Abstract
This study was designed to replicate an earlier report on the link between low scores on the Grooved Pegboard test (GPB), invalid responding, and elevated self-reported psychiatric symptoms. A fixed battery of neuropsychological tests was administered to 100 consecutively referred outpatients (MAge = 38.3, MEducation = 13.6 years) following traumatic brain injury at a Midwestern academic medical center. Classification accuracy of GPB validity cutoffs was computed against a free-standing PVT and three composite measures of embedded validity indicators. Previously suggested GPB validity cutoffs (T ≤ 29 in either hand) produced good combinations of sensitivity (0.25–0.55) and specificity (0.89–0.98) to psychometrically defined invalid performance. Raising the cutoff to T ≤ 31 resulted in a reasonable trade-off between increased sensitivity (0.36–0.55) and decreased specificity (0.84–0.94). T ≤ 31 in both hands was highly specific (0.93–0.98) to noncredible responding. GPB validity cutoffs were unrelated to psychiatric symptoms or injury severity. Failing PVTs based on forced choice recognition was associated with elevated self-reported depression, somatic concerns, and overall symptomatology. Low scores on the GPB are reliable indicators of noncredible responding. Self-reported emotional distress has a complex relationship with performance validity. Psychogenic interference is a potential mechanism behind PVT failures, and its expression is likely mediated by instrumentation and sampling artifacts. Further research on the topic is clearly needed to advance current understanding of psychogenic interference as a confound in cognitive testing.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The validity of the clinical interpretation of neuropsychological profiles rests on the assumption that examinees were able and willing to demonstrate their true ability level during the testing (Bigler, 2015; Lezak, Howieson, Bigler, & Tranel, 2012). The limitation of clinical intuition in detecting noncredible responding has long been evident (Heaton, Smith, Lehman, & Vogt, 1978). Therefore, a consensus emerged among professional organizations that a systematic and empirical evaluation of performance validity during neuropsychological assessment is necessary (Bush et al., 2005; Bush, Heilbronner, & Ruff, 2014; Chafetz et al., 2015; Heilbronner et al., 2009).
The traditional gold standard measures were free-standing performance validity tests (PVTs) designed specifically to evaluate the credibility of a given response set. These instruments are robust and by design optimized to differentiate genuine impairment from invalid performance. However, they often require multiple learning trials or time delays and provide little to no information on cognitive ability, which remains the ultimate goal of a clinical evaluation. In contrast, embedded validity indicators (EVIs) are nested within established test of neuropsychological functioning and were subsequently co-opted as PVTs.
Given that EVIs simultaneously measure cognitive ability and performance validity, they have the potential to address several of the current challenges faced by practicing neuropsychologists (Boone, 2013). As they abbreviate the psychometric testing without sacrificing data on either ability or effort, EVIs allow clinicians to provide a comprehensive evaluation of core constructs despite growing systemic pressures to optimize assessment practices for cost-effectiveness. Shorter testing also reduces demand on the examinee’s mental stamina, which is especially important in certain vulnerable populations such as young children and patients with complex medical and/or psychiatric conditions (Lichtenstein, Erdodi, & Linnea, 2017). Limiting exposure to free-standing PVTs could also help preserve the long-term integrity of these instruments, as repeated exposure to PVTs has been reported to compromise their signal detection performance (Boone, 2013). On the other hand, the practice of using the same instrument to measure both cognitive ability and performance validity has raised concerns about the inevitable confluence of these conceptually distinct constructs that PVTs are meant to differentiate (Bigler, 2012, 2015).
Research on EVIs has increased exponentially in recent years. Today, EVIs cover a broad range of cognitive domains: attention (Ashendorf, Clark, & Sugarman, 2017; Reese, Suhr, & Riddle, 2012; Trueblood, 1994), memory (Bortnik et al., 2010; Moore & Donders, 2004; Shura, Miskey, Rowland, Yoash-Gatz, & Denning, 2016; Pearson, 2009), processing speed (Erdodi et al., 2017a; Etherton, Bianchini, Heinly, & Greve, 2006; Kim et al., 2010b; Shura et al., 2016; Sugarman & Axelrod, 2015; Trueblood, 1994), language (Erdodi, 2017; Whiteside et al., 2015), executive functions (Ashendorf, Clark, & Sugarman, 2017; Shura et al., 2016; Suhr & Boyer, 1999), vigilance (Erdodi, Roth, Kirsch, Lajiness-O’Neill, & Medoff, 2014b; Lange et al., 2013; Ord, Boettcher, Greve, & Bianchini, 2010; Shura et al., 2016), and visuospatial/perceptual (Lu, Boone, Cozolino, & Mitchell, 2003; Reedy et al., 2013; Shura et al., 2016; Sussman, Peterson, Connery, Baker, & Kirkwood, 2017). In contrast, measures of motor speed have been underrepresented in this otherwise growing trend. Although the finger tapping test has long-established validity cutoffs (Arnold et al., 2005; Axelrod, Meyers, & Davis, 2014; Greiffenstein, Baker, & Gola, 1996; Larrabee, 2003), and the Grooved Pegboard Test (GPB) has a growing evidence base supporting its potential as an EVI (Arnold & Boone, 2007), the majority of the extant research only reported the effect of invalid performance on GPB scores as a continuous variable. The GPB (Lafayette Instrument, 2015) is a measure of manual dexterity designed to assess fine motor speed. It instructs the examinee to place grooved pegs into slotted holes angled in various directions, one at a time, as quickly as possible. Physically, the test consists of a small board with five rows, with five holes per row. The most commonly used score is completion time.
The GPB as an EVI has a long presence in the research literature. Examinees with noncredible responding produced consistently lower mean scores when compared to credible controls, with effect size ranging from medium (d = 0.50; Inman & Berry, 2002) to very large (d = 1.21; Rapport, Farchione, Coleman, & Axelrod, 1998) for the dominant hand and from small (d = 0.21—nonsignificant; Inman & Berry, 2002) to large (d = 1.03; Rapport et al., 1998) for the nondominant hand. The largest effect (d = 1.33) was reported by Binder and Willis (1991) using the combined raw score from both hands. All studies found a larger effect in the dominant hand (Johnson & Lesniak-Karpiak, 1997; van Gorp et al., 1999) relative to the nondominant hand. As a reference, in experimental malingering designs, Cohen’s d values of 0.75 are considered moderate, while Cohen’s d values of 1.25 are considered large (Rogers, Sewell, Martin, & Vitacco, 2003). Based on this classification system, the GPB shows promise as a PVT.
While the research reviewed above was important to establish the sensitivity of the GPB to noncredible responding in general, the practical demands of clinical neuropsychology require specific thresholds with known classification accuracy. The first study that published validity cutoffs for the GPB was by Erdodi et al. (2017d). Their mixed clinical sample consisted of 190 patients medically referred for neuropsychological assessment in the northeast USA. A demographically adjusted T score ≤ 29 for either hand was specific (0.85–0.90) to psychometrically defined invalid responding, with variable sensitivity (0.33–0.66). If Fail was defined as T ≤ 31 on both hands, specificity improved slightly (0.86–0.91). In addition, failing GPB validity cutoffs was associated with higher levels of self-reported symptoms on the Beck Depression Inventory—Second Edition (BDI-II) and several scales of the Personality Assessment Inventory (PAI; Depression, Somatic Concerns, Borderline and Antisocial Features, Alcohol and Drug Problems). Those who passed the dominant hand cutoff reported mean BDI-II scores in the mild range, while those who failed it had a mean score in the moderate severity range. The clinical significance of the increased symptom report was less clear on the PAI scales. On Somatic Concerns, the mean T score for patients who failed any of the GPB validity cutoffs crossed the T > 70 mark, while those who passed it produced mean T scores < 70. However, all means on the Antisocial Features scales were T < 60, regardless of Pass/Fail status on the GPB.
The authors attributed this unusual pattern of findings to psychogenic interference: a hypothesized mechanism by which emotional distress disrupts examinees’ ability to consistently demonstrate their maximal performance during psychometric testing (Bigler, 2012; Erdodi et al., 2016), producing internally inconsistent profiles, which in turn are commonly interpreted as evidence of noncredible responding (Boone, 2007a; Delis & Wetter, 2007; Greiffenstein et al., 1996; Slick, Sherman, Grant, & Iverson, 1999).
The link between emotional distress and cognitive performance is an intriguing topic in the context of clinical and forensic neuropsychology (Boone, 2007b; Henry et al., 2018; Suhr, Tranel, Wefel, & Barrash, 1997). Although multiple independent investigations converge in the conclusion that depression is orthogonal to performance validity (Considine et al., 2011; Egeland et al., 2005; Rees, Tombaugh, & Boulay, 2001), the evidence on the link between depression and test performance on specific cognitive domains remains equivocal. For example, some investigations concluded that depression and memory performance were unrelated (Egeland et al., 2005; Langenecker et al., 2005; Raskin, Mateer, & Tweeten, 1998; Rohling, Green, Allen, & Iverson, 2002). In contrast, other studies found a significant relationship between them (Bearden et al., 2006; Christensen, Griffiths, MacKinnon, & Jacomb, 1997, Considine et al., 2011).
Other factors affecting emotional functioning, such as complex trauma history, pain, and fatigue (Costabile, Bilo, DeRosa, Pane, & Sacca, 2018; Greiffenstein & Baker, 2008; Kalfon, Gal, Shorer, & Ablin, 2016; Suhr, 2003; Williamson, Holsman, Chaytor, Miller, & Drane, 2012), have also been linked to both PVT failure and cognitive performance. Accumulating evidence for the psychogenic interference hypothesis led to a proposed diagnostic entity (“cogniform disorder”) designed to capture excessive cognitive symptoms and poor test taking effort in the context of an assumed sick role nested in a conversion-like manifestation (Delis & Wetter, 2007).
More recently, and using a conceptually and computationally sophisticated methodology, Henry et al. (2018) demonstrated that illness perception in general and cogniphobia specifically predicted PVT outcomes. Illness perception refers to thoughts and beliefs about one’s health status, while cogniphobia is conceptualized as the belief that cognitive exertion may exacerbate an underlying neurological condition and the resulting avoidance of tasks that require significant mental effort (Suhr & Spickard, 2012). Taken together, existing research suggests that studying psychogenic interference has the potential to account for some of the unexplained variance in cognitive test performance.
The Erdodi et al. (2017d) study had a number of limitations. Their sample was diagnostically heterogeneous, which raises questions on whether the classification accuracy statistics developed in their study would generalize to specific diagnostic groups. In addition, missing data limited the effective sample size and, potentially, biased their parameter estimates. Therefore, the present study was designed to replicate their findings in a sample of patients with traumatic brain injury (TBI) assessed in a different region of the USA, using a fixed battery of neuropsychological tests.
Method
Participants
The sample consisted of a consecutive case sequence of 100 adults (55% male, 90% right-handed) clinically referred for neuropsychological assessment subsequent to a TBI at an outpatient neurorehabilitation unit of a Midwestern academic medical center. Mean age was 38.3 years (range 17–70), while mean level of education was 13.6 years. Overall intellectual functioning was in the low end of average range (MFSIQ = 92.8), as was estimated premorbid functioning based on performance on a single-word reading test (MWRAT-4 = 94.1). The sample was used in two previous publications (Erdodi, Abeare et al., 2017b; Erdodi, Roth, Kirsch, Lajiness-O’Neill, & Medoff, 2014b) focused on EVIs within Conners’ Continuous Performance Test and the Forced Choice Recognition trial of the California Verbal Learning Test, respectively. Given that patients were referred for cognitive evaluation by treating physicians, information on external incentive to appear impaired was inconsistently available. Therefore, the criteria for malingered neurocognitive deficits put forth by Slick, Sherman, Grant, and Iverson (1999) could not be applied. Instead, noncredible performance was operationalized using a variety of psychometric tools.
The majority (76%) of the patients sustained a head injury of mild severity. The rest of them were classified as moderate or severe by the assessing neuropsychologist, based on available data on commonly used injury parameters (duration of loss of consciousness, evidence of intracranial abnormalities on neuroradiological imaging, duration of peritraumatic amnesia, Glasgow Coma Scale score at the scene of the accident). A mild head injury was operationalized as a GCS ≥ 13, loss of consciousness < 30 min, posttraumatic amnesia < 24 h, and negative neuroradiological findings. Patients with compromised upper extremity neurological (i.e., hemiparesis or lesions to the peripheral nervous system) or orthopedic (i.e., bone fracture or soft tissue injury to the arm or hand) integrity were not administered the GPB. All patients were in the postacute stage of recovery (> 3 months post mild TBI and > 1 year post severe TBI).
Materials
Tests Administered
A fixed battery of commonly used neuropsychological tests [Booklet Category Test (DeFilippis & McCampbell, 1997), Conners’ Continuous Performance Test—Second Edition (Conners, 2004); Peabody Picture Vocabulary Test—Fourth Edition (Dunn & Dunn, 2007); Tactual Performance Test (Halstead, 1947), Trail Making Test (Reitan, 1955, 1958), verbal fluency (FAS and animals), Wisconsin Card Sorting Test (Heaton et al., 1993), Wide Range Achievement Test—Fourth Edition (Wilkinson & Robertson, 2006), Word Choice Test (Pearson, 2009)] was administered to all patients, covering a wide range of cognitive domains. Intellectual functioning was measured with the Wechsler Adult Intelligence Scale—Fourth Edition (WAIS-IV; Wechsler, 2008). Memory functioning was measured using the Wechsler Memory Scale—Fourth Edition (WMS-IV; Wechsler, 2008) and the California Verbal Learning Test—Second Edition (CVLT-II; Delis, Kramer, Kaplan, & Ober, 2000).
Self-reported emotional functioning was assessed using the Beck Depression Inventory—Second Edition (BDI-II; Beck, Steer, & Brown, 1996) and the Symptom Checklist 90—Revised (SCL-90-R; Derogatis, 1994). BDI-II scores between 14 and 19 are considered mild, 20–28 are considered moderate, and ≥ 29 are considered severe. SCL-90-R T scores ≥ 63 on any of the clinical scales are considered clinical elevations. The Global Severity Index has been reported to be a particularly sensitive indicator of overall psychological distress in individuals with TBI (Westcott & Alfrano, 2005), although other scales were also shown to be sensitive to the residual effects of head injury (Linn, Allen, & Willer, 1994). However, the SCL-90-R was also reported to be vulnerable to symptom fabrication/exaggeration in the context of experimentally induced malingering (McGuire & Shores, 2001; Sullivan & King, 2008).
The main free-standing PVT was the Word Memory Test at standard cutoffs (Green, 2003). GPB scores were demographically corrected using the norms by Heaton, Miller, Taylor, and Grant (2004). Given previous theoretically based (Bigler, 2014; Leighton, Weinborn, & Maybery, 2014) and empirically substantiated (Erdodi, 2017; Erdodi et al., 2017c) concerns about modality specificity as a confounding variable in calibrating PVTs, two additional instruments (“Erdodi Index Five” [EI-5]) were developed by aggregating five EVIs into a single composite measure of performance validity.
The EI-5 (FCR and PSP)
The first composite consisted of PVTs was based on forced choice recognition memory (EI-5FCR), while the other consisted of EVIs nested within measures of processing speed (EI-5PSP). Each EI-5 component was recoded onto a four-point scale: a score in the clear Pass range was coded as 0, while a score in the clear FAIL range was coded as 3, with two intermediate levels of failure (Table 1), following the methodology described by Erdodi (2017). The demarcation for an EI value on each individual component is determined by the most liberal cutoff available in the literature. EI values of 2 and 3 are either defined by previously published, more conservative cutoffs or the 5th and 10th percentile of the local distribution. In other words, if no clear cutoff is available to define the EI level of 3, the score associated with the bottom 5% of the distribution (most egregious failures) determines the specific cutoff. This methodology ensures that the EI model has similar interpretation across studies by providing a balanced approach between universally accepted cutoffs and study-specific variations in base rates of failure (BRFail).
Naturally, fixing the cutoffs for level 2 and level 3 to BRFail does not eliminate the confounding effect of the inherent variability across samples. The 5th percentile will be defined by very different scores in healthy university students and forensically evaluated patients with comorbid severe neuropsychiatric disorders. Therefore, EI values may not be comparable across studies. Moreover, extreme fluctuations in local distributions resulting in unusually high or low functioning examinees may render the EI model outright useless. For example, if only 5% of the sample fails the most liberal cutoff, meaningful differentiation of examinees in terms of the “extent of failure” is not possible due to the low BRFail. Conversely, if 90% of the sample fails highly conservative cutoffs, determining the range for EI level 1 is problematic, as it is defined as the most liberal published cutoff and is designed to be the gateway to noncredible designation (i.e., the additive effect of repeated near-Passes is equivalent to a couple of clear Fails), yet the vast majority of the sample demonstrates much stronger evidence of invalid performance, undermining the practical utility of this designation.
Such dramatic sampling idiosyncrasies are rare. When they do occur, they violate the implicit assumption underlying the EI model, namely that invalid performance is a continuous variable with a normative distribution of the severity gradient. Existing research on the EI produced remarkably consistent results supporting this a priori condition for the psychometric utility of the model. Across several studies, samples, EI components, and cutoffs (An et al., 2018; Erdodi, 2017; Erdodi et al., 2014b, 2017a, 2017b, 2017d, 2018a, 2018c, 2018d; Erdodi, Pelletier, & Roth, 2018b; Erdodi, Tyson et al., 2017c; Zuccato, Tyson, & Erdodi, 2018), zero was the modal value, the majority (half to two thirds) of the sample fell in the Pass range (i.e., EI ≤ 1), around 15–25% fell in the Borderline range (i.e., EI of 2 or 3), and 15–25% in the Fail range (i.e., EI ≥ 4). Two notable exceptions occurred when multiple EVIs nested within the same test were allowed to serve as independent EI components (Erdodi & Roth, 2017) or when two of the EI components had unusually high BRFail (Erdodi & Rai, 2017). In both cases, the left side of the distribution of EI scores was flattened, and as a result, the demarcation lines became less clear. However, more extreme EI scores retained their high specificity to noncredible responding.
The value of the full EI-5 scale is obtained by summing its recoded components and, thus, can range from 0 (all five constituent PVTs were passed) to 15 (all five constituent PVTs were failed at the most conservative cutoff). An EI-5 score ≤ 1 can be confidently classified as a Pass, as it indicates at most a single marginal failure. The clinical interpretation of the next range of scores (EI-5 of 2 and 3) is problematic: it can mean either several soft Fails, a single failure at the most conservative cutoff, or some combination of both. While this level of performance is clearly not valid, it also lacks strong evidence to deem the response set invalid. Therefore, it is labeled Borderline and excluded from analyses requiring a binary (Pass/Fail) outcome. Previous research demonstrated that individuals who scored in the Borderline range were more likely to fail other PVTs compared to those in the Pass range, but less likely to fail other PVTs compared to those in the Fail range (Erdodi, 2017; Erdodi et al., 2018c; Erdodi, Tyson et al., 2017). In contrast, an EI-5 score ≥ 4 contains sufficient evidence of noncredible responding from independent PVTs and, therefore, serves as the lower limit of the Fail range (Table 2). In previous studies, the EI model produced classification accuracy comparable to established free-standing PVTs in clinical samples of both adults (Erdodi, 2017; Erdodi et al., 2016; Erdodi & Rai, 2017) and children (Lichtenstein, Erdodi, Rai, Mazur-Mosiewicz, & Flaro, 2018).
The EI-5s were designed to capture both the number and the extent of PVT failures, providing a single-number summary of cumulative evidence of noncredible responding on a continuous scale of measurement. By accounting for different levels of PVT failure, they offer a more nuanced measure of performance validity. In contrast, the common practice of using a single binary cutoff that separates a scale into a valid and an invalid range ignores alternative cutoffs and inevitably sacrifices the diagnostic purity of both groups by forcing borderline cases into one of the two categories.
For example, a Reliable Digit Span (RDS; Greiffenstein, Baker, & Gola, 1994) has two commonly used cutoffs. While both have been shown to meet minimum specificity standards (Heinly, Greve, Bianchini, Love, & Brennan, 2005; Mathias, Greve, Bianchini, Houston, & Crouch, 2002; Reese, Suhr, & Riddle, 2012), RDS ≤ 7 (the liberal cutoff) is optimized for sensitivity, while ≤ 6 (the conservative cutoff) is optimized for specificity. As such, they have different predictive power. Ignoring that fact attenuates the overall classification accuracy and introduces systematic errors in the analysis.
The coexistence of alternative cutoffs creates a potential for diverging interpretations. For example, some assessors may consider an RDS score of 7 a “near-Pass” (Bigler, 2012)—in other words, a performance that does not meet a “sufficiently conservative” standard for failing a PVT. As such, they may be inclined to classify the response set as valid. Others may interpret the same score as the first level of noncredible responding and, consequently, label it as a “soft Fail” (Erdodi & Lichtenstein, 2017). While both hypothetical assessors above correctly determined that the score provides some, albeit weak, evidence of invalid performance, the confluence of the demand for binary classification (Pass/Fail), subjective thresholds for failure, and the implications of descriptive labels (“near-Pass” vs. “soft Fail”) ultimately led them to reach the opposite conclusions.
The EI model is a methodological innovation that provides a psychometric solution to the dilemma of “in-between” scores by the virtue of rescaling each individual performance onto a four-point ordinal scale (0–3), establishing a clearly valid range (= 0), and three levels of failure (1–3). This recoding is performed without making a definitive binary classification (i.e., Pass/Fail) at the level of individual components. Delaying the final decision on any given constituent PVT allows the assessor to take into account scores on other PVTs, and make the ultimate determination based on the cumulative evidence, consistent with recommendations from multiple professional organizations (Bush et al., 2005; Bush, Heilbronner, & Ruff, 2014; Chafetz et al., 2015; Heilbronner et al., 2009).
As such, an RDS of 7 is coded as 1 (weak evidence of noncredible performance). If all other PVTs were passed (= 0), this is then interpreted as a single incident of unusually low performance, and the overall EI composite will be considered a Pass. Similarly, in isolation, a CVLT-II forced choice recognition (FCR) score of 15, a Logical Memory recognition (LMRecog) raw score of 20 out of 30, or a Coding age-corrected scaled score of 5 would be treated as insufficient evidence to deem the entire response set invalid. However, if all four of these scores were to occur together, the overall neurocognitive profile would be interpreted as a Fail and considered equivalent to the combination of an RDS of 6 and an FCR of 14—both of which are individually highly specific to invalid performance (Jasinski, Berry, Shandera, & Clark, 2011; Persinger et al., 2018; Schwartz et al., 2016).
Most of the independent empirical support for the EI model comes from the Advanced Clinical Solutions module for assessing suboptimal effort (Pearson, 2009). The Technical Manual reports that around 25% of the overall clinical sample had an RDS of 7 and LMRecog of 20. It also notes that 19% of the overall clinical sample failed both of them [or two other PVTs at a comparably liberal cutoff (i.e., at the 25% base rate)]. However, only 6% failed three, 3% failed four, and only 1% failed all five. The precipitous drop in the base rate of cumulative failure can be interpreted as a decrease in false positive rate. In other words, while having one or two marginal PVT failures is a relatively common occurrence (i.e., weak evidence of globally invalid performance), the probability of ≥ 3 such failures is very low even in a clinical sample, and therefore, it provides strong evidence that the overall neurocognitive profile is likely invalid. Flexibility is a major advantage of the EI model over the Pearson ACS model: it is a modular index that can be built by mixing and matching components. Anyone (researcher or clinician) who has data on five or more well-validated EVIs can build his/her unique version of the EI and use it as a multivariate index of performance validity in either archival research or prospective studies.
Although some methodologists recommend excluding scores in the indeterminate range of performance on the criterion variable to improve the internal validity of the design (Greve & Bianchini, 2004), and this has since become widespread practice in performance validity research (Axelrod, Meyers, & Davis, 2014; Erdodi et al., 2017d; Jones, 2013; Kulas, Axelrod, & Rinaldi, 2014), it also raises concerns about artificially inflating classification accuracy and, in turn, limiting the generalizability of the findings. Handling borderline cases is a complex, multifactorial decision. The ultimate decision should take into account missing data (i.e., if component PVTs are inconsistently administered, a score in the Pass range is less reliable compared to a sample where all participants have data on all measures), the level of cutoff (failing PVTs at more conservative cutoff provides stronger evidence of noncredible performance, and hence, fewer failures are required to confidently establish a criterion group of invalid profiles), and the number of PVTs administered (requiring two failures has a different meaning if 2 [100% failure rate] or 12 PVTs [16.7% failure rate] were administered).
The VI-7
To provide an alternative validity composite that included all participants (i.e., did not exclude indeterminate cases), the Validity Index Seven (VI-7) was created that follows the traditional approach of counting the number of PVT failures along more conservative dichotomous (i.e., Pass/Fail) cutoffs (Arnold et al., 2005; Babikian, Boone, Lu, & Arnold, 2006; Bell-Sprinkle et al., 2013; Kim et al., 2010a; Nelson et al., 2003). A VI-7 value ≤ 1 (i.e., at most one failed PVT) was considered a Pass, while a VI-7 value ≥ 2 (i.e., at least two failed PVT) was considered an overall Fail, following commonly accepted and empirically supported forensic standards (Boone, 2013; Larrabee, 2014; Odland, Lammy, Martin, Grote, & Mittenberg, 2015). Table 3 lists the components of the VI-7, cutoffs, references, and BRFail. Given that in the present sample all tests were administered to every patient, and only two failures were required to deem the entire profile invalid, more conservative cutoffs were applied on each of the seven components.
Evidence for the Validity of the Composite PVTs
The EI model has growing empirical support as a multivariate PVT. Since its introduction, it performed very similarly to free-standing PVTs as an alternative measure of performance validity (Erdodi et al., 2014b, 2016, 2017d, 2018b, 2018d; Erdodi et al., 2018c; Erdodi & Roth, 2017). Cross-validation against free-standing PVTs further consolidated these findings. Erdodi and Rai (2017) reported that on a different version of the EI-5, a score < 4 had poor specificity (0.65–0.66) against the WMT and Non-Verbal Medical Symptom Validity Test (Green, 2008). However, specificity improved greatly at ≥ 4 (0.80–0.85) and ≥ 5 (0.89–0.90). More recently, An et al. (2018) found that a version of the EI-5 was highly predictive of experimental malingering (0.73 sensitivity at 0.88 specificity) as well as the WCT (0.79 sensitivity at 0.92 specificity) and Test of Memory Malingering (0.65 sensitivity at 0.97 specificity).
The evidence to support the indeterminate range (Borderline) as a legitimate third outcome of performance validity testing is even stronger, both for the traditional VI and the novel EI model. Erdodi (2017) demonstrated that EI-5 scores in the Borderline range provide consistently stronger evidence of noncredible responding than scores in the Pass range, but consistently weaker evidence of noncredible responding than scores in the Fail range. He argued that forcing these scores into either the Pass or the Fail category would contaminate the diagnostic purity of the criterion groups and, hence, attenuate classification accuracy. These findings were replicated in subsequent studies using different versions of both the VI and the EI (Erdodi et al., 2017d, 2018a, 2018c). Taken together, the cumulative evidence suggests that when validity composites are built by aggregating individual PVTs using liberal cutoffs, excluding the indeterminate range is necessary to optimize the separation of valid and invalid profiles and to restore the stringent (≥ 0.90) specificity standard.
Procedure
Data were collected from the clinical archives of the outpatient neuropsychology service where patients were assessed. Only deidentified test data were recorded for research purposes to protect patient confidentiality. The project was approved by the Institutional Review Board of the hospital system where the data were collected. APA ethical guidelines regulating research involving human participants were followed throughout the project.
Data Analysis
Basic descriptive statistics (M, SD, BRFail) were reported when relevant. Overall classification accuracy (AUC) and corresponding 95% confidence intervals (95% CIs) were computed in SPSS version 23.0. Sensitivity, specificity, positive (PPP) and negative predictive power (NPP), and risk ratio (RR) were calculated using standard formulas. The minimum acceptable level of specificity is 0.84 (Larrabee, 2003), although values ≥ 0.90 are the emerging new norm (Boone, 2013; Donders & Strong, 2011). Between-group contrasts were performed using independent t tests. Alpha level was not corrected for multiple comparisons, given that all contrasts were planned, and effect size estimates (Cohen’s d) were provided to allow readers to evaluate the magnitude of the difference independent of sample size.
Results
To demonstrate their utility in differentiating valid and invalid response sets, the classification accuracy of the validity composites was computed against the WMT as criterion (Table 4). The EI-5FCR produced a good AUC (0.88), with very high sensitivity (0.83) and adequate specificity (0.90). The EI-5PSP performed significantly more poorly. Nevertheless, it produced an AUC in the moderate range (0.74) and a good combination of sensitivity (0.50) and specificity (0.90). The VI-7 had the highest AUC at 0.91, 0.67 sensitivity, and 0.92 specificity.
Dominant hand GPB T scores produced significant AUCs against all four criterion PVTs (0.68–0.82). The T ≤ 29 cutoff had high specificity (0.90–0.99) and variable sensitivity (0.36–0.55). Increasing the cutoff to ≤ 31 sacrificed some of the specificity (0.87–0.94) without any gains in sensitivity. T ≤ 33 produced a good combination of sensitivity (0.50) and specificity (0.89) against the EI-5FCR but failed to maintain the minimum specificity standard against the WMT, EI-5PSP, and VI-7.
Similarly, nondominant hand GPB T scores produced significant AUCs against all four criterion PVTs (0.67–0.79). The T ≤ 29 cutoff had adequate specificity (0.89–0.92), but low sensitivity (0.24–0.30). Increasing the cutoff to ≤ 31 resulted in a predictable trade-off: improved sensitivity (0.32–0.50) at a reasonable cost to specificity (0.84–0.89). As before, T ≤ 33 produced a good combination of sensitivity (0.45) and specificity (0.88) against the EI-5FCR but failed to maintain the minimum specificity standard against the WMT, EI-5PSP, and VI-7 (Table 5).
When failure was redefined as a given score with both hands, ≤ 31 produced high specificity (0.93–0.98) but relatively low sensitivity (0.26–0.36). Increasing the cutoff to ≤ 33 cleared the specificity standard against all four criterion PVTs (0.89–0.94) and improved sensitivity (0.31–0.45). Given the increasing awareness that sensitivity and specificity should be interpreted in the broader context of the base rate of the condition of interest (Lange & Lippa, 2017), PPP and NPP were calculated at five different hypothetical BRFail (Table 6).
As severe TBI may result in legitimately impaired performance on the GPB, the effect of injury severity was independently examined (Table 7). Patients with mild TBI had consistently higher BRFail on the four criterion PVTs (RR 1.25–2.76) as well as the newly introduced GPB validity cutoffs (RR 1.23–2.20) compared to patients with moderate-to-severe TBI. However, it should be noted that most differences in BRFail did not reach statistical significance. In addition, to examine the domain specificity effect (Erdodi, 2017) as a potential confound, independent t tests were performed using the newly introduced GPB validity cutoffs (i.e., T ≤ 29 and T ≤ 31) as the independent variable and the EI-5s as dependent variables. Previous research suggests that the match in cognitive domain between the criterion and predictor variable influences classification accuracy: PVTs perform better against criterion measures that are similar in cognitive domain and/or administration format (Erdodi et al., 2017d, 2018a, 2018c), perhaps due to idiosyncratic patterns of selectively demonstrating certain types of deficits, but not others (Cottingham Victor, Boone, Ziegler, & Zeller, 2014; Erdodi et al., 2014a). Although patients who failed the GPB had significantly stronger evidence of invalid performance on both versions of the EI-5s (Table 8), effect sizes were larger on the EI-5PSP (d 0.82–1.13, large) than on the EI-5FCR (d 0.47–0.93; medium-large). This finding is consistent with the modality specificity hypothesis.
Finally, the results of independent t test with PVT status (Pass/Fail) as independent variable and the SCL-90-R scales as well as the BDI-II as dependent variable were computed. Patients who failed the WMT reported significantly higher scores on all measures of self-reported emotional distress, with effect sizes ranging from medium (d = 0.42) to large (d = 1.03). Failing the VI-7 was associated with smaller effects (d 0.42–0.63). Similar findings emerged with the EI-5FCR, with somewhat larger effect ranging from medium (d = 0.51) to large (d = 0.91). However, none of the contrasts with EI-5PSP as the independent variable reached significance.
More importantly, mean BDI-II scores among those who passed the criterion PVT was in the mild range or below (≤ 14.6), whereas those who failed the criterion PVT produced means in the upper limit of the mild range (17.1–18.8) or in the moderate severity range (20.3–20.7). Likewise, among significant contrasts, those who passed the criterion PVT scored in the nonclinical range on the SCL-90-R, with the exception of the Obsessive–Compulsive scale (Table 9). However, those who failed the criterion PVTs did not report clinically significant symptoms on the majority of the SCL-90-R scales (Interpersonal Sensitivity, Anxiety, Hostility, Paranoid Ideation, and Phobic Anxiety). At the same time, PVT failure was associated with clinically elevated scores on the Somatization, Obsessive–Compulsive, Depression, and Psychotic Symptoms as well as the GSI.
In contrast, passing or failing any of the newly endorsed GPB cutoffs was unrelated to SCL-90-R or BDI-II scores (Table 10). As a side note, there was no significant difference between the sample of Erdodi, Seke et al. (M = 18.1, SD = 12.1, range 0–51) and the present sample (M = 15.4, SD = 11.5, range 0–45) on the BDI-II, the only measure of self-reported psychiatric symptoms administered in both studies: t(268) = 1.79, p = 0.075, d = 0.23 (small effect). Both group means were within the mild range. Passing the GPB validity cutoffs was associated with a mean score in the nonclinical range on all SCL-90-R scales. Patients who failed the GPB reported mean scores in the clinical range on the Somatization, Obsessive–Compulsive, Depression, and GSI scales.
Discussion
This study was designed to replicate an earlier investigation based on a mixed clinical sample suggesting that in addition to measuring fine motor speed, GPB can also function as an index of noncredible responding and that failing GPB validity cutoffs is associated with elevated self-reported psychiatric symptoms (Erdodi et al., 2017d). The first finding was well-replicated: the previously endorsed cutoff (GPB T score ≤ 29 in either hand) produced good classification accuracy in the present sample against a different set of criterion PVTs. Moreover, based on these data, the default cutoffs could be raised to ≤ 31 for either hand or ≤ 33 for both hands, the point where the instrument best approximates the “Larrabee limit.” The phrase refers to the seemingly inescapable trade-off between false positives and false negative so that fixing specificity at 0.90 results in sensitivity hovering around the 0.50 mark (Lichtenstein et al., 2017).
More importantly, scoring below the GPB validity cutoffs was largely unrelated to head injury severity, pre-empting arguments that genuine neurological impairment could account for the PVT failure, despite previous reports that GPB is sensitive to neurological disorders (Larrabee, Millis, & Meyers, 2008). In fact, patients with moderate-to-severe TBI were less likely to score below the cutoffs than those with mild head injuries. While this finding may appear paradoxical, it is well replicated in performance validity research (Carone, 2008; Erdodi & Rai, 2017; Green, Iverson, & Allen, 1999; Grote et al., 2000).
As in the original study, GPB validity cutoffs resulted in similar classification accuracy across psychometrically diverse criterion PVTs (free-standing vs. embedded, univariate vs. multivariate, modality congruent vs. incongruent), suggesting that existing sensitivity and specificity values provide stable estimates of the GPB’s signal detection parameters. The consistency across samples and criteria alleviate concerns about modality specificity as a confounding variable in PVT research (Leighton et al., 2014; Root et al., 2006). At the same time, in the original study, GPB validity cutoffs performed notably better against the domain-congruent criterion PVT, the EI-5PSP. In contrast, in the present sample, the GPB resulted in consistently higher specificity against the domain-incongruent criterion PVT, the EI-5FCR. This puzzling finding serves as a reminder that measurement models can behave differently across samples, reinforcing the need for multiple independent replications. A possible explanation for this inconsistency is sample-specific differences in BRFail—most notably, on the CPT-II Omissions scale: while only 3.2% of patients in the original study scored T > 100, 15.0% of the present sample failed that cutoff.
The second finding of the Erdodi et al. (2017d) study was not replicated: passing or failing the GPB validity cutoffs was orthogonal to self-reported psychopathology. However, the outcome of the WMT or EI-5FCR was significantly related to BDI-II and SCL-90-R scores, producing a large overall effect. Similarly, failing the VI-7 was associated with a medium effect on all three measures of self-reported psychiatric symptoms. Within the latter SCL-90-R, the GSI was particularly sensitive to invalid responding, confirming previous reports that the instrument is vulnerable to noncredible responding (McGuire & Shores, 2001; Sullivan & King, 2008). This finding also cautions against equating elevated SCL-90-R scores with the presence of psychiatric illness (Johnson, Ellison, & Heikkinen, 1989) and reveals the need for a systematic evaluation of the credibility of symptom report by either utilizing free-standing instruments (Giromini, Viglione, Pignolo, & Zennaro, 2018; Viglione, Giromini, & Landis, 2017) or built-in validity scales within established inventories such as the PAI. While overall these results provide partial support for the psychogenic interference hypothesis overall, they also emphasize that it may be instrument-, scale-, and/or sample-specific.
A possible explanation for the discrepancy between the two sets of results is the choice of outcome measure: the SCL-90-R vs. the Personality Assessment Inventory (Morey, 1991). The latter is a more robust instrument with strong evidence base in a variety of clinical populations (Boone, 1998; Hopwood et al., 2007; Karlin et al., 2005; Siefert, Sinclair, Kehl-Fie, & Blais, 2009; Sims, Thomas, Hopwood, Chen, & Pascale, 2013; Sinclair et al., 2015) that contains almost four times more items and, more importantly, validity scales to evaluate the veracity of the response sets. The SCL-90-R lacks this important feature.
However, both studies used the BDI-II (Beck, Steer, & Brown, 1996) as a measure of emotional functioning, allowing for a direct comparison. The BDI-II has excellent psychometric properties (Sprinkle et al., 2002; Storch, Roberti, & Roth, 2004) and demonstrated both high sensitivity and specificity to a clinical diagnosis of depression (Kjærgaard, Arfwedson Wang, Waterloo, & Jorde, 2014). A careful examination of the results from Erdodi et al. (2017d) and the present study reveals that the only discrepancy in BDI-II scores as a function of passing or failing the GPB validity cutoffs was on the dominant hand (Cohen’s d 0.46 vs. 0.09). Nondominant hand (Cohen’s d 0.18 vs. 0.11) and combined (Cohen’s d 0.29 vs. 0.13) cutoffs produced essentially the same results.
It could be argued that the current findings are internally consistent (i.e., nonsignificant results were observed as a function of passing or failing all GPB cutoffs) regarding the BDI-II and that the isolated positive finding by Erdodi et al. (2017d) associated with failing the dominant hand GPB cutoff was an outlier. Likewise, medium to large effects were observed on the BDI-II as a function of passing or failing PVTs based on the forced choice recognition paradigm and the validity composite that contained several constituent PVTs based on recognition memory, but contrasts involving a criterion PVT based solely on processing speed measures were consistently nonsignificant. Further comparisons are hindered by the fact that the original study did not report BDI-II scores as a function of passing or failing the criterion PVTs. If future investigators agreed to consider reporting such findings (i.e., Is there a difference in self-reported psychiatric symptoms between those who passed and those who failed PVTs?), regardless of the main focus of the study, the knowledge base on psychogenic interference could be advanced more quickly.
As one of the reviewers pointed out, the age range (17–70 years) within the present sample and that of Erdodi et al. (2017d; 18–69 years) constrains the generalizability of the findings. Although the GPB cutoffs are given in a metric that has been adjusted for age in addition to gender, education, and race using norms by Heaton et al. (2004), it is unclear whether the classification accuracy would extend outside that range. While the emerging evidence base suggests that adult cutoffs on some PVTs can be applied to children (Donders, 2005; Erdodi & Lichtenstein, 2017; Kirkwood & Kirk, 2010; Lichtenstein et al., 2017, 2018; MacAllister, Nakhutina, Bender, Karantzoulis, & Carlson, 2009), at the same time, increased false positive rates have been reported in older adults (Ashendorf, O’Bryant, & McCaffrey, 2003; Dean, Victor, Boone, Philpott, & Hess, 2009; Kiewel, Wisdom, Bradshaw, Pastorek, & Strutt, 2012; Zenisek, Millis, Banks, & Miller, 2016).
A significant limitation of the present investigation is that neither measure of emotional functioning has built-in validity scale to provide an objective evaluation of the credibility of self-reported symptoms. Therefore, the data can support competing interpretations: (A) Individuals who exaggerate neurocognitive impairment also tend to exaggerate psychiatric symptoms (i.e., “double malingering”); and (B) genuine emotional distress (accurately captured on psychiatric inventories) prevented a subset of the patients from demonstrating their true ability level on performance based tests, resulting in both repeated PVT failures and elevations on the BDI-II and SCL-90-R (i.e., psychogenic interference; Delis & Wetter, 2007; Kalfon et al., 2016; Suhr, 2003; Suhr & Spickard, 2012). The fact that in some of the earlier studies the relationship between performance validity and self-reported psychiatric symptoms was present even though patients passed the symptom validity checks (Erdodi et al., 2017d, 2018c) argues for the latter explanation. In addition, the lack of data on external incentive status prevented us from classifying patients according to diagnostic criteria for malingered neurocognitive dysfunction proposed by Slick et al. (1999).
Overall, the limited available evidence precludes a definitive conclusion. While the psychogenic interference hypothesis has merit and, thus, warrants further investigation, clinicians are cautioned against its broad interpretation aimed at discounting objective evidence of noncredible performance, especially in the presence of external incentives to appear impaired (Larrabee, 2012; Slick et al., 1999). Instead, the combination of multiple PVT failures and elevated self-reported distress could inform the clinical management of the patient, in the form of psychotherapy or cognitive rehabilitation exploring causal mechanisms behind poor test performance (Boone, 2007a, b; Delis & Wetter, 2007; Erdodi et al., 2017b). If following symptom relief achieved after successful therapy, the patient produces valid data upon retesting, that pattern of findings would retroactively support psychogenic interference as an explanation for the initial PVT failures.
Future research on the psychogenic interference hypothesis would benefit from prospective studies specifically designed to dissociate the effects of several confounding variables identified in previous reports, such as apparent external incentives to underperform/exaggerate symptoms, self-reported emotional distress on both face-valid and opaque instruments. In addition, history of complex psychiatric trauma should be recorded and evaluated as a potential contributing factor to PVT failures with unknown etiology (Williamson et al., 2012). Although the link between abuse history and performance validity is far from being well-understood (Kemp et al., 2008; Tyson et al., 2018), previous studies found that individuals with severe developmental trauma were overrepresented among patients who were misclassified by multivariate models of performance validity assessment due to internal contradiction among various scores (Erdodi et al., 2017c, 2017e). If future studies replicate internal inconsistency as a reliable psychometric marker of the link between adverse life events and inexplicable PVT failures, it would significantly advance our current understanding of the mechanisms behind noncredible responding (Berry et al., 1996; Bigler, 2012, 2015; Boone, 2007a, b; Delis & Wetter, 2007).
In sum, the present investigation successfully replicated an earlier study that introduced validity cutoffs embedded within the GPB in a TBI sample from a different geographic region. Results suggest that a demographically adjusted T score ≤ 31 in either hand or ≤ 33 in both hands is specific to psychometrically defined invalid responding. Unlike in the previous report, failing the GPB was unrelated to self-reported psychiatric symptoms. At the same time, patients who failed PVTs based on the forced choice recognition paradigm reported higher levels of depression, somatic concerns, and overall symptomatology. Given the consistently good classification accuracy of GPB validity cutoffs across samples and criterion PVTs, they appear to be a valuable addition to the growing arsenal of EVIs available to clinical neuropsychologists. However, further research is clearly needed to elucidate the complex relationship between noncredible responding and self-reported emotional distress. Exploring the complex manifestation of psychogenic interference during cognitive testing appears to be a promising line of investigation that has the potential to provide new insights into the causal mechanisms behind internally inconsistent neuropsychological profiles, isolated cognitive deficits with no plausible etiology, and PVT failures.
References
An, K. Y., Charles, J., Ali, S., Enache, A., Dhuga, J., & Erdodi, L. A. (2018). Re-examining performance validity cutoffs within the Complex Ideational Material and the Boston Naming Test-Short Form using an experimental malingering paradigm. Journal of Clinical and Experimental Neuropsychology, 1–11. https://doi.org/10.1080/13803395.2018.1483488.
Arnold, G., & Boone, K. B. (2007). Use of motor and sensory tests as measures of effort. In K. B. Boone (Ed.), Assessment of feigned cognitive impairment. New York, NY: Guilford.
Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., & McPhearson, S. (2005). Sensitivity and specificity of finger tapping test scores for the detection of suspect effort. The Clinical Neuropsychologist, 19(1), 105–120.
Ashendorf, L., Clark, E. L., & Sugarman, M. A. (2017). Performance validity and processing speed in a VA polytrauma sample. The Clinical Neuropsychologist, 31(5), 857–866.
Ashendorf, L., O’Bryant, S. E., & McCaffrey, R. J. (2003). Specificity of malingering detection strategies in older adults using the CVLT and WCST. Clinical Neuropsychology, 17(2), 255–262.
Axelrod, B. N., Meyers, J. E., & Davis, J. J. (2014). Finger tapping test performance as a measure of performance validity. The Clinical Neuropsychologist, 28(5), 876–888.
Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and specificity of various digit span scores in the detection of suspect effort. The Clinical Neuropsychologist, 20(1), 145–159.
Bearden, C. E., Glahn, D. C., Monkul, E. S., Barrett, J., Najt, P., Villareal, V., & Soares, J. C. (2006). Patterns of memory impairment in bipolar disorder and unipolar major depression. Psychiatry Research, 142, 139–150.
Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Beck depression inventory-II. San Antonio, TX: Psychological Corporation.
Bell-Sprinkle, T. L., Boone, K. B., Miora, D., Cottingham, M., Victor, T., Ziegler, E., Zeller, M., & Wright, M. (2013). Re-examination of the Rey Word Recognition Test. The Clinical Neuropsychologist, 27(3), 516–527.
Berry, D. T. R., Adams, J. J., Clark, C. D., Thacker, S. R., Burger, T. L., Wetter, M. W., Baer, R. A., & Borden, J. W. (1996). Detection of a cry for help on the MMPI-2: An analog investigation. Journal of Personality Assessment, 67(1), 26–36.
Bigler, E. D. (2012). Symptom validity testing, effort and neuropsychological assessment. Journal of the International Neuropsychological Society, 18, 632–642.
Bigler, E. D. (2014). Effort, symptom validity testing, performance validity testing and traumatic brain injury. Brain Injury, 28(13–14), 1623–1638.
Bigler, E. D. (2015). Neuroimaging as a biomarker in symptom validity and performance validity testing. Brain Imaging and Behavior, 9(3), 421–444.
Binder, L. M., & Willis, S. C. (1991). Assessment of motivation after financially compensable minor head trauma. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 175–181.
Boone, D. (1998). Internal consistency reliability of the Personality Assessment Inventory with psychiatric inpatients. Journal of Clinical Psychology, 54(6), 839–843.
Boone, K. B. (2007a). Assessment of feigned cognitive impairment. A neuropsychological perspective. New York, NY: Guilford.
Boone, K. B. (2007b). Commentary on “Cogniform disorder and cogniform condition: Proposed diagnoses for excessive cognitive symptoms” by Dean C. Delis and Spencer R. Wetter. Archives of Clinical Neuropsychology, 22, 675–679.
Boone, K. B. (2013). Clinical practice of forensic neuropsychology. New York, NY: Guilford.
Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E., Victor, T. L., & Zeller, M. A. (2010). Examination of various WMS-III logical memory scores in the assessment of response bias. The Clinical Neuropsychologist, 24(2), 344–357.
Bush, S. S., Heilbronner, R. L., & Ruff, R. M. (2014). Psychological assessment of symptom and performance validity, response bias, and malingering: Official position of the Association for Scientific Advancement in Psychological Injury and Law. Psychological Injury and Law, 7(3), 197–205.
Bush, S. S., Ruff, R. M., Troster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., … Silver, C. H. (2005). Symptom validity assessment: Practice issues and medical necessity (NAN Policy and Planning Committees). Archives of Clinical Neuropsychology, 20, 419–426.
Carone, D. A. (2008). Children with moderate/severe brain damage/dysfunction outperform adults with mild-to-no brain damage on the Medical Symptom Validity Test. Brain Injury, 22(12), 960–971.
Chafetz, M. D., Williams, M. A., Ben-Porath, Y. S., Bianchini, K. J., Boone, K. B., Kirkwood, M. W., Larrabee, G. J., & Ord, J. S. (2015). Official position of the American Academy of Clinical Neuropsychology Social Security Administration policy on validity testing: Guidance and recommendations for change. The Clinical Neuropsychologist, 29(6), 723–740.
Christensen, H., Griffiths, K., MacKinnon, A., & Jacomb, P. (1997). A quantitative review of cognitive deficits in depression and Alzheimer-type dementia. Journal of the International Neuropsychological Society, 3, 631–651.
Conners, K. C. (2004). Conner’s continuous performance test (CPT II).Version 5 for Windows. Technical guide and software manual. North Tonawada, NY: Multi-Health Systems.
Considine, C., Weisenbach, S. L., Walker, S. J., McFadden, E. M., Franti, L. M., Bieliauskas, L. A., Maixner, D. F., Giordani, B., Berent, S., & Langenecker, S. A. (2011). Auditory memory decrements, without dissimulation, among patients with major depressive disorder. Archives of Clinical Neuropsychology, 26, 445–453.
Costabile, T., Bilo, L., DeRosa, A., Pane, C., & Sacca, F. (2018). Dissociative identity disorder: Restoration of executive functions after switch from alter to host personality. Psychiatry and Clinical Neurosciences, 72, 189–190.
Cottingham, M. E., Victor, T. L., Boone, K. B., Ziegler, E. A., & Zeller, M. (2014). Apparent effect of type of compensation seeking (disability vs. litigation) on performance validity test scores may be due to other factors. The Clinical Neuropsychologist, 28(6), 1030–1047.
Dean, A. C., Victor, T. L., Boone, K. B., Philpott, L. M., & Hess, R. A. (2009). Dementia and effort test performance. The Clinical Neuropsychologist, 23, 133–152.
DeFilippis, N. A., & McCampbell, E. (1997). Manual for the booklet category test. Odessa, FL: Psychological Assessment Resources.
Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. (2000). The California Verbal Learning Test (2nd ed.). San Antonio, TX: The Psychological Corporation.
Delis, D. C., & Wetter, S. R. (2007). Cogniform disorder and cogniform condition: Proposed diagnoses for excessive cognitive symptoms. Archives of Clinical Neuropsychology, 22, 589–604.
Derogatis, L. R. (1994). SCL-90-R: Administration, scoring, and procedures manual (3rd ed.). Minneapolis, MN: National Computer Systems.
Donders, J. (2005). Performance on the Test of Memory Malingering in a mixed pediatric sample. Child Neuropsychology, 11(2), 221–227.
Donders, J., & Strong, C. H. (2011). Embedded effort indicators on the California Verbal Learning Test—Second edition (CVLT-II): An attempted cross-validation. The Clinical Neuropsychologist, 25, 173–184.
Davis, J. J. (2014). Further consideration of Advanced Clinical Solutions Word Choice: Comparison to the Recognition Memory Test – Words and classification accuracy on a clinical sample. The Clinical Neuropsychologist, 28(8), 1278–1294. https://doi.org/10.1080/13854046.2014.975844.
Dunn, L. M., & Dunn, D. M. (2007). Peabody Picture Vocabulary Test (4th ed.). San Antonio, TX, Pearson.
Egeland, J., Lund, A., Landro, N. I., Rund, B. R., Sudet, K., Asbjornsen, A., Mjellem, N., Roness, A., & Stordal, K. I. (2005). Cortisol level predicts executive and memory function in depression, symptom level predicts psychomotor speed. Acta Psychiatrica Scandinavica, 112, 434–441.
Erdodi, L. A. (2017). Aggregating validity indicators: The salience of domain specificity and the indeterminate range in multivariate models of performance validity assessment. Applied Neuropsychology: Adult, 1–18. https://doi.org/10.1080/23279095.2017.1384925.
Erdodi, L. A., Abeare, C. A., Lichtenstein, J. D., Tyson, B. T., Kucharski, B., Zuccato, B. G., & Roth, R. M. (2017a). WAIS-IV processing speed scores as measures of non-credible responding—The third generation of embedded performance validity indicators. Psychological Assessment, 29(2), 148–157.
Erdodi, L. A., Abeare, C. A., Medoff, B., Seke, K. R., Sagar, S., & Kirsch, N. L. (2017b). A single error is one too many: The Forced Choice Recognition trial on the CVLT-II as a measure of performance validity in adults with TBI. Archives of Clinical Neuropsychology. https://doi.org/10.1093/arclin/acx110.
Erdodi, L. A., Dunn, A. G., Seke, K. R., Charron, C., McDermott, A., Enache, A., Maytham, C., & Hurtubise, J. (2018d). The Boston Naming Test as a measure of performance validity. Psychological Injury and Law, 11, 1–8. https://doi.org/10.1007/s12207-017-9309-3.
Erdodi, L. A., Hurtubise, J. L., Charron, C., Dunn, A., Enache, A., McDermott, A., & Hirst, R. B. (2018a). The D-KEFS Trails as performance validity tests. Psychological Assessment, 30(8), 1082–1095.
Erdodi, L. A., Kirsch, N. L., Lajiness-O’Neill, R., Vingilis, E., & Medoff, B. (2014a). Comparing the Recognition Memory Test and the Word Choice Test in a mixed clinical sample: Are they equivalent? Psychological Injury and Law, 7(3), 255–263. https://doi.org/10.1007/s12207-014-9197-8.
Erdodi, L. A., & Lichtenstein, J. D. (2017). Invalid before impaired: An emerging paradox of embedded validity indicators. The Clinical Neuropsychologist, 31(6–7), 1029–1046.
Erdodi, L. A., Pelletier, C. L., & Roth, R. M. (2018b). Elevations on select Conners’ CPT-II scales indicate noncredible responding in adults with traumatic brain injury. Applied Neuropsychology: Adult, 25(1), 19–28. https://doi.org/10.1080/23279095.2016.1232262.
Erdodi, L. A., & Roth, R. M. (2017). Low scores on BDAE Complex Ideational Material are associated with invalid performance in adults without aphasia. Applied Neuropsychology: Adult, 24(3), 264–274. https://doi.org/10.1080/23279095.2017.1298600.
Erdodi, L. A., & Rai, J. K. (2017). A single error is one too many: Examining alternative cutoffs on trial 2 on the TOMM. Brain Injury, 31(10), 1362–1368. https://doi.org/10.1080/02699052.2017.1332386.
Erdodi, L. A., Roth, R. M., Kirsch, N. L., Lajiness-O’Neill, R., & Medoff, B. (2014b). Aggregating validity indicators embedded in Conners’ CPT-II outperforms individual cutoffs at separating valid from invalid performance in adults with traumatic brain injury. Archives of Clinical Neuropsychology, 29(5), 456–466.
Erdodi, L. A., Sagar, S., Seke, K., Zuccato, B. G., Schwartz, E. S., & Roth, R. M. (2018c). The Stroop Test as a measure of performance validity in adults clinically referred for neuropsychological assessment. Psychological Assessment, 30(6), 755–766. https://doi.org/10.1037/pas0000525.
Erdodi, L. A., Seke, K. R., Shahein, A., Tyson, B. T., Sagar, S., & Roth, R. M. (2017d). Low scores on the Grooved Pegboard Test are associated with invalid responding and psychiatric symptoms. Psychology & Neuroscience, 10(3), 325–344.
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Lichtenstein, J. D., Pelletier, C. L., Rai, J. K., & Roth, R. M. (2016). The BDAE Complex Ideational Material—A measure of receptive language or performance validity? Psychological Injury and Law, 9, 112–120.
Erdodi, L. A., Tyson, B. T., Abeare, C. A., Zuccato, B. G., Rai, J. K., Seke, K. R., …, & Roth, R. M. (2017e). Utility of critical items within the Recognition Memory Test and Word Choice Test. Applied Neuropsychology: Adult. https://doi.org/10.1080/23279095.2017.1298600
Erdodi, L. A., Tyson, B. T., Shahein, A., Lichtenstein, J. D., Abeare, C. A., Pelletier, C. L., Zuccato, B. G., Kucharski, B., & Roth, R. M. (2017c). The power of timing: Adding a time-to-completion cutoff to the Word Choice Test and Recognition Memory Test improves classification accuracy. Journal of Clinical and Experimental Neuropsychology, 39(4), 369–383. https://doi.org/10.1080/13803395.2016.1230181.
Etherton, J. L., Bianchini, K. J., Heinly, M. T., & Greve, K. W. (2006). Pain, malingering, and performance on the WAIS-III Processing Speed Index. Journal of Clinical and Experimental Neuropsychology, 28(7), 1218–1237.
Giromini, L., Viglione, D. J., Pignolo, C., & Zennaro, A. (2018). A clinical comparison, simulation study testing the validity and IOP-29 with an Italian sample. Psychological Injury and Law. https://doi.org/10.1007/s12207-018-9314-1.
Green, P. (2003). Green’s word memory test. Edmonton, Canada: Green’s Publishing.
Green, P. (2008). Green’s non-verbal medical symptom validity test. Edmonton: Green’s Publishing.
Green, P., Iverson, G., & Allen, L. (1999). Detecting malingering in head injury litigation with the Word Memory Test. Brain Injury, 13, 813–819.
Greiffenstein, M. F., & Baker, W. J. (2008). Validity testing in dually diagnosed post-traumatic stress disorder and mild closed head injury. The Clinical Neuropsychologist, 22, 565–582.
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment, 6(3), 218–224.
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1996). Motor dysfunction profiles in traumatic brain injury and postconcussion syndrome. Journal of the International Neuropsychological Society, 2(06), 477.
Greve, K. W., & Bianchini, K. J. (2004). Setting empirical cutoffs on psychometric indicators of negative response bias: A methodological commentary with recommendation. Archives of Clinical Neuropsychology, 19, 533–541.
Greve, K. W., Bianchini, K. J., Mathias, C. W., Houston, R. J., & Crouch, J. A. (2002). Detecting malingered neurocognitive dysfunction with the Wisconsin Card Sorting Test: A preliminary investigation in traumatic brain injury. The Clinical Neuropsychologist, 16(2), 179–191.
Grote, C., Kooker, E. K., Garron, D. C., Nyenhuis, D. L., Smith, C. A., & Mattingly, M. L. (2000). Performance of compensation seeking and non-compensation seeking samples on the Victoria Symptom Validity Tests: Cross-validation and extension of a standardization study. Journal of Clinical and Experimental Neuropsychology, 22, 709–791.
Halstead, W. (1947). Brain and intelligence. A quantitative study of the frontal lobes. Chicago: University of Chicago Press.
Heaton, R. K., Chelune, G. J., Talley, J. L., Kay, G. G., & Curtis, G. (1993). Wisconsin Card Sorting Test (WCST) manual revised and expanded. Odessa, FL: Psychological Assessment Resources.
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised comprehensive norms for an expanded Halstead-Reitan battery: Demographically adjusted neuropsychological norms for African American and Caucasian adults. Lutz, FL: Psychological Assessment Resources.
Heaton, R. K., Smith, H. H., Lehman, R. A. W., & Vogt, A. T. (1978). Prospects for faking believable deficits on neuropsychological testing. Journal of Consulting and Clinical Psychology, 46(5), 892–900.
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., Millis, S. R., & Conference Participants. (2009). American Academy of Clinical Neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 23, 1093–1129.
Heinly, M. T., Greve, K. W., Bianchini, K., Love, J. M., & Brennan, A. (2005). WAIS digit-span-based indicators of malingered neurocognitive dysfunction: Classification accuracy in traumatic brain injury. Assessment, 12(4), 429–444.
Henry, G. K., Helbronner, R. L., Suhr, J., Gornbein, J., Wagner, E., & Drane, D. L. (2018). Illness perceptions predict cognitive performance validity. Journal of the International Neuropsychological Society, 24, 1–11.
Hopwood, C. J., Morey, L. C., Shea, M. T., McGlashan, T. H., Sanislow, C. A., Grilo, C. M., … Skodol, A. E. (2007). Personality traits predict current and future functioning comparably for individuals with major depressive and personality disorders. Journal of Nervous and Mental Disorders, 195(3), 266–269.
Inman, T. H., & Berry, D. T. R. (2002). Cross-validation of indicators of malingering: A comparison of nine neuropsychological tests, four tests of malingering, and behavioral observations. Archives of Clinical Neuropsychology, 17, 1–23.
Jasinski, L. J., Berry, D. T., Shandera, A. L., & Clark, J. A. (2011). Use of the Wechsler Adult Intelligence Scale Digit Span subtest for malingering detection: A meta-analytic review. Journal of Clinical and Experimental Neuropsychology, 33(3), 300–314.
Johnson, R.W., Ellison, R.A., & Heikkinen, C.A. (1989). Psychological symptoms of counseling center clients. Journal of Counseling Psychology, 36(1), 110-114.
Johnson, J. L., & Lesniak-Karpiak, K. (1997). The effect of warning on malingering on memory and motor tasks in college samples. Archives of Clinical Neuropsychology, 12, 231–238.
Jones, A. (2013). Test of memory malingering: Cutoff scores for psychometrically defined malingering groups in a military sample. The Clinical Neuropsychologist, 27(6), 1043–1059.
Kalfon, T. B., Gal, G., Shorer, R., & Ablin, J. N. (2016). Cognitive functioning in fibromyalgia: The central role of effort. Journal of Psychosomatic Research, 87, 30–36.
Karlin, B. E., Creech, S. K., Grimes, J. S., Clark, T. S., Meagher, M. W., & Morey, L. C. (2005). The Personality Assessment Inventory with chronic pain patients: Psychometric properties and clinical utility. Journal of Clinical Psychology, 61(12), 1571–1585.
Kemp, S., Coughlan, A. K., Rowbottom, C., Wilkinson, K., Teggart, V., & Baker, G. (2008). The base rate of effort test failure in patients with medically unexplained symptoms. Journal of Psychosomatic Research, 65(4), 319–325.
Kiewel, N. A., Wisdom, N. M., Bradshaw, M. R., Pastorek, N. J., & Strutt, A. M. (2012). Retrospective review of digit span-related effort indicators in probable Alzheimer’s disease patients. The Clinical Neuropsychologist, 26(6), 965–974.
Kim, M. S., Boone, K. B., Victor, T., Marion, S. D., Amano, S., Cottingham, M. E., Ziegler, E. A., & Zeller, M. A. (2010a). The Warrington Recognition Memory Test for words as a measure of response bias: Total score and response time cutoffs developed on “real world” credible and noncredible subjects. Archives of Clinical Neuropsychology, 25, 60–70.
Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., & Mitchell, C. (2010b). Sensitivity and specificity of a digit symbol recognition trial in the identification of response bias. Archives of Clinical Neuropsychology, 25(5), 420–428.
Kirkwood, M. W., & Kirk, J. W. (2010). The base rate of suboptimal effort in a pediatric mild TBI sample: Performance on the Medical Symptom Validity Test. The Clinical Neuropsychologist, 24(5), 860–872.
Kjærgaard, M., Arfwedson Wang, C. E., Waterloo, K., & Jorde, R. (2014). A study of the psychometric properties of the Beck Depression Inventory-II, the Montgomery and Åsberg Depression Rating Scale, and the Hospital Anxiety and Depression Scale in a sample from a healthy population. Scandinavian Journal of Psychology, 55(1), 83–89.
Kulas, J. F., Axelrod, B. N., & Rinaldi, A. R. (2014). Cross-validation of supplemental Test of Memory Malingering Scores as performance validity measures. Psychological Injury and Law, 7(3), 236–244.
Lafayette Instrument. (2015). Grooved Pergboard user’s manual. Lafayette, IN.
Lange, R. T., Iverson, G. L., Brickell, T. A., Staver, T., Pancholi, S., Bhagwat, A., & French, L. M. (2013). Clinical utility of the Conners’ Continuous Performance Test-II to detect poor effort in U.S. military personnel following traumatic brain injury. Psychological Assessment, 25(2), 339–352.
Lange, R. T., & Lippa, S. M. (2017). Sensitivity and specificity should never be interpreted in isolation without consideration of other clinical utility metrics. The Clinical Neuropsychologist, 31, 1015–1028.
Langenecker, S. A., Bieliauskas, L. A., Rapport, L. J., Zubieta, J. K., Wilde, E. A., & Berent, S. (2005). Face emotion perception and executive functioning deficits in depression. Journal of Clinical and Experimental Psychology, 27, 320–333.
Larrabee, G. J. (2003). Detection of malingering using atypical performance patterns on standard neuropsychological tests. The Clinical Neuropsychologist, 17(3), 410–425.
Larrabee, G. J. (2012). Assessment of malingering. In G. J. Larrabee (Ed.), Forensic neuropsychology: A scientific approach (2nd ed., pp. 116–159). New York: Oxford University Press.
Larrabee, G. J. (2014). False-positive rates associated with the use of multiple performance and symptom validity tests. Archives of Clinical Neuropsychology, 29, 364–373.
Larrabee, G. J., Millis, S. R., & Meyers, J. E. (2008). Sensitivity to brain dysfunction of the Halstead-Reitan vs and ability-focused neuropsychological battery. The Clinical Neuropsychologist, 22, 813–825.
Leighton, A., Weinborn, M., & Maybery, M. (2014). Bridging the gap between neurocognitive processing theory and performance validity assessment among the cognitively impaired: A review and methodological approach. Journal of the International Neuropsychological Society, 20, 873–886.
Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012). Neuropsychological assessment. New York: Oxford University Press.
Lichtenstein, J. D., Erdodi, L. A., & Linnea, K. S. (2017). Introducing a forced-choice recognition task to the California Verbal Learning Test—Children’s version. Child Neuropsychology, 23(3), 284–299. https://doi.org/10.1080/09297049.2015.1135422.
Lichtenstein, J. D., Erdodi, L. A., Rai, J. K., Mazur-Mosiewicz, A., & Flaro, L. (2018). Wisconsin Card Sorting Test embedded validity indicators developed for adults can be extended to children. Child Neuropsychology, 24(2), 247–260. https://doi.org/10.1080/09297049.2016.1259402.
Linn, R. T., Allen, K., & Willer, B. S. (1994). Affective symptoms in the chronic stage of traumatic brain injury: A study of married couples. Brain Injury, 8(2), 135–147.
Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Effectiveness of the Rey-Osterrieth Complex Figure Test and the Meyers and Meyers recognition trial in the detection of suspect effort. The Clinical Neuropsychologist, 17(3), 426–440.
MacAllister, W. S., Nakhutina, L., Bender, H. A., Karantzoulis, S., & Carlson, C. (2009). Assessing effort during neuropsychological evaluation with the TOMM in children and adolescents with epilepsy. Child Neuropsychology, 15(6), 521–531.
Mathias, C. W., Greve, K. W., Bianchini, K. J., Houston, R. J., & Crouch, J. A. (2002). Detecting malingered neurocognitive dysfunction using the reliable digit span in traumatic brain injury. Assessment, 9(3), 301–308.
McGuire, B. E., & Shores, E. A. (2001). Simulated pain on the Symptom Checklist-90-Revised. Journal of Clinical Psychology, 57(12), 1589–1596.
Moore, B. A., & Donders, J. (2004). Predictors of invalid neuropsychological performance after traumatic brain injury. Brain Injury, 18(10), 975–984.
Morey, L. C. (1991). Personality Assessment Inventory. Odessa, FL: Routledge.
Nelson, N. W., Boone, K., Dueck, A., Wagener, L., Lu, P., & Grills, C. (2003). The relationship between eight measures of suspect effort. The Clinical Neuropsychologist, 17(2), 263–272.
Odland, A. P., Lammy, A. B., Martin, P. K., Grote, C. L., & Mittenberg, W. (2015). Advanced administration and interpretation of multiple validity tests. Psychological Injury and Law, 8, 46–63.
Ord, J. S., Boettcher, A. C., Greve, K. J., & Bianchini, K. J. (2010). Detection of malingering in mild traumatic brain injury with the Conners’ Continuous Performance Test-II. Journal of Clinical and Experimental Neuropsychology, 32(4), 380–387.
Pearson. (2009). Advanced clinical solutions for the WAIS-IV and WMS-IV—Technical manual. San Antonio, TX: Author.
Persinger, V. C., Whiteside, D. M., Bobova, L., Saigal, S. D., Vannucci, M. J., & Basso, M. R. (2018). Using the California Verbal Learning Test, second edition as an embedded performance validity measure among individuals with TBI and individuals with psychiatric disorders. The Clinical Neuropsychologist, 32(6), 1039–1053. https://doi.org/10.1080/13854046.2017.1419507.
Rapport, L. J., Farchione, T. J., Coleman, R. D., & Axelrod, B. N. (1998). Effects of coaching on malingered motor function profiles. Journal of Clinical and Experimental Neuropsychology, 20(1), 89–97.
Raskin, S. A., Mateer, C. A., & Tweeten, R. (1998). Neuropsychological assessment of individuals with mild traumatic brain injury. The Clinical Neuropsychologist, 12(1), 21–30.
Reedy, S. D., Boone, K. B., Cottingham, M. E., Glaser, D. F., Lu, P. H., Victor, T. L., Ziegler, E. A., Zeller, M. A., & Wright, M. J. (2013). Cross-validation of the Lu and colleagues (2003) Rey-Osterrieth Complex Figure Test effort equation in a large known-group sample. Archives of Clinical Neuropsychology, 28, 30–37.
Rees, L. M., Tombaugh, T. N., & Boulay, L. (2001). Depression and the Test of Memory Malingering. Archives of Clinical Neuropsychology, 16, 501–506.
Reese, C. S., Suhr, J. A., & Riddle, T. L. (2012). Exploration of malingering indices in the Wechsler Adult Intelligence Scale – Fourth Edition Digit Span subtest. Archives of Clinical Neuropsychology, 27, 176–181.
Reitan, R. M. (1958). The validity of the Trail Making Test as an indicator of organic brain damage. Perceptual and Motor Skills, 8, 271–276.
Retain, R. M. (1955). The relation of the Trail Making Test to organic brain damage. Journal of Consulting Psychology, 19, 393–394.
Rogers, R., Sewell, K. W., Martin, M. A., & Vitacco, M. J. (2003). Detection of feigned mental disorders: A meta-analysis of the MMPI-2 and malingering. Assessment, 10, 160–177.
Rohling, M. L., Green, P., Allen, L. M., & Iverson, G. L. (2002). Depressive symptoms and neurocognitive test scores in patients passing symptom validity tests. Archives of Clinical Neuropsychology, 17, 205–222.
Root, J. C., Robbins, R. N., Chang, L., & Van Gorp, W. G. (2006). Detection of inadequate effort on the California Verbal Learning Test—Second Edition: Forced choice recognition and critical item analysis. Journal of the International Neuropsychological Society, 12, 688–696.
Schwartz, E. S., Erdodi, L., Rodriguez, N., Jyotsna, J. G., Curtain, J. R., Flashman, L. A., & Roth, R. M. (2016). CVLT-II forced choice recognition trial as an embedded validity indicator: A systematic review of the evidence. Journal of the International Neuropsychological Society, 22(8), 851–858.
Shura, R. D., Miskey, H. M., Rowland, J. A., Yoash-Gatz, R. E., & Denning, J. H. (2016). Embedded performance validity measures with postdeployment veterans: Cross-validation and efficiency with multiple measures. Applied Neuropsychology: Adult, 23, 94–104.
Siefert, C. J., Sinclair, S. J., Kehl-Fie, K. A., & Blais, M. A. (2009). An item-level psychometric analysis of the Personality Assessment Inventory clinical scales in a psychiatric inpatient unit. Assessment, 16(4), 373–383.
Sims, J. A., Thomas, K. M., Hopwood, C. J., Chen, S. H., & Pascale, C. (2013). Psychometric properties and norms for the Personality Assessment Inventory in egg donors and gestational carriers. Journal of Personality Assessment, 95(5), 495–499.
Sinclair, S. J., Walsh-Messinger, J., Siefert, C. J., Antonius, D., Baity, M. R., Haggerty, G., Stein, M. B., & Blais, M. A. (2015). Neuropsychological functioning and profile validity on the Personality Assessment Inventory (PAI): An investigation in multiple psychiatric settings. Bulletin of the Menninger Clinic, 79(4), 305–334.
Slick, D. J., Sherman, E. M. S., Grant, L., & Iverson, G. L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13(4), 545–561.
Sprinkle, S. D., Lurie, D., Insko, S. L., Atkinson, G., Jones, G. L., Logan, A. R., & Bissada, N. N. (2002). Criterion validity, severity cut scores, and test-retest reliability of the Beck Depression Inventory-II in a university counseling center sample. Journal of Counseling Psychology, 49(3), 381–385.
Storch, E. A., Roberti, J. W., & Roth, D. A. (2004). Factor structure, concurrent validity, and internal consistency of the Beck Depression Inventory—Second Edition in a sample of college students. Depression and Anxiety, 19(3), 187–189.
Sugarman, M. A., & Axelrod, B. N. (2015). Embedded measures of performance validity using verbal fluency tests in a clinical sample. Applied Neuropsychology: Adult, 22(2), 141–146.
Suhr, J., & Spickard, B. (2012). Pain-related fear is associated with cognitive task avoidance: Exploration of the cogniphobia construct in a recurrent headache sample. The Clinical Neuropsychologist, 26(7), 1128–1141. https://doi.org/10.1080/13854046.2012.713121.
Suhr, J., Tranel, D., Wefel, J., & Barrash, J. (1997). Memory performance after head injury: Contributions of malingering, litigation status, psychological factors, and medication use. Journal of Clinical and Experimental Psychology, 19(4), 500–514.
Suhr, J. A. (2003). Neuropsychological impairment in fibromyalgia. Relation to depression, fatigue, and pain. Journal of Psychosomatic Research, 55, 321–329.
Suhr, J. A., & Boyer, D. (1999). Use of the Wisconsin Card Sorting Test in the detection of malingering in student simulator and patient samples. Journal of Clinical and Experimental Psychology, 21(5), 701–708.
Sullivan, K., & King, J. (2008). Detecting faked psychopathology: A comparison of two tests to detect malingered psychopathology using a simulation design. Psychiatry Research, 176, 75–81.
Sussman, Z. W., Peterson, R. L., Connery, A. K., Baker, D. A., & Kirkwood, M. W. (2017). Utility of Matrix Reasoning as an embedded performance validity indicator in pediatric mild traumatic brain injury. Applied Neuropsychology: Child.
Trueblood, W. (1994). Qualitative and quantitative characteristics of malingered and other invalid WAIS-R and clinical memory data. Journal of Clinical and Experimental Neuropsychology, 14(4), 697–607.
Tyson, B. T., Baker, S., Greenacre, M., Kent, K., J., Lichtenstein, J. D., Sabelli, A., & Erdodi, L.A. (2018). Differentiating epilepsy from psychogenic nonepileptic seizures using neuropsychological test data. Epilepsy & Behavior, 87, 39–45.
van Gorp, E. G., Humphrey, L. A., Kalechstein, A. L., Brumm, V. L., McMullen, W. J., Stoddard, M., & McPachana, N. A. (1999). How well do standard clinical neuropsychological tests identify malingering? A preliminary analysis. Journal of Clinical and Experimental Neuropsychology, 21, 245–250.
Viglione, D. J., Giromini, L., & Landis, P. (2017). The development of the Inventory of Problems–29: A brief self-administered measure for discriminating bona fide from feigned psychiatric and cognitive complaints. Journal of Personality Assessment, 99, 534–544.
Wechsler, D. (2008). Wechsler Adult Intelligence Test—Fourth edition (WAIS-IV). San Antonio, TX: Pearson.
Westcott, M. C., & Alfrano, D. P. (2005). The symptoms checklist-90-revised and mild traumatic brain injury. Brain Injury, 19(14), 1261–1267.
Whiteside, D. M., Kogan, J., Wardin, L., Philips, D., Franzwa, M. G., Rice, L., Basso, M., & Roper, B. (2015). Language-based embedded performance validity measures in traumatic brain injury. Journal of Clinical and Experimental Neuropsychology, 37(2), 220–227.
Wilkinson, G. S., & Robertson, G. J. (2006). Wide range achievement test 4. Lutz, FL: Psychological Assessment Resources, Inc.
Williamson, D. J., Holsman, M., Chaytor, N., Miller, J. W., & Drane, D. L. (2012). Abuse, not financial incentive, predicts non-credible cognitive performance in patients with psychogenic non-epileptic seizures. The Clinical Neuropsychologist, 26(4), 588–598.
Zenisek, R., Millis, S. R., Banks, S. J., & Miller, J. B. (2016). Prevalence of below-criterion Reliable Digit Span scores in a clinical sample of older adults. Archives of Clinical Neuropsychology, 31(5), 426–433.
Zuccato, B. G., Tyson, T. T., & Erdodi, L. A. (2018). Early bird fails the PVT? The effects of timing artifacts on performance validity tests. Psychological Assessment. https://doi.org/10.1037/pas0000596.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Relevant ethical guidelines regulating research involving human participants were followed throughout the project. All data collection, storage, and processing was done in compliance with the Helsinki Declaration.
Conflict of Interest
The authors declare that they have no conflicts of interest.
Rights and permissions
About this article
Cite this article
Erdodi, L.A., Kirsch, N.L., Sabelli, A.G. et al. The Grooved Pegboard Test as a Validity Indicator—a Study on Psychogenic Interference as a Confound in Performance Validity Research. Psychol. Inj. and Law 11, 307–324 (2018). https://doi.org/10.1007/s12207-018-9337-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12207-018-9337-7