Background and Conditions of Use

The Structured Inventory of Malingered Symptomatology (SIMS; Widows & Smith, 2005) is a 75-item, freestanding symptom validity test (SVT). Despite the test name including the term malingering, the SIMS manual states that the purpose of the measure is to “assess symptoms of both feigned psychopathology and cognitive function” (Widows & Smith, 2005, p. 4), not malingering per se. Malingering as commonly used and defined in the DSM-5 involves “intentional production of false or grossly exaggerated physical or psychological symptoms, motivated by secondary incentives” (American Psychiatric Association, 2013, p. 726): as the SIMS does not assess intentionality or external incentives, it is best described as a measure of negative response bias or symptom feigning (exaggeration or fabrication of symptoms). The SIMS is administered on a carbonless booklet by marking directly on the form, via local administration software, or through the online PARiConnect platform. Items are true–false in format, and the measure takes about 15 min to complete. It was designed for use in adults (18 years of age and older) with at least 5th grade reading level (Widows & Smith, 2005). The total score includes all items and tends to be most researched, but the SIMS also contains five subscales of 15 items each: Psychosis (P), Neurologic Impairment (NI), Amnestic Disorders (AM), Low Intelligence (LI), and Affective Disorders (AF). Practice surveys have identified the SIMS as one of the most commonly used SVTs (Martin et al., 2015).

SIMS items were developed in several phases. During the first phase, 200 items were generated based on the review of literature that described qualitative characteristics of malingerers (Widows & Smith, 2005). During the next stage, nine experienced clinical psychologists were asked to rate the initial pool of 200 items and to classify each item into different conditions: low intelligence, affective disorder, neurologic impairment, psychosis, amnestic disorder, another category, or some combination of these categories (Smith & Burger, 1997; Widows & Smith, 2005). Items that received at least a two-thirds (67%) agreement rate were placed into one of five aforementioned conditions in rank order of their percentage agreement rate (Smith & Burger, 1997; Widows & Smith, 2005). The items that did not meet the 67% agreement rate were discarded. The final set contained 75 items (15 items per each of five scales) with interrater reliability values ranging from 0.76 (Neurologic Impairment) to 0.95 (Affective Disorders) with a mean interrater reliability coefficient of 0.84. The initial validation sample included 476 undergraduate students, with a mean age of 24.43 years and who were 71% female and 89.7% White. A simulation design was used to establish initial scores. The SIMS scales have been shown to demonstrate acceptable internal consistency with reliability coefficients ranging from 0.80 to 0.88 (Smith & Burger, 1997).

Initial principal component analysis reported in the SIMS manual (Widows & Smith, 2005) indicated a four-factor solution, with most of the NI items loading on the P subscale; however, NI was kept as a separate subscale given the good Cohen’s kappa score. In addition to the total score and five original subscales, Rogers et al. (2014) developed two new scales using a sample of psychiatric inpatients. In contrast to the original subscales that screen for feigning by symptom domain (e.g., memory versus psychotic symptoms), Rogers et al. developed their new scales based on detection methods. Fifteen items that were rarely endorsed by genuine responders but were commonly endorsed by simulated feigners comprise the Rare Symptoms (RS) scale. Thirteen item pairs that were highly intercorrelated in the feigning group and uncorrelated in the genuine group constitute the Symptom Combinations (SC) scale. The items on the SC scale do not necessarily reflect pairs of similar symptoms as sampled by two separate items; rather, the empirical method for deriving resulted in a variety of symptoms across all five of the original SIMS subscales. Reliabilities for the two new scales were excellent (RS) and moderate (SC). Rogers et al. (2014) recommended a cutoff score of > 6 for both new scales.

Short Forms

Completion of the SIMS is less burdensome than long multi-scale self-report measures with embedded SVTs; however, several studies using various methodologies identified potentially viable short forms of the SIMS. The earliest (Malcore et al., 2015) utilized a two-step process for item reduction: first, all items with a corrected correlation to the total score of less than r = 0.30 were removed (n = 37); then, items were removed if 95% of respondents endorsed the same answer, indicating lack of variability. Finally, the remaining two items of the LI subscale and one more item from the P subscale were dropped, leaving a 36-item short form (SIMS-SF) including four of the five subscales (of note, the short form appears to include 35 items from the description and others’ reference to the scale). Initial area under the curve (AUC) analyses suggested that the abbreviated total score, and even the abbreviated NI and AM scores, was comparable to the original total score when compared to FBS and RBS from the Minnesota Multiphasic Personality Inventory 2nd edition (MMPI-2). A separate study utilized machine learning to identify a SIMS short form in a sample of 132 subjects with adjustment disorder who completed a psychological evaluation in the context of a lawsuit (Orrù et al., 2021). Subjects were divided into the following groups based on three criteria: consistent, accentuators (i.e., symptom exaggerators), and producers (i.e., symptom fabricators). The resulting short forms included an 8-item version and a 10-item version, which were compared to both the Malcore et al. (2015) SIMS-SF and the full SIMS, with all models showing potential promise psychometrically; however, the items contained in those models were not listed in the manuscript.

More recently, Spencer et al. (2021) created the SIMS for Neuropsychological Settings (SIMS-NS) scale comprising 11 NI items and 8 AM items, retained based on correlations to MMPI-2 Restructured Form (MMPI-2-RF) SVTs agreed to be most relevant: AM items were compared to RBS and NI items were compared to FBS-r and Fs. The authors then compared the original SIMS, the new SIMS-NS, Rogers’ RS and SC scales, and the Malcore SIMS-SF on their ability to identify those who produce invalid scores on performance validity tests (PVTs). AUCs were comparable for SIMS Total (0.76), SIMS-NS (0.74), SIMS-SC (0.72), and SIMS-SF (0.76). Although these studies are promising for various short forms, different subscales, and using SIMS scores to screen for possible invalid performance, additional research in this area is warranted.

Appropriate Age Range and Language

The SIMS manual states that the measure is intended for adults aged 18 and older (Widows & Smith, 2005); however, a simulation study by Rogers et al. (1996) evaluated the SIMS in a sample of 53 adolescents ages 14–17. Although the total score had a PPP of 0.87 at > 14 (NPP = 0.62), the AF subscale showed improved accuracy at > 5 (PPP = 0.91, NPP = 0.70). For that sample, the total score was raised to a cutoff of > 40 to optimize accuracy, leading to PPP = 0.49 and NPP = 0.94. No studies were found on children, though at the content level, some items would not apply to children (e.g., driving items, some knowledge items on LI) and the 5th grade reading level would also create lower bounds regarding age. Other than age, the SIMS has been validated in a variety of contexts (see Table 1) including research, clinical, and forensic.

Table 1 Summary of classification accuracy studies for various cutoffs of the SIMS total score

The SIMS has also been studied in several countries other than the USA, such as Spain, Germany, Italy, Switzerland, and Netherlands, as well as several other languages, including Spanish, German, Dutch, and Turkish (Ardic et al., 2019; Giromini et al., 2018; González Ordi et al., 2012; Nijdam-Jones & Rosenfeld, 2017). Related to language and using a modified Dutch version of the SIMS, a small study with asylum seekers (van der Heide & Merckelbach, 2016) found that although those with poor language proficiency endorsed more SIMS items than those with intermediate or good proficiency, this effect was not statistically significant in two different sub-samples. In contrast, SIMS scores were more related to incentives to malinger than to language proficiency. In another study incorporating cultural factors into assessing the SIMS, Boskovic et al. (2017) found that mental health professionals from 22 countries rated items on the SIMS Short Form as less plausible compared to a measure of common psychiatric symptoms (Brief Symptom Inventory-18), regardless of whether the rater was from a Western or non-Western country; however, SIMS item plausibility was rated as not significantly different from a measure of dissociative symptoms, suggesting caution when using the SIMS with those reporting dissociative symptoms.

Meta-Analysis and Practice Surveys

van Impelen et al. (2014) completed a meta-analysis of the SIMS, specifically at cutoff scores of > 14 and > 16. Although sensitivity was adequate, specificity was less satisfactory and highly variable across various samples. Diagnostic accuracy of the SIMS subscales was also examined and generally adequate, though the P and LI scale were substantially less accurate than the other subscales. The authors recommend a variety of cutoff scores based on different situations, with > 16 recommended for screening purposes and a cutoff of > 24 for clinical populations or more certain conclusions. In addition to meta-analysis, the SIMS has been discussed in published practice surveys. For example, Dandachi-FitzGerald and Merckelbach (2013) completed a survey of neuropsychologists (N = 515) in six European countries and reported that SIMS was used by 13.2% of their sample, second to only the MMPI-2. Similarly, Martin et al. (2015) conducted a survey of North American neuropsychologists (N = 316) and reported that 10.1% of their sample used the SIMS, which was the most commonly used stand-alone symptom validity measure; only the MMPI-2/RF and Personality Assessment Inventory (PAI; Morey, 2007) were more commonly used.

Convergent and Incremental Validity

As Kelley (1927, p. 14) explained, “the problem of validity is that of whether a test really measures what it purports to measure.” As noted, the SIMS purports to measure malingered symptomatology, where symptomatology means psychopathological symptoms and malingered means—contrary to the standard definition—inaccurate symptom endorsement due to significant exaggeration or fabrication, whether or not such behavior arises from a conscious intention to achieve a specific outcome. Thus, the SIMS does not actually assess malingering as the word is commonly defined,Footnote 1 a point made convincingly by van Impelen et al. (2014, p. 1337). The construct the SIMS purports to measure is feigned psychological and cognitive symptoms.Footnote 2Footnote 3 Despite including scales for feigned memory complaints and low intellectual function, the SIMS does not appear sensitive to underperformance on cognitive tests as measured by performance validity tests, based on the few studies available (Alwes et al., 2008; Dandachi-FitzGerald et al., 2011).

Does the SIMS actually measure feigned symptomatology? The short answer is a qualified yes. As a screening instrument, the SIMS efficiently and effectively rules out feigning, i.e., identifying evaluees who do not require further symptom validity assessment.Footnote 4 Also, as a screening test, the SIMS effectively identifies individuals requiring comprehensive feigning assessment, provided evaluators understand its limitations. Specifically, while several studies have demonstrated convergent validity for the SIMS (e.g., Clegg et al., 2009; Edens et al., 1999; Heinze & Purisch, 2001; Lewis et al., 2002; Merten et al., 2020), some of these same studies have shown that to some extent the SIMS may measure genuine psychopathology (e.g., Ord et al., 2021); thus, evidence for the test’s discriminant validity is only moderate. Along these lines, the SIMS may misidentify patients exhibiting marked apathy (Dandachi-FitzGerald et al., 2020), patients with alexithymia (Merckelbach, et al., 2018), patients with schizophrenia (Peters et al., 2013), veterans with PTSD (Wolf et al., 2020), and inpatients with extensive trauma histories (Rogers, et al., 2014), although conversely, the SIMS did not misidentify patients with Korsakoff Amnesia in one study (Oudman et al., 2020).

If evaluators understand that discriminant validity for the SIMS is limited, they can minimize false positive findings by determining a cutoff score, in advance, that takes into account the estimated feigning base rate and the severity of genuine psychopathology typically seen at the evaluation site. For example, imagine a psychologist conducting clinical assessments at a long-term residential treatment program for patients.

    • diagnosed with severe substance use disorders;

    • who endured multiple adverse childhood events, most of whom are diagnosed with PTSD;

    • and who by and large are neither involved in personal injury litigation nor seeking any sort of disability benefit.

The psychologist at such a facility would want to assume a relatively low feigning prevalence rate and would want to take into account research showing that individuals with PTSD, for example, produce substantially higher validity scale scores on the MMPI-2 (Franklin et al., 2002; Frueh et al., 2000) and MMPI-2-RF (Goodwin et al., 2013) compared to other clinical groups. Consequently, the psychologist would probably want to use a SIMS cutoff score of ≤ 23 to rule out the need for additional symptom validity assessment. Thus, determining in advance, an evidence-based cutoff score will improve the odds that the SIMS measures feigned psychopathology rather than measuring genuine psychiatric illness.

Convergent Validity

To demonstrate an instrument’s convergent validity, researchers identify established operationalizations (measures) of the desired construct, and then determine if the instrument in question correlates highly with those established measures. For symptom validity tests (SVTs), investigators usually compare an SVT with other SVTs already shown to exhibit good reliability and validity. The SIMS manual (Widows & Smith, 2005) refers to several studies using an analog (simulated feigning) research design with undergraduates to assess validity of the SIMS scores. Analyses comparing honest responders and simulator feigning groups have revealed that individuals in simulating groups produced consistently higher SIMS total scores than those in the honest responding groups (p < 0.01). The manual also references several early studies on convergent and divergent validity (included below), but specifically includes results from a small study (n = 20, cited as via personal communication) using a disability sample and found the SIMS total score highly correlated with other measures of response bias: PAI NIM r = 0.94, M-FAST Total r = 0.93, TOMM r = -0.89 to -0.91 (Widows & Smith, 2005). Of note, despite the high correlations with the TOMM, another study from the Netherlands (Dandachi-FitzGerald et al., 2011) using a clinical sample (N = 183) demonstrated a far lower correlation (r = − 0.22) between the Dutch version of the SIMS and the Amsterdam Short Term Memory test, a different performance validity measure. Further, unpublished analyses by Smith and Burger (1997) described in the SIMS manual (Widows & Smith, 2005) demonstrated that SIMS total score was shown to have high correlations with the F scale (r = 0.84) and F-K index (r = 0.81) on the MMPI. Correlations of SIMS subscales with MMPI F scale were moderate to strong ranging from 0.53 (AM and LI) to 0.84 (P). Similarly, correlations of SIMS subscales with the F-K index of the MMPI were moderate (0.47 for LI and 0.49 for AM) to strong (0.83 for P). Additionally, SIMS total score was significantly correlated with the Insanity subscale (r = − 0.71) of the malingering scale and with the 16PF Faking Bad Scale (r = 0.45).

Heinze and Purisch (2001) conducted a study involving 57 men suspected of feigning incompetence to stand trial and reported that SIMS total score was moderately or strongly correlated with other commonly used measures of symptom validity: the validity scales of the MMPI-2 (correlations ranged from − 0.47 to 0.50) and scales of the Structured Interview of Reported Symptoms (SIRS; Rogers et al., 1992) with correlations ranging from 0.43 to 0.80. Similarly, using the MMPI-2, Lewis et al. (2002) in their study of 64 men undergoing pretrial forensic assessments found that the SIMS total score had the second-best effects size when comparing honest responders to feigners (d = 3.0), second only to MMPI-2 Fb (d = 3.6).

Edens et al. (2007) compared the ability of the SIMS, the SIRS, and the PAI to detect malingering among prison inmates (N = 115) in four subsamples of inmates: (a) a group instructed to malinger, (b) a group of suspected malingerers who were identified by psychiatric staff, (c) a control group selected from general population of inmates, and (d) a group of psychiatric patients. They reported that intercorrelations among measures for the total sample were quite high. Specifically, SIMS total scores were correlated strongly with the SIRS scores (r = 0.81), the NIM subscale of the PAI (r = 0.84), and moderately with two other PAI indices: MAL r = 0.68 and RDF r = 0.45. The SIMS total score correlated to the SIRS total score at a far lesser degree (r = 0.54) in a residential treatment sample (n = 41) of veterans diagnosed with PTSD (Freeman et al., 2008). Thus, in general, concurrent validity is supported when the SIMS is compared to a variety of other SVTs.

Incremental Validity and Incremental Utility

If an evaluator routinely administers one of the MMPI instruments (MMPI-2, MMPI-2-RF, or MMPI-3) or the PAI to screen for exaggeration/feigning, they might wonder if the administration of the SIMS would be useful. In other words, would the SIMS improve the ability to identify feigning? Unfortunately, few studies have examined incremental validity of the SIMS, and definitive evidence supporting incremental validity does not exist to date. In the study by Lewis et al. (2002), the SIMS total score did not show incremental validity beyond MMPI–2 Fb, which was the best predictor of invalid status. Given the low PPP of the SIMS in this study, the authors reiterated the need to follow-up with more thorough assessment in cases of invalid SIMS scores. In the Edens et al.’ study (2007), logistic regressions were used to examine whether the SIRS showed incremental validity over the SIMS; although the model was significant when the SIRS was added, predictive accuracy only increased from 69 to 72%. No other studies were found examining incremental validity of the SIMS.

On the other hand, if an evaluator wonders if the SIMS will improve their ability to rule out exaggeration/feigning beyond evaluator judgment based on record review and interview data only, then the SIMS does evince incremental validity (Dandachi-FitzGerald et al., 2017; Edens et al., 2007). In this context, the SIMS also possesses incremental utility. We define incremental utility as the extent to which a symptom validity test demonstrates incremental validity and provides value. Value is a subjective judgment that potentially involves several variables, but in relatively simple terms, a test has value if its benefits outweigh its costs (see Hunsley & Meyer, 2003, and Haynes & Lench, 2003, for detailed and erudite discussions of incremental validity and utility). If an evaluator wishes to conduct comprehensive exaggeration/feigning assessments only with evaluees who show an elevated risk of exaggeration/feigning, then the SIMS has value as it takes only 15 min to administer, costs $5.50 USD per test for administration and scoring (PAR, n.d.), and reliably rules out a sizeable portion of evaluees from needing further assessment.

In conclusion, SIMS incremental validity is not well established when compared to other SVTs; however, this applies more so in the test battery use than in the screening use, and incremental validity arguably is less important in that situation. A small number of studies suggest incremental validity and utility for the SIMS compared to evaluator judgments about exaggeration/feigning based on interview and record review data alone.

Cut Scores and Hit Rates

A number of cutoff scores have been examined for the SIMS. The cutoff score published on the paper test forms and presented in the test manual is > 14 for the Total score (Widows & Smith, 2005). This cutoff recommendation was based on empirical validation of the SIMS described in the manual and using two samples of college students. Study participants were randomly assigned either to a control group of honest responders (HR) or to one of six experimental conditions, in which participants were asked to simulate psychosis (P), amnesia (A), depression (D), low intelligence (LI), neurologic impairment (NI), or to simulate “faking bad” (FB) (Widows & Smith, 2005, p. 12). The total sample was split into developmental (n = 238) and cross-validation (n = 238) subsamples. Using the developmental sample, the authors identified cutoff scores that optimally separated participants in the HR group from participants in the experimental conditions on each scale. Resulting cutoffs were as follows: Total > 14, P > 1, NI > 2, AM > 2, LI > 2, and AF > 5. Sensitivity for the total score cutoff was 94.63% with specificity of 87.88%. The authors concluded that respondents who obtain SIMS total score greater than 14 are in need of further evaluation due to endorsing a high number of atypical, improbable, inconsistent, or illogical symptoms.

In addition to the cutoff score of > 14 published in the manual, several additional cutoff scores have been examined in the extant literature (e.g., van Impelen et al., 2014). van Impelen et al. (2014) conducted a meta-analysis of studies examining SIMS cutoffs, separating published research into two categories: (1) known groups and (2) simulation studies. In the simulation studies, they reported that utilized cutoff scores were either > 14 or > 16. They concluded that in those studies, SIMS scores differed significantly between experimental feigners and honest responders, and generally those studies reported rather high sensitivity values (0.63–1.00) with variable specificity (0.23–1.00). Effect sizes were variable as well, with Cohen’s d values ranging from 0.5 to 4.7. In the known-groups studies, a number of cutoffs were reported > 13, > 14, > 16, > 19, > 21, and > 23. Sensitivity values in those studies were relatively low for cutoffs > 21 (0.68) and > 23 (0.55) but much higher (0.75–1.00) for lower cutoffs. Specificity values were 0.73 and 1.00 for cutoffs > 21 and > 23, respectively, and they ranged between 0.37 and 0.93 for lower cutoffs. Overall, based on information presented by van Impelen et al. (2014), SIMS scores produce adequate sensitivity values at lower cutoffs, but specificity values are often low in known-group studies or highly variable in simulation studies. In summary, van Impelen et al. (2014, p. 1356) concluded that cutoffs > 14 and > 16 are “perilous” and that cutoff scores of > 19 (Clegg et al., 2009) or > 23 (Wisdom et al., 2010) should be considered.

In the present manuscript, we examined literature published to date (in English) that explored various SIMS cutoffs, and findings were organized based on the context in which the study was conducted: (1) research, (2) clinical, or (3) forensic settings. All studies included in the present review are outlined in Table 1. In a research setting, the initial validation study of the SIMS was conducted by Smith and Burger (1997). Participants in that study were college students (N = 476) who were assigned to an honest responding group or one of seven simulation conditions (simulating psychosis, amnestic disorder, neurological impairment, mania, depression, low intelligence, and “faking bad”). Results of that study indicated that the SIMS total score had the highest sensitivity (95.6%) and overall efficiency (percent correctly classified as malingering or non-malingering; 94.5%) compared to other validity indices (such as the F and K scales of the MMPI and 16PF Faking Bad scale). Corresponding specificity (87.9%) was adequate. The authors concluded that their data supported the construct validity of the SIMS, and that individuals who obtain a total score > 1 4 on the SIMS may be “suspected of malingering” (p. 188) and may require a further evaluation. Another simulation study published approximately around the same time by Rogers et al. (1996) involved 53 adolescent offenders who were asked to complete the SIMS under honest and feigning conditions. The authors utilized a different SIMS total cutoff score of > 16 and concluded that this score was moderately effective as a screening measure in identifying feigned protocols. However, NPP rate was rather low (0.62). The authors examined other cutoff scores for the SIMS and concluded that a score of > 40 (PPP = 0.49; NPP = 0.94) would be more optimal in their sample of dually diagnosed adolescents who were undergoing court-referred residential psychological treatment. Similarly, Rogers et al. (2014) reported that at cutoff score of > 14, specificity was very low (0.28) and concluded that a much higher SIMS total cutoff score (> 44) is needed to achieve high specificity (0.98) in their sample of mental health inpatients with extensive trauma histories who were asked to simulate disability.

In general, most simulation studies have examined two cutoff scores: > 14 and > 16. Across all simulation studies included in the present review (Table 1), sensitivity values for cutoff > 14 ranged between 0.52 and 0.98, whereas specificity values ranged between 0.28 and 1.00 (although most studies indicated that this cutoff achieved specificity in the 0.79–1.00 range). For the cutoff score of > 16, sensitivity values ranged between 0.36 and 0.98, whereas specificity values ranged between 0.23 and 1.00 (although most studies indicated that this cutoff score achieved specificity in the 0.83–1.00 range). Reported classification accuracy rates for a score of > 14 in simulation studies ranged between 0.57 and 0.95, whereas classification accuracy for > 16 ranged between 0.68 and 0.99 (see Table 1).

In clinical contexts, classification accuracy of the traditional cutoff score of > 14 produced mixed results. One of the first studies to raise concerns about the applicability of the traditional cutoff > 14 was published by Edens et al. (1999), who found that SIMS specificity rates were low in individuals reporting clinically significant distress and concluded that this cutoff score may be problematic due to potentially high false positive rates among clinical populations. Edens et al. (2007) later noted that at > 14 cutoff, specificity values ranged from 0.40 in psychiatric patients to 0.97 in controls, highlighting that although specificity was high in the control group, it was poor among patients with genuine psychiatric disorders. Some more recent studies have shown inadequate specificity (0.55) with higher sensitivity (0.76) for this cutoff (Benge et al., 2012), while other studies have revealed lower sensitivity (0.52–60) but moderate to high specificity values (0.75–0.98) associated with it (e.g., Harris & Merz, 2021; Puente-Lopez et al., 2020).

Regarding base rates of failure at different cutoff scores, research indicates that in some populations, failure rate at the cutoff > 14 may be rather high. For example, Ord et al. (2021) reported failure rate of approximately 45% for this cutoff in a sample of Iraq and Afghanistan combat veterans, compared to 14% failure rate when > 23 cutoff was utilized, suggesting that higher cutoffs may be more appropriate for use in certain populations. This high rate of overreporting may be attributed to the fact that combat veterans often have multiple comorbid conditions and subsequently may display higher levels of genuine psychopathology, which may in turn elevate their SIMS scores. These authors suggested that the traditional SIMS cutoff (> 14) may not be appropriate for use in combat-exposed veterans and that further research is needed to examine the use of SIMS in this population. Similarly, Harris and Merz (2021) examined different SIMS cutoffs in a sample of 110 patients at an adult neuropsychology clinic and found a high rate of elevations at the standard cutoff score of > 14 (45.5%), which corresponded to a failure rate of 24.4% in a group of “low suspicion” cases and a failure rate of 95.7% in “high suspicion” cases. They concluded that SIMS scores should be interpreted with caution and the score of > 14 could be used for screening purposes to determine whether a more thorough follow-up would be needed. They recommended a cutoff score of > 16 to be used in neuropsychological populations but noted that it still had only modest specificity.

Because the traditional cutoff score of > 14 may not be optimal for the use in clinical populations, a number of alternative cutoffs have been examined. For example, Dandachi-FitzGerald et al. (2020) examined SIMS cutoffs > 16 and > 19 in 120 clinical patients diagnosed with dementia, mild cognitive impairment, or Parkinson’s disease. They reported that only 10% obtained a score > 19, and 12.5% obtained a score > 16, concluding that the SIMS may have adequate classification accuracy rates at these cutoffs for these populations. Another study (Czornik et al., 2021) examined a sample of 54 memory clinic outpatients and found the overreporting rate at the cutoff > 16 was 14.8%, and elevating the cutoff score to > 21 resulted in a 7.5% rate. Benge et al. (2012) explored the diagnostic utility of the SIMS in identifying psychogenic non-epileptic events and examined a number of cutoff scores ranging from 10 to 40. Cutoff scores > 14 and > 16 resulted in rather high sensitivity numbers (0.76 and 0.71, respectively), but corresponding specificity values were low (0.55 and 0.69). The first cutoff score to achieve an adequate specificity value of 0.90 was > 23, but with an accompanying sizable drop in sensitivity (0.39). Finally, Peters et al. (2013) examined 41 patients diagnosed with schizophrenia and 43 non-clinical controls. They reported that in their schizophrenia sample, the specificity rate at the cutoff > 16 was 0.71, and that it increased to 0.81 when the cutoff score was raised to > 19. To achieve a specificity rate of 0.90 or above, the cutoff > 21 would need to be utilized in their sample. Taken as a whole, findings of studies conducted in clinical contexts indicate that the traditional cutoff > 14 should be used with caution, even for screening purposes, and that higher cutoffs may be more appropriate for various clinical populations.

In forensic settings, a traditionally utilized cutoff > 14 demonstrated high sensitivity (0.85–1.00) values, which were similar to cutoffs > 13 (0.87–0.96) and > 16 (0.75–1.00) (Table 1). However, these cutoffs corresponded to rather low specificity values in forensic settings (0.37–0.67) (Table 1). To achieve acceptable specificity (≥ 0.90), the cutoff score needed to be increased to > 22 (Wisdom et al., 2010), whereas cutoff > 23 resulted in perfect specificity of 1.00 (Wisdom et al., 2010). Similarly, Clegg et al. (2009) noted that increasing the SIMS cutoff score to > 19 may improve the utility of this measure in the assessment of symptom validity among disability claimants. These authors examined 56 individuals seeking disability classified into honest and suspected malingerers based on their SIRS scores, and 60 individuals from the community who completed SIMS honestly or as if they were malingering depression. They found that individuals in both malingering groups received significantly higher SIMS total scores compared to the honest groups. However, no significant differences were found between the scores of two malingering groups. The authors concluded that cutoff scores > 14 and > 16 had excellent sensitivity but low specificity; therefore, increasing the SIMS total cutoff score to at least > 19 may improve the utility of this measure in the population of Social Security Disability claimants. Similarly, other authors have suggested that cutoff > 16 may be inadequate for forensic populations. For example, Lewis, et al. (2002) examined 64 men undergoing pretrial forensic assessments and reported 1.00 sensitivity but only 0.61 specificity rates for the cutoff > 16. Alwes et al. (2008) also reported moderate to high sensitivity rates (0.75–0.96) but low specificity rates (0.60–0.67) for the cutoff > 16 in their sample of 308 individuals who completed a neuropsychiatric evaluation for workers' compensation or personal injury claims. In summary, studies in forensic settings have shown that the traditional cutoff > 14 and the cutoff > 16 produced inadequate specificity. To achieve acceptable specificity (≥ 0.90) in forensic populations, higher cutoff scores appear to be generally more appropriate.

Cutoff Scores for SIMS Subscales

As stated earlier, the following cutoff scores were identified in the SIMS manual for each subscale: P > 1, NI > 2, AM > 2, LI > 2, and AF > 5 (Widows & Smith, 2005). Sensitivity values of subscales ranged from 0.76 (AF) to 0.89 (AM) at these cutoffs. Specificity values for the subscales ranged from 0.76 (P) to 0.9 (NI and AF). Smith and Burger (1997) reported the following sensitivity values for the subscales at the published cutoffs: AF = 0.75, P = 0.80, N = 0.85, LI = 0.85, AM = 0.88. Corresponding specificity values were reported as follows: AF = 0.73, P = 0.73, N = 0.76, LI = 0.52, AM = 0.91. Benge et al. (2012) examined various cutoffs for the NI and AF scales specifically. They reported that the cutoff score > 2 for the NI scale was associated with the sensitivity value of 0.75 and specificity value of only 0.62 in their sample of inpatients with epilepsy or psychogenic non-epileptic events. The traditionally utilized cutoff > 5 for the AF subscale corresponded with sensitivity of 0.58 and specificity of 0.72. Adequate specificity (> 0.90) was achieved at the cutoff > 5 for NI and the cutoff > 7 for AF, indicating that these scores may be more appropriate for use in populations with neurological concerns. Parks et al. (2017) reported the following sensitivity values for each subscale at the published cutoffs in a sample of 78 undergraduate simulators: P = 0.35, LI = 0.42, AM = 0.64, NI = 0.67, AF = 0.92. Specificity values were not reported. Finally, Harris and Merz (2021) reported rather high failure rates for most subscales based on the published cutoffs in their sample of 110 adult neuropsychology outpatients: LI = 36.4%, AF = 47.3%, NI = 55.5%, and AM = 64.5%. The only exception was P subscale (feigning rate was 17.3%), which was more consistent with the expected rate in clinical populations. In summary, although most SIMS subscales displayed moderate to high sensitivity, not all of them achieved adequate specificity values at the cutoff scores published in the manual, based on results of available empirical studies.

In addition to the five subscales identified in the SIMS manual (Widows & Smith, 2005), some authors have proposed additional subscales or indices. For example, Rogers et al. (2005) developed a new index: an arithmetic difference between scores on AF and N subscales. They suggested a cutoff > 0 for that index, which corresponded to sensitivity of 1.00 but low specificity of 0.31 (total hit rate = 0.77). They found significant differences on that new index between the malingering condition and both factitious conditions in a sample of 65 psychology doctoral students using a simulation study design.

Further, the previously mentioned RS and SC subscales (Rogers et al., 2014) produced high specificity (0.98) with moderate sensitivity values of 0.42 (RS) and 0.67 (SC) at the recommended cutoff > 6. The authors concluded that these scales had potential utility in identifying simulated mental disorders. Later, Edens et al. (2020) examined classification accuracy of the RS and SC subscales in three archival samples: (1) 115 inmates (general population and inmates in a prison psychiatric unit receiving treatment for mental disorder), (2) 196 college students, and (3) 48 community-dwelling adults. They concluded that the suggested cutoff score > 6 for both subscales produced relatively low sensitivity values (ranging from 0 to 0.56 for different subsamples), although with corresponding high specificity (ranging from 0.87 to 1.00). These scales have been criticized (Cernovsky & Ferrari, 2020), and further validation is warranted before employing these scales clinically.

Interpretive Recommendations

Based on available known groups studies, we offer the following interpretive recommendations. These are general guidelines to consider. First, SIMS cutoff scores should be determined a priori, after taking into account: the evaluative setting’s feigning base rate (as best it can be estimated), the extent to which (in a given evaluative setting) examinees with genuine psychopathology are likely to produce elevated SIMS scores, and one’s willingness and ability to conduct a comprehensive feigning assessment when respondents produce SIMS scores above the pre-established cutoff score. Second, in most contexts, the manual cutoff score of > 14 will result in a high number of false positives and should generally be avoided. Therefore, we recommend using a cutoff score of > 19 for screening purposes and > 23 when used as convergent data (i.e., in a test battery with multiple SVTs). Alternately, in either a screening or battery use, SIMS users may also consider a graded approach for interpreting SIMS scores as follows: possibly invalid > 16, probably invalid > 19, very likely invalid > 23. Of note, we recommend the use of terminology other than malingering when describing test scores, given that the SIMS does not measure malingering per se, but instead measures feigning. Regardless of which interpretive approach is used, a SIMS Total cutoff score of at least > 23 is recommended for forensic situations.

Beyond the total score, caution is warranted when considering elevated subscales, given lack of research on subscale classification accuracy. Invalid subscale scores based on manual cutoffs might be best for explaining what types of exaggerated symptoms are driving an elevated Total score, rather than being used in isolation or when the total score is not elevated. Finally, when administering the SIMS to groups with certain characteristics, such as undergraduate students, or patients presenting with PTSD symptoms, we recommend consulting the research literature for studies examining similar populations. For example, in some research settings using the SIMS to exclude invalid subjects in a sample of healthy undergraduates, using the manual cutoff score of > 14 might be appropriate. In other contexts, more extreme cutoff scores might also be relevant (e.g., > 44 for psychiatric inpatients; Rogers et al., 2014). Table 1 can help readers identify studies with samples relevant to their own settings.

Strengths and Weaknesses

Strengths

The SIMS has a number of strengths that likely contribute to its popularity. First, the SIMS offers evaluators an efficient and effective method to rule out feigning among a sizeable proportion of examinees, so that they can devote the additional resources necessary for comprehensive symptom validity assessment to those demonstrating a greater likelihood of feigning. Second, the SIMS has several practical advantages: it takes only about 15 min to administer, is easy to score, is written at a fifth-grade reading level, costs less per test than many other symptom validity measures, and is available in a variety of administration modalities and several languages. Additionally, the SIMS shows incremental validity when compared to clinical judgments based on interview and record review data alone. Finally, the SIMS has been validated with a variety of cultural groups in the USA and Europe.

Weaknesses

Despite the noted strengths of the SIMS, several limitations emerge from existing research. First, the SIMS does not appear to show incremental validity beyond the validity scales of standard multiscale inventories like the MMPI-2-RF or the PAI. However, this limitation highlights the instrument’s two primary uses: (1) as a screener for identifying situations where more thorough symptom validity assessment is needed and (2) as a supplement in a battery of validity measures. Second, the SIMS does not exhibit robust discriminant validity when evaluating groups with severe psychopathology, and evaluators should use higher cutoff scores more generally (e.g., total > 19 [van Impelen, et al., 2014] or > 23 [Wisdom, et al., 2010]) or based on studies using samples with similar conditions (see Table 1).

The most notable limitation to the SIMS is that although SIMS items are described as containing “implausible, rare, atypical, or extreme symptoms that bona fide patients tend not to present” (Mazza et al., 2019, p. 5), this does not appear to be the case for all items. For example, two items describe trouble sleeping in general and middle insomnia specifically, neither of which are worded as extreme or severe. Yet, sleep disruption is a common complaint, with the DSM-5 noting that about a third of the general population suffers from insomnia symptoms (American Psychiatric Association, 2013). Similarly, from the content perspective, many items, particularly on the AF and AM scales, reflect genuine complaints common in clinical populations. In other words, many SIMS items represent quasi-rare symptoms (Rogers, 2018), as opposed to bizarre ones. A final weakness is the use of the term malingering in the SIMS name. Because the SIMS does not actually measure malingering, the test should be referred to as a symptom validity test, not a malingering test.Footnote 5

One particular group of authors has recently focused on criticizing the SIMS to the extent of calling the use of the SIMS malpractice (Cernovsky & Diamond, 2020; Cernovsky & Fattahi, 2020; Cernovsky et al., 2019a, 2019b, 2019c, 2020) and claiming that the SIMS is “essentially an iatrogenic pseudoscientific instrument” (Cernovsky et al., 2020, p. 9). They make a few valid points, notably that there is an issue of false positives using the manual cutoff scores, that many individual items in the SIMS can reflect genuine/common problems, and that additional validation in true medical groups is needed. Nevertheless, the group’s opinion appears to be biased, and their theses tend to be problematic across a few themes. First, the authors show a misunderstanding that the SIMS is intended to be used in isolation to diagnose malingering specifically. They also describe professionals misusing the test in this manner but blame the test itself. The authors do not cite the numerous instances in the SIMS manual where this is specifically not the recommended use (it is recommended to be used either as a screener or as one part of a multi-method full battery evaluation). Thus, individuals using the SIMS alone to diagnose malingering are not using the test correctly, which is not an inherent flaw or weakness of the test itself. Second, there is an apparent view that concussion patients (specifically, civil forensic MVA patients) should be expected to show chronic impairments, which has not been supported in the vast concussion literature, with persisting symptoms often best explained by non-injury factors, most notably litigation (Carroll et al., 2004, 2014; Cassidy et al., 2014; McCrea, 2008).

Third, the authors appear to lack an understanding of symptom exaggeration as opposed to symptom fabrication: i.e., an SVT can be made of “real” (non-bizarre) symptoms, but endorsing a high number can indicate exaggeration. Many SVTs do in fact effectively use real symptoms, where the high endorsement of symptoms is indicative of symptom exaggeration (e.g., the F scale on the MMPI). Rogers (2018) refers to these detection strategies as quasi-rare symptoms among a variety of amplified detection strategies. Last, the authors engage in the relying on the error of psychopathology as superordinate (Merten & Merckelbach, 2013), or explaining feigning behaviors using psychiatric symptoms (e.g., “cry for help”). In short, readers would be cautioned against using these papers as the basis of evaluating the SIMS.

Future Perspectives

There are several opportunities for future research to further the utility of the SIMS. More generally, there is room for additional studies using the SIMS across a variety of contexts (research, clinical, forensic), psychological and medical conditions (e.g., PTSD, TBI, neurological conditions), and populations, particularly military and veteran samples. SIMS research would benefit from additional convergent validity studies. Further research on the original five subscales would also be welcome given the majority of existing research has focused on the Total score. Also, an update to the manual might be helpful, incorporating the numerous studies reviewed here, and in the van Impelen et al. (2014) meta-analysis, into updated recommendations for cutoff scores across a variety of different situations. Although several translations have been created, additional translations and further validation in diverse cultures might also be considered for future work.

The SIMS manual (Widows & Smith, 2005) describes the measure as a screening instrument, but additional research has evaluated the utility of higher cutoff scores in some situations. For example, van Impelen et al.’ meta-analysis (2014) suggests using lower cutoff scores (> 14 or > 16) for most screening purposes and higher cutoff scores (> 19 or > 24) when using the SIMS as part of a test battery. Both the manual and the aforementioned meta-analysis note that the SIMS is also useful when added to other SVTs for the purpose of providing convergent data, and future studies might further explore how to best use the SIMS when employed in a battery using numerous validity indicators (e.g., incorporating interpretation of multivariate base rates).

In addition to screening versus test battery cutoff scores, future research could further assess the idea that the SIMS measures two separate feigning constructs: symptom magnification and symptom fabrication. This classification has been explored in several studies using the SIMS (Mazza et al., 2019; Orrù et al., 2021), and additional validation of these underlying constructs could lead to improved precision when describing and interpreting feigning behavior. Similarly, the several short forms (Malcore et al., 2015; Orrù et al., 2021; Spencer et al., 2021) and different subscales (Rogers et al., 2014) need cross-validation before being deployed clinically. Relatedly, there are few studies that have evaluated SIMS scores in relation to performance validity tests, and similar to the RBS scale on the MMPI-2-RF/3, the AM SIMS subscale in particular might be useful for screening for invalid performance, as was found by Spencer et al. (2021). Finally, as additional studies are published on the SIMS, an updated meta-analysis might prove beneficial.