Introduction

Ethics research has become increasingly important and relevant in today’s world. Since the Enron and WorldCom scandals, corporate stakeholders have been more concerned about ethical issues and practices, especially in the financial world. The AACSB (Association of Advance Collegiate Schools of Business) has also recommended that business schools emphasize business ethics in their teaching, curricula, and research (Chan et al., 2010). For the past century, the vast majority of ethics research had employed self-reported surveys as an observation technique to investigate ethical intention and tolerance towards unethical acts (Ford & Richardson, 1994; O'Fallon & Butterfield, 2005; Randall & Gibson, 1990, 2013). Unfortunately, self-reported surveys, especially studies that examine ethical or unethical behaviors, are very susceptible to the problem of socially desirable responding (Dalton & Ortegren, 2011; Randall & Fernandes, 1991). It is because there are no precise “right” or “wrong” answers for such questions, and individuals have the tendency to want to appear more altruistic and society-oriented than they actually are (Krumpal, 2013). According to the subjective expected utility theory, individuals are more likely to select an option that maximize (minimize) the positive (negative) outcomes of their response (Shanteau & Pingenot, 2009). Hence, individuals are more likely to agree to statements that match their social norms (which they assume the majority of other individuals will answer) instead of statements that reflect their true feelings, in order to present a favorable image or to avoid negative feelings (Tourangeau & Yan, 2007; Zerbe & Paulhus, 1987). Such bias is known as social desirability bias (SDB).

The presence of SDB presents a significant risk to the validity of research findings, especially studies in ethics, compared to other more conventional studies in organizational behavior (Campbell & Fiske, 1959; Kemery & Dunlap, 1986; Randall & Fernandes, 1991). Such claims have been supported by Fernandes and Randall (1992) who found SDB had higher significant effects on ethics-related inquiry as compared to other types of inquiries. Bernardi et al. (2003) also found SDB to be consistently significant in responses to ethical dilemmas. For instance, individuals may perceive themselves as above the average person in positive characteristics such as honesty, ethicality, or reasonableness while less in negative characteristics such as being confrontational, lazy, or unethical (Dalton & Ortegren, 2011; Goethals et al., 1991; Randall & Fernandes, 1991; 2013; Tyson, 1990, 1992). All these studies provide support to the subjective expected utility theory that individuals tend to present themselves favorably so as to be accepted by society or to present a certain image of themselves.

SDB may lead to misleading findings that do not reflect the actual phenomenon being studied (Goethals et al., 1991; Larkin, 2000; O’Clock & Okleshen, 1993). Therefore, controlling SDB is crucial in ethics research or any studies which involve sensitive questions. In a previous study conducted by Randall and Gibson (1990), only one out of 96 studies had measured SDB in ethics research conducted in the past 29 years (1961–1989). This result is not encouraging, given the fact that individuals tend to present themselves favorably, especially when answering sensitive questions such as those asked in ethics studies (Krumpal, 2013). However, the study by Randall and Gibson (1990) was conducted 30 years ago, and we are of the opinion that an update to their study is needed to determine whether current studies in ethics include a measure of SDB. Therefore, this study is an attempt to systematically review recently published ethics research with three specific objectives: (1) to provide an overview of the current SDB trend in ethics research, (2) to examine whether SDB has been considered in recent studies, and (3) to provide an account of the scales commonly used to measure SDB. Although similar reviews have investigated the use of SDB scales, those studies were in clinical psychology (Perinelli & Gremigni, 2016) and the nursing context (Van de Mortel, 2008), and only focused on a short period (e.g., 2010–2015). Hence, both the implications and recommendation of both these studies are not generalizable to ethics research.

This article is organized into five sections. In the first two sections, we presented the historical development and concept of SDB and highlighted how we searched for empirical research on SDB. In the third and fourth sections, we present the results and findings of our review. In the last section, we discussed the study’s implications and limitations and provided some suggestions for future research.

SDB: the concept and development

SDB was first discovered in personality inventory questionnaires where desirable traits (e.g., hardworking, honest, and generous) tended to be scored higher than undesirable traits (e.g., laziness, shy, and dirty). This discovery raised the suspicion that respondents who were rated “good” on a personality inventory questionnaire were, in fact, “faking to look good” (Bernreuter, 1933; Humm & Humm, 1944). Thereafter, Edwards (1953) introduced the concept of SDB by expanding the work of Humm and Humm (1944). SDB is generally defined as “the need for social approval and acceptance, and the belief that it can be attained by means of cultural acceptance and appropriate behavior” (Crowne & Marlowe, 1960; p. 109). In other words, individuals who tend to approve socially desirable behaviors (e.g., voting in elections, church attendance) and disapprove socially undesirable behavior (e.g., abuse of alcohol or drugs) maybe be exhibiting SDB (Zerbe & Paulhus, 1987). Giving such socially desirable responses can lead to incorrect and/or inaccurate correlations between independent and dependent variables; and also affect the mediating and/or moderating effects between them (Ganster et al., 1983). In short, SDB can confound research findings and even contaminate the validity of the research (Campbell & Fiske, 1959; Kemery & Dunlap, 1986).

Edwards (1957) developed a scale to measure SDB using some of the items from the Minnesota Multiphasic Personality Inventory (MMPI). However, Edward’s scale received numerous criticisms from Wiggins (1959), who claimed that the scale was not adequate for the task because it lacked empirical validity evidence and had psychopathology implications. Consequently, Crowne and Marlowe (1960) developed the 33-item Marlowe-Crowne Social Desirability scale (MCSDS) to address the weaknesses of the Edward scale. The MCSDS was designed to capture infrequent, but socially approved behavior; and also frequent, but socially disapproved behavior, wherein those people who tended to achieve high scores for socially approved behavior and low scores for socially disapproved behavior were considered to exhibit SDB (Uziel, 2010). The MCSDS gained popularity and was frequently cited in survey research in the early 1960s (Furnham, 1986). The original MCSDS contained 33 items. Subsequently, several simplified versions of the MCSDS were developed (see Hays et al., 1989; Reynolds, 1982; Strahan & Gerbasi, 1972). In all these scales, SDB was treated as a unidimensional construct.

Since then, SDB has evolved into a multidimensional construct where the most frequently used dimensions have been “self-deception” and “impression management” (Paulhus, 1984). These two dimensions were first introduced by Sackeim and Gur in 1978, as “self-deception” and “other-deception,” wherein “self-deception” was described as unrealistic positive self-depictions and other-deception as the conscious and deliberate distortion of self-descriptions to fool an audience (Sackeim & Gur, 1978). However, Paulhus later renamed “other-deception” to “impression management.” He defined “self-deception” to be the propensity to “unintentionally portray oneself in a favorable light, manifested as a positively biased but an honestly believed self-description” and “impression management” to be “the tendency to intentionally tailor one’s public image in order to be viewed favorably by others” (Paulhus, 2002, p. 54).

Based on these two dimensions, Paulhus (1984) developed the 40-item Balanced Inventory of Desirable Responding (BIDR) scale wherein 20 items were used to measure self-deception and the other 20 items to measure impression management on a 7-point Likert scale. However, the 40-item BIDR was criticized for being too lengthy. As a result, several simplified versions of the BIDR have been developed (see Bobbio & Manganelli, 2011; Hart et al., 2015), with the result of self-deception being further divided into the sub-dimensions of “self-deception enhancement” and “self-deception denial” (Paulhus, 2002; Paulhus & Reid, 1991; Vecchione & Alessandri, 2013); while impression management was further divided into “agentic management” and “communal management” (Blasberg et al., 2014). Self-deception enhancement involves promoting one’s positive qualities, while self-deception denial involves denying one’s negative qualities (Paulhus & Reid, 1991). On the other hand, agentic management involves exaggerating one’s social or intellectual status, while communal management involves denying socially deviant impulse and claiming divine attributes (Blasberg et al., 2014). A more recent scale is the Bi-dimensional Impression Management Index (BIMI) by Blasberg et al. (2014) which measures both agentic management and communal management.

According to Beretvas et al. (2002), there are three uses for a SDB scale. First, the SDB scale can be used to validate other scales by comparing the newly developed scale with the SDB scale. If there was no correlation between the new scale and the SDB scale, this implied that the new scale was not biased in a socially desirable manner. Second, the SDB scale can be used to verify whether SDB and the items in the other scale are distinct dimensions. Third, the SDB scale can be used to improve the quality of the data. Samples with high SDB scores should be removed if they exhibit SDB since such samples may cause misleading results (Beretvas et al., 2002; Leite & Beretvas, 2005). Therefore, the SDB scale should be incorporated into sensitive research topics such as ethics-related studies to control for SDB to ensure that the results obtained from such studies are not biased.

Data and methodology

The quality of a literature review is contingent upon the relevance and quality of the journals reviewed; i.e., these journals must be both valid and reliable (Vom Brocke et al., 2009). Based on the citation and impact analysis study conducted by Paul (2004) on business ethics journals, Business & Society (B&S), Business Ethics Quarterly (BEQ), and the Journal of Business Ethics (JBE) were found to be the leading and most widely recognized journals in the field of business ethics. Hence, articles published in these journals were included in this study. To broaden our scope of review, we had also included several other well-known business ethics journals such as Business Ethics, the Environment and Responsibility (formerly Business Ethics: A European Review, BEER), Business and Society Review (BSR), Ethics and Information Technology (EIT), Ethical Theory and Moral Practice (ETMP), International Journal of Value Based Management (IJVBM), Journal of Markets and Morality (JMM), and Teaching Business Ethics (TBE) as suggested by Sabrin (2002) and Chan et al. (2010). As a result, ten business ethics journals were assessed in this study (see Table 1). However, two journals had to be excluded in this study because IJVBM and TBE were later rolled into JBE in January 2004 (Chan et al., 2010). Hence, the review of the articles was based on eight journals. The criteria for the selection of articles from these journals were (a) empirical studies on topics related to ethics and (b) studies that incorporated a precise scale to measure SDB. The publication dates were limited to articles published between January 2000 and December 2019 (20 years) to focus on the recent trends. With this 20-year data, we were able to compare the current trend and provide an overview of SDB in ethics research.

Table 1 List of leading business ethics research journals

The keyword “social desirability” was used to search the titles, abstracts, keywords, and content for the articles. Eligibility assessments were performed manually by the authors and cross-validated between authors to minimize biases. Disagreements were resolved through consensus from all authors. Since our study focused on ethics, studies which did not contain any ethical element (e.g., job satisfaction, turnover intention, and organization culture) were excluded. Abstracts were reviewed, qualifying texts were retrieved, and information was extracted using a data extraction form and verified among the authors. Information extracted from the articles included journal name(s), year of publication(s), scale measurement of SDB used, country the research conducted, types of the respondent, mode of survey, and the research finding(s).

Results

During the 20-year sample period, a total of 787 articles were initially identified in the preliminary search. However, upon examining the abstract, keywords and full texts, 202 articles were excluded due to not containing any elements of ethics. Subsequently, the remaining 585 articles were further examined to include a measure of SDB. This led to another 505 articles being discarded for not including a measure of SDB (see the flow diagram in Fig. 1). For example, the studies by Culiberg and Bajde (2014) and De Waegeneer et al. (2016) were listed in our preliminary search but were later excluded because both studies did not include a scale to measure SDB. As a result, only 80 articles which fulfilled the criteria were retained for further review.

Fig. 1
figure 1

Flow diagram of the literature search

Table 2 presents the journals’ information of the 80 reviewed articles, which satisfied the designated criteria (refer to the Appendix for a full list of all 80 included articles). Among these 80 reviewed articles, the Journal of Business Ethics contributed the highest number of studies that measured SDB (90%), while Business Ethics, the Environment and Responsibility was ranked second with 7.50%, followed by Business Ethics Quarterly (1.25%) and Business and Society (1.25%). No articles in Ethics and Information Technology, Ethical Theory and Moral Practice and Journal of Markets and Morality incorporated a measure of SDB.

Table 2 Summary of reviewed articles by journal

Table 3 presents the reviewed articles by year and by journal. The results showed that at least one article per year that incorporated a measure of SDB was published in the Journal of Business Ethics while Business Ethics, the Environment and Responsibility showed a slight increase in recent years. We divided the 20-year data into two 10-year periods to make a comparison. The results showed a significant decrease between 2000 and 2010 in the number of articles that measured SDB but later picked up again between 2011 and 2019. Although several authors had repeatedly singled out ethics as a phenomenon that was easily influenced by SDB (Dunn & Shome, 2009; Randall & Fernandes, 1991; 2013), only 13.67% (or 80) of ethics-related articles incorporated a measure of SDB into their research based on our review. Most articles merely mentioned SDB as their study limitation (see Chang & Yen, 2007; Hoogervorst et al., 2010; Mulki et al., 2009; Wagner-Tsukamoto, 2009).

Table 3 Number of articles by journal and year that have incorporated an SDB scale (N=80)

Figure 2 presents the country of the reviewed articles conducted by region. The regional classification was according to the geographical location of the study where it had been conducted. Among the 80 reviewed articles, studies conducted in the United States (U.S) and Canada accounted for 62%, Asia-Pacific – 16%, United Kingdom (U.K) and Europe – 10% of the reviewed articles. The remaining 12% of the reviewed articles were either studies conducted in one or more countries or which did not mention the location of their studies. From the data presented in Fig. 2, it can be seen that by far the highest number of studies which incorporated a measure of SDB were by researchers from the U.S as compared to their counterparts in the U.K, Europe, and Asia-Pacific. We also present the distribution of reviewed articles by the year of publication and by region in Fig. 3. Based on Fig. 3, regions which consistently published at least one study that incorporated a SDB scale every year was the U.S and Canada, followed by the Asia-Pacific region and finally the U.K and Europe region. Furthermore, we divided the 20-year data into two 10-year periods to make a comparison. The results showed a significant increment in the number of studies that had incorporated an SDB scale after 2011 from all regions as compared to pre-2011. This indicates a positive step toward recognizing SDB as a possible research bias and the need to control or measure such a bias in the studies conducted. Perhaps reviews such as those by O'Fallon and Butterfield (2005) and Krumpal (2013) which highlighted the issues associated with SDB may have helped create awareness on the importance of controlling for SDB among ethics researchers.

Fig. 2
figure 2

Distribution of reviewed articles by region

Fig. 3
figure 3

Distribution of reviewed article by year of publications and region. Articles that did not mention the location of their study or which were conducted in multiple countries were excluded from this analysis

Table 4 provides a summary of reviewed articles according to the types of scales adopted to measure SDB. Among the scales identified, the shortened versions of the MCSDS (see Fraboni & Cooper, 1989; Hays et al., 1989; Strahan & Gerbasi, 1972) were the most frequently used SDB scale (used by 61.25% of authors), followed by the BIDR scale (16.25%) (Paulhus, 1984; Steenkamp et al., 2010) and the full version of the MCSDS (11.25%) (Crowne & Marlowe, 1960). The remaining articles adopted either the Eysenck Personality Questionnaire–Lie scale (1.25%), Bolino and Turnley ' s (1999) Impression Management Scale (1.25%), the Over-Claiming Scale (5.10%), a German version of the SDB scale (1.25%), the 4-item SDB (1.25%), or the SDB scale for the working context (1.25%). What stands out in Table 4 is the scale preferences in measuring SDB by the scholars. For instance, only a small percentage of ethics research scholars used the BIDR scale (Paulhus, 1984) even though the scale is based on newer theoretical and empirical knowledge and tested using more rigorous multivariate techniques than the MCSDS. In fact, the BIDR scale is said to better able capture two widely recognized dimensions of SDB (i.e., self-deception and impression management) while the MCSDS did not have any specific dimensions (Paulhus, 1984). Our review also showed that scholars preferred the shortened version of the MCSDS scales, even though there were numerous shortened versions of the BIDR scale (e.g., BIDR-16; BIDR-17). One possible reason for the popularity of the MCSDS could be due to the fact that the MCSDS was the earliest scale created to measure SDB, which may result in it being the most well-known too. Furthermore, Lambert et al. (2016) found the MCSDS outperformed the BIDR in identifying fakers, thus making it suitable to detect for SDB.

Table 4 Types of scales used to measure SDB

Figure 4 presents the study population of the reviewed articles. The majority of the reviewed articles sampled non-student respondents (51 articles or 63.80%) as compared to student respondents (19 articles or 23.80%), while five (or 12.40%) of the reviewed articles used a combination of student and non-student respondents. From the data in Fig. 4, it can be seen that 54 of the 80 articles showed SDB to be significant in at least one of the variables in their study (or 67.50% of the total reviewed articles) and approximately 33% (or 26 articles) stated that SDB did not influence their studies. Interestingly, Fig. 4 also showed that student respondents were as just as inclined to provide socially desirable responses, with 15 of the 19 articles which used student respondents found SDB to be significant. This indicated that students may also attempt to present a favorable impression in order to gain acceptance from their classmate and teachers and avoid rejection (Andrews & Meyer, 2003; Juvonen & Weiner, 1993; Pansu et al., 2008). Such evidence repudiates the earlier claims that student respondents are less likely to manifest SDB (Fastame & Penna, 2012).

Fig. 4
figure 4

Significance of SDB in the reviewed articles by study population

Figure 5 summarizes the reviewed articles by mode of the survey and significant results with SDB. Our review found that 83.75% (or 67) of the reviewed articles used the offline mode of survey (i.e., traditional paper-and-pencil method), indicating that ethics researchers still preferred the offline mode of survey method as compared to the online mode of surveys (16.25% or 13 of the 80 studies). Of the 13 articles that used the online mode of surveys, ten studies (76.92%) found SDB to be significantly correlated with at least one of the study variables. On the other hand, SDB was significantly correlated with at least one study variables in 45 of the 67 articles (or 67.20%) that had used the offline mode of survey. These results indicated that SDB was present for both online and offline mode of survey.

Fig. 5
figure 5

Mode of the survey in the reviewed publications

Table 5 summarizes the significant and non-significant findings of SDB by categories of their dependent variables. The organizational ethics category included 36 studies that explored corporate ethical values, counterproductive work behaviors, organizational ethical climates, and the organization’s corporate social responsibility. A total of 34 studies were categorized in the ethical decision-making category which included studies on ethical reasoning and ethical behaviors, while the environmental ethics category included seven studies that explored pro-environmental behaviors, ecological sustainability, and green consumer behaviors. Lastly, the moral philosophy category included three studies that examined deontological perspectives, idealism, and relativism. As shown in Table 5, 46 studies (57.50%) had at least one of their ethics-related variables significantly correlated with SDB while 34 studies (42.50%) found no statistically significant association between SDB with any of the ethics-related variables. From Table 5, it was interesting to note that SDB was significant in 20 of the 36 studies (55.55%) which explored organizational ethics while 22 of the 34 studies (64.71%) that examined ethical decision making found SDB to be significant or partially significant in their studies. This supports our earlier contention and those of past researchers (e.g., Dalton & Ortegren, 2011; Randall & Fernandes, 1991) that studies which examine ethical or unethical behaviors are susceptible to the problem of socially desirable.

Table 5 Summary of results according to categories

Table 6 presents the demographic variables which significantly and non-significantly correlated with SDB in the reviewed articles. Ten of the 25 reviewed articles (40%) that measured age found SDB to be significant, followed by four out of 23 reviewed articles (17.39%) and three out of 22 reviewed articles (13.64%) found tenure/work experience and gender to have a significant influence on SDB respectively. On the other hand, SDB was significantly correlated with types of job, position/job level, and education. What stands out in Table 6 is the fact that only a few demographic variables were found to be significantly correlated with SDB while a majority of the demographic variables tested in the studies were not. This finding may suggest that respondents’ demographic background may not be the root cause of SDB as previously mentioned by Fisher and Katz (2000), Kim and Kim (2015), and Larson (2019).

Table 6 Summary of demographic variables found significantly/not-significantly correlated with SDB

Discussion and conclusion

This article evaluated whether ethics researchers considered SDB when conducting their research and if SDB was a significant variable in ethics-related studies. With regard to the first question of the initial 585 ethics-related studies, only 80 studies incorporated a scale to measure SDB when conducting ethics research. A large majority (i.e., 90% or 72 articles) were published in the Journal of Business Ethics. On the question of whether SDB was a significant variable in ethics-related studies, this study found that such studies were indeed affected by SDB. Forty-six of the 80 articles found that SDB was significantly associated with at least one ethics-related research variable.

Most frequently used SDB scales

Within the 80 reviewed articles, the shortened versions of the MCSDS, especially the Strahan and Gerbasi (1972) and Reynolds (1982) versions were the most frequently used scale for measuring SDB, and this was followed by the BIDR scale and the full version of MCSDS. Such results may be due to the reason that SDB is usually used as a control variable (e.g., Valentine et al., 2019; Wang et al., 2019), and since lengthy questionnaires may cause response fatigue among the respondents, most researchers tend to opt for SDB scales with fewer items. However, the shortened versions of MCSDS have received numerous criticisms for its reliability and validity (see Ballard, 1992; Loo & Thorpe, 2000; Ventimiglia & MacDonald, 2012). For instance, Barger (2002) who analyzed nine shortened versions and the full version of MCSDS, concluded there was little evidence to support the model adequacy of different shortened versions across the different samples. He also noted that the apparent inadequacy of model fit found in some of the shortened versions might be a statistical artifact. Hence, based on empirical and conceptual grounds, Barger (2002) discouraged the use of both versions of the MCSDS as a control tool for SDB. Similarly, Leite and Beretvas (2005) also questioned the usefulness of the MCSDS as a measure of SDB due to instability in the scale’s dimensionality.

In addition, SDB was widely recognized to consist of two factors—i.e., self-deception and impression management. However, most reviewed studies adopted the full and shortened version of the MCSDS instead of the BIDR that claimed to able to measure both of dimensions. The MCSDS was initially developed to only capture one factor which is the need for social approval (Crowne & Marlowe, 1960) while BIDR was developed to measure self-deception and impression management (Paulhus, 1991). Although researchers (e.g., Loo & Thorpe, 2000; Paulhus, 1991; Smith, 1997) who explored the items of the MCSDS claimed the scale was not a single factor but loaded on both self-deception and impression management; the MCSDS, unlike the BIDR, was not able to clearly separate the two factors into subscales. Hence, researchers who adopted the MCSDS might not able to clearly examine the relationship between self-deception or impression management separately with the study variables. Therefore, we strongly suggest that researchers who desire to control for SDB should consider other measurement tools of SDB such as the BIDR scale which can address the weaknesses of the MCSDS (Hart et al., 2015; Paulhus, 1984). The use of creative ways such as experimental techniques, in-basket exercises, or asking respondents from other peoples’ perspectives could also be potential methods to reduce SDB.

Subsequently, our findings also revealed that ethics researchers had also used over-claiming and lie scales such as the Over-Claiming Scale and the Eysenck Personality Questionnaire–Lie scale as a substitute measure of SDB. While it can be argued that over-claiming is akin to not answering honestly (i.e., lying) and that individuals who score high in over-claiming or lie scales are said to portray SDB (Paulhus, 2012), we need to caution that over-claiming or lie scales only measure an “individual’s attempt to deceive a question” and does not measure socially desirable responding (Randall & Fernandes, 1991; p. 814). While over-claiming or lying has been found to correlate with SDB (McCrae & Costa, 1983; Paulhus et al., 2003), it is now understood that this correlation is in fact affected by personality characteristics rather than perceived item desirability (Feeney & Goffin, 2015; Randall & Fernandes, 1991). For this reason, over-claiming scales are not capable of detecting SDB (Kam et al., 2015), and in any case, using either the over-claiming or lie scale as a substitute of SDB measure is inappropriate.

Regions of studies

The findings also showed that studies conducted in the U.S were more likely to incorporate a scale to measure SDB as compared to studies conducted in the U.K and Europe. This was based on our results which found that 62% of the studies conducted in the U.S and Canada incorporated a measure of SDB as compared to only 10% of the studies conducted in the U.K and Europe. We also found SDB to be significant in at least one of the variables in studies from the U.S and Canada (46 articles or 57.50%). Such results might be due to the fact that these regions contributed more studies as compared to the other regions (Thomson Reuters, 2019). For instance, the Journal Citation Reports (JCR) published by Thomson Reuters showed that the U.S contributed 451 articles in the Journal of Business Ethics followed by the U.K and Australia which contributed 185 and 150 articles respectively. Furthermore, studies from the U.K and other parts of Europe only started to incorporate SDB after 2013. Therefore, the findings of the studies conducted between prior to 2014 should be interpreted with care since SDB could be present in the studies conducted.

Study population

Earlier studies (e.g., Gucciardi et al., 2010; Stober et al., 2019; Wang et al., 2019) found that younger respondents/students had a higher tendency to exhibit SDB as compared to adults/non-students. Ryff (1995) mentioned that adults were less likely to exhibit SDB because they were self-determined by their own values which allowed them to resist social pressures better as compared to the younger respondents. However, the findings of our review suggest that individuals, regardless of age, have the same tendency in providing socially desirable responses. This outcome was contrary to the findings of Long (2016), Morales-Vives et al. (2014), and Soubelet and Salthouse (2011) who claimed that younger individuals were less likely to exhibit SDB. Specifically, the evidence from our study found that younger respondents/students were just as capable of presenting favorable impressions or providing socially desirable responses in order to gain approval or acceptance as adults/non-students. Younger respondents/students are said to also exhibit the tendency to present a favorable impression of themselves in order to gain the approval/acceptance from their teachers and classmates (Juvonen & Weiner, 1993; Pansu et al., 2008). While student samples are widely used by researchers because of their availability, researchers should still include a measurement to detect SDB when collecting data from this population.

Mode of survey

Our study also reviewed the mode of survey used in the reviewed studies since SDB appeared to be closely linked to the mode of survey used. Several lines of evidence suggest that the level of SDB for online surveys is lower than those of offline survey mode (Aquilino, 1994; Booth-Kewley et al., 1992; Ramo et al., 2011). One of the main reasons for this is that online survey provides a greater sense of privacy, security, and confidentiality than the offline surveys. Consequently, respondents are more willing to disclose sensitive information through online surveys than offline surveys (Aquilino, 1994; Booth-Kewley et al., 1992). Hence, online survey mode is often adopted by researchers as one of the methods to minimize SDB. However, the results of our review did not support this claim. Our reviews found high significant results of SDB for studies which used offline (45 out of 67 articles or 67.16%) and online survey modes (10 out of 13 articles or 76.92%). It can therefore be assumed that SDB is present regardless of the mode of survey used. Our findings reflect those of earlier studies (e.g., Carlbring et al., 2007; Gnambs & Kaspar, 2017; Pettit, 2002) which show no significant differences between survey modes and SDB. A possible explanation why SDB could also be present in online survey mode might be that some respondents know that with advances in technologies, online surveys can be monitored and traced back to them (Rosenfeld et al., 1996; Whitener & Klein, 1995). Hence, in order to avoid any possibility of negative circumstances, respondents remain to exhibiting SDB regardless of the survey mode. Therefore, we can conclude that SDB is unavoidable regardless of the type of survey mode used. Adopting the online survey mode may only encourage respondents to share sensitive information but does not eliminate SDB (Booth-Kewley et al., 2007). Thus, we strongly encourage ethics researchers to incorporate both an indirect questioning technique (see Fisher, 1993) and a direct measurement method such as a SDB scale (see Podsakoff et al., 2003) even conducting their research through on online mode.

Categories of dependent variables

We could also conclude from our review that SDB was most frequently found to be a significant variable in organizational ethics studies, followed by ethical decision-making studies, and studies on environmental ethics and moral philosophies. Counterproductive work behaviors such as corruption, bribery, bullying, and cheating were the most frequently found variable to significantly correlate with SDB. This could be due to the reason that individuals were more likely to under-report such behaviors since these are viewed as “undesirable” or unethical behaviors. Another possible explanation is that most organizational ethics studies often required respondents to rate or compare their own work behaviors with those of their peers or superiors. Based on the holier-than-thou bias theory, individuals are more likely to perceive themselves to be more ethical than others (Dalton & Ortegren, 2011). These findings corroborate the earlier findings of Randall and Fernandes (1991), Randall and Gibson (1990), and Nyaw and Ng (1994) which showed SDB to be a significant variable in ethics-related studies. Future ethics researchers who measure ethics-related variables via self-reported surveys should always be aware of the possibility of SDB.

Demographic variables

Previous studies (e.g., Fisher & Katz, 2000; Kim & Kim, 2015; Larson, 2019) have discovered that the level of SDB varied across respondent’s demographic variables. However, our results indicated that except for age, the demographic variables in the studies reviewed had negligible or no significant impact on SDB. For instance, our review did not find any significant relationship between SDB and gender in any of the 19 studies reviewed. Similar results were also found between tenure/work experience and SDB where 19 of the 23 studies which examined this variable reported non-significant results. Our results reflect those of previous studies (e.g., Andrews & Meyer, 2003; Bobbio & Manganelli, 2011; Crutzen & Goritz, 2010; Haberecht et al., 2015; Kurz et al., 2016) who also reported no significant relationship between the demographic variables in their study with SDB. There are very limited studies which have examined the influence of demographic variables such as tenure/work experience, education level, types of jobs, and position/job level on SDB; and this would be a fruitful area for further work.

Among all the demographic variables, age was the only variable which had the greatest number of studies to have a significant relationship with SDB. We found that older respondents were more likely to provide socially desirable responses as compared to their younger counterparts. Such results were in line with Ones et al. (1996) and Thomsen et al. (2005) who found older respondents were more likely to over-report (under-report) desirable (undesirable) behaviors because they were more likely to be associated with positive traits (e.g., agreeableness and conscientiousness) and less likely with negative traits (e.g., neuroticism).

Theoretical implications

Our findings have several important theoretical implications. First, since the initial review conducted by Randall and Gibson (1990) which identified the number of ethics studies that had incorporated a scale to measure SDB, far too little attention has been paid to explore whether ethics researchers considered SDB when conducting their research and the extent to which SDB is actually measured in ethics studies. Prior studies have indicated that ethics-related studies are highly susceptible to SDB (Dalton & Ortegren, 2011; Randall & Fernandes, 1991), and steps should be taken to reduce this bias (Randall & Fernandes, 1991). Therefore, a review such as this is important to know about the state of ethics research and for researchers to evaluate the importance of controlling for SDB when conducting research in sensitive topics such as ethics. Our review found that of the initial 585 ethics-related studies, only 80 studies incorporated a scale to measure SDB when conducting ethics research. This figure, although is a significant increase from the review conducted by Randall and Gibson (1990), is still a small percentage as compared to the number of ethics-related studies conducted. This has allowed us to view the current state of ethics research, enabling us to further improve the reliability of data presented.

Second, there has been no detailed investigation to provide an overview of whether SDB was present in studies which used different modes of survey or type of respondents (younger persons vs adults) in the context of ethics studies. In addition, there has also been little discussion to provide an account of the scales commonly used to measure SDB primarily on ethics-related studies. While similar reviews have investigated the use of SDB scales, those studies were in clinical psychology (Perinelli & Gremigni, 2016) and nursing context (Van de Mortel, 2008). Our study addresses how the selection of survey mode and respondent type can be influenced by SDB and the importance of incorporating a measurement scale of SDB regardless of the type of survey mode used or sample selection. By doing so, we are able to identify and acknowledge which variables can potentially be influenced by SDB.

Finally, our review also identified the type of SDB measurement most used in ethics-related studies. This issue has grown in importance given that the MCSDS has been criticized for its weakly conceptualized dimensions (Ballard, 1992; Loo & Thorpe, 2000), outdated items (Snyder et al., 2000), validity issues (Dominguez Espinosa & Van De Vijver, 2014), and whether it measures SDB accurately (Jacobson et al., 1977). Indeed, Millham and Kellogg (1980) argued that the MCSDS did not measure SDB, but that the scale was more of a measure of “avoidance” (see Jacobson et al., 1970; Jacobson et al., 1977; Millham, 1974). However, the findings of our review indicated that the shortened and full versions of the MCSDS were still the most extensively used scale for measuring SDB. Since there have been several articles which have shown validity and conceptualization problems associated with MCSDS, ethics researchers should be cautious in using this scale to detect SDB. We also suggest various other methods to reduce SDB such as experimental techniques, in-basket exercises, or asking respondents from other people’s perspectives which provide a basis for further empirical work on identifying which method is effective in detecting SDB.

Limitations and future research

Our study is not without limitations. First, our study was limited to only ethics-related research; future research could look into other areas such as organizational and marketing studies which have been found to be also influenced by SDB (Fisher, 2000; King & Bruner, 2000). Second, our study only reviewed articles that had incorporated a measure to control for SDB. Future research could look into the effectiveness of other methods such as item randomized responses and indirect questioning techniques that have been cited as effective measures to control for SDB (De Jong et al., 2010; Fisher, 1993). Finally, studies need to be carried out to determine how the different versions of SDB scales are effective in detecting SDB.

As a conclusion, this study provided an overview of SDB in ethics-related studies conducted during the past 20 years (from 2000 to 2019). Only 80 articles published in eight leading business ethics journals have incorporated a SDB scale. Most of these articles were conducted in the U.S and were published in the Journal of Business Ethics. SDB is a significant variable in most of the ethics-related studies regardless of study samples or mode of survey used. Therefore, we strongly encourage ethics researchers who adopt self-reported surveys to include a measurement to detect SDB in their studies. Reviewers and editors of journals should also give greater attention to and consideration of whether authors include some forms of method to control for SDB since SDB can affect the veracity of the study’s results.