Introduction

Accountants play a critical role in our society and in our capitalist economy. The role of preparing and auditing financial statements places accountants and auditors in a position of public trust. Investors, creditors, customers, employees, and regulators rely on published accounting information to make informed economic decisions. Nichols and Wahlen (2004) confirm the strong link between accounting earnings and stock returns and note that this explains why “investors, managers, boards of directors, analysts, the financial press, auditors, securities regulators, and others place so much importance on accounting earnings” (p. 285).

But accountants have often failed to behave ethically and have sometimes betrayed the public trust in spectacular fashion, costing investors billions of dollars and shaking the foundations of our capital markets (Turner 2006; Sutton 2002). Turner (2006) recounted the string of scandals that began in 2001, resulting in a loss of confidence in the capital markets and catastrophic losses for investors. Wyatt (2004) observed that, in the 1990s, the accounting firms’ “focus on professionalism diminished, and the focus on revenue growth and increased profitability sharpened” and that “greed became a force to contend with in the accounting firms” (p. 49). Therefore, an analysis of the impact of ethics education on accounting students is timely and warranted.

Accountancy is a profession (Olazabal and Almer 2001). The American Assembly (2003), a conference of leaders from industry, academia, and government agencies, described auditors as “gatekeepers whose primary allegiance must be to the public” (p. 5). Duska and Duska (2003) noted the characteristics that qualify it as a profession, including “a standard of conduct governing the relationship of the practitioner with clients, colleagues and the public” and “an acceptance of social responsibility inherent in an occupation endowed with the public interest” (p. 66). Cheffers and Pakaluk (2005) described the professionalism of the accountant in terms of self-regulation, supplying a disinterested service, and being essentially ethical in his or her work and outlook. They noted that the “entire orientation and outlook of a professional has to be marked by a recognition of the priority, in particular, of ethical considerations over monetary benefit” (p. 92).Footnote 1

Moore et al. (2006) claim that there is an inherent conflict of interest for accountants who are supposed to provide an independent audit of the client who hires them and pays their audit fees. Zeff (2003) indicated that professional values within the accounting profession were deteriorating as early as 1980. Competitive pressures to generate revenue and the strong growth of non-audit services, especially management consulting, created conflicts of interest for the accounting firms. On the one hand the firms wanted to generate profitable growth and on the other hand, they were expected to stand up to clients when they took questionable accounting positions in their financial statements. Merino (2006) observed “business scandals at the end of both the 19th and 20th centuries….increased demand for accountants and led to calls for educators to instill in students an understanding of the profession’s moral obligations” (pp. 363–364).

The Sarbanes–Oxley Act of 2002 (SOX) was passed by the United States Congress to help reduce financial reporting abuses (Sarbanes–Oxley Act of 2002). However, rules are not sufficient for the good practice of accounting (Cheffers and Pakaluk 2005). The passage of SOX, while having some positive effect on corporate behavior, does not fully prevent questionable behavior by corporate actors. For example, Miller (2010) notes that the implementation of SOX mitigated some agency costs incurred by firms in the manufacturing sector in a post-SOX environment. On the other hand, many of the accounting scandals have occurred in spite of rules which, on paper at least, should have prevented the unethical behavior (Jennings 2004).

Given the importance of ethical behavior as it relates to reporting accounting earnings, faculty at accounting programs in colleges and universities have a strong interest in discovering the most effective ways to improve the moral reasoning capabilities of their students. In part, the reputation of accounting programs depends on the ethical behavior of their graduates.

Ethical failures in the accounting profession have caused researchers to ask a number of questions: how does the moral development of accountants and accounting students compare to those in other professions or other majors (Armstrong 1987; Ponemon 1990; Icerman et al. 1991; Jeffrey 1993; Lampe and Finn 1994; Cohen et al. 1998; Bailey et al. 2010)? Can ethics be taught to accounting students (Ponemon 1993; Cheffers and Pakaluk 2005)? What role should ethics play in the accounting curriculum (Loeb 1988; Langenderfer and Rockness 1989; Hurtt and Thomas 2008)? What types of accounting ethics education interventions are most effective (Armstrong 1993; Welton et al. 1994; Green and Weber 1997; Wilhelm and Czyzewski 2006)? This research focused on the latter question.

No previous study has tested whether virtue ethics is an effective platform for teaching accounting ethics among students. In order to provide insight into the impact that choice of ethical theory has on the effectiveness of training for accountants, this research examined whether ethics education interventions that are based on Smithian virtue ethics are effective in improving accounting students’ moral reasoning.

The remainder of this paper is organized as follows. The next section provides an outline of Kohlberg’s (1958) theory of cognitive moral development (CMD), Rest’s (1986) Four Component Model (FCM) of moral functioning, and a review of research in the area of accounting ethics education. The third section presents the research methodology, including the experimental design, the hypothesis statements, the data collection process, the data analysis methods, the accounting ethics course employed, and the sample population. The fourth section presents the results of the experiment and the fifth section discusses factors that influenced the results and implications for accounting ethics education practices.

Literature Review

Moral Functioning and the Four Component Model

Most research in ethics education in the various professions has been centered on theories of moral functioning developed by Kohlberg and Rest (Rest and Narvaez 1994; Bebeau 2002; Dellaportas 2006). Rest (1986) identified a FCM of moral functioning (see Fig. 1). Per Rest, each of these components is necessary for an individual to behave morally. The four components are moral sensitivity, moral judgment, moral motivation, and moral character (Rest et al. 1999, p. 101). According to Rest (1986) a combination of moral sensitivity, moral judgment, moral motivation, and moral character will lead to moral actions. Hence, there is a complex relationship between moral judgment and moral behavior, and improving moral judgment alone will not guarantee moral behavior. However, proper moral judgment is a necessary element for moral behavior to occur.

Fig. 1
figure 1

The relationship of Kohlberg’s stages of cognitive moral development (CMD) to Rest’s Four Component Model of moral functioning

Cognitive Moral Development

CMD theory focuses on component two of the FCM, i.e., moral judgment. CMD theory explores the process by which individuals progress from lower to higher levels of moral judgment. Kohlberg developed a seminal theory about the process of CMD, which is summarized by Green and Weber (1997). According to Kohlberg there are three levels of moral development: pre-conventional, conventional, and post-conventional (see Table 1). Research indicates that the majority of the adult population will not move past the conventional stages without a significant intervention (Dellaportas 2006).

Table 1 Kohlberg’s stages of cognitive moral development and the levels and schemas of the DIT and DIT-2

Rest (1979) built upon Kohlberg’s theory of CMD to develop an instrument to measure an individual’s cognitive moral capacity. This instrument is known as the Defining Issues Test (DIT). The DIT presents the subject with a series of ethical dilemmas, ranking a person’s preference for alternatives, known as “items of consideration” as factors in resolving the dilemmas. The items for consideration are selected to correspond to Kohlberg’s stages of moral development. The result of the test is a “P” (post-conventional) score. Higher use of principled reasoning per Kohlberg’s scale results in higher P-scores (Dellaportas 2006). The DIT has been validated and used extensively to measure moral reasoning ability in a number of professional settings (e.g., Bebeau 2002) and has been used in a large number of studies for the measurement of CMD from its inception in 1979 (e.g., Armstrong 1987; Ponemon 1992; Lampe and Finn 1992; Earley and Kelly 2004).

Rest et al. (1999) reviewed 20 years of DIT-related ethics research and performed a factor analysis on a mega-sample of DIT results (N = 45,856). They determined that there were three factors: “personal interest” factor (loading on stages 2 and 3 of Kohlberg’s CMD), “maintaining norms” factor (loading on stage 4), and “post-conventional” factor (loading on stages 5 and 6). These three factors or schemas formed the core of what Rest et al. (1999) described as a “neo-Kohlbergian” approach and are used on the new, revised DIT-2 test. Hence, under the neo-Kohlbergian scale, as individuals mature in their CMD they move sequentially from the lower “personal interest” schema to the “maintaining norms” schema and then to the higher “post-conventional” schema. The move from a lower schema to a higher schema represents an improvement in CMD. Hence, an ethics education intervention would be considered successful if it helped students move to higher levels of the neo-Kohlbergian scale, i.e., from the personal interest schema to the maintaining norms schema and/or from the maintaining norms schema to the post-conventional schema. The Kohlbergian stages and the neo-Kohlbergian schemas are illustrated in Table 1.

Cognitive Moral Development of Accounts and Students of Accountancy

Researchers have performed a number of studies to examine the ethical reasoning development of accountants and/or accounting students. These research studies provide important insights into ethical development within the accounting profession and insights into how accounting educators should approach accounting ethics education. Several studies compare the level of moral reasoning of accounting students with the moral reasoning of students from business and other majors with mixed results (St. Pierre et al. 1990; Icerman et al. 1991; Jeffrey 1993; Snodgrass and Behling 1996; Cohen et al. 1998; Bailey et al. 2010).

Bailey et al. (2010) reviewed 23 articles or working papers that report DIT P-scores of accounting students. The average P-scores ranged from 31 to 44. The median score of the studies is 38. Only 5 of the 23 studies report an average higher than 40. Bailey Scott and Thoma explained that a P-score of 40 represents the average for adults in general. They concluded “accounting students enter the workforce without having achieved the moral reasoning level of their peers in other disciplines” (p. 12). However, the literature provides a mixed picture of the CMD of accounting students relative to other majors. But the meta-analysis by Bailey et al. (2010) as well as a study by Ponemon and Glazer (1990) lends credence to the idea that improvement in accounting ethics education is desirable.

Accounting Ethics Education Interventions

A number of studies examined the impact of accounting ethics interventions on students of accountancy (Hiltebeitel and Jones 1991, 1992; Armstrong 1993; Ponemon 1993; Welton et al. 1994; Lampe 1996; Green and Weber 1997; Earley and Kelly 2004; Dellaportas 2006; Wilhelm and Czyzewski 2006; Mayhew and Murphy 2009; Welton and Guffey 2009). Studies on the effectiveness of accounting ethics education interventions have used various measuring tools and various types of interventions. The studies indicate that accounting ethics educational interventions for accountants have achieved mixed but generally modestly positive results. Most of these studies are based on whether subjects had been exposed to an ethics intervention, or whether the intervention is delivered as a component of an accounting class or as a stand-alone class. A few have studied the effectiveness of specific types of interventions (e.g., Welton et al. 1994; Green and Weber 1997; McCarthy 1997; Ponemon 1993).

Ponemon (1993) examined ethics interventions that were included in a one-semester auditing course of undergraduate and graduate students. The ethics training was based on the framework suggested by Langenderfer and Rockness (1989). Ponemon administered the DIT before and after the training to measure the impact on students’ moral reasoning. No significant improvement was found. In addition, in order to test students’ ethical behavior, the author observed the students in a “free-rider” experiment based on a game theory known as the prisoner’s dilemma.Footnote 2

Earley and Kelly (2004) described an educational intervention that was adopted in an undergraduate auditing course. Students’ moral reasoning was assessed at the beginning and at the end of the course to see if the interventions improved the students’ scores. The authors found that the results of the accounting ethics education intervention were not significantly altered by the intervening event of the Enron scandal.

The literature suggests that accounting ethics education interventions may improve the moral reasoning of accounting students, although the results are mixed. Studies have tested the effects of certain types of interventions, but have not tested the impact that choice of ethical theory can have on the effectiveness of accounting ethics interventions.

Accounting Ethics Education and Ethical Theory

Accounting ethics education interventions can be based on various ethical theories and perspectives, such as natural law, virtue ethics, deontology, utilitarianism, and corporate social responsibility. Some accounting ethics education interventions are based on a “mixed” model, such as those in Texas as required by the State Board of Accountancy (Hurtt and Thomas 2008). But are all of these theories and perspectives equally appropriate and effective for teaching ethics to students of accountancy?

There is significant conflict and confusion about ethical theories. Toulmin (1981) discussed this problem as he related his experience with a national commission made up of scientists, lawyers, and theologians. The commission members, experts in their respective fields, were able to come to a consensus on detailed ethical recommendations most of the time, but were often in strong disagreement about the underlying principles. There was a large degree of agreement about ethical intuitions, but wide disagreement about underlying ethical principles. Rest et al. (1999) noted, “[p]hilosophers over the centuries have proposed many visions for a society based on moral ideals. Different ideals have been proposed (e.g., utilitarian, social contract, virtue based, feminist, casuist, religious ideals), and they have many different starting points, elements, and assumptions” (p. 41).

Several authors have called for accounting ethics education to be based on virtue ethics (Mintz 1995; Jennings 2004; Mele 2005; Cheffers and Pakaluk 2005), however, no studies have tested the effectiveness of such virtue-based ethics interventions using the various measuring tools available (e.g., DIT-2).

Virtue Ethics

Hanley (2009) describes teleological approaches to ethics that deal with the “natural and intended end of human beings” (p. 183). Virtue ethics is concerned with identifying characteristics that are in harmony with the purpose of a human being (e.g., Aristotle 1976). Cheffers and Pakaluk (2005) wrote that virtue ethics “considers what good traits of character a person should have, in order to be a good human being. It studies and classifies these traits, and then it regards an action as good or bad, depending upon whether it is the sort of action that would be done by someone having those good traits” (p. 66). Hanley noted, “[v]irtue ethicists focus on evaluating characters and the specific virtues and vices that contribute to the composition of good and bad characters” (p. 53). He contrasted virtue ethics with the other principle ethical theories, consequentialism and deontology. Pincoffs (1986) defined virtues and vices as “dispositional properties that provide grounds for preference or for avoidance of persons” (p. 79). McCloskey (2008) described virtue ethics as the oldest stream of ethical thought. The ancient Greeks focused on traits of character as the subject of ethics (Pence 1993). Plato (1955) and Aristotle (1976) developed and analyzed the cardinal virtues of courage, temperance, wisdom, and justice. In the thirteen century, Aquinas (1274/1952) synthesized Aristotelian and Christian thought. He added the “theological” virtues of faith, hope, and love to the cardinal virtues. McCloskey described Adam Smith as “the last of the former virtue ethicists,” noting that virtue ethics fell out of favor in the late eighteenth century with the emergence of Kantianism and utilitarianism. However, virtue ethics has re-emerged in academic circles in the mid-twentieth century, with the work of Anscombe (1958) and MacIntyre (2007) (see also Pence 1993; McCloskey 2008).

Virtue ethics has been suggested as an appropriate ethic theory framework for education and practice in many professional fields, including business (Mintz 1996), engineering (Harris Jr 2008), social work (Pullen-Sansfacon 2010), youth work (Bessant 2009), sports coaching (Hardman et al. 2010), teaching (Higgins 2010), nursing (Vanlaere and Gastmans 2007), psychiatry (Radden and Sadler 2008), and the military (Wortel and Bosch 2011).

Virtue Ethics and the Accounting Profession

Mintz (1995), Jennings (2004), Mele (2005), and Cheffers and Pakaluk (2005) recommended virtue ethics as most appropriate for accounting and recommended that virtue ethics be the basis for accounting ethics education. Mintz (1995) described virtue theory based on the work of Pincoffs (1986) and how it relates to the field of accounting. Virtues are dispositional properties that provide grounds for preference for (or avoidance of) persons (Pincoffs 1986). Mintz indicated that accounting professionals should have both technical and moral expertise. Technical rules can never be a complete guide in all the situations that will face an accountant. Ethical conflict occurs when duties toward one group are inconsistent with their duties to another group or their own self-interest.

Cheffers and Pakaluk (2005) discussed several ethical frameworks: consequentialism, Kantianism, and virtue ethics, recommending virtue ethics as the most suitable ethical framework for accounting. They described a virtue as a characteristic that enables a person to carry out a task well. The virtues of an accountant are those traits that enable the practitioner to carry out the unique tasks of accountancy with excellence; such virtues include independence, due diligence, sense of public interest, objectivity, and integrity. Cheffers and Pakaluk indicated that these virtues provide the idealizations or principles that should guide the ethical reasoning and conduct of the accountant. They noted that the professional standards of accounting themselves appeal to these “virtues” as standards for correct action.

Although researchers have recommended virtue ethics as the appropriate ethical framework for pedagogy and practice in accountancy, the impact of accounting ethics education interventions that are based on a virtue ethics framework has not been tested. Other researchers have suggested that the moral philosophy of Adam Smith may be particularly germane to modern commercial society (Halteman 2003; Hanley 2009) and to the accounting profession (Keller 2007).

The Virtue Ethics of Adam Smith

Adam Smith (b.1723–d.1790) in “The Theory of Moral Sentiments” (TMSs; Smith 1790/1976) provided an important ethical framework. Smith is well known to most students of business as the “father of the capitalistic free-market economy” (Duska and Duska 2003, p. 67). His most famous work is “An Inquiry into the Nature and Causes of the Wealth of Nations” (WN) (Smith 1776/1952) which expounded upon the workings of free-market economies and is the first great treatise on the capitalist economic system. However, Smith’s first important work, the one which first made him famous, is “The Theory of Moral Sentiments” (TMS; Smith 1790/1976). Today TMS is largely neglected and little understood (Werhune 2000; Stovall et al. 2004). McCloskey (2008) noted that Smith’s moral philosophy is based on virtue ethics, a strain of ethical thought that mysteriously disappeared from academic circles around the end of the eighteenth century and reappeared in the middle of the twentieth century. Hanley (2009) described Smith’s moral philosophy as virtue ethics in which Smith conceived of the “discrete virtues as targeted responses to the various discrete challenges posed by commercial corruption” (p. 93). Smith’s ethics framework, based on reason, natural law, and virtue ethics, may prove valuable for training accounting students (Table 2).

Table 2 Comparison of DIT-2 schemas to key virtues from theory of moral sentiments (TMS)

Research Method

This research explored the question of whether an accounting ethics education intervention that is based on Smithian virtue ethics will significantly improve the moral reasoning of undergraduate accounting students. In order to test the research hypotheses, a repeated-measures experimental design was employed. The quasi-experimental design involved a pre-test, a treatment, and a post-test, and the use of an experimental group and a control group. The experimental group consisted of accounting students who attended a stand-alone three credit-hour course in accounting ethics at a private university in the US Midwest. The content of the course was based on the moral philosophy of Adam Smith as presented in TMS (1790/1976). The course emphasized the Smithian concepts of sympathy, the impartial spectator, and also the key virtues of prudence, justice, benevolence, and self-command. The course also involved analysis and discussion of selected accounting cases. These cases provided students the opportunity to apply Smithian ethics to ethical dilemmas in an accounting context. The control group consisted of accounting students at the same university who did not attend the accounting ethics course.

Researchers have used a variety of research designs to assess the effectiveness of ethics education interventions. Rest (1986) examined a set of 55 education intervention studies involving use of the DIT, noting that nine studies used a classical pre-test–post-test experimental design with random assignment of subjects to the experimental and control groups. There have been 28 studies, which used a quasi-experimental pre-test–post-test design without random assignment of subjects to the experimental and control groups. Eighteen studies either had a post-test only or no control group. Campbell and Stanley (1963) recommended the classical experimental design as ideal, but observed that random assignment of subjects is often impossible in a live academic setting. They indicated that a quasi-experimental pre-test–post-test design with an experimental group and a control group addresses the primary threats to internal validity of history, maturation, testing, and instrumentation.

This research used the DIT-2 instrument developed by Rest et al. (1999). This instrument was designed to measure an individual’s CMD using ethical scenarios of a general nature. The instrument produced, among other data a “personal interest” score, a “maintaining norms”, and a “post-conventional score.” CMD was measured using the personal interest score and the post-conventional score (or “P-score”). A reduction in the personal interest score represented improvement in CMD (i.e., subject relies to a lesser degree on a less developed mode of moral reasoning) and an increase in the post-conventional score represented improvement in CMD (i.e., subject increases his/her reliance on the higher levels of moral reasoning).

Hence the two research questions were (1) whether an accounting ethics education intervention based on Smithian virtue ethics significantly decreases accounting students’ personal interest scores on the DIT-2, and (2) whether an accounting ethics education intervention based on Smithian virtue ethics significantly increases accounting students’ post-conventional scores on the DIT-2. The null hypothesis statements to test each of these questions are as follows:

Ho 1

There are no significant decreases in reliance on personal interest factors in moral decision-making, as measured by the personal interest subscale of the DIT-2, between students who completed the accounting ethics intervention based on Smithian virtue ethics and those who did not complete the intervention.

Ho 2

There are no significant increases in reliance on post-conventional factors in moral decision-making, as measured by the post-conventional subscale of the DIT-2, between students who completed the accounting ethics intervention based on Smithian virtue ethics and those who did not complete the intervention.

Based on the results of the tests of hypotheses Ho1 and Ho2, the research was extended to include the DIT-2 N2 index, which is a summary index of an individuals’ CMD which measures the extent to which the subject prioritizes post-conventional factors and the extent to which the subject rates personal interest factors lower than post-conventional factors. The additional hypothesis statement is as follows:

Ho 3

There are no significant increases in moral decision-making, as measured by the N2 subscale of the DIT-2, between students who completed the accounting ethics intervention based on Smithian virtue ethics and those who did not complete the intervention.

The statistical significance of the research was calculated using an analysis of gain scores (post-test minus pre-test) and an analysis of covariance (ANCOVA) with pre-test scores as the covariate, as suggested by Campbell and Stanley (1963) and Wright (2006). Finally, the effectiveness of the intervention was also measured by calculating the effect size.

The Course

The experimental group received a treatment in the form of an accounting ethics education intervention. The 5-weekFootnote 3 accounting ethics course was based the moral philosophy of Adam Smith as presented in his work “The Theory of Moral Sentiments” (Smith 1790/1976) and utilized accounting case studies.

The accounting ethics course was delivered in an online format via Blackboard® software.Footnote 4 Online delivery helped to ensure the consistency of the material presented to the course participants (Schonfeld 2005).

The students were assigned readings from the original text of the sixth edition of Smith’s TMS (1790/1976). The readings from TMS were organized into the following topics: (1) why be ethical? (2) Sympathy. (3) The impartial spectator. (4) Prudence. (5) Justice. (6) Benevolence. (7) Self-command. The students were also required to read key portions of the AICPA Code of Professional Conduct. Strong readers and visual learners were most likely to experience the most benefit from readings in an online course (Schonfeld 2005).

Recorded lectures on the readings were included in the accounting ethics course. This element was included both to ensure that Smith’s ideas were presented in a manner that is accessible to twenty-first century students and also to reinforce the ideas the students encountered in the original text. The language used in TMS is typical of writings of the British Enlightenment in the eighteenth century. This was probably a different writing style than most of the students were used to reading. The recorded lectures were also included to ensure that non-visual learners are not disadvantaged by the online delivery of the course (Schonfeld 2005).

In accordance with research on effective ethics education interventions generally, and accounting ethics education interventions specifically, the course included analysis and discussion of accounting-related cases (Rest 1986; Mintz 1995; Huff and Frey 2005; Dellaportas 2006). Interaction among the students and between the students and the instructor was achieved by requiring student participation in online threaded discussions. For each threaded discussion topic, students were required to contribute a minimum of one high-quality direct response to the instructor’s question and two high-quality responses to the comments of other students.

In accordance with research on effective ethics education interventions, the course required students to write reflection papers. These papers provided the opportunity for students to think deeply about ethical principles and how they apply to the students’ professional and personal lives (Rest 1986; Mintz 2006; Van Hise and Massey 2010; Pullen-Sansfacon 2010).

Results

Data Collection

The experiment was conducted at a mid-sized private university in the US Midwest in a live academic setting. The subjects were non-traditionalFootnote 5 undergraduate accounting students. The university uses a cohort system for the adult degree completion program. Each cohort consisted of between 9 and 13 students. Students in the cohorts that were used in the experimental group and in the control group were all juniors and seniors (Table 3).

Table 3 Subjects and sample sizes

Descriptive Statistics

The demographics of the experimental group and control group are presented in Table 4. The experimental and control groups are similar in terms of average age, gender (predominantly female), and race (predominantly Caucasian) and political orientation.

Table 4 Descriptive statistics—experimental and control groups—demographics

Table 5 presents descriptive statistics for the pre-test scores for both the experimental and control groups. Pre-test scores for each of the indices for both instruments appear to be similar for the experimental and control groups with the possible exception of the DIT-2 maintaining norms score (average of 45.71 vs. 41.52), the DIT-2 post-conventional score (average of 26.94 vs. 31.27), and the DIT-2 N2 score (average of 26.94 vs. 31.84).

Table 5 Descriptive statistics—pre-test scores on key indices

Dimitrov and Rumrill (2003) and Wright (2006) recommended that, for pre-test-treatment-post-test quasi-experimental research designs, with subjects that are not randomly assigned to the experimental and control groups, it can be useful to report both ANCOVA (with pre-test score as the covariate) and analysis of gain scores. The ANCOVA provides insight into whether the average gain, after partialling out the pre-score is significantly different for the experimental and control groups. Analysis of gain scores addresses the question of the whether the average gain score is significantly different between the experimental and control groups (Wright 2006). Accordingly, this research employed two types of tests: tests that used post-test scores as the dependent variable and tests that used gain scores (post-test minus pre-test) as the dependent variable. ANCOVA was used to compare the post-test scores of the experimental group to the control group. Paired samples t tests examined the gain scores for the experimental group and for the control group. Then t tests were performed to compare the average gain scores of the experimental group to those of the control group.

Analysis of Covariance

Mertler and Vannatta (2010) explained “analysis of covariance is an extension of analysis of variance where main effects and interactions are assessed after the effects of another concomitant variable have been removed” (pp. 93–94). In the analysis of DIT-2 results, the dependent variable was the post-test score, the independent variable (between-subjects) factor was group (experimental vs. control), and the covariate was the pre-test score. Levene’s test assesses the hypothesis that the samples come from populations with different variances (Mertler and Vannatta 2010). For the covariance test of each index (personal interest, maintaining norms, post-conventional, and N2 scores), the assumption of equality of error variances was tested using Levene’s test, and assumption of homogeneity of regression slopes was tested calculating an F-statistic for the interaction between the independent variable, group and the covariate, pre-test. None of the results were significant, indicating that the assumptions were not violated.

The key F-statistic for assessing the effect of the treatment is the Group statistic and the related statistical significance p factor. The ANCOVA tests indicate that the strongest difference between the experimental group and the control group was in the personal interest score (F = 2.074, p = .154) and in the N2 score (F = 2.166, p = .145). The differences between the groups for the maintaining norms score (F = .276, p = .601) and the post-conventional score (F = .673, p = .415) were clearly insignificant. None of the results achieved statistical significance at an α level of .05. Based on the ANCOVA with pre-test scores as a covariate, null hypotheses Ho1, Ho2, and Ho3 are not rejected.

Paired Samples t Tests

Aron et al. (2008) recommended the paired samples t test as a hypothesis testing procedure that can be used when each subject in the sample is measured twice. SPSS was used to calculate the gain scores (post-test minus pre-test) for each index for the experimental group and for the control group. The paired samples t test uses the t-scores from the t-distribution to determine if the change score for the sample is statistically significantly different than zero, i.e., there was a significant change.

As shown in Table 6 the paired samples t tests showed several statistically significant results with an α level of .05. In the experimental group, there was a significant reduction in personal interest score in the experimental group (t = −3.77, p < .000). There was also a significant increase in maintaining norms score in the experimental group (t = 2.657, p = .011). Finally, there was a significant increase in N2 score in the experimental group (t = 3.876, p < .000). Most of the changes in the control group were statistically insignificant. However, interestingly, the change in the maintaining norms score for the control group was statistically significant (t = 2.691, p = .011). It is possible that this change represents a reversion to the mean for this population. It should be noted that the post-test average maintaining norms score for the control group of 46.85 is similar to the pre-test average maintaining norms score for the experimental group of 45.71. Based on paired samples t tests, null hypothesis Ho2 is not rejected. However, based on these tests, null hypotheses Ho1 and Ho3 are rejected.

Table 6 Paired samples t test—personal interest, maintaining norms, post-conventional, and N2 scores—experimental group and control group

Independent Samples t Tests on Gain Scores

Independent samples t tests use t scores from the t-distribution to determine if there is a statistically significant difference between the means of two populations (Aron et al. 2008). For purposes of assessing the DIT-2 results in this experiment, the sample means were the average change score for the experimental group and for the control group (Table 7).

Table 7 Independent samples t test on gain scores—DIT-2 personal interest, maintaining norms, post-conventional, and N2 scores—experimental versus control group

The change (reduction) to the personal interest score was greater for the experimental group than for the control group. The difference was not statistically significant with an α level of .05 level using two-tailed tests (t = 1.772, p = .080).Footnote 6 The maintaining norms scores increased by similar amounts for both the experimental and control groups (5.06 vs. 5.33). However, it should be noted that the post-test maintaining norms score for the experimental group was higher than the post-test maintaining norms score for the control group. The post-test maintaining norms score for the control group was similar to the pre-test maintaining norms score for the experimental group. The change in the control group’s maintaining norms score between the pre-test and the post-test could be attributed to a reversion to the mean for the population of non-traditional accounting students at the university. The post-conventional score improved slightly for the experimental group and declined for the control group. The difference in the change scores was not statistically significant (t = 1.479, p = .143). The N2 score increased by 5.98 for the experimental group and had very little change for the control group (+.50). The difference in the change scores between the experimental group and the control group was statistically significant at an α level of .05 (t = 2.06, p = .043). Based on the t tests of gain scores, null hypothesis Ho2 was not rejected. However, based on a one-tailed t test of gain scores Ho1 was rejected. Null hypothesis Ho3 was also rejected based on one-tailed and two-tailed t tests of gain scores.

Summary of DIT-2 Results

As mentioned previously in this chapter, Dimitrov and Rumrill (2003) and Wright (2006) recommended that, with a quasi-experimental research design such as the one used in this research, it can be useful to report both ANCOVA (with pre-test score as the covariate) and analysis of gain scores. The two types of analysis address two related but different questions. The ANCOVA examines whether the average gain, after partialling out the pre-test score is significantly different for the experimental and control groups. The analysis of gain scores examines whether the average gain score is different for the two groups.

The null hypothesis Ho1 for the DIT-2 personal interest score is partially rejected. The hypothesis is not rejected by the results of the ANCOVA, but is rejected by the results of the paired samples t test for the experimental group and by results of the independent samples t test on gain scores. For the DIT-2 personal interest index there was a statistically significant gain score in the experimental group (post-test vs. pre-test) and the difference in the average gain score between the experimental and control groups for the personal interest index was also statistically significant.

The null hypothesis Ho2 for the DIT-2 post-conventional score is not rejected.

The null hypothesis Ho3 for the DIT-2 N2 score is partially rejected. The null hypothesis is not rejected by the results of the ANCOVA, but is rejected by the results of the independent samples t test and by results of the independent samples t test on gain scores. For the DIT-2 N2 index, there was a statistically significant gain score in the experimental group (post-test vs. pre-test) and the difference in the average gain score between the experimental and control groups was also statistically significant.

The intervention did not have a statistically significant impact on the subjects’ post-conventional moral reasoning, but achieved mixed results on the subjects’ DIT-2 personal interest scores and on subjects’ DIT-2 N2 scores.

Calculation of Effect Sizes

Aron et al. (2008) indicate that it is often desirable to know not only whether a result is statistically significant, but also how large the effect is. Researchers are increasingly including reports of effect size in their work as an indication of the practical significance of their findings (see also, Urdan 2010). The effect size is a standardized measure of the effect of an experimental treatment, such as the standardized difference between two means. Cohen’s d is an effect size measure that is used in connection comparisons of population means, such as a t test (Cohen 1988). Cohen (1988) established the following effect size conventions for mean differences: an effect size of 0.20 is small, an effect size of .50 is medium, and an effect size of .80 is large.Footnote 7 In this research, Cohen’s d was calculated for those tests where statistical significance had been established by hypothesis testing. Accordingly, Cohen’s d was calculated for null hypothesis Ho1, relating to the reduction of DIT-2 personal interest scores and for null hypothesis Ho3, relating to the increase of DIT-2 N2 scores.

The effect size calculations for the reduction of DIT-2 personal interest scores found small–medium effect sizes. Cohen’s d for the paired samples t test on the experimental group gain score was −0.57 (medium effect size) and Cohen’s d for the independent samples t test comparing the gain scores of the experimental group to the control group was −0.40 (small–medium effect size).

The effect size calculations for the increase in DIT N2 scores found medium effect sizes. Cohen’s d for the paired samples t test on the experimental group gain score was 0.55 (medium effect size) and Cohen’s d for the independent samples t test comparing the gain scores of the experimental group to the control group was 0.46 (medium effect size).

Discussion

This research provides some evidence that the accounting ethics education intervention was associated with a beneficial impact on subjects’ pre-conventional scores, and on their N2 scores, although there was little evidence of a positive impact on subjects’ post-conventional moral reasoning. This research did provide a number of potentially valuable insights. First, the research highlighted some important characteristics of the CMD of a significant constituency of a number of private colleges and universities—i.e., non-traditional accounting students. Second, this research has identified some potential benefits of Smith’s (1790/1976) concepts of ethics, in particular the sympathy principle and the impartial spectator, for accounting ethics education. Third, this research reinforces the importance for accounting ethics research to examine more than just the post-conventional score when using the DIT-2. Fourth, this research provides insights into the differential impact of accounting ethics education interventions on general moral reasoning versus accounting context-specific moral reasoning.

Characteristics of the Sample Studied

In examining the moral reasoning of the population using the neo-Kohlbergian schemas, this experiment suggests that it may be helpful to understand, in advance of delivering an accounting ethics education intervention, the specific characteristics of the target population of an ethics education intervention in order to tailor the intervention to the audience. As shown in Figs. 23, one of the most striking aspects of the scores of the subjects used in the experimental and control groups is the strong predilection of these groups for maintaining norms (stage 4) thinking.

Fig. 2
figure 2

DIT-2 norm versus DIT-2 pre-test and post-test means—experimental group

Fig. 3
figure 3

DIT-2 norm versus DIT-2 pre-test and post-test means—control group

Subjects in this study had a very high predilection for the maintaining norms level of moral reasoning. The experimental group’s average maintaining norms score was 45.71 on the DIT-2 pre-test and 50.78 on the post-test. The control group had similarly high maintaining norms with averages of 41.52 for the DIT-2 pre-test, 46.85 for the DIT-2 post-test. Bebeau and Thoma (2003) provide DIT-2 norms for individuals that are juniors in college. The average maintaining norms score was 32.93 with a sample size of 1333 and standard deviation of 13.59 (p. 35). It is apparent that the maintaining norms orientation of the sample studied in this research is much higher than for juniors in college generally. There are several possible explanations for this. First, the subjects in this study are majoring in accounting. Second, the subjects in this study are working adults as opposed to traditional students. Third, subjects in this study may have a strong religious orthodoxy due to their personal and educational backgrounds. One or several of these factors in combination may help explain the strong maintaining norms orientation of the population examined in this study.

Accountants and Maintaining Norms

The subjects in this research were all accounting majors at a mid-sized private university. A question raised by this research is whether accounting students are predisposed to adopt maintaining norms moral reasoning and resistant to adopt post-conventional moral reasoning. The accounting profession may attract maintaining norms-oriented individuals and training in accounting may foster black-and-white, rules-based thinking that is conducive to maintaining norms moral reasoning. Louwers et al. (1997) suggest, “the public may expect lower levels of moral reasoning, specifically stage four, for accountants as members of a rule-based profession” (p. 210). The American Assembly (2003) in surveying the state of the accounting profession observed that independent auditors had reduced themselves to the role of rule-checkers. The literature presents evidence that the accounting discipline itself fosters rules-based thinking that may inhibit the development of post-conventional moral reasoning. The intervention featured in the present research did contain material and discussion of ethical principles versus rules. However, this topic was not emphasized as highly as the topics of sympathy and the impartial spectator, which are more central to Smith’s (1790/1976) moral theory. In the present research, the experimental treatment failed to achieve a statistically significant improvement in subjects’ post-conventional moral reasoning as measured by the DIT-2. This highlights the special challenges of accounting ethics education.

One of the key outcomes of this research is the discovery of evidence that accounting students, studying a rules-oriented discipline, may gravitate to maintaining norms moral reasoning and this orientation may inhibit them from progressing to post-conventional (principles-based) moral reasoning. In order to overcome the students’ bias toward maintaining norms moral reasoning, educators may need to use stronger emphasis and targeted approaches to emphasize the importance of principles-based moral reasoning (Armstrong 1987; Lampe and Finn 1992; Dellaportas 2006; Thomas 2012).

Adult Learners and Maintaining Norms

The subjects in this research were adult learners at a mid-sized private university with an average age of 36.5 years for the experimental group and 34.2 years for the control group. This research raises the question of whether mature, experienced working adults are receptive to developing post-conventional moral reasoning. Loomis (2009) found that the traditional students at three Christian institutions of higher education had significantly higher DIT-2 P-scores than non-traditional students. Crain (1985) notes adults in the US generally tend to reach stage 3 (personal interest) or 4 (maintaining norms). Rest and Narvaez (1994) referred to a 10-year longitudinal study of DIT results by Rest (1986). “The general trend is that as long as subjects continue in formal education, their DIT scores tend to gain; when subjects stop their formal education, then their DIT scores plateau” (p. 15). This raises interesting questions about adult learners who interrupt their formal education before completing an undergraduate degree and then return to a degree completion program later in their life. If their CMD plateaued when they left formal education, will their CMD progress when they resume their formal education in mid-life, or will they be resistant to the development of post-conventional moral reasoning? Other researchers have noted that most adults, without significant intervention, plateau in their CMD at level 3 or 4 (Crain 1985; Rest 1986; Dellaportas 2006). The high maintaining norms scores of the subjects both before and after the experimental treatment are indicative of their orientation toward the established rules of behavior. This research suggests that these non-traditional students may have plateaued in their moral development at stage 4 (maintaining norms) and that they may be resistant to interventions that seek to increase their post-conventional moral reasoning. This finding is somewhat mitigated by the statistically significant improvement that the subjects of this research experienced in their N2 scores, which indicates that, in their moral reasoning they were prioritizing post-conventional factors over personal interest factors more consistently following the experimental treatment.

Recommendations for Accounting Ethics Education

This research provides some insights about the practice of teaching ethics to accountants. The results of the experiment provide evidence of the potential benefits of Smithian ethics in accounting ethics education interventions. It seems highly desirable to reduce accounting students’ reliance on personal interest factors in making moral decisions. One could argue that many accounting scandals that actually occur might be avoided if the individuals involved had a greater respect and appreciation for the rules of the accounting profession (Jennings 2004). However, this research also provides evidence that Smithian virtue ethics may not provide a complete answer for the accounting ethics educator, if a key goal is to foster greater post-conventional moral reasoning in accounting students.

This experiment provides some evidence that Smithian virtue ethics, particularly the concepts of sympathy and the impartial spectator, may have positively impacted (i.e., reduced) the subjects’ personal interest moral reasoning. Additionally, improvements in subjects’ N2 scores indicate that subjects were more consistently assigning higher ratings to post-conventional factors versus personal interest factors (Rest et al. 1997). This suggests that the seminal concepts of the sympathy and the impartial spectator may be valuable components for accounting ethics education.

To summarize, exposure of accounting students to Smithian concepts of the sympathy principle and the impartial spectator can be a key component of an effective accounting ethics education intervention. However, these concepts should be a piece of an accounting ethics education intervention that also has other components that are proven to be effective in increasing accounting students’ post-conventional moral reasoning.

Limitations

One limitation of this research is its quasi-experimental design. Because the research was performed in a live academic setting, random assignment of subjects to the experimental and control groups was not possible. Measures were taken to minimize confounding factors. Also, there is no a priori reason to suppose that there are any fundamental differences between subjects in the two groups. Nonetheless, random assignment of subjects to the experimental and control groups would have been more ideal. The timing of the pre-test, experimental treatment, and post-test for part of the experimental group versus rest of the experimental group plus control group presents a limitation due to a possible history effect. However, based on Earley and Kelly (2004) and because the timing difference was not great (only a few weeks) the potential history effect is likely to be minimal.

This research involved non-traditional students. These students are older and more experienced than traditional college students, which can impact their level of moral development (Rest 1986; Rest and Narvaez 1994; Rest et al. 1999). Caution should be used in inferring that the experimental treatment used in this research would have the same impact on traditional students.

This research was done at a private, religiously oriented university. The impact of the experimental treatment used in this research may be different at a public university or at a non-religious private university.

This research was conducted at a university in the US Midwest. Research conducted in another region of the US might find different population characteristics and lead to different results.

Another limitation of this study is sample sizes. Although the sample sizes were adequate for the statistical tests performed, larger sample sizes might have provided clearer statistical results.

The DIT-2 instrument focuses on a subject’s CMD. This is only one of the four components of Rest’s FCM of moral functioning. According to Rest (1986), a combination the four components will lead to moral actions. Hence, improving moral judgment alone will not guarantee moral behavior.

The experimental treatment was an online accounting ethics intervention. The treatment might have had a different effect if it had been delivered in a traditional classroom setting.

The duration of the online accounting ethics education intervention was 5 weeks. The impact of the intervention might have been different if it had lasted longer. A migration from a strong maintaining norms orientation to a post-conventional orientation might require more time.

Suggested Future Research

As a result of this research, it appears that further research in accounting ethics education is needed.

Because this research involved non-traditional students at a Christian university, the following questions require further investigation. What impact would this accounting ethics education intervention have on traditional students? What impact would this accounting ethics education intervention have on students at a secular institution?

This research dealt with only one component of Rest’s (1986) FCM. This study could be extended to address other components of Rest’s FCM, such as moral sensitivity and moral motivation.

A key area for further research is examination of other approaches that will focus on improvement of students’ post-conventional moral reasoning. What are some ideas that could be incorporated into an accounting ethics education intervention, which, in combination with Smith’s (1790/1976) sympathy principle and impartial spectator, could positively impact students’ post-conventional moral reasoning? What kind of post-conventional moral reasoning should be promoted? Which principles should be emphasized? (See Rest et al. 1999). Some possible candidates might include the Kantian categorical imperative (Kant 1785/1952), another set of virtues, such as Aristotelian virtues (Aristotle 1976; Mintz 1996), the virtues of Aquinas (1274/1952), virtues specific to accounting (Cheffers and Pakaluk 2005; Libby and Thorne 2007), or the Christian golden rule (Knudson 1943).

Conclusion

This research provides insight into the impact of an accounting ethics education intervention that is based on the moral philosophy of Adam Smith. The results provide partial support for the hypothesis that such an intervention can positively impact accounting students’ personal interest and N2 scores, but does not support the hypothesis that such an intervention will significantly impact their post-conventional moral reasoning. This is the first study to research the specific question of the effect on accounting students of an accounting ethics education intervention that is based on the moral philosophy of Adam Smith and the first study to empirically test the impact of a virtue ethics-based accounting ethics education intervention on accounting students. It is also one of the few studies that examined all three of the neo-Kohlbergian schemas (personal interest, maintaining norms, and post-conventional) in an effort to more completely understand the students’ moral development.

A key contribution of Smithian ethics is the concept of the impartial spectator (Raphael 2007). This study provides some support for the theory that Smith’s sympathy concept and the concept of impartial spectator may help improve (reduce) accounting students’ reliance on personal interest factors as they reason about ethical dilemmas. Further study is needed to explore the impact of this accounting ethics education intervention on the other components of the FCM of moral functioning. Further research is also needed to determine approaches that will positively impact students’ post-conventional moral reasoning.