Introduction

Whether or not the process of grading performance assessments is fair is a common concern among university students: Do my professors actually consider the effort I put into this? What is the reasoning behind the grade for my last assignment? Issues of assessment have received considerable attention in the higher education literature. Frequent topics are the shift toward more learner-centered methods (for a recent review, see Pereira et al. 2015) and ways to ensure reliability and validity of grading practices (Dawson 2015; Chen et al. 2016; Bloxham et al. 2015). The significance students attach to a fair assessment process was demonstrated in a recent study by Burger and Gross (2016). They found that students who perceive grading procedures to be more fair are less likely to have dropout intentions. The implications for student retention give special importance to questions regarding the formation of justice evaluations in higher education. The present study approaches this issue from an institutional perspective: How are individual-level justice evaluations affected by institution-level characteristics of university departments? The insights thus gathered can offer guidance for policy measures targeted at a reduction of student feelings of unjust treatment, which in turn could increase retention rates.

The basic theoretical assumptions of this study are derived from the justice climate approach. This line of organizational justice research states that justice evaluations cannot be fully understood if they are treated as individual-level phenomena only. Rather, one has to consider that individual experiences and thus justice evaluations are always embedded in a specific social and institutional context (Naumann and Bennett 2000; Mossholder et al. 1998; Whitman et al. 2012). In higher education, university departments provide such a context. Different departments do not simply differ in the subject matter they teach to their students. Rather, they each represent a distinct academic environment characterized by specific approaches to teaching, organization of curricula, and methods of assessing student performance (Ramsden 1979; Entwistle and Tait 1990; Neumann 2001). As a result, student experiences vary greatly between departments, which is evident in a number of important outcomes. A large body of the literature has shown that the academic environment influences perceptions of the assessment process (Pereira et al. 2016; Flores et al. 2015; Parpala et al. 2010; Sun and Richardson 2016), academic achievement (Simpson 2015; Godor 2016; Brint et al. 2012), as well as student–faculty interactions (Kim and Sax 2014; Cuseo 2007; Severiens and Schmidt 2009).

This raises the question whether the department-specific environment manifests itself in department-specific justice evaluations as well. The idea of justice climates in university departments is supported by a descriptive study by Burger and Gross (2014). They report substantial differences in the average justice evaluations between departments of the same university. This hints at the possibility that the general conditions in some academic environments can be more favorable than in others if the goal is to meet students’ justice-related expectations.

This article aims to identify contextual conditions that elicit fairness-related responses and thus lead to the emergence of department-specific justice climates. Multilevel mixed models are used to estimate the effects of department-level predictors on individual-level outcomes in a German university. The focus is on two elements of the academic environment that are central to student experiences: the method of assessing student performance and the method of instruction. Both are subject to significant interdepartmental variation. In the humanities and social sciences, student performance is commonly assessed by means of essay assignments. STEM fields, on the other hand, show a preference for examinations such as multiple-choice questionnaires (Neumann et al. 2002; Simpson 2015). Likewise, instruction in some programs primarily takes place in large-scale lectures while it is more seminar-based in others.

Department-specific combinations of these institutional characteristics give a unique profile to the assessment process as well as to the way students interact with faculty. As will be argued later on, the conditions thus created can place students in some departments in a more advantageous position when it comes to meeting their justice-related expectations. This is expected to be reflected in student evaluations regarding the fairness of the assessment process.

To provide a more nuanced picture, we also consider that the relationship between academic environment and justice perceptions is not necessarily deterministic. Students are left some leverage when faced with an environment that runs counter to justice-related expectations. It can be argued that how well students make use of these opportunities depends on how well they can adapt to and cope with the affordances of higher education—matters in which students from low socioeconomic status (SES) families were frequently found to have greater problems than their peers from more affluent backgrounds (Tinto 1993; Pascarella et al. 2004; Ostrove and Long 2007; Rubin 2012). Therefore, special attention will be devoted to the situation of low-SES students.

Justice evaluations

The theoretical framework of this study is built on theories of organizational justice. Research in this field is concerned with the question of how individuals evaluate the fairness of the allocation of various resources (Greenberg 1990; Greenberg and Colquitt 2005). In this study, the focus is on the procedures used by faculty to assign grades for assessments of student performance. We distinguish procedural justice (Thibaut and Walker 1975; Leventhal 1980) and informational justice (Greenberg 1993). Procedural justice is further subdivided into aspects regarding the amount of control students can exert on the grading process on the one hand, and aspects regarding the perceived suitability of the procedures to produce valid results on the other (Burger and Gross 2014).

From the point of view of control-related procedural justice (PJ-C), the grading process appears fair to the students if they are given the possibility to exert influence on this process (Colquitt 2001). PJ-C is based on Thibaut and Walkers’s (1975) principles of process control and decision control and Leventhal’s (1980) correctability rule. Process control means that students have a voice in the grading procedure. This includes the possibility that grading criteria are established in cooperation with the instructor. Decision control refers to direct involvement in deciding the grade rather than the grading criteria. The correctability rule demands that students can appeal a grade if they feel that the grading decision was flawed. Greater influence relates to justice in that the grading of assessments gives more consideration to the students’ needs, thus assigning them a more active role instead of “forcing” grades onto them. As a consequence, students are partially responsible for the result and thus more likely to accept it. However, note that the above principles do not necessarily ensure that procedures are fairer in a sense of being more equitable. Even though decision control can be used to involve students in judging the quality of their own work in a constructive manner, there is a risk of abuse if students simply demand a better grade for no good reason. Students expressing the feeling that they deserve to have some voice in grading decisions can be indicative of a misguided sense of entitlement (Greenberger et al. 2008). Despite that, it is important that fairness is also a matter of perspective. While an impartial observer might consider it fair if a questionable attempt to exert influence is dismissed by the instructor, the student will likely feel injustice as long as the claim was legitimate from their point of view.

The concept of validity-related procedural justice (PJ-V) is derived from Leventhal’s (1980) work. According to Leventhal, a distributive procedure is perceived as fair if the receiving party feels that it is in compliance with certain rules. With regard to the validity of a process, the relevant criteria are bias suppression, consistency, and accuracy (Leventhal 1980). Applied to a higher education context, these rules demand that grading decisions cannot be guided by any partiality for or prejudice against certain students; that the standards used in assigning grades are applied consistently; and that methods are used that are able to accurately capture the students’ understanding of the subject.

Informational justice (IJ) was introduced as a distinct justice dimension by Greenberg (1993) and describes how individuals are informed about a procedure. Accurate and transparent communication enables individuals to come to a better understanding of the procedures, which in turn increases the likelihood that the procedures themselves are perceived to be fair (Greenberg 1993). In addition, feedback that is reasonable and constructive can be a motivator for improvement in future assignments (Hattie and Timperley 2007). In the present study, a fair communication policy is defined by detailed and thorough explanations on how assessments are graded. Further, explanations and feedback need to be comprehensible; they have to be communicated in a timely manner (Colquitt 2001).

University departments, academic environment, and justice climate

For a long time, research in the field of organizational justice has focused on the individual level when explaining the antecedents of justice evaluations. From that point of view, whether or not a procedure is considered to be fair is primarily a reflection of individual preferences and dispositions (Naumann and Bennett 2000). While it is certainly correct that individual-level attributes play an important role, this interpretation does not take into account that procedures are embedded in specific social and institutional contexts. Even though justice evaluations are ultimately expressions of individual sentiment, they are also reactions to actual events and to the conditions surrounding these events (Wegener 1991). A shared institutional context means that individuals are subject to the same conditions, rules, and regulations, and therefore make similar experiences (Mossholder et al. 1998). This promotes the emergence of group-specific justice climates, meaning that similarity in experiences and exchange about these experiences will lead to similarities in justice evaluations (Liao and Rupp 2005).

The present article applies this concept to a university setting. Here, the academic environment on the department level describes the basic framework in which learning, teaching, and assessment take place. It is assumed that structural conditions on the institutional level impose a specific form on the assessment process as well as on student–faculty interactions that can potentially impact the fairness of grades from a student perspective. In practice, the size of the impact will depend on a number of factors such as the specific practices and customs within a department (Neumann et al. 2002; Lindblom-Ylänne et al. 2006). There will also be variance due to different approaches chosen by individual instructors (Oleson and Hora 2014; Wilkesmann and Lauer 2015). Nevertheless, the idea here is that the structural conditions in a department give a certain direction to the experiences that students are likely to make. Since students in the same department experience their studies in light of these conditions, it seems reasonable to expect that their justice perceptions would show a certain degree of congruence. At the same time, congruence of justice perceptions within departments points to the possibility that sentiments could differ from students who are exposed to a different environment. The following paragraphs detail the respective roles of assessment method and instruction method in shaping these conditions.

Assessment method

While assessment can take place in a variety of other formats such as peer assessment (Topping 1998; Ashenafi 2015), self-assessment (Orsmond and Merry 2015), and portfolio assessment (Dysthe and Engelsen 2011), this study contrasts essays and examinations as these formats remain by far the most prevalent in the institution studied here. Essays and examinations represent two rather different approaches to measuring student performance, which has consequences for student attitudes toward the assessment process (Maclellan 2001; Scouller 1998; Flores et al. 2015). With regard to justice evaluations, the assessment method informs us about the degree to which assessment is standardized. The essay format is on the low end of the standardization spectrum. Unlike examinations, it is hardly possible to give an a priori specification of what constitutes a perfect score, and the result cannot always be positively determined to be right or wrong (Norton 1990).

It follows from this that students in departments where assessment is more essay-based should have better opportunities to bring their own perspective into the assessment process and to influence the outcome. The openness of essays also provides that students have better chances to make a compelling argument in the first place—arguing about a wrong answer in a more standardized format is a less promising endeavor. This leads us to our first hypothesis: A higher proportion of essays in a department is expected to increase ratings of PJ-C (Hypothesis 1PJ-C).

The assessment method also has implications for perceptions of PJ-V. Since examinations represent a more standardized approach to performance assessment, they can ensure that criteria like objectivity and consistency are adhered to (Biggs 1973). Essays grant more freedom when judging the results. A positivistic, techno-rationalist conception of assessment as described by Orr (2007) is hardly compatible with essays. Grading decisions are often too complex to be based on a predefined set of universal criteria (Bloxham et al. 2011). This does not mean that the validity of the grading process is necessarily compromised. In fact, one might argue that essays are better suited to capture student understanding of the subject matter (Huang 2016). But due to the lack of standardization, suspicions of arbitrariness are both more likely to arise and harder to dispel. This is complicated by the fact that student views about what is important when judging the quality of an essay can deviate from what faculty are looking for (Norton 1990). Thus, a higher proportion of essays relative to examinations is expected to have a negative impact on ratings of PJ-V (Hypothesis 1PJ-V).

With regard to IJ, there is no official policy regarding assessment and feedback on the level of the university studied here. Generally speaking, feedback for examinations is usually shorter than for essays. In many cases, feedback for examinations is limited to communicating the grade, unless students specifically ask for more information. Yet, feedback for essays can be very sparse as well. Departments work rather autonomously in this regard, and even within departments, there is bound to be variance between individual instructors. Nevertheless, it can be argued that essay-based assessment offers a different platform for the exchange of information between students and faculty. For example, the task assignment for an essay can be discussed in greater detail, which is usually not the case with examinations. Since communication channels are established as a by-product of essay-based assessment, the flow of information is facilitated. Therefore, students have more opportunities to satisfy their informational needs, which lead to the hypothesis that ratings of IJ are higher in departments where assessment is more essay-based (Hypothesis 1IJ).

Instruction method

Whether teaching takes place in seminars or in lectures is decisive for how students interact with faculty (Severiens et al. 2015). Interactions in traditional lectures leave students in a passive role. They mostly just follow the instructor’s presentation, and apart from their mere presence, their contributions are limited (Severiens and Schmidt 2009). This creates distance between students and faculty, making it harder for the former to actually become active if they need to do so to satisfy their needs (Park and Choi 2014). Research has shown that some of these issues can be compensated by incorporating elements that promote student engagement and interactions with instructors into lectures (Cavanagh 2011; Miller et al. 2013; Roopa et al. 2013). However, interactive elements in lectures are rather uncommon in the university in which data for this study were collected. In stark contrast to traditional lectures, instruction in seminars is more student-centered, encouraging students to actively interact with faculty. This makes it easier for faculty to both recognize and serve student needs, whereas in lectures, it is not uncommon for the majority of students to not have a single direct interaction with faculty in the course of a whole semester (Cuseo 2007).

The mode of interaction implied by the instruction method is assumed to be related to perceptions of PJ-C and IJ. It can be argued that these two justice dimensions are, to some extent, determined by what the students themselves make of the situation. With regard to PJ-C, this is obvious: Exerting control on the assessment process is not possible if the students do not act on their own initiative. A professor would not know that a student feels the need or entitlement to influence his or her grades if this is not actively expressed. The same is true for IJ. Even though students can receive assessment-related information without having to become active, this is not always sufficient. The more specific the information a student wants, the less likely that he or she will get this information unless they explicitly ask for it. PJ-V, on the other hand, does not depend on student–faculty interactions: As the demand for impartial and accurate grades does not vary, grading procedures should be valid regardless of whether or not students approach faculty.

In the present study, the curriculum of each department in the sample includes both lectures and seminars. However, the ratio of both formats varies heavily between departments. On the low end of the spectrum, seminars make up less than 40 % of classes, while on the upper end, more than 90 % of classes take place in seminars. The contrasting properties of seminars and lectures entail that students enrolled in more seminar-based departments are in a position in which it is more likely that they will become active on their own accord. This creates an environment in which the pursuit of interests regarding PJ-C and IJ is facilitated, which is why a higher proportion of seminars is expected to improve ratings of these justice dimensions (Hypothesis 2PJ-C and Hypothesis 2IJ).

Background-specific moderation of effects of the academic environment

The previous hypotheses were derived under the implicit assumption that students are a homogenous group who respond to their environment in a uniform way. This assumption seems a little too strong, given that justice evaluations do not only tell us something about the structural conditions in the departments, but also about how students experience and interpret these conditions. Thus, students who are exposed to the same conditions can arrive at deviating judgments if the effects of department-level structure are moderated by individual-level attributes. It is expected that the students’ socioeconomic background plays such a moderating role. Students from families with low socioeconomic status (SES) were found to have more difficulties adapting to the cultural and social context of university (Pascarella et al. 2004; Ostrove and Long 2007). This includes greater insecurities in interactions with faculty and peers (Bourdieu 1986; Tinto 1993; Kim and Sax 2009). These insecurities have implications for how perceptions of PJ-C and IJ are influenced by the academic environment. Since perceptions of PJ-V are assumed to be unrelated to student–faculty interactions, differential effects on this justice dimension due to insecurities associated with a lower social background are not expected.

Recall that perceptions of PJ-C and IJ are expected to be more negative if the academic environment leaves students in a passive, less involved role; that is, if assessment is examination-based and/or instruction is lecture-based. Yet, even in an unfavorable environment, students can find ways to get what they want. Regarding PJ-C, both examination-based assessment and lecture-based instruction make it harder for students to present an argument for a better grade because of the higher threshold they need to pass to initiate contact and to plead their case. Therefore, it is less likely that they even attempt to do so. Still, a higher threshold does not mean that exerting influence is prohibited per se, it just requires additional efforts. This applies to IJ as well. Even though examinations and traditional lectures provide less potential for detailed explanations, this does not mean that information cannot be obtained via other means. Students can still use office hours, email etc., to directly contact faculty to get the information they need.

This requires that students are confident enough in interactions with faculty to contemplate such actions and to show the necessary initiative to follow through. Students from low-SES families are more insecure in these matters than their higher status peers. Thus, they are at a disadvantage when confronted with an academic environment in which the fulfillment of justice-related expectations depends on such requirements. This disadvantage is less pronounced that the more interactions with faculty are encouraged by the method of assessment and the method of instruction. It follows from this that the positive effects of a higher proportion of essays and a higher proportion of seminars on both PJ-C and IJ are expected to be larger for students from a lower social background than for those from higher status families (assessment method: Hypothesis 3PJ-C and Hypothesis 3IJ; instruction method: Hypothesis 4PJ-C and Hypothesis 4IJ). Hypotheses are summarized in Table 1.

Table 1 Research hypotheses

Data and operationalization

This study uses data from the first wave of the CampusPanel. CampusPanel is an online survey that was conducted in the fall semester of 2013 among students of all departments of one of the largest German universities. The data include information on the students’ justice evaluations as well as on other study-related attitudes and experiences (Lang and Hillmert 2014). Data on department-level variables were collected from the online course catalog of the university for the semester in which the survey took place. The research sample consists of N = 1549 students on the individual level (L1). For some participants, information on ratings of all items that were used as measures of PJ-C and IJ was missing. These cases were excluded from analyses of these justice dimensions. The number of valid cases is N = 1496 for PJ-V and N = 1530 for IJ. Descriptive statistics of the study group are presented in Table 2.

Table 2 Description of study group

Participants are nested in N = 48 university departments. All department-level (L2) predictors are measured on this level. Yet, there is reason to expect additional variance within departments. This is due to the fact that in German universities, most departments offer more than one type of degree program (usually Bachelor’s degree, Master’s degree and state examination (Staatsexamen) for teaching professions). Multiple programs in one department suggest the possibility that while the aspects of the academic environment that are measured by the department-level predictors are constant across programs, other, unobserved factors that could affect justice perceptions may vary (e.g. workload). This possibility is taken into account when defining the level-2 units for the regression models. Departments with multiple programs are further subdivided, which leads to a total of N = 93 L2-units.

Instruments

Dependent variables

PJ-C, PJ-V, and IJ were each measured using three-item scales. The construction of these scales is based on an instrument developed by Colquitt (2001) that is widely used in organizational justice research. Table 3 contains English translations and descriptive statistics of the survey items used in this study. Confirmatory factor analysis (CFA) was used to predict standardized factor scores for the three justice dimensions.

Table 3 Scales, reliabilities and descriptive statistics

Individual-level predictors

SES: Parental socioeconomic status is measured on a continuous scale using ISEI-08 scores which can take on values between 10 (low status) and 90 (high status) (Ganzeboom and Treiman 2014). If information on both parents was available, the higher value was used. This variable was rescaled so that one unit equals ten ISEI-points.

Additional control variables on the individual level are immigrant background, gender, year of study (measured in number of semesters studied), satisfaction with academic achievement, preenrollment information regarding the study program, and digital media use by faculty. Variables for satisfaction with achievement and preenrollment information are factor scores derived from a CFA using three individual items, respectively (see Table 3).

Department-level predictors

Assessment method: The item measuring the proportion of essays relative to examinations in a department is based on CampusPanel data. Participants were asked how many essays and examinations they had written thus far. The ratios of these two values were then aggregated to calculate the group mean for each department. Since the data include no details on types of examinations students have written, the category “exam” subsumes different formats such as multiple-choice questionnaires and open-ended questions. Information on the last written examination was available for a small subset of the data. Among these, 57.3 % stated that their last examination was either partially or fully multiple-choice, so that there seems to be a focus on more standardized instruments.

Instruction method: The proportion of seminars was calculated as the ratio of the total number of classes in each department and the number of classes where instruction takes place in a student-centered format. Variables for proportion of essays and proportion of seminars are scaled so that one unit corresponds to a 10 % difference. Additional control variables on the department level are staff-student ratio as well as a categorical variable for type of degree program. The categories are undergraduate (Bachelor), graduate (Master), state examination for teaching professions, and state examination for nonteaching professions.

Method

Multilevel regression models are used to test the hypotheses. By using multilevel modeling, we can account for the fact that students in the same department are subject to the same academic environment. This within-group homogeneity could not be adequately captured if simple OLS regression is used, which would cause problems with statistical inference, especially in the estimation of standard errors (Raudenbush and Bryk 2002; Snijders and Bosker 2012).

The analytical procedure is divided into several steps. In a multilevel regression model, the intercept term is allowed to vary randomly between departments. As such, we can decompose the total variance in justice judgments into two parts: variance between individuals (L1) and between departments (L2). First, we want to get an estimate of the proportion of the variance that can be attributed to each of the two levels. This is done by fitting a model without predictors (Model 1). Next, a series of regression models is estimated in which each subsequent model adds parameters to the model before it. Model 2 expands on Model 1 by adding individual-level predictors. Model 3 then adds predictors that describe the academic environment on the department level. This gives us information on the effects of assessment method and instruction method on justice perceptions. In addition, we can determine how much of the between-department variance is explained by the department-level predictors. Building on this, Model 4 adds a random slope parameter for the effect of parental SES. This allows us to test whether the effect of SES is different between departments. Models 5 and 6 then test whether these differences are related to between-department differences in assessment method and instruction method.

Models are estimated using the software Stata 13.1. All continuous predictors are grand mean centered. With regard to the interpretation of the results, dependent variables are standardized factor scores. These scores have a mean of zero and a standard deviation of one. This means that if an independent variable in a regression model shows a regression coefficient of for example −.5, a one-unit increase in this predictor is estimated to lower justice judgments by half a standard deviation (SD).

Results

Random intercept models with individual-level predictors

As a first step, a model without predictors is fitted for each of the three justice dimensions in order to get estimates of the variance components on the individual and the department level. Model 1 shows an intraclass correlation (ICC) of .088 for PJ-C, .062 for PJ-V and .083 for IJ (see Table 4). These values can be interpreted as the proportion of total variance in justice evaluations that is due to variation between departments (L2) as opposed to variation within departments (L1). For example, almost nine percent of the variance in evaluations of PJ-C can be attributed to the institutional context. Likelihood-ratio tests comparing the random intercept models to the results of a pooled OLS regression without random intercept are significant for all three models (p < .000). This means that we can reject the null hypothesis that the random intercept has zero variance across L2-units for all three justice dimensions, and take this as evidence for the existence of department-specific justice climates.

Table 4 Multilevel regression models, L1 predictors

Model 2 adds L1 predictors to the regressions. This does not provoke any significant change in the L2 intercept variance. The results suggest that the between-department variance found in Model 1 is indeed due to department-level factors rather than a department-specific clustering of students with particular attributes.

Models with department-level predictors

Variables for proportion of essays, proportion of seminars, and staff-student ratio as well as type of degree program are added in Model 3 (see Table 5). Department-level residual variance decreases for all three justice dimensions. For PJ-C, the L2 intercept variance is down to .018 from .087 in Model 1, and for PJ-V it is down to .022 from .060, and for IJ it is down to .044 from .082. This means that 79.3 % of the total between-department variance of PJ-C is explained by the department-level predictors; for PJ-V, it is 63.3 % and for IJ 46.3 %. Thus, the measures of the academic environment explain the majority of the variance in perceptions of PJ-V and PJ-C that occurs between departments, while IJ still shows a significant amount of group-specific variance that is not accounted for by the variables in the model.

Table 5 Multilevel regression models, L1 and L2 predictors

The assessment method shows significant effects on PJ-C and PJ-V. The effect on PJ-C is particularly large: A 10 % increase in the proportion of essays relative to examinations leads to an average increase in PJ-C of .085 SD. This supports H1PJ-C, where it was proposed that essays leave more room for negotiations. If performance was only graded according to predetermined factors such as right or wrong answers, there would be little left to be negotiated. As expected in H1PJ-V, the coefficient of proportion of essays shows a negative sign for PJ-V. A 10 % increase in the proportion of essays equals a decrease in PJ-V by .052 SD. While essays do leave room for the students’ needs to be heard, they also leave room for arbitrariness. Of course, we cannot tell whether or not instructors are actually more likely to grade essays in an arbitrary fashion. Yet, we can tell that students question the validity-related aspects of the grading process to a greater extent than if assessment was more examination-based. Hypothesis H1IJ, which stated that the assessment method has an effect on ratings of IJ, is not supported by M3.

The proportion of seminars relative to lectures was predicted to have a significant effect on both PJ-C and IJ. Results are in support of H2PJ-V and H2IJ. A 10 % increase in the proportion of seminars increases PJ-C by .036 SD. The effect is slightly more pronounced for IJ, where a 10 % increase in seminars equals a .044 SD increase in the dependent variable. Seminars require students to actively participate in direct interaction with faculty. This lowers the threshold students need to overcome to pursue their justice-related needs, which promotes a positive justice climate. As expected, there is no significant effect on PJ-V, which supports the notion that from a student perspective, the use of valid grading criteria does not depend on student–faculty interactions.

Aside from the effects of the academic environment, one particularly interesting finding is the fact that female students give significantly worse ratings in all three justice dimensions (PJ-C: b = −.272; p = .000; IJ: b = −.236; p = .000; PJ-V: b = −.111; p = .035). Indeed, for PJ-C and IJ, gender is one of the most influential predictors in the model. These effects persist even when controlling for the L2 variables, suggesting that the findings are not due to a gender-specific selection into departments in which the academic environment is less favorable for meeting justice-related expectations. Additional research is necessary to find explanations for the large gender gap.

Cross-level interactions

Next, we take a look at the extent to which the effect of the academic environment on PJ-C and IJ is moderated by the students’ parental SES. Model 4 (see Table 6) adds a random slope for parental SES to the regression equation. This allows effects of SES to vary between L2 units. Likelihood-ratio tests for differences between models with and without the random slope parameter are significant for IJ, but not for PJ-C. This is taken as evidence that the relationship between SES and perceptions of IJ varies between departments. Since this is not the case with PJ-C, cross-level interactions between the L2 predictors and SES are estimated only for IJ. Model 5 provides evidence in support of H3IJ, which stated that the effect of the assessment method on IJ is larger for low-SES students.

Table 6 Multilevel regression models, L1 and L2 predictors, random slope and cross-level interactions

For ease of understanding the interaction between assessment method and SES on IJ, conditional marginal effects are plotted in Fig. 1. Here, we see effects of a 10 % increase in the proportion of essays on IJ across the range of parental SES as measured by ISEI scores. For students with an ISEI of 25, a 10 % increase of the proportion of essays equals an increase of IJ by .114 SD, whereas the same increase in essays increases IJ only by .065 SD if the ISEI is 50. A little further to the right, the 95 % confidence bands cross the zero line. This means that the effect ceases to be significant for ISEI scores greater than 52. This suggests that while the means of assessing student performance can influence the way students judge grading-related information policy, this is only true for students from lower status groups. This could explain the lack of a significant main effect of assessment method on IJ we saw in Model 3.

Fig. 1
figure 1

Cross-level interaction assessment method × SES on IJ. Predicted change in ratings of IJ for a 10 % increase in the proportion of essays for different levels of SES. 95 % confidence bands

Model 6 shows interactions between instruction method and SES. The effect of proportion of seminars on IJ significantly varies with parental SES (p = .009) and thus provides evidence in favor of H4IJ. This interaction is visualized in Fig. 2. For students with an ISEI of 25, the model predicts ratings of IJ to increase by .13 SD if the proportion of seminars increases by 10 %. On the other hand, the same increase in the proportion of seminars improves ratings of IJ only by .08 SD if the ISEI is 50. The effect ceases to be significant for ISEI scores greater than 71.

Fig. 2
figure 2

Cross-level interaction instruction method × SES on IJ. Predicted change in ratings of IJ for a 10 % increase in the proportion of seminars for different levels of SES. 95 % confidence bands

Discussion

Results by Burger and Gross (2016) suggest that student retention rates could be improved by reducing feelings of unjust treatment. The purpose of the present study was to contribute to this goal by exploring the role of the academic environment in shaping student perceptions of the fairness of the assessment process. The focus was on the question of how department-specific configurations of assessment method and instruction method influence evaluations of PJ-C, PJ-V, and IJ. With regard to the assessment method, essays were contrasted with examinations. In terms of instruction method, this study contrasted seminars and lectures. Results provide evidence for the existence of justice climates within university departments. Ratings of PJ-C were found to be significantly higher in departments were the assessment process is more essay-based as opposed to examination-based. On the other hand, it could be demonstrated that essay-based assessment has a detrimental effect on perceptions of PJ-V. Perceptions of IJ are positively correlated with a larger proportion of essays; effects were found to be especially strong for students from low status families. As for the method of instruction, a more student-centered approach to teaching in form of seminars as opposed to lectures proved to be beneficial for ratings of PJ-C and IJ. Again, the effect on IJ is moderated by parental SES.

Taken together, these results deliver valuable insights into how individual justice evaluations depend on the academic environment. Whether or not the grading process appears fair is to some extent a matter of being enrolled in the right (or wrong) department. Interdepartmental variation in assessment method and instruction method can lead to substantial differences in justice climates. This can be demonstrated by looking at the departments in our sample. The proportion of essays is below 11.4 % in one tenth of departments, while it is below 64.9 % in 90 % of departments. By plugging these values into the regression equation for M3, we get an average difference in ratings of PJ-C of .453 SD between departments in the first and the last decile. Therefore, students in the first group appear to be at a major disadvantage when it comes to exerting influence on the grading process. Conversely, grading of essay-based assessment can lack transparency and at worst appear arbitrary when compared to a more standardized approach. Again comparing departments in the first and last decile, our model predicts PJ-V to be rated lower by .278 SD by students in departments with the highest proportion of essays. Regarding the instruction method, the proportion of seminars is below 39.7 % in one tenth of departments in the sample while it is over 94.7 % in the highest decile. The model predicts an average difference in perceptions of PJ-C of .200 SD between departments in the first and the last decile; .244 SD for IJ. Thus, we can see obvious benefits for students in departments where instruction is primarily seminar-based.

The significant cross-level interactions add another dimension to these results: Since low-SES students’ perceptions of IJ exhibit a stronger dependence on the academic environment, interdepartmental differences are even more pronounced for this group. For students with an ISEI of 25, ratings of IJ are predicted to differ by .609 SD between departments in the first and the last decile of proportion of essays and by .716 SD when looking at the proportion of seminars. These comparisons point to structural inequalities between university departments that are usually overlooked. Given the particularly delicate standing of low-SES students in higher education, the findings in this study underline the importance of creating an environment that considers the needs of this group.

The conclusions have implications for policy decision making. From the point of view of PJ-C and IJ, it is tempting to recommend an increase in the proportion of essays to improve justice climate. Unfortunately, the negative effects of essays on perceptions of PJ-V lead to an obvious dilemma. Still, an argument in favor of essays can be made. While we do know that essays are more likely to create an impression of questionable grading practices, we do not know the extent to which these concerns are grounded. That is, would ratings of PJ-V still be lower in essay-based assessment if students had sufficient information on the grading process? Since grading criteria for essays are more complex, vague explanations given for an essay could be perceived more negatively than vague explanations given for an examination. It can be argued that one of the best ways to mitigate validity-related concerns is to make the process transparent through quality feedback (Carless 2006; Lizzio and Wilson 2008). Therefore, attempts to reduce feelings of injustice by prioritizing essay-based assessment can only be successful if they are accompanied by measures that ensure that a certain level of feedback quality and transparency is maintained. Such policies are still not present in many universities, including the one where this study took place. Regarding the instruction method, the case is rather straightforward: Seminar-based instruction facilitates student–faculty interaction and should thus be preferred when compared to traditional lectures. This adds to the existing literature in favor of a more student-centered approach to teaching (Cuseo 2007; Severiens and Schmidt 2009). Note, however, that these findings are based on data from a single German university. Therefore, the applicability of these recommendations in other contexts needs to be substantiated by further research.

Of course, these policy recommendations need to be measured against what could realistically be implemented given the resources available to a particular institution. Large-scale lectures enable schools to teach quantities of students that would otherwise exceed available capacities (Maringe and Sing 2014). Likewise, the benefits of essays over examinations come at the cost of increasing the workload for faculty when grading assignments—let alone the efforts that are necessary to provide extensive feedback (Price et al. 2010). Thus, there is bound to be friction between a study organization that is in line with students’ justice-related expectations and one that is feasible in light of limited resources and other situational constraints. Decisions need to be made by evaluating the status quo on a case-by-case basis: As long as the choice of assessment method and instruction method is not dictated by the circumstances, there is a case for the option that is most beneficial for a positive justice climate.

In closing this article, it is necessary to point out some limitations. First, given that the sample is comprised of students of a single German university, it is difficult to assess the extent to which the results can be generalized. It can be argued that the theoretical mechanisms that were proposed in this study are general enough to not be limited to the context of the study. Yet, it is necessary to investigate how effects of the academic environment manifest themselves in different institutions and for different student populations, especially in cross-country comparisons. This is particularly important in terms of the policy recommendations derived from this study.

Next, future research should also consider the effects of a more varied set of methods of assessment and instruction by using more fine-grained data. Essays, examinations, seminars, and lectures are rather broad categories that subsume a range of approaches to assessment and teaching. Assessment via examination can take place in many different formats, from multiple-choice questionnaires to open-ended questions. One could argue that negative effects of the examination format on perceptions of PJ-C should be lower in case of open questions than for multiple-choice. Likewise, advantages of examinations in terms of PJ-V could be less pronounced for open formats when compared to a tightly structured approach. In the same vein, a binary representation of instruction method cannot account for the possibility that some lectures can engage students in interactive processes. Therefore, differentiated analyses are necessary to provide a more accurate picture of student perceptions.

Finally, this study was focused on structural characteristics of departments. Although the structure specifies the general direction of assessment and student–faculty interactions, it does not fully determine the outcome. Faculty actions are also guided by disciplinary norms and department-specific customs (Biglan 1973; Becher 1989; Neumann et al. 2002). This can counteract the tendencies suggested by the structural conditions. For example, while we found considerable between-department variance in evaluations of IJ, the larger part of these differences could not be explained by our department-level predictors. A possible explanation for this could be that even though the academic environment defines the framework for the transmission of information, the actual feedback practices might depend on disciplinary customs as well. We cannot rule out the possibility that some disciplines simply assign little value to extensive feedback, thus canceling out the benefits of an environment that should otherwise be favorable for perceptions of IJ. Likewise, essays might offer better opportunities to influence the grade, but this is of little use if this type of student involvement is generally frowned upon in a department. Therefore, it is advised that future research should also consider faculty perceptions of these matters. This would provide a broader and more differentiated knowledge base for policies that aim to reduce feelings of injustice and to close interdepartmental gaps in justice climates.