1 Introduction

In education, assessment can be defined as ‘a systematic collection, review, and use of information’ (Walvoord 2004) to acquire feedback about a student’s progress and achievements, the effectiveness of teaching and instruction, and the attainment of course outcomes (University of Tasmania (UTAS) 2011), while fulfilling the overall goal of improving student learning (Palomba and Banta 1999). In outcome-based education (OBE) such as vocational education and training (VET) or competency-based training (CBT), assessments also provide feedback about the attainment of minimum standards by students that are essentially required for the workplace (Brady 1997; p.10). Standards in such cases become the outcomes (Burke 2011) or more correctly ‘learning outcomes’ establishing what the students should be able to demonstrate at the end of the learning period (Driscoll and Wood 2007). Students direct their learning efforts towards ‘outcome’ attainment, and assessors are guided on what they are supposed to measure via assessments. The evidence produced from the assessments can be used by educators to not only improve teaching practices by identifying learning needs but also to meet accountability requirements by providing feedback to stakeholders on the learners’ progress towards achievement of standards (Brindley 1998).

Standards for the occupational practice of seafaring are provided through the Standards of Training, Certification and Watchkeeping (STCW) Code of the STCW Convention that was introduced by the International Maritime Organization (IMO) in 1978 (then known as STCW’78). The STCW’78 was essentially knowledge-based comprising a syllabus for a quantifying examination instead of focusing on skills and abilities necessary to perform workplace tasks (Morrison 1997). The IMO revised the STCW Code through the 1995 amendments (since known as STCW’95) intending to fundamentally improve the training mandate by making it outcome-based. As a requirement of OBE and for the purposes of the certification and licencing, seafarers are required to demonstrate the achievement of the STCW standards through assessments.

Demonstration of attainment of competence that resembles workplace standards may require assessments that assess not only students’ progress against outcome attainment but also their ability to perform workplace tasks. Evidence produced through traditional assessment tasks such as multiple choice questions or oral examinations can provide indicators for students’ mastery of content knowledge but may not be able to adequately capture different aspects of a complex student performance resembling workplace tasks (Montgomery 2002). Such performance can be captured through assessment rubrics which comprise of individual and essential dimensions of performance known as criteria along with standards for levels of performance against those criteria (Jonsson and Svingby 2007). Rubrics involve creating a standard and a descriptive statement that illustrates how the standard is to be achieved (Cooper and Gargan 2009). Rubrics may report on outcome attainment, but the validation of attainment is achieved through the assessment process (Davis et al. 2007).

To determine if the intended outcomes have been achieved and to collate evidence of the same, assessors need to decide whether the selected assessment methods will adequately allow for evaluation and demonstration of the students’ learning outcomes (Moskal 2000). The quality of the information provided on outcomes attainment by the rubrics will only be as good as the assessments on which the reporting is based (Brindley 1998). The ability to perform workplace tasks should be assessed through assessment methods that resemble professional scenarios. Hence, fidelity of context to conditions in which the professional skill would be applied becomes an important element of assessment methods adopted. Such performance-based assessments applied in real-world contexts have often been described as authentic assessments (Herrington and Herrington 1998; Reeves and Okey 1996; Wiggins 1993; Meyer 1992).

However, fidelity of context cannot alone assure that essential aspects and constructs of professional competencies are being accurately assessed. Assessments should be valid and reliable to do so. Validity refers to the extent to which the evidence produced through assessments supports the inferences made about the student’s competencies and whether such inferences are being interpreted in appropriate contexts (Moskal and Leydens 2000). On the other hand, reliability refers to the consistency of assessment scores obtained every time the same competencies are assessed irrespective of the scorer, time period between the assessments, and the contextual and individual learning variables under which the assessments occur (Moskal and Leydens 2000). Rubrics provide clear statements on learning and performance expectations for both educators and students. Such statements can then be used to assess if intended outcomes were achieved by students, educators, and assessors. Hence, rubrics are highly regarded as tools that increase validity and reliability in assessments (Rezaei and Lovorn 2010; Jonsson and Svingby 2007; Silvestri and Oescher 2006).

This paper establishes the importance of using rubrics as an authentic assessment instrument for assessing outcomes that represent workplace tasks. Authentic assessment is defined collating all the characteristics used by major authors in the field. Validity and reliability are then established as essential criteria for measuring the effectiveness of assessment methods by researchers. Based on an extensive literature review in the area of authentic assessment, this paper explores the practices adopted in the past to improve the validity and reliability of authentic assessment when rubrics are used as an assessment instrument. The review uncovers a lack of holistic approach in addressing both validity and reliability aspects of authentic assessment and an absence of global research on authentic assessment in the field of seafarer education and training.

2 Definitions

2.1 Authentic assessment

The idea of ‘authenticity’ in education was conceived and developed in response to increasing accountability to stakeholders. The movement started in the 1980s in the high schools of USA. The term ‘authentic’ was first linked to student achievement by Archbald and Newmann (1988), requiring them to demonstrate outcomes beyond the school learning environment in an applied/work context. Wiggins (1989) related the term to student assessment while promoting authentic assessment as a process that required student performances (Wiggins 1990) at standards expected in the professional field. Unlike traditional tests that produced transcripts with unclear information of actual competence, evidence of student performance at workplace standards would improve accountability to stakeholders.

Authentic assessment is often used interchangeably with performance assessment as it imbibes some of the characteristics of the latter, but they are not synonymous (Marzano et al. 1993). For example, all authentic assessments require a performance of some kind, but not all performance-based assessments are conducted in authentic or real-world contexts (Meyer 1992). Palm (2008) provides a detailed classification of meanings describing the similarities and wide range of differences between the meanings of each concept. Authentic and performance assessments are known as types of ‘alternative assessments’ to traditional assessments (Dikli 2003). Traditional assessments include pen and paper testing, multiple choice questions (MCQs), and oral examinations. Cumming and Maxwell (1999) show that characteristics of authentic assessment can also be found in other assessments, such as problem-based and competency-based assessments, but provide clear distinction between them. For example, they explain that authentic assessment is based on theories of learning where performance of tasks occurs in genuine workplace or contextually similar situations. On the other hand, competency-based assessments are based on the theory of vocational education where assessment tasks should represent workplace tasks but can be performed in individual components and not necessarily integrated into one holistic task. Authentic assessments have also been called dynamic assessments (Chance 1997; Butler 1999) due to its dynamic nature of evolving to address student learning needs.

This paper defines authentic assessment by collating the characteristics provided by the most commonly cited authors in the area (Table 1). The exact number of citations for the individual papers has been obtained from the website of Google Scholar.

Table 1 Characteristics of authentic assessment defined by most commonly cited authors

Based on the characteristics provided in Table 1, authentic assessment herein will encompass tasks resulting in outcomes in a real-world context that require an integration of competence to solve forward looking questions and ill-structured problems, processes that require performance criteria to be provided beforehand and evidence of competence to be collected by the student, and outcomes that result in valid and reliable student performance, contextual and multiple evidence of competence, higher student engagement, and transfer of skills to different contexts.

2.2 Rubrics

Rubrics (an example shown in Table 2) are assessment tools that comprise of individual and essential dimensions of performance known as criteria along with standards for levels of performance against those criteria (Jonsson and Svingby 2007). Although the terms ‘criteria’ and ‘standard’ is sometimes used interchangeably, they have distinct meanings (Sadler 2005). The definitions provided by Sadler (2005) and Spady (1994) provide a robust basis for distinguishing the terms. Standards are defined as levels of definite attainment and sets of qualities established by authority, custom, or consensus by which student performance is judged, whereas criteria are essential attributes or rules used for judging the completeness and quality of standards. Table 2 provides an example of how a rubric may be designed for the unit of competence of ‘Prevent, control, and fight fires on board’ at the operational level from the STCW’95 Code. The move of seafarer training to OBE has shifted the emphasis to demonstration of competence requiring the intended learning outcomes (ILOs) be established and communicated to students beforehand to make the learning process transparent (Biggs and Tang 2007). As assessment rubrics communicate standards and the feedback for its achievement, they are an essential tool to OBE (Reddy 2007).

Table 2 Example of how a rubric may be constructed for the STCW unit of competence of ‘Prevent, control, and fight fires on board’ at the operational level

Without rubrics, students have no guidelines towards achievement or to understand the teacher’s feedback comments (Montgomery 2002) on outcomes achieved. For example, using a focus group discussion involving fourteen undergraduate students, Andrade and Du (2005) found the use of rubrics to be very effective in providing performance expectations and feedback about achievement of standards in teacher education. However, using rubrics to communicate standards achieved by students in professional education also requires assessment methods such as authentic assessment that can capture such standards.

Traditional assessments such as multiple choice questions and oral examinations assess the ability to recall facts and some of the applied skills (Archbald 1991) but fail to assess essential behaviour-based attributes (Wiggins 1992). An individual must develop along with technical skills and knowledge that together define professional competence (Sampson and Fytros 2008). Assessment of professional competence can be captured through authentic assessment tasks based on meaningful contexts and applied in real-world or contextually resembling real-world settings. However, professional competence is developed and assessed under specific contexts in educational settings. Transfer of performance or competence to perform individual components of a task to a holistic performance of the task where integration of competence is required cannot be assumed (Cumming and Maxwell 1999). According to Cumming and Maxwell (1999), learning and assessment need to be contextualised to make it relevant and meaningful for students. Meaningful context cannot only provide motivational benefits to student learning but also a clear understanding of learning that can or cannot be transferred to different contextual scenarios. If real-life contexts and complexities (task-centred approach) cannot be created in assessments, they should then focus on the selected constructs (construct-centred approach) of knowledge and skills (Messick 1994). For example, assessments designed in maritime education and training (MET) institutes may not be able to assess a student’s competence to manage large crowds as is required on passenger ships, but they may be designed to assess a student’s competence to do so through their ability to analyse risks associated with such management or developing crowd management plans. Although such assessments may take place in controlled situations, the authenticity will be reflected through ways in which the same skills would be applied in real-life contexts (Messick 1994). The standard of learning achieved in the real-world contexts may be communicated via rubrics, making it an important authentic assessment instrument for assessing outcomes that represent workplace tasks.

3 Authentic assessment

3.1 Aligning authentic assessment with rubrics

One of the key characteristics requires authentic assessment to provide performance criteria to students beforehand, which can be done through the use of rubrics. Provision of clear expectations of standards of performance via rubrics allows students to learn and educators to adopt appropriate instructional strategies to guide students towards the achievement of the desired outcomes (Archbald 1991). The use of summative examinations at the end of the learning period represents the final judgement of the students’ performance and is often too late to make any changes to the learning strategies. Authentic assessment methods that are based on ongoing use of formative assessments may be more suitable to provide diagnostic feedback and make adjustments to improve the learning process (Burke 2011).

Hence, the alignment of the learning, teaching, and instruction process towards the achievement of outcomes creates constructive alignment (Biggs and Tang 2007). Constructive alignment comes from the constructivist theory (Biggs and Tang 2011), where the student is not a mere receiver of knowledge but is also actively involved in the construction of it while progressing in learning. Newmann et al. (1996, 1995) and Cooperstein and Kocevar-Weidinger (2004) connected authentic assessment to the constructivist way of learning. Although principles of constructivism can allow everyone to construct meaningful learning, Newmann et al. (1996) recommended that high intellectual standards provided through rubrics in authentic assessment can promote highly intellectual construction of knowledge and meaning leading to superior learning and performance that would require students to use higher-order cognitive skills. In the current educational environment of the twenty-first century, assessments should not only capture the content knowledge or the professional skills but also higher-order skills (Burke 2011) of problem-solving, critical thinking, leadership, and team-working. According to Wiggins (1989), assessments should not only monitor standards but also set them to reveal achievement of higher-order skills which may not be quantified but is a necessity in a work context. Traditional assessments are not always performance-based nor can they be always creatively designed to encourage demonstration of higher-order skills. For example, a study by Brawley (2009) that involved authentic assessment of twenty-four students in early childhood showed that authentic assessments, when designed properly, are a better way to determine the higher-order thinking skills (as defined by Bloom’s taxonomy) required to complete a task. Creating authentic experience for students correctly becomes central to designing authentic assessment.

3.2 Validity and reliability of authentic assessment

Advances in technology such as simulators, web-learning, multimedia, etc. have allowed many researchers (Neely and Tucker 2012; Neo et al. 2012; Osborne et al. 2013; Scholtz 2007) to use such technology in the area of authentic assessment to create authentic experiences that can replicate real-world tasks for the students. However, Messick (1996) was not convinced that authentic assessments can ever fully represent real-world tasks in educational settings. Messick believed assessments are prone to threats of validity which emphasises the appropriateness of assessment tasks as effective measures of intended learning outcomes (Rhodes and Finley 2013). Because authentic assessments have a high fidelity to real-world contexts, it does not necessarily lead to the conclusion that they are more valid than traditional examinations. Assessment methods should be judged by established criteria for judging the technical adequacy of measures. Key among these criteria are the concepts of validity and reliability (Linn et al. 1991).

Validity and reliability are crucial to the acceptance of authentic assessment (or rubrics as an assessment tool) as an accurate measure of knowledge, skills, and behaviours (Stevens 2013). There are numerous extraneous variables that affect the validity and reliability of the rubrics when used an assessment instrument (Taylor 2011). If these variables are not addressed, then the validity and reliability of the assessment and the resulting outcomes becomes questionable (Olfos and Zulantay 2007).

3.3 Validity and reliability of rubrics

In the area of education, validity is not seen as a property of the assessment but how the results have been interpreted (Jonsson and Svingby 2007). Validity refers to the degree to which evidence produced from assessments support the interpretations made about student’s competencies. Table 3 describes the three types of evidence that are commonly examined to support the validity of an assessment instrument: content, criterion, and construct (Moskal and Leydens 2000).

Table 3 Three types of evidence commonly examined to support the validity of an assessment

It is extremely difficult to construct an assessment which is truly valid in measuring what it is supposed to measure (Finch 2002). For example, an assessment designed to assess a student’s ability to fight fires may not be able to effectively measure personal or professional behaviours (such as creativity and critical thinking) associated with the task performance. According to Messick (1996), it is hard for assessments to achieve complete validity, but he believed that the threats to validity can be minimised by ensuring that assessments do not contain anything that is irrelevant to the measurement of the desired outcomes. For example, assessments designed to assess a student’s ability to fight fires should not include pen and paper testing in classrooms which are irrelevant to the measurement of either the task performance or behaviours associated with it.

Does this mean that relevant and authentic scenarios can insure validity?

Capturing a more authentic performance does not insure validity (Stevens 2013). For example, HoepfL (2000) pointed out that creating standards for authentic assessments is a challenging task which may suffer from ‘Construct underrepresentation’ if the standards fail to assess essential dimensions of knowledge and skills or ‘Construct-irrelevant variance’ if the standards require tasks that are not relevant to measuring the desired competencies (Messick 1995). Assessments are valid if they effectively measure the intended learning outcome it was designed to assess. Whether assessments effectively measure the intended learning, outcomes cannot be based on the subjective judgement of whether questions appear to do so, known as face validity (Drost 2011). Drost (2011) explains that although face validity is important for credibility to stakeholders, it is the weakest and least scientific form of establishing validity for assessments.

For effective measurement, outcomes should be accompanied by the essential criteria and the levels of performance by which the performance would be judged (Mueller 2005). The criteria and the levels are usually combined into a rubric, which forms a scoring guide for the assessment making it easier for educators to define what is being measured through assessments and how the score is to be interpreted (Emery 2001). Scoring without specific guidelines may lead to subjective judgements. Rubrics can be used to improve the objectivity of scoring by specifying the same criteria and standards to be applied to all students’ work for scoring by either individual or multiple assessors (Dennison et al. 2015). For example, according to Jonsson and Svingby (2007), one widely cited effect of rubrics in the areas of authentic and performance-based assessments is the consistency of judgement and scoring across students, tasks, and different raters (scorers). Consistency of assessment scores obtained every time the same competencies are assessed irrespective of the scorer, time period between the assessments, and the context under which the assessments occurred is referred to as reliability (Moskal and Leydens 2000). Table 4 provides the different types of reliability testing conducted in the area of education.

Table 4 Different types of reliability testing used in student assessments

Ideally, an assessment should produce similar results independent of the scorer and the context of assessment. But, is this obtainable?

The more consistent the scores are over different scorers and contexts, the more reliable the assessment is thought to be. Methodologically, sound assessment instruments should have acceptable levels of both validity and reliability (Rhodes and Finley 2013). For example, the study by Vendlinski et al. (2002) used rubrics to authentically assess 134 first-year high school chemistry students to achieve valid inferences of a student’s content understanding, while not allowing the score to be affected by gender, ethnic, or socioeconomic bias.

The validity of the results and the strength of the rubric as an assessment instrument are evidenced by positive results on a variety of reliability tests (Diller and Phelps 2008). Performance-based assessments like authentic assessment face the problem of obtaining reliability (Lynch 2003). Issues such as lack of reliability, inconsistency in assessment design and grading, and potential for grading bias remain important challenges with authentic assessment (Rhodes and Finley 2013). Authentic assessments represent real-world tasks as valid indicators of workplace competence which should be consistent irrespective of the context or scorer. Such consistency can only be proved through reliability. Hence, authentic assessments should achieve both validity and reliability.

Because it can be difficult to establish whether an assessment instrument truly captures the outcome for which it is intended or whether the outcome can be consistently measured, it is preferable for instruments to demonstrate more than one type of validity (Rhodes and Finley 2013) and reliability. There are numerous aspects of validity and reliability investigated and reported in the literature on assessment. They may be discussed selectively, but none should be ignored (Jonsson and Svingby 2007). Although rubrics do not make assessment valid, addressing different aspects empirically could make assessments more valid and reliable for its intended purpose, eliciting the required performance (Jonsson 2008). There is sparse research focussing on the quality of rubrics as a valid and reliable assessment tool (Stellmack et al. 2009). Hence, a literature review in the area of authentic assessment was carried out to reveal if a holistic approach to improving its validity and reliability through rubrics has been used by past researchers in the area.

4 Classification of literature

The classification is based on a review of 124 articles which included books, chapters in books, conference papers and proceedings, government documents, journals, reports, thesis, and other articles classified as generic. The articles were chosen after a web-based search on popular websites such as Google, Google Chrome, and Google Scholar as well as the library database of the University of Tasmania. The University of Tasmania uses popular search systems such as ProQuest and Web of Science which enabled to widen the search of articles. Articles were also found by the snowballing technique based on a search through citations in articles discovered through online search. The online search used the phrases ‘authentic assessment’, ‘authenticity in assessment’, and ‘authentic+assessment’. Hence, all reviewed articles contain both the words ‘authentic’ and ‘assessment’ or ‘authenticity’ and ‘assessment’, the exception being the articles by Wiggins (1998) and BoarerPitchford (2010). While the former was chosen based on the fact that Wiggins is the most cited author in the area of authentic assessment, the latter was selected due to the discussion of authentic assessment in the research. The articles span from 1989 (when authentic assessment was first introduced) to 2015 (when this paper was being written). An effort was made to obtain as many articles as possible through the above methods.

The purpose of the classification was to highlight the different types of validity and reliability demonstrated in past research, when authentic assessment was implemented with the use of rubrics. As a result, articles where authentic assessment was implemented without the use of rubrics were excluded from the classification. Table 5 provides a snapshot of the criteria used for the inclusion and exclusion of articles from the classification.

Table 5 The criteria used to select articles for classification

The articles included in the classification were reviewed (Table 6) to investigate the extent of validity and reliability testing of rubrics in the past, when used as an authentic assessment instrument by researchers for student assessments in various areas of education and training.

5 Gaps found from the literature classification

The intention of the literature classification was to find out the extent of investigation that has been carried out in the area of testing validity and reliability of rubrics as authentic assessment tools. Reliability and validity problems are found to be very typical of authentic assessment (Olfos and Zulantay 2007). It is often assumed that reliability is achieved concurrently with validity, due to which it may be ignored or accepted with low levels in traditional assessments (Olfos and Zulantay 2007). This was evident in the study by Olfos and Zulantay (2007) which showed a lack of reliability but showed evidence of validity. So, reliability is often accepted as a necessary condition of validity (Olfos and Zulantay 2007). However, in cases of authentic assessment, reliability cannot be ignored or accepted with low levels as a trade-off between validity and reliability (Jonsson 2008). Reliability mainly indicates consistency of performance which is essential for workplace-based tasks.

The most obvious gap found in this respect reflects an absence of both validity and reliability testing in some studies such as Todorov and Brousseau (1998), Emery (2001), Vendlinski et al. (2002), and Brawley (2009). Reliability and validity are crucial to the acceptance of authentic assessment as an accurate measure of knowledge, skills, and behaviours (Stevens 2013). There are numerous extraneous variables that affect the validity and reliability of the rubrics when used an assessment instrument (Taylor 2011). If these variables are not addressed, then the validity and reliability of the assessment and the resulting outcomes becomes questionable (Olfos and Zulantay 2007). Fook and Sidhu (2010) believe that there is a general lack of research in exploring practices that can improve validity and reliability of assessments through criteria and standards provided in rubrics. The classification reveals that past research in the area of authentic assessment has addressed typically only one or two aspects of validity and reliability while others have not been investigated. The validity was mostly achieved through a review by field experts as evident in the studies by Moon et al. (2005), Fatonah et al. (2013), Olfos and Zulantay (2007), Johnson (2007), Taylor (2011), and Lang II (2012). Barring one study by Jonsson (2008), none of the studies in the classification demonstrated construct validity. A lack of construct validity may indicate that that underlying psychological variables such as problem-solving, social interaction, and communication which are required universally in most professions were not adequately assessed in these cases.

Some studies revealed other types of validity, such as face and convergent validity, which were not categorised under the three common types of evidence required to support the validity of an assessment instrument. While face validity is the weakest and least scientific form of establishing validity, convergent validity was explained by Cassidy (2009) as a subcategory of construct validity that seeks ‘agreement between a theoretical concept and a specific measuring instrument’. The review revealed that some researchers like Cassidy (2009) use a pre-tested instrument expecting the same validity and reliability as obtained in previous studies. However, if using a pre-existing instrument, it is essential for researchers to establish the instrument’s validity and reliability in the context of their own research (Burton and Mazerollw 2011).

A common method for establishing reliability for rubrics is revealed to be through inter-rater scoring or internal consistency reliability. Reliability in authentic assessments has often demonstrated by a variety of statistical measures and coefficients as evidenced by the studies ofJohnson (2007), Lang II (2012), Olfos and Zulantay (2007), and Diller and Phelps (2008). According to Lovorn and Rezaei (2011), simply using rubrics do not improve the reliability of the assessment. Reliability can only be improved if rubric users are well trained on its development and use. Raters/Scorers need to be involved in the development of rubrics or else it takes time for them to understand its purpose and implementation (Diller and Phelps 2008). For example, the study by Lovorn and Rezaei (2011) involved the training of 55 teachers in rubric use to find a resulting increase of reliability in writing assignments. However, many of the studies such as Moon et al. (2005), Olfos and Zulantay (2007), and Diller and Phelps (2008) do not mention any training for rubric users before they were administered. In the study by Taylor (2011), teacher development workshops were carried out to minimise threats to internal validity only. However, according to Taylor (2011), training conducted for rubrics development or use should be consistent for all involved. Differing approaches in terms of context, standards, or application can impact the results of research data and create problems with validity.

The classification also reveals an absence of research of authentic assessment in the field of seafarer education and training. Past research (Bell and Bell 2003; Cassidy 2009; Wellington et al. 2002) showed that authentic assessment has been implemented to investigate its impact on achievement of educational or professional standards, constructive alignment of instruction processes with assessment, and achievement of professional competence (including demonstration of essential behaviours). Similar research is needed but has been largely ignored in the area of seafarer education.

6 Conclusion

The move of the STCW’95 code towards OBE highlights the need of assessment practices that allow demonstration of learning outcomes by seafarer students through performances in real-world or contextually similar settings provided by authentic assessment.

To validate if intended outcomes are being measured consistently through assessments, authentic assessments need to achieve validity and reliability through clear statements of learning expectations provided by assessment rubrics. The validity and reliability of the rubric is not only essential for the validation of outcomes attainment but also for the rubric to be accepted as an instrument of authentic assessment that can effectively measure outcomes. An extensive literature review in the area of authentic assessment revealed a lack of research in a holistic approach to addressing different aspects of validity and reliability of rubrics when used as an authentic assessment instrument. The absence of a robust framework challenges and undermines the resulting outcomes from the learning and teaching experience attained by past researchers who based their findings using rubrics that addressed only selected aspects of validity and reliability. While addressing different aspects of validity will identify and assess the content and essential underlying constructs of professional competence in different contextual scenarios, different aspects of reliability will assure consistency in performance. Overall, this will ensure a holistic approach to competence assessment at a standard expected in employment.

Past research provides theoretical justification and empirical evidence of the value of authentic assessment when educators are seeking to:

  1. (1)

    Obtain evidence of the development and achievement of professional competence,

  2. (2)

    Raise the standards of student performance and achievement,

  3. (3)

    Measure the effectiveness of the teaching and learning,

  4. (4)

    Develop higher-order and critical thinking skills in students, and

  5. (5)

    Successfully align learning, teaching, and instruction with assessment.

The above outcomes together with a holistic approach to competence assessment will also benefit seafarer education and training. While knowledge-based components may continue to be assessed via traditional examinations, application of skills in real-world contexts will engage seafarer students through meaningful and relevant learning. Authentic assessments will go beyond meaningful contexts and also require seafarer students to integrate competence acquired for different STCW tasks for a holistic workplace-based performance. For example, assessment for the STCW task of ‘planning and conducting a passage and determine position’ may be designed to integrate components from other STCW tasks such as ‘maintain a safe navigational watch’, ‘use of ECDIS to maintain the safety of navigation’, and ‘manoeuvre the ship’. Assimilating, analysing, and integrating information from different units of competence will make the seafarers active participants in the process of learning and enhance student engagement. Demonstrating competence in authentic contexts will provide seafarer students with an understanding of how skills acquired in classrooms may be transferred at the workplace. Using pre-established performance criteria, students will frequently reflect on their current level of learning and compare it with the level required at the workplace, allowing them to develop strategies for raising their standards of performance.

The review reveals that there is a lack of global research on authentic assessment in the field of seafarer education and training. Further research needs to establish how to use authentic assessment within the confines of the STCW Code to improve:

  1. (1)

    Student engagement,

  2. (2)

    Transfer of competence, and

  3. (3)

    Standards of performance.

Inherent to such future research, investigations shall also reveal ways to:

  1. (1)

    Increase the validity and reliability of rubrics as an authentic assessment instrument and

  2. (2)

    Use rubrics as an authentic assessment instrument to satisfy employer and regulator expectations with the attainment of the standards stipulated in the STCW Code.