1 Introduction

Recently, health professional curricula have emphasized competencies that are pertinent to each country based on population nature and needs. Hence, such local competencies (learning outcomes) were published as the Saudi Meds [1]. The Saudi Meds is a national competence framework that has been developed by medical schools in the Kingdom of Saudi Arabia. The framework has seven domains: (1) approach to daily practice, (2) doctor and patient relation, (3) doctor and community, (4) communication skills, (5) professionalism, (6) doctor and information technology, and (7) doctor and research. The framework will guide curriculum development and assessment existing in all health professional education to ensure its adaptation to changing needs. The following sections address assessment and evaluation of the aforementioned domains.

2 Strategic Goal 10: To Develop a Comprehensive Approach to Students’ Assessment that Addresses All Educational Domains Including Knowledge, Skills, and Attitudes/Values

This goal, along with other goals in this manual, was developed through several meetings and workshops shared by an elite group of faculty educators and students representing almost all HSCs at KSU. Assessment of learning outcomes encompasses several issues including understanding of the principals of assessment, appropriate use of assessment methods and tools, and the comprehensive approach of assessment that covers the full range of educational domains. Effective assessment must consider the psychometric properties of the examination, i.e., to be valid, reliable, and feasible and to have a measurable impact on learning through quality indicators. There are several assessment methods and tools to measure learning outcomes. However, each tool is appropriate for the context to be measured. Therefore, no one method is appropriate for all domains of learning outcomes. Assessment methods include written examinations, practical and clinical, observational, portfolios, peer assessment, and self-assessment. Learning outcomes include knowledge, skills, attitudes, and values. These learning outcomes differ from one institute to another and from an environment to another depending on community needs [2]. The Miller’s Pyramid explains the learning outcome domains and specific tools and methods utilized to assess those domains (Fig. 18.1). The following initiatives are proposed to achieve this goal:

Fig. 18.1
A pyramid diagram represents the clinical competence assessment with does at the top followed by shows how, knows how, and knows. It depicts the assessment in the work environment followed by assessment in control situations, assessment of capacity for clinical context application, and test factual recognition.

Miller’s pyramid to assess clinical competence

2.1 Objective (Initiative) 10.1: To Develop Comprehensive Assessment Approaches for Courses that Address All Learning Domains

This initiative is the core issue of assessment that addresses a product expressed as students’ learning outcome. Most health professions educational programs adopt several methods that assess students’ learning outcomes of their knowledge, skills, and attitudes/values. However, they vary in type and number of assessment methods used depending on staff experience in assessment methods, presence or absence of health professions educational departments or centers, and program accreditation requirements by higher authorities. Ideally, assessment methods should address the achievement of all learning objectives in a valid, reliable, and feasible manner with an impact on the learner and the educational program [3, 4]. Validity of an assessment tool is the degree to which the tool is measuring what it is supposed to measure. In other words, it assesses the validity of scores rather than the instrument itself. Validity can be further broken into three sub-types. The first type is content validity, which reflects the degree of sampling from different learning domains of the subject to be assessed. This is the most important type of validity, which could be achieved by appropriate assessment blueprint construction [5]. The second type is criterion validity, which compares test scores against a criterion or gold standard. The third type is construct validity, which is the ability of an instrument to measure what it purports to measure using additional information that supports this notion. Reliability refers to consistency, reproducibility, or stability of test scores upon repetition. When a group of experts agree or become in close agreement about an examinee is called inter-rater reliability. Therefore, reliability can be measured by test–re-test, equivalent forms, split-half, and item-to-total scores comparison (internal consistency). Usually, reliability is measured by commercially available software, especially for multiple choice questions tests. A reliability coefficient value above 0.7 on regular exams or 0.8 and higher for high stakes exams is considered reliable. For a test to be valid, it has to be reliable as well, but not necessary the opposite (Fig. 18.2).

Fig. 18.2
4 illustrations present reliability and validity targets with varying patterns of dots proliferated over the 3 concentric circles in a 4 quadrant plane. They are titled unreliable and invalid, unreliable but valid, reliable but invalid, and both reliable and valid.

Reliability and validity target

Feasibility refers to the practicality of the assessment method with regard to available resources and expertise and costs. The impact of assessment on the learner and educational programs varies from one system of assessment to another. Since assessment drives learning, learners will try to pass exams the way they are designed for, e.g., memorization, last moment studying, review of previous exams, and guessing or even cheating. In order to have positive impacts on learners, educators should ensure validity of the assessment content, the way it is conducted, what is asked (information is given), and the time and frequency of continuous assessment sessions. Once these concepts of psychometric properties for testing are understood, it has to be applied to all types of testing methods to ensure best assessment and learning outcomes. Learning domains include knowledge, skills, and attitudes/values.

Details of this initiative are summarized in Table 18.1. Reports of the needs assessment from Goal 9, initiative 9.3 “Develop a comprehensive approach looking at the entire students’ process (input–process–outcomes) and document issues arising during studies” are reviewed by the Assessment Steering Committee and areas for improvement in assessment systems are highlighted. Assessment systems are then benchmarked with best practices in assessment approaches in order to develop a comprehensive handbook for assessment approaches through several workshops and final workshop inclusive of all HSCs and stakeholders. The handbook assessment guide must be aligned with the NCAAA and similar international assessment guidelines. Once the final draft of the handbook is reviewed and approved, it can be implemented and monitored with supporting research studies to validate it.

Table 18.1 Strategic plan for developing comprehensive assessment approaches for courses

2.2 Objective (Initiative) 10.2: To Develop Guidelines for Comprehensive Assessment Across All Learning Domains

This initiative (Table 18.2) discusses how to implement the assessment handbook and guidelines by all HSCs. This requires tremendous efforts by responsible parties and partners of this initiative to train faculty and administration of HSCs in every step and detail of the assessment handbook in order to be ready for successful implementation. Once the approved guidelines manual is ready, it will be published by KSU press and distributed to all HSCs teaching and learning (medical education) units/departments as a reference. At this stage, the assessment guidelines manual will be ready to be implemented at all levels of health professional education. Before implementation, however, knowledgeable and experienced educators at all HSCs will start making a strategic plan to implement the assessment guidelines manual at all HSCs through a one-day workshop led by the assessment steering committee at VRHSs. Then, each T&L unit/medical education department at the corresponding HSC will conduct a half-day seminar introducing the assessment manual to all faculty, students, and administrative representatives who are actively involved in courses coordination and students’ assessment. When the assessment manual needs to be implemented, the assessment, teaching, and learning unit (the medical education department) in each HSC needs to conduct short workshops for the various assessment tools along with course coordinators, faculty, students, and administrative representatives in each department. Monitoring and measuring the effectiveness of the implementation process is also of paramount importance to assure best learning outcomes. Assessment of the assessment methods is another issue that deserves more attention from educators and statisticians to analyze exam results and assure their reliability and validity. This will improve examination items writing, conduct, and results. Moreover, it will assure justice and equity among students who always look for evidence of their performance in examinations. Most health science programs have assessment centers where assessment training and workshops are conducted and examinations are revised before their conduct. In addition, assessment centers can monitor and supervise examinations, analyze and assess examination results, give feedback reports to various departments, and publish research on assessment and learning outcomes. The process of implementation may take one full academic year time depending on the degree of authority and support of the VRHSs, knowledge and skills of the personnel involved, and enthusiasm and cooperation of faculty. An estimated budget of 26,000 USD will be needed for the initiative resources, personnel incentives, and rewards for those who cooperate and compete for excellence. The KPIs of this initiative indicate that not all HSCs will be ready for the implementation of the assessment manual. Therefore, about (75%) of the HSCs are expected to participate in this initiative.

Table 18.2 Strategic plan for developing guidelines for comprehensive assessment across all learning domains

3 Discussion

Health professional education has undergone dramatic changes over the last century through four stages of innovation. In a significant shift from the scientific approach to health professional education commonplace in Europe at the end of the nineteenth century, the Flexner Era at the beginning of the twentieth century [6] was noted for the idea of teaching basic sciences as the basis of clinical sciences and practice. In the 1970s, problem-based learning was strongly promoted in an attempt to integrate basic, clinical, and social sciences through the use of problem scenarios [7]. Competency/outcome-based curricula became popular around the turn of this century. Learning outcomes vary from country to another and from an institute to another depending on societal and political needs [8]. In North America, broad learning objectives and learning outcomes were recommended by the American Association of Medical Colleges (AAMC) and the Accreditation Council for Graduate Medical Education (ACGME), respectively [9, 10]. Canadians also developed their own (CanMEDS) competencies [11]. The WHO—International Institute of Medical Education has produced a consensus of learning outcomes as minimal essential requirements for medical school graduate [12]. The Scottish Deans Medical Curriculum Group [13] adopted a framework of outcomes based on a three-circle model: what the doctor is able to do, his/her approaches to practice, and professional attributes (Table 18.3).

Table 18.3 Recommended assessment methods for the 12 learning outcomes in order of importance

In the Kingdom of Saudi Arabia, the Executive Committee for SaudiMED Framework developed six learning domains or themes with seventeen learning outcomes which are adopted by all medical schools: scientific approach to practice; patient care; community-oriented practice; communication and collaboration; professionalism; and research and scholarship [14]. Assessment of such learning competencies/outcomes worldwide, however, did not develop well along with these innovations in curricular development [10]. Assessment of learning outcomes in fact encompasses several issues including an understanding of the principals of assessment, appropriate use of assessment methods and tools against the desired competency/outcome, and the comprehensive approach of assessment that covers the full range of educational domains. Effective assessment must consider the psychometric properties of the examination that is to be valid, reliable, feasible and has a measurable impact on learning outcomes through quality indicators. These metrics are important measures that guard appropriateness and quality of examination methods; otherwise, examinations will be of low quality and products are usually weak. Low performance in common placement testing such as the progressive testing [15] may indicate indirectly poor performance at health professional schools and/or low-quality examinations. There are several assessment methods and tools to measure learning outcomes; however, each tool is appropriate for the context to be measured. Therefore, no one method is appropriate for all domains of learning outcomes. Investment in good assessment is also an investment in teaching and learning [16]. Shumway and Harden [2] summarized the assessment tools against each assessment category in “AMEE’s Assessment Guide No. 25” (Table 18.4).

Table 18.4 Assessment tools (instruments) against each assessment category

Written tests such as Long Essay Questions (LEQs) were a common assessment tool in health profession education at the begging of the nineteenth century. The LEQs are reliable for in-depth assessment of a knowledge segment (e.g., Describe the process of fat digestion and absorption in the gut?); however, they are not content valid tool that can explore the knowledge domain of the gastrointestinal tract, for example. Therefore, long essays might be appropriate assessment tool for in-depth knowledge. LEQs are very easy to construct but time consuming to correct, and teachers will lose concentration and interest while reading many texts, which may affect concentration and compromise fairness in grading. This, of course, will affect its practicality and validity to some extent. To avoid these disadvantages of long essays, the modified essay questions (MEQs), completion questions, and short answer questions have emerged at the middle of the past century as reliable, valid, and practical assessment tools, which have replaced most LEQs in health professions education [17]. Over the last three decades or so, there has been a general move to MCQs over all types of essay questions as objective, reliable, content valid, practical, easy to administer/share/correct and analyze assessment tools with good impact on learning outcome in health professions education [2]. Not only that, MCQs nowadays are widely used in admissions, progress testing, promotion from one level to a higher level, licensing, and in high stakes postgraduate board examinations. MCQs, however, cannot assess in-depth knowledge like essays, difficult and costly to construct, and have some cuing and guessing effects. Patient management problems (PMPs) and Extended Matching Items (EMIs) are not popular assessment tools nowadays as they used to be at the end of the past century because of difficulties with question setting, marking, and standardization.

Practical and clinical examinations are very important tools to assess practical and clinical skills and attitude domains of the clinical practice. Practical assessment includes spot examination and practical observation. These are easy to construct and administer, but lack content validity, i.e., they cannot sample enough from the skills and attitudinal domains. In order to solve this problem, the Objective Structured Practical Examination (OSPE) came to improve this issue by increasing the number of practical encounters through multiple stations and by standardizing answer checklists for all stations. Similarly, the clinical assessment tools are used to include long and short cases, which lack content validity and fairness of distribution among students (the luck of the draw!). Therefore, Objective Structured Clinical Examination (OSCE), Objective Structured Long Examination Record (OSLER), and Group Objective Structured Clinical Examination (GOSCE) have emerged as more valid and reliable tools to solve these issues and drawbacks. The OSCE, however, gained popularity over other methods during the last three decades as a reliable, valid, and feasible tool to assess clinical competence [18]. For a reliable and valid OSCE, a minimum of 20 stations are required, with the use of checklist and standardized patients (SPs) [10, 19]. SPs need to be well trained to portray real patients’ role in order to increase OSCEs’ validity [16, 20]. Feasibility of OSCEs varies from one academy to another depending on available resources, SPs, and experienced educators. Also, cost of OSCEs varies from center to center [21]. These costs increase with recruitment and training of SPs, training of examiners, and maintenance of exam security. Positive impacts of OSCEs on students include increased learning, satisfaction due to fairness in evaluation, and increases of their experience for future OSCEs. However, OSCE has some drawbacks including fragmented learning, no time for in-depth assessment, and students know most OSCE stations beforehand. However, lots of modification in OSCE constructions have modified these drawbacks.

Other forms of learning outcome assessment include direct observation during attachment (global rating), reviewing written reports (portfolios), logbooks, and self/peer/360° feedback reports. These are best used for communication, interpersonal, and other attitudinal skills. Reliability of these assessment tools increases if done by a committee of expert faculty/examiners and decreases if done by a biased faculty. For positive impacts on learning, these forms of assessment are best used for formative feedback, improving communication and interpersonal skills, and must be revealed to the student as early as possible. Negative impacts on learning happen when students are informed late and/or if done by inexperienced or biased faculty. More research to assess the validity and reliability of these forms of assessment is needed to encourage educators and faculty to use them more frequently. Another important area in any assessment system is the practice of post-assessment test items analysis, which gives the function of each tool and gives valid and reliable results.

4 Summary

To develop a comprehensive approach to students’ assessment requires faculty’s knowledge and skills on assessment principles, use of appropriate assessment method(s) that matches the appropriate domain (i.e. knowledge, skills, attitude), analysis of the results, and their interpretation. The strategy to achieve this goal involves two initiatives. First, to develop comprehensive assessment approaches for courses that address all learning domains. Second, to develop guidelines for these domains. The strategic details for each initiative, the recommended assessment methods, and tools were outlined. The estimated time needed to complete each initiative and its budgetary details depends on studies, meetings, and discussions by relevant stakeholders involved during the implementation process.