For almost two decades, educational assessment in the health professions has seen a major effort to introduce competency based frameworks. The intensity of the movement to competency-based education (CBE) and assessment (CBA) has not been matched by any other assessment activity in our shared professional experience of over 40 years. In contrast, the introduction of simulation in assessment in the late 1980s appeared as a rapid series of field tests wherein many evaluation studies, conducted with regulatory partnerships, served as the bases for establishing a sustainable innovation. In this paper, we argue that the underlying developmental work and regulatory interaction with the educational sector around CBA is lagging, especially in the context of postgraduate medical education (PGME).

We are not the first to raise concerns about CBA in PGME. Three categories of key developmental issues were noted in 2011 by The Future of Medical Education in Canada Postgraduate Project (Regehr et al. 2011). Of even greater concern, the meaning of terms and definitions used in CBA assessment tools have been questioned, raising some potential validity concerns (Lurie et al. 2011; Govaerts and van der Veuten 2013). Yet almost a decade later, we continue to encounter problems with the actual application and translation of CBA into action. In the assessment literature, the concerned comments of respected colleagues (Norman et al. 2014) and the documentation of many challenges (Williams et al. 2015; Holmboe et al. 2015; Iobst and Holmboe 2015; Klamen et al. 2016; Hawkins et al. 2015; Carriccio et al. 2016), continue to raise red flags for us. As developers and promoters of assessment strategies internationally, we have dealt with uncertainty about the implementation of CBA in our interactions with program directors and other educators. Without considerably more attention and improvement, we are concerned that CBA is not sufficiently well integrated into the wider world of health care quality, thereby potentially impacting on its sustainability. In this paper, we offer two possible roadmaps or frameworks to help CBA achieve sustainability.

Roadmap one: reframe and address CBA as a measurement development issue

The first roadmap restates the CBA challenge as a measurement development issue, and there are good models for doing this. The journey leading to successful innovation in this realm is based on a series of steps that can guide innovators along the pathway to implementation. Observance of these steps is required to reach a sustainable innovation in the real world of professional education and assessment. Pioneering work by Maatsch et al. (1976) in the early days of the new certification processes of the American Board of Emergency Medicine (ABME) in the 1970s serves as a template for assessment innovators today. In addition to shifting to criterion referencing, the project established best practices for large scale assessment innovations: (1) planned collaboration amongst key stakeholders including the profession, the regulators and the educators; (2) established standards for national credentialing and certification processes; (3) undertook full-scale field trials of the assessment tools (directed by independent psychometric experts); (4) used project management to plan, execute, control and finish the specific assessment goals for the ABEM; and (5) based on subsequent follow-up studies, eliminated inefficient assessment formats (Munger et al. 1982; American Board of Emergency Medicine 2018). However, since the entire ABME system was new, the challenge of integrating the innovation into an existing assessment scheme was not an issue.

In contrast, the introduction of simulated patients into summative and formative assessment in the mid-1980s needed to be integrated into existing systems. That required preparatory work and pilot studies aimed at engaging all of the institutional stakeholders and supporting the measurement qualities of the assessment scores. Only after finding evidence of feasibility and measurement quality were the policy decisions made to proceed and adopt the innovation on a wider scaler. This was the strategy for the introduction of patient simulation into the assessment systems of the Medical Council of Canada (Reznick et al. 1993; Dauphinee and Reznick 2011), the Educational Commission for Foreign Medical Graduates (Boulet et al. 1998, 2009), and the National Board of Medical Examiners (Swanson et al. 1999).

Given the challenges facing CBA, the first step might be placing more attention on the development of an actual assessment model. It is not clear to us that such a step has been taken by the many authors promoting CBA. Fortunately, the recently revised Practical Guide to the Evaluation of Clinical Competence by Holmboe et al. (2017) has laid out the development and pre-testing requirements that are essential to establish that CBA instrumentation meets current standards of best practices in assessment.

The second step is to address the measurement issues associated with the model, chief among them being validity. Building on Kane’s (1992) notion of establishing four validity links: scoring; generalization, extrapolation, and interpretation for promotion or for remediation, Clauser et al. (2018) have carefully outlined how the required evidentiary links can be made. Their approach assumes that dictionaries of terms and definition, as well as rubrics for scoring or defining performance standards for promotion or remediation, are established. As noted before, Lurie et al. (2011) reported that these definitions, which help establish the intended assessment goals, were often being negotiated ‘on the run’. The issue of the quality of the CBA assessment tools is a pivotal ‘go or not go’ consideration. The documentation of the measurement qualities of the scores or ratings is essential to move to the next step.

The third step to develop a model of implementation at the local level. It has to be planned, communicated, and followed for results and feedback so that improvement is generalizable across settings. This was important for the administration of standardized-patient-based assessment at different centers in Canada (Tamblyn 1998; Reznick et al. 1993). It is also an issue in work-based assessment across sites (Norcini and Burch 2007). The implementation plan must also include extensive training and preparation for faculty members, as we will cover in more detail shortly.

Finally, moving through the steps of creating an assessment model, documenting its measurement qualities and generating a model of implementation, the final step is judging whether it all fits together in the real world. Borrowing from a model used in the early days of defining the benefits of a health care system (Solon et al. 1960; Lee 1974), there are five key questions that must be asked repeatedly during the development of new systems that will impact on many stakeholders: (1) What are the benefits? (2) Who benefits? (3) Who decides? (4) Who pays? (5) Who manages it? These guiding questions must be asked repeatedly before wider implementation of CBA should be considered or even advised. This is a huge undertaking. Like all large building projects, the CBA movement needs a basic guiding framework to ensure that it addresses each of the unanswered challenges for potential users at different locations and faculties.

Roadmap two: implement CBA as sustainable innovation within existing enterprises

The second roadmap restates the challenge of CBA within a quality improvement enterprise that encompasses both the health care and educational systems. This wider context includes recognition of all of the stakeholders involved, including the regulatory authorities and certification bodies down through the leadership at the educational institutions to the teachers and the faculty support services that enable assessment at all educational and clinical care institutions. Implementation requires a shared roadmap to ensure that the basic components are identified and the responsibilities for them are clear in the execution of a common vision. This roadmap was adapted from Christensen’s case based management studies of innovation (2013). We turned to the management literature because the administration of CBA must occur at multiple levels, through multiple interfaces. While CBA may start as a well-intended national or state-based self-regulation initiative, it soon can become a set of challenges for the academic leadership and resources appropriation team at each individual Faculty. Ultimately, the implementation of CBA will surface as key developmental issues at the learner–teacher–mentor interface. To illustrate, accessible central support will be needed for a peer-directed program of professional development aimed at each faculty member and for the re-orientation of each trainee. Similarly, the faculty in the field will need new technical support, including serviceable dashboards to submit summative data and to record observations for formative feedback.

At the leadership level, in The Innovators Solution, Christensen et al. (2013) discuss turning innovative ideas into new processes and tools that refocus an existing system to improve its outcomes and services. This implies that, before adopting any externally promoted framework, leadership must consider if an innovation is likely to be ‘disruptive’ to current and functioning quality assurance programs. For leaders, Christensen (2000a) offers three determining ‘elements’ of success: What is the rate that the users or learners can fully use or absorb the innovation? Does the expected rate of improvement go beyond what the users or learners can fully use or absorb? Is there clarity on the distinction between sustaining and disruptive innovations? These questions are important pretests because innovation is aimed at users or learners with a view to ‘better performance’ of outcomes or services that were not adequately emphasized previously. In contrast, in business, disruptive innovations are not aimed at better products or services for the target audiences or users. Typically, they are services that are simpler or more convenient or cheaper to use. However, CBE and CBA are about continuous quality improvement and sustainability. Clearly CBA should be about readiness for innovational solutions, not creating disruptions.

Having proposed a framework or management scaffolding for leaders, what are the potential downstream risks for the Faculty and learners if the fiscal support and technical demands of CBA are not anticipated in advance by the management team? Again, Christensen’s (2000b) summary offers sage advice. Managing innovation mirrors the resource allocation process. If not prioritized at the Faculty leadership level, the innovation will starve for lack of needed resources. The necessity of deliberative priority setting for innovations, especially ones with far reaching resource implications like CBE and CBA, cannot be understated. Furthermore, if the information needed to inform prioritization of the innovation is non-existent or if an innovation is undertaken in differing social or economic circumstances, Christensen’s suggestion of ‘lessons with learning’ can be created through fast, inexpensive forays (pre-tests) into the field with the teacher–learner dyads using the CBA product or tool. Analogous to phase one testing of health procedures, these pilots can give positive answers or provide small failures for iterative learning—and establishing ‘face’ validity. For sustaining innovations, getting there ‘first’ or being seen as a leader is not important. It is about achieving consistent and incremental improvement. The human and fiscal resources implications of CBA adoption must be carefully considered.

Continuing with that same line of questioning, how else can potential users drill down further to see the implications CPE and CBA for the program’s associated teaching institutions or community placements? By using entrustable professional activities (EPAs) (Ten Cate 2013; Ten Cate and Scheele 2007) and/or the broader programmatic assessment (PA) (van der Vleuten et al. 2015), one can conduct arm-chair trials to anticipate potential implications. The purpose is to identify specific risks and opportunities and to help potential CBE adopters to decide if the innovation is sustainable in their setting given their purposes. This is a particular concern in PGME where a medical faculty and its associated teaching sites will be faced with integrating 30–50 specialties into an adequately funded process that will provide the required central support and needed faculty development processes (Holmboe et al. 2011).

Similarly, moving to the next down-steam implications of ‘bigger data’ models for assessment, each faculty and each implicated PGME program, could consider the data collection implications of CBA by using arm-chair pilots of introducing EPAs or PA. It will help identify the support needed to ensure that teachers and mentors can: (1) record real time observations of performance on-line; (2) be trained and organized to offer real time non-judgmental feedback to students/trainees for daily improvement; and (3) have central support to collect and summarize data and develop information packages on each learner. In turn, once collected, information must be sent to the Faculty’s PGME office to be considered and interpreted for each trainee in each of the faculty’s 30–50 PGME program’s promotions committee. It is they who collectively review and interpret the data and observations in order to make valid decisions about advancement to the next level of responsibility and trust. Technological and analytical advice and support aside, Eva and co-authors have suggested why significant faculty preparation and training is needed (Eva et al. 2016).

There are several other consequences of employing continuous monitoring frameworks at the level of an individual faculty member. In contrast to summative decisions, the observations and qualitative reports supporting formative assessment will also need data collection systems during field placements. Faculty member support systems for work-based assessment methods are a good pre-test for those considering CBA. These day-to-day assessment moments for feedback must be de-coupled from data for decision-making as in judgment for promotion (Van der Vleuten et al. 2012). Moreover, real-time feedback and formative learning must be evaluated by comparing assessment information against learning goals and predefined expectations. That should include regular access to on-line programs or dashboards that permit easy submission of field data into the student’s record. Recent reports have confirmed that the introduction of these data expectations is not easily accomplished (Dudek et al. 2012; Cook et al. 2016; Van Loon et al. 2016; Hauer et al. 2016). The rush to continuous quality improvement processes in both UGME and PGME is a big step forward for most faculties and its additional resource demands can be underappreciated. Therefore, the expertise of mentors and assessors is essential as judgments must be made on data where the inferences have implications, both for normal feedback as well as for promotion to the next level of responsibility. Furthermore, it has been documented that formative assessment activities can be viewed as summative by trainees, thereby offering a threat to the validity of the formative assessment processes (Govaerts 2015). Faculty and trainee preparation on authentic and regular formative feedback is essential (Dath and Iobst 2010; Holmboe et al. 2011).

Finally, as Norman et al. (2014) have noted, any assessment change must be administered and evaluated within the broader legal contexts of medical school promotion as well as licensure and certification. Those contexts are bounded and directed by basic legal and natural justice frameworks that are rooted in legislated requirements and legal jurisprudence. The assessments are typically based on internationally respected standards for educational and psychological testing (Dauphinee 2002; American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education 2014). These long established frameworks define the legal processes and standards under which both educational institutions and licensure and certification bodies must operate. They also set the guidelines for the structures and assessment processes needed to establish equivalency across educational settings and help to define the learning culture needed for assessment processes to thrive. The introduction of CBA is intended to create a new culture of achieving better results through continuous improvement with an existing educational system. That necessitates evaluating, in advance, if CBA is feasible and sustainable within the existing legal and administrative quality assurance formats.

Who will own and manage the CBA innovation challenge?

We support the view that the emphasis in assessment must be on the quality and outcomes of care in the broader clinical context (Bismil et al. 2014; Warm 2016; Wong and Holmboe 2016; Chen et al. 2014). Social accountability for the professions must always be framed within the health care quality effort, including the teaching institutions, and their impact on the population’s health. Given that this is now a widely accepted perspective, we are faced with a major innovation being promoted by a broad range of institutions responsible for the governance and assessment of physicians for licensure or certification. It seems reasonable to ask whether there are consortia of stakeholder organizations that are willing and able to take responsibility for assuring that the optimal solutions are identified for the effective implementation of CBA? While not possible globally, pockets of intense CBA activity exist in several countries: Canada, United States, and some European countries. For earlier innovations in assessment, informal coalitions of sponsoring bodies invested time, staff and money in moving the innovation forward. Sometimes, partnerships were formed with the educational sectors which created incentives for sharing the cost of innovations and promoting inter-organizational developmental training (Dauphinee and Reznick 2011; Tamblyn et al. 2002). In the case of CBA, opportunities for similar inter-organizational collaboration and longer term planning are apparent. Is it not time for those bodies with deeper pockets and a strong sense of social responsibility to own and ensure that established management and assessment practices are met in order to have sustainable change?

Concluding comments

Having considered the general status and assessment issues surrounding CBA, we are concerned about the quality and feasibility of the implementation of CBA globally and within existing teaching programs. We have suggested frameworks and steps to ensure that a sustainable CBA innovation emerges. We have included outcomes-based management lessons for dealing with innovative challenges at the local program level, a necessary step before a faculty introduces CBA.

In the end, the main roles for assessment should be to offer learner feedback for improvement, provide evidence judged against defined expectations for promotion from one phase to another, and derive data to assure continuous quality improvement of the program and its director and mentors. If not, CBA will be unsustainable for the trainees, the faculty, or the public.