1 Introduction

The maritime industry is massive and is responsible for over 90% of global trade. The industry employs around 1,653,500 people across many countries (BIMCO 2015). This responsibility requires many high-stakes operations to ensure that goods are transported across the globe in a timely manner. High-stakes operations are complex and system deviations can have devastating consequences. The complexity of such operations is associated interdependent collaborations and dynamic decision-making; these factors can add up to make the work of maritime operators exceptionally straining (Kluge 2014). This complexity has led to severe accidents such as the capsizing of Costa Concordia, the Sewol ferry tragedy (Kim et al. 2016) and the El Faro accident (Coast Guard 2017).

The potential consequences following errors in high-stakes operations are costly in terms of environmental damages, operating expenses and health hazards (Naderpour et al. 2015). Accidents are believed to be an inevitable part of high-stakes operations; such as the Federal Kivalina, Crete Cement and M/V Godafoss accidents have repeatedly demonstrated this reality (Accident Investigation Board 2010a, b, 2012). In response, many significant incentives exist for ship owners, crews and local communities to identify which measures that can prevent accidents or mitigate damages. Several possibilities exist for advancing on this issue such as improving technical systems, designs, or the engineering phase. Other options include establishing training and hiring procedures (Leveson 2011).

Regardless of which system components are inspected, performance assessment remains an essential method for identifying the measures that strengthen safety and efficiency (Wiggins 1993). However, it is difficult to assess high-stakes operational performance (Delandshere and Petrosky 1998). Accurate and consistent performance assessments are necessary to provide useful information regarding safety and efficiency. This need implies a systematic effort to understand the mechanisms in an operation in order for an assessor to pinpoint specific parts of the system that require enhancements (Bouejla et al. 2014). Information about system weaknesses is crucial in order to apply improvements that will eventually lead to safer and more efficient operations. This systematic effort requires that the assessment tool is able to capture key nuances in operations. The benefits of powerful assessment tools are even larger when the consequences are greater.

Assessment methods can take many forms. Generally, they consist of a hierarchy of previously identified performance indicators in which higher indicators are calculated based on lower indicators (Ernstsen et al. 2016); Manca et al. (2014) provide an example. The quality of an assessment depends on an accurate and consistent development of the method and proper identification of performance indicators. Consistency in this development process tends to vary in other industries and operational segments (Aditya et al. 2015).

Considering the maritime industry’s indispensable position in global trade, comprising a plethora of high-stakes and challenging operations, it is necessary to appraise the integrity of the way performance is assessed. Accurate and consistent performance assessment benefits all parties and leads to higher returns in terms of operational safety and efficiency, as demonstrated in other high-stakes domains, such as aviation (Mavin and Roth 2014), railways (Abril et al. 2008) and power plants (Nazir et al. 2014; Nazir et al. 2015).

The lack of performance assessment research on maritime operations is alarming, but attention to the matter has increased in recent decades (Rødseth et al. 2016). To maintain momentum in producing new research, the current study investigates the accuracy and consistency of performance assessment methods used in the maritime industry by examining how these methods are developed.

The accuracy and consistency of developing performance assessments across four major maritime segments were investigated using a systematic quantitative literature review to identify performance methods and the approaches used in all examined research papers. The following section presents a theoretical overview of the four approaches for developing performance assessment methods and a description of the maritime segments that are investigated in the current research. Subsequently is a presentation of the method; it is followed by a presentation of the results and analysis. The paper concludes with a discussion of the results.

2 Four approaches for developing performance assessment methods

Evaluating the development process of a tool provides information about its accuracy and consistency (Downing 2003). Following this argument, bottom-up, top-down and hybrid approaches can be considered to be accurate and consistent, while disconnections between data or theory and application are generally associated with inaccurate and inconsistent approaches to developing performance assessment methods (Hinkin et al. 1997). See Fig. 1 below.

Fig. 1
figure 1

Four approaches to performance assessment. Bottom-up, top-down and hybrid approaches are considered to be accurate and consistent processes for developing performance assessment methods

2.1 Bottom-up approach

Studies that fit in this category have a goal of developing or identifying performance indicators (PIs) within a defined operation or industry. The methods of finding PIs vary, though most involve using interviews, questionnaires and observations from subject matter experts (SMEs). The work commonly generates a list of PIs that are specific to the operation but can also be developed for generic usage; Leriche et al. (2015) provide an example.

Considering that the PIs identified are not limited by existing methods can be advantageous since the researchers have flexibility to adapt to the assessment for a specific situation. On the other hand, disadvantages raise questions of validity, such as whether data collection and subsequent analyses have been properly designed and carried out. It can be challenging to develop PIs in sociotechnical systems without the assistance of a theoretical framework. For example, some PIs may count a performance score twice at different stages of an operation. It is possible for a theoretical framework to account for this misinterpretation using algebraic calculations or through sophisticated modelling prior to measurement, though the framework’s calculations must be valid as well.

2.2 Top-down approach

Another approach is to use established literature, theories, regulations, legislations and frameworks to assess the PIs associated with an operation. Studies with a top-down focus use an established PI framework to evaluate the performance of an operation. In addition, such research can provide further validation of PIs that were previously identified in studies that use the bottom-up approach, such as Talley et al. (2014).

Efficiency and validity are advantages of a top-down approach. In many situations, established frameworks can provide valuable definitions and formulas to effectively measure the performance of an operation, eliminating the work of developing new PIs. Furthermore, robust legislation, regulation and standardisation of measurement systems may justify a framework’s validity and increase the trust of true measurement. One disadvantage is the lack of flexibility; if an established framework is tailored to a specific operation, the framework may condition the validity in another operation. Attention to and knowledge of a framework is necessary to use it effectively across situations and ensure a truer measurement of performance.

2.3 Hybrid approach

A combination of a bottom-up and a top-down approach can also be used. In this approach, data is gathered and analysed to develop PIs; at the same time, the PIs are evaluated against a set of predefined performance assessment frameworks; Sleire and Dale (2009) provide an example. The approach demands more resources than a single bottom-up or top-down approach, but it benefits from flexibility and established validity.

2.4 Inadequate approaches

Another approach is to haphazardly (or at least highly subjectively) determine a set of indicators for measuring performance. Depending on the available measurement tools, resources and knowledge of the system, this approach questions the validity of the data. One reason is that only a fraction of the system is measured, and interconnections existing in complex high-stakes operations are disregarded. Clearly, measuring all variables in a complex system is ideal, but a systematic approach may reveal the most important aspects of system performance. On the other hand, efficiently selecting indicators makes it possible to pinpoint areas of focus and relevant variables in a system; however, the highly subjective selection of indicators may compromise the accuracy and consistency of the overall operation. The current paper refers to these approaches to developing assessment methods as inadequate approaches.

Comparing various development processes across research studies is feasible and valuable in contrast to merely comparing specific assessment methods developed for distinct purposes. The relation between the processes is illustrated in Fig. 2 below, in which the bottom row (shaded area) represents the aims of the current research.

Fig. 2
figure 2

How the aims of the current research (shaded area) fit into the overall process of assessing performance

2.5 Four major maritime segments

The current research scrutinises the process of developing performance assessment tools in four major maritime segments. Port logistics, ship handling, safety and environmental performance are investigated because they all play a significant part in most shipping operations, and are thus widely researched. This information is analysed to deduce the accuracy and consistency of assessment methods in each of the respective segments.

Ports are essential hubs in maritime trade. Ports have become increasingly complex, evolving from a rudimentary place where cargo is handled to a functional element in the logistics chain that involves the flow of commodities, people and information (Roh et al. 2007). Extensive research has been conducted to develop assessment methods that capture the complex interplay among all agents in a port in order to find the best solution to port logistics.

Ship handling is the manoeuvring of a vessel, which encompass both technical seamanship skills and teamwork skills among crewmembers. Maritime operators must withstand a harsh and dynamic environment, often in isolation. This work is challenging, and measures must be taken to ensure that a crew has the skills required to accomplish necessary tasks.

Similarly, safety concerns are highly important in high-stakes industries and have been widely researched. The costs of a safety breach can be tremendous, so utmost care must be taken to increase safety. However, measuring safety is difficult because of its complexity, and many resources are invested in developing assessment frameworks for safety concerns.

Environmental considerations are increasingly relevant. Ship owners, local societies and governments are all apprehensive about the environment and express interest in green fleets. To be considerate of environmental impacts can yield productivity benefits for ship owners in the form of reduced fuel consumption as well as local benefits such as less pollution. Assessment methods have been developed to understand various aspects of the investigating environmental impact of operations from effects on coral reefs to carbon emissions.

3 Method

Peer-reviewed papers about maritime performance assessment were gathered from the Scopus, ScienceDirect and JSTOR databases. The criteria for including literature in the review follow the exclusion process depicted in Fig. 3 below. The time range considered was from 2005 to 2016; no relevant papers published before 2005 were identified.

Fig. 3
figure 3

Process of excluding papers in the literature review

3.1 Research statement and database search

A search statement was developed to ensure consistency across all database searches. The use and development of performance indicators in the maritime industry was broken down in four concepts; various combinations of these keywords (please see Table 1) have been explored in the literature. Concepts were topics in the search statement with relevant synonyms or alternative spellings such as maritime and marine, which are the British and American terms for the same concept that are both widely used in maritime literature. “Maritime” is the term used in the current paper. In total, 91 papers were found in Scopus, 568 were found in ScienceDirect (although search results only display 489 findings) and 44 were found in JSTOR. The same Boolean key-strain was used in all databases. After duplicates and unavailable papers (193 papers) were removed, 537 distinctive papers remained.

Table 1 Keywords for the four concepts used to search for relevant literature

3.2 Process of exclusion

The subsequent step in the process involved excluding irrelevant papers. The papers were first excluded based on an evaluation of abstracts conducted according to the process depicted in Fig. 3. In the first exclusion process, 128 research papers were selected for further examination. In the second part, complete articles were read to further assess relevance in relation to the search criteria; the same process was followed. Sixty-two research papers qualified from the second exclusion process and were chosen to be part of the literature evaluation. Complete numbers for each stage of the process are presented in Table 2 below.

Table 2 Process of excluding research papers

The papers were evaluated based on their relevance to the maritime industry, whether methodology and theoretical underpinnings were presented in the paper and whether the performance assessments were at the operational or tactical level. Operational performance assessments evaluate how a vessel performs within an operation, as with docking, navigating or dynamic positioning; tactical operational assessments evaluate how well a vessel performs across operations. Strategic evaluations, which are excluded from the current paper, are concerned with how an entire fleet performs over time and involve several economic calculations that were considered to be too indirectly related to job performance to be included in the current study.

3.3 Structuring the literature

The findings in the literature review were structured according to the maritime segments; the research papers were coded from A to D. Port logistics (A) encompass logistics and vessel handling when approaching a port. Ship handling (B) measures operational performance on board a vessel including both technical and navigational efficiency. Safety (C) concerns performance frameworks that assess both antecedents and consequences of crises. Environmental performance (D) focuses on research measuring green performance and the development of green performance indicators. Every paper was assessed in relation to the assessment methods and specifically in terms of which approach (i.e. bottom-up, top-down or hybrid) was taken in the conducted research.

3.4 Analysing the literature

All papers were included in univariate and bivariate analyses. The univariate analysis investigated the descriptive statistics concerning assessment methods used in the literature review. Another descriptive analysis was conducted on the distribution of the development approaches. Subsequently, to evaluate the consistency and accuracy of assessment methods, a cross-tabulation analysis was conducted on the use of the various development approaches across the four maritime segments.

4 Results

The results from the literature is organised into Table 3. It provides a list of the various approaches used in the examined research papers. The coding shown in the tables corresponds with and is used to identify the specific papers. For instance, code A1 corresponds with the paper titles “When it comes to container port efficiency, are all developing regions equal?” and the table illustrates that the assessment method was developed using a top-down approach.

Table 3 Listing assessment approaches that are used in the respective papers

The “X” marks the approach used in the respective research papers.

4.1 Result from the univariate analysis

Two descriptive analyses were performed to determine the distribution of data. First, descriptive statistics for the assessment methods identified in the literature are presented. Seventeen unique performance assessment methods were identified, though some were adapted for specific settings. Eleven undefined and unique methods were catalogued in the review. Such methods are often associated with an inadequate development approach.

The second descriptive analysis focused on the frequency with which different approaches were used to develop assessment methods. The top-down approach was the most prevalent approach (mode = 21 (34%)); bottom-up was the least-applied approach to performance assessment (15%). Combining adequate approaches (those that were consistent and accurate) revealed that 69.4% of the papers reviewed based their assessments on adequate research approaches. This finding signals an overall strong consistency for the maritime industry (Fig. 4).

Fig. 4
figure 4

Distribution of papers along the four research approaches

4.2 Result from the bivariate analysis

A bivariate cross-tabulation analysis was performed to further investigate accuracy and consistency. For port logistics and ship handling, the inadequate approach was most dominant at 34% for port logistics and 46% for ship handling. The hybrid approach was most prevalent in safety assessments (50%), and the top-down approach was used most often in assessments of environmental concerns (58%). Table 4 presents the cross-tabular bivariate analysis of the assessment approaches and maritime segments. The analysis revealed that the majority of papers in all segments used adequate approaches.

Table 4 Distribution of adequate and inadequate approaches with regards to each maritime segment

The distribution of various assessment approaches was then analysed to pinpoint the accuracy and consistency of the assessment methods for each maritime segment. Each segment received a score based on the number of research papers addressing each of the approaches. Furthermore, each approach received a weight reflecting its impact on the development process; this weight was determined using four assessment research experts to ensure consistency. The bottom-up and top-down approaches were weighted at 1, the hybrid approach was weighted at 1.5 and the inadequate approach had a negative weight of − 0.5.

The weights were devised to favour the more extensive hybrid method and penalise the lack of an accurate and consistent approach. The result for each segment was a relative proportional score due to the uneven return of papers for each segment. The maximum score was achieved if all papers for a segment received a weight of 1.5, meaning that they used the hybrid method, and the relative score was the proportion of the score to the maximum score for each segment. The environmental segment received the highest score (0.64), port logistics and safety received the middle scores (0.40 and 0.58, respectively) and ship handling received the lowest score (0.21) (Table 5).

Table 5 Relative accuracy and consistency comparison with the corresponding approaches

5 Discussion

A majority of the performance assessment research papers were found to develop assessment methods using adequate development approaches. At the same time, ship handling was suggested to receive increased attention with regards to consistency and accuracy in the development of assessment methods.

The bivariate analysis found that environmental research returned the highest relative score of the four segments (0.64). Environmental research has received much attention in recent years (Chu et al. 2017), and newer research may have increased attention developing a comprehensive method for performance assessment. Another explanation may be that the maritime industry is suspected to have a high environmental footprint (Lam 2015), and strong environmental performance is a key interest for all stakeholders.

The port logistics and safety segments received the scores 0.40 (port logistics) and 0.58 (safety) for accuracy and consistency. Port logistics were associated with a relatively high number of papers using inadequate methods to develop assessment methods; however, the majority of papers used adequate approaches. Safety assessment research was associated with the highest percentage of papers applying the more extensive hybrid approach to developing assessment methods. This finding suggests that some attention should be shifted to using adequate tools to develop methods for assessing port logistics; momentum in safety research should be maintained.

Ship handling scored the lowest (0.21); this field concerns seamanship and social collaboration on vessels (Ernstsen et al. 2017). This maritime segment received the lowest score regarding accuracy and consistency in the development of assessment methods. Ship handling is difficult to measure and has mostly been measured by the examination of the technical parameters of vessel performance; Sleire and Dale (2009) provide an example. It may be perceived as less advantageous to use a comprehensive approach. Regardless, it can be argued that accurate and consistent measurements for ship handling are also beneficial (Bouejla et al. 2014). As the shift from manual to automated systems continues, it is imperative that assessments of vessel performance are accurate and consistent. The low score suggests that further research on the assessment of ship handling is necessary.

Assessing performance is critical to determining operational safety and efficiency in high-stakes operations. Evidence of inadequate performance assessments is apparent in the existing literature, as 31% of the papers examined used approaches classified as inaccurate and inconsistent. It is difficult and time-consuming to adequately develop performance methods, and it may even be a conscious and constructive decision for certain operations to adopt a pragmatic approach. Nevertheless, the current paper argues that inaccurate and inconsistent assessment methods may cause more harm than good if pragmatic approaches are portrayed and misperceived as absolute and reliable measures of a particular operation. It is essential to be conscious of the underlying approaches used in the development of assessment methods for maritime operations; the current research emphasises this need.

It is worthy to mention some limitations. First, the Boolean logic applied impacted the research papers returned for analysis. The research papers in the review were examined carefully to ensure balance and proper representation of the literature. However, subjectivity was still present in the identification of relevant concepts and keywords used in the search string. This subjective effect was minimised by ongoing discussion among the researchers; however, it is still necessary to acknowledge this limitation. The identification of research concepts used as basis for the Boolean logic was also impacted by subjectivity, which influenced the subsequent identification of keywords and could have misled the study early on. Careful attention was paid to ensure that preconceived ideas and confirmation biases were minimised when the concepts for the systematic literature review were determined. Additionally, the exclusion criteria used to withdraw irrelevant research papers in systematic literature reviews influenced the results substantially. A step-by-step process was established to ensure that the literature was excluded in a consistent way, and the process was carefully verified in a dual review of the exclusion criteria. Finally, the use of a maximum score in the calculation can be considered misleading, as using a hybrid approach (which was required to achieve a maximum score) is not advisable or sensible in all circumstances and situations. However, the maximum score can be considered valuable for calculating the relative proportional score used to compare the respective maritime segments.

Although the findings of the current systematic literature review suggest an overall tendency to develop adequate assessment methods in the maritime industry, subsequent analyses of maritime segments and specified assessment approaches suggest opportunities for further improvement. For instance, it is suggested that standardising the way assessment methods are developed is further investigated. This could increase accuracy and consistency in the way performance is measured. It is also suggested that subsequent analyses pay increased attention to the development of ship handling performance frameworks. A comprehensive assessment framework to effectively determine ship-handling skills in high-stakes operations would make a significant contribution to maritime safety and efficiency.

6 Conclusion

The maritime industry is massive, and its vast impact on global ecology deserves to be accurately and consistently measured. The current study systematically investigated existing maritime literature to determine the prevalent use of consistent and accurate approaches to develop assessment methods. The findings suggest that assessment methods used in the maritime industry are developed using accurate and consistent approaches such as bottom-up, top-down and hybrid approaches. In the past, assessments of ship handling have commonly been using inadequate and highly subjective approaches to developing assessment methods. Therefore, it is proposed that the development of the methods used to assess performance in this maritime segment should receive additional attention. The current research paves the way for a systematic and increased understanding of performance assessment in the maritime industry. Currently, the authors are designing an experiment to evaluate the consistency and accuracy of performance indicators for ship navigation with an aim to further increase the integrity of performance assessments and lead to a safer and more efficient industry.