1 Introduction

When technologies such as machines or computers were first built, the main focus of innovation and invention was purely on the technology. However, with time, technologies have advanced far enough to enable development beyond the realm of technology shifting the focus towards the people that use technology. The principle, where the user is at the centre of the design process, is now vital to ensure safe, efficient, effective and enjoyable interaction between users and technology (Maguire 2001). Human-centred designs strive to understand the abilities and limitations of the user and develop technologies that play to the strengths of users whilst overcoming the weaknesses. Technologies are used in a number of environments; however, in safety critical environments, the importance of harnessing the principles of human-centred design is of utmost importance in order to maintain high levels of operator performance. Park and Jung (1996) argue that the human factors concept of mental workload is a very strong indicator for performance shaping factors.

Examining mental workload and the use of cognitive resources is thus critical for completing tasks in safety critical workplaces such as the railway signalling environment, where the research reported here was undertaken. Wickens (2002b) supports this, stating that task management is directly related to mental workload as the competing demands of tasks for cognitive resources can exceed the operator’s limited resources. During multitasking situations, there is a potential to manage the demand of cognitive resources by presenting information in different formats, so that distinct resources are used, instead of overloading a single resource whilst under using another.

Network Rail, the company that owns, maintains and operates the GB rail system, has developed an extensive ‘workload toolkit’ to support the measurement of signaller workload (Lowe and Pickup 2008). However, a potential gap was identified in that none of the existing tools specifically measure the level of usage of particular resources or the conflicts in resource demands during multitasking situations. The tools are also not always sensitive to differences in interface technologies, and this may be driving a lack of understanding of the potential impact on signallers when moving from old-style control panel type interfaces to modern VDU-based interfaces (Dadashi 2010).

Therefore, this paper aims to understand whether a particular tool, the Multiple Resource Questionnaire (MRQ; Boles et al. 2007) can be used to measure the demand on cognitive workload faced by railway signallers in two types of signalling boxes, predominantly when multitasking situations occur. If successful, the results could then be used to demonstrate the competing use of particular resources for certain task combinations. Recommendations could be made as a consequence to spread the use of cognitive resources for specific task combinations. The Multiple Resource Questionnaire developed by Boles et al. (2007) was therefore applied to the signalling environment in order to investigate the exact nature of the resources used in a number of key tasks, and to explore whether the tool could identify differences between two different interface technologies. The MRQ measures workload for 16 different resources available for a given task, and this analysis allows for comparing tasks, which makes it particularly useful for multitasking situations. The analysis will provide insight into the impact of two distinct signalling technologies on the cognitive workload of signallers. This study will also allow the research community to gain some insight into how the MRQ performs as a tool in the field environment, as it has only been used in laboratory studies to this date.

2 Literature review

2.1 Introducing the railway domain

Railway signalling is a workplace environment where the understanding of cognitive workload is critical. Routing of trains with the aim of safe operations and timeliness on the UK railway network is carried by signallers in control boxes. There are three main types of signalling boxes used; lever frames, eNtry–eXit (NX) panels and visual display unit (VDU) systems. The lever system controls comparatively small areas and uses mechanical levers to route trains. The NX panel is a console where a diagram of the control area is displayed. The signaller operates the console by pressing buttons and turning switches, and the status of the trains is provided via the illumination of light bulbs fixed in the console. More recently, VDU-based systems have been introduced which allow for automation. Here, the signallers view their control areas on VDU screens. These are used to show the position of trains as well as the status of signals and points. The information displayed is very similar to that shown on an NX panel, but the signaller interacts with the system via the use of a tracker ball rather than by direct manipulation. Examples of both systems are shown in Figs. 1 and 2.

Fig. 1
figure 1

The NX panel

Fig. 2
figure 2

The VDU-based system

Whilst the underlying signalling system remains identical, these boxes tackle the task of signalling trains with inherently different interfaces, but both systems have drawbacks. The interface used in NX panels is deemed obsolete due to the difficulty in implementing automation and the increasing pressures to control larger areas covering more railway tracks. However, signallers using the VDU-based system, which allows for automation and can cover larger control areas, face other issues. For instance, the monitoring task is more complicated as areas holding important information can be located on screens far apart and the displays can be unclear and cluttered, which may lead to a reduction in terms of speed of operation for some routes (Dadashi 2010). Other problems include slower commands, as signallers have to wait for the computer to process information, as well as a lack of support for multiple users, since the tracker ball only supports a single user. Current analysis tools do not fully capture these differences and the potential impact on the signallers’ workload.

Establishing the use of cognitive resources when carrying out tasks is especially relevant in safety critical environments and workplaces such as air traffic control (ATC), emergency service dispatch, intensive care nursing and signalling. Literature shows that human factors such as mental workload affect human performance and contribute considerably to error rates (e.g. Wickens and Hollands 2000; Park and Jung 1996). In addition, Edwards et al. (2010) investigated the impact of multiple factors on human performance in ATC environments and found that mental workload occurred as a factor for incidents in half of the factors dyads and 4 out of 5 factors triads. This evidence further suggests the importance of mental workload in safety critical environments such as railway signalling. This can be explained by some of the challenges that are faced in these working environments. Safety critical environments are highly dynamic so that problem solving tasks occur over time and at arbitrary periods. Moreover, workers face time pressures, overlapping tasks, the need to constantly maintain a high level of performance whilst having to be ready for any failure or problem that could occur (Hayes et al. 2004). Thus, if information can be collected on the way that tasks carried out in these environments impact on mental workload and therefore the use of specific cognitive resources, interfaces can be improved by identifying the format most suitable for presenting information to the operators. This information then complements the decision making process of operators which should consequently reduce the cognitive load.

There has been a renaissance in rail human factors research in the last decade (Wilson and Norris 2006), and Network Rail in particular has put extensive effort into the development of workload measurement capabilities (Lowe and Pickup 2008). The tools developed include a signaller-specific workload self-assessment scale (Pickup et al. 2005), an activity analysis tool (Lowe and Pickup 2008), analytical interview tools (Lowe and Pickup 2008) and an operational demand evaluation checklist (ODEC) tool (Pickup, Wilson and Lowe 2010). In some cases, different versions of some of these tools have been developed for different methods of signalling, but to date the toolkit does not have an objective or analytic capability to examine the influence of the interface on the signaller mental workload. In addition, none of the existing tools focus explicitly on resource use and potential multitasking conflicts in the depth or specificity offered by the MRQ.

2.2 From cognitive load theory to the MRQ

The limited resource model assumes that resources, which are supplied to meet task demand, are limited and may be allocated (Kahneman 1973). In the case of left over, or unallocated, resources, the user has spare capacity. In contrast, when the task demands exceed resources available, the user is faced with cognitive overload. Consequently, task performance depends on task difficulty and the allocation of available resource. However, not all variance in performance during multitasking can be attributed solely to these factors. Evidence shows that differences in the qualitative demands for information processing structures lead to differences in time-sharing efficiencies, which indicates that they behave as if supported by separate limited resources (Wickens 2002a). The Multiple Resource Model (MRM) highlights four important dichotomies which act like distinct resources. These dichotomies are the resources used by perception and cognition versus those resources used for execution of responses, visual versus auditory perception, where vision is further divided into focal and ambient vision, and finally spatial processes versus categorical or symbolic processes, which are usually linked to linguistics (Wickens and Hollands 2000). The final dichotomy is also associated with manual tasks that are linked to spatial processing and vocal tasks, which, in contrast, are processed verbally. Figure 3 shows these dichotomies in more detail. The MRM’s most valuable lesson is that when tasks, which are carried out simultaneously, use the same resources overload can occur. In a signalling environment, this means that if too many activities use similar resources, for instance visual or auditory resources, rather than a number of different resources, the signallers could face cognitive overload.

Fig. 3
figure 3

Representation of the structure of multiple resources (as shown in Wickens 2002a)

Therefore, if cognitive resources for a certain task can be pinpointed, then technologies may encourage the use of other, still untapped resources during multitasking situations, which will enable the user to successfully carry out a number of simultaneous tasks without reaching cognitive overload. However, the MRM has some shortcomings. Firstly, not all resources are captured by this model—the theory overlooks haptics and tactile information for instance. Secondly, the use of dichotomies may overlook or simplify the relationships between these resources.

In an attempt to address some limitations of the both the above models, Boles and Law (1998) proposed the Extended Multiple Resource Theory (EMRT), which acknowledges different types of resources but does not use dichotomies. In this model, the process resources identified, as shown on Table 1, suggest resources beside those of Wickens’ four dichotomies. Similarly, to the MRM, the EMRT differentiates between processes concerned with encoding or central processing and between processes handling responses. In addition to Wicken’s model, we can find an emotional, facial figural, planar categorical and temporal process. Moreover, the visual and spatial resources have been broken down into a number of distinct processes. Based on this theory, Boles et al. (2007) developed the MRQ which measures the factors relating to processes to predict dual-task performance. Applying this method would mean unveiling the use of 19 different resources rather than simply along four dichotomies. Existing workload measures, such as the SWAT and NASA-TLX, fail to do this as they emphasise more global psychological variables. The EMRT indicates that the distinction between resources is crucial and thereby supports Wickens’ (2002a) view, which emphasises the distinct resource pools are used when people carry out tasks.

Table 1 Process-specific mental resources (adapted from Boles et al. 2007)

As a tool, the MRQ aims to assess the overlap of resources that are used to complete a number of single tasks in order to predict interference between them when multiple tasks are carried out (Boles and Adair 2001b). To validate the MRQ, its’ validity and reliability have been tested specifically in a number of laboratory experiments. Reliability was established in two studies—the first tested a large number of games with a comparatively small number of interrators per game, whereas the second focussed on two games with a large number of interrators—the results show reliability ratings as measured by interrator agreement (r) between 0.57 and 0.83 (Boles and Adair 2001a). Secondly, to further satisfy reliability, a method should produce stable and consistent results over time (Wickens and Hollands 2000). When Phillip and Boles (2004) compared the MRQ with other workload measures, that is the workload profile and the global rating questionnaire, using computer games, they established that the MRQ showed the least variability over participants. Moreover, the validity of the MRQ was tested in a further study where 24 participants carried out 3 sets of simple tasks of a possible 4, and the results showed that the MRQ predicts dual-task interference (Bole and Adair 2001b). The MRQ therefore seems a fitting tool to investigate dual-task interference in a signalling environment, and the suitability will be further investigated in this paper. Megaw (2005) lists a number of criteria apart from validity to reliability for mental workload measurement techniques including:

  • Sensitivity: The measure taken should be sensitive to changes in task demands or required attentional resources.

  • Diagnosticity: The reason for variation in mental workload should be detectable by the measures.

  • Intrusiveness: The measure should not interfere or disrupt the primary task performance.

  • Participants’ acceptance: The extent by which the participants follow given instructions and co-operate with the measurement technique.

From the literature, it becomes clear that principles of cognitive workload and multiple resources are critical in workplaces where safe and timely decision making is required. Understanding the use of specific resources is particularly important during multitasking situations, but multiple resources have little relevance for the demands that a single task places on the human because in this case workload simply depends on overall demands not exceeding capacity (Wickens 2008). Once the relationship between the interaction design and mental workload is understood, designers, ergonomists and HCI practitioners may be able to create interactions that optimise the use of cognitive resources.

3 Methodology

The aim of this paper is twofold. First, the MRQ is used to compare two types of technology used in signalling in terms of the cognitive workload they pose on signallers during multitasking situations in normal working conditions. Second, the MRQ is evaluated as an indicator of cognitive workload in a field setting. This research will therefore help to emphasise the strengths and weaknesses of the questionnaire as a method. To that extent, the signalling task was analysed in both the NX panel and the VDU-based system in order to investigate the effect of the technology on cognitive resources used to complete the tasks. The research was executed in two stages. During the first phase, signallers were observed in order to identify the key tasks involved in signalling under normal operating conditions. The second stage aimed to identify which cognitive resources were used during these key tasks for both interfaces. The methodology applied, and the results of each stage are discussed in turn.

3.1 Preparation for the Multiple Resource Questionnaire

During the first phase, direct observation was carried out to identify the key tasks in signalling. Signallers were observed at two signal boxes. Box A used the NX panel and Box B used the VDU-based system. Each box consisted of three workstations, which were assigned to one signaller, respectively. Although it is not possible to exactly match boxes in terms of workload due to the variation in infrastructure, the boxes were chosen to have broadly comparable levels of workload. During the first visit, the researcher observed each workstation in each box for an hour to identify the key tasks performed by signallers. Two further visits were conducted in order to analyse how often these tasks were carried out in order to rate them on importance according to occurrence. Here, the signallers at each workstation were observed for an hour, and each minute the researcher noted down the tasks carried out by the signaller. All observations were carried out under normal working conditions.

3.2 Procedure and recruiting

Recruitment took place directly at the signal boxes. Permission to approach the signallers was provided by the manager of each box. After an introduction and an explanation of the study, the signallers were approached individually. All signallers who participated during the observation signed a consent form and filled in a short demographic questionnaire. Across both boxes a total of 13 participants took part. All participants were male with an average age of 40.9 ranging from 26 to 53. The majority obtained GCSE or A-level education, and had spent an average of 15.6 years as signaller. The observation was then carried out, and the researcher also noted any comments the signallers made about the system they were using.

3.3 Results from the observation

During the first visit, 12 tasks that signallers carry out under normal working conditions were identified. These are listed in Table 2. The remaining two rounds of observation investigated the single task and multiple task frequency, as well as the combination of tasks carried out simultaneously. The results are shown in Figs. 4 and 5.

Table 2 Typical tasks of signallers in normal working condition
Fig. 4
figure 4

Single task occurrence in both boxes (task numbers refer to tasks listed in Table 2)

Fig. 5
figure 5

Multitask occurrence in both boxes (task numbers refer to tasks listed in Table 2)

The data collected shows the importance of multitasking in a signalling environment. The signallers used a combination of tasks to fulfil their role during 57.3 % of their time and carried out a single task in 42.7 % of their time. Looking at the figures for both boxes, in the NX panel, 48.8 % of the time is dedicated to multitasking, and in the VDU-based panel the figure is 58.1 %. This difference could be explained by either the difference in traffic of the two areas controlled by each signal box or the design and layout of the boxes—the VDU-based system may require the controllers to multitask more frequently. Considering the data for multitasking situations in more detail, it can be said that monitoring, routing, using TRUST, making calls and setting reminders, or task 1, 2, 3, 7 and 8, respectively, are the five key tasks that describe 93 % of task combinations in the NX panel and 91 % in the VDU-based system. Therefore, these five tasks were chosen for closer analysis via the MRQ in the second stage of the research.

The data collected shows the importance of multitasking in a signalling environment. The signallers used a combination of tasks to fulfil their role during 57.3 % of their time and carried out a single task in 42.7 % of their time. Looking at the figures for both boxes, in the NX panel, 48.8 % of the time is dedicated to multitasking, and in the VDU-based panel, the figure is 58.1 %. This difference could be explained by either the difference in traffic of the two areas controlled by each signal box or the design and layout of the boxes—the VDU-based system may require the controllers to multitask more frequently. Considering the data for multitasking situations in more detail, it can be said that monitoring, routing, using TRUST, making calls and setting reminders, or task 1, 2, 3, 7 and 8, respectively, are the five key tasks that describe 93 % of task combinations in the NX panel and 91 % in the VDU-based system. Therefore, these five tasks were chosen for closer analysis via the MRQ in the second stage of the research.

At this point in time, it can be observed that in both boxes the monitoring task is the most frequent single task as well as in the most frequent in dual-task combinations. The most striking difference between the two boxes relates to an increase in dual tasking in the VDU-type box for the task combinations of monitoring and using of TRUST as well as monitoring and using the timetable. This discrepancy could be explained by different rates of traffic that signallers have to manage. However, and perhaps more interestingly, it could also be explained by different setup of the signal boxes. The shared board of information in NX panels may reduce the need for constant double checking and encourage communication between signallers and thereby reduce the amount of time spent on these task combinations. The remainder of the dual tasking situations seems fairly equally distributed considering that calls and setting reminders are caused by external forces and may occur more frequent at some times than others. Hence, consistency during the short period of observations cannot be assumed.

During the observations, comments by the signallers with experience in both types of signal boxes were collected. Two main points can be taken away from these comments. Firstly, the signallers find the VDU-based system more restrictive as only one task can be carried out at a time in terms of route setting (see Fig. 2). Via the trackerball, the signallers may only use one hand and using an additional method of communication, for instance over the keyboard, simultaneously is not supported by the system. On the NX panel, on the contrary, signaller often used two hands simultaneously to enter and cancel routes. Secondly, the signallers believe that the VDU-based system is much slower than the NX panel. The system needs to process every change the signaller makes, whereas these changes are implemented immediately on the NX panel. Although automation decreases workload to a certain extent, the signallers felt that they are held back and restricted by the system because of these two issues, and therefore, they are not able to work at their capacity. This is further problematic when multiple users are required on the same panel, which is critical especially during incident management. On the NX panel, signallers are able to pass on part of their panel to another signaller, and the supervisor is able to work on the same panel simultaneously if needed (see Fig. 1). However, the VDU-based system only allows for interaction with a single user, and it is not possible to pass on the responsibility of a part of the signalling area to another signaller. Therefore, signallers feel that the VDU-based system does not process changes fast enough, and it does not have a mechanism for overcoming the need of additional help. Whilst these comments provide additional valuable insights into the signalling task, care needs to be taken as they are subjective views and may be influenced by individual preferences.

Due to the different nature of the interfaces used in the signalling boxes as well as the comments received from the signallers, it was predicted that there would be a difference in the nature and amount of the cognitive resources used for each of these tasks when comparing between boxes.

3.4 The Multiple Resource Questionnaire

During the second phase of the research, the MRQ was completed in order to predict task interference in multitasking situations. The aims for this investigation were twofold. First, the results should reveal an accurate prediction of similarity of task pairs. This, in turn, should show the degree of interference of these tasks when carried out simultaneously. The results would also show which cognitive resources are underutilised. Secondly, the investigation allowed the researcher to scrutinise the MRQ as a method in a field setting.

The MRQ uses a rating scale to measure the usage of 17 mental resources, as shown in Table 1. A number of alterations had to be made to adjust the questionnaire to the audience and the environment. It was very difficult to understand and grasp the essence of each resource even for experts in the area so it was necessary to simplify the language of the questionnaire. This was mainly achieved by providing examples unrelated to the signalling task to help the participants obtain a good understanding of each process, and the researchers also aimed to simplify the description of the process, but this proved difficult as these were very specific and generalising the description often would have led to overlap with at least one other process. An example is shown to demonstrate the changes made:

  • Original description of the process:

    • Spatial quantitative process—required judgement of numerical quantity based on a non-verbal, non-digital representation (for example, bar graphs or small clusters of items), using the sense of vision.

  • Adjusted description of the process:

    • Spatial quantitative process—judgement of numerical quantity based on a non-verbal, non-numerical representation.

      Example: Estimating the number of people in a room without counting them.

The questionnaire was limited to measuring the usage of cognitive processes in five tasks due to time constraints. These tasks were identified in stage one, and they are key to the signalling task and were often used in a multitasking situation. The tasks selected were monitoring, routing, using computer, making calls and setting reminders.

4 Results from the Multiple Resource Questionnaire

The MRQ results for both Box A, where the NX panel is used, and Box B, with the VDU-based system, are shown in Table 3. The table shows the mean usage of each mental process. Here, 0 represents no usage and numbers 1–4 represent light, moderate, high and extreme usage, respectively. Via this table, profile similarity can be demonstrated. This measure gives an insight into the similarity of the peaks and valleys of demand across resources (Boles et al. 2007). Processes that incurred high usage are highlighted. Looking at the table, two observations can be made. Firstly, the tasks in either box seem to occupy similar mental resources, and the use of these resources is slightly higher in Box B, the VDU-based system. Secondly, some resources are highly used and some resources are untapped. Both claims are investigated in more detail.

Table 3 Mean usage of mental resources

Each task was rated according to the level of usage of cognitive resources. The table shows that tasks use similar resources when carried out in different signalling boxes. First of all, monitoring shows high usage of a number of processes including short-term memory, manual, spatial attentive, spatial positional and visual temporal processes. The maximum range between these highly used processes is only 0.53 between the two types of signalling boxes. Routing shows high usage in these processes too, whereas using the computer requires less mental resources with high usage restricted to manual, spatial attentive and visual lexical processes. Making calls in contrast uses both auditory processes highly as well as short-term memory and the vocal response process. Finally, setting reminders especially uses short-term memory, manual, spatial attentive and spatial positional processes.

4.1 The overall task demand

For the monitoring, routing and calling tasks, the usage of cognitive processes in Box B, the VDU-based system, is higher than in Box A, the NX panel, which is reflected in higher scores of overall demand. Overall demand can be obtained by summing all resource ratings for a given task (Boles et al. 2007). However, the demand on cognitive resources for using the computer is the same, and setting reminders require less cognitive resources in Box B. These claims will be tested using a Mann–Whitney test. This test was chosen as the data does not fulfil the requirements for parametric statistical testing. Five tests examined whether the overall demand for all tasks differs between Box A and B. The hypotheses were defined as:

H0:

There is no difference of overall demand for the monitoring, routing, computer, calls or reminder task.

H1:

There is a difference of overall demand for the monitoring, routing, computer, calls or reminder task.

Mann–Whitney U test is defined as

$$ U = n_{1} n_{2} + \left( {\frac{{n_{x} \left( {n_{x} + 1} \right)}}{2}} \right) - T_{x} $$

where n 1 and n 2 are the sample size of the first and second condition, T x is the largest rank total and n x is the sample size of the condition with the largest rank total. The null hypothesis can be rejected if

$$ U_{\text{observed}} \le U_{\text{critical}} $$

The outcomes of the statistical tests are shown in Table 4. The test clearly shows that the null hypothesis could not be rejected in any of these comparisons. Therefore, the difference of the overall demand is not statistically significant.

Table 4 Results of the Mann–Whitney U test

4.2 The overlap similarity

Secondly, the research claimed that some resources are highly used, and some resources are untapped depending on the task that is carried out. When considering the evidence from the results, it becomes clear that short-term memory, manual, spatial attentive and positional, as well as visual lexical processes are used heavily for at least four out of the five tasks. On the contrary processes such as auditory emotional and linguistic, tactile figural, vocal, facial motive and figural processes are underused. A measure called overlap similarity was used to show this in more detail. The overlap similarity score shows the degree of similarity of cognitive resources demanded by two distinct tasks when these tasks are carried out simultaneously. To obtain this score, the difference between the ratings for each cognitive resource across the two tasks is taken, and the average across the rating is then taken (Boles et al. 2007). A low score indicates a high degree of similarity between two tasks, with a minimum score of 0. Vice versa, a high score indicates a low degree of similarity, with a maximum score of 5. The overlap similarity scores for task pairs that were most common in a signalling environment are given in Table 5. This table shows that combinations of tasks that use similar resources have a lower score. Examples are monitoring and routing as well as routing and using the computer in both types of signalling boxes. However, making calls uses auditory and vocal resources. As these resources are not used to that extent in other tasks, the combinations that include calls show a higher overlap similarity score. However, there is scope for improvement by reducing similarity for any of the combinations of tasks.

Table 5 Overlap similarity scores

5 Discussion

The aims of this paper were twofold. Firstly, the MRQ was tested as a method measuring the usage of distinct cognitive resources required for a specific task in a field setting. Secondly, the results of the MRQ are discussed and related to the signalling environment as well as the technologies used by signallers to carry out the signalling task.

5.1 Methodological considerations for the MRQ

There are a number of criteria to rate methodologies such as the MRQ. These criteria include validity, reliability, sensitivity, diagnosticity, intrusiveness, ease of use and operator acceptance (Megaw 2005). The MRQ will be analysed in terms of these criteria. Firstly, validity examines whether the method can measure changes in mental workload (Wickens and Hollands 2000). When the method was applied in laboratory experiments by Boles and his colleagues, it clearly showed differences in mental workload along the 17 different cognitive processes (Boles and Adair 2001a, b; Boles et al. 2004). Thus, the validity of the method can be assumed for this study. Secondly, the method needs to have high reliability granted by stable and consistent results over time (Wickens and Hollands 2000). When Phillip and Boles (2004) compared the MRQ with other workload measures, namely the workload profile and the global rating questionnaire, using computer games, they established that the MRQ showed the least variability over participants. Although reliability clearly holds in the laboratory setting, it can be questioned in the field setting. Some of the participants in this study struggled to grasp the meaning of the different processes, one even openly admitted to randomly selecting answers. As a consequence, it is necessary that the participant undergo training to ensure reliability when the MRQ is applied.

In order to be a good measure of mental workload, the MRQ should be sensitive enough to detect changes in workload (Wickens and Hollands 2000). As before, the sensitivity of the MRQ was previously confirmed by studies carried out by Boles and colleagues (Boles et al. 2004; Boles et al. 2007) and is furthermore supported by the outcomes in this study. These studies showed that different tasks do indeed require different cognitive processes and the MRQ captured this well using measuring a total of 17 processes that can be involved in completing a task. Moreover, the MRQ is also sensitive enough to predict dual-task interference. In the signalling environment, overall task demand is not statistically significant, but the results still show the level of usage of mental processes is task dependant. For instance, monitoring prompted high usage of the short-term memory process whereas making calls instigated high usage of the auditory emotional process. In addition, spatial attentive resources are highly used for both routing and setting reminders, but not for making calls. The lack of significance may result from the participants’ unfamiliarity with the tool. If the participants could be trained how to use the MRQ to increase their understanding and confidence, sensitivity may have increased. The results certainly show potentially high sensitivity of the MRQ, but it also highlights the need for participant acceptance and understanding of such a tool.

Furthermore, diagnosticity shows the extent to which a measure provides an indication of the nature of the workload change (Wickens and Hollands 2000). As the MRQ uses a 17 item measure for subjective workload assessment, it aims to provide high diagnosticity by identifying the load on specific resources (Boles and Adair 2001a). These 17 processes have been identified as distinct cognitive resources, which is clearly critical when engaging in research fields such as psychology. However, some of these distinctions may be obsolete when this theory is applied in areas such as human factors or any other discipline that focuses on design. When designing tangible objects or intangible experiences, the design’s complexity is limited so that not all of the cognitive processes can be triggered by the use of different media or modalities. Future research should investigate whether the high diagnosticity is useful in practically oriented fields of research and which of these distinct resources can be pooled together.

Moreover, the intrusiveness of a method, that is the degree by which the method interferes with task performance and thereby contaminates the results, should be low (Wickens and Hollands 2000). During this study, the participants were asked to rate the usage of the 17 cognitive processes in retrospect. Thus, intrusiveness was low. Finally, O’Donnell and Eggemeier (1986) suggest that the method should be easy to use by the participants and the researcher as well as allowing for high user acceptance. The method failed on both accounts as the participants found it very hard to understand some of the processes and the distinction between them, although the descriptions were already simplified and appropriate examples were provided by the researcher. Moreover, a number of participants felt overwhelmed by the questionnaire and opted out and one participant admitted to randomly ticking boxes.

Some final remarks can be made about the use of the MRQ outside laboratory conditions. A number of biases could have been introduced by the setup. Firstly, Boles and his colleagues primed participants before asking them to complete the MRQ. Participants carried out the task which was rated continuously for a set amount of time (Boles and Adair 2001b; Boles et al. 2004). This was not possible whilst collecting data at the workplace. The lack of priming could have been overcome by using a training simulator and exposing signallers to the tasks they were rating in a more controlled manner. However, this option would be expensive and time-consuming. Participants would also be under the impression that their abilities and performance were tested, which would have skewed the results. Secondly, the lack of priming the participants also allowed for some variation as participants could have been thinking about slightly different scenarios within the task definitions. For instance, a call could simply be providing a train driver with information, it could manage maintenance or it could be a distress call in an emergency situation. Each of these scenarios has the potential to influence the results.

5.2 The use of cognitive processes in a signalling environment

This paper aimed to investigate the effectiveness of the Multiple Resource Questionnaire as a tool to pick up differences in workload and resource requirements between different signalling techniques and between the tasks signallers carry out on a daily basis. The results show that there is no significant differences in the overall demand for each task carried out in the VDU-based system or the NX panel. Therefore, it can be deduced that the workload is at a similar level in both signalling boxes, which shows that with the help of automation in the VDU-based systems, the operators are able to cope with larger control areas. Furthermore, the tasks carried out most commonly by signallers—monitoring, routing, using TRUST, making calls and placing reminders—have a consistent profile similarity over the two types of signalling boxes. For instance, the monitoring task shows high usage of short-term memory and visual temporal processes and low usage of auditory emotional and tactile figural processes in both boxes. Although different technologies are used, these results suggest that the interaction itself has not changed by a significant amount, which is in line with the underlying signalling system and tasks remaining the same. More generally, tasks carried out by signallers are highly visual, and they require high usage of short-term memory as well as spatial reasoning. Hence, there is still a potential to redesign the tasks in a manner to make more efficient use of cognitive resources.

Moreover, the demand on cognitive resources has not been tackled in multitasking situations. Both railway signalling boxes show high overlap similarity for all 7 task pairs. This concentration of task demands on similar resources means that the user could be faced by cognitive overload as the resources that can be allocated are limited (Wickens 2002a). This can be overcome by spreading the task demand over distinct cognitive resources. Furthermore, some additional problems of the VDU-based system have been highlighted by the operators and the literature. These include the lack of support for multiple users, the lack of immediacy as the computers have to process commands, as well as cluttered displays and presenting important information in a dispersed manner (Dadashi 2010).

Thus, the underlying issue that can cause cognitive overload in railway signalling, as well as other safety critical environments, is the way the tasks are designed and restricted in terms of the interaction with the technology used to complete these tasks. This problem can be overcome by applying principles of multimodal communication. A modality is defined as the type of communication channel used to present or receive information (Nigay and Coutaz 1993). These modalities include using speech, gestures, touch, etc. In a railway signalling environment, the communication between the operators and the technology could be designed in a way that different tasks use different modes of communication. For instance, setting routes could be achieved using touch, gestures or speech commands. One of the benefits of using multiple modes of communication is the flexibility it grants as the operators can choose the best mode depending on circumstances. Moreover, multimodal communication is something that we, as humans, engage in naturally especially when task demands increase (Oviatt et al. 1997). If applied in a meaningful way, the use of specific cognitive resources can be triggered by the use of distinct communication modalities. Therefore, profile similarity and overlap similarity of any combinations of tasks can be reduced by spreading tasks demands over a range of cognitive resources in a multitasking situation in order to reduce the risk of cognitive overload.

6 Conclusions

The aims of this paper were twofold. Firstly, the goal was to identify the distinct cognitive resources used by operators in two different railway signalling environments when carrying out tasks. The results of this study could then inform how existing technology supporting the signalling task can be improved or new technology could be developed by triggering the use of underused resources during multitasking situations so that competition over the same resources is reduced. Secondly, the MRQ, a relatively novel methodology, was applied to measure the use of cognitive resources. As this questionnaire has not been used in a field setting before, its value was investigated here.

As a method, the MRQ was adequate on a number of categories, most importantly validity, reliability and sensitivity. Validity was supported in this field trial as clear resource uses could be identified so that interference of resources when multitasking in a signalling environment could be highlighted. The tool was found reliable in previous research; however, reliability could be enhanced by adding a training session so that participants’ understanding and acceptance of the tool would increase. The MRQ as a tool that assesses mental workload was also found to be sensitive to changes in mental workload depending on the task carried out, which was demonstrated by the difference in processes required by signallers for distinct tasks. It is particularly useful for determining the cognitive resources that are required to carry out a task, and it further allows for a direct comparison of a pair of tasks, allowing for analysis of profile similarity, overall demand and overlap similarity. However, two main issues were identified when using the MRQ in a field environment to inform interaction design. First of all, the operators felt the distinctions between some processes were difficult to understand even after simplification and provision of examples unrelated to the signalling task. Future research should aim to clarify the questionnaire so that it is easily understood by participants without losing the meaning of the cognitive processes. Alternatively, participants could be subjected to a familiarisation session prior to rating relevant tasks, which could assess their level of understanding of the cognitive process. Furthermore, the MRQ is potentially too sensitive in a human-centred design context. There are two thoughts behind this claim. First, it could be argued that it is very difficult to design for interactions to trigger the use of certain cognitive resources to that granularity. Secondly, this level of granularity may not even be required. For instance, if a process requires the completion of two different tasks, then the allocations of resources from two distinct cognitive processes may be enough to cover task demands. Therefore, it can be argued that relevant grouping of cognitive processes could be developed, which may be more meaningful to designers and HCI practitioners. Future research should be carried out in order to develop such a meaningful grouping and investigate how the use of these resources can exactly be triggered via interaction design.

Applied to a railway signalling environment, the MRQ was used to establish the levels of usage of cognitive resources when signallers were multitasking at two different types of signalling boxes. These boxes use different equipment to help the operators to route trains in a safe and time effective manner. Establishing a deep understanding of metal workload in safety critical workplaces is important in order to design for interactions between operators and their systems that assist in completing tasks safely, efficiently and effectively. The results of the MRQ lead to two main conclusions. Firstly, although the signalling boxes use two different types of technologies, similar cognitive resources are required to complete signalling tasks such as monitoring and routing. This shows that the technology may have advanced to allow for automation and control of larger areas, but the interactions used to complete tasks have largely remained the same. Secondly, tasks carried out by railway signallers use similar cognitive resources. Thus, there is a danger of overload especially during non-normal working conditions. For that reason, there is a potential to redesign interactions using the principles of multimodal human–computer interaction, using different modes of communication, whilst multitasking can trigger the use of distinct cognitive resource. This approach will not only make interactions more natural and flexible, but it could also reduce the risk of cognitive overload by encouraging use of currently untapped resources. Future research should investigate how the use of these resources can exactly be triggered via interaction design.