1 Introduction

For a number of years, substantial research has been conducted on investigating the role of external representations in problem solving, and how such representations can affect problem solving efficiency [7, 8, 16]. Arguably, a key factor underlying the advantages afforded by multiple external representations (MERs), in terms of their problem solving capacity, relates to computational non-equivalence [1]. Representations containing identical information (information equivalence), but exhibiting different representational characteristics, produces differences in ease of information extraction. Larkin and Simon [8] contrasted interpretations of graphical and textual information in terms of search, recognition and inference, claiming that one principal benefit of diagrammatic representations is that “diagrams can group together information [to be] used together thus avoiding large amounts of search for the elements needed to make a problem solving inference.” Our prior work contrasting informationally equivalent, but computationally distinct 2D and 3D representations revealed that differences in representational characteristics have significant effects on the amount of cognitive effort required to perform tasks [5].

In addition to individual cognition, another broad level of research considers that an understanding of the interaction between representational characteristics and cognition is developed through social interaction and activity. Collaborative learning research underlines that the interaction of the cognitive processes of several people is qualitatively different to that of an individual [13, 14, 15]. Hutchins [6] broadly defines the effects of group-level cognitive properties, stating that these properties are produced by interactions between structures both internal to individuals and external to individuals (e.g. external representations).

1.1 Effect of communication medium on collaborative problem solving

Visual information presented in audio-only contexts forces collaborators to verbalise tacit knowledge which can be misinterpreted. Shared representations can mediate collaborative problem solving discourse, without the need for complex verbal descriptions, through the provision of a means to articulate emerging knowledge and solutions within a medium visible to all participants. Generally, research findings concerning the effects of visual signals on communicative processes appear to be mixed [2, 12, 4].

These studies compared performance and effects on communication in face-to-face, video-mediated and audio-only conditions. Our aim is to investigate the effects of performance and communication incorporating audio-visual (shared representations) and audio-only conditions, following the assumption that visually shared representations optimise collaborative problem solving. Moreover, we aim to examine both explicit and implicit references made during interactions, but prohibit non-verbal communication, except via the computer interface. In addition, we aim to gather subjective measures as a usability assessment to evaluate the representations.

2 Properties of 3D representations in desktop virtual reality (VR) environments

This study compares two graphical representations depicted in Figs. 1 and 2: 3D cylinder and 3D helix, respectively. The representations were presented on web browsers including Virtual Reality Modelling Language (VRML) plug-in controls (e.g. CosmoPlayer) in desktop VR environments incorporating audio-visual (shared representations) and audio-only communication. The development of representations within desktop VR environments offers certain advantages—e.g. desktop and web-based VR environments can potentially distribute data visualisations over the Internet, allowing multiple users to access the information concurrently. The development of desktop VR is also more cost-effective than equivalent immersive environments [3].

Fig. 1
figure 1

3D cylinder representation. Telephone calls made by six groups over a one-month period (with pop-up message, spin button, expanded group key and ‘Start View’)

Fig. 2
figure 2

3D helix representation. Telephone calls made by six groups over a one-month period (with pop-up message, spin button, expanded group key and ‘Start View’)

Both representations include identical telephone usage data, incorporating call usage for six discrete groups during a one-month period. The usage data for each group is represented by colour-coded data points or cylinders on the representations. Figures 1 and 2 highlight shared VR functionality included with each representation, notably:

  • A collapsible colour key for six groups with associated pictures

  • A pop-up message (showing associated group picture and usage information) briefly displayed in the top-left corner whenever a data point is selected

  • A spin button rotating the representation clockwise whenever the spin button is selected

  • A ‘Start View’ (from viewpoint list) to return to the default front view orientation

Table 1 summarises the design characteristics for both 3D representations.

Table 1 Representational design characteristics

Both representations share identical databases and design characteristics, and are considered to have informational and computational equivalence. The representations consist of a number of circular spirals, each rotating through 360°. Each spiral represents one week, and is further divided into seven equal parts, equating to seven days. A darker shaded segment denotes night, with the centre of the segment representing midnight. The two coloured segments within each spiral denote weekends. The 3D cylinder in Fig. 1 presents consecutive weekly intervals of usage data across each spiral; the earliest week represented by the innermost spiral. With the 3D helix in Fig. 2, the days of the week are incremented along each spiral being rotated in a clockwise direction; the earliest week represented by the foremost spiral. Both representations consist of spirals which have a finite number of potential cylinders—each cylinder representing the call duration for a particular group during a two-hour period.

3 Behavioural usage data and information processing tasks

The telephone usage data has been created and manipulated to create differences in call usage and patterns between six groups (based on typical groups representing actual BT customers). These groups, each with associated call usage, include: family with three children (heaviest call usage), mother with two children (medium call usage, daytime calls only), mother with infant (light call usage, only one evening call), father with two children (lightest call usage), male aged 33 (evening calls only, mostly at 8 pm) and student (evening calls, mostly at 10 pm, with occasional weekend and early morning calls).

A set of five analytical tasks (see Table 2) were devised, with each task divided into two parts, A and B. Tasks 1a and 3a query which group demonstrates both the highest and the lowest call usage (call duration and frequency) for the entire month. Task 2a is included to test whether specific call patterns can be derived from the representations. The data distributions have been manipulated so that four of the six groups exhibit specific call patterns (e.g. when approximately 50% or more of total call usage occurs during specific diurnal or weekly periods). Task 4a specifically tests whether a series of ‘shared’ cylinders (six in total) can be identified (‘shared’ cylinders are divided into two colours, denoting two groups with identical calling times and durations). Finally, task 5a tests which group called at a specific diurnal period throughout the month.

Table 2 Analytical tasks for telephone usage for six groups during a one-month period

Part B components are included to encourage collaborators to discuss whether any particular representational properties facilitated their search for answers to part A tasks. Moreover, part B tasks are designed to elicit the mutual exchange of representational characteristics between the participants, to derive qualitative feedback to inform usability assessments for the representations and to gain insights into how different representations contribute to collaborative problem solving.

4 Expert walkthrough and predictions

An expert walkthrough (e.g. [9]) was performed by the first author, using task examples with both representations, to identify strategies for attaining sub-goals (most and least effective methods) and cognitive processes. Within the strategies identified, a combination of physical actions and VRML plug-in control selections were possible, including Seek, Zoom, Rotate, Pan, Undo, Redo, Straighten or the ‘Start View’ viewpoint selection from the Viewpoint List.

During the expert walkthrough, three high-level, sequential cognitive processes were identified and associated with each stage of the five tasks:

  1. 1.

    Scanning the representation to identify target cylindrical data points

  2. 2.

    Selectively grouping target cylindrical data points according to task requirements, e.g.:

    (a):

    by group colour or name, and cylindrical size and frequency (for usage estimations and comparisons, e.g. tasks 1a and 3a), and/or

    (b):

    by group colour or name, and cylindrical spatial location (for temporal patterns and comparisons, e.g., tasks 2a, 4a and 5a)

  3. 3.

    Encoding target group(s) by writing or remembering group name(s) or colour(s)

These processes may be repeated, with the number of iterations being determined by task difficulty and the efficiency of the strategy adopted. The main factor contributing to lower mental effort is attributable to the degree of computational offloading onto each representation. The amount of offloading achievable is dependent on the efficiency of the strategy selected and its appropriateness for both representation and task.

Figure 3 shows the 3D helix representation at the front, or ‘Start View’. The simplest strategy is to retain this view while scanning the data and clicking on the cylinders to read the pop-up messages upon mouse selection. However, in order to avoid data occlusion, the representation should be rotated off-centre (e.g., see Fig. 4), so that cylinders and spirals are segregated.

Fig. 3
figure 3

3D helix representation (at ‘Start View’). Telephone calls made by six groups over a one-month period (with expanded group key)

Fig. 4
figure 4

3D helix representation (rotated and offset left). Telephone calls made by six groups over a one-month period (with expanded group key)

Perceptual distortions can occur with the 3D helix for tasks requiring data size comparisons (tasks 1a and 3a), as cylindrical height-cues are dissipated around the spirals. By contrast, the 3D cylinder representation (see Fig. 5) optimally supports data size comparisons as the spirals are horizontally aligned, although some perceptual distortions still occur, as cylinders are positioned around the spirals at different distances from the user.

Fig. 5
figure 5

3D cylinder representation (view zoomed in slightly). Telephone calls made by six groups over a one-month period (with group key collapsed)

To overcome these distortions, the 3D cylinder can be spun around so that distant cylinders are presented in the foreground, or rotated upwards so that the representation appears horizontally flat with the cylinders vertically aligned (see Fig. 6). It is then possible to spin the representation to estimate cylindrical heights as the representation turns.

Fig. 6
figure 6

3D cylinder representation (rotated up to eye-level view). Telephone calls made by six groups over a one-month period (with group key collapsed)

Both representations share representational characteristics, promoting higher offloading and task efficiency. The distribution of cylinders around the spirals at regular intervals enables temporal patterns to be perceived when the cylinders are aligned across the spirals. However, one noticeable difference between the representations is that the shaded and orange segments around the spirals are less distinguishable on the 3D cylinder (see Fig. 5), appearing more prominent around the helical form of the 3D helix (see Figs. 3 and 4). This effect is primarily due to the horizontal alignment of spirals with the 3D cylinder. As a result, computational offloading may be higher with the 3D helix for tasks requiring the comparison of targets at specific diurnal periods (task 5a).

4.1 Experimental predictions

The study follows the assumption that visually shared representations optimise collaborative problem solving, predicting a performance advantage for audio-visual groups. The expert walkthrough revealed differences in representational characteristics, which may produce differences in performance measures between representations for certain tasks. To summarise, these hypotheses predict that an overall performance advantage will occur:

  • For audio-visual groups, as forced verbalisation is unnecessary due to the presence of shared visual information

  • For 3D cylinder groups for data size comparisons (e.g. tasks 1a and 3a)

  • For 3D helix groups for tasks requiring discerning between diurnal periods, weekdays or weekends (e.g. task 5a)

It is also predicted that lower usability ratings will be submitted by audio-only groups (via questionnaire, see Sect. 5.4), as participants are forced to verbalise tacit knowledge and representational characteristics.

5 Method

5.1 Participants

A total of 120 participants (58 males and 62 females, aged between 17 and 47), comprising undergraduate and postgraduate students at the University of Nottingham, were tested in a between-subjects design. Each participant received a £5 incentive payment.

5.2 Materials

Both 3D representations were presented on the screens of two Pentium 4 1.4 GHz IBM-compatible personal computers incorporating Matrox Millenium 550 dual head graphics cards. The representations were displayed on desktop web browsers with VRML plug-in controls installed (CosmoPlayer). Direct screen capture video recording (using two Tandberg 6000 video codecs linked to a Sony DRS20 desktop digital recorder) was used to record participants’ verbal comments. Screen desktop information was output to video codecs via the PC graphics cards. Two desk microphones were used to capture audio. In addition, two Sony digital camcorders recorded participants’ head movements. Two test rooms were set up for video conferencing, allowing the transfer of audio-visual (PC screen desktop) information between the two rooms. Screen desktop information output from each local PC was displayed remotely via two Loewe 29” CRT televisions. After completing the tasks, each participant completed a brief twenty-statement scale-based questionnaire to measure subjective judgements about the representations (see Sect. 5.4).

5.3 Procedure

All 120 participants were randomly assigned to dyads (where participants did not formerly know each other personally). The study was conducted in two parts. In the first part, 30 dyads were allocated to the audio-only communication group. In the second part, a further 30 dyads were allocated to the audio-visual communication group. All dyads performed an identical series of five two-part tasks in one of three conditions (incorporating mixed, 3D cylinder or 3D helix representations).

All participants were seated in separate rooms. The experiment was preceded by a practice session in all test conditions. Participants were each given a set of written instructions to familiarise themselves with the plug-in controls and representational functionality, and three practice questions to familiarise themselves with the nature of the tasks. Prior to each trial, participants were instructed to agree on answers to each task, providing each answer verbally via a spokesperson (the identity of the spokesperson was decided between participants). Participants were also informed of the nature of the two-part tasks (e.g., the answers to part A of each task could be derived from the representation, whereas the answers to part B were subjective), and were instructed that they may or may not have the same 3D representation as their collaborator (participants with audio-visual communication were further informed that they could view their collaborator’s 3D representation and on-screen behaviours via the television screen). Participants were reminded to discuss what they were seeing, doing and why, what they found easy or hard, and what they thought of how the representation worked, including likes and dislikes, etc., and were informed that any device interactions (e.g. keystrokes, mouse manipulations, etc.) and verbal communication during the trials would be recorded for later analysis, and that there was a maximum of 30 min allocated to complete all tasks.

A final brief set of instructions was verbally given to each participant, informing them that they could communicate with their collaborator via the desk microphone, and that they would be able to hear their collaborator’s comments through the television speaker. All participants were reminded to agree the identity of the spokesperson before continuing, and that written notes could be taken if required. The dyads then proceeded to answer the five two-part tasks as presented in Table 2. All responses were given verbally via the nominated spokesperson. Following the set of tasks, participants completed a brief scale-based usability questionnaire without consulting with their collaborator.

5.4 Usability questionnaire

The usability questionnaire was originally developed for a pilot study preceding our prior work to assess participants’ attitudes towards each of the representations reviewed (see [5]). The original usability questionnaire has been retained and an additional dimension incorporating four statements has been added. The questionnaire now comprises 20 statements, measuring attitudes over five dimensions including System Performance, User Control, Affective Experience, Usability and Communication Process. The questionnaire uses a five-point Likert scale, requiring responses ranging from ‘Strongly Agree’ to ‘Strongly Disagree’. Three further questions on the original questionnaire measuring interaction strategies (requiring a dichotomous yes/no response) were reduced to one, mutually exclusive question. Reliability analyses were conducted to ensure that all the statements within each dimension were measuring a single concept. Cronbach’s alpha is an overall correlation coefficient that indicates a high level of consistency in a test. Items were tested together within each questionnaire usability dimension. A corrected item-total correlation was then observed for each item and no items with low correlations (e.g. below 0.15) were identified from the test. Table 3 presents the revised usability questionnaire.

Table 3 Revised representation usability questionnaire incorporating additional ‘Communication Process’ dimension

6 Results

The presentation and discussion of results is organised into three sections: the first section presents the performance measures derived from part A task responses, the second discusses the qualitative subjective measures derived from the part B task responses, while the third presents the post-test questionnaire analysis.

6.1 Part A task responses

The mean completion times for each task were calculated by separating each part A response from the total completion times recorded for all sequential part A and B responses. To test for equality of variances, Levene tests based on each of the mean part A task performance times were not significant (p>0.05), indicating equal variances. Between subjects (2×3) factorial analysis of variances (ANOVAs) (Communication: audio-only, audio-visual; Representation: mixed, 3D helix and 3D cylinder) were based on the mean performance times for each part A task response. The tests revealed no main effects of communication or representation, with no significant interaction between these factors for any of the task responses (see Table 4).

Table 4 Non-significant results for five mixed ANOVAs based on mean performance times for tasks 1a–5a

Table 5 presents the mean completion times across representations and between communication groups for all part A responses. Faster completion times were observed for task 1a, as all dyads quickly identified the group demonstrating the heaviest call usage, due to the obvious density and length of the dark green cylinders. The inferential difficulty of other tasks (characterised by selecting, encoding and comparing potential target cylinders) resulted in slower completion times. Slower task 2a and 4a responses, overall, demonstrate that these tasks are indicative of increased mental effort. Despite the lack of significant differences, the observed benefits for certain tasks incorporating audio-visual communication (e.g. tasks 2a, 3a and 4a) offers some support to the prediction for a performance advantage with shared visual information, especially with identical representations.

Table 5 Mean time (minutes) to complete tasks 1a-5a by Representation within Communication groups

To test the accuracy of part A task responses, they were first scored according to the task specifications and data distributions (see Sect. 3). Three of the five tasks (e.g. 1a, 3a, and 5a) required single item responses, with one score marked for each correct response. Tasks 2a and 4a consisted of multi-part responses, with one score marked for each correct response given. A maximum of four scores were achievable for task 2a (four groups demonstrate specific calling patterns), whereas six scores were achievable for task 4a (six groups share calling patterns).

Ceiling effects were observed for tasks 1a and 5a. All dyads tested produced error-free responses for task 1a, and only one incorrect response was obtained for task 5a (from 3D cylinder group with audio-visual communication). Further analysis tested differences between groups in the accuracy of responses for the remaining three tasks. Levene tests based on multi-part mean error responses (e.g. tasks 2a and 4a) were significant (p<0.05), indicating unequal variances. Non-parametric Kruskal-Wallis one-way ANOVAs based on tasks 2a and 4a mean errors revealed no significant differences between communication type (p>0.05), although differences between representation groups for task 4a mean errors achieved near significance (df=2, Χ 2=5.023, p<0.09), probably due to the higher mean errors produced by the audio-only mixed representations group (see Table 6).

Table 6 Mean errors for tasks 2a and 4a by Representation within Communication groups

Table 7 shows the number of dyads committing errors for task 3a by representation group within communication group. The 3D helix representation groups achieved greater accuracy than the other representation groups, with no differences in observed errors between communication groups. Chi-square analysis based on differences in errors committed between representation group and communication group was not significant (Χ 2=1.42, p>0.05).

Table 7 Number of dyads committing errors by representation group for task 3a (total number of dyads in parentheses)

6.2 Part B task responses

The analysis of the part B task responses partly follows the method prescribed by [11], namely, theme-based content analysis (TBCA). Using case studies, the TBCA method was devised as a usability tool to evaluate a variety of virtual reality technologies, including desktop environments. TBCA is a qualitative method providing information regarding user attitudes and behaviours, and indications of results in the user population by grouping data into useful categories (see [11] for a full review).

Following the method, the first author derived raw data themes directly from the transcribed part B responses. Higher level themes were devised, summarising the raw data themes at a broader level. The part B task responses were classified according to participants’ explicit verbal references to items, and implicit actions (also partly derived from part A responses). Two separate classification tables were produced, summarising these explicit references and implicit actions. To check for consistency and robustness, the tables were independently verified by two qualitative researchers based at Chimera, Institute for Socio-Technical Innovation and Research, University of Essex.

Frequencies were calculated from statements and actions falling into specific themes. Simple frequencies can identify differences or commonalities between participant groups in their approaches to completing certain tasks, task strategies and reveal representational usability problems. Table 8 presents the TBCA of dyads’ explicit verbal references.

Table 8 TBCA of dyads’ explicit references (Reps=number of repetitions)

The higher order theme ‘domain talk’, codes discourse about the domain (e.g. “Probably because they′re attending lectures during the day”, “That’s ‘cause she′s going out clubbing”). ‘Tool talk’, ‘representational characteristics’ and ‘data characteristics’ identify segments describing useful representational and data attributes. ‘Tool talk’ codes segments referring to plug-in tools, or representational functionality (e.g. “I used Pan and Rotate”, “The spin function just helped me”). ‘Representational characteristics codes segments referring to representational properties (e.g. “OK, it’s the layout of the representation”, “Just by looking at the light grey and darker bits”), while ‘data characteristics’ segments refer to data properties (e.g. “It was the length of the bars that helped us”, “It was obviously the different colours that helped us”). ‘Representational envy/Representational boasting’ applies to the mixed representations group with audio-visual communication, referring to segments where participants express a preference for either their collaborator’s representation displayed on the TV screen (‘envy’: “I think yours is easier to understand than mine”) or their own representation (‘boasting’: “I think mine is easier to see than yours”). ‘Representational equivalence’ applies to audio-only groups, highlighting segments where participants have attempted to establish whether they share identical representations (e.g. “How does it look because we don’t necessarily have the same representation”), or where participants have assumed that the representations are identical (e.g. “I’m assuming you’ve got the same as me”). ‘Representational uncertainty’ is applied to segments where participants are unsure how to interpret part of the representation (e.g. “What do you think for that region, is that what represents the weekend?”) and where they are unsure how to complete a sub-goal (e. g. “Where do you click the rotate?”).

The remaining higher order themes code participants’ references to task strategies, usability problems experienced with the representations and the data, suggested new functionality and include comments relating to task difficulty.

Table 9 presents the TBCA of participants’ implicit comments and actions. The higher order theme ‘misinterpretation of action/comments’ codes actions or discourse where participants have misinterpreted functionality suggested by their collaborator (e.g. using spin function instead of Cosmo rotate control), or where participants have misinterpreted their collaborator’s explanations (e.g. exploring inner most spiral when their collaborator was referring to the outer most spiral). ‘Task collaboration’ is characterised by three raw data themes: ‘hypothesis statements’, ‘instructive comments/advice’ and ‘splitting activities/sub-goal’, whereas ‘non-collaboration’, refers to periods where participants work alone (e.g. characterised by extended periods without dialogue) only conversing to agree a response.

Table 9 TBCA of dyads implicit actions (Reps=number of repetitions)

6.2.1 Qualitative observations

For ‘tool talk’ (apart from VRML plug-in references), pop-up messages were often cited as a useful function, especially with groups utilising audio-visual communication. All dyads using the 3D cylinder representation in the audio-visual condition also cited the utility of the spin function. The most cited ‘representational characteristic’ across groups was the light/dark shading on the spirals, whereas popular ‘data characteristics’ included cylindrical length and frequency and, to a lesser extent, cylindrical colours.

Participants in the audio-visual mixed representations group witnessed their collaborator’s manipulations via the TV screen, and a number of participants made judgements about their collaborator’s representation. Some participants using the 3D cylinder declared ‘representational envy’ for the 3D helix (more so than vice versa), stating that the information appeared clearer, or that it appeared easier to use. One participant elaborated on comments she had made earlier about her collaborator’s representation: “Yeah, I think it’s easier to use, the one that K’s using (3D helix), I would actually look at that rather than look at my screen for that one, ‘cause my one’s too messed up, well too obscured (3D cylinder).” Other related comments (see Table 10) offer support for the prediction that shaded and coloured segments appear more prominent around the helical form of the 3D helix (see Sect. 4.1).

Table 10 Mixed representations (audio-visual communication) transcript sample

A number of participants also demonstrated ‘representational boasting’, stating that the information in their own representation was more legible, or that it appeared more usable than their collaborator’s, e.g. “... I think I can see that easier on my representation... so it’s easier to see on mine, I think (3D helix).”

‘Representational equivalence’ applies to groups incorporating audio-only communication. Of these groups, participants using mixed representations made more explicit attempts to establish whether they were using identical displays, although three participants erroneously assumed that they were sharing the same representation. Of the groups using identical representations, four participants using the 3D helix correctly assumed that they were sharing the same representation, based on their collaborator’s descriptions.

Low frequencies of ‘representational uncertainty’ occurred, overall. Typically, when participants stated that they were unsure what part of the representation meant, or were unsure how to achieve a sub-goal, their collaborator intervened, offering instructive comments and advice. These exchanges represent integral components of task collaboration (see Table 11).

Table 11 Mixed representations (audio-visual communication) transcript sample

Certain strategies cited by dyads revealed some striking differences between representational groups in the actual strategies adopted. Most groups stated that they adopted the strategy of counting cylinders. ‘Lining up data by colour or position’ was also a popular strategy. However, the strategy of ‘observing patterns along the spirals’ was more predominant with both 3D helix groups. This strategy is indicative of high computational offloading, but was cited less by the mixed or 3D cylinder groups.

Data occlusion remained a frequently cited representational usability problem, particularly with groups in audio-visual contexts. Audio-only groups cited data usability issues, including indistinct light/dark green colours, or one cylindrical colour standing out more than the other colours. Suggestions for new functionality were occasionally raised, while the 3D helix group incorporating audio-visual communication were more verbose regarding the theme ‘task difficulty’.

‘Misinterpretation of actions/comments’ represents implicit actions demonstrated by audio-only groups. A common mistake made by participants involved using the Cosmo rotate tool as their collaborator referred to the spin function (or vice versa, see Table 12). Only one dyad using audio-visual communication misinterpreted functionality.

Table 12 3D helix representations (audio-only communication) transcript sample

Finally, all dyads demonstrated elements of task collaboration, frequently proposing hypothesis statements and instructive comments or advice. The most frequent instances of splitting activities or sub-goals occurred with the mixed representations group with audio-only communication. Evidence of non-collaboration was more frequently demonstrated by both 3D cylinder groups, and by the mixed representations group with audio-only communication.

6.3 Questionnaire ratings

To test the prediction that lower questionnaire ratings would be submitted by the audio-only groups (see Sect. 4.1), the means for the five questionnaire dimensions (System Performance, User Control, Affective Experience, Usability and Communication Process) were calculated from all questionnaire ratings. Levene tests based on mean questionnaire dimension ratings were not significant, indicating equal variances. A 2×3 between factorial analysis of variance (ANOVA) (Communication: audio only, and audio and visual; Representation: mixed, 3D helix and 3D cylinder) revealed a significant main effect of communication for the Usability dimension (df=1, 114, F=4.650, p<0.05), and significant main effects of representation for Affective Experience (df=2, 114, F=5.571, p<0.05) and Communication Process dimensions (df=2, 114, F=3.517, p<0.05). No significant interactions were observed between communication and representation groups. Independent t-tests revealed significant differences in Usability mean dimension ratings between both groups using 3D cylinder representations, with higher ratings observed for the group incorporating audio-only communication. Further, Tukey tests also revealed significantly higher Affective Experience mean dimension ratings (p<0.05) for groups using 3D helix than mixed representations, and significantly higher Communication Process mean dimension ratings (p<0.05) for groups using 3D cylinder than mixed representations.

Figure 7 presents the mean ratings for each of the five questionnaire dimensions. High rating scores contribute to a high overall user experience score.

Fig. 7
figure 7

Mean ratings for System Performance, User Control, Affective Experience, Usability and Communication Process questionnaire dimensions (error bars represent standard error of means)

Further analysis included the binary responses for the question relating to interaction strategies (e.g. writing or remembering information, or neither). Figure 8 shows the number of participants by communication and representation group indicating their strategy choices. A large proportion of the 3D cylinders group with audio-only communication indicated that they relied on writing, rather than remembering information, which may be indicative of less computational offloading afforded by the representations. This trend was also apparent, although less dramatically, with the mixed representations groups. A quarter of both groups indicated no particular strategy preference. The strategies adopted by the 3D helix group with audio-only communication was roughly equal, whereas in the audio-visual condition, the group relied less on remembering information.

Fig. 8
figure 8

Number of participants indicating interaction strategy during task performance

7 Discussion

The prediction for an overall performance advantage for groups utilising audio-visual communication was based on the assumption that forced verbalisations were unnecessary, due to the presence of shared visual information. However, the results confirm that this hypothesis was partly confounded by the ceiling effects observed for tasks 1a and 5a, where all dyads tested achieved virtually error-free performance. For task 1a, all dyads quickly and correctly perceived the group demonstrating the heaviest usage, due to the obvious density and length of the dark green cylinders. Similarly, all but one of the 60 dyads tested, correctly determined which group called at 10 pm weekdays (task 5a). One participant offered an insightful explanation: “... if for that question you had chosen, I don’t know, 2 o’clock in the afternoon, or something, it would have been a lot, lot harder to pick out 2 o’clock in the afternoon, but because there’s just one person calling all the time at that time (10 o’clock) you can sort of pick it out.” This comment illustrates that lighter data distribution at certain diurnal periods allows call usage to be perceived more easily.

Of the three remaining tasks (tasks 2a, 3a, & 4a), some benefits were observed when audio-visual communication was adopted. Faster completion times were observed, especially when identical representations were shared, although differences were not significant. For task 4a, virtually error-free performance was achieved by groups using mixed and helix representations incorporating audio-visual communication.

Representing a methodological point, certain authors [10] maintain that a full evaluation of communication technology requires measures of both task outcome and communication process. Following this approach, we also consider qualitative measures derived from TBCA and post-test questionnaire analysis. The outcome of the TBCA provides an insight into the usefulness of certain representational and data attributes, and participants’ strategies, assumptions, and problems encountered during collaborative problem solving.

The TBCA exposed differences in the strategies adopted by participants between representation groups. The majority of dyads in both 3D cylinder groups positively reflected on use of the spin function, compared to only 3 dyads from both 3D helix groups. The first author found that spinning the 3D cylinder representation was an effective method in overcoming data occlusion during the expert walkthrough. A similar strategy involves positioning the 3D cylinder so that it appears horizontally flat while spinning the representation, as confirmed by one participant: “... ‘cause yes, sometimes they overlap, but when you get it at the right angle, and I found if I spin it around, then it’s quite easy to tell if they’re overlapping.”

The representational characteristic most frequently cited as helpful was the light/dark shading on the spirals. This characteristic affords the observation of patterns along the spirals, a strategy predominantly reported by groups using the helix representation, as reported in our prior work [5]. One participant manipulating the 3D helix offered an insightful comment, which suggests that shading and patterns along the spirals enhances computational offloading: “... I like the dark bits that tells you when it’s the night... and that you can click on it that tells you... it’s quite difficult to add up in your head, but I suppose you can see the pattern of where they are in the dark bits, so you don’t really need to write anything down.”

Another participant confirmed that data occlusion occurs if the 3D helix representation is viewed from the front, or ‘Start View’: “I just find it difficult to distinguish between the weeks, especially when it’s straight on, because some bars (cylinders) look like... you can’t see whether it’s a continuation or not (data occlusion).” Data occlusion was the most frequently cited usability problem, particularly by dyads adopting audio-visual communication. Arguably, shared visual information affords discussion, including usability and other issues.

The higher order theme ‘representational envy’ was specifically applicable to the mixed representations group in the audio-visual condition. The 3D helix was particularly ‘envied’ by participants using the 3D cylinder representation (one segment supports the prediction that the shaded and coloured segments appear more prominent on the 3D helix, see Table 10). In some instances, both participants preferred or envied each others’ representations, illustrated in Table 13.

Table 13 Mixed representations (audio-visual communication) transcript sample

Dyads with mixed representations appeared to make general assumptions about their collaborator’s display. One explanation is that participants dislike their own representation, preferring their collaborator’s instead, or that their collaborator is demonstrating better control of their representation. The segment in Table 14 illustrates two participants reaching a consensus that the 3D helix is preferable.

Table 14 Mixed representations (audio-visual communication) transcript sample

For the higher order theme ‘representational equivalence’, the audio-only mixed representation group instigated the highest number of attempts where participants attempted to establish whether representations were identical. The group also demonstrated more task splitting activities compared to other groups. After establishing representational incongruence, participants may have divided sub-goals to achieve more effective task collaboration. However, the similarity of the two representations may have confounded these findings. Unless a detailed verbal description is provided, the representations could be assumed to be similar, or identical, as illustrated at Table 15.

Table 15 Mixed representations (audio-only communication) transcript sample

A further benefit for the presence of shared visual information was demonstrated by participants correctly inferring their collaborator’s actions and explanations. A common mistake made by participants incorporating audio-only communication involved misinterpreting their collaborator’s references to representational functionality (e.g., confusing spin function with Cosmo rotate tool).

Shared visual information appears to have made an impact on certain questionnaire dimensions. Participants in the audio-visual groups provided higher ratings for Communication Process, as the presence of shared visual information aided communication between participants (e.g. forced verbalisation was unnecessary). In addition, participants using identical 3D helix and 3D cylinder representations in visually shared contexts expressed higher Affective Experience ratings. Generally, mixed representations attracted lower ratings overall for all five questionnaire dimensions, demonstrating significantly lower Affective Experience ratings compared to 3D helix groups, and lower Communication Process ratings compared to 3D cylinder groups.

However, the presence of shared visual information also supports discourse concerning other issues, such as usability problems. Groups incorporating audio-only communication provided higher Usability ratings than those in audio-visual groups. This finding was confirmed by the TBCA, where usability problems, such as data occlusion, were more frequently cited by audio-visual groups.

8 Conclusions

It is evident from the findings presented that the presence or absence of shared visual information influences collaborative problem solving. As highlighted by other studies (e.g. [5]), by looking at task performance alone, one cannot conclude that differences exist between audio-visual and audio-only conditions. However, qualitative measures exposed by the TBCA and post-test questionnaire analysis revealed differences between representation and communication groups. The analysis revealed that certain representational properties facilitated participants’ task completion, also highlighting that different strategies were adopted between representation groups. The 3D helix groups demonstrated computational offloading by observing patterns along the spirals, whereas the majority of the audio-only 3D cylinders group indicated (via their questionnaire responses) that they had relied on writing notes, which is indicative of lower computational offloading afforded by the representation.

The presence of shared visual information promotes discourse. Audio-visual groups more frequently cited usability issues such as occlusion problems, resulting in a lower Usability questionnaire score than audio-only groups. Audio-visual groups also submitted higher Communication Process questionnaire ratings, indicating an overall preference for shared information. An important benefit of presenting visually shared information within collaborative environments is that it provides the capacity to correctly infer actions and explanations. Audio-only contexts place greater emphasis on verbal references, forcing participants to verbalise tacit knowledge which can, in turn, be easily misconstrued.

Although the TBCA indicates that all representation groups demonstrated task collaboration, participants in the mixed representations groups submitted lower questionnaire ratings, overall, with significantly lower Affective Experience ratings compared to 3D helix groups, and lower Communication Process ratings compared to 3D cylinder groups. The reasons why these groups should provide lower subjective ratings is not entirely clear. It is possible that some participants in the audio-only group were influenced by the knowledge that they were not sharing identical representations, or were frustrated by their collaborator’s verbal descriptions. Similarly, it is feasible that some participants in the audio-visual group submitted lower ratings for their own representation, based on their impressions of their collaborator’s 3D display (representational envy).

To summarise, there are clear indications that the presence of shared visual information enhances collaborative problem solving. The inclusion of mixed representations in collaborative environments does not, necessarily, contribute to the communication process, whereas the inclusion of identical representations, particularly in a shared workspace, offers a more familiar setting for participation.

Further studies could examine problem solving behaviours under different shared distributed representations, for example, complementary representations (i.e. each participant accesses complementary subsets of the data set). The trade-off with complementary representations is that, although cognitive effort is arguably reduced as each collaborator makes inferences and judgements from reduced data sets, certain contextual information necessary to inform decisions is absent, forcing communication between collaborators in order to find task solutions. This concept follows the paradigm that the abstract task space of a group problem solving task can be distributed across individual representations in different ways [17], yielding different problem solving behaviours, performance measures and qualitative information.

Further work is also required to address some of the usability issues highlighted in this study—the main problem being data occlusion. Possible solutions could include data filtering, enabling subsets of data to be displayed at any one time in order to reduce data density. Although both representations were presented offline, either representation could be developed in desktop web-based VR environments to distribute data visualisations over the Internet, allowing multiple users to access the information concurrently.