Keywords

1 Introduction

Big Data is nowadays the source of wealth for thousands of companies. According to a 2017 report made by NewVantage Partners, where they surveyed senior business and technology executives from international and modern companies, 48.4% of executives found the use of data “highly successful” for their businesses [21]. Data is also the backbone of many recent scientific studies, particularly multidimensional data. One such case is in microbiology, where large sets of organisms and environmental data are studied to explain certain behaviors of microbial ecology [22]. Thus, the understanding of large data has become increasingly imperative for both business and knowledge.

Two methods of data analysis that have been used for centuries are visual analytics and data visualization. It was during the 18th and 19th centuries when visual thinking started to take shape. Most of the first large data sets referred to the social aspect of the communities, such as people, economics, and medicine, where diagrams such as nomograms were used to represent them [12]. As information got more complex, different approaches and techniques surged. Some of these focus on illustrating multiple data fields by increasing the dimensions of the visualization. Hence, varying degrees of 3D visualizations started to be implemented for distinct purposes. The use of Virtual Reality (VR), consequently, started to be extensively discussed for data analysis.

Experiments have been carried out to test the usability of 3D over 2D data visualization. Throughout these experiments, advantages and disadvantages have been explained for each type of visualization. Specifically, they have exposed the best instances for which each visualization is better suited for. Such findings will be mentioned in the Related Work section of this paper. Nevertheless, data keeps growing into higher levels of dimensionality for which 2D and 3D visualizations are not always suited to represent. It has been confirmed that VR excels in multidimensional data, particularly on spatial dimensions, as well as providing a sense of awareness, interactivity [10], and collaborative properties [19]. Additionally, its challenges have been recognized and tested [4].

This paper aims to examine and compare 2D, 3D, and VR data visualizations on specific use cases of the multidimensional data analysis. Thereby, this paper introduces VRParaSet, a model to visualize categorical data in a virtual space. The features of the model will also be outlined. Moreover, the paper will present a use case study of the VRParaSet, reporting how effective and efficient each data visualization was. Criteria for this evaluation will also be explained. Finally, a comparison will be drawn between each visualization, and a conclusion will be reached upon.

2 Related Work

Beginning with 2D and 3D data visualizations, Dubel et al. [11] noted that distinct forms of visualizations have their weaknesses and strengths. Pure and mixed forms of visualization were classified and evaluated. A3 + R2 is one of these classifications, and it refers to graphs consisting of 3D objects (data) in 2D planes (grid). They established the following criteria for their study: occlusion, cluttering, distortion, and scalability. It was found that 2D visualizations outperform in precise measurement and interpretations, while 3D displays outperform in navigation and relative positioning. In like manner, VR technology has been the topic of several studies intended to describe its potential in the data visualization field and its numerous applications.

Ribarsky et al. [23] introduced possible applications for VR in information visualization back in 1994. They developed a system of virtual graphical objects bound to data or glyphs that enabled users to interact with them. They also explored the concept of human-scale virtual environments, such as a construction site; and non-human-scale visualizations, such as a virtual book.

Van Dam et al. [4] reviewed Immersive Virtual Reality (IVR) properties by conducting experiments on the matter. They concluded that VR environments highly depend on immersive senses. They also noted the challenges for non-human-scale scientific visualization. These challenges included abstract data representations, large amounts of data, and interactivity, among others. Most recently, software systems have been developed to demonstrate the true scientific relevance of immersive data visualization.

Donalek et al. [10] directed a study on immersive data visualization (IDV). They presented an IDV application, iViz, that placed the user in a virtual space where a 3D scatter plot was shown. The virtual space allowed the user to interact with the 3D plot by selecting and shuffling data axis. The application provided “the first insight into the potential of immersive VR as a scientific data visualization platform.” Moran et al. [19] developed a tool to visualize spatial Big Data. The authors incorporated data analytics to this tool to exalt the power of VR data visualizations.

A couple of recent academic papers have dealt with the comparison of different dimensions of DV and the evaluation of Non-Human-Scale VR visualization. These studies have left several questions when evaluating their usability.

Firstly, Sullivan’s master thesis [25] consisted of a use case study comparing solely 2D and VR data visualizations. He developed a 3D scatter plot in a VR environment. Next, he surveyed users regarding a series of tasks they were asked to do. These tasks consisted of localizing and estimating data in the VR and the 2D visualizations. It was concluded that users with previous VR experience felt more comfortable in the VR environment. Nevertheless, most users were more successful localizing and estimating the data in the 2D plot. It is to be mentioned that the VR application was developed for the Oculus Rift VR Headset and a desktop keyboard. Thus, it was noted on the user studies that users had problems using the multiple buttons of the keyboard while wearing the headset. Moreover, the scatter plot design had some downfalls, such as the labeling and the scale of the graph.

Lastly, Gustafsson’s master thesis [14] comprised of a use case study to evaluate the viability of Non-Human-Scale visualization. This use case study involved the creation of a scatterplot in a VR environment. A high interaction was achieved by adding extensive controls for the user. He resolved that useful VR visualization depended on a high degree of freedom (DoF).

Multiple website and blogs are in existence, where they display original data visualization applications. One of these is “Night at the Museum” by Arwa Mboya [17]. This application allows the user to view a scatter graph, a timeline, and a visual music model through VR. While the music visualization feature is not scientifically faithful, the scatter graph and the timeline are. The timeline, for instance, offers the user a different perspective of time awareness by scaling the chart to human-sized. While users can navigate through the music history and request for details on demand, the VR timeline does not provide users an overview or compare the kinds of music at different periods of time.

Another application by the company Datavized [18] implemented a globe model and column chart to represent geographically-linked data. The spatial data representation property of VR DV was again tested and successfully implemented.

3 The VRParaSet design

VRParaSet is a tool developed to aid in a data visualization use case study. It allows the user to view and contrast a categorical dataset in 2D, 3D, and 3D in a virtual space (using Google Cardboard and Oculus Rift). The tool was implemented using the Three.js [20] library with multiple perspective [2] and control [28] exported modules. The application is hosted in GitHub [15], and the VR feature is compatible with Google Cardboard [13]. VRParaSet’s VR mode compatible with Oculus Rift [27] was implemented using the previous resources on top of an A-Frame [1] framework. The use of A-Frame has been shown to be an effective means of creating and sharing VR/AR experiences in [26]. This mode is hosted in GitHub [16]. The 3D graph generated by the application is a categorical stacked parallel coordinate plot as described by Dang et al. [7].

Fig. 1.
figure 1

Parallel sets visualization of the Titanic data set [24] on different models: (a) 2D, (b) 3D, and (c) VR on Google Cardboard.

3.1 VRParaSet features

The VRParaSet can be accessed from anywhere with an internet connection. The first screen prompts the user for a tag name to identify his/her answers and the starting category of the dataset. The start category is used to color-code the categorical data graph. The use case study begins when the play button is clicked. There are three modes to this tool: the 2D mode, the 3D mode, and the VR mode. When the user starts the application, one of the modes will begin at random. The 2D and 3D modes are meant to be experienced using a desktop or laptop. Thus, a use case study question is prompted at the top of the page where the user has an input text to enter his/her answer and continue to the next. There are ten questions in total. After all of them are answered, the next random mode begins automatically.

VRParaSet is a natural extension of Parallel Sets [9], hence, the 2D model is implemented using such library. As seen in Fig. 1(a), the categorical data is modeled vertically. Each category is categorized by the top category and represented by a color scheme. Such data categorization features are shared by the 3D model as well. Moreover, this library allows the user to highlight a particular category by clicking on the color code for a particular field. This feature also highlights the data based on the category level. For instance, if a user clicks on the third class adult male section, the female and child sections for the third class will not be highlighted. It is also possible to reset the highlight by clicking on empty space.

VRParaSet’s 3D mode extends the classification methodology of Davis’ Parallel Sets [9] by integrating additional dimensions into the visual space. Based on the description by Dang et al. [7], VRParaSet includes a categorical stacked parallel coordinate plot. This visualization contains stacked parabolas and stacked columns that represent the number of people. The parabolas connect two adjacent categories, and their heights represent the number of people pertaining to those categories, whereas the columns represent the number of people pertaining to a single category. The color scheme is also based on the first category data. Furthermore, the graph has data filtering capabilities as well. By clicking on a parabola, the user can filter out all records that do not contain the fields represented by the parabola. Similarly, clicking on a column also filters the data based on that column’s field. To represent the filtered data in the graph, the tally of a “new” data set is computed. Henceforth, the heights of the parabolas and columns are recomputed given the categories i, j, and k and the category values c, u, and v, each one pertaining to the given features respectively. The updated height measurements are then denoted by

$$\begin{aligned} | \{w | w \in D, w_{i}=c, w_{j}=u, w_{k}=v\} | \end{aligned}$$

where w is a sequence of values \((w_{1},w_{2},...,w_{n})\), n is the number of categories, and D is the original multiset of sequences. To reset the graph, the user has to click on an empty space in the scene. An instance of a data filtering can be seen in Fig. 2. The whole plot is centered on the screen and can be rotated by clicking on the scene and moving it to the desired angle view.

Finally, VRParaSet includes a VR mode as depicted in Fig. 1(c). The VR mode is supported by mobile devices and the Google Cardboard set [13]. This mode presents the 3D categorical plot in a virtual space where the user can navigate through it. The plot properties are exactly the same as in the 3D model, but the controls are different. Firstly, the scene provides a fixed center pointer, so the user knows where he/she is looking at. Secondly, the Cardboard sole button is used for both clicking and navigating. When the pointer is on top of the graph, the user can click the button to filter its content. The user can also click over an empty space to reset the graph. The only movement possible is forward, and that is achieved by keeping the button pressed. The user will move in the direction that the user is pointing to.

Fig. 2.
figure 2

VRParaSet VR visualization (A-Frame) of the Titanic dataset: filtered by male passengers.

4 Use Case Study

By conducting case studies, understandings of the benefits and shortcomings of distinct categorical data visualizations were to be established. Accordingly, a user case study was performed with 14 individuals. They all trained with the VRParaSet controllers, as Sullivan’s study, had demonstrated that useful VR applications are greatly bound by the user’s experience in VR technologies [25]. The study was expected to illustrate how multidimensional categorical data is perceived differently under varying visual dimensions, particularly, utilizing an affordable and low-complex headset such as the Google cardboard for the VR stage. This was important to highlight due to previous studies using expensive and cumbersome VR headsets with numerous and confusing controllers [14, 25]. The medium through which the 2D, 3D, and VR data visualizations were tested was the VRParaSet tool. Moreover, the dataset employed in the use case study was a Titanic dataset. This was chosen as this same dataset was employed for the initial 3D categorical stacked parallel coordinate plot [7], and questions regarding the data were easy to formulate.

4.1 Participants and Dataset

We recruited 14 researchers within the computer science and computer engineering department at a university. The age of our participants ranged between 20 and 36 years, with a mean age of about 26 years. The study computer was an iMac (Retina 5K, Late 2015) 3 GHz Dual-Core equipped with a standard 27-in. screen of resolution of \(2,560\times 1,444\) pixels used for the 2D and 3D visualizations and a Google Pixel 2 phone with Google Cardboard for the VR visualizations.

Regarding the test data, we opted for the Titanic data set [7, 9] containing information of 2,202 passengers: Class (Crew, First Class, Second Class, and Third Class), Age (Adult vs. Child), Sex (Male vs. Female), and Survived (Survived vs. Perished). Studied parallel coordinates are color-coded by the Class attributes.

4.2 Procedure

In this user study, participants observed the graphed data on three different models and answered a series of 10 questions regarding the data displayed. Each session, one per participant, began by explaining the purpose of the study and what was expected from him/her. They were then explained how the data was represented in each parallel coordinate plot. They were also taught how to use the controls for each model by having them practice with them. This introductory section lasted 10 min. Afterward, they were asked to begin the actual examination.

For the examination, each participant was shown each visualization model in random order. For each one of the models, each participant was asked a set of 10 questions. They then typed the answer to each question in an input box to be recorded along with the time they took to answer. Each visualization took approximately 10 min to be completed. Once the question examination ended, they were asked about each model’s strengths and weaknesses. The entire session lasted approximately 45 min.

4.3 Questionnaire

The following is the list of questions each participant was asked to answer for each visualization model.

  1. 1.

    Which class was the least populated?

  2. 2.

    Which class had the most survivors?

  3. 3.

    Which sex suffered the most deaths?

  4. 4.

    Which class suffered the most deaths?

  5. 5.

    Which class had the most children?

  6. 6.

    Which class had the most adult male survivors?

  7. 7.

    Did the second class children perished or survived?

  8. 8.

    Did most adult female perished or survived?

  9. 9.

    Which class had the most male survivors?

  10. 10.

    Which class had more female perished than female survivors?

4.4 Hypotheses

Based on our analysis of meaningful features for pattern recognition, we hypothesized that 3D and VR would perform better than the equivalent 2D model on the same set of data as we can utilize an additional dimension to present complex data sets.

  • H1—3D data visualization will have a higher percentage of questions answered correctly.

  • H2—3D data visualization will have the least amount of completion time.

Fig. 3.
figure 3

Study results: (a) accuracy, (b) completion time, and (c) user confidence on 2D, 3D, and VR on Google Cardboard.

4.5 Results

After calculating the completion time and accuracy percentage of each participant, the following results were obtained. The accuracy of the responses was highest on the 3D model, while the lowest was on the VR model, as shown by Fig. 3(a). Although the accuracy results support the initial hypothesis H1, all visualizations had close accuracy percentages with very similar percentage variance. In fact, all ranges of variation are between 25% and 27%. It is worth noticing that the overall accuracy percentage among all visualizations concur with our expectations that categorical data visualizations are difficult to understand. This was made evident as most participants asked for clarifications on what each component of both 2D and 3D plots represented. For instance, the number of people was represented by the width of the colored region in the 2D plot while in the 3D plot it was represented by the height of the parabolas and columns.

Furthermore, and perhaps more interestingly, the completion time of some model sessions varied a lot from user to user, as illustrated in Fig. 3(b). The visualization phase that took the least amount of time on average was the VR while the longest one was the 3D visualization. This completely contradicts hypothesis H2. Despite the similarity in accuracy variance, variance in time spent varied tremendously. The most consisted one was the 3D, and the least consisted was the VR. This is supported by the fact that most users believed the information in the 3D model was “easy to find” while the VR model, even though it had the same plot, had a frustrating and at times confusing navigation. This means that people tended to spend more time on the more difficult model (2D) and spent less time on the easier one (3D). However, people performed worse on the VR model even though most of them, as shown by the average, spent little time on it. Hence, it can be inferred that participants deliberately answer the questions quickly regardless of the accuracy of the answer.

4.6 Qualitative Evaluation

Every participant was asked which model was overall preferable after the examination was over. Their preference was based on the ability to find data and the ability to compare data. Out of 14, 9 participants preferred the 3D data visualization model. They claimed the 3D model allowed the data to be organized and easy to find. They also mentioned that the rotating and filtering features were very helpful in localizing other data regions in the plot. The last 5 participants preferred the 2D model. They noted that the 2D visualization allowed all data to be visible at the same time. This comment supports the 2D graph description of Dubel et al. [11]. These participants also mentioned that the 2D model was simpler due to the lack of controls needed to navigate the visualization.

4.7 Discussions and Limitations

Participants also noted many of the weaknesses and limitations of each model. The VR model had the most critics. Participants thought the navigation was highly impractical. They thought adding a button to move back was needed. It was also suggested to implement movement sensors instead of movement buttons. Moreover, several of the participants felt motion sickness at various degrees after trying it. They mentioned that not having a reference point, like a floor, made “navigating through a floating graph very unreal and confusing.” Nevertheless, they found the VR model interesting and believe it has great potential. The 3D faced some of the same issues as the VR one. Some participants mentioned that 3D objects were usually obstructed by other 3D objects. The perception of the data was also an issue for one of the participants.

Finally, for the 2D visualization, most users thought the graph was too crowded. They had a hard time finding certain data among the chart. Consequently, users felt it was challenging to compare data values. This becomes worse for higher-dimensional data as the Class (or colored) attributed will be split into many smaller portions as we travel to further dimensions.

4.8 VRParaSet A-Frame Extension

An extension to VRParaSet was developed to incorporate the feedback from the previous Google Cardboard session and to take advantage of the positional tracking and high-accuracy of more robust VR tools. This extension renders all of Three.JS (3D) objects in an A-Frame scene making the application compatible with most VR headset, including Google Cardboard and the Oculus Rift.

In order to use the Oculus Rift, first, the Supermedium [3] browser was added to the Oculus library. Supermedium is a free Oculus application that allows the user to access web pages with virtual reality content. Once the website is loaded in the Supermedium browser, the user will spawn in front of the chart which is on top of a gray table. An important improvement to the original VRParaSet is the navigation which is achieved by simply walking. One can also get a closer look at the chart by walking towards it, giving the user an immersive sensation. Filtering is achieved by using the pointer in the middle of the screen, which is controlled by the head movement and clicking the trigger button of the right-hand controller. Chart additions include a table underneath the chart and a floor for spatial reference, a color legend, lighting and shadows to improve realism, and high chart labels to avoid label obstruction.

A second part use case study was conducted with 4 participants who had tested the original VRParaSet and five new participants. They were all university graduate students. The use case study procedure was repeated for this second part as they were all explained the categorical data chart and given time to test the application using the Oculus Rift. The series of questions was also the same as in the first part. Each season, one per participant, lasted approximately 15 min.

Fig. 4.
figure 4

Study results: (a) accuracy, (b) completion time, and (c) user confidence on 2D, 3D, and VR models compared with A-Frame extension

Significant improvements through the utilization of a robust VR headset such as the Oculus Rift is evident. Both accuracy in Fig. 4(a) and average time in Fig. 4(b) are better than not only the Google cardboard VR, but as good or even higher than the 2D and 3D model measures. This is intriguing since more than half of the second part user case participants had not seen the parallel coordinate chart before and performed as good as those who had seen it previously as demonstrated by the boxplot boundaries. Average confidence level in Fig. 4(c) using the Oculus is also sufficiently higher than the Google cardboard to assume users have a more accurate understanding of the data representation.

Moreover, the participants had previously participated in the first part of the user case study did comment that the visualization in A-Frame using the Oculus was tremendously better. They said that the navigation and interactions with the chart were “intuitive” and felt as if they were “seeing the graph on their actual desk.” They also said that they felt little to no dizziness compared to the Google cardboard VR model. All of them mentioned that this VR visualization was better than their preference in the first case study. The new participants found the chart difficult to understand at the beginning but thought the concept of data visualization through VR was highly interesting and promising. Overall, most of them believe that this type of data visualization has enormous potential, especially for complex information such as multidimensional data. A couple of participants did add that they would like more motion interactions with the chart, such as moving it with the Oculus controllers from side to side or up and down.

5 Conclusion and Future Work

This paper presents a user study of multidimensional categorical data on 4 data visualization models utilizing the VRParaSet tool. The VRParaSet supports four modes: 2D mode, 3D mode, a VR mode for Google Cardboard, and a VR mode for Oculus Rift through an A-Frame extension. The VRParaSet was evaluated on a qualitative study of 19 users which lead to conclusions on the viability of three distinct data visualizations. It was found that 2D, 3D, and VR settings do not greatly affect the ability of a user to find and examine data. It was observed that a 3D parallel coordinate plot outperforms a 2D plot due to the organization of the data and its neat representation. Nevertheless, it was concluded that a VR visualization requires enough controls to move around the scene comfortably and that space where the plot spans should have reference points. If that were not the case, users tend to get frustrated and confused when analyzing the visual data.

The current user studies on multidimensional visual analytics models are limited to categorical data. In the future, more extensive studies on complex data set (higher dimensional and different data types) should be conducted to have broader views/conclusions as there are various visualization techniques for high-dimensional data [8], such as scatter plot matrices [29] and their variances [5, 6]. Additionally, testing other VR headsets capabilities such as hand trackers would allow for the exploitation of the benefits of VR technologies for non-human-scale data visualizations. Overall, it would be ideal to also implement practical extensions of this technology in areas such as education or project management.