1 Introduction

Statistical charts, also known as statistical graphics, play an important role in statistical analysis (Friendly 2008) and exploratory data analysis. They present data in graphics form to provide insights into the underlying structure of data, such as distribution, trend, correlation, and outliers. A variety of statistical charts are developed to present different categories of data. For instance, scatterplots, line charts, bar charts, and parallel coordinates are proposed to present data of various dimensions.

In the big data era, conventional statistical charts are confronted with new challenges when data grow in scale and complexity (Sarikaya and Gleicher 2018). Therefore, the design space of statistical charts needs to be enhanced to address scalability, complex data characteristics, and various analysis tasks. Meanwhile, the usage scenarios of each chart have been expanded in terms of data characteristics and tasks. Generally, the enhanced charts are referred to as statistical charts that vary the original encoding (e.g., the continuous scatterplots Bachthaler and Weiskopf 2008) or add new visual channels to tackle the new challenges.

Being confronted with the large design space of enhanced charts, it is quite cumbersome for naive users to identify appropriate charts and fulfill the visual mapping and interactions (Li et al. 2018). In this work, we aim to help users select statistical charts and associated enhanced designs for specified usage scenarios. Specifically, we review use cases of line charts, bar charts, scatterplots and parallel coordinates. There are several considerations with respect to the reviews. Firstly, they cover the most representative graphic elements, including dots, lines, bars, and areas (a variation of line charts). Secondly, we focused on enhanced statistical charts only. For example, although the pie chart is also widely used, it is not included in this review because it has few enhancing instances. Thirdly, we limit our discussion on tabular data visualization. Following the framework of Sarikaya and Gleicher (2018) and Munzner (2014), we identify the data characteristics, tasks, and design space of each chart, and reason the design space and use cases by tasks and data characteristics. Besides the tasks, we discuss the challenges of each type of charts, such as data scalability and dimension scalability, and consider them as the targets of enhancements with respect to specific tasks. Although Sarikaya and Gleicher (2018) have given a thorough review of scatterplots, our survey remains to cover scatterplots to provide a systematic framework that guides designers to select charts and explore design spaces in terms of data characteristics and tasks as well as challenges. Our framework also provides a base for designers to develop new forms of statistical charts design.

Because statistical charts are widely used in visualization community, covering all papers using enhanced chart is infeasible. Instead, we focus on covering typical enhancement approaches and sample corresponding instances. We limit the scope of our survey in major conferences and journals on visualization, including IEEE VIS, EuroVis, PacificVis, IEEE TVCG, CGF, JOV, and JVLC. Papers in these venues in the last ten years are surveyed and sampled. In addition, a few papers from other areas, such as statistics, are included for their distinct enhancements.

The rest of the paper is organized as follows. In Sect. 2, we review the taxonomies of statistical charts. Section 3 reviews the application cases and taxonomies of each chart in detail. Subsequently, we discuss and conclude our survey in Sects. 4 and 5.

2 Background

In this paper, we illustrate the enhancements of statistical charts in a task-and-challenge-driven manner. According to Munzner’s paradigm of visualization design, “What-Why-How” (Munzner 2014), the design spaces are elicited by challenges from data characteristics and tasks. In this chapter, we state the taxonomies of statistical charts in the aspect of data, tasks, and designs.

2.1 Data types and characteristics

Usually, data types and characteristics are the first consideration in information visualization design (Munzner 2014). Several taxonomies have been proposed to guide the selection of charts and designs. Shneiderman (1996) summarized seven data types. In his taxonomy, multi-dimensional data are close to our definition of tabular data that has multiple attributes. Munzner (2014) categorized the attribute types into three types, including categorical, ordinal, and quantitative. Specifically, Sarikaya and Gleicher (2018) summarized the characteristics of tabular data for scatterplots, including class label, number of points, number of dimensions, spatial nature, and data distribution. They concluded that some characteristics, such as number of data items, number of dimensions, yield challenges to conventional charts.

The lexicon of the above-mentioned taxonomies touches several aspects, including data types (such as multi-dimensional, tree and network), data characteristics (such as the number of dimensions), and tasks (such as presenting the distribution of data). Among these taxonomies, we would like to choose a standard one for tabular data to put our discussion into a unified framework. Following Munzner (2014), we categorize the major data attributes into categorical, ordinal, and quantitative. Some attributes, such as class label, spatial nature, and temporal, can be classified into these three major types. We also identify a set of characteristics, such as number of data items and number of dimensions, as derived characteristics. These derived characteristics yield challenges to visualizations when the scales of them grow. The remaining characteristics, such as distribution, are considered as the aim of tasks for the clarity of discussion.

2.2 Tasks

Tasks are an important consideration in choosing visualizations (Schulz et al. 2013; Zhang et al. 2017; Mei et al. 2018). A thorough discussion on tasks is out of the focus of our review. We list primary tasks of each plot following Munzner (2014), and point out the reason of enhancing techniques in the aspect of tasks. For instance, line charts are designed for showing trend of a variable. Its variation, area chart, is proposed when users want to present the accumulation of the variable. In this sense, we introduce the tasks as the driven force of enhancements, as well as challenges from the data characteristics.

2.3 Designs

The design space of information visualization is built by visual marks and channels. Recently, Sarikaya and Gleicher (2018) collected the design decisions in scatterplots. They clustered the design choices as four major types: point encoding, point grouping, point position, and graph amenities. We extend their clustering to general statistical charts as encoding, grouping, position, and graph amenities, respectively.

Visual encoding, which contains visual marks and channels, is the first consideration of visual design. In the scope of this paper, the original visual marks include points (scatterplots), lines (line charts and parallel coordinates), bars (bar charts and histograms), and areas (area charts). As summarized by Sarikaya and Gleicher (2018), the visual channels contain color, size, symbols, outline, opacity, texture, depth of filed, and blurriness. Munzner (2014) summarized the theory of visual marks and channels and the ranking of channel effectiveness. Generally, the ranking of channels is task dependent.

3 Enhanced charts

3.1 Line charts

The line chart was firstly proposed by Playfair et al. (2005) in 1786. It has been widely used in visualization applications (Ma et al. 2016; Wu et al. 2018, 2019). The connections across data points (as shown in Fig. 1a) present the trends of a series. The temporal trends and other patterns of interest can be easily perceived according to the up and down slopes of data changes. However, when handling data with multiple series, large-scale, or extended tasks, standard line charts are infeasible to present needed information. A large amount of enhancements have been developed to eliminate these challenges.

Fig. 1
figure 1

Standard line chart and its enhancements. a A traditional line chart (Munzner 2014); b a line chart with color and shape encoding (Pagot et al. 2011); c comparison through small multiples (Chang et al. 2007); d statistical aggregation (Andrienko et al. 2010); e Focus+Context approach (Kincaid 2010). f Interactive lenses on line chart (Zhao et al. 2011)

3.1.1 Handling multiple series

As a commonly used enhancement, the colors of lines (Chen et al. 2015b, 2019) and the shapes of nodes (Liu et al. 2015) are employed to help viewers quickly identify and compare the trends of the different dimensions. Another example (Pagot et al. 2011) is shown in Fig. 1b. Alternatively, multiple series can be presented in parallel with the small multiples technique. For example, Chang et al. (2007) proposed a parallel line charts system to present the multiple dimensions of the represented data, with each row of line charts showing a specific series (Fig. 1c).

3.1.2 Handling large-scale data

Along with the growing number of series, it is difficult to identify lines and points in the line charts, because they overlap each other and bring much visual clutter. In order to explore the potential features of interest, a large amount of improvements have been conducted to enhance the visual expression of line charts (Zhao et al. 2018b; Muelder et al. 2016; Shi et al. 2012). For example, Andrienko et al. (2010) designed two graphs (Fig. 1d) to indicate that the calling behavior on Saturday and Sunday differs from that in the working days. The upper graph is a traditional line chart, in which the temporal records of 238 areas are depicted as lines, overlapping each other. The lower graph is a statistical aggregation of lines, in which the significant statistical variables are presented to better depict the feature trends of original dataset. Other than aggregation solutions, Kincaid (2010) proposed SignalLens (Fig. 1e), in which a Focus+Context approach was provided to get deeper insights into the low-level signal details in the context of the entire signal trace. Liu et al. (2018a) employed blue noise sampling to reduce the number of series while preserving major patterns.

3.1.3 Facilitating expression and tasks

Driven by various requirements of analytical tasks, the traditional line charts are enhanced in different manners. For example, Guo et al. (2019) combined pixel map and line chart to visualize details of variable correlations. Hao et al. (2011) proposed a visual analytics approach for peak-preserving prediction of large seasonal time series, in which color cues are presented to show the difference between the actual and predicted data; the certainty ban shows the confidence of prediction, and the most significant data points are highlighted in the dark shaded area. Zhao et al. (2011) proposed a novel visualization technique called ChronoLenses (Fig. 1f). Users can construct an interactive lens on the span of line chart and perform various transformations on the data. Furthermore, a flexible and reusable time-series visual analysis interface would be created through changing the parameters of lenses.

3.2 Parallel coordinates

Parallel coordinates are a common means of visualizing multivariate data (Inselberg 1985; Al-Dohuki et al. 2017; Xia et al. 2018a). In parallel coordinates, the axes of an n-dimensional space are represented as n parallel lines (see Fig. 2a). A data item in n-dimensional space is visualized as a polyline with vertices on the axes. The position of the vertex on the i-th axis encodes the i-th coordinate of the data item. With this visual encoding, parallel coordinates support analyzing the distribution of data items in each axis and the correlation between neighboring axes. In the past, various enhancements of parallel coordinates have been proposed to facilitate tasks and handle challenges. While there are tremendous variations in the literature, we only review distinct encoding of polylines and enhanced layouts of axes.

Fig. 2
figure 2

Standard parallel coordinates and its enhancements. a Standard parallel coordinates (Holten and Van Wijk 2010); b bundled curves (Palmas et al. 2014); c flexible linked axes with integrated scatterplots (Claessen and van Wijk 2011); d parallel sets (Kosara et al. 2006); e parallel coordinates with integrated scatterplots (Yuan et al. 2009); f parallel coordinates combining with scatterplot matrix (Viau et al. 2010)

3.2.1 Encoding of polylines

Because parallel coordinates are highly related to line charts, many enhancing techniques for line charts can be performed on parallel coordinates, such as using of color and opacity (Zhao et al. 2019; Holten and Van Wijk 2010) and sampling (Ellis and Dix 2006). These methods are proposed to encode additional information or handle the large-scale data problem. When dealing with category data, Kosara et al. (2006) proposed parallel sets, which encode the number of data items into the width of polylines (see Fig. 2d). Different from line charts, the position and shape of lines in parallel coordinates are flexible. Therefore, many designs replace polylines with smooth curves (Graham and Kennedy 2003) and bundled curves (Palmas et al. 2014) (see Fig. 2b) to facilitate the visually tracing of data items.

3.2.2 Layout of axes

Visualizing the correlation between adjacent axes is one of the major analysis tasks in parallel coordinates. An appropriate dimension ordering is critical to reveal patterns in dealing with multi-dimensional data. A traditional solution is to enable interactive ordering or order axes according to some measures. For instance, Peng et al. (2004) reordered the axes by calculating outliers between neighboring dimensions to reduce the visual clutter. Furthermore, Zhou et al. (2018c) proposed cluster-aware method for parallel coordinate plots to achieve semantic dimension ordering. Another popular solution is integrating scatterplots into parallel coordinates and yielding new layouts. Yuan et al. (2009) combined scatterplots with parallel coordinates to take advantages of both visualizations (see Fig. 2e). The visualization of converting two neighboring axes into a scatterplot shows relationships among multi-dimensions. Claessen and van Wijk (2011) proposed flexible linked axes and integrated scatterplots into the visualization to present multivariate data (see Fig. 2c). Viau et al. (2010) proposed Parallel Scatterplot Matrix that combines a scatterplot matrix and parallel coordinates to visualize and select features within a network (see Fig. 2f).

3.3 Bar charts

A bar chart (Fig. 3a) presents counts of categorical data items with bars, whose length encodes the counts. In another word, it presents a two-dimensional data, where the key attribute is categorical and the other attribute is quantitative (Munzner 2014). The bars can be plotted vertically or horizontally (Gu et al. 2018; Wu et al. 2017; Zhou et al. 2018b). A bar chart supports value comparison of different categories.

Fig. 3
figure 3

Standard bar chart and its enhancements. a Standard bar chart (Munzner 2014); b stacked bar chart (top) and grouped bar chart (bottom) (Streit and Gehlenborg 2014); c nonlinear dot plots (Rodrigues and Weiskopf 2018); d an expressive bar chart (Kim et al. 2017); e mosaic plots (Wickham and Hofmann 2011); f A 3D bar chart (Meuschke et al. 2017)

3.3.1 Handling multiple dimensions

Stacked bar charts and grouped bar charts (Fig. 3b) are the most common variations when there are two key attributes (Chen et al. 2018b; Streit and Gehlenborg 2014; Xie et al. 2014). Generally, they present sub-bars corresponding to the second key attribute and encode sub-bars, e.g., encoding with color (Liu et al. 2018b; Wang et al. 2018b; Chen et al. 2017). In stacked bar charts, each bar is stacked by multiple sub-bars to present the values of sub-categories (Zhou et al. 2018e; Liao et al. 2015; Huang et al. 2019). In comparison, grouped bar charts plot sub-bars in the category axis to compare the values of sub-categories (Wang et al. 2014; Zhou et al. 2019; Kamw et al. 2019). Other than these two variations, Taher et al. (2016) layouted two key attributes in a squared bottom, and presented each bar as a physical 3D stack. Similarly, Meuschke et al. (2017) presented 3D stacks while the two key attributes represent spatial information (Fig. 3f). Another solution to handle multiple dimensions is to use area rather than length to encode the value. In such case, bar charts are transferred into mosaic plots (Wickham and Hofmann 2011) (Fig. 3e). It can support more than two key attributes. Chen et al. (2016) proposed another solution to layout the bar charts of different dimensions in a matrix, which is similar to scatterplots matrix.

3.3.2 Handling composite attributes

When analysts are interested in not only the value but also the distribution statistics of each bar, such as the maximum and minimum, level lines are added to each bar to show its statistics (Hajizadeh et al. 2013). However, it may be misleading since the bar is encoded with length, while the level lines are encoded with position (Streit and Gehlenborg 2014).

3.3.3 Facilitating expression and tasks

A great number of approaches are proposed to facilitate the expressiveness of bar charts and analysis tasks (Zhou et al. 2018d). Usually, color channels are used to distinguish data of different categories (Xie et al. 2014). To highlight the part–whole relationship, a part of bars or sub-bars can be encoded in different colors (Hajizadeh et al. 2013; Wang et al. 2018a). To emphasize the relative contributions of each sub-bar, normalized bar charts normalize each bar to a uniform length. To facilitate the comparison among bars, Unger et al. (2018) attached level lines to bar charts. While the conventional rectangular bar works well in bar chart, the general public would appreciate more expressive design. Kim et al. (2017) proposed an approach to generate data-driven graphics, in which the bars are represented as expressive graphics.

3.3.4 Handling individual items

While the bars only represent the counts of each category, designers would like to look for individual items in bar charts. Wang et al. (2019) employed bar chart to visualize degree distribution with discontinuous x axis for degree. Dot plots (Wilkinson 1999) stack data items as points in bars (Fig. 3c). Recently, Rodrigues and Weiskopf (2018) presented nonlinear dot plots allowing a dynamic size of points. Ren et al. (2017) presented each bar as stacked glyphs of people to provide an expressive presentation.

3.3.5 Histograms

Histograms (Pearson 1895) can be considered as a variation of bar charts. It uses the lengths of bars to encode the frequency or frequency density of values. Figure 4a shows an example of the original histograms. Although it has a similar presentation with bar charts, the key attribute for a histogram is continuous rather than discrete. Usually, the first step to construct a histogram is to aggregate the key values into a set of bins.

Fig. 4
figure 4

Standard histogram and its enhancements. aGuo et al. (2011): standard histogram. bGeng et al. (2011): angular histogram. cWan and Hansen (2017): the histogram with an extra axis. dUnger et al. (2018): the sequence of partial encoded histograms. e Wickham and Hofmann (2011): mosaic plots. fAlsallakh et al. (2014): the histogram shaped as arcs

Having the similar shape with bar charts, histograms also have enhancement approaches alike. The bars or partial of bars can be encoded to highlight the distribution of partial data items (Unger et al. 2018; Chen et al. 2015a) (Fig. 4d). Similarly, stacked histograms are developed to show part–whole relationships (Wickham and Hofmann 2011; Andrienko et al. 2018) (Fig. 4e). van den Elzen and van Wijk (2011) compared different variations of histograms, including stacked histograms, smoothed histograms and streaming graphs. They concluded that smoothed histograms prevent discontinuities for easier interpretation, and streaming graphs are best suited to see individual class distributions as well as quantities. Different layouts of histograms have also been proposed for specific scenarios. For high-dimensional data visualization, Fan et al. (2017) used a color-encoded smoothed histogram to present the entropy information in the event stream. Wan and Hansen (2017) added extra axes for higher-dimension data. Geng et al. (2011) presented angular histograms to present the frequency of data in each axis (Fig. 4b). In a radial-layout visualization, the bars of a histogram can be shaped as arcs (Alsallakh et al. 2014, 2012) (Fig. 4f).

3.4 Scatterplots

Scatterplots encode objects with two quantitative attributes as marks in a two-dimensional space. Figure 5a shows an example of original scatterplots. The two attributes are encoded as positions in the two axes, respectively. Munzner (2014) summarized that original scatterplots are suitable for hundreds of data items. When the scale and complexity of data grow, the design of traditional scatterplots is enhanced to handle new challenges (Sarikaya and Gleicher 2018; Zhou et al. 2018a; Ma et al. 2018).

Fig. 5
figure 5

Standard scatterplot and its enhancements. a A standard scatterplot (Munzner 2014); b a scatterplot with color encoding (Sarikaya and Gleicher 2018); c scatter plots with color encoding and shape encoding (Gleicher et al. 2013); d the blurred scatterplot (Feng et al. 2010); e the visual abstraction of scatterplot (Chen et al. 2014); f progressive splatting of continuous scatterplots (Heinrich et al. 2011); g BubbleSet (Collins et al. 2009)

3.4.1 Handling multiple dimensions

Usually, designers enhance the scatterplots with additional visual channels to encode additional attributes (Wu et al. 2015). Sarikaya and Gleicher (2018) summarized that possible channels include color, size, symbols, opacity, texture, depth of field, and blurriness. Among these channels, Gleicher et al. (2013) showed that symbols are weaker than color in identifying multiclass scatterplots (Fig. 5c). Li et al. (2009) evaluated the performance of symbols as well as the size, and Li et al. (2010) studied the discrimination of opacity in scatterplots, respectively.

For categorical attributes, such as class, a great many of works take color as the first choice (Ma et al. 2017). For instance, Brown et al. (2012), Aupetit et al. (2014), and Chen et al. (2015a) encoded the class of points into color channel, respectively. Xia et al. (2017) allowed users to set color of points to identify their classes. Redundant channels have also been used to emphasize an attribute (Kanjanabose et al. 2015).

For quantitative attributes, designers often choose size, opacity, and blurriness to encode them. Usually, uncertainty is encoded to opacity channel (Xia et al. 2018b) or blurriness (Feng et al. 2010) (Fig. 5d). Inspired by the concept of depth of field in optics, Staib et al. (2016) encoded the distance to the current focus into blurriness. Choo et al. (2014) encoded citation counts into size channel.

When there are multiple additional attributes, designers often encode them into multiple channels. For instance, Choo et al. (2014) encoded three attributes into color, symbols and size, respectively. When the dimensionality of data continues to grow, the increasing of visual channels results in perception burden rapidly. To address this issue, one choice is to use a fan-like glyph to represent attributes (Liao et al. 2018; Kwon et al. 2017; Zhao et al. 2018a). When the data item represents an image, designers can directly plot the images in the scatterplots (Tenenbaum et al. 2000; Dang and Wilkinson 2014; Chen et al. 2018a).

Table 1 The enhancement techniques of statistical charts are clustered into four categories which are similar to Sarikaya and Gleicher (2018)
Table 2 Representative enhancement techniques and corresponding challenges and tasks

3.4.2 Handling large-scale data

When the number of data points grows, visual clutter happens, i.e., marks overlap each other. To address this issue, strategies can be categorized as reducing the data, simplifying the visual representation, and modifying the space of the plot (Sarikaya and Gleicher 2018). An example to reduce the data is subsampling (Bertini and Santucci 2006; Chen et al. 2014). Mayorga and Gleicher (2013) addressed the overlapping issue by abstracting dense regions as smooth shapes and subsampling outlying points in the remaining regions. Generalized scatterplots (Keim et al. 2010) distorted the representation to take advantage of unused space. Similarly, continuous scatterplot approaches (Bachthaler and Weiskopf 2008; Lehmann and Theisel 2010; Heinrich et al. 2011) transfer discrete data items into a continuous field and generate a color map to show the field (Fig. 5f). In the same line, Kernel Density Estimation (KDE) approaches (Lampe and Hauser 2011) estimate the density of data items and create a color map to represent the density distribution.

3.4.3 Handling composite attributes

To represent composite attributes, such as scalar fields, directions, and relationships, graph amenities are added into the scatterplots (Zhu et al. 2019). Cheng et al. (2016) used iso-contours and kernel density estimation to encode scalar field information. Similarly, Cheng and Mueller (2016) encoded scalar-filed to iso-contours. Chen et al. (2014) encoded the trend of data with a line upon each point to identify the direction. The BubbleSet approach (Collins et al. 2009) represents the group relationship among points by bubble-like iso-contours (Fig. 5g).

4 Discussion

In this section, we would like to shed light on the distribution of enhancement techniques among the four types of charts and reason the distribution we found in the literature. Besides that, we also discuss our framework by listing the challenges and tasks in an exemplar manner.

4.1 The enhancement techniques

Enhancement techniques are referred to added or varied encodings for strengthening visual presentation of statistical charts. Inspired by Sarikaya and Gleicher (2018)’s work, we categorize enhancement techniques into four types, i.e., encoding, grouping, position, and graph amenities. In Table 1, we fill the concrete enhancement techniques, which are found in the literature, in corresponding cells. We found that some techniques are feasible for all four kinds of charts, such as graph amenities (we consider histograms as a variant of bar charts). On the other hand, encoding, grouping, and position are found in a part of types of enhanced charts only. For instance, size can encode an additional attribute in scatterplots. Similarly, in line charts, the width of line can be used to encode an additional attribute. However, in bar charts and histograms, we have not found such an encoding using size or width channel. The reason behind this difference is the inherent feature of primary marks of these charts. It is also the main reason of the difference of grouping and position methods among the four charts.

It is worth noting that there are not available techniques for the options in blank cells. Besides indicating a possible mismatching between enhancement techniques and charts, it may also suggest potential research opportunities. For instance, the shape abstraction is used in scatterplots, and could be used in line charts to show the trend and distribution of a group of polylines.

4.2 Challenges and tasks

We present this survey in a challenges-and-tasks-driven manner. Table 2 presents typical examples of approaches to handle challenges and analysis tasks. We have identified seven typical challenges and tasks for four types of charts in the literature. Our survey indicates that high dimensionality is the most frequent challenge and the primary driven force of statistical charts enhancement. The other two major challenges, large data size and large data range, are mainly identified in line charts and scatterplots.

Table 2 provides suggestions for designers to select proper charts and designs. In the first step, designers can choose the proper type of chart according to the data type and the major analysis task. Subsequently, designers can identify the data characteristics and derived tasks, e.g., the number of data items, the number of dimensions, and the range of values. The characteristics and tasks, which may yield challenges, lead to the choice of design space. This table gives examples when the chart type and challenges are identified. Although the supported data types and tasks of different types of charts may overlap with each other, we suggest making the decision of chart type and designs following the above process. In this way, the performance of the major tasks could be maximized.

5 Conclusion

Statistical charts are widely used in exploratory data analysis and are the origins of many visual forms. While there are many available enhancement techniques, understanding the design strategies and making the right choice are valuable. In this paper, we have presented a challenge-and-task-driven framework to help design decisions. We have provided abundant examples and indicated potential areas for innovation in the design of the four statistical charts.