Keywords

1 Learning Objectives

In this course, students will learn how to produce data-driven visualizations for both exploration and communication purposes, given the types of data and visualization goals and tasks. By exploring real-world examples, they will learn to identify and avoid misleading visualizations. They will also learn how to convey their insights through data stories.

2 Content

The course material is divided into seven sections:

  1. 1.

    Why visualize: this section argues for the need to visualize data to improve understanding and communication. By integrating statistical data and visualizations, we show how we are able to improve the reader’s understanding of the underlying information.

  2. 2.

    Human perceptual system: this section presents basic concepts of the human perceptual system and how they apply to visualization, such as visual variables – e.g., color (hue, luminance, and saturation), size (length, height, width, area), shape, orientation, opacity, position – and Gestalt principles.

  3. 3.

    Types of data: this section introduces different types of data, e.g., categorical, numeric, hierarchic, network, temporal, and spatial data, illustrated through real-world examples.

  4. 4.

    Basic charts: this section describes widely used charts and some of their variations, e.g., bar chart, line chart, histogram, dot plot, box plot, scatter plot, bubble plot, maps, trees, networks, and heat maps. We show good and bad examples, and discuss common mistakes and misleading visualizations.

  5. 5.

    Visualization tasks: this section describes how the set of suitable visualizations is determined by the combination of data types and visualization tasks – e.g., retrieve value, filter, compute derived value, find extremum, sort, determine range, characterize distribution, find anomalies, cluster, and correlate [1]. As in some situations a single visualization is insufficient, we explore how visualizations may be combined to achieve an exploration or a communication goal.

  6. 6.

    Interactivity in data visualization: this section presents some interaction mechanisms for manipulating data visualizations, e.g., filtering, zooming, brushing and linking. For high-volume data, complex data, or complex visualization tasks, interactivity is essential for proper understanding.

  7. 7.

    Storytelling with data: this section discusses how to tell stories by using visualizations within a narrative so as to communicate data-driven insights. As humans learn better through stories, data stories may create empathy and be more memorable than isolated data facts and visualizations.

For pedagogical reasons, these sections will not be presented in a strict sequence, i.e., content from different sections may be intermingled.

3 Course Format

The course is formatted as a 3-h lecture split in two sessions, with several examples given throughout.

The examples will be explored in both top-down and bottom-up approaches: (i) by presenting students with visualization goals, we will explore how to produce data-driven visualizations to achieve them; and (ii) by presenting students with (well and poorly designed) visualizations, they will be provoked to reflect on their quality and consequences, as well as discuss alternative visualizations to achieve the same goal.

All slides, public data sets, and code (in R and Python) used in the course will be provided beforehand, as well as instructions on how to install the (open source) software necessary to run the examples.

4 Intended Audience

The intended audience is made up of students, young researchers, and professionals in information and communication technology-related fields who have not yet had a systematic exposure to data visualization. As an introductory course, it is not suitable for people with advanced knowledge of visualization concepts and techniques.

Programming knowledge of R and Python is not required, but students are encouraged to bring their own laptops to run the course examples throughout the sessions.