Introduction

Deciphering patterns of population structure and individual ancestry remain fundamental problems of population genetics (Endler 1977; Cavalli-Sforza et al. 1994; François and Durand 2010). Bayesian methods are widely used to infer population structure, which are implemented in number of programs (reviewed in François and Durand 2010). Many of these approaches use multi-locus genotype correlations and solve, using a Markov Chain Monte Carlo (MCMC) method (with the exception of BAPS, which uses a maximization algorithm (Corander et al. 2008)), for the most likely proportions of an individual’s genetic ancestry (q) in two or more (k) genetic clusters. For all of the individuals in a given study, these values can be represented in a matrix called the Q matrix.

Several of these programs incorporate geospatial data from individuals or sampling localities, for example GENELAND (Guillot et al. 2005), BAPS5 (Corander et al. 2008), POPS (Jay 2011), and TESS (Chen et al. 2007; Durand et al. 2009). These programs allow the user to output their results as a bar plot (Fig. 1a) and to use geospatial data to map population structure across a study area. However, these programs offer limited options for displaying multiple clusters in a single image or to display overlap between them. For example, the TESS hard-clustering method represents individuals into cells, which are shaded a single color (Fig. 1b) representing their membership in a single cluster. This hard-clustering method makes it impossible to represent overlap between clusters. In addition, Universal Kriging analysis (Ripley 1981) has been used to interpolate population structure across sampled and unsampled parts of a study area (R script for Universal Kriging method available at: http://membres-timc.imag.fr/Olivier.Francois/admix_display.html). Problems with this Universal Kriging method include: (1) interpolated fractions of ancestry for each cluster that are plotted on separate maps; and (2) color displays cannot be changed (Fig. 1c). The program POPS also provides methods to summarize maps based on Universal Kriging interpolation scripts available as an R script (Jay et al. 2012) (Fig. 1d).

Fig. 1
figure 1

Graphical representation of example data included with the TESS (ac) and POPS (d) distribution packages. a A bar plot assuming three populations created using DISTRUCT (Rosenberg 2004). b Color-coded Voronoi cells from TESS. c Four images created using an R script using Universal Kriging to display the spatial distribution of three different clusters. d Map of admixture coefficients using POPS example data, created using Universal Kriging interpolation scripts in R

The program TESS also contains an option to create ASCII based posterior predictive maps of admixture proportions, but offers no way to display them. TESS generates a single ASCII file for each cluster specified by the user during any particular TESS run (Fig. 2a). These ASCII files are two-dimensional predictions of the Q matrix given a simulated coordinate space of n pixels. For k clusters assumed by TESS, the posterior predictive mapping function outputs k files, each with predictions of probable ancestry for each pixel on the map. In this paper, we present a new post hoc method for mapping the structure of multiple clusters simultaneously based on individuals’ q values and predictive maps. This method is implemented in the program, TESS Ad-Mixer, which is a Clojure program that runs on a Java virtual machine (JVM).

Fig. 2
figure 2

The process of using TESS Ad-Mixer. a Visual representations of admixture proportions of the TESS posterior predictive maps. b Output image combining the posterior predictive maps into a single image using TESS Ad-Mixer (sample points were added using ArcMap). c Close up view of the TESS Ad-Mixer output file showing example points and a corresponding table. The table shows each example point’s expected q value for each of the three clusters, and its corresponding R, G, and B values. d Output image combining the posterior predictive maps into a single image using POPS example data (Jay 2011) for K = 3. Map extent was clipped using a shapefile from Jay et al. (2012), and sample points were added using ArcMap). For comparison to POPS, R spatial interpolation for these data see Fig. 1d

Functionality description

User inputs

User inputs in the program include providing k ASCII files (representing spatial interpolations of the Q matrix) and k color codes (represented by RGB values) that will visually distinguish each of k clusters determined by TESS. The k ASCII files will be notated as Q 1 , Q 2 Q k . Spatially predicted q values, inferred from the Q-matrix, are denoted as Q i (x, y), where i is a value between 1 and k, and (x, y) ranges over the study area.

Data normalization

TESS Ad-Mixer layers k ASCII grids onto a single image, while also mixing the colors of corresponding cells from each of k ASCII grids into proportional amounts that accurately reflect their q values. Each of the cells, Q i (x,y), from each of the k ASCII grids, contains a q value pertinent to a single particular geographic location, (x, y). For each (x, y), a vector of the normalized associated data, \( q_{\text{norm}}^{ \to } \), is calculated so that each cell displays a proportional amount of k colors specified by the user.

$$ q_{\text{norm}}^{ \to } (x,y) = \frac{1}{{\sum\limits_{i = 1}^{k} {Q_{i} (x,y)} }}\left( {\begin{array}{*{20}c} {Q_{1} } \\ {Q_{2} } \\ {\begin{array}{*{20}c} \vdots \\ {Q_{k} } \\ \end{array} } \\ \end{array} } \right) $$
(1)

In order to avoid dividing by 0, if:

$$ \sum\limits_{i = 1}^{k} {Q_{i} (x,y) < \in } $$
(2)

where, ϵ is a near 0 value, then:

$$ q_{\text{norm}}^{ \to } = 0 $$
(3)

.

Color computation

In order to generate the output image, \( q_{\text{norm}}^{ \to } \) and the RGB code values specified for each cluster are integrated for each pixel.

$$ R = q_{\text{norm}}^{{ \to \quad {\text{T}}}} \left( {\begin{array}{*{20}c} {r_{1} } \\ {r_{2} } \\ {\begin{array}{*{20}c} \vdots \\ {r_{k} } \\ \end{array} } \\ \end{array} } \right) $$
(4)
$$ G = q_{\text{norm}}^{{ \to \quad {\text{T}}}} \left( {\begin{array}{*{20}c} {g_{1} } \\ {g_{2} } \\ {\begin{array}{*{20}c} \vdots \\ {g_{k} } \\ \end{array} } \\ \end{array} } \right) $$
(5)
$$ B = q_{\text{norm}}^{{ \to \quad {\text{T}}}} \left( {\begin{array}{*{20}c} {b_{1} } \\ {b_{2} } \\ {\begin{array}{*{20}c} \vdots \\ {b_{k} } \\ \end{array} } \\ \end{array} } \right) $$
(6)

where each r i , g i , b i , are the color components (red, green and blue, respectively) previously specified by the user, designated for the ith cluster, and R, G, B are the color components for this pixel in the final image. The program computes R, G, B for every pixel designated by the input ASCII grids. These pixel values are used to construct a PNG image using Java’s BufferedImage class. The resulting PNG image (Fig. 2b) is a complete representation of the spatially interpolated Q matrix as generated by TESS.

Advantages of using TESS Ad-Mixer

TESS Ad-Mixer improves upon existing methods for representing the spatial distribution of population structure. Specifically, TESS Ad-Mixer: (1) allows users to choose colors for different clusters; (2) generates a single image of multiple clusters from TESS posterior predictive ASCII files; and (3) shows areas of cluster overlap from q values, which are represented by color mixing the pixels in question (Fig. 2c). Color mixing is especially useful for displaying adjacent clusters with large degrees of overlap (Fig. 2d) compared to methods based on Universal Kriging (Fig. 1d). The single output image generated by TESS Ad-Mixer can be georeferenced to maps using programs, such as ArcMap (ESRI Corp.).