1 Introduction

Mass spectrometry (MS) imaging brings the advantages of MS to microscopy and provides the spatial distribution of chemicals on a surface without the need for fluorescent or radioactive labeling [14]. The development of 2D-MS imaging of biological tissue analysis provides highly specific molecular information on the distribution of proteins [510], lipids [1119], and therapeutic drugs [2024] in the material. This information serves as a powerful tool for finding disease biomarkers as well as for understanding and developing drug delivery systems [25, 26]. While 2D-MS imaging has been widely applied to the analysis of thin tissue sections, it has also been recognized that it is highly valuable to acquire 3D spatial distributions of the chemicals in a tissue volume or in an entire organ [2734]. The two basic approaches used for 3D-MS imaging are, first, depth profiling using an ionization source that ablates tissue and second, recording a sequence of 2D images from serial sections taken from a tissue volume and then combining this information. In the depth profiling experiments, ablation of the tissue material is used to expose lower layers of tissue for analysis; this has been achieved with high energy ion beams in secondary ion mass spectrometry (SIMS) imaging [29, 33, 35] or with lasers in methods that include matrix assisted laser desorption (MALDI) [5, 36, 37], laser ablation electrospray ionization (LAESI) [28, 38], and laser ablation followed by atmospheric-pressure afterglow (LA-FAPA) [39]. In the alternative serial-sectioning approach, a volume of tissue is sliced into thin sections and each of the sections is imaged using standard 2D-MS imaging. The information obtained from the 2D images is then processed to construct the 3D distributions of the chemicals. By appropriately selecting a representative number of sections for analysis, a large volume of tissue can be analyzed in a relatively short time with adequate information being acquired to reconstruct the 3D chemical distributions. This approach has been implemented using MALDI [27, 30, 32, 40] and desorption electrospray ionization (DESI) [41].

Data processing for MS imaging is important and also challenging since a large amount of raw data is acquired and needs to be processed and analyzed. Software tools for 2D-MS image data processing are readily available. In addition to the software provided with commercial mass spectrometers [42], free software such as BioMap [43, 44], Datacube Explorer [45], and MITICS [46] have been used widely for generating 2D-MS images. Currently there is no software available for processing MS data to assemble 3D images directly. In recent studies [32, 40, 41], 2D images for selected ions on a series of sections were first generated using 2D data processing software, and then the color distribution information was further processed by image software to generate the 3D images. Note that the m/z values and corresponding ion abundance information for distributions of multiple compounds are not represented in the 3D data set constructed in this approach. To obtain 3D images for different ions, different 2D images have to be generated first, and their color distributions are used to represent difference for the 3D image construction. Application of advanced data analysis methods to a 3D volume, such as principal component analysis (PCA), is not possible because the original mass spectral information is not retained through the data processing. As discussed for a previous study of peptide and protein imaging in rat brain [32], many more extensive data processing procedures are required for true 3D data processing that retains the MS spectral information. These methods could include, but are not limited to, spectral smoothing, intra-section registration (2D image rotating and rescaling), inter-section registration (alignment, quality measurement), and validation (surface rendering), etc.

In this study, we explored the methods to reconstruct a 3D data set retaining the mass spectral information for 3D-MS imaging. With the accurate masses and the abundances of the ions representative for the compounds of interest, 3D images can be instantly produced with arbitrary views, and statistical analysis can be performed in the 3D volume. The key steps necessary for the data processing were identified, and solutions were developed and implemented using selected capabilities of MATLAB so that they can be integrated into a complete software package. The data set previously acquired for 36 sections of a mouse brain with DESI imaging was used here to test the methods and to demonstrate the new software solutions. Data reduction, tissue section alignment, data visualization, as well as statistical analysis using PCA and cluster analysis (CA) have been developed.

2 Data Registration and Storage

Similar to 2D-MS imaging, the spectra recorded for each point in a 3D volume tissue needs to be co-registered with the position of each sampling point. When using a series of sections from a tissue volume, the x and y coordinates are registered with the individual spectra acquired while 2D imaging is being performed on each section. The actual z coordinate value of a section needs to be registered together with the data recorded for each point on that section. A point close to the bottom-left corner of each tissue section is set as the reference point (0, 0) in the program we developed, while a relative x-y position system is used to register the points in that section. There is a challenge in aligning the x-y positions between different sections, for which a solution will be discussed later in this paper. Typically, multiple spectra are recorded for each point of the section and the averaged spectra are used for data processing. The entire data set can be stored in a data base defined in various ways. The data used in our study were recorded using an LTQ mass spectrometer (Thermo Fisher Scientific, Inc., San Jose, CA, USA) equipped with a homebuilt DESI imaging source. Thirty-six tissue sections of a mouse brain were imaged in the negative ion mode, with a total of 50 rows of scans and 69 spectra recorded per row [41]. The x and y positions corresponding to each spectrum were determined by the scanning speed and step length of the moving stage in the x and y directions, respectively. An index file was created to correlate each file name with the x-y coordinates. During the data processing, the peaks were identified and a single file was created for each analyte with its intensities at every point in the 3D volume.

3 Data Reduction

Data reduction has been shown to be necessary for 2D-MS imaging [47], and it is even more desirable for 3D imaging, especially when high resolution mass spectra are recorded using FTICR, Orbitrap, or TOF analysis, since a significantly larger amount of raw data is then collected. Use of the raw spectra causes problem in data storage as well as in subsequent data analysis, which is typically limited by the memory size and data transfer speed of the computer. The binning method is commonly used for reduction of raw data. For each spectrum, the bin width is first selected based on the mass resolution of the instrument, and one peak with a nominal m/z value centered within every bin window is assumed to represent the information with the peak intensity being defined as either the maximum or the sum of the signal intensity across the bin width.

Although the bin method is easy to implement and has been widely used, a more precise peak detection and alignment method can further decrease data redundancy without losing accurate mass information (see Supporting Information for a comparison between these two methods). This retains the advantage of using high MS resolution and high MS accuracy for the imaging [4850]. Peak detection in this work is based on statistical analysis of the spectra acquired for the entire tissue section. A histogram can be generated as shown in Figure 1a, which is obtained with 3450 spectra (50 × 69 spots) for one of the 36 tissue sections of the mouse brain tissue [41]. For the raw data without any baseline correction, an assumption can be made that a large number of peaks can be attributed to chemical or electronic noise. This is correct for the DESI imaging data used for demonstration. The noise level can be equated to the local maxima in the histogram, and peaks with intensities of three times of the noise level or higher can be identified as “real peaks.” With this peak detection made on the raw spectra, the “real peaks” were picked and retained while a large number of background signals are dropped, which significantly decreases the size of the data set. In some cases, it might be preferred to have the peak intensity corrected with the noise level. The reduction rate varies as a function of the chosen threshold for peak identification; as shown in Figure 1a inset, a 95% reduction can be achieved with a threshold of S/N = 3. Further reduction can be achieved by using higher S/N for peak detection, with a confidence based on prior knowledge of the samples to be imaged. Selection of a noise level of lower signal and an S/N of 3 for peak detection would help to save the low abundance but significant peaks; however, the total amount of data involved in the latter stage of analysis would also be significantly increased.

Figure 1
figure 1

(a) Peak intensity histogram based on statistical analysis of 3450 spectra recorded in 2D-MS imaging of a tissue section of mouse brain. The inset shows the percentage of the original data retained as a function of the threshold set for “real peaks.” (b) The peaks of PS 18:0/22:6 at m/z 834.6 from different spectra, showing mass shifts among different scans. (c) Distribution of m/z value for the peaks of PS 18:0/22:6 from all the spectra. (d) Positions of peaks detected within a mass range m/z 834 to 839 before (top) and after peak alignment (bottom)

Peak alignment can be performed after peak identification to further decrease the data size while retaining the accurate m/z values of the compounds detected in the tissue. Mass shifts exist for some compounds in spectra acquired from different spots on a tissue section, and they can be caused by the conditions used for mass analysis and the composition of the sample matrix [5155]. Peak alignment allows the assignment of the correct m/z value to a compound uniformly for all the pixels on a tissue section or in a tissue volume, based on the statistical analysis of the spectra and the mass accuracy and resolution of the mass spectrometer. This process plays an important role in subsequent data analysis. As an example, peak alignment for phosphatidylinisitols (PS) 18:0/22:6 was performed using the method shown in Figure 1c. The distribution of the peak positions obtained from all 3450 spectra acquired for a tissue section covers a narrow mass range around m/z 834.6 and could be fitted to a Gaussian distribution. The corresponding m/z value at the maximum in the distribution was then assigned to all the peaks counted to this distribution for all the spectra, while retaining the original measured peak intensity. If internal references can be used for mass calibration of each spectrum, the statistical analysis shown in Figure 1c is not necessary, but the peak positions still need to be identified as shown in Figure 1b. Practically, identifying multiple endogenous calibrators for in situ calibration can be difficult for MS imaging while adding external calibrators can also be cumbersome.

It is possible to observe two or more local maxima, and the mass accuracy and resolution of the mass spectrometer then needs to be considered to determine if multiple peaks should be assigned. In this study, mass windows of 0.1 and 0.01 Th are used for data acquired using the LTQ and Orbitrap, respectively. Peaks within these m/z windows are treated as representing a single compound. The effect of the peak alignment can be seen from the comparison for the peaks in the m/z range 834.0–839.0 shown in Figure 1d. A comparison of the data reduction between the bin method and the peak detection and peak alignment method is shown in Table 1. The peak detection and peak alignment method offers better data reduction with retention of accurate mass information. Even with a 10× better resolution than that used in the bin method, the peak detection and peak alignment method can reduce the data size by a factor of 12 relative to the bin method on the LTQ data set, and more than 17 fold on the Orbitrap data set.

Table 1 Data Reduction for Bin and Peak Detection and Peak Alignment (PD&PA) Methods, for LTQ and Orbitrap

4 Section Alignment

Since the individual sections were imaged separately, correlation of the x-y coordinates between different sections is necessary before the 3D visualization or data analysis can be properly performed. As shown in Figure 2a, in the raw 3D data set, the relative y positions for each section are correct since the they were assigned according to the position of the tissue section in the original tissue volume; however, the origins of the (x, y) positions are misaligned between the tissue sections since they were arbitrarily assigned when individual tissues were imaged. Rotation of some of the images is typically also required to get all the tissue sections perfectly aligned (Figure 2b). To solve this problem, statistical methods can be used to extract the sample region and recognize some major morphologic features from the image data. These can then be used by a computer program to align the x-y coordinates of tissue sections for 3D image data construction.

Figure 2
figure 2

(a) Image alignments of the tissue sections, (a) before and (b) after the correction of the (x, y) coordinates. Image of a tissue section with SOFM classification into (c) 2, (d) 3, (e) 4, and (f) 5 categories (features). (g, h) Images of two adjacent tissue sections (I and II), with SOFM classification into 3 features, with x and y shifts between them. Overlapping of images for section I and II (i) without alignment and (j) after moving section II 7 pixels up and 3 pixels to the left. For (a) to (f), each color represents a category, for (g) to (j), gray and white each represents a category and red represents the pixels with color mismatched between section I and II

In our study, an unsupervised, self-organizing feature map (SOFM) artificial neural network method was applied to classify the imaged area into the sample and substrate region, so that the shape and location of the sample region can be used for the inter-section alignment. SOFM is different from other artificial neural networks in the sense that it uses a neighborhood function to preserve the topologic or morphologic properties of a data space, which is useful for producing low-dimension views through classification of high-dimensional data [56, 57]. A significant advantage for using SOFM is that no training process is required to generate the low-dimensional views. This makes SOFM very suitable for identifying the main morphologic features universal in all tissue sections that can be used to differentiate sample regions from non-sample regions. The MATLAB Neural Network Training Tool was used to implement the SOFM. Identification of the tissue sample area is done with the SOFM using the spectra with their original intensities, with instruction set for two features into two categories (neuronal structure 1 × 2). As shown in Figure 1c, the sample region is clearly separated from the substrate background. More detailed morphologic features can be extracted using SOFM with the spectral intensity first normalized for the tissue region (Figure 2d, e, f). A potential limitation for using SOFM routinely in 3D tissue imaging is that it could be time consuming, depending on the number of categories that need to be identified.

To align the 36 tissues sections for the 3D data construction, SOFM is applied twice to provide images of the regions of white and grey matter. A program written in MATLAB allows for the overlay of two images from adjacent tissue sections (Figure 2g, h) and their relative movement and rotation. The program also calculates the number of pixels with color mismatch between these two images (Figure 2i), which is minimized when best alignment is achieved (Figure 2j). The x-y coordinates are then corrected and saved for the 3D data reconstruction.

This alignment method provides a process with a quantitative measure, the number of misaligned pixels, which can be implemented to achieve automated alignment. It has been applied for aligning two tissue sections with different sample areas, such as those at z = 2.22 mm and z = 3.04 mm shown in Figure 1a. The symmetry in distribution of the mismatched pixels can be used to assist the alignment process. When images with three or more categories identified are used in alignment, empirically it is found that number of mismatched pixels is also minimized when the sections of different sample areas are best aligned. In some cases, the observed sample area is enlarged due to stretching of the tissue section during sectioning, instead of the actual change in the original shape of the tissue volume. Additional instructions need to be included to reshape the sample area; however, the rule of achieving minimum number of misaligned pixels can still be used as a measure during that process.

5 Data Interpolation

The data stored in the reconstructed 3D data set can be used to generate images for discrete surfaces corresponding to the actual tissue sections used for data acquisition, such as those shown in Figure 2b. In order to generate 3D images with continuous chemical distributions along the z axis, data interpolation can be first executed to insert data for the appropriate image component between the real layers of data. Based on the assumption that the distributions of biological molecules are continuous, the inserted data can be generated using a variety of interpolation methods, such as nearest, linear, and cubic-spline-interpolation methods. The data insertion for PS 18:0/22:6 (m/z 834.6) and sulfatides (ST) 24:1 (m/z 888.8) are shown in Figure 3 as an example.

Figure 3
figure 3

The intensities of (a) PS 18:0/22:6 and (b) ST 24:1 in spectra acquired for a series tissue sections and the trend lines for data insertion. The 2D images for distributions of (c) PS 18:0/22:6 and (d) ST 24:1 on two actual tissue sections and two inserted layers between them

For a pixel with x = x0, y = y0 on the data layer z to be inserted between layer z1 and z2, the peak intensities P can be calculated using the linear interpolation method, equation 1:

$$ P = {P_1} + \left( {{P_2} - {P_1}} \right)\frac{{z - {z_1}}}{{{z_2} - {z_1}}} $$
(1)

where P1 and P2 are the MS peak intensities at (x0, y0) on layer z1 and z2, respectively. In the case of the mouse brain sample [41], there are 3450 pixels in every data layer and 19 lipid peaks with distinctive m/z values were identified as being informative, so 65,550 interpolations were performed to generate one additional data layer for insertion. Images of two inserted layers are shown in Figure 3c and d for the distributions of PS 18:0/22:6 (m/z 834.6) and ST 24:1 (m/z 888.8) on the inserted layers. To perform 3D imaging and 3D data analysis for the mouse brain experiment in which 36 sections were actually imaged with DESI, 364 additional layers were interpolated that result in a total of 400 data layers (each 26.6 μm apart) in the 3D data set reconstructed.

The interpolation for the virtual layers helps to generate images with better smoothness. The biologically meaningful interpolation has to be validated with comparison between images with all real layers and with a mixture of real and virtual layers. This could also be sample-specific and analyte-specific. In this study, we demonstrate how to enable the interpolation capability and use three classic methods as examples. No significance was observed among them for the imaging using the 19 lipid peaks. For various biological studies, the interpolation method can be easily switched based on the user’s knowledge about the sample and the distribution of the biomarkers.

6 3D Visualization and Data Analysis

With a complete 3D data set properly constructed, 3D visualization can be implemented easily using the 3-D Visualization Module in the MATLAB or other open source package such as Visualization ToolKit [58]. Both 2D (Figure 4a) and 3D images can be generated in the form of an iso-surface (Figure 4b), slice surface (Figure 4c), or sub-volume (Figure 4d). The distribution of a compound is presented using the variation of color intensity, and the distributions of different compounds can be overlaid.

Figure 4
figure 4

Visualization using 3D-MS data. (a) 2D images of selected compounds present in multiple layers, (b) Iso-surface views, (c) center slice views, and (d) subvolume views of PS 18:0/22:6 (top), ST24:1(middle), and both (bottom)

With the original MS spectral information retained in the 3D data set, statistical analysis can be readily applied to compounds distributed throughout the 3D volume, and the results can also be presented visually. As a demonstration, the k-mean clustering method is applied to 19 lipids in the mouse brain. The method of k-mean clustering is a partition method that can classify n observations (x 1, x 2, …, x n ), into k sets, (k ≤ n) S = {S 1, S 2, …, S k}, where x n is the mass spectrum from the nth point sampled in the 3D imaging and S k is the kth morphologic feature or region. The basic principle of k-mean clustering is to minimize the within-cluster sum of squares, equation 2:

$$ arg\;\min \sum\limits_{{i = 1}}^k {\sum\limits_{{{x_j} \in {S_i}}} {{{\left\| {{x_j} - {\mu_i}} \right\|}^2}} } $$
(2)

where μ i is the mean of points in S i . The k-mean clustering was applied using a MATLAB program to the 3D data constructed from the mouse brain experiment to two main regions (Figure 5a, b, c), which correspond to the gray and white matter. The averaged spectra within these two regions (Figure 5d, e) show a dominant peak at m/z 834.6 (assigned to PS 18:0/22:6) in region 1 and an ion at m/z 888.8 (assigned to ST 24:1 in region 2. The visualization tool in MATLAB can also be used to generate an overlapping view of the two compounds, as shown in Figure 5c. With the 3D data space appropriately constructed, other statistical analysis methods can be applied for tissue samples for finding biomarkers through the correlation between the chemical distributions and the morphologic features.

Figure 5
figure 5

Two-region classification of mouse brain tissue using k-mean clustering method with 3D data constructed with the DESI imaging data from 36 sections. Side view of (a) region 1, (b) region 2, and (c) the overlap of them, with averaged mass spectra for region (d) 1 and (e) 2

7 Conclusion

In this work, we explored a procedure and developed tools for data processing in 3D mass spectrometry imaging. The reconstruction of the 3D data set containing the mass spectral information, viz. the accurate masses and abundances, for all the compounds of interest is the critical step. The identification of the peaks and the alignment of the masses are performed based on the statistical analysis of the 2D imaging data acquired over an entire tissue section, which is important for reducing the data size while retaining the accurate mass information. Appropriate solutions were also identified for other technical challenges, including aligning the section data, producing continuous images, and generating arbitrary 3D views. These capabilities and the results of utilizing the various procedures and software tools were demonstrated with the 3D-MS imaging data acquired for a mouse brain using DESI-MS imaging. Though only data by DESI imaging are used in the demonstrations, the capabilities of the software and methods are not limited by mass range or resolution. They can be applied to data acquired by MALDI and other imaging methods, with proper m/z windows selected for the peak alignment based on the specified resolution and mass accuracy of the mass spectrometer used to record the data. In future development, the strategies for the proper interpolation of data and insertion of the virtual layers need to be explored and validated. The capability allowing direct comparison of 3D images acquired by a variety of technologies, such as mass spectrometry, MRI (magnetic resonance imaging), and spectroscopic imaging methods, would provide comprehensive morphologic and molecular information for the biological study.