Introduction

Non-invasive in vivo small animal imaging has evolved from a niche research application into a powerful and scientifically significant tool for basic research [1]. Small animal imaging enables faster translation and application of preclinical insights into the clinical routine and thus plays a pivotal role in biomedical and pharmaceutical research. A multitude of different imaging modalities are available, each having positive and negative attributes and should be chosen depending on the study design and scientific goal.

Positron emission tomography (PET) and single photon emission computed tomography (SPECT) provide a variety of biological targets for investigation of functional and metabolic pathways [1,2,3,4,5]. Magnetic resonance imaging (MRI) offers clear delineation of organs due to its high soft tissue contrast, as well as functional parameters, such as apparent diffusion coefficients (ADCs), for investigating diffusion in a specific tissue [6,7,8]. Optical imaging (OI), either based on the detection of fluorescence, chemiluminescence, or bioluminescence, can be applied to image fluorescent probes or bioluminescent/fluorescent proteins, for instance by means of genetically encoded reporters expressed in cells [9,9,11]. However, planar imaging approaches are hampered by the lack of quantification and the low tissue penetration depth, which may be overcome by 3D methods such as fluorescence-mediated tomography (FMT) [12]. By contrast, computed tomography (CT) provides high-resolution bone and lung imaging but lacks soft-tissue contrast and exposes the patient/animal to a certain radiation dose [13,13,14,16].

The overall benefit of small animal imaging and the insights obtained for example in specific diseases are directly linked to the validity and reliability of the collected data. If the data (regardless of the modality used) are not reproducible and/or reliable, then the outcome of the data is rather questionable. However, this lack of reproducibility in basic and preclinical research has been intensively discussed over the last decade and it has been proven that 75–90 % of the empirical observations cannot be reproduced [17,17,19]. Initiatives for the evaluation of biomarkers for oncology-related studies have been established recently and do strengthen the increasing demand for standardization and validation [20].

Many of the obtained preclinical results cannot be compared, occasionally even within one institution, because they depend on various complex factors such as anesthesia, animal handling, physiological parameters, data acquisition, and analysis. These factors can greatly influence the outcome of experiments, but most are avoidable to a certain degree. To overcome these differences, two steps can be made and will be outlined in this paper: (1) image quality parameters can be harmonized (based on phantom experiments) and (2) protocols and procedures for animal handling, image acquisition, and analysis can be standardized.

Individual sections for PET/SPECT, CT, MRI, and OI will identify phantom-based image quality parameters that can be harmonized. In addition, potential areas for standardization will be listed and the current status as well as future prospects will be discussed. Separate chapters focus on animal handling and image analysis in general.

Computed Tomography

Computed tomography is based on the attenuation of photons traversing matter. By measuring photon attenuation, an image with an intensity proportional to the local attenuation coefficient can be reconstructed. Excellent reviews regarding the CT technique exist already and the readers are referred to them [15, 16, 21]. In addition, a standardized nomenclature for bone histomorphometry has been established and is updated on a regular basis [22, 23], and guidelines to assess bone microstructure have been established, as well [24].

Parameters Obtained from Phantom Scans

The reproducibility of acquired CT data can be greatly affected by a variety of technical parameters, such as the reconstruction algorithm, data analysis, software version, or applied corrections, but also by the hardware configuration (e.g., detector choice). To ensure reproducibility and reliability of the acquired data on a permanent basis, an efficient quality control of the employed CT scanner is absolutely necessary [25]. From this work, the following parameters can be used to harmonize image quality over various devices: the accuracy of the CT water number and uniformity [25]. A 50 ml centrifuge tube filled with water can be used to derive these numbers. In addition, noise level and sharpness can be assessed using the same phantom setup, in which the noise can be determined in the water compartment of the phantom and the sharpness can be distinguished at an edge of the phantom, e.g., at the transition of the tube to the air around it. Furthermore, the exposure dose can be measured using phantoms and dosimeters [26,27,28,29,30]. The spatial resolution is a crucial factor in CT and mainly depends on the focal spot size of the X-ray source, the pixel size of the detector, and the chosen magnification (source-to-object distance) [31]. Clinical guidelines for quality control in CT imaging are available through the American College of Radiology (ACR), as well (available via the ACR website, www.acr.org).

Future work should include the definition of a reference range of values for these parameters. For each scanner, image protocols producing phantom scans that yield values for these parameters (Table 1) within the reference range would allow for standardized imaging in respect to the scanner’s performance.

Table 1 Summary of image quality parameters amenable for harmonization

Emission Tomography: PET and SPECT

Emission tomography enables the investigation of molecular, metabolic, and functional parameters due to the variety of available specific radiopharmaceuticals. PET is based on the coincident detection of two annihilation photons, originating from positron emitting radioisotopes. SPECT is based on the detection of a single photon, with a direction defined by collimators. Excellent reviews on these techniques exist [32,30,31,32,36].

Both techniques are powerful tools for basic research that facilitate the assessment of molecular and functional processes of diseases as well as potential therapies due to the multitude of available animal disease models [4, 35, 37,35,39]. A variety of small animal scanners have been developed by either university institutes or industry [35, 40,38,39,40,41,45]. This development has led to an increase in the number of preclinical emission tomography research studies. Many of these studies have been based on qualitative/visual interpretation of images. However, the demand for quantitatively accurate data is increasing due to the increasing role of preclinical data in obtaining approval for new drugs and the progress in clinical standardization of PET imaging.

In the clinical setup, harmonization of image quality, e.g., using 2-deoxy-2-[18F]fluoro-d-glucose ([18F]FDG) PET has been already established by the European Association of Nuclear Medicine (EANM) guidelines for PET/CT tumor imaging and by others [46,38,48]. Based on this work, response monitoring criteria were enlarged and complemented (as envisioned by the progression from “Response Evaluation Criteria in Solid Tumors” (RECIST) to “PET Response Criteria in Solid Tumors” (PERCIST)) [49].

Comparability of results, a field standard of working with standard operation procedures (SOPs), and complete and transparent reporting of the results obtained would prevent the need for duplicate studies and consequently contribute to refinement and reduction and finally to a greater return-of-cost in the preclinical environment [17, 50, 51].

Potential areas of standardization include the following:

  1. 1.

    Parameters obtained from phantom scans

  2. 2.

    Quality assurance in tracer production

Parameters Obtained from Phantom Scans

Similar to CT, the reproducibility of acquired PET or SPECT data can be greatly affected by a variety of technical parameters. However, standardization of technical aspects across different sites is even harder to achieve and ultimately might not be fully possible. When PET or SPECT systems from different vendors are employed for individual studies, possible comparisons are hampered by the differences for example in acquisition parameters (such as energy or timing window, etc.) that can differ from vendor to vendor [42]. To ensure reproducibility and reliability of the acquired data on a permanent basis, an efficient quality control of the employed PET or SPECT scanner is absolutely necessary [25]. From this work, the following parameters can be used to harmonize image quality over various PET scanners: image uniformity, cross-calibration accuracy, and recovery coefficients. All these parameters can be derived from the image quality phantom according to the National Electrical Manufacturers Association (NEMA) NU4-2008 recommendations. Furthermore, evaluations to assess the quantification accuracy and the partial volume effect for small animal PET scanners have been performed for some preclinical scanners, as well, and can be transferred to other preclinical scanners [52, 53].

For SPECT, the determination of image uniformity and cross-calibration as parameters to be harmonized is suggested [25]. For the latter, a cylindrical uniform phantom with an activity of around 10 MBq is recommended.

Future work should include the definition of reference ranges of values for these parameters for both PET and SPECT. For each scanner, image protocols producing phantom scans that yield values for these parameters (Table 1) within the reference range would allow for standardized imaging in respect to the scanner’s performance.

Quality Assurance in Tracer Production

Quality control (QC) of tracers for preclinical imaging is currently less formalized and in general reduced in comparison to clinical good manufacturing practice (GMP) procedures. But of course, for assurance of the validity of data created in preclinical imaging, comprehensive quality control (using valid analytical methods) should be performed to assure a sufficient, reproducible quality of the tracers to be applied. This should include testing for identity, chemical and radiochemical purity by chromatographic methods (high performance liquid and/or thin layer chromatography (HPLC, TLC)), residual solvents and ethanol content (gas chromatography, GC), as well as pH value. Testing for endotoxins and sterility may not be performed for every batch, but it should be validated during synthesis process development. Keeping retention samples of each batch will be advantageous, to allow for retrospective tests in case of unexpected imaging results. Some aspects need to be highlighted here with special regard to the different boundary conditions in small animal imaging compared to human imaging (for instance small body weight and blood volume or a smaller number of receptors). Specifications need to be defined, depending on the kind of preclinical study. Activity concentration is critical, as the volume of a tracer solution to be injected - especially in small animals - is very limited (for a 25 g mouse the maximum injection volume is 125 μl (5 μl/g body weight) according to german regulations [54]). In case of brain receptor studies, molar (or specific) activity is of elementary importance to avoid saturation of the receptor system under investigation [55]. Molar activities as applied in human imaging will not be sufficient. Furthermore, molar activities should be comparable in individual tracer batches for a set of experiments within a study to gain valid, reproducible results in imaging. Finally, formulation of the tracer solution needs to be verified to be suitable for animal imaging but also identical regarding excipients in individual tracer batches within the experiments of a preclinical trial, as varying compositions may influence imaging results (buffer, pH, stabilizers, etc.). Ethanol content should never exceed 10 %. Overall, defined specifications and sufficient QC of radiotracers are mandatory to secure valid imaging results and should therefore be included in the material and methods section of a manuscript [56].

Magnetic Resonance Imaging

Magnetic resonance techniques (i.e., magnetic resonance imaging and spectroscopy) are excellent non-invasive imaging tools that can map a wide range of tissue parameters [7, 57] or, for example, provide information about the concentration of brain metabolites that are less abundant than water in the brain [58]. The physical principles of MR techniques are based on nuclear magnetic resonance, wherein nuclei in a magnetic field absorb and re-emit electromagnetic radiation [59,51,52,53,54,64]. Some important features of MRI are that it can acquire images in any plane with a very high spatial resolution (up to 15 μm), and it provides an excellent soft tissue contrast without using ionizing radiation [65, 66]. A variety of nuclei less abundant than protons such as P-31, C-13, Na-23, and F-19, can also be used for MRI [60]. Currently, both vertical and horizontal MRI systems as well as hybrid PET/MRI preclinical systems are available from different vendors [67,59,69].

Preclinical MRI and MR spectroscopy (MRS) techniques have been increasingly used to perform longitudinal studies to obtain neuroimaging fingerprints of subtle changes in animal models for neurodegenerative, psychiatric, and other central nervous system-related disorders, such as stroke and cancer [70]. One of the limitations of preclinical and clinical MR techniques is the lack of consensus on standardized and optimized MRI and MRS methods. Studies have increasingly focused on improving clinical system standardization for multiple MR techniques and establishing multicenter platforms for central nervous system disorders [71,63,64,65,66,67,68,69,70,80]. Furthermore, clinical MRI accreditation programs and manuals for evaluating MRI performance are available (see www.acr.org/accreditation). We are not aware of such accreditation programs for standardization of preclinical MR techniques in single-center and/or multicenter research platforms.

Potential areas of standardization include the following:

  1. 1.

    Parameters obtained from phantom scans

  2. 2.

    Introduction of field standard protocols for common imaging tasks

Parameters Obtained from Phantom Scans

The accuracy and reproducibility of MRI experiments are limited by instrument-related variations as well as secondary sources such as software versions to analyze the data. Scanner quality control steps provide more insight about instrument-related sources of variations in MRI experiments that may hamper the experimental reproducibility.

Two main instrument-related sources of variations in MRI experiments are hardware-related differences and hardware imperfections [81, 82]. Preclinical studies performed at ultra-high magnetic field strengths may be particularly influenced by different hardware imperfections compared with studies performed at lower fields. All types of MR techniques (Table 2) are sensitive to differences in image acquisition via sequence parameter settings such as echo time, repetition time, flip angle, number of slices, slice orientation, direction phase encoding, acquisition volume, number of averages, and microenvironment (i.e., scanning environment and temperature) [71, 130, 131]. These different influences can be identified and controlled by implementing scanner quality assurance programs based on dedicated phantoms [131,123,133]. In preparing this review, we contacted preclinical research centers and found that quality control protocols using phantoms were generally not applied. The Osborne review recognizes the same issue [25].

Table 2 Summary of preclinical MRI and MRS techniques and references for pitfalls, artifacts, and technical considerations

The Function Biomedical Informatics Research Network (FBIRN) project and others have suggested the development of methods to measure and/or decrease scanner-associated variations using dedicated phantoms [71, 131, 132, 134,126,136]. The protocol for the FBIRN phantom can be accessed via the website: “https://stage.nitrc.org/frs/download.php/275/fBIRN_phantom_qaProcedures.pdf” [137]. For FBIRN and ACR, automatic analysis programs for MATLAB have been developed, which reduce the time to analyze the images [131, 138]. Phantom measurements as described by the FBIRN and ACR MRI accreditation program can assess many important quality control characteristics, such as the signal-to-noise ratio, signal-to-fluctuations noise ratio (measure temporal stability), signal drift, image uniformity, ghosting artifacts, chemical shift and spatial resolution, slice thickness accuracy, slice position accuracy, low-contrast object detectability, etc. [133, 136, 139, 140]. A common consensus about a receipt of phantom for preclinical MR systems and types of parameters which should be used for quality control are missing. Nevertheless, a current paper has described a step-by-step preclinical MRI quality assurance protocol to identify gradient calibration-related errors using a 3D-printed structural phantom [141]. Furthermore, another paper has described how to design and use a quality assurance phantom for monitoring tumor size using preclinical scanners (ultrasound, CT, and MRI) [142]. Lastly, Osborne and colleagues have provided a modified summary of quality control tests for preclinical MRI systems, which was adapted from the ACR MRI accreditation program (available via the ACR website, www.acr.org) [25]. In our institute (Bio-Imaging lab, Antwerp University), we conduct routine stability tests using a 15 ml phantom consisting of agarose, nickel chloride, and salt (phantom has T1 and T2 values similar to the rodent brain) to check instrument-related variations. Our routine stability tests using these phantoms help us detect subtle fluctuations related to the coil and/or MRI instrument as well as image quality (Table 1). The FBIRN and ACR manual provide MRI quality control methods that can be used for routine tests, as well [143]. In these documents, weekly assessment of quality control is suggested. There is a need for online platforms where users can compare their quality control results with each other. Such comparisons may aid the detection of hardware- or software-related performance changes.

The standardization and use of phantom quality programs in each preclinical lab may enable the creation of platforms for multicenter studies. The FBIRN and ACR MRI approach seems to be similar to the concepts of cross-calibration and image quality used within PET, in which phantom image characteristics are standardized, instead of every possible setting, during scanner QC, data acquisition, and image reconstruction.

Introduction of Field Standard Protocols for Common Imaging Tasks

An essential component of standardization is identifying what and how to standardize. Moving from image quality metrics towards parameters derived from animal experiments starts with a clear definition of those parameters (e.g., cerebral blood flow). Clearly defined protocols, based on field standards for obtaining those parameters, are required to decrease instrument- and software-related errors/differences, maximize data quality, and increase the reproducibility of results obtained by independent researchers.

Optical Imaging

Optical imaging devices rely on optical components such as lasers, filters, lenses, and cameras to detect photons emitted by fluorescent or bio-/chemiluminescent processes [9,9,11]. In contrast to high energy photons used in PET and CT, the photons used for optical imaging show a diffuse behavior because they are strongly scattered by the tissue, which needs to be considered during image reconstruction and interpretation. Furthermore, optical absorption and scattering are highly heterogeneous within the body and wavelength-dependent for many tissue constituents [144]. Blood is the dominant optical absorber in vivo, for example, which can cause problems for imaging of well-perfused regions such as liver and heart [145].

Fluorescence-based imaging requires an external light source, which illuminates the animal. Some photons reach fluorescent molecules inside the animal, are absorbed and reemitted at a different wavelength, and eventually reach the surface from where they are captured by a camera or other type of detector. Fluorescence reflectance imaging (FRI) is the most commonly used setup where light source and camera are usually positioned above the animal. FMT is a similar technique that involves more sophisticated hardware but allows tomographic 3D reconstruction of the fluorescence distribution (Fig. 1). In vivo bioluminescence imaging (BLI) does not require an external light source for excitation because the photons are generated inside the animal by bioluminescent processes. A luciferin substrate is administered to the animals prior to acquisition of light signals using dedicated low-light imaging systems such as CCD cameras. The real advantage of BLI is the exquisite sensitivity and specificity of the technique at the molecular level and the high signal-to-background ratio of the bioluminescent reaction. These advantages are particularly notable when using firefly luciferase with d-luciferin [146]. Bioluminescence tomography (BLT) allows the generation of a 3D reconstruction of signals for more precise localization of signals [147].

Potential areas of standardization include the following:

  1. 1.

    Parameters obtained from phantom scans

  2. 2.

    Introduction of field standard imaging protocols

Parameters Obtained from Phantom Scans

When calibrating fluorescent probes, it should be considered that many probes show a concentration-dependent absorption spectrum, resulting in non-linear calibration curves with a “quenching” effect at high concentrations. Furthermore, the behavior may strongly depend on the solvent with notable differences between water and serum. Additionally, some probes may be unstable over time or bleach due to exposure to room light. Hence, calibration should be performed contemporary to the in vivo experiment and under similar conditions. Multiple probes can be compared in well-plates or even pipette tips, but these should be filled to the same level to enable proper comparison. Furthermore, it should be noted that absorption and scattering will affect the brightness of the emission light, which is relevant for homogenized organs.

For bioluminescence imaging, it is difficult to identify an appropriate standard probe. Although low light emission probes exist or LEDs can be inserted into phantoms to reproduce the wavelength of emission of luciferase, a more appropriate standard for calibration would be a purified enzyme. On the other hand, the use of purified enzymes as standards presents many challenges. In fact, for the majority of applications, luciferase reporters are usually expressed in cells, and absolute quantification is unachievable because expression varies with time and cell conditions. As a starting point for future work, it would be interesting to assess the precision and robustness of cell imaging using a luciferase-expressing cell line in different laboratories. First, the robustness of cell lines in vitro can be evaluated as a standard. Second, the magnitude of the influence of imaging parameters can be evaluated using the imaging conditions described above.

Tissue-simulating phantoms are important to assess the brightness and stability of fluorescent probes or to assess the image quality of novel devices or reconstruction methods for both BLI and FRI. They can be constructed using silicon rubber or gelatin in combination with substances for scattering, absorption, and fluorescence, such as titanium oxide powder, lipid emulsions, India ink, or fluorescent dyes [145, 148]. Alternatively, a plastic phantom of size 15 mm × 33 mm × 40 mm with diffuse optical properties resembling average mouse tissue can be ordered from a hardware supplier [11]. This phantom contains a cylindrical inclusion (3 mm diameter) for 100 μl of the substance under investigation. A small amount (4 %) of lipid emulsion should be added to ensure that the optical scattering of the inclusion resembles the rest of the phantom [149]. For many applications, a reproducible and stable reference dye is required [145, 148] and a set of calibrated dyes at multiple wavelengths can be obtained from PerkinElmer, the manufacturer of a commonly used FMT system [149, 150]. In line with the recommendations of Osborne et al. this phantom can be used with known amounts of fluorescence or a small light-emitting lamp to check fluorescence and bioluminescence imaging systems regarding intensity stability, correct signal localization, and the degree of background signal (Table 1) [25].

Introduction of Field Standard Imaging Protocols

For fluorescence reflectance imaging, several parameters can be varied, including the excitation light source and strength, an optical filter for the emission light, and some devices allow adjustment of the field of view, e.g., to scan multiple mice at once. The exposure time is often set automatically to nearly saturate the detector, but typically a maximal exposure time can be configured. Furthermore, multispectral images may be acquired to separate different fluorophores or to reduce the background signal. Additionally, probes with large stoke shift or upconverting nanoparticles may require special combinations of excitation and emission wavelengths. Given this variety, general standardization may be difficult and the focus should be on proper reporting of the experiment settings to allow reproduction of the experimental procedure.

For bioluminescence imaging, protocols for BLI measurement vary depending upon luciferase enzyme employed, substrate injection route, and dose in animals. Efforts to standardize any of these parameters have been limited. Interestingly, a standardized reference imaging protocol seems to have emerged, as evidenced by several papers reporting this protocol in their materials and methods [151,143,153].

This particular BLI protocol images firefly luciferase-expressing cells in anesthetized nude mice 10 min after intraperitoneal injection with a dose of 150 mg/kg of d-luciferin. Although this standard protocol is good for many applications (for example, imaging of subcutaneous tumors in mice), it has serious limitations for other applications. For example, a dose of 150 mg/kg d-luciferin does not saturate firefly luciferase in many organs such as the brain and intraperitoneal injection might not the best route of injection. For brain applications, a higher dose of substrates injected intravenously guarantees higher sensitivity [154, 155]. Moreover, application of the same protocols to cells expressing low levels of luciferase can fail to generate detectable signals. Importantly, dose and route of administration of substrate and the type of bioluminescent enzyme influence kinetics of light emission in vivo, so it is important to define the emission kinetics for each application.

3D Imaging

An imaging protocol was described in detail for μCT-FMT using two commercially available devices and a multimodal mouse holder [149]. The protocol involves advanced fluorescence reconstruction using heterogeneous absorption and scattering maps and has been applied in several studies [145, 156,148,149,150,151,161]. While this protocol is specific for a special FMT device, replacement of the μCT is possible. Figure 1 shows fused μCT and FMT images acquired using this standardized protocol. The mouse was prepared with a rectal insertion containing fluorescence and μCT contrast agent to enable assessment of the fluorescence reconstruction quality. Such a rectal insertion is a compromise between a phantom and an in vivo experiment with an intravenously injected probe, thus providing a balance between realism and complexity, and was recently used to assess the sensitivity and accuracy of FMT in deep tissue regions [161].

Fig. 1
figure 1

Fluorescence imaging. A nude mouse (BALB/c nu/nu) was anesthetized, prepared with a rectal insertion containing a known amount of fluorescence, and imaged with μCT-FMT. a The reflectance image acquired by the 2D mode of FMT shows the mouse. b The fluorescence image is shown as a color-coded overlay, and the rectal insertion appears as a diffuse hyperintense region, which complicates analysis. c The multimodal mouse bed holds the mouse between two transparent acrylic glass plates (green). Markers (red) are used for automated fusion. The segmentation of the mouse body (orange) is used for fluorescence reconstruction. d The reconstructed 3D fluorescence distribution (shown as an overlay at the bottom) appears at the rectal inclusion. The inclusion can be identified in the μCT data due to the addition of μCT contrast agent. Hence, this approach can be used to assess the image quality of the fluorescence reconstruction in a reproducible manner. e 3D rendering of the μCT-FMT data showing the co-localization of the fluorescence with the insertion. This example shows that standardization of fluorescence imaging involves various aspects, including mouse models, animal preparation, probe design, imaging devices, scanning protocols, and image analysis.

Standardization of BLT imaging protocols mainly depends on the technique used for light source reconstruction. As mentioned above, most of the BLT applications are performed using multispectral image acquisition. In this case, choice of bandpass filters for acquisition, field of view, exposure time, as well as animal positioning and route and dose of substrate need to be standardized [162].

Animal Handling: Impact on Preclinical Imaging and Standardization

The field of small animal imaging in preclinical research has expanded in the last few years due to its high potential to analyze functional, anatomical, and physiological processes non-invasively in living animals in follow-up studies over long time periods (depending on the animal model and the imaging methodology). Several factors, such as anesthesia, animal handling, the circadian rhythm, fasting, or administration of the imaging agents, can influence the outcome and reproducibility of each study [163]. To reduce the number of animal studies and to achieve high reproducibility and international comparability among multiple research groups, imaging protocols should be standardized. Animal handling plays a major role and has a great influence on the outcome of quantitative data [163, 164].

General Aspects

Some aspects in animal handling apply to all tomographic imaging techniques, such as anesthesia, animal monitoring (i.e., respiratory or electrocardiography (ECG) rate), temperature control, and heating beds.

Anesthesia

In imaging experiments, the use of anesthesia can often not be avoided as rodents must be constantly restrained. Different anesthetic agents have different effects on rodent physiology, such as glucose metabolism, heart functions, blood pressure, and breathing frequency and, consequently, the study outcome [165, 166]. Excellent literatures that discuss the individual aspects of the different available anesthetics for preclinical use are available [163, 164, 166,158,159,160,170].

Isoflurane, ketamine/xylazine, medetomidine/midazolam, and pentobarbital are frequently used as anesthetics in preclinical studies. Additionally, mice anesthetized with ketamine/xylazine show increased serum glucose levels [171], whereas decreased glucose utilization is observed in rat brains [172]. Further, xylazine alone, which stimulates the α2-adrenergic receptor on pancreatic islands, causes hyperglycemia in mice [173]. The effects of ketamine alone on cerebral glucose utilization can be reversed by administration in combination with xylazine in specific regions [172]. Volatile anesthetics such as isoflurane lead to open mitochondrial adenosine triphosphate (ATP)-regulated potassium channels, whereas propofol or pentobarbital has no effect on these channels [174].

Moreover, anesthetics have different targets and therefore have different effects on brain function during the period that the rodent is anesthetized. When performing imaging studies, the same anesthesia should always be used and should be standardized in its use. Due to the side effects of anesthetics on functional connectivity, cerebral hemodynamics, and brain metabolism, awake imaging of rodents has been attempted for imaging studies [175].

Animal Monitoring

While anesthetics differentially affect physiology, they all cause a significant reduction of the rodent’s body temperature. This reduction itself can affect the physiology of the rodent. Especially in imaging studies in which changes in blood flow can affect the outcome of the study, such as functional MRI studies and studies in which tracers are injected, this decrease in body temperature should be taken into account. During the scan, it is possible to monitor e.g., the heart rate (by ECG sensors), breathing (using pneumatic pillows), blood oxygen levels, and temperature. Keeping these parameters stable over time and as similar as possible for each rodent by adjusting or standardizing the amount of anesthetics can avoid variation in the imaging outcome.

Age, Weight, and Animal Strains

A logical first step in the standardization of imaging studies is the use of rodents of the same strain, age, and animal weight within and between studies with a similar research question, as there are marked functional and behavioral differences between strains, ages, and weights [176, 177]. It is also necessary to standardize the vendor and not use rodents from different vendors within one study or multiple studies in the same animal model [176,168,169,179]. In addition, when performing multicenter studies with small animals, all subjects should be supplied from the same vendor. Transportation stress also has an impact on the animal physiology and should be considered during standardization of imaging experiments [178,170,171,181].

Housing Conditions

How rodents are housed affects their welfare and hence the way they cope with stressful experimental handling. Differences and changes in housing conditions can therefore have a large effect on the experimental outcome of imaging studies when this is not taken into account, especially in neuroimaging studies.

Guidelines exist for the cage size and the number of rodents housed in a single cage, although some studies have challenged these recommendations [182, 183]. While group size itself can have an effect on rodent welfare, an even larger effect can be observed when rodents are housed singly. Another important aspect of rodent housing is the use of environmental enrichment to improve living conditions by meeting the need for rodents to, for example, make nests, find shelter and gnaw, and will positively affect welfare [184]. However, whether environmental enrichment would increase the variability between rodents and thus negatively affect standardization has been questioned [184]. It is thus important to find a balance between enrichment for improving rodent welfare and avoiding the introduction of variability.

Rodents can identify human experimenters by smell, and Sorge et al. were the first to demonstrate that the presence of humans (either male or female) can affect the study outcome [185]. For example, male experimenters caused reduced pain behavior in mice compared with female experimenters, suggesting that standardization of animal handling should include the sex of the experimenters within one laboratory, especially in stress-related studies [185].

Chow

The content of the diet should be investigated before initiating a longitudinal study and should be considered as a source of variation when the results of different research centers are compared using the same animal model. Furthermore, care should be taken due to the fact that some chows can create an unspecific signal in vivo and hence should be avoided if possible (e.g., alfalfa-free chow for OI) [161, 186,178,188].

Heterogenization

On the other hand, there are also reports suggesting that environmental standardization may give rise to idiosyncratic results [189,181,191]. Richter et al. hypothesized that environmental standardization instead of heterogenization may cause poor reproducibility of experimental outcomes [189]. By contrast, van der Staay et al. emphasized the importance of standardization, and they suggested that standardization is inevitable for the risk assessment of new therapeutic drugs and prohibits random variation [192]. Especially in the field of neurology, mainly male rats are used due to differences in the developing brain and a 10 % larger total brain size in male rats compared to female rats [193], while females are often preferred due to their compatibility with each other. A recent meta-analysis supported the use of both male and female rodents, by demonstrating that the variability between females was not greater than that observed in males and that females could be included to limit generalization of findings [194]. However, males and females should not be mixed in a single experiment unless it has been demonstrated that it will not affect the outcome of the study.

Review of Data on the Impact of Animal Handling in CT Imaging

Motion due to animal breathing or their cardiac cycle can interfere with high-resolution in vivo CT imaging. Badea et al. have suggested to apply gating based on pulmonic or cardiac signals to decrease motion artifacts and hence enable in vivo CT imaging of cardiopulmonary structures [195]. However, when performing in vivo high-resolution CT, the radiation dose exposed to the animal should be taken into account since it may potentially affect the outcome, especially in regards to immunological studies; several groups have focused on this (e.g., [28,29,30]).

Most of the preclinical in vivo CT studies utilize contrast agents to increase the soft tissue contrast in certain regions compared to the surrounding tissue. Up-to-date multiple preclinical contrast agents have been developed, since the use of clinical CT contrast agents for preclinical CT imaging is challenging due to the rapid excretion of clinical contrast agents based on the faster cardiac and respiratory rate in mice compared to humans [196].

A detailed evaluation of several preclinical contrast agents has been performed in healthy animals on the basis of an injection volume of 100 μl/25 g body weight and the contrast enhancement in specific organs was determined, as well as the impact of the contrast agents on physiological and immunological parameters. Some of the evaluated preclinical contrast agents did have a distinct impact on certain parameters (e.g., enhanced tumor necrosis factor (TNF) mRNA expression levels), and hence this knowledge needs to be taken into account either when planning an experiment or while interpreting the acquired data when contrast agents were used [196].

In terms of chows, one should take care that these contain no metal fragments or stones since this can lead to metal artifacts in the CT images.

Review of Data on the Impact of Animal Handling in PET Imaging

Mahling et al. focused on tumor hypoxia imaging using [18F]fluoroazomycin arabinoside ([18F]FAZA) and the effect of the anesthetics used (isoflurane vs. ketamine/xylazine while breathing air or oxygen) on tracer uptake [197]. Higher tumor uptake was observed in ketamine/xylazine-breathing mice (for both air and oxygen), and lower whole-body uptake was observed when isoflurane was used for anesthesia, clearly revealing that anesthesia substantially influences the tracer uptake in PET imaging under hypoxic conditions [197]. Fuchs et al. analyzed pCO2, pH, and lactate values in mice before and after 3′-deoxy-3′-[18F]fluorothymidine ([18F]FLT) PET investigations with different breathing and anesthesia protocols in an inflammation (arthritic) and cancer (colon carcinoma) mouse model. Significant changes in pCO2 and lactate values were observed in anesthetized compared to conscious mice breathing air or oxygen [165]. This effect was mainly caused by sustained respiratory acidosis due to oxygen breathing, which caused increased pCO2 and reduced lactate and pH values in rodents and thus affected the results of the study. Interestingly, a significant increase in uptake was observed in the muscle tissue used as control tissue in colon carcinoma-bearing mice under anesthesia compared to awake mice. Since muscle tissue is often used as a reference uptake region and compared with tumor uptake, these results should be considered when analyzing the acquired data [165].

The optimal anesthesia condition for imaging of lung metastasis using [18F]FDG was evaluated by Woo et al. [198]. They detected the lowest tracer uptake in the chest wall and the heart using 0.5 % isoflurane in 100 % oxygen and concluded that this condition was suitable for their application in addition with fasting (20 h before the tracer injection) and warming [198].

The uptake behavior of [18F]FDG using either isoflurane or sevoflurane mixed in air was determined by Flores et al. [199]. The authors investigated this using athymic nude and Balb/c mice and analyzed blood glucose levels and tracer uptake in various organs. They recommended using sevoflurane instead of isoflurane as routine anesthesia especially for [18F]FDG PET, in which the blood glucose levels can change the uptake behavior [199].

Furthermore, Chan et al. performed in vivo experiments using [18F]FDG to determine the influence of tumor oxygenation on the tracer uptake and argued that in potentially hypoxic tumor areas the tracer uptake can be influenced by the tumor oxygenation [200].

Fueger et al. investigated the effect of ambient temperature, anesthesia, and dietary state on the biodistribution of [18F]FDG in mouse tumor models (Fig. 2) [171]. A profound influence of these parameters on the tumor visualization and the biodistribution of [18F]FDG was detected [171]. These results have been confirmed by others [201, 202]. To reduce variation in [18F]FDG uptake, fasting of animals should be considered. Most importantly, the fasting duration should be standardized within a study and between studies when data must be compared [171].

Fig. 2
figure 2

Impact of animal handling on the biodistribution of [18F]FDG. a Not fasted, warmed, no anesthesia. b Fasted, not warmed, no anesthesia. c Fasted, warmed, no anesthesia. d Fasted, warmed, no anesthesia, conscious injection. e Reference conditions: not fasted, not warmed, no anesthesia. f μCT, sagittal view for anatomic reference. g Not fasted, warmed, isoflurane. h Fasted, warmed, isoflurane. i Fasted, warmed, ketamine. This research was originally published in JNM. From Fueger BJ, Czernin J, Hildebrandt I, et al. Impact of animal handling on the results of 18F-FDG PET studies in mice. J Nucl Med. 2006; 47:999-1006. © by the Society of Nuclear Medicine and Molecular Imaging, Inc. [171].

The impact of anesthesia on reproducibility has been investigated extensively for a variety of different PET tracers (such as [18F]FDG, [18F]FLT, or [18F]FAZA), but studies have been limited for other tracers, for instance C-11-labeled substances, such as [11C]raclopride for D2-receptor imaging, [11C]-3-amino-4(2-dimethylaminomethyl-phenylsufanyl)-benzonitrile ([11C]DASB) for serotonin-receptor imaging, or [11C]Pittsburgh compound B ([11C]PIB) to determine amyloid deposits in Alzheimer’s disease. Especially in the preclinical setting, these tracers, among others, are used extensively for various models [203,198,205], and hence the imaging routines for these tracers should be standardized to obtain reliable and highly diagnostic results, as well.

Review of Data on the Impact of Animal Handling in MR Imaging

Variations in anesthetic regimes, route of administration, physiological parameters, gender, strain, circadian cycles, and diet can affect the results of animal MRI/MRS experiments [206]. Different studies have examined possible effects of different anesthetic regimes on structural, functional, and/or pharmacological MRI and MRS studies [83, 207,202,203,204,211]. Anesthetic regimes and doses should be carefully selected, particularly for functional MRI and MRS studies. While isoflurane is the most commonly used anesthetic for structural imaging due to the fast recovery of exposed animals, a variety of anesthetic regimes are used for functional and/or pharmacological MRI, including α-chloralose, medetomidine, propofol, and urethane. The assets and drawbacks of these anesthetics have been comprehensively discussed in regards to MRI in earlier reviews [83, 207,202,203,204,211]. To increase data quality and decrease inter- and intra-subject variability, attention should be paid to uniformly positioning the animal with respect to the RF coil(s), as this affects the loading of the coil, which is related to coil sensitivity (see [206] for a detailed discussion). The outcomes of a variety of MRI techniques, such as resting state functional MRI (fMRI) networks [176, 212], neuroanatomy [213], and cerebral metabolite levels [214], are dependent on the rodent strain. Less studied causes of variation in MRI experiments are animal stress (which alters corticosteroid levels) and diet. The impacts of different stress models on MRI have been depicted [215,210,217]. Acclimation of the animals to the scanner room and handling may decrease the physiological stress levels of the animals. Depending on the experimental context, the type of diet may play an important role in functional and structural alterations of brain networks [218]. When the conditions mentioned above are not controlled, variations among experiments performed using MR techniques may occur. The experimental setup of experiments and physiological monitoring should be standardized to minimize subject-related variations.

Review of Data on the Impact of Animal Handling in Optical Imaging

For optical imaging, nude mice bear significant advantages because the removal of hair from normal hairy mice can cause strong infectious reactions and irritations. Furthermore, in some mouse strains (for instance C57BL/6), pigmented regions frequently remain, which affect the imaging in an unpredictable manner. Additionally, nude mouse strains differ in size, e.g., BALB/c nude mice are typically smaller than CD1 nude mice; both strains are frequently used for tumor experiments due to their immune deficiencies. Small mice bear advantages for FMT and BLT because resolution and sensitivity are reduced in deep tissue regions [145]. Immuno-competent nude mice are also available, e.g., SKH1-mice or Black six nude mice. Additionally, the use of μCT contrast agents may affect the optical imaging. For example, AuroVist, a long-circulating agent, shows strong optical absorption, which is apparent as purple skin color [219]. The type of chow may also seriously affect fluorescence experiments by increasing the background signal, particularly for wavelengths below 750 nm, which can be avoided by using a special chlorophyll-free chow [161, 188].

Proper animal positioning should be considered for optical imaging. Some devices require a special holder, which squeezes the animal between two glass plates to reduce the maximal diameter, which provides advantages due to the limited optical penetration depth. This holder should be cleaned regularly to remove urine and feces that may contaminate the bed with fluorescence [149]. Furthermore, different positions are possible, i.e., a mouse may be positioned on the side to better image a subcutaneous tumor positioned on the lower leg [220, 221].

Image Analysis

Image format standardization has already been implemented for most of the techniques. For CT, emission tomography, and MRI, the Digital Imaging and Communications in Medicine (DICOM) working group (WG-30) promoted the use of DICOM standards in preclinical imaging (readers are referred to http://dicom.nema.org/ for detailed information).

In regards to output formats, two categories can be distinguished, namely open formats that are not restricted in use to a certain group of people and proprietary formats, where the majority of these are linked to certain organizations or industry products in order to ensure the ongoing use of their software packages. Additionally, proprietary formats might include calibration factors that automatically scale the imaging data, which needs to be taken into account, as well.

The contribution of data analysis to reproducibility can also be assessed. Data analysis in emission tomography can be highly demanding, particularly for kinetic modeling or multiparametric PET studies. Standardization of data analysis might be limited by differences in software as well as the expertise of the operator analyzing the data, which can impact the reproducibility and reliability of the acquired data. Therefore, each operator should be trained accordingly using specific training datasets with known outcomes to ensure the reproducibility of image analysis. However, there is no common standard for such training and it is currently addressed on an individual basis by each institution.

Multiple analysis software solutions that fit the different needs of individual studies are available, ranging from proprietary developments to fully licensed software solutions by scanner manufacturers or third-party companies. There is a need for systematic analysis of all data analysis methods to compare the efficiency of these methods with each other. For standardization purposes, a sample dataset that has passed through predefined quality control assessments can be used to compare results from different data analysis techniques as well as software. The same dataset should also be analyzed by more than one person using the same data analysis methods to estimate whether the data analysis steps are adequately standardized and reproducible.

This should be performed using datasets with a known ground truth (such as in vivo data with ex vivo correlation or simulated datasets) to ensure reliability and using quantitative uptake and/or kinetic parameters. A detailed review on segmentation analysis of PET imaging data can be found in the literature [222]. The availability of anatomical μCT or MRI data can allow for reproducible segmentation of organs, lesions, and tumors, which reduces the inter-reader variability compared with, e.g., unimodal FMT usage [221, 223]. While some organs such as the kidney and bladder are easy to segment, the liver is more difficult due to its lobular structure, resulting in higher variability between users [221]. Therefore, fully automated organ segmentation may become a valuable tool, particularly for biodistribution studies [224].

Finally, guidelines for reporting of small animal experiments have been suggested, and we strongly advise adhering to these guidelines [56].

Discussion and Outlook

This review wants to contribute to the validity and reliability of small animal imaging data by promoting the standardization of imaging procedures. It builds on the good practice QC paper by Osborne et al. [25]. It proposes to expand the use of phantoms to the measurement of image metrics (like signal-to-noise ratio in homogeneous regions) of which the values can be required to fall within a range of reference values. This range of values determines the reproducibility of the system-specific factors. Consequently, this is the lower limit of actual reproducibility in animal studies.

This review discusses general contributions to variability induced by animal handling, as well as examples from literature for CT, PET, MRI, and OI. Given the broad field, no general guidelines exist that apply to all imaging biomarkers. Therefore, specific animal handling protocols for a specific biomarker will need to be developed by the field. This will need to include data analysis procedures and adherence to the “Guidance for methods descriptions used in preclinical imaging papers” [56], which was designed to ensure that each report on a small animal imaging experiment contains the essential information required to understand and reproduce the experimental work.

In short, the utility of preclinical imaging would be enhanced by improved standardization. Approaches do exist for the implementation of the next steps in quantification, and it is encouraging that practical initiatives for their realization are currently being implemented in the context of the ESMI and the EANM workgroups.