Development of transcriptomics

In biological research, the evolution of today’s high throughput techniques stem from the Human Genome Project (HGP). The push to sequence the human genome set the foundation for the information line between bench work and computational methods that we know today. This is not to say that computational integration into biologic fields only began with the HGP, but instead insights into DNA throughout history and the microprocessor helped computer and biological sciences merge. Sequencing DNA to unravel the code of the genome has long been a goal of scientists. Chemical (Maxam-Gilbert) and chain terminator (Sanger) sequencing were developed in the 1970s (Maxam and Gilbert 1977; Sanger et al. 1977), researchers began to the raise the question: how do we compile and analyze the sequences to form information that is biologically relevant? Scientists saw the need and developed software and algorithms to process the new data (Staden 1979, 1984). As time passed, computers increased in power and decreased in size giving way to more familiar and public applications. This led to the creation of open sourced sequence databases and algorithms (i.e. National Center for Biotechnology Information 1989).

In the early 1990s, researchers began using macroarrays to measure changes in gene expression between phenotypes of interest. The design of the macroarray was made with large, hand-printed spots containing large quantities of cDNA for adequate hybridization on top of a nylon membrane. The arrays were radioactively labeled and had limited detection to a single sample each. All these “pitfalls” in macroarray technology drove to streamline the beginnings of microarray technology and applications. As the decade progressed, advances in fluorescent labeling took the place of radioactive labels allowing comparison of two different samples on one macroarray. However, this protocol called for more sophisticated instrumentation to detect fluorescent emissions, pushing technology to meet the demands of the budding field of genomics. Pat Brown and his group at Stanford developed the microarray by using robotic printing of cDNA spots on smaller microscope glass slides. The spots of the microarray require less sample cDNA for hybridization allowing for a greater number of detectable genes (Schena et al. 1995). The driving force was to make the arrays more applicable to larger genomes as opposed to smaller genomes. By incorporating robotic printing, arrays became more of a true high through put data source.

During the fabrication and development of microarrays, researchers were also interested in increasing the speed at which DNA or RNA could be sequenced. In 1990, the HGP chose to tackle the human genome by using a clone-by-clone strategy, sequencing one chromosome at a time. For the time this was semi-high throughput because most of the experiments and analysis being done by hand, but as the project went on more robotics and computational analysis were incorporated. Eight years into the 15-year HGP, J. Craig Venter believed that shotgun sequencing would speed the project up without affecting the accuracy. The human genome was published by both groups in February 2001 (Lander et al. 2001; Venter et al. 2001).

In a post HGP world, microarrays and sequencing have grown leaps and bounds. The availability of complete genomic sequences has led to improvements in both the accuracy and interpretations of the microarrays, as well as other functionalities such as examining alternate splicing. On the sequencing aspect, a number of technologies, known as next-generation sequencing, have now been developed allowing the performance of multiple parallel sequencing reactions at once, giving researchers the ability to obtain vast amounts of sequence data starting from RNA or DNA (Schuster 2008). Sequencing of the entire transcriptome of a given sample is now doable in a quantitative manner (known as RNAseq) (Wang et al. 2009). While it was thought that RNAseq would result in the obsolescence of microarrays, the experience in their use, a high level of understanding of the strengths and weaknesses of microarrays, wide availability of the equipment needed to perform microarray experiments as well as the computer hardware and software necessary for their analysis has resulted in their continued widespread use. Furthermore a large amount of microarray data is shared by researchers through repositories such as Gene Expression Omnibus (GEO, www.ncbi.nlm.nih.gov/geo/) and ArrayExpress (www.ebi.ac.uk/arrayexpress/). The utility of this data has been greatly increased through actions by the research community to set up a series of standards, known as Minimum Information About a Microarray Experiment (MIAME), for the presentation, interpretation, and exchange of microarray experimental data (Brazma et al. 2001).

With this public data availability one can reanalyze such data for genes and conditions of interest. In addition meta-analysis is becoming more and more applicable to microarray data as well as a large number of research interests. Meta-analysis is a combinational technique of independent studies addressing the same or similar hypothesis(s). This allows for the analysis of a larger sample sizes to better represent the population at large. The method reduces type I error, therefore reducing false positives and improving the reliability of the results. Meta-analysis also has the ability to control for between-study variation and can be a useful tool in identifying differences between studies. Overall meta-analysis can be a powerful tool in combining independent data sources in microarray analysis in biomedical studies (Ramasamy et al. 2008).

Transcriptomic characterization in the nonhuman primate model of neuroAIDS

The use of simian immunodeficiency virus (SIV) infected macaques in the study of HIV infection of the brain has been quite productive in studies of HIV neuropathogenesis (Fox et al. 1997; Burudi and Fox 2001; Williams et al. 2008). Among the advantages compared to other animal models are the similarities in the virus, as well as the host immune system and central nervous system (CNS). Furthermore, as compared to humans, the ability to control conditions is a significant plus. Relating to the virus one can control the time, route and dose of infection as well as the nature of the infecting virus itself. A wealth of host factors can also be influenced, as one can assess host genetic factors affecting HIV/SIV pathogenesis, and control diverse aspects including the environment, comorbid conditions, diet, and medication adherence. In general the CNS can only be sampled post-mortem in humans, and even then obtaining samples can be problematic. It should be noted that one outstanding resource in this regard is the National NeuroAIDS Tissue Consortium (NNTC, www.nntc.org) that provides CNS samples from well-characterized descendants to qualified investigators (Morgello et al. 2001; Everall et al. 2009), and studies using such tissue are also reviewed below. However in the monkey/SIV model animals can be sacrificed when the experimental protocol dictates, the brain can be perfused to eliminate blood contamination, and samples can be taken fresh minimizing post-mortem changes and degradation. However assessing details of neurocognitive status is difficult despite the development of methods to assess cognitive functions in SIV infected monkeys (Murray et al. 1992; Gold et al. 1998; Weed et al. 1999).

Messenger RNA (mRNA) studies

To date there have been four studies in which whole genome transcriptome characterization of protein coding genes has been performed on brain samples in SIV-infected monkeys. One examined acute infection (Roberts et al. 2004b), one a chronically infected pre-AIDS stage (Roberts et al. 2006), and two in animals with terminal AIDS and SIV encephalitis (Roberts et al. 2003; Gersten et al. 2009). The study of acute infection was performed on animals 2 weeks after intravenous inoculation with SIV, chosen since reproducible CNS infection is present at this stage. Examination of mRNA from the frontal lobe of animals revealed that the transcripts of 97 genes were significantly upregulated when compared to expression levels in the frontal lobe of uninfected animals (Roberts et al. 2004b). Examination of the literature and databases revealed that a large proportion of these genes were related to cytokine pathways, specifically those of interferon and IL-6. The transcription factor Signal Transducer and Activator of Transcription 1 (STAT1), involved in the interferon response, was found to be upregulated at the mRNA level by microarray as well as at the protein level by immunohistochemistry on the SIV infected frontal lobe tissue. Although no change in the levels of interferon mRNAs was found by quantitative RT-PCR analysis, a significant increase in IL-6 was present in the acutely infected brain. A number of tools are now available to assess the biological relevance of gene lists generated through microarrays through computational means, such as the Database for Annotation, Visualization and Integrated Discovery (DAVID, david.abcc.ncifcrf.gov/) (da Huang et al. 2009a, b). Interestingly the use of DAVID to assess the upregulated genes found in this study reveals a number of related biological processes occurring during acute infection, indicating the generation of an immune response in the brain to infection (Table 1).

Table 1 Gene ontology biological processes enriched in the frontal cortex of acutely SIV infected monkeys (from the microarray study of (Roberts et al. 2004b), as predicted (using DAVID) by the significantly upregulated genes. Shown are the terms with a corrected p-value of <0.01

SIV infected animals at 2 years post-inoculation were used for the studies of chronic infection of the brain (Roberts et al. 2006). These animals were relatively healthy and had not developed simian AIDS, but did have abnormalities on CNS functional tests, including motor and behavioral tasks as well as electrophysiological findings. In contrast to the study on acute infection, only seven genes showed significant change in frontal lobe mRNA in the chronically infected animals relative to levels found in uninfected animals. Specifically, these were the interferon-induced genes G1P3 and IFITM1; major histocompatibility complex (MHC) genes HLA-A, HLA-C and HLA-DRα; immunoglobulin gene IGHG3, and the chemokine gene CCL5 (also known as RANTES). This again points to an active immune response in the brain. The expression of a chemokine was intriguing, given the earlier demonstration of CD8+ T cells in the brains of chronically SIV infected monkeys, acting as SIV-specific cytotoxic T lymphocytes (CTL) (von Herrath et al. 1995; Marcondes et al. 2007). Indeed in this study brain infiltrating CD8+ T cells were present which themselves expressed CCL5 (Roberts et al. 2006).

The first study examining the terminal stage of SIV encephalitis characterized genes upregulated in the frontal cortex, as well as the occipital lobe, midbrain, and cerebellum (Roberts et al. 2003). The frontal lobe was studied in the most detail, and 98 genes were found to be upregulated. A number of the mRNAs and proteins encoded by these genes were examined by in situ hybridization and immunohistochemistry, revealing a complex expression pattern of expression in brain endothelial cells, neurons, glia, and infiltrating macrophages.

The genes identified represented a number of the pathological processes involved in neuroAIDS, including the immune response, the interferon/STAT1 pathway, and monocyte/macrophage migration. Applying the DAVID bioinformatics tool reveals a number of related biological processes ongoing in the encephalitic brain (Table 2). This is notable for the breadth of the host response, involving processes related to the adaptive immune response, in particular to exogenous antigens via MHC class I and II, likely related to the CTL response as discussed above. Furthermore additional aspects of immunity, including B cells/immunoglobulin as well as the acute response are elevated. This study, the first of its kind in CNS infection by HIV or SIV, has led to a number of findings followed up in studies in humans as well as experimental animals, including the role of CD163+ monocytes/macrophage/microglia (Roberts et al. 2004a; Kim et al. 2006; Borda et al. 2008), osteopontin (Burdo et al. 2007, 2008; Marcondes et al. 2008; Brown et al. 2011), STAT1 (Potash et al. 2005; Chaudhuri et al. 2008), and the glycoprotein CHI3L1 (Bonneh-Barkay et al. 2008) in neuroAIDS.

Table 2 Gene ontology biological processes enriched in the frontal cortex of SIV infected monkeys with SIV encephalitis (from the microarray study of (Roberts et al., 2003)), as predicted (using DAVID) by the significantly upregulated genes. Shown are the terms with a corrected p-value of <0.01

The second study on SIV encephalitis made use of a significant advance. Due to the work on analysis of the human genome and the obvious greater demand for reagents to study humans, microarrays were initially available for human (as well as mouse) sequences and thus the three earlier studies described above were performed using human gene arrays. Due to the relatively high sequence conservation between rhesus monkeys and humans (approximately 95%) many of the probe sets present on the human array indeed work to assess the expression of monkey genes, however there will certainly be false negatives for genes that could not be assessed. It has been estimated that one may not be able to measure the expression of approximately one-third of the genes in rhesus monkeys using the human arrays (Chismar et al. 2002). Investigators therefore used a number of approaches to devise a microarray specific for rhesus monkeys (Spindel et al. 2005; Duan et al. 2007), and this was used for the following study (Gersten et al. 2009).

This study examined mRNA from the hippocampus. A large number of genes should significant changes in expression between uninfected animals and those with SIV encephalitis: 720 were upregulated and 106 downregulated. This increased number of genes identified versus the first study on SIV encephalitis (Roberts et al. 2003) could likely be a combination of the effects of using the rhesus-specific microarray as well as examination of the hippocampus as opposed to frontal cortex, however true determination of these possibilities would require direct comparisons of the regions and arrays. DAVID was utilized in this study to examine altered pathways. While the downregulated genes did not result in enrichment the upregulated genes pointed to changes in immune, inflammatory, and stress responses, as well as apoptosis, cell proliferation, and signaling cascades.

A systems biology strategy was then utilized in order to help uncover the mechanisms behind neuronal dysfunction in neuroAIDS. The changes in gene expression were mapped to protein-protein interaction networks, and modules identified that distinguished uninfected animals from those with SIV encephalitis. This led to the identification of the transcription factor Early Growth Response 1 (EGR1), a key molecule in hippocampus-related learning and memory, to be downregulated in hippocampal neurons in the encephalitic brain (Gersten et al. 2009).

While the two studies on SIV encephalitis were performed on different brain regions, using different microarrays, and distinct forms of analysis, there is indeed commonality between the results. Examination of the upregulated genes reveals that 50 genes were identified in both studies (Table 3). The DAVID bioinformatics tool indicates that these common genes reflect the host response to infection, enriched for processes linked to inflammation and immunity (not shown).

Table 3 Gene Symbols of the 50 genes upregulated in common in the two studies on SIV encephalitis (Roberts et al. 2003; Gersten et al. 2009)

MicroRNA (miRNA) studies

While protein coding (as well as ribosomal and transfer) RNAs have been long studied, the recent finding that non-coding RNAs can have significant specific effects on gene and protein expression has led to a productive new area of research. In particular miRNAs – small (approximately 21–22 nucleotide) sequences that can bind to mRNAs and affect their stability or translatability – can play an important role in neurodegenerative disorders (Eacker et al. 2009; Yelamanchili and Fox 2010).

There has been a single study to date in which comprehensive miRNA profiling has been performed in the brain of SIV-infected monkeys (Yelamanchili et al. 2010). In this work the miRNA expression pattern was compared between uninfected animals and those with SIV encephalitis, examining two regions of the brain, the caudate and the hippocampus. Six miRNAs were significantly upregulated in the caudate, and four in the hippocampus, of monkeys with SIV encephalitis. Validation of the expression changes by quantitative RT-PCR revealed that three miRNAs were indeed upregulated in both regions: miR-21, miR-142-5p and miR-142-3p. The study then focused on miR-21. In situ hybridization confirmed its upregulation in the brain and identified neurons as the cell type in which the increased expression occurred. Functionally introducing miR-21 expression into culture primary neurons resulted in electrophysiological abnormalities. Examination of potential mRNA targets of miR-21 in neurons revealed that the transcription factor MEF2C, crucial in multiple aspects of neuronal development, function and survival as well as learning and memory, is targeted by miR-21. Indeed immunohistochemistry revealed a reduction of MEF2C in hippocampal neurons in animals with SIV encephalitis (Yelamanchili et al. 2010).

Studies in humans

There have been four studies of mRNA and three studies of miRNA utilizing human brain tissue. These have all been of autopsy specimens. For mRNA, one group examined frontal cortical gray matter (Masliah et al. 2004). All subjects had HIV infection, and it was found that in those with HIV encephalitis, 74 genes were downregulated and 59 were upregulated compared to HIV infected individuals without encephalitis. A number of downregulated genes were involved in synaptic plasticity and transmission as well as signaling molecules, whereas upregulated genes included those involved in immune responses; for the upregulated genes similar to the studies on SIV encephalitis. The second investigation also examined frontal cortical gray matter in addition to adjacent gyral white matter, here comparing samples from uninfected individuals to those who were HIV infected with dementia (Gelman et al. 2004). While the complete results on altered genes were not reported, the analysis focused on ionic channel genes, and revealed that many were significantly altered in HIV dementia compared to uninfected control individuals. This implied that a “channelopathy” might underlie the functional neuronal pathology in neuroAIDS. Indeed ion channels were also found altered in the first study described above (Masliah et al. 2004). A third study compared gene expression patterns in the frontal cortex (Shapshak et al. 2004) between uninfected controls and a group of HIV infected individuals, some of whom had dementia and encephalitis. The focus was on exploring different statistical clustering methods to group differentially regulated genes; of note within the HIV infected and uninfected groups there was a high level of correlation of gene expression.

The fourth study compared gene expression patterns in deep white matter from control uninfected individuals to those with HIV-associated neurocognitive disorder (HAND, predominantly HIV dementia), stratifying by whether the individuals were taking antiretroviral therapy up at the time of death (Borjabad et al. 2011). Profound differences in gene expression were found in those not on therapy, with 947 upregulated and 523 downregulated compared to uninfected controls. In those on therapy at the time of death, there were approximately 10 fold fewer differentially regulated genes. In those not on therapies, bioinformatics analysis revealed that upregulated pathways included immune responses and inflammation, whereas downregulated pathways included synaptic transmission, neurogenesis, and ion transport (Borjabad et al. 2011). Thus in the human studies alterations in ion transport and channels, and similar to the SIV studies immune responses and inflammation, appear to be a common theme.

In the two human studies reporting complete results (Masliah et al. 2004; Borjabad et al. 2011) despite examining distinct cellular populations (gray matter and white matter) there is again commonality in a set of upregulated genes (utilizing the genes found in (Borjabad et al. 2011) in individuals not on treatment), with 15 genes related to the immune response and the interferon system (Table 4). Comparison to the common genes found in the SIV studies (Table 3) reveals that B2M, BST2, IFI44, IFIT3, IFITM1, LGALS3BP, MX1, STAT1 are upregulated in all studies, regardless of species, microarray, and means of analysis. These are all inducible by interferons, and play diverse roles in antiviral responses, indicating the active host response present in neuroAIDS. Even when examining the genes upregulated in those who were on antiretroviral therapy at the time of death (Borjabad et al. 2011) five of these eight genes found in the SIV and HIV studies (B2M, IFI44, IFIT3, MX1, STAT1) are still significantly elevated in the brain despite treatment.

Table 4 Gene Symbols of the 15 genes upregulated in common in two of the studies on HIV neuroAIDS (Masliah et al. 2004; Borjabad et al. 2011)

The miRNA study on SIV infected monkeys also examined RNA from the caudate of individuals with HIV encephalitis and dementia in comparison to uninfected individuals. The microarray and quantitative RT-PCR revealed that the same three microRNAs found in SIV encephalitis were also upregulated in humans: miR-21, miR-142-5p and miR-142-3p (Yelamanchili et al. 2010). Another study profiled miRNAs in frontal lobe white matter from individuals with HIV encephalitis and uninfected controls. Microarray followed by quantitative RT-PCR revealed the six upregulated and eight downregulated miRNAs in the brains with HIV encephalitis (Noorbakhsh et al. 2010). Interestingly a number of the miRNAs that were downregulated targeted caspases, implying that they would be de-repressed (increased). Indeed increased activated caspase 6 was found in astrocytes in brains of those with HIV encephalitis (Noorbakhsh et al. 2010). The third study used an integrative approach of examining the expression of miRNAs and their predicted mRNAs targets in the frontal cortex from three groups: uninfected controls, HIV infected, and HIV infected with major depressive disorder. No information on neurocognitive function was given, one of the HIV/depression cases had HIV encephalitis. Using this approach 98 miRNAs were predicted to be altered in the HIV infected groups (Tatro et al. 2010).

In the two human studies reporting values from the array results in HIV encephalitis (Noorbakhsh et al. 2010; Yelamanchili et al. 2010) despite examining distinct regions and of the brain (caudate and frontal white matter) there are two miRNAs that are upregulated in both studies: miR-142-5p and miR-142-3p.

Perspectives

Transcriptomics utilizing microarray techniques have proved to be useful tools in identification and understanding the effects of SIV/HIV on mRNA/miRNA levels and uncovering the basis of CNS malfunction. As our genomic knowledge grows it will increase our understanding of how these changes reflect or result in altered functions. This was highlighted within the second SIV encephalitis mRNA monkey study by using the species-specific microarray platform. Unexplored aspects in the monkey model and humans include alternate splicing and long coding RNAs. While microarrays can be utilized for some of these aspects, RNAseq, not yet utilized in neuroAIDS, promises to yield novel findings due to the depth of analysis that can be achieved. This again depends on our ability to understand the relationship of the sequences obtained to the workings of the genome/transcriptome, the knowledge of which is growing in parallel.

While monkey and human transcriptomic studies have positive and negative qualities, both are instructive in the study of effects of HIV on the brain. While common findings are highlighted above, the application of meta-analysis would increase the strength of the findings and enable better generalization of the results. Furthermore meta-analysis is not only applicable to within species studies, but can also be utilized for between species assessments. This would be a suitable tool to highlight strengths, weaknesses, similarities, and differences between nonhuman primate and human models.

While DAVID is a useful discovery tool, there are alternate or complementary bioinformatics methodologies that would benefit in the study of neuroAIDS. Some of these have been utilized in the described studies, and many more are being developed. The integration of transcriptomics with other modalities, such as proteomics and metabolomics, also holds great promise. To facilitate further analysis, the GEO accession numbers for the studies described here that have been made publically available are listed in Table 5.

Table 5 GEO accession numbers for the data that can be accessed at http://www.ncbi.nlm.nih.gov/geo/ for non-human primate and human mRNA experiments described in the text. HAND = HIV associated neurocognitive disorder

The utility for SIV infected nonhuman primate models of neuroAIDS for transcriptomic studies will prove to be even greater as technologies become faster and less expensive and our knowledge of the monkey genome grows. The ability to assess changes in the brain under controlled conditions and to examine the brain before death from advanced disease are unique features only possible in this model. This will enable a better understanding of the mechanisms of disease, and to develop and test means to treat and prevent the CNS complications of HIV infection.