Keywords

Immunology, which is the study for the immune system, started in the late nineteenth century beginning with two significant discoveries. One was the phagocytosis by macrophages which plays a critical host-defense mechanism against invading pathogens found by Elie Metchnikoff (1845–1916). The other one was an antibody which can neutralize microbial toxins discovered by Emil von Behring (1854–1917) and Paul Ehrlich (1854–1915) (Kaufmann 2017). Since then, immunology has been a field of intensive biomedical research and contributed to society by providing pivotal knowledge on both basic science and clinical applications along with its development. While the classical and authentic function of the immune system is to protect our bodies from diverse pathogenic microorganisms, including bacterias, viruses, and parasites, recent immunological studies revealed different parts of the immune system in eliminating cancer cells and regulating physiologic processes in diverse tissues such as the nervous system function, metabolic state, thermogenesis, and tissue repair (Chaplin 2010; Rankin and Artis 2018; Rouse and Sehrawat 2010). Now, we recognize the immune system is a multifunctional biological system and vital for our health and survival.

The immune system is composed of different types of immune cells, which do not form a single organ like the brain and heart but are spread throughout the body to achieve rapid responses to invading pathogens. As transcriptional regulation plays a crucial role in shaping these immune cells of diverse differentiation and activation status, various immune cells were examined at the transcriptional level. These profiling analyses effectively yielded relevant insights of immune cells regarding respective regulatory mechanisms and crucial factors involved in cell differentiation and activations (Amit et al. 2011; Lara-Astiaso et al. 2014; Mostafavi et al. 2016; Smale and Fisher 2002; Uhlen et al. 2019). Furthermore, recent single-cell transcriptome analyses by single-cell RNA sequencing (scRNA-seq) provide unprecedented high-resolution insights of immune cells which cannot be captured by studies in bulk and are expected to promote our understanding of the nature of immune cells in both physiological and pathological contexts (Proserpio and Mahata 2016; Roy 2019; Seumois and Vijayanand 2019; Stubbington et al. 2017; Xie et al. 2021). We introduce general features of the immune system and discuss the transcriptome analysis applied to explore the immune system.

1 The Immune System and Immune Cells

The immune system not only protects us from diverse infections, including bacterias, viruses, and parasites, but also eliminates cancer cells and healing wounds. The efficiency of the immune activity relies on the orchestrated functions of a set of different types of immune cells, which are responsible for the diverse steps of the process. In the case of infections, for example, these include pathogen recognition, the cascade to recruit and activate effector cells, and the final clearance by other immune cells. Identifications of different types of cells involved in the immune process have been a keen target of immunological research for decades, and accordingly, types of immune cells have been expanded, which contributed to the dissection of the immune functions. It was started from the discovery of white blood cells in 1843 by Gabriel Andral (1797–1876) and William Addison (1802–1881). Then, different types of immune cells have been progressively identified along with the development of technologies such as flow cytometry in the 1960s and monoclonal antibodies in the 1970s, which were collectively employed to specify CD4+ T cells and CD8+ T cells, for instance (Hajdu 2003; Jayasinghe 2020; Packer 2021). The major populations of immune cells include granulocytes and macrophages with innate ability to phagocytose bacteria, antibody-producing B cells which were discovered before the 1990s, and more than 80 immune cell populations are recognized to date (Fig. 10.1) (Ackerman 1964; Hayakawa et al. 1983; Maecker et al. 2012; Stein et al. 1992).

Fig. 10.1
figure 1

Overview of immune cell populations

Arrows indicate schematic representation of the standard model for hematopoietic stem cell differentiation. Self-reactive T cells are eliminated by negative selection after the CD4/CD8 double-positive (DP) stage in the thymic medulla with the help of mTECs. ILC innate lymphoid cell, NK natural killer cell, gdT gamma delta (γδ) T cell, Treg regulatory T cell, DC dendritic cell

While different immune cells possess distinctive functions, essentially all immune cells develop from a hematopoietic stem cell in the bone marrow and share the same genome except for rearranged genes (i.e., T cell receptor and immunoglobulin). Through the differentiation pathways that can be parsed up to as many as ten successive steps, immune cells acquire their divergent capabilities, which are established by correspondent transcriptional landscapes (Hardy and Hayakawa 2001; Rothenberg 2014). As such, transcriptional regulations are the most fundamental mechanisms controlling immune cells and the immune system. Thus, it is not surprising that the recent development of single-cell transcriptomics is promising to confirm existing populations and unveil new populations efficiently in an unbiased manner.

2 Transcriptome Analysis of Different Subsets in Bulk

While more than 80 immune cell subsets are recognized throughout our body, many subsets residing in lymph nodes, tissues, and organs, immunocytes in peripheral blood are the most feasible cells to be examined for research and clinical diagnostics (Chou and Li 2018; Maecker et al. 2012; Novershtern et al. 2011). A few large transcriptomic studies have been done on different immune cell populations in blood. For example, 13 immune cell types in peripheral blood were examined by Schmiedel et al. (2018) 29 immune cell types by Monaco et al. (2019), and 18 immune cell types by Uhlen et al. (2019). In these studies, they isolated immune cells in blood including such as monocytes, natural killer (NK) cells, neutrophils, basophils, B cells, CD4+ and CD8+ T cells , as well as dendritic cells (DCs) employing known markers and fluorescence-activated cell sorters (FACS), and profiled whole transcriptomes by RNA sequencing (RNA-seq) or microarrays. These studies have revealed the distinctive global expression profiles of various immune cells where granulocyte cell types (neutrophils, basophils, and eosinophils) are discrete from others, all lymphocytes make a cluster, including T cells, NK cells, and B cells. In contrast, monocytes are closely related to DCs. According to the study by Uhlen et al., among ~16,000 genes detected in 18 immune cell types, ~10,000 were detected in single-cell types, which were almost comparable to genes detected in cell lines (~9,500 genes per cell line). Of these, 5,934 genes were seen across all immune cells and 1,713 genes in a single-cell type: 9,939 genes showed low specificity for cell types. The sets of differentially expressed and co-expressed genes were served to deduce the functional modules of genes with the aid of bioinformatics such as enrichment analysis using the gene ontology (GO). Furthermore, these transcriptome atlases in each cell type are valuable to promote the understanding of primary immunodeficiency diseases (PID). PID are a large group of over 400 different diseases caused by quantitative and functional changes in the various mechanisms involved in immune response and associated with complications including infections, autoimmune disorders, immune dysregulation with lymphoproliferation, inflammatory disorders, lymphomas, and other types of cancers (Amaya-Uribe et al. 2019; Sánchez-Ramón et al. 2019). While PID are caused by genetic disorders and 354 diseases were listed as consequences of monogenic defects in genes associated with the immune system involving 224 identified genes, the mechanism of disease is often incompletely understood (Uhlen et al. 2019) (https://www.omim.org). Uhlen et al. hypothesized that an analysis of cellular expression of identified genes could help generate a better mechanistic investigation and analyzed 224 PID genes across their 18 immune cell populations. They divided these PID genes into seven clusters according to the shared expression pattern among cell populations and found some PID genes are expressed explicitly in restricted populations. These included the CEBPE gene in which mutations can cause specific granule deficiency 1 (SG1) highly expressed in eosinophils. Although SG1 has been considered a neutrophil-granule deficiency associated with recurrent pyogenic infections, CEBPE’s expression in eosinophils suggested that eosinophil deficiency might also be involved in SG1.

It is also worth noting that the variable mRNA abundance in different immune cells was carefully examined in the study by Monaco et al., and they developed an enhanced method for normalization. The normalization for mRNA abundance can become essential for differential expression analyses. For example, if the analysis is done with two cell types of essentially different total mRNA amounts per cell (e.g., same 10 gene X mRNA molecules are expressed, but cell type A is expressing 100 mRNA molecules in total, and cell type B is expressing 1,000 mRNA molecules in total), this can lead to the misleading that the gene X is downregulated in cell type B. Indeed, existing normalization methods for transcriptome profiling such as the UQ, TMM, and RLE cannot correctly identify transcriptomes in which the overall transcriptional activity is suppressed or enhanced (Anders and Huber 2010; Bullard et al. 2010; Robinson and Oshlack 2010). Normalizing mRNA abundance also becomes relevant to analyzing the transcriptomes from cells of heterogeneous populations such as peripheral blood mononuclear cells (PBMCs) by employing a deconvolution method. Deconvolution processing computationally estimates the proportions of distinctive cell types in a heterogeneous sample utilizing the normalized abundance of mRNA in each cell type as references It is an effective solution to determine the composition of each immune cell type in PBMCs (Abbas et al. 2009; Shen-Orr and Gaujoux 2013). Considering that the proportion of immune cell subsets in PBMCs can be dynamically affected by the disease, age, or interventions (e.g., vaccines and drugs), the composition of immune cell populations needs to be carefully evaluated. Otherwise, it is not always possible to accurately determine which immune cell types are responsible for any given transcriptomic changes in PBMCs. The transcriptome profiling can contribute to the results that are inconclusive or difficult to interpret. Hence, the appropriate normalization method is crucial for differential expression analyses and deconvolution approaches. Monaco et al. developed an advanced and robust normalization method that can be applied for future transcriptome analyses of PBMCs by taking advantage of the breadth and granularity of the datasets from 29 isolated immune cell types.

In summary, transcriptome analyses of isolated immune cells from peripheral blood elucidate individual immune cell population’ divergent gene expression patterns, which promote our understanding of diseases related to the immune system. The transcriptome analyses of isolated immune cells are also critical as the resource for analyzing transcriptomes obtained from whole peripheral blood. Furthermore, large transcriptomic studies of isolated immune cells provide opportunities to develop and validate analysis pipelines which would be impractical from heterogeneous samples.

Another point to mention here is that the immune cells have been exploited to investigate the regulatory mechanisms of gene expressions. The immune system serves as an excellent model to explore the gene regulations along changing cell states, as discrete cell populations can be readily purified by well-established markers along differentiation and activation pathways that have been carefully characterized by persuasive studies. We took advantage of the breadth and granularity of immune cells to study the dynamic epigenetic landscapes associated with the target gene expression (Yoshida et al. 2019). The study provided a deep insight to understand immunological differentiation and function and the broad relevance of gene regulatory elements on the genome, such as a profound dichotomy within mammalian gene regulation by enhancers and promoters.

3 Transcriptomes Analyses Employing Whole Blood and PBMC

Blood is an invaluable source to examine our health not only because of the easy accessibility and minimal invasiveness during sampling but also because of the breadth of information it can provide (Sohn 2017). Transcriptomes in PBMCs have also been investigated intensively for scientific research (Corkum et al. 2015; Mello et al. 2012), as well as in medical contexts such as ischemic stroke (Baird et al. 2015), ulcerative colitis (Miao et al. 2013), epilepsy (Karsten et al. 2011), and sepsis (Davenport et al. 2016) to characterize diseases, and epidemiological contexts including aging (Peters et al. 2015), obesity (Homuth et al. 2015), and lifestyle factors such as smoking, drinking, and nutrition (Burton et al. 2018; Dumeaux et al. 2010).

Since PBMCs can include variable naïve and activated immune cells recirculating throughout the body, PBMCs transcriptome analyses are expected to promote the characterization of the whole immune system. However, due to the heterogeneity and dynamics of the components of immune cell types in PMBCs, cell population-level resolution is not successfully achieved so far even with the cutting-edge approaches such as deconvolutions mentioned and thus straight immunological interpretations (e.g., a specific immune cell population is enlarged in donor A than donor B, or a set of genes are more activated in immune cell population X in donor A than B) are readily possible. Accordingly, different approaches employing systems biology are preferentially applied for analyzing transcriptomes from PBMCs (Chaussabel 2015).

Systems biology is an approach in the biomedical research field to understand the larger picture hidden in the biological system by putting pieces of information from the system together. A hypothesis being constructed based on all observed parameters associated with a given biological system, systems biology is compatible with high throughput technologies called “omics” such as genomics, transcriptomics, proteomics, and metabolomics by which a biology system is comprehensively profiled (Aizat et al. 2018; Veenstra 2021). In omics, the parameters are not chosen in advance like in more traditional assays, and these approaches are inherently unbiased. Importantly, as the potency of systems biology intrinsically relies on the variability of observed parameters , the size and heterogeneity of a dataset are crucial for the analyses employing systems biology, and thus more informative results can be expected from the larger dataset (Koumakis 2020; McCue and McCoy 2017; Qin et al. 2015). Schmidt et al. reported the analysis of blood transcriptomes of 3,388 adult individuals (mean age = 58 years), together with phenotypic attributes including disease history, medication status, lifestyle factors, and body mass index (BMI) (Schmidt et al. 2020). Although there were preceding studies analyzing blood transcriptomics, studies were composed of relatively smaller sample sizes related to specific diseases, which restricted the analytical power due to the limited variability in the transcriptomic states and health conditions. Schmidt et al. demonstrated the diversity of blood transcriptomes with modules of co-expressed genes linking to different biological functions. They visualized the molecular heterogeneity of transcriptomes combining with different phenotypic statuses by employing state-of-the-art machine learning methods. The results include two major transcriptomic types, one relating to inflammation enhanced in male, elderly, and overweighted people, and the other one to activated immune responses in female, younger, and ordinary weighted people. They also found that transcriptome signatures are associated with immune response and the increase of inflammatory processes are shared among multiple diseases, aging, and obesity, indicating common underlying mechanisms.

Together, transcriptome analyses employing blood or PBMC is not straightforward to elucidate biological processes at the cell population level and characterize specific immune processes. However, they can provide an unprecedented opportunity to evaluate various diseases and lifestyle factors. They will be applicable for medical diagnostics and molecular and epidemiological research, which will contribute to the promotion of the personalized medicine .

4 Transcriptome Analyses: From Bulk to Single Cells

As mentioned earlier , transcriptome analyses using isolated immune cells as well as blood cells are beneficial to promote our understanding of the immune system by shedding light on disease pathogenesis and global immunity. However, as these are averaged profiles of immune cells and the transcriptomes of minor cells are masked by other major cells, it is not feasible to detect its relevance if rare subsets of cells are responsible for an immune phenotype. The heterogeneity is evident in blood cells and PMBC. Still, FACS-isolated cells according to their markers can also be heterogeneous because immune cell types are too heterogeneous to be entirely separated by known markers. Furthermore, immune cells can be activated by various stimuli such as pathogens and secreted proteins from other cell types (i.e., cytokines) temporarily in an unsynchronized manner. The heterogeneity should also be considered when rare subsets of cells (e.g., antigen-specific T cells or B cells) drive the immune responses by temporal activation (Chattopadhyay et al. 2014; Mostafavi et al. 2016). Hence, single-cell analysis is most anticipated when seeking rare distinctive subsets of cells relating to biological outcomes, for example, when rare cells are essential for conferring protection or inducing pathologic status.

5 Recent Development of Single-Cell Transcriptomics

Before the transcriptome analyses from single cells became possible, cDNA synthesis and amplification from a single cell were first succeeded by Iscove in 1990 and Coleman in 1992 (Brady et al. 1990; Eberwine et al. 1992). The cDNA was analyzed using DNA microarrays in the early 2000s and subsequently combined with next-generation sequencing (NGS) technology for single-cell RNA sequencing (scRNA-seq) around 2010 (Bengtsson et al. 2005; Islam et al. 2011; Klein et al. 2002; Kurimoto et al. 2006; Tang et al. 2009). There have been various scRNA-seq methods developed ranging from relatively lower throughput but more detailed full-length transcriptomic data from individual cells to higher throughput with focused coverage on the 3′ terminal of the transcript (Jaitin et al. 2014; Klein et al. 2015; Macosko et al. 2015; Picelli et al. 2013). Currently, scRNA-seq employing a commercial kit from 10× Genomics (Pleasanton, CA) is presumably the most popular. They allow us to profile up to 10,000 cells at a time, and have been used in more than 1,000 publications (Daniloski et al. 2021; Stewart et al. 2020).

6 Applying scRNA-seq for Immune Cells

The heterogeneity in immune cells mirrors the unusual flexibility of the immune system and is essential to protect our bodies efficiently from diverse pathogens. It was recognized in the 1970s and successively confirmed along with the identifications of the cluster of differentiation (CD) antigens using monoclonal antibodies (Engel et al. 2015; Talal 1973). For example, a type of T cells marked by CD4 glycoprotein molecule on the cell surface was identified around 1980. Then subtypes including Th1, Th2, Th17, and the regulatory T cells (Tregs) were identified later (Engleman et al. 1981; Harrington et al. 2005; Mosmann et al. 1986; Park et al. 2005; Sakaguchi et al. 1995). However, given that these distinctions between subtypes are defined by the expression of a few specific markers, these classifications might be a very simplified categorization. Indeed, Teichmann and colleagues demonstrated a subpopulation in Th2 cells which produces the steroid pregnenolone by employing the scRNA-seq approach (Mahata et al. 2014). Importantly, as the comprehensive transcriptome analysis was accomplished by scRNA-seq, they could identify co-regulated genes in the subpopulation, which facilitated the characterization of the cells. Shalek et al. also reported the transcriptomic heterogeneity within bone-marrow-derived dendritic cells (BMDCs) which were seemingly homogenous using scRNA-seq (Shalek et al. 2013). They found hundreds of key immune genes, including genes very highly expressed at the population level, are bimodally expressed across cells. While these pioneering researches employed mouse cells, scRNA-seq approaches were also effectively applied to human cells later .

Karamitros et al. employed scRNA-seq to investigate the transcriptomic differences between progenitor populations in human cord blood (i.e., lymphoid-primed multipotential progenitors: LMPPs, granulocyte-macrophage progenitors: GMPs and multi-lymphoid progenitors: MLPs which were FACS-isolated according to known markers) (Karamitros et al. 2018). They revealed these progenitors were transcriptionally distinct and heterogeneous at the single-cell level, with cells from different progenitor populations showing a transcriptional continuum. Combining with the results from functional assays , they argued a continuum of progenitors executed lymphoid and myeloid differentiation, rather than progenitors downstream of stem cells are uni-lineage. Considering that functional assays can only demonstrate the potential rather than actual cell fate in vivo, and a failure to display functional potential might reflect the assay’s problem, transcriptome analysis adequately contributed to declaring progenitor’s fate in vivo. Recently, Xie et al. profiled 7,551 human blood cells isolated from 21 healthy donors (Xie et al. 2021). They isolated 32 immunophenotypic cell types by FACS and measured transcriptomes in single cells by scRNA-seq. These cells include hematopoietic stem cells, progenitors , and mature immune cells, representing the whole-blood system. The transcriptomic profiles from these 7,551 cells constitute a comprehensive atlas for hematopoietic cells at single-cell resolution. Besides they identified putative long non-coding RNAs (lncRNAs) and transcription factors regulating the differentiation of immune cells, the atlas is also valuable as a resource. It will be utilized by the community to understand the transcriptomic regulations underlying hematopoiesis and immune cell differentiation.

7 scRNA-seq Analysis in Diseases

Measuring the transcriptomes at single-cell resolution by scRNA-seq is innovating our understanding of immune cells in a physiological setting, as mentioned above. In addition, this approach has afforded new options to study the immune response in pathological conditions. What types of cells are responsible for the dysregulated immune response in diseases? By employing comprehensive transcriptome profiling at single-cell resolution, it is possible to examine whether new pathogenic cell subsets developed in disease and the expansion (or contraction) of physiological cell subsets are accompanied. For example, Golumbeanu et al. employed the scRNA-seq approach for dissecting HIV-infected primary CD4+ T cells (Golumbeanu et al. 2018). HIV can persist in latently infected cells despite the effective treatments, which hampers HIV eradication. Hence strategies so-called “shock and kill“ have been developed aiming at reactivating HIV production from the latent cells, so as these cells will die due to virus-mediated cytotoxicity and be killed by cytotoxic CD8+ T cells. However, reactivations of HIV expression are limited to a fraction of latent cells, and the heterogeneity of latently infected cells was suggested. Golumbeanu et al. identified two major cell subpopulations characterized by a set of 134 differentially expressed genes (DEGs) by employing scRNA-seq. Gene ontology analysis revealed enrichment of viral processes, translational regulation, RNA and protein metabolism as well as cell activation genes among these DEGs, which indicates different HIV reactivation potentials for each cluster. They argue that these DEGs are valuable to facilitate the identification of successful reactivations and to identify potential biomarkers of inducible cells.

The composition of the tumor microenvironment (TME) is known to affect the prognosis of cancer patients. For example, higher infiltrates of cytotoxic and memory CD8+ T cells, Th1 CD4+ cells, and NK cells are usually associated with better outcomes, whereas Th2 and Th17 CD4+ cells and Tregs with poor prognosis in several cancers (Fridman et al. 2012). Indeed, while immunotherapies for lung cancer can significantly improve the prognosis for patients, their efficacy varies and depends on in part the number and properties of tumor-infiltrating T cells. Guo et al. investigated the heterogeneity within the tumor-infiltrating T cell by scRNA-seq (Guo et al. 2018). They performed scRNA-seq for 12,346 T cells from 14 untreated non-small-cell lung cancer (NSCLC) patients to comprehensively understand the infiltrating T cells regarding composition, lineage and functional status, and demonstrated the heterogeneity within exhausted CD8+ T cells and Tregs. T-cell exhaustion was originally identified in mice during chronic infection and was later observed in cancer patients (Jiang et al. 2015; Pauken et al. 2016). Exhausted T cells in TME are hyporesponsive states expressing increased inhibitory receptors and decreased effector cytokines, which provoke the failure of cancer elimination. Reinvigorating T-cell exhaustion by such as anti-CTLA-4 (ipilimumab) and anti-PD-1 (nivolumab and pembrolizumab) represents a promising strategy to treat cancer. Since scRNA-seq facilitates trajectory inference or so-called pseudo-time ordering which estimates the cellular identity along with a consecutive differentiation without prior knowledge, they could analyze CD8+ T cells undergoing exhaustion in TME and anticipate two clusters of cells preceding exhaustion, including their transcriptome signatures (Saelens et al. 2019). They employed the transcriptome datasets from TCGA LUAD (The Cancer Genome Atlas Lung Adenocarcinoma) and demonstrated that a high ratio of pre-exhausted to exhausted T cells was associated with a better prognosis. Furthermore, they identified heterogeneity within Tregs in TME, marked by the bimodal expression pattern of TNFRSF9 which is a known activation marker for Tregs. They found a set of 260 genes, including REL and LAYN which are associated with immunosuppressive functions, are highly expressed in TNFRSF9+ Tregs compared to TNFRSF9 Tregs. Importantly, survival analysis employing the TCGA LUAD dataset indicated that higher expressions of these 260 genes were predictive of a worse prognosis. These results represent the efficacy of an approach using scRNA-seq to reveal the heterogeneity in immune cell populations and identify potential clinical biomarkers.

Compared with bulk RNA-seq, scRNA-seq detects the transcriptome nuance in single cells that contribute to revealing the heterogeneity in a seemingly single population. With state-of-the-art machine learning and big data analytics, scRNA-seq has been becoming valuable to identify unknown subpopulations and their transcriptome signatures that affect the biological process and disease diagnosis. However, it is worth noting that scRNA-seq also has limitations compared with bulk RNA-seq, which include relatively low sensitivity, the bias of the transcriptome coverage, and overall cost. Hence, we anticipate that bulk RNA-seq will not lose its value.

8 The Paradigm of Self vs. Non-self from a Transcriptomic Viewpoint

Heterogeneity in the immune cells includes diversity at the DNA level besides the RNA and protein levels which establish the heterogeneity on the population level as discussed. At the DNA level, the number of T-cell receptors (TCRs) and the B-cell receptors (BCR) are estimated to be in the order of 107 whereas the human genome contains roughly 30,000 genes (Fugmann et al. 2000; Nikolich-Zugich et al. 2004). These are produced by somatic DNA recombination called V(D)J recombination in developing lymphocytes during the early stages of T and B cell differentiation. The exceptional divergency endows the immune system with potent effector mechanisms to destroy and eliminate a broad range of pathogenic microorganisms. As the recombination is nearly random, which appropriate to achieve the reactivity to targets of essentially unlimited diversities, it also causes the possibility of self-reactivity at the same time. Therefore, it is critical for the immune system to have mechanisms discriminating self from non-self to avoid destroying the host’s own tissues. The capability of the immune system to avoid damaging the host’s tissues is known as self-tolerance . As the failure of self-tolerance is associated with various autoimmune diseases, this mechanism has been broadly studied in immunology (Besnard et al. 2021; Klein et al. 2014; Sakaguchi et al. 2020).

One of the pivotal roles of T cells is to recognize and kill host cells infected by microbes which otherwise serve as factories for producing replicated microbes. This is managed by a mechanism where infected cells present a molecular complex of microbe antigens and Major Histocompatibility Complex (MHC) class I molecules on the cell surface, which are recognized and killed by T cells with a compatible TCR . As MHC molecules also present normal self-peptides on the cell surface, it is crucial for T cells to maintain self-tolerance. Negative selection of self-reactive T cells is an important process in the thymus where developing T cells of self-reactivity are eliminated if their TCRs react to self-peptides on MHC molecules . Intriguingly, essentially all protein-coding genes are expressed in sets of cells in the thymus. Negative selection functions effectively and comprehensively in the thymus where thymic epithelial cells (TECs) play a pivotal role. In the final section, we discuss how the establishment of self-tolerance in the thymus has been studied using transcriptomic data obtained by novel technologies.

8.1 Thymic Epithelial Cells (TECs) in the Thymic Stroma

The thymus is a highly specialized organ for the establishment of self-tolerance, which is characterized by the “education” of immature T cells. Thymus’ key function is to provide diverse competent T cells that can recognize and eliminate foreign antigens, while they are tolerant to self-components. This complicated process is mainly orchestrated by TECs that form reticular structures in the thymus. TECs are divided into two major subsets by their localization, molecular characteristics, and functions: cortical TECs (cTECs) and medullary TECs (mTECs). Specifically, cTECs are responsible for T-cell lineage commitment and positive selection, while mTECs contribute to the negative selection of self-reactive T cells and/or their cell-fate diversion into Treg lineages (Kyewski and Klein 2006; Matsumoto et al. 2019). These incomparable roles in mTECs are achieved by the expression and presentation of diverse self-antigens complexed with MHC molecule on the surface of mTECs. Notably, to effectively screen for considerable self-reactive thymocyte clones, mTECs are equipped with a unique capacity to express almost 90% of the coding genome, including thousands of tissue-restricted antigens (TRAs) (Kadouri et al. 2020). As expected, the impairment of this “central tolerance” machinery can result in various autoimmune diseases. However, most autoimmune diseases are multifactorial , making it difficult to elucidate their pathogenesis. In this regard, autoimmune regulator (AIRE) and forkhead box P3 (FOXP3), both of which work as transcription factors, are very characteristic genes that cause severe autoimmunity by a single gene mutation. Considering its intimacy with TECs, we focus on and review Aire, an intriguing transcriptional regulator .

8.2 Aire in mTEC

The human AIRE gene was first cloned as the causative gene for autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED) (Finnish-German APECED Consortium 1997; Nagamine et al. 1997). APECED shows autosomal recessive inheritance and patients have been preferentially reported in certain populations such as Finns, Norwegians, Sardinians, and Iranian Jews (Myhre et al. 2001). The human AIRE gene is composed of 14 exons and is located in the region q22.3 of chromosome 21, encoding a 545 amino-acid protein with a molecular weight of 57.5 kDa (Pitkanen et al. 2000). Importantly, Aire is almost exclusively expressed in mTECs in the thymus.

APECED patients’ symptoms are characterized by a variable combination of (i) failure of the endocrine organs, (ii) chronic mucocutaneous candidiasis, and (iii) dystrophy of the ectoderm-derived tissues (Ahonen et al. 1990). The “hypoparathyroidism,” “adrenal insufficiency (Addison disease),” and “chronic mucocutaneous candidiasis” are regarded as the triad of APECED. Notably, APECED patients have high levels of serum autoantibodies reacting specifically with components in the affected organs, like antibodies against steroidogenic enzymes of the P450 superfamily (e.g., P450c21 and P450c17) in the adrenal cortex (Peterson and Peltonen 2005). Furthermore, unique neutralizing autoantibodies to type I IFN and Th17-related cytokines are frequently detected in patients and these antibodies had been considered to be responsible for the development of chronic mucocutaneous candidiasis (Kisand et al. 2010; Puel et al. 2010). However, this long-standing hypothesis was recently challenged by another group, arguing that aberrantly enhanced type 1 immunity in the patients promotes candida infection susceptibility (Break et al. 2021).

Following the identification of the human AIRE gene, Aire-knockout (Aire-KO) mice (B6 genetic background) were generated to elucidate the mechanisms underlying the Aire deficiency and breakdown of self-tolerance (Anderson et al. 2002). Although the Aire-KO mice showed a rather milder phenotype than APECED patients, they developed lymphocytic infiltrates in several organs with the production of several autoantibodies. Remarkably, Aire-deficient mTECs showed considerable reduction in TRAs, raising a model wherein Aire functions predominantly as a direct transcriptional activator of TRA genes, and reduced TRAs is the cause of autoimmunity in Aire-KO mice (Anderson et al. 2002). This story seems perspicuous and reasonable, but some questions remain. Kuroda et al. reported that although mRNA levels of α-fodrin in mTECs were not reduced, autoantibodies against this molecule were produced in their Aire-deficient mouse model (Kuroda et al. 2005). Another example came from the Aire-KO mice on NOD (non-obese diabetic) background that developed severe autoimmune pancreatitis attacking acinar cells in parallel with a production of autoantibodies against pancreas-specific protein disulfide isomerase (PDIp), despite that the expression of PDIp was retained in Aire-deficient mTECs (Niki et al. 2006). Although further study is required, it is possible that Aire-dependent TRA reduction may not be the sole factor for the breakdown of self-tolerance in Aire-KO mice. In this regard, the role of Aire in the maturation program of mTECs has been proposed (Matsumoto 2011). Interestingly, each TRA protein is expressed only in a few mTECs, considered to be 1–3% of total mTECs with ordered stochasticity (Derbinski et al. 2008). The complete expression of all TRAs by the total mTEC population must be owing to the summation of mosaic expression of TRAs by individual mTECs (Kadouri et al. 2020).

8.3 The Molecular Function of Aire

Aire protein is localized in the nucleus as the shape of nuclear dots, resembling promyelocytic leukemia (PML) nuclear bodies, but they were revealed largely not to be colocalized (Akiyoshi et al. 2004). Considering its localization and structure, the Aire protein appears to be a putative transcriptional regulator, consisting of two plant homeodomain-type zinc-fingers (PHD-fingers), a DNA-binding domain (SAND), and four nuclear receptor binding LXXLL motifs (Kumar et al. 2001). These structural and functional domains are well conserved across phyla (Saltis et al. 2008). Many studies argue about the transcriptional role in Aire, but several unique features that differ from conventional transcription factors have been reported. Aire is apparently involved in the regulation of its target loci in collaboration with lots of partner proteins, forming a large multimolecular complex (Mathis and Benoist 2009). For example, CREB-binding protein (CBP) was the first identified Aire’s partner (Pitkanen et al. 2000). It has also been reported that Aire recruits p-TEFb for transcriptional elongation of target genes (Oven et al. 2007), followed by a study arguing that bromodomain-containing protein, Brd4, bridges Aire and p-TEFb (Yoshida et al. 2015). Furthermore, a broad screen for Aire-targeted coimmunoprecipitation followed by high-throughput mass spectrometry newly identified putative Aire-interacting proteins involved in multiple biological pathways, including nuclear transport, chromatin structure, binding to the transcription machinery, and pre-mRNA processing (Abramson et al. 2010).

Aire’s extraordinary broad transcriptional effect seems to be achieved by activating ectopic transcription, not through specific recognition of TRA gene promoters or enhancer motif. Instead, Aire appears to bind to the repressive chromatin mark H3K4me0 with its PHD1 finger domain (Koh et al. 2008; Org et al. 2008), and release RNA polymerase II paused just downstream of transcriptional start site (TSS) (Giraud et al. 2012). Moreover, recent bioinformatics revealed that Aire-containing complexes are predominantly located on mTEC super-enhancers, which are chromatin stretches enclosing TSS of Aire-dependent genes (Bansal et al. 2017).

8.4 mTEC Heterogeneity Defined by the Single-Cell Approach

As described above, TECs have been divided into cTECs (EpCAM+Ly51+UEA1) and mTECs (EpCAM+Ly51UEA1+), histologically and cytologically. Referring to their ontogeny, the evidence regarding the bipotent progenitor cells that give rise to both mTEC and cTEC lineages is emerging in the fetal and early neonatal thymus (Bleul et al. 2006; Rossi et al. 2006), characterized by cTEC-like molecular markers (Baik et al. 2013; Ohigashi et al. 2013). In contrast, it is still controversial about the existence and molecular characteristics of corresponding progenitors in the adult thymus (Ulyanchenko et al. 2016; Wong et al. 2014).

Depending on their molecular characteristics, mTECs were previously categorized as mTEClow (AireCD80lowMHC-IIlow) and mTEChigh (Aire+CD80highMHC-IIhigh). “Central tolerance” is primarily achieved by the effective expression and presentation of TRAs from mTEChigh to the developing thymocytes. mTEChigh are differentiated from a part of mTEClow and require RANK and CD40 signals for the development (Akiyama et al. 2008). In comparison with mTEChigh, mTEClow fraction appeared to contain multiple subsets as studied in the past several years: (i) developing stage of mTEC lineage (recently categorized as “Ccl21+ mTEC”) and (ii) terminally differentiated stage of mTECs, called “post-Aire mTEC” (Nishikawa et al. 2010) or “corneocyte-like mTEC ” (Kadouri et al. 2020) (Table 10.1). Post-Aire mTECs lose their nuclei as they form Hassall’s corpuscles. Notably, Aire-deficient mice have reduced numbers of Krt10+ post-Aire mTECs and impaired formation of Hassall’s corpuscles in their thymi, which suggests that Aire may control the differentiation program of mTECs (Matsumoto 2011; Yano et al. 2008).

Table 10.1 mTEC clusters identified by scRNA-seq

Furthermore, recent high-throughput scRNA-seq revealed that TECs, especially mTECs, consist of more heterogeneous groups than previously appreciated (Bornstein et al. 2018; Dhalla et al. 2020; Miller et al. 2018; Miragaia et al. 2018). Bornstein et al. categorized mTECs into four subsets as follows: (i) mTEC I (Ccl21+ mTEC), (ii) mTEC II (previous “mTEChigh”), (iii) mTEC III (previous “post-Aire mTEC” or “corneocyte-like mTEC”), and (iv) a newly identified mTEC IV (called “thymic tuft cells”). The existence of the thymic tuft cells, which are considered to establish an immune microenvironment in the thymus, was simultaneously reported by two groups (Bornstein et al. 2018; Miller et al. 2018). Thymic tuft cells are remarkably similar to peripheral tuft cells existing at mucosal barriers in that they express canonical taste transduction pathway molecules and IL-25, whereas the expression of MHC-II and CD74 is characteristic to thymic tuft cells (Miller et al. 2018). Moreover, Dhalla et al. identified a “proliferating mTEC” cluster that exhibited upregulation of Mki67 with Aire, but its biology is still controversial (Ishikawa et al. 2021).

Recently, two groups have reported scRNA-seq studies focusing on human TECs (Bautista et al. 2021; Park et al. 2020). Importantly, human TECs have been revealed to contain similar subsets to mouse TECs (i.e., mTEC I-IV), and the expression of TRA genes and APECED relevant genes are enriched in the AIRE-expressing mTEChigh cluster (Bautista et al. 2021). Bautista et al. also reported the existence of immature TECs, which express canonical TEC genes but lacking characteristic genes of cTECs and mTECs, from both datasets. Moreover, some unique TEC subsets that are specific to humans were identified. Both groups reported the existence of (i) MYOD1 and MYOG expressing myoid cells, and (ii) NEUROD1, NEUROG, and CHGA expressing neuroendocrine cells. Bautista et al. further identified (iii) SOX10 and MPZ expressing myelin cells. Interestingly, expressions of myasthenia gravis relevant genes (i.e., CHRNA1, TTN, and MUSK) were predominantly found in the myoid, and neuroendocrine subsets (Bautista et al. 2021). It evokes the possibility that these unique AIRE populations also participate in the induction of immune tolerance, while these cells may not directly present antigens due to their low levels of MHC (HLA) expression. In summary, recent transcriptome analysis at single-cell resolution revealed that the thymus orchestrates the establishment of self-tolerance by the coordination of quite heterogenous TEC subsets, collaborating with unique transcriptional machineries .

9 Conclusion

In this chapter, we have described the recent advances in transcriptome analyses especially focusing on the bulk RNA-seq and scRNA-seq approaches that helped our understanding of the immune system more globally. In the last part of the chapter, we touched on how these techniques have now been bringing a new paradigm for self vs. non-self-discrimination in the thymus. The study on Aire deficiency, a monogenic autoimmune disease, has underscored the importance of the advent of new technologies to draw a whole picture of transcriptional control of the immune system. We are hoping that the complete picture of the transcripts of each immune cell type and the integration of this knowledge will pave the way to a comprehensive understanding of the immune system from a novel viewpoint.