Introduction

Compared to the innate immune system, the adaptive immune system is highly diverse and specific. The remarkable diversity and specificity of the adaptive immune system stem from the unique properties of its key players, particularly the T cells. The adaptive immune system can mount a tailored response to a wide range of pathogens due to the diverse repertoire of T-cell receptors (TCRs) expressed on the surface of T cells (Abbas et al. 2018). Each TCR is specifically designed to recognize a particular antigen presented by the major histocompatibility complex (MHC) molecules on the surface of antigen-presenting cells. This paradigm allows T cells to recognize and respond to millions of different viruses, bacteria, and other pathogens effectively. CD4-expressing T cells specifically bind to MHC class II molecules, while CD8-expressing T cells bind to MHC class I molecules, broadening the range of antigens they can detect. Additionally, each TCR is only able to bind to a limited number of unique antigens, ensuring a highly specific response to particular pathogens. To achieve this high level of diversity and specificity, T cells undergo three critical developmental stages in the thymus, where their TCRs are generated through a process of genetic rearrangement. The first stage takes place during early T-cell development, and it is known as the variability, diversity, and joining (VDJ) recombination process. This process ensures the uniqueness of each TCR complex, rendering it exclusive in its ability to bind to a very limited number of antigens (Abbas et al. 2018). The second stage is the positive selection process, and it takes place in the cortex of the thymus. In this process, T cells expressing both CD4 and CD8 markers are filtered based on their affinity to either MHCII or MHCI (Xing and Hogquist 2012; Takaba and Takayanagi 2017). Cells that bind to an antigen or MHC with an appropriate affinity survive, whereas cells that interact with a weaker affinity die by apoptosis. The third stage is the regulatory process of eliminating autoreactive T cells, which is also known as the negative selection process. In this process, T cells are recruited to the medulla of the thymus. This migration process is controlled by the interaction between CCR7 expressed on T cells and its ligands (e.g., CCL19 and CCL21) expressed by medullary thymic epithelial cells (mTECs) and dendritic cells. mTECs express various tissue-restricted antigens (TRA) to allow deletions of T cells specific for antigens that otherwise would only be encountered in the periphery. Almost all T cells that recognize self-antigen—the MHC complexes—are deleted by mTECs and thymic dendritic cells (Xing and Hogquist 2012). T cells expressing a functional TCR without significant reactivity to self-antigens migrate to secondary lymphoid organs (e.g., the spleen and lymph nodes) and circulate throughout the body.

The process of regulating the expression of TRA is controlled by two main proteins, namely Autoimmune Regulator 1 (AIRE1) and Forebrain Embryonic Zinc Finger-Like Protein 2 (FEZF2) (Takaba and Takayanagi 2017). In the mouse thymus, AIRE1 controls 40% of the TRA expression (Peterson et al. 2008; St-Pierre et al. 2015). AIRE1 does not seem to be a transcription factor (Takaba and Takayanagi 2017). Rather, AIRE1 induces the transcription of TRA genes through interactions with various transcriptional factors. These interactions include enrichment of repressive markers in its promoter region (e.g., H3K27) (Takaba and Takayanagi 2017). It also supports transcriptional elongation through facilitating p-TEFb recruitment. Additionally, AIRE1 interacts with BRD4, which is a transcriptional and epigenetic regulator (Yoshida et al. 2015). Similarly, AIRE1 interacts with TOP1 and TOP2, which are known for their role in regulating the topologic states of DNA during transcription (Pommier et al. 2016). AIRE1 expression is controlled by various gene networks, such as the TNFR family network. This network includes RANK, CD40, RANK-ligand, and CD40-ligand, as well as NF-κB pathways (Akiyama et al. 2008). It was also shown that estrogen and androgen perform a crucial role in controlling AIRE1 expression (Dragin et al. 2016). Recently, strong evidence emerged supporting a crucial role of FEZF2 in central tolerance, probably through acting as a transcription factor. A difference in the usage of TCR Vβ chains in CD4 or CD8 T cells between WT and FEZF2-deficient thymus demonstrated that FEZF2 regulates the negative selection of T cells (Takaba et al. 2015). Notably, some of the genes controlled by FEZF2 are not controlled by AIRE1. However, some exceptions exist, such as FABP2, SAA2, and CDKN1C. FEZF2 expression itself is controlled by the LTBR signaling pathway. Whether other genes are fundamental for the process of expression of TRA is not currently known (Takaba and Takayanagi 2017).

The natural history of the process of expressing TRA to eliminate autoreactive T cells remains unclear. Additionally, the evolutionary advantage of having both AIRE1 and FEZF2 control the TRA expression process has not yet been identified. There is compelling evidence suggesting that invertebrates possess effective mechanisms for eliminating self-reactive cells. For instance, studies show that cells in sponges aggressively attack grafts from other sponges, indicating a recognition of non-self-elements (Beck and Habicht 1996). Phagocytosis, a prevalent mechanism in invertebrates, facilitates the elimination of foreign elements and is observed in a wide range of animals, from starfish to humans (Beck and Habicht 1996). Moreover, invertebrates exhibit systems like prophenoloxidase (proPO) and lectins, which bear striking similarities to the vertebrate complement system and antibodies, respectively. Notably, Drosophila has a mechanism to eliminate miss-migrating embryonic cells (Sano et al. 2005). The presence of such selective processes in invertebrates suggests the existence of mechanisms distinguishing self from non-self-proteins, which could potentially involve the expression of specific proteins to target cells reacting to those proteins. Given the differences in structure and function between AIRE1 and FEZF2, the need for more than one mechanism performing the same function is intriguing. FEZF2 belongs to the FEZF family which functions as a transcriptional repressor, and it is known to contain six C2H2 zinc fingers and an EH1 repressor motif (Copley 2005; Shimizu et al. 2010). Mammalian AIRE1 contains four crucial motifs that support its function in controlling the function of various TFs (Takaba and Takayanagi 2017). These motifs are, namely, CARD (caspase recruitment domain), SAND, PHD1 (plant homeodomain 1), and PHD2. There are two opposing arguments concerning the evolutionary relationship between AIRE1 and FEZF2 in the context of TRA regulation. The first argument proposes that AIRE1 and FEZF2 share a common ancestral origin but have diverged over time due to accumulating differences, eventually leading to distinct roles in TRA regulation. On the other hand, the second argument posits that AIRE1 and FEZF2 are not related and are products of different gene families from the outset. Despite their different origins, they have independently evolved to perform similar functions in TRA regulation through convergent evolution. To evaluate these contrasting perspectives, our hypothesis suggests that AIRE1 and FEZF2 are not ancestrally related and have independently evolved to serve similar functions in TRA regulation, which can be attributed to convergent evolution mechanisms rather than a common ancestral pathway.

In this investigation, we studied the evolutionary history of the mechanisms responsible for the regulation of TRA expression, focusing primarily on the roles of AIRE1 and FEZF2. To test our hypothesis, we conducted a range of analyses, including multiple sequence alignment followed by phylogenetic analysis. Additionally, to determine whether the FEZF2 and AIRE1 families were structurally related, we constructed putative homologs of their ancestral sequences using ancestral sequence reconstruction. We also explored whether they evolved under positive selection pressure using Phylogenetic Analysis by Maximum Likelihood (PAML). To compare their functional specificity, we employed type II functional divergence analysis, linear functional motif searches, and gene enrichment analysis for their downstream targets. Furthermore, we utilized non-homology-based artificial intelligence methods to predict the function of AIRE1 and FEZF2 in earlier diverging species. The results of our research allowed us to infer the earliest diverging homologs and the evolutionary history of the regulatory mechanisms responsible for controlling the expression of TRA, while excluding PAML from the analysis.

Materials and Methods

Workflow

To investigate the evolutionary history of the proteins responsible for the negative selection pathway, we divided these proteins into three groups (i) main regulators (AIRE1 and FEZF2), (ii) upstream of the main regulators (e.g., regulators of regulators), and (iii) downstream targets of the regulatory elements (Table 1). For the first group, we drew the phylogenetic history, calculated the ancestral sequence, investigated positive selection, functional divergence, and functional specificity, and then identified functional motifs. For the other two groups, we only investigated the evolutionary history and gene ontology.

Table 1 Investigated genes classified based on their function in the selection process

Database Search

In this study, our primary focus was investigating the evolutionary origins of the FEZF2 and AIRE1 pathways. To provide a comprehensive understanding of these pathways, we thoroughly examined not only FEZF2 and AIRE1 themselves but also their associated regulators and downstream targets (Table 1). Given the diverse nature and extensive evolutionary history of the genes under investigation, we employed a strategy based on sequence alignment of their respective proteins (Mickael et al. 2016). We conducted a detailed analysis of FEZF2 and AIRE1 presence across various taxonomic classes, spanning over 500 million years and encompassing more than 200 protein sequences, including Mammalia, Actinopterygii, Insecta, Arachnida, Gastropoda, Bivalvia, Cephalopoda, Anthozoa, and Placozoa. To maintain consistency and comparability across our analysis, we conducted BLASTP searches utilizing human protein families against the aforementioned proteomes. In order to ascertain robust results, we employed the longest transcript homolog for each species during our investigation (Supplementary file 1). To establish candidate proteins, we set a stringent threshold, only accepting sequences with E values below 1e–10 (Wiemerslage et al. 2016). Additionally, we implemented a filtering step by comparing conserved domains within each identified protein against the query human protein sequences. By adopting this comprehensive approach and standardizing our taxonomic categorization, we aim to provide a clear basis for comparison across different evolutionary time frames. This ensures that our analysis accurately captures the evolutionary trajectories of the FEZF2 and AIRE1 pathways within the context of their respective taxa.

Alignment and Phylogenetic Analysis

The phylogenetic investigation was done in two stages (Kubick et al. 2018; 2021a, b). First, amino acid sequences were aligned using MAFFT by utilizing the iterative refinement method (FFT-NS-i) (Huson and Bryant 2006; Katoh and Standley 2013; Tamura et al. 2013). To ensure the utmost accuracy in capturing the complex evolutionary relationships within our dataset, we harnessed the powerful capabilities of IQ-Tree (Nguyen et al. 2015). This advanced tool enabled us to investigate a diverse range of substitution models, including but not limited to GTR, HKY, and JC. Through the comprehensive analysis offered by IQ-Tree, we employed sophisticated techniques such as the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to rigorously compare and evaluate different models’ performance in representing our data's intricacies. Through IQ-Tree’s meticulous examination, we were able to pinpoint the most suitable model for tree investigated. This selection was grounded in the model’s ability to effectively explain the observed sequence variations and reflect the underlying evolutionary processes accurately.

Positive Selection Analysis

We employed a maximum likelihood approach to explore whether FEZF2 or AIRE1 underwent positive selection during evolution (Kubick et al. 2021a, b). First, we back translated the respective complementary DNAs (cDNAs) using the EMBOSS Backtranseq tool and aligned them based on their codon arrangement (Madeira et al. 2019). Next, we examined patterns of positive selection in both genes using CODEML (PAML, Version 4) (Yang 2007). To investigate selection, we calculated the substitution rate ratio (ω) as the ratio of nonsynonymous (dN) to synonymous (dS) mutations. We conducted three levels of analysis: (i) basic (global) selection, (ii) branch-specific selection, (iii) branch-site-specific selection, and (iv) site-specific selection models (Kubick et al. 2018). Statistical significance was determined through a likelihood ratio test (LRT), calculated using the following equation: \(p{\text{ - value}} = \chi^{{2}} \left( {{2}*\Delta \left( {{\text{ln}}\left( {{\text{LRTmodel}}} \right) - {\text{ln}}\left( {{\text{LRTneutral}}} \right)} \right),{\text{ number of degrees of freedom}}} \right).\).

Functional Divergence Estimation

We conducted an analysis of Type II functional divergence between the FEZF2 and AIRE1 proteins using the DIVERGE software. This analysis aimed to identify any shifts in cluster-specific amino acid properties. Type II functional divergence represents a change in amino acid properties, such as charge, size, or hydropathy (Gu and Velden 2002). In this analysis, AIRE1 was grouped into higher and lower vertebrates, while FEZF2 was classified in both vertebrates and invertebrates (Gu and Velden 2002; Kubick et al. 2018).

Linear Motifs Search

To investigate the distinction in the evolution–function relationship between FEZF2 and AIRE1, we conducted a search for linear motifs within the protein sequences of both. Linear motifs are short sequences of amino acids that serve as potential protein interaction sites. We performed this search using the ELM server (http://elm.eu.org/) with a motif significance threshold set to 100 (Kumar et al. 2019).

Non-Homology Functional Prediction

To substantiate our findings, we utilized two non-homology-based techniques for predicting the functions of FEZF2 and AIRE1. Initially, we employed DeepGO, a method that predicts protein function by leveraging neural networks and gene network linkages (Kulmanov et al. 2018). Additionally, we utilized a weighted K-nearest neighbor classifier as implemented in Pannzer (Törönen et al. 2018).

Functional Ontologies

To complement our study and address the question of why two different genes perform the same function, we employed functional ontologies. We aimed to determine whether the reason behind this redundancy was based on the unique functions of the genes they controlled. To achieve this, we compared the gene enrichment profiles of AIRE1 and FEZF2 downstream targets using three methods. First, we analyzed the microarray gene sets GSE69105 and GSE2585, which consist of groups of WT and knockout mice for AIRE1 and FEZF2, respectively, using GeneSpring©. We considered downstream gene candidates that exhibited a fold change greater than 1.4 and a p-value less than 0.05. Subsequently, we utilized the Gene Ontology (GO) Gorilla server to identify gene enrichment in both datasets. Additionally, we employed the list of downstream targets as input for Metascape©. To support the results of our functional ontology investigation, we examined the expression of FEZF homologs in invertebrates by referencing FlyBase (https://flybase.org/) and WormBase (https://wormbase.org/) (accessed on 21 February 2023).

Results

Evolutionary History of AIRE1 and FEZF2

FEZF2 and AIRE1 differ considerably in their evolutionary history. In addition to FEZF2, the FEZF family possesses another member, namely FEZF1. We only found homologs for FEZF1 in vertebrates (Fig. 1A). Conversely, FEZF2 seems to have homologs in higher vertebrates (mammals and fish) as well as lampreys. Additionally, BLASTP analysis revealed that FEZF2 has multiple homologs in various invertebrates investigated, including Spiralia (Crassostrea virginica, E-value < 3e-112), Cnidaria (Nematostella vectensis, E-value < 1e-110), and Arthropoda (Drosophila melanogaster, E-value < 2e-114). Thus, the FEZF family seems to have undergone one round of duplication that occurred during the Cambrian explosion between 541 and 530 million years ago. FEZF2 from invertebrates are orthologs of FEZF2 from vertebrates. FEZF2 from invertebrates and vertebrates are paralogs of FEZF1 from vertebrates. The duplication event occurred in the branch before all vertebrates (Fig. 1A, branch with bootstrap 76). FEZF2 is most likely the ancestral version of this protein/gene. Interestingly, we were not able to locate homologs for AIRE1 beyond vertebrates. On the structural level, multiple sequence alignment and conserved sequence investigations revealed that both FEZF1 and FEZF2 share a structural domain consisting of six C2H2 zinc finger domains (Fig. 1B). This domain appears to be conserved in all species investigated. In the case of AIRE1, four conserved domains were found among the investigated species, namely the homogeneous staining region (HSR), the SAND domain (Sp100, AIRE1, NucP41/75, and DEAF-1), as well as two Plant Homeodomain (PHD) subunits.

Fig. 1
figure 1

FEZF2 and AIRE1 evolution histories. A The FEZF2 family appears to have homologs as early as Trichoplax adhaerens, while the oldest homolog of the AIRE1 family can be found in bony fish. CD4 protein sequences from humans were used as an outgroup. Internal node labels represent bootstrap values. The putative occurrence of a round of duplication is evident in FEZF2 but not in AIRE1. B On a structural level, AIRE1 contains four main functional domains, namely HSR, SAND, and two PHD units. In contrast, the FEZF family is characterized by having a single conserved domain consisting of six zinc fingers

Evolution of the Process of TRA Expression in the Thymus

The evolutionary history of TRA is intricate. While AIRE1 seem to have first emerged during the divergence of bony fish, its regulators do not share a specific emergence period (Supplementary file 1 and Fig. 2). Notably, RANK first emerged during the divergence of lampreys, and CD40 has homologs in both lampreys and tunicates. Homologs of CD40L, on the other hand, first appeared in fish. Estrogen receptor homologs (ER1 and ER2) are both present in lampreys, but ER1 seems to be more ancient, with a homolog found in Spiralia and in Cnidaria. The downstream targets of AIRE1 also exhibit a diverse evolutionary history. Various genes controlled by AIRE1 first emerged in bony fish (e.g., CAMK2B, C4BP, KRT2) or rodents (e.g., AMPLEX). Similarly, various proteins controlled by AIRE1 have homologs in cnidarians, such as GSTA2 and IGF2 (Supplementary file 1). The evolution of the FEZF2 pathways appears to follow a similar pattern to AIRE1, with no apparent common emergence period. Furthermore, several proteins regulated by FEZF2 appeared in invertebrates (Fig. 2). For example, BHMT has homologs in both Cnidaria and Trichoplax, while F2 and MAOA homologs exist in Cnidaria, and KCNJ5 in Spiralia. Interestingly, known regulators of FEZF2 seem to have emerged much later, with LTA, LTB, and LIGHT homologs first diverging in bony fish, and LTBR homologs first appearing in lampreys.

Fig. 2
figure 2figure 2

Investigating the evolution of AIRE1 regulators, and downstream targets reveals a complex picture. A Several of AIRE1 known regulatory proteins and genes, appear to have earlier homologs. For example, RANK, CD40, and ESR1 all have homologs in Spiralia. Since mollusks lack T cells, this observation indicates that these regulators might have other functions. B We noticed that more than half of the genes regulated by AIRE1 seem to have homologs that predate the earliest divergence of AIRE1 homologs. This observation cannot be extended toward FEZF2-regulated pathways, as FEZF2 possesses homologs as ancient as Trichoplax. Overall, these observations indicate that controlling one gene by another is not necessarily connected to the time of divergence of either of these two genes. Numbers in parentheses are SH-aLRT support (%)/ultrafast bootstrap support (%)

AIRE1 and FEZF2 Families Do Not Share a Recent Common Origin

Regulatory elements controlling the process of TRA expression have a divergent evolutionary history (Table 2). The FEZ family contains two main genes, namely FEZ1 and FEZ2. The nearest homolog for the constructed ancestral sequence of the FEZF family is a protein that contains a zinc finger (C2H2 type) and is expressed in Metschnikowia aff. Pulcherrima (Fig. 1B and Table 2). The nearest homologs to the reconstructed ancestral sequence of the AIRE1 family include three proteins, namely PHD finger protein 12 (Drosophila busckii), E3 ubiquitin-protein ligase TRIM33 like (Actinia tenebrosa), and Chromodomain-helicase-DNA-binding protein 4-like isoform X3 (Orbicella faveolata). We constructed an evolutionary a phylogenetic tree for these sequence. Our results indicate that Chromodomain-helicase-DNA-binding protein 4-like isoform X3 (Orbicella faveolata) could be the nearest homolog of the reconstructed ancestral protein sequence of AIRE1. Taken together, our results do not support the hypothesis that the FEZF family and AIRE1 are evolutionarily related (Fig. 3).

Table 2 Estimated homologs of AIRE1 reconstructed ancestral sequence compared to that of FEZF2
Fig. 3
figure 3

The AIRE1 origin is resolved using evolutionary network analysis. Our analysis indicates that AIRE1’s nearest homolog is CHD4-LIKE (Orbicella Faveolata). Numbers in parentheses are SH-aLRT support (%)/ultrafast bootstrap support (%)

Positive Selection

We conducted a comprehensive analysis of positive selection to assess the evolutionary dynamics of AIRE1 and FEZF2. Notably, our analysis revealed distinct patterns in the conservation of these two genes across various taxa. In the case of FEZF2, we observed a high degree of conservation between vertebrates and invertebrates, as evidenced by an ω value below 1 (0.3, p-value < 0.01) in both the global and branch evolution patterns. Our investigation at the branch-site and site levels, using the phylogenetic tree (Fig. 1), did not uncover any amino acids subject to positive evolution within the vertebrates and invertebrates divide. These results were based on the Bayes Empirical Bayes (BEB) method (Table 3). However, for AIRE1, we detected a more relaxed conservation evolutionary pattern compared to FEZF2. Globally, AIRE1 appeared to have evolved under neutral selection, reflected by an ω value of 1. Similarly, we did not find conclusive evidence of positive selection on the primates branch. Moving to the site-branch analysis, we identified only two amino acids subjected to positive selection: 147 P (probability: 0.998**, BEB) and 315 V (probability: 0.992**, BEB).

Table 3 Positive selection test for the AIRE1 and the FEZ family

Functional Divergence

Overall, we detected a low degree of functional divergence. We utilized DIVERGE 3.0 to identify putative functional divergence type II sites using a cutoff value of 0.5 (Supplementary file 2). We found only two functional divergent sites in the AIRE1 sequence, particularly at site 290 (where higher vertebrates have Histidine, while bony fish have Tyrosine). Additionally, at amino acids 320 and 321, higher vertebrates have HA, while fish have YS. Although the FEZ2 family has diverged much earlier than AIRE1, we were able to locate only one single site that could represent a candidate for functional divergence (site 454). There is a strong variation at this site, with vertebrates having glutamine, insects having threonine, Cnidaria having isoleucine, and Trichoplax having valine. These results suggest that both AIRE1 and FEZ2 exhibit a limited number of functional divergent sites.

Motifs Search

Our linear motif search revealed a complex picture for AIRE1 and FEZF2. We found that AIRE1 contains an LXXLL motif, known to play a role in binding nuclear receptors. Conversely, FEZF2 has several motifs related to autophagy, such as DKFPHP, SYSELWKSSL, and SYSEL, as well as motifs involved in the regulation of actin and cytoskeleton dynamics, such as PPACPR. However, both genes share various similar motifs related to MAPK regulation, such as AGASPAT (see supplementary file 3).

Non-Homolog-Based Functional Ontology of FEZF2 and AIRE1

In order to reinforce our findings, we employed non-homology-based methods for functional ontology prediction of FEZF2 and AIRE1. Specifically, we utilized DeepGO to examine the functional conservation of these proteins (Fig. 4). This approach is AI driven and not reliant on homology. The results revealed a high degree of functional conservation for both FEZF2 and AIRE1 among the species in which they are expressed. This suggests that FEZF2 may serve a similar function in invertebrates.

Fig. 4
figure 4

Non-homology-based Functional Prediction. We employed AI-driven methods to predict functions based on protein sequences. This approach was chosen because homology-based assumptions may not provide a complete representation of the evolutionary history of these distant genes (Bhaumik et al. 2023). Our results indicate that AIRE1 oldest homologs appear in fish, while FEZF2 has homologs in more ancient divergent species, such as Spiralia and Cnidaria

AIRE1 and FEZF2 Have Distinctive Functional Ontologies

We explored the functions of the top genes identified through microarray data analysis (GSE69105 and GSE2585) in GeneSpring, with a fold change > 1.4 and a p-value < 0.05. In these two datasets, FEZF2 and AIRE1 were knocked out in mice mTECs. We employed several gene enrichment approaches, including Gorilla (Supplementary file 4), Gene cards (Supplementary file 5, Table 1), the Shiny server (Supplementary file 1, Table 2), and Metascape analysis (Fig. 5). Our results indicate that FEZF2 and AIRE1 control multiple genes in a complementary manner. It is noteworthy that the knockout of AIRE1 affected the expression of more than 1000 genes, whereas knocking out FEZF2 only affected 100 genes (Supplementary file 5). Importantly, when examining the expression patterns of FEZF2 homologs in invertebrates, we found that one FEZF2 homolog (e.g., erm) is expressed in the male and female germline of Drosophila Melanogaster, as well as in the embryos of both Drosophila and C. elegans.

Fig. 5
figure 5

Functional Ontologies of FEZF2 versus AIRE1 Downstream Targets. A FEZF2 downstream targets exhibit a high degree of interconnectedness and form a single cluster, whereas genes regulated by AIRE1 display greater diversity. B Examination of the expression of a FEZF2 homolog (e.g., erm) in Drosophila melanogaster via scRNA-seq and manual literature curation reveals erm expression in both male and female germlines, as well as in Drosophila embryos (darker color indicates higher expression). C Similar expression patterns are observed in C. elegans embryos

Discussion

FEZF2 is a member of the FEZ family and its closest known ancestor is a protein containing a zinc finger domain of the C2H2 type found in Metschnikowia aff. Pulcherrima (E-value < 9e-32) (Table 2). In mammals, FEZF2 still retains these zinc finger protein domains (Fig. 1B). Interestingly, zinc finger domains are known for their role as DNA-binding motifs in transcription factors, like TFIIIA. They have also been shown to have the ability to interact with DNA, RNA, and proteins (Negi et al. 2008). As a result of various evolutionary processes C2H2 zinc finger domains contain only four highly conserved residues, while the rest of the residues are highly variable (Albà 2017). C2H2 zinc finger domains are extremely diverse, with the ability to recognize the complete range of possible DNA triplets (Albà 2017). Interestingly, MCPIP1, which also contains a zinc finger, plays a role in the elimination of autoantibodies (Dobosz et al. 2021; Rakhra and Rakhra 2021). Taken together, these results confirm our hypothesis that FEZF2 can regulate the expression of a multitude of TRAs and extend it to suggest that zinc finger proteins could play a fundamental role in the process of eliminating harmful self-proteins in an immune context.

The nearest known ancestor for the reconstructed ancestral sequence of AIRE1 appears to be CHD4 LIKE (Orbicella Faveolata) (Fig. 3). CHD4 is a protein involved in chromatin remodeling and belongs to the CHD (Chromodomain-Helicase-DNA binding) family of proteins, which are ATP-dependent chromatin remodeling factors. CHD4 plays a vital role in the regulation of gene expression by modifying chromatin structure to either activate or repress gene transcription. Dysregulation of CHD4 function has been linked to diseases, such as cancer and developmental disorders. Importantly, it has been recently reported that Chd4 might be involved in regulating the thymus’s TRA expression, where it is responsible for organizing the promoter regions of Fezf2-dependent genes and contributing to the AIRE1-mediated induction of self-antigens via super-enhancers (Tomofuji et al. 2020; Benlaribi et al. 2022). These findings support the notion that AIRE1 may regulate transcription factors both directly through protein–protein interactions and indirectly through epigenetic mechanisms. Although these results contradict the hypothesis of a recent common evolutionary origin for the regulation of TRA expression through FEZF2 and AIRE1, nevertheless it indicates that the regulation of these two regulators might have been mediated by a common mechanism.

FEZF2 and AIRE1 have evolved through a finely tuned and controlled evolutionary process. The elimination of autoreactive T cells is a crucial requirement for the body's survival. Our study reveals a significant level of complementarity between FEZF2 and AIRE1 on various levels, including the evolutionary history of their downstream targets and the functional characteristics of these targets. Regarding the evolutionary history of downstream targets, we observed that the earliest homologs of FEZF2 diverged before the emergence of the earliest homologs of AIRE1 (Figs. 1A and 2). However, both AIRE1 and FEZF2 regulate genes that can be classified as ancient, existing in invertebrates. Examples of such genes include BHMT, F2, ITIH3, KCNJ5, LGALS7, MAOA, MYO15B, PLAG1, and ZP2 in the case of FEZF2, and IGF2 and SPT1 in the case of AIRE1 (Supplementary file 1). We also found multiple downstream targets for both pathways, with homologs emerging as late as in bony fish, such as C4BP and KRT2 (Fig. 2). When considering the functional characteristics of the downstream targets, we discovered that both pathways control genes with shared functional annotations, exemplified by genes, like ZP2 and ZP3 (Fig. 4, Supplementary file 2, and Supplementary file 4). However, it is worth noting that based on the analysis of knockout microarray studies (GSE69105 and GSE2585), AIRE1 appears to control the expression of approximately 1000 genes, while FEZF2 regulates around 100 genes (Supplementary file 5). Consequently, genes controlled by AIRE1 exhibit greater diversity (Fig. 4). Overall, our results seem to confirm the convergence of the functions of these two genes. Convergence evolution can be classified into two types: (i) parallel evolution and (ii) collateral evolution based on the existence of identical mutations taking part in independent lineages or the occurrence of a hybridization process between entities, respectively (Stern 2013). As AIRE1 and FEZF2, and possibly other genes, regulate the levels of TRA in mTECs, it is safe to assume that the evolution of AIRE1 and FEZF2 followed a parallel evolution scheme. The main reason behind the need for this particular approach seems to be minimizing collateral damage by reducing possible pleiotropic effects and maximizing adaptation.

Our findings suggest a potential connection between the functions of both FEZF2 and AIRE1 and the autophagy mechanism. Macroautophagy mediates endogenous MHC class II-loading in mTEC (Nedjic et al. 2008; Klein et al. 2014). Notably, an elevated level of AIRE expression in mTECs has been correlated with the enhancement of autophagy processes (Shevyrev et al. 2022). AIRE1’s ability to regulate the expression of various proteins in the gonads has also been well established (Adamson et al. 2004; Radhakrishnan et al. 2016). AIRE1 likely facilitated the auditioning of gonadal and perhaps early embryonic cells (Forsdyke 2020). The gonadal environment is rich in mechanisms that facilitate the selection of offspring. Processes related to ovarian follicle survival and death are under the regulation of autophagy (Yadav et al. 2018). Autophagy involves the modification of internal membranes and an increase in the number of vesicles that engulf both bulk cytoplasm and organelles. The initial stage of autophagy includes the encapsulation of cytoplasmic components within membrane sacs known as autophagosomes. Ovarian follicular atresia results from the degradation of autophagic vesicles by lysosomal enzymes. Therefore, it has been suggested that AIRE1’s role in the thymus may be an adaptation of earlier evolving mechanisms. The apparent conservation of FEZF2 function between vertebrates and invertebrates hints at a potential role for FEZF2 in regulating simpler mechanisms that may have existed in invertebrates (Buss et al. 1985). Our investigations into positive selection using PAML and non-homology-based functional prediction of FEZF2 homologs found in invertebrates indicate that FEZF2 might have the ability to function as a transcription factor (Table 2 and Supplementary 3). Furthermore, FEZF2 has been reported to be expressed in primordial germ cells, which are precursors to embryonic sperm and egg cells (Jean et al. 2015). A FEZF2 homolog is also expressed in the germline and embryos of Drosophila (Figs. 4B and C). Autophagy plays a fundamental role in the Drosophila germline and has been reported in organisms, like Hydra (Chera et al. 2009). Interestingly, FEZF2 contains motifs associated with autophagy regulation, such as DKFPHP, SYSELWKSSL, and SYSEL. These findings suggest the possibility that FEZF2 may contribute to an invertebrate auto-elimination mechanism through the regulation of genes involved in autophagy.

Our research reveals valuable insights into the negative selection of immune cells in lampreys. Lampreys exhibit unique immune cell phenotypes, including VRLA (akin to α/β T cells), VRLB (similar to B cells), and VRLC (resembling γ/δ T cells). These specialized immune cell types have evolved specific mechanisms for immune cell elimination. Notably, lampreys possess a thymoid structure that expresses VRL, indicating the presence of a mechanism to eliminate autoreactive immune cells (Bajoghli et al. 2011). Interestingly, while we found a homolog of FEZF2 (XP_032822733.1) in lampreys, we could not locate a homolog of AIRE1. Moreover, we identified the presence of LTBR (XP_032823234.1), a gene known to regulate FEZF2 expression in lampreys (Takaba et al. 2015). Additionally, we observed the expression of downstream target genes that are regulated by FEZF2 in lampreys. These genes include BHMT, CALCA, COL17A1, CRISP 1, CYP24A1, F2, FABP7, FABP9, GC, ITIH3, KRT10, KCNJ5, LGALS7, LYPD1, MAOA, MUC3, MYO15B, PLAGL1, PLD1, Resp18, and ZP2. In summary, our findings suggest that lampreys may have evolved a mechanism akin to negative selection, potentially mediated by FEZF2, to eliminate autoreactive immune cells.

Conclusion

Our comprehensive analysis has revealed a multifaceted view of the regulation of TRA expression. Despite the absence of a recent common evolutionary origin in the mechanisms governing TRA regulation by FEZF2 and AIRE1, we have observed their co-expression in various contexts beyond the thymus, particularly in the germline. Both pathways appear to contribute to the process of autophagy, an ancient mechanism documented in organisms like Hydra, Drosophila, and Spiralia. Interestingly, while the AIRE1 pathway emerged in bony fish, the FEZF2-mediated pathway displays remarkable conservation in invertebrates. This suggests that Fezf2 may have been one of the ancient genes responsible for facilitating self-elimination, possibly by regulating gene expression related to autophagy. AIRE1, on the other hand, appears to have evolved to support this function in more complex vertebrate gonads and immune systems.