Introduction

The human Y chromosome bears an intriguing evolutionary history. Starting off as an autosome, this chromosome has undergone massive progressive gene decay to form one of the smallest chromosomes in the human genome. Because of this peculiarity, it was assumed that the Y chromosome would have a diminutive role beyond sex determination. Being the bearer of the SRY gene, the master-switch determining maleness, the Y chromosome, was assumed to be exclusively involved in sex determination. Eventually, several coding regions were identified on the human Y chromosome that were shown to be transcribed and their roles in spermatogenesis were defined [1]. However, it is believed that the Y chromosome does not have any other roles beyond sex determination and maintenance of male fertility. Discounting this perception, several studies have now demonstrated a link between the Y chromosome and a wide array of biological processes such as the immune and inflammatory responses, graft versus host diseases, several types of cancer, gender differences in cardiac failure, and sex-specific effects on the brain such as neurological and psychiatric disorders [2, 3]. In this review, we present evidence of the extra-reproductive roles of the Y chromosome genes. Rather than being comprehensive, we will focus on the genes within the long arm of the Y chromosome in the AZF locus which are often deleted in men infertile men [1]. The roles of AZF genes in spermatogenesis and male infertility have been recently reviewed [1, 4]; herein, we will present evidences that are now emerging to show the involvement of these genes beyond fertility regulation.

Gene content on the long arm of the human Y chromosome

Like all other chromosomes in the human genomic complement, the 60Mb-long Y chromosome comprises a short arm (p) and a long arm (q) divided by a centromeric region. However, unlike the other chromosomes, the Y has developed a unique and complex genomic structure because it does not recombine with its homologous X chromosome during male meiosis except at the pseudoautosomal regions (PARs) (Fig. 1a). PAR1 at Yp contains about 2600 kb DNA and PAR2 at Yq contains about 320 kb DNA. The bulk of the Y, excluding the PARs, is called the “non-recombining Y” (NRY) which is composed of the heterochromatic region (24 Mb) and the euchromatic region (30Mb) both of which lie distal to the PAR1. The euchromatic region within the NRY is also referred to as male-specific region on Y (MSY) [4,5,6,7].

Fig. 1
figure 1

Transcribing genes within the AZF loci of the human Y chromosome, their biological processes and molecular functions. A. Transcribing genes in the AZF loci of the human Y chromosome where each block represents the three AZF clusters. The genes in the yellow overlaid block are ubiquitously transcribed while those in the blue block are generally testis-enriched. The associated phenotypic manifestations due to loss or gain of these gene/gene families is depicted by the lilac bars to the extreme left of the image. Tissue expression of the genes has been labelled adjacent to the boxes. B. Biological processes and C. Molecular functions of the transcribing genes within the AZF loci as predicted by Uniprot (https://www.uniprot.org, accessed on 15 Nov 2018). t a. T is depicted. lilac extreme b.c.

The functional involvement of the MSY with male infertility was postulated in 1976 when microscopically visible terminal deletions or microdeletions in the q arm of the Y chromosome were specifically observed in azoospermic males, leading to the preposition of the existence of a locus, called azoospermia factor (AZF) on Yq11. Several other studies confirmed the presence of the AZF and approximately 200 Y-specific sequence tagged sites (STS) were identified in this region subsequently [8, 9]. Soon, the original AZF region was further subdivided into three different sub-regions termed AZFa, AZFb, and AZFc (Fig. 1) and in 2003, Skaletsky et al. defined the first complete sequence of the intricate 23-Mb euchromatic region of a human Y chromosome, from a single male, thus paving the way for renewed research on this chromosome [6, 9, 10]. Overall, the AZF locus is estimated to contain about 244 genes of which 28 are protein coding, 28 are genes that are transcribed but not translated (non-coding RNA) and 188 are pseudogenes (Table 1 Supplementary Data).

The AZF genes

The AZFa locus in proximal Yq11 extends about 1.1 Mb encoding single copy genes which have X chromosome homologs. Overall, AZFa contains fifteen genes of which three are protein coding, DDX3Y, UTY, and USP9Y; one is a testis-specific transcription unit TTY15; and 11 are pseudogenes (Fig. 1a, Table 2 Supplementary Data). The AZFb locus is located in the central region of Yq11 and spans 3.2 Mb containing three single copy regions, a Y chromosome-specific 19 satellite DNA repeat array (DYZ), and 14 multi-copy sequence units called amplicons which are highly identical segmental duplications [1, 10, 11]. The AZFb locus contains a total of 132 genes of which 15 are protein coding, 17 are non-coding RNAs, and 100 are pseudogenes. Amongst the 15 protein coding genes are six copies of RBMY, two copies of PRY, two copies of HSFY, and one copy each of EIF1AY, KDM5D, CDY2A, RPS4Y2, and XKRY (Fig. 1a, Table 2 Supplementary Data). The AZFc locus is distal to AZFb and is the most commonly deleted region in infertile men. This locus is 4.5 Mb long and contains 97 genes. Amongst these, 11 are protein-coding, 10 are non-coding RNAs, and 76 are pseudogenes. Amongst the 11 protein coding genes are four copies of DAZ, three copies of BPY, and two copies each of CDY1 and CSPG4LY along. This region also contains seven testis-specific transcription factors (Table 2 Supplementary Data) [1, 12,13,14].

Y chromosome microdeletions and male infertility

The AZF regions are known to contain several fragile sites that undergo deletions possibly due to errors in self-recombination. These deletions are called the Y chromosome microdeletions (YCMD) and are defined as small sub-microscopic deletions in the proximal Yq that remove entire or parts of the AZF region. The classical YCMD involves loss of the complete AZFa, b, or c locus alone or in combination resulting in AZFa+b, AZFa+c, or AZFb+c or the deletion of the entire AZF locus [1]. A third type of microdeletion called the “partial AZFc deletion” is commonly referred to as gr/gr or AZFc sub-deletions. Several variations of the partial AZFc deletions are reported [1]. The partial AZFc deletions do not completely remove the AZFc locus but reduce the copy number of the genes within the AZFc locus. Typically, the AZFc partial deletions involve loss of two copies of DAZ, one copy of CDY1, and BPY2 [1].

Analysis of the data from 40,000 human Y chromosomes has revealed the presence of YCMD in approximately 7.5% in infertile males worldwide with some variations based on geographic locations [1]. Unlike the complete AZF deletions which are found exclusively in infertile males, the partial AZFc deletions are also common in fertile males but show higher prevalence in infertile males [14]. Analysis of > 17,000 Y chromosomes from fertile and infertile men globally revealed that twice the number of infertile men had partial AZFc deletions as compared to fertile men (odds ratio of 1.8); however, these results were also found to be ethnicity dependent [1]. Interestingly, the partial AZFc deletions are strongly associated with reduced sperm motility and count [15, 16]. These observations suggest that not only the loss of complete AZF locus but also the reduction in gene copy families may also compromise the process of spermatogenesis resulting in male sub-fertility.

Copy number variants (CNVs) in the AZFs genes and male infertility

Beyond YCMD and partial AZFc deletions, studies report that there are gains and losses of individual genes or gene copies within the Y chromosome. Recently, several CNVs ranging from < 1 kb to > 3 Mb in the Y chromosome including the AZF loci are reported [17, 18]. However, amongst the three AZF regions, the AZFc is particularly susceptible to CNVs due to non-allelic homologous recombination. Several ampliconic CNVs are reported in the AZFc locus [17, 19, 20] which are implicated in spermatogenic failure [21]. However, it is challenging to identify the genes responsible for the resulting phenotype of spermatogenic failure because the CNVs involve simultaneous deletion of multiple genes or gene families. Nevertheless, the frequency of DAZ, CDY1, and BPY2 CNVs are significantly higher in infertile men as compared to controls and are associated with a reduction in semen quality [15, 16, 22]. Intriguingly, a significant relationship between the Rbmy CNV with sperm head abnormalities is observed in mice [23]. In Han Chinese males, a positive correlation between the RBMY1 copy dosage and sperm motility has been observed where males with fewer than six copies of RBMY1 have an elevated risk for asthenozoospermia relative to those with six RBMY1 copies [24]. Thus, the loss of AZF loci or CNV in the genes within the AZF loci is strongly associated with male infertility.

AZF genes beyond the testis

From the above discussion, it is clear that the AZF genes play an important role in maintenance of spermatogenesis and their deletion is associated with male infertility. However, the clinical observations that men harboring YCMD are otherwise “apparently healthy” have led to notion that the genes within the AZF loci may not have additional functions. Notwithstanding this notion, advances in technology that aid in detection of low abundance transcripts and the availability of this data from multiple tissues has shown widespread expression of the AZF genes beyond the testis.

Although the human genome project has confirmed the paucity of protein-coding genes within the Y, the discovery of a large number of Y-linked genes being expressed in non-gonadal tissues has brought the Y chromosome to the forefront of research on men’s susceptibility to disease in recent years. To understand the extra-gonadal implications of the AZF genes, we analyzed the tissue distribution of their transcripts, the putative biological processes and molecular functions and the disease associations. In this review, we have limited our analysis to the 29 protein coding genes in the AZF loci (DDX3Y, USP9Y, UTY, XKRY, CDY2A, HSFY1, HSFY2, KDM5D, ELF1AY, RPS4Y2, RBMY1A1, RBMY1B, RBMY1D, RBMY1E, RBMY1F, RBMY1J, PRY, PRY2, BPY2, BPY2B, BPY2C, DAZ1, DAZ2, DAZ3, DAZ4, CDY1, CDY1B, and two copies of CSPG4LY2). Based on current knowledge, the protein-coding genes of the Y chromosome can be classified into two broad categories: those that show ubiquitous expression and those that are testis specific (Fig. 1a). These genes, accordingly, can have either multiple ubiquitous functions or maybe restricted to testis-related functions.

Tissue distribution of the AZF genes

To understand if AZF genes are expressed in tissues beyond the testis, we evaluated their transcript levels in the human protein atlas (HPA) datasets (https://www.proteinatlas.org, accessed in November 2018). For each gene, the reads per kilobase million (RPKM) levels in all the reported tissues were recorded and a heat map was generated using Morpheus (https://software.broadinstitute.org/morpheus/). The results (Fig. 2) demonstrate that approximately 93% of the AZF gene transcripts are enriched in the testis; of the remaining 7%, PRY2 transcripts were not detected in the testis while the remaining showed transcript abundance below 0.01 RPKM. Beyond the testis, it is interesting to note that transcripts for several AZF genes are detected in multiple tissues at varying quantities [25]. Amongst these, the three protein-coding genes of the AZFa locus (USP9Y, DDX3Y, and UTY) are expressed in almost all the tissues and 3/15 protein coding AZFb genes (HSFY, KDM5D, EIF1AY) are also expressed in multiple tissues. However, a many genes within the AZFb and AZFc loci seem to be expressed exclusively in the testis with the exception of RPS4Y2, DAZ, DDX3Y, and CDY2A. Beyond the testis, RPS4Y2 is expressed in the prostate, the DAZ cluster is expressed in the stomach, DDX3Y is expressed in the adrenals, and the CDY2A is expressed in the epididymis. The widespread detection of the transcripts of the AZF genes is also backed by the presence of the corresponding proteins in these tissues (data not shown).

Fig. 2
figure 2

Tissue distribution of the AZF genes. The legend depicts normalized RPKM values from the HPA dataset (source protein atlas). The pale blue color correlates with an RPKM value of 0.01 to 1. Blocks with no color have RPKM values < 0.01

Intriguingly, the relative abundance of AZF gene mRNA in many somatic tissues is almost comparable to that in the testis indicating that these are regulated and not due to leaky expression. This multi-tissue expression of AZF genes is specific and not due to the misalignment of the sequences to their autosomal homologous transcripts as none of these genes was detected in female tissues including the ovary and the reproductive tract. Furthermore, none of the AZF gene transcripts was detected in the adipose tissue, smooth muscles, and the parathyroid gland, irrespective of the sex. The observations that AZF genes are transcribed and translated in multiple tissues beyond the testis led us to postulate that these genes might have functions beyond male reproduction. To further elucidate the biology of these genes beyond reproduction, we scrutinized the putative molecular functions and biological processes of the protein-coding AZF genes.

Biological functions of AZF genes

We accessed UniProt (accessed in November 2018) to identify the putative functions of the protein-coding AZF genes [26]. Figure 1b details the biological and molecular functions of the AZF genes. As expected, most of the genes are involved in spermatogenesis; however, 14% are involved in regulation of gene expression, chromatin organization, regulation of translation and mRNA, and protein synthesis. Data on the molecular functions of these genes is represented in Fig. 1c. Twenty-four percent of genes encode products that are involved in protein–protein interactions, 18% genes are involved in nucleic acid binding and 12% are specifically involved in RNA binding. The other functions of the genes include their roles in histone modifications (methylation and acetylation) and protease activity. These observations indicate many of the AZF genes have broad functions in regulation of physiologically essential processes like gene expression and protein synthesis. Interestingly, the AZF genes involved in regulation of transcription and translation (for example, KDM5D, RBMY, and DAZ) not only have testis-specific targets but also regulate genes that have roles in multiple cellular functions. For example, both mouse and human RBMY regulate splicing of ubiquitously expressed transcription factor SMAD5 and the cell cycle regulators histone-lysine N-methyltransferase (EHMT1) and protein lin-9 homolog (LIN9) [27, 28]. Amongst the other AZF genes involved in regulation of gene expression is the AZFa gene, UTY, which encodes a histone H3 Lys27-specific demethylase. Ubiquitously expressed in multiple human tissues, the Mouse Genome Informatics (MGI) database shows that mice with genetic deletion of Uty have abnormal cardiac development [29]. Furthermore, mice knockout for the X homolog of Uty [Utx] specifically in the hematopoietic stem and progenitor cells show significant alterations in H3K27 acetylation, loss of H3K4me1 modifications, and bidirectional alterations in chromatin accessibility. There is also altered expression of many genes and defective pro-oncogenic molecules E-twenty-six (ETS) and GATA-factor binding in hematopoietic stem and progenitor cells of Utx knockout mice [29]. Together, these data suggest that the Y-linked AZF genes have may have multiple extra-gonadal functions.

Clinical consequences of YCMD beyond infertility

Since most of the AZF genes are not present in mice genome; their extra-gonadal functions are challenging to determine. Presently, our understanding of the non-reproductive functions of AZF genes stem from studies in men with YCMD and genetic profiling of cancer cells that involve CNVs of the AZF genes. Some insights on the extra-gonadal functions of the AZF genes also stem from observations of phenotypes in individuals with deletions/duplications/gains/losses of AZF loci (Table 1). While we agree that most of these are anecdotal evidence, data on phenotypic manifestations of alterations in AZF loci is being crystallized in process of embryonic development, somatic cancers, and neuropsychiatric disorders. Reviewed below are evidence that suggest that alterations in the AZF genes can manifest as various phenotypes of male infertility.

Table 1 Studies that report the extra-gonadal functions of some of the AZF genes

Poor embryo quality

In the era of assisted reproductive technologies (ARTs), azoospermic and oligozoospermic males father children via intracytoplasmic sperm injection (ICSI). Many infertile men undergoing ICSI harbor YCMD and transmit this genetic defect vertically to all their male offspring [41]. Clinical observations have shown that sperm retrieved from males harboring AZF deletions produce viable embryos and result in successful pregnancies [42,43,44]. This has impelled the notion that YCMD may not affect embryonic outcomes. However, as compared azoospermic men without YCMD, lower fertilization rates are noted when sperm from azoospermic AZF-deleted men are used in ART [45, 46]. Some studies also observe an increased number of poor-quality embryos when sperm from fathers harboring YCMD are used [12, 45, 46]; however, this finding remains debatable [47,48,49].

Beyond poor embryo quality, Mateu et al. have reported a high percentage of aneuploidy in embryos when sperm retrieved from males harboring YCMD were used in ICSI [50]. Indeed, analysis of data on men with YCMD and in patients bearing a mosaic 46, XY/45, X karyotype has shown an overall Y chromosomal instability where the YCMD or the partial AZFc deletion get progressively larger and often result in aneuploid cells [42, 51, 52]. However, in most cases, children born via ICSI from fathers affected by YCMD are phenotypically normal; no data exists on the incidence of pregnancy losses occurring due to chromosomal aneuploidies in couples where the male partner has a YCMD. Although preliminary in nature, coupled with the fact that some of the Y-linked genes are expressed in early developing human embryos [53], it may suggest that loss of genes within the AZF loci might have detrimental effects on embryonic development.

AZFc genes in cancer

The Y chromosome has long been suspected as a candidate for gonadal cancers. Individuals with gonadal dysgenesis bearing full or even partial fragments of Y chromosome (even in a few of the cells) have a high risk of developing gonadal tumors (specifically gonadoblastoma) [54,55,56]. Non age-related loss of Y chromosome (LOY) in blood cells has been recently identified to be associated with smoking-related carcinogenesis [57]. In addition, the YCMD and/or CNVs are suggested to be a risk factor for the development of testicular cancer [58]. In a recent study, deletion of UTY with or without mutation in its X homolog is observed in 80% of solid tumors and also in leukemic cell lines of males. Another study [59] has implicated the UTY gene in an important transcriptional regulatory network that controls prostate differentiation.

The KDM5D gene is reported to have a tumor suppressor function in prostate cancer [60]. This gene regulates invasion-associated genes and the loss of KDM5D causes the cell to acquire invasiveness leading to the development of metastasis. Another study suggests that loss of the KDM5D epigenetically modifies histone methylation marks and alters gene expression, resulting in aggressive prostate cancer leading to poor prognosis. KDM5D preferably binds to promoter regions with co-enrichment of the motifs of crucial transcription factors that regulate the cell cycle. Hence, loss of this gene leads to an accelerated entry into the cell cycle and mitotic cycle. Also, loss of KDM5D is associated with stress-induced DNA damage on the serine/threonine protein kinase ATR [35]. Furthermore, KDM5D is known to interact with androgen receptor signaling and hence plays an important role in determining docetaxel sensitivity for the treatment of prostate cancer [34]. A recent study [61] in a large European cohort has identified that the gr/gr AZFc subdeletion is a risk factor for development of testicular germ cell tumours. Moreover this group has found that normozoospermic gr/gr subdeletion cariers harbour a four-fold increased risk to develop the disease.

Beyond these published reports, data in human protein atlas (https://www.proteinatlas.org/, accessed in November 2018) also reveals that expression of AZF genes is altered in cancers. Expression of USP9Y and RPS4Y2 is enhanced in prostate cancer, and expression of HSFY2 is enhanced in gliomas and cancers of the stomach, prostate, and ovary. DDX3Y, UTY, KDM5D, and ELF1AY are overexpressed in several types of cancer such as cancers of the thyroid, lung, liver, pancreas, head and neck colorectal, urothelial, renal, prostate, testis, breast, and skin [26]. Notably, high expression of DDX3Y and KDM5D are favorable prognostic markers of head and neck cancer. While these are limited observational studies, results of large case–control studies are yet awaited; it is becoming apparent that beyond infertility, the YCMD and/or AZF CNVs also may be associated with increased risk of cancers in men.

Neuropsychiatry disorders

From Fig. 1, it is evident that genes of the AZF locus are expressed in the adult brain. All the three AZFa genes and four of the 13 AZFb genes are expressed in the cerebral cortex. While no data exists on neuronal phenotypes of mice lacking these genes, long-term follow-up of 42 Chilean men diagnosed with YCMD showed a higher prevalence of neuropsychiatric disorders [62]. A significantly higher proportion of abnormal height (severe short or extremely tall stature) was observed in men with terminal AZFb+c deletions; 5/42 men (11%) had some forms of neuropsychiatric conditions such as bipolar disorders and severe clinical depression. Clinical histories also documented language delay, attention deficit hyperactivity disorder, and emotional and behavioral problems such as anxiety and social disability. To understand the association of YCMD and neuropsychiatric disorders, we assessed the information in the Decipher database (https://decipher.sanger.ac.uk/browser) which assembles clinical information of Caucasian individuals CNVs of various genes. In all, the databases information can be found on 84 males having CNVs (deletions, duplications, or gains) of AZF genes. Of these, clinical information is available for 71 individuals. We observed that 21/71 males having AZF gene CNVs showed some form of neuropsychiatry concerns such as intellectual disorders and delayed development. These observations along with the data on the Chilean cohort prompt us to suggest that genetic alterations in the genetic loci might predispose these individuals to neuropsychiatric disorders. Although we understand that these are very early observations/case reports, it is vital to recognize this problem. Long-term information on the mental health of males with and without YCMD and/or CNVs of the Y chromosome must be assembled to confirm this potential link.

Conclusion

The human Y chromosome, traditionally considered a genetic wasteland, plays a central role in regulation of spermatogenesis. Data has now emerged demonstrating that the genes of the AZF loci of the Y chromosome are also expressed in somatic tissues and the protein products of these genes have several housekeeping functions including those in regulation of gene transcription and translation. This implies that the AZF genes may have roles beyond the regulation of spermatogenesis. Indeed, clinical analysis of males harboring AZF deletions or AZF gene CNVs and large-scale expression data from human cancers has demonstrated an early association of these genes with early embryonic development, cognitive functions, and predisposition to cancers.

While we agree that such data is preliminary and observational in nature, systematic studies need to be planned to address the genetic alterations in the Y chromosome and occurrence of health risks beyond infertility. Since infertile males have higher incidence of genomic instability on both sex chromosomes, it is imperative to obtain long-term follow-up data on the health status of children born from fathers harboring genetic alterations in the AZF locus. We envisage that detailed clinical analysis paralleled with genetic studies on infertile men and their offspring will afford important insights into the functions of the Y chromosome beyond the testis. This information will provide a different perspective in the area of androgenetics and have implications in devising strategies to improve the overall well-being of infertile males.