Prader-Willi syndrome (PWS) is caused by the loss of expression from an imprinted region on chromosome 15q11.2-q13.1. Most PWS patients (about 60%) have a deletion in the paternal chromosome. The deletion can occur between two proximal breakpoints (BP1, BP2) and a common distal breakpoint (BP3). There are very rare instances of different distal breakpoints BP4 and BP5. Type I deletions occur between BP1 and BP3, while type II deletions are between BP2 and BP3. About 35% of PWS patients have uniparental maternal disomy 15, and the remaining patients have an imprinting defect, typically a deletion or epimutation in the imprinting center [1, 2]. Deletions between BP1 and BP2 affect only nonimprinted genes but cause Burnside-Butler syndrome, characterized by neurological, cognitive, and behavioral problems [3].

Here, the properties and functions of genes between BP1 and BP5 are reviewed (Fig. 2.1 and Table 2.1). These genes contribute collectively to PWS, suggesting that not a single gene is solely responsible for the syndrome.

Fig. 2.1
figure 1

Overview of the genes in the Prader-Willi syndrome region. The genes are abbreviated using the HUGO nomenclature, and the meaning of the abbreviations is explained in Table 2.1. Nonimprinted genes are shown in orange. Maternally imprinted genes are in red, and paternally imprinted genes are in blue. Maternally imprinted genes are expressed from the father’s allele, and paternally imprinted genes are expressed from the mother’s allele. See the text for tissue or cell-type-specific deviation from these predominant imprints. BP: breakpoints; IC: imprinting center; PWS-SRO: PWS smallest region of overlap; AS: Angelman syndrome smallest region of overlap; C and T: centromere and telomere locations. Lines indicate deletion areas for Burnside-Butler syndrome, type I and type II deletions

Table 2.1 Genes in the PWS region between BP1 and BP5. Imprinted indicates the type of imprint; intron indicates whether an intron is present in the gene

Genetic Imprinting

Genomic imprinting is the epigenetic marking of a gene based on the parental origin that results in monoallelic expression as one of the paternal alleles is transcriptionally silenced. Imprinting generates a parental-specific gene expression in diploid cells. Imprinting originated after evolution from egg-laying to live-born mammals having a placenta and is present in eutherian mammals (e.g., humans and mice) and marsupials (e.g., kangaroo) [4].

Similar to all autosomes, the gene region on chromosome 15 responsible for the Prader-Willi syndrome is present in two alleles: the maternal one derived from the mother and the paternal one derived from the father. However, due to imprinting, at least five gene expression units are expressed only from the paternal allele in the brain and two genes are only expressed from the maternal allele (Fig. 2.1).

There are 165 human genes and 197 mouse genes currently known to be imprinted that represent about 1% of all genes [5]. Imprinted genes are predominantly found in the placenta where they regulate fetal growth via their influence on placental development and function [6, 7]. Recently, imprinted genes have been increasingly recognized to function in the nervous system [8]. Imprinting can be tissue-specific: for example, two genes from the PWS region, PWRN1 and NPAP1, show biallelic expression in testis but only paternal allele expression in the brain [9, 10]. UBEA3 is generally expressed only from the maternal allele in brain, but data suggest expression from the paternal allele in mouse glial cells [11, 12].

Imprinted genes have general common features. For example, more than 80% of imprinted genes are localized in 16 clusters containing at least two imprinted genes [13], suggesting that imprinting factors regulate multiple genes in cis [12, 13]. In general, each cluster contains a DNA sequence that is methylated either in oogenesis (maternal imprint) or spermatogenesis (paternal imprint). With one exception, each cluster expresses a long noncoding RNA (lncRNA) that is typically very large in size, for example, Airn is 108 kb, SNURF-SNRPN is more than 550 kb long [14]. At least two imprinted lncRNAs, SNURF-SNRPN and DLK1-MEG3 [15, 16], host C/D box snoRNAs.

Despite these common structural features, the molecular mechanisms governing imprinting are unclear. DNA methylation is the best understood epigenetic mark that represses transcription when present in promoter regions. The imprint is controlled through an imprinting control region (ICR), also named imprinting center (IC), that retains the methylation on one parental allele. The DNA methylation is caused by DNA methylases (DNMTs) and can be removed in the germline via DNA demethylases. In addition to DNA methylation, histone modifications, such as H3K27me3, can contribute to imprinting [17].

Compared with other known imprinted regions, such as the Igf2r (insulin-like growth factor 2 receptor), Kcnq1 (potassium voltage-gated channel subfamily Q member 1), and Dlk1 (delta-like noncanonical notch ligand 1) clusters, the PWS syndrome region is more complicated due to the large number of genes in the cluster (at least seven) and the mixed paternal imprints: most genes are paternally expressed, but two genes (UE3A and ATP10A) are expressed from the maternal allele. The existence of maternal and paternal imprinted genes in close vicinity also implies the presence of boundary elements containing the imprinting signals that have not yet been identified [14].

Genes in the Maternally Imprinted Region, Not Expressed in PWS

Protein Ubiquitination

Three maternally imprinted proteins (MKRN3, MAGEL2, NECDIN), one paternally imprinted protein (UBE3A), and one nonimprinted protein (HERC2) act in ubiquitin ligation (Fig. 2.2b–f). Ubiquitin is a small protein that is named after its ubiquitous expression, and is found in all eukaryotes. It is attached to numerous proteins, which regulates multiple cellular processes, including protein degradation, protein localization, DNA repair, cell cycle progression, transcription, and cell signaling [18]. Ubiquitination is performed by three subsequent enzymatic steps (Fig. 2.2a): first, E1 ubiquitin-activating enzymes catalyze a thioester bond between E1 and ubiquitin; second, ubiquitin is transferred to E2 ubiquitin-conjugating enzymes; and finally, ubiquitin is transferred from E2 ubiquitin-conjugating enzymes to substrates by E3 ubiquitin ligases. The three different enzymatic steps lead to a cascade of ubiquitination targets as humans express two ubiquitin-activating enzymes that act on at least 35 E2 ubiquitin-conjugating enzymes, which subsequently interact with more than 600 ubiquitin ligases. The most abundant class of ubiquitin ligases contains a RING domain (really interesting new gene) that binds to ubiquitin-loaded E2-conjugating enzyme. RING domains bind zinc, but they can also fold into the same structure without Zn coordination, called the U-box (Fig. 2.2a, b) [19]. MKRN3 is a RING domain E3 ubiquitin ligase; UBE3A and HERC2 are also E3 ubiquitin ligases but contain a HECT domain (homologous to E6-AP C terminus) to perform the catalysis and bind E2 ligases with other protein motifs. E3 ubiquitin ligases can be regulated by proteins binding to their RING domain. For example, MAGEL2 and NECDIN bind to RING domains via their MHD domain (MAGE homology domain).

Fig. 2.2
figure 2

Protein ubiquitination. (a) Ubiquitination pathway: E1 ubiquitin-activating enzymes bind ubiquitin while hydrolyzing ATP; ubiquitin is then transferred onto E2 ubiquitin-conjugating enzymes; the E2 ubiquitin-conjugating enzymes bind to E3 ubiquitin ligases that transfer ubiquitin to its final substrate. Ub: ubiquitin; RING: really interesting gene. (b) Schematic structure of the MKRN3 protein. C3H: C3H-type ring finger; CH: Macronin-type Cys-His domain, RING domain [23]. (c) Schematic structure of the MAGEL2 protein. Proline-rich: proline-rich domain; U7BS: USP7 binding site (ubiquitin-specific peptidase, removes ubiquitin from targets); MHD: MAGE homology domain [171]. SYS is the hotspot for Schaaf-Young syndrome mutations [38]; there are additional SYS mutations in the MHD. (d) Schematic structure of NECDIN. MHD: MAGE homology domain [43]. (e) Schematic structure of UBE3A. (f) Schematic structure of HERC2 [164]

MKRN3

The MKRN3 gene, initially called ZNF127 [20], is imprinted, and only the paternal allele generates a mRNA and protein. MKRN3 encodes an E3 ubiquitin ligase, which adds ubiquitin moieties onto substrate proteins [21]. MKRN3 stands for Makorin ring finger protein 3. It is part of the Makorin protein family that derives its name from Makor (Hebrew for source) after a novel by Michener [22]. The MKRN3 protein derives from a short pre-mRNA-lacking introns and is thus considered an intronless gene. However, nonprotein-coding isoforms that show intron removal exist. Its 3’UTR overlaps with an antisense RNA. The protein contains four zinc fingers, three of the C3H type and one RING finger in addition to an MKRN3-specific Cys-His domain [23] (Fig. 2.2b).

Mutations in MKRN3 cause central precocious puberty (CPP) [24]. CPP generates puberty before the age of 8 years in girls and 9 years in boys [25] and is caused by the early reactivation of the hypothalamic-pituitary-gonadal axis. Currently, 39 inactivating mutations in the coding sequence of MKRN3 have been described, including 4 nonsense, 13 frameshift, and 22 missense mutations. These mutations are almost all within the zinc finger domains [26].

Due to the imprinting and paternal allele expression, family members inherited their mutations from their fathers [24]. MKRN3 is expressed ubiquitously, but its effect on puberty onset originates from the hypothalamus, especially from the KISS1-positive neurons in the arcuate nucleus and ventromedial nucleus (VMN) of the hypothalamus. MKRN3 suppresses puberty by inhibiting the promoters of KISS1 (KiSS-1 metastasis suppressor) and TAC3 (tachykinin precursor 3), which reduces their transcription. MKRN3 expression drops before the onset of puberty, resulting in an increase of KISS1 and TAC3 expression. This increase of KISS1 and TAC3 results in the secretion of GnRH (gonadotrophin-releasing hormone), which initiates puberty. The inhibiting activity of MKRN3 is dependent on its ubiquitin ligase activity, which explains why MKRN3 mutations located in the RING domain lead to an early onset of puberty as they antagonize MKRN3 activity on KISS1 and TAC3 [21] (Fig. 2.3).

Fig. 2.3
figure 3

MKRN3 regulates puberty. MKRN3 is expressed in the hypothalamus where it blocks transcription of KISS and TAC3 (tachykinin precursor 3) mRNA via an unknown mechanism that requires MKRN3’s ubiquitination activity. At the onset of puberty, MKRN3 mRNA and protein are strongly reduced in the hypothalamus, leading to the transcription of KISS1 and TAC3, which results in the release of GnRH (gonadotrophin-releasing hormone) that initiates puberty

It is not clear whether the loss of MKRN3 in PWS contributes to the disorders of puberty development in PWS [27].

MAGEL2

MAGEL2 is an abbreviation for MAGE family member L2, and MAGE is an acronym for melanoma antigen family. MAGEL2 is imprinted in the brain and is only expressed from the paternal allele. MAGE family members were originally identified as tumor antigens. The MAGE protein family contains 40 members generated through gene duplications in placental animals. Most MAGE genes are located on the X-chromosomes, that is, show monoallelic expression in males and likely females due to X-chromosome inactivation. MAGE genes are localized in clusters, termed MAGE-A, -B, -C, D, E, G, H, and in individual members F1 and L2, as well as the individual member Necdin. In contrast to their ancestral genes in nonplacental species, numerous MAGE genes, including the members of the MAGEA, -B, and -C clusters, MAGEL2, and NECDIN, are intronless [28]. All MAGE proteins contain a MAGE homology (MHD) domain that mediates protein-protein interactions. MAGE proteins can associate with E3 ubiquitin ligases to form MAGE-RING-ligases, which alters the E3 ubiquitin ligase activity, its substrate specificity, and subcellular localization [29, 30] (Figs. 2.2c and 2.4a). Human MAGEL2 can be detected in most tissues, but it is most abundant in the brain, especially in the hypothalamus, nucleus accumbens, and pituitary [31].

Fig. 2.4
figure 4

Function of MAGEL2. (a) Effect of MAGE proteins on E3 ubiquitin ligases. MAGE protein binds to ubiquitin ligases and can increase their activity or change their substrate specificity and location. (b) Membranes from the endoplasmic reticulum (ER) form the trans-Golgi network (TGN) from which endosomes are formed. (c) MAGEL2 promotes the ubiquitination of WASHC1 (W) through binding to the E3 ubiquitin ligase TRIM27. Ubiquitination of WASHC1 initiates actin polymerization that sends endosomes to the plasma membrane. (d) In the absence of MAGEL2, WASHC1 is not ubiquitinated, resulting in endosomes being sent to lysosomes where their content is degraded

MAGEL2 binds to the E3 RING ubiquitin ligase TRIM27. TRIM27 is a member of the tripartite motif family, characterized by the TRIM motif that contains three domains: a RING domain, B box zinc finger domains, and a coiled-coil region. TRIM27 is found throughout the cell and regulates transcription in the nucleus [32]. MAGEL2 and TRIM27 ubiquitinate WASHC1, which is part of the WASH protein complex that compartmentalizes endosomes by initiating actin polymerization. WASHC1 stands for Wiskott-Aldrich syndrome homolog 1. Endosomes are sorting organelles that transport proteins from the plasma membrane to the trans-Golgi network [33]. Through endosomal compartmentalization, proteins are sorted, i.e. they are either recycled to the plasma membrane or send to lysosomes for degradation. The WASHC1 protein is involved in this sorting process as it recruits a protein complex that initiates actin polymerization [34]. MAGEL2 is recruited to endosomes and promotes ubiquitination of WASHC1 by TRIM27, and this ubiquitination promotes endosomal actin polymerization [35] (Fig. 2.4b–d).

Secretory granules are vesicles generated from the trans-Golgi network that can be rapidly released upon a stimulus [36]. A proteomic study showed that the loss of MAGEL2 found in PWS reduces the amount of secretory granules in human and mouse neurons. This reduction is caused by an increase in lysosomal degradation. The increased lysosomal degradation is likely due to the sorting defect of endosomes, where the endosomal sorting protein WASHC1 is no longer ubiquitinated, resulting in endosomes sent to lysosomes [37]. Through this mechanism, the loss of MAGEL2 could contribute to PWS by reducing neuroendocrine secretion in the hypothalamus.

Nonsense mutations of MAGEL2 result in Schaaf-Yang syndrome (SYS) with 42 of the currently known 78 nonsense mutations clustered in only six nucleotides in the middle of MAGEL2 [38]. This leads to a loss of the MAGE homology domain (MHD). At early ages, Schaaf-Yang syndrome shares several symptoms with PWS, such as developmental delay, neonatal hypotonia, poor suck, and excessive weight gain [38, 39]. However, during adolescence Schaaf-Young syndrome becomes more distinct from PWS.

Necdin/NDN

Necdin (NDN) stands for neurally differentiated EC cell-derived factor as it was discovered in a screen of neuronal cell differentiation using retinoic acid on P19 cells [40]. Necdin1 is an intronless gene and is expressed predominantly in the brain where it is imprinted and only expressed from the father’s allele. Similar to MAGEL2, necdin is a member of the MAGE protein family and contains a single MAGE domain. It is present both in the cytosol and nucleoplasm, which is determined through its interaction with other proteins. NDN interacts with hundreds of proteins, including Grin1 (glutamate receptor), p75 (neurotrophin receptor), transportin (nuclear import factor), and Htt (huntingtin) in the cytosol and the nucleus with proteins forming complexes with p53 and Crebp (Creb-binding protein) [41]. Functionally, it was shown to bind the ubiquitin E3 ligase Mdm2, leading to degradation of the proapoptotic protein CCAR1/CARP1 (cell cycle apoptosis regulatory protein) [42]. In addition, NDN binds to PIAS1, a RING-type SUMO E3 ligase, via its MHD and regulates PIAS1 activity [43].

Early characterizations showed that Necdin interacts with the transcription factor E2F1 and the p75 neurotrophin receptor that binds NGF, BDNF, NT-3, and NT4/5. Interactions with p75 and E2F1 cause cell cycle arrest. In contrast, the related MAGE family member MAGEL2 does not cause cell cycle arrest [44].

Several knockdown models of Necdin were created that manifest early postnatal lethality with partial penetrance. The mouse models show various phenotypes depending on the genetic background, including respiratory defects, skin scraping, and abnormal neuronal differentiation due to reduced TRKA signaling, differentiation of GABA neurons, and defective axonal outgrowth. Several studies reported hypothalamic changes that showed a reduction of GnRH neurons [45, 46].

PWRN1 and PWRN2

PWRN1 is the abbreviation for Prader-Willi region nonprotein-coding RNA 1. PWRN1 is most abundantly expressed in testis and is also found in the prostate, heart, kidney, lung, and brain. It shows biallelic expression in testis, but is only paternally expressed in the brain [10]. There is evidence that PWRN1 hosts an alternative promoter of the SNURF-SNRPN gene [47].

PWRN2 is a nonprotein-coding RNA located in an intron of PWRN1 in antisense orientation to PWRN1. The functions of PWRN1 and PWRN2 are not known.

NPAP1/C15orf2

NPAP1 was first named C15orf2 and later renamed NPAP1 for nuclear pore-associated protein 1. It is an intronless gene with the strongest expression in testes, where the expression is biallelic [9]. NPAP1 is imprinted in the brain where it is expressed from the paternal allele and protein can be detected in various brain tissues, including the hypothalamus [48].

The NPAP1 gene shows strong expression of piRNAs [10]. The piRNAs (PiWi-interacting RNAs) are 26–31-long RNAs made from cleaving precursor RNAs and bound to argonaute proteins. They function mainly in the silencing of transposons [49]. Global databases indicate that piRNAs are generated throughout the Prader-Willi region, with NPAP1 forming the most piRNAs of all protein-coding genes [50].

NPAP1 shows sequence similarity with the nuclear pore complex protein POM121 [51]. POM121 is part of the nuclear core complex and has a single-pass transmembrane domain that contributes to anchoring the nuclear pore complex in the nuclear membrane. POM121 enhances the importin-dependent nuclear transport of transcription factors like E2F1 and MYC [52]. NPAP1 colocalizes with the nuclear pore complex inside the nucleus. Overexpression of NPAP1 did not show an effect on transcriptional regulation and global mRNA transport, so the exact molecular function remains unclear [51].

An interesting feature of NPAP1 is that it is primate specific. Paralogs of NPAP1 exist in all placental species, except rodents. Since they are intronless, these paralogs likely derived from an earlier intron-containing POM121 ancestral gene through retrotransposition. This primate specificity is notable as PWS mouse models do not fully recapitulate the human phenotype, most strikingly the obesity [53].

The SNURF-SNRPN Transcriptional Unit

SNURF-SNRPN and Imprinting Center

SNURF-SNRPN stands for SNRPN upstream reading frame (SNURF) – small nuclear ribonucleoprotein polypeptide N (SNRPN) gene (Fig. 2.5a). This gene is imprinted and only expressed from the paternal allele. It is bicistronic, that is, it expresses two proteins, the SNURF and SNRPN protein, from one mRNA (Fig. 2.5b, c). Bicistronic transcripts are common in bacteria, but very rare in eukaryotes. In addition to these two proteins, the gene harbors numerous noncoding RNAs in its 3’ UTR, among them at least six classes of C/D box small nucleolar RNAs (snoRNAs) and the IPW (imprinted in Prader-Willi) RNA. The promoter region of the SNURF-SNRPN gene overlaps with the imprinting center, that is, the region that is necessary for methylation of the parental allele.

Fig. 2.5
figure 5

The SNURF-SNRPN transcriptional unit. (a) Overview. The SNURF-SNRPN transcription unit comprises at least 600,000 nucleotides. It starts with at least three promoters (arrows) initiating at three start sites, one of which overlaps with the imprinting center. A further upstream promoter has been described in PWRN1 (not shown). The imprinting center (IC) is bipartite, defined by the Angelman smallest region of overlap (AS-SRO) and the PWS smallest region of overlap (PWS-SRO) that are responsible for the expression from the maternal and paternal allele, respectively. Protein coding exons are shown in black and noncoding exons in gray. (b) The SNURF-SNRPN gene creates a bicistronic mRNA that encodes the SNURF and SmN proteins. This mRNA is polyadenylated, leading to the termination of part of the transcript. ‘Read-through’, i.e. continuation of the transcript downstream of the polyadenylation site creates a large (>440,000 bp) 3’UTR, referred to as SNHG14 (small nuclear RNA host gene 14). SNHG14 hosts C/D box snoRNAs (SNORDs) that are located between two noncoding exons. In addition, noncoding transcripts are generated through splicing (IPW) or as intronless RNAs (PWAR1, 2). The SNHG14 transcript extends as an antisense transcript into the UBE3A gene that is expressed from the maternal allele. There is evidence for a neuron-specific promoter upstream of the SNORD115 cluster. (c) SNURF and SmN proteins are encoded by SNURF-SNRPN. (d) Overview of the SNORDs in SNHG14. Each SNORD expression unit consists of two noncoding exons flanking a SNORD located in the intron and is depicted as a single gray line. IPW is a spliced RNA consisting of three exons. Detailed genomic coordinates are given in [125, 126]

The imprinting center is defined through microdeletions [54,55,56] and is responsible for the methylation and silencing of the paternal alleles. The core of the imprinting center is defined by the shortest region of deletion overlap between PWS patients and a paternal imprinting defect (PWS-SRO). Conversely, the smallest region of deletion overlap between Angelman syndrome patients and a maternal imprinting defect is described as AS-SRO. The PWS-SRO activates the paternal allele and keeps the allele active, whereas the AS-SRO is necessary for the maternal pattern of expression [57]. It is highly unusual that imprinting centers for the maternal and paternal alleles are so close together.

The SNURF-SNRPN gene consists of at least 8 coding exons and at least 11 noncoding upstream exons. There are three major transcriptional start sites in exons 1, U1A, and U1B. Both in brain and testis, exons from the upstream PWRN1 genes are joined to SNURF-SNRPN upstream exons, indicating that PWRN1 could be part of the SNURF-SNRPN transcription unit [47]. In general, the repressed maternal allele is heavily methylated, but there are region-specific differences. For example, one of the exons in the 5’UTR (exon U1B) is located in a CpG island that is extensively methylated on the repressed maternal allele and unmethylated on the expressed paternal allele. However, the reverse methylation pattern is seen in intron 5, that is, methylated in the paternal allele, but unmethylated in the maternal allele [58].

The coding SNURF-SNRPN mRNA contains two open-reading frames, SNURF and SNRPN, that both encode nuclear proteins [59] (Fig. 2.5c). Although SNURF is evolutionary highly conserved, its function is unclear. Its ORF contains a nuclear localization signal, as well as cAMP-dependent kinase, casein kinase, and protein kinase C sites [59].

The SNRPN open-reading frame encodes the SmN protein (Sm protein N, N for neuron), which belongs to the LSm (like Sm) protein class. Sm proteins (from Smith after an autoimmune antiserum) form a heptameric ring around small nuclear RNAs [60]. There are seven ubiquitously expressed Sm proteins (SmB, D1, D2, D3, E, F, G). SmB can be replaced with an alternatively spliced variant, SmB’ [61] or SmN. Together with the major snRNAs (U1, U2, U4, U5), Sm proteins form small nuclear ribonucleoproteins (snRNPs) that generate the spliceosome.

Of note, the SmN protein is different from the SMN protein (capital M), which is generated by the survival of motoneuron 1 (SMN1) gene [62]. SMN functions in loading the Sm proteins, including SmN onto snRNAs [63], but is not linked to the Prader-Willi gene region.

SmN is predominantly expressed in the brain and to a lesser extent in the heart [64]. Reflecting its ability to compete with SmB/B’, the relative amount of SmB/B’ is lowest in the heart and the brain. Whereas most Sm proteins are small, 16 kD or less, SmB and SmN are larger. Due to a long C-terminal extension, SmN is 28 kD. Overexpression studies showed that SmN expression leads to a reduction of SmB/B’ protein, but not their respective mRNA levels, indicating a post-transcriptional regulatory mechanism. SmN is absent in brains from PWS subjects, which is compensated by an increase of SmB/B’ [65].

SmN is incorporated predominantly into U2 snRNP and at higher cellular concentration also seen in U1 snRNPs [66]. Overexpression of SmN has little effect on overall gene expression, but it slightly influences alternative splicing of a few mRNAs, such as BIN1 and EXOC7 pre-mRNAs [67]. Mice lacking SmN show no changes in alternative splicing [68]. Thus, despite extensive molecular characterization, the physiological function of SmN and the effect of SmN substituting the SmB/B’ proteins are not fully understood.

SNHG14: 3’ UTR of the SNURF-SNRNP Transcript

The SNURF-SNRNP open-reading frames are followed by a large (>440,000 bp) 3’ untranslated region (3’ UTR), termed SNHG14 for small nucleolar RNA host gene 14. SNHG14 contains numerous nonprotein-coding RNAs, most prominently C/D box snoRNAs. The 3’UTR extends as an antisense RNA into the downstream ubiquitin protein ligase E3A gene (UBE3A) that is expressed from the maternal allele in the brain [69] (Fig. 2.5a).

The SNHG14 region is nearly devoid of H3K27Ac histone marks that indicate transcriptional start sites. However, CAGE tag analysis (5’ tag analysis of gene expression) indicates start sites between SNORD116 and SNORD115 clusters [70] (Fig. 2.5a). It is thus possible that the SNHG14 transcripts originate from the various 5’ as well as internal promoters. The SNHG14 transcript is extensively alternatively spliced and harbors multiple termination points. In addition to SNORDs, several noncoding RNAs are found in the SNHG14 primary transcript, including IPW (imprinted in PWS), PWAR1, and PWAR5 (Prader-Willi/Angelman region RNA) (Fig. 2.5b). Microdeletions of the SNORD116 cluster generate a phenotype that shares features with PWS, which emphasizes the role of SNORDs in PWS (Fig. 2.5d).

Role of C/D Box snoRNAs in PWS

General Features of C/D Box snoRNAs (SNORD)

Currently, at least 295 C/D box snoRNAs (SNORDs) have been identified [71]. However, most of our knowledge of SNORD function derives from studies performed on highly expressed SNORDs that act on ribosomal RNA (rRNA) biosynthesis. The biogenesis and function of SNORDs acting in rRNA synthesis are summarized below for a better understanding of the SNORDs in the PWS region that are far less understood [72].

snoRNAs are highly expressed RNAs, which is why they have been studied since 1979 [73] and are among the best characterized human RNAs. There are two main classes of snoRNAs – C/D box snoRNAs (SNORDs) that guide ribose 2’-O-methylation and H/ACA box snoRNAs (SNORA) that guide pseudouridinylation. A typical mammalian cell contains an estimated 200,000 copies of SNORD3 (U3) and 20,000 copies of SNORD13/14 [74, 75], which compares to an estimated 200,000 mRNA molecules in a cell [76]. SNORDs have characteristic structural elements: the C (RUGAUGA, R = purine) and D (CUGA) boxes, which are usually present in duplicates (C’ and D’ boxes) and up to two antisense boxes hybridizing to the RNA target [77]. SNORDs are flanked by two termini that form a stem in the final SNORD ribonucleoprotein complex (SNORNP) (Fig. 2.6a, c).

Fig. 2.6
figure 6

Structure and biogenesis of SNORDs and base complementarity between SNORD115 and 5HT2C. (a) A typical SNORD is characterized by C and D boxes that are present in duplicates (C’ and D’), flanked by two terminal sequences that form a short stem. AS: antisense boxes, that is, SNORD sequences that bind to target RNAs. (b) SNORDs (thick black line with yellow dots) reside in introns (line) flanked by two exons (boxes). The splicing reaction generates a lariat that contains the SNORD. Often, there is a distance requirement between the SNORD and the branchpoint, which reflects the binding of a protein. After debranching of the intron, exonucleases (red circles) trim the SNORD up to the stem structure that protects the snoRNA. (c) Structure of the SNORD-protein complex. A stem of four basepairs is shown; this stem can vary in structure and size. Proteins (15.5, NOP58 and NOP56, and fibrillarin) that associate with the snoRNA are indicated as round ovals. The interaction with the target RNA is schematically shown; a specific residue five nucleotides downstream of the C box is modified by fibrillarin that adds a methyl group to the 2’-hydroxyl group on the ribose. (d) Schematic structure of the serotonin receptor 2C (5HT2C) pre-mRNA. The start codon in exon III is indicated. Exon Vb is alternatively spliced due to a proximal (PS) and distal (DS) splice site. Skipping of exon Vb results in RNA1 that encodes a protein residing in intracellular compartments. Vb inclusion generates a full-length receptor. A partial sequence of exon Vb is shown underneath, and the proximal splice site is highlighted in yellow. The 5HT2C pre-mRNA is edited at five sites (A–D), which changes an adenosine to inosine. (e) Base complementarity between SNORD115 and exon Vb

Biogenesis of Human SNORDs

With the exception of four SNORDs (U3, U8, U13, and U118), human SNORDs are located in introns [78, 79]. Generally, introns are released as lariats after the splicing reaction (Fig. 2.6b). The lariats are opened up by the debranching enzyme and are subsequently degraded through exonucleases where XRN1/2 acts on the 5’ end and the RNA exosome at the 3’ end [80, 81]. The close connection between pre-mRNA splicing and SNORD biogenesis is reflected by a distance requirement for SNORDs that are located around 33–40 nt upstream of the branch point, which was shown with biochemical studies using a few SNORDs [82, 83]. SNORDs associate with proteins in a snoRNP precursor, which prevents snoRNA degradation. In addition, the stem-termini forms a short dsRNA stem that protects from exonucleases. Four proteins, NHP2L1 (15.5k, SNU13), NOP56, NOP58, as well as fibrillarin [84,85,86] that catalyzes 2’-O-methylation of target rRNAs, are then deposited on the snoRNP. The addition of proteins to the snoRNA is aided by the R2TP complex [named after the yeast proteins ATPases Rvb1 and Rvb2 (named after E. coli DNA repair enzyme ruvB), Pih1 (protein interacting with Hsp90), and Tah1 (TPR-containing protein associated with Hsp90) [87]. R2TP components are conserved from yeast to humans. Two of its members, PIH1D1 and RPAP3, shuttle between cytosol and nucleus [88].The shuttling and the SNORD assembly are regulated in yeast by the nutritional status via phosphorylation emanating from the mTOR pathway [88].

Well-Understood “Classical” and Novel Functions of SNORDs

The best understood function of SNORDs is their involvement in rRNA biogenesis, where they participate in 2’-O-methylation, folding, and cleavage of pre-rRNA. rRNAs are made from a large precursor that contains tandem arrays of 18S, 5.8S, and 28S rRNA. rRNA undergoes extensive modification, including 2’-O-methylation, pseudouridinylation, and base modifications. SNORDs catalyze the 2’-O-methylation of at least 106 ribose sites in humans [89]. SNORDs bind to the nascent rRNA using interactions between their antisense elements and pre-rRNA, which positions the methylase fibrillarin to perform the 2’-O-methylation (Fig. 2.6c). It is possible that this process also influences the folding of the pre-rRNA. About 80 proteins are attached to the pre-rRNA during this step [90, 91].

In addition to 2’-O-methylation, several SNORDs including the most abundant U3 direct cleavage of the pre-rRNA [92]. Although U3 binds the methylase fibrillarin, it performs only cleavage of pre-rRNA.

About half of the known SNORDs, including all the SNORDs hosted by SNHG14, do not have predicted rRNA targets and are considered “orphan” [79, 93], suggesting novel functions other than noncoding RNA methylation or cleavage.

Outside PWS, there is a growing list of diseases associated with a loss of SNORD expression that does not lead to detectable changes in rRNA processing. Examples include several forms of cancer [94,95,96,97,98,99,100,101,102,103], cancer progression [104], cardiovascular disease [105], lipotoxic stress [106], osteoarthritis [107], cerebral microangiopathy leukoencephalopathy [108], diabetic glucose deregulation [109], and viral host interactions [110].

Biochemical analyses showed that about 1/3 of highly expressed SNORDs form protein complexes with and without fibrillarin, suggesting functions outside of 2’-O-methylation. It was shown that SNORD27 regulates alternative splicing in addition to its well-documented role in rRNA methylation. Fibrillarin-free, nonmethylating SNORDs likely act similar to RNA oligonucleotides, that is, they can recognize targets using their entire sequence, not just the antisense boxes [111]. In addition, SNORDs participate in polyadenylation of RNAs [112, 113].

SNORDS of the PWS Region

All SNORDs in the PWS region are hosted by the SNHG14, that is, the 3’ UTR of the SNURF-SNRPN gene (Fig. 2.5a, b). Each SNORD is localized in an intron flanked by two noncoding exons that have canonical 5’ and 3’ splice sites [114], (Figs. 2.5a and 2.6b). With the exception of SNORD115, no RNA targets have been identified for the SNORDs in the SNHG14 region.

Very little is known about SNORD107, SNORD108, SNORD64, and SNORD109A, B. These SNORDs are predominately expressed in the brain, but they can also be detected in other tissues tested using RT-PCR [70, 115]. SNORD109A and 109B have identical sequences, but these are hosted by two different introns.

SNORD116

SNORD116 is present in at least 28 tandemly arranged copies. The copies are dissimilar and fall in at least three clusters [115]. SNORD116 is strongly expressed in the brain, but it can be detected by Northern blot and RT-PCR in all tissues. The weakest expression is found in the muscle, liver, and placenta [70].

In contrast to all other SNORDs of the region, the C’ box of all SNORD116 copies deviates from the consensus RUGAUGA and is RTGAGTGA. This deviation is evolutionary conserved [116]. Reflecting SNORD116’s dependency on splicing (Fig. 2.6b), its formation in the cellular model system is dependent on optimal splice sites [117] and presence of neuron-specific splicing factors in mice [118]. The exons surrounding the SNORD116 copies are joined in a large RNA that forms an “RNA cloud” that resides near sites of transcription and increases during sleep in the mouse brain [119].

Overexpression of a single SNORD116 copy in cells showed about 200 changes in mRNA expression. Changes in gene expression were often increased when SNORD115 was co-expressed, indicating a possible interaction between these SNORDs. However, no binding sites could be identified in putative target genes [117]. A comparison between neurons made from PWS subject and control-derived iPS cells identified a different expression of nescient helix loop helix 2 (NHLH2) and the prohormone convertase PC1, as reflected in a SNORD116 knockout mouse model [120]. However, no direct RNA:SNORD116 interaction could be identified.

Microdeletions of the SNORD116 region have been identified in seven individuals that exhibit a Prader-Willi-like phenotype [121,122,123,124,125,126] that is less severe than the phenotype caused by loss of expression from the full Prader-Willi region between BP1 and BP3. Comparing all microdeletions identifies a region containing SNORD116 and IPW, suggesting that the loss of these SNORDs plays a central role in PWS disease etiology [121,122,123,124,125,126,127] (Fig. 2.5b). Mice lacking SNORD116 recapitulate some of this phenotype as they show postnatal growth retardation and an increased food intake, which is compensated by higher energy expenditure [128] and can be reversed by reintroducing SNORD116 [129, 130]. In addition, SNORD116 mice show a deregulation of dysregulation of diurnally expressed Mtor and circadian genes Clock, Cry1, and Per2 [119], and in diurnal DNA methylation in mouse cortex [131].

SNORD115

SNORD115 is present in 48 tandemly arranged almost identical copies. SNORD115 is expressed almost exclusively in the brain, but smaller amounts can be detected in the kidney, liver, and muscle [70]. It is the only SNORD of the SNHG14 transcript with a known target RNA, as it shows an 18 nt complementarity to a known RNA, the alternative exon Vb of the serotonin receptor 2C (5HT2C) (Fig. 2.6d). Skipping of these alternative exons leads to a truncated 5HT2C receptor. In addition, exon Vb undergoes RNA editing at five sites, where an adenosine is deaminated into an inosine.

The 5HT2C receptor regulates food uptake in the arcuate nucleus. 5HT2C activity induces POMC (pro-opiomelanocortin), which is processed into alpha MSH that reduces food intake by acting on neurons in the paraventricular nucleus. The full-length 5HT2C, containing exon Vb, is constitutively active, that is, it signals without ligand binding. RNA editing changes 5HT2C’s amino acids, reducing the receptor’s coupling to G-proteins, which strongly reduces the constitutive activity. Skipping of exon Vb results in a truncated 5HT2C receptor that resides in the endoplasmic reticulum and does not reach the plasma membrane. The truncated receptor can heterodimerize with the full-length receptor, leading to a sequestration of the 5HT2C inside the cell, and a reduction of 5HT2C signaling. RNA oligonucleotides mimicking the exon-skipping effect of SNORD115 strongly reduce food intake in mice [132].

Transfection studies using reporter genes in cells showed that SNORD115 promotes the inclusion of exon Vb [133] due to direct SNORD:mRNA interaction. SHAPE assays showed that the serotonin receptor 2C pre-mRNA in this region forms a stable double-stranded structure that sequesters the regulated splice site, causing exon skipping [134]. Thus, SNORD115 appears to regulate the ratio between full-length and truncated 5HT2C receptors by influencing their alternative splicing.

In similar cell-based assays, SNORD115 was also shown to influence 2’-O-methylation of short RNAs corresponding to serotonin receptor 2C pre-mRNA when these RNAs are sent to the nucleolus by using reporter constructs with an RNA polymerase I promoter [135]. Exon Vb is included in most brain regions, with the exception of the choroid plexus, which also lacks expression of SNORD115. Overexpression of SNORD115 in choroid plexus does not promote exon Vb inclusion but has a modest effect on A->I editing [136].

PWS knockout mice lacking expression of the SNURF-SNURPN and SNHG14 region show a reduction in exon Vb inclusion in the arcuate nucleus and pituitary, indicating that SNORD115 could change splicing in the nucleoplasma [137, 138]. However, a recent CRISPR/Cas9 SNORD115 knockdown mouse did not show significant changes in 5HT2C alternative splicing and modest changes in 5HT2C RNA editing, which questions the cell-based data and the changes in mice with longer deletions [139]. Thus, despite a clear binding site and effects using reporter genes in cell models, the physiological role of SNORD115 remains elusive.

IPW

IPW stands for imprinted in Prader-Willi. IPW is a noncoding RNA that is part of the SNHG14 region transcript and contains two introns. Removal of the IPWs region in an iPS cell model showed surprisingly an upregulation of the imprinted DLK1-DIO3 gene region on chromosome 14. This indicates that IPW downregulates maternally expressed genes of the DLK1-DIO3 region [140] through an unknown mechanism.

The imprinted DLK1-DIO3 region is important for fetal development and its deregulation in adult tissues can lead to cancer [141]. Its loss due to uniparental disomy leads to Kagami-Ogata syndrome [142]. The paternal allele expresses DLK1 and RTL1 and the maternal allele expresses GTL2, RTL1as, and MEG8 [143]. Loss of IPW leads to an upregulation of GTL2(MEG3), RTL1as, and MEG8 in cell models, but the molecular mechanism remains unclear.

GTL2/MEG3 is a noncoding RNA (maternally expressed only) that functions as a tumor suppressor, MEG8 is also a noncoding RNA that hosts two clusters of brain-specific C/D box snoRNAs (SNORD113 cluster and SNORD114 cluster), as well as numerous miRNAs, which is similar to the SNORD cluster in PWS [16].

Genes in Paternally Imprinted Region, Expressed in PWS, Not Expressed in Angelman Syndrome

Two genes UBE3A and ATP10A in the Prader-Willi region are imprinted on the paternal allele and thus only expressed from the maternal allele. Their loss of expression contributes to Angelman syndrome.

UBE3A

UBE3A is an E3A ubiquitin-protein ligase (UBE3A). Analysis in mouse shows that this gene is expressed only from the mother’s allele in neurons, but it is likely expressed from both alleles in glia, that is, oligodendrocytes and astrocytes [11, 12] and thus the loss of the paternal allele expression in PWS could have an effect in these cells. UBE3A is in antisense orientation to SNHG14, which downregulates its expression [69, 144,145,146].

Functionally, UBE3A encodes the E3 ligase E6-associated protein (E6AP) that attaches ubiquitin to target proteins, resulting in their degradation. Targets have been shown to be p53, Arc, Ephexin5, and SK2 [147].

ATP10A

ATP10A is an ATPase phospholipid transporting 10A protein, a P4-type ATPase, acting as a lipid flippase. It is a membrane protein that transports phosphatidylcholine across membranes, ensuring a different composition of lipid bilayers. Its overexpression in cells influences cell shape and size, cell adhesion, and spreading [148].

Nonimprinted Genes Between BP1 and BP2

These genes are biallelically expressed, but they are deleted in type I deletion patients, making it possible that their expression levels are reduced (Fig. 2.1). Microdeletions between BP1 and BP2 cause Burnside-Butler syndrome characterized by developmental delays, language impairment followed by motor delay, attention-deficit disorder/attention-deficit hyperactivity disorder, and autism spectrum disorder [3, 149, 150].

NIPA1

NIAP1 is an abbreviation for nonimprinted in Prader-Willi/Angelman syndrome region protein 1. The protein encodes a magnesium transporter that associates with early endosomes and the cell surface. This localization is magnesium dependent: a high magnesium concentration promotes localization in endosomes. Mutations in NIPA1 cause autosomal-dominant hereditary spastic paraplegia (HSP), a neurodegenerative disorder characterized by progressive lower limb spasticity and weakness [151].

NIPA2

NIPA2 is an abbreviation for nonimprinted in Prader-Willi/Angelman syndrome region protein 2. Similar to NIPA1, this protein is also a magnesium transporter that maintains magnesium influx [152]. NIPA2 is downregulated in mouse models of diabetes and promotes osteoblast function, likely by influencing mitophagy, that is, the selective degradation of mitochondria due to autophagy [153].

CYFIP1

CYFIP1 stands for cytoplasmic FMR1 interacting protein 1 (FMR1: fragile X mental retardation protein). A subgroup of fragile X syndrome patients shows a Prader-Willi-like phenotype with obesity and hyperphagia. In 13 cases of fragile X-syndrome with Prader-Willi phenotype investigated, CYPFIP1 mRNA was reduced [154].

Reflecting its association with CYFIP1 inhibits local protein biosynthesis. In addition, CYFIP1 is a component of the WAVE regulatory complex that regulates actin polymerization. In the WAVE complex, CYFIP1 is often called SAR1.

CYFIP1 also interacts with the small GTPase Rac1 and localizes in synaptosomes [155]. Induced by BDNF (brain-derived neurotrophic factor) Rac1 causes a conformational change of CYFIP1 that releases CYFIP1 from translational initiation factors and promotes its association with the WAVE complex and actin polymerization. Through this mechanism, CYFIP1 contributes to the formation of dendritic spines [156].

TUBGCP5

TUBGCP5 is tubulin gamma complex-associated protein 5. The gamma tubulin ring complex is a large protein complex that nucleates microtubules to the centrosome [157]. Centrosomes are the main microtubule-organizing center in the cell [158]. TUBGCP5, also named KIAA1899, is expressed in all tissues, including the brain [159].

Nonimprinted Genes Between ATP10A and BP3

GABRB3, GABRA5, GABRG3

The region contains three genes encoding subunits of the GABA A receptor, namely, GABRB3 (beta3 subunit), GABRA5 (alpha 5 subunit), and GABRG3 (gamma 3 subunit). The GABRG3 locus also expresses a spliced, noncoding antisense transcript GABRG3-A3. The GABA A receptors are the major inhibitory receptors in the brain, responding to GABA (gamma amino butyric acid). Each receptor is formed by five subunits that form a chloride channel. Currently, 19 subunits are known. GABA A receptors are a common drug target, most notably for benzodiazepines [160].

OCA2

OCA2 is an abbreviation for oculocutaneous albinism, type 2. Mutations in this gene lead to oculocutaneous albinism, which affects the eyes (oculo-) and skin (-cutaneous). The gene is the homolog of the mouse pink-eyed dilution (p) locus and encodes a transmembrane protein with 12 membrane-spanning domains [161]. Melanosomes lacking OCA2 have a higher pH than wild-type melanocytes, suggesting that OCA2 regulates the pH of melanosomes. It is not fully understood how this change in pH leads to a loss of melanin, which could occur either by inhibiting melanin synthesis or the uptake of tyrosine, the substrate for melanin synthesis [162]. The hypopigmentation seen in PWS is likely linked to the hemizygosity of OCA2, as one copy is lost due to the deletion [163]. Since the pink-eyed dilution (p) locus in mouse shows recessive inheritance, it is possible that other modifying genes act on OCA2 [162].

HERC2

HERC2 stands for HECT and RLD domain-containing E3 ubiquitin protein ligase 2. It contains a C-terminal HECT domain and homologous to E6-AP C terminus that catalyzes the transfer of ubiquitin from E2 ligases. HERC2 also contains an RLD domain, which stands for regulators of chromatin condensation 1 (RCC)-like domain. The RLD domain has a twofold function: it acts as a guanine nucleotide-exchange factor (GEF) for the small GTPase Ran and also interacts with histones H2A and H2AB that are bound to chromatin [164]. In addition, the protein contains a cytochrome b5-like region, a mind-bomb/HERC2 (M-H) domain, a CPH domain, a ZZ-type zinc finger, and a DOC domain. Reflecting the multiple domains, the protein interacts with more than 300 other proteins and likely serves as a scaffold to integrate protein complexes involved in protein transport, metabolism, and translation [165]. HERC2 binds to UBE3A via its HERC2-binding domain (Fig. 2.2e) [166]. The best understood functions are in DNA repair and replication. HERC2 is expressed in most tissues and throughout the cell, but it associates with the centrosome. HERC2 deletion in mice leads to reduced growth, jerky gait, male sterility, female semisterility, and maternal behavior defects known as rjs (runty, jerky, sterile) mice [167].

Nonimprinted Genes Between BP3 and BP5

Defects in genes between BP4 and BP5 are extremely rare but have been reported. APBA2 is a gene found in this region.

APBA2 is the amyloid beta A4 precursor protein-binding family A member 2. The protein acts as a scaffold regulating the generation of β-amyloid [168, 169].

CHRNA7

CHRNA7 is the neuronal acetylcholine receptor subunit alpha-7. The gene is ubiquitously expressed, included in most brain regions. Acetylcholine receptors are composed of 5 subunits and 11 subunits are expressed in the brain. Changes in CHRNA7 expression are associated with schizophrenia, bipolar disorder, ADHD, and epilepsy [170].

Contribution of Genes to PWS

A tremendous amount of work has been done regarding genes expressed in the Prader-Willi region. Not a single gene stands out as the sole contributor to PWS. Five genes that work in the ubiquitination of proteins could affect hundreds of proteins. Similarly, the noncoding RNAs probably have multiple targets. It is thus likely that the loss of expression of numerous genes acts together to create the syndrome, which needs to be taken into account for any therapeutic intervention.

Databases

The piRNA database can be visualized at http://regulatoryrna.org/database/piRNA/genome.php.

Genes and their annotation can be visualized at http://genome.ucsc.edu.