Introduction

Proteotoxic diseases arise due to misfolding within or external to cells. Extracellular amyloid, for example, is a feature of diverse diseases (Table 1A) [1]. Such β-sheet-rich pathological deposits [2, 3] arise from misfolding of globular proteins, such as immunoglobulin light chains or β2-microglobulin in association with hematologic malignancies [4]; mutations in various other proteins can predispose to toxic deposition as exemplified by unstable variants of serpins, transthyretin, and lysozyme [57]. In neurodegenerative diseases, perineuronal plaque reflects aggregation-coupled misfolding of intrinsically disordered polypeptides [8, 9]. Principles of amyloidogenesis have recently been reviewed [10]. Of complementary importance is intracellular proteotoxicity (Table 1B). Its pathophysiologic importance has motivated studies of a series of foundational mechanisms, including nascent folding, quality control, trafficking, and degradation [11]: a dynamic regulatory network collectively designated proteostasis [12]. Distinctive protein inclusions within cells are histopathological hallmarks of neurodegenerative diseases, as exemplified by huntingtin aggregation in neuronal nuclei (Huntington’s disease [13]) and tau-related cytoplasmic neurofibrillary tangles (Alzheimer’s disease [14]).

Table 1 Summary of human diseases that arise due to protein deposits

This review highlights a monogenic form of diabetes mellitus (DM)Footnote 1—the mutant proinsulin syndrome [15••, 16••]—due to nascent protein misfolding in the endoplasmic reticulum (ER). Although monogenic DM syndromes encompass a variety of genes and molecular mechanisms, mutations in the insulin gene (INS) are of particular interest in relation to the pathway of insulin biosynthesis [17, 18]. The mutant proinsulin syndrome, also designated Mutant INS-gene-induced Diabetes of Youth (MIDY) (also known as Maturity-onset diabetes of the young 10 (MODY10) [19, 20]), is caused by toxic misfolding of variant proinsulins [15••, 16••] (for review, see [21, 22] (Table 2); the human syndrome was anticipated by the Akita mouse model [23, 24]. Clinical mutations (genetically dominant) impair secretion of both variant and wild-type insulin in trans; misfolding and aggregation activate the unfolded-protein response (UPR) and induce ER stress, leading in turn to β-cell dysfunction and death [25, 26] (for reviews, see [27•]). Here, we delineate structural mechanisms by which MIDY mutations impair the folding efficiency of proinsulin. Biophysical principles underlying ER-related proteotoxicity in this syndrome promise to provide general insight into a broad class of proteotoxic diseases [28].

Table 2 Mutations in preproinsulin observed in patients with diabetes [50]

Monogenic Diabetes and Proinsulin Syndrome

Monogenic DM can arise due to mutations in genes encoding key transcription factors, subunits of the β-cell potassium channel, the β-cell glucose-sensor glucokinase, or insulin itself [29, 30]. Collectively, such syndromes comprise 1–5% of DM [31]. The spectrum of phenotypes ranges from transient or permanent neonatal-onset DM (tNDM and pNDM) to MODY [32]. Whereas NDM presents within the first 6 months, MODY ordinarily has an onset between 10 and 25 years of age. Heterozygous INS mutations constitute the second most common cause of monogenic DM (after potassium channel mutations [33, 34]). Genotype–phenotype correlations in the mutant proinsulin syndrome suggest that ages of onset reflect mutational severity. Mutation-specific phenotypes are general features of other genetic diseases, such as partial or complete androgen-insensitivity syndrome and cystic fibrosis, among several others [3540]. In addition to mutation-specific effects, clinical differences in penetrance, disease severities, or ages of onset may be influenced by modifier genes or environment as observed in other endocrine syndromes [41], including type 1 DM [42].

Biosynthesis of Insulin

The INS-gene encodes preproinsulin, a single-chain precursor polypeptide with a signal peptide-B-C-A domain N-to-C organization [18]. The signal peptide is cleaved co-translationally on ER translocation. Folding within the ER accompanies a specific pairing of three disulfide bridges. Processing of proinsulin by prohormone convertases PC1/3 and PC2 generates the two-chain hormone in glucose-regulated secretory vesicles [17]. The mature hormone’s two cystines link the A and B chains (A7-B7, B19-A20) whereas one is within the A chain (A6-A11) [43]; these are each required for stability and activity [44, 45]. Mispairing of disulfides in vitro leads to reduced stability and activity [46, 47]. The solution structure of proinsulin (as an engineered monomer) contains a native-like insulin core (51 residues) with a flexible C domain (35 residues) [48]. Whereas clinical INS mutations primarily affect nascent folding in the ER, specific mutations have been identified that selectively perturb protein trafficking, prohormone processing, and receptor binding [49, 50•].

ER Quality Control

Chemical trapping studies of insulin-related precursor polypeptides in vitro have demonstrated accumulation of one- and two-disulfide intermediates, thus providing evidence for a hierarchical disulfide pathway [51, 52]. Together, these studies suggested the initial formation of cystine B19-A20 along with hydrophobic clustering by C-terminal α-helix and central B-chain α-helix. Such a native-like structure, recapitulated in a one-disulfide peptide model [44, 45], defines a specific folding nucleus [53]. Cellular folding of proinsulin and disulfide analogs has been extensively investigated by Arvan, Kaufman, and their respective colleagues in relation to the ER oxidative-folding machinery (quality control, stress, quality control, and exit; [54] (for review, see [55]). Pairwise substitution of cysteines enabled the respective contributions of each disulfide bridge to be evaluated [54]. The results highlighted the importance of cystines [A7-B7] and [B19-A20] (but not [A6-A11]) for efficient ER export and eventual secretion. Evidence was obtained that an unpaired thiol group at CysA11 underlies the proteotoxicity of SerA6-murine proinsulin (Ins2-Munich [56]). The particularly deleterious role of a single cysteine at A11 was thus highlighted, as CysA11 can mispair with three other cysteines CysB19, CysA20, and CysB7 in the same molecule or mediate aberrant intermolecular cross-linking [54].

Molecular Rheostat of Foldability

Whereas most MIDY mutations entail either loss or addition of Cys, non-Cys-related mutations highlight key determinants of foldability [50•]. Many such mutations cluster near the critical [B19-A20] disulfide bridge, particularly in the B9-B19 or A16-A19 helices. These are of biophysical interest as the variant polypeptide retains the six canonical Cys residues: impaired disulfide pairing presumably reflects general biophysical principles that underlie protein folding, structure, and stability [57, 58]. Prominent among these are (i) the efficiency of side-chain packing in a hydrophobic core [59] and (ii) the intrinsic secondary-structural propensities of the amino acids [60]. Large-to-small mutations [61], for example, can introduce destabilizing cavities in the native state [62] and by extension in a native-like specific folding nucleus [44]. Within helices, the substitution of a residue of high helical propensity by one of lower helical propensity can likewise impair stability [63, 64]. We describe in turn below clinical mutations that exemplify these principles. We chose the following subset of NCR mutations based on (a) their positioning within or near proinsulin’s specific folding nucleus [53] (Fig. 1) and (b) illustrative biophysical mechanisms of impaired foldability. A foundational structural model is provided by the crystallographic T-state monomer (PDB entry: 4INS) [43], as recapitulated in the insulin-like core of a proinsulin monomer [48].

  • (i) The side chain of LeuB6 inserts into an interchain cavity surrounded by the invariant side chains of LeuB11, LeuB15, and LeuA16 (Fig. 1). At this site, a variety of non-conservative mutations (Arg, Gln, Pro, and Val) lead to neonatal-onset DM. Each would be expected to introduce profound structural perturbations. In contrast, MODY substitution MetB6 is presumably associated with only subtle changes in packing.

  • (ii) LeuB11 contributes to segmental α-helical propensity and nascent clustering of nonpolar residues. The side chain is buried within a cavity abutting the nonpolar inner surface of the A chain. Clinical mutations are Pro or Gln, each expected to impede initial [B19-A20] disulfide pairing: ProB11 would profoundly perturb α-helical propensity, stability, and self-assembly. GlnB11 would fit within the B11-related cavity, but its carboxamide group would impose an electrostatic penalty.

  • (iii) The side chain of LeuB15 packs within a nonpolar crevice delimited by CysB19 and PheB24. Clinical mutations at B15 are Pro, His, and Val (neonatal in each case). Like ProB11 (above), ProB15 would be expected to introduce marked perturbations. Another neonatal mutation at this position (His) would insert a polar aromatic side chain into the nonpolar hydrophobic pocket, thus destabilizing the core. The β-branched side chain of ValB15 would by contrast be associated with more subtle effects due to its lower α-helical propensity and smaller volume, relative to Leu.

  • (iv) ValB18 adjoins CysB19 near the end of the central B-chain α-helix. Clinical mutations are Gly (neonatal) and Ala (MODY). Each would impair the efficiency of core packing near cystine [B19-A20] in a solvent-exposed interchain crevice. Substitution by Gly (a residue of similarly low helical propensity as Val) would create a cavity and enhance main-chain flexibility, presumably interfering with nascent [B19-A20] pairing. Interestingly the extent of these perturbations is different for Gly and Ala in terms of the severity of onset. Ala is predicted to exhibit offsetting biophysical effects: greater helical propensity but impaired packing efficiency.

  • (v) Three neonatal-onset MIDY mutations have recently been found in the A domain (ProA16, AspA19 and  AsnA19) [50•, 65]. The side chain of LeuA16 is buried within the core (Fig. 1). ProA16 would perturb the segmental main-chain conformation and introduce a destabilizing cavity [66•]. TyrA19 projects from a nonpolar crevice (lined in part by cystine [B19-A20]) to expose its para-hydroxyl group; AspA19 would place a destabilizing negative charge within the core. Similarly, AsnA19 would impede the foldability by projecting the carboxamide group into the nonpolar core.

Fig. 1
figure 1

Structure and sites of clinical mutations in insulin. Ribbon model of insulin monomer showing the core residues (PDB entry 4INS [43]). Sulfur atoms in A6-A11 and B19-A20 disulfides are shown as gold spheres and A7-B7 as sticks. Other side chains are shown in dark blue (near A6-A11 cystine) or light blue (near B19-A20 cystine); residues TyrA19 and HisB5 that are at near core residue and also sites of clinical mutation are shown in magenta. All other side chains are shown in light gray (A chain) or dark gray (B chain). Right side panel shows the view rotated vertically by 90°

Position A16 has long been of interest in relation to the structure, foldability, and function of insulin [43, 67•, 68••]. Invariant within an extended vertebrate family (insulin and insulin-like growth factors [IGF-I, II]) and also among most relaxins/insulin-like peptides (ILPs) [69]), the side chain of LeuA16 is buried in the core in both free and receptor-bound states [7072]. Packing of LeuA16 efficiently fills a potential cavity delimited by conserved nonpolar receptor-binding elements (LeuB15, IleA2, and TyrA19) and girded by cystines [A6-A11] and [B19-A20] [43, 70]. Such a “left-over space” (akin to Gould’s celebrated evolutionary metaphor of the spandrels of the San Marco cathedral in Venice [73]) rationalizes the exquisite sensitivity of insulin chain combination to A16 substitutions [67•]. LeuA16 is invariant as an “exaptation,” the only side chain able to fit in this space otherwise peripheral to the mechanism of receptor binding. Indeed, substitution of LeuA16 by Val—although rendering chain combination yield negligible and impairing the folding of proinsulin—is nonetheless compatible with native structure and function [68••]—once the folded state has been reached. Remarkably, ValA16 has recently been found in an infant in Saudi Arabia as a recessive MIDY mutation [50•] (E. De Franco, personal communication), to our knowledge the first instance of a point mutation with recessive inheritance. Additional recessive mutations may occur among MIDY patients, but lack of family history could obscure their identification (a general issue in human genetics; for review in monogenic diabetes syndromes, see [74]); ValA16 provides a prototype recessive mutation in a society notable for consanguinity [75]. It is noteworthy that detailed analysis of structure, foldability, and function of ValA16-insulin and ValA16-proinsulin [68••] preceded its clinical description [50•].

In the initial steps of proinsulin folding, the side chains of B11, B15, B18, A16, and A19 are proposed to collapse to form a specific folding nucleus guiding pairing of CysB19 and CysA20 [44, 51, 52]. Together, the above analysis supports a broad hypothesis that the variable age of onset of MIDY-related DM—and perhaps the mode of inheritance, dominant, or recession—is intrinsic to the biophysical properties of the mutations (as distinct from environmental effects or the influence of potential modifier genes as pertinent to the onset of Type 1 DM [42]). This hypothesis presumably extends to the collection of MIDY mutations as a whole and is not restricted to the above subset of substitutions.

We anticipate that one or another MIDY mutation may primarily impair pairing of any one of proinsulin’s three disulfide bridges. However, not all structural elements of a protein’s native state contribute to its folding nucleus or subsequent steps in oxidative protein folding. Like Sherlock Holmes’ famous clue: “the dog that did not bark in the night-time” [76], the absence of clinical mutations in such an element may be as informative as the presence of mutations in other elements.Footnote 2 An example is provided by proinsulin’s flexible C domain: although mutations at the dibasic cleavage sites can lead to secretion of split proinsulins [49], lack of non-Cys-related mutations in this segment implies that disulfide pairing is robust to such substitutions in accordance with both the C domain’s evolutionary variability in sequence and length and its diversification among chordate insulin-like growth factors [77].

Conclusions

The discovery of insulin in Toronto in 1921, followed the next year by its first clinical use, represents a landmark in the history of molecular medicine [78]. However transformational, the work of Banting, Best, Collip, and Macleod provided only the starting point for generations of seminal basic and translational investigations: the ensuing century of discovery is the subject of recent commemoration and review [79]. Identification of the mutant proinsulin syndrome in this century [50•] has brought together long-standing themes in diabetes research—hormone biosynthesis and structure—with foundational paradigms in human genetics, cell biology, and protein biophysics [22, 50•, 55].

NCR mutations in proinsulin associated with toxic misfolding in principle define key determinants of foldability, providing insight into how specific disulfide pairing is specified by the wild-type protein sequence [58]. Although structural studies have encountered an experimental “Catch-22” (i.e., confounded impaired folding efficiency), we anticipate that frontier synthetic methods [80, 81] may circumvent this critical barrier to provide tractable models [82, 83]. Such synthetic advances promise to enable our overarching hypothesis—that the variable age of onset among MIDY patients is due to mutation-specific biophysical mechanisms—to be rigorously tested. Furthermore, such biophysical insights may enable molecular interpretation of pathophysiologic events in the stressed ER that contribute to trans-interference with wild-type proinsulin biosynthesis and impaired glucose-stimulated insulin secretion [66•]. Key questions include how wild-type and variant folding intermediates self-associate in the ER and in turn how such aggregates block trafficking to the Golgi apparatus [55, 84, 85].

The broader significance of the mutant proinsulin syndrome pertains to non-syndromic type 2 DM. Under conditions of peripheral insulin resistance leading to INS overexpression, misfolding of even wild-type proinsulin can activate the unfolded-protein response and induce ER stress [86]. We envision that β-cell dysfunction caused by mutations in proinsulin may recapitulate, in accelerated form, the natural history of type 2 DM [27•, 83, 87]. Accordingly, studies of such variants in β-cell lines, isolated islets, and engineered mouse models promise to provide broad insights into the pathogenesis of a pandemic disease [27•]. Such models may also enable development of novel therapeutic approaches which focuses on reducing β-cell ER stress elicited by the misfolding of wild-type proinsulin [88•]. This prospect exemplifies a general paradigm in pharmacology whereby rare monogenic syndromes can open doors to innovative drug discovery [89]. It is fitting that such opportunities have arisen at the cusp of insulin’s second century [79, 90].

Note Added In Proof

Classification of clinical mutations in proinsulin based on structural mechanisms of disulfide pairing may be obtained based on equilibrium peptide models of oxidative folding intermediates [108, 109].