INTRODUCTION

G protein-coupled receptors (GPCRs), which are one of the most important classes of membrane proteins in pharmacology and the most represented family of proteins in the human proteome, are characterized by the amphiphilic nature and low stability. GPCRs consist of seven transmembrane (TM) helices and are expressed on the cell plasma membrane. They recognize a broad range of extracellular ligands and transmit signals into the cells, thus triggering different types of cellular responses. GPCRs play a paramount role in many physiological functions of the human organism, such as vision, gustation, olfaction, regulation of activity of the nervous, immune, and cardiovascular systems, and maintenance of homeostasis and cell density in tissues. Dysfunctions in these processes lead to serious diseases that can be corrected by blocking or activating the respective receptors. Up to 40% of currently prescribed drugs target GPCRs [1, 2]. Knowing the high-resolution structures of GPCRs in different functional states is essential for understanding of the molecular mechanism underlying their action, as well as for the creation of highly efficient medicines with minimal side effects. The time and the cost of drug development can be significantly decreased by using the structure-based drug design [3]; hence the studies of GPCRs can benefit basic science, medicine, and pharmacology.

Currently used methods for obtaining high-resolution structures of proteins are X-ray crystallography [4], cryoelectron microscopy (cryo-EM) [5], microcrystal electron diffraction (microED) [6], and biomolecular nuclear magnetic resonance (NMR) spectroscopy [7].

Studying GPCR structure and functions requires homogeneous stabilized samples of these receptors. Each of the involved procedures (e.g., expression and purification of stable monomeric receptor) represents an independent scientific task requiring development of the corresponding protein engineering strategy.

In this article, we reviewed the strategies that proved to be the most successful in resolving the structures of GPCRs and their complexes, including rational design of genetic modifications (prediction of necessary point mutations, deletions; the use of tags and fusion proteins), selection of the optimal expression systems, development of strategies for protein purification and stabilization (including the use of various membrane-modeling media, antibodies and their fragments, and ligands), as well as the basic approaches to the characterization and quality control of the obtained protein samples.

GENETIC CONSTRUCTS

Introduction of point mutations. In most cases, point mutations leading to the increase in the receptor expression level, protein homogeneity, and higher thermostability and, in some cases, aimed to eliminate undesired posttranslational modifications, are introduced into receptor-encoding genetic constructs. For example, it has been shown that for some class A GPCRs, amino acid substitutions in the allosteric sodium-binding site upregulate protein expression [8] and decrease the conformational heterogeneity of the receptor via its stabilization in the states preferential for the binding of antagonists [9] or agonists, or in the intermediate states [10]. Similar modifications were used for the class B receptors [11].

Posttranslational modifications can reduce the homogeneity of protein sample during crystallization. To eliminate the negative effect of posttranslational modifications, some receptor residues were substituted with amino acids with different functional side groups. N-glycosylation is the most common posttranslational modification in GPCRs, so deleting the glycosylation sites has proven to be beneficial [12-15]. However, in most cases, posttranslational modifications are functionally important and can affect protein expression, folding, and in some cases, the ability to bind ligands; hence elimination of postranlational modifications can result in the decreased surface expression for many GPCRs [16].

Another approach is introduction of cysteine bridges into the protein molecule. It has been commonly recognized that disulfide bonds have a direct effect on the protein folding and stability [17-19]. For some GPCRs, introduction of point mutations leading to the formation of disulfide bridges between two closely spaced cysteine residues proved to be quite successful, for example, introduction of disulfide bond into the rhodopsin molecule between the N-terminus and the third extracellular loop (ECL3). This modification increased the thermostability of the receptor by 10°C [20], both as the opsin apoprotein [21] and in a complex with retinal [20] in a detergent solution. At the same time, it had no significant effect on the receptor function [20-22]. In another example, the position for the disulfide bond introduction into the LPA1 receptor was determined using a predictive algorithm [23]. The formation of a disulfide bond between the external sides of the TM domains 5 and 6 increased the thermostability of the LPA1 receptor by 5°C in the presence of a ligand [24]. The formation of a disulfide bond between the TM helices 5 and 6 in the GLP-1 receptor resulted in the stabilization of its inactive conformation [25].

Despite the existing concepts on the mechanisms of receptor stabilization, the choice of point mutations that would lead to the desired effects remains a difficult task. Below, we present some approaches aimed to solve this problem.

Alanine scanning and StaR™ technology. Alanine scanning is a method of sequential site-directed mutagenesis used to identify specific amino acid residues that influence the stability or function of a studied protein. Because alanine has only a beta-carbon atom in its side chain [26], it is a small amino acid that can replace almost any other residue without creating steric hindrances. At the same time, alanine preserves the polypeptide backbone conformation (unlike glycine and proline) [27]. Alanine residues tend to form alpha-helices, which are common for membane proteins [28]. Due to these properties, alanine is preferred as a substituting residue in the search for the functional amino acids in a receptor.

In the course of alanine scanning, all amino acid residues are sequentially replaced by alanine, while native alanine residues are typically replaced by leucine [29, 30] and less often, by other amino acids [31]. The stability of the resulting mutant proteins is evaluated experimentally and the best mutations are selected. Alanine scanning has been used to design more stable receptors, as well as to identify amino acids involved in the signal transduction [32].

As a rule, alanine scanning is not performed for wild-type receptors. Usually, the starting point is a modified construct, e.g., protein with deleted flexible N- and C-terminal domains, since such deletions can promote protein expression and reduce proteolytic cleavage of the terminal receptor sequences [29, 30, 33].

Although using alanine and leucine significantly helps in the initial search for positions that can be mutated to stabilize the protein, using other amino acid as substitutions might result in better stabilization of the receptors [29, 30].

Stabilized Receptor™ (StaR™) technology involves a design of thermostable receptors by inserting point mutations into their structure. The thermostability increases due to the protein stabilization in any of its conformations existing in nature: from the ground state to the fully active G protein-bound state. The choice of the state is determined by the type of ligand (agonist, antagonist, inverse agonist) used in the experiments. Thus, when the agonist-binding conformation is chosen, the receptor will have a high affinity to other agonists, and vice versa. For example, the constructs with an enhanced affinity to agonists or antagonists have been designed for the adenosine A2A receptor (A2AR) [30].

In essence, the StaR™ technology is based on alanine scanning. All mutants with the single point substitutions are expressed and their ability to bind radioactively labeled ligand at increasing temperatures is assessed in comparison with that of the wild-type protein [34]. Next, point mutations promoting the thermostability of the ligand–receptor complex are sequentially combined with each other to achieve the maximum melting temperature [35].

The first crystal structure obtained with the help of the StaR™ technology was the structure of the turkey β1-adrenergic receptor (β1AR) [29]. The authors used the alanine scanning, as well as substitutions based on the amino acid sequence alignment with the sequence of bovine rhodopsin, the only GPCR with the known structure at that time.

Because of the need to test a large number of mutants (more than 300) for thermostability, designing a StaR™ construct might take months of laborious work. However, this technique still stays relevant for obtaining receptors for crystallization [20, 36-48].

Directed evolution. Directed evolution can assist in the search for and selection of functionally expressed receptors that cannot be produced in commonly used expression systems, such as insect and mammalian cells.

The directed evolution of GPCRs includes the following steps: (i) creation of a library of the receptor-encoding gene mutants; (ii) detection of receptor variants exposed to selection pressure, whose increased functional expression is confirmed by the binding of a specific ligand; and (iii) subsequent rounds of evolution/selection to find the most preferable variant [49]. It should be taken into account that all the procedures described in this section require the use of receptor-specific ligands.

Originally, the libraries of mutant GPCR genes were analyzed in the Escherichia coli cells as the most convenient and simplest object for genetic manipulations. Permeabilization of the E. coli outer membrane made the receptors expressed on the inner membrane accessible to fluorescently labeled ligands. Then the fluorescence-activated cell sortinf (FACS) was performed. Hence, a great number of receptor mutants could be screened simultaneously without the need to express each receptor separately. The use of several rounds of selection has made it possible to identify receptors with an enhanced functional expression not only in prokaryotic, but also in eukaryotic expression systems [50]. The CHESS technique (cellular high-throughput encapsulation, solubilization, and screening) developed later allowed to identify receptors with an increased resistance to detergents, which was possible due to the encapsulation of E. coli cells within a polysaccharide matrix, followed by the cell membrane disruption and solubilization of receptors with a detergent. Large cellular components, including the receptor itself and the plasmid DNA, remained within a capsule. Further incubation of the semipermeable capsules with a fluorescently labeled ligand and FACS made it possible to select the clones with the receptors staying functional under the given conditions. The selected genes were exposed to further rounds of mutagenesis and selection [51]. The methods of directed evolution in E. coli cells [51, 52] together with further modifications allowed to produce the neurotensin receptor 1 (NTSR1) applicable for structural studies [47].

The drawback of the above-mentioned techniques is the need to express functional receptors in E. coli cells, which is not always possible for the native eukaryotic membrane proteins.

The SaBRE (Saccharomyces cerevisiae-based receptor evolution) method, which appeared later [53], is based on the directed protein evolution in yeast cells and allows expression of a large variety of GPCRs due to the existence of posttranslational mechanisms in yeast. Only two rounds of directed evolution in S. cerevisiae cells were needed to obtain receptors with an enhanced expression and capable of ligand binding. Mutations selected by the SaBRE method ensured an increased expression in insect cells of receptors with a significantly elevated thermostability. The method was used to create the constructs for crystallization of the oxytocin receptor [54], parathyroid hormone receptor PTH1R [55], and opiod receptor δOP with eight introduced point mutations that had been identified earlier by the SaBRE method for the related κOP receptor [56].

Recently, researchers at the Andreas Plückthun lab have combined several techniques of directed evolution for the purpose of high-throughput CHESS screening for the oxytocin receptor, which is originally toxic for E. coli cells. The mutants selected by the SaBRE method were used in subsequent rounds of evolution in E. coli cells in order to perform CHESS and to identify mutants with the increased functional expression in E. coli cells. This approach can benefit the studies, in which the use of E. coli cells is preferable [49].

Recently, the YDDS (yeast direct detergent screening) method for selecting receptors has been reported, which allowed to identify GPCRs resistant to the short-chain detergents upon their expression in yeast cells [57].

Using machine learning to predict point mutations: CompoMug algorithm. The use of methods employing artificial intelligence systems has led to significant breakthroughs in many scientific areas and allowed to solve the problems that otherwise could not have been efficiently solved by the classical approaches. The CompoMug algorithm consists of four modules based on the analysis of available data, sequence analysis, structure analysis, and machine learning, respectively, which combine several approaches to compile a list of candidate point mutations that could improve the stability of GPCRs. The data- and sequence-based modules work only with the information on the target receptor sequence, while the structure- and machine learning-based modules work with the structural information. This algorithm generates a priority list of point mutations (with the efficiency of each mutation estimated using internal algorithm criteria in each module), that can be tested experimentally to create receptors that can be purified as homogenous stable protein samples suitable for structural studies. The CompoMug algorithm uses various types of information related to the GPCR stability, e.g., the data on the stabilizing mutations transferred between GPCRs, sequence-based information on the protein evolution (e.g., conservation of absolute and relative positions of amino acids), and structural data (e.g., distances between the contacts and residue energies). All information is encoded numerically to compile the GPCR stability descriptors. Machine learning algorithms were applied to the descriptors calculated for the known GPCR-stabilizing point mutations in order to obtain a model for comprehensive prediction. The method has been iteratively improved with the accumulation of data on the stabilizing effect of point mutations for new GPCRs with regard to the experimental feedback data. The method was implemented using modern machine learning algorithms and computational biology. The software distribution tool and its description are available at: CompoMug 2.0 https://gitlab.com/pp_lab/CompoMug. CompoMug is a computational platform for GPCR thermostabilization and can be further refined with the accumulation of experimental data [58].

Deletion of unordered regions from the receptor amino acid sequence. Flexible regions of proteins often hinder crystallization and reduce the quality of the obtained data. The most flexible regions in GPCRs are the N- and C-terminal sequences; therefore, they are often deleted with only 10-20 amino acid residues being left. A long N-terminal fragment of the receptor can contain glycosylation sites, which influence protein processing in the cytoplasmic membrane, thereby affecting its release [59]. The C-terminal part can also affect protein expression and its monodispersity [60]. Unfortunately, it is impossible to predict the effect of deletion of the N- and C-terminal regions on the level of receptor expression. At the same time, one should take into consideration the functional importance of glycosylation sites and sites providing receptor interaction with ligands, G proteins and other partners [61, 62]. In practice, it is preferable to delete the unordered regions in multiple rounds (1 to 5 residues per each) and to test the effect of these deletions experimentally.

In class A GPCRs, the third intracellular loop (ICL3) is very flexible [63]. To improve crystallization, it is often replaced by a fusion protein, e.g., the thermostable region of apocytochrome (BRIL), flavodoxin, T4 lysozyme, rubredoxin, glycogen synthase, etc. [64] (see the section “Use of Fusion Partners”).

Insertion of additional elements into the receptor amino acid sequence. Insertions into the receptor amino acid sequence can serve different purposes. Expression of GPCRs and their translocation to the cell surface are often increased by adding the HA-tag (MKTIIALSYIFCLVFA) [65, 66], a fragment of hemagglutinin (major surface protein of the influenza virus). When located at the protein N-terminus, the HA-tag provides more efficient translocation of protein to the endoplasmic reticulum and its further transport to the cell membrane surface [65]. The HA-tag can be used in both SF9 insect cells [67] and human HEK293 cells [66, 68]. Another fragment of hemagglutinin sequence (YPYDVPDYA) was used as an epitope for the immunohistochemical studies of GPCRs [69-71].

To facilitate receptor purification by affinity chromatography, receptor molecules are often modified by special tags, such as the FLAG-tag (DYKDDDDK/A) and/or His-tag (5-10 histidine residues) at the N-terminus [72]. Both His-tag and FLAG-tag can be inserted into the C-terminal part of the receptor as well, however, the affinity resin for receptor purification should be taken into account. These tags can also be used to analyze the level of protein expression by immunoblotting and flow cytometry. Since the N-terminus of a GPCR is always located outside the cell, the labeling of intact cells requires FLAG-tag to be added to the protein N-terminus [73, 74].

Maltose-binding protein (MBP) is also used for the modification of the GPCR sequence, but less frequently [75]. This rather large protein (42.5 kDa) can be used not only as a tag for affinity chromatography, but as a good fusion partner to increase the expression of GPCRs in bacterial cells [29, 47, 76, 77].

Simultaneous introduction of MBP and thioredoxin A (TrxA) in the NTR1 construct increased the amount of protein expressed in the E. coli membrane and allowed to obtain the crystals of the receptor [47]. Addition of glutathione S-transferase (GST) to the N-terminus of the CXCL8 chemokine receptor (CXCR1) made it possible to express this protein in amounts sufficient for crystallization, although within the inclusion bodies [78]. Various fusion partners and expression systems are discussed in detail in the sections “Use of Fusion Partners” and “Expression Systems for GPCR Production”.

GPCR isolation can be facilitated by the introduction of the Strep-tag, a small peptide of eight amino acid residues (WSHPQFEK), which was found by chance as a peptide selectively binding to streptavidin [79, 80].

Affinity chromatography techniques are discussed in the section “Purification and Stabilization of GPCRs”.

Since polypeptide tags are unordered sequences, their presence in the receptor construct can hinder analysis of the protein spatial structure. The tags could be removed using highly specific proteases. The commonly used enzymes are TEV (tobacco etch virus) protease and PreScission protease (a variant of HRV 3C protease). The recognition site for TEV protease is the ENLYFQ/S sequence that is cleaved between the glutamine and serine residues. Instead of serine, there can be glycine, alanine, methionine, cysteine, and histidine residues. The recognition site for the PreScission protease is LEVLFQ/GP, which is hydrolyzed strictly between the glutamine and glycine residues, which makes this protease more specific than TEV protease.

In some cases, the receptor sequence and the introduced fusion proteins, tags, protease recognition sites are separated by linkers several amino acid residues long to facilitate the access to the epitopes (for antibodies) and protease recognition sites, as well as to modify the flexibility of the resulting recombinant protein [81].

Use of fusion partners. GPCRs are dynamic proteins that change their conformation depending on the stage of the signaling process [82]. Also, as integral membrane proteins, GPCRs have a low content of hydrophilic regions that are necessary for the formation of crystals. These features are the major obstacles in obtaining diffraction-quality crystals of GPCRs [64].

Chimeric proteins consisting of GPCR and a partner protein are created to increase the stability of the receptor and the probability of contact formation in the protein crystal. The partner protein has to satisfy certain criteria, such as the existence of its high-resolution structure, small size (~5-20 kDa), simple folding, absence of posttranslational modifications, and molecule hydrophilicity [83]. Another requirement is that if the fusion partner is inserted between the receptor TM domains 5 and 6, the distance between its N- and C-termini protein should be less than 15 Å [64].

As a rule, the partner protein is incorporated into the ICL3 [13, 84], because it is conformationally heterogenous and connects the most mobile TM domains 5 and 6. Less often, the partner protein is fused to the receptor N-terminus [85]. It is also possible to insert the partner protein into other parts of the receptor molecule, e.g., intracelular loop 2 (ICL2) [86] or C-terminus [87]. In some cases, several partner proteins have been used simultaneously [87, 88].

None of the known fusion partners provides a universal solution for the problem of GPCR crystallization. Typically, several partner proteins are tested with the same receptor and several insertion sites are tried for the same partner protein. In most cases, the partner protein is added together with the introduction of stabilizing point mutations in the receptor. The proteins used most frequently for the fusion with GPCRs (Fig. 1) contain various elements of the secondary structure. They might consist of loops (rubredoxin), β-sheets (xylanase), α-helices [C-terminal fragment of T4 lysozyme (CtermT4L) and its modifications (mT4L), thermostabilized apocytochrome b562 (BRIL)], or a combination of these elements [flavodoxin, catalytic domain of glycogen synthase from Pyrococcus abyssi (PGS)] [64, 89].

Fig. 1.
figure 1

Common fusion partners of GPCRs. Images were taken from GPCR structures from PDB database with each of these fusion partners: rubredoxin – 4MBS; TrxA – 6IBL; mT4L – 4U16; T4L – 2RH1; BRIL – 5NM4; flavodoxin – 5TGZ; PGS – 5U09. The pie chart represents the frequency of their use as fusion proteins.

The first GPCR structures with fusion proteins were obtained with T4L, an easily crystallized well-folded protein. Currently, the most often used protein is BRIL. Unlike T4L, BRIL provides more rigid fixation of the TM helices 5 and 6 due to the smooth transition into the α-helices of BRIL. The best resolution for the GPCR structure obtained by X-ray crystallography is 1.7 Å (PDB ID: 5NM4), which was achieved by using GPCR fused with BRIL and containing thermostabilizing point mutations.

TrxA and MBP are used mainly to enhance GPCR expression in E. coli cells. TrxA was also used to increase the protein expression level in a cell-free translation system [90]. In most cases, these partner proteins are cleaved off during purification [47, 87, 91]. However, the fusion with the partner proteins can facilitate the cryo-EM studies of GPCRs. For example, TrxA was added to the N-terminus of A2AR to increase the molecular mass of the receptor without affecting its pharmacological properties [92, 93].

The use of alternative partner proteins has increased the number of successful crystallizations and published structures. For example, addition of the DARPin D12 protein to the seventh TM helix of GPCR [94]. This partner protein was used for crystallization of NTSR1 and α1B-adrenergic receptor [95, 96] produced by expression in E. coli cells. DARPin D12 is a compact protein containing the ankyrin repeat motif forming the helix–turn–helix conformation and promoting protein–protein interactions [97]. It has been shown that the force of protein–protein interactions of DARPin is sufficient to form protein crystals under various conditions [98].

EXPRESSION SYSTEMS FOR GPCR PRODUCTION

The photoreceptor protein rhodopsin is the only GPCR expressed with a high density in cell membranes in a multicellular organism [99] and, therefore, it was isolated directly from the discs of the external segment of bovine (Bos taurus) retinal rod cells [100] in amounts sufficient for crystallization. As a result, the first 3D structure of GPCR resolved by X-ray structure analysis was published in 2000 [101]. Later, the structure of rhodopsin from the retina of Japanese common squid (Todarodes pacificus) was solved [102]. However, the members of the GPCR superfamily are typically expressed in cell membranes in extremely small amounts, which makes their purification from tissues impractical and technically difficult. In addition, for most receptors, production of high-quality crystals requires insertion of modifications in the receptor amino acid sequence, which is impossible for the receptors isolated from natural sources. Therefore, obtaining sufficient amounts of protein for its characterization and further structural studies is one of the key problems in the structural biology of GPCRs.

Multiple expression strategies are used to increase the yield of the purified receptor up to milligrams, the most common being heterologous expressed in various systems. As most other proteins, GPCRs have been expressed in bacterial, yeast, insect, and mammalian cells, as well as in cell-free expression systems.

Bacterial expression systems. E. coli is an extensively studied and cost-efficient bacterial expression system that has shown itself effective in the production of recombinant proteins, including those used in the structural studies. However, the use of E. coli cells for the production of GPCRs has been significantly limited. The reasons for this include significant differences in the lipid composition of prokaryotic and eukaryotic membranes (since lipids are important regulators of the receptor function), the absence of well-developed machinery for protein folding and processing, and the absence of most of posttranslational modifications typical for mammalian membrane proteins, which results to the impaired folding and function of expressed membrane proteins [103]. Typical GPCR modifications are glycosylation, palmitoylation, phosphorylation, etc., that are important for the correct protein folding and intracellular transport of the receptors [16]. In addition, protein overexperssion in bacterial cells often leads to the formation of insoluble protein aggregates (inclusion bodies) [104]. The latter per se are not necessarily a disadvantage, because proteins in inclusion bodies are relatively pure and unavailable from digestion with cellular proteases. However, protein recovery from inclusion bodies requires a great deal of efforts to select conditions for the in vitro protein refolding and is often unsuccessful. In addition, the reducing nature of the E. coli cytoplasm is unfavorable for the formation of disulfide bonds. There have been numerous attempts to overcome the limitations of expression of eukaryotic membrane proteins in E. coli cells, e.g., by using partner proteins directing GPCR expression to the cytoplasmic membrane.

Despite the disadvantages of bacterial expression systems, the attempts of bacterial expression of GPCRs have been made more than once, resulting, for example, in obtaining the NTR1 structure in 2014 [47]. The N- and C-termini of the receptor were fused with MBP and TrxA, respectively, to provide the high expression level of the recombinant protein. In addition, numerous mutations were introduced into the protein sequence in order to crystallize it, which affected the functional properties of NTR1 (at the same time, the properties of the receptor expressed in insect cells were not impaired [44]).

Expression in yeast cells combines the advantages of bacterial expression systems (low cost of reagents, simplicity of genetic manipulations, rapid biomass growth) and the possibility of some posttranslational modifications [105]. Pichia pastoris and Saccharomyces cerevisiae strains have been well characterized genetically. Protein expression in Pichia is typically preferred [106], as it has provided much higher (10-100 times) biomass yield compared to the same volume of S. cerevisiae culture [107]. Pichia strains have been succussfully used for the production of recombinant membrane proteins, including GPCRs [108-111].

The shortcomings of the yeast expression system include the difficulties with cell lysis and disintegration due to the rigidness of yeast cell wall, as well as the fact that protein glycosylation profile in yeast is different from that in mammalian cells [107].

Expression in insect cells is the most common approach in the structural studies of GPCRs, because expressed proteins undergo most of necessary posttranslational modifications occurring in mammalian cells. However, N-glycosylation in insect cells involves attachment of simple unbranched glycoproteins, in contrast to mammalian cells, where complex glycoproteins with branched oligosaccharide chains are formed [112-114].

The temperature conditions for the cultivation of insect and mammalian cells are different, which may influence the cell membrane composition. The membranes of insect cells are characterized by a lower cholesterol content, higher phosphatidylinositol content, and the absence of phosphatidylserine [115]. Such differences can result in lower yields of the expressed receptor. Nevertheless, the cell culture medium can be enriched with the necessary lipids, which enhanced the expression of some receptors, e.g., turkey β1AR [116] and human dopamine D3 receptor [117].

Despite the disadvantages described above, expression in insect cells yields functional proteins that can be used in structural studies.

The highest number of proteins with the solved structures have been obtained in Spodoptera frugiperda cells (cell lines Sf9 and Sf21); a lesser number of proteins have been produced in Trichoplusia ni cells (Hi5 cell line). It is recommended first to test the level of protein expression in small volumes of cell culture, because it can be significantly different in the tested cell lines. Thus, expression of turkey β1AR in Hi5 cells was much higher than in other cell lines [116].

Expression in insect cells is typically performed using the baculovirus system. The expression cassette with the target gene is incorporated through the site-specific transposition into a baculovirus shuttle vector (bacmid) [118] produced in E. coli cells. Subsequent transfection of insect cells with the recombinant bacmid results in the production of baculoviral progeny, which is then used to infect the cells for the expression of protein of interest.

The shortcomings of this method include relatively expensive cultivation medium, cell lysis as a result of infection by the recombinant baculovirus (which can lead to proteolytic degradation of the target protein), and the differences in the glycosylation profile of the produced proteins with the proteins synthesized in mammalian cells. The advantages of this method are its simplicity, high protein yields in suspension culture, the use of serum-free medium for cultivation, and correct folding of membrane proteins (after optimization) [119]. The overall process, from bacmid production to protein expression, takes about 3 weeks on average.

Expression in mammalian cells provides the most native environment for the production of human receptors. Due to correct protein folding and existence of mechanisms providing posttranslational modifications of human proteins, expression in mammalian cells is widely used in molecular biology. HEK293 (human embryo kidney) and CHO (Chinese hamster ovary) cells are the best studied and most popular in GPCR research. The limitations for protein production in mammalian cells include the use of expensive culturing media and antibiotics, as well as the time-consuming selection of cell lines providing stable expression and high yield of the target protein [120]. In this context, mammalian expression systems are typically used in the studies of receptor role in signal transduction, when the high levels of protein expression are not required [121, 122] or in the cases when production of functinally active receptor in other expression systems (e.g., insect cells) has failed [123, 124].

Cell-free expression systems. Cell-free protein synthesis in vitro is based on the use of cell extracts for the protein transcription and translation [125] or translation only [126, 127]. Such systems include high-molecular components (ribosomes, DNA template or mRNA [126], enzymes) and various substrates, such as amino acids, nucleoside triphosphates (NTPs), and energy sources [128]. Membrane-simulating components (e.g., micelle-forming detergents, liposomes, nanodiscs) are often added to ensure expression of membrane proteins in a soluble form [129, 130]. Expression in the absence of membrane-simulating components is also possible [129], but might result in the production of insoluble aggragates. Although it often provides higher protein yields [131], the recombinant protein has to be solubilized, which may have a negative effect on its properties [130].

One of the advantages of the cell-free system is that it can be used for the synthesis of proteins that are toxic for cells [90, 132, 133]. It also allows tight control of the reaction components [127], as well as medium adaptation using various additives [130]. Moreover, cell-free expression systems make it possible to include uncommon amino acids into the produced protein to impart new properties to the latter [132]. For example, labeled amino acids can be introduced to the protein sequence for further protein characterization by NMR spectroscopy [131, 132]. Also, affinity purification can be performed immediately after the expression [133, 134], in opposite to cell-based systems that require cell disintegration and membrane solubilization for the isolation of membrane proteins [135]. Because the proteins can be expressed in small volumes, it makes easy to evaluate the results of optimization of expression conditions by high-throughput screening with a plate reader [131, 136].

The composition of the cell extract can vary depending on the centrifugal force used for the fractionation of cell components [134]. Cell extract from E. coli cells is most commonly used for the synthesis of recombinant proteins, because it is cost-efficient, easy to use, and provides high protein yields (mg/ml) [134, 135]. However, when choosing the source of cell extract, one should take into consideration the required protein yield, as well as the ability of the system to provide correct protein folding and necessary posttranslational modifications [137]. For example, the extracts of eukaryotic cells, such as insect cells [138, 139], rabbit reticulocytes [140], and Chinese hamster ovary cells [141], ensure protein glycosylation. Also, E. coli extracts can be optimized by adding chaperones, stabilizing compounds, and redox agents that contribute to the correct folding of eukaryotic proteins [142-145]. E. coli strains capable of some types of glycosylation have been obtained as well [146-148]. Other important factors for consideration when choosing an expression system, are how easy the system is to work with and its cost.

Numerous attempts have been made to express GPCRs in the cell-free systems. The produced receptors demonstrated similar protein–ligand binding affinities compared to the receptors expressed in the cell-based systems [149-154]. Moreover, it was possible to obtain conformation-specific antibodies against class A and C GPCRs expressed in wheat germ extract [155]. However, a considerable portion of the synthesized receptors might be nonfunctional [154], because even the near-optimal conditions can affect the formation of the functional protein [129].

Apparently, this is the reason why there are only a few X-ray structures solved for membrane proteins synthesized in cell-free expression systems [135, 156, 157].

Despite the advantages of protein synthesis in cell extracts, cell-free expression system has not become very popular in GPCR research, as they require a lot of effort for the optimization of expression of functional GPCRs.

PURIFICATION AND STABILIZATION OF GPCRs

The structural studies of GPCRs are hindered by the low stability, amphiphilic nature, and conformational mobility (a quality essential for the signal transduction) of the receptors. High-affinity ligands and antibody fragments can be added to the receptors to stabilize them and to reduce their conformational variability.

Methods for GPCR purification. GPCR purification is a complex multistage process often combining several purification techniques, such as gel filtration, ion exchange chromatography, affinity chromatography, and immunoaffinity chromatography.

Affinity chromatography is a common method for purification of recombinant proteins. This technique is based on the specific interaction between a target protein and a ligand bound to the stationary phase. Below, we describe the ligands used for GPCR purification.

Metal chelate affinity chromatography uses the interaction between an immobilized metal ion and free electron donor groups of amino acid side chains. Typically, a resin with immobilized Ni2+ or Co2+ is used as the stationary phase, while the receptor is modified with the His-tag at the N- or C-terminus. The imidazole rings of the His-tag bind to the chelated ions of divalent metals, thus allowing to separate the target protein from the cell lysate components, even in the presence of detergents used for GPCR solubilization. Resins with immobilized cobalt ions are believed to have a lower protein binding ability and a higher specificity than resins with immobilized nickel ions [158]. Because of a higher purity of the resulting protein sample, Co2+-containing resins are often preferred for the GPCR purification.

The first stages of GPCR purification are often performed using other chromatography techniques. For example, proteins can be purified using the binding between Strep-tag (WSHPQFEL) incorporated into the receptor amino acid sequence and immobilized avidin (Strep-Tacin) [80, 159, 160]. It is also possible to use amylose resins for purification of MBP-tagged proteins [75, 161, 162].

Immunoaffinity chromatography is based on the antibody–protein interaction. Very often, the second [163, 164] or even the first [165, 166] step in the GPCR purification is chromatography with anti-FLAG antibodies [72]. At present, calcium-dependent antibodies M1 [73], calcium-independent antibodies M2 [167], M5 [168], and L5 [169], and other anti-FLAG antibodies are commercailly available; however, only agarose beads with immobilized M1 and M2 are produced. To use immobilized M1 antibodies, the FLAG-tag should be at the protein N-terminus, while M2 antibodies bind both to the N-terminal and C-terminal FLAG-tag; however, the chromatography on M1 antibodies is preferable [170].

Rhodopsin has been studied using a special resin with 1D4 antibodies specifically binding to its C-terminal sequence (TETSQVAPA) [171, 172], which was used for purification of other recombinant GPCRs [173]. Another tag that can be employed in the immunoaffinity chromatography of GPCRs is the C-reactive protein, which binds to specific anti-C-reactive protein antibodies immobilized on agarose [12].

Ligand chromatography. Strictly speaking, the previously described techniques also represent ligand chromatography, as they involve biospecific interactions between proteins and immobilized ligands. Here, we will describe the type of affinity chromatography that does not require introduction of special tags in the receptor amino acid sequence. In the case of ligand-specific chromatography, a ligand selectively interacting with the studied receptor is cross-linked to an inert carrier, thereby making it possible to separate functional ligand-binding receptors from the nonfunctional ones [72]. The best known technique is alprenolol affinity chromatography. Alprenolol is a nonselective beta-blocker and antagonist of serotonine receptors 5HT1A and 5-HT1B [174]. A resin with immobilized alprenolol was used to purification of β1AR [12, 175-178]. To get rid of alprenolol, different chromatographic methods could be combined [179, 180]. Ligand chromatography was used for purification of untagged M2 muscarinic receptor [181].

The final stage of purification of recombinant GPCRs and their complexes can be affinity chromatography on immobilized lectins [12]. Concavalin A (ConA) is a protein that binds α-D-mannose and α-D-glucose with a high specificity; therefore, it binds only glycosylated membrane proteins [59].

Ion-exchange chromatography is used for GPCR purification less often than affinity chromatography. It is based on the electrostatic interactions between the charged side groups of amino acids and oppositely charged immobilized groups. The strength of binding is proportional to the total protein charge. Ion exchange chromatography has been used to purify GPCRs [47, 182], as well as G proteins, nanobodies, antibodies [180, 183], and other proteins used in the studies of GPCRs [184].

Gel filtration, or size-exclusion chromatography (SEC), is a method for the separation of proteins according to their size and shape. For globular proteins, the size of the molecule directly depends on its molecular mass, which allows to separate proteins by their mass [185]. The resin for gel filtration is synthesized by cross-linking dextran molecules into beads of a particular size that act as molecular sieves. The pores of the beads retain smaller molecules, while larger molecules do not enter the pores and pass throught the resin much faster [186]. The tecnique is described in more detail in the section “Characterization of Protein Preparations and Quality Control”.

Although preparative gel filtration is rarely used as the final purification step before receptor crystallization [20], it is commonly used to purify large GPCR complexes with G proteins, antibodies, etc. [161, 184, 187-189].

Membrane modelling systems in GPCR studies; their pros and cons. Each GPCR in the cell membrane creates a unique physicochemical environment necessary for its functioning. Correct GPCR folding, interaction with ligands, and signal transduction depend on the membrane thickness, curvature, lipid composition, electrostatic potential, as well as membrane pressure on the receptor. All these factors should be taken into account when choosing a membrane modelling system (MMS) [190]. Below, we discuss the most popular MMSs used in the studies of GPCR structure and function.

Detergent micelles. Detergents are amphiphilic compounds consisting of the hydrophilic head and hydrophobic tail. When a detergent is added to the GPCR-containing membranes, detergent molecules substitute for the membrane lipids and form a micelle around the protein (Fig. 2a). Delipidation is the main cause of GPCR destabilization and inactivation [191]. Detergent molecules are more mobile than lipids, which means that the micelle geometry is unstable; the protein can determine the thickness and shape of the micelle hydrophobic part, as well as adopt some nonphysiological conformations [192]. Therefore, studying GPCRs in micelles often requires introduction of stabilizing mutations into the protein. Detergents can also bind to ligands, G proteins, and other molecules, hindering the interactions between these molecules and GPCR.

Fig. 2.
figure 2

Main membrane modelling systems.

Based on the charge, detergents are divided into nonionic, ionic, and zwitterionic. Nonionic detergents [e.g., DDM (n-dodecyl-β-D-maltoside), CHS (cholesteryl hemisuccinate), and LMNG (lauryl maltose neopentyl glycol)] are very efficient for solubilization [193]. Ionic detergents, e.g., SDS (sodium dodecyl sulfate), have a more severe impact on the protein molecule and can be used for both protein solubilization [194] and denaturation [195]. Zwitterionic detergents are used less commonly; however, they allow to obtain GPCRs with the melting temperature close to that in a DDM+CHS mixture [196].

Branched detergents better stabilize GPCRs due to their structure and more efficient packaging. The branched detergent LMNG forms a hydrogen bond between its two heads, thus restricting the mobility of the receptor. In the case of unbranched DDM, two detergent molecules cannot form stable hydrogen bonds between them; the receptor in such micelle is more flexible and, therefore, binds the ligands more efficiently [197].

Detergents are characterized by the critical micelle concentration (CMC) which is defined as the detergent concentration above which micelles form. The CMC of a detergent depends on the temperature and properties of the solvent [198]. The CMC for different detergents can vary within a rather broad range: in water, the CMC for DDM is 0.17 mM and CMC for LMNG is 0.01 mM (i.e., more than 15 times less than for DDM). The CMC should be taken into account when choosing a detergent, inter alia, for the structural studies. Usually, for successful solubilization, the detergent is added to the protein in excess, which results in the presence of free detergent molecules or even empty micelles in the solution [199]. Solubilized receptors can have the same size as empty micelles and therefore might be mistaken for them, which significantly hinders the analysis by cryo-EM [200]. Free detergent also decreases the signal/noise ratio. During crystallization in lipid cubic phase (LCP; see below), free detergent molecules can prevent the phase formation [201]. Therefore, it is necessary to get rid of free detergent, which it is difficult to achieve for detergents with low CMC values using traditional methods (e.g., dialysis or gel filtration). New methods are currently being developed for the removal of free detergent from the GPCR solutions [200]. A detergent with a very high CMC can be added to the protein sample to form a layer at the water–air interface and to prevent protein adsorption on it [202, 203]. However, receptors are more efficiently incorporated into micelles formed from the detergents with lower CMCs (due to longer hydrophobic tails) [204]. It makes such detergents more attractive for the micelle formation. For example, the structures of the modified β2-adrenergic receptor (β2AR) in a complex with the G protein and beta-arrestin [205], A2AR complex with the G protein [93], and rhodopsin complex with the G protein [206] were obrtained using the LMNG micelles. However, there is no universal solution; the detergent should be selected for each receptor and in each experiment.

Incorporation of GPCRs into detergent micelles is currently the most popular method of their solubilization due to simplicity of micelle formation and diversity and relatively low cost of detergents. Other MMSs, unless otherwise specified, require preliminary protein solubilization into detergent micelles and, therefore, have the same shortcomings, e.g., the necessity for additional protein stabilization by mutations.

Amphiphilic polymers. Amphiphilic polymers (amphipols, Apols; Fig. 2b) are long molecules with alternating hydrophilic and hydrophobic regions. The molecular mass of such an amphiphilic polymer can be up to 34 kDa [207]. Amphiphilic polymers show high affinity to the protein hydrophobic part and wrap around it; at the same time, the concentration of free amphiphilic polymers in solution is very low. Amphiphilic polymers have a much milder impact on protein than detergents and are ineffective in solubilization of membrane proteins [208]. The possibility of GPCR stabilization by a biotinylated amphiphilic polymer has been shown for the rhodopsin-like growth hormone secretagogue receptor (GHSR) [209].

SMALP polymers. SMA (styrene maleic acid) is a polymer of styrene (hydrophobic) and maleic (hydrophilic) acids (Fig. 2d). It can solubilize receptors directly from the membrane, allowing to bypass the detergent stage. Styrene acid penetrates into the membrane and fragments it, while maleic acid stays outside the formed SMALP (styrene maleic acid lipid particle) [210]. The distinguishing feature of this MMS is that GPCR is isolated together with a surrounding membrane region. On the one hand, it allows to study GPCRs in the presence of native lipids and to analyze their effect on the receptor function. On the other hand, it makes impossible to control the lipid environment of the receptor. SMALPs were used to isolate β2AR from HEK293T cells [211] and to study the interaction between A2AR and its ligands [212].

Lipid–protein nanodiscs. Lipid–protein nanodisc (Fig. 2c) consists of two molecules of the modified apolipoprotein ApoA-1 wrapped around the lipid bilayer in the antiparallel orientation. By varying the length of ApoA-1, nanodiscs of different sizes (8-16 nm in diameter) can be obtained. Nanodiscs have a higher stability than micelles, and their structure is more similar to the membrane one [213], so they can be used to study the interactions between GPCRs and G proteins [214]. The nanodisc technology continues to evolve and new techniques of nanodisc assembly are being developed, e.g., covalent circularization using the treatment with sortase [215] or the use of DNA-origami barrels to obtain nanodiscs up to 90 nm in diameter [216].

SapNP nanodiscs. Saposins A, B, C, and D are small proteins involved in the sphingolipid metabolism in the body [217]. They consist of four alpha-helices and form disc-shaped particles in the presence of lipids [218]. Similar to SMALP, saposins can solubilize proteins directly from the membrane. They have different lipid-binding specificity, with saposin A binding to the largest number of lipids. Therefore, it is more often used for assembling SapNP (saposin nanoparticle) discs. The proteins in SapA discs are more thermostable than in detergent micelles [219].

Bicelles. Bicelle consists of a flat phospholipid bilayer rimmed with a detergent (Fig. 2e). The most popular bicelle components are DMPC (1,2-dimyristoyl-sn-glycero-3-phosphocholine) as the phospholipid and DHPC (1,2-dihexanoyl-sn-glycero-3-phosphocholine) as the detergent; the ratio between these components determines the size of a bicelle [220]. Bicelles can be used to study the interactions of GPCRs with G proteins and ligands, as it has been done for the human neuropeptide Y receptor type 2 (Y2R) [221]. Bicelles are also used in protein crystallization [222, 223].

Liposomes. When lipids are dried and then rehydrated, they self-arrange into liposomes, which are small bilayer vesicles (Fig. 2f). Multilamellar liposomes can reach up to 50 µm in diameter. Multilamellar liposomes are usually undesirable; therefore, they are additionally treated to obtain unilamellar liposomes. Liposomes are produced by extrusion, sonication, and homogenization [224, 225] and can be used to study multicomponent systems and to reconstruct entire GPCR-triggered signaling pathways. Among all the above-described MMSs, liposomes are the most similar to the cell membrane in their physical properties [226, 227].

Lipid cubic phase (LCP) is a lipid bilayer, which under certain conditions, self-arranges into an infinite periodic surface that divides the space into two non-overlapping areas (Fig. 2g). LCP serves as a matrix consisting of a lipid bilayer; it provides diffusion of protein molecules within itself and facilitates crystallization. This MMS has become very popular as it allowed to crystallize some GPCRs and retinal-containing membrane proteins. Successful crystallization in LCP depends on several factors. Firstly, the lipid bilayer facilitates ordered positioning of hydrophilic and hydrophobic parts of adjacent protein molecules, allowing them to form a crystal consisting of multiple membrane protein layers. Such crystals usually diffract well, because the contacts between the protein molecules are formed by both hydrophilic and hydrophobic regions, which contributes to better ordering. Secondly, the lipid bilayer simulates the natural environment of membrane protein, thereby increasing its stability. Thus, crystallization in LCP or its less ordered analogs (e.g., lipidic-sponge phase) makes it possible to eliminate the major difficulties associated with membrane proteins. In addition, LCP serves as a filter that cuts off large molecular aggregates, thereby facilitating protein crystallization [228]. Depending on the selected conditions (mainly, the concentrations of components in the solution and temperature), some of lipids can assemble into the structures with different space group symmetry (the most comon are the cubic phases Im3m and Pn3m and the lamellar phase) [229]. The most popular lipid for the LCP formation is monoolein; however, using other lipids allows to alter the properties of the LCP (see above) and to vary the sizes of the elementary cell from 30 to 240 Å [230] in order to select the phase parameters for a particular protein to facilitate its correct incorporation into the lipid bilayer [231]. GPCR crystals grown in the LCP may be too small to obtain a high-resolution structure even while using the microfocus stations at the synchrotron radiation sources, so these crystals can be studied by serial femtosecond crystallography with XFEL (X-ray free electron laser) or micro-electron diffraction [232, 233].

Using ligands for receptor stabilization. In the absence of ligands, GPCRs can exist in numerous conformational states [82]. Ligand binding results in the conformational changes that may stabilize the receptor in the ligand-bound state, which is determined by the type of ligand used (antagonist, inverse agonist, or agonist). The ligands providing the best receptor stabilization are selected before crystallization, e.g., by the thermal shift analysis with a fluorescent dye interacting with protein cysteines [234]. Among 450 GPCR structures solved by X-ray crystallography, 429 were crystallized with exogenous ligands; 301 out of 337 cryo-EM structures were also resolved in the presence of ligands (according to GPCRdb [235], August 2022).

So far, only several receptors have been successfully crystallized in the absence of a ligand or without stabilization by the antibody fragments. The first ligand-free structure was solved in 2008 for the bovine opsin [236]. The other structures appeared much later, e.g., of the zebrafish (Danio rerio) lipid receptor LPA6 in 2017. The structure of this receptor was shown to have a lateral vertical cleft between TM4 and TM5, that contained a hydrophobic molecule, probably, endogenous lipid, detergent, or monoolein, which was present in abundance during crystallization. Supposedly, this cleft is a part of the ligand-binding pocket and is intended for the acyl chain of lysophosphatidic acid, a natural ligand of the LPA6 receptor [237]. The structure of the class F human Frizzled 4 receptor (FZD4) with deleted extracellular cysteine-rich domain (CRD) and stabilized by 4 mutations appeared in 2018 [238]. Human orphan receptor GPR52 stabilized by point mutations was crystallized in the absence of ligands. It was found that GPR52 has a uniquely folded extracellular loop 2 (ECL2) that operates as a built-in agonist [239]. Nevertheless, for most GPCRs, ligands proved to be necessary for the receptor stabilization and crystallization, which emphasizes their importance and the necessity of further search for ligands for difficult-to-crystallize receptors.

Using antibody fragments for stabilization of GPCR conformational states. Strategies of using antibody fragments instead of or in addition to partner proteins proved to be successful for some receptors. Antibody fragments can form contacts in the crystal, stabilize proteins in a specific state (thus making protein samples homogeneous), serve as allosteric modulators, and increase receptor thermostability [240]. Antibody fragments aimed at stabilization of receptors in a certain conformational state in complexes with ligands are obtained by immunization of laboratory animals, followed by the generation and selection of antibody-producing hybridomas and purification of monoclonal antibodies [11, 240]. Purified antibodies are fragmented into the antigen-binding Fab domain and crystallizable Fc fragment using papain [241].

For the first time, the monoclonal antibody Fab fragment was used for the crystallization of β2AR (PDB ID: 2R4R) in a complex with its partial agonist in order to decrease the conformational mobility of flexible protein domains and to increase the polar surface for the formation of contacts in crystal, what resulted in a structure with a resolution of 3.4 Å [183]. The structures of the 5-HT2B receptor, sphingosine-1-phosphate receptor 3 (S1PR3), and angiotensin II type 2 receptor (AT2R) in the active conformation were also obtained using Fab fragments recognizing the extracellular domains of these receptors [240, 242-244]. Receptors that were crystallized in the inactive state with the help of Fab fragments include the class B glucagon receptor with T4L incorporated in the ICL2 [86], A2AR receptor [109], and others [11, 38, 240, 245, 246].

In addition to Fab fragments targeting the receptor epitopes, the antibody fragments against the fusion partner BRIL were used to crystallize the ligand-bound glucagon-like peptide-1 receptor. Because anti-BRIL Fab fragments can be employed to facilitate crystallization of different GPCRs fused with BRIL without having direct effect on the receptor, they can be used to study both active and inactive receptor states [11].

In later studies, protein crystallization was performed in the presence of nanobodies (Nb), which are recombinant antigen-binding domains of unique camelid (Tylopoda) antibodies lacking the light chains [247]. The term nanobodies was chosen because of their small size (25% of typical Fab fragment) [248]. Originaly, nanobodies have been used for stabilization of the active conformations of receptors in complexes with agonists, because crystallization of GPCRs in the active state is a difficult task due to their conformational flexibility and instability. For example, the first structure of β2AR in the active state was solved using the Nb80 nanobody that served as an alternative to the Gs protein [248]. Later, Nb80 has been subjected to molecular evolution with the creation of Nb6B9, which was used to obtain several more receptor structures, including those with low-affinity agonists [179, 249]. Some Nbs have been found to stabilize GPCRs in the inactive conformation [250, 251].

Production of antibodies against the fixed states of individual receptors is a labor-, time-, and money-consuming task [252]. One of the approaches used to facilitate this process was successful transfer of the Nb6 epitope from the kappa opioid receptor κOP to other GPCRs [253]. Originally, Nb6 was obtained against the ICL3 of the κOP receptor [254]. Replacing the ICL3 of the target receptor with the ICL3 of κOP ensured Nb6 binding with the target GPCR, which makes not necessary the production of new specific antibodies against the studied receptor.

Using salts and chemical agents for purification and stabilization of receptors. The buffers used in GPCR studies often contain chemical agents to maintain the protein in a stable and monomeric state. The most common of them are glycerol and sodium chloride, which are present at almost all stages of receptor purification. Depending on the structural research method (crystallization or cryo-EM), it might be necessary to treat the receptor with other salts and small molecules, such as magnesium chloride, potassium chloride, ATP, and iodoacetamide.

Sodium chloride determines the ionic strength of the solution and may help to maintain the receptor in the monomeric state by abolishing polar interactions between the protein molecules [255]. The concentration of sodium chloride in buffers for different receptors can vary from trace amounts (below 6 mM) [37] to 800 mM [256, 257].

Many receptors are stable within a broad range of NaCl concentration [242, 257-259], which allows NaCl to be used as a tool in GPCR purification. For example, cells with the overexpressed receptor are washed in the buffer with a zero concentration of NaCl and low concentrations of KCl and MgCl2. KCl and MgCl2 provide the ionic strength necessary to maintain the receptor stability, while the absence of NaCl outside the cells causes an osmotic influx of water into the cells, which later facilitates their disintegration. Consequently, the membranes of already disintegrated cells are washed with a buffer with high NaCl concentration (1000 mM) to remove soluble intracellular and peripheral membrane proteins (high salt concentrations cause their aggregation) [260], while GPCRs incorporated in the membrane remain in the native conformation. In addition to NaCl, urea can be used to remove peripheral membrane proteins [44, 261].

Glycerol is another additive stabilizing GPCRs. Glycerol is amphiphilic and shields the hydrophobic regions on the surface of protein molecule from the aqueous solution, thereby maintaining correct receptor conformation [262]. It is especially important in the case when GPCR is extracted from the membrane by solubilization and is embedded in detergent micelles. As mentioned above, detergent molecules are more mobile than native lipids in the membrane; in addition, there is a continuous exchange of detergent molecules between micelles and surrounding buffer. Therefore, the hydrophobic amino acids hidden by lipids in the native membrane might become exposed, at least transiently, to the solution. In order to maintain the receptor stability in micelles, glycerol is added to a concentration of 10-30% during receptor solubilization [259, 263]. In many GPCR purification protocols, glycerol concentration in the washing buffer is 10% [257-259]. Glycerol is also a cryoprotectant [264] and is added to the buffers for the long-term storage of cell membranes or to purified receptors before freezing [259].

Iodoacetamide is used for the purification of GPCR preparations intended for crystallization, which requires the protein to be in an extremely monomeric state. Iodoacetamide is an alkylating agent that irreversibly modifies the -SH group of cysteine, preventing further formation of disulfide bonds. If the receptor contains cysteine residues exposed to the solution, two receptor molecules can form a disulfide bond through these cysteines after solubilization, thus disturbing the monomeric state of the protein. Iodoacetamide prevents the formation of such covalently linked dimers [265], for which the membranes containining the receptor are incubated with iodacetamide at a conentration of ~2 mg/ml immediately before solubilization [258, 259].

ATP is also used for GPCR purification to increase the monomeric state of the receptor. All GPCRs have a rather complex structure and often require molecular chaperones for correct folding [266]. Individual chaperones, e.g., Hsp70, regulate the process of GPCR signaling, ensuring dissociation of G proteins from the receptors [267]. Even after membrane washing and solubilization, some receptors can still be bound to chaperones, many of which are ATP-dependent. When interacting with ATP, chaperones change their conformation and can be detached from GPCR. Therefore, ATP is added to the washing buffer for the solubilized receptor [258, 259].

CHARACTERIZATION OF PROTEIN SAMPLES AND QUALITY CONTROL

Purified protein samples used for the structural and functional studies can be characterized by analytical gel filtration, thermal shift assay, and nano differential scanning fluorimetry.

Analytical gel filtration, or size-exclusion chromatography (SEC), is a chromatographic technique providing separation of macromolecules in solution according to the ratio of the molecule hydrodynamic radius to the average pore size of chromatography resin [268] packed into a chromatography column. When proteins in a buffer pass through the column, smaller molecules enter the particle pores, so their retention time increases, while large proteins are eluted without entering the pores. As a result, proteins are fractionated according to their size [269]. Eluted proteins are usually monitored with a UV absorption detector.

In fluorescence-detection SEC (FSEC), the target proteins are covalently linked to the green fluorescent protein (GFP) or some other, and the SEC profile is recorded using the fluorophore emission. Since FSEC uses unique GFP signal, the experiment requires neither protein purification, nor large-scale protein production. The measurements can be performed after solubilization of intact cells or unpurified membranes, which significantly simplifies precrystallization screening required for selecting the optimal conditions of protein production [270].

SEC is one of the most useful tools for monitoring the monodispersity and stability of target proteins. A monodisperse and correctly folded protein is usually eluted as a symmetric Gaussian peak, while a polydisperse, unstable, or unfolded protein is eluted as several asymmetric peaks (Fig. 3). SEC elution profiles can also provide information on the impurities present in the sample; therefore, SEC is widely used for the analysis of the homogeneity, stability, and purity of proteins and their complexes (i.e., basic indicators for the suitability of protein samples for the structural studies) [271].

Fig. 3.
figure 3

Example of the SEC profile of a protein sample (absorbance at 280 nm plotted vs. retention time). Monomeric protein is eluted as a symmetric Gaussian peak 1. The fractions with higher molecular masses, presumably protein dimers (peak 2) and protein aggregates (peak 3) are eluted before the monomeric protein.

The homogeneity and oligomeric state of proteins can be determined by comparing the retention times of the protein–detergent complex and molecular mass standrads. However, it should be taken into account that the molecular mass standards for SEC are soluble proteins or small molecules, whereas membrane proteins form complexes with detergent molecules, which can potentially increase their hydrodynamnic radius and lead to the overestimation of the protein molecular mass [272]. This problem can be solved by using light scattering techniques, e.g., MALS.

Multi-angle light scattering (MALS) is a well-proven method for studying protein interactions, which can be used both for single proteins and protein complexes. In a typical MALS experiment, scattering of a laser beam by the protein solution is measured at several angles in the plane perpendicular to the incident light. The total scattering intensity depends on the protein molar mass and concentration, whereas the angular dependence is related to the root mean square (rms) radius of the molecule. Analysis of the angular variation of scattered intensity allows to determine the root mean square radius, molecular mass, and concentration of macromolecules in the sample [273].

MALS is especially efficient in combination with SEC. As protein complexes are eluted from a SEC column, they can be immediately analyzed with a MALS detector. This makes it possible to fractionate different proteins or protein complexes present in the sample with simultaneous measurement of their molecular masses [274].

Thermal shift assay. Expression and purification of recombinant proteins can be considerably improved by the addition of stabilizing buffers or ligands that reduce the tendency of expressed proteins to form aggregates during purification and storage in vitro. Moreover, for recombinant proteins, high protein stability correlates with its ability for crystallization [275]. Stabilizing buffers and additives are identified by their capacity to increase the protein melting temperature during its thermal denaturation.

One of the methods for assessing protein thermostability is thermal shift assay (TSA), also referred to as differential scanning fluorimetry. Protein preparation for TSA involves protein incubation with specific fluorescent dyes (e.g., SYPRO Orange) that increase their quantum yield upon binding with the protein hydrophobic regions that become available for the interaction during protein denaturation. Thermal denaturation induced by a gradual temperature elevation can be monitored as the increase in the dye fluorescence [276]. Protein stability is evaluated based on the melting temperature (Tm), which can be increased by changing the buffer or introducing more additives to the buffer solution. This increase in the melting temperature is referred to as the thermal shift indicating an increase in the protein stability. The thermal shift can be also used to identify the ligands that stabilize receptors by binding to them [277]. Thus, the dye N-[4-(7-diethylamino-4-methyl-3-coumarinyl)phenyl]maleimide (CPM) was first used in the TSA of human apelin receptor (APJ) [234]. Therefore, TSA can be used to analysis the stability of GPCR variants with different partner proteins, point mutations, and lengths of the N- and C-termini, as well as to search for new ligands by screening compound libraries.

Nano differential scanning fluorimetry (nanoDSF). In contrast to TSA, nanoDSF does not involve protein labeling with a fluorescent dye. NanoDSF tracks the changes in the intrinsic fluorescence of tryptophan residues in the protein molecule caused by alterations in the protein 3D structure induced by temperature changes [278]. Intrinsic tryptophan fluorescence responds to the changes in the protein microenvironment due to the solvatochromic properties (i.e., environment-dependent changes in the fluorescence parameters) of the indole ring. The fluorescence maximum of tryptophan in a nonpolar environment is 330 nm (excitation wavelength, 280 nm). In the polar environment, the fluorescence intensity usually decreases due to the static and dynamic quenching by the solvent molecules, while the emission peak shifts toward the red region of the spectrum (approximately to 350 nm) [279]. This usually occurs when tryptophan residues normally hidden in the protein hydrophobic core are exposed to the aqueous environment during denaturation. The temperature required to unfold 50% of the protein (Tm) can be determined from the extent of the red shift (sometimes, blue shift) of the tryptophan fluorescence (Fig. 4).

Fig. 4.
figure 4

Example of nanoDSF data. Solid lines, the ratio of sample fluorescence at 350 nm to fluorescence at 330 nm; dashed lines, the first derivative of the ratio; blue curves, the data for protein melting without the ligand; red curves, the data for protein melting with the ligand; vertical dash-dotted lines, the melting temperature of the samples. Protein stabilization due to the ligand binding results in the increase of the melting temperature.

Since nanoDSF can be used to monitor protein thermostability from changes in its melting temperature, it is employed to search for the optimal conditions of protein storage and crystallization that ensure the highest protein stability [280]. NanoDSF also allows to detect the protein–ligand binding and can help in obtaining valuable information on the ligand-binding site and the mechanism of protein action [281].

FORMATION OF RECEPTOR COMPLEXES WITH MAIN INTERACTION PARTNERS

Formation of GPCR complexes for crystallographic studies. In order to understand the molecular mechanisms of GPCR function, it is essential to know the structures of receptor complexes not only with ligands, but also with protein molecules interacting with the receptor and involved in the signal transduction inside the cell, such as G protein-coupled receptor kinases (GRKs) [265], G proteins, arrestins, etc.

According to GPCRdb, 450 GPCR structures have been obtained by X-Ray crystallography. Among them, 53 are receptor complexes with antibodies, antibody fragments, and nanobodies; 18 and 4 are complexes with G proteins and visual arrestins, respectively.

Antibodies and antibody fragments are often used in the structural studies of GPCRs (see “Using Antibody Fragments for Stabilization of GPCR Conformational States” section). For example, nanobodies have been used as chaperones in crystallization of receptor; however, each receptor requires development of a specific unique nanobody [282]. Protocol for the nanobody–receptor complex formation and crystallization is the following: a receptor is mixed with the excess of a nanobody; the reaction is carried out at 4°C for 1-8 h; the complex is purified and concentrated; aliquots (7-8 µl) of the complex are frozen in liquid nitrogen and defrosted immediately before mixing with lipids and following crystallization [179, 283-285]. If the receptor is thermostable, the procedure of complex formation can be performed at room temperature and more rapidly [249]. The complex can be concentrated up to 40-50 mg protein/ml [286] or crystallized without preliminary freezing [180]. The protocol for receptor–antibody fragment complexes formation is principally the same: the antibodies are mixed with the receptor, which may be purified, immobilized on the affine resin during purification, or even can still be in the membrane before solubilization. The mixture is kept on ice for 1-8 h and then the receptor–antibody fragment complex is purified by gel filtration, concentrated, and used for crystallization [11, 240, 243, 246, 287].

Heterotrimeric G proteins are a family of important proteins involved in transmitting signals through GPCRs. G protein is composed of three subunits: Gα, Gβ, and Gγ [288]. During receptor activation, G protein dissociates into the Gα and Gβγ subunits. Based on the homology and associated downstream signaling pathways, G proteins are classified into four families according to their Gα subunits: Gs (Gs and Golf), Gi/o (Gi1, Gi2, Gi3, Go, Gt1, Gt2, Gt3, and Gz), Gq/11 (Gq, G11, G14, and G15), and G12/13 (G12 and G13) [289]. Gs subunits activate adenylate cyclase, while Gi/o subunits inhibit it; Gq/11 subunits activate phospholipase C-β, and G12/13 subunits activate small GTPases [290]. There are also 5 different types of Gβ and 12 types of Gγ subunits which trigger the associated signaling pathways and provide a great variety of heterotrimeric G proteins. Gα subunits are GTPases that use the energy of guanosine triphosphate (GTP) hydrolysis into guanosine diphosphate (GDP) and trigger intracellular signaling cascades [291-293].

The first crystal structure of the GPCR complex with the G protein was published in 2008. Rhodopsin solubilized from bovine retina was mixed with the chemically synthesized C-terminal fragment of the Gα subunit at a molar ratio of 1 : 4, incubated on ice, and illuminated by green light (500 ± 20 nm) for complex formation, followed by crystallization using the hanging drop technique [236].

The next solved crystal structure of the β2AR complex with the Gs heterotrimer and nanobody 35 (Nb35) has become the crowning achievement of ten years of studies of GPCRs. The structure was obtained for the receptor at the moment of its activation and signal transduction via the G protein. The G protein subunits (recombinant bovine Gα and Gγ and rat Gβ) were expressed separately, combined into a complex, and then mixed with the ligand–receptor complex with a slight excess of Gs. The mixture was incubated for 3 h at room temperature with the addition of apyrase for the hydrolysis of GDP released from Gs during the complex formation to prevent dissociation of the receptor–Gs complex. The complex was separated from the unbound G proteins using chromatography on anti-FLAG resin and from the unbound receptors using SEC. One hour before crystallization, the ligand–T4L–β2AR–Gs complex was mixed with a slight molar excess of Nb35 and kept at room temperature. Nb35 was used for stabilization of the Gs complex, which improved its affinity to the receptor. Crystallization of the ligand–T4L–β2AR–Gs complex was performed by the LCP crystallization technique [85].

To facilitate crystallization of the GPCR–Gs complex, mini-Gs (construct 414) was designed, which is a shortened variant of Gαs with eight-point mutations [294] capable of the protein stablization in the absence of Gβγ [92, 295]. Mini-Gs increased the receptor affinity to the agonist (similar to the heterotrimeric Gs) and demonstrates the same sensitivity to Na+ (allosteric modulator). The complex of A2AR with the NECA agonist and mini-Gs proved to be more thermostable than the receptor–NECA complex. The complex was formed by mixing the purified A2AR with a 1.2 molar excess of mini-Gs in the presence of apyrase. The mixture was incubated overnight on ice, and the complex was purified by SEC before crystallization [294].

Mini-G versions have been designed for other Gα subunits [92]. The protocol for the formation of GPCR complexes with mini-G proteins is rather universal [171].

All above-described crystallization procedures have been developed for the nucleotide-free complexes, so additional experiments were required to obtain the receptor–Gs–GDP complexes. The group of Brian Kobilka [296] inserted the amino acid sequence of the GDP-binding region of Gαs (hereinafter, GsCT) into the β2AR instead of the ICL3. The GsCT sequence was inserted via optimized linkers between TM5 and T4L, T4L and GsCT, GsCT and TM6. A disulfide bridge was introduced between GsCT and TM5, which stabilized the interaction between the receptor and the G protein. This method yielded high-quality crystals that allowed to understand the interaction between the receptor and nucleotide-free Gαs. Gi/oCT peptides created by analogy with GsCT have been used to study structural changes associated with signal transduction from the visual receptors to G proteins [297-304].

The latest crystal structure of the GPCR–G protein complex was published in 2021 [305]. The structure of the complex consisting of the D1 dopamine receptor, its ligand, Gs heterotrimeric protein, and Nb35 was compared to that of the analogous β2AR complex and studied for the interaction between the receptor and G protein. The reaction of complex formation in this work was carried out on the anti-FLAG resin. The complex components were added to the resin and the mixture was incubated for 3 h at 4°C with gentle shaking. Next, the receptors and the assembled complexes were eluted from the resin, followed by the final complex purification by gel filtration.

To ensure the assembly of the GPCR complex with G proteins, it is necessary to add apyrase in order to remove nucleotides that can cause the dissociation of the complex [85]. Copper phenanthroline can be used to catalyze the formation of disulfide bridges between the receptor and the G protein [296]. The disulfide-mediated protein aggregation during the complex formation can be prevented by tris(2-carboxyethyl)phosphine (TCEP), but not with iodoacetamide, because the latter blocks cysteines in subunits α and β of Gs and causes complex dissociation [85].

Other important components of the GPCR-mediated signal transduction are arrestins. These proteins are involved in receptor desensitization and removal from the cell membrane, as well as in the triggering of the G protein-independent cellular pathways. The arrestin family includes the subfamilies of visual arrestins (arrestin 1 and arrestin 4) and non-visual β-arrestins (β-arrestin 1 and β-arrestin 2, also referred to as arrestin 2 and arrestin 3, respectively). Also, the family of α-arrestins structurally related to β-arrestins has been identified. Type 1 and 2 β-arrestins, which are typically used in structural studies, are expressed in most mammalian tissues and cell types [306].

At present, four structures of the visual rhodopsin–arrestin 1 complexes have been solved. Three of them were obtained by the co-expression of both proteins in a single polypeptide chain composed of the full-sized human rhodopsin with T4 lysozyme at the N-terminus and type 1 arrestin connected by the flexible AAAGSAGSAGSAGSA linker to protein C-terminus. All-trans-retinal was added to the complex during the LCP preparation [15, 307]. Another rhodopsin–arrestin 1 complex was assembled from the rhodopsin isolated from bovine retina and chemically synthesized ArrFL-1 peptide (arrestin 1 loop region). The solubilized receptor was mixed with the peptide at a molar ratio of 1 : 12; the mixture was incubated on ice for 5 min and illuminated by light at 530 nm. The complex was crystallized by the hanging drop technique [308].

Studying the complex structure by cryo-EM. The first receptor structure obtained in 2017 by cryo-EM was the structure of calcitonin receptor [309]. The same year, Jacque Dubochet, Joachim Frank, and Richard Henderson were awarded the Nobel Prize in Chemistry “for developing cryo-electron microscopy for the high-resolution structure determination of biomolecules in solution”. After the decades of improving this technique, its application to the structural biology of GPCRs has become a tremendous success. According to GPCRdb, 337 GPCR structures have been solved by cryo-EM in just 5 years.

The procedure used in the first work was as follows: the receptor modified with the N-terminal FLAG-tag and C-terminal His-tag, Gαs and Gβ1γ2 were coexpressed in the baculovirus system in Hi5 insect cells [310]. For this, the cell culture was simultaneously infected with the viral suspensions at the 1 : 2 : 2 ratio, respectively. After the expression for 48 h, the cells were centrifuged and resuspended, and complex formation was triggered by adding the ligand (salmon calcitonin), Nb35, and apyrase. Nb35 bound to the Gαs–Gβ complex, thereby stabilizing the G protein. The reaction of complex formation was performed for 1 h at room temperature, followed by solubilization of the receptors and multistage purification of the complexes.

Due to the limitations of cryo-EM technique, which requires a certain minimum mass and shape of a studied object, this method was first suitable for solving the structures of only class C GPCR complexes with small ligands due to the presence of large extramembrane domains in these proteins [311-315]. For other GPCR classes, the structure of the receptor–ligand complex in the absence of other molecules could be obtained only in the case of dimerization (as for rhodopsin [172]) or by using large protein ligands, such as chorionic gonadotropin [316].

Coexpression allows to obtain protein complexes in both bacterial [317-319] and eukaryotic [320-323] expression systems.The most common approach in the GPCR structural studies is cell transfection with several vectors [320-323]. Coexpression of proteins in a single reading frame is used less frequently [313].

The formation of protein complexes from purified components [12, 184, 324], before protein purification [325], or before receptor solubilization [326, 327] produced similar results. It is possible that the authors of the latter works optimized the protocol of complex purification for their purposes.

Cryo-EM has made it possible to obtain the structure of µ-opioid receptor (µOR) in a complex with Gi1 protein [328]. µOR transmits signals via the subfamily of Gi/o proteins that inhibit adenylate cyclase [329]. The activation of opioid receptor results in pain suppression [330]; therefore, understanding the structure of the agonist–µOR–Gi1 complex was especially important. The Gi1 heterotrimer in the absence of the bound nucleotide was stabilized using a single-chain variable fragment (scFv) that interacted with the Gαi and Gβ subunits, thus stabilizing the G protein but producing no effect on the interactions between the G protein and the receptor. This work was the first one to solve the structure of the GPCR complex with the G protein other than from the Gs subfamily. The experimental protocol for the ligand–GPCR–Gi complex formation was not much different from the procedures for obtaining receptor complexes with Gs proteins: coexpressed G protein subunits were purified, concentrated, and added to the purified concentrated receptor–ligand complex. The reaction of complex formation was carried out for 1 h at room temperature and then for another 1 h in the presence of apyrase. The complex was purified and treated with proteases, then scFv16 was added. The resulting complex was purified by gel filtration, concentrated, and used for the structural studies.

The Gα12/13 subfamily contains only two members: Gα12 and Gα13 [331]. About 30 receptors bind G proteins of this subfamily during activation [332]. G12 and G13 activate the guanine nucleotide exchange factor (GEF) regulating monomeric small GTPases of the Rho family [333]. The mechanisms of the receptor interaction with G12 and G13 had remained unclear for a long time, until the structures of the complexes of type 2 sphingosine-1-phosphate receptor (S1PR2) [334] and adhesion receptor ADGRG1 (GPR56) [335] were obtained in 2022. The G13 heterotrimer in [334] was unstable; therefore, it was decided to create a chimeric Gα13 subunit, in which the αN helix of the wild-type Gα13 was replaced with the αN helix of Gαi [336], and the G protein was stabilized using scFv16. The purified and concentrated components were added in a particular order ‒ first, the receptor and the G protein and then scFv16. In [335], a mini-Gα13 variant was designed by analogy with the previously created thermostable mini-Gα12 [336].

The studies of GPCR complexes with the Gαq protein are particularly difficult case. When Gq/11 proteins bind to GPCR, they activate β-phospholipase, which in turn triggers a long cascade of reactions [337-339]. The structures of the receptor–Gq/11 complex were obtained only after introduction of stabilizing deletions and mutations, as well as modifications ensuring Gαq binding to scFv16. Mini-Gq constructs were created, in which the N-terminal sequence of Gαq/11 was replaced with the N-terminal sequences of other Gα subunits [92, 340, 341]. Gαq/11 subunits in all obtained complexes were chimeric [342-344].

β-Arrestins can desensitize GPCRs [345] or perform signaling functions when bound to GPCRs [346-348]. In contrast to visual arrestins used for the crystallization of protein complexes, β-arrestins are much more flexible and therefore, hinder the X-ray analysis of the complex structure. Cryo-EM was used to establish the structure of the arrestin 2 (Arr2) complex with NTSR1 [349]. The construct consisting of the receptor, β-arrestin, and light chain of Fab30 (BRIL-NTSR1–Arr2–Fab30L) connected via linkers was coexpressed with the heavy chain of Fab30H and GRK5 kinase in Sf9 insect cells. The receptor was solubilized, and the receptor–β-arrestin–heavy and light Fab30 chain complex was purified, concentrated, and used for further studies. GRK5 was necessary for the successful complex formation (the role of phosphorylation in the attachment of β-arrestins to the receptors will be discussed below). The receptor–β-arrestin complex can also be formed from purified and concentrated components, using sortase to attach the phosphorylated peptide to the C-terminus of the receptor [350] or to treat the GRK receptor [351].

Before the attachment of β-arrestins, the C-terminus and ICLs of the receptors undergo selective phosphorylation by kinases of the GRK family that includes seven members [352]. This process determines the direction and dynamics of biochemical signaling and the fate of the receptor; therefore, the structure of the GPCR–GRK complex is of particular interest. At present, such structure was obtained for the complex of visual rhodopsin with GRK1 using cross-linking and kinase inhibitors for the complex stabilization [184].

GPCRs bind G proteins during activation and dissociate from them after signal transduction. Next, GSK phosphorylates the intracellular domains of the receptor, and the receptor is desensitized by β-arrestins. However, it has been noted that sometimes the receptor continues signaling [353-355], so it was hypothesized and then proven that some GPCRs interact simultaneously with β-arrestins and G proteins [356]. The structure of such mega complex was obtained in 2019 [357]. For this, β2AR with the substituted C-terminal sequence of the type 2 vasopressin receptor (β2V2R), β-arrestin 1 (β-arr1), and GRK2 were coexpressed. Before centrifugation, the cells were heated to 37°C with the addition of a ligand for the complex formation. Before solubilization, an excess of Fab30 was added to the membranes and the mixture was incubated for 30 min at room temperature. The receptor was solubilized and Gs heterotrimer, Nb35, and Nb32K were added in excess to the purified β2V2R–β-arr1–GRK2–Fab30 complex. The complex was formed at room temperature within an hour. This publication was supplemented with a short article with comments on the complex structure [205].

CONCLUSION

Here, we reviewed modern approaches to obtaining homogenous stabilized GPCRs. First and foremost, genetic modifications are introduced into the wild-type receptors to delete unordered regions of the protein amino acid sequence, reduce their conformational mobility, eliminate heterogenous posttranslational modifications, and add fusion proteins. The purpose of these modifications is to increase the protein stability, facilitate the formation of contacts in the crystal, or create an additional volume and certain protein shape for the cryo-EM studies. The resulting constructs are expressed in the systems providing correct protein folding. According to GPCRdb, the baculovirus system using insect cells has been proven to be the most successful for GPCR expression. The following stages include receptor purification and stabilization in suspensions with membrane-modeling media and characterization of its homogeneity and stability. We also discussed the issues related to production of GPCR complexes with their main interaction partners, such as G proteins, β-arrestins, and GRKs.