INTRODUCTION

Since the first mentioning of using antibodies for protein recognition in 1968, there has been an increasing interest in the development of high-affinity and specific agents for their application in diagnostics and, later, in therapy and other fields [1]. In the 1970s and 1980s, technologies for production of monoclonal antibodies have been developed, the structure of immunoglobulins have been determined, and the methods for designing chimeric, humanized, and then fully human monoclonal antibodies have been created. Antibodies have taken their place among the main specific protein-binding agents in modern biology and medicine [2]. Currently, antibodies are used as immobilizing and detecting agents in diagnostics, as well as drugs and drug delivery systems, labeling and staining reagents, research tools, etc. [3-5].

However, even modern monoclonal antibodies have their drawbacks, which limits their diversity and areas of application.

The necessity to optimize approaches for development of binding agents, in particular, the requirements for their targeted and rational design, have led to a widespread application of molecular modeling methods. These methods have demonstrated a high efficiency in studying high-affinity protein-binding agents, increasing the efficacy and reducing the cost of their development, improving the binding parameters, and enhancing the stability of such agents. Several studies have shown the possibility of rational design of affinity-binding agents based on the target structure. Molecular modeling methods are used at different stages in the development of protein-binding agents, from the creation of primary monomer libraries to system optimization and identification of molecular binding mechanisms. Both common (homology modeling, molecular docking, molecular dynamics) and specialized (optimized for a specific type of binding agent) methods can be used.

This review summarizes main advantages, disadvantages, and limitations of known protein-binding agents, as well as key experimental approaches to their development. Special attention is given to modern molecular modeling methods used for investigation, optimization, and rational design of high-affinity protein-binding agents in order to overcome the limitations of experimental approaches and to evaluate the modeling results.

PROTEIN-BASED PROTEIN-BINDING AFFINE AGENTS

Antibodies: structure, methods of production, and applications. Antibodies are globular proteins composed of four polypeptide chains: two light chains (50-60 kDa) and two heavy chains (100-120 kDa) connected by disulfide bonds. The antigen-binding properties of an antibody are determined by the variable domain (Fv) formed by the heavy and light chains (Fig. 1a). The variable domain contains conserved regions responsible for maintaining the structure of the binding site and variable loops, or CDRs (complementarity-determining regions), that provide interaction with the antigen (Fig. 1b). Antibodies are produced by the immune cells; they are also components of B-lymphocyte receptors [6].

Fig. 1.
figure 1

The structure of immunoglobulin G (PDB: 1HZH) (a) and variable loops (CRDs) of the Fv domain (b).

The first use of antibodies as high-affinity specific agents can be traced back to 1890 when E. von Behring and S. Kitasato used a serum containing polyclonal antibodies to treat diphtheria in animals. Antibodies have become the most popular antitoxins; polyclonal sera are still used nowadays. A significant breakthrough in the research and application of antibodies occurred in 1975, when G. Köhler and C. Milstein proposed the hybridoma technology for producing monoclonal antibodies. Monoclonal antibodies have enabled to develop unique, reproducible methods for protein isolation and identification, protein concentration measurement, and labeling of cells based on their antigenic composition. Since 1985, monoclonal antibodies have been used in drug therapy. Today, they remain the most common protein-binding high-affinity specific agents [7].

The most frequently used method for developing monoclonal antibodies is the hybridoma technology, which involves the fusion of genetically modified myeloma cells with B lymphocytes, resulting in the formation of cells capable of continuously producing antibodies against a target protein. The hybridoma technology is costly, resource-intensive, and cannot be easily automated, partly due to the need for continuous maintenance of cell cultures in bioreactors and monitoring of genetic changes during cell proliferation [8].

A more versatile and cost-effective alternative method for obtaining antibodies is phage display. In this technique, a gene of the antibody variable domain is inserted into a bacteriophage (e.g., M13) genome. The recombinant bacteriophage is used for infecting bacteria, and the antibody fragment is expressed as a part of the virus envelope. Only bacteriophages containing the antibody fragment on the surface can bind to the corresponding antigen. The bacteriophages with a low specificity are removed using a mixture of non-target proteins. This allows to select a variable domain that can subsequently be modified to increase its affinity and specificity through directed mutagenesis and repeated selection [9].

Despite the existence of two methods for producing monoclonal antibodies, the costs of development and, more significantly, production are substantial, since both methods require the use of bioreactors. Consequently, the price of these antibodies can be high, limiting their application. Moreover, due to the difficulties in controlling the batch-to-batch reproducibility, the affinity, specificity, and stability of the produced antibodies often differ from those claimed by the manufacturer [10].

Molecular modeling methods for the development and study of antibodies. In recent years, molecular modeling methods have been actively used for optimization and development of the structure of antibodies in order to enhance their affinity, specificity, and other properties.

Modeling of antibody structure. Five out of six hypervariable regions (CDRs) in the variable domain exhibit a limited number of possible conformations, i.e., differ only slightly between antibodies with different specificity and affinity (RMSD, ~0.7 Å) [1112]. Analysis of the antibody tertiary structures has shown that there are eight main templates for the variable region conformation – two for the structure of the light and heavy chains, five for the hypervariable loops within the CDRs (L1, L2, L3, H1, H2), and one used as a basis for modeling the H3 loop – which can vary significantly between different antibodies (Fig. 1b). The structures of most variable antibody fragments can be predicted by combining these loop conformations. Five of the six loops within the CDRs are typically modeled by the homology-based methods. The main challenge is modeling of the H3 loop and relative orientation of the H and L chains. Since H3 is a part of the HL interface, modeling of both regions is interdependent [13]. The H3 loop is built using ab initio methods that involve either searching for similar loops in known protein three-dimension structures or constructing the loop by sequential addition of amino acid residues, followed by optimization of the loop structure using molecular mechanics and molecular dynamics methods. Methods based on stochastic approaches, such as the Monte Carlo algorithm, have been proposed, in which a set of different geometries is generated and the most stable ones are selected by evaluating their energy. The structure is then optimized by minimizing the internal energy, and the most energetically favorable structure is chosen [14].

The conserved fragments of the antibody are designed by homology modeling based on the known antibody tertiary structure using the programs such as Modeller, I-TASSER, and Rosetta [15-18]. Based on these approaches, several software packages have been developed, both commercial (products from Schrödinger Inc., Chemical Computer Group, and Accelrys Inc.) and freely available (PIGS, WAM, SAbPred) [1719-21]. A combination of homology and ab initio methods forms the basis of the RosettaAntibody software package, which provides a high correspondence between the modeled and experimentally determined structures [22, 23].

The most significant result in the application of methods developed for modeling the antigen-antibody interactions has become the ability to alter the affinity of such interactions by introducing point mutations in the antibody sequence and rapid estimation of the binding affinity in silico [2425]. Kiyoshi et al. [25] modeled 1178 point mutations including sequential substitutions of each of 62 amino acid residues in the CDRs with other 19 residues in the structure of the 11K2 antibody directed against the chemokine MCP-1. In silico selection demonstrated that twelve of these substitutions resulted in increased affinity. Subsequent verification by the surface plasmon resonance method confirmed that five of them indeed had an increased affinity. For example, one of the mutants demonstrated a 4.7-fold higher affinity compared to the wild-type antibody. The authors noted that the presence of charged residues in the variable domain light chain fragment played the most significant role in enhancing the antibody affinity [25]. Therefore, the used methods have shown the ability to identify amino acid residues most important for the antibody-antigen binding.

Modeling of antigen–antibody complexes. In the aforementioned study [25], the authors improved the affinity of the antibody with already known structure of the antibody-antigen complex [25], while the tertiary structures of complexes of new antibodies, as well as most commercially available antibodies, are unknown [26]. Therefore, the ability to correctly identify the antibody-binding site on the target protein has become of a particular importance. Various experimental and computational approaches have been developed to solve this problem [27]. In silico methods use both the tertiary structures of target proteins and their amino acid sequences as input data. Although prediction of linear epitopes based on their amino acid sequence yields fine results, they represent only ~10% of all epitopes [2728]. The attempts to predict conformational epitopes based on amino acid sequences have been mostly unsuccessful due to the difficulties in accounting for the spatial interactions between different fragments of protein sequences [29-31]. Therefore, the most common method is macromolecular (protein–protein) docking, in which two proteins are docked as rigid structures based on the assessment of their electrostatic and steric complementarity. Formation of protein–protein complexes is accompanied by mutually induced adjustments of proteins surfaces, which makes rigid protein–protein docking low-informative. For this reason, approaches for modeling protein complexes with rapid mutual geometry optimization have been developed [3233]. Thus, RosettaFold performs rapid geometry adjustments by energy minimization using the Monte Carlo algorithm. The SnugDock algorithm developed based on the RosettaFold and RosettaAntibody software, was specifically designed for modeling the antibody-antigen complexes and is capable of performing rapid Monte Carlo optimization to obtain higher-quality models [34]. It was found that an antibody–antigen complex close to the native one is usually more energetically favorable, resulting in the global energy minimum for the modeled system, which makes selection more efficient [33, 35].

However, the use of this approach is limited by its low accuracy. The complex optimization step is coarse and the energy evaluation function is simplified, leading to poor correlation between the experimental data and predictions [36]. Therefore, the results of docking are often further evaluated using molecular dynamics simulations. For instance, in modeling the antibody complexes with beta-amyloid, after docking of crenezumab (a humanized antibody against beta-amyloid peptides 1-40 and 1-42 developed as a treatment for Alzheimer’s disease) to beta-amyloid, about 200 complex models were generated and divided into eight clusters. Subsequent 200-ns molecular dynamics simulations were performed on the representatives of these clusters in order to optimize the complex structure, analyze the structure of the obtained complexes, and evaluate their interaction energy. Two of these complexes were stable, and they explained well the experimental data [35].

The Monte Carlo method is stochastic, and its predictive power depends on the number of iterations, which increases the number of possible configurations of the complexes and leads to a challenging problem of selecting the most probable structures. Special algorithms are employed to increase the conformational sampling in order to reduce random energy fluctuations, re-evaluate complex energies by increasing the importance of electrostatic and desolvation energy, and cluster the complexes. The programs implementing these algorithms are FiberDock, ZRANK, and pyDock [37-40].

In addition to the Monte Carlo-based methods, direct approaches for searching the interacting surfaces have been developed using the fast Fourier transform (FFT). This algorithm can search for correspondences between the protein surfaces by transforming them into the frequency domain. Consequently, it identifies significantly fewer potential binding sites and conformations and in much lesser time, partly due to the ability to perform computations on graphics processors. However, the most significant factor in this algorithm is geometric matching (rather than physicochemical compatibility), which substantially limits the applicability of the approach and requires additional modeling steps [41].

There are also methods based on predicting the interaction energy between two proteins, e.g., in antibody–antigen complexes, that are optimized to find correlations between the structure of interacting surfaces and experimentally determined affinity. Recently, the possibility of using the free energy perturbation method (FEP) to predict the influence of point mutations on the affinity and stability of antigen-antibody complexes has been demonstrated [42]. The FEP method is based on calculating the difference in changes in the free energy of a complex during conversion of a ligand (or amino acid residue) into another one via small changes in the ligand structure followed by optimization of each intermediate complex structure by molecular dynamics simulation. The DDMut-PPI program has been developed based on deep learning methods to predict with a higher efficiency the changes in the free binding energy for two proteins [43]. The increasing number of structures of protein complexes in the PDB database has made possible the development of frequency matrices. These matrices have shown the preference for polar and charged amino acid residues at the interaction interface, thus allowing rapid selection of potential interaction areas based on this criterion. This additional selection of complexes by their physicochemical properties significantly increases the probability of finding correct complex geometry [44, 45].

More universal combined approaches have also been developed. For instance, the modeling of antibody structure by homology, complex optimization, and application of sophisticated physicochemical evaluation functions have been used in the OptMAVEn-2.0 program for de novo modeling of variable antibody fragments for the interaction with specific epitopes on the protein antigen surface. This method implements protein–protein docking, pose optimization, side-chain positioning optimization, sequence selection, construction of statistical matrices, and more [46]. The method mimics the natural process of V(D)J gene recombination by designing antibodies from modular Ab parts (MAPs) with subsequent structure optimization to increase their affinity. When this approach was used for constructing antibodies against peptides, five designed antibodies that were tested experimentally demonstrated correct protein globule folding and stability in solution. Three of them exhibited a nanomolar-range affinity [47]. This approach was also used to design two antibodies against epitopes on the Zika virus envelope protein [46].

An intriguing prospect in the development of antibody modeling methods is the possibility of de novo design of antibodies against a specified epitope based on computational models. A new approach has been developed that includes an assembly of a library of short seed sequences capable of binding to regions on the target protein surface. These sequences are docked onto the protein surface using the hot-spot method that involves the docking of a library of short fragments and selection of those showing the highest binding energy with the protein (the term “hot spots” often refers to amino acid residues that significantly contribute to the binding energy). Then, an antibody backbone is selected to link these short sequences. With a sufficiently diverse library, this approach could be used for the directed development of antibodies for specific epitopes; however, in the original work, its application was limited to the epitopes with the known antibody–epitope complex structures [48].

Combining statistical analysis and homology modeling is particularly applicable in the modeling methods based on deep learning. The AlphaFold2 neural network based on the transformer model and its extension AlphaFold-Multimer have been utilized to model antibody-antigen complexes using primary sequences only. However, the modeling result still remained at ~30% success in reproducing the geometry of the experimentally obtained complex [49].

Various deep learning models based on graph neural networks, language models, etc., have been used for modeling antibodies for specific epitopes [50-52]. However, these methods are limited by the training datasets available from PDB, so that a very limited number of structures for both epitopes and antigen-binding domains have been used for training.

Other actively developed approaches are methods based on deep learning. They efficiently approximate hidden patterns and thus can provide a high accuracy in model construction, surpassing homology modeling methods, such as Modeller and I-TASSER [53]. However, high-quality training of such systems requires large training datasets, which are currently difficult to obtain. The small number of experimentally determined structures limits the efficiency of such approaches [49].

The methods used for the development and optimization of monoclonal antibodies are shown in Fig. 2.

Fig. 2.
figure 2

Molecular modeling methods used for the development and optimization of monoclonal antibodies.

In addition to the methods for designing the structure of antibodies and their complexes with antigens, various approaches have been developed to predict the antibody propensity for aggregation, immunogenicity, and pharmacokinetic clearance [54-56].

Despite a significant variety of methods for modeling interactions in the antibody–antigen complexes, most of them have a serious limitation, as they do not account for the role of solvent, which is crucial in the antibody–antigen interactions. The method most effectively addressing this issue is molecular dynamics simulations in explicit solvent that requires substantial computational resources. Potential solution to this problem might involve the use of accelerated dynamics methods, including those based on deep learning [57].

Antibody fragments. Structure and methods of development. As mentioned earlier, development and production of antibodies are costly processes that require continuous monitoring. Since only the antibody variable domain is involved in target binding, truncated antibody variants retaining the ability to bind their targets have been developed. Using genetic engineering methods, a variety of protein-binding antibody fragments have been created, such as single-chain variable fragments (scFvs), individual antigen-binding fragments (Fab’s), minibodies, nanobodies, and others [58-60]. The comparison of antibody structures and their fragments is shown in Fig. 3.

Fig. 3.
figure 3

Comparison of antibody structures and examples of antibody fragments.

The development of all above-mentioned monoclonal antibody derivatives has started with the application of hybridoma or phage display technologies. These technologies enable production of high-affinity and specific agents and allow further increase in their affinity during selection [61, 62]. Gene regions encoding constant domains responsible for maintaining the antibody structure and binding to Fc receptors of the immune system are removed by genetic engineering methods, while the light and heavy chains of the immunoglobulin variable fragments are connected by the designed linker sequences.

Nanobodies are based on immunoglobulins with only heavy chain in the variable domain and, therefore, structurally simpler antigen-binding region. Such antibodies have been found in representatives of the Camelidae family. Reducing the size of a molecule decreases the cost and complexity of its production and increases the tissue permeability for this molecule, which is especially important in therapeutic applications. At the same time, this significantly reduces the stability of the antigen-binding region, thus worsening its binding parameters [58].

Methods for maturation on phage or yeast display are employed to solve the problem of reduced affinity and specificity of antibody fragments. In these methods, single substitutions are introduced into the sequences encoding antigen-binding regions, resulting in the creation of a diverse library from which the most affine and specific structures are then selected. Due to lower costs and effort associated with working with antibody fragments, this approach proves to be quite efficient [63].

Molecular modeling methods in the development and research of antibody fragments. Due a smaller size of antibody fragments, the use of molecular modeling methods in the development and optimization of antibodies has proven to be more efficient [64, 65].

Utilization of small antibody fragments makes it possible to design bispecific binding systems composed of fragments from two different antibodies connected by a linker. Molecular modeling methods have been successfully used to create bispecific scFvs that inhibit formation of the T-cell receptor (TCR) complex with the major histocompatibility complex (MHC-II) in the presence of staphylococcal enterotoxin B (SEB) [66]. The most important modeling tool in this study was the microsecond molecular dynamics simulation which helped to decipher the mechanism of allosteric inhibition of the SEB-TCR complex formation.

In [66], a bispecific system was created from the variable domains of known antibodies. However, predicting the structure of new protein-binding antibody fragments requires specialized methods. Tools used for modeling complete antibodies are ineffective for this task, as the absence of stable domains increases the antibody fragment mobility [67]. For these tasks, special programs, like NanoNet, have been developed. This deep learning model can accurately and efficiently (about 1 million structures in four hours on a standard CPU) predict the structures of nanobodies using only their amino acid sequences as an input. These models, combined with flexible protein–protein docking, can serve as virtual screening tools to find new high-affinity and specific protein-binding agents [68].

Thus, despite a smaller size of antibody fragments compared to full monoclonal antibodies, their modeling poses similar challenges and requires significant computational resources.

Antibody mimetics. Structure and methods of development. Antibodies and their fragments are the most commonly used protein-based high-affinity and specific binding agents. However, they have several disadvantages that limit their application. High costs of production, low stability (which leads to problems with aggregation), immunogenicity, and high costs of storage and logistics are the main limiting factors that hinder wider use of antibodies. Most of these problems are related to the native structure of antibodies, which are not meant to exist outside their native environment. Despite the small size of antibody fragments, their structure still remains suboptimal and includes relatively large regions whose function is limited to maintaining the structure of the binding site [69].

Therefore, it was proposed to develop protein-binding agents non-homologous to antibodies; they are known as antibody mimetics (or non-immunoglobulin epitope binders). Designing simpler protein structures with an enhanced affinity and specificity makes it possible to overcome the limitations of antibodies while retaining their useful features.

Protein antibody mimetics have a simple rigid backbone composed of alpha-helices or beta-sheets, while their protein-binding sites are usually free loops [70]. Examples of protein scaffolds for antibody mimetics are shown in Fig. 4. Their development typically employs site-directed or random mutagenesis. Protein antibody mimetics are produced using simple bacterial reactors that provide a higher yield of these molecules compared to antibodies [63].

Fig. 4.
figure 4

Examples of backbones for proteinaceous antibody mimetics (PDB codes are given in parenthesis).

The most common method for developing antibody mimetics involves a cycle of directed evolution followed by candidate selection. Initially, a DNA library encoding homologous protein structures is amplified by error-prone PCR (achieved by adding manganese ions to the reaction mixture) to significantly increase the diversity of amino acid sequences relative to the original library. The most promising sequences with an increased affinity and specificity are selected using the phage display method, and the procedure is repeated [71]. After a limited number of cycles, a highly affine and specific protein-binding agent with high stability can be obtained.

However, this method has limitations that hinder a large-scale entry of antibody mimetics to the market. One of the main challenges in the design of protein-based antibody mimetics is finding an appropriate protein scaffold. Despite a large variety of known scaffolds, such as affibodies (B-domain of staphylococcal protein A), adnectins (extracellular domain of human fibronectin III), and affitins (variants of DNA-binding protein Sac7d), each protein backbone has its drawbacks, and discovering new ones is constrained by the ability to solve protein tertiary structures. Since most of these proteins have structures foreign for an organism, they tend to be immunogenic, which limits their application in vivo. Also, due to the absence of Fc fragments, they lack effector functions [63]. Consequently, although protein-based antibody mimetics are free from many drawbacks typical for antibodies, their quantity and areas of application are currently limited.

Molecular modeling methods in research and development of protein antibody mimetics. Molecular modeling methods address several serious challenges in the development of protein antibody mimetics. The antibody mimetic structure typically includes stable, water-soluble fragments of known proteins that can adopt a correct conformation in bacterial cells. This simplifies their identification, since the amino acid sequences of these fragments predominantly consist of hydrophilic residues, have a low cysteine content, etc. [72]. Analysis of statistic distribution of amino acid residues allows to select a limited set of residues that can be used to form structures with the desired properties, such as solubility and pH sensitivity, without compromising the binding parameters, resulting in the production of proteins retaining stable structures over a wide range of temperatures and pH values.

The tertiary structure of such proteins can be predicted with programs like AlphaFold2 and RosettaFold. However, the accuracy of these predictions remains uncertain as these programs are trained mostly on native structures [73]. A potential solution to this issue may be the application of diffusion-based deep learning models, such as AlphaFold3 and Chroma, which are capable of predicting protein structures based on the desired parameters [7475]. Due to the ability of diffusion models to make small iterative steps during generation, the imposed constraints can cumulatively direct the modeling towards the desired properties. Thus, the structures predicted by Chroma showed minimal differences with the experimentally determined structures (RMSD, ~1 Å). Since Chroma has a subquadratic computational complexity from O(N) to O(Nlog[N]), this tool could be very useful for developing the backbones for protein-based antibody mimetics [75]. A significant limitation of diffusion models is generation of the so-called “hallucinations,” i.e., unrealistic structures and connections that contradict the physical properties [76]. Therefore, expert evaluation of modeling results is crucial.

However, the methods used for modeling protein structures do not address the issue of potential immunogenicity of designed proteins. Based on proteomics data, databases such as SEDB, Epitome, IEDB, AntiJen, and Bcipep have been created, that store information about epitopes recognized by T-cell and B-cell receptors and antibodies, as well as those incorporated into MHC molecules [77-81]. This information can help to identify potential epitopes and replace them when designing protein scaffolds, thereby reducing their potential immunogenicity.

Despite a smaller size and lower structural complexity of antibody mimetics compared to antibodies, there are currently few examples of application of molecular modeling methods for their development. The primary approach in the de novo design of protein antibody mimetics is the hot spot method, which combines the docking of potentially binding peptides and subsequent selection of the scaffolds [82]. Using this method, several mimetics have been developed that bind with a high affinity to the conserved regions of the influenza virus hemagglutinin (HA). As a result of modeling, two high-affinity proteins were designed, and the structure of the HA complex with the antibody mimetic showing the highest affinity was analyzed by X-ray crystallography. The structure of the predicted complex closely matched the experimentally determined one [83].

A combination of several approaches, including high-molecular flexible docking, molecular dynamics simulation, deep learning methods for the tertiary structure prediction, and immunogenicity prediction algorithms, can significantly improve existing approaches and lead to the development of new protein antibody mimetics. These mimetics offer a number of advantages and lack the limitation of monoclonal antibodies and their derivatives.

NON-PROTEINACEOUS AFFINE PROTEIN-BINDING AGENTS

Aptamers. Structure and application. The disadvantages of protein affinity agents are related to the cost of production, problems with storage and logistics, immunogenicity, etc., have stimulated the search for non-proteinaceous protein-binding agents. The most promising of them are aptamers.

Aptamers are short single-stranded molecules of DNA, RNA, or synthetic nucleic acids, usually 20 to 60 nucleotides long, with a high affinity and specificity for various high- and low-molecular-weight targets [84]. Aptamers were independently discovered by two groups in 1990 as a result of the SELEX (systematic evolution of ligand by exponential enrichment) process [8485]. Later, aptamers have been found in nature as components of bacterial riboswitches [86].

The idea of using nucleic acids for the recognition of protein targets has emerged in the studies of human immunodeficiency virus (HIV). It was shown that the transactivation regulatory element (TAR) containing RNA sequences can inhibit HIV replication by binding to the viral protein Tat with a high affinity and specificity [87]. Despite that aptamers are completely different from antibodies; it is believed that they can compete with antibodies in both diagnostic and therapeutic applications [88]. Although aptamers recognize and bind targets in the same way as antibodies, they have a number of advantages, such as shorter production time and lower cost, easier modifications, better thermal and chemical stability, smaller size, and low immunogenicity [89].

Due to their advantages over antibodies, aptamers have found applications in various fields of molecular biology, biotechnology, diagnostics, and therapy. One of the promising areas of research using aptamers is development of optical sensors. Numerous studies have shown that fluorescent dyes attached to the conformationally flexible regions of the aptamers can produce an optical signal in response to ligand binding [90]. However, the number of aptamers on the market is limited compared to antibodies, primarily due to the lack of an urgent need for infrastructure restructuring, which is economically costly and currently impractical for most applications [91].

Many aptamers designed for therapeutic purposes are currently tested in clinical trials or have already been released to the market, including drugs for the treatment of age-related macular degeneration, blood clotting disorders, cancer, inflammatory processes, etc. [89, 92].

Besides being used as therapeutic agents, aptamers can be employed for targeted drug delivery into human cells to enhance drug efficacy and reduce side effects [93]. Human prostate-specific membrane antigen (PSMA, a transmembrane protein associated with prostate cancer) that is overexpressed on the surface of tumor cells, has become the first model system for the aptamer-based drug delivery [94]. Bispecific aptamers against gp120 and CD4 receptor deliver a small interfering RNA-based active agent that inhibits HIV activity in vitro [95].

Methods for designing and producing aptamers. A standard method of aptamer selection is SELEX, which is an analogue of directed evolution followed by screening of protein-binding candidates, but instead of phage display and expression in bacterial cells, it involves in vitro incubation and PCR. During SELEX, the target is first incubated with a set (~ 109-1011) of single-stranded random oligonucleotides known as the primary library. The oligonucleotides in the SELEX library typically consist of 40-100 nucleotides due to limitations of chemical synthesis, with a random region in the middle and common sequences at both ends for primer annealing. After incubation of the primary library with the target, unbound oligonucleotides are removed, and the aptamer-protein complexes are separated. The released DNA sequences are amplified by the error-prone PCR. Standard SELEX usually includes several rounds of this procedure [85]; the selected aptamers are sequenced and parameters of their binding to the target are assessed by various methods, such as surface plasmon resonance or isothermal calorimetry. Up to twenty selection rounds are typically carried out to enrich aptamers with a high affinity to the target [96].

Although traditional SELEX is the main approach for generating aptamers, it is not exempt from limitations. The primary library should include as many diverse structures as possible, while at the same time, it should not contain double-stranded nucleic acids, as well as single-stranded linear structures incapable of providing reliable folding. The problem of finding the optimal primary library remains unsolved [97]. Typically, to increase the library diversity, a large number of structures with a high proportion of GC pairs are used. It has been shown that increasing the GC content of the primary library leads to a greater complexity of tertiary structures, resulting in a greater diversity of potential aptamers and higher average affinity [98].

Another approach to increasing the diversity of the primary library is an approach developed by SomaLogic (a subsidiary of Standard BioTools). The com-pany’s primary library for the production of special modified aptamers, the so-called SOMAmers, contains a large number of structures in which thymidines are replaced by C5′-ethynyl-2′-deoxyuridines. During SELEX, these residues are modified by side chains of amino acids or other groups to expand the properties of the obtained aptamers [99]. However, the evident limitations of this method are its high cost and possibility to insert only a single chemical modification into a SOMAmer structure.

The PCR process used for enrichment in SELEX is not ideal. Due to the presence of a large variety of secondary structures in the primary library and a higher rate of synthesis of short sequences, there is a bias that can lead to excessive enrichment with the structures possessing the structure most advantageous for polymerization reaction, rather than with structures with the optimal required parameters. To solve this problem, emulsion PCR is used, in which each individual structure is amplified in a microsome, thus not competing for the polymerase active site [100].

Despite numerous approaches to the SELEX optimization, its efficiency currently remains at a 30% probability of detecting the desired aptamer [101].

One of the serious limitations of aptamers compared to antibodies is the related to the absence of hydrophobic properties in natural nucleic acids. Among the solutions are the above-mentioned SOMAmers that can be modified, in particular, by side radicals of hydrophobic amino acids (e.g., benzyl radical) (Fig. 5a). However, this modification is labor-intensive and not universal, since the obtained aptamers cannot be not amplified directly by PCR, and additional steps are required. More promising is addition of a synthetic base, such as dNaM-d5SICS (Fig. 5a). Due to the presence of polyaromatic hydrocarbon group, this base is hydrophobic, but still maintains its ability for complementary replication in PCR using modified DNA polymerases [102]. However, adding a new base pair to the primary library for SELEX requires an increased diversity and significantly raises the costs of production.

Fig. 5.
figure 5

Modifications of nucleic acids to increase the affinity of aptamers and their resistance to nucleases: a) modification adding molecule hydrophobicity; b) modifications increasing resistance to autodegradation and hydrolysis by nucleases: 2′-amino- (I), 2′-fluoro- (II), 2′-O-methyl-deoxyribose (III), 2′-deoxy-2′-fluoro-D-arabinonucleic acid (IV), LNA (V), and phosphate modifications (VI, VII). B, any base.

Another limitation of aptamers as therapeutic agents is the lack of effector functions. Sequentially cross-linking of free ends of two aptamers, one of which bound to the target protein and the second interacted to FcRγIII, resulted in the effector aptamer capable of inducing immune response via binding to the target protein [103].

Since aptamers are short oligonucleotides, they are susceptible to degradation. RNA-based aptamers are highly unstable due to self-degradation caused by formation of transient bonds between the 2′-hydroxyl group and phosphate, as well as due to the high level of environment contamination with ribonucleases that catalyze RNA cleavage [104105]. DNA aptamers prove to be more practical in terms of stability; however, they can also be degraded by deoxyribonucleases in vivo. Numerous modifications of aptamers have been developed to prevent both autodegradation and the action of various nucleases (Fig. 5b). Substitution of the 2′-hydroxyl group with amino group (I), fluorine (II), or O-methyl group (III) prevented both autodegradation and cleavage by nucleases. However, this modification significantly enhanced the flexibility of nucleic acid, making it similar to the flexibility of RNA, which may affect the structure of the resulting molecules. 2′-Deoxy-2′-fluoro-D-arabinonucleic acid (IV) and locked nucleic acids (LNAs) (V) are used to significantly increase the resistance of aptamers to nucleases, while maintaining the flexibility of DNA molecule [106]. More effective in this case is replacement of the phosphate group oxygen with sulfur or methyl group, which does not affect the structure of deoxyribose. In this case, the aptamer acquires resistance to nucleases, but alteration of the molecule charge increases its ability for non-specific binding to the target [106, 107]. Also, modifications increasing the resistance to nucleases or increasing their affinity can prevent polymerization in PCR, which complicates the SELEX process [108].

Fig. 6.
figure 6

Application of molecular modeling methods in the development and optimization of aptamers.

Molecular modeling methods in design and studies of aptamers. Method for aptamer selection from a primary sequence library. To optimize the development of aptamers, numerous methods based on molecular modeling have been proposed. Commonly used computational approaches include docking and molecular dynamics, which allow to simulate interactions between the aptamer and its target. These methods are sometimes supplemented by quantum mechanics or hybrid energy calculation methods to provide a more precise estimation of the binding energy [109]. Methods for predicting the secondary and tertiary structure of RNA and DNA play an important role in the design of aptamers, since the tertiary structure is required to model the protein-aptamer complexes [110]. Modern in silico tools make it possible to model aptamers for both small molecules and complex biopolymers. The only limitation of these methods is that they cannot use cells as targets (unlike SELEX). A typical cycle of aptamer modeling in silico starts with the secondary structure prediction, followed by the tertiary structure prediction and its optimization. Next, rigid or flexible docking of the target and aptamer is performed, leading to the selection of complexes with the highest binding scores. The next important, but not mandatory step is molecular dynamics simulation to assess the complex stability and to determine the binding energy with a higher accuracy [111]. Analysis of the aptamer interaction with the target allows to perform nucleotide substitutions or chemical modifications, after which the previous steps can be repeated with new candidates [112]. In addition to the structure and interaction modeling methods, quantitative structure–activity relationship (QSAR) models have been utilized in aptamer design for rapid prediction of the desired feature and scanning of the sequence space [113-115].

All the above computational methods can be used in combination with the experimental ones, complementing each other and increasing the efficiency of aptamer design. For example, the diversity of the primary SELEX library can be increased by using the methods for prediction of secondary structures that allow estimating the structure complexity and selecting sequences with the greatest potential. Such methods can also be applied to select a correct ratio of structures in the primary library to prevent an uneven PCR amplification [116]. However, the methods for the secondary structure prediction are generally low-performance and unable to quickly predict a large number of different structures. Also, they are mostly applicable to RNA, while aptamers are predominantly based on DNA due to its greater structural stability and cost-effectiveness. Furthermore, most secondary structure prediction methods ignore the common pseudoknot structure due to the high computational cost of its prediction [116117]. Therefore, deep learning methods have been proposed to develop models capable of predicting nucleic acid structures with a high speed and accuracy [118]. However, currently, experimental data on single-stranded DNA structures are still insufficient for a high-quality model training.

A primary sequence library in SELEX contains a large number of diverse sequences; however, an aptamer selected for a specific target may still be suboptimal, i.e., have low tertiary structure stability or affinity. A multistep approach involving introduction of point substitutions into the aptamer followed by the molecular dynamics simulation was applied in the development of an aptamer for binding with the antibiotic sulfadimethoxine [119]. The authors used an aptamer generated through SELEX as a starting component. Its interaction with the target was modeled using molecular dynamics, and the affinity of interaction was estimated as a change in the Gibbs free energy during interaction between the aptamer and the target molecule. Point substitutions were introduced into the aptamer, after which molecular dynamics simulation and assessment of binding energy were performed again. The best sequences were selected at each step, which allowed to increase the aptamer affinity without resorting to additional SELEX experiments.

Another example of this approach is development of an aptamer against prostate-specific antigen (PSA) [120]. The researchers utilized five sequences obtained by SELEX as a basis for modeling and adopted the functions of reproduction and crossover in the genetic algorithm to produce next-generation sequences, whose interaction with the target was assessed using protein-protein docking. The best candidates were synthesized and their interaction with PSA was evaluated using a quartz crystal microbalance to select the most promising candidates. Hence, in this study, the sequences were iteratively optimized for the target by utilizing both real experiments, as well as simpler and more high-performance computer models. The resulting aptamer showed a three-fold higher affinity for the target than the original aptamers obtained by SELEX.

Method for aptamer modeling based on the target structure. In the studies described above, computer modeling has been used mainly to improve already existing experimentally obtained aptamers. However, approaches to de novo modeling of aptamers based on the information about the target structure have been proposed. To design an aptamer for cytochrome P450, Shcherbinin et al. [121] used the hot spot method and defined two distinct regions in the nucleotide chain. The first one was involved in the formation of bonds with the protein through nucleotides, making the principal contribution to the molecule affinity and specificity. The second region was responsible for preserving the aptamer’s tertiary structure. After determining the protein surface regions with positively charged amino acids, the structures of trinucleotides binding to the protein were obtained using small molecule docking and molecular dynamics simulation. The fragments exhibiting the highest affinity were fused with helical complementary regions to form a loop structure. The synthesized aptamers showed a high specificity by exclusively binding to proteins within the same family; the experimentally determined binding energies showed a strong correlation with the calculated ones [121].

The modeling of molecules usually starts with a creation of a set of candidate molecules and assessment of their efficiency for further selection. The most accurate, but at the same time, the most expensive approach for such assessment is experimental estimation of the binding energy for the candidate interaction with the target. Hence, molecular docking and dynamics methods are used instead, although, simulations for a large number of candidates can be time-consuming. An alternative approach involves utilizing machine learning methods to predict a required parameter. This approach was used to develop an aptamer against aminopeptidase CD13. The aptamer sequences were generated iteratively using a genetic algorithm. Next-generation sequences were selected by employing an evaluation function based on the machine learning model pretrained on the characteristics of the primary and secondary structures [114]. In another study, a neural network model trained on experimental data was used to explore a set of sequences to develop aptamers against neutrophil gelatinase-associated lipocalin. The authors developed and synthesized aptamers with a high affinity, but also found shorter nucleic acid sequences with a comparable affinity and greater stability [115]. However, such neural networks can identify dependencies and improve already known structures, but when searching for new compounds, they can exclude promising structures from consideration. For this reason, the results of these studies require thorough validation and good interpretation.

Molecular modeling is most promising in the development of modified aptamers more suitable for therapeutic applications. Thus, using in silico approaches to the aptamer development allows to select an optimal combination of modifications that can enhance the binding affinity, resistance to nuclease degradation, while maintaining the binding specificity. In [122], sequential application of different computer programs allowed to model modified aptamers and their interactions with target proteins. The authors used the classical prediction of the aptamer tertiary structure, followed by the nucleotide modification, high-molecular docking, and analysis of interacting bases. In particular, Mfold and 3dRNA were used to predict the tertiary structures of nucleic acids. However, Mfold cannot predict pseudoknots and also demonstrates low performance [123]; 3dRNA also has low performance and shows a significant discrepancy between the experimental and predicted structures (RMSD, ~4 Å) [124]. Thus, due to the low prediction accuracy and low performance, the suggested approach is limited to modeling new aptamers for a given target.

Another obvious limitation of this method is the absence of molecular dynamics stage for optimization of the complex structure. Even a single modification can profoundly affect the tertiary structure of nucleic acid [125]. For this reason, after modifications, the structure should be investigated by molecular dynamics methods to identify conformational changes. However, conventional force fields for modeling nucleic acids cannot be used to simulate modified nucleic acids. Galindo-Murillo et al. [126] conducted parameterization of modified nucleotides, enabling molecular dynamics simulation of atypical bases, which yielded a good similarity between the modeling and experimental results. However, the range of parameterized modifications was limited. Existing universal force fields are capable of simulating molecules with many different modifications, but for biopolymers, these force fields are in poor agreement with experiments [127].

Therefore, there are two promising directions in the development of methods for molecular modeling of aptamers based on the target structure. One group of methods imitates the SELEX selection process using the tertiary structure prediction, docking, and molecular dynamics, while the other group of methods provide a targeted development of the nucleic acid structure capable of binding to the epitope on the protein surface (Fig. 6). Both approaches require high-quality prediction of the secondary and tertiary structure, as well as the use of molecular dynamics simulation. The development of these methods is crucial for the successful prediction of new aptamers.

Molecularly imprinted polymers. Structure and methods of production. Molecularly imprinted polymers (MIPs) were proposed in 1973 to facilitate production of antibody analogs [128]. MIPs are porous materials capable of selective recognition of a template (molecules intended to be recognized by MIP). They are obtained by self-assembly of functional monomers around the template molecule in a porogen (solvent for pore formation), after which polymerization is initiated in the presence of a cross-linking reagent [129]. MIPs have found many applications in diagnostics (e.g., in analogs of immunoassay), affinity separation, drug delivery, etc. [130-135]. MIPs were originally designed for low-molecular-weight compounds, but nowadays they start to find their application for interaction with protein molecules.

The basic approach to obtaining MIPs is a directed creation of a surface with the geometric and physicochemical correspondence to a template. The first step is addition of functional monomers, which interact with the template and form an unstable pre-polymerization complex. At the second stage, polymerization of the resulting complex is carried out in the presence of an excess of cross-linking monomers and a porogen. At the last stage, the template is washed out from MIP, thus creating a cavity for specific binding of the target protein [136137].

There are two main approaches to obtaining MIPs. The first one includes the use of covalent interactions between the monomers and the template; after polymerization, the covalent bonds are cleaved. Due to the complexity of this procedure and the need for additional cleavage steps, this approach is rarely used for proteins.

The second approach is based on the formation of relatively weak noncovalent interactions between the template and functional monomers, such as hydrogen bonds, ionic interactions, van der Waals forces, dipole–dipole and hydrophobic interactions. Due to its simplicity, it is the most commonly used method. However, noncovalent binding requires a more careful selection of the composition and number of functional monomers [138].

Depending on the template structure and selected polymer, the synthesis of MIPs always needs with a careful selection of the reaction mixture composition and reaction conditions, which is one of the main disadvantages of this method. In addition to the unguided selection of possible components, functional and cross-linking monomers, porogens, and reaction conditions, this process is also poorly reproducible. Also, each new batch of MIPs requires the use of a template, which hinders the scalability of production [139].

A combinatorial chemistry method has been proposed to solve the problem of component selection, which involved creating a large variety of possible compositions and running experiments on production of MIPs in parallel [140, 141].

Other factors complicating MIP synthesis are polymerization conditions, including elevated temperatures, pH values unnatural for proteins, and organic solvents, which can lead to changes in the template structure [132]. The method of epitope imprinting was proposed as an alternative to using the whole protein. Most proteins have surface regions with a large number of polar and charged amino acids that are most commonly involved in binding. Instead of attempting polymerization around the whole protein, which requires selection of conditions ensuring retention of the protein structure, only fragments that include a given epitope are used. Due to a smaller size and often more stable structure of such epitopes, this approach increases the range of acceptable conditions for the polymerization reaction. However, in order to achieve a full correspondence between the given epitope and the native structure, it is necessary to carry out rather complicated design of smaller proteins that preserve the structure of the epitope [142].

Although it is possible to test a large number of components to obtain a required MIP structure, the strength of interactions between the functional monomers and the template is usually low, so the monomers are added in excess. This often results in a wide range of equilibrium states of the monomer–template complex, leading to the variety in the affinity and specificity of the resulting MIPs [143].

Polymerization around the template must allow subsequent unimpeded removal of the template protein from the binding site, leaving access for a new protein. Therefore, tightly polymerizing cross-linking monomers used to develop MIPs for small molecules are not suitable for proteins. For this reason, methods allowing to change reaction conditions in order to regulate the pore size and, therefore, to control the binding and release of proteins have been proposed. In particular, it was suggested to carry out polymerization with the addition of pH-sensitive cross-linking elements of peptide nature, so that the increase in pH would lead to the increase in the pore size and binding site increase and provide the protein release [144].

A more popular approach is the surface imprint method, when the template is coated with a thin layer of polymer that does not completely cover the protein globule and does not interfere with the protein molecule release and binding. Further development of this method is nanoparticle pre-polymerization, when functional monomers are bound to the pre-prepared nanoparticles composed of a cross-linking polymer, and full polymerization occurs after interaction with the target protein. However, even with this modification, the surface imprint method significantly reduces the binding specificity compared to bulk polymerization [145]. The types of protein-binding MIPs are shown in Fig. 7.

Fig. 7.
figure 7

Protein-binding MIPs.

Molecular modeling in the development and research of MIPs. MIP development is a long and costly process. It requires selection of optimal monomers, the ratio of components in the reaction mixture, and optimal reaction conditions. This complicates and lengthens process of develop MIP development, which limits their application. Computer modeling can significantly optimize this process.

To date, most studies on molecular modeling of MIPs have been performed for low-molecular-weight compounds [146-149]. The main efforts have been focused on the pre-polymerization stage. Both quantum mechanics and molecular dynamics methods have been used for this purpose in order to find the most suitable functional monomers, evaluate the binding process, and select the optimal ratio of the reaction mixture components [150-157].

Several computational approaches have been proposed to design MIPs that bind to proteins [158-160]. In this case, the most commonly used methods are molecular docking and molecular dynamics simulation.

One of the difficulties in creating MIPs is their possible influence on the structure of the template protein. Thus, molecular dynamics modeling of complex formation between ribonuclease A and functional monomers based on styrene and cross-linking polymer polyethylene glycol dimethylacrylate 400, showed possible conformational changes in the protein that led to its inactivation [161]. At the same time, no possible influence of functional monomers on the PSA structure has been revealed by molecular dynamics simulations in [162].

Kryscio et al. [158] studied the mechanisms underlying the effect of MIP synthesis conditions on the conformation of albumin by docking various functional and cross-linking monomers onto the protein surface. It was found that the monomers preferentially bound to a particular position at the protein surface, where they significantly affected the protein tertiary structure. Moreover, different functional monomers competed for the same amino acid residues.

Molecular modeling techniques are also used to select monomers and to find the optimal reagent ratios for the design of protein-specific MIPs [158, 163, 164]. Rajpal et al. [165] used docking to analyze the composition of a mixture of functional monomers in order to select the optimal reaction mixture for the synthesis of MIPs for several peptides prior to laboratory experiments. Docking followed by molecular dynamics simulation has been used to select the concentrations of functional and cross-linking monomers for the optimal binding to myoglobin [159].

Another important task is prediction of the MIP affinity to its target [166]. Lowdon et al. [166] evaluated several deep learning-based methods for the ability to estimate the affinity of MIPs to small molecules. Thus, the use of different deep learning models allowed to predict well the binding parameters of molecular imprints to 2-methoxphenidine. A similar approach can be applied to the analysis of MIP binding to proteins, which can greatly simplify selection of components and polymerization conditions in the preparation of molecular imprints.

CONCLUSION

The development of high-affinity and specific agents clearly evolves towards simpler, cheaper, and more technologically advanced solutions. Initially, these agents were monoclonal antibodies and their derivatives (chimeric, humanized, and human monoclonal antibodies). The need to simplify and cheapen the production and further development of antibody technologies have led to the use of antibody fragments, such as scFvs, single antigen binding fragment (Fab’s), minibodies, nanobodies, and others. Antibody mimetics became the next step in the development of proteinaceous high-affinity agents. Because of difficulties with the storage and application of protein systems, the next stage was the development of non-proteinaceous systems, the main ones being aptamers and MIPs. All developed high-affinity agents have their advantages and disadvantages. It can be expected that in the future, these technologies will be developed in parallel, providing researchers with a wide range of solutions to choose in each particular case depending on the tasks at hand.

In recent years, molecular modeling methods have been actively used in the development of high-affinity agents. Their application has allowed to abandon the traditional trial-and-error approach. Modeling the interaction of a high-affinity agent with its target provides a more detailed understanding of the molecular mechanisms underlying the binding and facilitates selection of the optimal structure of the system and conditions for its preparation. Molecular modeling has become an integral part of such developments.

The main methods are:

1. Homology modeling to construct the structures of antibodies, their fragments, and known antibody mimetics, as well as the structure of the target;

2. Ab initio methods for predicting variable binding regions of the antibodies, their fragments, and antibody mimetics;

3. Macromolecular docking for the pose selection for interacting bases in the design of protein-binding agents and aptamers (e.g., hot spot method) and in selection of components for MIP design;

4. Macromolecular docking for predicting the structures of protein–protein and protein–nucleic acid complexes;

5. Molecular mechanics, quantum chemistry, and statistical methods to estimate the interaction energies as a basis for selecting the binding agents with a higher affinity;

6. Molecular dynamics to simulate the in vivo behavior of molecular systems and to assess the stability of simulated structures and their complexes.

The classical modeling methods continue to improve; however, in recent years, there has been a dramatic increase in the interest in prediction systems based on deep learning. The ability of neural networks to recognize hidden patterns allows to create the programs with a high predictive power. Systems based on deep learning are used in modeling and de novo design of proteins, protein–protein complexes (including antibodies and antigen–antibody complexes), and nucleic acids, as they allow approximation of physicochemical characteristics, accelerate computationally expensive simulation processes, and more. However, a serious limitation of deep learning-based methods is insufficient training datasets, often leading to a decreased accuracy or “hallucinations”, which prevents these methods from fully supplanting the classical approaches.

Modern software packages, including freeware and web-based ones, can perform the whole cycle of computations in the development of all types of high-affinity agents. Nevertheless, more data, algorithm improvement, and comparative studies of the applicability of different methods are required to further increase their accuracy, performance, and efficiency.