Introduction

Glycosylation, the covalent addition of glycans onto proteins, is not only the most structurally diverse but also the most abundant post-translational modification (PTM) [1,2,3]. The classical glycosylation mainly occurs as oligosaccharides on proteins in the secretory pathway (commonly linked via Asn (N-linked) or Ser/Thr (“mucin-type” O-linked)). In comparison, O-linked β-N-acetylglucosamine modification (O-GlcNAcylation) on Ser/Thr residues of proteins was not reported until 1984 [4, 5]. Mounting evidence since then gradually revealed that O-GlcNAc is distinct in multiple ways: (1) O-GlcNAc is a monosaccharide which is not elongated to complex sugar structures [6, 7]; (2) O-GlcNAc is a reversible and highly dynamic modification catalyzed by only a pair of enzymes: O-GlcNAc transferase (OGT) (which adds O-GlcNAc to Ser/Thr residues [8,9,10]) and β-D-N-acetylglucosaminidase (O-GlcNAcase) (which removes O-GlcNAc [11]); (3) O-GlcNAc exists almost exclusively on proteins localized in the nucleus, cytoplasm, and mitochondria [7, 12, 13]; (4) O-GlcNAc frequently interplays with other PTMs (particularly phosphorylation [14]); and (5) O-GlcNAc is most common in metazoans, despite the wide presence across all kingdoms of life (except Archaea) [15]; (6) due to the monosaccharide nature, functional roles of protein O-GlcNAcylation can be studied similar to phosphorylation and other PTMs, where site-specific mutagenesis as well as other targeted approaches [16,17,18,19] can be conducted. Indeed, O-GlcNAcylation as a single monosaccharide is of critical importance in numerous biological processes and intimately linked to physiological/pathological events [20,21,22,23,24,25,26,27,28,29,30,31,32]. Moreover, developing therapeutics and biomarker assays that target protein O-GlcNAcylation shows great promise [33].

In the past two decades, there have been tremendous technical advances, especially with high-throughput analytical techniques (tandem mass spectrometry in particular), allowing increasingly comprehensive analysis of glycans and glycoproteins [33,34,35]. Correspondingly, glycoproteomics data explosion promoted the rapid evolution of glycoinformatics, a branch of bioinformatics specifically tailored to information concerning the primary structure (composition, sequence, and linkages) and three-dimensional structure (including dynamics) of glycans and glycoproteins in tissues or cell types as well as their interaction with biological surroundings [36,37,38,39,40]. Until recently, the development of glycoinfomatics tools has been slow and difficult due to the complexity of conventional glycosylation, and O-GlcNAc, despite being a structurally well-defined monosaccharide that is found widespread throughout the nucelocytoplasmic proteome, has also been neglected. Of all the ~20,000 proteins encoded by the human genome, it is estimated that at least 25% (>5000 proteins) are localized in the secretory pathway (including ER, Golgi apparatus, and plasma membrane) [41], whereas the majority (~75%, 15,000 proteins) are localized in nucleus (nuclear and nucleolar structures), cytosol, mitochondria, cytoskeleton, and others. Among the ~15,000 proteins, >25% (>4000 proteins) have been found O-GlcNAcylated, with >11,000 Ser/Thr sites unambiguously identified to be O-GlcNAcylated thus far (O-GlcNAcAtlas, version 3.0) [42]. The numbers of O-GlcNAc proteins and modification sites clearly suggest its widespread presence (which may be far more than those modified by complex glycans). Arguably, O-GlcNAc represents a large population of glycosylated proteins in humans and other species, which has led to substantially increased research attention over the past few decades, yielding >3400 O-GlcNAc-related publications (Fig. 1). Given the unique importance of O-GlcNAc in health and disease, increased efforts in developing O-GlcNAc informatic tools have been made. Undoubtedly, there is a need to bridge these endeavors to the glycoinformatics, glycobiology, and even a wider scientific community.

Fig. 1
figure 1

Accumulated numbers of O-GlcNAc-focused publications indexed in PubMed from 1984 to 2023 (search queries: O-linked-β-N-acetylglucosamine, O-GlcNAc, O-GlcNAcylation)

This article thus aims to summarize advances in informatics for O-GlcNAc research in the past 40 years. We will focus on several closely intertwined aspects: bioinformatics tools for O-GlcNAc proteomics; databases/servers cataloging O-GlcNAc proteins/peptides/sites experimentally identified; software tools for the prediction of O-GlcNAc sites; and databases for OGT/OGA interaction proteins (Fig. 2). Besides addressing the progress and challenges, we also illustrate how these O-GlcNAc informatics tools have shaped O-GlcNAc research, especially the elucidation of O-GlcNAc site-specific functional roles in physiology and pathology.

Fig. 2
figure 2

Application of O-GlcNAc informatics tools for O-GlcNAc site-specific functions on protein(s) of interest

Bioinformatic tools for O-GlcNAc proteomics

As for other PTMs, chemical and biochemical tools/methods are indispensable for O-GlcNAc research. With the rapid maturation of high-resolution and sensitive mass spectrometry and accompanying techniques, tandem mass spectrometry-based proteomics has emerged as a cornerstone approach, which can be used for either low-throughput (e.g., immunoprecipitated proteins from cell lysates) or high-throughput (e.g., whole cell lysates) analysis. Due to the low stoichiometric nature, enrichment of O-GlcNAc proteins/peptides is generally needed for O-GlcNAc proteomics. Moreover, the O-GlcNAc moiety is extremely labile in gas phase, hindering accurate identification of O-GlcNAc sites. Thus, quite a few O-GlcNAc proteomics studies (especially in earlier days) reported numerous O-GlcNAcylated proteins as potential or putative (even after certain types of enrichment) and without information on the specific O-GlcNAc peptides or sites. It is problematic to consider any of such proteins as modified by O-GlcNAc. Therefore, identification of O-GlcNAc peptides/sites provides direct evidence for confident assignment of proteins as O-GlcNAcylated. And mapping O-GlcNAc sites generally serves as the first step towards elucidation of their functional importance [33, 43,44,45,46,47,48,49].

Despite the structural simplicity, site-specific O-GlcNAc proteomics has never been an easy task. Due to the inherent challenges of O-GlcNAc aforementioned (e.g., low stoichiometry and extreme lability in gas phase), it is not surprising that substantial efforts have been focused on sample preparation (e.g., development of enrichment methods/materials) and mass spectrometry data acquisition. For example, electron transfer dissociation (ETD) and hybrid fragmentation modes (such as electron transfer/higher-energy collision dissociation (EThcD), and higher-energy collisional dissociation (HCD) product dependent EThcD (HCD-pd-EThcD)) can retain O-GlcNAc moiety on peptides during fragmentation, largely facilitating O-GlcNAc site mapping [33, 45]. As an indispensable part, O-GlcNAc proteomics data analysis is not trivial. So far, a number of software packages have been applied for O-GlcNAc site mapping. Instead of introducing all proteomics software tools available, we aim to focus on those that have been used more often for O-GlcNAc proteomics (below). It is notable that other tools (e.g., the OScore software [50], the Open Mass Spectrometry Search Algorithm (OMSSA) [51,52,53], and MS-GF+ [54]) have also been utilized for site-specific O-GlcNAc analysis.

Proteome Discoverer (PD) software, a commercial platform of Thermo Fisher Scientific, provides streamlined peptide/protein identification and quantification. Since its first release in 2007, PD has been evolving to integrate multiple search algorithms such as Sequest HT, Mascot, MSAmanda, Byonic, and ProSightPD [55]. PD is also commonly used for PTM analysis, since it includes ptmRS (the successor of PhosphoRS) as a node for measuring the confidence of modification localization in peptide sequences in terms of probability. In order to calculate individual probability values for each putatively modified site, the algorithm optimizes individual peak depths for different regions of the tandem mass spectrum with a limit of eight peaks per 100 m/z window [56]. Since PhosphoRS has been validated using synthetic phosphopeptides, it is not fully clear about the performance of ptmRS for other types of modifications [56]. Nevertheless, as the accompanying software for Orbitrap instruments, PD is a widely used software by many labs. As a seamlessly embedded module, ptmRS has been adopted for O-GlcNAc site localization analysis in many studies (e.g., [57,58,59,60,61]).

MaxQuant, a freely available proteomics software package integrated with search engine Andromeda, has gained wide adoption for analyzing large mass-spectrometric datasets [62]. The PTM scoring algorithm in Andromeda was initially developed for phosphorylation and has been expanded to other types of modification. Andromeda divides the entire MS/MS spectrum into 100 m/z bins and evaluates the experimental peaks in each bin using a binominal distribution probability formula, giving priority to peaks with higher intensities [62]. Specifically, the PTM localization score is determined by selecting the four highest-intensity fragment ions from each 100 m/z segment, and this selection is performed for every possible combination of PTM sites within the peptide sequence [63]. MaxQuant has been applied for site-specific O-GlcNAc proteomics data analysis, typically with 0.75 as the PTM score threshold for unambiguously localization assignment [64,65,66].

Protein Prospector is another freely accessible suite of tools developed for analysis of MS-based proteomics. For ambiguous modified positions, Site Localization In Peptide (SLIP) scores are reported, which are determined by comparing probability and expectation values for the same peptide with different site assignments in terms of −logP [67]. So far, Protein Prospector has been used in a number of O-GlcNAc proteomics studies [68,69,70,71,72].

The relatively effective identification of O-GlcNAc peptides/sites by the aforementioned search engines shows that it is possible to use the traditional proteomics software packages for O-GlcNAc proteomics by including HexNAc as a variable modification. To bear in mind, the HexNAc moiety might be lost from peptides during collision-based fragmentation (especially HCD), producing peptide fragments without the glycan mass and resulting in low confident or erroneous site assignment. Thus, data analysis programs must anticipate the potential HexNAc loss to correctly calculate the peptide fragment masses. Besides, in recent years, there has been an emergence of a series of software tools targeted for glycoproteomics data analysis, due to the failure of complex glycan analysis by using the commonly used proteomics software. For example, to facilitate the structural education and site mapping of complex O-glycans on peptides, a series of packages have been developed, including Byonic [73], MSFragger-Glyco [74], pGlyco [75], O-pair [76], and GlycReSoft [77]. Although promising, only a few of them (e.g., Byonic [78, 79]) have been tentatively applied for O-GlcNAc proteomics analysis thus far.

Of special note, despite the significant progress to identify O-GlcNAc peptides and localize O-GlcNAc sites, it is a challenge to distinguish it from O-GalNAc (especially GalNAcα1-S/T, i.e., Tn antigen), since the two isomers of HexNAc result in the same delta mass on the modified peptides when fragmentated in fragmentation modes such as HCD. Although subcellular localization of proteins is helpful to distinguish GlcNAc and GalNAc, its performance might be compromised since O-GlcNAc also appears to be on certain extracellular proteins (e.g., mediated by the atypical enzyme EOGT) [80] and a small portion of O-GalNAc proteins appears to be localized in nucleus [81]. To that end, by leveraging the differential intensities of oxonium ions (i.e., m/z 126.055, 138.055, 144.066, 168.0655, and 186.0761) for O-GlcNAc and O-GalNAc, a binary logistic regression model HexNAcQuest was developed recently [82]. Results from independent validation datasets demonstrate that HexNAcQuest can accurately discern O-GlcNAc from O-GalNAc modification mainly based on the intensities of oxonium ions that have resulted from HCD, EThcD, or HCD-pd-EThcD mass spectrometry. Moreover, HexNAcQuest is a more accurate and general model in comparison to criteria based on empirical observations [45, 79, 83, 84]. In addition, a detailed protocol has recently been described to integrate HexNAcQuest with commonly used proteomics data analysis workflows [85], which will further facilitate distinguishing HexNAc isomers on peptides from complex samples.

Taken together, a series of software packages have been adopted for O-GlcNAc proteomics data analysis. However, their performance (especially those of the recently developed software tailored for O-glycoproteomics) has not been rigorously evaluated for O-GlcNAc proteomics. Each algorithm used may identify overlapping as well as different populations of O-GlcNAC peptides from the same mass spectrometry data files. Furthermore, although some software (e.g., PD and MaxQuant) allows custom settings (e.g., adding a certain chemical tag onto Ser/Thr residues) for data analysis, many others do not. This serves as an important feature when selecting appropriate software tools, as chemical tagging (e.g., the metabolic labeling- or chemozymatic labeling-based enrichment) appears to be a powerful strategy for O-GlcNAc enrichment [51, 53, 64, 65, 78]. Besides identification of O-GlcNAc peptides, accurate O-GlcNAc site localization is a critical aspect. Although many software tools provide site localization scores, unambiguous site assignment is still challenging (especially for peptides containing multiple O-GlcNAc sites). Clearly, this field is still in need of a widely accepted software tool and metric that can accurately describe the certainty of site localization in a user-friendly manner for O-GlcNAc proteomics data analysis of large datasets.

Databases/servers for O-GlcNAc proteins/sites

Technical advances in O-GlcNAc proteomics have produced a large amount of data. Several databases (e.g., PhosphoSite Plus [86], dbPTM [87], and MS-viewer [88]) attempted to collect O-GlcNAc sites and proteins. Unfortunately, these databases only covered a limited amount of useful information. Thus, several databases have been created to specifically accommodate the rapid accumulation of O-GlcNAc information on proteins.

The database dbOGAP, established in 2011, is the first public bioinformatics resource dedicated to O-GlcNAcylated proteins and sites [89]. The initial version contained ~800 O-GlcNAcylated proteins and ~400 sites experimentally determined in about 500 articles published since 1984 to 2010. Unfortunately, the website became inaccessible due to the lack of maintenance. Consequently, we took the initiative to establish a new one, O-GlcNAcAtlas, to integrate O-GlcNAc sites and proteins from literature [42]. Stringent selection criteria were applied to select O-GlcNAc sites and proteins, and proteins identified in large-scale proteomics studies without O-GlcNAc site localization were not included. Besides unambiguously identified sites (with a localization score >0.75), ambiguously identified sites (mainly due to the low localization scores for peptides with clustered Ser/Thr residues especially when processed by the software tool Protein Prospector) were also included. Besides O-GlcNAc sites, related information (including species, sample type, peptide sequence, protein name, and site mapping methods used) was also extracted and presented. Meanwhile, the O-GlcNAc database was developed as another online resource for O-GlcNAc proteins (first focusing on human then extended to other species) [90]. Since many proteins without information on O-GlcNAc peptides/sites have been included, the O-GlcNAc database appears to have substantially more proteins. However, as aforementioned, it is hard to consider such proteins (mostly from O-GlcNAc proteomics studies without peptides/sites) as O-GlcNAcylated (lacking basic information). Interestingly, an O-GlcNAc score (between 0 and 100) is used as a quantifier to estimate the level of O-GlcNAc confidence for each protein in the database, with advertised caution for any score below 10.

Of note, the interplay between Ser/Thr O-GlcNAcylation and phosphorylation has been extensively studied in the past few decades [14]. With the public availability of specifically tailored databases of O-GlcNAc sites/proteins (as aforementioned) and phosphorylation sites/proteins (such as PhosphoSitePlus [86] and EPSD [91]), investigators can integrate such information easily, in whichever way they like (e.g., explore the potential cross-talk on modification sites on proteins of interest). The integration would also allow exploration of the intricate interaction between these two PTMs (e.g., co-localization evaluation of the two PTMs by doing meta-analysis [15]).

Bioinformatic tools for O-GlcNAc prediction

So far, >16,000 unambiguous sites and >10,000 ambiguous sites have been identified on a total number of >7000 proteins in multiple species (O-GlcNAcAtlas, version 3.0). Despite the technological progress, sensitive and robust methods for global and site-specific O-GlcNAc analysis are still lacking in many cases. Since protein O-GlcNAcylation functions in a site-specific manner and accurate determination of O-GlcNAc sites is also technically demanding, there is a need to obtain O-GlcNAc site information to afford site-specific O-GlcNAc functional studies of proteins (without known modification sites reported). To that end, a number of computational tools (based on machine learning and deep learning methods) have been developed to predict O-GlcNAc sites (Table 1).

Table 1 Computational tools developed for the prediction of O-GlcNAc sites on proteins

YinOYang, an artificial neuronal network-based predictor developed in 2002, is a pioneering effort in this area [92]. It was trained on 40 experimentally determined O-GlcNAc acceptor sites on human proteins to recognize the sequence context and surface accessibility. Although the positive dataset is limited, it serves as a benchmark for comparison in evaluating the effectiveness and reliability of newly emerging predictors. With the growing knowledge of O-GlcNAcylation and the development of its public resources, several predictors have been developed by using larger positive datasets. When the first O-GlcNAc database, i.e., dbOGAP, was launched, a site prediction system named OGlcNAcScan was also introduced to the scientific community, which was based on a support vector machine (SVM) and trained on nearly 400 O-GlcNAcylated sites in dbOGAP [89]. However, this predictor and related database have not been accessible. PGlcS is another O-GlcNAcylated sites predictor trained on the dbOGAP dataset [93]. It used k-means cluster and SVM classifier combined with multiple features to improve the performance. Although it claimed that PGlcS presented better sensitivity compared to O-GlcNAcScan when tested on an independent dataset, its public access was not provided. Potential OGT substrate motifs have also been used in developing a two-layered machine learning-based predictive model, i.e., OGTSite [94]. The positive training data composed of 410 experimentally verified O-GlcNAcylation sites was investigated using the maximal dependence decomposition (MDD) method to discover substrate motif signatures. Despite the promising accuracy provided by OGTSite, its web portal is no longer available. Since 2013, Jia and colleagues developed a series of predictors to capture O-GlcNAcylated sites on proteins. The first one, i.e., O-GlcNAcPRED, is an SVM-based model trained on 41 mer peptide sequences built from 167 proteins in dbOGAP with application of the adapted normal distribution bi-profile Bayes (ANBPB) feature encoding scheme [95]. In 2018, Jia et al. [96] introduced an improved predictor, O-GlcNAcPRED-II, based on larger training datasets (e.g., 945 O-GlcNAc sites) and the rotation forest algorithm.

The quickly evolving deep learning methods, together with the availability of much larger datasets, have prompted the development of several new prediction tools for O-GlcNAc sites in recent years. Among the >9000 unambiguously identified O-GlcNAc sites on proteins from multiple species (O-GlcNAcAtlas, version 2.0), humans and mice contain the most O-GlcNAcylated sites. Thus, Jia and collaborators built ensemble models based on deep learning networks (named O-GlcNAcPRED-DL) for these two species separately [97]. In short, O-GlcNAcPRED-DL employed a one-hot encoding approach, BLOSUM62 (a matrix reflecting sequence similarity), AAindex (reflecting amino acid physical and chemical properties), and Word2Vec (reflecting global characteristics and contextual information). Moreover, four network frameworks were built based on the connection of the convolutional neural network (CNN) and the bidirectional long short-term memory method (BiLSTM). In comparison to the traditional machine learning-based tools reported previously, the ensembled O-GlcNAcPRED-DL models showed substantially enhanced performance for the O-GlcNAc site prediction on human and mouse proteins. Meanwhile, also based upon the datasets from O-GlcNAcAtlas (version 2.0), Pokharel et al. developed another predictor LM-OGlcNAc-Site [98]. In brief, it integrates embeddings from multiple protein language models (Ank, ESM-2, and ProtT5) by adopting a decision-level fusion approach. LM-OGlcNAc-Site appears to outperform the models trained on these individual models as well as the integrated models using score-level fusion and O-GlcNAcPRED-II.

Collectively, a number of computational models have been developed to predict O-GlcNAc sites in the past few decades. It is fortunate that O-GlcNAc prediction, coupled with downstream experimental approaches (e.g., site-directed mutagenesis), has been applied for many site-specific functional studies. Despite the success obtained, some challenges remain. For example, there is no consensus motif for protein O-GlcNAcylation, although there appear to be OGT-preferred amino acid sequences within several species [15]. In addition, secondary and tertiary structures also appear to affect O-GlcNAcylation [15]. Therefore, there may still be challenges to accurately predict O-GlcNAc sites on certain proteins. Moreover, the developed models are mostly trained on human and mouse datasets, due to the limited experimental datasets for other species. Thus, the performance of current prediction models for proteins in other species might be compromised. With the further development of advanced artificial intelligence (e.g., advanced deep learning and protein language models) and further increased growth of experimental datasets for training and testing, we anticipate that current models will be refined and novel models will be developed for enhanced prediction, sensitivity, and accuracy of O-GlcNAc sites on proteins from human, mouse, and other species. Prediction of O-GlcNAc sites will continue to serve as a valuable strategy by many researchers to expedite site-specific functional elucidation in diverse scenarios.

Databases for OGT/OGA interaction proteins

As the main executors of biological function, proteins are not working alone. But rather, proteins work closely with their interaction partners. Defining the interaction proteins of OGT/OGA, the O-GlcNAc cycling enzymes, is of critical importance.

Technologies to map protein–protein interactions (PPIs) have enabled the global characterization of PPIs of many proteins in recent years. Some high-throughput methods (including affinity purification coupled tandem mass spectrometry (AP-MS), immunoprecipitation-MS (IP-MS), cross-linking MS (XL-MS), and proximity labeling MS (PL-MS)) have also been tailored for the characterization of OGT/OGA-interacting proteins [99,100,101,102,103,104,105,106,107].

To accommodate the exponentially increased datasets of PPIs, several comprehensive databases (e.g., BioGRID [108], APID [109], IntAct [110], and STRING [111]) were constructed in the past years. Although these public repertories categorize hundreds and thousands of PPIs from many species, they covered only a limited number of OGT-interacting proteins experimentally described. Recently, we compiled OGT-Protein Interaction Network (OGT-PIN) [112]. As a specifically designed portal, OGT-PIN is a rigorously curated and comprehensive database for interaction proteins of OGT and its orthologues identified in several species of intense research (e.g., SXC in Drosophila melanogaster, SEC in plants, and OGT-1 in Caenorhabditis elegans). Although over 2500 were experimentally identified as OGT interaction proteins, >1000 can be regarded as high-stringent interacting proteins of OGT and orthologues (OGT-PIN, version 2.0). Among them, it appears that human OGT has >800 high-stringent interacting proteins, suggesting that it is truly one of the hub proteins in cellular interaction network. Interestingly, only a small portion (~39%) of OGT-interacting proteins have been found to be OGT substrates, supporting the notion that they are not necessarily OGT substrates. The cataloging of OGT interaction proteins will facilitate functional studies of OGT-catalyzed O-GlcNAcylation on protein(s) of interest. In addition, it will provide clues to further understand the non-canonical roles of OGT which are yet to be characterized [113]. In contrast, OGA interactomes have been much less explored [106, 107, 114], and a specifically tailored database of OGA interaction proteins is not available yet. Investigators who are interested in that aspect might have to retrieve information from comprehensive databases aforementioned (such as BioGRID) and/or related publications.

With the further technical advances (e.g., in vivo proximity labeling), it is anticipated that more proteins will be identified as OGT/OGA-interacting proteins. The identification of weak and transient OGT/OGA interaction proteins will be anticipated, which will further expand the repositories. However, how to accurately define true hits and avoid false positives continues to be a question. In addition, as other proteins, interactomes of OGT/OGA are a dynamic system; thus, cataloging temporal interaction partners of OGT/OGA may provide us a dynamic view of the OGT/OGA interaction networks upon perturbation in different biological settings.

Conclusions

Four decades of research on O-GlcNAcylation has been very fruitful! In this article, we summarized great progress in informatics for O-GlcNAc research, by focusing on several aspects (including bioinformatics tools for O-GlcNAc proteomics; databases/servers cataloging O-GlcNAc proteins/peptides/sites experimentally identified; software tools for the prediction of O-GlcNAc sites; and databases for OGT/OGA interaction proteins). Not only have O-GlcNAc studies created large amounts of datasets (e.g., O-GlcNAc proteomics and interactomics), urging the development of an array of O-GlcNAc-focused databases/servers and software tools, but O-GlcNAc informatics resources have been instrumental in facilitating O-GlcNAc studies along the years (Fig. 2). Besides elucidating functional roles of O-GlcNAcylation on many proteins, they have provided us valuable insights into multiple aspects such as a global and detailed view of protein O-GlcNAcylation and OGT biology. Of note, informatics analysis can also be applied to other O-GlcNAc-relevant research topics (e.g., prediction of putative conveyers of OGT intellectual disability [115] and cancer biomarkers [116]), which are beyond the scope of this article.

Clearly, there are aspects of O-GlcNAc informatics that need to be improved. For example, performance of software packages for O-GlcNAc proteomics needs to be rigorously evaluated and/or refined. A key point is to further improve O-GlcNAc site localization so that unambiguous and accurate site assignment can be achieved (especially for peptides containing multiple O-GlcNAc sites). Although certain success regarding O-GlcNAc prediction has been obtained, there is room to further enhance prediction sensitivity and accuracy of O-GlcNAc sites. The continuously expanding repositories of O-GlcNAc sites from experiments can be used to further improve O-GlcNAc prediction performance. The rapid evolution of artificial intelligence capabilities (e.g., via machine learning and deep learning-based algorithms) has begun to unleash unprecedented potential to modern biology research (including glycobiology) and precision medicine [117, 118]. With the implementation of these technological advances, we believe that O-GlcNAc informatics resources will become more sophisticated. The maturation of these resources will undoubtedly encourage more researchers to join the glycosciences and particularly O-GlcNAc field. We anticipate that the advances in O-GlcNAc informatics will make it a handy and indispensable tool for biomedical scientists in O-GlcNAc-targeted basic and translation research in the coming years.