Introduction

Protein glycosylation is a prevalent post-translational modification, involving complex biosynthesis pathways, that modifies specific sites along the protein backbone. Glycosylation plays an important role in numerous biological processes, ranging from fertilization and immune response to cell−cell recognition and inflammation [13]. Advances in glycan characterisation technologies have increased our understanding of the molecular and structural roles of glycans in biological processes, which has led to the development of biotherapeutics including, monoclonal antibodies with specific glycoforms, that is driving drug design and vaccine programmes [4]. Despite the far-reaching functional and structural roles of glycans, supporting computational methods to store and handle the growing volumes of data still remain in their infancy, lagging behind other -omics bioinformatics activities. Furthermore, the absence of sophisticated tools directly impacts the quality of data reported in other databases, as exemplified by the carbohydrate stereochemistry and nomenclature misassignments in the Protein Data Bank [5].

Several initiatives to catalogue and organise glycan-related (structural and experimental) information have been released starting with CarbBank [6] and followed by the KEGG GLYCAN database [7], the Consortium for Functional Glycomics (CFG) [8], GLYCOSCIENCES.de [9], EUROCarbDB [10], GlycomeDB [11] and UniCarbKB [12]. A key impetus in glycomics is now perceptible in the move toward large-scale analysis of the structure and function of glycans. A diverse range of technologies and strategies is being applied to address the technically difficult problems of glycan structural analysis and subsequently the investigation of their functional roles [13].

Mass spectrometry (MS), either directly or in conjunction with chromatographic separation, is widely employed for the characterisation of oligosaccharides [14], yet studies remain hampered due to the inherent complexity of glycan structures. A technique that has the potential to improve glycan analysis is ion mobility-MS (IM-MS), in which ions are separated according to their mass, charge, size and shape. IM-MS separates ions based on the time required to traverse a region of neutral gas, generally helium or nitrogen, under the influence of a weak electric field and reports both an arrival time and mass-to-charge (m/z) value. The arrival time can be converted into a collision cross sections (CCS) which is an absolute value reflecting the rotationally averaged structure of the glycan ion. To date, only a limited number of reports apply IM-MS for glycomics [1518], but we have demonstrated its potential including the separation of isomeric glycans, and by combining IM-MS with fragmentation to provide informative spectra of N-glycans released from sub-microgram amounts of human immunodeficiency virus gp120 [1921].

Some commercially available IM-MS instruments use a non-uniform travelling wave (TW) field to transport ions through the IM cell and require calibration with known compounds to estimate CCSs. We have shown that when estimating CCSs using TW IM-MS instruments (TWCCS) it is vital to use known calibrant CCS values measured in the same IM gas and are of the same sample molecular class as the analyte (i.e., dextran CCSs measured in N2 for estimating N-glycans in N2) [20, 21]. Drift tube IM-MS instruments that utilize a uniform electric field on the other hand, do not require calibration and CCS (DTCCS) measurements are highly reproducible, precise and referred to as “absolute” in this report.

IM-MS derived CCSs have considerable potential as an additional dimension of structural information that is highly complementary to existing glycomics data storage and analysis pipelines. Because of the compatibility of ion mobility-mass spectrometry (IM-MS) with other separation techniques such as high-pressure liquid chromatography (U/HPLC), it is conceivable to implement CCS information in already existing databases. However, there are no tools available for supporting such data collections. Here, we address this deficiency by introducing the GlycoMob database that is tailored towards the storage of IM-MS data including absolute drift tube DTCCS and traveling wave TWCCS values, and how these data add new functionality to UniCarbKB.

Database implementation

GlycoMob (http://www.glycomob.org) is a modular extension of the freely available UniCarbKB framework that extends the MS resources provided by UniCarb-DB. The application is developed with the Play Framework using Java and Scala with a PostgreSQL database. The GlycoMob database schema (or collection of structures, experimental data and associated metadata) is embedded into the UniCarbKB database infrastructure. The integration of GlycoMob with UniCarb databases allows for improved data discovery and cross-referencing over the curated and experimentally verified data collections provided by the UniCarbKB initiative [22, 12].

Design and search features

GlycoMob aims to reinforce and implement data-sharing standards to provide a framework that supports new and current analytical technologies. As such, GlycoMob adopts the UniCarbKB user interface to retain familiarity across all UniCarb-related resources. The database is organized into three major sections including: (i) a complete listing of structures and compositions stored in the database; (ii) biological content-specific data collections, including glycans released from purified glycoproteins; and (iii) a description of analysed commercial glycans. Each section can be accessed from the GlycoMob homepage. In addition, GlycoMob can be searched in three ways: (i) by CSS value and underivatised mass; (ii) by glycoprotein; and (iii) by monosaccharide composition. Each search function can be accessed from the navigation bar and detailed information is provided. By default the search results will return matching structures or compositions.

Glycan structure CSS values

The first release of GlycoMob includes a broad range of masses and CCSs for native glycans as well as their fragments. In total, over 350 CCS values are attributed to 70 precursor glycan structures determined in both N2 and He drift gases. Furthermore, the database includes over 300 fragment ion measurements, predominately determined in the negative ion mode, which can be used to complement structure assignments using IM. As an example, the experimental CCS values obtained for native glycans and fragments from ribonuclease B are shown in Fig. 1 and the coverage of data accumulated from the standard glycoproteins is summarised in Fig. 2. Each entry is comprehensively annotated and includes: a description of the glycoprotein from which the glycans were released or, in the case of commercial standards, a reference to the supplier. Additional information on experimental conditions including gas pressures and voltages are also documented.

Fig. 1
figure 1

Screenshot showing the absolute CCSs of RNase B native glycans and fragments in helium and nitrogen. The CCS measurements are grouped together by the drift gas, selecting either of the tabs above the main table will display the relevant data sets. Users can automatically search the table content via the ‘Filter by CCS’ bar, additional search options include by composition and glycoprotein listed in the side panel or the main ‘Query’ page

Fig. 2
figure 2

Screenshot of the GlycoMob homepage as part of the UniCarbKB database. In the centre of the page is a concise summary of the data available including the glycoproteins analysed and associated number of CCS values for native structures and fragments. Embedded links in the table will redirect the user to the relevant data content pages

Experimental methods

GlycoMob has been populated initially with N-glycans released from four common well-characterised glycoproteins whose structures span all major classes, from high-mannose to complex. A detailed description of the experimental methods and procedures has been previously reported [20]. Briefly, N-linked glycans were released using hydrazine, from the glycoproteins porcine thyroglobulin, ribonuclease B, chicken ovalbumin, and bovine fetuin obtained from Sigma Chemical Co., Ltd. (Poole, Dorset, U.K.) and subsequently reacetylated prior to analysis. For the glycoprotein standards thyroglobulin and fetuin, sialic acids were removed by heating with 1 % acetic acid for 1 h at 70 °C. For electrospray analysis, samples were dissolved in water:methanol (1:1, v:v) at ∼1 mg/mL. Additionally, CSS measurements were generated for the widely used calibration standard dextran (Fluka and Sigma-Aldrich).

Measurements of absolute collision cross sections were performed using a Synapt G1 HDMS quadrupole/IMS/oa-ToF instrument (Waters Co., Manchester, U.K.) modified for drift tube operation [23]. The reported drift tube CCS represent averages of three (He) or two (N2) replicates acquired in independent measurements. In addition, high-resolution TW IM-MS measurements were performed on an unmodified Synapt HDMS G2-S 8000 m/z quadrupole/IMS/oa-ToF MS instrument (Waters Corporation, Manchester, U.K.). Based on these data, travelling wave CCS values were estimated as reported previously [20], and used to construct GlycoMob.

The coverage of GlycoMob also includes CSS (drift tube and travelling wave) measurements for synthetic pure high-mannose N-glycans (Dextra, Reading, UK) as [M+H]+, [M+Na]+, [M+K]+, [M-H], [M+Cl] and [M+H2PO4] ions in He and N2 drift gases [24] (Fig. 3).

Fig. 3
figure 3

High-mannose summary page. For each structure entry the CCSs measured in positive and negative mode using either nitrogen or helium drift gas are shown together with its adducts. Additionally, ion mobility data sets can be accessed from the ‘Glycoprotein Standards’ and ‘Dextran’ drop down lists in the right side panel. Users can select the embedded UniCarb-DB and UniCarbKB links to access relevant mass spectra and structural information, respectively. The structures are displayed using the hybrid Consortium for Functional Glycomics [25] and Oxford graphical notation [26, 27]

Bridging GlycoMob with UniCarb-related databases

A significant problem facing the glycobioinformatics community is the distribution of data. The principal objective of the UniCarbKB initiative is not only to connect related data collections, but also to provide descriptive metadata that is representative of the reported data. Efforts led by the Minimum Information for A Glycomics Experiment (MIRAGE) [28] project and the ontology work of GlycoRDF [29] aim to alleviate this situation by providing standardized data terms. In this context, fully characterized structures stored in GlycoMob (e.g., high-mannose standards) are linked to the tandem MS data repository UniCarb-DB. Similarly, users can access data stored in the curated glycan structure database UniCarbKB (and vice-versa), by following the links for individual glycoprotein entries and glycan compositions. Such connections allow researchers to easily navigate between databases and view experimental tandem MS spectra and related structural/glycoprotein information.

GlycoMob adopts GlycoRDF to provide a common standard for representing the stored information. GlycoRDF aims to enhance data interoperability by providing an ontology to standaridise glycomics data for building data-driven applications, which efficiently integrate heterogeneous datasets through the provision of a single SPARQL endpoint that can be used to query and return data from the semantic web. For example, such an endpoint enables users to simultaneously query GlycoMob, UniCarb and other RDF-compatible databases to answer complex research questions. The development of web applications to perform such queries is in its infancy, however, examples (using SPARQL) are provided on the UniCarbKB RDF site (http://rdf.unicarbkb.org).

Discussion

CCS measurements provide an additional parameter that can be used to improve the specificity of glycan identification. Here, we have described GlycoMob a novel database solution for storing and making available experimentally confirmed CSS values acquired from both TW and DT IM-MS instruments for the glycomics domain, which starts to address the lack of resources available for this emerging technology.

The database provides access to a broad dataset for CCS calibration of TW IM-MS instruments using native N-linked glycans released from naturally occurring and commercially available glycoproteins. Access to such data not only assists calibration, but also will provide an extra level of confidence in the characterization of N-glycan structures. Its availability will serve as a tool for the development of new analytical tools for structural data querying and spectra interpretation, in particular, when connected to liquid chromatography-MS-based workflows and the MS/MS database UniCarb-DB that contains masses, retention times and fragment information.

By providing a common interface to high-quality data, we believe that GlycoMob can become a vital resource for analyzing glycomic ion mobility data. As the technology improves and further data are generated, the database will be continually updated with new CCS values and associated data for glycans released by PNGaseF and hydrazinolysis, thereby providing a leading platform for IM-MS.