Keywords

5.1 Introduction

Bruton agammaglobulinemia tyrosine kinase (BTK) variations lead to X-linked agammaglobulinemia (XLA, MIM# 300300), a hereditary primary immunodeficiency [26, 29]. XLA is caused by a block in B cell differentiation resulting in severely decreased numbers of B lymphocytes and an almost complete lack of plasma cells and very low or missing immunoglobulin levels of all isotypes. The patients have increased susceptibility to mainly bacterial infections because of virtually absent humoral immune responses. The frequency of XLA has been estimated to be 1:200,000 live births. The disease is considered to have full penetrance. Female carriers are healthy but display nonrandom X-chromosome inactivation in their B cells. Only a few female patients have been identified.

The BTK gene (LRG_128, reference sequence used U78027.1) contains 19 exons (Fig. 5.1) and codes for a protein of 77 kDa. Exon 1 is outside the coding region. BTK is expressed in all hematopoietic lineages except for T lymphocytes and plasma cells [23]. BTK belongs to the Tec family of related cytoplasmic protein tyrosine kinases (PTKs) formed by BMX (BMX non-receptor tyrosine kinase), ITK (IL2-inducible T cell kinase), TEC (tec protein tyrosine kinase), and TXK (TXK tyrosine kinase). Except for TXK, they have the same domain organization, from the N-terminus pleckstrin homology (PH) domain, Tec homology (TH) domain, Src homology 3 (SH3) domain, SH2 domain, and catalytic tyrosine kinase (TK) domain.

The three-dimensional structure has been determined for the PH domain and the first half of the TH domain [7], the SH3 domain [4], the SH2 domain [6], and the kinase domain [12]. For the full-length BTK, there is a low-resolution structure in an extended conformation [13]. BTK interacts with several partners [15].

Variations in BTK account for about 80 % of agammaglobulinemia cases. Several other genes can lead to a failure of B cell development and agammaglobulinemia [1]. These genes encode components of the pre-B cell receptor or proteins that are activated by cross-linking of the pre-B cell receptor. Defects in these genes lead to a block in B cell differentiation at the pro-B to pre-B cell transition. Other forms of agammaglobulinemia appear with growth hormone deficiency or as autosomal recessive diseases. Some autosomal recessive agammaglobulinemias have been identified involving pre-B cell receptor (pre-BCR) or BCR component genes μ-heavy chain (IGHM), λ5/14.1 (IGLL1, immunoglobulin lambda-like polypeptide 1), Igα (CD79A), and Igβ (CD79B). Variations in the B cell linker protein (BLNK), which is essential for Igμ signal transduction, and PIK3R1 (phosphoinositide-3-kinase, regulatory subunit 1 (alpha)) for phosphoinositide 3-kinase regulator are downstream of BCR.

5.1.1 BTKbase

BTKbase is the first immunodeficiency variation database (IDbase) founded in 1994 [32]. Subsequently more than 130 immunodeficiency variation databases (IDbases) have been released [19]. BTKbase contains public variation entries for 1362 patients from 1198 unrelated families (total number of variants in these unrelated families is 1209) showing 742 unique molecular events.

BTKbase aims at collecting all published variations. Data are either directly submitted or derived from more than 100 publications. The database format has been previously published [31, 34]. The data are presented as individual entries, each carrying a unique patient identification number (PIN) and accession number, systematic names according to the Human Genome Variation Society (HGVS) variation nomenclature, a short verbal description of the variation, submission information (submission and update dates, version numbers, and submitter details), literature citations, and annotation in detail at DNA, RNA, and protein levels. In addition, the most important clinical parameters and laboratory findings are included, provided they are available.

IDbases, including BTKbase, follow a number of standards including the use of HUGO Gene Nomenclature Committee (HGNC) gene names (www.genenames.org), HGVS variation nomenclature [3], and IDRefSeqs (reference sequences for primary immunodeficiency genes and proteins). Currently, IDbases are in the process of changing to Locus Reference Genomic (LRG) reference sequences, which are already available for some 100 immunodeficiency genes (www.lrg-sequence.org). BTKbase follows the recommendations for locus-specific variation databases (LSDBs) [33] and their curation [2].

BTKbase is freely available at http://structure.bmc.lu.se/idbase/BTKbase/. The website contains information related to XLA and BTK. The bioinformatics pages include several tables for statistics of BTK variations. The variation distributions are shown along sequences in illustrative ways. The submission page provides variation checking facilities and electronic submission services. The variation browser allows visual means for browsing variations along the protein sequence. The reference information for variation publications and related protein structures are included in their own sections.

5.2 Analysis of BTK Variations

XLA arises as a block in B cell development. BTKbase contains information in many entries for the immunological status of patients. These properties have been extensively discussed in a previous publication [28]. The majority of the reported patients have significantly reduced numbers of B cells and Ig levels. A large portion of patients with X-linked diseases have de novo variations.

5.2.1 Variation Statistics

Extensive statistical analyses of variations at the three molecular levels, DNA, RNA, and protein, were performed. Since data per unique families are considered the most representative regarding, e.g., mutational effects and prevalence, the discussion about variation statistics mainly relates to these.

Variations appear throughout the BTK domains as well as in exons and introns (Fig. 5.1, Table 5.1); however, the distribution is not even. Some exons contain more variations than expected. The PH and SH2 domains contain approximately the expected number of variations, whereas there are less than expected in the TH and SH3 domains and more than expected in the kinase domain (Table 5.2). The TH domain has two structural elements [31, 35], an N-terminal BTK motif and a C-terminal proline-rich region which contains two proline-rich regions capable of intra- and intermolecular interactions [4, 17]. The reason for under-representativeness of the TH domain may be that it likely has a partially intrinsically disordered structure in the C-terminal half of the domain, and therefore, variations do not have a major effect. On the other hand, XLA-causing variants do appear in the Zn2+-binding BTK motif.

Fig. 5.1
figure 1

Distribution of all variations to BTK gene regions and BTK protein domains. Variations in exons are indicated by green numbers in exons which are numbered in red. Variations in introns are in black below the domain chart. Domain borders are above the chart and numbers of variations in the domains above them

Table 5.1 Distribution of variation and variation types in BTK domains for all cases, independent families, and unique variations
Table 5.2 Spectrum of variants in the structural BTK domains

We have recently investigated the putative effects of all possible amino acid substitutions due to single nucleotide changes in the BTK TK domain [27]. Altogether 67 % of the 1495 substitutions were predicted to be harmful. Although this number seems very high, it is considered to be realistic because the kinase domain contains so many conserved regions and has several functions. The situation is likely very different in the SH3 and TH domains.

The variants are classified in Table 5.1 based on their effects on DNA or RNA level. The largest group of the variants is amino acid substitution causing missense variations (41 % of independent families). The SH3 domain is the only one where amino acid substitutions do not occur. Although SH3 domains are abundant in the human proteome, no disease-causing amino acid substitutions have been reported in any of them.

Nonsense variations account for 17 % of all variations, frameshift variations 19 %, and intronic variations 14 %. The proportions of deletions (4 %) and insertions (0.7 %) are very low and different from those reported in previous publications [10, 28] where proportions of 20 % (deletions) and 7 % (insertions) were given. These differences are due to the way the variants were counted, e.g., a variant with a DNA name “deletion” and an RNA name “frameshift” has been considered here as a frameshift variation. In the future we will avoid this kind of issues by adopting variation naming according to the Variation Ontology [30].

The distribution of variation types is very similar compared to the other IDbases [19]. The ratio of missense/nonsense variations, 2.5, is slightly higher in BTKbase compared to IDbases (1.5). Multiple variants in BTK have been identified in 17 families, complex variations in 9 families, and miscellaneous cases in 15 families.

There are altogether 341 unique amino acid substitutions. The theoretical maximum is 4151: thus, until now we have 8.2 % of the total variation; however, just a fraction of them is harmful and thus identifiable from XLA patients. In the case of nonsense variations, a larger portion has been seen in patients. There are 94 (28 %) of all the possible (n = 297) variants in the BTKbase. According to χ 2 statistics, there is highly significant overrepresentation (p < 0.0001).

When we are looking at the changes at amino acid level, it is apparent that arginine, as previously indicated, harbors the largest number of variants (Table 5.3). However, the most common outcome at protein level is protein truncation due to incorporation of a stop codon to the coding region. Altogether 29.5 % of single nucleotide changes lead to protein truncation.

Table 5.3 Amino acid substitutions indicated in percentages

Arginine is by far the most frequently substituted amino acid (31 %). This has not only been observed in BTK before [28] but also in variant datasets extracted from dbSNP [21], and this overrepresentation of arginine is known to be due to the high mutability of the codons containing CpG dinucleotides. Arginine is coded by six codons, four of which have a CpG dinucleotide in the first and second codon position [18]. The overrepresentation of arginine as the most frequently substituted amino acid also leads to the enrichment of tryptophan as the residue other amino acids are substituted to; arginine was replaced by tryptophan in 6.3 % of all amino acid substitutions (Table 5.3).

Proline is the amino acid to which most amino acids have been substituted to (8 %) closely followed by tryptophan (6.9 %), histidine (5 %), arginine (5 %), glutamine (4.8 %), and serine (4.8 %).

The G > A and C > T substitutions form the largest classes of changes, ~24 % (Table 5.4). The types of base changes were investigated more closely. The changes from amino to keto base and vice versa are much more frequent than substitutions within these groups. There is clearly a higher frequency of transitions (purine to purine and pyrimidine to pyrimidine, 66 %) than transversions (34 %). The higher rate of transitions agrees with the higher rate (~70 %) of transitions found to be typical for human genes [25]. The strong to weak base substitutions are by far the biggest category, containing 60 % of the variations. This was also found in the VariSNP variant datasets [21].

Table 5.4 Nucleotide substitutions in unique families (%)

5.2.2 Structural Consequences

BTK consists of five domains which, except for the SH3 domain, contain amino acid substitutions (Fig. 5.2). The effects and consequences of the variations vary widely. A recent study revealed that about two thirds of all kinase domain variations originating from a single nucleotide change likely lead to XLA [27]. This is not to say that two thirds of all possible amino acid changes were harmful since the majority of them do not originate from single base changes (because of the organization of the genetic code). Numerous variants affect functional sites, such as ligand- and substrate-binding regions at the domains. Stability-affecting changes are common. There are putative explanations available for the consequences of all the 1495 substitutions studied. These results are well in line with previous studies and predictions of BTK variants [5, 79, 11, 12, 14, 16, 20, 22, 24, 28, 32, 3541].

Fig. 5.2
figure 2

Distribution of amino acid substitutions to BTK domains. Affected amino acids are shown in yellow. PH domain is on top left (PDB code 1BTK [7]). The first part of the TH domain including the BTK motif binding Zn2+ (magenta) is on the top of the domain. In the SH3 (1AWX [4]), top right, and SH2 domain (2GE9 [6]), bottom left, an in-frame deletion of 21 residues is indicated. The kinase domain (1K2P [12]) is at bottom to the right. Amino acid substitutions appear throughout all the domains except the SH3 domain where there are none

Minor changes can be accommodated without major structural alterations. As has been seen in especially the PH domain, changes to electrostatics are common [16]. When the charge is reversed, added, or removed, the properties of the site are modified. If this happens on the protein surface of the binding site, then the interactions with partners are impaired or weakened.

Structural variations appear frequently in secondary structural elements. Although there are some variations at loops connecting these elements, the α- and β-structures are more sensitive for substitutions. Structural variants are frequent on the protein core where there is no space for larger side chains due to tight packing. Further, introduction of charged or polar residues to the protein core, even if sterically possible, is usually harmful. Much more variation is allowed on the protein surface in areas not involved in intra- or intermolecular interactions. Some of these interactions are known; however, we do not even know the three-dimensional organization of the entire BTK. The domains are independently folding and connected by loops, which can be quite long. It is likely that the domain interactions are different in different folds of the entire protein. There is structural information for the entire BTK in elongated conformation [13]; however, this conformation is not likely the only one.

BTK variation information has been collected already for two decades into BTKbase, which has been a central resource for research and diagnosis. The database is constantly growing; however, the recent explosion in sequencing activities has not contributed much to the increased numbers in the database. That is presumably because many cases remain in laboratories and are never published or submitted to a database. It is in the interest of the entire community to share information about variations, especially in rare diseases such as XLA.