Introduction

Plasmodium falciparium has a complex life cycle with alternating development in mosquitoes and vertebrate hosts. This complex multi-step multiple-host cycle necessitates adequate gene sets and appropriately regulated expression. It is hoped that a better understanding of the organisation and expression constraints of the genes within its genome will nurture the rational design of novel intervention strategies, which is greatly needed in view of the rapidly deteriorating efficiency of existing control tools.

Advances in the fields of genomics, proteomics and molecular immunology offer tremendous opportunities for the development of novel interventions against this public health threat. Today, researchers can make use of a huge amount of genome sequence data accompanied by expression data that might pave the way for the hoped-for anti-malarial therapies (Bozdech et al. 2003; LeRoch et al. 2003; Hall et al. 2005).

Recent advances in the genomics of autonomously living organisms have highlighted the widespread occurrence of multigene families, which account for up to 50% of certain genomes (Brenner et al. 1995; Labedan and Riley 1995; Tatusov et al. 1997). The diversification of individual genes within gene families provides a set of related molecules with a large array of recognition properties, specific functions or expression profiles, facilitating the fine tuning of biological activity according to specific cellular developmental or environmental conditions (Henikoff et al. 1997). These multigene families constitute a substantial part of the overall genome content in bacteria (Blattner et al. 1997) as well as lower and higher eukaryotes (Rubin et al. 2000). Multigene families are usually a collection of related genes that are presumed to share a common ancestor and might have originated from each other by duplication and subsequent divergence through gene recombination. These sets of genes are known to optimise fitness by providing a substantial part of the overall genome content of the organism. The number and the size of multigene families vary greatly from one organism to another (Rubin et al. 2000). With most genomes, the majority of the families contain few (up to five) members (Salanoubat et al. 2000). There are however examples of large multigene families with more than 50–100 members, including the large multigene families encoding surface antigens in plasmodia species (Su et al. 1995; Kyes et al. 1999; Portillo 2001) or apical organelle proteins (Owen et al. 1999; Preiser et al. 1999; 2002).

While functional divergence of duplicated copies can occur, some copies can result in a loss of function by the generation of stop codons or a gradual accumulation of disabling mutations (Little 1982; Wilde 1986), thereby suffering the less glamorous fate of fixing a null allele and becoming a pseudogene. Pseudogenes usually act as a reservoir of sequences that recombine with their functional paralogous genes and thus generate antigenic diversity (Restrepo 1994; Balakirev et al. 2003).

In a previous study, an antigen was identified in membrane fractions of P. falciparum-infected erythrocytes (Kun et al. 1991). Sequence analysis of the full gene revealed a protein rich in Trp residues and with a domain containing threonine repeats, which was termed TryThrA (Uhlemann et al. 2001). A blast search of the P. falciparum genome database PlasmoDB based on gene structure and amino acid sequence identified related sequences on chromosomes 1, 4, 10 and 13 (http://plasmodb.org). The respective gene product on chromosome 1 has recently been characterised as a merozoite-associated protein (Ntumngia et al. 2004). The P. falciparum Trp-rich antigens were envisaged to be homologues of the P. yoelii secreted antigens which have been shown to be successful in vaccine studies in mice (Burns et al. 1999; Burns et al. 2000). The level of identity on the amino acid level is rather low and limited to the Trp-rich domains. In the present study, we describe the characterisation of the sequences on chromosomes 10 and 13. On the basis of their gene structure and amino acid composition, we have termed them tryptophan-rich antigen-3 (TrpA-3) and lysine-tryptophan-rich antigen (LysTrpA) respectively.

Materials and methods

Parasites materials and cultures

The parasites used in this study were three Plasmodium falciparum laboratory strains (the Kenyan isolate Binh1, the FCR-3 strain isolated in The Gambia and the Gabonese isolate, Cys007) and a series of field isolates from malaria patients attending the Albert Schweitzer Hospital in Lambaréné, Gabon. The laboratory-adapted parasites were maintained in continuous culture using standard methods with erythrocytes (0+) at 5% hematocrit and AB+ Rh+ serum in buffered medium (Cranmer et al. 1997).

PCR and RT-PCR

Genomic DNA and total RNA were extracted from asynchronous cultures and blood samples of field isolates using the QIAamp DNA blood kit and the RNeasy mini kit (Qiagen, Hilden, Germany) respectively as described in the user manuals. The respective genomic DNAs from all the isolates and total RNA from the Binh1 strain were used as templates for PCR and reverse transcriptase-PCR (RT-PCR) respectively. Sequence data from the database PlasmoDB (http://plasmodb.org), were used to design specific oligonucleotide primers for PCR and RT-PCR. The TrpA-3 gene was divided into four different regions and the various fragments were PCR amplified using the following primer sets:

  • F1: 5′-ATG CAA ATT AAT CCA ATG GAA-3′;

  • RI: 5′-TTC TAT ATC TTT TCT TGT-3′;

  • F2: 5′-CAA CTT ATG ACA TTA GAA-3′;

  • R2: 5′-TCC ATT ATT GTC GAA TTT ATT-3′;

  • F3: 5′-TAT ACT CAT GAT GAT TTG TAT-3′;

  • R3: 5′-CAC ACG CTC CAT TGT TTT TCC-3′;

  • F4: 5′-CAA TGG AGC GTG TGG TTA GAA-3′; and

  • R4: 5′-TTA TTC TTG GTT TCT AAT TAA-3′

while for LysTrpA, the gene sequence was divided into four overlapping blocks by using the corresponding primer combinations BaF/BaR, T4F/T4R, May/Ros4 and Fb3/Ros4. The primers used were:

  • BaF: 5′-CTG GAA CGA GAA GAA GAA GG-3′;

  • BaR: 5′-CCT TTT CTT TAT GAC TGC-3′;

  • T4F: 5′-CAT AAG ATG CTA CCA CAA AAA-3′;

  • May: 5′-CAT AAA TGC AGT CAT AAA-3′;

  • Fb3: 5′-TAT AAT CAA GAT GTG AAT-3′;

  • Ros4: 5′-ATA TTC AAA CAA AAC AGC TAG-3′; and

  • T4R: 5′-TTC ATT GAT ATT TAC ATT CAC-3′.

All amplifications were initiated with a denaturing step of 94°C for 5 min, followed by 35 cycles of 94°C for 1 min, 1 min annealing time (the temperature was adjusted to the Tm of the oligonucleotides used) and 1.5 min at 72°C for elongation. The reaction was stopped after a final elongation step of 72°C for 10 min. After purification, the respective PCR products were sequenced with internally generated specific primers using BigDye terminator chemistry on an ABI PRISM 3100 Genetic analyser (Applied Biosystems, Foster City, USA). The sequences were analysed for polymorphisms after DNA alignment of the sequenced products using the Bioedit alignment program (North Carolina State University, USA). The quality of the individual SNPs were assessed by visual inspection with reference to the electropherograms. cDNA was synthesised from total RNA by RT-PCR using the Titan one Tube RT-PCR kit (Roche, Mannheim, Germany) according to the manufacturer’s instructions. The respective products were also sequenced as described above.

Gene expression profiles

The gene expression profile of the five members of the Trp-rich multigene family was done using microarray data generated by Bozdech et al. (2003). We generated a heat map using the Genesis 1.2.2 software (http://genome.tugraz.at).

Results

Genomic organisation and polymorphism

We sequenced genes for TrpA-3 and LysTrpA from three laboratory-adapted isolates and seven field isolates from Lambaréné, Gabon. The complete nucleotide sequence of TrpA-3 in the Binh1 isolate consists of 3,075 nucleotides and contains an open-reading frame, which encodes for a predicted protein with 971 amino acid residues and a molecular weight of about 107 KDa, while the complete gene sequence in LysTrpA consists of a nucleotide sequence of 2,543 base pairs and an open reading frame encoding a predicted protein with 797 amino acid residues and a predicted molecular weight of about 88 KDa (Fig. 1). Total RNA was isolated from an asynchronous culture which was used to produce cDNA by RT-PCR. Transcription of both genes was shown by analysis of the PCR products generated from cDNA and genomic DNA which revealed that short introns of 159 bps and 143 bps at the 5′ end were spliced out of the mature messages from both TrpA-3 and LysTrpA genes respectively (Fig. 3a, b). Towards the 3′end of LysTrpA, at the beginning of the Trp-rich domain (at position 1,812), a stop codon is situated. To investigate the possibility that the stop codon was part of a second short intron as suggested in PlasmoDB, we generated RT-PCR products from FCR-3 using specific primers flanking the stop codon. A sequence analysis of RT-PCR products revealed that the internal stop codon was not spliced out. Therefore we concluded that both genes have a two exon structure. The first exons are very short, consisting of 189 and 180 nucleotides for the TrpA-3 and LysTrpA respectively, while the second exons account for the majority of the total gene sequences with 2,730 and 2,220 nucleotides for TrpA-3 and LysTrpA respectively. The intron regions, like for most P. falciparum intron sequences, are predominantly AT-rich and show length variation among the studied isolates. A closer observation of the cDNA sequences in TrpA-3 reveals an N-terminal unique region which is preceded by a sequence which codes for a hydrophobic signal sequence and a C-terminal end characterised by a cluster rich in Trp residues. In the LysTrpA gene, there are two prominent domains: the 5′ highly charged domain which is rich in Lysine residues and a 3′ end similar to TrpA-3 characterised by a cluster of Trp residues. In the N-terminus there is also a series of residues which code for a putative hydrophobic signal sequence. The Trp-rich domains are highly conserved both in number and position among the various isolates (not shown). When Trp is replaced by other amino acids, this is mainly by tyrosine or phenylalanine, both of which also belong to the bulky aromatic amino acids. Just upstream of the Trp region in TrpA-3 (between nucleotides 1,863 and 2,079) is a region containing four tandem repeat sequences of 18 amino acid residues. We also detected repetitive elements consisting of a number of pentapeptide repeats (KDSDK) in the 3′ end lysine-rich domain of the LysTrpA gene in all the studied isolates. These differ in number between the different isolates resulting in a length polymorphism within this region.

Fig. 1
figure 1

Sequence analysis of LysTrpA and TrpA-3. a The deduced amino acid sequence of LysTrpA and b TrpA-3 genes are shown. The tryptophan residues are indicated in bold. The repeat sequences are underlined, signal peptides are indicated in lower case in gray, while the position of introns are indicated by arrow heads. The stop codons in the LysTrpA sequence are highlighted with asterisks (*) while the positions of mutations are highlighted in red

To study the diversity of the two genes, we sequenced genomic DNA from a series of laboratory and field isolates. Using the sequence from the Binh1 isolate as our reference, we determined the variability of these genes among the study isolates. A representative alignment demonstrated a high conservation of the genes between the laboratory and field isolates (not shown). However, we detected 14 SNPs in the coding region of TrpA-3, four of which were synonymous and the rest were biased towards non-synonymous mutations. Five of the mutations were located within the repeat region. One of the laboratory clones (Cys007) contained all the observed SNPs except for that at position 2,526 (Table 1). In the LysTrpA gene, only four SNPs were detected, all resulting in non-synonymous mutations (Table 1). The most remarkable mutation was observed at position 2,202 of the nucleotide sequence. At this position, a substitution reaction converts glutamic acid to a stop codon in all three laboratory strains and in four of the seven field isolates sequenced. Comparing TrpA-3 and LysTrpA to the other members of the gene family in P. falciparum, the results indicate that the different members of the family have about a 31% similarity to each other with homology restricted to the Trp-rich region as the position of the Trp residues are conserved in all the isolates (not shown). The aromatic amino acids seem to build a scaffold, which can also be found in other plasmodial species (Fig. 2). The N-terminal portion follows a regularity of W x2W x7W x2 F x10 W x7 W x2 W x7 W x2 (Y/F) x7(Y/F) whereby x can be any residues. In the middle of the molecule this pattern is largely lost but towards the C-terminus the W x2 W x7 repeats appear again. Not only Trp residues but also some positions of charged residues like Glu, Asp and Lys are also highly conserved between species.

Table 1 Polymorphisms and point mutations in the TrpA-3 and LysTrpA genes. Observed point mutations in the TrpA-3 using the Binh1 isolate as reference sequence and LysTrpA with the genome sequence of 3D7 as reference sequence. F = FCR-3, B = Binh1, S = Cys007, while D1−D7 and * represent field isolates
Fig. 2
figure 2

Multiple alignment of the tryptophan-rich domains of 12 genes belonging to different Plasmodia species. LysTrpA, TryThrA, MaTrA and Trp3 are members of the Trp-rich family in P. falciparum, pypAg1, pypAg3 are from P. yoelii, P.v396893, P.v 398040, P.v398132, P.v 398701 are from P.vivax, while P. k1689e08q1c and P.b 116c08q1c correspond to P.knowlesi and P.berghei respectively. The shading indicates the conservational position of the Trp residues in the different Plasmodia genes. In black Trp (W), Phe (F) or Tyr (Y) residues are indicated at conserved positions. In the consensus sequence h represents conserved hydrophobic residues (A, C, F, I, L, M, V, W and Y), C indicates charged residues (D, E, K, R and H) and + or − indicate positively or negatively charged residues (K and R, E, D) in at least 70% of the residues

Between species one can find up to 85, 71 and 38% homology (chrPy1_01062 (P. yoelii) versus PB_116c08q1c (P. berghei); Pc_5923 (P. chabaudi); Pk_1813d08q1c (P. knowlesi) respectively, but among paralogues the homology is between 25 and 30% in P. yoelii proteins and around 40% in P. knowlesi (not shown) and largely restricted to the Trp scaffold. When performing a TBLAST search on PlasmoDB on all available plasmodial sequences, all species contain proteins with a Trp-rich scaffold in various numbers: in P. falciparum five are found. In the murine parasites the numbers are in the same range. In the preliminary P. vivax and P. knowlesi sequences more than 15 can be found. These Trp-scaffold proteins are not related to any of the multigene families described so far.

Expression profiles of the family members

Gene expression analysis of the five members of the gene family was carried out using the Genesis 1.2.2 software (http://genome.tugraz.at). Raw data for this analysis was obtained from microarray analysis of tightly synchronised parasite cultures of the reference genome strain (3D7) of P. falciparum. A general picture of the gene expression profile within 48 h corresponding to the asexual intra-erythrocytic developmental cycle shows that both TrpA-3 and LysTrpA are expressed mainly in the mid-stages of the asexual cycle whereas TryThrA expression is more pronounced in earlier stages (Fig. 3). This corresponds well with the observation that TryThrA expression as shown by immunoprecipitation starts very early in the parasite development (Uhlemann et al. 2001). The expression of two other genes (PFA0135w and PFD0100c) is highest in later stages. The gene product of PFA0135w, the merozoite-associated tryptophan-rich Antigen (MaTrA), was shown by Western blot analysis to be expressed in later stages and merozoites (Ntumngia et al. 2004).

Fig. 3
figure 3

a Photography of a 1.5% agarose gel showing PCR products from genomic DNA (g) and cDNA (c). g1 and c1 represent products from PCR and RT-PCR with primers flanking the stop codon at position 1,812 respectively while g2 and c2 represent products with primers flanking the predicted intron at the 5′ end in the LysTrpA gene. Molecular weight markers are indicated in base pairs (bp). b g and c represent products from PCR and RT-PCR with primers flanking the predicted intron at the 5′ end in the TrpA-3 gene. Differences in band sizes between g and c indicate the sizes of the introns. c Expression profiles of the various genes which constitute the Trp-rich protein gene family of P. falciparum. Each column indicates a different time point after merozoite invasion, covering the entire asexual 48 h cycle. Red and green represent positive and negative values of median respectively. The intensity of the red or green colour is a measure of the level of expression of the genes at the different time points (3 is a threefold induction; −3 is a threefold repression compared to a standard), as given below the time course. At the time point given in grey no samples were taken. Raw data were taken from the publication by Bozdech et al. 2003

Discussion

The sequences of two Trp-rich proteins of a novel Plasmodium falciparum multi-gene family have been reported (Uhlemann et al. 2001, Ntumngia et al. 2004). These are thought to be orthologues of two Plasmodium yoelii antigens. In this study, we have identified and characterised two further paralogues of these genes. They contain two exons each and typically for non-coding regions in P. falciparum, the 3′ untranslated regions have a high A+T content. Both have a Trp-rich domain and a repetitive region. Despite the similarity in structure, the two antigens exert different ways of antigenic variation: LysTrpA contains a highly polymorphic repetitive region and very few single nucleotide polymorphisms (SNP), whereas TrpA-3 seems to be invariant in length but has accumulated more SNPs. Based on this, we suggest that TrpA-3 could be under greater selection pressure than LysTrpA.

TrpA-3 and LysTrpA, like their paralogues, TryThrA and MaTrA, contain highly conserved Trp-rich domains in all field and laboratory isolates studied so far. The main function of this domain in the parasite is not yet known. These sequences are very distinct from the Trp-containing domains of PfEMP1 and EBA-175. However, multiple Trp residues appear in functional domains of at least three malarial proteins that are involved in key processes including the invasion of hepatocytes (CSP and TRAP) (Goundis et al. 1988), cytoadherance (PfEMP1) (Baruch et al. 1997), and merozoite invasion of iRBCs (EBA) (Sim et al. 1994; Adams et al. 1992). Trp-rich proteins are found to be expressed throughout the asexual phase of the parasite life cycle as a clear indication that these proteins may play an essential role in the development of the parasite within its host.

Multigene families have been shown to be a common phenomenon in plasmodia and many other organisms and it is well established that the multiple paralogues generated through the process of amplification and divergence provide an organism with a set of related genes allowing the fine tuning of the biological activity based on the existing cellular developmental or environmental conditions. Through this process, some of the genes develop disabling elements resulting in loss of function and become pseudogenes. Pseudogenes in P. falciparum have been described in the large PfEMP1 family, the family of erythrocyte binding proteins (EBA) and for the ring-infected erythrocyte membrane antigen RESA. RESA-2 is a homologue of the ring-infected erythrocyte surface antigen RESA from P. falciparum, which contains a stop codon in the reading frame (Cappai et al. 1992). It was shown that RESA-2 was transcribed and that the mRNA was correctly spliced (Vazeux et al. 1993). Translation however was not shown in either report. Two further reports describe two pseudogenes, which generate stop codons by frameshift mutations (Triglia et al. 2001; Taylor et al. 2001). Both genes belonging to different gene families were transcribed but no protein was produced. Knock-out parasites lacking PfRH3 or PsiEBA165 were viable and phenotypically not distinguishable from wild-type parasites.

LysTrpA contains two stop codons, one of which is polymorphic. The existence of these stop codons in parasites never been in culture rules out that these mutations have been introduced by artificial conditions. Translation through internal stop codons has long been known to occur in bacteria and viruses (Gesteland and Atkins 1996) and has also been shown in P. falciparum (Bischoff et al. 2000). The molecular mechanism underlying translation readthrough is still unknown.

Recently comparative genomic data were published from malarial parasites (Hall et al. 2005). The genes coding for Trp-rich proteins were either located in subtelomeric regions (TryThrA, MaTrA, TrpA-3) or they are classified as “inserts” (LysTrpA) within chromosome 13 of P. falciparum. The subtelomeric regions are pivotal locations for genes involved in antigenic variation in P. falciparum, P. chabaudi and P. vivax (Rubio et al. 1996; Fischer et al. 2003, Portillo et al. 2001). Given the limited number of Trp-rich proteins in P. falciparum, it is unlikely that they play a major role in antigenic variation in this species. But given the expression in later stages and the localisation of the members of the Trp-rich family on merozoites it is attractive to speculate about a possible role of Trp-rich proteins in antigenic variation in P. vivax or P. knowlesi merozoites.