Introduction

Characterizing genes and proteins of the malaria parasite will lead to a better understanding of the biology of the parasite and provide information about possible therapeutic targets. Monoclonal antibodies (Mabs) were previously generated against proteins possibly located in an ER-like compartment (Cortés et al. 2003). In this previous study, we demonstrated that one of these antibodies, called Mab7, recognized a 68 kDa antigen in P. falciparum and other Plasmodium species. Fluorescence microscopy localized the 68 kDa antigen to a subcellular compartment that overlapped with the ER but exhibited some distinctions from the ER. Subsequently, this compartment was named the Plasmodium export compartment (Wiser 2007). In addition, by mass spectrometry analysis of the immunopurified protein, we recently demonstrated that the antigen of Mab7 corresponds to the P. falciparum HSP70-2 protein (Cortés et al. 2020). The present work arises as an attempt to confirm the protein identity via the screening of recombinant DNA expression libraries with Mab7. In this study, we show that Mab7 identifies recombinant DNA clones that encode a larger-than-expected protein which is conserved in all Plasmodium species. This conserved gene proved not to be of the 68 kDa protein, but nonetheless, the cloned gene did prove to be interesting in that it is not found in any other species except Plasmodium and likely the closely related Hepatocystis. Another notable feature of this protein is a WD40 domain that is quite conserved across all Plasmodium species.

WD40 domains are found in a large number of eukaryotic proteins and participate in a wide range of functions including cell division, RNA transcription and processing, vesicular trafficking, signal transduction, cytoskeleton assembly, and the ubiquitin proteasome system (Stirnimann et al. 2010). In addition, WD40 domain-containing proteins play important roles in the assembly of dynamic multi-protein complexes and often function in cellular interaction networks. Despite the important roles WD40 proteins play in cellular physiology, little work has been done on WD40-domain proteins from the malaria parasite. Chahar et al. (2015) carried out an extensive in silico analysis of WD40-domain proteins in Plasmodium falciparum and identified 80 putative WD40-domain proteins. Orthologs from other organisms were identified for most of the P. falciparum WD40-domain proteins. However, fifteen of the WD40-domain proteins appear unique to the genus Plasmodium. One of these unique WD40-domain proteins has been demonstrated to be involved in the formation of adhesion protein complexes during the blood stage of the parasite (von Bohl et al. 2015). Similarly, not much is known about WD40-domain proteins in other apicomplexans. Toxoplasma coronin, an actin-binding protein, has been crystallized (Kallio and Kursula 2014) and a myosin from Gregarina polymorpha contains a WD40 domain (Heintzelman and Mateer 2008). Both of these WD40 proteins may be involved in gliding motility, which is a process unique to the Apicomplexa. Clearly, more work on the WD40 proteins of Apicomplexa and the malaria parasite is needed. In that WD40 domain-containing proteins are poorly characterized in Plasmodium and Apicomplexa and WD40 proteins are potential drug targets (Song et al. 2017), this unique Plasmodium protein was further characterized.

Methods

Gene cloning and sequencing

Recombinant cDNA expression libraries were obtained from Nirbhay Kumar and have been previously described (Zhang et al. 1999). Briefly, these libraries were constructed from P. falciparum (NF54) enriched for gametocytes. RNA was converted into cDNA and ligated into lambda ZAPII bacteriophage. For screening, the recombinant bacteriophage was mixed with Escherichia coli XL-1 Blue in the top agarose, plated on bacteriological agar plates, and incubated at 37 °C to form bacterial lawns. When plaques first became visible, the agar/agarose plates were overlaid with polyvinylidene difluoride (PVDF) membranes saturated with 10 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) and incubated for an additional 14–16 h at 37 °C. A total of 5.3 × 105 plaques were screened.

The PDVF membranes from the plaque lifts were subjected to immunoblotting using monoclonal antibody-7 (Mab7). Mab7 was generated using a membrane fraction from P. falciparum-infected erythrocytes and recognizes a 68 kDa parasite protein (Cortés et al. 2003). Positive plaques were detected with an enzyme-conjugated anti-mouse IgG and developed as previously described (Cortés et al. 2003). Positive clones were purified by plaque purification and the insert DNA fragments were isolated by in vivo excision using M13 helper phage (Short et al. 1988). The purified recombinant phagemids were sent to the Instituto de Genética de la Universidad Nacional de Colombia for DNA sequencing with fluorescent dye terminators.

Sequence analysis

The sequence of the cloned insert cDNA was used to carry out BLAST searches (https://blast.ncbi.nlm.nih.gov) of the available sequences at National Center for Biotechnology Information. The most recent search was carried out in January 2021. Orthologs of this gene were identified through additional BLAST searches and through PlasmoDB (Aurrecoechea et al. 2009). Multiple sequence alignments were carried out using CLUSTALW 2.1 (Larkin et al. 2007) and pairwise alignments using the SIM alignment tool (Huang and Miller 1991). The original alignment containing all species was refined using alignments carried out within a single parasite clade as guides. PROSITE (Sigrist et al. 2013) and SPARCLE (Marchler-Bauer et al. 2017) were used to identify domains and functional sites within the sequences. Low-complexity sequence was analyzed according to Wootton (1994). Transmembrane helices were predicted using TMHMM (Sonnhammer et al. 1998). WD40 domains were determined using the WD40-repeat protein structure predictor (WDSP) tool (Ma et al. 2019).

Protein structure modeling

Protein modeling was carried out with Protein Homology/analogY Recognition Engine version 2.0 (Phyre2) (Kelley et al. 2015), SWISS-MODEL (Waterhouse et al. 2018), SPARKS-X (Yang et al. 2011), or QUARK (Xu and Zhang 2012). Structures were visualized with EzMol (Reynolds et al. 2018) or PyMOL Molecular Graphics System, Version 2.0 (Schrödinger, LLC).

Epitope prediction

The sequence of the antigen-binding region of Mab7 was determined so that epitope prediction studies could be carried out. Frozen hybridoma cells were supplied to GenScript ProBio (Piscataway, NJ) and they carried out RNA extraction, and amplification and sequencing of the regions corresponding to the variable domains of the light and heavy chains. The translated sequence of the variable domain was then used to generate a three-dimensional (3D) model of the antigen-binding site of Mab7 with Lymphocyte Receptor Automated Modelling (LYRA) (Klausen et al. 2015). This Mab7 model was used to search for potential epitopes in 3-dimensional models of HSP70-2 from Plasmodium and models of the translation of the cloned DNA fragment using EpiPred (Krawczyk et al. 2014).

Three-dimensional models of HSP70-2 from P. falciparum (GenBank accession number CAD51861.1) and P. berghei (Swiss-Protein accession number Q25642) were generated with Phyre2 (Kelley et al. 2015) using human BIP (PDB ID 5E84) as the template. SPARKS-X (Yang et al. 2011) was used to search for proteins with possible structural similarities to the protein sequence of the cloned DNA fragment. A model generated using the template from the outer surface protein A of Borreliella burgdorferi (PDB ID 2FKJ) was used to predict the epitope of Mab7. Other templates generating similar epitopes (data not shown) include endoglycosidase S from Streptococcus pyogenes (PDB ID 6EN3), vacuolar ATPase of Saccharomyces cerevisiae (PDB ID 6O7V), and spike protein of the type VI secretion system from Francisella tularensis (PDB ID 6U9E).

Gene identifiers

The gene ID number of the protein described in this study is PF3D7_1228800 (P. falciparum 3D7). Orthologs of this gene include AK88_02334 (P. fragile strain Nilgiri), C922_03474 (P. inui San Antonio 1), PADL01_1229400 (P. adleri G01), PBANKA_1443500 (P. berghei ANKA), PBILCG01_1228800 (P. billcollinsi G01), PBLACG01_1228200 (P. blacklocki G01), PCHAS_1445700 (P. chabaudi chabaudi), PCOAH_00053960 (P. coatneyi Hackeri), PCYB_145450 (P. cynomolgi strain B), PcyM_1450400 (P. cynomolgi strain M), Pf7G8_120034300 (P. falciparum 7G8), PfCD01_120033700 (P. falciparum CD01), PfDd2_120033800 (P. falciparum Dd2), PfGA01_120033500 (P. falciparum GA01), PfGB4_120034200 (P. falciparum GB4), PfGN01_120034800 (P. falciparum GN01), PfHB3_120033600 (P. falciparum HB3), PfIT_120033900 (P. falciparum IT), PfKE01_120033800 (P. falciparum KE01), PfML01_120034200 (P. falciparum ML01), PfSD01_120033700 (P. falciparum SD01), PfSN01_120034400 (P. falciparum SN01), PfTG01_120033700 (P. falciparum TG01), PGABG01_1227600 (P. gaboni strain G01), PGAL8A_00129900 (P. gallinaceum 8A), PGSY75_1228800 (P. gaboni strain SY75), PKNH_1448200 (P. knowlesi strain H), PKNOH_S140266300 (P. knowlesi strain Malayan Strain Pk1 A), PmUG01_14063100 (P. malariae UG01), PocGH01_14054500 (P. ovale curtisi GH01), PPRFG01_1226000 (P. praefalciparum strain G01), PRELSG_1444400 (P. relictum SGS1-like), PRG01_1232100 (P. reichenowi G01), PVL_000196200 (P. vivax-like Pvl01), PVP01_1447000 (P. vivax P01), PVX_124070 (P. vivax Sal-1), PY01983 (P. yoelii yoelii 17XNL), PY17X_1446000 (P. yoelii yoelii 17X), PYYM_1447600 (P. yoelii yoelii YM), YYE_00892 (P. vinckei vinckei strain vinckei), and YYG_00464 (P. vinckei petteri strain CR).

Gene identifiers for proteins possibly interacting with the WD40 repeat protein include PF3D7_0416400 (histone acetyltransferase), PF3D7_0612100 (eIF3 subunit L), PF3D7_0702100 (PHISTb family protein), PF3D7_0802100 (APIAP2), PF3D7_0823300 (histone acetyl transferase GCN5), PF3D7_0935500 (Plasmodium export protein, PFI1715W), PF3D7_1246200 (actin 1), and PF3D7_1360700 (SUMO E3 ligase).

Results and discussion

In an attempt to identify the 68 kDa antigen of P. falciparum, a cDNA expression library was screened with Mab7. From a total of 5.3 × 105 plaques analyzed, two positive clones were ultimately identified. Figure 1 shows reactivity of Mab7 with the plaques from one of these clones, and this reactivity is clearly a strong positive signal. No positive plaques were detected in the negative control that did not contain Mab7. The phagemids from both clones (named clones 5 and 6) were released and sequenced, and both clones had identical 404 bp inserts. This sequence was submitted to GenBank and has the accession number MK618679. The translation of the insert DNA is on pages 32–33 of the supplemental data. Searching GenBank with the MK618679 sequence identified a P. falciparum gene on chromosome 12 designated as putative WD40 repeat-containing protein (gene ID = PF3D7_1228800). This gene contains a single exon consisting of 9630 nucleotides that encodes a protein of 3209 amino acids with a predicted molecular mass of 380 kDa. This protein was previously identified as part of a survey of WD40-repeat proteins in Plasmodium, but not further characterized (Chahar et al. 2015). We propose to call this protein Plasmodium WD40 repeat-containing protein-1 (PlasWD40-1) until further information about this protein is available.

Fig. 1
figure 1

Expression of a recombinant protein recognized by Mab7. Clone 6 derived from the screening of the recombinant cDNA library was subjected to three rounds of plaque purification. Shown are plaque lifts (upper) analyzed by immunoblotting with Mab7 (left) or a no antibody control (right) from the third plaque purification. Below the plaque lifts are pictures of the agar/agarose plates from which the plaque lifts were generated. All plaques on the agarose plate exhibited a strong positive signal in the lifts analyzed by Mab7. No plaques exhibited a positive signal in the no antibody negative control. This demonstrates that Mab7 recognizes a protein expressed by the cloned DNA fragment (GenBank Accession Number MK618679) and the recombinant DNA clone is purified

PlasWD40-1 is substantially larger than the 68 kDa protein recognized by Mab7. Furthermore, through mass spectrometry analysis, the 68 kDa protein has been recently identified as the ER-resident heat-shock protein-70 from P. falciparum referred to as PfHSP70-2 (Cortés et al. 2020). Thus, the detection of clones 5 and 6 by Mab7 is likely due to cross-reaction between the protein expressed by the cloned cDNA fragment and Mab7. In other words, Mab7 not only recognizes PfHSP70-2, but also recognizes a protein expressed by the cloned DNA fragment (MK618679).

To investigate a possible common epitope shared by both proteins, the amino acid sequences of these two proteins were compared. A pair-wise alignment identified a 38 amino acid fragment with 29% sequence identity (Fig. 2). The sequence of Mab7 was then used to generate a 3-dimensional model of the antigen-binding site of Mab7. This antigen-binding site model was then used to search for conformational epitopes in 3-dimensional models of PfHSP70-2 and MK618679. A 12 amino acid long epitope was identified in PfHSP70-2 (Fig. 2, highlighted in orange). As expected, since Mab7 reacts equally well with HSP70-2 from P. falciparum and P. berghei (Cortés et al. 2003), this same epitope was also identified in the P. berghei ortholog PbHSP70-2 (data not shown). Two potential epitopes of six and seven amino acids in length (highlighted in yellow), which overlap with the predicted PfHSP70-2 epitope, were identified in MK618679. Similar epitopes were identified in three other models generated from the sequence of the cloned insert fragment (data not shown). The overlapping potential epitopes identified in these modeling studies may explain the reactivity of Mab7 with both PfHSP70-2 and the MK618679 insert.

Fig. 2
figure 2

Predicted epitopes of PfHSP70-2 and cloned sequence. A pairwise alignment between PfHSP70-2 (Gene ID PF3D7_0917900) and the translation of the cloned DNA fragment (Genebank Accession MK618679) was carried out. Shown is the highest scoring overlap between the two sequences consisting of 38 residues with 29% identity (denoted with asterisks *). Predicted epitopes recognized by Mab7 are highlighted in orange and yellow

Expression of PlasWD40-1

The cloning of PlasWD40-1 from a cDNA library prepared from parasite mRNA indicates that the gene is expressed in blood stage parasites. This is corroborated by several transcriptome studies. Messenger RNA levels of the gene peak during the late trophozoite stage (Llinás et al. 2006; Rovira-Graells et al. 2012). The gene is also expressed in gametocytes (López-Barragán et al. 2011) and sexually committed parasites (Pelle et al. 2015). Transcripts are also detected in the ookinete stage (López-Barragán et al. 2011), as well as oocyst and sporozoite stages (Zanghì et al. 2018). Similar expression profiles are observed in the ortholog from rodent malaria parasites (Otto et al. 2014), P. knowlesi (Lapp et al. 2015), and P. vivax (Zhu et al. 2016). During the blood stage, the transcript is also associated with polysomes (Bunnik et al. 2013) implying translation as well as transcription of the gene. Furthermore, proteomic analyses using mass spectrophotometry have detected peptides of PlasWD40-1 in blood stages as well as sporozoites (https://plasmodb.org). These results indicate that the gene is expressed in all life cycle stages of the malaria parasite. Insertional mutagenesis by the piggyBAC transposon suggests that the gene is dispensable during the in vitro blood stage culture (Zhang et al. 2018). However, the fact that orthologs of the gene have been found in all Plasmodium species does argue that the gene is essential for some aspect of the parasite’s life cycle.

Orthologues in other species

The amino acid sequence of PlasWD40-1 was used to search GenBank and orthologs were identified in all other Plasmodium species. However, no homologous proteins were identified in non-Plasmodium species, except for a partial sequence of 363 amino acids from Hepatocystis (Gene Bank Accession Number VWU53150.1). Hepatocystis is generally placed as a sister group to mammalian Plasmodium species (Martinsen et al. 2008; Galen et al. 2018) and thus is often considered to be within the Plasmodium clade. Complete genomes are available for many other Apicomplexa, such as Toxoplasma, Cryptosporidium, Theileria, and Babesia, and no homologues were found in these species. PlasWD40-1 belongs to ortholog group OG6_158520 (Chen et al. 2006). Sequences from seven non-Plasmodium species are also included in OG6_158520. However, this inclusion of non-Plasmodium species in the ortholog group is likely an artefact since there are only short regions of homology between these proteins and PlasWD40-1 (data not shown). Therefore, the PlasWD40 gene appears to be unique to the genus Plasmodium. However, the Haemosporida are not widely sampled in terms of genomics and it is possible that this gene may be present in other members of the Haemosporida.

The amino acid sequences from 17 Plasmodium species were aligned (Supplemental Data) and this alignment was used to generate an unrooted phylogenetic tree (Fig. 3). The phylogenetic analysis revealed four major branches designated as great ape, simian, avian, and rodent. P. malariae and P. ovale sequences did not form a strong association with any of these groups and branched near the base. This phylogenetic tree is in strong agreement with what is known about the evolution of the genus Plasmodium and indeed others have shown trees with similar topologies and these four major groups (Loy et al. 2017). For example, P. falciparum, P. reichenowi, and P. gaboni have long been grouped together in the subgenus Laverania and their evolutionary relationships are well characterized (Sundararaman et al. 2016). Similarly, P. vivax forms a clade with malaria parasites found in macaques of Asia (Cornejo and Escalante 2006), the rodent parasites are a monophyletic group (Ramiro et al. 2012), and the avian parasites split from the mammalian species (Böhme et al. 2018). In addition, the PlasWD40-1 gene is located on chromosome 12 of the great ape species and chromosome 14 of the other species. The synteny of chromosome 12 from the great ape species and chromosome 14 from the other species has been previously noted (Tachibana et al. 2012).

Fig. 3
figure 3

Phylogenetic relationships of the WD40 repeat-containing protein among Plasmodium species. Shown is an unrooted neighbor-joining tree derived from the Clustal 2.1 alignment of the WD40 repeat-containing protein from P. falciparum (Pfal, CZT99447.1), P. reichenowi (Prei, CDO65564.1), P. gaboni (Pgab, SOV16446.1), P. vivax (Pviv, EDL47649.1), P. cynomolgi (Pcyn, GAB69117.1) P. fragile (Pfra, KJP88059.1), P. coatneyi (Pcoa, ANQ11029.1), P. inui (Pinu, EUD66004.1), P. knowlesi (Pkno, CAQ42237.1), P. berghei (Pber, CDS52152.1), P. chabaudi (Pcha, CDR16910.1), P. vinckei (Pvin, EUD74513.1), P. yoelii (Pyoe, ETB61791.1), P. ovale (Pova, SCQ17100.1), P. malariae (Pmal, SCP03571.1), P. relictum (Prel, CRH02710.1), and P. gallinaceum (Pgal, CRG93592.1). Four major groups are observed: great ape (blue), simian (green), avian (purple), and rodent (orange). P. malariae and P. ovale do not show a strong affinity for any of the four groups

Complete sequences of PlasWD40-1 are available from PlasmoDB and other databases from 13 isolates of P. falciparum. These P. falciparum sequences are essentially identical with the differences between isolates being found almost exclusively in the number of tandem repeats in low-complexity regions (data not shown). One exception is a 33 amino acid insert found in some of the isolates. This insert corresponds to a putative intron in some isolates, but not other isolates. The 33 extra amino acids are part of a conserved region in the great ape group (Supplemental Data, pages 26–27), and therefore, we feel the designation of this putative intron is in error. The need for the intron to maintain the reading frame is due to a variation in the number of adenosines in a poly-A tract, which is a common sequencing error.

Sequence alignments of PlasWD40-1 from species within each of the four Plasmodium clades were also carried out. These clade-specific alignments were then used to refine the original alignment with all 17 species. The combined alignments revealed three major regions of PlasWD40-1 based upon sequence homology and structural features (Fig. 4). These three regions are (1) a conserved N-terminal region containing the WD40 domain, (2) a quasi-conserved central region containing a large amount of low-complexity sequence, and (3) a partially conserved C-terminal region.

Fig. 4
figure 4

Structure of the WD40 repeat-containing protein. A schematic representation of the WD40 repeat protein based on the alignment (Supplemental Data). Blocks in red are highly conserved in all species; blocks in pink are partially conserved in all species; blocks in green are conserved among species in the simian group; blocks in blue are conserved among species in the great ape group; blocks in purple are conserved among species in the avian group; blocks in orange are conserved among species in the rodent group; and blocks in gray are not conserved. The protein is divided into three major regions: a conserved N-terminal region, a central quasi-variable region, and a partially conserved C-terminal region. The conserved N-terminal region is composed primarily of a WD40 domain consisting of seven WD40 repeats. A zinc-protease motif (Zn) is also found at the end of the N-terminal conserved region in most species. In the quasi-conserved region, there is little sequence conservation across all species. However, there is substantial sequence conservation among the species within a group. There are three small regions of conserved sequence in this quasi-variable region including a 20-residue region of primarily hydrophobic amino acids (H) and two 12 amino acid regions of either partially or highly conserved sequence. A large portion of this quasi-variable region is a large continuous block of predicted disordered structure in all species. The C-terminal region is interspersed with conserved, partially conserved, and conserved within the group sequences

WD40 repeat domain

Seven WD40 repeats are found at the beginning of PlasWD40-1 (Supplemental Data, pages 2–7). These repeats exhibit the characteristic features of a WD40 repeat (Wang et al. 2013). For example, the repeats are approximately 40 amino acids and form four anti-parallel β-strands, and many of the repeats have the iconic tryptophan (W) and aspartate (D) that is part of the namesake WD40. In addition, a conserved tetrad of aspartate (D), histidine (H), serine (S), and tryptophan (W) is present in most of the repeats. The hydrogen bonds formed by this DHSW tetrad stabilize the β-sheet. Furthermore, most WD40 domains are composed of seven WD40 repeats and are often described as folding into seven-bladed propeller-like structures with each blade composed of the four-stranded anti-parallel β-sheet (Jain and Pandey 2018). Each propeller blade is formed from three β-strands of the repeat and a β-strand of the adjacent WD40 repeat. The last blade is then formed from the remaining β-strand of the first repeat and the three remaining β-strands of the last repeat. This closes the ring and provides stability to the WD40 domain.

These WD40 repeats were readily identified with the WDSP tool (Wang et al. 2013). The average score for the WD40 domain from all 17 species was 72.1, which is in the high confidence range. The individual repeats range in size from 39 to 49 amino acids and exhibit WDSP scores greater than 70 for six of the seven repeats (Table 1). Note that scores greater than 62 are considered high confidence, scores between 44 and 62 are medium confidence, and scores less than 44 are low confidence (Ma et al. 2019). The individual repeats exhibit a high level of sequence homology between Plasmodium species with 51–78% sequence identity among the 17 species. If conservative substitutions are considered, the identical plus similar residues range from 83 to 98%. Within a clade, the sequence identity approaches 100% for the WD40 repeats. This amount of homology is rather remarkable since WD40 sequence motifs generally do not exhibit high levels of sequence identity (Stirnimann et al. 2010). Considering avian and mammalian Plasmodium species may have diverged between 10 million (Böhme et al. 2018) and 64 million (Silva et al. 2015) years ago, the high level of homology implies strong selective forces to maintain this level of sequence conservation. In addition, the partial sequence of PlasWD40-1 from Hepatocystis exhibits a high level of sequence identity with WD40 repeats 3–6 (Supplemental Data, pages 3–6).

Table 1 WDSP scores and homology across the seven WD40 repeats

Divergent sequences are found between WD40 repeats two and three (Supplemental Data, pages 2–3), repeats four and five (Supplemental Data, page 4), and repeats six and seven (Supplemental Data, page 6). These inter-WD40-repeat sequences are largely conserved within the four clades, but not conserved across the four Plasmodium clades. Similarly, the Hepatocystis sequence between repeats four and five and repeats six and seven is distinct from the Plasmodium sequences. These inserted sequences between the WD40 repeats tend to be of low complexity and blocks of tandem repeats are sometimes observed. In addition, P. vivax has a glycine-rich insert of 16 residues between WD40 repeats five and six (Supplemental Data, page 5). The inserts between the WD40 repeats would be exposed on the bottom face of the WD40 domain (Wang et al. 2013). The significance of these inserts within the WD40 domain is not clear.

The WD40 domains of the PlasWD40-1 sequences were subjected to 3D modeling with Phyre2 and SWISS-MODEL. Both 3D modeling programs readily identified the WD40 domain of PlasWD40-1 and a wide variety of WD40 domain-containing proteins were identified as templates. Many of the resulting models exhibited the canonical seven-bladed propeller (Jain and Pandey 2018) with each of the blades being formed from three β-strands of one WD40 repeat and the N-terminal β-strand of the adjacent WD40 repeat (Fig. 5). Therefore, the modeling further supports the validity of this WD40 domain. Since the primary function of the WD40 domain is to bind other proteins, the high level of conservation of the WD40 domain in both sequence and structure implies that PlasWD40-1 binds to other protein(s) that may also be conserved in the genus Plasmodium.

Fig. 5
figure 5

Modeling of the WD40 domain from PlasWD40-1. Shown are representative examples of models generated with Phyre2 using the receptor for activated C-kinase 1 (PDB ID 3DM0) as template from each of the four Plasmodium clades (as indicated). The colors of the seven WD40 repeats are yellow (WD40-1), green (WD40-2), turquoise (WD40-3), light blue (WD40-4), dark blue (WD40-5), purple (WD40-6), and maroon (WD40-7). Red indicates positions where the inter-WD40-repeat sequences were manually removed before modeling, and orange indicates positions where sequence was removed by the modeling program. Remaining sequence is in light gray

Zinc protease motif

A search for known motifs and domains revealed a zinc protease motif at the end of the conserved N-terminal region in most species (Fig. 4). Most WD40-domain proteins also possess additional domains with other functional activities (Jain and Pandey 2018). This zinc protease motif is found within a highly conserved stretch of 39 amino acids that is separated from the WD40 domain by a small region of sequence that is conserved within clades (Supplemental Data, page 8). Zinc protease motifs are defined by a core element of HExxH and more stringently defined by [GSTALIVN]-{PCHR}-{KND}-H-E-[LIVMFYW]-{DEHRKP}-H-{EKPC}-[LIVMFYWGSPQ] where residues enclosed by [] are allowed and residues enclosed by {} are not allowed (Rawlings and Barrett 1995). The two histidines (H) bind the zinc atom and the glutamate (E) provides the nucleophile to form the active site. P. coatneyi has a conservative replacement of the active site glutamate (E) in the core element (HDxxH vs HExxH). This substitution of glutamate (E) for aspartate (D) would likely have minimal effect on the presumptive protease activity. However, members of the great ape clade (subgenus Laverania) have PDxxH for the core element, as well as an asparagine-rich insert immediately before the core element. Proline is never found within the zinc protease motif, possibly because it would disrupt the α-helix that includes the HExxH motif. The proline and poly-asparagine insertion raises questions about the validity and universality of this putative zinc protease motif, especially in the species from the great ape clade.

Initial modeling of the region containing the zinc protease motif did not identify any known structures that could serve as templates. Therefore, an ab initio approach was used to model this region. Generally, the xxxHExxHxx motif is found within an α-helix and the two histidine residues coordinate with the zinc atom (Rawlings and Barrett 1995). Some of the predicted models had the HExxH active site adjacent to an α-helix and in a region with several α-helices (Fig. 6), as is common in metalloproteases. Considering that the ab initio approach is based on probabilities of folding, and since the motif is surrounded by α-helices, it is reasonable to assume that the Zn-protease motif is within an α-helix. In addition, the critical histidine and glutamate residues are near the zinc atom suggesting the possibility of an intact active site. Furthermore, a groove that could serve as a substrate-binding site is found near the putative active site. As expected, the P. falciparum model (Fig. 6) is rather disordered in the proximity of the putative active site due to the poly-asparagine region adjacent to the putative zinc-binding site and the substitution of PDxxH for the canonical HExxH. Experimental studies are needed to resolve the inconsistencies of the modeling data with the sequence data and to determine if PlasWD40-1 has zinc-binding or metalloprotease activity.

Fig. 6
figure 6

Modeling of the putative zinc protease motif. Structures were built following the ab initio approach using the QUARK tool. The zinc atom (dark gray sphere) was inserted with PyMOL. Shown are the hypothetical structural models of the Zn-protease motif plus 100 residues on each side of the canonical xxxHExxHxx sequence (blue) from PlasWD40-1 of P. relictum and P. falciparum. Dotted boxes show enlargements of the canonical HExxH motif and the Zn atom. The dashed ellipse in the P. relictum model denotes a groove that could function as a substrate-binding site. The golden-brown regions in the P. falciparum model denote the poly-asparagine insert found in the great ape group

Low-complexity disordered region

Following the highly conserved N-terminal region is a central region that exhibits little homology across all the species. However, within the four major clades, there is a fair amount of homology (note the amount of green, blue, purple, and orange in the central region of Fig. 4). Most of the variation in this central region is found in the simian group and a lot of this variation is due to the two major branches within the simian group (Fig. 3). There are three small regions of 12–20 residues (Supplemental Data, pages 9, 14, and 18) that are conserved across all species within this quasi-variable central region. Presumably, these three conserved regions play some role in the overall structure of PlasWD40-1.

The quasi-variable region is also characterized by a substantial amount of low-complexity sequence. Low-complexity sequence is characterized by a low diversity of amino acid residues, and this low diversity is often due to tandem repeats. Proteins with tandem repeats and low complexity are relatively common in Plasmodium with more than half of the genes containing low-complexity sequence (DePristo et al. 2006). The sequences from the great ape group contain several asparagine-rich regions, which have been noted in P. falciparum proteins in general (Chaudhry et al. 2018). The rodent parasites have the least amount of low-complexity sequence within the PlasWD40-1 sequence.

Although there is little conservation across the central region of PlasWD40-1 in terms of sequence identity, this central region is predicted to primarily exhibit a disordered structure in all 17 species. Furthermore, the central region has the signature features of disordered regions, namely low-complexity sequence and an amino acid compositional bias with a low content of bulky hydrophobic amino acids and a high proportion of polar and charged amino acids (Dyson and Wright 2005). Tandemly repeated sequences are a common feature of disordered regions (Tompa 2003). Although intrinsically disordered protein regions lack a well-defined structure, they often carry out important functions that often involve molecular recognition or function as linkers between protein domains. Thus, even though the sequence is not highly conserved across all Plasmodium species, the function of the central region may be preserved.

C-terminal region

The C-terminal region of PlasWD40-1 exhibits some sequence conservation, but not as much as the N-terminal region. BLAST searches with the C-terminal regions did not identify any highly homologous sequences that were from non-Plasmodium species. Furthermore, no motifs or other characterized domains were identified except for a predicted membrane-spanning sequence near the C-terminus in most of the species (Supplemental Data, page 36). However, there are no signal sequences in PlasWD40-1, and this may represent a conserved hydrophobic region that is important for the folding and structure of the protein.

Interactome

Both the WD40 domain and disordered protein sequence are likely involved in protein–protein interactions, and especially interactions involved in scaffolding and the assembly of dynamic multi-subunit complexes. An analysis of protein–protein interactions using the yeast two-hybrid screening has been carried out in P. falciparum (LaCount et al. 2005), and several interactions with PlasWD40-1 were identified (Table 2). None of the interactions corresponded to the N-terminal WD40 domain. This is not unexpected in that binding to the WD40 is highly dependent on the three-dimensional structure of the domain which may not be maintained in the yeast two-hybrid system. Most of the interactions identified by the yeast two-hybrid screening corresponded to the central disordered regions of PlasWD40-1 and are in support of the contention this quasi-conserved region may function in protein–protein interactions. The only binding sequence outside of the central disordered region significantly overlaps with the original cDNA clone from the lambda expression library (Supplemental Data, page 33).

Table 2 Proteins interacting with PlasWD40-1 as detected by the yeast two-hybrid assay

The most often observed interaction of PlasWD40-1 was with itself. This could imply that the protein forms oligomers with itself. However, one of the binding partners is twelve amino acid residues of a poly-asparagine region containing eleven asparagines and one aspartate (Supplemental Data, pages 11–12). The high frequency of poly-asparagine repeats in P. falciparum (Chaudhry et al. 2018) raises questions about the validity of this protein–protein interaction. Similarly, the regions of PlasWD40-1 that binds to the GNC5 histone acetyltransferase and the L subunit of translation initiation factor 3 are also largely a poly-asparagine repeats (Supplemental Data, page 19). Likewise, the region of PlasWD40-1 that binds to actin is primarily a low-complexity repeat that is not conserved within the great ape group (Supplemental Data, page 15).

Several of the proteins interacting with PlasWD40-1 are involved in transcription and chromatin remodeling, known roles for some WD40 containing proteins. For example, transcription factor APIA2 binds to a region that is partially conserved in all Plasmodium species (Supplemental Data, page 14). However, it should be noted that this partially conserved region is rather hydrophobic and could promote promiscuous interactions. Discerning the exact interactome of PlasWD40-1 will require more direct experimental studies. In addition, yeast two-hybrid assays in other Plasmodium species may prove useful in identifying common proteins that interact with PlasWD40-1 in other Plasmodium species.

Conclusions

PlasWD40-1 is a large protein (2065–3217 amino acids) found in the genus Plasmodium. The protein consists of a N-terminal WD40-repeat domain that is highly conserved among all Plasmodium species. The N-terminal region may also contain a Zn-protease site in some Plasmodium species. This is followed by a central domain of quasi-conserved sequence that primarily has low-complexity sequence and a disordered structure. The sequence in this central region is not highly conserved across all species, but the central region is relatively well conserved among species within each of the four Plasmodium clades. The C-terminal region is partially conserved across all species. In that the function of WD40 domains and disordered protein structure is primarily to bind other proteins, it is likely that PlasWD40-1 plays some role in forming multi-protein complexes. The specific proteins that bind to PlasWD40-1 remain to be determined.

The apparent restriction of PlasWD40-1 to the genus Plasmodium implies that the function is somewhat unique to Plasmodium. There are certainly many unique features in the biology of Plasmodium that required multi-protein complexes and scaffolding. For example, PfWLP1 is another WD40 repeat protein unique to Plasmodium (von Bohl et al. 2015). PfWLP1 forms complexes with adhesion proteins in the micronemes of merozoites and in the subpellicular space in gametocytes. In addition, proteins unique to Plasmodium that play important roles in the parasite’s biology make attractive therapeutic targets. And initial studies suggest that WD40 proteins are viable drug targets (Song et al. 2017).