Introduction

The Armadillo (ARM) repeat is a ubiquitous hydrophobic domain consisting of 38–45 amino acids (aa), which occur as multiple tandem copies in a variety of eukaryotic proteins [1, 2]. It forms a 3-D structure that is highly conserved across all eukaryotic kingdoms [2]. The ARM repeats occur in tandem, with a few or no aa between each repeat, forming a single unit [1]. The number of tandem repeats in the ARM proteins of plants is generally four to eight [3]. The primary function of the ARM motif is to coordinate protein–protein interactions [2], which translates into a diverse array of potential functions for the proteins containing these motifs. These functions have been thoroughly investigated in animals and include intracellular signalling, cell adhesion, cytoskeletal regulation, nucleocytoplasmic trafficking, transcriptional regulation and vesicular transport [1, 2, 4]. These biochemical phenomena are known to be involved in plant responses towards pathogen perception [5, 6].

ARM repeat containing genes in plants has sparked an interest among plant researchers, as their encoded proteins appear to be involved in a variety of diverse functions, from development to defence. ARM proteins of plants differ from those of animals in that they frequently contain novel plant-specific domains [2] that allow for the functional diversification of the plant ARM proteins. Three of these additional domains, namely the U-box, F-box and HECT domains, are associated with ubiquitination and proteosomal degradation. Other less frequently observed domains found in ARM proteins include the importin-β-binding (IBB), leucine-rich repeat (LRR), ‘broad-complex, tramtrak and bric-a-brac’/‘poxvirus and zinc finger’ (BTB/POZ), WD40, and the microtubule-associated kinesin motor and Lis-homology (LisH) domains [3]. However, of the abovementioned domains, the U-box is by far the most commonly occurring motif in ARM proteins [3]. These include ARM repeat containing 1 (ARC1) from Brassica oleracea [7]; photoperiod-responsive 1 (PHOR1) from potato (Solanum demissum and S. tuberosum) [8]; PUB17 from A. thaliana [9]; NtPUB4 and ACRE276 from Nicotiana tabacum [9, 10] and spotted leaf11 (SPL11) from Oryza sativa [11].

Despite the large number of predicted ARM repeat proteins in A. thaliana (108, [3]), most of these have unknown functions. Moreover, very few ARM repeat genes have been characterized in other plant species. To date, fewer than ten ARM repeat genes across all plant species have been characterized and functionally investigated.

This study reports the identification and molecular characterization of an ARM repeat gene from G. hirsutum, designated GhARM (GenBank accession number HQ630673). It was initially discovered as a differentially expressed sequence tag (EST) during a differential display reverse transcriptase polymerase chain reaction (DDRT-PCR) study with a cell wall-derived (CWD) V. dahliae elicitor [12]. Our results confirm the down-regulation of GhARM following elicitation of cotton cells, and specify the time-frame of repression. Our findings suggest the GhARM may play a regulatory role in the defence response against fungi and provide a foundation for functional studies to further explore this possible role.

Materials and methods

Plant material and growth conditions

Cotton cv OR19 cell suspension cultures were established from callus tissue and grown in the dark at 25 °C on a continuous rotary shaker at 120 rpm. The growth media contained Murashige and Skoog (MS) basal salts (Sigma-Aldrich), a vitamin mix (1 μg/ml each of nicotinic acid, pyridoxine and thiamine), 0.2 μg/ml 2,4-D (2,4-dichlorophenoxyacetic acid), 2 μg/ml naphthalene acetic acid (NAA), 0.1 mg/ml myo-inositol and 30 mg/ml glucose, pH 5.8. All experiments were performed on cells in the logarithmic growth phase, 2–3 days after sub-cultivation.

Elicitor preparation and induction

The V. dahliae elicitor was prepared from the heat-released fraction of the mycelial cell walls, as described by Dubery and Slater [13]. The CWD V. dahliae elicitor is comprised of ~70 % carbohydrates and 10 % protein [13]. Cotton cell suspensions (25 ml) were treated with 5 μg/ml CWD V. dahliae elicitor, or only MS media for the controls.

Genome walking

Genomic DNA was isolated from cotton cell suspensions using a CTAB method adapted from Murray and Thompson [14]. DNA (6 μg) was digested with 2.5 U/μg DNA of the following restriction enzymes: StuI, DraI, PvuII and EcoRV (Fermentas). The reaction was incubated for 16 h at 37 °C. Adaptors 1 (5′-GTAATACGACTCACTATAGGG CACGCGTGGTCGACGGCCCGGGCTGGT-3′) and 2 (5′-ACCAGCCC-3′) from a GenomeWalker™ Universal kit (Clontech) were annealed together to yield a 25 μM genome walker adaptor using an initial incubation 94 °C for 5 min, after which the temperature was reduced every min by 1 °C, until 4 °C. Each digested genomic DNA library (10 μl) was ligated to 25 μM genome walker adaptor with 3 U T4 DNA ligase (Bioline) at 16 °C for 16 h. The reaction was diluted five times with ddH2O and primary and secondary/nested PCR were performed on the adaptor-ligated, restriction-digested cotton DNA. Gene-specific primers (GSPs, Table S1) were designed to amplify genomic sequences upstream and downstream of an EST, named C4B4 (461 bp), obtained from a previous DDRT-PCR study. The primary PCRs (25 μl) each contained 1 × Ex Taq™ buffer, 0.2 mM dNTPs (Takara Bio Inc.), 0.2 μM adaptor primer 1 (AP1) (5′-GTAATACGACTCACTATAGGGC-3′), 0.2 μM GSP (Table S1), 1.25 U TaKaRa Ex Taq™ (Takara Bio Inc.), and 0.1–1 μg adaptor-ligated, digested DNA template. The cycling conditions consisted of an initial denaturation at 94 °C for 2 min, then 35 cycles of denaturation at 94 °C for 20 s, annealing at 55 °C for 30 s, and elongation at 72 °C for 4 min, followed by a final elongation step at 72 °C for 15 min. The primary PCRs were diluted 1:49 in ddH2O and 0.5 μl of the diluted primary PCR products were used as a template for the secondary (nested) PCR with 0.2 μM adaptor primer 2 (AP2) (5′-ACTATAGGGCACGCGTGGT-3′) and an internal GSP (Table S1). The same cycling conditions were used for the secondary PCRs, except the annealing temperature was increased to 60–65 °C to improve sensitivity. The PCR products were gel purified, cloned into a pGEM®-T Easy vector (Promega) and sequenced.

Rapid amplification of cDNA ends (RACE)

The 5′ and 3′ RACE reactions were performed with a 5′/3′ RACE kit, 2nd Generation (Roche Diagnostics) in order to obtain the full-length cDNA sequence of each gene. The protocol was carried out according to the manufacturer’s instructions with minor modifications. Briefly, total RNA was isolated from cotton cell suspensions with an RNeasy® Plant Mini Kit (Qiagen) and mRNA was isolated from the total RNA with an Oligotex mRNA Mini Kit (Qiagen). mRNA (250–500 ng) was reverse transcribed to cDNA with 0.5 μM GSP (Table S2), and 25 U Transcriptor Reverse Transcriptase (Roche Diagnostics) in 10 μl reaction. The 5′ single-stranded cDNA molecule was purified with a Nucleospin® Extract kit (Macherey–Nagel) or DNA Clean and Concentrator™ kit (Zymo Research). A homopolymeric A-tail was then added to the 3′ end of the cDNA strand with a recombinant Terminal Transferase (Roche Diagnostics) and deoxyadenosine triphosphate (dATP). Primary PCRs were performed with 0.2 μM oligo dT-anchor primer (5′-GACCACGCGTATCGATGTCGACTTTTTTTTTTTTTTTTV-3′ V=A, C or G), and 0.2 μM of a second GSP (Table S2). The PCR cycling conditions were as follows: denaturation at 95 °C for 2 min, followed by 35 cycles of denaturation at 94 °C for 20 s, annealing at 55 °C for 30 s, extension at 72 °C for 2 min, and a final extension at 72 °C for 7 min. The primary 5′ RACE reaction products were diluted 1:49 with ddH2O and used as the templates in secondary 5′ RACE reactions. The secondary 5′ RACE reactions were performed with 0.2 μM PCR anchor primer (5′-GACCACGCGTATCGATGTCGACTTTTTTTTTTTTTTTTV-3′ V=A, C or G) and 0.2 μM of a third internal GSP (Table S2). The same cycling parameters were used for the secondary 5′ RACE PCR, except the annealing temperature was raised to 60–65 °C to improve sensitivity. The putative transcription start site (TSS) was only obtained after a second 5′ RACE reaction was performed further upstream of the first. The 5′ RACE products were purified, cloned into a pGEM®-T Easy vector (Promega) and sequenced.

For 3′ RACE, 1 μl (40–80 ng) mRNA was reverse transcribed to cDNA with 0.5 μM 3′ RACE adapter (5′-GCGAGCACAGAATTAATACGACTCACTATAGGT12VN-3′ V=A, C or G, N=any base) from an Ambion FirstChoice® RLM-RACE kit (Ambion) and 25 U Transcriptor Reverse Transcriptase (Roche Diagnostics) in a 10 μl reaction according to the manufacturer’s instructions. A primary PCR was performed with 0.2 μM 3′ RACE outer primer (5′-GCGAGCACAGAATTAATACGACT-3′) (Ambion) and 0.2 μM GSP (Table S2). The PCR cycling conditions were as follows: denaturation at 95 °C for 2 min, followed by 35 cycles of denaturation at 94 °C for 20 s, annealing at 55 °C for 30 s, extension at 72 °C for 30 s, and a final extension at 72 °C for 7 min. The primary 3′ RACE reaction products were diluted 1:49 with ddH2O and used as templates in the secondary 3′ RACE reactions. The secondary 3′ RACE reactions were performed with 0.2 μM 3′ RACE inner primer (5′-CGCGGATCCGAATTAATACGACTCACTATAGG-3′ (Ambion) and 0.2 μM third internal GSP (Table S2). The cycling conditions were the same as the primary 3′ RACE PCRs, except the annealing temperatures were raised to 60–65 °C to increase specificity. The 3′ RACE products were purified, cloned into a pGEM®-T Easy vector (Promega) and sequenced.

Confirmation of the absence of introns in GhARM

Internal PCR and RT-PCRs were performed to confirm the absence of introns in regions of the two genes that were not entirely covered by the genome walking or RACE reactions, such as the region of the respective ESTs. DNA and RNA were isolated, and the RNA was reverse transcribed to cDNA, as described above. Each PCR reaction contained 0.05 U ExSel high fidelity DNA polymerase (JMR Holdings), 1× reaction buffer (with 2 mM MgSO4 final concentration (f.c.)), 0.2 mM dNTPs (Bioline), 0.2 μM of each GSP (Table S3), in addition to cDNA (1 μl) or DNA (670 ng), for the RT-PCR and PCR, respectively. The PCR cycling conditions were as follows: denaturation at 94 °C for 2 min, then 35 cycles of denaturation at 94 °C for 20 s, annealing at 60 °C for 30 s and elongation at 70 °C for 1 min, followed by a final elongation step at 70 °C for 7 min. The PCR products were purified, cloned into a pGEM®-T Easy vector (Promega) and sequenced.

Sequence analyses

Sequences were analysed and assembled with ChromasPro (Technelysium), whereas DNAssist Version 5.1 was utilized for routine genomic sequence alignments. Homologous protein sequences were identified and compared to characterized gene products with the blastp search algorithm provided by the National Centre of Biotechnology Information (NCBI) at the National Institute of Health (NIH) using the basic local alignment search tool (BLAST) (http://www.ncbi.nlm.nih.gov.BLAST/). Alignments between homologous protein sequences were performed with ClustalW in ChromasPro and a phylogenetic tree was constructed with neighbourhood joining analysis using MEGA5 [15]. Two web-based software programs, PlantCARE (plant cis-acting regulatory elements) (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) and PLACE (plant cis-acting regulatory elements) (http://www.dna.affrc.go.jp/PLACE/), were used to identify putative cis-elements in the promoter regions upstream of GhARM.

Several bioinformatics web-based software programs were used for complementary prediction and/or analysis of the domain architecture of the gene products and identification of signature motif regions. These included: InterPro (www.ebi.ac.uk/interpro), Pfam (http://pfam.sanger.ac.uk), ExPASy proteomics server (expert protein analysis system) from the Swiss Institute of Bioinformatics (SIB) (http://au.expasy.org), PROSITE (http://ca.expasy.org/prosite) and ProtParam (protein identification and analysis tool) (http://au.expasy.org/tools/protparam.html) (both on the ExPASy Server), Plantsp (plants phosphorylation) (http://plantsp.genomics.purdue.edu/html), PANTHER (protein analysis through evolutionary relationships) (www.pantherdb.org), SMART (simple modular architecture research tool) (http://smart.embl-heidelberg.de), PRODOM (protein domain) (http://prodom.prabi.fr/prodom/current/html), My Hits-Motif Scan using hidden Markov models (HMMs) (http://myhits.isb-sib.ch/cgi-bin/motif_scan), and GO (gene ontology) (www.geneontology.org). The tertiary protein structures of the characterized gene products were predicted with Phyre (protein homology/analogy recognition engine) (http://www.imperial.ac.uk/phyre), a web-based protein structure prediction program [16].

Southern blot

Genomic DNA was extracted from cotton cell suspensions using a CTAB protocol adapted from Sambrook et al. [17]. DNA (30 µg) was restriction-digested (3 U restriction enzyme/μg DNA) overnight at 37 °C with SacI, XbaI, EcoRI, and HindIII (Fermentas) in a total volume of 200 μl. The restriction digests were purified with a sodium acetate precipitation to remove contaminants and reduce the volume of the digested DNA. The purified DNA (20 µg) was electrophoresed at 4 °C on a 0.8 % (w/v) TAE (40 mM Tris–acetate; 1 mM EDTA) agarose gel at 20 V and DNA was transferred to a Hybond-N+ nylon membrane (Amersham) using an upward transfer system [17]. The probes were prepared with a PCR digoxygenin (DIG) Probe Synthesis Kit (Roche Diagnostics), according to the manufacturer’s instructions. Two sets of probes were prepared for GhARM. The first probe (175 bp) was made with forward (NF1) and reverse (RR1) primers (Table S4), and encompassed the ARM repeat region. The second probe (226 bp) was made with forward (RF1) and reverse (5′ExonR1) primers (Table S4), and preceded the ARM repeats. The membranes were hybridized with 10 pM (f.c.) of the heat-denatured DIG-labelled probe in pre-warmed ULTRAhyb™ Ultrasensitive Hybridization Buffer (Ambion) for 20 h at 42 °C, with constant agitation. Detection of the membranes was carried out with a DIG Luminescence Detection Kit (Roche Diagnostics).

Real-time quantitative RT-PCR analysis

Cotton cell suspensions were treated with 5 μg/ml CWD V. dahliae elicitor. The elicitation of the cell suspensions was terminated by the isolation of RNA from the suspensions at 0 (calibrator), 2, 4, 6, 8, and 10 h. RNA was isolated from cotton cell suspensions with an RNeasy® Plant Mini Kit (Qiagen) and digested with an RNase-free DNase (Promega) to remove any carry-over DNA contamination. A Transcriptor Reverse Transcriptase (25 U) (Roche Diagnostics) was used to reverse transcribe 1 μg RNA to cDNA in a 10 μl RT-PCR according to the manufacturer’s instructions. Relative quantification real-time PCR (RT-qPCR) was performed with 18S as a reference gene using a LightCycler® FastStart DNA MasterPLUS SYBR Green I kit (Roche Diagnostics) on a RotorGene 3000 (Corbett Research, Sydney, Australia), using 2 μl cDNA as the template. For the GhARM RT-qPCR, a 175 bp fragment was amplified with 0.2 μM of forward (NF1) and reverse (RR1) primers (Table S5) at the same primer concentrations. The PCR cycling conditions consisted of an initial denaturation at 95 °C for 10 min, followed by 40 cycles of 95 °C for 10 s, 60 °C for 8 s, 72 °C for 10 s. The RT-qPCR data was processed according to the standard curve method [18].

Results

Analysis of the full-length GhARM sequence

The full genomic sequence of the GhARM (GenBank accession number HQ630673) was obtained from genome walking reactions and internal PCRs (Fig. S1, supplementary online data). The size of the genomic sequence containing the GhARM gene was 4131 bp. The 4131 bp sequence included a 2125 bp GhARM promoter that is interrupted by an upstream retrotransposon at 987 bp (Fig. S1).

The putative transcription start site of GhARM (Fig. S1) was determined from the second 5′ RACE reaction. The putative transcription end site was not obtained experimentally, since the 3′ RACE product ended prematurely (within the translated region of the gene) and subsequent attempts at 3′ RACE only produced products with sequences that were homologous, but not identical to GhARM. The translation start and end sites were determined by in silico translation of the transcript from the ATG initiation codon to the in-frame stop codon. GhARM did not contain any introns. The full-length GhARM cDNA transcript was 1780 bp, which consisted of a 1713 bp open reading frame (ORF) and a 67 bp 5′-UTR. Although the full 3′-UTR of the GhARM transcript was not experimentally determined by 3′ RACE, it could be deduced from the three polyA termination signals following the TAA translation stop site (Fig. S1).

Amino acid sequence and domain analysis of GhARM

In silico analysis of GhARM revealed that it encodes an ARM repeat-containing protein with a length of 570 aa and an estimated MW of 62 kDa (pI 9.33). A putative signal sequence was also found in GhARM, with a signal cleavage site (VRS-FV) between 42 and 43 aa (Fig. S1). Therefore, the mature, spliced GhARM protein would be 528 aa long, with a calculated MW of 58 kDa (pI 9.4). Post-translational analysis revealed two potential glycosylation and seven possible myristoylation sites in the aa sequence of GhARM (Fig. S1). GhARM contains three consecutive ARM repeats within a large Armadillo-type fold that encompasses most of the protein (Fig. 1). A Leu-rich region is evident towards the N-terminal portion of GhARM (Fig. S1).

Fig. 1
figure 1

Schematic representation of GhARM indicating genomic architechture and conserved domain analysis. GhARM contains three ARM repeats within a large Armadillo-type fold. SP signal peptide, ARM Armadillo repeat. The fully annotated nucleotide and amino acid sequences are presented in Fig. S1

A homology sequence comparison between GhARM and the four most homologous plant ARM repeat-containing proteins, identified from a BLAST search of the NCBI database, is shown in Fig. 2. The high degree of homology was not only confined to the region of the three identified ARM repeats in GhARM. The phylogenetic analysis suggests that the ARM proteins from Populus trichocarpa (XP_002321152) and Ricinus communis (XP_002525337) are the closest orthologs of GhARM (Fig. 3).

Fig. 2
figure 2

Amino acid sequence alignment of GhARM with other reported ARM repeat domain-containing proteins from plants. Identical aa (asterisk), conserved substitutions (colon), semi-conserved substitutions (dot) and gaps (hyphen) are indicated in the alignment. The three predicted ARM repeat motifs in GhARM are underlined in green and are indicated separately above the alignment with arrows and numbers. The conserved aa residues of each ARM repeat are indicated in yellow (Val/Ile/Leu), pink (Gly), blue (hydrophobic) and red (hydrophilic). (Color figure online)

Fig. 3
figure 3

Phylogenetic analysis of GhARM with homologous plant ARM proteins. Neighbourhood joining analysis was performed using Mega5. Bootstrap values from 500 bootstrap replicates are shown next to the branches. Genbank protein accession numbers: Populus trichocarpa—XP_002321152, Vitis vinifera—XP_002271314, Ricinus communis—XP_002525337, Arabidopsis thaliana—NP_178638, Oryza sativa (Japonica cultivar)—NP_001062641, O. sativa (Indica cultivar)—EAZ08227, Sorghum bicolor—XP_002467629, Medicago truncatula—ABN08829, Zea mays—NP_001146226, Triticum aestivum—ACS92633, Physcomitrella patens—XP_001772489

Molecular modelling

In order to investigate the potential tertiary structure of GhARM, Phyre, a comparative 3-D modelling program was used (Fig. 4). The GhARM structure prediction was based on its homology to a human β-catenin protein from the ARM repeat family (PDB code 1JDH, domain ‘A’). The GhARM protein was predicted to have a 14 % identity to the β-catenin with an E-score of 3.6e−25 at an estimated precision of 100 % (Fig. 4).

Fig. 4
figure 4

Molecular modelling of the ARM repeat domain of GhARM. Helices are represented by round red arrows and beta sheets are represented by flat blue arrows. The grey loop regions link the secondary structures together. The N- and C-termini are labelled. (Color figure online)

Southern blot analysis

Two Southern blots were carried out to determine the copy number of GhARM, and both produced multiple bands, suggesting that GhARM is a low-copy gene that forms part of a multi-gene family. The first Southern made use of a 175 bp probe that hybridized to the ARM repeats of GhARM (Fig. 5). A second Southern, with a 226 bp probe complementary to a region outside of the ARM repeats (Fig. S2) was also performed to confirm that the multiple bands were a true indication of the GhARM copy number and not the result of a non-specific probe binding to other ARM repeat-containing family members in G. hirsutum.

Fig. 5
figure 5

Southern blot analysis of GhARM. Cotton genomic DNA (30 μg) was digested with: (1) SacI, (2) XbaI, (3) HindIII and (4) EcoRI

Promoter analysis

A 2125 bp promoter sequence was obtained from genome walking upstream of GhARM. However, as a 777 bp LINE retrotransposon interrupted the GhARM promoter, only 986 bp following the TAG translation end site of the transposon was analysed. The promoter was examined for cis-acting elements that are potentially involved in regulating the expression of the gene (Table 1). These included defence and elicitor-responsive, hormone-responsive and abiotic stress-responsive cis-elements. Putative light-responsive, tissue/organ-specific, sugar-responsive and other miscellaneous cis-elements were also identified in the promoter (not shown in Table 1).

Table 1 Analysis of putative cis-elements in the promoter of GhARM

Real-time quantitative RT-PCR expression analysis

To further investigate the expression pattern of GhARM following elicitation with the CWD V. dahliae elicitor, cotton cell suspensions were treated with the elicitor over a 10 h period. The transcription of GhARM was monitored with RT-qPCR (Fig. 6).

Fig. 6
figure 6

Expression of GhARM in cotton cell suspensions treated with a cell wall-derived (CWD) V. dahliae elicitor. Cotton cell suspensions were treated with 5 μg/ml CWD V. dahliae elicitor and RNA was isolated from treated suspensions at the given time points. Error bars represent the SEM of two biological repeats and two technical repeats (n = 4.) Significant differences at P < 0.05 between the treated samples and the control (0 h) are indicated with asterisks

A bi-phasic repression of GhARM was observed, in response to elicitation with the CWD V. dahliae elicitor over a 10 h period (Fig. 6). During the first phase of repression, GhARM transcript levels had dropped by an average of 0.531-fold (half of the original transcript levels), after only 1 h exposure to the CWD V. dahliae elicitor. However, 1 h later (2 h after elicitor–exposure) the GhARM transcript levels had recovered and returned to normal (pre-elicitation). During the second, stronger phase of transcriptional repression, GhARM transcript levels exhibited maximal repression after 8 h exposure to the elicitor, whereby its levels were reduced to an average of 0.069-fold less than normal.

Discussion

GhARM encodes an Armadillo repeat protein with three consecutive ARM repeats

While ARM repeat genes are well-characterized in animal systems where they have diverse roles ranging from cell adhesion to transcriptional regulation [1, 2], only a few have been studied in plants. As a result, little is known about the function of these genes in plants, which may well be equally diverse. In this study, we characterized the first full-length genomic sequence of a gene encoding an ARM repeat protein (GhARM) in cotton that is down-regulated in response to a V. dahliae elicitor. The GhARM protein contains three tandem ARM repeats within an Armadillo-type fold. Phylogenetic analysis using neighbourhood joining indicated that ARM repeat proteins from Populus trichocarpa and Ricinus communis are the closest orthologs to GhARM (Fig. 3).

An analysis of the GhARM gene structure revealed that it lacks intervening intronic sequences, a feature typical of ARM genes from other plant species. However, in contrast to these simple gene structures, the more distantly related leaf and flower related (LFR) ARM repeat gene from A. thaliana has seven introns in its ORF [19].

The three tandem ARM repeat domains are the distinguishing feature of GhARM. These repeats are positioned towards the N terminus of a large Armadillo-type fold that encompasses most of the protein (Fig. 1). GhARM displays a high overall homology to ARM proteins from several diverse plant species (Fig. 2), which is not restricted to the three ARM repeats. Four amino acid residues (Val/Ile/Leu, Gly, any hydrophobic, and any hydrophilic aa, typically at positions 4, 8, 16 and 34, respectively) that are largely conserved in ARM domains [22] are present in each ARM repeat of GhARM as well as those of the other homologous plant ARM proteins.

The predicted 3-D structure of GhARM (Fig. 4) indicates the highly conserved nature characteristic of ARM repeat domains. Each ARM repeat motif consists of three α-helices that are in turn connected by short loops [20]. These helices form a compact helical bundle that pack together with consecutive ARM repeat helical bundles to form a distinctive right-handed superhelical twist (Fig. 4). ARM repeat domains are involved in mediating protein–protein interactions, by means of a shallow groove that runs along the length of the superhelix and that provides the binding surface [20]. ARM repeats always occur in tandem and are characteristically imperfect i.e. non-homologous, although the size of the repeat (~42 aa), and its hydrophobic nature, is largely conserved [20]. The lack of sequence conservation of ARM repeats is observed not only between ARM repeat proteins but also between the ARM repeats of the same protein [1, 3]. Mudgil et al. [3] proposed that this sequence diversity may be the result of positive, rather than random selection, such as that found in other repeat families such as the LRR- and resistance (R)-proteins [3]. The divergence of these ARM repeat regions would certainly account for the variety of binding partners and the diverse intracellular signalling processes mediated by this novel ARM motif. Another important factor that contributes to the diversity of cellular processes mediated by ARM repeat proteins is the presence of an additional functional domain, which usually includes those associated with ubiquitination and proteosomal degradation [2, 3]. Interestingly, no additional functional domains were detected in GhARM other than the ARM repeats. The only other characterized ARM repeat protein, LFR, which lacks any additional discernable domains, was recently reported in A. thaliana [19].

The GhARM amino acid sequence contains numerous potential phosphorylation sites corresponding to various protein kinases (not shown). The frequency of putative phosphorylation sites in GhARM suggests that it potentially associates with, and is therefore phosphorylated, by protein kinases. It is important to note that two ARM repeat proteins, namely ARC1 and NtPUB4, associate with certain receptor-like kinases (RLKs) via their ARM repeat domains. ARC1 interacts with an S-receptor-like kinase (SRK) to mediate self-incompatibility in the Brassicaceae [7, 21] and NtPUB4 associates with CHRK1 (chitinase-related receptor-like kinase) to mediate specific plant developmental processes [10]. Taken together, these findings indicate that the potential association of GhARM with protein kinase(s), and its role in signalling events downstream of pathogen perception, merits further investigation.

Copy number analysis

The multiple bands from the Southern blot analysis indicate that GhARM is a low copy gene that forms part of a multi-gene family within the G. hirsutum genome (Fig. 5). The presence of multiple GhARM gene copies may account for the multiple homologous products that were obtained from the 3′ RACE reactions. These homologous products probably represented gene copies of GhARM that have undergone independent evolution since the duplication event.

Promoter analysis

The examined gene promoter contained putative cis-elements that bind TFs known for their involvement in defence responses against invading phytopathogens (Table 1). These include the W-boxes, which bind WRKYs and GT-1-binding sites, and a P-box, which is involved in elicitor- or light-mediated phenylalanine ammonia-lyase (PAL) gene activation. Furthermore, the promoter contains several putative cis-elements that bind DNA-binding with one finger (DOF)-related, MYB and MYC TFs, which are all potentially involved in defence responses [23, 24]. Numerous putative gibberellin (GA)-responsive cis-elements were also identified. GA has long been recognized for its role in plant developmental processes [25, 26]. However, recent data indicate that GA also plays a crucial role in defence signalling [27].

Expression pattern of GhARM in response to elicitation

The down-regulated transcription of GhARM following elicitation with the V. dahliae elicitor (Fig. 6) suggests that GhARM may function as a negative regulator in cotton, the repression of which could contribute to activate defence responses. The bi-phasic pattern can be interpreted as resulting from a second signalling event that followed the first transient response [28]. Perception of microbe-associated molecular pattern molecules (MAMPs) leads to the activation of signal cascades, which then can lead to increased levels of secondary signal molecules through which defence responses are controlled and amplified in both positive and negative feedback loops [28, 29]. Similar bi-phasic modulation of genes related to defence was reported as part of the onset of systemic acquired resistance and was ascribed to be indicative of two signalling events [28].

There are currently only a few published reports on the involvement of ARM repeat proteins in plant defence, all involving PUB-ARM proteins, signifying the importance of ubiquitination for resistance of plants to pathogens. PUB17 from A. thaliana and its tobacco homolog ACRE276, which each contain five ARM repeats and a U-box, are positive regulators of the HR and defence across the Solanaceae and Brassicaceae families [9]. In contrast, SPL11 from O. sativa, a PUB-ARM with six ARM repeats, is a negative regulator of plant cell death and defence, which likely functions in basal resistance rather than R-gene-mediated defence signalling [11]. Furthermore, PUB22, PUB23, and PUB24 in A. thaliana act as negative regulators of MAMP-triggered immunity in response to several distinct MAMPs [30]. Recently, PUB12 and PUB13 were reported to associate via their ARM domains to the FLS2 receptor-like kinase and to attenuate immune signalling through its degradation [31]. Furthermore, the pub12 and pub13 mutants displayed elevated immune responses to flagellin treatment.

GhARM lacks a U-box and it is therefore unlikely to have a direct role in ubiquitination. However, its decreased transcription following elicitation opens an interesting avenue for the functional investigation into its potential signalling role during the cotton: V. dahliae interaction [32].