Introduction

Sorghum bicolour (L.) Moench, is an importants staple food crop with high ethanol production potential. Sorghum crop is grown mostly under rainfed conditions and exposed to abiotic stress at different stages of its life cycle, although it can cope with many such stresses (Badigannavar et al. 2018; Sanchez et al. 2002). Considering ongoing climatic changes caused principally by global warming, the pressure on crop production in water-limited environments is expected to increase. In India, lower yields of Rabi sorghum grown on residual moisture is mainly due to low adoption of improved varieties to abiotic stress like drought. Genetic improvement of sorghum cultivars can further increase their yields under rainfed conditions (Ongom et al. 2016).

Drought resistance in sorghum is a complex trait influenced by many genes coding for various contributing factors towards drought tolerance (Ali et al. 2011). A high-quality sorghum genome sequence data has been developed and identified 34,211 genes which can help future molecular studies (McCormick et al. 2018). The genes conferring drought resistance can provide a foundation for the scientific improvement of sorghum productivity under water deficit conditions. Understanding the drought tolerance mechanism and expression of drought resistance genes can help breed high-yielding drought-tolerant varieties. Plants overcome drought stress, by modifying several physiological and biochemical mechanisms. Drought and desiccation tolerance in plants is correlated with the presence of considerable quantities of compatible osmolytes, antioxidative enzymes, and specific desiccation tolerance proteins (Damame et al. 2016; Pardo et al. 2020; Oliver et al. 2020).

Late Embryogenesis Abundant (LEA) proteins are thermo-stable, highly hydrophilic and induced by abiotic stress. They help to prevent crystallization of cellular components under water deficit. LEA proteins function as hydration buffers and ion chelators to protect macromolecules or stabilize membranes at the onset of dehydration. In stress tolerant sorghum genotypes, LEA proteins plays critical role in stabilization of cell membrane (Vincour and Altman, 2005). Under drought stress condition, levels of reactive oxygen species (ROS) increased. For protection of hydrophobic component of enzymes from ROS derivatives, several amino acids such as glycine, glutamine, glutamic acids, threonine, asparagine, serine, and aspartic were synthesized (Ogbaga et al. 2016). LEA proteins are rich in these amino acids. Nagraja et al., (2019), identify 68 LEA genes belonging to eight families in sorghum bicolar. Majority of those LEA gens are intron less. Abdel-Ghany et al. (2020) identified 180 genes from sorghum that were differentially expressed in response to drought, of which most (70%) of them were up-regulated under drought stress. The earliest genes to be up-regulated encode for transcription factors. LEA genes were top up-regulated genes in response to drought stress in RNA-sequencing-based differential gene expression studies (Johnson et al. 2014; Fracasso et al. 2016; Abdel-Ghany et al. 2020). Under mild to severe drought stresses, 11 LEA genes were up-regulated, while the expression of these genes was reduced on re-watering (Deng-feng et al. 2019). This result indicates important role of LEA protein genes under drought stress in sorghum.

The wild sorghum genotype IS-18,909 possesses incredible drought and heat tolerance (Hinge et al. 2015, Gokul et al. 2014). It can tolerate extreme drought stress and hence this genotype can be considered as a good gene source to combat various abiotic stress. Keeping in mind the above facts, the present research program involved studies on genes encoding late embryogenesis abundant (LEA) proteins, during drought stress in sorghum.

Materials and methods

Collection of plant materials and RNA isolation

The present investigation was conducted at State Level Biotechnology Centre, Mahatma Phule Krishi Vidyapeeth (MPKV), Rahuri, District-Ahmednagar, India. The seeds of wild sorghum genotype IS-18,909 (Sorghum bicolor (L.) Moench subsp. verticilliflorum (Steud.) de Wet ex Wiersema & J. Dahlb.) were obtained from the Senior Sorghum Breeder, All India Coordinated Sorghum Improvement Project (AICSIP), MPKV Rahuri. Seeds were germinated on vermiculite under laboratory conditions. The 15 days old seedlings of each cultivar were subjected to PEG-6000 induced drought stress to obtain − 0.4 bar (0.4Mpa) osmotic pressure as per Michel and Kaufmann (1973). Leaves of seedling of each cultivar from control and stressed condition were immediately used for total RNA isolation using the QiaRNeasy Miniprep isolation kit. Formaldehyde agarose (FA) gel electrophoresis (0.8% agarose) was used to confirm the integrity of the RNA. Total RNA extracted was quantified spectrophotometrically to dilute the RNA samples to working concentration.

cDNA synthesis by RT-PCR (reverse transcription PCR)

Single Step GeNei AMV RT-PCR (Reverse Transcription PCR) kit was used to synthesize complementary cDNA from sorghum leaf RNA; which was further amplified by using Hot-start Taq DNA polymerase. First step reverse transcription for the synthesis of cDNA was carried out at 50℃ for 1 h. Hot start Taq DNA polymerase was activated at 95ºC for 15 min. The amplification of cDNA was done by performing 40 cycles of denaturation at 94 °C for 1 min, annealing at 50–60 °C (Table 1) for 1 min, extension at 72 °C for 1.30 min in a thermal cycler (Eppendorf Master Cycler Gradient, Germany). The final extension was carried out at 72 °C for 1 min, and 4 °C holds up to retrieval. The 10µL RT-PCR reaction mixture consist of components provided with single step GeNei AMV RT-PCR kit along with 25 pmoles of both gene-specific primers and 5 ng template RNA. RT-PCR products were analyzed by 1.2% Agarose gel electrophoresis in TBE buffer, and imaged was captured by a gel documentation system (Flour Chem.™ Alpha Innotech, USA). Different gene-specific cDNA bands (LEA1:547 bp and LEA3P:817 bp), were carefully sliced from the gel using the separate sterile blades, and the DNA sample was gel eluted by using Hiper™ Minispin Gel Extraction Kit.

Table 1 Designing of gene-specific primers for genes encoding for desiccation tolerance proteins

Cloning of cDNA fragments

Gel eluted LEA1 and LEA3 gene-specific cDNA bands were clone in vector pTZ57R/T (2886 bp). Ligation reactions were prepared on ice with InsTAclone™ PCR Cloning Kit, and this ligation mixture was incubated at 4ºC overnight. Competent cells of Escherichia coli strain JM109 were transformed with this ligation mix and incubated overnight at 37°C on Luria Bertani (LB) agar plates having ampicillin (100 µg/mL) and nalidixic acid (30 µg/mL). To get distinct single colonies, transformed bacterial colonies were streaked and incubated overnight at 37℃ on LB-ampicillin (100 µg/mL) agar plates. Single bacterial colonies from these plates were used for further sub-culturing in a 5 mL LB nalidixic acid broth medium and incubated overnight at 37ºC. Transformed Escherichia coli culture was pelleted and resuspended in glycerol (0.5 mL of 60% glycerol), LB broth (0.5mL), and DMSO (90 µL). They were chilled at -196°C in liquid nitrogen and stored at -80°C deep freezer till further work.

Confirmation of cloned fragments by PCR amplification and restriction analysis

Gene JET plasmid Miniprep kit (MBI Fermentas Life Science Ltd.) was used for plasmid DNA isolation from transformed E. coli (5 mL) culture in LB broth. All recombinant plasmids were double digested with restriction enzymes EcoRI and BamHI (M/s Bangalore Genei Ltd.) together at 37oC for 1 h. Isolated plasmid DNA of each recombinant clone was also PCR amplified. Products of both, i.e., PCR amplification and restriction digestion of the plasmid, were analyzed on the 1.2% agarose gel electrophoresis (Fig. 1).

Fig. 1
figure 1

Restriction digestion of cloned cDNA (LEA1 547 bp, LEA3:817 bp) into pTZ57R/T (2886 bp) vector

Sequencing analysis of cloned and eluted fragments

Custom sequencing was done from M/s Bangalore Genei Ltd. from both directions using universal sequencing primers M13F and M13R for cloned fragments in PCR cloning vector pTZ57R/T (2886 bp) (pTZ57R/T::500, pTZ57R/T::806 bp). Sequenced results were analyzed using ChromasLite 2.01 software. To study the phylogenetic relationship of sequence results generated from the cloning of LEA1 and LEA3 were evaluated by BLAST homology search. Sequence data was submitted at the online site www.ncbi.nlm.nih.gov as BankIT submissions. GenBank accession numbers KJ637318.1 and KT030731.1 were assigned to LEA1 (547 bp) and LEA3 (816 bp) was respectively. Consensus motifs from different LEA groups, as previously identified (Dure 2001; Battaglia et al. 2008; Pedrosa et al. 2015) were searched to classify the derived LEA protein sequences.

Analysis of sorghum LEA amino acid sequence

General features LEA proteins’such as molecular weight, isoelectric point, amino acid (aa) composition, molecular extinction coefficient, and half-life, were predicted using the ExPASy-ProtParam software (http://web.expasy.org/protparam/). Motifs of LEA1 and LEA3 proteins were identified using the MotifScan software (http://myhits.isb-sib.ch/cgi-bin/motif_scan) (Sigrist et al. 2010). Hydrophilicity/ hydrophobicity of LEA1 and LEA3 proteins was analyzed using ProtScale (Sweet et al. 1983, Gasteiger et al. 2005).

The secondary structure of LEA1 and LEA3 was predicted using PredictProtein (https://www.predictprotein.org/). Then, determination of the topologic structural characteristics of LEA1 and LEA3 proteins was performed using the online service SMART (Letunic et al. 2012), and their 3-dimensional (3D) structure was estimated by modeling using the online service SWISS-MODEL (Bienert et al. 2017). A 3D model of LEA1 and LEA3 was constructed using the program Swiss-PDB Viewer (Guex et al. 2009; Marchler-Bauer et al. 2017) and Phyre2 (Kelley et al. 2015) and assessed using Verify_3D (http://services.mbi.ucla.edu/Verify_3D/). SPIDER2 (Sequence-based Prediction of Local and Nonlocal Structural Features for Proteins) (http://sparks-lab.org/server/SPIDER2/) was used to predict region forming helical structure.

Results

In the present study, attempts were made to investigate genes encoding for LEA proteins expressed under drought stress in sorghum. Gene specific primers were designed and cloned full-length cDNA of these genes was sequenced for further characterization.

cDNA sequence homology analysis

On the BLAST homology-based megablast alignment, 547 bp LEA1 insert clone showed complete 100% coverage along with 99% and 95% homology (Gaps = 0%) with sorghum LEA mRNA, complete cds (DQ855277.1) and Zea mays LEA1 mRNA, complete cds (EU961259.1), respectively. This confirmed that the 547 bp cloned PCR fragment has the LEA1 gene’s full-length coding sequence.

The BLAST homology search of the 816 bp LEA3 cDNA insert clone showed 100% query coverage with six LEA3 gene accessions, including five sorghum LEA3 genes (97.55–99.51% homology with no gap) and one Pogonatherum LEA3 gene (84.8% homology with 7% gaps). It also showed 99.18–100% sequence homology with 75–89% coverage (Gaps = 12%) with three sorghum LEA3 cds.). It also showed 90.76-93.0% sequence homology with 38–52% coverage (Gaps = 12%) with three of five Zeamays LEA3 cds; with rest 2 accessions showing 86.3–89.1% sequence homology with 19–36% coverage (Gaps = 0–1%). This confirmed that the 817 bp cloned PCR fragment has the LEA3 gene’s complete coding sequence.

LEA1 and LEA3 protein homology analysis

On blastp analysis, the 181 amino acid LEA1 protein showed complete query coverage in 44 hits, with another 59 entries showing over 90% coverage. Only a single sorghum entry LEA B19.3 showed 69.1% homology (125 out of 181 aa residues) with a 44 amino acid gap (between 44 and 87 aa residues position) and 12 mismatches.

On blastp analysis, the 271 amino acid LEA3 protein showed complete query coverage in 43 hits, of which five were from sorghum that exhibited 73.4 to 74.9% homology. The top five homologous sorghum entries matched for 199–203 aa residues, with all of them showing two-gap regions of 31 (in between 21 and 51 aa residues position) and 37 amino acids gap (in between 186 and 222 aa residues position), respectively. Another sorghum entry showed over 90% coverage and 71.2% homology.

LEA protein conserved domains and motifs analysis

LEA1 protein showed an LEA domain in from 1 to 177 aa residues position with a gap from 48th to 108th aa positions (Suppl. Figure 1a and Fig. 2a). Typical LEA1 specific internal 20 bp sequence (TRKEQ [L/M] G [T/E] EGY [Q/K] EMGRKGG [L/E]) is present in three copies in between 95 and 154 amino acid positions; with other two LEA1 specific 20-mer motifs present in a single copy each (Table 2). Typical LEA1 N-terminal motif (TVVPGGTGGKSLEAQE[H/N]LAE) was located at 20–39 residue position, while LEA1-C-terminal motif D[K/E]SGGERA[A/E][E/R]EGI[E/D]IDESK[F/Y] was located at 158–177 residue position.

Table 2 Evaluation of consensus motifs of amino acids (aa) sequences from different LEA groups
Fig. 2
figure 2

(a) The conserved motifs with structural features of LEA 1 protein (AID50187) below the bp ruler from the query. Legends of motif scan features: 1: Amidation; 2: CK2_Phospho Site; 3: Myristyl; 4: PKC Phospho Site; 5: Small Hydrophilic Plant Seed; 6: LEA 5; (b) The conserved motifs with structural features of LEA 3 protein (KT030731.1) below the bp ruler from query.Legends of motif scan features:1: ASN_Glycosylation; 2:CK2_Phospho_Site; 3:Myristyl; 4:PKC_Phospho_Site; 5:TYR_Phospho_Site; 6:CAP160; 7:CRA_rpt; 8:LEA_4 (89–121; 122–165); 9:Oleosin; 10:CRA_rpt; 11:LEA 4 (89–132); 12:LEA 4 (133–176); 13:NUMOD3

The 271 amino acid LEA3 protein showed two overlapping LEA domains, from 89 to 132 and 122–165 aa residues position (Suppl. Figure 1b and Fig. 2b). Typical LEA3 specific internal 11-mer sequence (ATEAAKQKASE) as described by Dure (2001) is present in five copies in between 100 and 153 amino acid positions; while SYKAGETKGRKT motif is present twice along with additional two 11-mer LEA3 specific motifs present in a single copy each. Further, this protein appeared to be of the LEA3-D-7 subgroup (Table 2).

Structural features 181 aa LEA1 protein

On Conserved domain structure analysis of LEA protein, two distinct domains covering 1 to 40 and 109 to 177 amino acid residue regions exhibited LEA 5 superfamily type small hydrophilic seed protein-specific domains with a wide 60 amino acid wide gap in between (Fig. 2a). On Motifs identification using the MotifScan software, it had 7 Casein kinase II phosphorylation sites (each amino acid tetramer); 5 Protein kinase C phosphorylation sites (4 had amino acid trimers), six each of which was six amino acid residue long (at 24–29, 93–98, 109–114, 133–138, 152–157, 161–166 sites), glycine-rich 88–162 aa region, two amidation sites ( at 128–131 and 148–151 amino acid residues).

The SWISS MODEL analysis of the LEA1 protein KT030731 (Based on GenBank: KT030731.1) revealed two alternate pairs of LEA5 family domains with common Domain-I (from 1 to 47 position); along with either Domain-II-A of 55 aa (from 80 to 134 position) or Domain-II-B of 60 aa (from 118 to 177 position). There are two internal repeats of 26 amino acids, from position 92–117 and 132 to 157. The protein structure is highly complex, with low complexity region only in between 158 and 172 residue position. On 3D Structure analysis of protein using Pyre software, 94% of the LEA1 protein couldn’t be meaningfully predicted and is highly speculative as 66% of the protein’s sequence was predicted disordered (Fig. 3a and b).

Fig. 3
figure 3

(3a and 3b) The 3D Structure of LEA 1 protein (AID50187).predicted using SWISS-MODEL (3a) and Pyre2 (3b) software; Fig. 3c and 3d The 3D Structure of LEA 3 protein (KT030731.1) predicted using SWISS-MODEL (3c) and Pyre2 software (3d)

Structural features 271 aa LEA3 protein

On Conserved domain structure analysis of LEA3 protein, two partially overlapping domains viz. 89 to 132 and 122 to 165 amino acid residue regions exhibited LEA 4 superfamily-specific protein-specific domains (Fig. 2b). On Motifs identification using the MotifScan software, it had 5 Protein kinase C phosphorylation sites (4 had amino acid trimers), single Tyrosine kinase phosphorylation sites at the end of second LEA domain (in between 161 and 168 amino acid residues), 5 myristoylation sites outside LEA region. The 5 myristoylation site was six amino acid residue long (at 55–60, 82–87, 211–216, 234–239, 246–251 aa positions), threonine rich 99–147 aa region, asparagine glycosylation site at 249–253 amino acid residues. It has an oleosin site at 80–96 aa region (that has a role in membrane desiccation tolerance), CRA_rpt site at137-169 aa, CAP at 222–247 aa site.

The SWISS_MODEL analysis of the protein KT030731 generated 7 alternate models with most exhibiting helical structure (Fig. 3c and d). There is a transmembrane domain-like region from position 26–48. The protein structure has a low complexity region only in 103–165 and 248–255 residue positions. On 3D Structure analysis of protein using Pyre2 software, 94% of the LEA3 protein formed an alpha-helical structure (which excluded both terminal regions); with 26% of protein being disordered mostly in the terminal regions. It also showed 15.85% identity with apolipoprotein A-1.

Hydrophilicity analysis of 181 aa sorghum LEA1 protein

LEA 1 protein (KJ637318.1) consisted of 181 amino acids with a molecular weight of 19.65 KD and pI value of 6.4, exhibiting LEA5 superfamily-specific protein-specific domains (Fig. 4a). This protein is highly disordered with 19 to 122 amino acid regions without any regular secondary structure. Secondary structure analysis predicted 76.2% loop, 22.65% helix, and 1.1% strand. Most predominant amino acids were glycine (28 i.e. 15.5%) and glutamic acid (23 i.e. 12.7%); followed by leucine (14 i.e. 7.7%), arginine (13 i.e. 7.2%), alanine (12 i.e. 6.6%). There were comparatively fewer sulfhydryl group amino acids, i.e., serine (12 amino acid residues, i.e., 6.6%) and threonine (10 amino acid residues, i.e., 5.5%), as well as that of both lysine and glutamine (11, i.e., 6.1%). On the contrary, the least represented amino acids were Trp (absent), Cys (3), Phe (3), Asp (3), Tyr (4), Asn (4), and all of which except Asp and Asn are highly hydrophobic. There were 26 negatively charged amino acid residues (Asp + Glu) against 24 positively charged residues (Arg + Lys). LEA1 protein had an aliphatic index of 57.18 with a Grand Average of Hydropathicity (GRAVY) value of -0.865 as per ProtParam. This protein was classified as stable as its instability index (II) was 51.43.

Fig. 4
figure 4

(a) Prediction of hydrophobic domains of LEA 1 protein (AID50187); (b) Prediction of hydrophobic domains of LEA 3 protein (KT030731.1)

Hydrophilicity analysis of 271 aa sorghum LEA3 protein

LEA 3 protein (KT030731.1) consisted of 271 amino acids with a molecular weight of 29.2 KD and a theoretical pI value of 8.59. Almost complete protein (97.8%) formed an alpha-helical structure with a 2.2% forming loop, and no region being disordered (Fig. 4b). It has a high content of alanine (38, i.e., 14.0%) and threonine (34 amino acid residues, i.e., 12.5%). Lysine (29 amino acid residues) was the next predominant amino acid, followed by glutamine (8.5%), glutamic acid (7.4%), and serine (7.0%). On the contrary, the least represented amino acids were Trp (2), Pro (2), Tyr (3), Cys (3), and Phe (4). The total number of negatively charged amino acid residues (Asp + Glu) was 32 as against 35 positively charged residues (Arg + Lys). LEA2 protein had an aliphatic index of 57.79 with a grand average of hydropathicity (GRAVY) value of -0.738 as per ProtParam (Fig. 4b). This protein was classified as stable as its instability index (II) was 37.61.

Discussion

In the climate change scenario, water stress will be a major factor limiting crop production in this century. Transcriptome profiling of drought-stressed sorghum plants from various stages revealed the complex nature of drought stress (Varoquaux et al. 2019). The expression of 10,727 genes being significantly influenced, of which 75% reverted to their original expression levels; if stress in removed (Varoquaux et al. 2019).

LEA or dehydrins are small, extremely hydrophilic proteins having a role in imparting desiccation tolerance in plants (Close 1996; Cuming 1999; Nagaraju et al. 2019). LEA proteins are protective proteins that prevent the denaturation of cellular components, crystallization induced damages (Abdul et al. 2021) and stabilize membrane and proteins against dehydration (Fracasso et al. 2016). They accumulate in the vegetative tissues and recalcitrant seeds, where they protect the phospholipid membranes. Grasses tolerate desiccation stress during growth through various conserved mechanisms that are shared with the seed dehydration process, with an overlapping pattern of the LEA expression under both conditions (Pardo et al. 2020). Based on the high polar amino acid residues, they are thought to provide preferential hydration to intracellular macromolecules (Damame et al. 2016). On further dehydration, LEAs would provide a layer of their own hydroxylated residues to interact with the surface proteins, acting as replacement water (Close 1996).

The LEA genes have been classified into eight groups based on their conserved domains and homology analysis (Hunault and Jaspard 2010). However, 32 maize LEA genes were classified into nine groups, which were distributed throughout the genome via transposition (Li and Cao 2016). Nagaraju et al. (2019) further identified a total of 68 LEA genes in sorghum, which are evenly distributed on all ten chromosomes, with chromosomes 1, 2, and 3 being the hot spots. The majority of the sorghum LEA genes were intron-less or have fewer introns. The majority of the LEA proteins are basic, with chloroplast sub-cellular localization. Their promoter analysis revealed the presence of abiotic stress-responsive, biotic stress-responsive, hormone-responsive, and development-responsive cis-elements. Gene expression analysis revealed their tissue-specific expression, with higher expression noticed in stems than roots and leaves. Most the LEA family members were up-regulated at least in one tissue under different stress conditions (Nagaraju et al. 2019).

In the present study, the potential sorghum LEA1 (543 bp) and LEA3 (817 bp) cDNAs were cloned, and their physiological and biochemical characters were predicted. They were found to encode for a protein with 181 (19.65 kD) and 271 amino acids, matching LEA1 and LEA3 consensus sequence motifs, respectively. LEA1 and LEA3 proteins are characterized by a repeating motif of 20-mer and 11-mer amino acids, respectively (Amara et al. 2014). LEA1 proteins are seed embryo-specific, while LEA3 proteins are abscisic acid (ABA) inducible during specific developmental stages under stress conditions that interact with membranes during desiccation (Amara et al. 2014).

In sorghum LEA1, there is high glycine content (15.5%) and charged amino acids (27.6%), with a coiled structure that agrees to typical LEA1 description as reviewed by Battaglia et al.., (2008). As per Zhang et al. (2014), LEA 1 protein (AID50187) consisted of 181 amino acids long (19.65 kD size), with two distinct LEA5 type domains with 60 amino acid wide gap. It had a glycine-rich nonapeptide LEA signature in the N-terminal region with a consensus pattern. It had a highly disordered/complex structure lacking any secondary structure, comprising two internal 26 amino acids repeats. LEA 3 proteins are mostly devoid of secondary structure, being largely in a random coil conformation in solution (Gasteiger et al. 2005). This protein had an aliphatic index of 57.18 with a hydropathicity (GRAVY) value of -0.865. The LEA 3 protein (KT030731.1) consisted of 271 amino acids long (29.2 kD size) with an alpha-helix structure with two partially overlapping LEA 4 type domains and transmembrane domain. LEA2 protein had an aliphatic index of 57.79 with a hydropathicity (GRAVY) value of -0.738.

Although both these LEA proteins are rich in hydrophilic amino acids with two LEA domains; however, they were distinct in the sense of being wide apart vs. overlapping; helical vs. disordered in their structure. A high portion of random coil structures affects their exceptional water binding capacity, and the conserved segments give rise to amphipathic α-helices, which form lipid-binding domains and thus can associate with and protect lipid aggregates and hydrophobic domains of proteins. This suggests that LEA1 protein may serve as a dehydrin to protect cells from stress.

The LEA proteins comprise a diverse collection of multi-functional proteins that play a major role in desiccation tolerance and seed longevity. Each LEA protein group or family harbors one or more copies of unique domains. In a study, transgenic tobacco and maize plants expressing maize LEA14tv gene exhibited enhanced drought tolerance (Minh et al. 2019). Understanding these genes will provide further opportunities to elucidate the underlying molecular mechanism involved in drought tolerance in sorghum. Therefore, more research is needed to identify the contribution of these LEA proteins towards drought tolerance; and to decide whether they could be used for imparting drought tolerance.