Introduction

The advance in sequencing technology has led to an exponential increase of data on genomics, transcriptomics, proteomics, and metabolomics in the postgenomic era. Although a majority of such data from various studies are deposited in data repositories and shared among researchers, for the most part, they are not fully explored (Nabi et al. 2020). For example, uncharacterized functional proteins (UPFs) and proteins containing the domains of unknown function (DUFs), which are collectively termed DUF families (Mistry et al. 2021), present 24% (4795/19,632) of the protein families currently stored in the Protein Family (Pfam) database (version 35.0) (Mistry et al. 2021). DUFs are protein domains with relatively conserved amino acid sequences and uncharacterized functions (Bateman et al. 2010). They were temporarily named with the prefix DUF followed by a number, such as DUF1 and DUF2. Once their functional properties are characterized, they will be appropriately renamed or merged with other well-known domains (Bateman et al. 2010). For example, DUF1 and DUF2, frequently presented in the bacterial signaling proteins, were renamed as GGDEF and EAL domains, respectively, after their functions were elucidated (Simm et al. 2004). As of 2021, the functions of 1132 DUF or UPF families were identified, and their identifiers in the Pfam database were changed (Mistry et al. 2021).

Initially, DUFs were often overlooked because they were generally not essential in standard mutant screens and were only discovered in a limited number of genomes (Jaroszewski et al. 2009). However, studies have shown that many of the most prevalent DUFs found in bacteria were also present in animals and plants (Goodacre et al. 2013). For instance, the DUF143-containing protein RsfA from bacteria can interact with ribosomal protein L14 to block the synthesis of protein. This interaction was verified using the homologs of RsfA/L14 in other bacteria, as well as the mitochondria of yeast and human and the chloroplasts of maize in vitro (Haüser et al. 2012). The homologs of RsfA, ATP25 from yeast, and DG238 from Arabidopsis were localized in mitochondria and chloroplasts, respectively, and function in the development of mitochondrial and chloroplast (Zeng et al. 2008; Wang et al. 2017). Therefore, eukaryotic DUF143 may be the evolutionary product of bacteria RsfA and are functionally conservative in organelles. Evolutionary history analyses of prevalent DUFs, such as DUF143, could help to elucidate their functions and roles in plant adaptation.

Studies have shown that DUFs play essential roles in physiological processes, including the growth, development, and responses to biotic and abiotic stresses in plants (Table 1). This aspect has been addressed in a previous review (Lv et al. 2023). In the present review, we systemically summarized the identification of DUF genes and the roles of DUF-containing proteins as transferases, pleiotropic factors, ABA-responsive factors, and interaction partners and pointed out the potential future directions.

Table 1 The DUFs involved in diverse biological processes in plants

Identification and characterization of DUF genes

Phenotype-driven approaches

Phenotype-driven approaches begin with genetic screening for mutants of interest and then proceed with the identification of genes causing the observed phenotypes. Some genes encoding proteins with DUF domains were identified by this approach. For example, the Young Leaf Chlorosis 1 (YLC1) gene encoding a protein with DUF3353 domain was isolated by map-based cloning from a rice mutant with a chlorosis phenotype and decreased level of chlorophyll and lutein contents at the early plant development stage. Then, the function of YLC1 in chlorophyll and lutein biosynthesis was verified using complementary analysis and RNA inference technology. In addition, the expression levels of genes related to chlorophyll biosynthesis and photosynthesis were significantly altered in the ylc1 mutant, implying a potential role of YLC1 in regulating chlorophyll biosynthesis and photosynthesis (Zhou et al. 2013). Similarly, the Young Seedling Stripe1 (YSS1), encoding a protein belonging to the DUF3727 superfamily, was identified through map-based cloning as the responsible gene for the striated leaves at the seedling stage of a rice mutant. Further investigation revealed abnormal chloroplast structure of the yss1 mutant, which is completely rescued in YSS1-overexpressed transgenic plants (Zhou et al. 2017). In addition, a floral mutant with wrinkled petals and exposed stigma was screened in mungbean (Vigna radiata). The Stigma Exposed 1 (SE1), a DUF1005-encoding gene, was identified through map-based cloning. Histological observation revealed irregular cell shape in petals and increased cell length in stigma of the se1 mutant, indicating a role of SE1 in cell shape and size regulation. This was further confirmed by the hetero-expression of SE1 in Arabidopsis (Lin et al. 2020). With advances in mutant creation, sequencing technologies, bioinformatics, and phenotype screening, it will be much easier and faster to identify and characterize DUF genes by phenotype-driven approaches.

Knowledge- and data-driven approaches

The advent of next-generation sequencing and sophisticated computational algorithms enabled the fast identification and functional prediction of genes. Studying DUF genes through knowledge- and data-driven approaches generally involves some intelligent guesswork by searching for sequence and structural homologs and determining expression profiles. These predictions provide a starting point for further genetic and biochemical studies.

The function of a gene can be predicted by identifying homologous genes with known functions. For example, two DUF288-containing proteins, STELLO1 (STL1) and STELLO2 (STL2), were validated to regulate cellulose synthesis by aiding in the assembly of cellulose synthase complexes in Arabidopsis (Zhang et al. 2016). Their homologous genes in cotton, namely GhSTL1-GhSTL4, were identified by blasting the DUF288 domain in the upland cotton genome database. The role of GhSTLs in cellulose synthesis was validated by generating GhSTLs-silenced plants. The cellulose content and length of fibers were reduced in the GhSTLs-silenced plants compared to wild-type plants, indicating that GhSTLs are involved in cellulose synthesis in cotton (Guo et al. 2022). Besides sequence similarity, identifying structural homologies of a protein to other well-characterized proteins could also provide insights into its function and roles within the biological system. With structural genomics, the structures of hundreds of DUF proteins were examined, and two thirds of them may belong to functionally well-characterized protein families, which offers the first hypothesis about their functions (Bateman et al. 2010; Goodacre et al. 2013; Mudgal et al. 2015).

The expression profile of a gene under various conditions in the cell or in the whole organism can also give clues to gene function. DUF4228 genes responded to various abiotic stresses in Arabidopsis (Yang et al. 2020), soybean (Leng et al. 2021), and cotton (Lv et al. 2022) at the transcriptional level, indicating their functions in plant tolerance to abiotic stresses. This implication was validated later by several experiments. For example, the hetero-expression of MsDUF4228 in tobacco negatively impacted the responses of seedlings to osmotic stress (Wang et al. 2018). Overexpression of GmDUF4228-70 increased the resistance of soybeans to salt and drought stress (Leng et al. 2021). Silencing of GhDUF4228-67 reduced the salt tolerance in cotton (Lv et al. 2022). Some of the OsDUF668 genes were triggered by Avr9/Cf-9 recognition and strongly expressed in response to rice blast disease and mechanical wounding, indicating that these Avr9/Cf-9-triggered OsDUF668 genes may confer resistance to biotic stresses in rice (Zhong et al. 2019). Similarly, a total of 28 DUF966 genes were identified in wheat, and some of them were strongly induced by salt stresses, implying a role of TaDUF966 in salt tolerance. This was experimentally confirmed by the fact that TaDUF966-9B knockout plants showed increased sensitivity to salt stress (Zhou et al. 2020). The expression of GhRDUF4D, a DUF1117 gene, was increased upon Verticillium dahliae infection in upland cotton, indicating its role against V. dahliae infection in cotton (Zhao et al. 2021). Consistent with the suggestion from expression data, the V. dahliae resistance was significantly enhanced in GhRDUF4D transgenic Arabidopsis and weakened in GhRDUF4D silenced cotton plants (Zhao et al. 2021). A total of 12 DUF221 domain-containing proteins (DDP) genes were identified in the genome of tomato. Among them, SlDDP6, SlDDP11, and SlDDP12 were downregulated under salt stress, whereas SlDDP1, SlDDP2, SlDDP3, SlDDP4, SlDDP7, SlDDP8, and SlDDP10 were upregulated, suggesting the involvement of these DUF221 genes in responses to abiotic stress (Waseem et al. 2021), although further investigations are required.

Instead of monitoring a single gene, RNA-seq can investigate the expressions of tens of thousands of genes at once. By identifying differently expressed genes under various conditions, researchers were able to associate genes with physiological processes and provide insight into gene functions. For example, from the available RNA-Seq datasets of Arabidopsis leaves exposed to S-nitroso-L-cysteine, 231 upregulated and 206 downregulated DUF genes were identified, indicating the roles of these genes in nitro-oxidative stress responses (Nabi et al. 2020). Among these genes, the involvement of AtDUF569 in nitro-oxidative stress responses was validated experimentally (Nabi et al. 2020). Clustering genes by expression pattern can provide an additional layer of information for gene function prediction. For example, an Oryza sativa stress-responsive DUF740 protein (OsSRDP) was characterized for its role in rice resistance to abiotic and biotic stresses (Jayaraman et al. 2023). Further investigation of the expression data showed that two genes (LOC_Os05g09640 and LOC_Os06g50370) were co-expressed with OsSRDP under abiotic and biotic stresses, indicating that these two genes are candidate interaction partners of OsSRDP and may be coordinately regulated (Jayaraman et al. 2023).

DUF-containing proteins function as transferases in plant cell wall formation

The cell wall is an intricate and crucial component of plant cells (Fig. 1a). It needs to be strong to resist internal and external pressures while remaining flexible to allow cell growth (Temple et al. 2022). Polysaccharides are fundamental for the proper structure and function of cell walls (Fig. 1b). Many DUF genes have been shown to participate in the synthesis and modification of polysaccharides as transferases.

Fig. 1
figure 1

The roles of DUF-containing proteins in regulating the synthesis and modification of plant cell wall components. a Model of the plant cell. The cell wall is shown as the closed brown circle. b The main polysaccharides in the plant cell wall are cellulose and hemicellulose, which are polymerized forms of β-glucan and galactomannan. c Various DUF-containing proteins function as transferases in the synthesis and modification of polysaccharides. The DUF domains are indicated as blue boxes, and the transmembrane domains (TMD) are indicated as the gray boxes

Synthesis of plant cell wall components

Cellulose is the major component in both the primary and secondary cell walls. Two DUF266-containing proteins participated in cellulose biosynthesis, probably as glycosyltransferases (GTs) (Fig. 1c). The loss-of-function mutation in a DUF266 gene, brittle culm 10 (BC10), reduced the cellulose content in rice (Zhou et al. 2009). Overexpression of PdDUF266A significantly increased cellulose content in Populus (Yang et al. 2017).

Pectin is a major constituent of the primary cell wall and along with hemicelluloses form a matrix into which cellulose is embedded. Rhamnogalacturonan-I (RG-I), a pectic polysaccharide, is likely present in the primary cell walls of all vascular plants (Cankar et al. 2014). The Pectic ArabinoGalactan synthesis-Related Protein (PAGR), containing a highly conserved DUF246 domain, may function in the biosynthesis of pectic RG-I arabinogalactans as glycosyltransferases in Arabidopsis (Stonebloom et al. 2016) (Fig. 1c).

Modification of plant cell wall components

Xylans, as a major component of hemicellulose, can be O-acetylated at multiple positions of its backbone residuals. Some members in the DUF231 family were proven to modify xylan as O-acetyltransferases (Fig. 1c). For example, the eskimo1/trichome birefringence-like 29 (esk1/tbl29) mutant had significantly decreased xylan acetyltransferase activity and xylan 2-O- and 3-O-monoacetylation (Xiong et al. 2013; Yuan et al. 2013). Eight closely related genes to TBL29 from the DUF231 family, namely TBL3, TBL28, and TBL30-TBL35, were investigated for their functions in xylan acetylation. Double mutants (tbl3 tbl31, tbl32 tbl33, tbl34 tbl35) had a significantly lower level of xylan acetylation than the tbl29 mutant (Yuan et al. 2016a, b, c). Moreover, triple mutants (tbl29 tbl3 tbl31, tbl29 tbl32 tbl33, tbl29 tbl34 tbl35) showed a much more dramatic decrease in xylan acetylation than double mutants, indicating functional redundancy of these genes in xylan acetylation (Yuan et al. 2016a, b, c). The xylan acetyltransferase activity of TBL29 and its homologs (TBL3, TBL28, and TBL30-TBL35) has been biochemically demonstrated in Arabidopsis (Urbanowicz et al. 2014; Zhong et al. 2017).

Besides xylan acetylation, several members in the DUF231 family were also shown to catalyze the acetylation of xyloglucan (Fig. 1c). Arabidopsis ALTERED XYLOGLUCAN 4 (AXY4) acetylated xyloglucan in tissues except seeds, and its paralog AXY4L acetylated xyloglucan in seeds (Gille et al. 2011). Another Xyloglucan Backbone 6-O-Acetyltransferase 1 (XyBAT1) can acetylate xyloglucan in rice, tomato, and Brachypodium distachyon (Liu et al. 2016; Zhong et al. 2020). Specifically, AXY4 and AXY4L may acetylate the side-chain of xyloglucan, while BdXyBAT1 acetylated the glucan backbone of xyloglucan (Liu et al. 2016). In addition, mannan and pectin are generally acetylated by acetyltransferase, including mannan O-acetyltransferase1 (MOAT1), MOAT2, MOAT3, MOAT4, and TBL10 (Stranne et al. 2018; Zhong et al. 2018). The above results indicate the conserved function of the DUF231 family in the acetylation of polysaccharides.

Besides acetylation, cell wall polysaccharides can be methylated, which may be critical for cell expansion (Levesque-Tremblay et al. 2015) and signal responses (Mizukami et al. 2016). Members in the DUF579 family may methylate polysaccharides as methyltransferases. Five proteins of the DUF579 family, GLUCURONOXYLAN METHYLTRANSFERASE1 (GXM1), GXM2, GXM3 (Lee et al. 2012), ArabinoGalactan Methyltransferase1 (AGM1), and AGM2 (Temple et al. 2019), aided the 4-O-methylation of xylan in Arabidopsis (Fig. 1c).

It would be interesting to investigate the functions of uncharacterized proteins in the DUF231 and DUF579 families in the acetylation and methylation of polysaccharides, considering the conserved functions of characterized proteins in these families.

The pleiotropy of DUF genes as regulatory factors

Some DUF genes have pleiotropic effects on the growth, development, and stress tolerance of plants. A DUF538 gene, SMALLER TRICHOMES with VARIABLE BRANCHES (SVB), was significantly highly expressed in an Arabidopsis mutant with defective trichomes, compared to the wild-type plant. T-DNA insertion in SVB resulted in smaller trichomes with variable branches (Marks et al. 2009). Besides trichome development, SVB also modulates plant growth with SVB-like (SVBL) through the transcriptional regulation of GLABRA1, a hub gene for trichome development (Yu et al. 2021). In addition, SVB may also involve in the endoplasmic reticulum (ER) stress tolerance through signal transduction as a putative phosphoinositide-binding protein (Yu and Kanehara 2020). BYPASS1-LIKE (B1L) protein from the DUF793 family is also a multifunctional regulator. B1L interacts with 14-3-3λ and TRANSTHYRETIN-LIKE (TTL) to co-regulate seedling growth and freezing tolerance by the C-REPEAT BINDING FACTOR (CBF) pathway (Chen et al. 2019, 2020a). Moreover, B1L modulates lateral root initiation via exocytic vesicular trafficking-mediated PIN-FORMED (PIN) recycling in Arabidopsis (Yang et al. 2022) (Fig. 2b). The Stress Induced DUF1644 Protein 301 (OsSIDP301) is a negative regulator for both salt stress and grain size in rice (Ge et al. 2022). OsSIDP301 regulates the tolerance of rice to salt stress through the abscisic acid (ABA) signaling pathway and affects grain size by influencing cell expansion in spikelet hulls (Ge et al. 2022). The Sc from the DUF1618 family functions as a pollen-essential factor. In Sc-j/Sc-i (japonica allele/indica allele) hybrids, the high expression of Sc-i in sporophytic cells suppressed the expression of Sc-j in pollen, leading to transmission ratio distortion. Knocking out one or two of the three Sc-i copies by CRISPR/Cas9 rescues Sc-j expression and pollen fertility (Shen et al. 2017) (Fig. 2d). In addition, Sc (#51, Os03g0247300) also responded to salt, drought, and cold stress in rice, suggesting a role of Sc in abiotic stress tolerance (Wang et al. 2014).

Fig. 2
figure 2

The roles of several DUF-containing proteins in controlling plant growth and development. a SVB is involved in regulating the formation of leaf trichomes in Arabidopsis. b B1L and c OsSGL are involved in regulating roots development, respectively, in Arabidopsis and rice. d Sc is involved in regulating pollen fertility in rice. The SVB, B1L, OsSGL, and Sc proteins contain conserved DUF538, DUF793, DUF1645, and DUF1618 domains, respectively

The importance of Stress tolerance and Grain Length (OsSGL), a DUF1645-containing protein, has been revealed by several studies. The OsSGL was upregulated by a wide spectrum of abiotic stress. The overexpression and hetero-expression of OsSGL increased the expressions of antioxidative and stress-responsive genes and enhanced the drought tolerance in rice and Arabidopsis (Cui et al. 2016). Moreover, the overexpression of OsGSL also altered an array of other traits, including increased grain length, grain weight, grain number per panicle, and extensive root systems (Fig. 2c), probably via a cytokinin signaling pathway (Cui et al. 2016; Wang et al. 2016). In addition, overexpression or silencing of OsGSL both reduced the starch content of grain (Liu et al. 2022). OsSGL was shown to inhibit the expression of starch-biosynthesis-related genes as a regulatory suppressor. Further, the interaction of OsSUS1 with OsSGL alleviated the transcriptional repression of OsSGL. Therefore, OsGSL functions as a regulator in controlling grain yield and quality (Liu et al. 2022).

The responses of plants to biotic and abiotic stresses are deployed at the expense of growth (Huot et al. 2014). Therefore, balancing the tradeoff between defense and growth is crucial. Some DUF genes represent potential roles in the trade-off by reallocating resources through phytohormonal crosstalk (Ning et al. 2017). For example, Zea mays Auxin-Regulated Protein 1 (ZmAuxRP1) from the DUF966 family can increase the biosynthesis of auxin (IAA) and inhibit the biosynthesis of benzoxazinoid, which is a potent secondary metabolite that contributes to allelopathy and defense (Ye et al. 2019). Upon pathogen infection, the expression of the resistant ZmAuxRP1 allele was transiently decreased so that plants were able to allocate more resources for benzoxazinoid biosynthesis and less for IAA biosynthesis. This led to arrested root growth but enhanced the resistance of maize to pathogens. When the pathogen attacks were averted, the expression of ZmAuxRP1 increased back to the level that is required for normal growth (Ye et al. 2019). Similarly, overexpression of AtAuxRP3, another DUF966 gene, increased the endogenous IAA level in Arabidopsis and reduced the NaCl and osmotic stress tolerance. Transcriptomic analysis showed increased expressions of a variety of development-related genes and decreased expressions of many stress-responsive genes in the AtAuxRP3 overexpression lines (Shen et al. 2019).

The functions of DUF genes mediated by the ABA signaling pathway

ABA is recognized as a “stress hormone” because of its critical role in mediating adaptive responses to various stresses, especially drought and salinity (Nambara and Marion-Poll 2005). Many DUF genes contribute to stress resistance through ABA-dependent pathways. For example, two RING-DUF1117 E3 ubiquitin ligase genes, AtRDUF1 and AtRDUF2, were upregulated by both ABA and drought stresses. Single- and double-deletion mutants of these two genes exhibited reduced ABA-dependent drought resistance in Arabidopsis (Kim et al. 2012). Arabidopsis thaliana mpo1 homolog in plants (AtMHP1) encodes a protein from the DUF962 family. The mhp1 mutant was hypersensitive to salt stress and ABA, indicating that MHP1 may be involved in the ABA-dependent salt stress response pathway in Arabidopsis (Zheng et al. 2021). A homolog of AtMHP1 in Poplar, Metabolism of PHS to Odd-numbered FA 1 (PtoMPO1), also encodes a protein containing the DUF962 domain. The hetero-expression of PtoMPO1 in Arabidopsis greatly reduced the sensitivity of the mhp1 mutant to salt stress and ABA (Zheng et al. 2022). The Bifunctional nucleases in Basal Defense response 1 (AtBBD1) from DUF151 family and a cytoplasmic S40 protein (AtS40-1) from DUF584 family were upregulated by ABA and respectively enhanced the drought and salt tolerance by ABA signaling in Arabidopsis (Huque et al. 2021; Wang et al. 2022). The DUF966-stress repressive gene 2 (OsDSR2) negatively regulated ABA-dependent salt and simulated drought stresses by downregulating the expression of ABA- and stress-responsive genes (Luo et al. 2014, 2023). Similarly, OsSIDP301 from the DUF1644 family negatively regulated the tolerance of rice to salt stress through the ABA signaling pathway (Ge et al. 2022).

ABA can suppress seed germination by preventing cell wall loosening of the embryo and inhibiting water uptake (Xi et al. 2010). Overexpression of MsDUF, a DUF4228 gene in Medicago sativa, induced the expression levels of ABA synthesis genes, increased the accumulation of ABA, and caused a reduction in seed germination (Wang et al. 2018). Similarly, AtS40.4 encoding a DUF584-containing protein negatively regulated seed germination and seedling growth by the ABA signaling pathway in Arabidopsis (Shi et al. 2021). F-box/DUF295 Brassiceae specific 2 (FDB2), belonging to the DUF295 family, conferred ABA insensitivity during seed germination and post-germination growth in Arabidopsis (Gong et al. 2022). In the presence of ABA, overexpression of FDB2 increased seed germination and seedling growth, while fdb2 mutants showed opposite phenotypes (Gong et al. 2022).

Interactions of DUF-containing proteins with other proteins

Instead of being static, proteins often change their configuration over time and frequently interact with other molecules to perform biological roles. Some DUFs perform their biological roles through protein interactions (Table 2). For instance, the DUF593 domain at the C-terminus of Zea mays floury1 (ZmFL1) binds to the 22-kD α-zein to facilitate its localization, which is critical for the formation of vitreous endosperm (Holding et al. 2007). The DUF724 domain at the C-terminal regions of DUF724-containing proteins (AtDuf3, AtDuf5, AtDuf7) engages with microtubules or actin filaments, which may play a role in RNA transport in the plant cell (Cao et al. 2010). The DUF827 domains in two coiled-coil proteins, weak chloroplast movement under blue light 1 (WEB1) and plastid movement impaired 2 (PMI2), provide a protein-protein interaction surface for forming the WEB1–PMI2 complex, which is essential for chloroplast movement response (Kodama et al. 2011). The DUF581 domain interacts with SUCROSE-NON-FERMENTING1-RELATED KINASE 1 (SnRK1) and serves as a bridge between SnRK1 and the DUF581-containing proteins while regulating stress signaling in Arabidopsis (Nietzsche et al. 2014). The DUF1620 domain and the WD40 repeat motif at the C-terminus and the N-terminus of the restoration of fertility complex 3 (RFC3) can interact with RF5 and Glycine-Rich Protein 162 (GRP162), respectively, to form the RFC complex, which is required for the restoration of fertility in Hong-Lian (HL) rice (Qin et al. 2016). The C-terminal DUF3755 domain of DIVARICATA AND RADIALIS INTERACTING FACTOR 1 (PtrDRIF1) is necessary and sufficient for the interactions of PtrDRIF1 with the homeodomain (HD) proteins (PtrWOX13c and PtrKNAT7). The formed heterotrimer may affect wood formation by mediating vascular cambium cell division and lignocellulose deposition in Populus trichocarpa (Petzold et al. 2018).

Table 2 DUF-containing proteins and their interacting proteins in plants

Conclusions and perspectives

Over the past 20 years, the number of DUF genes discovered in the plant kingdom has considerably increased. This number will further increase with the sequencing of new species and resequencing of other species. Exploring this treasure trove will provide a wealth of data for solving biological puzzles.

Model plants Arabidopsis and rice have been the focus of most studies on the biological characterization of numerous DUF genes. Since orthologs in the same DUF family often have biological functions that are identical among species (Leng et al. 2021; Lv et al. 2022), findings from rice and Arabidopsis may be transferable to other species. The phylogenetic analysis of well-characterized genes in model plants among species may provide insights into their biological functions in non-model plants.

Previous genome-wide identification of DUF genes showed that each DUF family normally has multiple duplicated gene members in one genome. Single mutants of any of these DUF genes exhibited no dramatic variation in the phenotype, whereas double, triple, or even quadruple mutants of these DUF genes exhibited distinct phenotypic variation compared to the wild-type plants. This suggested that genes containing the same DUF domain may be functional redundancy (Cao et al. 2010; Kim et al. 2012; Mewalal et al. 2016). On the one hand, this makes it particularly difficult to study the function of DUF genes. On the other hand, characterizing the function of one DUF gene can give clues to the functions of other genes in the same DUF family.

DUF genes can perform various molecular functions, such as transferases, ABA-responsive factors, and regulatory factors, in plant growth and stress responses. The architecture of cell wall is important in plant morphogenesis and the tolerance of plant to external stresses. For example, a DUF-246 family glycosyltransferase-like gene, PAGR, involves in biosynthesis of pectic arabinogalactans and affects male fertility in Arabidopsis (Stonebloom et al. 2016). In addition, AhDGR2, a DUF642 gene, affects the structure and composition of cell walls and causes salt and ABA hyper-sensibility in Arabidopsis. Similar to PAGR and AhDGR2, whether other cell wall formation-related DUF genes affect plant morphogenesis and stress resistance will also be interesting to explore. Studies have shown several DUF genes may function as ABA-responsive factors in abiotic stress tolerances (Zheng et al. 2021; Wang et al. 2022). ZmAuxRP1, a stalk rot disease resistance related gene from the DUF966 family, regulates plant growth by influencing IAA biosynthesis (Ye et al. 2019), indicating it may involve in IAA signaling pathway. Because of the importance of signaling pathway in plant growth and stress responses, whether DUF genes encoding transferases and regulatory factors (e.g., OsSGL, SVB) may act through signaling (ABA, IAA, JA, SA etc.) pathways or hormone crosstalk (Fig. 3) deserves further investigation.

Fig. 3
figure 3

Proposed model for DUF genes involved in growth regulation and stress tolerances in plants. The developmental cues or various stimuli (biotic and abiotic stresses) trigger the expression of DUF genes through an ABA-dependent pathway in plants. Other signaling molecules, besides ABA, that mediated the function of DUF genes should be elucidated in future studies. The DUF families contain at least one conserved DUF. The structures and functions of DUFs should be elucidated using various biotechnologies (genetics, biochemistry and molecular biology, structural biology, omics, and bioinformatics) and information techniques (machine learning). The DUF genes regulate the growth and stress tolerances of plants by influencing the expressions of developmental, defense, and stress-responsive genes

Currently, at the genetic level, the functions of DUF genes in controlling plant growth, development, and stress responses were mainly investigated by phylogenetic analysis, chromosomal locations, gene structures, motif compositions, gene duplications, cis-elements prediction, and expression profiling analysis, as well as through phenotypic, physiological, and transcriptional differences between wild-type and transgenic plants. At the protein level, it is commonly thought that the functional units of proteins are represented by domains, which often have unique structures and roles. It is also the respective DUF domain that defines the DUF protein family. Therefore, the functions of specific DUF domains deserve more investigation. Such studies could begin with mining the core motifs of the DUF domains through amino acid substitutions and deletions (Wang et al. 2017) and then dissecting their functions using gene editing (Fig. 3). It was also suggested that individual DUF domains may have unique functions determined by their structure (Bateman et al. 2010). To the best of our knowledge, the crystal structure was solved for only a few DUF domains (PatG-DUFsp, DUF1110) in plants (Mann et al. 2014; Harada et al. 2016). In the future, the functions and structures of specific DUF domains need to be solved to fuel the characterization of proteins with these domains. In addition, most biological processes are mediated through protein interactions. Therefore, to systemically study the function of DUFs as well as their involved biological networks, diverse approaches at the protein level should be applied (Harada et al. 2016; Liu et al. 2022).

The continuous innovation in algorithms and computational frameworks will enable the integration of likely relevant multi-omics data, including phenomics, genomics, epigenomics, transcriptomics, proteomics, proteogenomics, interactomics, ionomics, metabolomics, et al., across experiments to boost the insights, broaden the horizon, and generate new hypotheses for DUFs studies. The computationally generated hypotheses can then be tested efficiently with advanced technologies in functional genomics. The high-quality experimental data will in turn improve algorithms to generate new hypotheses. Though the future is bright, there is a gap between computing and biology. To build the bridge and harness the power of biotechnologies and information techniques (e.g., machine learning) in DUFs studies, multidisciplinary research collaboration among geneticists, biologists, data analysts, computer scientists, as well as researchers of other related disciplines is required (Fig. 3). With more and more DUFs being deciphered, our understanding of the intricate mechanisms underlying the biological processes in the plant kingdom will be enhanced.