Introduction

Diversity is the prime most important for designing a breeding program for any crop. In the past various researchers have already been highlighted the significance of diversity assessment in crop improvement. The variation among breeding materials could be exploited either in the form of a variety or the selection of parents for another breeding program. Domestication and breeding bottleneck (targeting only a few traits such as yield and contributing traits, exploitation of few improved lines as parents thereby mass replacement of landraces by few improved modern cultivar) resulted in a rapid reduction of variation from cultivation (Tenaillon et al. 2004). Due to the narrow genetic base of high-yielding maize cultivars breeding for biotic as well as abiotic stress tolerance become challenging. However, a rapid climate-changing scenario worsens the situation further (Prasanna 2012; Warburton et al. 2008). The Rapid change in climatic conditions in association with global population rise and production impediment induces maize demand henceforth there is an immense need for boosting up of various agronomic as well as economically important traits (Xiao et al. 2017). The development of climate-proofing genotype is the priority to combat global climate change and increase production and productivity.

For sustainable improvement in crop productivity and desired genetic gain, there is a need for continues researching, creation, and subsequent deployment of novel diversity in crops (Smith et al. 2015). In a similar line, various researchers have carried out diversity assessment in different groups of maize germplasm collection (Shehata et al. 2009; Nepolean et al. 2013; Sserumaga et al. 2014; Ertiro et al. 2017; Adu et al. 2019a, b). Dependency on a limited number of ancestral population (Yu et al. 2007) as well as manmade selection (Van Heerwaarden et al. 2012) may have contributed significantly to the reduction of genetic variation in present elite maize germplasm than the progenitor populations (Tarter et al. 2004; Le Clere et al. 2005; Lu et al. 2009; Liu et al. 2016). For improvement of modern maize cultivars in terms of favorable allelic combination and enrichment of genetic base of existing breeding programs landraces and wild progenitors are considered the biggest repository (Goodman 1990; Xiao et al. 1996; Lia et al. 2009). Teosinte the wild progenitor of maize reported to carry desirable gene combinations for valuable traits including tolerance towards various biotic (Niazi et al. 2014; Chavan and Smith, 2014; Bernal et al. 2015; Joshi et al. 2021a; Adhikari et al. 2021b; Corona et al. 2021b) and abiotic stresses, quality traits as well (Kumar et al. 2020; Sahoo et al. 2021). Teosintes are grasses that share the genus “Zea” with maize and were grouped in two sections viz Luxuriantes and Zea (Doebley and Iltis 1980). Two annual teosintes species Z. luxurians and Z. nicaraguensis along with two perennial teosinte Z. perennis and Z. diploperennis categorised in section Luxuriants. Whereas three annual teosinte species Z. mays subsp. parviglumis, Z. mays subsp. mexicana and Z. mays subsp. huehuetenangensis grouped in section Zea along with cultivated maize (Z. mays subsp. mays). All teosinte species are diploid with chromosome number 2n = 20 same as maize except Z. perennis which is a tetraploid species (2n = 40). Of the many teosintes, Zea mays subsp. parviglumis is considered to be the closest relative of maize and is highly adapted to their distinctive, local environment. Teosinte-parviglumis and maize were morphologically (Iltis 2000) as well at a molecular level (Adhikari et al. 2019) distinct from each other but reported to show cross-compatibility and fertile progenies were recovered by various workers (Singh et al. 2017; Kumar et al. 2019; Adhikari et al. 2020). Zea mays subsp. parviglumis is therefore considered to be the preferred choice for enhancement of maize germplasm as well as for domestication of wild adaptive alleles. Hence, an attempt was made to cross teosinte with maize to introgress desirable diversity for agronomical as well as yield contributing trait.

For successful exploitation of genetic diversity for crop improvement via the employment of breeding strategies knowledge of the level of diversity in germplasm set is a must (Hallauer et al. 1988; Kage et al. 2013; Xu et al. 2013). This helps breeders in the development of inbreds with immense genetic variability by the selection of diverse parental combinations (Semagn et al. 2012; Ertiro et al. 2017), for the establishment of heterotic groups and generation of source materials for the breeding program (Legesse et al. 2007). Though there are numbers of marker technologies are available for diversity assessment namely phenological, morphological, biochemical as well as molecular (Govindaraj et al. 2015; Adu et al. 2019a, b) but the molecular marker-based analysis is the most preferred one due to independence on developmental stage and immunity towards environmental fluctuation (Smith and Smith 1992; Westman and Kresovich 1997; Govindaraj et al. 2015). In previous findings for diversity assessment in maize SSR markers were reported more informative (Yuan et al. 2000; Warburton et al. 2002; Pinto et al. 2003; Inghelandt et al. 2010; Shayanowako et al. 2018; Adu et al. 2019a, b; Adu et al. 2019b). Due to desirable features such as multi-allelic nature, high variability (Tautz 1989; Schug et al. 1998; Xu et al. 2013), enormous abundance, even distribution throughout the genome (Liu et al. 1996; Senior et al. 1996; Matsuoka et al. 2002; Wu et al. 2010; Xu et al. 2013), reproducibility (Vos et al. 1995; Senior and Heun 1993), co-dominant nature, SSR have become the marker of choice for genetic analysis in crops (Gupta and Varshney 2000). In addition to diversity assessment, they are also considered the best for heterotic groups of lines (Enoki et al. 2002).

Although teosinte is genetically polymorphic but still exploited to a limited extent for genetic resource creation or diversification of cultivated maize as well as maize germplasm enhancement (Liu et al. 2016; Adhikari et al. 2021a; Joshi et al. 2021b; Corona et al. 2021a). There is an urgent need for exploitation of variation from teosinte by incorporation of teosinte in the breeding program thereby it can facilitate the better opportunity for selection of elite diversified maize lines. Therefore, the present experiment was planned by taking teosinte in a crossing program with maize for introgression of diverse alleles that were lost from maize during domestication. Thereby to know the diversity of teosinte-derived maize lines morphological and molecular diversity assessments were carried out. As graphical genotyping makes visualization of allelic introgression feasible (Young and Tanksley 1989), therefore, parent-wise level of allelic introgression in progeny can visualize easily. The objective of the present study was to investigate the level of genetic diversity among 100 teosinte derived maize lines by using genetic parameters and microsatellite markers as well as clustering of lines based on their genetic relatedness. Further, graphical genotyping to derived maize lines know to the parental allelic contribution.

Materials and methods

Generation of material

The experimental material for the present investigation was derived from wild progenitor teosinte (Z. mays ssp. parviglumis) and a maize inbred line DI-103. In the crossing program, the maize inbred was used as seed/female parent and teosinte as pollen parent to produce F1s followed BC1F1 generation was produced by one backcrossing with the maize inbred as a recurrent parent. Subsequently, selfing was carried out for four generations to produce 100 BC1F5 lines encoded as MT-1 to MT-100.

Experimental design and recording procedure

The present study was carried out with the evaluation of 100 BC1F5 lines along with both the parent in randomized complete block design with two replications in the 2018–2019 Kharif seasons. In both the replication each line was being planted in a single row of 2 m long and 75 cm apart. The data of these lines were recorded for fourteen agro-morphological traits i.e., days to anthesis (DA), days to silking (DS), anthesis–silking interval (ASI), flag leaf length (FLL), flag leaf width (FLW), plant height (PH), ear per plant (E/P), node bearing first ear (NBE), ear length (EL), ear diameter (ED), kernel rows per ear (KR/E), kernels per row (K/R), test weight (TW) and grain yield per plant (GY/P). DA and DS were recorded from the date of sowing to the day when anthers and silk appear in 50% of plants in a row, respectively. ASI was recorded as the difference in DA and DS. For FLL, FLW and PH were measured; NBE and E/P were counted for five randomly tagged plants per line and were averaged in each replication further. Whereas, EL, ED, KR/E, and K/R were recorded by averaging values of five randomly selected ears that were harvested from five randomly tagged plants of each line.

DNA extraction

Genomic DNA of each line was extracted by CTAB (Cetyl trimethyl ammonium bromide) method (Doyle and Doyle 1990) with slight modifications from young leaves of 30 days old plants. After RNase A (10 μg/ml) treatment at 37ºC for half an hour, the DNA was purified with propanol and purified DNA was dissolved in TE buffer. The quality and quantity of DNA were checked by electrophoresis of stock DNA in 0.8% agarose gel and with a spectrophotometer (Systronics PC Based Double Beam Spectrophotometer 2202), respectively. Dilution of stock DNA was made to prepare the working concentration of 200 ng/μl stored at -20ºC for further PCR amplification.

SSR marker assay

One sixty-eight SSR marker covering the entire genome of maize was selected from the maize database: http://maize.gdb to evaluates the polymorphism between the parents i.e., maize and teosinte. Around 46% of markers reported polymorphic (76) and were utilized for genotyping of 100 BC1F5 lines. Standardized PCR amplification were performed in a 13.8 μl reaction mixture containing 3 µl (200 ng/µl) genomic DNA, 0.35 µl dNTPs mix (2.5 mM each), 0.25 µl Taq DNA polymerase (3U/µl),1.5 µl reaction buffer with 15 mM MgCl2 (10X) + 1.5 µl each forward and reverse primer (40 ng/µl) and 7.2 µl deionised water. Thermal cycling was performed in sure Cycler 8800 (Agilent Technology), Prima 96 plus (Himedia) thermal cycler. The amplification process consisted of initial denaturation at 94ºC for 5 min, followed by 30 cycles of 94ºC for 40 s, annealing temperature 55ºC to 70ºC (varies with primer) for 40 s, and elongation at 72ºC for 1 min, followed by a final extension for 10 min at 72ºC. PCR products were stored at 4ºC until use. The PCR products were resolved in horizontal electrophoresis assembly using 3% agarose gel after mixing with 2 μl of 6X loading dye. A 100 bp ladder was also loaded in each row as a reference. After running of gel for 2–3 h at a constant voltage of 100 V visualized and captured under UV light in the alpha imager.

Data scoring and statistical analysis

The average value of evaluated plants for all the agro-morphological traits was calculated and used for the statistical analysis through ANOVA for estimation of variation among BC1F5 maize lines through STPR-3. Variability parameters such as genotypic coefficients of variation (GCV) (Burton 1952), phenotypic coefficients of variation (PCV) (Burton 1952), broad-sense heritability (h2b) (Lush 1949), and genetic advance in percent of the mean (GAM) (Comstock et al. 1952) were estimated by using Microsoft Excel. The SSR marker data were recorded in binary format as ‘1’ refers to the presence of a specific allele at the locus, while, ‘0’ refers to the absence of the same allele. The sizes of the bands were estimated by using a 100 bp standard marker. The presence and absence of bands in all teosinte-parviglumis derived 100 BC1F5 maize lines for 76 primers were used to generate bi-nominal data using an excel sheet. The genetic diversity of each marker was estimated by summary statistics including the number of alleles per locus, major allele frequency, minor allele frequency, gene diversity, heterozygosity, and the polymorphism information content (PIC) by using Power Marker version 3.25 (Liu and Muse 2005). The similarity between pairs of teosinte derived maize population was calculated by using Jaccard’s similarity coefficient (Jaccard 1908). Cluster analysis was carried out based on a neighbor-joining algorithm using the UPGMA (unweighted pair group method with arithmetic averages) in PAST (PAleontological STatistics) software (Hammer et al. 2001) and a dendrogram was generated by using a dissimilarity matrix. Further principal coordinate analysis (PCoA) was performed by PAST (PAleontological STatistics) software to complement clustering or grouping patterns revealed by the dendrogram. IciMapping4.2 software was used to create the linkage group for polymorphic SSR markers between two parents. The genotypic data for polymorphic SSR markers on 100 teosinte-derived maize lines were analyzed by IciMapping4.2 software with the help of group command at the default setting of LOD score of 3.0. Further to find the correct order of the markers on linkage group, sequence, compare, and ripple commands were used. The generated linkage order with position (cM) data was used to feed in the GGT2.5 (Berloo 2008) to know the introgression in each line and to construct a graphical genotype.

Result and discussion

Morphological diversity

The result of ANOVA revealed significant variance among 100 inbred lines for all the characters under evaluation, which indicated the presence of a sufficient variation among the lines that could be exploited in further maize improvement program (Table 1). Similarly, significant variation for yield contributing traits in maize germplasm was observed in previous studies (Unay et al. 2004; Wattoo et al. 2009; Zare et al. 2011; Kumar et al. 2012; Shahrokhi and Khorasani 2013; Sarac and Nedelea 2013. For DA and DS teosinte derived maize lines exhibit a wider range from 47–68 days and 44–67 days, respectively. Among the lines, MT-95 has required a minimum of 47 days for anthesis whereas the best genotypes for the silking duration were MT-40 and MT-95 as they enter in a silking stage in 44 days only. However, MT-11 was delayed in both anthesis (68 days) as well as silking (65.5 days) duration. The nearly similar range for anthesis (47–67 days) and silking duration (46–63 days), in the progeny of teosinte derived maize population was reported by Magoja (1991). Flowering duration is considered most critical in terms of abiotic stress resistance (Westgate and Bassetti 1990; Edmeades et al. 1993a, b; Edmeades et al. 1997; Sah et al. 2020). Therefore, lines with short anthesis and silking duration are considered the best material for abiotic stress resistance breeding more particularly for water-limited environments. For ASI, the variation of lines was observed from -4 to + 5 days with a mean of 2.51 days. Out of 100 lines, four MT-1, MT-68, MT-70, and MT-93 showed 0 ASI indicates anthesis and silking on the same days. Whereas, a total of 16 lines were reflected 1-day ASI. In addition to these 40 lines were showed protogynous behavior. The reduction in ASI is considered an important adaptive parameter under a drought environment (Bolanos and Edmeades 1993; Bolaños 1996; Edmeades et al. 1997; Banziger et al. 2000). It has been observed by Westgate (1997) that water deficiency in case of drought-susceptible genotypes at the flowering stage may cause delayed silking that subsequently resulted in longer ASI. As a consequence of delayed silking or larger ASI, there is the failure of fertilization resulted in an increased abortion rate of the kernel (Westgate and Bassetti 1990), therefore, lines with short ASI as well as protogynous behavior can be utilized in further maize drought improvement program.

Table 1 Analysis of variance (ANOVA) for different characters in parents (Maize (DI-103), Teosinte) and their BC1F5 maize lines

The FLL and FLW were varied from 9.46–50.59 cm and 2.16–7.68 cm, respectively. The top three genotypes for FLL were MT-33 (57.58 cm), MT-98 (53.72 cm), and MT-15 (50.59 cm), whereas, for FLW, MT-57 (7.68 cm), MT-16 (7.65 cm), and MT-30 (97.5 cm) were ranked in higher side. The lines MT-9 (9.46 cm) and MT-50 (2.17 cm) are considered the best in terms of reduced FLL and FLW, respectively. Reduced leaf area is considered as an important trait particularly under high-density planting due to more light penetration especially in the area of the ear that facilitates translocation of photosynthetic assimilates in the ear (Lambert et al. 2014; Huang et al. 2017). In contrast, a wider leaf area causes a shading effect and there is a reduction in light intensity in the leaf canopy under high plant density thereby resulted in lower grain yield (Lambert et al. 2014). Plant height ranged from 89.5–229.33 cm and maximum height (229.33 cm) recorded in the case of MT-17 and minimum were observed in the case of MT-87(89.5 cm). The smaller plant height is associated with shorter internodes; therefore, even at low plant density may inhibit better penetration of solar radiation into the canopy (Lambert et al. 2014). As a consequence, there is a reduced source potential for supplying assimilates for the development of reproductive structure due to intra-plant competition for light (Sangoi et al. 1997). Taller plants may associate with longer internodes thereby can reduce intra plant shading (Lambert et al. 2014) as a consequence all leaves get direct exposure to sunlight resulted in more photosynthate accumulation as well as yield enhancement. Due to less shading, this feature could be utilized for the high-density planting of maize. A dense plant population requires more water which may lead to water stress (Downey 1971; Tetio-Kagho and Gardner 1988), however, sufficient water availability is undoubtedly responsible for improved yield under higher plant density. Therefore, for the water-deficient areas, if maize is planted under higher plant density can lead to complete yield loss, more particularly if water stress coincides with the tasselling and silking stage (Herrero and Johnson 1981; Edmeades et al. 1993a, b). Therefore, lines with taller height along with shorter ASI (MT-46) or protogynous (MT-39, MT-24, MT-16) behavior could perform well under high-density planting even in the absence of adequate water supply. In contrast lines with a shorter height and reduced leaf area (MT-50) are better adopted under high plant density and limited water availability (Ackerson 1983; Sangoi et al. 1997).

Among developed populations, the average E/P and NBE vary from 1 to 5.5 and 2.67 to 7.50. Among the 100 lines, 90% lines were prolific of which 27% of lines possess > 3 E/P and most of the lines bear 2 to 2.5 E/P. Prolificacy is a desired trait in maize as it significantly contributes towards yield enhancement (Motto and Moll 1983). Drought adoptable maize genotypes were a feature with prolificacy (Edmeades et al. 1993a, b and Ribaut et al. 1997). To ensure minimum threshold for grain set by enabling adequate resource availability under drought-prone areas the preferred plant density is relatively low (Merlos et al. 2015). Prolificacy is an indicator of reproductive plasticity (Sarquís et al. 1998) which is defined as the ability of the plant to maintain the ratio between available resources and grain yield (Vega et al. 2000). The prolific maize carries reproductive plasticity in terms of kernel numbers adjustment with available resources and thereby able to maintain yield across the wider range of density. It was proven by Ross et al. (2020) that under reduced plant density (2 pl m–2) in prolific maize hybrids yield were maintained. Whereas, 25% yield penalty was recorded in the case of nonprolific ones. Therefore, due to high reproductive plasticity prolific lines can maintain optimum yield under stress by adjusting the number of the kernel with available resource irrespective of plant density and are the best suited for water stress environment (Ross et al. 2020). Ear placements at higher node 7.5 were recorded in the case of MT-9 whereas, the lowest was recorded in the case of MT-76 (2.6). Ear placement is present population is quite lower than maize hybrids which ranged from 9–11 (Subedi and Ma 2005) indicates lower ear placement in the studied population. Node bearing ear in association with internode length is responsible for ear height in maize. Neither too high nor too low height is desirable due to their constraints and ideal height is somewhere in between depending upon the targeted environment. If the ear is placed at the too lower position it will be unfavorable for yield and make the plant difficult for harvesting whereas, too high ear placement makes the stalk prone to bending as well as breakage. Lodging is an important problem in maize. One possible means for imparting lodging tolerance in maize is lower ear placement thereby increased plant height: ear height ratio and consequently reduction of the center of gravity of plant (Josephson and Kincer 1977; Li et al. 2007). Reduced ear placement is indirectly favorable for efficient mechanical harvesting due to lodging tolerance (Josephson and Kincer 1977). The observed variation for EL and ED among teosinte derived maize population was varied from 5.5- 19.17 cm and 0.82 to 7.17 cm, respectively. Genotypes MT-24 and MT-49 produced the longest ear and MT-10 were reported the best for ED. Among the studied population, KR/E ranged from 2.67–16, and K/R varied from 3.5–44.83. MT-64 and MT-56 ranked superior for KR/E and K/R, respectively. Wide variation for TW was explained by a wider range of 98.20–229 g and MT-58 were showed the highest TW. GY/P ranged from 6.67–98.33 g and maximum yield was recorded in the case of MT-25.

Estimation of variability parameters in teosinte derived maize population

Variability parameters such as genotypic coefficients of variation (GCV), phenotypic coefficients of variation (PCV), broad-sense heritability (h2b), and genetic advance in percent of the mean (GAM) are presented in Table 2. For all the studied traits estimates of PCV were slightly higher than GCV. The environmental influence is indicated by the differences between PCV and GCV but in the studied population good correspondence was observed between GCV and PCV for all characters except yield indicates that the observed variation is more because of the actual genotypic difference without much environmental influence. Similar results were also reported by Bello et al. 2012 for various agro-morphological traits in maize. Sivasubramanian and Madhavamenon (1973) were categorized GCV and PCV values into three classes such as low (0–10%), moderate (10–20%), and high (20% and above). High values of PCV and GCV (> 20%) were observed for ASI, FLL, FLW, E/P, EL, ED, KR/E, K/R, and GY/P. Whereas medium (10–20%) estimates were recorded in the case of PH, NBE and TW indicated the existence of substantial variability and there is ample scope of trait improvement through selection. These observations are in confirmation with the findings of Rafiq et al. (2004) and Akbar et al. (2008). On the other hand, low variation among genotypes for anthesis and silking duration was depicted by very low PCV and GCV (< 10%) estimates. Our results are following the finding of Bello et al., 2012 who have recorded low PCV and GCV estimates for anthesis and silking duration in maize. Heritability in the broad sense (h2b) was classified as low (< 50%), medium (50–75%), and high (> 75%) as suggested by Robinson (1966). Broad sense heritability estimates ranged from 38.42% for grain yield per plant to 96.71% for the test weight. High heritability was recorded in the case of DS, ASI, FLL, FLW, PH, E/P, EL, ED, K/R, and TW. Whereas, DA, NBE, and KR/E showed moderate heritability. It indicates minimal environmental influence on the expression of these traits. High heritability for EL (Noor et al. (2010), PH (Aminu and Izge 2012; Bello et al. 2012; Anshuman et al. 2013), NBE (Bello et al. 2012; Anshuman et al. 2013), TW (Noor et al. 2010), ED (Anshuman et al. 2013) observed in the present study agreed with the findings of earlier workers. The lowest heritability was recorded in the case of grain yield correspondent to its polygenetic nature. It has been observed that yield contributing traits tends to display high heritability than yield (Messmer et al. 2009; Peng et al. 2011). Therefore, it is possible to select and combine yield-related traits for the development of a genotype with superior performance (Robinson et al. 1951). Genetic advance illustrates the degree to which a respective trait is improved under specific selection pressure. In comparison to independent estimation, the combined estimation of heritability and genetic advance is considered the best in terms of reliability (Johnson et al. 1955; Shinde et al. 2010; Nwangburuka and Denton 2012, and Meshram et al. 2013). Genetic advance as percent mean (GAM) was classified in three categories viz., low (0–10%), moderate (10–20%), and high (≥ 20%) by Johnson et al. (1955) and Falconer and Mackay (1996). GAM ranges from 10.77% (DA) to 76.18% (ASI). In all the studied traits estimates of GAM were high except DA and DS where moderate GAM was observed. High heritability coupled with high genetic advance is considered the best condition for selection which is reported in the case of ASI, FLL, FLW, PH, E/P, EL, ED, K/R, and TW. This situation also elucidates the presence of additive gene effect in the governance of respective traits henceforth it offers reliable maize improvement through the selection of these traits. Our findings are in close agreement with the finding of Mahmood et al. (2004); Peiffer et al. (2014); Bekele and Rao (2014); Kinfe and Tsehaye (2015); Rahman et al. (2015). Bello et al. (2012) have recorded high heritability along with high genetic advance for PH, NBE. The expected genetic advance that was moderate for anthesis and silking duration may be compensated for by their moderate and high heritability estimates, respectively. The Lower GAM for DA and DS was also recorded by Mahmood et al. (2004) and Ogunniyan and Olakojo 2014.

Table 2 Statistical parameters of fourteen agro-morphological traits in teosinte derived maize population

Molecular diversity

Polymorphism and allelic diversity of microsatellite markers

Molecular profiling with 76 polymorphic markers microsatellites loci resulted in the detection of 377 alleles with an average of 5 alleles per locus (Table 3). The number of alleles per locus ranges from 2 to 8. A maximum 8 alleles were reported in the case of phi10918 followed by 7 in the case of umc1988, bnlg1144, phi121, umc1279, and umc234, and a minimum of 2 alleles were reported in the case of bnlg197. The estimates obtained in our experiment varies from previous molecular studies based on SSRs markers on maize inbred lines (van Inghelandt et al. 2010; Yang et al. 2011; Wasala and Prasanna 2013; Nikhou and Ebrahimi 2013; Li et al. 2014; Lanes et al. 2014; Sserumaga et al. 2014; Abdel-Rahman et al. 2016; Vega-Alvarez et al. 2017). The alleles detected in the present study is lesser than 675 alleles (Yang et al. 2011), 471 alleles (Lanes et al. 2014), and 649 alleles (Vega-Alvarez et al. 2017), but much higher than 104 SSR (Legesse et al. 2007), 145 alleles (Xiao et al. 2017), 48 alleles (Maniruzzaman et al. 2018), 191 alleles (Shayanowako et al. 2018) and 288 alleles (Adu et al. 2019a, b). The possible reasons for differences in the number of alleles among the present study and previous studies could be the targeted genetic materials as well as the number of markers and polymorphism detection methodologies employed. The average alleles recorded in the present study are higher than earlier findings of Legesse et al. (2007), Wietholter (2008), Wasala and Prasanna (2013), and Li et al. (2014) they observed 3.85, 2.7, 3.85 and 2.45 mean alleles, respectively. Similarly, molecular diversity analysis of 27 maize inbred lines based on 10 SSR markers resulted in 23 polymorphic alleles with an average of 2.3 alleles per locus (Abdel-Rahman et al. 2016). According to Nei’s 1973 the gene, diversity ranged from 0.30 to 0.50 with a mean of 0.48. In almost all the markers gene diversity is towards the higher side > 0.40 except one marker bnlg197 which depicts a minimum of 0.30 gene diversity. The mean gene diversity observed in this study was higher than 0.22 reported by Adu et al. (2019a, b), whereas, less than 0.65 recorded by Sserumaga et al. (2014). It has been observed that markers with a greater number of alleles exhibit more gene diversity.

Table 3 Allelic variations revealed by SSR marker in teosinte derived maize population

The product length varies from 80 bp in case of umc1622, bnlg197, bnlg389, umc1215, umc1546, umc1428, umc2635 and umc1673 to 600 bp umc2392. Wang et al. (2013) detected fragment sizes of 206–299 bp in the case of SSR markers, whereas, a wider range of 62–230 bp was recorded by Senior et al. (1998). The product length base pair difference is ranges from 10 bp (umc1622, y1SS, umc1215 phi070, phi089, umc2635, phi121, umc1279, phi067, umc1152, phi054 to 400 bp (umc2392). The major allele frequency (MAF) of the 76 SSR markers averaged 0.58 per marker with a range from 0.50 (bnlg1144, umc1869, phi10918, umc1171, umc2307, bnlg1600, bnlg1371, umc1127, umc1304, umc1673, umc1152, umc1053, bnlg1250) to 0.82 (bnlg197). Whereas, minor allele frequency (MAF) was ranged from 0.50 to 0.20 with an average of 0.42. Our results agreed well with the earlier findings based on SSR marker in maize inbred lines (Sserumaga et al. 2014; Adu et al. 2019a, b). Adu et al. (2019a, b) recorded a range of MAF from 0.50 to 0.99 and a range of MAF from 0.50 and 0.01. In the case of most of the SSR markers Maf > MAF, whereas, 17% markers showed almost equal allele frequencies (MaF = MAF = 0.5) for the two alternative forms. The heterozygosity (H) values ranged from 0.55 (umc1245) to 1.00 (bnlg1144, umc1171, bnlg1600, umc1127), with an average of 0.58. The heterozygosity recorded in the present experiment is very high compared to previous studies (Sserumaga et al. 2014; Adu et al. 2019a, b). With the mean 0.07 and range of heterozygosity from 0.00 to 0.20 was reported by Adu et al. (2019a, b) and a similar range of 0.00 to 0.20 was recorded by Sserumaga et al. (2014). Relatively moderate to high H values were observed in the case of most of the markers as well as the high number of major allele frequency indicating a high level of genetic diversity in the studied population.

PIC content is also known as the power of discrimination of marker, which means how strongly a marker can differential individuals based on number and distribution of allele at a respective marker locus. With an average of 0.64, PIC is varied from 0.29 (bnlg197) to 0.86 for bnlg615 and umc1726, wider range reflects higher allelic variation in marker loci and wide distribution in the teosinte derived maize population. The mean PIC value in the present experiment is quite higher than 0.59 (Senior et al. 1998). These PIC estimates are in close agreement with the finding of Shehata et al. 2009, Sserumaga et al. 2014, Gazal et al. 2016, Adu et al. 2019a, b. PIC values, ranging from 0.32 to 0.85 with a mean value of 0.68 were recorded by Adu et al. 2019a, b, similarly, an average PIC of 0.61 was reported by Sserumaga et al., 2014. Since most of the markers exhibit a high PIC (0.60 to 0.86) value, it reflects the suitability of the SSR marker for genetic diversity and relationship studies due to strong discriminatory power. According to the PIC guideline of Botstein et al. (1980), all the SSR markers except 3 (umc1988, umc1245, bnlg197) were highly informative as they showed PIC > 0.5.

Clustering analysis and grouping

The neighbor-joining unweighted pair group method with arithmetic averages (UPGMA) cluster analysis at 0.45 Jaccard similarity coefficient grouped the 102-teosinte derived maize lines including both the parents into 14 clusters based on 76 SSR markers (Table 4, Fig. 1). Clustering in a large number of groups indicates the presence of a sufficient amount of genetic diversity among the teosinte derived maize lines in terms of molecular makeup. The number of clusters observed in teosinte derived maize population is quite higher than the clusters recorded in other studies targeting maize germplasm (Enoki et al. 2002; Patto et al. 2004; Adu et al. 2019a, b). Genetic dissimilarity was varied from 0.327 to 0.784. Genotypes MT-18 and MT-32 were the most similar (67.4%) due to the minimum dissimilarity value among them and belong to cluster 14. The most diverse lines were teosinte (cluster 1) and MT-36 (cluster 13) due to maximum dissimilarity value i.e., 0.784. The distribution of lines in 14 clusters was not homogeneous. The majority of lines 49 (~ 48%) were grouped in cluster 14 followed by 17 (~ 17%) in cluster 10. Cluster 12 is comprised of 7 lines, whereas, both Cluster 7 and 13 consists of 6 lines each. In cluster 11, 4 lines were categorized; each cluster 6 and 9 composed of 3 lines, whereas, 2 lines were grouped in cluster 4. Minimum 1 line was present in cluster number 1, 2, 3, 5, and 8. Teosinte was clustered independently from maize as well as from teosinte derived maize line due to one backcrossing with maize parent the derived lines are more towards maize parent. The reason for independent clustering of teosinte is profound differences in plant and inflorescence architecture among teosinte and maize (Iltis 2000). Intensive investigations were carried out for revealing genertic basis of morphological distinction between maize and teosinte. Beadle (1939) estimated that there are four or five major loci involved in maize domestication. QTL mapping revealed the complex nature of genetic architecture and involvement of many loci in trait differences (Doebley et al. 1992), a few of which have been mapped to the underlying genes and cloned such as teosinte glume architecture1 (tga1) (Dorweiler et al. 1993), teosinte branched1 (tb1) (Doebley et al. 1995) and grassy tiller1 (gt1) (Whipple et al. 2011). The tb1 gene is responsible for suppression of axillary bud growth on the main stem and female inflorescence development in the maize (Doebley 2004). Another gene which was selected during domestication was gt1 that leads to the limited number of large ears in maize in contrast to numerous small ears of teosinte (Wills et al., 2018). Another well-characterized maize domestication gene is tga1, responsible for naked kernels exposed on the ear in maize from kernels encased in a hardened fruitcase in teosinte. Both tb1 and gt1 genes are overexpressed in maize (Studer 2011; Wills 2013) whereas by change in single amino acid, change in protein structure and function was observed in case of tga1 in maize (Wang et al. 2005).

Table 4 Clustering patterns of 102 lines including both the parents’ viz., Maize (DI-103), Teosinte and 100 BC1F5 teosinte derived maize lines
Fig. 1
figure 1

Dendrogram produced by Jaccard’s similarity coefficient and the unweighted pair group method with arithmetic average (UPGMA) clustering method based on SSR data of 100 teosinte derived maize lines along with both the parents

To understand and confirm the dynamics of the derived population, we have further carried out principal coordinate analysis (PCoA). Based on the pairwise genetic distance matrix among the 102 lines (100 teosintes derived maize lines, parents), a clear distinction among the teosinte derived maize lines and teosinte could be visualized, and this was in concordance with the results of the biplot (Fig. 2).

Fig. 2
figure 2

Biplot representing distribution pattern of 100 teosinte derived maize lines

Molecular characterization

Linkage analysis using the SSR data revealed ten linkage groups in maize (Table 5, Fig. 3). Among 100 teosinte-derived maize lines, the maximum allelic contribution from maize parent was recorded in the case of MT-26 (65.2%) followed by MT-7 (64.6%) and MT-36 (64%), and the minimum of 27.6% was observed in the case of MT-63. The maximum contribution from teosinte parent was observed in the case of MT-19 (59.4%) followed by MT-99 (55.8%), whereas, least 21.3% was demonstrated in the case of MT-44 (Supplementary Table 1). Among the derived maize lines, with 14 heterozygous segments, MT-44 elucidates the maximum heterozygosity (37.5%) followed by MT-3 display 34.7% heterozygosity with 17 heterozygous segments. With 3 heterozygous segments line, MT-26 reveals the minimum heterozygosity (2.4%). Theoretically, the heterozygosity after each self pollination is reduced by one-half, decreasing to a low level after 5 generations of inbreeding (~ 3%). The heterozygosity observed in advanced inbred progeny, known as residual heterozygosity (RH), does not comply with the law of Mendelian segregation, in that excessive RH is observed in some genomic regions. In a maize nested association mapping (NAM) population including a set of 25 recombinant inbred line (RIL) populations that had undergone more than 5 cycles of selfing, a higher level of heterozygosity was observed in pericentromeric regions across all populations and chromosomes relative to telomeric regions, possibly resulting from the selective preservation of heterozygosity due to the pseudo-over-dominance of heterosis for yield QTLs in recombination-inhibited regions (McMullen et al. 2009; Eichten et al. 2011). In the present case, retaining high degree of heterozygosity in teosinte derived maize lines even after four generations of selfing may be due to linked genomic regions ‘pericentromeric regions and telomeric regions or due to diverse genomic assemblage followed by distorted segregation. During manual pollination, there is chance of cross contamination based increased in heterozygosity, however, utmost care was taken during the control pollination to avoid contamination. The maximum recombination (52%) was recorded in the case of MT-40, whereas, MT-81 expressed the least recombination (30%). Graphical representation (Supplementary Fig. 1a–j) indicates teosinte allelic introgression in derived maize line which is reflected in terms of morphological diversity as well. Similarly, Kumar et al. (2019) carried out graphical genotyping to know parental allelic introgression in five teosinte derived maize BC1F4 lines and have reported 34.1% to 53.4% teosinte and 34.1% to 54.5% maize allelic introgression. In addition to maize, various other researchers performed graphical representation of introgressed lines in other crops as well such as in potato (van Eck et al. 2017) and wheat (Riar et al. 2012; Todorovska et al. 2016; Nataraj et al. 2018). Genome size (GS) variation has been well documented in maize. Among the cultivated land-races and inbred lines, GS varies by at least 30% (Diez et al. 2013). GS values varied widely among individuals, from 0.948 for an individual from the landrace ‘Palomero Legitimo’ (MEXI211) to 1.299 for an individual from the landrace ‘Olote colorado’ (OAXA522). The intra-specific genome size variation is largely caused by differences in the amount of heterochromatin (McClintock 1978). The average GS value per plant was 1.111, but these averages varied between wild and cultivated samples; the average GS of teosintes (1.129) was significantly larger (P < 0.001, Kruskal–Wallis test) than the average GS of cultivated maize (1.095). Scientific report state that maize and teosinte genomes vary both in gene content (Swanson-Wagner et al. 2010) and TE complement (Wang and Dooner 2006). The most prominent cause of genome shrinkage are illegitimate recombination, transposon derived unequal homologous recombination and double strand break repair (Schubert and Vu 2016), thereby promoting evolutionary novelties and reproductive barriers, hence evolution of new species (Pellicer et al. 2018). The bigger genomes tend to have more genes, more and longer introns, and more transposable elements than organisms with smaller genomes. Therefore there is possibility of recovery of those genes that were lost during course of domestication through teosinte allelic introgression into maize.

Table 5 Allelic contributions of teosinte derived maize lines
Fig. 3
figure 3

Linkage group wise mean graphical representation of allelic position in 100 Teosinte derived maize lines: (-) = Absent allele, a = Maize allele, b = Teosinte allele, (H) = Heterozygous allele

These teosinte introgressed maize lines are superior to maize in many aspects such as flowering time and behaviors, ear numbers, test weight and yield. Therefore these lines could be an excellent material for researchers and it offers researchers new opportunities to undertake complementary multi-location, multi-year trials for yield and agronomic performance, response to abiotic and biotic stresses, and quality traits important to the maize community. There is huge possibility of identification of lines with desirable traits in which breeder is interested upon such as resistance to various diseases, insects, drought tolerance, nitrogen fixation and improved protein content as teosinte reported to posses these traits (Niazi et al. 2014; Bernal et al. 2015; Kumar et al. 2020; Sahoo et al. 2021; Joshi et al. 2021a).

Conclusion

The presence of a wider range between the minimum and the maximum values and higher estimates for variability parameters for all the studied traits assures the existence of sufficient variation among derived maize lines and potential material for future maize breeding programs. The large number of alleles produced in this study (377), with an average of 5 per primer along with a high PIC value indicates that this system is a reliable and powerful tool to evaluate genetic polymorphisms and relationships among genotypes. High Allelic variation revealed by SSR marker in teosinte derived maize population and genetic dissimilarity obtained among lines revealed the introgression of substantial genetic variability in maize by teosinte. Cluster analysis allocated the 102 lines including (100 teosinte-derived maize lines, maize (DI-103), and teosinte into 14 genetic groups. The larger number of clusters indicates uniqueness among lines in terms of molecular makeup. Genotypes clustered together are less diverse therefore possibility of getting desirable recombinant is more by crossing between genotypes that belong to different clusters. Graphical genotype displayed a greater extent of teosinte allelic introgression in maize lines which leads to wider variation in terms of morphological traits as well. Such variation in the maize germplasm provides a better opportunity for breeders to improve traits of interest through parent selection, hybridization, and recombination of desirable genotypes.