Introduction

Soybean (Glycine max L.) is an annual, self-pollinated species with a genome size of 1115 Mpb (Schmutz et al. 2010). The specie is believed to have originated from wild soybean Glycine soja, considering that both have 20 chromosomes (2n = 40), hybridize easily, exhibit normal meiotic chromosome pairing, and generate viable fertile hybrids (Kim et al. 2010). The exact region of origin of soybean is still unknown, but southern China, the Yellow River valley of central China, northeastern China, and several other regions are all candidate sources because G. soja grows naturally in far eastern Russia, China, Korea and Japan (Carter et al. 2004).

Glycine max is generally considered to have been domesticated from its wild relative (G. soja) 6000–9000 years ago in China (Carter et al. 2004) and may have been introduced to Korea, and then to Japan approximately 2000 years ago, to North America in 1765, and to Central and South America during the first half of the last century. In this process of domestication and selection, a severe genetic bottleneck during soybean domestication was also found in several independent analyses (Xu et al. 2002; Hyten et al. 2006). There is supporting evidence for both single and multiple domestication events (Hymowitz and Kaizuma 1981; Gai et al. 2000; Xu and Gai 2003), which has been accompanied by a reduction in genetic diversity, as well as loss of useful traits reserved in wild relatives. This reduction of genetic diversity is common in crops have been subjected to strong selective pressure directed at genes controlling traits of agronomic importance during their domestication and subsequent episodes of selective breeding (i.e.: Maize-Vigouroux et al. 2002).

The largest resource of soybean germplasm is the Asian landraces of G. max that are the most immediate result of domestication (Hyten et al. 2007). Selection, hybridization and breeding from these landraces have resulted in the release of improved cultivars in north American-USA (Gizlice et al. 1994). These first cultivars developed in USA were introduced and planted in Brazil during the 1960s and 1970s. With the growing importance of soybean, breeders began crossing these cultivars among themselves and with other sources, obtaining the first Brazilian cultivars, such as Industrial, Santa Rosa and Campos Gerais (Hiromoto and Vello 1986). Thus, the current Brazilian soybean germplasm pool, as defined by Hiromoto and Vello (1986), is the result of several cycles of selection and effective recombination among a relatively small number of selections from the USA cultivars.

The frequent selection, admixed population, and the crossing of a small number of cultivars in the Brazilian soybean breeding programs can lead to a reduction in genetic diversity and affect the patterns of linkage disequilibrium (LD). At the moment, few genetic studies have determined the patterns of LD in tropical soybean genotypes. Priolli et al. (2014), using 142 SSR markers and 94 accessions (cultivated and breeding material) obtained of EMBRAPA soybean and USP/ESALQ germplasm that represent soybean breeding lines of public and private institutions, suggest a structure of LD across the soybean genome (LD decay) of approximately 12 cM. In self-pollinated species, as well as soybean, where recombination is less effective than in outcrossing species, LD declines more slowly (Flint-Garcia et al. 2003). Nonetheless, the germplasm that makes up the collection plays a key role in LD variation because the extent of LD is influenced by the level of genetic variation captured by the target population (Soto-Cerda et al. 2013). In soybean, a highly variable pattern of LD has been reported in multiple populations, with variability at different genomic regions (Hyten et al. 2007). In fact, due to the highly variable levels of LD decay in the Landraces and the Elite Cultivars reported for soybean (Hyten et al. 2007; Zhou et al. 2015) and the demands of dense marker sets, it is necessary to determine the LD in tropical soybean cultivars of Brazil that represent the range of photoperiod/temperature latitudinal adaptation as defined by a maturity group (MG) Roman numeral designation.

Most of the process observed in population genetics, as well as domestication, selection, founding events and population subdivision can affect LD decay, however, population structure (admixture) and the mating system of the species (selfing versus outcrossing) can strongly influence patterns of LD (Flint-Garcia et al. 2003). It is known that pairwise LD increases with selfing and can extend very far in highly selfed organisms (Nordborg 2000). For this reason, assume that individuals in a sample are either fully outcrossing may result in spurious inference of population structure in partially selfing populations, as suggested by Falush et al. (2003). To correct spurious evidence for admixture in the presence of partial self-fertilization, Gao et al. (2007) implement a model to accommodate partial selfing and correct the inference of population structure in self-pollinating species as soybean. On the other hand, predict LD decay based on the present-day mating system must be cautious, because the mating system may have changed significantly (Flint-Garcia et al. 2003). For example, G. max and its ancestor, G. soja, differ significantly in their outcrossing rates. The self-pollinating G. max has an outcrossing rate of approximately 1%, whereas G. soja outcrosses at an average rate of 13% (Fujita et al. 1997). The greater amount of outcrossing in G. soja increases the effective recombination rate, leading to the prediction of an 11-fold lower extent of LD in G. soja as compared to G. max (Flint-Garcia et al. 2003).

In this study we genotyped 169 tropical soybean genotypes using high throughput genotyping with SNPs markers. The overall goal was to analyze linkage disequilibrium blocks in a collection of tropical soybean genotypes of Brazil. Our specific goals were: (1) to estimate population structure and assess population relatedness; (2) and to detect the patterns of LD blocks.

Materials and methods

Plant material

A total of 169 cultivars of soybean with commercial use in Brazil were used for genotyping (Table S1). These cultivars represent the core cultivars used for Brazilian farmers from 1990s to 2010s, and some of these were important progenitors in soybean breeding program of Brazil. Additionally, these cultivars were chosen to represent a range of materials developed for the Brazilian production area and representing the range of photoperiod/temperature latitudinal adaptation as defined by a maturity group (MG) Roman numeral designation (Table S1).

DNA extraction and SNPs genotyping

Genomic DNA was extracted from leaf tissues collected from a mix of ten plants of each accession. DNA-easy Plant Kit (Qiagene) was used to DNA extraction. A total of 6000 single nucleotide polymorphism (SNP) was genotyped in the 169 cultivars with an Infinium iSelect HD Custom Genotyping BARCSoySNP6K (Illumina Inc., San Diego, CA, USA) on the Illumina iScan platform. Genotyping was conducted by Deoxi Biotechnology Ltda®, in Araçatuba, Sao Paulo, Brazil. After eliminating: redundant, non-polymorphic SNPs and SNPs with heterozygous alleles considered as missing data, a total of 4949 SNPs remained. In addition, markers with MAF < 0.1 were removed from the genotype data set, leaving 3780 SNPs for the population structure, coancestry and LD analysis.

Linkage disequilibrium

Linkage disequilibrium parameter (r2) for estimating the degree of LD between pair-wise SNPs was calculated using the software TASSEL4.0 for each chromosomal and LD decay graph was plotted with physical distance (Mbp) versus r2 for all intra-chromosomal comparison using nonlinear regression as described by Remington et al. (2001). The expected value of r2 was estimated according to the following equation:

$$E(r^{2} ) = \left[ {\frac{10 + C}{(2 + C)(11 + C)}} \right]\left[ {1 + \frac{{(3 + C)(12 + 12C + C^{2} )}}{n(2 + C)(11 + C)}} \right]$$

where r2 is the squared correlation coefficient, n is the sample size, and C is a model coefficient for the distance variable (Hill and Weir 1988). The LD decay curve was fitted to predicted r2 values between adjacent markers using the model of Hill and Weir (1988). This model was implemented to determine LD decay as a function of the distance using the ‘nlm’ function in R. To determine the baseline r2 values, a critical value of LD decay was calculated to 50% of its initial value according to Mamidi et al. (2011) and Wen et al. (2015).

Linkage disequilibrium blocks analysis

The pairwise estimates D’ and r2 were calculated by chromosome. LD blocks were estimated by Solid Spine of LD using the software Haploview 4.2 (Barrett et al. 2005). This internally developed method of Haploview searches for a “spine” of strong LD running from one marker to another along the legs of the triangle in the LD chart. A cutoff of 1% was used, meaning that if addition of a SNP to a block resulted in a recombinant allele at a frequency exceeding 1%, then that SNP was not included in the block.

Population structure

Population structure and inbreeding coefficients at population level were estimated under the Markov Chain Monte Carlo (MCMC) algorithm for the generalized Bayesian clustering method implemented in InStruct software (Gao et al. 2007). This method does not assume Hardy–Weinberg equilibrium within loci, and the expected genotype frequencies are estimated based on rates of inbreeding or selfing.

For infer population structure and population selfing rates in soybean, we performed the function (mode) two of InStruct software (Gao et al. 2007). In fact, we implemented one independent run of MCMC sampling for numbers of groups (K parameter) varying from 2 to 10, without prior population information, and burn-in of 5000 with run length periods of 50000 iterations. The best estimate of number of K groups was determined according to the lowest value of Deviance Information Criterion (DIC) among the nine K simulated (Gao et al. 2007). The hierarchical F statistics were used to estimate proportion of genetic variance explained by MG class and company of origin of soybean using ancestry estimates for K = 9 and calculated using the hierfstat R package (Goudet 2005).

Molecular coancestry

Strong relatedness among families, subpopulations and populations can potentially cause spurious association when it is not considered in association mapping model. Relatedness between subpopulations was estimated using Reynolds genetic distance (ϴ), which is given by ϴij = −ln (1 − Fst) for subpopulations i and j (Reynolds et al. 1983), where Fst corresponds to genetic differentiation among subpopulations. Pairwise molecular coancestry between the nine subpopulations of tropical soybean obtained previously with InStruct software was performed in the software Arlequin 3.5 (Excoffier and Lischer 2010) using a total of 3780 SNPs markers.

Results

Tropical soybean genotyping

A high coverage of the tropical soybean genome was obtained with the BARCSoySNP6K. In mean 247.5 SNPs markers were found by chromosome, with variation from 198 (chromosome 1) to 323 (chromosome 8). For each chromosome was estimated the ratio between the number of SNPs and the length of each chromosome measured in cM. On average, was found one SNP marker every 0.48 cM, ranging from 0.33 cM (chromosome 4) to 0.60 cM (chromosome 17) by SNP (Table 1). The most marker coverage was found for chromosome 8 that had 323 markers with an average marker density of 0.49 cM. In contrast, the chromosome 1 had the least number of SNPs markers which is equal to 198, with an average marker density of 0.55 cM. This demonstrates that Illumina Infinium platform of genotyping identified SNPs that were well distributed throughout the tropical soybean genome.

Table 1 Distribution of SNPs markers and linkage disequilibrium blocks in the 20 chromosomes (Chr) in cultivars of tropical soybean

Some loci were found in heterozygosity. The percentage of heterozygosity was ranging from 3% (BRSMT Crixás, CD 205, P98Y70 and Celeste) to 41% (BMX Titan RR), with a mean of 9% among the 169 cultivars. Seventy-six percent of the cultivars (129) had fewer than 10% of heterozygosity; 10% (33) was between 10 and 30% and 4% had more than 35% of heterozygosity (Fig. 1).

Fig. 1
figure 1

Frequency of observed heterozygosity in 169 cultivars of soybean, using 4949 SNPs markers

Population structure and molecular coancestry

The genetic structure of the 169 tropical soybean cultivars was estimated using a bayesian clustering approach to infer the number of strongly differentiated genetic subpopulations. According to the lowest DIC value obtained of the posterior bayesian clustering analysis implemented in InStruct, the most probable number of subpopulations was 9 (Fig. 2; Fig. S1). Fst values indicate that 43% of all genotypes present more than 50% of membership to their respective groups. Each subpopulation (K = 9) contained admixed cultivars that come from different soybean genetic breeding programs of Brazil (Table 2). Among the nine subpopulations, none had individuals exclusively from one company or maturity group (Tables 3, 4; Table S1). In fact, this result confirms the shared genetic base among the public and private breeding programs of soybean in Brazil.

Fig. 2
figure 2

Bar plot of the estimated population structure of 169 cultivars of soybean (k = 9). The y-axis is the subgroup membership, and the x-axis is the genotype. The groups go from G1 to G9 from left to right

Table 2 Sub-population structure with number of cultivar and selfing rates by group obtained for 169 cultivars of tropical soybean
Table 3 Distribution of cultivars in each subgroup based on population structure and maturity groups of improved soybean tropical lines
Table 4 Distribution of cultivars in each subgroup based on population structure and companies of improved soybean tropical lines

Based on the alleles of 4949 SNPs markers, and considering the nine subpopulations obtained with InStruct, the average molecular coancestry among the pairwise subpopulation comparisons was 0.234 in the tropical soybean collection as a whole. Approximately 60% of the pairwise coancestry estimates were lower than 0.23 (Mean = 0.196), 30% ranged from 0.24 to 0.3 (Mean = 0.264), and 10% was higher than 0.31 (Mean = 0.332) (Fig. 3). According to the pairwise coancestry estimates most cultivars had moderate relatedness among subpopulations of tropical soybean collection.

Fig. 3
figure 3

Global pairwise molecular coancestry estimates of the 169 tropical soybean cultivars that represent nine subpopulations of Brazil

LD blocks analysis and LD-decay by chromosome

The SNPs with MAF > 0.1 distributed over the soybean genome (3780) has permitted to identify 941 linkage disequilibrium blocks in the tropical soybean material, with 3086 SNPs constituting the haplotype LD block (62% from total SNPs) (Table 1). In mean, the number of blocks by chromosome was 47.05, ranging from 32 (chromosome 1) to 74(chromosome 18) (Table 1). The quantity of SNP in linkage disequilibrium in each block ranged between 2 and 9, with an average of 2.69 SNPs per block. Among the blocks in LD, 64% presented two or three markers, and less than 3% presented seven or more SNPs (Fig. S2). The length of the blocks was very similar by chromosome, and most of these were represented among 51–500 kb. Length blocks larger than 500 kb was not found or was in a very low proportion. There was no relationship between the number of SNPs markers and the increase in LD blocks, indicating that these blocks are randomly localized in the genome. The average length of blocks was 252.4 kb, ranging among 1 (chromosome 4) to 499 kb (chromosome 11). More than 70% of LD blocks showed a length lower than 200 kb (Fig. S3). The sums of the lengths for LD blocks were 237,535 kb, and represents 20% of soybean genome, which have a length of 1.1 gb.

To understand the specific LD block patterns in cultivated tropical soybean, we used Haploview to carry out an LD analysis. It was performed using 3780 SNPs and a nonlinear regression model was used to estimate the LD decay. LD decays reaching at r2 value of 0.2 after 8.5 Mb (Fig. 4).

Fig. 4
figure 4

LD decay among 169 cultivars of tropical soybean

Discussion

SNP genotyping BARCSoySNP6K is a promising method for characterizing soybean genetic diversity and linkage disequilibrium, and for constructing high resolution linkage maps to improve the soybean whole genome sequence assembly (Song et al. 2013). In soybean, Illumina Infinium assay provides a significantly higher level of SNP genotyping capacity respect to Illumina GoldenGate platform, and recent Infinium assays with successful allele calls for nearly 50 k SNPs has been reported by Song et al. (2013). Also, several SNP based marker assays has been developed and validated in soybean, which includes Axiom SoyaSNP array for ~180,000 SNPs (Lee et al. 2015a, b) and the NJAU 355K SoySNP array (Wang et al. 2016). Our SNP genotyping is the first application of Infinium BARCSoySNP6K for tropical soybean. This resulting dataset demonstrate the low to moderate coverage of tropical soybean genome by using this 6 k SNP assay and will assist in the application of genome-wide association studies and high-resolution genetic linkage maps of important traits. At the moment, this assay is being applied to carry out genome-wide association study in important agronomic traits of soybean (Akond et al. 2013; Lee et al. 2015a, b; Contreras-Soto et al. 2017).

Our SNP genotyping represent the first study of soybean genome in tropical cultivars. Our results showed a well distributed number of SNPs into the chromosomes and variable levels of heterozygosity among the 169 cultivars of soybean. In mean, we found 9% of heterozygosity among the cultivars, which may be considered moderately high respect to other studies in soybean (Hyten et al. 2010). In addition it represents an important source of genetic diversity and adaptive evolution.

Some breeding methods for soybean, as single seed descent or back-cross strategies, impose selection on plants that maintain variable levels of heterozygosity during the early generations of the breeding cycle. Haun et al. (2011) following the single seed descent generations in soybean, demonstrate that heterozygous loci may segregate, resulting in genetic heterozygosity within released accessions. Genetic theory predicts, on average, a halving of heterozygous loci with every self-pollination following a given cross. However, heterozygosity may be retained at higher rates if loci confer desirable and selectable phenotypes (Gore et al. 2009), as the case of continuous selections of soybean cultivars in Brazil for different traits.

The varieties used in this study went through dozens of generations of self-fertilization, and each individual plant is very close to, or reaches to 100% homozygous. Heterozygosity, in this case, means a mix of different homozygous genotypes for the locus that was heterozygous when single plants was selected to produce the breeding line, and after that, to advance as a new commercial variety. The level of heterozygosity also depends on the method used to produce the genetic seeds of the new varieties. The pure line method will result in lower level of heterozygosity than bulk method. Our result reveals high levels of heterozygosity in some cultivars of tropical soybean (BMX Titan RR = 41%; NIDERA A 6411 RR = 37%), and it may be useful to promote genetic variability among the genetic base of soybean in Brazil. In fact, a recent study using phenotypic and molecular data (SSR markers) verified the existence of genetic variability among RR soybean® cultivars in public and private soybean breeding companies of Brazil (Villela et al. 2014).

According to DIC value, our population structure analysis supported the existence of nine subpopulations that come from different genetic breeding programs of Brazil. Nearly half of these were considered admixed because the degree of membership within a subpopulation was <0.5. Although 169 cultivars were used in this study, we were only able to obtain the pedigrees for 89 cultivars (Table S1). Due to the Variety Protection Act from 1997 in Brazil, many breeders have not made public the pedigrees of released cultivars, especially recently released varieties. However, our result reveals the existence of shared genetic base among the public and private breeding programs of soybean in Brazil, and showed the high genetic relationship that exist among the commercial cultivars. The same Variety Protection Act from 1997 in Brazil has a clause called “breeders right”. The breeders in Brazil have the right to use, for crossing, any commercial varieties, regardless of whether it is protected or not, and where it originates. This allows the sharing of germplasm between breeding programs.

A previous study conducted by Hiramoto and Vello (1986) indicate that Brazilian soybean ancestors have a narrow genetic base, with only four ancestors (CNS, S-100, Roanoke and Tokyo), that represent approximately 48% of the overall genetic base. Wysmierski and Vello (2013), evaluating 444 cultivars available in the database for the National Cultivar Registry from the Ministry of Agriculture, Livestock and Food Supply of Brazil, showed an increasing in the number of ancestors over time (1971–2009); however the same four main ancestors contribute more than half (55.3%) to the genetic base in soybean and were the same over 1971–2009, showing an increasing on the cumulative relative genetic contribution of ancestors from 46.6 to 57.6%, indicating that the genetic base of Brazilian soybean is still narrow, despite the incorporation of new ancestors.

Company origin and MG may be the principal determinants of population structure within the soybean germplasm collection (Tables 3, 4), however as the genetic base and origin of improved tropical lines are common it’s difficult to explain it. Soybeans are classified into 13 unique MG from very early to very late (000, 00, 0, I, II, III, IV, V, VI, VII, VIII, IX and X), based on temperature and photoperiod response to latitude. Our collection is represented by MG IV to IX and showed admixture population structure among the nine groups. Bandillo et al. (2015) evaluating a diverse soybean MG from the USDA Germplasm Resources Information Network (GRIN) database, reported that near two-thirds of the accessions in the USDA soybean germplasm collection are admixed. Specifically, more than 90% of accessions from America and Europe are admixed. Probably it helps to confirm the admixed genetic structure nature of tropical soybean which has been developed from individuals that have a narrow genetic base of United States. In fact, previous studies demonstrate that the top five ancestors of Brazilian germplasm are the exact same top five ancestors for the soybean genetic base of the southern United States (Wysmierski and Vello 2013).

Simultaneously, as the proportion of individuals for each company and MG within each of the nine subpopulations was not equals, it indicate different degrees of allelic diversity across populations, similar with the results reported by Bandillo et al. (2015) for the USDA soybean germplasm. As expected, individuals of each company of tropical soybean mostly were admixed in all subpopulations as a whole (Table 4). Bandillo et al. (2015), indicates that the analysis of this result is complicated by the fact that ancestors of American soybean, the origin of most of the tropical soybean germplasm (Hiromoto and Vello 1986), contributed at different pedigree levels, coupled with the fact that the American soybean germplasm resulted from a severe population bottleneck when soybeans were introduced to North America (Gizlice et al. 1994) and consequently to Brazil (Hiromoto and Vello 1986). In consequence, company of origin and MG should be explaining a small genetic variation of tropical soybean.

Hierarchical F statistics, calculated using ancestry estimates for K = 9, showed that genetic differentiation explained by MG (~5%) was higher than that explained by Companies (~3%). Similar values of genetic differentiation for MG (MG 000 to X) using ancestry estimates for K = 5 has been reported by Bandillo et al. (2015) for the USDA-GRIN soybean collection. Although the amount of total variation explained is small, these results suggest that population structure in the germplasm collection of Brazil is driven more by MG than companies of origin of soybean cultivars.

At the moment, no information exists about the LD decay in improved tropical soybean lines adapted to Brazil. In addition, most of the studies conducted in soybean, has been used accessions from the U.S. Department of Agriculture (USDA) Soybean Germplasm Resources Information Network (GRIN) database (www.arsgrin.gov). In comparison with the GRIN soybean germplasm resource, with similar MG (Wen et al. 2015; Vuong et al. 2015) our improved tropical soybean showed a higher LD decay (Fig. 4). The difference of LD patterns may be attributed to low genome coverage of markers and fewer genotypes used at the present study. Consequently, as suggested by Song et al. (2015) for soybean, most of the studies conducted for LD evaluation have been limited in terms of sample size and/or the number of loci analyzed, in fact, probably is necessary to evaluate the germplasm of tropical soybean with a greater number of markers.

We found that LD declined below 0.2 at ~8.5 Mb (Fig. 4). In improved cultivars that represent public and private breeding programs for the north central of the United States (MG 0 and early I), LD declined below 0.1 at 7.0 Mb, 5.9 Mb and 8 Mb in studies conducted in the years 2005, 2006 and 2013, respectively (Mamidi et al. 2011, 2014). In Elite cultivars of a single breeding program of Canada LD dropped below 0.1 at ~2.8 Mb (Bastien et al. 2014). Hyten et al. (2007), reported a declined LD decay to 0.1 at 574 kb in north American Elite Cultivars. In fact, highly variable pattern of LD have been reported in multiple soybean populations, and photoperiod sensitivity (maturity) has been proposed how a factor that may have contributed to increase LD in soybean, because their effect resulted in population subdivision in elite soybean cultivars (Hyten et al. 2007). Bastien et al. (2014), suggest that their results of less extensive LD is likely a reflection of the broader scope of the genotypes as it comprised genetically-modified, conventional, and food-type soybeans belonging to Maturity Groups 000 to II. In contrast, our tropical soybean collection showed high relationship among them, and this maybe explains our more extensive LD decay respect to others studies conducted with germplasm of soybean. It is not surprising to find high levels of LD in cultivars with high genetic relationship. In fact, the stringent cleistogamy and relatively long generation time of soybeans suggested that there would be high LD in the soybean genome (Lam et al. 2010).

It is known that LD increases with selfing and can extend very far in highly selfed organisms (Nordborg 2000). Nordborg and Donnelly (1997) showed that the degree of selfing that a species exhibits is related to effective recombination rate. This is because recombination is less effective in selfing species where individuals are more likely to be homozygous at a given locus than in outcrossing species. In the current study, tropical soybean cultivars showed selfing rates equal to s = 0.966 (data not shown). This relationship between recombination rate and selfing can extend to LD, because effective recombination is reduced severely in highly selfing species, as soybean, and consequently LD will be more extensive.

Cultivars contain specific sequence blocks in their chromosomes, which may be associated with artificially selected phenotypic variations from many generations of breeding (Kim et al. 2014; Song et al. 2015). The current study identified an extensive LD, with a set of 941 LD blocks, with most of the SNPs (3086 or 62% from total SNPs number) constituting the haplotype LD blocks. Song et al. (2015) recently provided the first high-resolution haplotype maps based on the largest sample size and the largest number of loci reported in soybean thus far, and they identified that the extent of LD and the average haplotype block sizes were the greater in the north American cultivar population, respect to wild and landraces populations. Our results were similar with the result reported for north American cultivars, and probably this corroborate that the extensive use of a small number of elite genotypes in Brazilian breeding program further reduces genetic variability. In fact, domestication and artificial selection have led to extensive LD and haplotype structure.

This study provides the first comprehensive sequencing data of tropical soybean genome and explored approximately 20% of soybean genome, considering that the sum of lengths for LD blocks were 237,535 kb. According to Schmutz et al. (2010), the soybean genome has about 1.1–1.15 gb, which means that this study was used one marker by 222 kb for evaluate the LD decay and LD haplotype blocks in tropical soybean. Our results showed small differences in length and number of LD blocks and demonstrate that the frequency of occurrence of LD blocks of lengths <500 kb is predominant in cultivated soybean of Brazil. Lam et al. (2010) reported that the frequency of occurrence of LD blocks of lengths <20 kb was higher in wild soybeans than in cultivated soybeans, and indicate that LD blocks of wild soybeans was about half that of cultivated soybeans. In fact, the genetic material used in this study maybe supported the relatively long LD blocks reported here.

Our results of high genetic relatedness and population structure in cultivars of tropical soybean, demonstrate that the nature of soybean fertilization, which results in high inbreeding and thus a reduction in recombination, may have promoted low genome diversity in the tropical soybean and high LD. According to Lam et al. (2010) the presence of high LD in the soybean genome indicates that soybeans would serve as a good model for studying the genomes of crops with extreme LD. Additionally, the information provided by the present study about population structure, genetic relatedness and LD haplotype block location and distribution for cultivated soybean genome, can facilitate the identification of genes of interest. For breeding applications, our identification of the high LD nature in tropical soybean genome indicates that marker-assisted breeding and association mapping studies are better choices for soybean improvement, whereas mapbased cloning using genetic populations will be challenging.