Introduction

Pedigree information can provide relevant data for breeders to make pertinent decisions as it allows a maintenance of high levels of variability in the progeny, an increase of the heterosis effect, and a reduction of the levels of inbreeding (Tarn et al. 1992). Luo et al. (2000) emphasized that information about parental genotypes is fundamental in distinguishing recombinant and parental genotypic classes for linkage analysis. For instance, pedigree information is used for multiple quantitative traits loci (QTL) mapping (Bink et al. 2002), association mapping (Simko et al. 2004; Malosetti et al. 2007; D’hoop et al. 2010; Baldwin et al. 2011) as well as for studies on extreme resistance inheritance (Song et al. 2005). Pedigree information is also highly relevant for genetic relationship estimations (Demeke et al. 1996; Sun et al. 2003; Braun and Wenzel 2004), some of which show that full siblings and half siblings were closely clustered, indicating that estimated relationships were connected to known pedigrees (Hosaka et al. 1994; Demeke et al. 1996; Isenegger et al. 2001).

In general, pedigree information is collected either by researchers, breeders, or breeding companies themselves. Today, a large amount of pedigree data on potato cultivars is available on a Web interface (Berloo et al. 2007). Although the data is continually updated, the database does not always provide the pedigree for some cultivars (i.e., orphans, old varieties, or unavailable information). The authors also stated that different sources of pedigree data may sometimes cause conflicting parental compositions or refer to certain cultivars under different names. Involuntary errors in pollination, seed harvest, or labeling may also generate errors. Berloo et al. (2007) thereby elucidated the need for reliable genetic tools in confirming existing pedigree data.

Since the 1980s, parentage analysis of plants has been widely expanded not only from the development of methods for assessing the parentage of individual progeny, but also from the increase in the diversity and availability of molecular markers. While allozymes (Ellstrand 1984; Smouse and Meagher 1994; Krauss 1994) as well as dominant molecular markers (RFLP, RAPD) (Lewis and Snow 1992; Milligan and McMurry 1993) and co-dominant molecular markers (Ashley and Dow 1994; Dow and Ashley 1996; Double et al. 1997; Buteler et al. 1997) have widely been applied to parentage studies in plants, it seems that simple sequence repeats (SSR) or microsatellites constitute the preferred markers to perform such parentage assessment research (Ashley 2010). This is particularly due to their high discriminatory power, co-dominant transmission, and reproducible properties (Tautz and Renz 1984; Tautz 1989; Li et al. 2002). Numerous original papers reported that these hyper-variable DNA sequences are an excellent source of genetic markers for potato cultivar fingerprinting (Veilleux et al. 1995; Kawchuk et al. 1996; Provan et al. 1996; Ghislain et al. 2004; Moisan-Thiery et al. 2005; Reid et al. 2009; Reid et al. 2011) as they provide the highest DI (diversity index) over dominant markers such as RAPD, AFLP, and RFLP markers (Milbourne et al. 1997; McGregor et al. 2000).

The earliest and conceptually simplest technique of parentage analysis is the simple exclusion method introduced by Ellstrand (1984). Based on Mendelian rules of inheritance, this technique uses genotypic incompatibilities between tested parents and their offspring in order to reject particular parent-offspring hypotheses. More precisely, the analysis is based on a strict examination of genotypes and the exclusion is declared when an obligatory parental-inherited allele observed in the offspring is not found in the alleged parent. This method has been extensively used in many various plant studies such as gene flow estimation analyses (Hamrick and Schnabel 1985; Ellstrand et al. 1989), including extension to both relative male and female fertility estimations and mating system examinations (Broyles and Wyatt 1991; Snow and Lewis 1993), genealogy reconstruction (Lewis and Snow 1992; Milligan and McMurry 1993), and paternity testing (Grattapaglia et al. 2004). Today, this technique is mainly used in kinship testing in humans, in particular for paternity testing (Chakraborty and Stivers 1996). However, it is recognized that the simple exclusion method presents a certain number of limitations. Firstly, this method requires the acquisition of the genotype of one “confirmed” parent (for example the mother) in order to test the other so-called alleged parent (Devlin et al. 1988). Secondly, this analysis is usually hindered by genotyping errors, null alleles, or mutations appearing between parents and children which contribute to false exclusions for a given locus as observed in humans (Chakraborty et al. 1996; Gunn et al. 1997; Junge et al. 2006) and plants (Grattapaglia et al. 2004). Finally, this method is rapidly rendered impractical when the pool of candidate parents becomes too large due to the high number of loci needed to yield a single non-excluded parent or parent-pair assignment to an offspring (Jones and Ardren 2003). Under the strict exclusion rule, a single mismatch is enough to exclude a candidate parent. However, many exclusion programs allow the user to specify the number of mismatches that are necessary to validate the exclusion, leading to a less stringent method considering the difficulties imposed by mutations or scoring errors (Jones and Ardren 2003). For instance, the standard practice in human paternity testing requires the exclusion of a minimum of three loci for up to 12 SSR markers before a test can be declared negative (Junge et al. 2006). To our knowledge, no study has achieved parentage testing using the simple exclusion method, such as kinship testing, with the potato. This method appears to be promising for the development of a pedigree validation program and can easily be implemented despite the known limitations of the method. Indeed, due to polysomic inheritance of the potato and following the possibilities given by the segregation of the alleles from a parent to the offspring (Luo et al. 2000; Hackett and Luo 2003), the application of such a method is theoretically possible. With regard to the phenomenon of double reduction, which is one of the most important features of auto-tetrasomic inheritance (Mather 1936), it was first assumed that it occurs sporadically in the cultivated tetraploid potato (Haynes and Douches 1993). However, recent studies have shown variable results for the evaluation of double reduction proportion which may range to 10% (Slater et al. 2014). Therefore, the phenomenon of double reduction in the cultivated tetraploid potato may have a potential impact on parentage analysis.

Another approach introduced by Meagher (1986), the maximum-likelihood method, sometimes called most likely parent (or likelihood ratio) method, is preferably used when genotypes are available for a set of offsprings and a set of potential parents within a defined population from which the parent is most likely to be identified. The maximum-likelihood method is an inferential method which is generally chosen after applying the simple exclusion method, specifically in instances where complete exclusion is not possible (Jones and Ardren 2003). In this parentage assessment, the ratio precisely divides the likelihood of an individual being the parent of a given offspring by the likelihood of these individuals being unrelated (Meagher and Thompson 1986). To compare the likelihoods of different relationships, a log-likelihoods score (LOD) is calculated for each possible parent as the logarithm to the base e of the likelihood ratio (Meagher and Thompson 1986), providing a null, positive, or negative likelihood parental score. After an exhaustive evaluation of all genetically compatible parents, the individual with the highest LOD is chosen as the true parent, while all other parents are discarded. Contrary to the simple exclusion method, the maximum-likelihood method allows paternity assignment for a higher number of offspring (Devlin et al. 1988; Smouse and Meagher 1994) and usually allows for some degree of transmission errors due to genotype misreading or mutation (Jones and Ardren 2003). However, some limitations are also found with the maximum-likelihood method. Firstly, the used loci must be unlinked and unambiguously inherited (Smouse and Meagher 1994; Buteler et al. 1997). Secondly, the method is only accurate when the entire set of potential parents has been sampled (Marshall et al. 1998). Thirdly, parentage remains ambiguous when multiple parent–offspring relationships obtain the highest nonzero likelihood (Jones and Ardren 2003), typically leading to an arbitrary choice of the parent with the highest LOD score as the true parent, with little consideration to the relative likelihoods of other possible parents. Nevertheless, it is now possible to mitigate this effect by setting a threshold which allows the rejection of compatible genotypes, by either using random genotypes taken from the population sharing the same allele frequency (Meagher 1986) or by using simulation procedures (Marshall et al. 1998; Gerber et al. 2000). Other studies have emphasized that a statistical bias in favor of homozygotes exists, therefore favoring a homozygous putative parent that will always result in a higher LOD score for a given locus than would that of a heterozygous individual (Devlin et al. 1988; Smouse and Meagher 1994; Buteler et al. 1997). Finally, the model provides no explanation for when LOD scores result in a tie (Devlin et al. 1988; Schnabel 1998). However, most of these limitations can be overcome when the number of potential parental candidates is low or the number of genetic markers is increased (Buteler et al. 1997; Schnabel 1998). With regard to the potato, this inferential method seems to be an interesting approach to investigate parental relationships, particularly when parental pedigree information is unavailable (e.g., orphans and old varieties) or uncertain. As with the simple exclusion method, the possibilities given by the segregation of the alleles from a parent to the offspring in autotetraploid potatoes should not disrupt the inferential methodology.

This study investigated these two approaches with the aim of identifying the need for the development of reliable programs. A pedigree validation program could be constructed by implementing kinship testing based on the simple exclusion method. Whereas for varieties whose pedigree information is either unknown or uncertain, a potential parent assignment program could be applied through the use of inferential analysis such as the LOD method. The reliability of these selected methods is examined in this paper through the use of pedigree information provided by either the literature or the breeders and the efficiency of the LOD method was evaluated by comparing it to another inferential method, the GDR method, that used the Jaccard’s coefficient (Jaccard 1908) as an arithmetic measure of genetic distance between two individuals to generate a ranking of the most related varieties.

Material and Methods

For a detailed view of the experimental procedure used in this study, see the schematic view in Fig. 1.

Fig. 1
figure 1

Schematic view of experimental procedure

Plant Materials and Genomic DNA Isolation

Analyses were carried out on 577 European potato cultivars, including triploid or tetraploid varieties that were provided by several breeders and growers. Plant material and genotype data were collected with the best intentions by donors and the authors, respectively. The final results are published as such. The authors are not responsible for any potential inaccuracy except for those arising from genotyping analyses. In particular, legal issues or disputes raised by owners of germplasm cannot be based on data or conclusions disclosed in this publication.

Genomic DNA was isolated from germ, tuber, or leaf material using the QuickPick Plant DNA kit and Pickpen 1-M from Bio-Nobile (Turku, Finland). The concentration of the DNA solutions was quantified using a ND-3300 NanoDrop spectrofluorimeter (Thermo Scientific; Waltham, MA, USA). A subsample of each concentrated DNA solution was diluted to 2 ng/μl, which was found to be the suitable concentration for optimizing the multiplexed-PCRs.

Microsatellite Markers

The 17 SSR primers used in the present study (Table 1) were taken from the literature (Veilleux et al. 1995; Provan et al. 1996; Milbourne et al. 1998; Ghislain et al. 2004; Feingold et al. 2005; Ghislain et al. 2009). They were chosen based on the polymorphic information content (PIC) or diversity index (DI), the length of the obtained products and also on the similarity of their optimal annealing temperatures. Twenty-six primer pairs were a priori preferentially selected based on annealing temperatures ranging from 53 to 57 °C. This range included the highest number of primers available. However, only 17 primer pairs were kept after an intensive preliminary testing phase (see “Supplementary material” for more details), which resulted in the combination of these 17 primer pairs in three sets of multiplexed-PCR.

Table 1 Multiplexed SSR sets and data analysis. Locus, core repeat, primer sequences (5′-3') with fluorochrome label, source of primer, concentration of markers for each primer mixture (conc. in mix), number of distinct peaks (NP), number of allelic phenotypes (NAP), frequency of most common allelic phenotypes (FMCAP), allele size range, and DI were estimated from 548 European potato cultivars for each multiplexed SSR set

Multiplexed-PCR Amplification

Amplification of the 17 loci was performed in two sets of six loci and one set of five loci, using a Kapa2G HotStart Multilocus Amplification Kit (Kapa Biosystems, Boston USA). The 5′ end of the forward primer of each pair was labeled with a fluorescent dye (Fam, Hex or Ned dyes; Table 1). The PCR amplification mixture in a total reaction volume of 25 μl contained 1.5× Kapa2G HotStart PCR Buffer, 0.2 mM of each dNTP, 0.1–0.2 μM of each primer (Applied Biosystems), 1 unit of KAPA2G Fast Hotstart DNA Polymerase and 20 ng of template DNA. All amplifications were carried out in the same conditions (annealing temperature of 53 °C) on a 9800 Fast Thermal Cycler (Applied Biosystems). The PCR profile was as follows: after an initial denaturing step of 2 min at 95 °C, the reaction consisted of 29 cycles of 15 s at 95 °C, 30 s at 53 °C (T a), 12 s at 72 °C, and one cycle of 2 min at 72 °C for final extension. The completed reaction products were held at 4 °C until the electrophoresis analysis.

Analysis of Amplicons

The PCR products were diluted in water at a ratio of 1:15. A volume of 1 μl of the diluted solution was transferred into 14 μl of a denaturing mix (1% of the ROX-labeled molecular weight marker GS-400HD Rox and 99% of formamide), then denatured at 95 °C for 5 min. Each of the three sets of multiplexed-PCR was independently run on a 4-capillary ABI 3100-avant Genetic Analyzer (Applied Biosystems). The denatured samples (15 μl) were separated using a standard polyacrylamide gel (POP4; Applied Biosystems) and run following standard parameters (Injection_voltage:5.0 kV; Injection_time:8 s; Run_voltage:15.0 kV; Run_time:1250 s). Estimations of fragment sizes were enabled by including a number of tracks containing size 21 GS-400HD Rox-labeled DNA fragments (Life Technologies; Carlsbad, CA, USA), covering a size range of 50–400 bp. The lengths of the PCR products were determined using GeneMapper Software 4.0 (Applied Biosystems) and used to construct a multilocus allelic phenotype for each cultivar. A database was then compiled, regrouping the 577 varieties fingerprints.

Estimates of Genetic Similarity by the UPGMA Clustering

Genetic similarity between all potato cultivars was estimated by the UPGMA (unweighted pair group method with arithmetic mean) clustering method using Jaccard’s coefficient (Jaccard 1908). Distinct peaks were scored as either 1 or 0, for presence or absence, for each cultivar accession and for all the 17 loci. All varieties sharing identical genetic fingerprints (duplicates) as revealed by genetic similarity estimation were removed before-hand from the analysis in order to obtain a reliable measure of marker informativeness and to perform optimal treatment of parentage analysis. Out of 577 varieties, 29 duplicates were discarded from the database, and analyses were thus conducted on 548 varieties. Binary values data were then exported to MATLAB V5 software (The Math Works Inc., Natick, MA, USA) as a matrix of 548 varieties × the total number of distinct alleles (a peak was considered as a distinct allele). A similarity matrix was computed by pair-wise comparisons according to the number of shared peaks, and a frequency distribution of pairwise Jaccard similarities for all 548 varieties was generated.

Measure of Markers’ Informativeness

To obtain a measure of the utility of the marker systems, a DI was calculated from fingerprint data for each marker according to the Nei’s statistic (Nei 1973; Nei and Li 1979): DI = 1 − ∑p g 2 where p g is the frequency of an individual allelic phenotype, assuming that each SSR reveals one locus (Provan et al. 1996; Feingold et al. 2005).

Parentage Analysis

  1. 1.

    Kinship testing method

    The kinship testing method was developed based on the simple exclusion method. Using all the pedigree information disclosed either by literature or breeders, we performed kinship testing on 67 hybridized offspring whose genetic information of both parents was available in our database.

    The simple exclusion method was based on a strict examination of allelic phenotypes between the tested parents and their offspring according to the Mendelian rules of inheritance. We first examined whether all alleles present in the offspring were also found in the alleged parents and then ensured that the distribution of inherited alleles was correctly balanced in the offspring for each given locus. During this step, if one of these two conditions was not satisfied, the test accumulated one segregation mismatch (SM) for each violation (see examples in Table 2). The SM is defined as a mismatch appearing between the allelic phenotype of tested parents and a hybrid, which is supposedly due to either allelic mutation, segregation distortion, null alleles, or genotyping errors. To minimize false exclusion effects in such a method, the test was adapted by adding a genetic distance parameter in the form of the Jaccard’s coefficient as defined during the UPGMA process. This allowed us to formulate a logical test as follows (Table 2): if the genetic distance between the hybrid and each alleged parent (GD[HYB ‐ PAR]) is less than the average of the genetic distance between the hybrid (or the alleged parents) and the varietal genetic pool (GD[HYB ‐ POOL] or GD[PAR ‐ POOL]) (made up of 548 varieties), the test is considered as negative (exclusion is declared). In addition, exclusion is also declared through examining the position value of both tested parents within the most likely parent ranking (RankPAR as performed during GDR assessment) for each kinship test and when the resulting position value is under a threshold of 0.647. The value of the exclusion threshold should be defined as the minimum probability of being a parent and is thus established based on the position of the lowest ranked alleged second parent among all 67 kinship tests that presented no SM (for this study: RankPAR: 0.647 = 193th position of 546). We assumed that both parents could be ranked in the first two positions (1st position out of 547 for the first parent, 1st position out of 546 for the second parent) meaning that the second parent may be ranked as the second most likely parent. The analysis also focused on the correspondence between the number of SM and the result (exclusion or not) of the adapted test for each assessment.

    Table 2 Examples of genetic profile examination and conditions of the kinship testing method for three different scenarios, where the hybrid (HYB); the parent (PAR); the allelic phenotype (peaks were assigned letter codes instead of actual size) for each individual SSR marker (M); the genetic distance based on Jaccard’s coefficient (GD); the genetic distance mean of the hybrid with all other individuals in the pool (GD[HYB-POOL]); the genetic distance between the hybrid and the alleged parent (GD[HYB-PAR]); the genetic distance mean of the alleged parent with all other individuals in the pool (GD[PAR-POOL]); the position value of the tested parent within the most likely parent ranking (RankPAR); and the number of segregation mismatches (SM) are given
  2. 2.

    Inferential analysis

    Two methods were developed to establish a parentage analysis based on a ranking of the most likely parent: the maximum-likelihood (LOD) method and the genetic distance ranking (GDR) method. For both methods, inferential analysis was performed for all of the 548 distinct varieties present in our database, each being considered as a single entry. Thus, for both methods, 548 assessments in total were conducted. For each assessment, 547 entries were inferred successively as a potential parent for the analyzed variety and then ranked as the most likely parent criterion in descending order.

    The LOD method involved calculating a logarithm of the likelihood ratio (LOD) for each possible parent, which provided either a null, positive, or negative likelihood parental score (LOD score). Due to the polysomic inheritance of the potato and consequently to the constraints of the specific locus dosage during the scoring of polymorphic fragments, we used the likelihood ratio statistics, originally developed to analyze amplified fragment length polymorphism (AFLP) markers (Gerber et al. 2000) to perform this inferential analysis. The use of the LOD formulas based on dominant markers directly enabled a performance of such an analysis with an allelic phenotype system. The formulas of the likelihood ratio were then written in MATLAB V5 software (The Math Works Inc., Natick, MA, USA) (see appendix for details) to automate the process for each parent/offspring combination ((5482 − 548)/2 = 149878 combinations). For each of the 548 assessment, inference of parentage was performed on a ranking of the most probable varieties according to decreasing LOD scores.

    The GDR method used the Jaccard’s coefficient as an arithmetic measure of genetic distance between two individuals to generate a ranking of the most related varieties. The similarity matrix computed during the estimates of genetic similarity by the UPGMA clustering was therefore used to perform the GDR method and each of the 149878 combinations was also considered as a potential parent/offspring combination. Inference was thus performed on a ranking of the most related varieties based on a decrease in the Jaccard’s coefficient.

    In order to gain a clear vision on the reliability of both methods, we carried out a parental ranking comparison between both methods based on the number of varieties for which the pedigree information of one parent was known and the corresponding genetic data was available in our database. Of the 548 assessments, 208 answered to these two criteria. The comparison study therefore focused on 208 assessments. We then evaluated the reliability to infer the “alleged parent” based on both the first ranked candidate as well as the top 30 criteria of parental ranking. Additional analyses were also conducted on the ranking distribution of first position of “alleged parent” to determine if both methods classed the “alleged parent” in the same way.

Results

Estimates of Genetic Similarity by the UPGMA Clustering

A pairwise similarity matrix based on Jaccard’s coefficient was generated, which allowed the drawing of a frequency distribution of pairwise similarities for all 548 varieties (Fig. 2). The average Jaccard similarity between all 548 potato cultivars with these markers was 0.447. The 16 pairs that were the most genetically distanced shared a similarity of 0.15–0.20 whereas the 5 closest pairs shared a similarity of 0.80–0.85. As a whole, all pair values reflected a good variability in the cultivar pool with a normal distribution (extreme cases, for example those too close or too distant varieties, were not abundant).

Fig. 2
figure 2

Frequency distribution of pairwise comparisons for the 548 potato varieties of the database

Measure of Markers’ Informativeness

A total of 17 pairs of primers were used to amplify potato microsatellites ranging from 82 to 252 bp. The average of unique allelic phenotypes for all of the 17 markers across all 548 fingerprinted potatoes averaged 52.24, ranging from 14 to 108, with the highest number for STI0036 (Table 1). The Diversity Index (DI) varied in this study from 0.442 (STG0006) to 0.970 (STI0036) across all 548 fingerprinted genotypes. Mean DI for the 17 SSR markers was 0.851. The highest values were shown for STM1052 and STI0036. Nevertheless, these markers did not necessarily present the highest number of alleles (an allele corresponded to a distinct peak). Indeed, STWAX-2 showed the highest number of alleles (17 alleles) in our study. The number of alleles across all fingerprinted genotypes averaged 10.06, ranging from 7 to 17. In total, the 17 microsatellite loci produced 171 alleles. The use of these three multiplexed-PCR sets is thus an extremely powerful tool to differentiate cultivars, with the best DI for the first set. Using the calculation based on the frequency of the most common allelic phenotypes of each marker as an upper limit, the chance of two varieties yielding identical profiles based on these 17 markers gave an infinitesimally small value. The upper limit is based on the presented frequencies of most common allelic phenotypes (see Table 1): 0.308 × 0.086 × … × 0.204 = 6. 9 × 10− 12.

Parentage Analysis

  1. 1.

    Kinship testing method

    The kinship tests were conducted on 67 lines, which corresponded to the number of tests for which the genetic profile of the offspring and both of their parents (referring to the available pedigree information) were available in the genetic data base. The results showed that 61 out of 67 kinship tests were “not excluded”. All of the 6 excluded tests were indicative that only one of the parents did not meet the validation criteria (see Table 3). Otherwise, no “excluded tests” involving the attributes of both parents were met in this study. The number of segregation mismatches (SM) was counted for each test. Of the 61 “not excluded” tests, 42 tests did not show any SM while the remaining 19 presented between 1 and 5 SM. The 6 “excluded” tests reported between 3 and 8 SM. This examination thus revealed that the number of SM, alone, can be no decisive for the outcome of the test, since the tests obtaining between 3 and 5 SM can either be categorized as “excluded” or “not excluded”. The reliability of the material source as the cause of the exclusion can be directly rejected if a variety is excluded in a particular test while this same variety is not excluded in another test. This is for example the case of test no. 47 which excluded the variety Charlotte as the parent while test no. 29 confirmed that the same variety Charlotte is the presumed parent of the tested hybrid. Similarly, test no. 5 excluded the variety Premiere as the second parent while test no. 50 confirmed both parents of the hybrid Premiere.

    Table 3 Kinship testing data where the kinship test number (no. test); the hybrid name (HYB); the parents name (PAR); the genetic distance mean of hybrid with all other individuals in the pool (GD[HYB-POOL]); the genetic distance between hybrid and alleged parent (GD[HYB-PAR]); the genetic distance mean of alleged parent with all other individuals in the pool (GD[PAR-POOL]); the position value of tested parent within the most likely parent ranking (RankPAR); the number of segregation mismatches (SM no.); and the test result are given
  2. 2.

    Inferential analysis

    1. a.

      The maximum-likelihood method

      The frequency distribution of all parent/offspring combinations followed a Gaussian curve where the mean estimated LOD score was close to 0. The LOD scores ranged from a minimum of −15 to a maximum of 30. A LOD score of zero implied that the alleged parent was as likely to be a parent of an offspring with the same probability as would be a randomly selected individual. A positive LOD score implied that the alleged parent was more likely to be a parent of an offspring than a randomly selected individual would be. The maximum and minimum LOD scores were surprisingly high but had a very low frequency (0.77% for values ≤ −10 and ≥ +10) and were negligible in terms of occurrence. High LOD scores (≥ +10) were due to a combination of pairs of varieties having both a close genetic similarity and a vast genetic distance between said pairs and the other pairs in the varietal genetic pool. Regarding the ranking of the most likely parent, the 1st ranked entry across all 548 assessments averaged a LOD score of 13.99 while the last ranked entry averaged −12.87. For the 1st ranked entry, no obtained LOD scores resulted in a tie.

    2. b.

      The genetic distance ranking method

      Since this inferential method used the Jaccard coefficient as the genetic distance parameter, the frequency distribution of all parent/offspring combinations was the same as that generated during the UPGMA process, taking into account that each pair is considered here as a parent/offspring combination. This frequency distribution also followed a Gaussian curve where the mean estimated similarity between all the potato cultivars was 0.447. The similarity values ranged from a minimum of 0.169 to a maximum of 0.818. With regard to the ranking of the most likely parent, the 1st ranked entry across all 548 assessments averaged a Jaccard coefficient of 0.685 while the last ranked entry averaged 0.260.

    3. c.

      Comparison in the ranking of the two inferential analysis methods used

      The comparison for the ranking of the most likely parent between both methods was conducted on 208 entries, which corresponded to the number of varieties where one parent was known and present in our database. The positional ranking, whether in the first position or within the top 30, for the alleged parent differed according to the method (Fig. 3). Indeed, it appeared that the “alleged parent” had a 27.40% probability (57 assessments of 208) of being ranked first using the LOD method as compared to 24.52% (51 assessments of 208) using the GDR method. Moreover, the LOD method gave a better result if we consider that the “alleged parent” has a 54.81% chance of being in the top 5 of the most likely parent ranking, as compared to 52.40% for the GDR method (see limit on Fig. 3). However, after this top 5 “virtual limit”, the trend is clearly reversed in favor of the GDR method, which has the best probability to place the “alleged parent” in the top 30 (LOD: 79.33%; GDR: 84.13%). The ranking distribution of the 1st position also showed that neither method systematically classed the “alleged parent” in the same way. Indeed, among the assessments that inferred the expected parent in the 1st position (57 for LOD and 51 for GDR), both methods achieved at the same result (same inferred parent in the 1st position) only for 39 assessments.

    Fig. 3
    figure 3

    Top 30 frequency ranking for GDR and LOD methods

Discussion

Since the early 1990s, the utilization of genetic markers, in particular SSR, is well documented for the potato and constitutes useful tools for cultivar identification (Kawchuk et al. 1996; Ghislain et al. 2004; Moisan-Thiery et al. 2005; Ghislain et al. 2009; Reid et al. 2009; Reid et al. 2011; Côté et al. 2013; Karaagac et al. 2014) as well as for linkage analysis (Bradshaw et al. 1998; Milbourne et al. 1998; Song et al. 2005) and genetic relationship studies (Provan et al. 1996; Schneider and Douches 1997; Ashkenazi et al. 2001; Braun and Wenzel 2004). Our study extends the scope of the polymorphic SSR markers in the potato as parentage analysis methods developed in this study have shown that SSR markers were also promising in supporting pedigree information. Pedigree confirmation finds many applications of interest in several areas. For instance, this tool can be used as an objective criterion of variety validity being that genetic heritage is an indisputable proof of variety authenticity. This may also be helpful in plant breeding, particularly in selecting parental lines. Indeed, parental choice based on a certified pedigree is essential for precise estimation of breeding values such as the maximization of the heterosis effect and the reduction in the levels of inbreeding (Tarn et al. 1992). Many studies also emphasized the importance of pedigree information in linkage analysis (Luo et al. 2000), such as quantitative trait loci (QTL) (Bradshaw et al. 1998; Bink et al. 2002), association mapping (Simko et al. 2004; Malosetti et al. 2007; D’hoop et al. 2010; Baldwin et al. 2011), and extreme resistance inheritance (Song et al. 2005). Finally, this tool may also be highly relevant for studies on genetic relationship estimations (Hosaka et al. 1994; Demeke et al. 1996; Sun et al. 2003), which have often faced problems of unreliable pedigree information (Braun and Wenzel 2004).

In our study, a large panel of microsatellite markers were tested, from which 17 loci were selected on the basis of their diversity index (DI) or polymorphic information content (PIC) and their compatibility to be used in a multiplexed-PCR. After an intensive preliminary testing phase, we set up three multiplexed SSR sets theoretically capable of discriminating all potato cultivars. However, several varieties resulted in having the same fingerprint, and we hypothesized two different explanations for this finding. Firstly, some varieties were called by different names yet had the same allelic phenotype and therefore could be considered as either synonyms (e.g., the following pairs could be considered as synonyms: Rosa/Platte de Florenville; Duke of York/Eersteling; Cleopatra/Margarita; Ratte/Nagleiner Kipfler; Bernadette/Dali; Elfe/Marabel; Jose/Wilja) or somaclonal mutants (e.g., Diamant is mutant of Cardinal; Famosa is a mutant of Estima; Margaritta is a mutant of Picasso). Secondly, a human error such as sampling or handling error could also have explained such a finding. This analysis allowed us to support the information provided by the potato pedigree database (Berloo et al. 2007) and by the breeders in relation to certain similar varieties (completely identical according to our 17 SSR) bearing different names. With regard to the informativeness of these selected markers, the high DI values revealed a large variability in allelic phenotypes. In general, the mean of these values was higher than those observed in similar studies using the same markers (Provan et al. 1996; Milbourne et al. 1997; Ghislain et al. 2004; Rios et al. 2007). The high values in our study could be explained by either a larger number of fingerprinted cultivars or a wide genetic variability within the sampled pool. A similar study which was based on a microsatellite database construction of potato varieties (Reid et al. 2011) showed that markers having the highest DI value systematically presented the largest number of alleles. In our study, this was not the case: the marker STI0036 had the highest DI mean value but presented only 13 alleles (the maximum number of alleles was found for the marker STWAX-2 with 17 alleles). As a final point, both the compiled database which regrouped a large number of varieties and the high number of informative SSR markers used in this study appeared, in our opinion, to be pivotal in performing a reliable parentage analysis.

To our knowledge, no study has attempted parentage analysis using such molecular tools in the potato. To examine a possible first step in this direction, our initial strategy was to develop a kinship testing method, based on the simple exclusion method (Ellstrand 1984), in order to highlight errors and abnormalities present in the available pedigree information. The application of such a method was theoretically possible with polysomic inheritance of the potato and following the possibilities given by the segregations of the alleles from a parent to the offspring (Luo et al. 2000; Hackett and Luo 2003). To overcome the constraint of false exclusions at given loci, current practice requires more than three exclusion loci before a test is reported as negative (Grattapaglia et al. 2004; Junge et al. 2006). From our point of view, this approach is too restrictive because it is based on an empirical value, which does not take into account the exact number of loci used in the study nor the ploidy level. Indeed, since the majority of tested potatoes are tetraploid and our DNA fingerprinting methodology was based on a large number of markers, it was assumed that the probability of finding more allelic mutations or segregation distortion between the parent and offspring for a given locus increases. The adaptation of the simple exclusion method, with the integration of a Jaccard similarity coefficient, allowed us to circumvent the problem of false exclusion in a polyploidy parentage study. The new criteria were henceforth based on the assumption that both the genetic similarity between the hybrid and the tested parent(s) were more than likely higher than the average genetic similarities between each of them (parents and hybrid) and the tested varietal pool. In addition, the inclusion of a ranking position parameter into the logical test strengthened its result since it appeared that relatives were closely ranked. Taking these criteria into account, the kinship testing method confirmed 61 tests (out of the 67) based on data collected in the literature. Although these validation tests confirmed the majority of the disclosed pedigree information, the exclusion cases found in our study must be considered with caution being that the results came from the development of a still-investigated method and the number of exclusions depended strictly on the availability of the three related varieties in the database (genetic data from offsprings and both parents were all required in order to perform the test). In any case, we hypothesized three exclusion categories to explain exclusion cases. The first category includes “wrong pedigree information” offspring, which can be explained by the tests where the analyzed variety is clearly not the progeny of the presumed parents (or vice versa). The second category consists in “human error”, which is a situation encountered when the tested individual (parent or offspring) is not that which is expected due to genotyping errors, involuntary errors in pollination, seed harvest, or labeling as well as an accidental mixing from the breeder’s collection. The third category concerns problems of “name redundancy” which arise from varieties originating from different locations yet bearing the same name (i.e., tested individual had the same name as an unrelated variety and was thus attributed to other parental couples). This is, for instance, the case for Laura, a variety from Austria, whose pedigree is different from the Laura variety from Germany. In other words, exclusion was met when the genetic fingerprint of one or more tested individuals is different from that which is expected. This could arise from either a mistake in the publication of the new variety’s pedigree, an uncertain variety origin, or a human error leading to erroneous genetic data.

The second strategy undertaken in this study was to implement an inferential parentage analysis by using a large pool of fingerprinted varieties. Two different methods were applied: one based on a common genetic distance (GDR method) and the other on probabilistic tools (LOD method). Regarding the LOD method, we used the formulas based on dominant loci (Gerber et al. 2000) in order to overcome the constraints of the specific locus dosage during the scoring of polymorphic fragment (Provan et al. 1996). In our case, this approach seemed easier to implement than the one, for example, suggested by Esselink et al. (2004) since we no longer had to define any extra haplotype or extra information in the genotypic model when unambiguous evidence of null alleles was available. The formulas used for the dominant markers have been successfully applied following polyploid model (Rodzen et al. 2004) and demonstrated here suitable performance in parentage inference for the potato. For each of the two methods applied, inferential analysis was conducted on all of the 548 varieties present in our database. Each inferred a rank of 547 candidates’ parents by following a decreasing order of the generated values (i.e., Jaccard’s coefficient or LOD score following the used method). The reliability of both inferential methods was determined by referencing the produced ranking data of a likely parent with available pedigree information. Of the 548 varieties, the constraints of information availability allowed for only 208 evaluations. Our results revealed that, while both methods did not consistently rank the “alleged parent” in the same way, the LOD method was more likely than the GDR method to classify a true parent in the first position among the many possible candidates. In fact, this method seemed to be more appropriate since the likelihood is calculated based on segregation probabilities (Meagher and Thompson 1986) while the GDR method is based on arithmetic calculation (Jaccard 1908). Although the LOD method gave better results in the ranking criterion, the chance of finding the alleged parent in the first position remained relatively low (27.40%), but interestingly, the probability of retrieving this alleged parent within the top five ranking was doubled (54.81%). This first investigation of inferential parentage analysis in the potato revealed that varieties with the highest scores were thus not necessarily the true parents, and consequently, all of the other 340 inferred most likely candidate parents, which were not supported by prior pedigree information, could not be assigned as a true parent with certainty. Direct examination of the ranking results for both methods indicated a significant presence of siblings in the ranking that may have been either genetically close to the analyzed variety or to the inferred parent. They were thus considered as interfering candidates. Indeed, the presence of other family members in the pool of candidate parents poses a serious challenge to parentage assessment since it may disrupt the parental ranking (Jones and Ardren 2003). Although strategies exist that account for some of the effects of family structure on the assignment of kinship, for instance by including a confidence threshold (Marshall et al. 1998; Gerber et al. 2000), additional work is needed before it is determined with certainty which techniques are most sensitive in studies involving relatives, especially for crop plants. In our potato parentage study, tested varieties were not sampled from a natural population because of the bias caused by varietal selection. Family members, other than parents such as full- or half-siblings, present in the sampled pool was thus important. Another aspect of the ranking examination revealed a less efficient ranking for many modern breeds that have a parent widely used in breeding such as Agria. As a consequence, these breeds tended to cluster together, also disrupting the ranking. For these reasons, the use of inferential analysis to assign a true parent is a priori to be treated with caution in such a study; but the inferential method remains a promising tool for assigning parents, especially if a genotypic examination (or known sibling examination) of the best ranked parental candidates is performed in complement. Certainly, this complementary step could increase the probability to rank the true parent in the first position by rejecting eventual interfering candidates.

In conclusion, the parentage analysis investigated in this study, based on kinship testing and inferential methods, proved to be particularly hopeful in regards to the development of pedigree information validation, or parental assignment programs. Although inferential methods and the exclusion method are based on different approaches, they both appear to be promising to support the existing potato pedigree data, either in breeder’s catalogs or in online potato pedigree databases by detecting, for instance, abnormalities (e.g., typing errors, synonymous names). Moreover, the kinship testing method may be used as a hopeful complementary tool for assigning parents with inferential methods, especially when the presence of interfering candidates disrupt the parental ranking. These two approaches may also be helpful for studies of genetic diversity between individuals or varieties using genetic fingerprinting data as mentioned in specific reviews (Devlin et al. 1988; Mohammadi and Prasanna 2003; Ge et al. 2011). Indeed, our results showed that siblings were closely clustered, confirming that closely related cultivars, estimated by genetic fingerprinting, were connected to pedigrees. Such results therefore support the hypothesis that reliable information on pedigree can steer uncertain interpretation on both clustering and estimates of true measures of genetic similarity in the potato as pointed out in different studies (Hosaka et al. 1994; Braun and Wenzel 2004). Finally, the supporting pedigree information can also be a helpful tool in plant breeding, as it allows the breeders to make informed decisions regarding the selection of germplasm for crossing schemes, in particular for the creation of new varieties.