Introduction

Molecular markers have proved to be valuable tools in the characterization and evaluation of genetic diversity within and between species and populations. Marker systems differ in their information content, which depends on polymorphism. The concept of polymorphism is used to define genetic variation in a population, which has been extensively studied in recent years by several established scientific disciplines, for example, genetics, ecology, zoology, and microbiology (Mukherjee et al. 2010; Muneer et al. 2011; Rajkumar et al. 2011). Examples are numerous and obvious. For the practical design of molecular genetic studies, a few questions must be considered. How difficult will it be to find usefully polymorphic loci? How many markers are needed? How polymorphic must each marker be? These questions can be answered by measuring the information content of the markers. There are two measures of the quality or informativeness of a polymorphism as a genetic marker: heterozygosity (H) and polymorphic information content (PIC). Since its first application by Botstein et al. (1980) PIC has become the most widely applied formula for genetic studies to measure the information content of molecular markers. To illustrate its wide application, we surveyed DNA fingerprinting publications of the last 20 years. This search revealed that more than one thousand published papers utilized the PIC formula.

Materials and Methods

The heterozygosity of a locus is defined as the probability that an individual is heterozygous for the locus in the population (Liu 1998) and can be calculated as:

$$ H = 1 - \sum\limits_{i = 1}^{l} {P_{i}^{2} } $$

where P i is the frequency for the ith allele among a total of l alleles. PIC refers to the value of a marker for detecting polymorphism within a population, depending on the number of detectable alleles and the distribution of their frequency; thus, it provides an estimate of the discriminating power of the marker. The PIC value of an l-allele locus can be calculated as

$$ {\mathbf{PIC}} = 1 - \sum\limits_{i = 1}^{l} {\mathop P\nolimits_{i}^{2} } - \sum\limits_{i = 1}^{l - 1} {\sum\limits_{j = i + 1}^{l} 2 } \mathop P\nolimits_{i}^{2} \mathop P\nolimits_{j}^{2} $$

where P i and P j are the population frequency of the ith and jth allele. According to Guo and Elston (1999), PIC is defined as the probability that the marker genotype of a given offspring will allow deduction, in the absence of crossing over, of which of the two marker alleles of the affected parents it received. In other words, PIC is a modification of the heterozygosity measure that subtracts from the H value an additional probability that an individual in a linkage analysis does not contribute information to the study (Speer 1999).

Results and Discussion

For the accurate design of genetic studies, such estimates must be calculated to describe the informativeness of the markers, but presently there are no easily accessible calculators for that purpose. To simplify the work of molecular studies, we have developed a useful online tool (http://w3.georgikon.hu/pic/english/default.aspx) to facilitate the calculation of H and PIC values. This program, PICcalc, can calculate these values from manually uploaded allelic frequencies or from a given file containing binary data. The latter option allows the user to calculate the values for a given number of loci from a simply prepared text file, ensuring the estimation of PIC and H for a primer or primer sets used in the analysis with different genetic marker systems dealing with binary data.

Dominant and codominant markers are routinely used in molecular genetic studies. For multilocus methods (e.g., AFLP, ISSR, RAPD), in theory, it is presumed that fragments of equal length amplify from the corresponding loci and that they represent a single, dominant locus with two possible alleles (presence/absence). The maximum value of PIC and H for dominant markers is 0.5, since two alleles per locus are assumed and both are influenced by the number and frequency of the alleles (Henry 1997; De Riek et al. 2001; Bolaric et al. 2005). To consider this feature of dominant markers, a link for this calculation is implemented in the program especially for these kinds of markers.

The additional utilities made available on the Web site are for free usage, making data evaluation faster and easier. This simple online tool provides an easy way to compute PIC and H from binary data or from allelic frequencies.