Introduction

Despite the fact that genomics is still a young discipline of knowledge, its achievements are not to be underestimated, especially in the field of human genetics and recognition of the basics of genetic diseases, diagnostics of genetic defects (Ahn et al. 2013) or identification of changes arising in the genetic material of tissues subjected to malignant transformation (Chung et al. 2006). The development of analytical approaches progressing along with the development of technology and increasing knowledge about sequences of genomes allowed extending the achievements of genomics beyond the framework of experimental applications. Currently, genomics is being heavily engaged in the explanation of complex mechanisms and relationships occurring within a genome in both physiological and pathological conditions and searching for answers that may explain still unknown aspects of its functioning. Following the reduction in costs of genome-wide genetic analysis, genomics has also entered the area of animal genetics, especially in the aspects of biodiversity (Twito et al. 2007; Kijas et al. 2009), animal production (Hayes et al. 2009a; Zhang et al. 2013), susceptibility to diseases (Zhang et al. 2012; Kizilkaya et al. 2013) or identification of genetic factors underlying observed phenotypical traits (Cargill et al. 2008; Ren et al. 2011).

Much has already been said about the use of methods of genomics for assessing the livestock breeding value or identification of genomic regions associated with different production traits (Hayes et al. 2009a; Saatchi et al. 2011; Weber et al. 2012). However, the application of genomics in terms of pure biology, aiming at the identification of the basics of functioning, variability and structure of the genome of these animals, is less popular and often underestimated. Since animal production cannot take place without a series of purely biological processes occurring in cells or tissues, basic research in this area should be within the scope of quantitative geneticists. Moreover, the correct estimation of the breeding value of animals based on molecular markers cannot be performed without extensive knowledge about complexity of the genome.

So far, the most widely used tool in studies on livestock genomes are genotyping microarrays. They allow a relatively quick, reliable and inexpensive determination of genotypes of a large number of single nucleotide polymorphisms (SNPs) being a primary source of genetic variation. The microarrays are designed to describe the genetic variation within a genome of interest in the best possible way, owing to the use of linkage disequilibrium phenomenon (Matukumalli et al. 2009; Kranis et al. 2013). Currently, the most advanced genotyping tools in animal genomics are becoming available for cattle and allow analysis of about 800,000 SNPs in parallel, which means that, in the range of each Mb of genomic sequence, the genetic variation is described by about 300 markers (Rincon et al. 2011). It gives a detailed insight into the genome of the species and allows for a detailed analysis in aspects of its variability, rearrangements and structure.

In view of the growing number of studies on livestock genomics, in this work, we undertook a review of various applications of genomics in terms of population genetics, biodiversity, the structure of the genome and phenomena occurring therein. In this paper, we focused on applications deviating from the prevailing trend of genomic selection, describing the basic research on the complexity of the genome of farm animals.

Studies on linkage disequilibrium (LD)

In the current era of genome-wide association studies, the knowledge of linkage disequilibrium (LD) between markers is important in order to establish the number of markers necessary for genomic selection, efficient association studies and fine mapping of genetic diseases (Pritchard and Przeworski 2001; Espigolan et al. 2013). LD is defined as the non-random association of alleles at two or more loci and is influenced by, inter alia, population history and its evolution (Ardlie et al. 2002; Khatkar et al. 2008).

Studies on LD throughout a genome can be used to reflect population history, breeding systems and patterns of geographic subdivision, while LD in specific genomic regions gives an opportunity to learn more about the history of natural selection, gene conversion, mutations and other factors that cause gene-frequency evolution (Slatkin 2008). In animal populations, these allelic associations are also extremely valuable in localising genes affecting quantitative traits (quantitative trait loci, QTL) and are necessary to detect associations between a QTL and a marker (Pritchard and Przeworski 2001; Du et al. 2007). A local recombination rate is one of the main factors influencing LD. Regions with a low recombination rate, like the Y chromosome, parts of the X chromosome and regions near the centromere in autosomes, are characterised by high LD extent. On the other hand, small LD extent between two loci is typical for regions with a high recombination rate, such as euchromatin and small regions known as recombination hotspots (Jeffreys et al. 2001).

A wide variety of statistics have been proposed to measure LD. D′ and r 2, each with different statistical properties, are two measures most commonly used to evaluate LD between biallelic markers (Hill and Robertson 1968; Hill 1981; Valdar et al. 2006; Bohmanova et al. 2010). These parameters can vary between 0 (no disequilibrium) and 1 (complete disequilibrium), but their interpretation is slightly different. For biallelic markers, D′ takes the value of 1 if at least one allele at each locus is completely associated with an allele at the other locus, in other words, if one or more of the four possible haplotypes are absent. D′ values are less than 1 if all four possible haplotypes are present. The extent of LD based on D′ is the most useful for representing historical recombination patterns and is very helpful in understanding long-range LD. One disadvantage of this measure is that it tends to be inflated by small-sized samples and in the presence of rare or low-frequency alleles. The other LD measure (r 2) represents the correlation of alleles at two loci and is more useful for predicting the power of association mapping. For a pair of biallelic loci, r 2 is equal to 1 (known as the perfect LD) if only two haplotypes are present within a population. r 2 is a measure less susceptible to an allele frequency fluctuation than D′, but it is not completely independent of it. r 2 appears to be elevated when the average MAF is either too low or too high (Du et al. 2007; Khatkar et al. 2008).

LD studies have shown that LD in livestock populations is much more extensive than in humans, which can be simply explained by the small effective population and stronger selection that is typical for livestock populations (McRae et al. 2002). A validation work by Khatkar et al. (2008) on Australian Holstein–Friesian cattle suggests that, for the accurate estimation of D′ or for any analysis based on the D′ matrix (like the construction of LD maps), a sample of 400 or more individuals is required. In contrast, r 2 can be accurately estimated with a smaller sample of 75 individuals. They also reported that LD estimated as r 2 spans over 40 kb and as D′ measures over 8.2 Mb. The mean LD among syntenic SNPs measured by r 2 and D′ amounted to 0.024 and 0.189, respectively, in the studied Holstein cattle population. Espigolan et al. (2013) investigated LD using 446,986 markers in Nellore cattle, and reported that the average r 2 and D′ across the genome were equal to 0.17 and 0.52, respectively. In the study by Bohmanova et al. (2010), D′ = 0.72 and r 2 = 0.20 were observed in North American Holstein cattle between markers distanced by 40–60 kb. Qanbari et al. (2010a) obtained similar results for 810 German Holstein–Friesian cattle genotyped by the Illumina Bovine SNP50K BeadChip. Using a panel of 40,854 SNPs, the authors created a second-generation LD map in this population and presented a mean value of r 2 = 0.30 ± 0.32 in pairwise distances of <25 kb, which dropped to 0.20 ± 0.24 at 50–75 kb. Marques et al. (2008), who analysed 505 SNPs on chromosome 14, estimated LD (r 2 = 0.2) in Holstein cattle using markers separated by less than 100 kb. Similar results were presented by McKay et al. (2007) on the basis of 2,670 SNPs. Using a panel of 54,000 SNPs, Silva et al. (2010) genotyped 25 Gyr bulls and obtained a mean LD equal to 0.21 (r 2) between adjacent markers.

In the domestic horse, McCue et al. (2012) estimated genome-wide LD within and across different breeds. The authors reported that LD was higher within a breed than across breeds. They also observed that LD declined more rapidly in the Quarter and Mongolian horse than in other studied breeds, with r 2 values dropping below 0.2 within the first 50–100 kb. On the other hand, LD was clearly the highest in the Thoroughbred, where the r 2 value did not drop below 0.2 until 400 kb, and remained higher than in other breeds until approximately 1,200 kb. Similar results were reported by Corbin et al. (2010), who evaluated the extent and distribution of LD in a sample of 817 Thoroughbreds. Using 34,848 autosomal SNP markers, the authors found that the LD was relatively high between closely positioned markers (>0.6 at 5 kb) and extended over long distances, with the average r 2 value maintained above non-syntenic levels for SNPs up to 20 Mb apart.

LD levels between markers have also been studied in the genomes of pig breeds. Du et al. (2007) used 4,500 markers to estimate r 2 in six commercial lines of pigs and observed that, for all pairs of SNPs that are approximately 3 cM apart, the average r 2 was equal to 0.1. Ai et al. (2013) reported that the LD extent across populations is much shorter in Chinese pig breeds than western pigs. With the threshold of r 2 = 0.3, LD extends to 10.5 kb among Chinese pigs and to 125 kb among western breeds. These findings are comparable to a report of Amaral et al. (2008) that was based on the data of 371 SNPs. The authors established that LD extended up to 2 cM in European breeds and up to 0.05 cM in Chinese pigs. Using an SNP panel, Badke et al. (2012) identified the average r 2 between adjacent SNP across all chromosomes for Landrace (r 2 = 0.36), Yorkshire (r 2 = 0.39), Hampshire (r 2 = 0.44) and Duroc (r 2 = 0.46) pigs. The presented values were higher than those reported by Uimari and Tapio (2011), who used the same genotyping platform and obtained average r 2 values of 0.43 and 0.46 for adjacent markers in the Finnish Landrace and Yorkshire populations, respectively.

García-Gámez et al. (2012) presented an analysis of the extent of LD in Spanish Churra sheep using 43,784 SNPs distributed across the autosomal genome. The authors reported that, for SNPs distanced up to 10 kb, the average r 2 was equal to 0.329 and for markers separated by 200–500 kb, the average r 2 was reduced to 0.061. Using the Illumina Ovine SNP50 BeadChip, Miller et al. (2011) examined the extent of genome-wide LD within a population of bighorn sheep (Ovis canadensis) and found that high levels of LD persist over 4 Mb. Similar studies were conducted by Usai et al. (2010), who analysed 51,446 SNPs in Sarda rams and showed an average r 2 value of 0.072 for SNPs separated by at least 1,000 kb. These studies showed a substantially lower LD in the sheep when compared with a wide range of cattle breeds, including dairy and beef cattle (Villa-Angulo et al. 2009).

The differences in the published extent of LD occur because the estimate of LD depends on various factors. Such factors include: the history and structure of an analysed population, a sample size, a marker type (microsatellites or SNPs), a density and distribution of markers, the type of method used for haplotype reconstruction and strictness of SNP filtering (threshold of MAF and Hardy–Weinberg equilibrium). It is important to note that a population characterised by low-range LD will require a higher marker density compared to a population with extensive LD, where fewer markers will be required to obtain the same power to detect association (Meadows et al. 2008).

In summary, LD is an important tool which provides valuable information for selecting SNPs for association and genome selection studies and helps to unravel the recombination history of a population.

Runs of homozygosity

Thanks to the availability of high-density SNP arrays, it is also possible to examine the genome of an animal to identify runs of homozygosity (referred to as ROH). ROHs are contiguous homozygous regions of a DNA sequence where the two haplotypes inherited from parents are identical. This results in a formation of ROHs with different lengths: longer segments represent inbreeding to a recent ancestor and shorter ones are associated with inbreeding from distant generations. To clarify, the length and frequency of ROHs may give information regarding an animal’s ancestry and the history of its population (Purfield et al. 2012).

The criteria of ROH identification are, however, still not described precisely, since many authors use different approaches regarding the minimum number of SNPs in ROHs, their length and, also, some of the researchers allow the presence of a small proportion of heterozygote genotypes within ROHs, which may arise as a result of genotyping errors (Ku et al. 2011). From long ROHs, consanguinity may become available to identify. The longer the ROH segments are present in a genome, the higher the chance of recent inbreeding occurring within a pedigree (Kirin et al. 2010). On the other hand, remarkably long ROHs are sometimes present in outbred populations (Gibson et al. 2006). Frequently repeated meiosis and the breaking of chromosomal segments are the reason for long ROHs’ decay and creation of short ROHs. Due to the limitations of the pedigree recording process, these short ROHs may not be reflected by the pedigree of an animal (McQuillan et al. 2008).

In human populations, the analysis of ROHs is presented as a tested and valid method of identifying kinship, and may inform about the susceptibility of an individual to recessive diseases (Gibson et al. 2006; McQuillan et al. 2008; Hildebrandt et al. 2009; Kirin et al. 2010).

ROHs may also be utilised in animal genetics as an estimator of inbreeding levels, which can be used for the assessment of inbreeding depression. In addition, inbreeding estimates obtained conventionally from pedigree data, according to many authors (Ron et al. 1996; Carothers et al. 2006), can be incorrect due to errors and insufficient pedigree depth. These pedigree errors are generated mainly because of an improper recording procedure, mismothering and misidentification of animals. What is more, the results of inbreeding coefficients calculated from pedigree may not reflect the true levels of inbreeding, so the presented approach of ROH utilisation may seem appropriate.

Many authors described high correlations between FROH (calculated by dividing all of an individual’s total length of ROH by the length of the autosomal genome covered by SNPs with the exclusion of centromeres) and inbreeding coefficients. Hamzić (2011) noted that the strongest correlations of FROH with pedigree inbreeding coefficients were obtained for ROH cut-off lengths of 4 Mb, with a correlation ranging from 0.619 for Norwegian Red up to 0.705 for Tyrol Grey. Purfield et al. 2012 obtained similar results in their study on various cattle populations and presented a strong correlation equal to 0.75. The research of Ferencakovic et al. (2011) corresponded to other authors’ results and showed that Austrian Fleckvieh cattle was characterised by a high correlation (0.68) between an inbreeding coefficient calculated from ROHs of lengths greater than 4 Mb and pedigree-based estimates. These results are consistent with the studies conducted on humans.

Various breeds of cattle show different average ROH lengths in their genome. Purfield et al. (2012) showed that the largest mean portion of the genome classified as ROH was identified for Angus and Hereford breeds (198.6 and 198.7 Mb, respectively; approximately 8 % of their genome) and for other breeds, such as Holstein, Holstein–Friesian, Friesian, Limousin and Simmental, it ranged from 80.58 to 93.48 Mb (almost 3.2–3.7 % of their genome). Moreover, the three most homozygous animals had approximately 700 Mb covered by ROHs, which represented nearly a quarter of their genome.

To conclude, the proportion of the genome covered in long ROHs provides a good indication of the inbreeding levels of an animal and may be utilised as a new tool to determine autozygosity that was derived from recent or distant ancestors.

Selection signatures

Animal domestication and modern animal breeding are closely related, with strong artificial selection, which leads to the genetic improvement of animal production traits and fixation in the population of favourable traits associated with different aspects of animal production (e.g. behaviour, longevity or resistance to disease). Any type of selection (natural or artificial) leads to changes in the frequency of genetic variants associated with a trait under selection. Thanks to the LD across a genome, regions under selection can be detected by the analysis of allele frequency spectra of genome-wide SNPs that reflect the frequency of a selected variant by a physical linkage. The most common approach in the identification of selection signatures is the analysis of differences in allele or haplotype frequencies between populations with different levels of selected traits. In general, most of the computational methods used for the identification of selection signatures are based on comparison of the distribution of allelic frequencies by calculating population genetic statistics that are a function of allelic or genotypic frequencies. For example, FST (Weir et al. 2005; Wilkinson et al. 2013) and LD (Przeworski 2002; Kim and Nielsen 2004; Ennis 2007) measures have been used. Additionally, specific significance tests for detecting selection signatures have been proposed (Fay and Wu 2000; Kim and Stephan 2002; Voight et al. 2006; Stella et al. 2010) and some of them allow to study selection signatures in single populations (Stella et al. 2010). Other methods, like that proposed by Sabeti et al. (2002) and modified by Qanbari et al. (2010b), the extended haplotype homozygosity test (EHH) identifies loci under selection by an estimation of the age of core haplotypes. It is established by the assessment of decay of core haplotypes association to alleles at various distances from the locus. The method identifies regions with an unusually long range of haplotype and a high frequency in a population (Qanbari et al. 2011).

By using different computational approaches, several studies aiming at the identification of genomic regions under selection in different populations have been performed. Most of them were concerned with cattle as a species most widely subjected to genomic selection, which generates a large amount of data for population genetics. By the analysis of the allele frequency distribution between dairy and beef cattle breeds in Japan, Hosokawa et al. (2012) identified 11 candidate regions associated with different types of production distributed on eight different autosomes. The regions extended over several hundred kb, ranging from 314 kb on BTA13 to 1.8 Mb on BTA26. Within the regions, the authors identified candidate genes, including those previously associated with meat quality and milk yield traits, like IGF1 or STAT1. By using a similar approach, but employing a simulation for significance testing, Hayes et al. (2009b) identified 15 regions of the genome differentially selected in dairy and beef cattle breeds. Most of these regions were located on BTA20 near the locus of GHR (growth hormone receptor), a gene with large effects on protein content in milk from dairy cattle (Blott et al. 2003) and on BTA6, in the proximity of the ABCG2 gene, which harbours a polymorphism affecting milk protein content (Cohen-Zinder et al. 2005). The analysis of FST-based genetic diversity in Australian cattle breeds revealed 129 SNPs that have highly divergent FST values between the studied breeds and Bovine HapMap data (Barendse et al. 2009). The authors identified 12 genomic regions that had additive effects on traits like: residual feed intake, beef yield or intramuscular fatness measured in Australian cattle. The FST estimate was also used to detect signatures of diversifying selection in 13 porcine breeds. The signatures were found in regions associated with traits related to breed standard criteria, such as coat colour and ear morphology (Wilkinson et al. 2013). By using the parametric composite log likelihood (CLL) of the differences in allelic frequencies between five different cattle breeds selected for milk production, Stella et al. (2010) detected 699 putative selection signatures. The largest CLL was observed on BTA6 and corresponded to the KIT gene, which is responsible for the piebald phenotype present in four of the five breeds studied. Moreover, large CLLs were present at the site of the potassium channel-related genes on BTA14, -16 and -25, as well as within integrins (BTA18 and 19) and serine-/arginine-rich splicing factors (BTA20 and 23). By using the EHH, which detects selection by measuring the characteristics of haplotypes within a single population, in Holstein cattle, Qanbari et al. (2010b) identified 12 core haplotypes expected to be under strong positive selection. The haplotypes were associated with a panel of genes, including FABP3, CLPN3, SPERT, HTR2A5, ABCE1, BMP4 and PTGER2. This panel comprises some interesting candidate genes and QTL, representing a broad range of economically important traits, such as milk yield and composition, as well as reproductive and behavioural traits.

Detection of the regions of the genome which were added to the selection in the breeds’ history is also possible by the identification of so-called ‘selective sweeps’. This refers to the regions of a genome which show reduction or even elimination of nucleotide variation which arises in the alleles fixation process occurring under strong positive selection. By the analysis of the minor allele frequency of SNPs included in the Bovine SNP50 assay (Illumina) in 14 diverse cattle breeds, Ramey et al. (2013) found 28 genomic regions on 15 different chromosomes, of which 23 were breed-specific and five were shared among two to seven breeds. The regions encompassed several genes which could not be connected with the enrichment of any specific metabolic pathway. Employing a hidden Markov model-based test, which detects selection by studying local variations in the allele frequency spectrum along a genome, within a single population, Boitard and Rocha (2013) revealed, in the Blonde d’Aquitaine breed, three candidate regions under selection on BTA2, -7 and -11. The region on chromosome 2 encompassed GDF8 gene (myostatin, MSTN), a known muscle growth factor inhibitor.

The studies on selection signatures can be an important step in the recognition of biological factors affecting physiology and production in farm animals. The selected regions may contain or harbour the functional elements responsible for the development of desired traits and, thus, may help to identify the metabolic processes behind selected traits.

Copy number variation

In recent years, much research has been focused on copy number variants (CNVs), which are a type of structural variation of a genome and are considered to be an important source of genetic diversity, constituting approximately 10 % of the human genome (Orozco et al. 2009). They occur when deletions, duplications or insertions of DNA fragments from 1 kbp to 1 Mbp take place (Feuk et al. 2006; Redon et al. 2006). Regions of CNVs may encompass active genes or groups of genes, as well as promoters, enhancers or other functionally important sequences (Henrichsen et al. 2009; Schrider and Hahn 2010). Moreover, CNVs can arise owing to different molecular mechanisms, such as non-allelic homologous recombination (NAHR), non-homologous end joining (NHEJ), replication slippage and retrotransposition. The most common mechanism in humans is NAHR and the least common is retrotransposition (Kidd et al. 2008; Conrad et al. 2010).

When it comes to their presence in a genome, these variations are common in a range of organisms, not only in humans (Sebat et al. 2004; Conrad et al. 2006; McCarroll et al. 2006; Redon et al. 2006) but also in animals, including mice (Graubert et al. 2007; She et al. 2008), chimpanzees (Perry et al. 2006, 2008), rhesus macaques (Lee et al. 2008), cows (Liu et al. 2010), dogs (Chen et al. 2009; Nicholas et al. 2009), chickens (Griffin et al. 2008), fruit flies (Dopman and Hartl 2007; Emerson et al. 2008), Caenorhabditis elegans (Maydan et al. 2010), as well as in plants, such as maize (Springer et al. 2009), Arabidopsis thaliana (Ossowski et al. 2008) and even fungi, such as Saccharomyces cerevisiae (Carreto et al. 2008). They can also vary between individuals within a species (Schrider and Hahn 2010).

As natural diversity, they can arise de novo in an organism (somatic CNV) or be a result of disruptions in the recombination process in germ cells, which makes them heritable. However, the presence of CNVs in a genome may not be neutral for an organism. Numerous research projects have shown that these variations influence phenotypic features, complex bases of behaviour, susceptibility/resistance to diseases (e.g. autism, autoimmune diseases), as well as the occurrence of genetic disorders in humans (Buckland 2003; Gonzalez et al. 2005; Aitman et al. 2006; Autism Genome Project Consortium 2007; Fanciulli et al. 2007; Yang et al. 2007; Schaschl et al. 2009). There are a couple of mechanisms through which CNVs affect genes and their expression patterns. It can be simply via dosage effect, which may concern a single gene, a set of adjacent genes (e.g. DiGeorge syndrome, Potocki–Lupski syndrome), as well as allele combinations in the case of complex diseases, particularly those of the central nervous system (Henrichsen et al. 2009). Moreover, CNVs can alter sequences regulating gene expression, like enhancers (McCarroll et al. 2006; Nguyen et al. 2006) or promoters. Such extensive genome rearrangements may lead to the exposure of recessive alleles (when a deletion of the dominant gene takes place) or even to the inactivation of some genes (when a deletion within a gene takes place). Therefore, some diseases may result not from changes of copy numbers of a given CNV, but from a structural alteration in a fragment of a genome, causing a disruption of a metabolic pathway, regardless of the gene dosage (Henrichsen et al. 2009).

Copy number variations can be identified with the use of a wide range of techniques, such as FISH (fluorescent in situ hybridisation), CGH (comparative genomic hybridisation), aCGH (array comparative genomic hybridisation), Southern blotting, PFGE (pulsed-field gel electrophoresis), MAPH (multiplex amplifiable probe hybridisation), MLPA (multiplex ligation-dependent probe amplification), PRT (paralogue ratio test) and qPCR (quantitative polymerase chain reaction). What is more, the cutting-edge methods of analysis as well as advanced computational techniques enable CNV identification at a genome-wide scale using high-throughput genome scan technologies like NGS (next-generation sequencing) or genotyping microarrays (SNP microarrays). To infer copy number changes from a microarray analysis, the combination of the two measures of signal intensities may be used: LRR (log R ratio) and BAF (B allele frequency). A significant deviation from the expected distribution of these parameters implies an incorrect number of copies of a given allele (Wang et al. 2007). When it comes to livestock species, some significant advances have also been made lately. First of all, the construction of low CNV resolution maps for cattle, horse, goat, sheep, pig, dog, chicken, duck and turkey gave us an insight into their genomes and showed that these variations are widespread in these species. Moreover, like in humans, CNVs have been associated with different phenotypes and susceptibility to diseases, as well as developmental disorders, e.g. several pigmentation (white coat in horse, pig and sheep) and morphological (late feathering and pea comb in chicken) traits, osteopetrosis, anhidrotic ectodermal dysplasia, copper toxicosis, intersexuality and cone degeneration (reviewed by Clop et al. 2012).

The first small-scale analysis in cattle was carried out on two Hereford and three Holstein individuals by Liu et al. (2008). It allowed for the identification of 25 CNVs present on 16 autosomes, with a size ranging from 28.7 to 396.8 kb and an average size of around 127.8 kb (Liu et al. 2008). The next step could be taken along with the appearance of the Bovine SNP50 BeadChip, which allowed for the detection of bovine CNVs by high-throughput genotyping of different breeds. The analysis proved that there were differences in the frequency of CNVs between breeds (African, composite and Bos indicus breeds had higher frequency than Bos taurus breeds) (Matukumalli et al. 2009). The next studies on bovine CNVs were carried out simultaneously in 2010 by Bae et al. (2010) and Fadista et al. (2010). With the use of the Bovine SNP50 BeadChip and custom aCGH, respectively, they constructed two comprehensive CNV maps. However, the obtained size ranges of CNVRs differed from each other as follows: 50–200 kb (Bae et al. 2010) and 1.7 kb–2 Mb (Fadista et al. 2010). Nonetheless, despite the differences in the size range, in both studies, losses were approximately two to three times more frequent than gains. In 2011, Hou et al. performed research on 539 cows belonging to 21 modern breeds, which enabled them to identify 682 candidate CNVRs that covered 139.9 Mb (i.e. nearly 4.60 % of the bovine genome). Among these 682 CNVRs, there were 370 losses, 216 gains and 96 both (loss and gain in the same region). The chromosomes most rich in CNVs were 1 and 6, as well as pericentromeric and subtelomeric regions of chromosomes. Summarising, these results show that around 50 % of bovine CNVRs may be common to different breeds as well as individuals, albeit when taking into account CNVR frequencies, the existing differences are significant, implying that these structural variations could have participated in the process of breed differentiation (Matukumalli et al. 2009; Liu et al. 2010; Seroussi et al. 2010; Hou et al. 2011). Furthermore, bovine CNVRs may encompass about ∼300 and 500 genes (Bae et al. 2010; Fadista et al. 2010; Liu et al. 2010), of which at least 19 are engaged in human diseases. Moreover, CNV regions contain about 110 QTL (Fadista et al. 2010). Overall, with regard to these results, copy number variations may have an impact on traits of economic interest.

The first analyses of CNVs on a genome scale in the horse were performed in 2012 by two teams: Doan et al. with the use of a custom-designed whole-exome tiling array, as well as Dupuis et al. (2013) with the use of the Illumina Equine SNP50 beadchip. Doan et al.’s research was carried out on 16 horses of different breeds (e.g. Andalusian, Vanner, Miniature, Quarter Horse, Shire) and a grey donkey (Equus asinus). The number of detected CNVs was 2,368, with size range 197 bp–3.5 Mb and mean size 99.4 kb. Among these CNVs, there were 1,509 gains and 859 losses. A total of 438 CNVs were present in single horses (not shared with the others). When it comes to chromosomal distribution, CNVs were detected on each autosome and the X chromosome; however, some chromosomes (12, 17, 23) were enriched with CNVs (15.1 %, 9.1 %, 8.2 %, respectively). Moreover, the copy variations encompassed 1,707 genes, of which 559 exist as CNVs in humans (Doan et al. 2012). Dupuis et al.’ team in 2011 performed a genome-wide association study on 234 cases of horses with recurrent laryngeal neuropathy (RLN) and 228 breed-matched controls (Dupuis et al. 2011). Then, the data were also used to detect copy number variants and their eventual associations with RLN. In sum, 2,797 CNVs were detected for 477 horses, with an average size of 229 kb. Most of the CNVs (86 %) were observed only in four or fewer horses (i.e. <1 %). None of them were significantly associated with recurrent laryngeal neuropathy (Dupuis et al. 2013).

Despite the improvements in the genome analysis methods, the platforms to discover CNV in domestic animals are not sufficiently precise due to their low resolution, which prevents them from detecting small CNVs. Moreover, the results cannot be easily compared because of technical differences between platforms, and these technical issues can lead to false-negative and -positive results (reviewed by Cantsilieris and White 2013), which is why confirmation with alternative methods is usually required. Furthermore, genomics of livestock species encounter more obstacles when CNV platforms and genome assemblies are not available (e.g. camel, dromedary, alpaca, goat) (Clop et al. 2012). If that is the case, cross-species analyses must be carried out, which may have an impact on their sensitivity (Fontanesi et al. 2010, 2011). However, the application of high-throughput sequencing methods may help to solve these issues owing to their lesser bias (than SNP arrays or aCGH), an ability to identify larger numbers of CNVs during a single experiment and applicability to any species (even without a known genome sequence). Unfortunately, these methods are quite demanding when it comes to computational resources, and the results can also be influenced by technical issues (Alkan et al. 2011).

Hitherto, association studies carried out in domestic animals have concerned mainly Mendelian traits. The next very challenging step in animal genomics will be to identify associations between different CNV genotypes and complex phenotypes such as economic traits (e.g. fatness, milk production) or susceptibility to cancer and infectious diseases, which are important from the point of view of veterinarians and animal breeders (Clop et al. 2012).

Genetic differentiation and breed assignment

The idea of the assignment of individuals to their breed of origin has come from population genetic investigations, such as analysing genetic diversity and structure, evaluating the amount of genetic exchange between populations, identifying immigrants and detecting hidden population structures (Negrini et al. 2009). Genetic markers can be used to identify and verify the origin of individuals when genetic heterogeneity amongst populations is sufficient (Wilkinson et al. 2011). The development of assignment methods would make it possible to allocate animals and animal products to their breed of origin; for example, when requested documentation is lost or when external features of animals cannot be evaluated (Wilkinson et al. 2011; Gurgul et al. 2013). Moreover, the genetic identification can clear up issues such as, inter alia, the contribution of source populations to mixed fisheries, the identification of migrant individuals, structure and levels of diversity amongst populations, and tracking the trade routes of poached animals (Wilkinson et al. 2011).

SNP chips are highly informative but are relatively costly to produce. Moreover, they are computationally expensive to analyse. Hopefully, there is a possibility to reduce the number of markers by screening according to their information content so as to create reduced panels for population genetic analyses. Several statistical methods can be used to determine which genetic markers contain the most information to discriminate among populations (Wilkinson et al. 2011). Wilkinson et al. (2011) compared marker selection methods (delta, Wright’s FST, Weir and Cockerham’s FST and PCA) for selecting population informative SNP loci. The aim of their study was to determine the lowest number of SNP markers from the Bovine SNP50 BeadChip required for the effective and confident assignment of individual genotypes to European cattle breeds. All of the studied SNP selection methods yielded reduced marker panels capable of breed identification, but the power of assignment varied clearly between analysis methods. The pairwise Wright’s FST subtly outperformed other investigated methods in the individual assignment analysis, but delta, pairwise W&C’s FST and PCA did not perform poorly for assignment success rates (Wilkinson et al. 2011). Gurgul et al. (2013) used 120 SNP markers included in the Bovine SNP50 BeadChip genotyping assay (Illumina), which were recommended for parentage testing and pedigree verification in worldwide cattle populations. The results obtained were not completely satisfying and the authors suggested that the studied markers are not the best tool for breed discrimination, especially with the use of reference populations of small size. It was also suggested that markers’ informativeness and the power of discrimination between breeds may be higher for SNPs located in genes responsible for animals’ physiological properties (Gurgul et al. 2013). Nishimura et al. (2013), using Wright’s FST values, identified highly differentiated SNPs between Japanese Black and Holstein cattle. Twenty SNPs from the top 100 SNPs with high FST values (FST values over 0.61) were selected for primer design, followed by the genotyping of F1 animals. Of the SNPs, 18 (two SNPs were difficult to genotype and were excluded), located more than 30 Mb apart, were selected for breed assignment and allowed for the correct assignment of all examined samples to JB or to F1 and Holstein. The authors determined the number of SNPs which should be used for the assignment tests by the examination of an assignment error rate for each number of SNPs used for linear discriminant formula (Nishimura et al. 2013). Several statistical approaches have been developed to enable marker selection with the highest discrimination power between different populations. Nevertheless, the results obtained strongly depend on the differentiation of specific populations, which strongly influences the power of marker discrimination or their informativeness (Gurgul et al. 2013).

To allocate individuals of unknown breeds to their breed of origin, allocation tests are used. Some of them are implemented in freely available software like GeneClass or Structure, which integrate different algorithms for the assignment of individuals to their breeds or the identification of first-generation migrants and enables calculation of the associated probabilities. Negrini et al. (2009) compared the Bayesian (Rannala and Mountain 1997; Pritchard et al. 2000; Baudouin and Lebrun 2000) and frequency-based methods (Paetkau et al. 1995) implemented in GeneClass 2 and Structure 2.2 software for breed assignment. In the reallocation tests, methods implemented in Structure performed better than those in GeneClass. The percentage of correct assignments accounted for 96 % and 85 %, respectively. However, a higher correct assignment rate in allocating animals treated as unknowns to a reference dataset was shown for methods implemented in the GeneClass software. In the authors’ opinion, the results obtained showed that SNPs are suitable markers for the assignment of individuals to reference breeds and the software programs Structure 2.2 and GeneClass 2 can be complementary tools to assess breed integrity (Negrini et al. 2009). Wilkinson et al. (2011) suggested that the method of Rannala and Mountain (1997) is more effective for individual assignment than other methods. However, the authors pointed out that, if the levels of genetic differentiation between reference populations are high, the method of Paetkau et al. (1995) is equally effective. Gurgul et al. (2013) applied the Bayesian (Rannala and Mountain 1997) and frequency-based (Paetkau et al. 1995) methods for allocation tests in their study and found dependence in which worse performance of the Bayesian method for some breeds was compensated by relatively better performance of the frequency-based method of Paetkau et al. (Gurgul et al. 2013).

Even though SNP markers are extensively used in scientific and commercial applications, the methods using SNPs for breed recognition and assignment of individuals are not yet sufficiently developed and tested. However, recent research on the use of SNPs for breed assignment showed promising results and suggested that this kind of studies should be continued (Gurgul et al. 2013).

Summary

In this review, we presented a variety of applications of high-throughput genome analysis methods in studies on livestock and the most up-to-date research performed in this area. The article focuses mainly on the application of genotyping microarrays and gives detailed insight into the most interesting and popular applications of data obtained from the available genotyping platforms. We showed that animal genomics is currently undergoing dynamic development and provides interesting results, which may find a broader application, e.g. as a model for studies in other species, including humans. A new world of possibilities is currently being opened by next-generation sequencing methods, which allow the study of genomes in one base pair resolution. This will provide a stimulus for further evolution of animal genomics and, in conjunction with present knowledge and achievements of transcriptomics, proteomics and biochemistry, will bring us to the understanding of biological mechanisms shaping economically important traits of farm animals.