Keywords

1 The Water Buffalo Genome

There are two types of Asian water buffalo, the river- and the swamp-type, which are classified into distinct subspecies, Bubalus bubalis bubalis and Bubalus bubalis carabanensis, respectively, which were derived from independent domestication of different ancestral populations of wild Asian buffalo, Bubalus arnee (Cockrill 1981; Kumar et al. 2007a; Yindee et al. 2010; Nagarajan et al. 2015; Wang et al. 2017a; Colli et al. 2018; Sun et al. 2020). Their divergence predated domestication and was likely caused by isolation following the Pleistocenic glaciations (Wang et al. 2017a; Sun et al. 2020). Domestication of river buffaloes occurred in the western Indian subcontinent about 6000 years ago (Kyr BP) (Kumar et al. 2007b; Nagarajan et al. 2015) and in the China-Indo-China border region between 3000 and 7000 years ago for swamp buffaloes (Zhang et al. 2016; Wang et al. 2017a). Following domestication, river buffaloes migrated westwards through southwestern Asia to the Mediterranean basin, while swamp buffaloes spread to south- and north-eastern Asia (Colli et al. 2018; Zhang et al. 2020). There is extensive differentiation between the types in terms of morphology, behavior, and current geographical distribution (Macgregor 1941; Cockrill 1974; Zhang et al. 2020). The two types are genetically differentiated, as revealed by allozymes (Amano et al. 1980; Barker et al. 1997a), microsatellite markers (Barker et al. 1997b; Zhang et al. 2011; Kumar et al. 2006), single-nucleotide polymorphisms (SNPs) (Colli et al. 2018), and whole-genome sequence data (Dutta et al. 2020; Sun et al. 2020). The two types are also differentiated by mitochondrial DNA and Y-chromosome variation (Zhang et al. 2006; Yindee et al. 2010; Zhang et al. 2016; Wang et al. 2017a; Sun et al. 2020).

Both river and swamp buffaloes have five sub-metacentric autosomes, while the remaining autosomes and the sex chromosomes are acrocentric (Iannuzzi and Di Meo 2009). The number of chromosomes (2n) differ between the two types, the river karyotype (2n) being 50 while the swamp karyotype (2n) is 48 (Ulbrich and Fischer 1967; Fischer and Ulbrich 1968). The one-pair difference in chromosome number is the result of a rearrangement creating swamp buffalo chromosome 1 from a tandem fusion translocation between the telomere of river buffalo chromosome BBU4p and the centromere of BBU9 (Fig. 2.1) (Di Berardino and Iannuzzi 1981; Iannuzzi and Di Meo 2009; Luo et al. 2020). Fluorescent in situ hybridization (FISH) and C-banding studies (Di Meo et al. 1995; Tanaka et al. 1999) suggested that this fusion caused the loss of a large portion of heterochromatin and satellite DNA from BBU9, and of the nucleolus organizer region (NOR) present at the telomere of BBU4p (Fig. 2.1). In river buffalo, six nucleolus organizing regions (NOR) are found at the telomeres of chromosomes 3p, 4p, 6, 21, 23, and 24, while only five NORs are found in the swamp buffalo on chromosomes 4p (which corresponds to river buffalo chromosome 3p), 6, 20, 22, and 23 as a result of the tandem fusion translocation (Degrandi et al. 2014). As all chromosome arms pairs are conserved between the two species, river x swamp crosses show 2n = 49 and are fertile, although with a possible reduced fertility due to unbalanced chromosome sorting during meiosis (Iannuzzi and Di Meo 2009).

Fig. 2.1
figure 1

Tandem fusion translocation between the centromere and the telomere of river buffalo chromosomes 9 and 4p, respectively, originating swamp buffalo chromosome 1. The loss of NORs following the fusion is also indicated. (Modified with BioRender.com from Iannuzzi and Di Meo (2009))

Initial information on the general organization and gene localization on domestic buffalo chromosomes came from the comparison with cattle. Cytogenetic c-banding studies provided the arm-by-arm matching between water buffalo and cattle chromosomes, while Fluorescent In Situ Hybridization (FISH) enabled cattle-derived genes and microsatellite loci to be mapped to water buffalo chromosomes (Amaral et al. 2008; Iannuzzi and Di Meo 2009; Michelizzi et al. 2010). A first-generation whole-genome Radiation Hybrid (RH) map was obtained by mapping 2621 cattle-derived markers on to a river buffalo-hamster panel of hybrid cells (Amaral et al. 2008). This showed that there was good conservation of synteny between cattle and buffalo genomes within linkage groups (Michelizzi et al. 2010).

2 Reference Genome Sequences

A first draft of the domestic buffalo whole-genome sequence Bbu_2.0-alpha was obtained by Illumina paired-end sequencing from a female Murrah river buffalo (Tantia et al. 2011). The genome sequence, which had 17-19X read depth, was assembled by alignment with the bovine genome and was deposited in the NCBI Short Read Archive (SRA) in 2009 under Accession Numbers SRX016621 and SRX015182. The first de novo buffalo genome assembly was created by the International Buffalo Genome Consortium (Williams et al. 2017). A total of 242 Gb raw sequence data derived from Olimpia, a highly homozygous female Mediterranean buffalo, was obtained by combined high-throughput sequencing on the Illumina Genome Analyzer IIx System and Roche 454 FLX titanium platforms, with an approximate 70X genome coverage. This high coverage assembly UMD_CASPUR_WB_2.0 (GenBank Accession Number GCF_000471725.1. Table 2.1) was 2.83 Gb in size, with a 21.94 kb contig N50 and 21,711 protein-coding genes that were annotated based on RNA-seq data from 30 tissues (NCBI SRA project PRJNA207334).

Table 2.1 Summary information on the water buffalo genome assemblies currently available in either the National Center for Biotechnology Information (NCBI) or in the National Genomics Data Center of the China National Center for Bioinformation (CNCB-NGDC)

The river buffalo reference genome was further improved using PacBio long read sequence data from the DNA of the same individual used for UMD_CASPUR_WB_2.0 which were combined with Chicago- and Hi-C-based chromatin interaction maps (Cairns et al. 2016) to scaffold a 69X de novo chromosome-level sequence (Low et al. 2019). Indels were corrected using additional Illumina paired-end sequencing. The total length of UOA_WB_1 is 2.66 Gb and contains 509 scaffolds with an N50 of 117.2 Mb (Table 2.1). About 1.53 Gb (58%) of the sequence is haplotype-resolved and has only 383 gaps. The 29 chromosome-level scaffolds are ordered consistently with the buffalo whole-genome radiation hybrid (RH) map and show high conservation of synteny with the homologous chromosomes of Bos taurus genome version UMD3.1 (Fig. 2.2). Annotation of UOA_WB_1 was produced by incorporating information on 3462 buffalo transcripts from GenBank, 1013 buffalo GenBank protein sequences, 50,553 and 13,381 Ref Seq protein from human and cattle, respectively, and RNA sequencing data from more than 50 different buffalo tissue types (Low et al. 2019). This high contiguity assembly constitutes the current domestic buffalo reference genome (Gen Bank Ref Seq GCF_003121395.1), and the annotation can be browsed with the NCBI Genome Data Viewer tool (https://www.ncbi.nlm.nih.gov/genome/gdv/?org=bubalus-bubalis&group=bovidae).

Fig. 2.2
figure 2

Circular plot of water buffalo chromosomes mapping to B. taurus genome. (From Low et al. (2019))

The trio-binning approach (Koren et al. 2018) in which parental information is used to phase sequence data prior to assembly, has been recently used to create a haplotype-resolved assembly of the Murrah buffalo (Ananthasayanam et al. 2020). PacBio long reads and 10X Genomics linked reads at 166X coverage with an additional 802 Gb of optical mapping data were phased using 274 Gb paired-end data from the parents. Chromosomal level assembly for paternal and maternal genomes with 25 scaffolds and N50 of 117.48 Mb (sire haplotype) and 118.51 Mb (dam haplotype) were achieved; however, this genome is not presently annotated. Recently, a combined PacBio and Hi-C sequencing approaches were used to create the first reference genome sequence for swamp buffalo CUSA_SWP (CNCB-NGDC Accession number GWHAAJZ00000000), estimated to be 2.63 Gb in size (Luo et al. 2020). The chromosome-scale scaffolds of this assembly have an N50 of 117.3 Mb (Table 2.1), representing the 24 swamp buffalo chromosomes and covering 97.5% of the genome.

The availability of high-quality genomic sequences for both river and swamp buffaloes has made it possible to compare the genome arrangement of the two subspecies and to evaluate their relationship in an evolutionary perspective. Analysis of the sequence similarity confirmed that swamp buffalo chromosome 1 arose from the fusion of river buffalo chromosomes 4p and 9. Comparison of the sequences also facilitated the calculation of divergence times based on 6429 single copy orthologs sequences, which showed that the common ancestor of swamp and river buffaloes dates back to 1.1–3.5 million years ago (Luo et al. 2020). This suggests that glacial events known to have occurred at that time may have played a role in separating ancestral buffalo populations on either side of the present-day India-Myanmar border, leading to geographical isolation and genetic divergence. Expanded genes were found in both subspecies, 179 in swamp and 261 in river-type, which may account for the phenotypic differences, particularly in relation to muscle growth and environmental adaptation.

3 Genetic Diversity

The first genetic studies on buffalo were based on variations in protein-coding loci (i.e., allozymes) and microsatellite loci (Barker et al. 1997a, b) in the nuclear genome. Direct sequencing of the control region or of the cytochrome b (Cytb) gene was used to assess maternally inherited mitochondrial DNA (Lau et al. 1998), and sequences from the non-recombining part of Y-chromosome have been used more recently to study the inheritance of the paternal genome (Yindee et al. 2010). High-throughput sequencing and the availability of whole-genome sequence information have made it possible to discover a large number of single-nucleotide polymorphism (SNP) markers and to devise marker panels (Iamartino et al. 2017) for cost-effective genotyping of large numbers of animals to investigate genome diversity and for genome-wide association studies (GWAS).

3.1 Microsatellite Markers

Being highly polymorphic, microsatellite loci that contain Short-Tandem Repeats (STRs) were among the first markers to be widely used for the characterization of the nuclear genomic diversity in both river and swamp buffaloes. Although a panel of 30 microsatellite loci was proposed by ISAG-FAO (FAO 2004), most studies have used different or partially overlapping marker sets. This, together with the difficulties in harmonizing allele scoring of STRs across laboratories, prevented a global scale characterization of water buffalo diversity.

In domestic species, diversity levels are higher close to domestication centers, and decrease as populations move away from it. This pattern of diversity is also seen for both water buffalo subspecies. At the local level, microsatellite-based studies showed that for both buffalo types the highest diversity was found close to the respective likely domestication centers, with a gradual decrease along the dispersal routes. The expected heterozygosity (HE) of river buffalo assessed using STR loci, decreases from HE = 0.71–0.78 for the Murrah, Nili-Ravi, and Kundi group from the Indian subcontinent (Kumar et al. 2006; Vijh et al. 2008) to HE = 0.58–0.68 of the Mediterranean population from Italy (Moioli et al. 2001), which is consistent with the supposed center of domestication in the Indian subcontinent. Similarly, the highest heterozygosity in swamp buffaloes is in Thailand (HE = 0.573), close to the likely domestication center across the China-Indochina border, although overall swamp-type populations display little differentiation from each other all over southeast Asia (Zhang et al. 2011).

A recent meta-analysis made use of model-based clustering of microsatellite marker data from both river and swamp buffaloes (Zhang et al. 2020), which showed a clear difference between the two types and the occurrence of some degree of introgression from the river-type gene pool into swamp buffaloes in Far Southwest China, in Thailand, and to a lesser extent, in the Malaysian peninsula and the Philippines.

3.2 SNP (Single-Nucleotide Polymorphism) Diversity

SNPs represent the most frequent type of mutation occurring genome-wide. Comparison of WGS sequences in mammalian livestock has shown that ca. 30 million SNP loci can be found on average within a species (Alberto et al. 2018; Júnior et al. 2020; Luo et al. 2020). Most SNPs are biallelic, which makes them ideal for the development of array-based genotyping panels and simplifies data analysis.

Cattle SNP arrays have been evaluated for use in buffalo (Michelizzi et al. 2011; Wu et al. 2013; Borquis et al. 2014; Pérez-Pardal et al. 2018). Most loci on the cattle panel can be genotyped successfully in buffalo; however, only a small percentage of cattle SNP are polymorphic in buffalo, e.g., only 926 loci (1.71%) out of the ca. 50K markers included in the Illumina BovineSNP50 Bead Chip and only 16,580 (2.41%) out of the 777K SNP in the Illumina Bovine HD Bead Chip (Michelizzi et al. 2011; Borquis et al. 2014). This shows the low cross-species applicability of SNP arrays, which decreases proportionally to the evolutionary divergence time and that most polymorphisms targeted by SNP panels are generally of recent evolutionary origin.

A buffalo-specific SNP panel including 89,988 SNPs, 5799 probes for quality control and 1784 gender identification probes (Iamartino et al. 2017) is commercially available as the Axiom® Buffalo Genotyping Array (Thermo Fisher). The SNP loci were originally selected to have an even genome-wide distribution using the bovine UMD 3.1 genome to estimate their position but have recently been remapped to the new version of the buffalo reference genome (Low et al. 2019). SNPs were initially identified from whole-genome sequences of Mediterranean, Murrah, Jaffarabadi, and Nili-Ravi buffaloes. As these are all river-type breeds, only ca. 22.74% of the loci are polymorphic in swamp buffaloes (Colli et al. 2018). The 90K panel has also been evaluated in Lowland Anoa (Bubalus depressicornis) and Cape buffalo (Syncerus caffer), resulting in 7652 (7.8%) and 3239 (3.3%) out of 89,988 loci respectively polymorphic in these species (Iamartino et al. 2017).

A multi-species SNP panel has been developed by the European IMAGE consortium to reduce genotyping costs. The IMAGE panel incudes ca. 11K buffalo SNP markers, selected to be evenly distributed genome-wide, to be informative in river and swamp populations by including ancestral polymorphisms, and also to include functional, mtDNA, and Y-chromosome variants (Crooijmans et al. in preparation). Many loci are shared with the Axiom 90K panel to make data integration and meta-analysis possible.

The Axiom Buffalo Genotyping Array has been used to study diversity, linkage disequilibrium, effective population size, genome-wide distribution of runs of homozygosity (ROH), and GWAS have been carried out for milk production, reproduction and disease traits (de Camargo et al. 2015; Colli et al. 2018; Liu et al. 2018; Deng et al. 2019; Du et al. 2019; Guzman et al. 2019). Using genotype data from 20,463 loci that were polymorphic in both river and swamp buffaloes, Colli et al. (2018) evaluated genetic variation, population structure, and gene flow in 30 buffalo populations worldwide. Three distinct gene pools were found, corresponding to individuals of pure river, pure swamp, and admixed river × swamp ancestries. The occurrence of two independent domestication events was confirmed, and several links between populations were identified that were consistent with human migrations, importation, and crossbreeding to improve performance. Association studies have identified candidate genes, that are either trait-specific or with pleiotropic effects, which can be used by breeders to select animals carrying advantageous alleles to improve production traits, particularly milk fat yield and milk protein percentage (Liu et al. 2018). Candidate genes likely to affect milk yield, fat yield, and protein yield have been reported in two genomic regions which respectively harbored the genes MFSD14A, SLC35A3, PALMD and RGS22 and VPS13B (Liu et al. 2018). These regions corresponded to locations on cattle chromosomes BTA3 and BTA14 where QTLs influencing milk performance have been reported in dairy cattle (Harder et al. 2006; Wibowo et al. 2008). The comparison of data from cattle and buffalo suggested that different alleles affect milk traits, for fat production in particular, in the two species (de Camargo et al. 2015; Liu et al. 2018).

Runs of homozygosity (ROHs) are contiguous genomic regions where an individual has inherited the same segment on both chromosomes from a common ancestor (Ceballos et al. 2018). Inbreeding at the genomic level can therefore be estimated from the frequency and length of ROHs. Additionally, ROHs can indicate regions impacted by selective pressure over time. The distribution of ROHs over the genome, their abundance, and length are determined by a number of factors including region-specific recombination rate, GC content, selective pressures, and demographic events (McQuillan et al. 2008; Bosse et al. 2014). The length of the ROH regions is in general inversely proportional to time passed since historic, demographical or selection events, with longer ROHs corresponding to recent inbreeding and shorter ROHs arising from ancient demographic changes as bottlenecks or founder effects (Cardoso et al. 2018). Genomic patterns of runs of homozygosity have been studied in several livestock species to find signatures of natural or human-mediated selection, and to identify advantageous genetic variants that may have become fixed or almost fixed as a consequence (Peripolli et al. 2017). The availability of the Axiom buffalo genotyping array has facilitated studies on the occurrence of ROHs genome-wide in water buffalo populations. Runs of homozygosity have been studied at the local level in Iranian Azeri and Khuzestani, and in Brazilian Murrah river buffaloes (Ghoreishifar et al. 2020; Nascimento et al. 2021), and also in a set of 15 river and 15 swamp buffalo populations worldwide (Macciotta et al. 2021). Particularly long ROHs, up to 10 Mb with mean ROH length per animal of 4.28 ± 1.85 Mb, were found in all Brazilian Murrah individuals, most likely resulting from recent strong selection for milk production traits in this population (Nascimento et al. 2021). Conversely, Iranian breeds showed ROHs about 1 Mb long in all animals, but the total number and total length varied considerably between individuals, suggesting the occurrence of both recent and past inbreeding events which have affected the populations (Ghoreishifar et al. 2020). In both studies, genes and QTLs found within ROH islands bore signatures of selection and were associated with selected traits and functions including body size, muscle and bone development, immune response, milk traits (milk yield, milk fat yield and percentage, milk protein yield and percentage), coat color and pigmentation, reproduction and morphology (Ghoreishifar et al. 2020; Nascimento et al. 2021).

ROH analysis across buffalo types identified >18K homozygous regions overall (Macciotta et al. 2021). Swamp-type buffalo populations possessed more ROHs and had higher genomic inbreeding and number of ROHs per animal. Although differences found between river and swamp populations may have been partly affected by ascertainment bias in the SNP on the array, a convergent signature of selection was found in both river and swamp buffaloes on chromosome 2 where a large ROH island spans genes involved in adaptation to the environment and reproduction (Macciotta et al. 2021).

4 Nuclear Genome Diversity

Huge amounts of whole-genome resequencing data are being produced using high-throughput sequencing platforms. WGS sequence data have been published for water buffaloes from many countries: Bangladesh, Bengal, China, Egypt, India (multiple breeds), Indonesia, Iran, Iraq, Italy, Laos, Myanmar, Nepal, Pakistan, the Philippines, Vietnam, and Thailand (Dutta et al. 2020; El-Khishin et al. 2020; Luo et al. 2020; Mintoo et al. 2020; Sun et al. 2020). Based on 5–10X resequencing data from swamp- and river-type animals from 14 Asian and one European country, ca. 33.5 million SNPs were identified overall for the two subspecies, while 18.7 and 23.7 million SNPs were found to be polymorphic within swamp and river buffalo populations, respectively (Luo et al. 2020). Genetic variation within subspecies has been estimated by calculating nucleotide diversity (θπ), a statistic frequently used in population genetics, which corresponds to the average number of differences per nucleotide site found when the DNA sequences are compared pairwise among all individuals in the sampled set. Overall, swamp buffaloes have lower genetic diversity than river buffalo, with nucleotide diversity values of ca. θπswamp = 0.0018 and θπriver = 0.0027. Among the river-type, the Italian Mediterranean population has a lower-than-average nucleotide diversity, (ca. θπ = 0.002) compared to buffaloes from Pakistan, Iran, and Iraq (ca. θπ = 0.0028) and to Indian breeds (ca. θπ = 0.0025 Luo et al. 2020). This reduction of variability is consistent with the loss of diversity as populations move from their geographic origin and alleles are lost. Gene flow has been seen in the regions of Southeast Asia where the two subspecies hybridize naturally (Luo et al. 2020). Introgression from river buffalo into swamp populations is now used widely to improve productivity, especially in Laos, Myanmar, and the Philippines. Interestingly in Yunnan Province of China, river buffaloes displayed genomic introgression from swamp buffalo (Luo et al. 2020).

Genome-wide analyses have identified hundreds of genes harboring differential selection signatures in swamp and river buffaloes (Dutta et al. 2020; Luo et al. 2020), with a set of 67 genes under selection that are common to both groups (Luo et al. 2020). Pathway and gene ontology analyses of genes under selection in the swamp buffalo identified a significant enrichment in muscle- and cardiac-related functions, nerve development, and behavior, which may account for physical strength and endurance, and docile temperament (Luo et al. 2020). Selective sweeps have also been found in two starch-digestion enzymes genes, which suggests a possible adaptation to reduce rumen acidosis induced by the starch-rich feed traditionally supplied to swamp buffaloes used for traction in several Asian countries (Luo et al. 2020). In the case of river buffaloes, selection signatures have been reported in genes related to body size, fecundity, fetal growth, birth size, and milk production. Selective sweeps common to both subspecies suggest the occurrence of convergent evolution affecting body size, immune response to pathogens, and behavioral changes related to the “domestication syndrome” (Dutta et al. 2020; Luo et al. 2020).

5 Functional Variation

Putative sequence variants affecting a range of phenotypes have been proposed by screening the candidate genes in the water buffalo genome. This strategy identified variations related to coat color, milk production, reproductive performance, and diseases, although the validation of these associations is difficult.

In swamp buffaloes, the most frequent coat color is solid dark gray. Solid white animals are not uncommon with a 10% frequency in some populations. White seems to be dominant over the dark variant (Rife and Buranamanas 1959; Rife 1962), but so far variants that control the white coat have not been identified. Swamp buffalo bulls with a white spotted coat, a phenotype found only in animals from Tana Toraja in Indonesia, are highly prized for ceremonial purposes. Two independent loss of function mutations in the microphthalmia-associated transcription factor (MITF) gene are associated with the white spotted coat of swamp buffalo (Yusnizar et al. 2015). Specific allelic variants have been reported that affect milk production and growth traits. Milk protein percentage in Mediterranean buffalo has been associated with a C > T transition in the signal transducer and activator of transcription 5A (STAT5A) gene and a non-synonymous point mutation in the insulin-like growth factor 2 (IGF2) gene which also has an effect on average daily gain (Abo-Al-Ela et al. 2014; Coizet et al. 2018). SNPs affecting milk yield, fat and protein percentage and yield have been identified within the growth hormone receptor (GHR) gene in Egyptian buffaloes, in which animals carrying the AA homozygous genotype at both missense mutations 380G > A and 836T > A displayed higher productive performance (El-Komy et al. 2020).

The identification of genetic variants related to reproduction and fertility is particularly important for the efficient implementation of breeding plans. In male buffaloes from China variations at two genes, luteinizing hormone beta polypeptide (LHB) and gonadotropin-releasing hormone receptor (GnRHR) were recently found to affect semen quality traits, i.e., volume of the ejaculate and quality of the sperm cells (Cheng et al. 2017; Wang et al. 2017b, 2020). In females, pregnancy rate and susceptibility to anestrus are influenced by variants occurring in the melatonin receptor 1A (MTNR1A) gene and the cytochrome P450 aromatase gene (CYP19A1), respectively (El-Bayomi et al. 2017; Pandey et al. 2019). Susceptibility to disease is a trait that potentially has a significant impact on the viability of livestock production. Susceptibility to tuberculosis and mastitis have been associated with specific mutations. In Mediterranean buffaloes, a G > A transition at nucleotide position 4467 in the 3′-UTR (untranslated region) of the interferon gamma (IFNG) gene disrupts the target site for micro-RNA miR-125b, which seems to be responsible for an increased susceptibility to Mycobacterium bovis (Iannaccone et al. 2018). In Egyptian buffaloes, a C > A transversion in exon 27 of the complement component 3 (C3) gene has been shown to have a significant association with milk somatic cell score, which is correlated with mastitis (El-Halawany et al. 2017).

The heritable defect transverse hemimelia (TH) causes abnormal development of the terminal portion of the limbs, which has varying degrees of severity. A case-control study based on whole-genome sequence data suggests that TH is an oligogenic trait and 13 putative candidate genes have been identified. In particular, variants in wingless-type MMTV integration site family, Member 7a (WNT7A) and SWI/SNF Related, Matrix Associated, Actin Dependent Regulator (SMARCA4) genes in the homozygous state are associated with the extreme forms of the phenotype (Whitacre et al. 2017).

6 Uniparental Genomes Diversity

6.1 Mitochondrial Genome

Mitochondrial DNA (mtDNA) is found at high number of copies per cell and is maternally inherited. The mitochondrial genome is haploid and hence does not recombine, but nevertheless has a high mutation rate because of the absence of repair mechanisms. As a result of these features, mtDNA variation has been used to investigate diversity and phylogenies of the maternal lineage in many animal species (Lenstra et al. 2012). The current water buffalo mtDNA reference sequence (Gen Bank Accession Number NC_006295) is from a swamp-type buffalo belonging to the Haikou breed from China. The swamp buffalo mitochondrial genome has a total size of 16,359 bp, and includes the displacement loop (which is non-coding but contains the origin of replication of the H-strand), two genes coding for 12s and 16s ribosomal RNAs (rRNA), 22 genes for transfer RNAs (tRNA), and 13 genes encoding proteins involved in the mitochondrial electron transport chain and ATP synthesis (Fig. 2.3). Partial copies of varying length of the mtDNA genome have been integrated into the nuclear genome in eukaryotic species in the course of their evolution (Richly and Leister 2004). These nuclear-mitochondrial transpositions (NUMTs) have not been investigated in detail in water buffalo; however, a BLAST search (https://blast.ncbi.nlm.nih.gov/Blast.cgi) for sequences occurring in both mtDNA and nuclear genome reference sequences returns 24 hits, with percentages of identity between 74.89–100% and length 31–5346 bp. The longest uninterrupted hit of 6430 bp is found on chromosome X, and covers positions 9991–16,359 of the mtDNA reference. The water buffalo mtDNA control region, or displacement loop (D-loop) is 909 bp long, similar to other buffalo species, e.g., the tamaraw Bubalus mindorensis (922–925 bp) and the lowland anoa B. depressicornis (927 bp), and also cattle, B. taurus (909 bp) and Bos indicus (912 bp), but is shorter than that of pig (1175 bp), sheep (1179 bp), and goat (1212 bp) due to the lack of repetitive motifs.

Fig. 2.3
figure 3

Water buffalo mitochondrial DNA organization based on the current B. bubalis mtDNA Reference Sequence (Gen Bank Accession Number NC_006295). Protein, tRNA, and rRNA coding genes are grouped and color coded as follows: purple, tRNAs; red, rRNAs; blue, NADH dehydrogenase; dark green, cytochrome oxidase; light green, ATP synthase; orange, cytochrome b. (The picture was created with Genome VX tool (http://wolfe.ucd.ie/GenomeVx/))

The first mtDNA investigations, based on the analysis of D-loop and Cytb sequence data, highlighted a deep divergence between swamp and river buffalo female lineages and the presence of a hotspot of diversity in animals from southeast Asia, which pointed to domestication in this region (Tanaka et al. 1996; Kikkawa et al. 1997; Lau et al. 1998). The analysis of the entire control region sequence clearly suggested different maternal origin and independent domestication of river- and swamp-type buffaloes (Kumar et al. 2007a; Lei et al. 2007). Large-scale screening of control region variation (Zhang et al. 2016) identified five frequently found swamp buffalo haplogroups, SA1, SA2, SB1, SB2 and SB3, and three rare and highly divergent additional haplogroups, SC, SD, and SE (Figs. 2.4 and 2.5) (Sun et al. 2020). In river buffalo, four haplogroups were found, R1, R2, R3, and R4 (Zhang et al. 2016; Sun et al. 2020). Three are spread across the whole distribution area of river buffalo from the Asian continent to the Mediterranean basin (Figs. 2.4 and 2.5). The R1 haplogroup occurs more frequently (75.4%) compared with R2 (16.4%), and R3 is relatively rare (8.2%) (Fig. 2.4) (Kumar et al. 2007a; Zhang et al. 2016). The R4 haplogroup has only been identified in Italian animals so far (Fig. 2.5. Sun et al. 2020). For both river and swamp buffaloes, mtDNA haplogroup frequencies vary slightly across the distribution areas (Figs. 2.4 and 2.5), following the general trend of a progressive loss of diversity moving away from the domestication centers. The occurrence of rare and highly divergent variants points to the repeated post-domestication introgression of wild buffalo lineages into domestic buffalo (Nagarajan et al. 2015; Wang et al. 2017a), which has also been seen in several livestock species (Bonfiglio et al. 2010; Colli et al. 2015).

Fig. 2.4
figure 4

Distribution and frequency of river and swamp buffalo mtDNA haplogroups in different areas of the world based on control region data. The inset shows the phylogenetic relationships between the haplogroups. (From Zhang et al. (2020))

Fig. 2.5
figure 5

Phylogenetic networks of (a) river and swamp buffalo complete mitochondrial genomes, and (b) Y-chromosome haplotype variation based on 520 SNPs. Gray-shadowed rectangles identify different haplogroups. Edge widths are proportional to the number of mismatches between the joined haplotypes. Geographical provenance of the samples is shown in the top panel. (Modified from Sun et al. (2020))

Using mtDNA data, water buffalo phylogeographic patterns of variation and demographic history have been investigated. MtDNA data show that swamp buffalo populations are strongly partitioned geographically with little gene flow, despite their phenotypic uniformity. In contrast, river buffaloes have a lower haplotypic diversity and a weaker phylogeographic structure (Zhang et al. 2016; Wang et al. 2017a), which may be partly due to the human-mediated effects, including selection for productivity and exchange of animals for the improvement of milk production. Swamp buffaloes in southern China/northern Indochina straddling the Mekong River have the highest diversity, suggesting this region as the likely center of domestication of the swamp type, which is estimated to have occurred ca. 3–7 Kya (Zhang et al. 2016; Wang et al. 2017a). Diversity of river buffaloes is highest in India and gradually decreases westward towards the Mediterranean (Nagarajan et al. 2015), supporting the hypothesis of domestication in the Indian subcontinent. The gradual decline in diversity from the Indian subcontinent suggests that after domestication migration of river buffalo westward towards southwestern Asia and the Mediterranean occurred gradually and without major bottlenecks (Zhang et al. 2016), in agreement with microsatellite- (Moioli et al. 2001; Kumar et al. 2006; Vijh et al. 2008) and SNP-based evidence (Colli et al. 2018).

Whole mitogenomes of swamp buffalo show that Pleistocenic glacial periods, before 11K years ago, played a role in shaping phylogeny and demographic history of the subspecies. Divergence between river- and swamp-type mitochondrial lineages has been estimated to have occurred at the beginning of a glacial period (900–860 Kya). Within swamp buffaloes, the two major macro-lineages diverged at the time of the second glacial event 200–130 Kya (Wang et al. 2017a). Interestingly, most present-day swamp buffalo mitochondrial genome originated from two ancestors dating back to the Last Glacial Maximum 26–19 Kya and further differentiated during the Holocene warm period 11–6 Kya. A substantial bottleneck was observed at 7–3 Kya, corresponding to a reduction in effective population size that occurred at the time of domestication (Wang et al. 2017a).

6.2 Y-Chromosome

The non-recombining part of the Y-chromosome, i.e., the portion lacking a homologous counterpart on the X-chromosome, provides information on the paternal lineage. Low variability of the Y-chromosome has made the identification of Y-chromosome polymorphisms difficult in most livestock species (Hellborg and Ellegren 2004). As a consequence, Y-chromosome variation has been analyzed by sequencing short gene fragments or by typing a few microsatellite loci (Zhang et al. 2006, 2016; Yindee et al. 2010; Wang et al. 2018). Microsatellite-based studies of the buffalo Y-chromosome diversity initially exploited cattle STR loci with limited success. Out of 40 cattle loci tested in Asian swamp buffaloes, only 12 could be satisfactorily amplified and only seven were polymorphic (Wang et al. 2018). Together they described nine haplotypes and four haplogroups, i.e., Y1, Y2, Y3, and Y4. Haplogroup Y1 occurred more frequently (83.4%) across southeastern Asia, while Y2 and Y3 had discontinuous distributions and varying frequencies. The high frequency of Y1 around the Yangtze Valley suggested this as the domestication center. Interestingly, the rare and highly divergent Y4 was found only on Hainan Island, and was interpreted as introgression from an ancient wild Asian buffalo population native to the island (Wang et al. 2018).

Studies of the Y-chromosomes genes of water buffalo have produced sequence data from specific genes, e.g., DEAD-Box Helicase 3 (DBY), Y-linked Zinc finger Y-chromosomal protein (ZFY), and Sex-determining Region Y (SRY) gene segments (Zhang et al. 2006; Yindee et al. 2010; Zhang et al. 2016). Sequencing of SRY identified a single SNP difference between river and swamp buffalo at position 202 of the coding region. PCR genotyping of this SNP was used to reveal male-mediated introgression of river-type buffalo into swamp buffalo in Southern China (Zhang et al. 2006). A 2310-bp Y-chromosome fragment was sequenced in 495 males (450 swamp and 45 river) from 35 populations spanning most of southeastern and central Asia (Zhang et al. 2016). Among swamp buffaloes sequence analysis identified only nine SNPs originating 11 haplotypes, of which four were frequent and seven rare. Only one SNP defining two haplotypes was found in river buffalo (Zhang et al. 2016). Nevertheless, the geographical distribution of the haplogroups and the values of haplotype diversity in swamp buffaloes corroborated the hypothesis of domestication around the border between China and Indochina (Zhang et al. 2016).

Despite the lack of a buffalo Y-chromosome reference assembly, sequencing of whole genomes of about 90 male buffaloes from different geographical areas produced Y-chromosome data (Fig. 2.5) (Sun et al. 2020). A total of 520 Y-chromosome specific SNPs were detected in these data, which identified a clear subdivision of the paternal lineages into swamp- (YS) and river-type clades (YR). Two haplogroups were identified within each clade: swamp-type haplogroups YS1 and YS2 are spread across the whole distribution area of the subspecies, although with varying frequencies, while river-type haplogroups were geographically separated, YR1 and YR2 were detected in southern Asia and Italy, respectively. The YR clade haplotypes have been identified in swamp individuals showing the male-mediated river buffalo introgression into swamp buffalo populations due to crossbreeding plans to improve productivity (Fig. 2.5). Y-chromosome data confirms the clear genetic subdivision between river and swamp paternal lineages, indicating that they originated from different wild ancestral population.

7 Conclusions

Y-chromosome specific loci and mitochondrial genome variation suggest that swamp and river buffaloes were domesticated from different ancestral populations that had already diverged due to prolonged isolation following glacial events during the early Pleistocene period. Likely domestication sites of river and swamp buffaloes were the northwestern Indian subcontinent and the China/Indochina border region, respectively (Kumar et al. 2007b; Yindee et al. 2010; Nagarajan et al. 2015; Zhang et al. 2016). This hypothesis has been confirmed by microsatellite loci (Kumar et al. 2006; Vijh et al. 2008; Zhang et al. 2011, 2020) and genome-wide SNP analysis (Colli et al. 2018). River buffaloes have fewer haplotypes and lower overall diversity compared with swamp buffalo for both mtDNA and Y-chromosome markers, which contrasts with the higher values of genomic diversity estimated in river buffaloes compared to swamp buffalo based on whole-genome sequence data (Luo et al. 2020). These differences may be explained by taking demographic effects into account. Inference drawn from whole-genome sequences showed that around 100 thousand years ago river buffalo ancestors experienced a four times larger population expansion compared to swamp buffalo ancestors (Luo et al. 2020). This expansion of the ancestral river buffalo likely led to increased diversity of which only a small amount was sampled when domestication occurred, resulting in a reduction in uniparental lineages while nuclear genomic diversity was maintained in the background. Another explanation may be that in swamp buffalo domestication captured a proportion of diversity from wild population with a high variability of uniparental markers but a generally lower nuclear genomic diversity. Following domestication swamp buffalo variation increased due to a further introgression from wild populations (Zhang et al. 2016).