Introduction

With around 400 breeds recognized worldwide, dog (Canis lupus familiaris) represents an incomparable species in terms of phenotypic variability. After centuries of selection, each breed possesses a breed-specific gene pool resulting in unique phenotypes such as body size, morphology, coating, and behavior. Fixation of favored alleles by selection, crossing, and genetic drift, it has led to an increased frequency of deleterious alleles, mainly recessive, responsible for hereditary diseases (Switonski 2014).

As observed in human, dog is prone to monogenic and complex diseases. Around 800 canine monogenic diseases have been described: the causal gene mutation is known for at least 318 disorders, and about 543 diseases are considered models for human diseases (December 28, 2021, https://omia.org/home). Notably, several canine monogenic disorders are characterized by a breed-specific distribution (Switonski 2014), with lower frequency in populations with a wide gene pool (characteristic of mixed-breed dogs) than within narrow purebred populations. Indeed, the incidence of certain genetic diseases is higher among some pure breeds compared to others and their prevalence can reach very high levels in a single breed (Zierath et al. 2017). To date, most research focused on monogenic diseases due to their relatively simple molecular background and their large numbers observed in pure breeds. In 1989, Evans and colleagues (Evans et al. 1989) reported the first gene mutation causing hemophilia B in dogs (OMIA#000,438–9615). Since then, many mutations were identified for hundreds of disorders, mostly inherited by an autosomal recessive pattern (Switonski 2014), such as lethal acrodermatitis (OMIA#002,146–9615) and hereditary nasal parakeratosis (OMIA#001,373–9615).

Canine epidemiological studies determined that the combination of genetic, environmental, and lifestyle factors cause most of the complex diseases, including several congenital defects and adult-onset diseases (Donner et al. 2018). The most common orthopedic condition diagnosed in dog, canine hip dysplasia (OMIA#000,473–9615), is an example of complex disease with high prevalence in large and giant breeds, characterized by a relatively low heritability and a phenotypic expression strongly influenced by environmental factors such as caloric intake during growth (King 2017). Diabetes is another example of common complex disease for which segregation analysis suggested a polygenic mode of inheritance in Australian Terriers; indeed, no evidence for a large effect of a single gene was found in diabetic dogs (Mui et al. 2020). Similarly, Addison’s disease (AD) or primary hypoadrenocorticism is a polygenic autoimmune disorder characterized by adrenal insufficiency which follows the destruction of the adrenal cortex. Although the risk for AD is higher in some breeds such as Bearded Collie, the disorder is commonly observed in purebred and mixed breed dogs (Gershony et al. 2020).

Here we reviewed the most relevant studies on dog–human translational genomics, while providing a summary of the data accessible from publicly available canine databases. The aim is to offer a comprehensive overview of the knowledge and resources available today for all the researchers involved in the field.

Dog as translational model

In human disease research, the murine model is the most extensively used (Rosenthal and Brown 2007) to unveil mechanisms underlying disorders, to test the efficacy of drugs (Justice and Dhillon 2016), and to study the fundamentals of cancer biology (Gordon et al. 2009). However, this species shows large physiological differences when compared to humans, such as size and metabolic rate. Body size is correlated to different life-history traits; indeed, the two species show large differences in reproductive parameters such as age at reproductive maturity, length of gestation, litter size, birth interval, fraction of energy devoted to reproduction, and life expectancy (Perlman 2016). Also, murine and human cells differ in mitochondrial density (Hulbert and Else 2005), metabolic rate, and fatty acid composition of their membrane phospholipids (Hulbert 2008). The differences between humans and mice are also a reflection of the interactions with environmental factors (e.g., food sources) and other species (e.g., microbiota and pathogens) (Perlman 2016). Moreover, mice are kept in artificial and highly controlled conditions; therefore, it is hard to replicate the presence of environmental elements which may play a crucial role in determining the occurrence of the disease. Conversely, in dogs, many diseases show pathophysiological and clinical features similar to the human counterpart. This is the case of some canine cancer for instance (LeBlanc and Mazcko 2020) where the prevalence is high in some pure breed due to genetic bottlenecks (Capodanno et al. 2022). It has been estimated that nearly 27% of purebred dogs die of cancer (Dobson 2013), with a morbidity rate over ten times higher than in humans (Capodanno et al. 2022).

Notably, dogs share the same environment with humans; therefore, they are exposed to millions of antigens (Dow 2020) and chemical risk factors for human diseases, such as cigarette smoke and pesticides (John and Said 2017). Moreover, the canine gut microbiome is similar to the human microbiome compared to what observed in other species such as mice or pigs. In fact, dog and human microbiome responds in highly similar way to dietary changes, suggesting that dog studies may be predictive of the results obtained in humans (Coelho et al. 2018).

Thus, dog represents a more reliable, naturally occurring, model for translational comparative studies than mice, also applicable in clinical trials for chemotherapy and immunotherapy (Tsamouri et al. 2021). Other advantages of using canine animal models, as reported by Bujak et al. (2018), are summarized in Fig. 1. Many known cases of breed-specific cancer susceptibility, or apparent protection from a particular disorder, have been reported. Golden Retriever, Boxer, French Bulldog, Boston Terrier, and Rat Terriers breeds display an increased risk of developing central nervous system cancers, while some breeds such as Cocker Spaniel and Doberman Pinscher are at low risk (Song et al. 2013). Among the types of cancer, glial tumors affect more frequently brachycephalic breeds, whereas dolichocephalic breeds are more prone to meningiomas (Song et al. 2013). In translational oncogenic studies, dog is considered a useful model for human osteosarcoma. Indeed, canine osteosarcoma is another common example of cancer with breed-specific susceptibility and relatively high incidence. It is considered the most common primary bone tumor in dogs (Davis and Ostrander 2014), often observed in breeds characterized by large body size and long limbs such as the Leonberger (Letko et al. 2020) and the Great Dane (Dobson 2013).

Fig. 1
figure 1

Why use the domestic dog as a model for human translational medicine studies

Successful examples of human–dog translational genomics

Decades of studies using dog genome led to the identification of hundreds of pathogenic mutations. More than 500 canine disorders are proposed as counterparts responsible for human diseases (Switonski 2014). The most common dog diseases, studied in the past 30 years, for which the causal mutation for the human counterpart has been identified, are reported in Supplementary Table 1. Among the pure breeds used as a model, the Labrador Retriever proved to be the most valuable, enabling researchers to identify single nucleotide variants (SNVs) responsible for at least 28 canine disorders (Fig. 2). Golden Retriever, German Shepherd, Beagle, and Border Collie are other important breeds for clinical research that allowed the discovery of genetic variants for 16, 15, 14, and 12 canine disorders, respectively. More information on single breeds contribution to medical research is provided in Supplementary Tables 1 and 2, and Fig. 2.

Fig. 2
figure 2

Dog breeds used as a model to discover mutations associated with genetic disorders for which the human counterpart has been identified or proposed. Breeds for which less than 4 diseases were studied are not reported but are listed in the Supplementary Table 2

The main disorders for which the genetic variants have been identified in both dogs and humans can be classified into 16 clusters (Fig. 3 and Supplementary Table 3). Eye and ear disorders are the most extensively studied group of diseases (more than 50 disorders), followed by neurological (n = 45), musculoskeletal (n = 40), metabolic (n = 32), cutaneous (n = 27), lysosomal (n = 20), and blood diseases (n = 18). The remaining groups of disorders (i.e., immune disorders, cancer, and respiratory disorders) account only for 8 studied diseases, followed by gastrointestinal and neuromuscular diseases (n = 5) and heart and reproductive disorders (n = 3).

Fig. 3
figure 3

The main types of disorders for which the genetic variants have been identified in dogs and humans

In the following subsections, we summarized the main findings on the disease classes for which the mutations causing the human counterpart were identified.

Eye and ear diseases

Retinitis pigmentosa (RP) is a blinding eye disorder affecting circa two million people worldwide. A class of common canine disorders known as progressive retinal atrophies (PRA) proved to be an appropriate model for the study of this human ocular disease (Bunel et al. 2019), as dogs affected with PRA develop symptoms similar to RP. Recently, a whole-genome sequencing (WGS) study of affected Lapponian Herders revealed a missense variant (g.5648046C > T—c.3176G > A—p.R1059H) in the intraflagellar transport 122 gene (IFT122) (OMIA#002,320–9615), which impairs the protein function. This finding points to IFT122 as a potential candidate gene to identify the biological mechanisms at the basis of human RP (Kaukonen et al. 2021).

Mutations in the lipoxygenase homology PLAT domains 1 gene (LOXHD1) (OMIA#002,336–9615) are known to be involved in hearing loss of humans and mice. The gene was studied in Rottweiler dogs affected by non-syndromic hearing loss, an important medical problem whose causes are still unknown. As observed for other species, the affected dogs carried a missense variant (g.44806821G > C—c.5747G > C—p.G1914A) in LOXHD1, a gene essential for cochlear hair cell function (Hytönen et al. 2021a).

Neurological diseases

Dogs affected by degenerative myelopathy, which is caused by two SNVs (c.118G > A—p.E40K; c.52A > T—p.Thr18Ser) in the superoxide dismutase type 1 gene (SOD1) (OMIA# 000,263–9615) (Zeng et al. 2014), were proposed as animal model for human amyotrophic lateral sclerosis. A study of Dutch Markiesje dogs led to the identification of a frameshift mutation, at the fourth codon in SOD1. The shifted coding sequence generates a stop codon at the tenth codon. This mutation is responsible for an autosomal recessive form of juvenile paroxysmal dyskinesia laying the basis for the study of the human juvenile, progressive spastic tetraplegia, and axial hypotonia (Mandigers et al. 2021).

Parson Russell Terrier is a breed prone to develop a peculiar neurodegenerative disorder characterized by the onset of severe seizures at the age of about 4 months that sometimes evolves to a fatal status epilepticus. Analysis of the pitrilysin metallopeptidase 1 gene (PITRM1) (OMIA#002,324–9615) from affected dogs identified a homozygous deletion of 6 nucleotides (g.32188565_32188570del—c.175_180del—p.L59_S60del) likely responsible for the disease. This result suggests dog as a model for neurodegenerative disease with mitochondrial respiratory deficiency and severe epileptic encephalopathy (Hytönen et al. 2021a).

Musculoskeletal diseases

In human, the most common genetic muscle disorders are the congenital muscular dystrophies and the congenital myopathies (Bönnemann et al. 2014) which are caused by mutations on the collagen VI (COL6), the laminin subunit alpha 2 (LAMA2), the LARGE xylosyl- and glucuronyltransferase (LARGE), the selenoprotein N (SEPN1), and the ryanodine receptor 1 (RYR1) genes (Shelton et al. 2021). Similarly, dogs affected by muscular dystrophy showed mutations in the collagen type VI alpha 3 chain (COL6A3) (OMIA#002,274–9615) (g.48007994C > T—c.6210 + 1G > A; g.48014962G > A—c.4726C >—p.R1576*) (in Labrador Retrievers) (Bolduc et al. 2020), in collagen type VI alpha 1 chain (COL6A1) (c.289C > T- p.Q97*) (OMIA#001,967–9615) (in Landseer dogs) (Steffen et al. 2015), in LARGE (g.30357716C > T—c.1363C > T—p.R455*) (OMIA#002,460–9615) (in Labrador Retrievers) (Shelton et al. 2022), and in LAMA2 (g.67883271G > A—c.3285G > A—p.W1095*; g.67734331-67736575del) (OMIA#002,459–9615) (in Staffordshire terriers) (Shelton et al. 2022).

Metabolic diseases

Diabetes mellitus is a metabolic disorder commonly diagnosed in dogs. SNVs affecting immune response and cytokine genes were associated with increased susceptibility to disease in several breeds (Catchpole et al. 2013; Short et al. 2007; 2009; 2010). In humans, mutations in the proopiomelanocortin gene (POMC) are linked to obesity and consequently to a higher risk of obesity-related diseases such as type 2 diabetes (Farooqi et al. 2006). A recent study on Labrador Retriever reported a deletion of 14 bp in POMC gene (g.19431807_19431821del—p.E188fs) (OMIA#001,258–9615) that is involved in food motivation and obesity (Raffan et al. 2016). However, no association was found between the presence of this mutation and canine diabetes mellitus (Davison et al. 2017).

Cutaneous diseases

Canine junctional epidermolysis bullosa is characterized by ulcers of the skin, footpads, oral mucosa, and gastrointestinal tract which often requires euthanasia of the affected dog. In Australian Shepherd, the disorder was associated with a missense mutation in the laminin subunit beta 3 gene (LAMB3) (g.8286613A > G—c.1174 T > C—p.(C392R)) (OMIA#002,269–9615), which leads to a recessive form of junctional epidermolysis bullosa (Kiener et al. 2020). Similarly, genetic variants in the LAMB3 and other genes such as the laminin subunit alfa 3 gene (LAMA3) and the laminin subunit gamma 2 (LAMC2) have also been identified in human epidermolysis bullosa (Has et al. 2020).

Lysosomal diseases

Among the neurodegenerative lysosomal storage diseases, canine GM2-gangliosidosis is a fatal disorder caused by a 3-base pair deletion in the hexosaminidase subunit beta gene (HEXB) (g.57243656_57243658del—c.849_851del—p.L284del) (OMIA#001,462–9615) gene. There is no cure for the disease and ill dogs are euthanized. The mutation was found in affected Shiba Inu, suggesting dog as a possible model for this disorder in humans (Wang et al. 2018).

Blood diseases

Hereditary methemoglobinemia is a rare autosomal recessive disorder in animals characterized by the deficiency of nicotinamide adenine dinucleotide (NADH)-cytochrome b5 reductase (CYB5R3) which leads to an increase in the concentration of oxidized hemoglobin. Clinical manifestations in dogs are associated with signs of cyanosis of the oral mucous membranes, tongue, and skin (Shino et al. 2018). The disorder was studied in the Pomeranian dog. Researchers found a missense mutation (c.580A > C—p.I194L) in cytochrome b5 reductase 3 gene (CYB5R3) (OMIA#002,131–9615) associated with the disorder (Otsuka-Yamasaki et al. 2021). Similarly, human hereditary methemoglobinemia is due to CYB5R3 gene variants causing a missense mutation or truncated protein because of the presence of premature termination codons and incorrect exon–intron splicing sites (Percy and Lappin 2008).

Immunological diseases

Hereditary myeloperoxidase deficiency is a rare human disorder (estimated frequency of 1/1000 to 1/4000 people) (Parry et al. 1981) caused by mutations in the myeloperoxidase gene (MPO), leading to a reduction or absence of myeloperoxidase activity in neutrophils and monocytes (Kizaki et al. 1994), increasing susceptibility to fungal infections such as candidiasis (Aratani et al. 1999). Studying an affected Italian hound dog, a homozygous mutation in MPO gene (c.1987C > T—p.R663*) (OMIA#002,028–9615) was found, leading to a premature termination codon (Gentilini et al. 2016).

Cancer

Pediatric colorectal polyps are recorded in hamartomatous polyposis syndrome, an autosomal dominant human genetic disorder, characterized by gastrointestinal polyps and symptoms such as rectal bleeding and prolapse (Bronner 2003). The canine colorectal hamartomatous polyposis and ganglioneuromatosis was studied in a Great Dane puppy showing clinical sings of hematochezia, tenesmus, and rectal prolapse (Bemelmans et al. 2011). Genetic analysis revealed a duplication on the phosphatase and tensin homolog gene (PTEN) (OMIA#001,515–9615). This mutation also characterizes human Cowden syndrome (OMIN#158,350), where patients are at greater risk of colon, breast, thyroid, and endometrial cancer (Bemelmans et al. 2011).

A chromosomal translocation producing a breakpoint cluster region protein-tyrosine-protein kinase gene hybrid (BRC-ABL) (OMIA#002,299–9615) was reported as a cause of lethal chronic monocytic leukemia in mixed dog. A similar mutation leads to the “Philadelphia” chromosome abnormality recognized in human chronic myelogenous leukemia (Cruz Cardona et al. 2011).

Respiratory diseases

Canine idiopathic pulmonary fibrosis (CIPF), which shares several clinical and pathological features with human idiopathic pulmonary fibrosis, has been proposed as a possible disease mode in human patients. The CIPF is a chronic and progressive fibrotic disorder affecting dog’s lungs with a particular high incidence in West Highland White Terrier dog breed (Heikkilä et al. 2011). Although the general causes triggering the disease are still unknown, a recent genome-wide association study (GWAS) identified genetic risk factors located in the cleavage and polyadenylation specificity factor subunit 7 (CPSF7) and the succinate dehydrogenase complex assembly factor 2 (SDHAF2) genes (Piras et al. 2020).

Gastrointestinal diseases

Human Imerslund-Gräsbeck syndrome is a condition caused by a congenic cobalamin malabsorption, characterized by an autosomal recessive pattern of inheritance. Two different mutations on cubilin gene (CUBN) (p.P1297L and p.P337L) have been identified in human patients (Aminoff et al. 1999; Storm et al. 2011). Similarly, the canine disease is caused by a mutation on cubilin gene (c.8392delC—p.Gln2798Argfs) (OMIA#001,786–9615) identified in Komondor dogs exhibiting failure-to-thrive, inappetence, vomiting and/or diarrhea, and weakness due to selective cobalamin malabsorption (Fyfe et al. 2018).

Neuromuscular diseases

Congenital myasthenic syndromes are rare neuromuscular diseases occurring in dogs, characterized by the disruption of signal transmission across neuromuscular junction leading to skeletal muscle weakness. The disorder has been studied in a few breeds. The first research was carried out in Old Danish Pointing Dogs, where a missense mutation (g.1484906G > A—c.85G > A—p.V29M) in choline O-acetyltransferase gene (CHAT) gene (OMIA#002,072–9615) was identified (Proschowsky et al. 2007). More recently, the disorder was studied in Labrador Retriever. Affected dogs were homozygous for a missense variant in the collagen like tail subunit of asymmetric acetylcholinesterase gene COLQ gene (c.1010 T > C—p.I337T) (OMIA#001,928–9615) (Rinz et al. 2014). Similarly, the disorder in Golden Retrievers is associated with a COLQ mutation (g.27175559G > A—c.880G > A—p.G294) (Tsai et al. 2020). Congenital myasthenic syndromes are observed also in human. Some patients were found to be homozygous for an identical COLQ mutation (c.1010 T > C) (Matlik et al. 2014).

Heart diseases

Several forms of human cardiomyopathy are caused by variants in the phospholamban gene (PLN) (Van der Heijden and Hassink 2013). The familial dilated cardiomyopathy is one of the most common diagnosed cardiac disorder characterized by cardiac enlargement and decreased myocardial function (Wilcox and Hershberger 2018). The canine counterpart was studied in Welsh Springer Spaniels where the disease is characterized by a highly penetrant dilated cardiomyopathy resulting in sudden death. Affected dogs showed a missense mutation in the PLN gene (g.58588129C > T—c.26G > A—p.R9H) (OMIA#002,195–9615) (Yost et al. 2019) identical to that in humans (Medeiros et al. 2011).

Reproductive diseases

Among the causes of gonadal carcinogenesis, infertility, and sterility, disorders of sex development are one of the most important classes studied (Nowacka-Woszuk et al. 2020). In human patients with disorders/differences of sex development and infertility, insertions and deletions in the nuclear receptor subfamily 5 group A member 1 gene (NR5A1) (p.Ser4*, p.Cys55Ser, p.Met78Leu, and p.Met98Glyfs*45) were described (Fabbri-Scallet et al. 2020). Moreover, NR5A1 deletions were observed in patients with XY disorders of sex development with female external genitalia, no uterus, and dysgenetic testes (Shojaei et al. 2017). Similar mutations were observed in dogs; indeed, a large deletion of four exons of the NR5A1 gene (OMIA#002,296–9615) was identified in a Yorkshire Terrier with rudimentary penis, hypospadias, bilateral cryptorchidism, and spermatogenesis inactive testes (Nowacka-Woszuk et al. 2020).

Dog genomic data handling and available resources

In human genetics, the 1000 Genomes Project (1KGP) represents a milestone in providing a global picture of human genetic variability. The 1KGP consortium made their data publicly available to the worldwide scientific community committed to the study of human genetic variation through freely accessible public databases (Siva 2008).

Several projects to provide canine genomic database and catalog, as occurred for human genome studies, have been completed or are still ongoing (Table 1). The improvement of next-generation sequencing (NGS) technologies led to an enormous amount of genomic data from domestic dogs and wild gray wolves, often freely available for evolutionary, zootechnical, or medical research.

Table 1 Projects and resources available for dogs’ genomics research

Dog 10 K genomes project (dog10K)

This project aims to coordinate the global effort on genome sequencing in dogs and build a collection of data consisting of the genome from 10,000 dogs. To reduce sequencing costs, the project currently plans to sequence each sample at ≅ 4X coverage. Despite the low sequencing coverage, combining data across thousands of samples should allow a highly accurate genotype imputation for each sample (Wang et al. 2019).

Dog Aging Project (DAP)

The DAP started in 2016 and is an example of geroscience project which shares some parallels with the Golden Retriever Lifetime Study, although with much larger phenotypic diversity collection and analysis of big data. Indeed, the project is based on the retrieval of dog data through the coordinated efforts of owners, veterinarians, researchers, and volunteers. DAP relies on these communities to follow 10,000 dogs across the USA over a 10-year period with the goal to unveil the biological, lifestyle, and environmental factors that maximize dogs’ health and longevity. The governance is based at the University of Washington and Texas A&M College of Veterinary Medicine and Biomedical Sciences. Several studies were carried out by the project; the most recent found that canine cognitive dysfunction, a canine counterpart for the human Alzheimer’s disease, correlates with amyloid-beta 42 levels in dog brain (Urfer et al. 2021).

Dog Biomedical Variant Database Consortium (DBVDC)

This database consists of a list of functionally annotated SNPs detected through WGS of 582 dogs from 126 breeds and 8 wolves.

The main technical details are listed below:

  • Approximate sequencing coverage: about 24X (minimum coverage 10X).

  • Genomic variants identified: 23,133,692 SNPs and 10,048,038 short indels (including 93% undescribed variants).

  • Variant effect classification: 247,141 SNPs and 99,562 short indels have impact on 11,267 protein-coding genes.

  • Loss-of-function variants discovered: each genome contains heterozygous loss-of-function variants in 30 potentially embryonic lethal genes and 97 genes associated with developmental disorders.

The catalog has already been used to unravel the genetic background for more than 50 inherited disorders and traits. The DBVDC is updated approximately every 3–6 months with new sequencing runs.

Dog Genome SNP Database (DoGSD)—iDOG

The DoGSD is a web-based, open-access resource (comprising around 19 million high-quality whole-genome SNPs), created as a storage for the information on dog and wolf genome variation (CNCB-NGDC Members and Partners 2021). The database provides the research community with a variety of data services. However, its main feature is the SNPs detector and visualization tool. In addition, DoGSD incorporates information as SNP annotation, summary lists of SNPs located in genes, sampling location, and breed information.

The main resources available from the website are described below.

  • Genomes

    The reference genome for dog, dhole (Cuon alpinus), and wolf can be accessed via FTP (file transfer protocol) along with annotated genes and proteins.

  • Breeds, Disease, and Genotype–Phenotype pairs (G2P)

    Breeds section contains phenotypic information for 481 breeds curated from different kennel clubs (AKC, CKC, UKC, and FCI) and Wikipedia. Breed-specific diseases are also reported.

    Disease section reports information on 806 canine diseases retrieved from different databases (OMIA, CIDD, and The Dog Place website). Each disease is accompanied by a short description of the clinical signs, the associated gene/s (if known), and the related literatures.

    Genotype–Phenotype pairs (G2P) is a SNPs detector and visualization tool containing 71 million non-redundant SNPs called from 722 individual samples, and 6 million ancient SNPs from 27 individual samples. The information includes chromosome, position, dbSNP rsid, breed, disease trait, effect allele, OR value, P value, reported gene symbol, and PMID.

  • Gene expression

    This section contains data from 62 gene expression profiles (RNA-Seq) projects and 1198 experiments collected from NCBI. Analyses were carried out on different tissues and cell lines.

  • Single cell

    This section reports single-cell RNA sequencing data from post-mortem hippocampus brain tissue from a 6-month-old Beagle divided into the following three sections.

Single cell cluster. Uniform Manifold Approximation and Projection of transcript profiles from 105,057 hippocampus cells. For each record, the associated cluster, type of cell, gene name, log2 fold change, P value, and adjusted P value are reported.

Gene marker explorer. Heatmap for 345 genes differentially expressed in 9 cell types from hippocampus. For each gene, the cell type and the mean differentially expressed gene value are reported.

Gene in situ map. Map of 11 genes in hippocampus tissues obtained by in situ hybridization.

Golden Retriever Lifetime Study (GRLS)—Morris Animal Foundation

The GRLS is one of the largest canine health studies in the USA which aims to investigate the complex associations between dietary, genetics, and environmental risk factors influencing cancer and other important diseases in dogs. The project is based on retrieving health, environmental, and behavioral data from more than 3000 purebred Golden Retrievers. The database is updated annually via online questionnaires from dogs’ owners, physical examinations, and collection of biological samples by primary care veterinarians (Guy et al. 2015). The collected data have already been used to study the effect of inbreeding on fertility (Chu et al. 2019), and the relationship between the timing of spay/neuter and the development of obesity and non-traumatic orthopedic injury (Simpson et al. 2019).

CanFam6: domestic dog reference genome

CanFam6 is currently the most updated canine reference genome provided by the International Consortium of Canine Genome Sequencing (Wang et al. 2019) obtained by sequencing a female Boxer on an PacBio Sequel systems system. The breed was chosen following an analysis of 60 dog breeds that demonstrated that the Boxer is one of the breeds with the least amount of variation in its genome (Jagannathan et al. 2021).

The National Human Genome Research Institute (NHGRI) Dog Genome Project

The NHGRI Dog Genome Project team consist of canine DNA samples, health histories, and pedigrees. The goal of the project is the identification of genetic variants associated with inherited diseases, morphological traits, and behavior. Three databases can be freely downloaded from the project’s webpage:

  • Locus Specific Genotypes

    Provided data consist of two Variants Calling Files (VCF) containing biallelic variants on chromosome 15. One file combined information from the WGS of 1161 dogs from 230 breeds, as well as from 141 indigenous and villages dogs, while the second file contains the genotypes of 86 wild canids.

  • Genome-Wide Variant Discovery

    A single VCF file containing 91 million SNVs obtained from the WGS of 722 canids: 668 domestic dogs (of which 528 from 144 established breeds and 36 samples from mixed-breed or dogs of unknown breed), 104 village and feral dogs from diverse localities, and 54 wild canids from five species (Andean fox, coyote, dhole, golden jackal, and gray wolf) (Plassais et al. 2019).

  • SNP-based Population Studies

    A PLINK formatted file including genotypes from >150,000 SNPs in 1356 dogs and 9 wild canids (Parker et al. 2017).

OMIA—Online Mendelian Inheritance In Animals

Online Mendelian Inheritance in Animals (OMIA) is a catalog of inherited disorders, other traits, and genes in 319 animal species edited by scientific staff of the University of Sydney (Lenffer et al. 2006). Information is stored in a free access database which accounts for a total of 829 canine disorders/traits, divided into 4 sections: Mendelian trait/disorder; Mendelian trait/disorder with likely causal variant(s) known, likely causal variants, potential models for human traits (accessed on December 28, 2021). Each disorder/trait is accompanied by a descriptive sheet containing information on relative gene/s involved (if known), possibly relevant human trait(s) and/or gene(s), mode of inheritance, molecular basis, main clinical features, prevalence, breed commonly affected, and relevant related literatures.

Sequence Read Archive (SRA)—NCBI

The Sequence Read Archive is the largest international public repository of next-generation sequence data hosted by NCBI servers (Leinonen et al. 2011). It stores BAM (Binary Alignment Map) and FASTQ files from the primary analysis phase of sequencing. The database stores billions of SRA from thousands of species including canine data from the aforementioned project. However, searching for data can often be more time-consuming than the retrieval of files from the other “dog-specific” database. Moreover, often the data lacks detailed notes on phenotype which can sometimes be retrieved from the related literature. Detailed information on the data available are reported in Table 2.

Table 2 Dog data available from SRA-NCBI (accessed on January 17, 2022, https://www.ncbi.nlm.nih.gov/sra)

Main limitations of translational research from dogs to humans

Although canine clinical trials can be extremely beneficial when translated to human clinical studies, this approach is not exempt from limitations. Sebbag and Mochel (2020) pointed out the following issues when using dogs for translational research:

  • Expensive to purchase and maintain dogs in specific housing.

  • Limitation in molecular tool kits specific to dogs compared to laboratory rabbits and rodents.

  • The usage of dogs as experimental animal models is rightly subjected to important ethical and public‐perception considerations.

  • The mixed genetic background (crossbreed) of most subject affects the variability in disease characteristics such as its severity and response to therapy. On the other hand, inbred strains are available for rodents and rabbits with highly reduced genetic variability, as well as knockout and transgenic strains.

  • Limitation in clinical trials involving dogs imposed by rigorous ethical review, unpredictable case enrollment, variability in disease phenotype among individual dogs, economic challenges (e.g., owner ability to provide care), and owner compliance.

Conclusions and perspectives

Innovative models for medical research are strongly required. In fact, in many areas of clinical research, laboratory rodents continue to be used as test subjects, despite the wide anatomo-physiological divergence that exists with human (Sebbag and Mochel 2020). Although studies on mouse models have been suitable to unveil the basic physiopathological mechanisms of several disorders, such models cannot reproduce the complex biology behind some diseases, such as cancer, especially its recurrence and metastasis (Capodanno et al. 2022). On the other hand, hundreds of canine diseases share features with the human counterpart such as protein and gene homology, pathophysiological mechanisms of initiation and progression. Furthermore, the two species often share drug targets, drug resistance, potential prognostic, and diagnostic biomarkers. This evidence points to dog as the most suitable model for many human diseases (Tsamouri et al. 2021).

In view of the clinical and molecular similarities between canine and human diseases, different branches of translational medicine, such as comparative oncology, aim to study spontaneously occurring diseases in dogs to provide a more reliable model for human cancer (Capodanno et al. 2022). Throughout preclinical studies involving canine patients with spontaneous diseases, the expertise of veterinarians, physicians, and basic science researchers may soon be integrated under the umbrella of the One Health Initiative (Sebbag and Mochel 2020). Further deepening in the understanding of canine pathology will be beneficial for the clinical applications of treatments in both dogs and humans.

Numerous projects are ongoing to create big, publicly available, canine datasets with the aim of speeding up the clinical research efforts. However, caution is needed while recording phenotype to provide highly reliable data for the clinical association studies. Most of the available genomic datasets are accompanied by the record of limited phenotypic information accounting only for breed type without taking into consideration other important records, such as dietary factor, environment, and clinical history of dog. Although these data can be very useful in zootechnical research related to genetic-based improvement, they may be of limited use in the field of clinical research.