Introduction

Hearing impairment (HI) is a highly heterogeneous and common sensory disorder (Vona et al. 2020). The three major types of HI are conductive, sensorineural, and mixed (both conductive and sensorineural). Conductive HI is due to reduced ability of the external ear, middle ear, or both, to conduct sound, whereas sensorineural HI can be due to cochlear dysfunction, damage to stereocilia, or problems associated with vestibulocochlear nerve transmission to and from the brain and inner ear.

Congenital HI occurs in 1–2 per 1000 newborns globally (Vos et al. 2019; Vona et al. 2020). More than half of newborns who fail hearing screening have no identifiable risk factors and are presumed to have genetic HI (Vos et al. 2019). Of the genetic HI cases, 30% are syndromic and 70% are nonsyndromic (NS). For NSHI the modes of inheritance are autosomal recessive (AR) (~ 77%); autosomal dominant (AD) (~ 22%); X-linked (~ 1%); and mitochondrial (< 1%) (Irshad et al. 2005). ARNSHI is usually sensorineural, prelingual/congenital, severe to profound, affects all frequencies, and non-progressive. While ADNSHI is usually progressive, post-lingual, mild to profound, often affecting the middle to high frequencies. To date, > 120 genes have been identified for NSHI (Adadey et al. 2020) with the majority of the genes implicated in ARNSHI. The vast majority of ARNSHI genes were localized and identified through the study of consanguineous pedigrees.

Characteristics of ARNSHI in consanguineous and outbred populations

For ARNSHI, both parents are expected to be carriers of causal variants. Their hearing impaired children can either be homozygous or compound heterozygous depending on whether they inherit the same or different causal variants from each parent. When both parents are carriers of causal ARNSHI variants in the same gene, on average, ¼ of their children will be hearing impaired and ½ a causal variant carrier. Children of parents who are either carriers of causal variants or have HI due to different ARNSHI genes do not have an increased risk of being hearing impaired. Offspring of two deaf parents that carry causal variants in the same gene will all have HI. For consanguineous pedigrees, a rare causal HI variant is more likely enter the pedigree once, for example, for a pedigree segregating ARNSHI where the parents of the affected children are first cousins the causal variant is more prone to enter the pedigree through one of their two shared grandparents than twice through both the maternal and paternal lineages (Fig. 1a). Therefore, for consanguineous pedigrees, it is usual to observe causal homozygous variants. In contrast, for an outbred pedigree, the causal variant will enter the pedigree twice through both a maternal and paternal grandparent (Fig. 1b). Although for outbred pedigrees usually compound heterozygous variants are observed, homozygous causal variants can be detected particularly those that are more frequent within a population, for example, GJB2 p.Gly12fs which has an allele frequency of 1% in non-Finnish Europeans (Genome Aggregation Database Consortium et al. 2020).

Fig. 1
figure 1

ARNSHI in consanguineous and outbred pedigrees. Solid symbols represent affected individuals, clear symbols represent unaffected individuals. Squares are males and circles are females. Two parallel lines between parents indicates consanguinity. a Pedigree of a consanguineous family showing entrance of a causal allele once from a shared great-grandparent (indicated by a green arrow). A variant could enter through either the shared great-grandparent, but one great-grandparent was selected for the purpose of illustration, which is more likely than the variant entering twice from the maternal and paternal lineages of the carrier parents (indicated by blue arrows). b An outbred pedigree with a single affected child where the causal variant enters the pedigree twice through both a maternal and paternal grandparent (green arrows). The variant entering through one random grandparent is shown as an example, however, entrance of the variant could be from either grandparent. c An outbred pedigree with two branches with affected children showing that ARNSHI variants must enter three times (blue, green, and orange arrows). The variants could have also entered through the spouses of the individuals shown with arrows. d Pedigree showing several consanguineous matings within a pedigree which can lead to multiple branches with affected children

Outbred pedigrees segregating ARNSHI are usually nuclear. Since to observe more than one branch with hearing impaired members, causal variants would have to be introduced more than twice to the pedigree, for example, to observe cousins from an outbred pedigree with ARNSHI the sibling carrier parents would have to have children with spouses that are also causal variant carriers (Fig. 1c). For families that hail from populations where consanguinity is a common practice, it is not unusual to have several consanguineous matings within a single pedigree and multiple branches of the family to have affected children (Fig. 1d). In addition, some populations where consanguinity is practiced also tend to have large families, which also impacts the ability to successfully identify the causal variant.

Early methods of ARNSHI gene identification: positional cloning via linkage analysis and homozygosity mapping

The first methods used to identify underlying genes in Mendelian diseases included positional and functional cloning (Fig. 2). Functional cloning relies on knowledge of a dysfunctional protein associated with disease, for example discovered via biochemical assays. For HI, the latter technique has only been successful in the study of deaf mice (Wang et al. 1998; Smith and Van Camp 1999). Conversely, positional cloning is a technique that focuses on the localization of the disease gene along the chromosome, without any prior knowledge needed on the gene’s product or function. Positional cloning has been the most successful technique used to identify novel HI genes in humans, including ARNSHI, especially before the availability of next-generation sequencing (NGS) (Friedman et al. 1995; Chaib et al. 1996; Wang et al. 1998; Yasunaga et al. 1999). In the process of positional cloning, early studies would generate a genetic map via genotyping panels of short tandem repeat polymorphic (STRP) (aka microsatellite) markers and the application of statistical methods (Collin et al. 2008) and later, when they became available, microarray panels of single-nucleotide polymorphism (SNP) markers (Basit et al. 2011). For ARNSHI, two approaches were then mainly used to map genomic loci linked to disease, i.e., linkage analysis and homozygosity mapping.

Fig. 2
figure 2

A comparison of traditional positional cloning and NGS approaches to ARNSHI gene identification methods. a Traditional positional cloning with the identification of OTOF shown as an example. In the identification of OTOF, a small region of chromosome 2 was mapped via linkage analysis and a physical map was constructed using yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs) and P1 phage artificial chromosomes (PACs). Following transcript mapping of all genes and expressed sequenced tags (ESTs) in the region, candidate gene OTOF was identified. Further sequencing analysis of OTOF revealed the causal variant. b Recent approaches evaluate next-generation sequencing data via variant filtering approaches based on plausible inheritance model, variant frequency etc., to identify causal genes and variants

Linkage analysis to identify ARNSHI loci

In traditional parametric linkage analysis, genetic maps of STRPs or SNPs, together with information on mode of inheritance, penetrance, and allele frequencies are used to localize Mendelian disease loci. This technique is based on the fact that genetic variants which are physically close on a chromosome segregate together during meiosis. Therefore, a genomic locus containing a causal variant can be statistically linked to disease by interrogating nearby markers, and the strength of evidence for linkage (or lack of) can be evaluated through the estimation of a logarithm of the odds (LOD) score. Due to high levels of locus heterogeneity for NSHI, the study of multiple families will not lead to significant results, even when methods that allow for heterogeneity are applied (Ott 1983). Therefore, it was important to be able to analyze families that can independently establish linkage, i.e., LOD score ≥ 3.0, (Morton 1955) which was later revised to a LOD score ≥ 3.3 (Lander and Kruglyak 1995). For outbred pedigrees, it is nearly impossible to reach the significance threshold using a single pedigree. They are unphased, it is impossible to determine from which two grandparents the carrier parents received a causal variant. Therefore, the first affected child provides no linkage information and families with only a single hearing impaired child are uninformative (Fig. 3a). For (1) a rare causal variant; (2) a marker in perfect linkage disequilibrium (LD) (Θ = 0) with the causal variant; or (3) multiple markers that form a rare haplotype that tags the causal variant, each additional affected child adds 0.6 to the LOD score and each additional unaffected child 0.125. Therefore, for example, a nuclear outbred pedigree would need to have six affected children available for study to obtain a LOD score of 3.0 (Fig. 3b), an event which is unlikely to be observed. In contrast, phase information is available for a consanguineous pedigree, since for a rare causal variant, the probability is higher that it entered the pedigree once, for example, through one of the great-grandparents for a first-cousin union (Fig. 1a) or through the great-great-grandparents for a second-cousin union, then through two founder pedigree members. Since phase information is available, not only is there linkage information for the first affected child that is obtained from the meiosis received from the mother and father, but in the case of a first-cousin union the meiosis from the grandparents to the parents are also informative (Fig. 3c). Therefore, when analysis is performed either directly using (1) the rare causal variant; (2) a marker in perfect LD with the causal variant; or (3) multiple markers that form a rare haplotype by tagging the causal variant, a consanguineous pedigree with a single affected individual who is the offspring of a first-cousin mating can provide a LOD score of 1.2 (Fig. 3d) and if the child is offspring of a second-cousin union the pedigree can provide a LOD score of 1.8. It can be observed that each informative meiosis for this situation provides a LOD score of 0.3. For either of these pedigree structures, each affected child adds 0.6 and each unaffected child adds 0.125 to the LOD score (Fig. 3e). The reason why an unaffected child provides so little linkage information compared to an affected child is that it cannot be determined whether or not they are a disease variant carrier; LOD score calculation must be made using 2/3 probability that they are a causal variant carrier and 1/3 probability that they are homozygous wild type. In contrast, for an affected child, the probability is 1.0 that they are carriers of two causal variants (Fig. 3f) (Ott et al. 2015). Allele frequencies will also impact the LOD scores with the LOD decreasing with increasing allele frequencies, because as the causal variant allele frequency increases so too does the probability that it entered the pedigree more than once (i.e., through two founders) with more possibilities for the variant to enter a second time when consanguineous parents are more distantly related. It should be noted that consanguineous pedigrees will not provide any additional information when they segregate Mendelian traits with modes of inheritance other than AR.

Fig. 3
figure 3

Linkage analysis in ARNSHI pedigrees. Solid symbols represent affected individuals, clear symbols represent unaffected individuals. Squares are males and circles are females. A double line between parents indicates consanguinity. a An outbred pedigree with a single affected child is unphased and uninformative for linkage as it is unknown which grandparent is carrier of the variant. b A nuclear outbred pedigree showing affected children with carrier parents. In this case, the first affected individual is uninformative, however, each additional affected sibling can contribute a maximum of 0.6 to the LOD score, and 6 affected children are necessary to obtain a LOD score of 3.0. c Example of a pedigree providing information from meioses from the grandparents to the parents in a first-cousin union (blue arrows). Each informative meiosis provides a LOD score of 0.3. d Pedigree with one affected individual who is the offspring of a first-cousin mating. This pedigree will provide a LOD score of 1.2 due to the informative meioses from the grandparents to the parents (blue arrows) and the meioses from the parents to the affected child (green arrows). Each meiosis adds 0.3 to the LOD score. e Each affected child adds 0.6 (green arrows) to the LOD score and each unaffected child adds 0.125 (light-green arrows) to the LOD score. f Pedigree showing the probability of the causal variant genotype of each child. The probabilities are not influenced by consanguinity

Homozygosity mapping to identify ARNSHI loci

Another approach which was used to analyze genotype data to map ARNSHI loci is homozygosity mapping, which examines the genome for runs of homozygosity (ROH) (Lander and Botstein 1987). For this method to detect homozygous causal variants, they must be surrounded by a ROH, making it an ideal approach to study consanguineous pedigrees. However, it can also be used to study outbred families. In the latter case, detection of homozygous regions has been successful when the parents are cryptically distantly related and are carriers of the same causal variant. If a causal variant is homozygous but the parents only share a very distant common ancestor, the ROH can be too small to detect. Homozygosity mapping can be performed on data obtained from a single affected individual. However, this will usually reveal many regions of homozygosity with a greater number of ROH if the parents are more closely related. The number of ROH can be reduced by analyzing data from multiple affected and unaffected family members. Unlike linkage analysis, most homozygosity mapping (Seelow et al. 2009) methods do not provide statistical evidence of a region containing a causal variant.

Candidate gene and variant identification

After an ARNSHI locus was mapped to a chromosomal location via linkage analysis or homozygosity mapping, or both, the search for the causal variant within this region began (Friedman et al. 1995; Wang et al. 1998). Before the completion or the draft of the Human Genome Project was available, a “physical map” was generated to pinpoint candidate genes within the mapped region. This physical map consists of sets of overlapping DNAs, such as yeast artificial chromosomes (YACs), that span the critical region (Fig. 2 and Table 1). This process became obsolete after the completion of the Human Genome Project when genes were mapped to the full human genome. Candidate genes within the region mapped to disease were then selected to undergo sequencing step by step. This was a slow, laborious, and expensive process, since regions were almost always > 1 Mb and contained many genes, it would often take years to identify an ARNSHI gene.

Table 1 ARNSHI loci and genes

Analysis of next-generation sequence data to identify ARNSHI genes

In the past decade, exome sequencing (ES) and whole-genome sequencing (WGS) have rapidly become more accessible, cost-effective, and currently are the preferred method of studying families that segregate Mendelian traits (Fig. 2). Identified variants can be annotated using, for example, ANNOVAR which allows annotation of a large number of bioinformatic tools such as Combined Annotation Dependent Depletion (CADD) (Liu et al. 2016), population-specific allele frequencies from The Genome Aggregation Database (gnomAD) (Genome Aggregation Database Consortium et al. 2020), variant classification information from ClinVar (Landrum et al. 2018), and custom datasets such as large-scale screens for HI in mice (Ingham et al. 2019). These annotations have proven very effective in aiding in the identification of causal variants.

Copy number (CNVs), a type of structural variant (SV), can also be identified using NGS. Other SVs, such as translocations and gene fusions, often seen in large sizes (50 bases to > 1 kilobases) have been harder to identify although, currently evolving strategies for their identification have seen some success in cancer and neurological diseases using long-read sequencing, as opposed to the standard short read sequencing currently commonly used for exomes. Certain ARNSHI genes are known to commonly harbor pathogenic CNVs, such as OTOA and STRC (Shearer et al. 2014).

Considering all the above advantages, research strategies for ARNSHI cohorts have increasingly taken the approach of analyzing exome sequence data without performing linkage analysis. The trend is increasingly observed for genes identified from 2014 onwards (Table 1). With the development of more effective variant calling tools, and further accessibility to cost-effective ES, more samples from individuals with HI are being exome sequenced in search of novel, rare coding variants with potentially high impact on function.

Prior to the analysis of sequence data, the most plausible inheritance models for the pedigrees should be determined. For ARNSHI, variants are generally filtered retaining those with an minor allele frequency (MAF) of, for example, ≤ 0.005 in every population that are either homozygous or potentially compound heterozygous, with the exception of a few known population enriched variants with a higher MAF, such as p.Gly12fs in GJB2 (Chakchouk et al. 2019). When variants detected via ES are potentially compound heterozygotes, the genotypes of the parents are necessary to determine if the variants are in trans or cis. In rare occasions, parental genotyping may show that one of the ARNSI variants has arisen de novo, in which phasing of variants may require additional follow-up (e.g., long-read sequencing and ddPCR). For each family, if DNA samples from all family members have not undergone NGS, the segregation of the identified variant with the affection status of the family members needs to be verified using, for example, Sanger sequencing. Lastly, due to the high interaction among proteins of the ear sensory epithelia and hair cells, digenic and polygenic inheritance have also been described for HI (Schrauwen et al. 2018; Khalil et al. 2020). The definition of digenic inheritance in the literature can be variable, but for HI, genes have been often reported in a classic digenic model, for example, where two trans heterozygous variants in two genes are required for the expression of a phenotype (Schrauwen et al. 2018).

Linkage analysis and homozygosity mapping in the NGS era

Although linkage analysis can also be performed using either exome or whole-genome sequence data, this type of analysis is rarely performed to localize causal variants (Wang et al. 2015). Linkage analysis is sometimes performed to analyze the candidate causal variants that were identified through WGS and ES and using genotype data for all available informative pedigree members to provide statistical evidence of potential involvement of the variant in ARNSHI etiology. Since the introduction of NGS, homozygosity mapping has been performed using WGS and ES data and may give clues to the regions where the causal variant lies (Wakeling et al. 2019).

Overview of ARNSHI loci and genes identified and the role of consanguinity

There are over 100 ARNSHI loci (designated by DFNB followed by a number), but for the purpose of this article, we will concentrate on those for which a gene has been identified and have DFNB and OMIM (McKusick 2007) numbers (Table 1). There are several reasons for concentrating on loci for which the gene has been discovered: (1) for loci for which the gene is unknown, the mapped region may be incorrect; (2) more than one DFNB number is sometimes assigned to the same locus/gene; and (3) the family used to identify the DFNB locus was later determined to have a syndromic HI. We only report here on genes with an OMIM number because the validity of these genes has been assessed. Additional information on gene–disease relations can be obtained by accessing ClinGen (Rehm et al. 2015), a consortium focused on curating the strength of gene–disease relations including ARNSHI genes. Although not discussed here, there have also been several candidate ARNSHI genes reported for which additional evidence is needed to irrefutably link them to ARNSHI.

Three years after the first ARNSHI locus, DFNB1A, was mapped to chr13q12.11 in two consanguineous Tunisian pedigrees (Guilford et al. 1994), the first ARNSHI gene, GJB2 was identified in a large consanguineous Pakistani family (Kelsell et al. 1997). The global prevalence of GJB2 variants leading to ARNSHI is 21.3% (Chan and Chang 2014). Inarguably, GJB2 is the major cause of congenital ARNSHI worldwide and there are regional specific enrichment of certain pathogenic variants, for example, p.Gly12fs in Europe (Zelante et al. 1997), p.Trp24X in the Indian subcontinent (Santos et al. 2005), and p.Arg143Trp in Ghana (Adadey et al. 2020).

The successful identification of ARNSHI genes primarily in Pakistani families (41%; N = 30) is due to the high rates of consanguinity where ~ 60% of all marriages are between first cousins (can be as high as 70% in certain remote provinces), relatively large family sizes (3.5 live births per female), and dedicated local scientists (Ullah et al. 2017). Other countries where novel ARNSHI genes have been frequently first reported include: Iran (12.3%; N = 9); Turkey (10.9%; N = 8); India (10.9%; N = 8); and the Netherlands (8.2%; N = 6). (Table 2). Furthermore, 16 (21.9%; 16/73) ARNSHI genes were first identified through the study of families from more than one population. Overall, 93% (68/73) of the ARNSHI loci and 92% (67/73) of the ARNSHI genes were first identified in consanguineous families. The most recent report of a new ARNSHI gene, CLRN2, was identified through the study of a consanguineous Iranian family (Vona et al. 2021). Besides GJB2, a number of genes that were discovered in consanguineous families have been shown to also play a role in ARNSHI in outbred populations, for example, SLC26A4 (Chen et al. 2016), CDH23 (Astuto et al. 2002), and STRC (Vona et al. 2015).

Table 2 Countries used to identify ARNSHI genes and loci

The role of non-consanguineous families in identification of ARNSHI genes cannot be discounted. For example, DFNB18A was identified in a consanguineous Indian family but the gene for this locus, MYO7A was identified through the study of a non-consanguineous Chinese family. Three ARNSHI loci were first mapped in non-consanguineous families, and the corresponding genes identified in the same families: OTOG (DFNB18B) in a Dutch and a Spanish family; WBP2 (DFNB107) in a Chinese family; and ESRP1 (DFNB109) and SPNS2 (DFNB115) both in separate European–American families (Table 1). Although the ARNSHI locus (DFNB3) was mapped through the study of non-consanguineous families, the gene, MYO15A was identified by studying both non-consanguineous Balinese families that were used to map DFNB3 and consanguineous Indian families (Table 1).

The introduction of NGS expedited novel ARNSHI gene identification

In 2009, a seminal article was published, showing proof of principle of detecting causal variants for Mendelian traits using ES (Ng et al. 2010). Since 2010, 32 novel ARNSHI genes (43.8%; N = 32/73) have been discovered using NGS (Table 1).

NGS was first used in 2010 to identify, TPRN (DFNB79), in a consanguineous Pakistani family using a custom targeted capture to interrogate the DFNB79 interval. DFNB79 was mapped using linkage analysis via microsatellite markers in a consanguineous Pakistani family and this family was also used to discover TPRN (Khan et al. 2010). In 2012, the next three ARNSHI genes were identified using NGS (Table 1). CABP2 was discovered by studying three consanguineous Iranian families using a custom capture array that targeted the genes in the DFNB93 interval that was previously identified using SNPs and linkage. TSPEAR (DFNB98) was identified via linkage analysis and ES using DNA samples obtained from a consanguineous Iranian family. Lastly, OTOGL (DFNB84B) was discovered by performing linkage analysis using SNP markers in a consanguineous Turkish family followed by ES.

A number of ARNSHI genes were identified through performing linkage mapping followed by ES in consanguineous families (Table 1). More recently, this evolved to the use of ES only, without prior linkage analysis. The first example being the identification of EPS8 in a consanguineous Algerian family (Behlouli et al. 2014). Although most ARNSHI genes identified are still predominantly through the study of consanguineous pedigrees, NGS has facilitated the ability to perform gene identification in outbred families, for example, WBP2 and ESRP1 (Buniello et al. 2016; Rohacek et al. 2017) that mainly have ARNSHI due to compound heterozygous variants. As described above, outbred families provide little linkage information, and therefore, in most circumstances, they cannot be used to map the ARNSHI locus to a genetic region, which was necessary for the positional candidate approach. With the advent of NGS, it has become possible to perform variant filtering and identify causal genes without prior knowledge of the genetic region containing the causal variant. Currently, the detection of multiple families with variants in the same gene, which may include smaller families, is considered important in establishing gene-disease validity in Mendelian disease (DiStefano et al. 2019). The detection of multiple families and individuals are aided by decreasing NGS costs and resources such as GeneMatcher that connect scientists with interests in the same gene (Sobreira et al. 2015).

ES advanced gene identification for ARNSHI loci that were mapped and remained undiscovered after a decade, for example, GAB1 (DFNB26), CDC14A (DFNB32), ADCY1 (DFNB44), and S1PR2 (DFNB68) (Table 1). Notably, the same consanguineous Pakistani pedigree was used to map DFNB26 (Riazuddin et al. 2000) and identify the gene for this locus, GAB1, 18 years later (Yousaf et al. 2018).

Since the identification of TSPEAR in 2012, > 60% of all ARNSHI genes were discovered utilizing ES. Around 37% of all genes (27/73) were identified before exome sequence data were first used in analysis of ARNSHI. Although linkage studies and homozygosity mapping continue to be valuable tools in gene identification, since 2014, 9 genes (12.3%) were identified without the need for the former (Table 1).

With the decreasing costs of WGS, it can be readily implemented to identify novel ARNSHI genes. In 2019, CLDN9 was the first ARNSHI gene identified using WGS (Sineni et al. 2019). In comparison to exome data, whole-genome sequence data provides more uniform read depth coverage, accurate copy number variant evaluation, and an assessment of the entire genome. However, it remains difficult to interpret single-nucleotide variants and smaller insertions and deletions outside of coding regions.

The utility of RNA NGS to prioritize and identify ARNSHI genes

NGS also advanced the possibility to study the inner ear transcriptome, and more recently single cell RNA sequencing (scRNA-seq) analyses have become more easily accessible. scRNA-seq, as the name indicates, involves the sequence analysis of RNAs per single cell in a tissue. This can be done via fluorescence-activated cell sorting (FACS), microfluidics (chip or droplet methods), and other techniques. Microfluidic partitioning technologies have recently advanced this field significantly as they allow barcoding of RNAs per single cell and the possibility to perform massive parallel scRNA sequencing in various tissues, including inner ear tissues (Kolla et al. 2020).

scRNA-seq has been especially helpful in studying cells of the cochlear and vestibular epithelium during inner ear development. Recently, Kolla et al. (2020) characterized the developing mouse inner ear sensory epithelium using massive parallel scRNA-seq, a resource which the authors made publicly available for other researchers as well. This dataset can be accessed by other scientists to quickly check spatiotemporal expression of novel human candidate genes in the different cell types of the inner ear sensory epithelium, without performing immunohistochemistry and in situ hybridization experiments. This study also found genes that were previously not known to be expressed in mouse hair or prosensory cells, i.e., Rprm, Cd164l2, Ccer2, and Gng8.

scRNA-seq and RNA-seq have also been useful in detecting genes with low expression levels in certain tissues. The first transcriptome of rat cochleae aided in discovering genes that were previously not known to be expressed in certain developmental stages (Cho et al. 2002). More recently, scRNA-seq data from inner hair, outer hair and Dieter’s cells facilitated the annotation of new exons in Mendelian HI genes (Ranum et al. 2019).

Unfortunately, expression data on human inner ear tissues are limited due to the challenges in obtaining tissue. One study adopts NGS to study tissues from the human cochleae and the vestibular system obtained during trans-labyrinthine and trans-cochlear approaches of tumors to the skull base (Schrauwen et al. 2016). Another study profiled microRNA expression in the developing human cochleovestibular nerve and otic vesicles via NGS (Chadly et al. 2018). Further studies of human tissues utilizing scRNA-seq are important to gain insight into inner ear specific transcripts and their spatial expression pattern. One such ongoing study created a single cell map of the developing human cochlea (Yu et al. 2019).

Last, open access tools such as gEAR (Orvis et al. 2020) aid in the exploration and visualization of inner ear expression data generated from various independent research groups and help to prioritize potential novel human HI genes. Other databases with useful inner ear or early craniofacial expression data include the Shared Harvard Inner-Ear Laboratory Database (SHIELD) (Shen et al. 2015), and The Gene expression Omnibus (GEO) (Barrett et al. 2012).

Animal models in the aid of ARNSHI identification

Animal models of hearing loss have been instrumental in the validation of human ARNSHI genes, as a crucial aid in the understanding of the function of these genes in the hearing system, and also to identify candidate genes which can be used to screen human families for causal variation. One such example is Gipc3, a gene in which variants underlie progressive sensorineural HI and audiogenic seizures in mice (Charizopoulou et al. 2011). Once identified as a HI gene in mice, screening of human families revealed its human ortholog GIPC3 was the underlying cause of ARNSHI DFNB15 (Charizopoulou et al. 2011). Other examples in which known animal HI genes aided in human ARNSHI gene identification include MYO15A (Wang et al. 1998) and S1PR2 (Santos-Cortez et al. 2016).

Future directions

Current genetic diagnostic testing for HI using exome or custom capture has a ~ 37–39% diagnostic rate in the United States (Sloan-Heggen et al. 2016; Sheppard et al. 2018). The diagnostic rate differs based on family history and ancestry. For patients from the US, the diagnostic rates depended on family history of HI and the mode of inheritance: 50% ADHI; 41% ARHI; and 37% no family history. The diagnostic success rate by ancestry in one US study was for Europeans 38%; Asians 63%; Middle Easterners 72%; and Africans 26% (Sloan-Heggen et al. 2016). These differences in successful rates by ancestry are impacted by the genetic diversity of the population and the studies on HI that have been performed. For ARNSHI, gnomAD was interrogated by ancestry to evaluate the frequencies of known pathogenic and likely pathogenic variants (Chakchouk et al. 2019). For Latinos and African/African–Americans, the prevalence of HI due to known ARNSHI variants is 26.1 and 5.2 affected per 100,000 individuals, respectively, which is much lower than 96.9 affected per 100,000 individuals for Ashkenazi Jews. This low prevalence might be attributable to the fact that most genetic studies on HI include few Latinos or individuals of African ancestry (Mittal et al. 2018).

The understanding of the genetic spectrum of causal variation in ARNSHI is crucial in diagnostic testing. Studying ethnically/racially diverse populations is important to discover novel genes and variants, since some may be ancestry specific. In the context of ARNSHI, there are populations with high consanguinity that have not been studied but may be informative to identify novel ARNSHI genes, for example, Sudan (63% consanguinity and 4.3 live births per woman), Mauritania (47% consanguinity and 4.79 live births per woman), and isolated Egyptian Nubians (80% consanguinity and 3.3 live births per woman) (Saha et al. 1990; Tadmouri et al. 2009; Anwar et al. 2014; Romdhane et al. 2019). Studies of some populations may also be limited due to geopolitics and/or inaccessibility to the scientific community, for example, Sudan and North Korea. Overall, there is an imminent need to study diverse populations, especially those from sub-Saharan Africa, to improve our understanding of the genetic spectrum of ARNSHI.

Genomic technologies are constantly improving in accuracy and affordability. Newer technologies such as long-read sequencing may be able to capture complex structural variants or regions of the genome not well assessed via short-read sequencing and improve pathogenic variant discovery.

Last, the interrogation of multi-omic datasets including transcriptomics, epigenomics, proteomics, and metabolomics will improve our assessment of possible pathogenic variation. In addition, large-scale animal model phenotyping projects and databases, such as The Zebrafish Information Network (ZFIN; for zebrafish), FlyBase (for drosophila melanogaster), and the International Mouse Phenotyping Consortium (IMPC) and Mouse Genome Informatics (MGI), also provide helpful links to animal HI genes to potential aid in identifying human ARNSHI genes.

In conclusion, NGS has progressed the identification of novel ARNSHI genes in the last decade. Consanguineous pedigrees remain invaluable resources to identify new ARNSHI genes. To further advance our knowledge on the genetic architecture of ARNSHI, research should focus on novel sequencing techniques, and the study of diverse and understudied populations such as Africa.