Introduction

The organ of Corti acts as a mammalian phonoreceptor and consists of hair cells acting as mechanical receptors, together with at least four kinds of supporting cells. They form a very refined and sophisticated structure together with the nerve fibers projecting on the hair cells. Such a tissue organization has attracted the interest of biologists for a long time and has been closely observed morphologically. In addition, because an abnormality in the organ of Corti can cause hearing loss and degrade the quality of life, its clinical research has great significance. Generally, damaged hair cells are not regenerated in mammals; therefore, to date, loss of hair cells inevitably leads to permanent hearing loss. However, the finding that the avian hair cells can regenerate has accelerated studies on the mechanisms for hair cell regeneration, providing the prospect of the regeneration of auditory function. In particular, the mechanism that determines the fate of hair cells is considered to be the most important clinical issue, and several important findings have been reported regarding gene expression cascades and signal transduction [1, 2].

However, many details remain to be determined, especially about the cooperative regulation between morphogenesis and molecular events for cell differentiation. As described below, the complexity of the tissue structure and the continuing occurrence of developmental processes along the spiral organ of Corti seem to make it difficult to understand these issues. In the spiral cochlea, developmental events start at a certain position on the long axis of the spiral (apical, medial, or basal part) and then proceed along the long axis in destined direction(s). Therefore, various developmental stages of auditory epithelium continuously exist along the spiral axis at any time in the developing organ of Corti. In addition to the longitudinal axis, timing of the cell differentiation differs depending on the mediolateral (radial) position of the auditory epithelium; the differentiation of inner hair cells precedes that of outer hair cells [3]. This means that an individual developmental event proceeds in a very small confined area, and conversely, many events corresponding to different developmental stages run concurrently at respective regions in a whole organ of Corti. Therefore, this review will summarize the current knowledge concerning the developmental events in the organ of Corti, with a focus on their spatiotemporal dynamics, which will help to provide a deeper insight into its developmental mechanism. In the first section, the morphology of the organ of Corti will be outlined. The next two sections deal with the fate determination of auditory epithelial cells and the subsequent formation of the tissue architecture of the auditory epithelium, highlighting several important topics during each developmental process. In illustrating the molecular events involved in the cell differentiation, their spatiotemporal relationship with tissue morphogenesis is emphasized as much as possible. In addition, unanswered questions concerning the morphogenesis of the organ of Corti will be discussed in these sections. The answers to these questions could aid our further understanding of the mechanism that establishes the complex tissue architecture of the organ of Corti.

Tissue architecture of the organ of Corti

The organ of Corti is the auditory receptor in the cochlea of the inner ear, which is situated in the cavity inside the temporal bone, forming the bony labyrinth. The bony labyrinth consists of three parts, the medial cochlea, the lateral semicircular canals, and the vestibule connecting them. In the bony labyrinth, there is a continuous epithelial sac, forming a membranous labyrinth, which contains the highly differentiated sensory epithelia that are responsible for sense of hearing and equilibrium. There are six sensory organs in the inner ear: three ampullary crests in the individual semicircular canals; the macula of saccule and utricle in the vestibule, which detect the equilibrium; and the organ of Corti on the bottom floor of the cochlea, which senses sound. The epithelium of the membranous labyrinth outside the sensory organ consists of a monolayer epithelium, while the sensory region is a two-layered epithelium consisting of supporting cells and hair cells. Supporting cells are columnar, lie on the basement membrane, and reach the luminal surface of the epithelial sheet to fill the space between the hair cells, which are embedded in “salt and pepper” pattern among the supporting cells, being not in contact with the basement membrane but positioned on the cell body of the supporting cells, constituting the upper layer of the double-layered epithelium. In the sensory epithelium, hair cells having a large number of stereocilia on the apical surface of the cell perform the central function of sensory reception as a mechanical receptor.

The organ of Corti is a long band-shaped epithelium constituting the bottom of the cochlear duct, which coils around the modiolus. There are two types of hair cells, inner hair cells and outer hair cells, which form a line in one row and three rows, respectively, from the bottom of the cochlea towards the apex (Fig. 1). The stereocilia are arranged in a V shape or W shape on the apical surface of hair cells, and the direction of the stereocilia is aligned with the central tip of the V that faces the outside of the spiral. A gelatinous structure, called the tectorial membrane, which covers the auditory epithelium, touches the stereocilia. When sound reaches the inner ear, the organ of Corti on the bottom of the cochlear duct moves via oscillation of the perilymph fluid in the scala tympani. The stereocilia are then bent by the tectorial membrane, leading to a change of membrane permeability of the stereocilia and depolarization of the hair cells, which is transmitted to the cochlear nerve via the synapse [4]. The cell body of the cochlear nerve is in the spiral ganglion near the modiolar axis. The nerve fiber passes through the bottom of the organ of Corti, and afferent fibers mainly project to inner hair cells, while efferent fibers project to outer hair cells [5].

Fig. 1
figure 1

A schematic diagram showing the structure of mature organ of Corti of perinatal mice. Inner phalangeal cells (IPhC, purple) and inner hair cells (IHC, deep green) align at the modiolar side. Inner pillar cells (IPC, yellow) and outer pillar cells (OPC, pink) form the (inner) tunnel of Corti. In more lateral, outer regions, Deiter’s cells (DC, light blue) and outer hair cells (OHC, pale green) align in three rows. The gelatinous tectorial membrane (TM, beige) covers over the organ of Corti. Blue; inner sulcus cells (ISC), gray; Hensen’s cells (HC). The same color representation as this figure is also applied to Figs. 2 and 3. This figure is by Kelley et al. and was modified [6]

Fig. 2
figure 2

a View of luminal surface of a mouse organ of Corti (P0) stained with fluorescent phalloidin. Outer hair cells form a line in three rows in a staggered pattern and are separated from each other by supporting cells. Cellular boundary and stereocilia on hair cells are intensely stained by phalloidin. b Schematic diagram of image a. Colors represent same cell types as in Fig. 1

Fig. 3
figure 3

3D diagram showing tissue structure of the organ of Corti. Outer hair cell (pale green), subjacent Deiter’s cell (light blue), and its phalangeal process form a Y-shape. Colors represent same cell types as in Fig. 1

Viewing the auditory epithelium of organ of Corti from the luminal surface side, the hair cells and supporting cells form a characteristic mosaic pattern [7], with hair cells surrounded by supporting cells; therefore, the hair cells are not in contact with each other (Fig. 2). There are at least four types of supporting cells in the auditory epithelium: inner phalangeal cells, on which the inner hair cells are placed; inner and outer pillar cells separating the inner hair cell region and the outer hair cell region; and Deiter’s cells or outer phalangeal cells, which support and separate the outer hair cells. The inner and outer hair cells are flask-shaped or cylinder-shaped, and their bottom is on an inner phalangeal cell or a Deiter’s cell, respectively. The lateral side of the outer hair cells is in contact with Deiter’s cells in immature stages, but becomes surrounded by perilymphatic fluid in mature organ of Corti. Deiter’s cells have a narrow phalangeal process extending upward from the cell body, and their top end reaches the luminal surface of the auditory epithelium, filling the space among the apical surfaces of the hair cells and forming tight junctions with them. This luminal surface of the auditory epithelium is named the reticular plate, and isolates the endolymph fluid in the cochlear duct (the middle floor) and the perilymph that fills the basolateral surrounding of the auditory epithelial cells and the scala tympani [8]. The phalangeal process of Deiter’s cells does not extend straight upward to form the tight junction with the hair cell itself, but is inclined apically and laterally to form the tight junction with the hair cell across one to three cells in the apical direction and in the adjacent outside row [9, 10]. By contrast, the outer hair cells are inclined toward the base of the cochlea. As a result, an outer hair cell, a Deiter’s cell, and its phalangeal process together form a Y-shaped structure (Fig. 3). The inner and the outer pillar cells divide the inner region composed of the inner phalangeal cells and the inner hair cells, and the outer region composed of Deiter’s cells and the outer hair cells. In a mature organ of Corti, a tunnel is formed between the inner and outer pillar cells. The outer side of the Deiter’s cells and the outer hair cells, several rows of Henzen’s cells, and short Claudius’ cells cover the following outer sulcus to the stria vascularis. In the modiolar side, border cells are located immediately adjacent to the inner hair cells and inner phalangeal cells. Further inside the border cells is a groove called the inner sulcus, which follows the spiral limbus, and the surface is covered with a single layer of inner sulcus cells.

Development of the organ of Corti

Developmental events of the mouse auditory epithelium can be classified as follows: formation of the prosensory epithelium, differentiation of cells constituting the auditory epithelium, and organization of the epithelial tissue structure. This section will introduce previously known information on these developmental events, focusing especially on understanding them from a morphological point of view. The first half subsection illustrates the fate determination process of the prosensory domain and its constituting epithelial cells, in which undifferentiated, homogeneous cells receive signaling molecules and start to differentiate into various cell types with different shapes and functions. The latter half subsection deals with the process of tissue organization, in which individual cell types behave to form the fine structure of the auditory epithelium.

After the basic structure of the auditory epithelium is formed, further differentiation processes, such as stereocilia formation on the apical surface of hair cells, go on. While many of these cell differentiation processes are essential to acquire hearing function, these will not be discussed in this review because of limited space.

Fate determination of auditory epithelial cells

Mapping of the prosensory epithelium

Sensory organs of the inner ear are derived from the otic placode as shown in Fig. 4 [11].

Fig. 4
figure 4

Formation of the otic vesicle. Frontal sections of the occipital region are illustrated

The otic placode is a thickening of the epidermis appearing on the side of the neck of at embryonic day (E)8.5–8.75, and 9–12 (the somite stage) in mice. The otic placode invaginates to form an otic pit, and is pinched off from the epidermis. The epithelial vesicle is called an otocyst. The otocyst receives signal molecules from adjacent tissues, and forms a complex membranous labyrinth with six sensory epithelia. In this process, signals secreted from the hindbrain and the notochord regulate the regionalization. Exposure to high concentrations of WNT ligands secreted from the hindbrain dorsalizes the dorsomedial part of the otocyst [12]. By contrast, SHH secreted from the basal plate of the hindbrain and notochord ventralizes the otocyst [13, 14]. The ventral end of the otocyst protrudes at E11 and begins to form a cochlear duct (Fig. 5). The cochlear duct continues to extend until the perinatal period, and eventually reaches 1 and 3/4 turns in mice. From the rest of the otocyst, the equilibrium organs and endolymphatic duct are formed from the lateral and dorsal side of the otocyst, respectively. The bone matter of the bony labyrinth is formed from the mesenchyme around the membranous labyrinth. In the elongating cochlear duct, signal proteins, and transcription factors such as JAG1, SOX2, WNTs, and BMP4, which are involved in the fate determination of the auditory epithelium, begin to be expressed in a site-specific manner (Fig. 6). The area destined to become the sensory epithelium is called the prosensory domain. SOX2 is a transcription factor expressed in the prosensory domain and is essential for its formation, as this domain does not form in SOX2-deficient mice [15]. For the expression of SOX2, expression of the Notch ligand, JAG1, in the prosensory domain is required [16]. Notch receptor is widely expressed in the otic placode-derived tissues [17]. FGF20 [18], and FGFR1 [19] are involved in this process. The region outside the SOX2-positive region, which becomes the outer sulcus, expresses BMP4, which is an important morphogen defining the lateral boundary of the prosensory domain [20]. The modiolar region, opposite to the BMP4-positive region across the prosensory domain, is a thick pseudo-stratified epithelium called Kölliker’s organ, which is seen only in fetal stages [21]. In Kölliker’s organ, several WNT proteins, including WNT5a, WNT7a, and WNT7b, are dynamically expressed during development [22, 23]. In addition, a transgenic mouse with a Wnt reporter gene, Lgr5-EGFP, whose expression is induced by canonical WNT signaling, revealed the activity of canonical WNT signaling in the Kölliker’s organ of the developing cochlear [24]. Munnamalai et al. reported that WNT signals can promote cell differentiation into inner hair cells by downregulating BMP4 function and suggested that WNT signaling specifies the inner boundary of the prosensory domain through antagonizing BMP4 activity [25]. Later in development, many epithelial cells in the Kölliker’s organ are lost by apoptosis and the tissue becomes a thin single-cell-layered epithelium called the inner sulcus [26, 27].

Fig. 5
figure 5

Development of the membranous labyrinth of the inner ear. Dark blue; endolymphatic duct, green; six prosensory epithelia

Fig. 6
figure 6

Region specification of cochlear epithelium and the development of the organ of Corti. Cross-section views of the cochlear duct are shown. The right side indicates the modiolar, and the left side indicates the lateral sides of the cochlea. Yellow cells indicate neuronal cell bodies, which form a nascent spiral ganglion. The organ of Corti is derived from the cochlear prosensory domain-expressing transcription factor, SOX2. JAG1 is needed for the expression of SOX2. BMP4 and WNTs signals restrict the outer and inner boundary of the prosensory domain, respectively. In the prosensory region, the cell lineage, including inner phalangeal and inner hair cells, is firstly differentiated and the inner hair cells secret FGF8. Figures are adapted from studies by Fekete et al. [28] and Basch et al. [2] and are modified

After the above regionalization of the floor of the cochlear duct, two important events proceed sequentially in inverted directions from the apex or the base of the cochlear duct, respectively. The first one, starting from the apex, is the cell cycle exit of the epithelial cells of the prosensory domain. The second one, starting from the base, is fate decision and differentiation of auditory epithelial cells. These events are explained in the following paragraphs.

Cells exit the cell cycle from the apex

Cells of the prosensory domain are actively dividing from E10 to E12 [29]. The prosensory domain, before the final division, forms a pseudo-stratified epithelium with interkinetic migration, which is a vertical intracellular nuclear migration accompanied by the cell division cycle, like other pseudo-stratified epithelia, such as the neuroepithelium. In these pseudo-stratified epithelia, cell nuclei in the M phase are at the luminal side of the epithelium, and those in the S phase are at the bottom [30]. Cells in the prosensory domain undergo the final cell division from E12 to 14 and exit from the cell cycle. Cell cycle exit proceeds from the apex toward the cochlear base over about 2.5 days [29]. Cell cycle exit is caused by the expression of cyclin-dependent kinase inhibitor p27(KIP1) in prosensory epithelial cells [31, 32]. Because p27(KIP1) inhibits the cyclin-dependent kinase required for cell cycle progression, cells expressing p27(KIP1) exit from the cell cycle. The expression of p27(KIP1) starts at the cochlear apex and extends downward to the bottom and covers all SOX2-positive cells by E14. After the cell cycle exit, interkinetic migration is no longer observed in the prosensory domain and the nuclei accumulate at the bottom of the epithelium. The epithelial cells outside the prosensory domain continue cell division in the Kölliker’s organ and in the lateral strial region, and cell nuclei are sporadic throughout the epithelial depth [21].

Cell differentiation occurs from the base of the helix

In contrast to cell cycle exit, differentiation of hair cells and supporting cells from undifferentiated prosensory epithelial cells starts from the mid-basal region of the cochlear duct. For the initiation of the differentiation of prosensory epithelial cells, the spiral ganglion is considered to play a significant role, because the developmental disruption of the spiral ganglion disturbs the precedence of cell differentiation in the cochlear base [33,34,35,36,37]. In particular, the importance of SHH has been reported. SHH is secreted from the spiral ganglion [38], and its expression gradually decreases from the base of the cochlea at the time of hair cell differentiation. Knockout of Shh in the spiral ganglion, or of the SHH receptor SMO in the organ of Corti, leads to the precocious differentiation of hair cells [39, 40]. These results indicated that SHH inhibits premature differentiation of hair cells, and downregulation of SHH expression induces the differentiation of prosensory epithelial cells from the cochlear base.

ATOH 1 is indispensable for fate determination of auditory epithelial cells

In the early prosensory domain, equivalent homogeneous epithelial cells differentiate into hair cells and supporting cells, with different functions and morphologies. How is this differentiation regulated? In asymmetric cell division, unequal distributions of differentiation determinant factors to daughter cells make equivalent undifferentiated cells into two different kinds of cells. The asymmetric cell division of Drosophila neural stem cells is an example of this phenomenon. In this asymmetric division, the Numb protein, an inhibitor of Notch signaling, unequally distributes to daughter cells, and the difference in Notch activity between the two daughter cells leads to the different cell fates: a neural cell and a neural stem cell [41]. Eddison et al. observed a higher Numb expression level in hair cells than in supporting cells in the chicken auditory epithelium [42], but subsequent analysis confirmed that the asymmetric localization of Numb did not contribute to fate determination of prosensory cells [43]. Another analysis of lineage tracing using fluorescently-labeled prosensory cells showed that the ratio of daughter cell fate (hair cell vs. supporting cell) was not constant, supporting the hypothesis that the fate decision of prosensory epithelial cells is made after the final cell division [44]. What mechanism determines the cell fate of hair cells or supporting cells after the final division? It is the same question as when and how hair cells and supporting cells acquire cell-specific gene expression patterns. Previous studies have shown that the most important factor in hair cell differentiation is ATOH1, a bHLH-type transcription factor. The expression of ATOH1 is considered to be necessary and sufficient for the development of hair cells, because they are completely deleted in Atoh1 null-mutant mice. By contrast, ectopic hair cells are produced under the presence of forced, ectopic expression of ATOH1 [45]. The expression of ATOH1 begins at E13 in the organ of Corti in mice and declines after birth [46]. Factors responsible for inducing and initiating the expression of ATOH1 in prosensory epithelial cells have not been determined, but the significance of SHH as a negative regulator, and canonical WNT signaling as a positive regulator, has been suggested [38, 40, 47, 48].

Cell fate decision by the Notch lateral inhibition system

In the fate determination of sensory epithelial cells, Notch signaling plays an important role to enhance the difference of the expression levels of ATOH1 between future hair cells and supporting cells [49, 50]. This mechanism is termed lateral inhibition of Notch signaling and is an evolutionarily conserved cell–cell interaction system between adjacent cells, which makes homogeneous adjacent cells direct different cell fates. Notch signaling starts with the binding of membrane-bound Notch ligands to Notch receptors on adjacent cell membrane. Ligand binding to the Notch receptor triggers the incision of intracellular domain of the Notch receptor (NICD). NICD translocates into the nucleus and transactivates its target genes. In the simple model of Notch lateral inhibition, the initial cell population expresses the Notch ligand, Notch receptor, and the Notch target gene homogeneously. Among these, the target gene of Notch signaling, whose encoding protein downregulates the expression of Notch ligand and suppresses a specific cellular differentiation, is supposed to exert a critical role. If the expression levels of these Notch signaling components are equal among homogenous cell population, the levels of the Notch activity in these cells should be the same. However, there is a small cell-to-cell variation (noise) in the expression levels of these components. Therefore, the expression level of the Notch ligand in some cells can be a little higher than in others by chance. The increased Notch ligands send more signals to adjacent cells, in which a stimulated expression of Notch-target gene results in a suppression of cell differentiation and the expression of Notch ligands. On the other hand, the initial cell with the high expression level of Notch ligand receives less Notch signaling because of decreased expression of Notch ligand in the adjacent cells and differentiates into a particular fate. According to this relationship, neighboring cells differentiate into two kinds of cells: one that abundantly expresses Notch ligands and sends Notch signals to its neighbors, and another that expresses less Notch ligand and receives more Notch signals from neighboring cells. Thus, this is the mechanism by which a differentiating cell inhibits surrounding cells from differentiating in the same way as itself [51]. In the inner ear, the Notch receptor, Notch ligands including JAG2 and DLL1, Notch target genes such as Hes and Hey, and ATOH1 are expressed in the prosensory cells and are mainly involved in lateral inhibition during the cell fate determination. In the subset of prosensory cells, expression levels of JAG2 and DLL1 are elevated and the Notch signal is sent to the adjacent cells, which then express HES and/or HEY transcriptional repressors. These are bHLH-type transcriptional repressors specific for the supporting cells in the auditory epithelium. They bind to the Atoh1 promoter and suppress its transcription. By contrast, in the nascent hair cells, the expression level of HES or HEY declines as they receive less Notch signaling from adjacent nascent supporting cells, and ATOH1 expression is enhanced and maintained by both of absence of HES/HEY repressors and by ATOH1 auto-regulatory enhancement. In this way, the cell fates of hair cells and supporting cells are determined and a cell-specific gene expression pattern is acquired by each population [50, 52,53,54].

It has been revealed that Notch signaling plays two different roles in cochlear development. In the early specification of the prosensory domain, Notch works in the “lateral induction” mode, in which Notch signaling is activated in a positive feedback system, and cells send and receive Notch signals to each other in equal strength [55]. By contrast, in the later stage of fate decision of hair cells and supporting cells in the sensory epithelium, Notch signaling works in the above-mentioned “lateral inhibition” mode. The way in which Notch signaling works depends on the type of ligand (JAG1-2 and DLL1-3) and the strength of the signaling in each system, which can be regulated by modifiers, such as fringe proteins [55,56,57]. As described above, JAG1 contributes to the lateral induction in the specification of the prosensory epithelium in the early stage, while JAG2 and DLL1 participate in the lateral inhibition in the fate determination of hair cells and supporting cells in the later stage.

In addition to Notch signaling, other factors are also involved in the expression regulation of ATOH1. For example, SOX2 suppresses the expression of ATOH1 and promotes the expression of PROX1, which is a transcription factor specific to nascent supporting cells [58, 59]. In fact, SOX2 expression becomes restricted to the supporting cells in later stages.

Timing and mechanism of the differentiation of inner hair cells and outer hair cells

The types and arrangement of hair cells and supporting cells of the equilibrium sensory organs are relatively simple; however, they are more complex in the organ of Corti. The cell differentiation of the inner region of the auditory epithelium, including inner hair cells and inner phalangeal cells, precedes that of the outer region, including outer hair cells and Deiter’s cells [57, 60]. Cells in the inner region are first specified with the expression of LFNG (lunatic fringe homolog) at the boundary region between the Kölliker’s organ and the prosensory domain at E13.5 [57]. LFNG expression is thought to be important for their specification. Fringe proteins, comprising lunatic fringe (LFNG), manic fringe (MFNG), and radical fringe (RFNG), are glycosyltransferases that extend the sugar chain of the Notch receptor, thus modifying its affinity for its ligand. Fringe lowers the efficiency of Notch signaling from the JAG ligand [61, 62]. Cells in the inner region express LFNG; therefore, these cells receive a low level of Notch signal, whereas Kölliker’s organ cells that do not express LFNG receive a moderate Notch signal. It is thought that this difference determines the reactivity to a signal that promotes the differentiation into the cell lineage of the sensory epithelium, because the experimental lowering of the level of Notch signaling received by Köllikers organ cells led to these cells forming excessive inner hair cells/inner phalangeal cells [57]. After the fate of the inner cell lineage has been determined, the lateral inhibiting mechanism of Notch signaling plays a role among these cells to decide whether they develop into inner hair cells or inner phalangeal cells. Once the differentiation of inner hair cells begins, these cells become the source of FGF8. In situ hybridization showed that Fgf8 is detected only in inner hair cells at E15.5. FGF8 acts on FGFR3 expressed in adjacent nascent pillar cells on the lateral side, to induce these cells to express the Hey2 gene [63, 64]. While FGF8 can induce the expression of HEY2 in pillar cells, independent of Notch signaling, Notch signaling can also induce HEY2 expression in these cells [65]. When both signals were inhibited, the nascent pillar cells no longer expressed HEY2, and differentiated not into pillar cells, but into outer hair cells and Deiter’s cells. This result indicated that HEY2 is essential for pillar cell differentiation [65]. BMP4 secreted from the outer sulcus region suppresses FGF8’s action such that epithelial cells in the outer hair cell region do not differentiate into pillar cells [66]. Finally, outer hair cells and Deiter’s cells differentiate in the area outside the pillar cells. This process requires FGF20/FGFR1 signaling. In mice mutated for the Fgf20 gene, which is expressed in the prosensory epithelial region, or the Fgfr1 gene, the Notch lateral inhibition mechanism does not work and epithelial cells in the lateral part of the auditory epithelium remain undifferentiated, while inner hair cells differentiate [3, 67,68,69]. In addition to the above-mentioned genes, the significance of other genes in the development of outer hair cells has been reported. In Emx2 KO mice, no outer hair cells are generated, while two lines of inner hair cells appear [70]. In conditional KO mice for the Jag1 gene, which is eliminated in the inner ear as early as the otic placode period, only outer hair cells are deleted [71]. Although these results showed the importance of these molecules in the generation of outer hair cells, their mechanism of action has yet to be determined. Once the cell fate is specified into outer hair cells and Deiter’s cells, the subsequent lateral inhibition of Notch signaling and the differentiation of ATOH1 expression levels take place in determining the fate between the outer hair cells and the Deiter’s cells [52, 54].

Tissue organization of the auditory epithelium

Observation of hair cells and supporting cells during organogenesis

Hair cells are characterized by actin-based cellular protrusions, stereocilia, at the top of the cell. However, as the stereocilia form in the late stage of embryogenesis, hair cells and supporting cells in the developing organ of Corti cannot be distinguished by the presence of stereocilia. Therefore, it is necessary to distinguish these cells by other means to investigate their behavior in the early stage of differentiation. The expression of the above-mentioned transcription factor, ATOH1, is specific to hair cells, and anti-ATOH1 antibodies label only a small number of cells in the prosensory epithelium at E14.5 [72]. However, the expression of ATOH1 is not exclusively confined to nascent hair cells, but is detected in some supporting cells in the early stages, while such ATOH1-labeling of supporting cells decreases at the cochlear base by E15 [58, 73]. Currently, detecting the expression of MYO6 (myosin VI) is an established method to identify hair cells at the earliest stage (Fig. 7). Although MYO6 expression is required for postnatal synapse formation on the hair cell body, morphogenesis of stereocilia, and hearing ability, MYO6 is not involved in the fate determination process of hair cells and its genetic deletion does not affect the fate of hair cells [74,75,76,77]. In contrast to ATOH1, with the relatively broad expression pattern in the early stage, MYO6 is expressed exclusively in nascent hair cells from the early stage. Although MYO6-positive cells in the otocyst in the study by Xiang et al. [78] at E13.5 could not be definitely identified, in the developing prosensory epithelium at E14.5 or E15.5, the expression of MYO6 is confined in a row of inner hair cells. The MYO6 expression begins near the cochlear base and spreads to the apex until the neonatal stage [40, 79,80,81,82,83]. Outer hair cells later become MYO6-positive, also from the base of the cochlea. The epithelium at the most apical region is negative for MYO6 before birth. While the reported timings of the initiation of MYO6 expression in outer hair cells show some variation, it seems to be consistent that the prosensory epithelium, with MYO6-positive outer hair cells, has the architecture of a stratified epithelium, namely, hair cells in the upper layer and supporting cells in the lower layer.

Fig. 7
figure 7

Initiation of MYO6 expression in cochlear hair cells. Fluorescent immunostaining for MYO6 of the organ of Corti at E16.5 (green, ac) and E18.5 (cyan, d). Cell nuclei are stained with 2-(4-amidinophenyl)-1H-indole-6-carboxamidine (DAPI) (magenta). Mature hair cells are positioned on the supporting cells and express MYO6 robustly at embryonic day (E)18.5. At E16.5, however, MYO6 is not detected in all hair cells; it is detected in the inner and outer hair cells of the basal turn (base), and in the inner hair cells of the mid turn (mid), and is not detected in the apical turn (ap)

Another strategy of staining F-actin (filamentous actin) with fluorescent phalloidin has been widely used to identify hair cells in the surface preparation of auditory epithelium [60]. By this method, hair cells with stereocilia can be clearly identified by their high level of staining; however, at earlier stages, the circumferences of the hair cells are strongly stained with phalloidin; therefore, they can be distinguished from surrounding supporting cells by viewing the epithelium from the luminal surface (Fig. 8). Using this method, a row of inner hair cells can be identified as early as approximately E14.5–E15 at the cochlear base. At this stage, epithelial cells in the outside region seem to be uniform and the outer hair cells cannot be identified; however, they become distinguishable around E15 to E16 as circumferential phalloidin staining becomes stronger [60, 84]. Furthermore, for the outer hair cells, it is also possible to discriminate them from supporting cells based on their vertical position in the epithelium. The mature organ of Corti comprises a bilayer epithelium, and hair cells placed on the supporting cell bodies are not in contact with the basement membrane. The cell body of a prosensory epithelial cell in the immature stages is vertically elongated and very narrow; therefore, only the nuclear location can be clearly identified under optical microscopes. Cell nuclei before the final division are sporadic in the immature prosensory epithelium. After the final division, the nuclei of the epithelial cells accumulate in the bottom layer of the epithelium. Thereafter, a subset of nuclei seems to emerge in the upper part of the epithelial tissue near the cochlear duct lumen. Scattered cell nuclei in the upper and deeper parts gradually align into two flat layers, and the auditory epithelium establishes a two cell-layered stratified epithelium [60]. Around the time of this vertical cell rearrangement, cells in the upper layer lose contact with the basement membrane, although the precise timing is not clear, and differentiate into outer hair cells. In contrast, inner hair cell nuclei can be hardly identified by their vertical position, because their position is not luminal, but deep, similar to other supporting cells in the epithelium. Instead, their alignment in one row parallel to the longitudinal axis of the cochlear spiral enables their identification when viewed from above. These morphological features of inner and outer hair cells can be found at more apical immature regions than known marker expression such as MYO6. Taken together, hair cell fate seems to be determined when postmitotic nuclei of prosensory epithelial cells remain in the deep layer of the prosensory epithelium. Once hair cell fate is decided, outer hair cell nuclei move to the luminal layer and the cell body leaves the basal layer. In addition, both inner and outer hair cells accumulate F-actin at the cell circumference, and then start to express MYO6 over the detection limit. To estimate the speed of differentiation, time-lapse imaging of a live organ of Corti or comparison of fixed samples at different stages would be useful. Our observation of fixed mouse samples indicated that the number of inner hair cells identifiable by their morphology and phalloidin accumulation is 450–550 and 500–550 at E16.5 and E17.5, respectively. In addition, there are about 100 or 50 inner hair cells between the most apical identifiable inner hair cell and the most apical identifiable outer hair cells at E16.5 and E17.5, respectively ([85] and unpublished data). Although live image analyses would provide more accurate data, it could be roughly estimated that the differentiating process from an uniform immature epithelium to the stratified epithelium with identifiable outer hair cells takes about 2 days.

Fig. 8
figure 8

Confocal images of the surface preparation of immature prosensory epithelia at embryonic day (E)17.5 stained with fluorescent phalloidin (white) and 2-(4-amidinophenyl)-1H-indole-6-carboxamidine (DAPI) (magenta). The upper panels indicate xz section images, and the lower panels indicate xy section images at the apical plane. a shows a more apical, immature region than b. The blue circle in the lower image indicates an inner hair cell, and its nucleus is labeled with a blue circle in the upper panel. Inner hair cells could be identified in region a by their alignment. Inner and outer hair cells could be identified in region b by their intense phalloidin staining, as well as their alignment. Inner hair cells in region b are positive for MYO6 (data not shown)

Planar cell polarity (PCP) signal controls convergent extension of auditory epithelium

Epithelial cells usually have an apical–basal polarity, which is a structural basis for epithelial functions, such as barrier function and the directional transport through the epithelial sheet. In many cases, epithelial tissues in vivo have an additional polarity, planar cell polarity (PCP), which corresponds to the planar information, such as proximal/distal and anterior/posterior, along the epithelial plane perpendicular to the apico-basal axis. Many proteins involved in the formation of PCP have been identified and are called PCP signaling molecules [86]. PCP signaling responds to the planar information in the tissue and polarizes the localization of intracellular molecules to reflect the planar information in individual epithelial cells. In addition, PCP signals also function to control a directional tissue movement, termed convergent extension. Convergent extension is defined as a developmental cell movement that narrows the width and elongates the length of the tissue [87]. Convergent extension has been studied particularly in body axis elongation in Xenopus development. During this process, spindle-shaped mesodermal cells intercalate with each other in a mediolateral orientation, resulting in the elongation of the whole tissue along the midline and mediolateral convergence. Convergent extension is also observed in epithelial tissue. A typical example is germ band extension during Drosophila development [88]. In germ band extension, individual epithelial cells do not exhibit a spindle shape with mediolateral stretching, but they are polygonal and rearrange their position to change the gross shape of the tissue. This cell rearrangement is achieved by dynamic remodeling of cell–cell junctions, which consists of degeneration and regeneration of the cell–cell adhesion structure (Fig. 9). Actin and non-muscle myosin II (NMII) accumulate at the cell–cell junctions along the direction of convergence, causing them to shrink. This shrinkage forms a “rosette,” in which more than four cells meet at a vertex. After rosette formation, a new cell–cell junction is formed from the vertex in the direction of tissue extension (perpendicular to the preceding junctional degeneration), and the rosette dissociates. By repeating this process, the tissue changes the aspect ratio of cell number and converges in width and extends in length as a whole tissue. Thus, the directional degeneration and regeneration of cell–cell junctions drives the convergent extension of the whole tissue [89, 90].

Fig. 9
figure 9

Cell–cell junction remodeling in the convergent extension of the epithelial tissue. Degeneration of the cell–cell junctions between two cells (shown in red a) leads to the assembly of four cells including two cells on either side of the original junction and two cells at the ends of the junction at the degenerated point (vertex a, shown in black). Similarly, degeneration of a continuous cell–cell junction connecting three cells (shown in red c) induces the assembly of two cells from the both ends of the junction and three cells lying side-by-side with the degenerating junction, resulting in the meeting of five cells at the vertex. The arrangement of more than five cells is called a “rosette.” Thus, degeneration of a continuous junction connecting four cells (shown in red b) results in the formation of a rosette with six cells. Once a rosette is formed or four cells meet at a vertex, a new junction (shown in blue) is formed from the vertex in the direction perpendicular to the degenerated junction. This forms a cycle of cell–cell junction remodeling and leads to cellular rearrangement in the epithelial tissue. By repeating such a cellular rearrangement in a specific orientation, the gross shape of the tissue converges and extends without cell number changes

Functional analysis of PCP signaling factors has revealed their developmental functions in mammals. In the inner ear, elongation of the organ of Corti, arrangement of the auditory epithelial cells, and the orientation of the stereocilia on hair cells are affected by deficiencies in PCP signaling. PCP signaling-deficient mice exhibit short and wide organs of Corti and excessive hair cell rows, showing that these defects are induced by the inhibition of convergent extension [82, 83, 91,92,93]. Detailed analysis of the normal morphogenesis of the auditory epithelium indicates that the tissue elongation process coincides with the definition of convergent extension. Namely, after the final cell division, the number of total epithelial cells does not increase throughout the tissue elongation process [60]. In addition, in the immature auditory epithelium, rosette structures are observed and inhibition of NMII function suppresses convergence and elongation [84, 94]. These results confirmed that elongation of the auditory epithelium is a result of convergent extension.

Issues involved in the convergent extension of the auditory epithelium

Although convergent extension is considered to be responsible for elongation of the developing organ of Corti, there are many issues about its progression to be elucidated that remain to be explained. The first point is the timing of convergent extension movement, that is, how it progresses in relation to the structural changes of the auditory epithelium and cell differentiation in the epithelium. According to morphological observations, the organ of Corti elongates from E14 to E19, and particularly from E16 to E17 [60]. In addition, Chacon-Heszele et al. reported that many typical rosettes were observed in the undifferentiated prosensory epithelium E14–E15, and decreased as development proceeded [84]. These data show rosette-like epithelial cell meetings in the auditory epithelium at E15.5, in which outer hair cells can be clearly identified by F-actin accumulation. This means that the cellular rearrangement is ongoing in the stratified auditory epithelium. It is likely that cell motility during cell rearrangement differs in Deiter’s cells and in hair cells, with or without adhesion to the basement membrane, respectively. However, the mechanism for the formation and dissociation of rosettes in such a heterologous, stratified epithelium has not been elucidated in the organ of Corti, or in any other system during convergent extension. In addition, the cooperating mechanism between the formation of the characteristic mosaic pattern of epithelial cells and the convergence and extension of the whole tissue also remains unknown. This topic will be dealt with in the next section.

The second point concerns the distribution of NMII. NMII is concentrated at the shortening cell–cell junctions and is considered to be involved in the rosette formation in Drosophila [89, 90] and is required for convergent extension of the auditory epithelium in mice [94]. Mouse NMII is composed of three members, NMIIA, B, and C, and these are the gene products of Myh9, Myh10, and Myh14, respectively. In the developing auditory epithelium, the expression of NMIIB and NMIIC has been reported [94, 95]. However, these reports did not focus on the process of rosette formation and dissociation; therefore, it is unclear whether the NMII proteins accumulate at the shortening cell–cell junctions in rosettes. Further analysis is needed to clarify the mode of action of NMII in the developing auditory epithelium.

The last point is the initiation mechanism of convergent extension. Convergent extension in the auditory epithelium is thought to start after the final cell division; however, we do not know the factor(s) that prompts the epithelium to begin convergent extension or guide the direction of convergent extension. With regard to this mechanism, an interesting observation was reported. In the normal immature prosensory epithelium, extension of individual cell shape, which is parallel to the longitudinal axis of the cochlear spiral, is observed, as well as rosette formation, at E14.5 [84]. Similar cellular extension is also observed in Drosophila epithelial cells at the beginning of germ band extension [96, 97]. This cellular stretching is caused by the pulling force originating from posterior midgut invagination (Fig. 10). This stretching is transient and disappears as cell intercalation begins. In a mutant embryo in which the midgut invagination is ablated, the cellular stretching and the consequent germ band extension was markedly reduced. Interestingly, while the incidence of cell intercalation and growth of cell–cell junction were normal, the orientation of newly forming junction was affected in this mutant. This pulling force is, therefore, required for convergent extension in orientating the growth of new cell–cell junctions [98]. Although the mechanism and significance of the cellular extension in the immature auditory epithelium are not known, the cellular orientation is randomized in p120 catenin KO mice, which cannot accomplish convergent extension because of a defect in the intracellular localization of cadherin. Dlg1 KO mice also exhibit defects in convergent extension and directional elongation of individual cells in the immature region of the auditory epithelium [85] and our unpublished data). Further analyses are necessary to reveal the significance of this cellular stretching and the mechanism that promotes the convergent extension.

Fig. 10
figure 10

Germ band extension in a Drosophila embryo and the preceding temporal stretching of germ band epithelial cells. Coincident with gastrulation in a Drosophila embryo, the lateral epidermis, called the germ band, elongates by cell intercalation. Preceding this tissue movement, germ band epithelial cells stretch in anterior–posterior direction due to a stretching force from midgut invagination at the tip of germ band. The stretching is temporal and is not seen once the germ band extension begins

Mosaic pattern formation of six differentiated auditory epithelial cells

As mentioned above, the formation of a specific mosaic pattern is another factor to be considered in understanding the tissue construction of the organ of Corti. Recently, several studies have been conducted that focused on mosaic formation. In the undifferentiated cochlear epithelium, the average number of cells adjacent to another cell is six in the sensory epithelium, inner sulcus, or outer sulcus. This structure is called a honeycomb, and is the most common form of uniform epithelial tissue, and remains throughout development in the inner and outer sulcus. However, only in the auditory epithelium, the number of adjacent cells decreases with differentiation, and one hair cell comes to be surrounded by four to five supporting cells in later stages [60]. During this process, the original honeycomb-patterned epithelial cells change to a checkerboard pattern, in which the junction between cells converges in the horizontal or vertical directions, and to the final characteristic mosaic pattern (Fig. 2), with oblique cell–cell junctions between Deiter’s cells [99]. The reason why such a change occurs only in the auditory epithelial region, and the mechanism that determines the number of supporting cells surrounding hair cells remain poorly understood. Recently, however, the involvement of nectin molecules in the recognition of adjacent cells in this developmental process has been reported. Nectins are a membrane protein family with an immunoglobulin-like domain, which are involved in cell–cell adhesion at adherens junctions. There are four isoforms, nectin-1–4. Nectins have homophilic-and heterophilic-binding ability; however, heterophilic binding is stronger [100]. In the nascent auditory epithelium, nectin1 and nectin3 are mainly expressed in hair cells and supporting cells, respectively. In mice knocked out for either nectin gene, the unusual contiguity of two hair cells increased, and the formation of a normal mosaic pattern was inhibited [101]. Furthermore, spontaneous mosaic formation has been mimicked in a co-culture of two colonies of HEK293 cells which are transfected with nectin-1 or nectin-3. The results suggest the possibility that varying nectin expression in hair cells and supporting cells enables them to rearrange their positions such that nectin-1-positive cells are surrounded by nectin-1-negative cells. Although it has not been investigated thoroughly whether the cell rearrangement for convergent extension and that for mosaic formation are related processes, convergent extension proceeds independent of nectins [102]. It would be interesting to determine whether rosette formation and dissociation accomplish convergent extension before the cell type-specific rearrangement forms the mosaic pattern, or whether rosette formation and dissociation proceed under the control of cell–cell affinity by means of nectin selectivity. Live monitoring of cellular dynamics of each cell type during cell rearrangement would answer this question.

The mechanism of the change from a pseudo-stratified monolayer epithelium to a stratified epithelium is unknown

The auditory epithelium changes its structure from a pseudo-stratified epithelium, with three to five nuclei stacking in the vertical direction in the immature stages, to a bilayered stratified epithelium of hair cells in the upper layer and supporting cells in the lower one as development proceeds [21]. There are some examples of developmental stratification of simple monolayer epithelia, as the esophageal epithelium, corneal epithelium, and granulosa cells of the ovarian follicle [103]. However, these stratifications seem substantially different from that in the auditory epithelium, because these stratification are caused by cell proliferation, and basal cells retain their mitotic capacity as stem cells after stratification, which is completely different from the mitotic quiescence of epithelial cells in the organ of Corti. Nevertheless, the mechanism for the detachment and repositioning of differentiating nascent hair cells from the basal lamina to the luminal layer of the auditory epithelium might have something in common with the stratification process of the above organs. For example, BMP4, WNT, SOX2, and p63 are involved in the process of stratification of esophageal and retinal epithelia [104,105,106]. Of these, p63 is a transcription factor specific to the basal cells of the stratified epithelium, and is essential for the stratification process in these tissues. The primary function of p63 in the basal cells in the stratified epithelia is thought to be maintaining the proliferative capacity of basal cells. However, some reports suggested an association of p63-related epithelia stratification with a change in the keratin expression pattern in the esophagus [105] or with integrin expression in mammary glands [107], which imply the involvement of p63 in epithelial construction by regulating cell–cell or cell-matrix adhesion. Recently, Terrinoni et al. reported that p63 is expressed in the developing cochlea, and induces the expression of ATOH1, PROX1 and HES5, which regulate the differentiation of auditory epithelial cells, and that disruption of the p63 gene induces the appearance of extra inner/outer hair cells in the organ of Corti [108]. However, it remains unclear whether p63 in the developing auditory epithelium is involved in the regulation of cellular adhesion molecules like above-mentioned stratifying epithelia. Future validation of p63 function in the development of the organ of Corti might provide an interesting insight into the mechanism of epithelial stratification.

Conclusion

Cochlear organogenesis starts as an epithelial sac and proceeds via the sequential occurrence of the final mitosis of prosensory epithelial cells; their fate determination and differentiation; intraepithelial cell rearrangement; and tissue elongation into a long spiral form. During this process, these various events, corresponding to different developmental stages, occur in parallel along the long spiral cochlear duct and are regulated by many signaling molecules, their receptors, and transcription factors, such as WNTs, FGFs, BMP, SHH, and Notch, each of which plays its role in a finely tuned spatiotemporal manner. To date, the central roles of Notch signaling and PCP signaling for the cell fate determination of the sensory epithelium and for the convergent extension of the organ of Corti have been determined, respectively. However, the mechanism generating the cochlear-specific complexity, namely, the concurrency of epithelial stratification, convergent extension, and mosaic formation in the prosensory epithelium, remains to be elucidated. Live-imaging observation of the developmental process of the auditory epithelium might provide useful results that would help to answer these questions. Future investigations using such techniques are needed to acquire novel insights into the tissue construction mechanism during development.