Optical communications over fiber networks are the backbone of our current communication infrastructure, and the essential network components and subsystems have been described in detail in Part A of this Springer Handbook. Fiber-optic networks have scaled in capacity over the years by first adding multiple wavelength channels, and most recently by increasing the spectral efficiency of the transmitted signals through advanced modulation techniques enabled by the introduction of digital coherent transceivers [10.1]. This allowed to use pulse shaping in combination with multilevel complex formats to optimize the use of the optical spectrum, as well as transmitting independent signals on two orthogonal polarizations of the electric field propagating in a single-mode fiber () (See Chaps. 47 for a detailed description). In this way, a single SMF is able to provide around \({\mathrm{50}}\,{\mathrm{Tb/s}}\) for distances up to \({\mathrm{10000}}\,{\mathrm{km}}\). This high capacity, which was once considered much higher than required for telecommunications, is now becoming the bottleneck of the cloud-centric information-technology communication networks. It is therefore of paramount importance to identify and develop new technologies to increase the capacity of fiber-optic communication systems by \(2--3\) orders of magnitude, while simultaneously reducing the cost per bit of the transmitted data. As current state-of-the-art fiber-optic systems operate close to the theoretical capacity limit [10.1], this can only be achieved by increasing the number of parallel spatial channels, which is the aim of research in space-division multiplexing (SDM) [10.2, 10.3, 10.4].

The simplest approach to overcome the capacity limit of SMFs is using traditional single-mode systems operated in parallel. However, this approach results in costly and inflexible optical networks.

Significant cost reductions can be expected by better integrating components at various levels, in particular by sharing components whenever possible or placing multiple devices in a single planar waveguide circuit or in a common free-space subsystem.

Integration offers great opportunities for cost reduction, most remarkably for transponders, which represent a significant expense in a transmission system and require large numbers of wavelength and spatial channels (presently an independent line card is used for each wavelength channel).

Optical amplifiers supporting multiple spatial channels also offer a great potential for integration: starting from simple parallel erbium-doped amplifiers sharing some components and control boards, to advanced pump-sharing schemes like for example cladding pumped amplifiers, where a common pump is guided in the fiber cladding and the signals to be amplified are guided in the fiber cores.

Optical switches, which are the basic building blocks of optical networks, can also be modified to support systems with multiple spatial channels. In particular, the concept of joint switching means a beam-steering-based switching element (for example microelectro-mechanical system () mirror arrays, or liquid crystal on silicon () pixel arrays) can be used to switch multiple spatial channels simultaneously.

At the physical transmission level, consisting of the optical fiber, it is also possible to utilize waveguide structures that support multiple spatial modes, either by using single cores that support multiple modes, or by adding multiple cores in the common cladding of an optical fiber. While the single-mode fiber transmission channel is very well known in terms of both the linear (chromatic dispersion, polarization-mode dispersion, and Rayleigh scattering) and nonlinear (Kerr effect, stimulated Raman scattering, and stimulated Brillouin scattering) effects, the characterization of the same propagation effects can become fairly complex if multiple modes are involved, as all possible intermodal interactions have to be accounted for. Modeling propagation effects in fibers supporting multiple modes is an imperative step towards the assessment of the transmission capacity in SDM systems.

1 Basic Concepts in Space-Division Multiplexed Optical Networks

Adding spatial channels to traditional wavelength-division multiplexed () networks can significantly affect the overall network complexity, as both the wavelength and space dimensions can be used for the routing of optical signals. Current network architectures are also often constrained by reliability requirements: for example, networks are designed such that a single failure at any component level can always be overcome by either a protection scheme (where a second independent physical path is available) or a restoration scheme (where a second independent path is configured when needed). Using multiple independent spatial paths provides more flexibility to overcome failure by appropriately designing and operating the network.

Finally, at a physical point-to-point link level, fibers supporting multiple light paths, like multimode and multicore fibers, can be utilized to reduce the component count by integrating functionalities in the spatial domain.

Routing in optical networks, network reliability, and point-to-point link engineering are not independent of each other, and the three domains have to be optimized jointly, which makes SDM networking a formidable challenge with no obvious winning network architecture, but rather with multiple possible solutions that depend on the specific network requirements.

In this chapter, we focus on SDM point-to-point links (Sect. 10.2) and routing in SDM networks (Sect. 10.8), as these two topics are key for understanding SDM networks.

2 SDM Point-to-Point Links

An SDM point-to-point link is schematically described in Fig. 10.1. It includes the following main components: A bank of transmitters Tx\({}_{n,m}\), a multiplexing device that encodes \(N\) spatial channels and \(M\) wavelength channels into the physical spatial and wavelength channels, an SDM fiber possibly followed by \(K\) sections, each consisting of an optical amplifier and an SDM fiber span. At the end of the transmission fiber, a demultiplexing device splits all transmitted channels by spatial and wavelength channels, and the individual channels are received by a bank of single-mode receivers Rx\({}_{n,m}\). The subscripts \(n\leq N\) and \(m\leq M\) in Fig. 10.1 are the indices of the spatial and wavelength channels, respectively.

Fig. 10.1
figure 1figure 1

Point-to-point combined WDM/SDM transmission

Note that spatial and wavelength channels are not equivalent: Wavelength channels are formed by arbitrarily allocating portions of the continuous optical spectrum, whereas spatial channels are in a one-to-one correspondence with a discrete set of modes or cores of the transmission fiber. This has important implications for optical switching as we will see in Sect. 10.8. Additionally, there is also a fundamental difference between spatial and wavelength channels concerning crosstalk: Optical fibers do not introduce linear crosstalk between different wavelength channels (nonlinear effects can potentially do so, but we will neglect nonlinear effects in this stage), whereas linear crosstalk can be present between spatial channels, particularly if the spatial channels are arranged very closely, like in multicore fibers, where light can couple between neighboring cores, or in multimode fibers, where the modes spatially overlap and coupling can be produced by fiber imperfections. In practice, the presence of crosstalk between spatial channels, is not detrimental for optical transmission. In fact, the crosstalk between spatial channels can be undone using digital signal processing () techniques, which are similar to multiple-input-multiple-output () algorithms used in wireless networks. A major advantage of optical fibers, as compared to the wireless channel, is that they have very low loss, and often also very low difference in loss between modes, which results in an almost unitary MIMO channel, which according to information theory, provides \(N\) times the capacity of a single channel, where \(N\) is the number of spatial channels. MIMO transmission is particularly effective if the received signals are detected using polarization-diverse coherent receivers (PD-Rxs), which are able to measure amplitude and phase for both polarizations of the received spatial channels. In fact, in the case where all modes guided by the fiber under test are detected using PD-Rxs, the optical field impinging upon the receiver is fully known and propagation-induced impairments (most effectively unitary linear effects) can be compensated for in the digital domain. The MIMO DSP that is necessary to undo linear impairments, is a generalization of the \(2\times 2\) MIMO implemented in commercially available digital coherent transceivers (Chap. 6), where \(2N\times 2N\) MIMO is required for \(N\) coupled spatial modes, and additionally a larger number of equalizer memory (taps) is needed owing to the fact that, as discussed in the next section, the group velocity difference between modes in multimode optical fibers is several orders of magnitude larger than the group velocity difference between the two polarizations of a single-mode fiber. The design of SDM fibers is therefore often optimized to make the magnitude of the group velocities of all the fiber modes as close as possible. Note that in the case of multimode fiber, this optimization is very similar to that performed on commercial OM3 and OM4 multimode fibers used for short-reach interconnects, where the difference in group velocities between modes can limit transmission reach and bandwidth [10.5, 10.6, 10.7].

Coherent MIMO transmission over SDM fiber can maximize the transmission capacity of the fiber close to the theoretical limit imposed by information theory under any crosstalk condition. The crosstalk in the \(n\)-th mode \(X_{\mathrm{t}}(n)\) is defined as

$$\displaystyle X_{\mathrm{t}}(n)=\frac{\sum_{m\neq n}P_{m,n}}{P_{n,n}}\;,$$
(10.1)

where \(P_{m,n}\) is the power transfer matrix between an excited mode \(m\) and a mode \(n\) received after transmission through the fiber. The added complexity of the MIMO DSP, even if within the realm of modern ASIC technology, is often undesirable, and alternatively SDM fibers can be designed so that the crosstalk between spatial channels is reduced to levels where electronic crosstalk mitigation is no longer required. In Table 10.1 we report the maximum acceptable crosstalk level for an added system penalty (degradation of the quality factor \(Q\) of the transmitted signal) of 1 and \({\mathrm{3}}\,{\mathrm{dB}}\), respectively, at a bit-error ratio () of \(10^{-3}\) [10.8], which using state-of-the-art hard-decision forward error correction () is sufficient to obtain error-free performance (post-FEC BER \(<10^{-12}\)).

Table 10.1 Maximum acceptable crosstalk level for an additive system penalty obtained at a BER of \(10^{-3}\) for quadrature phase-shift keying () modulation and various quadrature-amplitude modulation () formats (after [10.8])

This is of particular interest in the case of a system based on the use of multicore fibers where distances over \({\mathrm{10000}}\,{\mathrm{km}}\) have been achieved without MIMO processing. We will refer to this technique as transmission in the low-crosstalk regime. Note that the low-crosstalk regime is challenging not only for the fiber, but also for all the in-line optical components, connectors and splices, which also have to meet the low-crosstalk requirement. This is sometimes a limiting factor for the level of integration that can be achieved.

Alternatively, it is also possible to combine MIMO-based transmission with the low-crosstalk regime as not all modes will show the same amount of coupling between each other and the total number of spatial channels can then be separated into groups, where strong coupling is present within the groups and little crosstalk is observed between the groups. This technique is used in multimode fibers for distances smaller than \({\mathrm{100}}\,{\mathrm{km}}\) and also in multicore fibers with few-mode cores (few-mode multicore fibers (s)), where MIMO is performed across the core modes of each core but not across the core modes of different cores, and distances up to \({\mathrm{1000}}\,{\mathrm{km}}\) have been demonstrated [10.10, 10.9].

In the following sections, we discuss in more detail the components of a point-to-point SDM link.

2.1 SDM Fibers

The single-mode fiber has been the work horse of high-capacity long-distance communications for over three decades. It is also the most common and lowest-cost glass fiber (Chap. 2), mostly because of its favorable optical properties, like low loss and large bandwidth (tens of \(\mathrm{THz}\)). A common way to increase the capacity of optical links is therefore the use of single-mode fiber ribbons. Fiber ribbons can be spliced with commercial ribbon splicer and containerized, therefore offering advantages in high fiber-count cables, and cables with over 3000 fibers are commercially available. Recently, it has been proposed to increase the capacity of the fiber-optic channel using the following fiber types:

  • Multicore fibers: fibers with multiple cores

  • Multimode fibers: fibers with cores supporting multiple modes.

Both fiber types are well known and have been proposed for various applications, like sensing, endoscopy, and short-reach interconnects, but a significant effort was recently devoted to optimizing the fiber design to support long-distance transmission.

2.1.1 Multicore Fibers

Since low-crosstalk multicore fibers (s) have been successfully engineered, various techniques to further reduce the crosstalk between cores have been investigated. These include, for example, the use of trenches and holes around the cores and the use of heterogeneous cores, where neighboring cores are designed with different propagation constants. Additionally, bidirectional transmission can be used, where neighboring cores carry signals propagating in opposite directions. Nevertheless, the maximum number of cores in multicore fibers is limited by the cladding diameter. Increasing the cladding diameter substantially above the standard diameter of \({\mathrm{125}}\,{\mathrm{{\upmu}m}}\) results in more fragile fibers with reduced reliability, and practical cladding diameters for long-distance communication are limited to around \({\mathrm{250}}\,{\mathrm{{\upmu}m}}\) maximum cladding diameter. Additionally, larger core density can be achieved in multicore fibers by reducing the effective area of the cores, which however reduces the transmission performance of the fiber. Multicore fibers with more than 30 cores [10.11] have been demonstrated for a distance up to 1645 km ultrahigh-capacity transmission, and longer distances can be achieved by reducing the core count to 12 cores, where distances up to \({\mathrm{8800}}\,{\mathrm{km}}\) have been demonstrated [10.12]. Multicore fibers can be spliced using splicers designed to support polarization-maintaining fibers, but the nonstandard cladding diameters require customized fiber holders. Also, the core alignment is typically less accurate than in the case of single-mode fiber splices, particularly for cores located far from the fiber center, which are susceptible to fiber rotation errors, resulting in slightly larger splicing losses (up to \({\mathrm{1}}\,{\mathrm{dB}}\)). Multicore fibers can be connectorized and prototype connectors have been demonstrated [10.13].

Alternative multicore fibers can be operated in the so-called coupled-core regime, where the distance between cores is made small, such that the cores become coupled. The coupled-core multicore fiber () will therefore behave more like a microstructured fiber supporting multiple modes. It is important to point out that coupled-core fibers show an optimum core spacing, where the properties of the modes are such that the fiber perturbations cause strong random coupling between the modes, while the modal group-velocity difference is still moderately small. This effect is described in detail in Sect. 10.3.3.

2.1.2 Multimode Fibers

Multimode fibers (s) consist of fibers with a single core that is either larger in diameter or has a larger refractive index compared to an SMF, such that the fiber can support more than one mode. The modes can then be potentially used as independent transmission channels, and fibers with 100 or more modes can be easily produced. Various refractive index designs of the core have been proposed to support SDM, like step-index and multistep index profiles, graded-index profiles, and ring-core fibers.

Two design strategies have been pursued for multimode fibers. The first strategy is to minimize the group delay difference between the supported fiber modes. For few modes this is possible by using multistep index fibers, like for example a depressed-cladding design in the case of three spatial modes [10.14]. For larger numbers of modes, cores with a graded-index profile provide an optimum solution, and fibers with up to 45 modes at 1550 nm wavelength have been demonstrated using 50 \({\upmu}\)m core diameters [10.5, 10.6, 10.7]. The fibers are very similar in design and manufacturing to commercially available graded-index fiber optimized for 850 nm wavelength and referred to as OM3 and OM4 fibers in commercial products, which are amply used for short-reach interconnects, for example in datacenters. Graded-index multimode fibers have a typical residual group delay difference of around \({\mathrm{100}}\,{\mathrm{ps/km}}\) [10.15, 10.16], which for long-distance communication that can reach up to \({\mathrm{10000}}\,{\mathrm{km}}\) would result in up to \({\mathrm{1}}\,{\mathrm{{\upmu}s}}\) of maximum delay between modes. Fortunately, the crosstalk between modes strongly reduces the effect as explained in Sect. 10.5.1, such that impulse response widths in the order of tens of nanoseconds are experimentally observed. Using low group delay multimode fibers, transmission distances up to \({\mathrm{4500}}\,{\mathrm{km}}\) have been demonstrated using 3 spatial modes [10.17], whereas experiments with up to 45 spatial modes [10.18, 10.19] have been reported for shorter fiber lengths. The second design strategy aims to reduce the coupling between modes and use the modes as independent transmission channels. The main goal in this case is to increase the difference in phase velocity between modes by optimizing the index profile. The phase velocity of a mode describes the speed at which the optical phase front travels in the fiber (a formal definition is given in Sect. 10.3). This approach can yield a significant crosstalk reduction, particularly between nearly degenerate modes, however fiber manufacturing imperfections, geometrical deformations like bending and twisting, and Rayleigh scattering limit the achievable crosstalk reduction, with the smallest reported crosstalk levels being of the order of \(\mathrm{-30}\) to \({\mathrm{-40}}\,{\mathrm{dB/km}}\), and the longest transmission distance demonstrated being about \({\mathrm{50}}\,{\mathrm{km}}\). Low-crosstalk fibers are of interest for short-distance applications (\({<}\,{\mathrm{100}}\,{\mathrm{km}}\)).

We note that alternative fiber designs aiming to avoid degeneracy between modes have been proposed, for example by using elliptical cores [10.20, 10.21, 10.22], or high-contrast ring-core fiber designs [10.23], which allows accessing and using all fiber modes without any MIMO digital signal processing.

2.2 Mode Multiplexers and Fiber Couplers

Mode multiplexers (s) are devices that couple multiple single-mode fibers into the modes of a multimode fiber. There are two types of MMUXs: The so-called mode-selective device directly couples a given single-mode fiber with a specific fiber mode, whereas the nonmode-selective device, associates different single-mode fiber with different orthogonal linear combinations of modes. For coherent MIMO transmission a nonmode-selective MMUX can be utilized with no disadvantage, whereas in some specific cases, like for example transmission with low-crosstalk between mode groups, or for modal delay compensation, mode-selective MMUXs with high modal selectivity are required.

Since fiber modes are orthogonal to each other (the definition of mode orthogonality is provided in Sect. 10.3), it is theoretically possible to separate modes in a lossless way, similarly to the way in which a polarizing beam splitter can separate two polarization components, or a diffraction grating can separate different wavelengths. Numerous techniques have been proposed and demonstrated for realizing mode multiplexers [10.25, 10.26, 10.27, 10.28, 10.29, 10.30, 10.31, 10.32, 10.33]. In what follows, we describe two of the most promising techniques.

2.2.1 The Photonic Lantern

Photonic lanterns (s) provide an adiabatic transition between \(N\) single-mode fibers and a step-index multimode fiber [10.24, 10.31, 10.32, 10.34, 10.35, 10.36]. This device is manufactured by using a glass processor starting from \(N\) single-mode fibers inserted in a low refractive index capillary. The composite structure is then continuously reduced in size (tapered) in the glass processor so that the cores of the single-mode fibers vanish and the claddings form the new core, whereas the low-refractive-index capillary forms the new cladding (Fig. 10.2a).

Fig. 10.2
figure 2figure 2

(a) Principle of the fiber-based photonic lantern. (b) Photonic lantern implemented using laser-inscribed waveguides. (c) Core arrangement matching linearly polarized (LP) modes of multimode fibers with 3, 6, 10, and 15 spatial modes [10.24]

The end section of the PL can then be directly spliced to a multimode fiber with a matching core geometry. In order to map the correct modes between the single-mode and the multimode end, the SMFs have to be arranged into specific spatial patterns (Fig. 10.2c). Arranging the fibers according to these patterns is easily achieved for MMUXs up to six spatial modes, where the single-mode fibers self-arrange in the capillary. Photonic lanterns supporting a larger number of modes are fabricated more easily by using a drilled low-refractive-index preform instead of the capillary, where the fiber is held in the correct location by the drilled holes. This way MMUXs up to 15 spatial modes have been demonstrated [10.37]. Mode-selective PLs can be achieved by starting from nonidentical single-mode fibers, either by slightly changing the core diameter or the refractive index difference between the core and cladding of the single-mode fibers [10.31, 10.32, 10.38]. In this way, for example, the single-mode fibers with the lowest effective refractive index can be matched with the fiber mode that also has the lowest effective refractive index and a mapping between modes and the input SMF can be achieved. Photonic lanterns are of interest because devices with low insertion loss and low mode-dependent loss () can be achieved. For example, an MMUX supporting three spatial modes can have an insertion loss of less than \({\mathrm{0.5}}\,{\mathrm{dB}}\) and less than \({\mathrm{3}}\,{\mathrm{dB}}\) MDL (in short MDL is defined as the power ratio in dB between the least and most attenuated linear combinations of modes—see Sect. 10.5.2 for a more extended description). The corresponding values for six and ten modes are slightly larger, but PL technology is still the best performing in terms of loss.

Photonic lanterns can also be realized by using femto-second laser-inscribed three-dimensional waveguides (Fig. 10.2b), where waveguides are written into a glass substrate and brought close to each other arranged as shown in Fig. 10.2c, so that the waveguides almost merge [10.39]. In particular, there is an alternative design for the laser-inscribed PL, referred to as taper velocity couplers [10.40], where the inscribed waveguides are tapered within the coupling section. MMUXs fabricated this way are commercially available, and offer the advantage of being compatible with planar waveguide technology, as the input waveguide can be arranged in any desired geometry.

2.2.2 The Multiplane Light Conversion Mode Multiplexer

Multiplane light conversion has been proposed as a universal mode-converting device capable of transforming any input mode set into any output mode set [10.33]. This functionality is achieved by having the light traversing multiple phase-only transmission masks, each followed by a free-space transmission section. The number of required masks depends on the desired transformation, but fortuitously, one of the transformations of biggest interest in mode-division multiplexing, which is the transformation from spatially separated spots into Laguerre–Gaussian beams (which are in close approximation to the modes of a graded-index multimode fiber, as will be seen in the following sections), can be achieved with less than ten phase masks. Also, in practice, multiplane devices are implemented using a reflective geometry, where light is reflected multiple times between two planes: one plane consists of a substrate containing multiple phase masks, and the second is a high-reflectivity mirror. A schematic arrangement for a mode-multiplexer is shown in Fig. 10.3.

Fig. 10.3
figure 3figure 3

Multiplane light converter configured as a mode multiplexer: Each single mode fiber (SMF) on the left-hand side will couple to a particular mode of the multimode fiber () on the right-hand side [10.41]

Using this technology, MMUXs with large mode selectivity have been realized with up to 45 spatial modes [10.41, 10.42], and proof-of-principle devices with up to 210 spatial modes using only 7 phase masks have been demonstrated [10.43]. Multiplane-based MMUXs offer great flexibility but up-to-date products and prototypes could not yet match the low-loss level of photonic lanterns. However, since theoretically they are lossless elements, lower loss devices can be expected by using a coating with higher reflectivity and improved mask fabrication and designs.

2.3 SDM Amplifiers

Optical amplifiers that support multiple spatial paths can be designed in several ways. A simple collocation of multiple erbium-doped fibers in a single module can already provide significant cost savings by sharing components and control electronics. Additionally, it is possible to share pump lasers and use arrays of integrated optics components like taps and detectors for power level monitoring. A higher level of integration is possible when using multimode or multicore erbium-doped fibers. It is then possible to extend the functionality of free-space optics-based components like for example optical isolators or gain-equalizing filters to support multiple modes or cores without the need for mode multiplexers or fan-in/fan-out devices in the case of multimode- and multicore fibers, respectively.

Multimode erbium-doped fibers, typically require a precisely controlled doping profile and/or specific modal excitation of the pump laser in order to achieve a similar gain for each mode of interest [10.44, 10.45, 10.46, 10.47]. The situation is more advantageous in multicore doped fibers, where pump lasers are coupled into the individual doped core of the multicore fiber. The gain and noise figure of the core-pumped multicore amplifiers is expected to be comparable to traditional single-mode erbium-doped amplifiers, but the pump couplers will require at least one set of fan-in devices to couple the pump laser.

A particularly promising alternative approach to building optical amplifiers with multiple spatial channels is based on cladding pumping. In cladding pumping, a multimode high-power pump laser at 980 nm wavelength is coupled to the cladding of the amplifying fiber, which results in a homogeneous illumination of the entire fiber cross-section. The cores, which are erbium-doped, absorb the pump light and amplify the guided signals (Fig. 10.4a,b).

Fig. 10.4a,b
figure 4figure 4

Cladding pump amplifier scheme: (a) Side-pumping arrangement where a multimode pump is coupled into the cladding of a multicore fiber to homogeneously illuminate the cladding; (b) cross-section of multicore fiber showing the different fiber regions and the low-index coating used to confine the pump light

The multimode pump can be coupled to the cladding by using the so-called side-pumping configuration, which consists of bringing a tapered multimode fiber carrying the pump light into contact with the external cladding of the amplifying fiber. Coupling efficiency of \(> {\mathrm{80}}\%\) can be achieved without using traditional dichroic combiners.

Cladding pumping can be applied to both multicore and multimode fibers. In multicore fibers, all cores are illuminated by the pump light, which can produce homogeneous gain. This is advantageous compared to core pumping, where for each core the pump light has to be independently combined and coupled [10.48]. In order to minimize the gain variation between cores in cladding-pumped amplifiers, the core properties have to be precisely matched. Alternatively, variable gain attenuators (s) acting on each core separately can be used in combination with a dynamic gain equalizing filter (), for example using an LCOS-based device [10.49]. Furthermore, cladding-pumped amplifiers, are typically operated in a nonsaturated output power regime, and therefore can naturally provide constant gain independently of the input power, which is desirable in optical networks to reduce the effect of transients caused by partial network failures. A drawback of operating the amplifier in a nonsaturated regime is however a reduced power efficiency of the cladding pump amplifier, which can in principle be overcome by means of pump-recycling schemes [10.50] able to re-use the pump light that is not absorbed by the cores at the end of the amplifying fiber (which otherwise is just dumped into free-space and absorbed).

Cladding pumping is advantageous also for multimode amplifiers: A homogeneous gain for all modes can be achieved by using a cladding pumping and in combination with a simple doping profile with constant erbium concentration across the whole core region [10.51, 10.52].

2.4 SDM Transceivers

Transceivers for SDM differentiate themselves from state-of-the-art single-mode digital coherent transceivers in two ways.

The first main difference is that additional DSP capabilities are required to support multiple modes, and a longer memory is necessary to accommodate larger group-delay differences in the MIMO DSP. Also, in order to simultaneously process multiple MIMO channels, a higher integration density of high-speed analog-digital converters (s) is required. Note that the complexity of a conventional SMF digital coherent transceiver is dominated by the chromatic dispersion compensation (approximately \({\mathrm{30}}\%\)) and forward-error correction sections (approximately \({\mathrm{25}}\%\)), whereas the \(2\times 2\) MIMO section implies relatively little complexity (approximately \({\mathrm{10}}\%\)). Therefore, if the added complexity of the MIMO DSP grows hypothetically by one order of magnitude, the overall DSP complexity only increased by a factor of 2 per mode (see [10.53] for a detailed analysis).

The second main difference is that transceivers in a conventional WDM system require a tunable laser for each transceiver in order to cover the full wavelength channel map. Once the spatial dimension is added, it is in principle possible to share a single tunable laser among multiple spatial channels [10.54]. This is of particular interest in MIMO-based channels, where all spatial channels are treated as a single end-to-end MIMO transmission channel [10.55].

Additionally, the large count of basic components required in an SDM transceiver, like high-speed ADCs and digital-to-analog converters (s), modulators and optical coherent receivers, offers a great cost-saving potential through the integration of large pools of transceivers.

3 Fiber Modes

The modes of an optical fiber are the solutions of Maxwell's equations for electro-magnetic waves traveling along the propagation axis \(z\) of a dielectric waveguide characterized by some transversal refractive index profile. These solutions are invariant in the \(z\)-direction up to a phase term \(\exp(\mathrm{j}\beta_{n}z)\), where \(n\) is the mode number and \(\beta_{n}\) is corresponding propagation constant. A single-mode electromagnetic field generated by a monochromatic source at the angular frequency \(\omega_{0}\) can be expressed as

$$\displaystyle\begin{aligned}\displaystyle\boldsymbol{E}_{n}(x,y,z,t)&\displaystyle=\mathrm{Re}\left[\boldsymbol{F}_{n}(x,y,\omega_{0})\mathrm{e}^{\mathrm{j}(\beta_{n}z-\omega_{0}t)}\right],\\ \displaystyle\boldsymbol{H}_{n}(x,y,z,t)&\displaystyle=\mathrm{Re}\left[\boldsymbol{G}_{n}(x,y,\omega_{0})\mathrm{e}^{\mathrm{j}(\beta_{n}z-\omega_{0}t)}\right],\end{aligned}$$
(10.2)

where \(\boldsymbol{E}_{n}\) and \(\boldsymbol{H}_{n}\) are real-valued three-dimensional vectors that represent the electric and magnetic field, respectively. The complex-valued vectors \(\boldsymbol{F}_{n}\) and \(\boldsymbol{G}_{n}\) are the corresponding lateral modal field distributions, and they are solutions of the Helmholtz equation in the form [10.56, 10.57]

$$\begin{aligned}\Updelta_{\mathrm{T}}\boldsymbol{F}_{n}+\frac{\omega^{2}_{0}}{c^{2}}n^{2}(x,y)\boldsymbol{F}_{n} & =\beta_{n}^{2}\boldsymbol{F}_{n}\;,\end{aligned}$$
(10.3)
$$\begin{aligned}\Updelta_{\mathrm{T}}\boldsymbol{G}_{n}+\frac{\omega^{2}_{0}}{c^{2}}n^{2}(x,y)\boldsymbol{G}_{n} & =\beta_{n}^{2}\boldsymbol{G}_{n}\;,\end{aligned}$$
(10.4)

where \(n(x,y)\) is the transversal refractive index profile, \(c\) is the speed of light in a vacuum, and \(\Updelta_{\mathrm{T}}=\partial^{2}/\partial x^{2}+\partial^{2}/\partial y^{2}\) is the transversal Laplace operator. Note that the first of the two equations is only exact if \(\nabla\cdot\boldsymbol{E}=0\), whereas in general \(\nabla\cdot\boldsymbol{E}=-2\nabla\left[\log(n)\right]\cdot\boldsymbol{E}\), as follows from \(\nabla\cdot\boldsymbol{D}=0\) (\(\boldsymbol{D}\) is the electric displacement vector). The condition \(\nabla\cdot\boldsymbol{E}=0\) is fulfilled exactly for step-wise constant refractive index profiles, and only approximately for slowly varying refractive index profiles.

In most cases that are of practical relevance, the fiber modes have to be calculated with numerical methods, and they fulfill the orthogonality condition [10.56]

$$\displaystyle\iint\mathrm{d}x\mathrm{d}y\left(\boldsymbol{F}_{n}\times\boldsymbol{G}_{m}^{\ast}\right)\cdot\boldsymbol{\hat{z}}=2\mathcal{N}_{n}^{2}\delta_{n,m}\;,$$
(10.5)

where \(\boldsymbol{\hat{z}}\) is a unit vector pointing in the propagation direction, \(\delta_{n,m}\) is the Kronecker delta, and \(\mathcal{N}_{n}\) is a normalization factor that is discussed later in the chapter. The orthogonality condition is used to decompose any electric and magnetic field at the fiber input characterized by the lateral profiles \(\boldsymbol{\mathcal{F}}_{\mathrm{fs}}(x,y)\) and \(\boldsymbol{\mathcal{G}}_{\mathrm{fs}}(x,y)\), respectively, as a linear combination of the fiber modes by using the following relations

$$\displaystyle\begin{aligned}\displaystyle\boldsymbol{\mathcal{F}}_{\mathrm{fb}}(x,y)&\displaystyle=\sum_{n=1}^{N}a_{n}\boldsymbol{F}_{n}(x,y)\;,\\ \displaystyle\boldsymbol{\mathcal{G}}_{\mathrm{fb}}(x,y)&\displaystyle=\sum_{n=1}^{N}a_{n}\boldsymbol{G}_{n}(x,y)\;,\\ \displaystyle a_{n}&\displaystyle=\iint\mathrm{d}x\mathrm{d}y\frac{\boldsymbol{\mathcal{F}}_{\mathrm{fs}}\times\boldsymbol{G}_{n}^{\ast}}{2\mathcal{N}_{n}^{2}}\cdot\boldsymbol{\hat{z}}\\ \displaystyle&\displaystyle=\iint\mathrm{d}x\mathrm{d}y\frac{\boldsymbol{F}_{n}^{\ast}\times\boldsymbol{\mathcal{G}}_{\mathrm{fs}}}{2\mathcal{N}_{n}^{2}}\cdot\boldsymbol{\hat{z}}\;,\end{aligned}$$
(10.6)

where the subscripts fs and fb refer to the input field in free space and to the guided field in the fiber, respectively.

Different modes have different propagation constants, unless they are degenerate. It is customary to represent each propagation constant as a Taylor expansion around the central frequency \(\omega_{0}\)

$$\displaystyle\begin{aligned}\displaystyle\beta_{n}&\displaystyle=\beta_{n,0}+\beta_{n,1}(\omega-\omega_{0})+\frac{1}{2}\beta_{n,2}(\omega-\omega_{0})^{2}\\ \displaystyle&\displaystyle\quad+\frac{1}{6}\beta_{n,3}(\omega-\omega_{0})^{3}+\dots\end{aligned}$$
(10.7)

The different terms have the following physical interpretation. The first term is related to the phase velocity \(\nu_{\mathrm{ph},n}\),

$$\displaystyle\nu_{\mathrm{ph},n}=\frac{\omega_{0}}{\beta_{n,0}}\;,$$
(10.8)

which describes the speed of the phase front of the propagating field. When two modes travel at the same phase velocity, energy can be exchanged and the coupled light is kept in phase over a long propagation distance. For pairs of modes with different phase velocities it is possible to define a beat length,

$$\displaystyle L_{n.m}=\frac{2\uppi}{|\beta_{n,0}-\beta_{m,0}|}\;,$$
(10.9)

so that the accumulated phase difference between the two modes is \(\Updelta\phi=2\uppi z/L_{n,m}\), and the two modes are periodically in phase with the period being equal to the beat length. The second term of (10.7) yields the group velocity

$$\displaystyle\nu_{\mathrm{gr},n}=\frac{1}{\beta_{n,1}}\;,$$
(10.10)

which describes the speed of a light pulse traveling in mode \(n\). The arrival time difference for two pulses traveling in modes \(n\) and \(m\), respectively, of a fiber of length \(\ell\) is known as the differential group delay ()

$$\displaystyle\mathrm{DGD}_{n,m}=\ell|\beta_{n,1}-\beta_{m,1}|\;.$$
(10.11)

The third term of (10.7) is related to the chromatic dispersion  coefficient \(\mathrm{CD}_{n}=(2\uppi c/\lambda_{0}^{2})\beta_{n,2}\), \(\lambda_{0}=2\uppi c/\omega_{0}\). Chromatic dispersion is responsible for the intramodal frequency dependence of the group velocity, which causes temporal pulse broadening even in the case when only a single mode is transmitted.

3.1 Modes of a Step-Index Fiber

The simplest optical fibers are based on a step-index profile, where the refractive index in polar coordinates is given by

$$\displaystyle n(r)=\begin{cases}n_{c},&\quad\mathrm{if}\;0<r\leq a\;,\\ n_{0},&\quad\mathrm{if}\;r> a\;,\end{cases}$$
(10.12)

where \(a\) is the fiber core radius, and \(r=\sqrt{x^{2}+y^{2}}\). For step-index fibers the solution of (10.3) and (10.4) is known to be given by the Bessel function \(J_{n}(r)\) in the core region, and by the modified Bessel function of the second kind \(K_{n}(r)\) in the cladding region. Even if the solutions are known within each refractive index region, numerical methods have to be used to solve the boundary condition problem between core and cladding. The solutions for the first four mode groups of a step-index fiber are shown in Fig. 10.5.

Fig. 10.5
figure 5figure 5

First four mode groups of a step-index multimode fiber

The fundamental mode HE\({}_{11}\) is degenerate in polarization, meaning that two modes with orthogonal polarizations, but with the same lateral profile and propagation constants are supported by the fiber. The next modes TM\({}_{01}\) and TE\({}_{01}\) are nondegenerate, and their polarization is position-dependent. The following HE\({}_{21}\) mode is two-fold degenerate and can be represented as a linear combination of polarizations, by using circularly polarized light resulting in a ring-looking intensity profile (similar to TM\({}_{01}\) and TE\({}_{01}\)), but with a phase term of the form \(\exp(\pm\mathrm{j}\varphi)\), where \(\varphi\) is the angle in polar coordinates. We note that fiber modes are often referred to as optical angular momentum () modes [10.23]. However, not all fiber modes can be represented as OAM modes, in particular TM\({}_{0n}\) and TE\({}_{0n}\) are not compatible with OAM modes [10.58], and also the concept of OAM modes breaks down when the effective index contrast is such as to break the degeneracy of the HE\({}_{nm}\) modes [10.59, 10.60].

In the relevant regime of weak guidance [10.61], where the index contrast is smaller or much smaller than \({\mathrm{1}}\%\), the modes of a step-index fiber can be grouped in nearly degenerate mode groups (TM\({}_{01}\), TE\({}_{01}\), and the two-fold degenerate HE\({}_{21}\) mode form a group). Modes belonging to the same group couple almost immediately due to fiber imperfections, and therefore do not propagate as individual modes in real fibers. Moreover, in the regime of weak guidance all modes are approximately linearly polarized (LP)  [10.61]. Each LP mode is identified by two integers \(n\) and \(m\) and is denoted as LP\({}_{nm}\), where \(n\) characterizes the azimuthal dependence \(\exp(\pm\mathrm{j}n\varphi)\) and sets the number of \(2\uppi\) phase changes for a rotation around the fiber axis, whereas \(m\) characterizes the radial dependence of the mode amplitude and sets the number of zero-crossings increased by one in the radial direction. Note that LP modes can equivalently be expressed either in terms of the functions \(\exp(\pm\mathrm{j}n\varphi)\), or in terms of the functions \(\cos(n\varphi)\) and \(\sin(n\varphi)\). Figure 10.6 shows the first 32 modes of a weakly guiding step-index fiber in the \(\cos(n\varphi)/\sin(n\varphi)\) representation. The main advantage of this representation is that the mode lateral profile functions are real-valued and can be plotted using a simple color map. In the figure red and blue indicate positive and amplitude values, respectively.

Fig. 10.6
figure 6figure 6

First 32 modes of a step-index multimode fiber

Modes with \(n\neq 0\) are four-fold degenerate, as they are degenerate with respect to polarization and with respect to their azimuthal characteristics. Modes with \(n=0\) are two-fold degenerate, as they are only degenerate with respect to polarization. The number of LP modes that are guided by a step-index fiber depends on the normalized frequency \(V\), which is defined as

$$\displaystyle V=\frac{2\uppi a}{\lambda_{0}}\sqrt{n_{\mathrm{c}}^{2}-n_{0}^{2}}\;.$$
(10.13)

Fibers with the same \(V\) number guide the same number of modes and the mode profiles are characterized by the same dependence on the normalized radial coordinate \(r/a\). Step-index fibers are single-mode if \(V<2.4\). The cut-off frequencies of high-order modes are enumerated in Table 10.2.

Table 10.2 Cut-off frequencies of the modes of a step-index fiber

According to (10.13), the number of modes of a step-index fiber can be increased either by increasing the core radius or by increasing the difference in refractive index \(n_{\mathrm{c}}-n_{0}\).

3.2 Modes of a Graded-Index Fiber

Graded-index fibers have a nearly parabolic refractive index profile that is truncated at the core/cladding boundary. In order to reduce coupling between the last guided mode-group and the cladding modes, an additional low refractive index trench around the nearly parabolic core is added [10.5, 10.6]. The modes of a graded-index fiber with an ideally parabolic profile are Laguerre–Gaussian modes, where the radial term of the modal profile is a combination of a Gaussian function and a Laguerre polynomial, whereas the azimuthal term is still of the form \(\exp(\mathrm{j}n\varphi)\). As shown in Fig. 10.7, the modes of a graded-index fiber are divided into groups: all the modes of a given group are degenerate and each added group shows one additional degenerate mode. Although the modes of a graded-index fiber look similar to those of a step-index fiber, they have a different sequential order when sorted by the magnitude of their propagation constants. Also, the radial field extension, that in step-index fibers is mostly confined within the core for all modes, in graded-index fibers is proportional to the mode order, so that higher order modes have a significantly larger modal diameter. The propagation constants of the mode groups of an ideal graded-index fiber are equally spaced, and all modes have nearly the same group velocity, which makes the graded-index fiber the multimode fiber with the smallest theoretical DGD between pulses propagating in different modes. In practice, the DGD of a graded-index fiber is determined by the accuracy of the refractive index profile and by the dispersion properties of the core and cladding materials. Additionally, the modes of a graded-index fiber have the property of being invariant under spatial Fourier transformation. The spatial Fourier transform configuration can be achieved, for example, by placing the end face of the fiber in the back focal plain of a lens, so that the Fourier transform appears in the front focal plane of the same lens. For a graded-index fiber, this configuration reproduces a scaled version of the modes, which is not generally the case for other sets of guided modes like for example the modes of a step-index fiber.

Fig. 10.7
figure 7figure 7

The first 36 modes of a graded-index multimode fiber

3.3 Modes of Multicore Fibers: The Concept of Supermodes

Multicore fibers consist of multiple cores that are placed close to each other in a common cladding. The modes of a multicore fiber can be calculated by analyzing the modes of the ensemble of cores. The resulting modal field distributions spread across all cores and are referred to as supermodes. An example of supermodes for a three-core fiber is shown in Fig. 10.8a,b.

Fig. 10.8a,b
figure 8figure 8

Modes of a three-core multicore fiber supporting six vector modes. (a) The intensity profile, (b) direction of the electrical field indicated by arrows. Arrows arranged in circle indicates circular polarization and the position of the arrow relative to the circle indicates the overall relative phase

When the coupling between cores is weak, the modes of a multicore fiber can be calculated using coupled-mode theory [10.62], where the supermodes are approximated by linear combinations of the modes of the individual cores (core modes), and the coupling coefficients depend on the overlap between the core modes [10.62, 10.63]. This approximation is useful to qualitatively study the systematic coupling between the cores. Note that both modal representations, supermodes and core modes, can be used to understand the coupling behavior of multicore fibers, however the supermodes representation has the advantage that it provides exact solutions for any core configuration including the case of strong core-mode overlap.

In practice, the modal properties of the multicore fibers depend strongly on the distance between the cores, and three distinct regimes can be identified:

The first is the weak coupling regime, where the refractive index variations between the cores of the fiber structure produce an equivalent propagation-constant difference between the cores, that is bigger than the difference between the propagation constants of the supermodes. In this case, the cores behave as independent waveguides, showing a random-like coupling between neighbor cores, and the coupling between the cores is then described by the coupled power theory [10.64, 10.65]. The weak coupling regime is typically observed in multicore fibers where the core spacing is significantly larger than the core-mode field diameter.

The second regime is the supermode regime, where the difference in propagation constant between the supermodes is much bigger than the propagation constant variations introduced by the refractive index variations between the cores or external fiber perturbations like twists and bends. In this case, the supermodes are stable and can propagate unperturbed, and the fiber behaves similarly to a conventional multimode fiber, which shows a weak random-like coupling between the supermodes. The supermode regime is typically observed in multicore fibers where the core spacing is comparable or smaller than the core-mode field diameter.

The third regime can be observed when the difference in the propagation constant between the supermodes is comparable to the propagation constant variation induced by refractive index variation and the external fiber perturbations. In this case, the supermodes couple strongly and mix continuously. We refer to this regime as the strong coupling regime, and to the fibers operated in this regime as coupled-core multicore fibers. These fibers offer the advantage of showing very narrow impulse responses for all coupled cores, typically one order of magnitude (or more) narrower than impulse responses achievable in graded-index multimode fibers with an equivalent number of modes (Sect. 10.5.1). The strong coupling regime is typically observed in multicore fibers where the core spacing is comparable or slightly larger than the core-mode field diameter.

4 Representation of Modes in Fibers

We start by considering the case of a single-mode fiber, which for simplicity we assume to be a step-index fiber, where the fundamental mode is HE\({}_{11}\). We express the three-dimensional real-valued electric field vector \(\boldsymbol{E}(x,y,z,t)\) introduced in the previous section as

$$\displaystyle\begin{aligned}\displaystyle\boldsymbol{E}(x,y,z,t)&\displaystyle=\mathrm{Re}\left\{\left[\frac{\boldsymbol{F}_{\mathrm{HE}_{11x}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{HE}_{11}}(\omega_{0})}E_{x}(z,t)\right.\right.\\ \displaystyle&\displaystyle\qquad\left.\left.+\frac{\boldsymbol{F}_{\mathrm{HE}_{11y}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{HE}_{11}}(\omega_{0})}E_{y}(z,t)\right]\mathrm{e}^{-\mathrm{j}\omega_{0}t}\right\},\end{aligned}$$
(10.14)

where \(\boldsymbol{F}_{\mathrm{HE}_{11}x}\) and \(\boldsymbol{F}_{\mathrm{HE}_{11y}}\) are the lateral profile functions of the fundamental mode aligned with the \(x\)- and \(y\)-directions, respectively. Note that the specific choice of the \(x\)- and \(y\)- directions is immaterial to this discussion, as any other pair of orthogonal directions would be equally suitable. The terms \(E_{x}\) and \(E_{y}\) are the corresponding complex envelopes of the field and the normalization coefficient \({\mathcal{N}}_{\mathrm{HE}_{11}}\) is introduced to ensure that the power in Watts that is carried by the \(x\)-oriented (\(y\)-oriented) mode is \(|E_{x}|^{2}\) (\(|E_{y}|^{2}\)). We note that the mode lateral profile functions are evaluated at \(\omega_{0}\) owing to the fact that in all cases of practical relevance the bandwidth of the individual complex envelopes is sufficiently small to ignore the dependence of \(\boldsymbol{F}_{n}\) on frequency. The form of (10.14) shows that the complex envelopes \(E_{x}\) and \(E_{y}\) are not exactly the \(x\) and \(y\) polarization components of the electric field, as follows from the fact that the HE\({}_{11}\) lateral profile function possesses a nonzero component in the \(z\)-direction. The situation simplifies in the relevant case of weakly guiding fibers [10.61], where the fundamental mode LP\({}_{01}\) is linearly polarized and its longitudinal component is negligible. In this case, \(E_{x}\) and \(E_{y}\) characterize fully and independently the \(x\) and \(y\) polarization components of the electric field (which is indeed a two-dimensional vector in the \(x\)\(y\) plane) and the field vector can be expressed as

$$\displaystyle\begin{aligned}\displaystyle\boldsymbol{E}(x,y,z,t)&\displaystyle=\mathrm{Re}\left\{\left[\frac{\boldsymbol{F}_{\mathrm{LP}_{01x}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{LP}_{01}}(\omega_{0})}E_{x}(z,t)\right.\right.\\ \displaystyle&\displaystyle\quad+\left.\left.\frac{\boldsymbol{F}_{\mathrm{LP}_{01y}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{LP}_{01}}(\omega_{0})}E_{y}(z,t)\right]\mathrm{e}^{-\mathrm{j}\omega_{0}t}\right\},\\ \displaystyle&\displaystyle=\frac{F_{\mathrm{LP}_{01}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{LP}_{01}}(\omega_{0})}\mathrm{Re}\left\{[E_{x}(z,t)\boldsymbol{\hat{x}}\vphantom{x^{j}}\right.\\ \displaystyle&\displaystyle\left.\quad+E_{y}(z,t)\boldsymbol{\hat{z}}]\mathrm{e}^{-\mathrm{j}\omega_{0}t}\right\},\end{aligned} $$
(10.15)

where the second equality relies on the fact that the lateral profile of the fundamental mode is real-valued, and \(\boldsymbol{F}_{\mathrm{LP}_{01x}}=F_{\mathrm{LP}_{01}}\boldsymbol{\hat{x}}\), \(\boldsymbol{F}_{\mathrm{LP}_{01y}}=F_{\mathrm{LP}_{01}}\boldsymbol{\hat{y}}\), where \(\boldsymbol{\hat{x}}\) and \(\boldsymbol{\hat{y}}\) are unit vectors pointing in the \(x\)- and \(y\)-directions, respectively. It is convenient to introduce the bi-dimensional vector \(\boldsymbol{E}(z,t)\) defined as

$$\displaystyle\boldsymbol{E}(z,t)=\left(\begin{matrix}E_{x}(z,t)\\ E_{y}(z,t)\\ \end{matrix}\right).$$
(10.16)

This vector is related to the Jones vector introduced in Sect. 10.4.1. It provides a full characterization of the optical field not only in the weakly guiding approximation, where its components are in a one-to-one correspondence with the \(x\) and \(y\) polarization components of the field, but also in the most general case where the longitudinal component is non-negligible. Note that the same vector notation is used, yet with no risk of confusion, to denote both the three-dimensional vectors \(\boldsymbol{E}(x,y,z,t)\) and \(\boldsymbol{F}(x,y,\omega_{0})\), and the two-dimensional vector \(\boldsymbol{E}(z,t)\). The extension of this representation to the case of higher order modes is straightforward, but requires some clarifications. Let us consider the group of the nearly degenerate modes HE\({}_{21\mathrm{a}}\) and HE\({}_{21\mathrm{b}}\), and the transverse modes TE\({}_{01}\) and TM\({}_{01}\), whose excitation yields the field

$$\displaystyle\begin{aligned}\displaystyle\boldsymbol{E}(x,y,z,t)&\displaystyle=\mathrm{Re}\left\{\left[\frac{\boldsymbol{F}_{\mathrm{HE}_{21\mathrm{a}}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{HE}_{21}}(\omega_{0})}E_{\mathrm{HE}_{21\mathrm{a}}}(z,t)\right.\right.\\ \displaystyle&\displaystyle\enskip\left.\left.+\frac{\boldsymbol{F}_{\mathrm{HE}_{21\mathrm{b}}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{HE}_{21}}(\omega_{0})}E_{\mathrm{HE}_{21\mathrm{b}}}(z,t)\right.\right.\\ \displaystyle&\displaystyle\enskip\left.\left.+\frac{\boldsymbol{F}_{\mathrm{TE}_{01}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{TE}_{01}}(\omega_{0})}E_{\mathrm{TE}_{01}}(z,t)\right.\right.\\ \displaystyle&\displaystyle\enskip\left.\left.+\frac{\boldsymbol{F}_{\mathrm{TM}_{01}}(x,y,\omega_{0})}{{\mathcal{N}}_{\mathrm{TM}_{01}}(\omega_{0})}E_{\mathrm{TM}_{01}}(z,t)\right]\mathrm{e}^{-\mathrm{j}\omega_{0}t}\right\}.\\ \displaystyle\end{aligned}$$
(10.17)

In the weakly guiding approximation, the lateral profile functions of the HE, TE, and TM modes can be used to express those of the LP\({}_{11}\) group as follows

$$\begin{aligned}\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{a}x}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}} & =\frac{1}{\sqrt{2}}\left(\frac{\boldsymbol{F}_{\mathrm{HE}_{21\mathrm{a}}}}{{\mathcal{N}}_{\mathrm{HE}_{21}}}+\frac{\boldsymbol{F}_{\mathrm{TM}_{01}}}{{\mathcal{N}}_{\mathrm{TM}_{01}}}\right),\end{aligned}$$
(10.18)
$$\begin{aligned}\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{a}y}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}} & =\frac{1}{\sqrt{2}}\left(\frac{\boldsymbol{F}_{\mathrm{HE}_{21\mathrm{b}}}}{{\mathcal{N}}_{\mathrm{HE}_{21}}}-\frac{\boldsymbol{F}_{\mathrm{TE}_{01}}}{{\mathcal{N}}_{\mathrm{TE}_{01}}}\right),\end{aligned}$$
(10.19)
$$\begin{aligned}\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{b}x}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}} & =\frac{1}{\sqrt{2}}\left(\frac{\boldsymbol{F}_{\mathrm{HE}_{21\mathrm{b}}}}{{\mathcal{N}}_{\mathrm{HE}_{21}}}+\frac{\boldsymbol{F}_{\mathrm{TE}_{01}}}{{\mathcal{N}}_{\mathrm{TE}_{01}}}\right),\end{aligned}$$
(10.20)
$$\begin{aligned}\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{b}y}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}} & =\frac{1}{\sqrt{2}}\left(\frac{\boldsymbol{F}_{\mathrm{HE}_{21\mathrm{a}}}}{{\mathcal{N}}_{\mathrm{HE}_{21}}}-\frac{\boldsymbol{F}_{\mathrm{TM}_{01}}}{{\mathcal{N}}_{\mathrm{TM}_{01}}}\right),\end{aligned}$$
(10.21)

(where we dropped the dependence on \(x\), \(y\), and \(\omega_{0}\) for ease of notation) with the result

$$\begin{aligned}\begin{aligned}\displaystyle\boldsymbol{E}(x,y,z,t)&\displaystyle=\mathrm{Re}\left\{\left[\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{a}x}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}}E_{\mathrm{LP}_{11\mathrm{a}x}}(z,t)\right.\right.\\ \displaystyle&\displaystyle\qquad\qquad+\left.\left.\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{a}y}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}}E_{\mathrm{LP}_{11\mathrm{a}y}}(z,t)\right.\right.\\ \displaystyle&\displaystyle\qquad\qquad+\left.\left.\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{b}x}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}}E_{\mathrm{LP}_{11\mathrm{b}x}}(z,t)\right.\right.\\ \displaystyle&\displaystyle\qquad\qquad+\left.\left.\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{a}y}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}}E_{\mathrm{LP}_{11\mathrm{b}y}}(z,t)\right]\mathrm{e}^{-\mathrm{j}\omega_{0}t}\right\}\\ \displaystyle&\displaystyle=\mathrm{Re}\left(\left\{\frac{F_{\mathrm{LP}_{11\mathrm{a}}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}}[E_{\mathrm{LP}_{11\mathrm{a}x}}(z,t)\boldsymbol{\hat{x}}\right.\right.\\ \displaystyle&\displaystyle\qquad\qquad+\left.\left.E_{\mathrm{LP}_{11\mathrm{a}y}}(z,t)\boldsymbol{\hat{y}}]\right.\right.\\ \displaystyle&\displaystyle\qquad\qquad+\left.\left.\frac{F_{\mathrm{LP}_{11\mathrm{b}}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}}[E_{\mathrm{LP}_{11\mathrm{b}x}}(z,t)\boldsymbol{\hat{x}}\right.\right.\\ \displaystyle&\displaystyle\qquad\qquad+\left.\left.\vphantom{\frac{\boldsymbol{F}_{\mathrm{LP}_{11\mathrm{a}y}}}{{\mathcal{N}}_{\mathrm{LP}_{11}}}}E_{\mathrm{LP}_{11\mathrm{b}y}}(z,t)\boldsymbol{\hat{y}}]\right\}\mathrm{e}^{-\mathrm{j}\omega_{0}t}\right),\end{aligned}\end{aligned}$$
(10.22)

where the complex envelopes of the linearly polarized modes are obtained from those of the true fiber modes through

$$\displaystyle\begin{aligned}\displaystyle\left[\begin{matrix}E_{\mathrm{LP}_{11\mathrm{a}x}}(z,t)\\ E_{\mathrm{LP}_{11\mathrm{a}y}}(z,t)\\ E_{\mathrm{LP}_{11\mathrm{b}x}}(z,t)\\ E_{\mathrm{LP}_{11\mathrm{b}y}}(z,t)\end{matrix}\right]&\displaystyle=\frac{1}{\sqrt{2}}\left[\begin{matrix}1&0&1&0\\ 0&1&0&-1\\ 0&1&0&1\\ 1&0&-1&0\end{matrix}\right]\\ \displaystyle&\displaystyle\quad\times\left[\begin{matrix}E_{\mathrm{HE}_{21\mathrm{a}}}(z,t)\\ E_{\mathrm{HE}_{21\mathrm{b}}}(z,t)\\ E_{\mathrm{TM}_{01}}(z,t)\\ E_{\mathrm{TE}_{01}}(z,t)\end{matrix}\right]\end{aligned}$$
(10.23)

as follows from using (10.18)–(10.21) into (10.22). The second equality in (10.22) shows that each of the complex envelopes in the LP representation provides a complete characterization of a space and polarization mode. All together, the four complex envelopes give a complete description of the field, and they are related to the generalized Jones representation presented in Sect. 10.4.3.

We can now move to the most general case of \(2N\) modes, where the factor of two accounts either for the degeneracy of a spatial mode (as is the case for HE\({}_{11}\) or HE\({}_{21}\)), or for the fact that in all cases of practical relevance a fiber cannot guide only one out of two quasi-degenerate modes (as is the case for TE\({}_{01}\) and TM\({}_{01}\)). In the weakly guiding approximation, the factor of two accounts simply for polarization degeneracy. By suitably sorting the guided modes, we can express the electric field as

$$\displaystyle\boldsymbol{E}(x,y,z,t)=\mathrm{Re}\left[\sum_{n=1}^{2N}\frac{\boldsymbol{F}_{n}(x,y,\omega_{0})}{{\mathcal{N}}_{n}(\omega_{0})}E_{n}(z,t)\mathrm{e}^{-\mathrm{j}\omega_{0}t}\right],$$
(10.24)

where the term \(E_{n}(z,t)\) is the complex envelope of the field in the \(n\)-th mode and the vector \(\boldsymbol{F}_{n}(x,y,\omega_{0})\) is the corresponding mode lateral profile. As specified already in the examples illustrated above, the normalization coefficients \({\mathcal{N}}_{n}(\omega_{0})\) are introduced to ensure that the power in Watts that is carried by the \(n\)-th mode is given by \(|E_{n}(z,t)|^{2}\) [10.66]. Note that a different normalization could be assumed (for instance, in [10.67] the power in Watts is given by \(|E_{n}(z,t)|^{2}/(2Z_{0})\), where \(Z_{0}=\sqrt{\mu_{0}/\epsilon_{0}}\) is the impedance of vacuum). Their expression follows from the mode orthogonality condition (10.5), which we recall here for convenience,

$$\displaystyle\int\mathrm{d}x\mathrm{d}y\left(\boldsymbol{F}_{n}\times\boldsymbol{G}_{m}^{\ast}\right)\cdot\hat{\boldsymbol{z}}=2\mathcal{N}_{n}^{2}\delta_{n,m}\;.$$
(10.25)

The orthogonality condition implies the following equation, which is often found in the literature [10.66, 10.67, 10.68],

$$\displaystyle\int\mathrm{d}x\mathrm{d}y\left(\boldsymbol{F}_{n}\times\boldsymbol{G}_{m}^{\ast}+\boldsymbol{F}_{m}^{\ast}\times\boldsymbol{G}_{n}\right)\cdot\hat{\boldsymbol{z}}=4\mathcal{N}_{n}^{2}\delta_{n,m}\;,$$
(10.26)

and in the weakly guiding approximation simplifies to

$$\displaystyle\frac{n_{\mathrm{eff}}}{2Z_{0}}\mathrm{Re}\left(\int\mathrm{d}x\mathrm{d}y\,\boldsymbol{F}_{n}\cdot\boldsymbol{F}_{m}^{\ast}\right)=\delta_{n,m}\mathcal{N}_{n}^{2}\;,$$
(10.27)

where \(n_{\mathrm{eff}}\) is the effective refractive index of the fundamental mode. With the field expression introduced in (10.24), in the ideal case of a perfectly circular fiber, in the absence of loss, mode coupling and nonlinear propagation effects, the complex envelopes evolve according to the simple evolution equation in the frequency domain

$$\displaystyle\frac{\partial\tilde{E}_{n}(z,\omega)}{\partial z}=\mathrm{j}\beta_{n}(\omega)\tilde{E}_{n}(z,\omega)\;,$$
(10.28)

where by the tilde we denote a (frequency) Fourier transform according to the definition

$$\displaystyle\tilde{E}_{n}(z,\omega)=\int E_{n}(z,t)\exp(\mathrm{j}\omega t)\mathrm{d}t\;.$$
(10.29)

An important feature of (10.24) is that the effect of perturbations is captured through the dependence of the complex envelopes on the longitudinal coordinate, while the modes used in the expansion are those of the unperturbed fiber. An alternative approach, that is not further discussed in this chapter, relies on using perturbed local modes [10.69].

It is worth pointing out that the true fiber modes form a complete orthogonal basis for representing locally the lateral profile of the field propagating in the fiber, and therefore any other orthogonal basis obtained from a unitary transformation of the true fiber modes lateral profile functions works as well [10.70]. However, since the resulting lateral profile functions are not in general fiber modes, the evolution of the corresponding complex envelopes is described by coupled equations even in the ideal case of an unperturbed fiber. Equations (10.18)–(10.21) and (10.23) can be interpreted as an example of this change of basis. Indeed, it is well known that LP modes are only true modes within the weakly guiding approximation, whereas in reality they couple during propagation, not only in fibers with high-index-contrast, but also in weakly guiding fibers where the accumulated effects of the small modal birefringence cannot be ignored [10.71].

In the remainder of this section we review the Jones and Stokes formalisms [10.72, 10.73, 10.74], which are widely used for the study of polarization-related phenomena in single-mode fibers, and discuss their generalization to the multimode case.

4.1 Jones and Stokes Formalism for Single-Mode Fibers

Jones calculus was originally proposed to describe polarized light by means of two-dimensional vectors [10.72]. Indeed, as discussed in the previous section, the vector \(\boldsymbol{E}(z,t)\) defined in (10.16) provides a complete description of the electric field propagating in a single-mode fiber, and the physical interpretation of its two components is slightly different whether the fiber is weakly guiding or not. The corresponding Jones vector is defined as the Fourier transform of \(\boldsymbol{E}(z,t)\) normalized to have unit modulus, namely

$$\displaystyle|e(z,\omega)\rangle=\frac{\tilde{\boldsymbol{E}}(z,\omega)}{|\tilde{\boldsymbol{E}}(z,\omega)|}=\left(\begin{aligned}\displaystyle&\displaystyle e_{x}(z,\omega)\\ \displaystyle&\displaystyle e_{y}(z,\omega)\\ \displaystyle\end{aligned}\right),$$
(10.30)

where we use the bra–ket notation to denote a Jones vector \(|e\rangle\). By the bra \(\langle e(z,\omega)|\) we denote the Hermitian adjoint of the field Jones vector (i. e., the complex conjugate row vector), so that the unit-modulus condition can be expressed as \(\langle e|e\rangle=|e_{x}|^{2}+|e_{y}|^{2}=1\), and the scalar product between two Jones vectors is given by \(\langle u|e\rangle=u_{x}^{\ast}e_{x}+u_{x}^{\ast}e_{y}\).

We now move to introducing the Stokes representation of the electric field. This is an alternative description based on the use of real-valued three-dimensional vectors and it is isomorphic to the Jones representation [10.71]. If we denote by \(\boldsymbol{e}\) the Stokes vector corresponding to the Jones vector \(|e\rangle\), its three components are defined as

$$\displaystyle\begin{aligned}\displaystyle e_{1}&\displaystyle=|e_{x}|^{2}-|e_{y}|^{2}\;,\\ \displaystyle e_{2}&\displaystyle=2\mathrm{Re}\left(e_{x}^{\ast}e_{y}\right),\\ \displaystyle e_{3}&\displaystyle=2\mathrm{Im}\left(e_{x}^{\ast}e_{y}\right).\end{aligned}$$
(10.31)

The length of the Stokes vector can be evaluated to be \(|\boldsymbol{e}|=\sqrt{e_{1}^{2}+e_{2}^{2}+e_{3}^{2}}=\langle e|e\rangle=1\), where the second equality follows from the normalization of \(|e\rangle\) (in some cases the Jones vector is not normalized to have unit modulus, with the result that the length of the Stokes vector equals the optical power). The ensemble of all possible polarization states spans the surface of a sphere of unit radius in Stokes space, which is famously known as the Poincaré sphere. An alternative expression of the Stokes vector, which turns out to be highly convenient for the generalization of the Stokes formalism to the multimode fiber case is the one based on the use of Pauli spin matrices, which we recall here for convenience

$$\displaystyle\begin{aligned}\displaystyle\sigma_{1}&\displaystyle=\left(\begin{matrix}1&0\\ 0&-1\\ \end{matrix}\right),\quad\sigma_{2}=\left(\begin{matrix}0&1\\ 1&0\\ \end{matrix}\right),\\ \displaystyle\sigma_{3}&\displaystyle=\left(\begin{matrix}0&-\mathrm{j}\\ \mathrm{j}&0\\ \end{matrix}\right).\end{aligned}$$
(10.32)

Note that the original definition of the Pauli matrices in quantum mechanics differs from the above by a circular permutation of the matrix subscripts. With the use of the Pauli matrices, (10.31) can be re-expressed as

$$\displaystyle e_{n}=\langle e|\sigma_{n}|e\rangle\;,\quad n\in\{1,2,3\}\;,$$
(10.33)

and by formally collecting the Pauli matrices into a column vector which we denote by \(\boldsymbol{\sigma}\), the Stokes vector can be expressed in the following compact form

$$\displaystyle\boldsymbol{e}=\langle e|\boldsymbol{\sigma}|e\rangle\;.$$
(10.34)

Another relevant relation between the Jones and Stokes representations has to do with the projection operator. This is a \(2\times 2\) matrix returning the projection on a given Jones vector \(|e\rangle\) of the Jones vector to which is applied,

$$\displaystyle|e\rangle\langle e|=\frac{1}{2}\left(\mathbf{I}+\boldsymbol{e}\cdot\boldsymbol{\sigma}\right).$$
(10.35)

Here, by \(\mathbf{I}\) we denote the \(2\times 2\) identity matrix (in what follows we will use the same symbol to denote the \(M\times M\) identity matrix as well, where \(M\) is an arbitrary integer), and where the scalar product between a Stokes vector and the Pauli matrix vector stands for the linear combination \(\boldsymbol{e}\cdot\boldsymbol{\sigma}=e_{1}\sigma_{1}+e_{2}\sigma_{2}+e_{3}\sigma_{3}\). The equivalence between (10.33) and (10.35) follows from the equality

$$\displaystyle\langle e|\sigma_{n}|e\rangle=\mathrm{tr}\left(\sigma_{n}|e\rangle\langle e|\right),$$
(10.36)

and from the trace-orthogonality of the Pauli matrices, that is

$$\displaystyle\mathrm{tr}\left(\sigma_{n}\sigma_{m}\right)=2\delta_{n,m}\;,$$
(10.37)

where \(\delta_{n,m}\) is the Kronecker delta. A useful consequence of (10.35) is

$$\displaystyle|\langle u|v\rangle|^{2}=\frac{1}{2}\left(1+\boldsymbol{u}\cdot\boldsymbol{v}\right),$$
(10.38)

which shows that orthogonal states of polarization, for which \(\langle u|v\rangle=0\), are antiparallel in Stokes space, namely \(\boldsymbol{u}\cdot\boldsymbol{v}=-1\).

4.2 Polarization Coupling and Unitary Propagation in Single-Mode Fibers

Manufacturing imperfections and mechanical stress that are always present in real fibers are responsible for the fact that orthogonal polarization modes couple during propagation in single-mode fibers. In the absence of polarization-dependent loss (), polarization-mode coupling can be conveniently described in Jones space by means of a unitary matrix \(\mathbf{U}\) defined through the following input-output relation,

$$\displaystyle|\tilde{e}(z,\omega)\rangle=\mathrm{e}^{-\frac{\alpha}{2}z}\mathbf{U}(z,\omega)|\tilde{e}(0,\omega)\rangle\;,$$
(10.39)

where the term \(\exp(-\alpha z/2)\) describes polarization-averaged loss. This term is immaterial to the present analysis and will be dropped in what follows. The unitary property \(\mathbf{U}(z,\omega)\mathbf{U}^{\dagger}(z,\omega)=\mathbf{I}\) implies the following form for the evolution equation of \(\mathbf{U}(z,\omega)\),

$$\displaystyle\frac{\mathrm{d}\mathbf{U}(z,\omega)}{\mathrm{d}z}=\mathrm{j}\mathbf{B}(z,\omega)\mathbf{U}(z,\omega)\;,$$
(10.40)

where \(\mathbf{B}(z,\omega)\) is a Hermitian matrix. Indeed, by differentiating both sides of the equality \(\mathbf{U}\mathbf{U}^{\dagger}=\mathbf{I}\), one obtains \((\mathrm{d}\mathbf{U}/\mathrm{d}z)\mathbf{U}^{\dagger}=-[(\mathrm{d}\mathbf{U}/\mathrm{d}z)\mathbf{U}^{\dagger}]^{\dagger}\), which implies that \((\mathrm{d}\mathbf{U}/\mathrm{d}z)\mathbf{U}^{\dagger}\) is anti-Hermitian, and hence can be expressed as \(\mathrm{j}\mathbf{B}\), where \(\mathbf{B}\) is Hermitian. Since the Pauli matrices form a basis for traceless Hermitian matrices, the above can be conveniently re-expressed as

$$\displaystyle\frac{\mathrm{d}\mathbf{U}(z,\omega)}{\mathrm{d}z}=\mathrm{j}\left[\beta_{0}(z,\omega)\mathbf{I}+\frac{1}{2}\boldsymbol{\beta}(z,\omega)\cdot\boldsymbol{\sigma}\right]\mathbf{U}(z,\omega)\;,$$
(10.41)

where by \(\beta_{0}\) we denote the propagation constant of the fundamental mode, whose third-order Taylor expansion defined in (10.7) yields the terms describing polarization-averaged phase delay, group delay, and chromatic dispersion. The traceless matrix \(\boldsymbol{\beta}(z,\omega)\cdot\boldsymbol{\sigma}\) accounts for the local (\(z\)-dependent) polarization-mode coupling that occurs during propagation (including its frequency dependence), where \(\boldsymbol{\beta}\) is a three-dimensional real-valued vector, which we refer to as the birefringence vector [10.73, 10.75] (this definition relaxes the use of the term birefringence vector, which normally is restricted to the frequency derivative of \(\boldsymbol{\beta}\)). The simple relation between \(\boldsymbol{\beta}\) and \(\mathbf{B}\) is \(\beta_{n}=\mathrm{tr}(\sigma_{n}\mathbf{B})\).

A closed-form solution for (10.41) in the general case does not exist, however the evolution of \(\mathbf{U}\) over a short fiber section of length \(\Updelta z\) is given by the following expression

$$\displaystyle\begin{aligned}\displaystyle\mathbf{U}(z+\Updelta z,\omega)&\displaystyle\simeq\exp\left[\mathrm{j}\beta_{0}(z,\omega)\Updelta z\right]\\ \displaystyle&\displaystyle\quad\times\exp\left[\frac{\mathrm{j}}{2}\,\boldsymbol{\beta}(z,\omega)\cdot\boldsymbol{\sigma}\,\Updelta z\right]\mathbf{U}(z,\omega)\;,\end{aligned}$$
(10.42)

which is customarily employed in numerical simulations, where a fiber is modeled as a concatenation of multiple short sections (waveplates). If the birefringence vector is \(z\)-independent, namely \(\boldsymbol{\beta}(z,\omega)=\boldsymbol{\beta}(\omega)\), the above expression can be readily modified to evaluate the fiber transfer matrix \(\mathbf{U}\), with the result

$$\displaystyle\begin{aligned}\displaystyle\mathbf{U}(z,\omega)&\displaystyle=\exp\left[\mathrm{j}\int_{0}^{z}\beta_{0}(\zeta,\omega)\mathrm{d}\zeta\right]\\ \displaystyle&\displaystyle\quad\times\exp\left[\frac{\mathrm{j}}{2}\,\boldsymbol{\beta}(\omega)\cdot\boldsymbol{\sigma}\,z\right],\end{aligned}$$
(10.43)

which ensures the initial condition \(\mathbf{U}(0,\omega)=\mathbf{I}\). This case is of no practical relevance (in reality the fiber birefringence is rapidly varying along the propagation axis), however the form of (10.43) is interesting since also in the most general case \(\mathbf{U}\) can be expressed in the same form,

$$\displaystyle\mathbf{U}(z,\omega)=\exp\left[\mathrm{j}\phi_{0}(z,\omega)\right]\exp\left[\frac{\mathrm{j}}{2}\,\boldsymbol{r}(z,\omega)\cdot\boldsymbol{\sigma}\right],$$
(10.44)

where \(\phi_{0}\) accounts for the accumulated phase and \(\boldsymbol{r}\) for the accumulated effect of the fiber birefringence from the input to the generic position \(z\). Indeed, a unitary matrix can in general be expressed as \(\mathbf{U}=\exp(\mathrm{j}\mathbf{H}/2)\), where \(\mathbf{U}\) is a traceless Hermitian matrix that in turn can be expanded in terms of the Pauli matrices as \(\mathbf{H}=\boldsymbol{r}\cdot\boldsymbol{\sigma}\). A useful alternative expression for \(\mathbf{U}\) follows from the eigenvector analysis of the matrix exponential appearing in (10.44), which yields

$$\displaystyle\mathbf{U}=\mathrm{e}^{\mathrm{j}\phi_{0}}\left(\mathrm{e}^{-\mathrm{j}r/2}|r\rangle\langle r|+\mathrm{e}^{\mathrm{j}r/2}|r_{\perp}\rangle\langle r_{\perp}|\right),$$
(10.45)

where \(r=|\boldsymbol{r}|\), and by \(|r\rangle\) and \(|r_{\perp}\rangle\), we denote the two orthogonal Jones vectors corresponding to the (unit-length) Stokes vectors \(\pm\boldsymbol{r}/r\). Equation (10.45) shows that \(|r\rangle\) and \(|r_{\perp}\rangle\) are the two eigenstates of \(\mathbf{U}\) and their eigenvalues are equal to \(\exp(-\mathrm{j}r/2)\) and \(\exp(\mathrm{j}r/2)\), respectively. Equation (10.45) is consistent with a general property of unitary matrices of any dimension of having orthogonal eigenvectors with unit-modulus eigenvalues. A detailed derivation of (10.45) can be found in [10.73]. The derivation relies essentially on two properties of the Pauli matrices, \(\sigma_{n}^{2}=\mathbf{I}\) and \(\sigma_{n}\sigma_{m}=-\sigma_{m}\sigma_{n}\), which yield \((\boldsymbol{r}\cdot\boldsymbol{\sigma})^{2k}=|\boldsymbol{r}|^{2k}\mathbf{I}\). Use of the latter equality in the power expansion of the matrix exponential yields \(\exp(\mathrm{j}\,\boldsymbol{r}\cdot\boldsymbol{\sigma}/2)=\cos(|\boldsymbol{r}|)\mathbf{I}+\mathrm{j}\sin(|\boldsymbol{r}|)(\boldsymbol{r}\cdot\boldsymbol{\sigma})/|\boldsymbol{r}|\). Equation (10.45) is finally obtained by considering (10.35) and the subsequent discussion.

The evolution of the field polarization state can be conveniently described also in Stokes space, where the overall effect of unitary fiber propagation is rotation of the field Stokes vector, as follows from the invariance of the Stokes vector length (this invariance is not trivially the consequence of the normalization involved in the definition of the Jones vectors but rather the result of power conservation during unitary propagation). If we denote by \(\mathbf{R}\) the \(3\times 3\) rotation matrix isomorphic to the unitary Jones matrix \(\mathbf{U}\), the input-output relation \(\boldsymbol{e}_{\mathrm{out}}=\mathbf{R}\boldsymbol{e}_{\mathrm{in}}=\langle e_{\mathrm{in}}|\mathbf{U}^{\dagger}\boldsymbol{\sigma}\mathbf{U}|e_{\mathrm{in}}\rangle\) yields the following simple relation [10.73]

$$\displaystyle\mathbf{R}\boldsymbol{\sigma}=\mathbf{U}^{\dagger}\boldsymbol{\sigma}\mathbf{U}\;,$$
(10.46)

which connects \(\mathbf{U}\) and \(\mathbf{R}\). The matrix \(\mathbf{R}\) is also referred to as the Müller matrix. The known evolution equation for the field Stokes vector is obtained by differentiating the expression \(\boldsymbol{e}=\mathrm{tr}\left(\boldsymbol{\sigma}|e\rangle\langle e|\right)\), which yields

$$\displaystyle\frac{\partial\boldsymbol{e}}{\partial z}=\mathrm{tr}\left[\boldsymbol{\sigma}\,\mathrm{j}\frac{(\boldsymbol{\beta}\cdot\boldsymbol{\sigma})(\boldsymbol{e}\cdot\boldsymbol{\sigma})-(\boldsymbol{e}\cdot\boldsymbol{\sigma})(\boldsymbol{\beta}\cdot\boldsymbol{\sigma})}{2}\right]=\boldsymbol{\beta}\times\boldsymbol{e}\;,$$
(10.47)

where the first equality follows from using (10.41) and the second requires using some of the Pauli matrices algebra. Equation (10.47) provides an intuitive interpretation of the local birefringence vector \(\boldsymbol{\beta}\). Indeed, it shows \(\boldsymbol{\beta}\) to be the local rotation axis that characterizes the trajectory drawn by the tip of the field Stokes vector on the Poincaré sphere, as the field propagates along the fiber, as illustrated in Fig. 10.9a,b. In the case of uniform birefringence the trajectory simplifies to a circle and the motion on this circle is described by the matrix \(\mathbf{R}(z)=\exp(z\boldsymbol{\beta}\times)\), where by \(\boldsymbol{\beta}\times\) we denote the matrix operator that, if applied to the vector \(\boldsymbol{s}\), performs the vector product \(\boldsymbol{\beta}\times\boldsymbol{s}\), namely

$$\displaystyle\boldsymbol{\beta}\times=\left(\begin{matrix}0&-\beta_{3}&\beta_{2}\\ \beta_{3}&0&-\beta_{1}\\ -\beta_{2}&\beta_{1}&0\\ \end{matrix}\right).$$
(10.48)

The expression of \(\mathbf{R}(z)\) shows that the rotation angle is \(|\boldsymbol{\beta}|z\) and the rotation axis is \(\boldsymbol{\hat{\beta}}=\boldsymbol{\beta}/|\boldsymbol{\beta}|\), thereby implying that the two orthogonal states whose Stokes vectors are parallel and antiparallel to \(\boldsymbol{\beta}\) are propagation eigenstates. This argument is useful to clarify the isomorphic relation existing between the general unitary Jones matrix \(\mathbf{U}=\exp(\mathrm{j}\phi_{0})\exp(\mathrm{j}\boldsymbol{r}\cdot\boldsymbol{\sigma}/2)\) and the Müller matrix \(\mathbf{R}=\exp(\boldsymbol{r}\times)\) (note that the phase shift \(\phi_{0}\) is immaterial in the Stokes representation, consistent with the definition of the Stokes vector itself).

Fig. 10.9a,b
figure 9figure 9

Trajectory of the field Stokes vector on the Poincaré sphere. (a) If the birefringence vector \(\boldsymbol{\beta}\) is constant along the fiber, the Stokes vector rotates around \(\boldsymbol{\beta}\), namely the trajectory is a circle, the rotation axis is \(\boldsymbol{\hat{\beta}}=\boldsymbol{\beta}/|\boldsymbol{\beta}|\), and the angular velocity is \(|\boldsymbol{\beta}|\). (b) In the general case of varying birefringence, the Stokes vector trajectory can be approximated by means of infinitesimal rotations around the local birefringence vector \(\boldsymbol{\beta}(z)\)

4.3 Generalized Jones and Stokes Formalism

The example of the four-mode field discussed in Sect. 10.4 suggests that the complex envelopes \(E_{n}(z,t)\) provide a complete description of the field, although their physical interpretation is slightly different whether the fiber is weakly guiding or not. The generalized Jones vector \(|e(z,\omega)\rangle\), often referred to as the field hyperpolarization vector, is hence constructed by stacking the Fourier transform of the individual complex envelopes on top of each other, and by normalizing the resulting \(2N\)-dimensional column vector to have unit modulus, formally identically to the definition used in (10.30) for the single-mode case [10.76, 10.77, 10.78],

$$\displaystyle\boldsymbol{E}(z,t)=\left(\begin{matrix}E_{1}(z,t)\\ E_{2}(z,t)\\ \vdots\\ E_{2N}\end{matrix}\right),\quad|e(z,\omega)\rangle=\frac{\tilde{\boldsymbol{E}}(z,\omega)}{|\tilde{\boldsymbol{E}}(z,\omega)|}\;.$$
(10.49)

The symbol \(\boldsymbol{E}\), which was previously used to denote a two-dimensional column vector, here denotes a \(2N\)-dimensional column vector.

The generalization of the Stokes representation is less straightforward and entails a generalization of the Pauli matrix formalism. A convenient starting point is (10.35), which shows that the Stokes representation of a single-mode field is related to the expansion of the projection operator \(|e\rangle\langle e|\) in terms of the Pauli matrices. Since \(|e\rangle\langle e|\) is a \(2N\times 2N\) Hermitian matrix for \(N> 1\) as well as for \(N=1\), (10.35) can be generalized into

$$\displaystyle|e\rangle\langle e|=\frac{1}{2N}\left(\mathbf{I}+\boldsymbol{s}\cdot\boldsymbol{\Lambda}\right),$$
(10.50)

where \(\boldsymbol{s}\) is the generalized Stokes vector and \(\boldsymbol{\Lambda}\) is a vector collecting the generalized Pauli matrices \(\Lambda_{n}\), which must be traceless Hermitian matrices fulfilling the following trace-orthogonality condition,

$$\displaystyle\mathrm{tr}\{\Lambda_{m}\Lambda_{n}\}=2N\delta_{n,m}\;.$$
(10.51)

Matrices of this type form a basis for all \(2N\times 2N\) traceless Hermitian matrices (a recursive algorithm to construct the matrices \(\Lambda_{n}\) for any number of modes is illustrated in the appendix of [10.78]). These have \(D=4N^{2}-1\) degrees of freedom, as follows from the fact that the elements on the main diagonal are real-valued and the off-diagonal elements are complex-conjugate in pairs. The subtraction of one accounts for the zero-trace constraint. These considerations imply that the generalized Stokes vectors are \(D\)-dimensional and real-valued, where \(D\) is hence the dimensionality of the generalized Stokes space. Note, however, that the region spanned by the Stokes vectors is only \((4N-2)\)-dimensional, like the hyperpolarization space (the \(2N\) complex-valued entries of \(|e\rangle\) minus the unit-magnitude constraint and the common phase of the hyperpolarization vector).

Equations (10.50) and (10.51) imply the following generalized properties

$$\begin{aligned}e_{n} & =\langle e|\Lambda_{n}|e\rangle,\quad n\in\{1,\dots,D\}\;,\end{aligned}$$
(10.52)
$$\begin{aligned}\boldsymbol{e} & =\langle e|\boldsymbol{\Lambda}|e\rangle\;,\end{aligned}$$
(10.53)
$$\begin{aligned}|\langle u|v\rangle|^{2} & =\frac{1}{2N}\left(1+\boldsymbol{u}\cdot\boldsymbol{v}\right)\;.\end{aligned}$$
(10.54)

Note that the length of the generalized Stokes vectors is given by \(|\boldsymbol{e}|=\sqrt{2N-1}\), as follows from (10.54) for \(u=v\). Another important consequence of the same equation is that Stokes vectors corresponding to orthogonal Jones vectors (for which \(\langle u|v\rangle=0\)) are characterized by the relation \(\boldsymbol{u}\cdot\boldsymbol{v}=-1\), which in the multidimensional case does not imply that \(\boldsymbol{u}\) and \(\boldsymbol{v}\) are antiparallel. In fact, since their magnitude is not 1, one can define the angle \(\alpha\) formed by two Stokes vectors corresponding to orthogonal Jones vectors through the equality \(\boldsymbol{u}\cdot\boldsymbol{v}=(2N-1)\cos(\alpha)=-1\). We note that this result does not change by normalizing the generalized Stokes vectors to have unit length, as is done in [10.79].

5 Mode Coupling and Unitary Propagation in SDM Fibers

In the absence of MDL, mode coupling in a fiber that supports \(2N\) modes is described by a unitary \(2N\times 2N\) matrix \(\mathbf{U}(z,\omega)\), whose evolution obeys the equation,

$$\displaystyle\frac{\mathrm{d}\mathbf{U}(z,\omega)}{\mathrm{d}z}=\mathrm{j}\mathbf{B}(z,\omega)\mathbf{U}(z,\omega)\;,$$
(10.55)

which is identical to (10.40), provided that the symbol \(\mathbf{B}\) denotes a \(2N\times 2N\) Hermitian matrix. The individual terms of \(\mathbf{B}\) account for the coupling between pairs of modes, whereas blocks of \(\mathbf{B}\) describe the coupling within and between groups of degenerate modes. An illustration is presented in Fig. 10.10.

Fig. 10.10
figure 10figure 10

The matrix \(\mathbf{B}\) describing linear coupling in a fiber that supports propagation of LP\({}_{01}\) and LP\({}_{11}\) mode groups. The \(2\times 2\) block \(\mathbf{B}_{01}\) accounts for polarization coupling within the fundamental mode, while the \(4\times 4\) block \(\mathbf{B}_{11}\) accounts for mode coupling within the LP\({}_{11}\) group. The \(2\times 4\) block \(\mathbf{K}\) and its Hermitian adjoint \(\mathbf{K}^{\dagger}\) describe intergroup mode coupling

The matrix \(\mathbf{B}\) can be expanded in terms of the generalized Pauli matrices, thereby rendering (10.50) into

$$\displaystyle\begin{aligned}\displaystyle\frac{\mathrm{d}\mathbf{U}(z,\omega)}{\mathrm{d}z}&\displaystyle=\mathrm{j}\bigg[\beta_{0}(z,\omega)\mathbf{I}+\frac{1}{2N}\boldsymbol{\beta}(z,\omega)\cdot\boldsymbol{\Lambda}\bigg]\mathbf{U}(z,\omega)\;,\end{aligned}$$
(10.56)

where \(\beta_{0}(z,\omega)\) has the meaning of the mode-averaged propagation constant, whereas the \(D\)-dimensional vector \(\boldsymbol{\beta}(z,\omega)\) accounts for the mismatch between the various propagation constants, as well as for the local mode coupling [10.80, 10.81]. An alternative form of (10.55), which is often encountered in the literature, is obtained by accounting separately for the propagation constants of the individual modes,

$$\displaystyle\frac{\mathrm{d}\mathbf{U}(z,\omega)}{\mathrm{d}z}=\mathrm{j}\left[\mathbf{B}_{0}+\frac{1}{2N}\boldsymbol{b}(z,\omega)\cdot\boldsymbol{\Lambda}\right]\mathbf{U}(z,\omega)\;,$$
(10.57)

where \(\mathbf{B}_{0}\) denotes a diagonal matrix whose nonzero elements are the propagation constants of the individual modes, and where the vector \(\boldsymbol{b}(z,\omega)\) only accounts for the mode coupling caused by the fiber perturbations. Clearly, this description is only appropriate in the case where the spatial modes used as a basis for the field lateral profile are also true fiber modes. In this case, in the absence of coupling (\(\boldsymbol{b}=0\)) (10.57) yields \(\mathbf{U}=\exp(\mathrm{j}\,\mathbf{B}_{0}z)\). If the spatial modes assumed for the field lateral profile expansion are not true fiber modes (as is rigorously the case in the LP representation), then \(\mathbf{B}_{0}\) is nondiagonal [10.71, 10.80, 10.82] and it accounts for the deterministic and periodic coupling that occurs between the spatial modes of the basis.

Similarly to the single-mode case, there is no closed-form solution for (10.57), except when the generalized birefringence vector is independent of \(z\). In this situation (10.42)–(10.44) apply also to the case of multiple modes, provided that the quantity \(\boldsymbol{\sigma}/2\) be replaced with \(\boldsymbol{\Lambda}/2N\). A major difference between the single-mode and the multimode case stems from the fact that while the matrix \(\boldsymbol{r}\cdot\boldsymbol{\sigma}\) admits two orthogonal eigenvectors, the matrix \(\boldsymbol{r}\cdot\boldsymbol{\Lambda}\) admits \(2N\) orthogonal eigenvectors, and hence the matrix \(\mathbf{U}\) can be expanded as

$$\displaystyle\mathbf{U}=\mathrm{e}^{\mathrm{j}\phi_{0}}\sum_{n=1}^{2N}\mathrm{e}^{\mathrm{j}\phi_{n}}|p_{n}\rangle\langle p_{n}|$$
(10.58)

where \(|p_{n}\rangle\) and \(\phi_{n}\) are the \(n\)-th eigenstate of \(\boldsymbol{r}\cdot\boldsymbol{\Lambda}\) and the corresponding eigenvalue, respectively, and where \(\sum_{n}\phi_{n}=0\).

The Stokes-space representation of unitary evolution in the case of multiple-mode propagation is formally identical to the one discussed for single-mode fibers, and the main differences have to do with the increased dimensionality of the generalized Stokes space. Indeed, a unitary \(4N\times 4N\) Jones matrix \(\mathbf{U}\) corresponds to a norm-preserving transformation \(\mathbf{R}\) in the \(D\times D\) Stokes space, which can be still interpreted as a rotation, yet on a hypersphere, thereby failing to provide an intuitive description of the Stokes vector evolution. The relation connecting \(\mathbf{U}\) and \(\mathbf{R}\) is obtained from (10.46), by replacing the Pauli matrices with their generalized version, while the evolution equation for the generalized Stokes vector becomes

$$\displaystyle\begin{aligned}\displaystyle\frac{\partial\boldsymbol{e}}{\partial z}&\displaystyle=\mathrm{tr}\left[\boldsymbol{\Lambda}\,\mathrm{j}\frac{(\boldsymbol{\beta}\cdot\boldsymbol{\Lambda})(\boldsymbol{e}\cdot\boldsymbol{\Lambda})-(\boldsymbol{e}\cdot\boldsymbol{\Lambda})(\boldsymbol{\beta}\cdot\boldsymbol{\Lambda})}{2N}\right]\\ \displaystyle&\displaystyle=\boldsymbol{\beta}\times\boldsymbol{e}\;,\end{aligned} $$
(10.59)

where the first equality follows from (10.56) and the second relies on the generalization of the vector product to the multidimensional case [10.78]. The \(k\)-th component of the generalized vector product between vectors \(\boldsymbol{A}\) and \(\boldsymbol{B}\) is defined as

$$\displaystyle(\boldsymbol{A}\times\boldsymbol{B})_{k}=\sum_{i,j}f_{i,j,k}A_{i}B_{j}\;,$$
(10.60)

where by \(f_{i,j,k}\) we denote the structure constants

$$\displaystyle f_{i,j,k}=\frac{\mathrm{j}}{(2N)^{2}}\mathrm{tr}\,[\Lambda_{k}(\Lambda_{i}\Lambda_{j}-\Lambda_{j}\Lambda_{i})]\;.$$
(10.61)

Equation (10.59) is formally identical to the dynamic equation (10.47) obtained for single-mode propagation, and in principle it can be used for numerical simulations, just like in the single-mode case. Also, similarly to the case of a single-mode fiber, the Jones matrix \(\mathbf{U}=\exp(\mathrm{j}\boldsymbol{r}\cdot\boldsymbol{\Lambda}/2N)\) is isomorphic to \(\mathbf{R}=\exp(\boldsymbol{r}\times)\), where by \(\boldsymbol{r}\times\) we denote the \(D\times D\) matrix operator that returns the vector product \(\boldsymbol{r}\times\boldsymbol{s}\), when applied to the vector \(\boldsymbol{s}\). The expression of \(\boldsymbol{r}\times\) follows from (10.60). It is interesting to note that the propagation matrix \(\mathbf{R}\) cannot pull a legitimate Stokes vector out of the manifold of the legitimate Stokes vectors, and it can be shown that only Stokes vectors corresponding to the eigenstates of \(\boldsymbol{r}\cdot\boldsymbol{\Lambda}\) are eigenstates of \(\mathbf{R}\).

5.1 Modal Dispersion

The term modal dispersion is used to address two distinct phenomena. One is the modal dependence of the field group velocity, and the other is the frequency dependence of the random coupling process.

In the case of single-mode fibers, where the two polarizations of the fundamental mode are perfectly degenerate, modal dispersion is referred to as polarization-mode dispersion () and is a manifestation of the frequency dependence of the fiber random birefringence. In the case of multimode fiber structures, modal dispersion arises primarily from the group velocity mismatch existing between the fiber modes, but its properties are profoundly influenced by the regime of coupling that characterizes the multimode propagation. In all cases modal dispersion introduces a delayed channel response which needs to be equalized at the receiver end by means of MIMO techniques, thereby obviously increasing the complexity of the MIMO-DSP receiver. In what follows, we review the formalism developed for the study of PMD in single-mode fibers and discuss its generalization to the case of SDM fibers.

5.1.1 Polarization-Mode Dispersion in Single-Mode Fibers

The unitary condition \(\mathbf{U}(z,\omega)\mathbf{U}^{\dagger}(z,\omega)=\mathbf{I}\) implies that the equation describing the frequency dependence of \(\mathbf{U}\) is of the same form as (10.41) (which describes its \(z\)-dependence), namely

$$\displaystyle\frac{\partial\mathbf{U}(z,\omega)}{\partial\omega}=\mathrm{j}\left[\tau_{0}(z,\omega)\mathbf{I}+\frac{1}{2}\boldsymbol{\tau}(z,\omega)\cdot\boldsymbol{\sigma}\right]\mathbf{U}(z,\omega)\;.$$
(10.62)

The meaning of \(\tau_{0}\) and \(\boldsymbol{\tau}\) is easily understood when they do not depend on frequency and hence (10.62) has the following simple solution

$$\begin{aligned}\mathbf{U}(z,\omega) & =\exp(\mathrm{j}\tau_{0}\omega)\exp\left(\frac{\mathrm{j}}{2}\boldsymbol{\tau}\cdot\boldsymbol{\sigma}\,\omega\right)\mathbf{U}(z,0)\end{aligned}$$
(10.63)
$$\begin{aligned} & \begin{aligned}\displaystyle&\displaystyle=\mathrm{e}^{\mathrm{j}\tau_{0}\omega}(\mathrm{e}^{\mathrm{j}\tau\omega/2}|\tau\rangle\langle\tau|+\mathrm{e}^{-\mathrm{j}\tau\omega/2}|\tau_{\perp}\rangle\\ \displaystyle&\displaystyle\quad\times\langle\tau_{\perp}|)\mathbf{U}(z,0)\;,\end{aligned}\end{aligned}$$
(10.64)

where the second equality follows from the discussion related to (10.45). Here by \(|\tau\rangle\) and \(|\tau_{\perp}\rangle\), we denote the Jones vectors that correspond to the Stokes vectors \(\pm\hat{\tau}=\pm\boldsymbol{\tau}/\tau\), with \(\tau=|\boldsymbol{\tau}|\). This form indicates that a polarized input signal characterized by a state vector \(|p\rangle\) such that \(\mathbf{U}(z,0)|p\rangle=|\tau\rangle\) or by an orthogonal state \(|p_{\perp}\rangle\) such that \(\mathbf{U}(z,0)|p_{\perp}\rangle=|\tau_{\perp}\rangle\) is simply delayed by \(\tau_{0}+\tau/2\) or \(\tau_{0}-\tau/2\), respectively, at propagation distance \(z\), namely

$$\begin{aligned}f(t)|p\rangle & \to f\left(t-\tau_{0}-\frac{\tau}{2}\right)|\tau\rangle\;,\end{aligned}$$
(10.65)
$$\begin{aligned}f(t)|p_{\perp}\rangle & \to f\left(t-\tau_{0}+\frac{\tau}{2}\right)|\tau_{\perp}\rangle.\end{aligned}$$
(10.66)

The polarization states \(|p\rangle\) and \(|p_{\perp}\rangle\) are known as principal states of polarization () and the relative delay \(\tau\) that they accumulate during propagation is known as the differential group delay (DGD) (it is customary to refer to \(|\tau\rangle\) and \(|\tau_{\perp}\rangle\) as the slow and the fast PSPs, respectively, consistent with the fact that \(|\tau\rangle\) is delayed with respect to \(|\tau_{\perp}\rangle\)). The vector \(\boldsymbol{\tau}\), which as discussed provides a complete characterization of the PSPs, is famously known as the PMD vector.

The effect of PMD on arbitrarily polarized input states can be more conveniently described by introducing the distinction between input and output PSPs (this distinction is often erroneously ignored in the literature, however it becomes unnecessary if one assumes that no coupling occurs at \(\omega=0\) (\(\mathbf{U}(z,0)=\mathbf{I}\)), or equivalently if the Jones vectors are expressed in a rotating reference frame where this is the case). Indeed, \(|p\rangle\) and \(|p_{\perp}\rangle\) should be more correctly referred to as the input PSPs, whereas \(|\tau\rangle\) and \(|\tau_{\perp}\rangle\) should be referred to as the output PSPs. Using the simple relation existing between them, (10.64) can be re-expressed in the following form

$$\displaystyle\mathbf{U}(z,\omega)=\mathrm{e}^{\mathrm{j}\tau_{0}\omega}\left(\mathrm{e}^{\mathrm{j}\tau\omega/2}|\tau\rangle\langle p|+\mathrm{e}^{-\mathrm{j}\tau\omega/2}|\tau_{\perp}\rangle\langle p_{\perp}|\right),$$
(10.67)

which can be used to see that an input signal characterized by the state vector \(|u\rangle\), during propagation splits into two replicas that are separated in time by the DGD,

$$\displaystyle\begin{aligned}\displaystyle f(t)|u\rangle&\displaystyle\to\langle p|u\rangle f\left(t-\tau_{0}-\frac{\tau}{2}\right)|\tau\rangle\\ \displaystyle&\displaystyle\quad\,+\langle p_{\perp}|u\rangle f\left(t-\tau_{0}+\frac{\tau}{2}\right)|\tau\rangle\;.\end{aligned}$$
(10.68)

The two replicas are polarized along the output PSPs, whereas their amplitudes are equal to the projections of the input signal state vector onto the input PSPs. Equation (10.68) reduces to (10.65) or (10.66) if \(|u\rangle=|p\rangle\) or \(|u\rangle=|p_{\perp}\rangle\), respectively.

We recall that (10.67) and (10.68) were derived under the assumption that the PMD vector does not depend on frequency. The resulting description of PMD is hence an approximation usually referred to as a first-order PMD picture. Assessing the accuracy of this approximation requires studying the statistical properties of the PMD vector, which are briefly reviewed in what follows.

The PMD vector evolution equation is obtained in two steps. We first equate the two expressions for \(\partial^{2}\mathbf{U}/\partial z\partial\omega\) obtained from (10.41) and (10.62), with the result

$$\begin{aligned}\frac{\partial\tau_{0}}{\partial z} & =\frac{\partial\beta_{0}}{\partial\omega}\;,\end{aligned}$$
(10.69)
$$\begin{aligned}\frac{\partial\boldsymbol{\tau}}{\partial z}\cdot\boldsymbol{\sigma} & =\frac{\partial\boldsymbol{\beta}}{\partial\omega}\cdot\boldsymbol{\sigma}+\mathrm{j}\frac{(\boldsymbol{\beta}\cdot\boldsymbol{\sigma})(\boldsymbol{\tau}\cdot\boldsymbol{\sigma})-(\boldsymbol{\tau}\cdot\boldsymbol{\sigma})(\boldsymbol{\beta}\cdot\boldsymbol{\sigma})}{2}\;.\end{aligned}$$
(10.70)

The first equation describes the accumulation of the polarization-averaged delay. The second can be further simplified by tracing out the Pauli matrices, with the same procedure illustrated in (10.47). The result is the famous PMD dynamic equation

$$\displaystyle\frac{\partial\boldsymbol{\tau}}{\partial z}=\frac{\partial\boldsymbol{\beta}}{\partial\omega}+\boldsymbol{\beta}\times\boldsymbol{\tau}\;.$$
(10.71)

The dependence of the birefringence vector on propagation distance renders the evolution of the PMD vector nontrivial. Most importantly, since \(\boldsymbol{\beta}\) is random in nature (it describes random mode coupling), the PMD vector \(\boldsymbol{\tau}\) is also random. The statistical properties of the birefringence vector of single-mode fibers have been accurately characterized in the past decade, and a well-established result is that its typical correlation length ranges from a few meters to a few hundreds of meters [10.75], implying that thousands of independent contributions accumulate over typical fiber lengths in metro and long-haul systems. This simple argument, in conjunction with the central-limit theorem, legitimates the description of the PMD vector evolution in terms of a three-dimensional Brownian motion [10.83]. That is, the three components of the PMD vector are independent and identically distributed Gaussian variables, and its length—the DGD—is characterized by a Maxwellian probability density function (plotted in Fig. 10.11a). The mean PMD vector length (or, equivalently, the mean DGD) is proportional to the square-root of the propagation distance

$$\displaystyle\langle\tau(z)\rangle=\kappa_{\mathrm{PMD}}\sqrt{z}\;,$$
(10.72)

where by angled brackets we denote ensemble averaging, and where the proportionality coefficient \(\kappa_{\mathrm{PMD}}\) is the familiar PMD coefficient (note that the mean value of the DGD is frequency-independent, as it follows from the stationarity of the PMD process with respect to frequency). The PMD coefficient is customarily specified in units of \(\mathrm{ps/\sqrt{km}}\) and typical values range from \({\mathrm{0.01}}\,{\mathrm{ps/\sqrt{km}}}\) in modern low-PMD fibers to \({\mathrm{0.5}}\,{\mathrm{ps/\sqrt{km}}}\) in installed vintage systems [10.84]. We stress that the square-root growth of the mean DGD results from the random nature of the birefringence vector \(\boldsymbol{\beta}\), while the details of the birefringence statistics are not relevant, as long as the fiber length exceeds by some orders of magnitude the birefringence correlation length. Just to mention one relevant example, it is worth pointing out that all the work carried out by Galtarossa's group [10.75, 10.82, 10.84, 10.85] relies on the assumption that circular birefringence is absent everywhere along the fiber, thereby implying that the third component of \(\boldsymbol{\beta}\) vanishes. In this case, the second term at the right-hand side of (10.71) is the one responsible for lifting the PMD vector out of the equatorial plane in Stokes space, thereby making the assumption of vanishing circular birefringence immaterial, with the result that all of the described properties of the PMD vector are not affected by this detail of the model. Equation (10.72) can be expressed in the following equivalent form,

$$\displaystyle\langle\tau^{2}(z)\rangle=\kappa^{2}z\;,$$
(10.73)

where \(\kappa=\kappa_{\mathrm{PMD}}\sqrt{3\uppi/8}\).

The random nature of PMD manifests itself also through the frequency dependence of the PMD vector, which is key to assessing the accuracy of the first-order approximation. This dependence is conveniently characterized by means of the two-frequency correlation function of the PMD vector [10.86, 10.87, 10.88], whose expression is

$$\displaystyle\langle\boldsymbol{\tau}(z,\omega)\cdot\boldsymbol{\tau}(z,\omega+\Omega)\rangle=3\frac{1-\mathrm{e}^{-\frac{\Omega^{2}\langle\tau^{2}(z)\rangle}{3}}}{\Omega^{2}}$$
(10.74)

The derivation of (10.74) is straightforward if one uses the tools of stochastic calculus [10.88]. The same result can also be obtained by approximating the fiber with a finite number \(N\) of constant-birefringence plates and then by taking the limit \(N\to\infty\), which is the approach used in the work where (10.74) was first presented [10.86]. Note that the derivation of the autocorrelation function is performed by assuming a first-order expansion of the birefringence vector \(\boldsymbol{\beta}(z,\omega+\Omega)\simeq\boldsymbol{\beta}(z,\omega)+\Omega(\partial\boldsymbol{\beta}/\partial\omega)(z,\omega)\). A similar assumption underpins the derivation of the generalized PMD vector autocorrelation function () in the multimode case. Inspecting the plot of (10.74) in Fig. 10.11 shows that the PMD vector ACF reduces to one half of its peak value at the angular frequency difference \(\Omega_{\mathrm{3dB}}\simeq 2.18/\sqrt{\langle\tau^{2}\rangle}\), which suggests that for smaller differences two PMD vectors are highly correlated with each other and hence the frequency dependence of the PMD vector is negligible. The corresponding frequency difference \(B=\Omega_{\mathrm{3dB}}/2\uppi\simeq 0.347/\sqrt{\langle\tau^{2}\rangle}\) is often used as a definition of the PMD bandwidth, with the idea that the first-order PMD approximation only applies to the transmission of signals whose bandwidth does not exceed the PMD bandwidth. It is worth pointing out that in the case of single-mode fiber systems, this is almost always the case, for single-channel bandwidths of the order of a few tens of \(\mathrm{GHz}\). As an example consider a 1000 km link: for a legacy fiber with a PMD coefficient \(\kappa_{\mathrm{PMD}}={\mathrm{0.1}}\,{\mathrm{ps/\sqrt{km}}}\), the PMD bandwidth is \(B\simeq{\mathrm{100}}\,{\mathrm{GHz}}\), and it increases to \(B\simeq{\mathrm{1}}\,{\mathrm{THz}}\) in the case of a low-PMD fiber with \(\kappa_{\mathrm{PMD}}={\mathrm{0.01}}\,{\mathrm{ps/\sqrt{km}}}\). The situation is substantially different in the case of multimode fibers, as is discussed in the next section.

Fig. 10.11
figure 11figure 11

(a) The probability density function of the DGD normalized to its root-mean-square value. (b) Normalized autocorrelation function of the PMD vector

To conclude this section, we remind the reader that PMD is a unitary effect and hence, unlike PDL [10.89], does not imply a fundamental system information capacity loss. For this reason, its effect can in principle be fully compensated for in the digital domain at the receiver of a polarization-multiplexed coherent system. The complexity of the necessary DSP scales with the magnitude of the system PMD (the differential delay that needs to be accommodated in time-domain equalization algorithms [10.90]), or equivalently with the PMD bandwidth (the resolution required in frequency-domain equalization algorithms [10.91]).

5.1.2 Generalization of the PMD Formalism

The derivation of (10.62) relies solely on the unitary nature of the Jones matrix \(\mathbf{U}(z,\omega)\). Its extension to the case of multimode fiber structures is thereby straightforward, and the resulting equation can be expressed as

$$\displaystyle\frac{\partial\mathbf{U}(z,\omega)}{\partial\omega}=\mathrm{j}\left[\tau_{0}(z,\omega)\mathbf{I}+\frac{1}{2N}\boldsymbol{\tau}(z,\omega)\cdot\boldsymbol{\Lambda}\right]\mathbf{U}(z,\omega)\;,$$
(10.75)

where \(\tau_{0}\) is now the mode-averaged group delay, and \(\boldsymbol{\tau}\) is a \(D\)-dimensional real-valued vector that generalizes the PMD vector and that is referred to as the mode dispersion () vector [10.78]. Its evolution equation is also derived with the same procedure described in the single-mode case and the result is identical to (10.71), provided that the symbol \(\times\) is used to denote the generalized vector product. A major difference with respect to the single-mode case is due to the phase and group velocity mismatch existing between the various fiber modes. As pointed out in the discussion of (10.56) and (10.57), this mismatch is captured by the generalized birefringence vector, which can be conveniently expressed as the sum of two contributions,

$$\displaystyle\boldsymbol{\beta}(z,\omega)=\boldsymbol{\beta}_{\mathrm{d}}(\omega)+\boldsymbol{\beta}_{\mathrm{r}}(z,\omega)\;,$$
(10.76)

where the term \(\boldsymbol{\beta}_{\mathrm{d}}\) is the deterministic content of \(\boldsymbol{\beta}\) accounting for the propagation constants mismatch (which is constant along the fiber, unless some specific special fiber design is considered), while the term \(\boldsymbol{\beta}_{\mathrm{r}}\) models random coupling between modes. Moreover, if the spatial modes used for representing the field lateral profile are not true fiber modes, \(\boldsymbol{\beta}_{\mathrm{d}}\) must also account for the deterministic coupling between them. With the formalism of (10.57), \(\boldsymbol{\beta}_{\mathrm{d}}\) can be extracted using \(\beta_{\mathrm{d,}n}=\mathrm{tr}(\Lambda_{n}\mathbf{B}_{0})/2N\), where \(n=1\dots D\). As an example, Fig. 10.12 illustrates the case of a coupled-core three-core fiber where the spatial-modes basis consists of the fundamental modes of the individual cores (they are not true fiber modes—the true fiber modes are supermodes, as discussed in Sect. 10.3.3). In this case, one can compute [10.80] \(\boldsymbol{\beta}_{\mathrm{d}}=2b\sqrt{N}(\hat{e}_{10}+\hat{e}_{16}+\hat{e}_{18}+\hat{e}_{24}+\hat{e}_{26}+\hat{e}_{32})\), where \(\boldsymbol{\hat{e}}_{j}\), \(j=1\dots 35\) is a unit vector in the \(j\)-th direction of the generalized Stokes space.

Fig. 10.12
figure 12figure 12

State vector \(\boldsymbol{E}\) in a three-core fiber, where the field is represented in the basis of the fundamental modes of the individual fiber cores, and the matrix \(\mathbf{B}_{0}\) describing the deterministic coupling between them

Using (10.76), the MD vector evolution equation reads as

$$\displaystyle\frac{\partial\boldsymbol{\tau}}{\partial z}=\frac{\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}}{\mathrm{d}\omega}+\frac{\partial\boldsymbol{\beta}_{\mathrm{r}}}{\partial\omega}+(\boldsymbol{\beta}_{\mathrm{d}}+\boldsymbol{\beta}_{\mathrm{r}})\times\boldsymbol{\tau}\;.$$
(10.77)

The term \(\partial\boldsymbol{\beta}_{\mathrm{r}}/\partial\omega\), which accounts for the frequency dependence of the perturbations, contributes to the evolution of the MD vector to a negligible extent as compared to the \(\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}/\mathrm{d}\omega\), which accounts for the deterministic walk-off between nondegenerate modes, and hence can be ignored. The simplified evolution equation,

$$\displaystyle\frac{\partial\boldsymbol{\tau}}{\partial z}=\frac{\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}}{\mathrm{d}\omega}+\left(\boldsymbol{\beta}_{\mathrm{d}}+\boldsymbol{\beta}_{\mathrm{r}}\right)\times\boldsymbol{\tau}\;,$$
(10.78)

shows that the local contribution to the MD vector \(\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}/\mathrm{d}\omega\) is constant along the fiber, while the overall \(z\)-dependent birefringence vector \(\boldsymbol{\beta}_{\mathrm{d}}(\omega)+\boldsymbol{\beta}_{\mathrm{r}}(z,\omega)\) rotates the MD vector as it accumulates along the fiber. This dynamics suggests that in the multimode case the statistics of the MD vector depend on the effectiveness with which the MD vector is randomized by the random birefringence, with different results in the two relevant regimes of weak and strong mode coupling.

Like in the single-mode fiber case, an intuitive interpretation of the MD vector can be gained from the first-order picture. In fact, the PSP expansion of the channel transfer matrix \(\mathbf{U}\) in (10.67) is generalized to the multimode case in the following form,

$$\displaystyle\mathbf{U}(z,\omega)=\mathrm{e}^{\mathrm{j}t_{0}\omega}\sum_{n=1}^{2N}\mathrm{e}^{\mathrm{j}t_{n}\omega}|\tau_{n}\rangle\langle p_{n}|\;,$$
(10.79)

where the output principal states (PSs) \(|\tau_{n}\rangle\) are the \(2N\) orthogonal eigenstates of the matrix \(\boldsymbol{\tau}\cdot\boldsymbol{\Lambda}\) (they are related to the input PSPs \(|p_{n}\rangle\) through \(|\tau_{n}\rangle=\mathbf{U}(z,0)|p_{n}\rangle\)), and the corresponding delays are referred to the mode-averaged group delay \(t_{0}\), so that \(\sum_{n}t_{n}=0\). Thus, an input signal characterized by the state vector \(|u\rangle\), as a result of propagation splits into \(2N\) replicas, each delayed by \(t_{0}+t_{n}\),

$$\displaystyle f(t)|u\rangle\to\sum_{n=1}^{2N}\langle p_{n}|u\rangle f(t-t_{0}-t_{n})|\tau_{n}\rangle\;.$$
(10.80)

The analytical extraction of the mode delays, which are the eigenvalues of the matrix \((\boldsymbol{\tau}\cdot\boldsymbol{\Lambda})/2N\), in the multimode case is not as straightforward as in the single-mode case (where \(t_{1}=\tau/2\) and \(t_{2}=-\tau/2\)), however the mode delays are related to the MD vector through the following simple relation

$$\displaystyle\tau^{2}=2N\sum_{n=1}^{2N}t_{n}^{2}\;.$$
(10.81)

Within the first-order picture, the most relevant quantity is the largest differential group delay (), also referred to as the delay spread [10.92]), which is defined as the difference between the largest and the smallest of the \(2N\) delays. The LDGD is the time interval that needs to be accommodated at the MIMO-DSP receiver, and it obviously affects the complexity of the MIMO-DSP receiver [10.93]. In this framework the statistics of the LDGD is of primary importance, as LDGD fluctuations might cause system outages if not properly accounted for in the receiver design. These considerations underpinned early studies of MD in SDM fibers, which were focused primarily on characterizing the probability density function of the LDGD [10.77, 10.78]. More recently, however, it became clear that the first-order picture can accurately describe MD in fibers with negligible mode coupling, whereas it is fundamentally inconsistent in the most relevant case of SDM fibers with strong mode coupling [10.80, 10.81, 10.94]. These two cases are discussed in what follows.

5.1.3 Modal Dispersion in the Regime of Weak Mode Coupling

Weak coupling between modes results from a large mismatch between the modes' propagation constants. In this regime modal dispersion manifests itself primarily in the form of modal walk-off, where distinct groups of quasi-degenerate modes accumulate a differential delay that increases proportionally to the propagation distance. Using the Stokes-space formalism, this result emerges from (10.78), which by setting \(\boldsymbol{\beta}_{\mathrm{r}}=0\), yields \(\boldsymbol{\tau}=(\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}/\mathrm{d}\omega)z\) (this simple result follows from the fact that \(\boldsymbol{\beta}_{\mathrm{d}}\) and \(\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}/\mathrm{d}\omega\) are parallel vectors, as discussed in [10.80]). In this case, the first-order approximation is legitimate for signals within whose bandwidth the term \(\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}/\mathrm{d}\omega\) does not vary significantly. In particular, in the familiar case of two uncoupled groups of degenerate modes, this expression of the MD vector can be shown to produce two distinct delays, whose absolute difference is equal to the differential group delay \(\smash{L|v_{g1}^{-1}-v_{g,2}^{-1}|}\), where \(v_{g,1}\) and \(v_{g,2}\) denote the group velocities of the two mode groups. Modal dispersion within the two groups of modes adds to the much larger intergroup dispersion, implying an almost negligible effect on the MIMO-DSP complexity, which depends primarily on the intergroup differential delay. This regime includes transmission in LP\({}_{01}\) and LP\({}_{11}\) mode groups of weakly guiding fibers, under the simplifying assumption of perfect degeneracy of the LP\({}_{11}\) modes.

Obviously, the regime of weak mode coupling evolves into a regime of intermediate coupling, and eventually of strong coupling, as propagation distance increases. The analysis of this transition and its consequences for the fiber modal dispersion are rather complex and go beyond the purpose of this review. Recent studies on this subject can be found in [10.80, 10.94, 10.95, 10.96, 10.97].

5.1.4 Modal Dispersion in the Regime of Strong Mode Coupling

Modes with similar propagation constants get strongly coupled over relatively short propagation distances, as a result of the fiber's perturbations. In this situation, the effect of the random birefringence vector \(\boldsymbol{\beta}_{\mathrm{r}}\) is dominant and the most relevant properties of the MD vector can be derived by neglecting the deterministic birefringence vector \(\boldsymbol{\beta}_{\mathrm{d}}\) in (10.78). The simplified equation,

$$\displaystyle\frac{\partial\boldsymbol{\tau}}{\partial z}=\frac{\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}}{\mathrm{d}\omega}+\boldsymbol{\beta}_{\mathrm{r}}\times\boldsymbol{\tau}\;,$$
(10.82)

differs from the PMD vector evolution equation in the forcing term, which is deterministic. Note that because of the many uncorrelated rotations of the accumulating MD vector driven by the random birefringence vector, the orientation of \(\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}/\mathrm{d}\omega\) is immaterial, and the same argument used in the single-mode case can hence be used here to conclude that the MD vector evolves as a Gaussian vector too (indeed, direct measurements of the generalized birefringence vector statistics are not available yet, however the observed mode-coupling dynamics indicate that the modal content of the transmitted field in the regime of strong mode coupling is randomized over a few meters, suggesting that the correlation length of the generalized birefringence vector is of the same order or smaller than in single-mode fibers). Its modulus follows the chi distribution with \(D\) degrees of freedom (its square modulus follows the chi-squared distribution), and its mean-square value grows linearly with propagation distance, namely \(\langle\tau^{2}\rangle=\kappa^{2}z\) (however, the dependence of the MD coefficient \(\kappa\) on the fiber design and perturbation statistics is rather complex [10.80]). We remind the reader that a random variable \(Y\) is chi-square-distributed with \(D\) degrees of freedom if it results from the sum of the squares of \(D\) identically distributed and zero-mean independent Gaussian variables \(X_{n}\): \(Y=\sum_{n=1}^{D}X_{n}^{2}\). The probability density function of \(\tau\) is plotted for several values of \(N\) in Fig. 10.13a.

Fig. 10.13
figure 13figure 13

(a) The probability density function of the MD vector modulus normalized to its root-mean-square value for various numbers of spatial modes. (b) Normalized autocorrelation function of the MD vector (the single-mode case corresponds to \(D=3\))

A major difference between the single-mode and the multimode case is in the fact that while the PMD vector length scales with the strength of the perturbations, the length of the MD vector scales with the modulus of the deterministic birefringence vector derivative \(|\mathrm{d}\boldsymbol{\beta}_{\mathrm{d}}/\mathrm{d}\omega|\), which can be greater by orders of magnitude, depending on the deterministic walk-off between the fiber modes. An important consequence of this difference is that the MD bandwidth can be correspondingly smaller than the PMD bandwidth. Indeed, the MD vector autocorrelation function has the following form,

$$\displaystyle\langle\boldsymbol{\tau}(z,\omega+\Omega)\cdot\boldsymbol{\tau}(z,\omega)\rangle=\frac{D}{\Omega^{2}}\left[1-\mathrm{e}^{-\frac{\Omega^{2}\langle\tau^{2}(z)\rangle}{D}}\right],$$
(10.83)

and the MD bandwidth is \(B_{\mathrm{MD}}\simeq 0.2\sqrt{D/\langle\tau^{2}\rangle}\), as obtained by inspection of Fig. 10.13b (this expression can also be obtained by multiplying the PMD bandwidth by \(\sqrt{D/3}\)). It should be noted at this point that, while measurements of the PMD vector and its statistics are routinely performed in traditional single-mode systems, the experimental characterization of the MD vector in SDM systems is more involved [10.99] and therefore the system modal dispersion is typically characterized by exploiting the concept of the intensity impulse response (). This is defined as the mode-averaged output power that is measured by exciting a single mode at the fiber input with a spectrally flat signal of bandwidth \(B\). In formulae, we define the matrix \(\mathbf{H}(t)\) whose \((j,k)\) element \(H_{j,k}(t)\) is the signal received in the \(j\)-th mode when the \(k\)-th mode was excited,

$$\displaystyle\mathbf{H}(t)=\int_{-B/2}^{B/2}\mathbf{U}(L,\omega)\mathrm{e}^{-\mathrm{j}\omega t}\frac{\mathrm{d}\omega}{2\uppi}\;,$$
(10.84)

so that the IIR can be expressed as

$$\displaystyle I(t)=\frac{1}{2N}\sum_{j=1}^{2N}\sum_{k=1}^{2N}|H_{j,k}(t)|^{2}\;.$$
(10.85)

Here the inner sum is the total output power that is measured when the \(j\)-th mode was excited, while the outer sum performs the mode averaging. If the probing signal bandwidth is sufficiently larger than the MD bandwidth (by one or more orders of magnitude), it can be shown [10.81] that the IIR is deterministic and practically independent of \(B\). Most importantly, its temporal profile is Gaussian and the mean-square duration is very simply related to the mean-square length of the MD vector (or, equivalently, to the MD bandwidth), namely

$$\begin{aligned}I(t) & =I_{0}\exp\left(-\frac{t^{2}}{2T^{2}}\right),\end{aligned}$$
(10.86)
$$\begin{aligned}T^{2} & =\frac{\langle\tau^{2}\rangle}{4N^{2}}=\frac{\kappa^{2}z}{4N^{2}},\end{aligned}$$
(10.87)

where \(I_{0}\) is a normalization coefficient immaterial to the present discussion. The Gaussian shape of the IIR has been observed in various experiments [10.100, 10.53, 10.98] and can be reproduced in simulations. Figure 10.14a,b presents a comparison between the measured and simulated IIR for the coupled-core three-core fiber used in [10.98]. The measured IIR mean-square duration of about \({\mathrm{0.25}}\,{\mathrm{ns^{2}}}\) at a propagation distance of \({\mathrm{1000}}\,{\mathrm{km}}\) corresponds to an MD bandwidth of approximately \({\mathrm{400}}\,{\mathrm{MHz}}\), a value much smaller than typical WDM channel bandwidths used today in commercial systems. We remind the reader that, in contrast, typical PMD bandwidth values for the same link length are of the order of several hundreds of GHz (as seen in Sect. 10.5.1 Polarization-Mode Dispersion in Single-Mode Fibers).

Fig. 10.14a,b
figure 14figure 14

The mean-square width of the intensity impulse response versus propagation distance for the coupled-core three-core fiber of [10.98]. The inset shows the intensity impulse response for the right-most data point. (a) and (b) present experimental [10.98] and simulation [10.81] results, respectively. The dashed curve in (b) is a plot of (10.86) (the relation between the MD coefficient \(\kappa\) and the fiber characteristics is discussed in [10.80])

The above argument shows the inadequacy of the first-order approximation to characterize the MD of SDM fibers for medium-to-long-reach transmission, where modes undergo strong coupling, and at the same time clarifies that a correct approach to designing the MIMO-DSP receiver must rely on the knowledge of the IIR duration. Strategies to reduce the receiver complexity include pursuing the reduction of the fiber MD through fiber design optimization. This approach means studying the dependence of the MD coefficient \(\kappa\) on the fiber characteristics (core number/geometry and/or refractive index profile), as well as on the statistics of the fiber perturbations. This is a rather challenging task and only a limited number of preliminary investigations are available in the literature [10.80, 10.85, 10.94].

5.2 Stokes-Space Analysis of Mode-Dependent Loss and its Impact on Information Capacity

Mode-dependent loss is a nonunitary propagation effect and as such it is responsible for impairing the capacity of SDM systems [10.101, 10.102, 10.103, 10.104, 10.55]. The Stokes-space formalism has proven to be a convenient tool for the modeling of MDL and its impact on system performance. If we denote by \(S\) the average transmit power per mode and by \(\mathbf{Q}\) the coherency matrix of the propagated amplification noise, the channel spectral efficiency in the absence of channel state information can be expressed as

$$\displaystyle C=\log_{2}\left[\det\left(\mathbf{I}+S\mathbf{Q}^{-1/2}\mathbf{U}\mathbf{U}^{\dagger}\mathbf{Q}^{-1/2}\right)\right].$$
(10.88)

The matrix \(\mathbf{U}\mathbf{U}^{\dagger}\) (which, in the absence of MDL, would equal the identity matrix) is Hermitian and can be expressed in terms of the generalized Pauli matrices,

$$\displaystyle\mathbf{U}\mathbf{U}^{\dagger}=\gamma_{0}\left(\mathbf{I}+\boldsymbol{\Gamma}\cdot\boldsymbol{\Lambda}\right),$$
(10.89)

where \(\gamma_{0}\) is the mode-averaged gain and the Stokes vector \(\boldsymbol{\Gamma}\) is the MDL vector that generalizes the familiar PDL vector used in the single-mode fiber case. In the regime of strong mode coupling and large signal-to-noise ratio (), the average spectral efficiency reduction per mode induced by MDL has been shown to be [10.103, 10.104]

$$\displaystyle\frac{C_{0}-\langle C\rangle}{2N}=\frac{\langle\Gamma^{2}\rangle}{3\ln(2)}\;,$$
(10.90)

where \(C_{0}\) is the spectral efficiency of a perfect link, where the received SNR equals the ratio between the mode-averaged signal and the noise powers. The accuracy of (10.90) is excellent for SNR values larger than \({\mathrm{10}}\,{\mathrm{dB}}\) [10.104]. A simple method for measuring \(\langle\Gamma^{2}\rangle\) is presented in [10.103].

A quantity which is often used as a figure of merit in the analysis of MDL is the power ratio between the least and the most attenuated hyperpolarization states, which is given by

$$\displaystyle\rho_{\mathrm{dB}}=10\log_{10}\left(\frac{1+\lambda_{\max}}{1+\lambda_{\min}}\right),$$
(10.91)

where \(\lambda_{\max}\) and \(\lambda_{\min}\) denote the largest and smallest eigenvalues of \(\boldsymbol{\Gamma}\cdot\boldsymbol{\Lambda}\) (note that the corresponding loss/"​"​gain values that are measured in experiments are \(\gamma_{0}(1+\lambda_{\max})\) and \(\gamma_{0}(1+\lambda_{\min})\), respectively). Interestingly, in the regime of small-to-moderate MDL the mean-square length of the MDL vector is related to this quantity by the following simple relation,

$$\displaystyle\langle\rho_{\mathrm{dB}}^{2}\rangle=\frac{10^{2}}{\ln^{2}(10)}f(N)\langle\Gamma^{2}\rangle$$
(10.92)

with

$$\displaystyle f(N)=4\frac{(N-1)^{2}+24.7(N-1)+16.14}{0.2532(N-1)^{2}+7.401(N-1)+16.14}\;.$$
(10.93)

This connects the average MDL-induced spectral efficiency reduction per mode caused by MDL (10.90) with the mean-square MDL expressed in logarithmic units,

$$\displaystyle\frac{C_{0}-\langle C\rangle}{2N}=\frac{\ln^{2}(10)}{300\ln(2)f(N)}\langle\rho_{\mathrm{dB}}^{2}\rangle\;.$$
(10.94)

This expression does not depend on the specific way in which the in-line amplifiers are operated, as discussed in [10.104].

6 SDM Transmission Experiments

Numerous transmission experiments have been performed over multimode fibers with the numbers of spatial modes ranging from 3 to 45 [10.105, 10.106, 10.107, 10.108, 10.109, 10.14, 10.18, 10.19]. Also, multicore fibers have been studied experimentally in detail for many possible core arrangements up to 36 cores and spatial multiplicity (number of cores \(\times\) number of modes) larger than 100.

Space-division multiplexed transmission experiments are very equipment-intensive: A typical SDM transmission experiment for six spatial channels is shown in Fig. 10.15.

Fig. 10.15
figure 15figure 15

Space-division multiplexed transmission experiment supporting six spatial channels. Triangles represent erbium-doped fiber amplifiers (s), is a polarizing beam splitter, DSO is a digital storage oscilloscope, ECL is an external-cavity tunable laser, DFB is a distributed feedback laser, DN-MZM is a double-nested Mach–Zehnder modulator, and PD-CRX is a polarization-diverse coherent receiver

The transmitter consists of a traditional WDM signal, where odd and even wavelength channels are modulated separately by two double-nested Mach–Zehnder () modulators driven with four independent signals carrying the underlaying transmission pattern, like for example QPSK, 16-QAM, or 64-QAM, generated by high-speed digital-to-analog converters (DACs), where pseudo-random patterns are chosen such that the cross-correlation peaks between patterns are significantly smaller than the autocorrelation peaks. This is required to properly identify the timing of the received channels, and to evaluate their performance using digital signal processing [10.121]. Additional copies of the signal are generated and decorrelated using fiber delays such that each mode and polarization carries a locally independent signal. The decorrelated signals are then injected into a six-fold recirculating loop arrangement, which is used to emulate long-distance experiments (often in SDM experiments only limited lengths of prototype fibers are available).

The loop arrangement is similar to a traditional SMF loop, except that it consists of six loops which have to be adjusted to a path-length difference of typically within \({\mathrm{1}}\,{\mathrm{cm}}\), corresponding to a time delay of \({\mathrm{50}}\,{\mathrm{ps}}\). The loop contains amplifiers to overcome the fiber loss and the loss of the additional loop components, loop switches (that are used to open and close the loop during the loading and recirculation time, respectively), combiners and splitters (to inject and extract the light from the loop), and finally programmable gain equalizing filters (denoted as blockers in Fig. 10.15), to maintain a flat spectrum after each recirculation.

The signals extracted from the loops are captured by an array of polarization-diverse coherent receivers (PD-CRXs), which extract the amplitude and phase of all modes and polarizations, so that the optical field after transmission is fully known. Note that it is necessary to measure all modes and polarization for the same time windows, therefore a digital storage oscilloscope () with 24 real-time channels is required for a transmission with 6 spatial modes (alternatively, time-multiplexed receiver schemes, where subsets of modes are delayed by single-mode fibers, have been proposed to reduce the number of ports that are necessary in the DSO [10.109, 10.122]).

The resulting signals are stored in the DSO, and subsequently processed by applying MIMO-DSP techniques, similar to the methods presented in Chap. 6.

Some representative results of MIMO-based transmission in multimode and coupled-core fibers are summarized in Table 10.3. By the terms capacity and spectral efficiency in the table, and more in general in this review of experimental results, we refer to the largest achieved transmission rate, and to the same quantity divided by the total transmission bandwidth, respectively.

Table 10.3 Summary of relevant MIMO-based transmission results in SDM fibers

The longest transmission distances and highest spectral-efficiency-distance products were demonstrated in CC-MCFs, clearly confirming the advantages of the strong coupling regime. The maximum experimental capacity demonstrated in MIMO-SDM transmission clearly surpasses the largest reported values for single-mode fibers. In particular, the largest spectral efficiency demonstrated is as high as \({\mathrm{202}}\,{\mathrm{(bit/s)Hz}}\) which is well above the nonlinear Shannon limit for single-mode fibers [10.1, 10.136] which is \({\mathrm{26.5}}\,{\mathrm{(bit/s)Hz}}\) for a fiber length of \({\mathrm{27}}\,{\mathrm{km}}\), indicating that mode-multiplexed transmission over a few-mode fiber (), that is, a fiber that supports 10 or fewer modes, has the technical potential to be considered as a replacement for single-mode fibers.

Transmission results for some representative multicore fiber transmission experiments are summarized in Table 10.4.

Table 10.4 Summary of relevant SDM transmission in uncoupled multicore fibers

Multicore fibers, especially in combination with few-mode cores, can achieve spatial multiplicities larger than 100, providing an impressive transmission capacity in excess of \({\mathrm{10}}\,{\mathrm{Pb/s}}\), however only for distances shorter than \({\mathrm{100}}\,{\mathrm{km}}\). Longer distances up to \({\mathrm{8800}}\,{\mathrm{km}}\) can be achieved using single-mode cores at a notable capacity of \({\mathrm{520}}\,{\mathrm{Tb}}\), which is of interest in particular for submarine transmission, where multiple parallel paths can achieve superior performance under a constraint of limited power [10.131].

7 Nonlinear Effects in SDM Fibers

In the previous sections we only considered linear effects in multimode fiber propagation. However, the transmission capacity of multimode systems, just like in the single-mode counterpart [10.1], is ultimately limited by nonlinear effects. The theory of nonlinearities in multimode fibers is challenging as all possible interactions between all involved modes have to be considered. Nonlinear multimode propagation is described by the coupled nonlinear Schrödinger equations [10.137, 10.66, 10.67]. If, for ease of discussion, we neglect loss and mode-dependent chromatic dispersion, the equations can be expressed as follows

$$\displaystyle\begin{aligned}\displaystyle\frac{\partial\boldsymbol{E}}{\partial z}&\displaystyle=\mathrm{j}\mathbf{B}_{0}\boldsymbol{E}-\mathbf{B}_{1}\frac{\partial\boldsymbol{E}}{\partial t}-\mathrm{j}\frac{\beta_{2}}{2}\frac{\partial^{2}\boldsymbol{E}}{\partial t^{2}}\\ \displaystyle&\displaystyle\quad+\mathrm{j}\gamma\sum_{h,k,m,n=1}^{2N}C_{nhkm}E_{h}^{\ast}E_{k}E_{m}\boldsymbol{\hat{u}}_{n}\;,\end{aligned}$$
(10.95)

where \(\mathbf{B}_{0}=\mathbf{B}(z,\omega_{0})\) and \(\mathbf{B}_{1}=\partial\mathbf{B}(z,\omega_{0})/\partial\omega\) account for random mode coupling and intermodal walk-off, respectively, \(\beta_{2}\) is the mode-averaged chromatic dispersion coefficient, \(\gamma\) is the nonlinearity coefficient defined for single-mode fibers [10.138], and where by \(\boldsymbol{\hat{u}}_{n}\) we denote a \(2N\)-dimensional column vector whose \(n\)-th component is equal to one and the others to zero. The nonlinearity coefficients \(C_{nhkm}\) involve overlap integrals between the modes, lateral profile functions and their expressions can be found in [10.66] and references therein. As can be seen in (10.95), the Kerr nonlinearity produces a total of \((2N)^{4}\) coefficients (\((2N)^{3}\) coefficients per mode) that have to be considered in the study of nonlinear effects. This can be a challenging task, especially when the modal properties vary strongly between modes, and in general only detailed numerical simulations will provide representative results [10.137]. In contrast, when all modal properties are similar, like in the case of strongly coupled fibers, theoretical results have indicated a significant advantage for strongly coupled SDM fibers over equivalent single-mode fibers [10.139, 10.66]. In the following two sections we briefly describe nonlinear experimental work performed in few-mode fibers and coupled-core multicore fibers.

7.1 Impact of Nonlinearities in the Strong-Coupling Regime

Coupled-core fibers are interesting for the study of nonlinear effects because all fiber modes have similar modal properties in terms of effective area and propagation coefficients, and hence the electric field propagates in the regime of strong mode coupling described earlier in this chapter. One important consequence of this situation is that the nonlinear term that appears in the propagation equation (10.95) can be drastically simplified by taking into account the fact that the length-scale on which random mode coupling is effective is by orders of magnitude smaller than typical nonlinear length-scales. The simplified propagation equation, which is known as the multicomponent Manakov equation [10.140], is in the form

$$\displaystyle\frac{\partial\boldsymbol{E}}{\partial z}=-\beta_{1}\frac{\partial\boldsymbol{E}}{\partial t}-\mathrm{j}\frac{\beta_{2}}{2}\frac{\partial^{2}\boldsymbol{E}}{\partial t^{2}}+\mathrm{j}\gamma\kappa|\boldsymbol{E}|^{2}\boldsymbol{E}\;,$$
(10.96)

where \(\beta_{1}\) is the inverse group velocity common to all modes, and where the nonlinearity appears through the total optical power only, consistent with the fact that the electric field is isotropically distributed in the \(2N\)-dimensional hyperpolarization space. As can be seen, the \((2N)^{4}\) nonlinearity coefficients \(C_{nhkm}\) are replaced by a single coefficient \(\kappa\), which is given by [10.140]

$$\displaystyle\kappa=\sum_{h,n}\frac{C_{nhhn}+C_{nhnh}}{2N(2N+1)}.$$
(10.97)

Equations (10.96) and (10.97) describe nonlinear propagation in the most general case of \(2N\) strongly coupled modes. In the specific case of coupled-core fibers, which is considered in this section, (10.97) can be further simplified, with the result [10.66]

$$\displaystyle\kappa\gamma=\frac{1}{3}\frac{8}{2N+1}\gamma_{0}\;,$$
(10.98)

where \(\gamma_{0}\) is the nonlinearity coefficient of a single-mode fiber with the same radius and refractive-index profile of the individual cores (for \(N=1\), (10.98) yields \(\gamma\kappa=\frac{8}{9}\gamma_{0}\), the nonlinearity coefficient of the famous Manakov equation describing nonlinear propagation in single-mode fibers with random polarization coupling [10.141, 10.142]).

The scaling of \(\kappa\) with the number of modes is key to understanding the improved tolerance of coupled-core multicore fibers to nonlinear distortions. This can be easily seen by expressing the nonlinear term as \(\gamma\kappa|\boldsymbol{E}|^{2}\sim 4\gamma_{0}/3\sum_{n}|E_{n}|^{2}/2N\), which shows that the various modes can be considered as sources of nonlinear noise whose power is proportional to \(1/(2N)^{2}\). Since they carry independent signals, the total nonlinear noise power results from the sum of the individual contributions and hence it scales like \(\sim 2N\times 1/(2N)^{2}=1/2N\), thereby reducing with the number of strongly coupled modes supported by the fiber [10.139, 10.66]. A formal characterization of the nonlinear interference noise can be found in [10.143]. Note that while (10.98) is an analytical result derived specifically for coupled-core multicore fibers, the scaling \(\gamma\kappa\sim 1/N\) is a more general characteristic of fibers operating in the regime of strong mode mixing. The simple argument underpinning this statement is that random mode coupling distributes the power transmitted in each mode equally between all modes, with the result that on average the nonlinearity must be proportional to the mode-averaged power, which is equal to \(|\boldsymbol{E}|^{2}/2N\).

The superior tolerance of coupled-core multicore fibers to nonlinear distortions, as analytically predicted in [10.139, 10.66] and seen in early simulation work [10.144], has recently been confirmed in transmission experiments performed with a four-core fiber [10.117]. The results of an experimental comparison between a single-mode fiber [10.145] and a four-core coupled-core fiber [10.146] with nominally identical cores and the same span length are shown in Fig. 10.16a,b.

Fig. 10.16a,b
figure 16figure 16

Transmission performance comparison between a single-mode fiber and a 4-core coupled-core fiber with identical length and core design for a WDM signal with 15 channels at a baudrate of \({\mathrm{30}}\,{\mathrm{GBd}}\) and a channel spacing of \({\mathrm{33.33}}\,{\mathrm{GHz}}\). (a) Quality factor \(Q\) as a function of the launch power for distances of 2200, 4400, and \({\mathrm{6600}}\,{\mathrm{km}}\), for a 16 QAM signal. (b\(Q\) factor as a function of distance for QPSK, 16 QAM, and 64 QAM modulated signals

The pure-silica core design realizes an ultralow loss and larger effective area high-performance fiber typically utilized in submarine links (see also Chap. 2 for more detail). Figure 10.16a,ba shows the quality factor \(Q\) as a function of the launch power per wavelength channel in a recirculating-loop system with 110-km-long spans. As can be clearly seen, the optimum launch power for the coupled-core fiber is about \({\mathrm{2}}\,{\mathrm{dB}}\) larger, indicating a better tolerance to nonlinearities, which results in \(Q\) factors that are about \({\mathrm{0.8}}\,{\mathrm{dB}}\) larger. Figure 10.16a,bb compares the launch-power-optimized \(Q\) factors as functions of the propagation distance in the same recirculating-loop experiment, and for different modulation formats. The results clearly show that for all tested formats and distances up to \({\mathrm{10000}}\,{\mathrm{km}}\) the coupled-core fiber outperforms the equivalent single-mode fiber.

7.2 Impact of Nonlinearities in Few-Mode Fibers

Nonlinear effects in few-mode fibers are different than in single-mode fibers, as the modal properties allow for phase-matching conditions that are forbidden in single-mode fibers (see also Chap. 9 for a description of Kerr nonlinearities in single-mode fibers). For example, four-wave mixing is strongly suppressed in nonzero-dispersion single-mode fibers, because of the impact of chromatic dispersion on the phase-matching condition. In few-mode fibers, however, modal dispersion can compensate for chromatic dispersion, and therefore strong four-wave mixing can be observed. The effect can be better understood considering cross-phase modulation, where the intensity fluctuations of a signal traveling in one mode can imprint a phase on a second signal traveling in another mode. If both signals travel at the same group velocity, the interaction length for this effect becomes long, and a strong effect can be observed. As the group velocity depends on wavelength and mode, in low DGD few-mode fibers (like optimized GI fibers) it is possible to find conditions where two different modes at two different wavelengths have a matched group delay. This effect was experimentally observed in a fiber with three spatial modes [10.147] and a length of \({\mathrm{5}}\,{\mathrm{km}}\), confirming that the effect does not degrade significantly even in the presence of perturbations along the fiber. Similar experiments were also reported for fully nondegenerate four-wave mixing [10.148, 10.149], also confirming that four-wave mixing effects in few-mode fibers are non-negligible and can provide significant penalties for mode-multiplexed MIMO-based transmission.

As for the modeling of nonlinear propagation in few-mode fibers, we note that a similar simplification of the coupled NLSEs as in the case of coupled-core multicore fibers is obtained by taking into account the fact that modes belonging to the same group of quasi-degenerate modes mix strongly during propagation. The result is a set of coupled multicomponent Manakov equations, which in the case of two mode groups denoted a and b can be expressed in the following form,

$$\begin{aligned} & \begin{aligned}\displaystyle\frac{\partial\boldsymbol{E}_{\mathrm{a}}}{\partial z}&\displaystyle=-\beta_{1,\mathrm{a}}\frac{\partial\boldsymbol{E}_{\mathrm{a}}}{\partial t}-\mathrm{j}\frac{\beta_{2,\mathrm{a}}}{2}\frac{\partial^{2}\boldsymbol{E}_{\mathrm{a}}}{\partial t^{2}}\\ \displaystyle&\displaystyle\quad+\mathrm{j}\gamma\left(\kappa_{\mathrm{a}}|\boldsymbol{E}_{\mathrm{a}}|^{2}+\kappa_{\mathrm{ab}}|\boldsymbol{E}_{\mathrm{b}}|^{2}\right)\boldsymbol{E}_{\mathrm{a}}\;,\end{aligned}\end{aligned}$$
(10.99)
$$\begin{aligned} & \begin{aligned}\displaystyle\frac{\partial\boldsymbol{E}_{\mathrm{b}}}{\partial z}&\displaystyle=-\beta_{1,\mathrm{b}}\frac{\partial\boldsymbol{E}_{\mathrm{b}}}{\partial t}-\mathrm{j}\frac{\beta_{2,\mathrm{b}}}{2}\frac{\partial^{2}\boldsymbol{E}_{\mathrm{b}}}{\partial t^{2}}\\ \displaystyle&\displaystyle\quad+\mathrm{j}\gamma\left(\kappa_{\mathrm{ab}}|\boldsymbol{E}_{\mathrm{a}}|^{2}+\kappa_{\mathrm{b}}|\boldsymbol{E}_{\mathrm{b}}|^{2}\right)\boldsymbol{E}_{\mathrm{b}}\;,\end{aligned}\end{aligned}$$
(10.100)

where \(\boldsymbol{E}_{\mathrm{a}}\) and \(\boldsymbol{E}_{\mathrm{b}}\) are state vectors of dimensions \(2N_{\mathrm{a}}\) and \(2N_{\mathrm{b}}\), respectively, which describe the electric field in the two mode groups. The coupled Manakov equations, derived in [10.150] for arbitrary values of \(N_{\mathrm{a}}\) and \(N_{\mathrm{b}}\), and in [10.137] for \(N_{\mathrm{a}}=N_{\mathrm{b}}=1\), can be used for the analytical study of intergroup nonlinear effects [10.143] under the assumption of negligible linear coupling between mode groups. However, if the propagating groups couple to a non-negligible extent, (10.99) and (10.100) must be supplemented with additional terms that account for linear intermodal crosstalk, which reduces their analytical tractability significantly [10.151, 10.152, 10.153, 10.66, 10.95].

8 Routing in SDM Networks

The signals transmitted over an SDM link, are typically associated with a spatial channel \(s_{n}\) and a wavelength channel \(\lambda_{m}\), where the indices \(n\) and \(m\) identify the respective spatial and wavelength channel. Note that spatial channels are defined either as modes of an optical waveguide, as physically separated light-paths using multiple waveguides, or as a combination of the two.

In conventional wavelength-multiplexed networks, wavelength is used to optically route signals when traversing a network node. For each wavelength of an ingress fiber it is possible to select an egress fiber, as long as the wavelength channel of the egress fiber has not been assigned to another incoming signal at the same wavelength. This limitation is referred to as wavelength blocking and makes the initial network configuration and the subsequent channel provisioning (adding new channel routes in a live network) mathematically more complex, increases the blocking probability, and therefore reduces the capacity of the network [10.154].

For spatial channels similar limitations may occur, for example, when spatial channels are implemented by using distinct spatial modes. In contrast, if the spatial modes are carried by spatially separated waveguides, or separated by a spatial mode multiplexer, the individual spatial channels are all equivalent and can be switched between each other with no restriction.

In the general case, optical networks can be built based on nodes that are capable of switching any wavelength from any spatial channel coming from any direction, to any wavelength and to any spatial channel going to any direction (here we define directions as geographically separate routes and spatial channels as parallel-running channels, either in a single fiber like a multicore fiber, or multiple single-mode fibers hosted in a single conduit or cable). The complexity of such a node in terms of physical implementation and dynamic operation (traffic provisioning) is undesirably larger and is not cost-effective. It is therefore necessary to limit the complexity by forming logical units of switching, to reduce the logical channel number to around 100 channels for each direction. This can be achieved in various ways [10.155], and in the next sections the three most promising approaches to the bundling of wavelength/spatial channels are reviewed.

8.1 Parallel Single-Mode Systems

The first approach consists of duplicating conventional single-mode WDM systems and operating them in parallel. This approach in not cost-effective, but represents the baseline to be considered for alternative approaches. The most relevant limitation of this approach is that it is not possible to share resources between the duplicated systems, which may be responsible for significant blocking probability and under-utilization of resources.

8.2 Spatial Superchannels

In this approach the spatial channels are bundled together in a fixed number \(N\), and components are used that can perform the equivalent single-mode operation on all \(N\) channels at the same time. The term spatial superchannel was coined [10.156] in reference to spectral superchannels where multiple subsequent wavelength channels are bundled to form a single spectrally wider transmission channel. Spatial superchannels look similar to a traditional single-mode system in terms of operation, except that the capacity is increased by a factor of \(N\). This concept is particularly attractive because wavelength-selective switches supporting spatial superchannels, can be implemented using the joint switching architecture, where a single switching element can be reused to switch \(N\) channels in parallel, therefore effectively increasing the switch capacity of the switching element. This principle is shown in Fig. 10.17a,b, where a tilt mirror is used to switch light between one input and two output superchannels, by tilting the mirrors such that the light is reflected from the input to the desired output.

Fig. 10.17a,b
figure 17figure 17

Switching multiple spatial channels with a single switching element: Groupwise switching versus interleaved switching geometry

In Fig. 10.17a,ba a groupwise switching arrangement is shown which requires an \(N\)-times larger tilt angle compared to a traditional single-mode switch, whereas Fig. 10.17a,bb shows the advantageous interleaved switching arrangement, where the superchannels can be switched by using tilt angles that are comparable to the single-mode switch. Note that the overall switch size of a spatial superchannel switch is larger than the single-mode counterpart, as it needs to accommodate the larger required optical aperture and a more complex lens design. However, the switch element, which is often the main factor limiting the number of channels that can be switched, can stay the same size. Figure 10.17a,b shows the principle of a simple switch, but the same idea can also be used to build wavelength-selective switches, where light is separated in wavelength in the out-of-plane direction and the single mirror is replaced with a mirror array [10.155].

Optical networks based on the concept of spatial superchannels can be implemented with multicore fibers, where the number \(N\) of parallel channels coincides with the number of cores. The spatial superchannel architecture is of interest also for multimode fibers, where it is required to transmit all fiber modes in a common link, so that MIMO processing can be used to compensate for propagation-induced mode coupling.

Additionally, a spatial superchannel can also be used to logically bundle multiple single-mode fibers and therefore constitutes a very promising architecture for all possible SDM fiber types.

The main drawback of the spatial superchannel architecture is that there is no simple way to increase the number of parallel spatial channels \(N\) composing the superchannel once the network is deployed and operated.

8.3 Space-Routed Networks

An alternative way to build SDM optical networks is to completely drop the wavelength dimension for the switching domain and utilize pure spatial switching based on traditional switches that are wavelength transparent. This solution offers several advantages: All channels are equivalent and therefore no wavelength blocking is observed, which can dramatically simplify the network reconfiguration. Also, space switches are much easier to build and typically have lower loss compared to wavelength-selective switches. Furthermore, the local add/drop ports of the nodes become significantly simpler, as they are equivalent to ports carrying traffic from fibers coming from different directions.

The disadvantage of this solution is that it requires fibers with completely uncoupled spatial channels, and therefore it is not compatible with MIMO-based multimode transmission. The solution is therefore particularly attractive for single-mode and uncoupled multicore fibers. Space-routed networks also require transceivers capable of generating signals that occupy the whole transmission band. Potentially, such full-band transceivers are expected to be more economical as they offer a larger potential for integration. However, the network can suffer from granularity issues if the desired link capacity is small compared to the capacity of the full-band receiver.

9 Conclusion

Space-division multiplexing addresses the technologies needed to scale the link and network capacities of current optical communication systems. The main proposed solutions include new fiber types, optical amplifiers, and optical switches.

As an alternative to standard single-mode fibers, multimode fibers and multicore fibers offer effective ways to increase the spatial multiplicity of optical fibers, at the expense of more complex linear and nonlinear transmission effects that we reviewed in detail in the chapter. The linear transmission effects in a system supporting \(N\) spatial modes can be mitigated by using \(2N\,{\times}\,2N\) MIMO digital signal processing, which is a generalization of the \(2\,{\times}\,2\) MIMO processing used in conventional single-mode digital coherent transmission. Nonlinear impairments are typically moderately increased compared to the single-mode case, when the spatial modes are weakly coupled, whereas a reduction of nonlinear effects can be observed in the regime of strong mode mixing, like for example in the case of coupled-core multicore fibers.

Cladding- and core-pumped optical multicore and multimode amplifiers offer a sizeable potential for cost reduction by significantly reducing the number of required optical elements per amplified spatial channel.

Optical switches supporting multiple modes or spatial channels can be effectively implemented by using joint-switching architectures, which dramatically increase the switching capacity of the switching element by acting on all spatial channels at the same time.

Space-division multiplexing also enables multiple new network architectures. Even though no single one-size-fits-all SDM architecture is currently transpiring, technologies that are currently being investigated, have the potential to offer a significant advantage in terms of costs over parallelizing conventional single-mode fiber-based systems. The optimum solution will depend on the targeted application and in particular on the required link capacity and network granularity.