Introduction

Optics is a well-researched discipline, explored for more than two centuries by a galaxy of scientific luminaries from all parts of the globe and taught by renowned scientists in prestigious schools all over the world. The topic of this paper, the “quantum theory of paraxial ray optics according to Dietrich Marcuse” [1, 2] is, therefore, somewhat of an outlier for being virtually unknown and ignored in the vast optics literature. Its present re-appraisal is warranted because of its conceptual connection to a new class of interferometers with certain unique characteristics. Although invented in 1995 [3], the significance of this class of interferometers has become apparent only recently in the light of new developments [4] in plasma diagnostics applied to the Dense Plasma Focus (DPF) [5,6,7].

The DPF is a well-known scalable laboratory plasma fusion device, that produces an intense burst of fusion neutrons when operated with deuterium or deuterium–tritium. It is useful for high fidelity neutron radiography of fast events [8] such as nuclear weapons research using sub-critical implosions. Its use with other gas mixtures has been suggested for producing short-lived radionuclides suitable for medical imaging [9, 10], for nanotechnology [11, 12] and as an intense source of fast ions [13].

The recent developments refer to what appear to be sub-millimetre-scale filamentary structures embedded within a few millimetres size plasma revealed in differential interferometry (also known as lateral shearing interferometry) images [4]. They are observed to be oriented along axial as well as azimuthal directions. While such filaments have been observed [7] for a long time in visible, soft-x-ray and extreme ultra-violet imaging in DPF operation with admixture of heavy gases in deuterium, their observation using refractivity-based diagnostics in pure deuterium operation confirms them to be associated with spatial modulation of density. This has a special significance.

Spatial gradients of density are likely to be associated with those of pressure, in view of the fact that heat flow would tend to smooth out temperature gradients over such small dimensions. Spatial gradients of pressure cannot be sustained over time scales necessary for observation unless they are counterbalanced by equal measures of complementary spatially modulated magnetic force density. This implies that the current in the plasma flows along special pathways of sub-millimetre spatial width.

This has significant implications in understanding the complex physics of the DPF [7]. It suggests that the current that sustains the plasma at least partially flows in a network of axially and azimuthally oriented channels analogous to the vascular network of the circulatory system in a complex living organism such as an animal or a tree. This is an altogether new paradigm. Sooner or later, this aspect would demand a diagnostic technique that can visualize and make measurements upon such network of submillimetre scale filamentary structures using optical techniques that are sensitive to spatial modulation of plasma refractivity.

This paper is motivated partly by the realization that conventional differential interferometry suffers from an inherent handicap (discussed later) that would limit its suitability for deeper investigations into this aspect [4] and that a far better alternative exists but is not widely known in the optics and plasma diagnostics community. This, however, requires appreciation of some significant insights from Dietrich Marcuse [1, 2], which explains the title.

The “quantum theory of paraxial ray optics according to Dietrich Marcuse” (DM) [1, 2] has no overt relation with the quantum theory of light. Rather, it is a demonstration of a formal similarity between the transition from classical to quantum mechanics of point particles on the one hand and recovery of wave optics from ray optics on the other. Ray optics can be derived [14, 15] from Fermat’s Principle of Least Time (PLT) and classical mechanics can be derived from Hamilton’s Principle of Least Action (PLA) [16]. DM demonstrates [1] that PLT can be cast in a form analogous to PLA.

This transition is made possible by a change of variables. The variable ‘time’ is replaced by the coordinate z along the principal optic axis – it might be thought of as ‘generalized time’. The coordinate space is reduced to the two dimensions (x,y) on the wavefront – a constant-phase surface. The ‘generalized velocities’ are therefore replaced with \(x^{\prime} = {{dx} \mathord{\left/ {\vphantom {{dx} {dz}}} \right. \kern-0pt} {dz}} \equiv \delta_{x} \left( {x,y} \right)\); \(y^{\prime}{{ = dy} \mathord{\left/ {\vphantom {{ = dy} {dz}}} \right. \kern-0pt} {dz}} \equiv \delta_{y} \left( {x,y} \right)\). These are the inclination angles (in radians) of a ray located at the point (x,y) on the wavefront with respect to the optic axis. The “Lagrangian” turns out [1, 2, 17] to be the space-dependent refractive index of the medium multiplied by the derivative of the path length with respect to z. The Euler equations of the variational problem [1, 2, 16, 17] turn out be equivalent to the ray equations of geometrical optics.

Using this Lagrangian, ‘generalized momenta’ are defined as variables conjugate to the coordinates and the Hamiltonian functional [16] is related to the Lagrangian by a Legendre transformation with ‘generalized momenta’ becoming the new independent variables in place of ‘generalized velocities’. Expressed in terms of the generalized momenta, the Hamiltonian of ray optics [1, 2, 17] has an algebraic form similar to the relativistic Dirac Hamiltonian of point particles. DM then constructs the corresponding “non-relativistic” limit and shows that it corresponds with paraxial approximation of ray optics. The Hamilton–Jacobi equations corresponding to this “non-relativistic” Hamiltonian turn out [1, 2, 17] to be equivalent to the eikonal equation of ray optics [14, 15, 17].

The following cautionary quote from DM [1] needs a special mention: “The generalized momentum of the ray must not be confused with photon momentum. The two have nothing in common. The photon momentum describes actual mechanical momentum that is carried by the photon, while the generalized momentum of the light ray was derived purely formally from the Hamiltonian formalism of ray optics”. This disclaimer is revisited later in the paper.

Having established the correspondence between ray optics and the classical mechanics of point particles, DM posits that just as classical mechanics is a limiting case of quantum mechanics as the Planck constant tends to zero, ray optics is a limiting case of wave optics as the wavelength tends to zero. Therefore, just as quantum mechanics was historically “derived” from classical mechanics by replacing conjugate momenta with operators, it should be possible to re-derive the wave equation from ray optics treating the conjugate momenta of optical rays as operators. This is the basic idea of the “quantum theory of paraxial ray optics according to Dietrich Marcuse”.

In this theory, the equivalent of the relativistic Klein–Gordon (K–G) equation [18, 19] turns out [1, 2] to be identical with the scalar wave equation of optics [14, 15] with the Planck’s constant replaced by the wavelength. This “discovery of the equivalence of wave optics with the quantum theory of rays” [1] enables him to “immediately use all the well-known results of quantum mechanics and apply them to the quantum theory of ray optics”. He derives [1] the Schrodinger equation as the “paraxial approximation for the reduced wave equation”. “The wave function ψ of the quantum mechanics of rays is the usual scalar wave function of wave optics. In our present theory, it assumes the additional interpretation as a probability amplitude”. “The wave function describes the probability of finding a ray in a field of light rays whose statistical state is characterized by ψ”. From this, the notion of expectation values for measurements of coordinates and momenta of rays can be defined. This leads to the “uncertainty principle of ray optics” that states that the product of the uncertainties in the simultaneous measurement of coordinates and momenta of rays must be greater than a lower bound related to the wavelength.

Operation of DM’s “uncertainty principle of ray optics” is most readily appreciated in the example of differential interferometry that was used [4] for detecting the presence of filamentary structures in the plasma focus. Differential interferometry works by splitting a collimated light beam emerging from a phase object (such as a network of sub-millimetre-size filaments within a few millimetre-size plasma) into two laterally displaced but overlapping beams. Therefore, every object point on the nearly planar test wavefront forms two image points, shifted by \(\pm \Delta x\) with respect to the optic axis. Interference fringes are formed in the overlap region with an intensity modulation in the x direction – straight fringes aligned in the y direction. All rays on the wavefront are thus deliberately deflected in the ± x-directions and the additional deflection caused by the phase object results in a pattern of deviation of the fringes. This deviation pattern is a measure of the pattern of local inclination of rays in the wavefront. But the x-coordinate of the deviated ray on the phase object is known only to within an error of \(\pm \Delta x\). The measurement of ray deviations imposed by the phase object has an inbuilt scale related to the ray deviations imposed by the interferometer and is therefore intrinsically linked with this spatial error.

The differential interferometer experiment [4] that apparently suggests the existence of a network of filament-like structures in the dense plasma focus cannot make credible measurements on the spatial features of these filamentary structures because the lateral displacement of interfering beams would be more than the diameter of the filaments. Information about the spatial structure of the filaments would thus be smeared away. This is a fundamental handicap of the differential interferometry technique that has its origin in the “uncertainty principle of ray optics according to Dietrich Marcuse”.

Many researchers have alluded to an uncertainty principle of optics applied to interferometry [20,21,22,23,24]. Their concern, however, is about the implications of Heisenberg’s uncertainty principle for microtopography using multiple beam interference. DM’s use of the “quantum theory of rays” in deriving the “uncertainty principle of ray optics” is clearly conceptually unrelated to their work. Hence the present emphasis on “uncertainty principle of ray optics according to Dietrich Marcuse”.

This paper recapitulates development of the quantum theory of paraxial ray optics as reported by DM [1] and asks two further questions: Would it be possible to introduce a geometric phase [25,26,27,28,29] into this “quantum theory of ray optics” by adiabatically transporting the “quantum state of the ray” around a cyclic path in some parameter space? If so, what would be its consequences?

The answer to the first question turns out to be affirmative. A new class of interferometers is revealed as the answer to the second question. These interferometers compare the original test wavefront with its copy that has undergone an adiabatic cyclic transport around a closed planar curve. They circumvent, but do not violate, the “uncertainty principle of ray optics according to Dietrich Marcuse”. The comparison of a wavefront with its cyclically transported copy does not amount to a simultaneous measurement. It involves the additional assumption that the wavefront does not change appreciably over the time required for cyclic transport. This condition may be satisfied in many practical situations, where these interferometers potentially represent a practical solution to a measurement problem. Relaxation of the requirement of simultaneous measurement for slowly evolving phase objects removes the limitations placed by the “uncertainty principle of ray optics according to Dietrich Marcuse” by introducing an additional controllable parameter in the proportionality between the fringe shift and the ray deflection, while maintaining a one-to-one correspondence between points on the phase object and those on the interferogram.

This additional parameter is related to the length of the cyclic path traversed by the “quantum state of the ray”. Skew rays travel a different distance around the cyclic path as compared to the parallel rays. The cyclic path brings the “quantum state of the ray” (almost) to its starting point. The image of a point on the wavefront is thus another single point – not a pair of image points as in lateral shearing interferometry that adheres to the requirement of simultaneous measurements. The sensitivity for detection of a fainter phase object, such as a smaller ray deflection caused by modulation of plasma density by a finer filament, can be increased by increasing the length of the cyclic path, provided that imperfections in the light source, the incident beam optics, the optical elements of the interferometer and their mutual alignment do not introduce comparable errors or artifacts and the light source and the phase object do not undergo an appreciable change in the time required for cyclic transport. They may thus represent a way forward in developing plasma diagnostic techniques for studying the filamentary structure of dense magnetized plasmas with adequate detection threshold for small sized filaments without compromising on the spatial resolution. This new class of interferometers may have other applications as well, but that aspect is incidental to the theme of this paper and is briefly commented upon later in the paper.

This paper is organized into the following seven sections  (1) Introduction (2) Revisiting “quantum theory of paraxial ray optics according to Dietrich Marcuse” (3) Geometric phase in the “quantum theory of paraxial ray optics"  (4) Three examples of a new class of interferometers” (5) Experimental demonstration (6) Brief comments on potential applications (7) Summary and conclusions.

Revisiting “quantum theory of paraxial ray optics according to Dietrich Marcuse”

The purpose of this section is to recapitulate the essential logic underlying the “quantum theory of paraxial ray optics according to Dietrich Marcuse” avoiding the many digressions in his treatise related to the broader scope of his work. Its main motivation is to demonstrate that it is distinct from the quantum theory of light and yet, they both have an underlying common structure. Its intended takeaway is the concept of the “quantum state of an optical ray”– a notion that is found in neither classical nor modern treatises on optics and which has no relation with the quantum state of a photon, which is related to its polarization. This notion is shown to be a key logical link between ray optics and a new class of interferometers.

An important issue concerns the necessity for the quantum theory of physics. This necessity was rooted in several experiments in physics during the early part of the twentieth century [18, 19]. The experimental discovery of particle-like properties of electromagnetic waves and wave-like properties of particles [18, 19] needed to be related to experimentally well-established theories of classical electromagnetism and classical mechanics. That need was the driving motivation behind the development of the quantum theory of physics. The “quantum theory of paraxial ray optics according to Dietrich Marcuse” differs from the quantum theory of physics in this respect. There is no experiment in optics that requires the “quantum theory of paraxial ray optics according to Dietrich Marcuse” to be formulated in a manner similar to quantum physics.

The justification for the “quantum theory of paraxial ray optics according to Dietrich Marcuse” lies in certain inferences that are more easily accessible through an analogy with quantum mechanics rather than directly from the electromagnetic theory of light. These inferences serve as the optics designer’s tool box (as DM demonstrates [1]) rather than a contribution to the elucidation of optics by itself. In this sense, the “quantum theory of paraxial ray optics according to Dietrich Marcuse” is a desirable, but not necessary, exercise – more of a matter of choice. That is perhaps the reason why it is seldom referred to in optics literature.

The argument begins with Fermat’s Principle of Least Time (PLT) which states that in going from point A to point B, light follows a path that has the least time of propagation. This can be cast as a variational problem in terms of the phase velocity \(v_{ph} \equiv {c \mathord{\left/ {\vphantom {c {n\left( {x,y,z} \right)}}} \right. \kern-0pt} {n\left( {x,y,z} \right)}}\) of light in a spatially non-uniform medium of refractive index \(n\left( {x,y,z} \right)\) and the element of path length \(ds = \sqrt {dx^{2} + dy^{2} + dz^{2} } = dz\sqrt {\left( {{{dx} \mathord{\left/ {\vphantom {{dx} {dz}}} \right. \kern-0pt} {dz}}} \right)^{2} + \left( {{{dy} \mathord{\left/ {\vphantom {{dy} {dz}}} \right. \kern-0pt} {dz}}} \right)^{2} + 1}\):

$$\delta c^{ - 1} \int\limits_{A}^{B} {n\left( {x,y,z} \right)} ds = \delta c^{ - 1} \int\limits_{A}^{B} {n\left( {x,y,z} \right)} dz\sqrt {x^{{\prime}{2}} + y^{{\prime}{2}} + 1} = 0$$
(1)

Renaming coordinate z along the optic axis as ‘generalized time’, \(x^{\prime} \equiv {{dx} \mathord{\left/ {\vphantom {{dx} {dz}}} \right. \kern-0pt} {dz}}\) and \(y^{\prime} \equiv {{dy} \mathord{\left/ {\vphantom {{dy} {dz}}} \right. \kern-0pt} {dz}}\) as ‘generalized velocities’, (1) can be formally written as

$$\delta \int\limits_{A}^{B} {L\left( {x,y,x^{\prime},y^{\prime},z} \right)dz} = 0$$
(2)

This is formally identical with the variational principle of Hamiltonian mechanics known as the Principle of Least Action, where the functional in (2) is known as the Lagrangian in classical mechanics, and for the ray optics case described by (1), is given by

$$L\left( {x,y,x^{\prime},y^{\prime},z} \right) = n\left( {x,y,z} \right)\sqrt {x^{{\prime}{2}} + y^{{\prime}{2}} + 1}$$
(3)

The solution of the variational problem is well known [16]:

$$\frac{\partial }{\partial z}\frac{\partial L}{{\partial x^{\prime}}} - \frac{\partial L}{{\partial x}} = 0;\,\frac{\partial }{\partial z}\frac{\partial L}{{\partial y^{\prime}}} - \frac{\partial L}{{\partial y}} = 0$$
(4)

Introducing ‘generalized momenta’ as variables canonically conjugate [16] to coordinates x and y as

$$p_{x} = \frac{\partial }{{\partial x^{\prime}}}L\left( {x,y,x^{\prime},y^{\prime},z} \right);\,p_{y} = \frac{\partial }{{\partial y^{\prime}}}L\left( {x,y,x^{\prime},y^{\prime},z} \right)$$
(5)

a new functional of coordinates and canonical momenta, the Hamiltonian, is defined by the Legendre transformation

$$H\left( {x,y,p_{x} ,p_{y} } \right) \equiv p_{x} x^{\prime} + p_{y} y^{\prime} - L\left( {x,y,x^{\prime},y^{\prime},z} \right)$$
(6)

Using (1) and (3) one can solve for the generalized velocities \(x^{\prime},y^{\prime}\) and obtain

$$x^{\prime} = \frac{{p_{x} }}{{\sqrt {n^{2} - p_{x}^{2} - p_{y}^{2} } }};\,y^{\prime} = \frac{{p_{y} }}{{\sqrt {n^{2} - p_{x}^{2} - p_{y}^{2} } }}$$
(7)

Substituting in (4), the functional form of the Hamiltonian is obtained as

$$H\left( {x,y,p_{x} ,p_{y} } \right) = - \sqrt {n^{2} - p_{x}^{2} - p_{y}^{2} }$$
(8)

DM observes [1] that this “Hamiltonian of ray optics has some resemblance to the relativistic energy of a point particle with rest mass m0”:

$$E = c\sqrt {m_{0}^{2} c^{2} + p_{x}^{2} + p_{y}^{2} + p_{z}^{2} }$$
(9)

A short mathematical exercise can be used to derive the Hamilton–Jacobi equations using Eq. (2) which can be shown to be equivalent to the eikonal equation [14, 15, 17] of geometrical optics – an exercise that, however, is a digression from this discussion and will not be pursued further.

A paraxial approximation to Hamiltonian optics can be derived from the observation that for rays that are nearly parallel to the optic axis, \(x^{\prime} \ll 1;y^{\prime} \ll 1\). Equations (5) then imply that \(p_{x} \ll n;p_{y} \ll n\). Paraxial approximation also implies that the space dependent part of the refractive index is small compared with its spatially uniform part:

$$n\left( {x,y,z} \right) \approx n_{0} - \Delta n\left( {x,y,z} \right);\left| {\Delta n} \right|_{\max } \ll n_{0}$$
(10)

For the paraxial approximation, (6) can be approximated as

$$H\left( {x,y,p_{x} ,p_{y} } \right) \approx \frac{{p_{x}^{2} + p_{y}^{2} }}{{2n_{0} }} + \Delta n\left( {x,y,z} \right) - n_{0}$$
(11)

DM observes [1] “The paraxial Hamiltonian of geometrical optics has a very close correspondence to the nonrelativistic Hamiltonian of the mechanics of point particles”

$$H = \frac{{p_{x}^{2} + p_{y}^{2} + p_{z}^{2} }}{2m} + V$$
(12)

“The ray optics problem has one dimension less than does the corresponding problem of point particles. The particle potential V is replaced in a very logical way by the index of refraction of the optical medium. The difference in sign of the two potential terms is immaterial… The additive constant n0 has no physical significance, since any potential is determined only up to an arbitrary constant.”

It is a well-known result [1, 14, 15] that geometrical optics can be derived from electromagnetic theory of light, or its scalar form used in wave optics, by neglecting terms proportional to the wavelength of light. Classical mechanics bears a similar relationship with quantum mechanics in the limit of the Planck’s constant tending to zero. During the development of quantum mechanics, introduction of a wavefunction to represent de-Broglie’s matter waves was a novelty; in the case of optics, the wave nature of light is already a well-established fact. Since quantum mechanics was historically “derived” from classical mechanics by replacing generalized momenta with differential operators that operated on the wavefunction [19], it should be possible to recover wave optics from ray optics using a similar procedure.

DM proceeds [1] on this program with the remarks “Quantization of a physical theory is accomplished by replacing all of the variables by operators. In wave mechanics, the coordinates retain their meaning as numbers but the canonically conjugate variables – the momenta become differential operators”.

$$p_{x} = - i\kappa \frac{\partial }{\partial x};\,p_{y} = - i\kappa \frac{\partial }{\partial y}$$
(13)

These operators act on the wavefunction. “The time coordinate of mechanics is now replaced by the z coordinate. The unit “time” in Planck’s constant must therefore also be replaced by a length coordinate…”. The “constant κ, which takes the place of \(\hbar\), must, instead, have the dimension of the Hamiltonian times length. The Hamiltonian.. is dimensionless, so that κ must have the dimension of length”. “The energy or Hamilton operator is expressed by a time derivative in ordinary quantum mechanics”. “…[U]sing again the correspondence between the time variable and the length variable z” leads to the relation

$$H = i\kappa \frac{\partial }{\partial z}$$
(14)

Squaring this operator equation and applying it to (6) leads to the equivalent of the Klein–Gordon equation [18, 19] in this quantum theory.

$$\frac{{\partial^{2} \psi }}{{\partial x^{2} }} + \frac{{\partial^{2} \psi }}{{\partial y^{2} }} + \frac{{\partial^{2} \psi }}{{\partial z^{2} }} + \frac{{n^{2} }}{{\kappa^{2} }} = 0$$
(15)

Comparing (15) with the reduced scalar wave equation of wave optics [1, 14, 15] derived from electromagnetic theory of light leads to the result \(\kappa = {{\lambda_{0} } \mathord{\left/ {\vphantom {{\lambda_{0} } {2\pi }}} \right. \kern-0pt} {2\pi }}\), where \(\lambda_{0}\) is the free space wavelength of light.

The paraxial approximation of the reduced wave equation is obtained [1] using (13) and (14) with (9) and is shown to have an algebraic form identical with the Schrodinger equation. Then DM [1] makes a crucial observation: “The wave function ψ of the quantum mechanics of rays is the usual scalar wave function of wave optics. In our present theory, it assumes the additional interpretation as a probability amplitude. The wave function describes the state of a statistical ensemble of rays. Its absolute square value \(\overline{P} = \left| {\psi \left( {x,y,z} \right)} \right|^{2}\) assumes the meaning of the probability density for finding a light ray inside the unit area in the x, y plane at the position z”. “The probability interpretation works only for the paraxial ray theory. The wave function of the Klein–Gordon equation cannot be interpreted as a probability amplitude”.

This observation leads to [1] the derivation of “the uncertainty principle as it applies to our quantum theory of light rays”:

$$\Delta x\Delta p_{x} \ge \frac{{\lambda_{0} }}{4\pi };\,\Delta y\Delta p_{y} \ge \frac{{\lambda_{0} }}{4\pi }$$
(16)

where the uncertainties are defined using the probabilistic interpretation of the paraxial wave function:

$$\Delta x = \left[ {\left\langle {\left( {x - \left\langle x \right\rangle } \right)^{2} } \right\rangle } \right]^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} ;\,\Delta p_{x} = \left[ {\left\langle {\left( {p_{x} - \left\langle {p_{x} } \right\rangle } \right)^{2} } \right\rangle } \right]^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}}$$
(17)

The implications of this uncertainty principle for the plasma diagnostics experiment [4] based on differential interferometry have already been mentioned in the Introduction.

One may legitimately inquire into the relation between the propagation vector \(\vec{k} \equiv \hat{x}k_{x} + \hat{y}k_{y} + \hat{z}\sqrt {k^{2} - k_{x}^{2} - k_{y}^{2} }\), \(k \equiv {{2\pi } \mathord{\left/ {\vphantom {{2\pi } {\lambda_{0} }}} \right. \kern-0pt} {\lambda_{0} }}\) in the electromagnetic theory of light, the concept of ray in geometrical optics and the momentum in the “quantum theory of paraxial ray optics according to Dietrich Marcuse”. In the paraxial approximation in free space, they refer to the same aspect of light. For example, \(\delta_{x} \approx {{k_{x} } \mathord{\left/ {\vphantom {{k_{x} } k}} \right. \kern-0pt} k} \approx p_{x} ;\delta_{y} \approx {{k_{y} } \mathord{\left/ {\vphantom {{k_{y} } k}} \right. \kern-0pt} k} \approx p_{y}\). They differ only in their extrapolation to a wider context. Paraxial Hamiltonian optics is in fact widely used in the design of charged particle optics where the rays are actually classical trajectories of material particles. The mechanical momentum \(\hbar \vec{k}\) of the photon is proportional to the ray momentum in DM’s theory. The cautionary quote by DM referred earlier therefore needs to be considered in the contemporary context of his treatise.

It would be imprudent to use the above remarks to dismiss DM’s theory as no different from the quantum theory of physics. The multi-fold similarity between the “quantum theory of paraxial ray optics according to Dietrich Marcuse” and the quantum theory of non-relativistic point particles suggests that a ray can reasonably be associated with a “quantum state” labelled by its coordinates and momenta at a particular instant of “generalized time” – a certain place on the optical path of a paraxial beam. The notion of “the quantum state of a ray” then serves as a bridge between two distinctly different disciplines with important practical consequences. For example, a curious mind might wonder whether such a quantum state would acquire a geometric phase [25,26,27,28,29] if it were to be “adiabatically transported” along a cyclic path in some parameter space following Berry’s logic [20]. Such transport would be an exercise in geometrical optics, not quantum mechanics. This question is considered in the next section.

Geometric phase in the “quantum theory of paraxial ray optics”

Geometric phase [25,26,27,28,29] pertains to an abstract geometrical aspect of physical systems, known as anholonomy, whereby a system traces a cyclic path enclosing an area [i.e. not retracing its path away from and back towards its initial state)] in some parameter space and returns to an initial set of parameters, but nevertheless undergoes a distinct change that depends on the cyclic path in the parameter space. Several kinds of systems and parameter spaces exhibit [26] manifestations of the geometric phase. The parameter spaces [26] include the Poincaré sphere, space–time, real space, momentum space, 7-dimensional phase space (consisting of 4 dimensions of space–time and 3 of momentum space) in diverse fields such as quantum mechanics, classical mechanics, solid state physics, optics and even commonplace occurrences such as the tumbling cat [29, 30] and parallel parking of a car [29]. Berry [25] has led the efforts in understanding this phenomenon, which, for a quantum system, is a result of adiabaticity and anholonomy in the cyclic transport of a quantum state in some parameter space. Adiabaticity means that the quantum state is changed minimally or not at all during the cyclic transport.

In the case of the “quantum state of the ray”, an adiabatic cyclic transport should mean that the ray is moved parallel to itself in a closed curve so that it comes back to the same point in space making the same angle with respect to the optic axis. A perfect right-angle prism used in retroreflection mode is known to reflect a ray parallel to itself regardless of its inclination, with the optic axis shifted laterally and inverted. Two perfect right angle prisms with their hypotenuse faces optically facing each other should then transport a ray back to (or for sufficiently small ray inclination, very close to) its starting position maintaining its inclination.

A ray might then acquire an additional phase which depends on the magnitude of the cyclic path and which could be measured by comparing a test wavefront with its cyclically and adiabatically transported copy. Adiabatic transport would imply that there would be negligible relative lateral displacement between the wavefront being tested and its cyclically transported copy, unlike a lateral shearing interferometer, so that there would be a negligible loss of spatial resolution. The dependence of this phase shift on the length of the cyclic path would introduce a new control variable, which could be used to tune the sensitivity of phase measurement. This inference of the possibility of introducing a controllable phase shift proportional to ray inclination, without compromising on spatial resolution, by using optical properties of a perfect right-angle prism in retroreflection mode is the crucial link between the “quantum theory of paraxial ray optics” and a new class of interferometers.”. This inference is not readily accessible in conventional optics without invoking the “quantum theory of ray optics” [1] with its attendant notion of the quantum state of a ray, along with Berry’s theory [20] of the geometric phase. It is also not accessible in the quantum theory of photons or the electromagnetic theory of light.

Note that if the prisms have an error in their right angle, the ray would not be transported exactly parallel to itself and that would constitute non-adiabatic transport.

Figure 1 illustrates the concept of the cyclic and adiabatic transport of a ray.

Fig. 1
figure 1

Illustration of adiabatic, cyclic transport of the “quantum state of a ray”

A distinction is made between the device coordinate system, denoted by \(\left( {x_{1} ,x_{2} ,x_{3} } \right)\) and the wavefront coordinate system, denoted by (x, y, z) – the last one being along the optic axis of the wavefront which may be folded in the \(\left( {x_{1} ,x_{2} ,x_{3} } \right)\) space. Figure 1 shows four mutually perpendicular plane reflecting surfaces, represented by the square ABCD (called the reference square), lying in the \(x_{1} x_{3}\) plane. The diagonal AC lies along the x1 axis. Its length \(\boldsymbol{\mathcal{D}} \equiv AC\) plays a central role in the discussion. Two rays, R1 and R2, both lying in the plane of the figure and starting from coincident points \(P,P^{\prime}\) on AB, are traced, obeying the laws of reflection at each surface.

Ray R1 makes an angle of \({\pi \mathord{\left/ {\vphantom {\pi 4}} \right. \kern-0pt} 4}\) with the normal to the surface AB. It therefore repeatedly traces out the cyclic rectangular path \(PQRSPQRS....\). This path can be thought of as a folded cyclic optic axis. It is this property that ensures that every point on the planar wavefront would have a corresponding single image point when it is cyclically transported in the manner described above.

The skew ray R2 makes a small angle \(\delta \equiv x^{\prime}\) radians with respect to R1 in the \(x_{1} x_{3}\) plane. It traces out an open path \(P^{\prime}Q^{\prime}R^{\prime}S^{\prime}P^{\prime\prime}Q^{\prime\prime}...\). At all times, the angle between the two rays remains constant, “preserving its momentum \(p_{x}\)” during the cyclic transport.

From Fig. 1, it is clear that the length of the cyclic optical path of ray R1 equals \(2\boldsymbol{\mathcal{D}}\). Every ray that is parallel to ray R1 also has the cyclic optical path length \(2\boldsymbol{\mathcal{D}}\). The image IP of its starting point P (which lies on side AB), formed by reflection in surfaces BC, CD, DA and AB lies at a distance \(2\boldsymbol{\mathcal{D}}\) on the line QP-extended, at its intersection with the line \(Q^{\prime\prime}P^{\prime\prime}\)-extended. Dropping a perpendicular to the line QP-extended from P to intersect the line \(Q^{\prime\prime}P^{\prime\prime}\) -extended at T, it is clear that \(PT = 2\boldsymbol{\mathcal{D}}\tan \delta\). In the triangle \(PTP^{\prime\prime}\), \(\angle P^{\prime\prime}PT = {\pi \mathord{\left/ {\vphantom {\pi 4}} \right. \kern-0pt} 4}\) by construction, \(\angle P^{\prime\prime}TP = {\pi \mathord{\left/ {\vphantom {\pi 2}} \right. \kern-0pt} 2} - \delta\) and \(\angle TP^{\prime\prime}P = {\pi \mathord{\left/ {\vphantom {\pi 4}} \right. \kern-0pt} 4} + \delta\). Therefore,

$$\frac{{PP^{\prime\prime}}}{{\sin \left( {{\pi \mathord{\left/ {\vphantom {\pi {2 + \delta }}} \right. \kern-0pt} {2 + \delta }}} \right)}} = \frac{{P^{\prime\prime}T}}{{\sin \left( {{\pi \mathord{\left/ {\vphantom {\pi 4}} \right. \kern-0pt} 4}} \right)}} = \frac{PT}{{\sin \left( {{\pi \mathord{\left/ {\vphantom {\pi {4 + \delta }}} \right. \kern-0pt} {4 + \delta }}} \right)}} = \frac{{2\boldsymbol{\mathcal{D}}\tan \delta }}{{\sin \left( {{\pi \mathord{\left/ {\vphantom {\pi {4 + \delta }}} \right. \kern-0pt} {4 + \delta }}} \right)}}$$
(18)

The path travelled by ray R2 has a length \(P^{\prime\prime}T + 2\boldsymbol{\mathcal{D}}\sec \delta\). Therefore, the path difference \(\Delta\) between the rays R1 and R2 is

$$\Delta = 2\boldsymbol{\mathcal{D}}\left\{ {\frac{{\tan \delta \sin \left( {{\pi \mathord{\left/ {\vphantom {\pi 4}} \right. \kern-0pt} 4}} \right)}}{{\sin \left( {{\pi \mathord{\left/ {\vphantom {\pi {4 + \delta }}} \right. \kern-0pt} {4 + \delta }}} \right)}} + \sec \delta - 1} \right\} \approx 2\boldsymbol{\mathcal{D}}\delta {\text{ for small }}\delta$$
(19)

The lateral displacement \(PP^{\prime\prime}\) of the skew ray R2 is

$$PP^{\prime\prime} = \frac{{2\boldsymbol{\mathcal{D}}\tan \delta \sin \left( {{\pi \mathord{\left/ {\vphantom {\pi {2 + \delta }}} \right. \kern-0pt} {2 + \delta }}} \right)}}{{\sin \left( {{\pi \mathord{\left/ {\vphantom {\pi {4 + \delta }}} \right. \kern-0pt} {4 + \delta }}} \right)}} \approx 2\sqrt 2 \boldsymbol{\mathcal{D}}\delta {\text{ for small }}\delta$$
(20)

The phase difference between the two rays is \({{\sim 4\pi \boldsymbol{\mathcal{D}}\delta } \mathord{\left/ {\vphantom {{\sim 4\pi \boldsymbol{\mathcal{D}}\delta } {\lambda_{0} }}} \right. \kern-0pt} {\lambda_{0} }}\). Measurement of \(\delta\) requires that \(2\boldsymbol{\mathcal{D}}\delta\) should not be too large a multiple of the wavelength which would result in too dense an interferogram – impractical for its evaluation. This implies that the lateral displacement of the skew ray would be at the most one order larger than the wavelength, negligibly small in comparison with the size of the phase object, while its inclination is kept intact during the cyclic transport. This would satisfy the requirement that the cyclic transport of the “quantum state of the ray” be adiabatic.

This also implies that the natural scale of measurement of the ray inclination is of the order of \(\sim {{\lambda_{0} } \mathord{\left/ {\vphantom {{\lambda_{0} } {2\boldsymbol{\mathcal{D}}}}} \right. \kern-0pt} {2\boldsymbol{\mathcal{D}}}}\). This scale depends on the experimental design parameter \(\boldsymbol{\mathcal{D}}\), which could be increased arbitrarily in principle, provided imperfections in the light source, collimating optics, interferometer components and their alignment do not introduce artifacts \(\sim {{\lambda_{0} } \mathord{\left/ {\vphantom {{\lambda_{0} } {2\boldsymbol{\mathcal{D}}}}} \right. \kern-0pt} {2\boldsymbol{\mathcal{D}}}}\).

These observations form the basis of a new class of interferometers. The next section describes three interferometer configurations based on adiabatic transport of the “quantum state” of a ray preserving its initial inclination with respect to the optic axis.

Three examples of a new class of interferometers

This section introduces a new class of interferometers for performing visualization and measurements on a faint phase object. The phase object is defined as a constant-phase surface perpendicular to the optic axis of a collimated beam that has passed through and emerged as a paraxial beam from a medium of weakly non-uniform refractive index into a region of uniform refractive index containing reflecting and partially reflecting surfaces constituting an interferometer. Each point \(\left( {x,y} \right)\) on the nearly planar wavefront of a phase object has an associated ray with inclinations angles \(\delta_{x} \left( {x,y} \right),\delta_{y} \left( {x,y} \right)\) radians with respect to the optic axis. This wavefront is compared with its copies which have undergone cyclic adiabatic transport around a closed, planar curve that lies in the plane defined by the optic axis and the x-axis. This cyclic transport preserves the inclination of a skew ray with respect to the optic axis in the plane containing the planar cyclic path. Points on every copy have a one-to-one correspondence with points on the phase object. In every copy, a skew ray associated with a general point \(\boldsymbol{\mathcal{G}}\left( {x,y} \right)\) on the phase object undergoes a phase change proportional to \(2\boldsymbol{\mathcal{D}}\delta\) for the ray inclination \(\delta \equiv x^{\prime}\) radians and the length \(2\boldsymbol{\mathcal{D}}\) of the planar cyclic path that lies on a plane perpendicular to the y axis. It also undergoes a lateral displacement \(\Delta x \approx 2\sqrt 2 \boldsymbol{\mathcal{D}}\delta\) in the plane of the cyclic path, that is at the most one order of magnitude larger than the wavelength.

This class of interferometers is thus distinguished by four properties:

  1. 1.

    The phase object is completely external to the interferometer assembly, a characteristic shared with lateral shearing interferometry. For a counterexample, a Mach–Zehnder interferometer must have the phase object within its optical path.

  2. 2.

    The interferometer is sensitive to the local phase gradient of the phase object and not to the phase. A small-sized feature in a larger phase object would thus be more prominently emphasized within the contours of the whole object.

  3. 3.

    There is an experimental control parameter that can be varied to increase the sensitivity to finer features.

  4. 4.

    There is a one-to-one correspondence between points on the phase objects and those on the interferogram.

Three examples are described below.

Example A – a Fabry–Perot interferometer with mirrors replaced with retroreflectors

A Fabry–Perot (FP) interferometer has two partially reflecting plane mirrors parallel to each other. A new kind of interferometer results when these mirrors are replaced with two right angle prisms in partial retroreflection mode with their hypotenuse faces parallel (see Fig. 2).

Fig. 2
figure 2

Schematic of Example A – an interferometer based on the Fabry–Perot interferometer. M1, M2, M3 and M4 are four flat reflecting surfaces. M1 and M2 have a reflectance R and transmittance T at 45° angle of incidence. M3 & M4 have 100% reflectance at 45° angle of incidence. C and S are ground glass screens or similar diffusive surfaces. The mirrors lie on a reference square with diagonal \(\boldsymbol{\mathcal{D}}\) in the plane of the figure

Since precise parallelism between the mirrors is known to be very critical for the FP interferometer and glass right angle prisms reflect using total internal reflection, it would be necessary to replace the 4 reflecting surfaces of the two right angle prisms by 4 mutually perpendicular flat partially reflecting surfaces (called “mirrors” for convenience, although they could be cube beam splitters or some other kind of reflector including right angle prisms in beam steering mode) M1, M2, M3 and M4 (see Fig. 2) supported on suitably custom-designed kinematic mounts.

As these mirrors are surrogates for the reflecting surfaces of right-angle retroreflecting prisms, the kinematic mounts need to be designed in a special way to enable their accurate alignment with the reference square ABCD in Fig. 1. Each mount must hold two mirrors such that their normals make an angle of precisely \({\pi \mathord{\left/ {\vphantom {\pi 2}} \right. \kern-0pt} 2}\) radians, any deviations being correctible. The imaginary intersection of the reflecting surfaces, equivalent to the roof edge of a glass prism, must be perpendicular to the plane containing the reference square ABCD in Fig. 1, any deviations being correctible. The vector bisecting the angle between the normals represents the hypotenuse face of a glass prism. This must be aligned with the diagonal of the reference square ABCD. The mount must have precision translational degrees of freedom that enable the two mirrors to accurately coincide with the reference square.

The relative separation between the roof edges of two such prism-equivalent assemblies is the diagonal \(\boldsymbol{\mathcal{D}} \equiv AC\) of the reference square in Fig. 1. This would have a minimum value constrained by the mechanical dimensions of the kinematic mounts. Larger values would need appropriate additional translational degrees of freedom.

Mirrors M1 and M2 are supported on one prism-equivalent kinematic mount assembly \({\mathbb{P}}_{1}\). They define one corner of the reference square. Assembly \({\mathbb{P}}_{1}\) is aligned such that the diagonal of the reference square is perpendicular to the optic axis of the input beam. This diagonal is called the reference diagonal. The roof edge of \({\mathbb{P}}_{1}\) is aligned parallel with the y-axis of the wavefront coordinate system and is exactly perpendicular to the plane containing the reference square.

Mirrors M3 and M4 are supported on another prism-equivalent kinematic mount assembly \({\mathbb{P}}_{2}\), that can be moved with respect to \({\mathbb{P}}_{1}\) along the reference diagonal, effectively providing an adjustable diagonal distance \(\boldsymbol{\mathcal{D}}\). The roof edge of \({\mathbb{P}}_{2}\) is not exactly perpendicular to the plane of the reference square but makes a slightly smaller angle \({{\left( {\pi - \alpha } \right)} \mathord{\left/ {\vphantom {{\left( {\pi - \alpha } \right)} 2}} \right. \kern-0pt} 2}\) that is precisely adjustable.

Both \({\mathbb{P}}_{1}\) and \({\mathbb{P}}_{2}\) are initially adjusted to have their respective mirrors perpendicular to each other within a very low error, using auxiliary optical techniques. As an example, the retroreflected image of a moiré grating would have a different spatial frequency, the difference being proportional to the angular error. The reciprocal of the fringe spacing of the moiré pattern between the grating and its retroreflected image is thus a measure of the angular error. Mirrors M1 and M2 have a reflection coefficient R and transmission coefficient T and M3 and M4 have a reflection coefficient of 100%, for 45° incidence in each case. The input wavefront of intensity I0 emerging from a phase object is incident on mirror M1. A portion of intensity RI0 is reflected out of the interferometer and can be recorded on the screen C as a calibration reference for the intensity distribution of the input wavefront. A portion of intensity TI0 is transmitted through M1 and is incident on M2. A portion \(T^{2} I_{0}\) is transmitted through M2 and falls on the screen S – this shall be referred to as the first output beamlet. A portion of intensity \(RTI_{0}\) is reflected by M2 and is reflected fully by M3 and M4 and is again incident on M1. The portion \(R^{2} TI_{0}\) is reflected by M1 and is incident on M2. Out of this, a portion \(R^{2} T^{2} I_{0}\) is transmitted by M2 and is incident on the screen S – this shall be called the second output beamlet. Continuing in this manner, a portion of intensity \(R^{3} TI_{0}\) is reflected by M2, M3 and M4 with a portion \(R^{4} TI_{0}\) reflected by M1 and incident on M2, out of which a portion \(R^{4} T^{2} I_{0}\) is transmitted by M2 to form the third output beamlet.

Thus, in every pass along the cyclic path, a portion of the input beam entering the interferometer leaks out and falls on the screen S. Each successive beamlet acquires a phase \(\varphi\) in one cycle. The amplitudes of the successive beamlets, equal to square root of their intensities multiplied with a complex phase factor, are added by the superposition principle, producing an interference pattern on the screen S.

$$\begin{aligned} {\mathbb{A}}_{Output} & = \sqrt {T^{2} I_{0} } + \sqrt {R^{2} T^{2} I_{0} } \cdot \exp \left( {i\varphi } \right) + \sqrt {R^{4} T^{2} I_{0} } \cdot \exp \left( {2i\varphi } \right) \cdots \\ & = T\sqrt {I_{0} } \left\{ {1 + R \cdot \exp \left( {i\varphi } \right) + R^{2} \cdot \exp \left( {2i\varphi } \right) \cdots } \right\} = \frac{{T\sqrt {I_{0} } }}{{1 - R \cdot \exp \left( {i\varphi } \right)}} \\ \end{aligned}$$
(21)

The interferogram thus has the intensity pattern

$$\begin{aligned} {\mathbb{I}}_{Output} & = \left| {{\mathbb{A}}_{Output} } \right|^{2} = \frac{{T\sqrt {I_{0} } }}{{1 - R \cdot \exp \left( {i\varphi } \right)}} \cdot \frac{{T\sqrt {I_{0} } }}{{1 - R \cdot \exp \left( { - i\varphi } \right)}} \\ & = \frac{{T^{2} }}{{\left( {1 - R} \right)^{2} }}\frac{{I_{0} }}{{1 + {\mathbb{F}}\sin^{2} \left( {{\varphi \mathord{\left/ {\vphantom {\varphi 2}} \right. \kern-0pt} 2}} \right)}};\quad {\mathbb{F}} \equiv \frac{4R}{{\left( {1 - R} \right)^{2} }} \gg 1 \\ \end{aligned}$$
(22)

Equation (17) is the well-known Airy distribution that represents high-finesse fringes similar to a Fabry–Perot interferometer, with finesse factor \({\mathbb{F}} \equiv {{4R} \mathord{\left/ {\vphantom {{4R} {\left( {1 - R} \right)^{2} }}} \right. \kern-0pt} {\left( {1 - R} \right)^{2} }} \gg 1\). Its bright fringes represent a contour map of \(\varphi = 2m\pi ,m = 0,1,2 \cdots\).

The phase distribution on the screen S can be calculated for the case of \(\alpha = 0\) with reference to Fig. 3.

Fig. 3
figure 3

Construction of virtual images of an arbitrary point \(\boldsymbol{\mathcal{G}}\) on the phase object by successive reflections from mirrors M2, M3, M4 and M1. The projection of \(\boldsymbol{\mathcal{G}}\) on the screen by a ray parallel to the optic axis is \(\boldsymbol{\mathcal{G}}^{0}\). A skew ray starting from point \(\boldsymbol{\mathcal{G}}\) produces the first output beamlet that meets the screen at \(\boldsymbol{\mathcal{G}}^{\prime}\). After cyclic transport around the four mirrors, the ray appears to emerge from the virtual image \(\boldsymbol{\mathcal{I}_{4}}\) and meets the screen in \(\boldsymbol{\mathcal{G}}^{\prime\prime}\). The screen is viewed along the thick vertical arrow

The phase object lies at a distance \(\boldsymbol{\mathcal{P}}\) from the reference diagonal and at a distance \(\boldsymbol{\mathcal{P}} + \boldsymbol{\mathcal{Q}}\) from the screen along the optic axis. A conical pencil of rays, representing all possible rays that might be emitted from the general point \(\boldsymbol{\mathcal{G}}\left( {x,y} \right)\) on the phase object, would get reflected from surfaces M2, M3, M4 and M1 in succession. At each reflection, the reflected rays would form a truncated cone with its virtual apex at a point behind the reflecting plane, that is known as the virtual image of the object point. The line joining the object point and its image is bisected by the reflecting surface. The virtual images of \(\boldsymbol{\mathcal{G}}\left( {x,y} \right)\) from M2, M3, M4 and M1 are shown as \(\boldsymbol{\mathcal{I}}_{1}\), \(\boldsymbol{\mathcal{I}}_{2}\), \(\boldsymbol{\mathcal{I}}_{3}\) and \(\boldsymbol{\mathcal{I}}_{4}\). The projection of \(\boldsymbol{\mathcal{G}}\) on the screen by a ray parallel to the optic axis is marked as \(\boldsymbol{\mathcal{G}}^{0}\). The first output beamlet travels a distance \(\boldsymbol{\mathcal{GG}}^{0} = \boldsymbol{\mathcal{P}} + \boldsymbol{\mathcal{Q}} - {x \mathord{\left/ {\vphantom {x {\sqrt 2 }}} \right. \kern-0pt} {\sqrt 2 }}\) from the \(\boldsymbol{\mathcal{G}}\left( {x,y} \right)\) to the screen. The second output beamlet travels a distance \(2\boldsymbol{\mathcal{D}} + \boldsymbol{\mathcal{P}} + \boldsymbol{\mathcal{Q}} - {x \mathord{\left/ {\vphantom {x {\sqrt 2 }}} \right. \kern-0pt} {\sqrt 2 }}\) from \(\boldsymbol{\mathcal{I}}_{4}\) to the screen. A ray starting from \(\boldsymbol{\mathcal{G}}\) making angle \(\delta\) radians with the optic axis meets the screen at \(\boldsymbol{\mathcal{G}}^{\prime}\) as part of the first output beamlet. After cyclic transport, this ray meets the screen in point \(\boldsymbol{\mathcal{G}}^{\prime\prime}\) as part of the second output beamlet apparently emerging from the virtual image \(\boldsymbol{\mathcal{I}}_{4}\).Following the same geometrical argument as in the previous section, it can be shown that the path difference \(\boldsymbol{\mathcal{GG}}^{\prime} - \boldsymbol{\mathcal{GG}}^{0} \approx \left( {\boldsymbol{\mathcal{P}} + \boldsymbol{\mathcal{Q}}} \right)\delta\) and \(\boldsymbol{\mathcal{I}}_{4} \boldsymbol{\mathcal{G}}^{\prime\prime} - \boldsymbol{\mathcal{GG}}^{0} \approx \left( {2\boldsymbol{\mathcal{D}} + \boldsymbol{\mathcal{P}} + \boldsymbol{\mathcal{Q}}} \right)\delta\). The phase difference between the skew rays of the second and the first output beamlets is therefore \(\Delta \varphi = {{4\pi \boldsymbol{\mathcal{D}}\delta } \mathord{\left/ {\vphantom {{4\pi \boldsymbol{\mathcal{D}}\delta } {\lambda_{0} }}} \right. \kern-0pt} {\lambda_{0} }}\). The shift \(\boldsymbol{\mathcal{G}}^{\prime}{\boldsymbol{\mathcal{G}}^{0}} \approx \sqrt 2 \left( {\boldsymbol{\mathcal{P}} + \boldsymbol{\mathcal{Q}}} \right)\delta\) and \(\boldsymbol{\mathcal{G}}^{\prime\prime}{\boldsymbol{\mathcal{G}}^{0}} \approx \sqrt 2 \left( {2\boldsymbol{\mathcal{D}} + \boldsymbol{\mathcal{P}} + \boldsymbol{\mathcal{Q}}} \right)\delta\). Therefore, \(\boldsymbol{\mathcal{G}}^{\prime\prime}\boldsymbol{\mathcal{G}}^{\prime} \approx 2\sqrt 2 \boldsymbol{\mathcal{D}}\delta\).

The optical path difference between rays parallel to the optic axis belonging to the second and first beamlets is \(2\boldsymbol{\mathcal{D}}\) throughout the projection of the collimated beam on the screen. This contributes a spatially uniform phase difference \(\varphi_{0} = {{4\pi \boldsymbol{\mathcal{D}}} \mathord{\left/ {\vphantom {{4\pi \boldsymbol{\mathcal{D}}} {\lambda_{0} }}} \right. \kern-0pt} {\lambda_{0} }}\), producing a uniform intensity pattern on the screen.

It is possible to create fringes perpendicular to the y axis by tilting the roof edge of the prism-equivalent assembly \({\mathbb{P}}_{2}\) in the \(x_{1} x_{2}\) plane by a very small angle \({\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-0pt} 2}\). In this case, the path traced out by rays parallel to the optic axis does not lie in a plane but rather forms a helix with its axis along the \(\hat{x}_{2}\) direction with a pitch related to the tilt angle. So long as this angle is “sufficiently small”, the transport around the four mirrors can still be considered cyclic and adiabatic.

A formal derivation of the phase difference \(\varphi\) between output beamlets for the case of \(\alpha \ne 0;\alpha \ll 1\) is given in the Appendix using a matrix method [31, 32]. The following results are obtained:

$$\varphi \approx \left( {{{2\pi } \mathord{\left/ {\vphantom {{2\pi } {\lambda_{0} }}} \right. \kern-0pt} {\lambda_{0} }}} \right)\left( {2\boldsymbol{\mathcal{D}} + y\alpha - 2\boldsymbol{\mathcal{D}}\delta_{x} } \right)$$
(23)

The intensity distribution (17) then becomes a fringe pattern, with bright fringe positions \(y_{m}\) given by

$$\left( {{{2\pi } \mathord{\left/ {\vphantom {{2\pi } {\lambda_{0} }}} \right. \kern-0pt} {\lambda_{0} }}} \right)\left( {2\boldsymbol{\mathcal{D}} + y_{m} \alpha - 2\boldsymbol{\mathcal{D}}\delta_{x} } \right) = 2m\pi ,\;m = 0,1,2 \cdots ;y_{m} = \frac{{\lambda_{0} m}}{\alpha } + 2\boldsymbol{\mathcal{D}}\frac{{\delta_{x} }}{\alpha } - \frac{{2\boldsymbol{\mathcal{D}}}}{\alpha }$$
(24)

The fringe order m is the integer nearest to \({{2\boldsymbol{\mathcal{D}}} \mathord{\left/ {\vphantom {{2\boldsymbol{\mathcal{D}}} {\lambda_{0} }}} \right. \kern-0pt} {\lambda_{0} }}\). The fringe separation is

$$\Delta y \equiv y_{m + 1} - y_{m} = \frac{{\lambda_{0} }}{\alpha }$$
(25)

The fringe deviation in units of fringe separation is

$$\Delta f \equiv \frac{{y_{m} - \left. {y_{m} } \right|_{{\delta_{x} = 0}} }}{\Delta y} = \frac{{2\boldsymbol{\mathcal{D}}}}{{\lambda_{0} }}\delta_{x}$$
(26)

With \(\alpha = 0\), (17) with (19) becomes a contour map of \(\left| {\delta_{x} } \right|\).

Example B – modification of a Michelson interferometer

The Michelson interferometer (MI) has a beam splitter at 45° to the optic axis of the input beam, which transmits about half the beam along the optic axis of the input beam and reflects the rest at 90° to the input beam. Mirrors reflect these beams back along their original directions. The beam splitter then transmits the initially reflected beam and reflects the initially transmitted beam, which coincide and create an interference pattern depending upon the path difference between the two arms of the interferometer.

It turns out that merely replacing mirrors of a Michelson interferometer by right-angle prisms does not turn it into another example of the new class of interferometers. The second example illustrated in Fig. 4 has a beam splitter at 45° to the optic axis of the input beam as in the MI. Two retro-reflecting right-angle-prism-equivalent optical assemblies consisting of mirrors M1, M2, M3 and M4 are placed at the positions similar to those of mirrors in the MI. But the input beam reflected by the beam splitter does not reach these mirrors. It is only the transmitted beam that participates in the interference pattern. The optical system in Fig. 4 turns out to be equivalent to that in Fig. 2 as far as ray tracing is concerned.

Fig. 4
figure 4

Example B: A modification of the Michelson Interferometer. The Michelson interferometer consists of a beam splitter at 45° to the optic axis and two mirrors which reflect the two resulting beams back along their path. The figure shows the two mirrors of a Michelson interferometer replaced by retro-reflecting right-angle-prism-equivalent assemblies consisting of mirrors M1, M2, M3, M4. But the beam splitter is oriented at 90° to the beam splitter of a Michelson interferometer. The optical system consisting of virtual images of M1 and M2 formed by the beam splitter and the mirrors M3 and M4 is equivalent to that illustrated in Fig. 2

Unlike the system in Fig. 2, the mirrors M1 and M2 have a 100% reflectance at 45° incidence. Unlike the MI whose beam splitter has both reflectance and transmittance ~ 0.5, the beam splitter in Fig. 4 has a reflectance R ~ 0.9 and transmittance T ~ 0.1 at 45° incidence. The rays make multiple passes along the folded cyclic path bounded by the mirrors M1… M4 and the beam splitter. After each pass, a portion leaks out of the beam splitter and falls on the screen as a series of beamlets.

Figure 4 illustrates that the optical system consisting of the virtual images of M1 and M2 and mirrors M3 and M4 is equivalent to the one shown in Fig. 2. The cyclic path followed by incoming rays in Fig. 2 is folded by the beam splitter into another cyclic path shown by thick rays in Fig. 4. A ray tracing exercise similar to the earlier one yields exactly the same results: high finesse fringes are formed parallel to the y-axis when the M3-M4 assembly is tilted in the \(x_{1} x_{2}\) plane by a small angle \({\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-0pt} 2}\), whose deviation is proportional to the ray deflection.

Example C: based on a Moiré interferometer

An attempt [33] to circumvent the uncertainty principle of optics by modifying diffraction limits on the moiré deflectometry technique [34] was found [35] to contain an error, whose resolution led to the patent [3] mentioned earlier and discovery of a new class of interferometers.

The idea revolves around the fact that a diffraction grating with the pitch of its rulings comparable to the wavelength diffracts the incoming test beam into several diffraction orders which are well separated. When the diffraction orders \(\pm 1\) are reflected back along their path using mirrors, they produce an interference pattern at the position of the grating with a pitch exactly equal to the pitch of the grating. A moiré pattern between the grating and the interference pattern projected on it contains information on the changes impressed on the optical beam. This moiré pattern can be transmitted or reflected by the grating if the grating is made by ruling lines on a metal film coated on an optical flat.

If one of the recombining beams passes through a phase object, the phase information impressed on that beam is manifested as deviation of the moiré pattern [35]. This configuration [33] is illustrated in Fig. 5.

Fig. 5
figure 5

Diffraction compensation scheme proposed in Ref [33] using a single strongly-diffracting grating. BS is a beam splitter, Ml and M2 are mirrors. GG is a ground glass screen kept at an elevation different from that of the mirrors. The grating is tilted such that it makes a small angle with the plane of the figure. Then, the beams reflected from M1 and M2 are partially reflected by the grating towards the matt screen on the right side and partially transmitted and can be viewed using a beam splitter and a matt screen kept on the lower left side of the figure. The beams a1, a0, a-1 are singly diffracted while the beams a-1a1, a-1a0, a-1a-1, a1a1, a1a0 and a1a-1 are doubly diffracted. A phase object kept between the grating and one of the mirrors will produce an interferogram with fringe deviation proportional to the phase introduced

Reproduced with permission from Fig. 3 of Ref [33].

When the mirrors in Fig. 5 are replaced with right-angle-prism-equivalent optical assemblies, the configuration mentioned in Indian Patent no. 183095 results (Fig. 6).

Fig. 6
figure 6

Reproduced from Indian Patent No. 183095 (public domain). 1 Housing, 2 Light source, 3 Beam splitter, 4 Collimated beam, 5A, 5B Optical arrangements for inspection of opaque or transparent test objects. 6A, 6B, shutters to choose the test object. 7 wedge prism for angular calibration. 8A, 8B identical transmission gratings of pitch d. 9A, 9B, 10A, 10B and 12A, 12B are pairs of plane mirrors at right angles to each other. 11- screen for viewing the moiré pattern. O is the origin of the coordinate system. XX’- is the x-axis. Z marks the z-axis. The Y axis is perpendicular to the plane of the figure. The ruling of gratings 8A and 8B make angles \(\pm {\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-0pt} 2}\) with respect to the Y-axis. The angle between the normal to the grating and the diffraction orders \(\pm 1\) is \(\theta_{0} = \sin^{ - 1} \left( {{{\lambda_{0} } \mathord{\left/ {\vphantom {{\lambda_{0} } d}} \right. \kern-0pt} d}} \right) < {\pi \mathord{\left/ {\vphantom {\pi 2}} \right. \kern-0pt} 2}\). The roof edges of prisms 9 and 10 are at a distance D from the origin

The grating 8A splits the input beam into diffracted beams of orders − 1, 0 and + 1. The prisms 9 and 10 retroreflect the  − 1 and + 1 orders so that they are incident on the grating 8B at precisely the angle of diffraction of grating 8A. The interference pattern between these recombinant beams has a pitch exactly equal to the pitch of grating 8B. By tilting grating 8A by angle \({\alpha \mathord{\left/ {\vphantom {\alpha 2}} \right. \kern-0pt} 2}\) and grating 8B by angle \({{ - \alpha } \mathord{\left/ {\vphantom {{ - \alpha } 2}} \right. \kern-0pt} 2}\) in the XY plane, a moiré pattern is produced that is projected on screen 11. The zeroth order beam retroreflected by prism 12 is also projected on screen 11 to serve as an intensity normalization reference. A ray diagram and its theoretical calculations are described in detail in the patent [3] and shall not be repeated here. The main results are as follows:

  1. 1.

    The interference pattern produced by the superposition of retroreflected beams of diffraction orders + 1 and  − 1 at the grating 8B has the pitch exactly identical with the pitch of the grating.

  2. 2.

    The normalized intensity distribution of the moiré pattern is given by

    $${{{\mathbb{I}}_{{moire^{\prime}}} } \mathord{\left/ {\vphantom {{{\mathbb{I}}_{{moire^{\prime}}} } {{\mathbb{I}}_{\max } }}} \right. \kern-0pt} {{\mathbb{I}}_{\max } }} = \cos^{2} \left\{ {2\pi \left( {\frac{2\sin \left( \alpha \right)}{d}} \right)\left( {y - D\delta_{x} \sec^{2} \theta_{0} {\text{cosec}} \left( \alpha \right)} \right)} \right\}$$
    (27)
  3. 3.

    The pitch of bright fringes is

    $$\Delta y = {d \mathord{\left/ {\vphantom {d {4\sin \alpha }}} \right. \kern-0pt} {4\sin \alpha }}.$$
    (28)
  4. 4.

    The fringe deviation in units of fringe separation is

    $$h = \frac{2D}{d}\delta_{x} \left( {x,y} \right)\sec^{2} \theta_{0} \;\;for\;\;\theta_{0} < \quad {\pi \mathord{\left/ {\vphantom {\pi 2}} \right. \kern-0pt} 2}$$
    (29)

Experimental demonstration

This section begins with a historical note in order to provide context. Originally, Ref [33] was intended only as a theoretical exercise. On Referee’s insistence, an experimental demonstration was hastily organized. After observation of moiré fringes as predicted [33], the revised manuscript was submitted within the deadline for resubmission. However, just after acceptance, it was realized that the moiré pattern did not respond to ray deflections impressed on the input beam. This led to re-examination of the theory. An error was found and reported [35] as corrigendum to Ref [33]. A modification of the experiment that implemented the theoretical premise of Ref [33] was actually performed and was reported as “note added in proof” [35]. The details of the experiment were not published at that time as the emphasis switched to filing of a patent before its novelty could be extinguished by such publication. That experimental demonstration is described in this section for the first time.

A 0·5 W He–Ne laser beam was expanded to \(\approx 15\;{\text{mm}}\) diameter using off-the-shelf beam expansion and spatial filtering accessories. Its collimation was checked using two moiré gratings of pitch 0·11 mm separated by a variable distance up to 75 cm. The grating referred in Fig 5 had 100 grooves per mm on aluminium coating on ultra-flat glass, so it could be used both in transmission and reflection mode. The \(\pm 1\) diffraction orders were well-separated 50 cm beyond the grating. These were retroreflected back on the grating using two glass right-angle prisms of 50 × 50 mm faces mounted on translation stages. The grating was tilted at an angle in the plane containing the rulings and the input beam. The retroreflected diffracted beams were re-diffracted by the grating in reflection mode. These were incident on a matt screen made of tracing paper mounted between the two prisms at a higher elevation producing 6 doubly-diffracted beams as described in Fig. 5.

A 1.5 mm diameter metal rod was placed in the path of the input beam and its shadows in the central two beams on the matt screen were made to coincide by precisely moving the two prisms using micrometer-controlled translation stages. The best overlap between the shadows of the rod was not complete overlap – a result of manufacturing tolerances in the right angle of the prisms. Moiré fringes were observed when the grating was tilted in its plane by a small angle. They were recorded on negative colour film using a film camera. Figure 7 shows the scanned image of its positive print.

Fig. 7
figure 7

Moiré fringes. a As photographed on colour film and scanned b converted to grayscale, enhanced and marked with two lines showing the hard boundaries of the shadows of a rod placed in the path of the input beam

The fringe separation could be altered by either tilting the grating in its plane or by tilting one of the prisms in the plane normal to its hypotenuse face. The fringes were displaced when the plume of a lighted candle was inserted in the input beam showing their sensitivity to an external phase object.

But the surprising observation was that shown in Fig. 7. The fringes have a faint presence in the shadow region. These faint fringes are deviated below and above the original fringe position to the left and right of the centre of the shadow region. Apparently, the deflection of rays because of diffraction from the edges of the object is being observed as fringe deviation. A lateral shearing interferometer cannot produce fringe deviation caused by diffraction from an edge.

This is unfortunately the only result from those experiments that survives to this day. Therefore, many legitimate questions concerning this work remain (and likely to continue to remain) unanswered. This work was discontinued shortly after filing the patent.

Enquiry into the question of what would happen if the angle of diffraction is exactly \({\pi \mathord{\left/ {\vphantom {\pi 2}} \right. \kern-0pt} 2}\) led to the configuration described as example B. Its analysis led to the configuration of example A. That was prepared as a patent application and submitted to a patent attorney but was not pursued further.

Brief comments on potential applications

This class of interferometers has unique properties already mentioned in the previous section. While practical applications based on these special characteristics would be determined largely by ingenuity and inventiveness, certain general observations can be made. Applications can be broadly categorised into two classes: those that emphasise optical metrology and testing and others that emphasize characterization of unknown phase objects.

These interferometers are crucially dependent on the perfection of the right angle of the retroreflecting prism. Measuring the error in a right-angle prism is obviously an important first application. One could make an interferometer of type C using a grating and two prism assemblies made with 4 front-coated mirrors. These can be manually adjusted to obtain best overlap between the middle two doubly-diffracted beams where the deviation of fringes in the shadow region of a long and narrow opaque object placed in the input beam is clearly visible. This instrument can then be used as a primary standard for testing glass prisms for the angular error. For this purpose, a spatially-filtered collimated beam retroreflected from the test prism can be used as the input beam. A perfect prism would produce moiré fringes which are continuous across the shadow of its roof edge. Any deviation would indicate the type of correction that is needed. With experience, a standardized optical shop procedure for manufacture of right-angle prisms with very low error can be evolved.

Such low-error prisms can be used with a type B interferometer that can be set up for testing and manufacturing right-angle prisms with even lower error. These can then be used in place of prism assembly \({\mathbb{P}}_{2}\) and even to manufacture a pair of cube beam splitters which can replace the prism assembly \({\mathbb{P}}_{1}\) in type A interferometer. Such type A interferometer would then be composed of two monolithic high precision optical components instead of a set of complex optomechanical mounts.

The second category of applications for characterizing unknown phase objects can be discussed most fruitfully using two examples.

One example is a microscopic biomedical sample that is most often a smear on a glass slide. By using a front-coated mirror as the glass slide, light reflected from the smeared sample becomes an unknown phase object. The light could be either in the form of a collimated beam reflected from the sample as a paraxial beam or a collimated beam focused to a small area on the sample and reflected as a divergent beam. The latter produces a Fourier transform of the spatial variation imposed by the sample. Both can be expanded into a paraxial beam of a larger diameter and passed through the type A interferometer, which is now made using just two monolithic high precision optical components, one of which can be moved.

The two-dimensional interferogram produced by the paraxial beam can be recorded on a high-pixel-count digital camera either as a contour map of the absolute magnitude of the ray inclination or as fringe deviation pattern. This can be done at different separations of the two prisms. The resulting data characterizing the sample is unique in many respects:

  • Different biomolecules would have different molar refractivity. Their spatially non-uniform concentration would thus impart a corresponding non-uniformity on the reflected light beam – both in the case of a paraxial reflected beam and a divergent reflected beam.

  • The interference pattern is sensitive to the gradient of optical phase. Thus, spatially abrupt changes in biomolecule concentrations, denoting membrane boundaries, would produce a more noticeable change in the interference pattern than a more diffuse concentration pattern.

  • One could, in principle, divide the reflected beam into two and deploy two type A interferometers looking at ray inclinations \(\delta_{x} \left( {x,y} \right)\) and \(\delta_{y} \left( {x,y} \right)\) of the phase object. By recording the interferograms on high pixel count CCD cameras, quantitative information would be obtained about these over a wide dynamic range because of a variable \(\boldsymbol{\mathcal{D}}\) at the same spatial magnification.

  • This information can be either in coordinate space or Fourier space pertaining to the sample. This has significant implications which require a separate research effort.

  • This data could potentially provide a closer view of intracellular activity in live cells without modifying them as in fluorescence microscopy.

The second example is that of plasma diagnostics. A collimated beam emerging from the dense plasma focus can be split into many parts, each being fed to a separate type A interferometer. Using different values of \(\boldsymbol{\mathcal{D}}\), one can map the refractivity gradients \(\delta_{x} \left( {x,y} \right)\) and \(\delta_{y} \left( {x,y} \right)\) over many scales giving an unprecedented insight into relations between phenomena occurring at different spatial scales. These involve formation of plasmoidal, toroidal, lobular and filamentary substructures within and outside the dense plasma core [7]. More importantly, this would provide the first quantitative information on the density structure of the low-density region that surrounds the dense core where most of the magnetic energy resides. Such information is currently not accessible by any refractivity-based diagnostics.

People with relevant expertise might like to inquire whether such right-angle prisms can be devised for paraxial optics of charged or neutral particle and what would happen if similar interferometers could be constructed using them. Note that this question becomes comprehensible only when one invokes the analogy between DM’s quantum theory of rays and classical trajectories of paraxial charged particle beams.

Summary and conclusions

This paper revisits a new class of interferometers invented in 1995 [3] but not reported to the scientific community since then. Contemporary relevance of this invention is brought out in reference to recent advances in plasma diagnostics [4]. The conceptual basis of this invention is more easily accessible through theoretical insights provided by the work of Dietrich Marcuse [1, 2] on a quantum theory of ray optics than through the classical approaches to optics. Essential features of this work are reviewed briefly. The notion of quantum state of a ray in the context of geometrical optics of paraxial beams is brought out. This is defined in terms of the coordinates \(\left( {x,y} \right)\) and ray inclinations \(\delta_{x} \left( {x,y} \right),\delta_{y} \left( {x,y} \right)\) with respect to the optic axis at any point on the nearly planar constant-phase surface of a paraxial beam.

Following Berry’s theory of geometric phase [25], the question of introducing a geometric phase by cyclic adiabatic transport of the quantum state of a ray in coordinate space is considered. It is recognized that two perfect right angle prisms can transport a ray without changing its inclination with respect to the optic axis around a nearly closed cyclic curve that does not retrace its path. This cyclic transport confers a phase proportional to the ray inclination (in radian) and the length of the cyclic path under paraxial conditions.

A new class of interferometers is defined where a paraxial test wavefront is compared with its copy subjected to adiabatic and cyclic transport. This class of interferometers has the following properties:

  1. 1.

    The phase object is completely external to the interferometer assembly.

  2. 2.

    The interferometer is sensitive to the local phase gradient of the phase object and not to the phase.

  3. 3.

    There is an experimental control parameter that can be varied to increase the sensitivity to finer features.

  4. 4.

    There is a one-to-one correspondence between points on the phase objects and those on the interferogram.

Three examples of such interferometers are described along with experimental demonstration of one carried out in 1995 but never reported before.

Brief commentary on potential applications suggests the possibility of advances in optical metrology applied to optical shop testing and quantitative characterization of unknown phase objects such as microscopic biomedical samples and diagnostics of dense magnetized plasmas. Features of such phase objects not accessible by conventional optical techniques can be characterized by such interferometers.