Keywords

2010 Mathematics Subject Classification

FormalPara Introduction

These notes, echoing a conference given at the Strasbourg–Zurich seminar in October 2017, are written to serve as an introduction to 2-dimensional quantum Yang–Mills theory and to the results obtained in the last five to ten years about its so-called large N limit.

Quantum Yang–Mills theory, at least in the flavour that we will describe, combines differential geometric and probabilistic ideas. We would like to think, and hope to convince the reader, that this is less a complication than a source of beauty and enjoyment.

Some parts of our presentation will rely more distinctly on a probabilistic or a differential geometric background. We will however always try to keep technicalities aside and to favour explanation over demonstration. This is thus not, in the purest sense, a mathematical text: there will be essentially no proof. On the other hand, we will give fairly detailed examples of some computations that, we hope, are typical of the theory and illustrate it.

Slightly different in aim and content, but also introductory, the notes [26] written with four hands with Ambar Sengupta can serve as counterpoint, or complement, to the present text.

These notes are split in three parts. In the first, we explain the nature of the Yang–Mills holonomy process, which is the main object of interest of the theory. We do it from two perspectives, one differential geometric, and one probabilistic. This leads us to the definition of Wilson loop expectations, which are the most important numerical quantities of the theory.

In the second part, we discuss several approaches to the computation of Wilson loop expectations, and illustrate them on several examples. The large N limit of the theory makes a first appearance in this section, and we derive by hand some concrete instances of the Makeenko–Migdal equations which are the subject of the third part. We also included in the second part a discussion of the holonomy process on the sphere, and of the Douglas–Kazakov phase transition.

In the third part, we describe the Makeenko–Migdal equations. In keeping with the style of these notes, we do not offer a proof of these equations, but we describe as carefully as we can Makeenko and Migdal’s original derivation of them. Then, we discuss the amount of information carried by these equations and illustrate their power in the computation of the so-called master field, that is the large N limit of Wilson loop functionals.

1 Quantum Yang–Mills Theory on Compact Surfaces

1.1 The Holonomy Process and the Yang–Mills Action

The central object of study of quantum 2-dimensional Yang–Mills theory is a collection of random unitary matrices indexed by the class \({\mathcal L}_{m}(M)\) of Lipschitz continuous loops based at some point m on a compact surface M. This collection of random variables is called the Yang–Mills holonomy process and it is denoted by

$$\displaystyle \begin{aligned} (H_{\ell})_{\ell\in {\mathcal L}_{m}(M)} \end{aligned} $$
(1)

The idea of this collection of random variables arose, along a fairly convoluted path, from physical considerations relating to the description of certain kinds of fundamental interactions.Footnote 1 It is, fortunately, not necessary to be familiar with the original motivation of Yang and Mills to understand what the Yang–Mills holonomy process is.

In very broad terms, the basic data of the theory is a compact surface M (for example a disk, a sphere, a cylinder, a torus) and a compact matrix group G (for example U(1), SO(3), U(N)). From this data, an infinite dimensional space of connections can be builtFootnote 2, on which an infinite dimensional symmetry group, the gauge group actsFootnote 3, with infinite dimensional quotient, and one of the fundamental maps of the theory is the holonomy map

On the right-hand side, the action of G on the space of maps from \({\mathcal L}_{m}(M)\) to G is by conjugation. Leaving this action aside, note that the distribution of the holonomy process (1) is a probability measure on the space \(\mbox{Maps}({\mathcal L}_{m}(M),G)\). We will call this space the space of holonomies.

One property that makes the holonomy map so important is that it is injective. It is thus legitimate to say that a connection is well described by its holonomy.

Another fundamental map of the theory is the Yang–Mills action S YM which is a non-negative functional traditionally defined on the space of connections, but that can also be defined on the space of holonomies, so that the situation is

(2)

The Yang–Mills measure is heuristically described as the Boltzmann probability measure, on the space of connections or on the space of holonomies, associated with the Yang–Mills action. The typical formula that one finds in the literature is

$$\displaystyle \begin{aligned} {\; \mathrm{d}}\mu_{{\mathsf{YM}}}(\omega)=\frac{1}{Z} \, e^{-\frac{1}{2T}S_{{\mathsf{YM}}}(\omega)}\, {\; \mathrm{d}}\omega \end{aligned} $$
(3)

where T is a positive real parameter called the coupling constant. Here, ω is meant to stand for a connection or for a holonomy, depending on one’s preferred point of view. This expression is however plagued with difficulties: on the infinite dimensional spaces where the Yang–Mills measure is supposed to live, there is no Lebesgue-like reference measure that could reasonably play the role of dω, and even if there were, one would not expect the Yang–Mills measure to be absolutely continuous with respect to it; moreover, because of the action of the gauge group, the most sensible value for the normalisation constant would be Z = +; and one does finally not expect a typical ω in the sense of the Yang–Mills measure to be regular enough to have a finite Yang–Mills action.

One of the goals of the 2-dimensional quantum Yang–Mills theory is to find a way of sorting out these difficulties and to construct rigorously a probability measure that can honestly be called the Yang–Mills measure. The situation may look rather desperate, but it is uplifting to realise that after replacing the space of connections, or holonomies, by a space of real-valued functions on [0, 1] and the Yang–Mills action by the square of the Sobolev H 1 norm, the analogous problem is almost just as ill-posed but has a very well-known solution, namely the Wiener measure. The main difference between the Wiener and the Yang–Mills cases is the presence in the latter of the gauge symmetry. Symmetry can however be a nuisance or a guide, and it turns out to be possible, in Yang–Mills theory, to make gauge symmetry an ally rather than a foe.

We will now describe more precisely the three maps appearing in the diagram (2). The holonomy map and the Yang–Mills action on the space of connections are differential geometric in nature. We start by describing them, and then turn to the Yang–Mills action on the space of holonomies. It would be unfair to say that the content of Section 1.2 can safely be completely ignored: we will refer to it later, in particular in Section 3.2. However, it is certainly possible to skip it at first reading and to jump to Section 1.3.

1.2 The Yang–Mills Action: Connections

In this section, we assume from the reader some familiarity with the differential geometry of principal bundles. We give brief reminders of the main definitions, but this is of course not the place for a complete exposition. For details, and although some might find it too Bourbakist in style, we recommend the second chapter of the first volume of the classical opus by Kobayashi and Nomizu [21].

1.2.1 The Yang–Mills Action

Although we are concerned in this text with compact surfaces, we will describe the Yang–Mills action in the more general context of compact Riemannian manifolds of arbitrary dimension—this is not more difficult.

Let M be a compact connected Riemannian manifold. Let G be a compact Lie group with Lie algebra \({\mathfrak g}\). Assume that \({\mathfrak g}\) is endowed with a scalar product 〈⋅, ⋅〉 that is invariant under the adjoint representation \(\mathrm {{Ad}}:G\to {\mathrm {GL}}({\mathfrak g})\).Footnote 4 Let π : P → M be a principal G-bundle over M.Footnote 5 Let \({\mathcal A}\) denote the space of connections on P. It is an affine subspace of the space of \({\mathfrak g}\)-valued differential 1-forms on P. For every connection \(\omega \in {\mathcal A}\), the curvature of ω is the form \(\Omega =d\omega +\frac {1}{2}[\omega \wedge \omega ]\).Footnote 6 This \({\mathfrak g}\)-valued 2-form on P vanishes on vertical vectors and is G-equivariant. It can thus be seen as a 2-form on M with values in the adjoint bundle Ad(P). Using the Hodge operator of the Riemannian structure of M, one can form the (Ad(P) ⊗Ad(P))-valued form of top degree Ω ∧ ⋆ Ω on M. Contracting this form with the Euclidean structure of Ad(P) induced by the invariant scalar product on \({\mathfrak g}\) yields the real-valued differential form of top degree 〈 Ω∧ ⋆ Ω〉. This form can be integratedFootnote 7 to yield the Yang–Mills action of ω:

$$\displaystyle \begin{aligned} S_{{\mathsf{YM}}}(\omega)=\frac{1}{2}\int_{M} \langle \Omega \wedge {\star}\Omega\rangle\end{aligned} $$
(4)

In words, the Yang–Mills action of a connection is nothing more than one half of the squared L 2 norm of its curvature.Footnote 8

Let us describe locally, in coordinates, the scalar function that is integrated over M to compute S YM(ω). For this, let us consider an open subset U of M on which there exist local coordinates x 1, …, x n on M and over which P is trivial. Let us choose a sectionFootnote 9 σ : U → P of P over U. Let us define A = σ ω. Then in the local coordinates on U, the 1-form A writes A 1 dx 1 + … + A n dx n, where A 1, …, A n are maps from U to \({\mathfrak g}\). Then F = σ  Ω writes

$$\displaystyle \begin{aligned}F=\sum_{1\leqslant i<j\leqslant n} \big(\partial_{i}A_{j}-\partial_{j}A_{i}+[A_{i},A_{j}]\big){\; \mathrm{d}} x_{i}\wedge \mathrm{d} x_{j}\end{aligned}$$

and the contribution of U to the Yang–Mills action of ω is

$$\displaystyle \begin{aligned}\frac{1}{2}\int_{U}\langle \Omega \wedge {\star}\Omega\rangle=\frac{1}{2}\sum_{1\leqslant i<j\leqslant n} \int_{U} \big\| \partial_{i}A_{j}-\partial_{j}A_{i}+[A_{i},A_{j}]\big\|{}^{2} {\; \mathrm{d}}\text{vol}(x)\end{aligned}$$

where dvol(x) is the Riemannian volume measure on M, and ∥⋅∥ is the Euclidean norm on \({\mathfrak g}\) associated with the invariant scalar product 〈⋅, ⋅〉. The analogy with the squared Sobolev H 1 norm should be even more obvious on this expression.

1.2.2 Gauge Transformations

The gauge group, that we denote by \({\! \mathcal {J}}\), is the group of G-equivariant diffeomorphisms of P over the identity of M.Footnote 10 It acts by pull-back on \({\mathcal A}\) and a routine verification shows that it leaves S YM invariant. Thus, the Yang–Mills action descends to a function

$$\displaystyle \begin{aligned}S_{{\mathsf{YM}}}:{\mathcal A}\big/{\! \mathcal{J}} \to [0,\infty)\end{aligned}$$

the study of which is the subject of classical Yang–Mills theory.

Let us display the formulas which give, through a local section of P, the action of the gauge group on a connection and its curvature. These formulas are indeed useful, and ubiquitous in the literature. Let j : P → P be a gauge transformation. Let σ : U → P be a local section of P over an open subset U of M. Then there exists a unique function g : U → G such that for every x ∈ U, one has j(σ(x)) = σ(x)g(x). Then, letting j act on a connection ω yields the new connection j ⋅ ω = j ω and transforms on one hand A into

$$\displaystyle \begin{aligned}A^{g}=\sigma^{*}(j\cdot\omega)=g^{-1}Ag+g^{-1}{\; \mathrm{d}} g\end{aligned}$$

and on the other hand F into

$$\displaystyle \begin{aligned}F^{g}=g^{-1}Fg\end{aligned}$$

This formula explains the invariance of the Yang–Mills action: without trying to be perfectly precise, one can say that the action of a gauge transformation conjugates the curvature at each point of M by some element of G, and thus leaves its Euclidean norm unchanged.

1.2.3 Some Questions of Classical Yang–Mills Theory

Let us mention, without giving any details, a few examples of the questions that arise in the study of the Yang–Mills action.

  • The set \(S_{{\mathsf {YM}}}^{-1}(0)\) is the moduli space of flat connections, that is, the quotient of the set of flat connections by the action of the gauge group. It is a finite-dimensional orbifold with a rich geometric structure, the study of which is both an old and an active subject of investigation [11, 12, 20, 28, 29, 45, 46].

  • The Yang–Mills action can be understood as arising, through appropriate reformulation and generalisation, from a Lagrangian formulation of Maxwell’s equations of the electromagnetic field. The critical points of the Yang–Mills action are thus of special interest: they are, in a sense, the classical physical fields of Yang–Mills theory. They are called Yang–Mills connections and a milestone in their study in the 2-dimensional case is [1].

  • When M is 4-dimensional, the Yang–Mills action is conformally invariant, in the sense that it depends on the Riemannian metric on M only through its conformal class. There is an extensive literature devoted to Yang–Mills connections on 4-dimensional manifolds [18]. Looking for self-dual Yang–Mills connections on \({\mathbb R}^{4}\) that are invariant by translation in two directions, for example, leads to the study of Hitchin equations and Higgs bundles [19].

  • From a physical point of view, the Yang–Mills action of a connection is an appropriate measure of its non-triviality. From an analytical point of view, however, it turns out that a natural way of measuring a connection is its Sobolev H 1 norm.Footnote 11 The Yang–Mills action is controlled by the H 1 norm, but not conversely. A flat connection, that is, a connection with Yang–Mills action 0, can be given an arbitrarily large H 1 norm by an appropriate gauge transformation. A beautiful theorem of Karen Uhlenbeck states that level sets of the Yang–Mills action, that is, the sets of the form \(\{S_{{\mathsf {YM}}}\leqslant c\}\), \(c\in {\mathbb R}_{+}\), are sequentially weakly compact in H 1 up to gauge transformation: from any sequence of connections with bounded Yang–Mills action, one can extract a subsequence which, after suitable gauge transformation of each term, converges weakly in H 1 [44].

  • The Yang–Mills action gives rise to a gradient flow, which formally is the solution of the differential equation \(\partial _{t}\omega _{t}=-\nabla _{\omega _{t}}S_{{\mathsf {YM}}}\). This is the Yang–Mills flow [36]. There is currently an active investigation of stochastic perturbations of this flow in cases where M is 2- or 3-dimensional [4, 41].

1.2.4 The Holonomy Map

A fundamental construction associated with a connection is that of the holonomy, or parallel transport, that it induces. For every continuous and piecewise smooth curve c : [0, 1] → M, the parallel transport along c determined by the connection ω is the G-equivariant mapping \(\text{hol}(\omega ,c):P_{c_{0}}\to P_{c_{1}}\) which to every point p of \(P_{c_{0}}\) associates the endpoint of the unique continuous curve \(\tilde c:[0,1]\to P\) such that \(\tilde c_{0}=p\), \(\pi \circ \tilde c=c\) and for all t ∈ [0, 1] at which c is differentiable, \(\omega (\dot {\tilde {c}}_{t})=0\).

This parallel transport enjoys the following properties, which are of fundamental importance.

  • It is unaffected by a change of parametrisation of the curve.

  • If c : [0, 1] → M is a curve and c −1 denotes the same curve traced backwards, that is, \(c^{-1}_{t}=c_{1-t}\), then hol(ω, c −1) = hol(ω, c)−1.

  • If c and c′ are two curves such that \(c_{1}=c^{\prime }_{0}\), so that the concatenation cc′ is well defined, then hol(ω, cc′) = hol(ω, c′) ∘hol(ω, c).

It will be useful to understand a bit more concretely how this parallel transport can be computed, and how it gives rise to a holonomy in the sense that we gave to this word in Section 1.1.

Assume that the range of the curve c lies in an open subset U of M over which the fibre bundle P is trivial.Footnote 12 Let σ : U → P be a section of P over U. Set A = σ ω. It is a 1-form on U with values in \({\mathfrak g}\). The solution of the differential equation

$$\displaystyle \begin{aligned} \dot h_{t}=-A(\dot c_{t})h_{t}, \ h_{0}=1_{G} \end{aligned} $$
(5)

is a curve h : [0, 1] → G which starts from the unit element 1G. The endpoint of this curve computes the parallel transport along c determined by ω in the sense that

$$\displaystyle \begin{aligned}\text{hol}(\omega,c)(\sigma(c_{0}))=\sigma(c_{1})h_{1}\end{aligned}$$

This relation is illustrated in Figure 1.

Fig. 1
figure 1

The difference between the horizontal lift of c starting at σ(c 0), denoted in this picture by \(\tilde c\), and σ(c), the image of c by the local section σ, is measured by the function h which solves (5)

Let us introduce the notation

$$\displaystyle \begin{aligned}\text{hol}_{\sigma}(\omega,c)=h_{1}\end{aligned} $$

the holonomy of ω along c read in the section σ. This object has the drawback of depending on the choice of a local section of the bundle, but the great advantage of being fairly concrete, namely an element of G, that is, in many situations, a matrix.

If \(j\in {\! \mathcal {J}}\) is a gauge transformation of P, recall from Section 1.2.2 that j ⋅ ω = j ω is the pull-back of ω by the diffeomorphism j of P. Then the holonomy of j ⋅ ω along c is related to that of ω by the relation

$$\displaystyle \begin{aligned}\text{hol}(j\cdot \omega,c)=j_{|P_{c_{1}}}^{-1}\circ \text{hol}(\omega,c)\circ j_{|P_{c_{0}}}\end{aligned}$$

Through the local section σ : U → M, and letting g : U → G be the function such that j(σ(x)) = σ(x)g(x) for every x ∈ U, this relation takes the more explicit form

$$\displaystyle \begin{aligned} \text{hol}_{\sigma}(j\cdot \omega,c)=g_{c_{1}}^{-1}\text{hol}_{\sigma}(\omega,c)g_{c_{0}} \end{aligned} $$
(6)

It follows from (6) that for all loop on M, that is, all curve which ends at its starting point, the conjugacy class of holσ(ω, ) is not affectedFootnote 13 by a gauge transformation of ω.

More generally, given a base point m on M, and denoting by \({\mathcal L}_{m}^{\infty }(M)\) the class of piecewise smooth loops on M based at m, the orbit of

$$\displaystyle \begin{aligned}(\text{hol}_{\sigma}(\omega,\ell) : \ell\in {\mathcal L}_{m}^{\infty}(M))\in \mbox{Maps}({\mathcal L}_{m}^{\infty}(M),G)\end{aligned}$$

under the action of G by simultaneous conjugation is not affected by a gauge transformation of ω. This explains how a connection modulo gauge transformation defines a holonomy modulo conjugation.

The following result makes precise the statement that the horizontal arrow of (2) is injective.

Theorem 1.1

Let m be a point of M. Let σ be a section of P in a neighbourhood of m. For any two connections ω and ω′ on P, the following assertions are equivalent.

  1. 1.

    There exists a gauge transformation \(j\in {\! \mathcal {J}}\) such that j  ω = ω′.

  2. 2.

    There exists g  G such that for all loop \(\ell \in {\mathcal L}_{m}^{\infty }(M)\), the equality hol σ(ω′, l) = g −1hol σ(ω, l)g holds.

1.3 The Yang–Mills Action: Holonomies

We will now give an alternative of the Yang–Mills action that is less classical and, most importantly, specific to the 2-dimensional case. To give an idea of the nature of this second description, let us pursue the analogy with the Wiener measure and the Sobolev H 1 norm. Consider a smooth function \(b:[0,1]\to {\mathbb R}\) with b(0) = 0. The squared H 1 norm of b can be expressed at least in the following two ways:

$$\displaystyle \begin{aligned} \|b\|{}_{H^{1}}^{2} =\int_{0}^{1} |\dot b(t)|{}^{2} {\; \mathrm{d}} t=\sup_{0\leqslant t_{0}< t_{1}<\ldots<t_{n}\leqslant 1} \sum_{k=1}^{n} \frac{|b(t_{k})-b(t_{k-1})|{}^{2}}{t_{k}-t_{k-1}} \end{aligned} $$
(7)

The integral expression corresponds to the description of the Yang–Mills action that we gave in the last section and is very similar to (4). We will now give another description, similar to the second, more combinatorial one.

1.3.1 Holonomies

The main algebraic property of the holonomy of a connection, already mentioned in Section 1.2.4, is that it is a multiplicative map from \({\mathcal L}_{m}^{\infty }(M)\) to G. Let us formulate this in a slightly different way.

Recall that M is a compact Riemannian manifold and G a compact Lie group. Let \({\mathcal P}(M)\) denote the set of all Lipschitz continuousFootnote 14 paths on M, two paths being identified if they differ only by an increasing change of parametrisation. Let us call a function \(h:{\mathcal P}(M)\to G\) multiplicative if it satisfies the following two properties.

  • For all path c, letting c −1 denote the same path traced backwards, one has h(c −1) = h(c)−1.

  • For all paths c and c′ such that c finishes where c′ starts, so that the concatenated path cc′ is defined, one has h(cc′) = h(c′)h(c).

More generally, given a subset P of \({\mathcal P}(M)\), we say that a function h : P → G is multiplicative if it satisfies the above conditions whenever all the paths involved belong to the subset P.

Let us denote by \(\mathrm {Mult}({\mathcal P}(M),G)\) (resp. by Mult(P, G)) the subset of \(\mathrm {Maps}({\mathcal P}(M),G)\) (resp. of Maps(P, G)) formed by all multiplicative maps.

There is an action of the gauge group Maps(M, G) on \(\mbox{Mult}({\mathcal P}(M),G)\) defined as follows. Consider g : M → G and a multiplicative map \(h:{\mathcal P}(M)\to G\). For all path c starting at c 0 and finishing at c 1, define

$$\displaystyle \begin{aligned} (g\cdot h)(c)=g_{c_{1}}^{-1}h(c)g_{c_{0}} \end{aligned} $$
(8)

an equation that should be compared with (6). It is not difficult to check that the map g ⋅ h is still multiplicative.

Let m be a point of M. A multiplicative function can be restricted to \({\mathcal L}_{m}(M)\) and the action of Maps(M, G) on this restricted map reduces to the action of G by conjugation. The following fact may seem surprising at first sight, but it is not difficult to prove.

Proposition 1.2

For all m  M, the restriction map

$$\displaystyle \begin{aligned}\mathit{\mbox{Mult}}({\mathcal P}(M),G)\big/ \mathit{\mbox{Maps}}(M,G) \longrightarrow \mathrm{Mult}({\mathcal L}_{m}(M),G)\big/ G\end{aligned}$$

is a bijection.

We call either side of this bijection the space of holonomies. Thanks to the multiplicativity and the gauge symmetry, a holonomy can either be seen as a group-valued function on the set of all paths, or on the set of all loops based at some reference point m on M.

1.3.2 Graphs on Surfaces

We will now assume that M is a 2-dimensional manifold: it is thus a compact surface. We announced an expression of the Yang–Mills action similar to the rightmost term of (7): the role of subdivisions of the interval [0, 1] will be played by graphs on M. This will be the occasion of a first encounter with this notion that is central to the construction of the 2-dimensional Yang–Mills measure.

Let us call edge an element of \({\mathcal P}(M)\) that is injective — note that this does not depend on the way in which the path is parameterised. A graph is a finite set of edges, stable by the reversal map ee −1, and in which any two edges either form a pair {e, e −1}, or meet, if at all, at some of their endpoints.

The vertices of a graph are the endpoints of its edges. The faces of a graph are the connected components of the complement in M of the union of its edges. A graph is conveniently described as a triple \({\mathbb G}=({\mathbb V},{\mathbb E},{\mathbb F})\) consisting of a set of vertices, a set of edges and a set of faces, but it is in fact entirely determined by the set \({\mathbb E}\) of its edges.

A crucial additional assumption is that every face of a graph must be homeomorphic to a disk. This guarantees that the 1-skeleton of the graph correctly represents the topology of the surface, to the extent that a 1-dimensional object can represent a 2-dimensional one.

1.3.3 The Yang–Mills Action

Let \({\mathbb G}\) be a graph on our compact surface M. We will denote by\({\mathcal P}({\mathbb G})\) the set of paths that can be constructed as concatenations of edges of \({\mathbb G}\). To each face F of \({\mathbb G}\), we can associate in an almost unequivocal way a loop ∂F that winds exactly once around F. To give a perfectly rigourous definition of this loop is less simple than one might expect, but there is nothing counterintuitive in it. It is only almost well defined because there is no preferred starting point for this loop. However, if \(f:{\mathcal P}({\mathbb G})\to G\) is a multiplicative function, then the conjugacy class of the element h(∂F) of G is well defined. In particular, the Riemannian distance, in G, between the element h(∂F) and the unit element 1G, is well defined.Footnote 15 This distance is, moreover, not affected by the action of an element of the gauge group Maps(M, G) on h.

We can now define the Yang–Mills action on the space of holonomies by setting, for all \(h\in \mathrm {Mult}({\mathcal P}(M),G)\),

$$\displaystyle \begin{aligned} S_{{\mathsf{YM}}}(h)=\sup\bigg\{\sum_{F\in {\mathbb F}} \frac{d_{G}(1_{G},h(\partial F))^{2}}{\text{area}(F)}\ :\ {\mathbb G} \text{ graph on } M\bigg\} \end{aligned} $$
(9)

where the area of a face F is computed using the Riemannian structure on M.

It is manifest on this expression that, in the case where M is a surface, the only part of the Riemannian structure on M that is used in the definition of the Yang–Mills action is the Riemannian volume, in this case the Riemannian area. This is of course also true, be it in a slightly less apparent way, of the definition (4).

Proposition 1.3

Assume that M is 2-dimensional. Then the definitions (4) and (9) of the Yang–Mills action agree. More precisely, for every connection ω inducing a holonomy h, the equality S YM(ω) = S YM(h) holds.

1.4 The Yang–Mills Holonomy Process

We will now explain how to construct the Yang–Mills holonomy process. Although the definition of this process is derived, at a heuristic level, from the Yang–Mills action, the process and the action are logically unrelated. We can thus start afresh, from a compact surface M on which we have a Riemannian structure, or at least a measure of area, and a compact Lie group G, on the Lie algebra of which we have an invariant scalar product.

1.4.1 The Configuration Space of Lattice Yang–Mills Theory

One piece of information that we need to retain from the previous sections is the notion of graph on our surface M (see Section 1.3.2). Let us choose a graph \({\mathbb G}=({\mathbb V},{\mathbb E},{\mathbb F})\) on M. The configuration space associated with a graph \({\mathbb G}\) on our surface M is the manifold

$$\displaystyle \begin{aligned}{\mathcal C}_{{\mathbb G}}=\{g=(g_{e})_{e\in {\mathbb E}} \in G^{{\mathbb E}} : \forall g\in G, g_{e^{-1}}=g_{e}^{-1}\}=\mathrm{Mult}({\mathbb E},G)\end{aligned}$$

of all ways of assigning an element of G to each oriented edge, in a way that is consistent with the orientation reversal.

Recall that we denote by \({\mathcal P}({\mathbb G})\) the set of paths that can be constructed as concatenations of edges of \({\mathbb G}\). The configuration space \({\mathcal C}_{{\mathbb G}}\) is naturally in one-to-one correspondence with the set \(\mathrm {Mult}({\mathcal P}({\mathbb G}),G)\) of all multiplicative maps from \({\mathcal P}({\mathbb G})\) to G.

Choosing an orientation of \({\mathbb G}\), that is, a subset \({\mathbb E}^{+}\subset {\mathbb E}\) containing exactly one element in each pair {e, e −1} allows one to realise the configuration space in the slightly less canonical, but easier to handle, way

$$\displaystyle \begin{aligned}{\mathcal C}_{{\mathbb G}}=G^{{\mathbb E}^{+}}\end{aligned}$$

This makes it easy, for instance, to endow \({\mathcal C}_{{\mathbb G}}\) with a probability measure, namely the Haar measure on \(G^{{\mathbb E}^{+}}\). The invariance of the Haar measure on the compact group G under the inverse map xx −1 implies that this measure on \({\mathcal C}_{{\mathbb G}}\) does not depend on the choice of orientation. We denote it by dg.

Every path \(c\in {\mathcal P}({\mathbb G})\) can be uniquely written as a concatenation of edges \(c=e_{1}^{\epsilon _{1}}\ldots e_{n}^{\epsilon _{n}}\) with \(e_{1},\ldots ,e_{n}\in {\mathbb E}^{+}\) and 𝜖 1, …, 𝜖 n ∈{−1, 1}. To such a path \(c=e_{1}^{\epsilon _{1}}\ldots e_{n}^{\epsilon _{n}}\) we associate a holonomy map

$$\displaystyle \begin{aligned} h_{c}: {\mathcal C}_{{\mathbb G}} & \longrightarrow G \\ g & \longmapsto g_{e_{n}}^{\epsilon_{n}}\ldots g_{e_{1}}^{\epsilon_{1}} \end{aligned} $$
(10)

Our goal is to endow the configuration space \({\mathcal C}_{{\mathbb G}}\) with an interesting probability measure, so as to make the collection of maps \((h_{c})_{c\in {\mathcal P}({\mathbb G})}\) into a collection of G-valued random variables.

1.4.2 The Driver–Sengupta Formula

In order to define this probability measure, we need to introduce the heat kernel on G, or more accurately the fundamental solution of the heat equation. The invariant scalar product on the Lie algebra \({\mathfrak g}\) determines a bi-invariant Riemannian structure on G, and a Laplace-Beltrami operator Δ. We consider the function \(p:{\mathbb R}^{*}_{+}\times G \to {\mathbb R}^{*}_{+}\) that is the unique positive solution of the heat equation \((\partial _{t}-\frac {1}{2} \Delta )p=0\) with initial condition \(p(t,x){\; \mathrm {d}} x\Rightarrow \delta _{1_{G}}\) as t → 0. We use the notation p t(x) = p(t, x). A crucial property of this function is that, for all t > 0 and all x, y ∈ G, we have p t(yxy −1) = p t(x). We refer to this property as the invariance under conjugation of the heat kernel.

We mentioned at the end of Section 1.3.3 that, in the 2-dimensional setting, the Yang–Mills action depends on a Riemannian structure of the surface M only through the Riemannian area that it induces. We will denote by |F| the area of a Borel subset F of M.

Given a face F of our graph, recall that we denote by ∂F a path that goes once around this face in the positive direction. Recall also that this path is ill-defined because there is no preferred vertex on the boundary of F from which to start it. However, this indeterminacy only results in an indeterminacy up to conjugation for the holonomy map h ∂F. Thanks to the invariance under conjugation of the heat kernel, the function gp t(h ∂F(g)) is still well defined on \({\mathcal C}_{{\mathbb G}}\) for every t > 0.

We can now write the formula which is the basis of the definition of the 2-dimensional Yang–Mills measure. It is due to Bruce Driver in the case where M is the plane, or a disk, and to Ambar Sengupta when M is an arbitrary compact surface. Recall that T is a positive real parameter of the measure. We define, on \({\mathcal C}_{{\mathbb G}}\), the probability measure

$$\displaystyle \begin{aligned} \mathrm{d}\mu^{{\mathbb G},T}_{{\mathsf{YM}}}(g)=\frac{1}{Z({\mathbb G},T)} \prod_{F\in {\mathbb F}} p_{T|F|}(h_{\partial F}(g)){\; \mathrm{d}} g \end{aligned} $$
(DS)

Here, \(Z({\mathbb G},T)\) is the normalisation constant that makes \(\mu ^{{\mathbb G},T}_{{\mathsf {YM}}}\) a probability measure on \({\mathcal C}_{{\mathbb G}}\).

The gauge group \(\mbox{Maps}({\mathbb V},G)\) acts on the configuration space \({\mathcal C}_{{\mathbb G}}\) by a formula analogous to (8), and the measure \(\mu ^{{\mathbb G},T}_{{\mathsf {YM}}}\) is invariant under this action. Indeed, this action preserves the reference measure dg and transforms the holonomy along loops, in this case along boundaries of faces, by conjugation, which leaves the value of the fundamental solution of the heat equation on these holonomies unchanged.Footnote 16

1.4.3 Invariance Under Subdivision

Starting from a graph \({\mathbb G}\) on our surface M, we built the configuration space \({\mathcal C}_{{\mathbb G}}\) and endowed, thanks to the Driver–Sengupta formula, this space with a probability measure, the lattice 2-dimensional Yang–Mills measure on \({\mathbb G}\). In doing so, we automatically produced a collection

$$\displaystyle \begin{aligned}(h_{c})_{c\in {\mathcal P}({\mathbb G})} \ \text{ or } \ (h_{\ell})_{\ell\in {\mathcal L}_{m}({\mathbb G})}\end{aligned}$$

of G-valued random variables.Footnote 17

The property of this construction that makes it so extremely pleasant is the fact that it is invariant under subdivision.

To articulate this fundamental property, let us say that a graph \({\mathbb G}_{2}\) is finer than a graph \({\mathbb G}_{1}\) if \({\mathbb G}_{2}\) can be obtained from \({\mathbb G}_{1}\) by subdividing and adding edges. More precisely, \({\mathbb G}_{2}\) is finer than \({\mathbb G}_{1}\) if \({\mathbb E}_{1}\subset {\mathcal P}({\mathbb G}_{2})\): each edge of \({\mathbb G}_{1}\) is a path in \({\mathbb G}_{2}\). When this happens, there is a natural map

$$\displaystyle \begin{aligned} {\mathcal C}_{{\mathbb G}_{2}} & \longrightarrow {\mathcal C}_{{\mathbb G}_{1}} \\ g^{(2)} & \longmapsto \big(h^{(2)}_{e}(g^{(2)})\big)_{e\in {\mathbb E}_{1}} \end{aligned} $$

where each edge e of \({\mathbb G}_{1}\) is seen as a path in \({\mathbb G}_{2}\) and thus assigned a holonomy by the configuration g (2).

The main result of 2-dimensional lattice Yang–Mills theory is the following.

Theorem 1.4

Let \({\mathbb G}_{1}\) and \({\mathbb G}_{2}\) be two graphs on M. Assume that \({\mathbb G}_{2}\) is finer than \({\mathbb G}_{1}\). Then for all T > 0, the equality \(Z({\mathbb G}_{1},T)=Z({\mathbb G}_{2},T)\) holds and the push-forward of the measure \(\mu ^{{\mathbb G}_{2},T}_{{\mathsf {YM}}}\) by the natural map \({\mathcal C}_{{\mathbb G}_{2}}\to {\mathcal C}_{{\mathbb G}_{1}}\) is the measure \(\mu ^{{\mathbb G}_{1},T}_{{\mathsf {YM}}}\).

This theorem is so important that we are going to give an idea of the mechanism of its proof.

Proof

The first observation is that one can always go from a graph to a finer graph by an appropriate succession of elementary operations consisting either in adding a new vertex in the middle of an existing edge or in adding a new edge between two existing vertices. We need to understand why neither of these elementary operations affect the partition function, nor transform essentially the measure.

The subdivision of an edge e into two new edges e′ and e″ amounts, in the integral defining the partition function and in the expression defining the discrete Yang–Mills measure, to the replacement of every occurrence of the integration variable g e by the product of the two new variables \(g_{e''}g_{e'}\). The invariance by translation of the Haar measure ensures that this does not affect the result of any computation.

The case of the addition of a new edge is more interesting. This edge e splits a face F into two faces F 1 and F 2, the boundaries of which are of the form ea and be −1 for some paths a and b. Observe that ba is a loop going along the boundary of F. In the computation of the partition function of the Yang–Mills measure on the finer graph, or of the integral of any functional on the configuration space of the coarser graph with respect to the image of the discrete Yang–Mills measure on the finer graph, we find an integral of a product of many factors, among which the two factors

$$\displaystyle \begin{aligned}p_{T|F_{1}|}\big(h_{a}(g)g_{e}\big)\ p_{T|F_{2}|}\big(g_{e}^{-1}h_{b}(g)\big)\end{aligned}$$

contain the only two occurrences of the integration variable g e. We can thus easily integrate with respect to g e, using the convolution property of the heat kernel, namely the equality p t ∗ p s = p t+s, to find these two factors replaced by

$$\displaystyle \begin{aligned}p_{T(|F_{1}|+|F_{2}|)}\big(h_{a}(g)h_{b}(g)\big)=p_{T|F|}\big(h_{ba}(g)\big)=p_{T|F|}\big(h_{\partial F}(g)\big)\end{aligned}$$

We are thus left with the partition function, or the integral of our functional, relative to the coarser graph. □

The partition function \(Z({\mathbb G},T)\), which is now promoted to a function of T alone, is a very interesting object. Let us give without proof an expression of this function. We use the notation [a, b] = aba −1b −1 for the commutator of two elements a and b of G.

Proposition 1.5

Assume that M is a surface of genus g without boundary. Then for all T > 0, the partition function of the 2-dimensional Yang–Mills theory on M is given by

$$\displaystyle \begin{aligned}Z_{M}(T)=\int_{G^{2g}}p_{T|M|}([a_{1},b_{1}]\ldots [a_{g},b_{g}]){\; \mathrm{d}} a_{1}\mathrm{d} b_{1}\ldots \mathrm{d} a_{g} \mathrm{d} b_{g}\end{aligned}$$

1.4.4 The Continuum Limit

Up to some conceptually inessential but technically annoying complications, the invariance by subdivision of the discrete theory allows one to take the limit of the discrete measures as the graphs on the surface become infinitely fine. The technical complications have to do with the fact that, because two edges of two distinct graphs can intersect in a rather pathological way, it is not always true that given two graphs, there exists a third graph that is finer than these two graphs. The net effect of this complication is the persistence, in the theorem asserting the existence and uniqueness of the Yang–Mills holonomy process, of a continuity condition. We say that a sequence of paths \((c_{n})_{n\geqslant 1}\) on M converges to a path c with fixed endpoints if all paths c, c 1, c 2, … start at the same point and finish at the same (possibly different) point, and if the sequence of the paths \((c_{n})_{n\geqslant 1}\) parameterised at unit speed converges uniformly to c.

Theorem 1.6

(The Yang–Mills holonomy process, [23, 40]) Let M be a compact surface endowed with a smooth Footnote 18 measure of area. Let G be a compact Lie group, the Lie algebra of which is endowed with an invariant scalar product. There exists a collection of G-valued random variables \((H_{c})_{c\in {\mathcal P}(M)}\) such that

  • for every graph \({\mathbb G}=({\mathbb V},{\mathbb E},{\mathbb F})\), the distribution of \((H_{e})_{e\in {\mathbb E}}\) is the measure \(\mu ^{{\mathbb G},T}_{{\mathsf {YM}}}\),

  • whenever a sequence \((c_{n})_{n\geqslant 1}\) of paths converges with fixed endpoints to a path c, the sequence of random variables \((H_{c_{n}})_{n\geqslant 1}\) converges in probability to H c.

Moreover, any two collections of G-valued random variables with these properties have the same distribution.

The Yang–Mills holonomy process \((H_{c})_{c\in {\mathcal P}(M)}\) is invariant in distribution under the action of the gauge group. This means that for every function g : M → G, the following equality in distribution holds:

$$\displaystyle \begin{aligned} \Big(g(\overline{c})^{-1}H_{c}g(\underline{c})\Big)_{c\in {\mathcal P}(M)}\stackrel{(d)}{=}(H_{c})_{c\in {\mathcal P}(M)} \end{aligned} $$
(11)

where \( \underline {c}\) and \(\overline {c}\) denote respectively the starting and finishing point of a path c. In particular, the distribution of H c is uniform on G for every path c that is not a loop. Of course, this huge collection of uniform random variables is correlated in a complicated way, in particular to allow the random variables associated with loops to have non-uniform distributions.

The holonomy process also enjoys a property of invariance under area-preserving maps of M: if ϕ : M → M is an area-preserving diffeomorphism, then ϕ preserves the class \({\mathcal P}(M)\) and the family \((H_{\phi (c)})_{c\in {\mathcal P}(M)}\) has the same distribution as the family \((H_{c})_{c\in {\mathcal P}(M)}\). This is because the Driver–Sengupta formula depends only on the combinatorial structure of the graph under consideration, and on the areas of its faces. This is consistent with the fact that the Yang–Mills action, which we originally defined on a Riemannian manifold by (4), depends, if the manifold is 2-dimensional, on the Riemannian structure only through the Riemannian area. We already mentioned this important point in relation with the expression (9) of the Yang–Mills action.

1.4.5 The Structure of the Holonomy Process

The structure of the Yang–Mills holonomy process can be described fairly concretely provided one understands the structure of the set of loops on a graph.

Let us consider a graph \({\mathbb G}\) on M and a vertex m of this graph. We denote naturally by \({\mathcal L}_{m}({\mathbb G})\) the set of loops in \({\mathbb G}\) based at m. The operation of concatenation makes \({\mathcal L}_{m}({\mathbb G})\) a monoid, with unit element the constant loop at m. Each element of this monoid has an ‘inverse’ −1, but it is not true, unless is already the constant loop, that ℓℓ −1 is the constant loop. In order to make \({\mathcal L}_{m}({\mathbb G})\) a group, into which −1 is truly the inverse of , it is natural to introduce on it the backtracking equivalence relation, for which two loops are equivalent if one can go from one to the other by successively erasing or inserting sub-loops of the form ee −1, where e is an edge of the graph.

Each equivalence class of loops contains a unique loop of shortest length, which is also the unique reduced loop in this class, where by a reduced loop we mean one without any sub-loop of the form ee −1.

Moreover, concatenation is compatible with this equivalence relation and the quotient monoid is a group. This quotient monoid can be more concretely described as the set \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\) of reduced loops endowed with the operation of concatenation-followed-by-reduction.

With this group of reduced loops in hand, we can make several observations.

  • Each element g of the configuration space \({\mathcal C}_{{\mathbb G}}\) induces, by the holonomy map, a map \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\to G\), which sends a loop to h (g). This map is a group homomorphism, and the map

    $$\displaystyle \begin{aligned}{\mathcal C}_{{\mathbb G}} \longrightarrow \mathrm{Hom}({\mathcal L}_{m}^{\mathrm{red}}({\mathbb G}),G)\end{aligned}$$

    is onto. Moreover, it descends to a bijection

    $$\displaystyle \begin{aligned}{\mathcal C}_{{\mathbb G}}/\mbox{Maps}({\mathbb V},G) \,{\mathop{\kern 0pt\longrightarrow}\limits_{}^{\sim}}\, \mathrm{Hom}({\mathcal L}_{m}^{\mathrm{red}}({\mathbb G}),G)/G\end{aligned}$$

    where the action on the left is that of the gauge group, and on the action on the right is that of G by conjugation.

  • Let Γ denote the 1-skeleton of the graph, that is, the union of the ranges of its edges. The map \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\to \pi _{1}(\Gamma ,m)\) which simply sends a reduced loop to its homotopy class is an isomorphism.

  • The group \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\), being isomorphic to the fundamental group of a graph, or of a 1-dimensional complex, is a free group. The rank of this group is equal to \(|{\mathbb E}|-|{\mathbb V}|+1=|{\mathbb F}|-\chi (M)+1=|{\mathbb F}|+2g-1\), where χ(M) is the Euler characteristic of M and g its genus.

It is useful to recognise that the free group \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\) admits nice bases.Footnote 19 Let us call lasso around a face F of \({\mathbb G}\) any loop of the form c.∂F.c −1, where c is a path from m to a vertex on the boundary of F, and ∂F is a loop going once around F.

It is now quite easy to describe the holonomy process. Let us begin with the case of the plane, or the disk.

Proposition 1.7

Assume that M is a disk or the plane. Let \({\mathbb G}\) be a graph on M. The free group \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\) admits a basis \(\{\lambda _{F}: F\in {\mathbb F}\}\) such that

  • for each face F, the loop λ F is a lasso around F,

  • under the lattice Yang–Mills measure \(\mu ^{{\mathbb G},T}_{{\mathsf {YM}}}\), the random variables \((H_{\lambda _{F}} : F\in {\mathbb F})\) are independent, each \(H_{\lambda _{F}}\) being distributed according to the measure p T|F|(g) dg.

In a sense, the holonomy process has independent increments distributed according to the fundamental solution of the heat equation: it can be described as a ‘Brownian motion on G indexed by loops’ on the disk, or on the plane. The role of time is played by area, and increments occur along faces of the graph, or lassos, instead of intervals of time.

In the case of a closed surface, the situation is slightly different. In this case, the most natural presentation of the group \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\) is not as a free group (which it is), but with one generator too many, and one relation.

Proposition 1.8

Assume that M is a closed surface of genus g. Let \({\mathbb G}\) be a graph on M. Set \(r=|{\mathbb F}|\) . The free group \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\) admits a presentation

$$\displaystyle \begin{aligned}{\mathcal L}_{m}^{\mathrm{red}}({\mathbb G})=\big\langle \lambda_{F_{1}},\ldots,\lambda_{F_{r}}, a_{1},b_{1},\ldots,a_{g},g_{b} \ \big|\ [a_{1},b_{1}]\ldots [a_{g},b_{g}]=\lambda_{F_{1}}\ldots \lambda_{F_{r}}\big\rangle\end{aligned}$$

where

  • the loops \(\lambda _{F_{1}},\ldots ,\lambda _{F_{r}}\) are lassos around the r faces of \({\mathbb G}\),

  • the homotopy classes of the loops a 1, b 1, …, a g, b g generate π 1(M, m),

  • for every test function \(f:G^{2g+r}\to {\mathbb C}\) , one has

    (12)

    where in the last integral, z r stands for

    $$\displaystyle \begin{aligned}z_{r}=(z_{r-1}\ldots z_{1}[a_{g},b_{g}]\ldots [a_{1},b_{1}])^{-1}\end{aligned}$$

Let us try to spell out the probabilistic content of this result. The presentation of the group \({\mathcal L}_{m}^{\mathrm {red}}({\mathbb G})\) that we chose splits it into a homotopically trivial part, giving rise to the random variables \(H_{\lambda _{1}},\ldots ,H_{\lambda _{r}}\), and a system of generators of the fundamental group of M, associated with the random variables \(H_{a_{1}},H_{b_{1}},\ldots ,H_{a_{g}},H_{b_{g}}\). A particular role is played by the homotopically trivial loop C = [a 1, b 1]…[a g, b g].

  • The distribution of the random variable H C is such that for every continuous test function \(\tilde f:G\to {\mathbb C}\),

    $$\displaystyle \begin{aligned}\int_{{\mathcal C}_{G}} \tilde f(H_{C}){\; \mathrm{d}}\mu^{{\mathbb G},T}_{{\mathsf{YM}}}=Z_{M}(T)^{-1}\int_{G^{2g}}(\tilde fp_{T|M|})([a_{1},b_{1}]\ldots [a_{g},b_{g}]){\; \mathrm{d}} a_{1} {\; \mathrm{d}} b_{1}\ldots {\; \mathrm{d}} a_{g} {\; \mathrm{d}} b_{g}\end{aligned}$$

    This does not seem to be a particularly well-known distribution. It needs not have a density with respect to the Haar measure: for instance if G = U(N), it is supported by the Haar-negligible subgroup SU(N). However, it is, by definition, absolutely continuous with respect to the distribution of the product of g independent commutators of independent uniformly distributed random variables, and this distribution, for example if G = SU(N) and provided \(g\geqslant 2\), is absolutely continuous with respect to the Haar measure. It is also possible to write a Fourier series for this distribution, but it involves Littlewood–Richardson coefficients, or more generally an understanding of the tensor product of irreducible representations of G.

  • Conditional on H C, the families \((H_{\lambda _{1}},\ldots ,H_{\lambda _{r}})\) and \((H_{a_{1}},H_{b_{1}},\ldots ,H_{a_{g}},H_{b_{g}})\) are independent. It is also true that the random variables

    $$\displaystyle \begin{aligned}(H_{\lambda_{1}},\ldots,H_{\lambda_{r}})\text{ mod }G \ \ \ \text{ and } \ \ \ (H_{a_{1}},H_{b_{1}},\ldots,H_{a_{g}},H_{b_{g}})\text{ mod }G\end{aligned}$$

    with values in G rG and G 2gG, where G acts by conjugation, are independent conditional on H C mod G, that is, conditional on the conjugacy class of H C.

On a surface of genus g, the probabilistic backbone of the holonomy process can thus be described as consisting of a segment of a Brownian motion on G of length T|M| and 2g independent Haar distributed random variables on G, jointly conditioned on the final point of the Brownian motion being equal to the products of the g commutators of the uniform random variables taken in pairs.

The case where M is a sphere is special, in the sense that it involves no uniform random variables, but only a Brownian bridge on G going from 1G to 1G in a time equal to T times the total area of the sphere.

1.5 Wilson Loop Expectations

A different approach to the description of the distribution of the Yang–Mills holonomy process consists in identifying a natural class of scalar, gauge-invariant, functionals of this process, the distribution of which is hoped to contain as much information as possible. The most natural class of such functionals is that of Wilson loop functionals, which are indeed the most important scalar observables of the theory. A Wilson loop functional is constructed by choosing a certain number of loops 1, …, n on M, then the same number of conjugation-invariant functions \(\chi _{1},\ldots ,\chi _{n}:G\to {\mathbb C}\) and by forming the product

$$\displaystyle \begin{aligned} \chi_{1}(H_{\ell_{1}})\ldots \chi_{n}(H_{\ell_{n}}) \end{aligned} $$
(13)

When G is a group of matrices, the simplest choice of conjugation-invariant function is the trace. The Wilson loop expectations, which play in this theory the role of n-point functions, are the numbers

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{Tr}(H_{\ell_{1}})\ldots \mathrm{Tr}(H_{\ell_{n}})] \end{aligned} $$
(14)

the computation of which is a seemingly endless subject of reflection. We will discuss in the next section a few concrete examples of computation of such numbers. For the time being, let us say a word about the amount of information that they carry.

Suppose we know the collection of all the numbers (14), or more generally the expectation of all functionals of the form (13). Then we know the joint distribution of all random variables of the form χ(H ) where is a loop and \(\chi :G\to {\mathbb C}\) is an invariant function. Since G is compact, invariant functions separate conjugacy classes and we know, in fact, the joint distribution of the conjugacy classes of all variables H . This is certainly an important piece of information. However, the form of the action of the group of gauge transformations on the collection of holonomies, as given by (11), indicates that this action preserves more than just the individual conjugacy classes of the holonomies. Indeed, if 1, …, n are based at the same point, then it is the orbit of \((H_{\ell _{1}},\ldots ,H_{\ell _{n}})\) under the operation of simultaneous conjugation

$$\displaystyle \begin{aligned}(h_{1},\ldots,h_{n})\mapsto (gh_{1}g^{-1},\ldots,gh_{n}g^{-1})\end{aligned}$$

that is gauge-invariant. To grasp the geometric meaning of this invariance, it is useful to take a concrete example for G, say G = SU(N) or even G = SO(3). In these groups, knowing the individual conjugacy classes of a collection of elements amounts to knowing their eigenvalues, that is, in the case of SO(3), the angles of the rotations. On the other hand, to know the orbit of these elements under simultaneous conjugation requires the additional knowledge of the relative positions of their eigenspaces, or for rotations, the relative positions of their axes.

The main question is then the following. Is it the case that the Wilson loop expectations describe not only the individual conjugacy classes of the G-valued random variables that constitute the Yang–Mills process, but also the simultaneous conjugacy class of all variables associated with the loops based at some point m of M? In more precise terms, is it true that the algebra of functions on \({\mathcal A}/{\! \mathcal {J}}\) generated by Wilson loop functionals separates points? If not, it cannot be said that the Wilson loop functionals constitute a complete set of gauge-invariant scalar observables.

The answer turns out to depend entirely on the group G, and it does not seem to be known in all cases, even for compact Lie groups.Footnote 20 The property that G must have for the answer to be positive is the following.Footnote 21

Definition 1

(Property W) We say that a group G has the property W if for any \(n\geqslant 2\) and any two collections x 1, …, x n and \(x^{\prime }_{1},\ldots ,x^{\prime }_{n}\) of elements of G, the assumption that every word in x 1, …, x n and their inverses is conjugated to the same word in \(x^{\prime }_{1},\ldots ,x^{\prime }_{n}\) and their inverses implies the existence of an element y of G such that \(x^{\prime }_{1}=yx_{1}y^{-1}, \ldots , x^{\prime }_{n}=yx_{n}y^{-1}\).

Since this long definition is maybe not very pleasant to read, let us word it differently. We are comparing two relations between n-tuples (x 1, …, x n) and \((x^{\prime }_{1},\ldots ,x^{\prime }_{n})\) of elements of G. The first is the relation of simultaneous conjugation

$$\displaystyle \begin{aligned} \exists y\in G, \ x^{\prime}_{1}=yx_{1}y^{-1}, \ldots, x^{\prime}_{n}=yx_{n}y^{-1} \end{aligned} $$
(SC)

The second could be called lexical conjugation and holds exactly when

$$\displaystyle \begin{aligned} \text{every word in }x_{1},\ldots,x_{n}\text{ is conjugated to the same word in }x^{\prime}_{1},\ldots,x^{\prime}_{n} \end{aligned} $$
(LC)

where a word in a certain set of letters can involve these letters and their inverses. We also considered a third property of individual conjugation

$$\displaystyle \begin{aligned} \exists y_{1},\ldots,y_{n} \in G, \ x^{\prime}_{1}=y_{1}x_{1}y_{1}^{-1}, \ldots, x^{\prime}_{n}=y_{n}x_{n}y_{n}^{-1} \end{aligned} $$
(IC)

In any group, one has the chain of implications

$$\displaystyle \begin{aligned}\mbox{({SC})} \Rightarrow \mbox{({LC})} \Rightarrow \mbox{({IC})}\end{aligned}$$

Unless the group G has very special properties (for instance that of being abelian), the second implication is not an equivalence, and the property (IC) is much weaker than the property (LC). For the group G to have the property W means that the properties (SC) and (LC) are equivalent. The proof of the following result can be found in [22], see also [10, 39].

Theorem 1.9

Any Cartesian product of special orthogonal, orthogonal, special unitary, unitary and symplectic groups has the property W.

It is known that some non-compact groups fail to have the property W. However, it seems not be known whether this equivalence holds, for instance, for spin groups.

2 Computation of Wilson Loop Expectations

In this section, we will give a few concrete examples of computations with the Yang–Mills holonomy process, with an eye to its so-called large N limit, that is, its behaviour when the group G is taken to be U(N) with an appropriately scaled invariant product on its Lie algebra, and N tends to infinity.Footnote 22

The basis of virtually any computation in 2-dimensional Yang–Mills theory is the Driver–Sengupta formula (DS). This formula can be combined with an expression of the heat kernel on G, for example its Fourier expansion, and lead to very concrete calculations. It is also possible to use a more dynamical, either analytic or probabilistic approach to the heat kernel, by seeing it as the solution of the heat equation or, almost equivalently, as the density of the distribution of the Brownian motion on G. We will illustrate these possibilities on a few examples in the simplest case where M is the plane, and then turn to the much more complicated case where M is the 2-dimensional sphere. For the sake of simplicity, we will assume in this section that the coupling constant T that appears in (DS) is equal to 1.

2.1 The Brownian Motion on the Unitary Group

In order to be as concrete as possible, and because we are interested in the large N limit, we will in this section choose G = U(N), the unitary group of rank N. As indicated earlier (see Footnote 4), we endow the Lie algebra of U(N), which is the space \({\mathfrak u}(N)\) of N × N skew-Hermitian matrices, with the scalar product 〈X, Y 〉 = NTr(X Y ). In the Euclidian space \(({\mathfrak u}(N),\langle \cdot , \cdot \rangle )\), we consider a linear Brownian motion \((K_{t})_{t\geqslant 0}\), use it to form the stochastic differential equation

$$\displaystyle \begin{aligned} dU_{t}=U_{t}{\; \mathrm{d}} K_{t} - \frac{1}{2}U_{t}{\; \mathrm{d}} t\ , \ \ U_{0}=I_{N} \end{aligned} $$
(15)

and call the unique solution to this equation the Brownian motion on U(N).

Using the notation Tr for the usual trace of a N × N matrix and \(\mathrm {tr}=\frac {1}{N}\mathrm {Tr}\) for its normalised trace, the usual rules of stochastic calculus take, in this matricial context, the following nice form: for all N × N matrix A, measurable with respect to \(\sigma (K_{s} : s\leqslant t)\), we have

$$\displaystyle \begin{aligned} {\; \mathrm{d}} K_{t} A {\; \mathrm{d}} K_{t} = - \mathrm{tr}(A) {\; \mathrm{d}} t \ \ \text{ and } \ \ {\; \mathrm{d}} K_{t} \, \mathrm{tr}(A {\; \mathrm{d}} K_{t}) =-\frac{1}{N^{2}} A {\; \mathrm{d}} t \end{aligned} $$
(16)

This relation can be used to check that \({\; \mathrm {d}}(U_{t}U_{t}^{*})=0\), so that the trajectories of the process B stay almost surely, as expected, in U(N).

The density of the distribution of U t with respect to the normalised Haar measure on U(N) is the function p t appearing in the Driver–Sengupta formula, and that we described in Section 1.4.2.

It will be useful to know the Fourier series of this function \(p_{t}:{\mathrm U}(N)\to {\mathbb R}\). To describe it, let us introduce the set \(\widehat {\mathrm U}(N)\) of equivalence classes of irreducible representations (or irreps) of U(N). For every \(\alpha \in \widehat {\mathrm U}(N)\), let us denote by d α the degree of α, that is, the dimension of the space on which U(N) acts through α. Let us also denote by \(\chi _{\alpha }:{\mathrm U}(N)\to {\mathbb C}\) the character of α, and by c 2(α) the quadratic Casimir number of α, that is, the non-negative real number such that

$$\displaystyle \begin{aligned}\Delta \chi_{\alpha}=-c_{2}(\alpha)\chi_{\alpha}\end{aligned}$$

The Fourier series of the heat kernel is then

$$\displaystyle \begin{aligned} p_{t}=\sum_{\alpha\in \widehat {\mathrm U}(N)} e^{-\frac{c_{2}(\alpha)t}{2}} d_{\alpha}\chi_{\alpha} \end{aligned} $$
(17)

and there is nothing specific to U(N) in this formula.

It is however possible, in the case of U(N), to write explicitly each of its ingredients. Indeed, the set of irreps of U(N) is conveniently labelled by non-increasing sequences of N relative integers \(\lambda =(\lambda _{1}\geqslant \ldots \geqslant \lambda _{N})\), called dominant weights. The dimension and quadratic Casimir number of the irrep with highest weight λ are given by the formulas

$$\displaystyle \begin{aligned} d_{\lambda}=\prod_{1\leqslant i< j \leqslant N} \frac{\lambda_{i}-\lambda_{j}+j-i}{j-i} \ \text{ and } \ N c_{2}(\lambda)=\sum_{1\leqslant i \leqslant N} \lambda_{i}^{2}+\sum_{1\leqslant i < j \leqslant N} (\lambda_{i}-\lambda_{j}) \end{aligned} $$
(18)

The character of this representation is given, up to a power of the determinant, by a Schur function, but we will not need its explicit formula.

We are now equipped to make some computations with the Yang–Mills holonomy process.

2.2 The Simple Loop on the Plane

2.2.1 Using Harmonic Analysis

Let us consider, on the plane, a loop that is a simple loop going once around a domain of area t (see, if needed, Figure 2). The partition function of the Yang–Mills model on the plane is equal to 1 and the Driver–Sengupta formula (DS) tells us that for every continuous test function \(f:{\mathrm U}(N)\to {\mathbb C}\), we have

$$\displaystyle \begin{aligned}{\mathbb E}[f(H_{\ell})]=\int_{{\mathrm U}(N)} f(x)p_{t}(x){\; \mathrm{d}} x \end{aligned}$$

In other words, H has the same distribution as U t, the value at time t of the Brownian motion on U(N) defined in the previous section.

Fig. 2
figure 2

A simple loop on the plane

Using the Fourier expansion (17) and the classical orthogonality relations between characters, we find, for every irrep α of U(N) acting on the vector space V α, the equality

$$\displaystyle \begin{aligned} {\mathbb E}[\alpha(H_{\ell})]=e^{-\frac{c_{2}(\alpha)t}{2}}\, \mathrm{id}_{V{}_{\alpha}} \end{aligned}$$

which holds in End(V α). In particular, since the usual trace is, on U(N), the character of the natural representation, which has highest weight (1, 0, …, 0), dimension N and quadratic Casimir 1, we find

$$\displaystyle \begin{aligned} {\mathbb E}[H_{\ell}]=e^{-\frac{t}{2}} I_{N} \ \ \text{ and } \ \ {\mathbb E}[\mathrm{tr} (H_{\ell})]=e^{-\frac{t}{2}} \end{aligned} $$
(19)

Suppose now that we want to compute the expectation of \(\mathrm {tr}(H_{\ell }^{2})\), which is also the expectation of \(\mathrm {tr}(H_{\ell ^{2}})\), where 2 is the loop gone along twice. From the Driver–Sengupta formula and the Fourier expansion of the heat kernel, we get the expression

$$\displaystyle \begin{aligned}{\mathbb E}[\mathrm{tr}(H_{\ell}^{2})]=\sum_{\lambda\in \widehat{\mathrm U}(N)} e^{-\frac{c_{2}(\lambda)t}{2}} d_{\lambda} \int_{{\mathrm U}(N)} \mathrm{tr}(x^{2})\chi_{\lambda}(x){\; \mathrm{d}} x\end{aligned}$$

In order to go further, we need to know that, at least when \(N\geqslant 2\),

$$\displaystyle \begin{aligned}\mathrm{tr}(x^{2})=\chi_{(2,0,\ldots,0)}(x)-\chi_{(1,1,0\ldots,0)}(x)\end{aligned}$$

Using again the orthogonality of characters, we find, after some reordering of the terms,

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{\ell}^{2})]=e^{-t} \Big(\cosh \frac{t}{N}-N\sinh \frac{t}{N}\Big) \end{aligned} $$
(20)

It is possible to go further down this road, by systematically writing the function x↦tr(x n) as a linear combination of characters. This is what Philippe Biane did to determine the large N limit of the non-commutative distribution of the Brownian motion on the unitary group. The simplest non-trivial case is the large N limit of (20):

$$\displaystyle \begin{aligned} \lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell}^{2})]=e^{-t}(1-t) \end{aligned} $$
(21)

The general formula is nice enough, at least in the limit when N tends to infinity, to be quoted explicitly. It was discovered independently by Philippe Biane and Eric Rains, who formulated it in terms of the Brownian motion on U(N) rather than the Yang–Mills holonomy process.

Theorem 2.1

(Biane [2], Rains [37]) With the current notation, and for every integer \(n\geqslant 1\),

$$\displaystyle \begin{aligned} \lim_{N\to \infty} {\mathbb E}[\mathrm{tr}(H_{\ell}^{n})]=e^{-\frac{nt}{2}}\sum_{k=0}^{n-1} \frac{(-t)^{k}}{k!} n^{k-1}\binom{n}{k+1} \end{aligned} $$
(22)

It must be said that this result already appeared, without proof, in Isadore Singer’s seminal paper on the large N limit of the Yang–Mills holonomy field [42].Footnote 23

One of Biane’s aims in [2] was to prove the following theorem concerning the limit as N tends to infinity of the Brownian motion on U(N) as a stochastic process. This convergence result is stated in the language of free probability, a theory presented in detail in the book of Alexandru Nica and Roland Speicher [34].

Theorem 2.2

(Biane [2]) As N tends to infinity, the Brownian motion on U(N) converges in non-commutative distribution, as a process, towards a unitary non-commutative process \((u_{t})_{t\geqslant 0}\) with free stationary multiplicative increments such that for all integer \(n\geqslant 0\) and all real \(t\geqslant 0\), the expectation of \(u_{t}^{n}\) and that of \((u_{t}^{*})^{n}\) are given by the right-hand side of (22).

2.2.2 Using Stochastic Calculus

Let us illustrate, on the same example of a simple loop on the plane, the dynamical approach to the same computations, based on the use of Itō’s formula. The general principle of these computations is to see the quantities such as the left-hand sides of (19) and (20) as functions of t, and to write a differential equation that they satisfy. Recall that t, in our current notation, is the area of the disk enclosed by the simple loop . A variation of t can thus be described, in geometrical terms, as a variation of the area of the unique face enclosed by .

As a first example, let us use (15) and Itō’s formula to find

$$\displaystyle \begin{aligned}\frac{d}{dt}{\mathbb E}[\mathrm{tr}(H_{\ell})]=\frac{d}{dt}{\mathbb E}[\mathrm{tr}(U_{t})]=-\frac{1}{2}{\mathbb E}[\mathrm{tr}(U_{t})]\end{aligned}$$

which, together with the information \({\mathbb E}[\mathrm {tr}(U_{0})]=1\), yield immediately (19).

Let us apply the same strategy to the computation of \({\mathbb E}[\mathrm {tr}(H_{\ell }^{2})]={\mathbb E}[\mathrm {tr}(U_{t}^{2})]\). The computation is more interesting and involves the first of the two rules (16). We find

$$\displaystyle \begin{aligned} \frac{d}{dt}{\mathbb E}[\mathrm{tr}(U_{t}^{2})]=-{\mathbb E}[\mathrm{tr}(U_{t}^{2})]-{\mathbb E}[\mathrm{tr}(U_{t})^{2}] \end{aligned} $$
(23)

and see a function of t pop up that we were initially not interested in, namely \({\mathbb E}[\mathrm {tr}(U_{t})^{2}]\). The only way out left to us is retreat forwards and we compute the derivative with respect to t of this new function, using now the second rule of (16):

$$\displaystyle \begin{aligned} \frac{d}{dt}{\mathbb E}[\mathrm{tr}(U_{t})^{2}]=-\frac{1}{N^{2}}{\mathbb E}[\mathrm{tr}(U_{t}^{2})]-{\mathbb E}[\mathrm{tr}(U_{t})^{2}] \end{aligned} $$
(24)

All’s well that ends well: (23) and (24) form a closed system of ordinary differential equations that is easily solved and from which we recover, in particular, (20). As a bonus, we get

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{\ell})^{2}]=e^{-t} \Big(\cosh \frac{t}{N}-\frac{1}{N}\sinh \frac{t}{N}\Big) \end{aligned} $$
(25)

The only change with respect to (20) is the change from N to \(\frac {1}{N}\) in front of the hyperbolic sine, with the effect that

$$\displaystyle \begin{aligned} \lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell})^{2}]=e^{-t}=\lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell})]^{2} \end{aligned} $$
(26)

This is an instance of a general factorisation property which was observed, among others, by Feng Xu [47], and which is a consequence of the concentration, in the limit where N tends to infinity, of the spectra of the random matrices that we are considering.

2.3 Yin . . .

Let us consider a slightly more complicated loop depicted on Figure 3. This loop goes once around a domain of area s + t and then once around a smaller domain of area t contained in the first one.

Fig. 3
figure 3

The loop goes first once along the larger circle (the edge a) and then once along the smaller circle (the edge b). The loop ab is equivalent to the concatenation of ab −1, b and b. The loops ab −1 and b are essentially simple loops surrounding disjoint domains

Let us apply the Driver–Sengupta formula in this case. We denote a generic element of the configuration space U(N)2 by (x a, x b), in relation with our labelling by a and b of the two edges of the graph formed by . Thus, for every continuous test function \(f:{\mathrm U}(N)\to {\mathbb C}\), we have

$$\displaystyle \begin{aligned}{\mathbb E}[f(H_{\ell})]=\int_{{\mathrm U}(N)^{2}}f(x_{b}x_{a}) p_{s}(x_{b}^{-1}x_{a})p_{t}(x_{b}){\; \mathrm{d}} x_{a}{\; \mathrm{d}} x_{b}\end{aligned}$$

Note that, according to (10), the discrete holonomy map is order-reversing, so that the loop  = ab gives rise to the map h (x a, x b) = x bx a.

The change of variables \((y,z)=(x_{b}^{-1}x_{a},x_{b})\) preserves the Haar measure on U(N)2 and we have

$$\displaystyle \begin{aligned} {\mathbb E}[f(H_{\ell})]=\int_{{\mathrm U}(N)^{2}}f(z^{2}y) p_{s}(y)p_{t}(z){\; \mathrm{d}} y {\; \mathrm{d}} z \end{aligned} $$
(27)

This corresponds to the fact, explained in the caption of Figure 3, that the loop can be written as 1 2 2, where 1 goes around the moon-shaped domain sitting between the two disks, and 2 goes around the small circle of area t. These loops enclose disjoint domains, and although 1 is not strictly speaking self-intersection free, they are essentially simple, in the sense that they can be approximated by simple loops.

From this graphical decomposition of , or from (27), we infer that H has the distribution of \(V_{t}^{2}U_{s}\), where U and V  are independent Brownian motions on U(N).Footnote 24 Using the independence, the fact that the expectation of U s is \(e^{-\frac {s}{2}}I_{N}\) (see (19)), and (20), we find

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{\ell})]=e^{-\frac{s}{2}-t} \Big(\cosh \frac{t}{N}-N\sinh \frac{t}{N}\Big) \end{aligned} $$
(28)

and, letting N tend to infinity,

$$\displaystyle \begin{aligned} \lim_{N\to\infty}{\mathbb E}[\mathrm{tr}(H_{\ell})]=e^{-\frac{s}{2}-t} (1-t) \end{aligned} $$
(29)

We succeeded in computing the expectation of tr(H ), but we did so by taking advantage of the favourable circumstances, namely the fact that the word \(V_{t}^{2}U_{s}\) is a very simple one, with two independent Brownian motions appearing one after the other (and not, for example, as U sV tU sV t), and the fact that the expectation of U s is a very simple matrix.

A more systematic approach is possible, by looking at \({\mathbb E}[\mathrm {tr}(V_{t}^{2}U_{s})]\) as a function of s and t and by using Itō’s formula to compute its partial derivatives. One finds

$$\displaystyle \begin{aligned} & \partial_{s}{\mathbb E}[\mathrm{tr}(V_{t}^{2}U_{s})]=-\frac{1}{2}{\mathbb E}[\mathrm{tr}(V_{t}^{2}U_{s})]\\ &\partial_{t}{\mathbb E}[\mathrm{tr}(V_{t}^{2}U_{s})]=-{\mathbb E}[\mathrm{tr}(V_{t}^{2}U_{s})]-{\mathbb E}[\mathrm{tr}(V_{t})\mathrm{tr}(V_{t}U_{s})] \end{aligned} $$

Once again, a function appears that we were not considering at first. Let us apply the same treatment to this new function:

$$\displaystyle \begin{aligned} & \partial_{s}{\mathbb E}[\mathrm{tr}(V_{t})\mathrm{tr}(V_{t}U_{s})]=-\frac{1}{2}{\mathbb E}[\mathrm{tr}(V_{t})\mathrm{tr}(V_{t}U_{s})]\\ &\partial_{t}{\mathbb E}[\mathrm{tr}(V_{t})\mathrm{tr}(V_{t}U_{s})]=-{\mathbb E}[\mathrm{tr}(V_{t})\mathrm{tr}(V_{t}U_{s})]-\frac{1}{N^{2}}{\mathbb E}[\mathrm{tr}(V_{t}^{2}U_{s})] \end{aligned} $$

It is possible to solve this system and to recover (28).

An interesting observation is the fact that the linear combination 2 s −  t of partial derivatives is particularly simple:

$$\displaystyle \begin{aligned} &(2\partial_{s}-\partial_{t}){\mathbb E}[\mathrm{tr}(V_{t}^{2}U_{s})]={\mathbb E}[\mathrm{tr}(V_{t})\mathrm{tr}(V_{t}U_{s})] \end{aligned} $$
(30)
$$\displaystyle \begin{aligned} \text{and } & (2\partial_{s}-\partial_{t}){\mathbb E}[\mathrm{tr}(V_{t})\mathrm{tr}(V_{t}U_{s})]=\frac{1}{N^{2}}{\mathbb E}[\mathrm{tr}(V_{t}^{2}U_{s})] \end{aligned} $$
(31)

These are instances of the Makeenko–Migdal equations that we will discuss in greater detail in the next section. Before that, let us study another example.

2.4 . . . And Yang

Let us now consider the eight-shaped loop drawn on Figure 4. The Driver–Sengupta formula yields, with the by now usual notation, and taking the inversion of the order into account,

$$\displaystyle \begin{aligned}{\mathbb E}[f(H_{\ell})]=\int_{{\mathrm U}(N)^{6}} f(x_{f}x_{e}x_{d}x_{c}x_{b}x_{a}) p_{s}(x_{a}x_{c}x_{e})p_{t}(x_{f}x_{b}x_{d})p_{u}(x_{c}^{-1}x_{f})p_{v}(x_{a}^{-1}x_{d}){\; \mathrm{d}} x\end{aligned}$$
Fig. 4
figure 4

An eight-shaped loop on the plane. The letters s, t, u, v in the faces indicate the areas of the faces. The other letters label the edges of the graph. The loop can be decomposed, as we did for the heart-shaped loop, as a product of lassos enclosing pairwise disjoint domains: abcdef = (ad −1)(dbf)(f −1c)(da −1)(aec)(c −1f). Here, by a lasso, we mean a loop of the form clc −1, where c is a path starting from the starting point of our loop and l is a simple loop. In this particular case, the path c is always the constant path

The appropriate change of variables is dictated by the geometry of the loop, more precisely by a decomposition in product of lassos, one of which is given in the caption of Figure 4. Accordingly, let us set

$$\displaystyle \begin{aligned}(y,z,g,h,e,f)=(x_{c}x_{e}x_{a},x_{f}x_{b}x_{d},x_{c}x_{f}^{-1},x_{d}^{-1}x_{a},x_{e},x_{f})\end{aligned}$$

This change of variables preserves the Haar measure on U(N)6.Footnote 25 Thus, we find

$$\displaystyle \begin{aligned}{\mathbb E}[f(H_{\ell})]=\int_{{\mathrm U}(N)^{4}} f(g^{-1}yh^{-1}gzh) p_{s}(y)p_{t}(z)p_{u}(g)p_{v}(h){\; \mathrm{d}} g {\; \mathrm{d}} h {\; \mathrm{d}} y {\; \mathrm{d}} z\end{aligned}$$

after integrating with respect to e and f which do not appear in the integrand. Thus, considering four independent Brownian motions G, H, Z, Y  on U(N), we find the equality in distribution

$$\displaystyle \begin{aligned} H_{\ell} \,{\mathop{\kern 0pt=}\limits_{}^{\text{dist.}}}\, G_{u}^{-1}Y_{s}H_{v}^{-1}G_{u}Z_{t}H_{v} \end{aligned} $$
(32)

The quantity \({\mathbb E}[\mathrm {tr}(H_{\ell })]\) appears now as a function of the four real parameters s, t, u, v and we can use stochastic calculus to differentiate it with respect to each of them. In fact, using the first assertion of (19), which in the language of Brownian motion reads \({\mathbb E}[Y_{s}]=e^{-\frac {s}{2}}I_{N}\) and \({\mathbb E}[Z_{t}]=e^{-\frac {t}{2}}I_{N}\), we can simplify the problem to

$$\displaystyle \begin{aligned}{\mathbb E}[\mathrm{tr}(H_{\ell})]=e^{-\frac{s+t}{2}}{\mathbb E}[\mathrm{tr}(G_{u}^{-1}H_{v}^{-1}G_{u}H_{v})]\end{aligned}$$

The expectation in the right-hand side of this equality is a symmetric function of u and v. Using stochastic calculus, we find

$$\displaystyle \begin{aligned} \partial_{u}{\mathbb E}[\mathrm{tr}(G_{u}^{-1}H_{v}^{-1}G_{u}H_{v})]=-{\mathbb E}[\mathrm{tr}(G_{u}^{-1}H_{v}^{-1}G_{u}H_{v})]+{\mathbb E}[\mathrm{tr}(H_{v}^{-1})\mathrm{tr}(H_{v})] \end{aligned} $$
(33)

The new function \({\mathbb E}[\mathrm {tr}(H_{v}^{-1})\mathrm {tr}(H_{v})]\) of v can in turn be computed using Itō’s formula, since it is equal to 1 when v = 0 and satisfies the differential equation

$$\displaystyle \begin{aligned}\partial_{v}{\mathbb E}[\mathrm{tr}(H_{v}^{-1})\mathrm{tr}(H_{v})]=-{\mathbb E}[\mathrm{tr}(H_{v}^{-1})\mathrm{tr}(H_{v})]+\frac{1}{N^{2}}\end{aligned}$$

which is solved in

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{v}^{-1})\mathrm{tr}(H_{v})]=\frac{1}{N^{2}}(1-e^{-v})+e^{-v} \end{aligned} $$
(34)

Replacing in (33) and solving, we find finally

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{\ell})]=e^{-\frac{s+t}{2}} \Big(e^{-u}+e^{-v}-e^{-(u+v)}+\frac{1}{N^{2}}(1-e^{-u})(1-e^{-v})\Big) \end{aligned} $$
(35)

and, letting N tend to infinity,

$$\displaystyle \begin{aligned} \lim_{N\to\infty}{\mathbb E}[\mathrm{tr}(H_{\ell})]=e^{-\frac{s+t}{2}} \big(e^{-u}+e^{-v}-e^{-(u+v)}\big) \end{aligned} $$
(36)

We did these computations without taking great care of a possible geometric meaning of the successive steps. Anticipating our discussion of the Makeenko–Migdal equations, it is interesting to check that

$$\displaystyle \begin{aligned} (\partial_{u}+\partial_{v}-\partial_{s}-\partial_{t}){\mathbb E}[\mathrm{tr}(H_{\ell})]=e^{-\frac{s+t}{2}}\Big(e^{-(u+v)}+\frac{1}{N^{2}}(1-e^{-(u+v)})\Big)={\mathbb E}[\mathrm{tr}(H_{\ell'})\mathrm{tr}(H_{\ell''})] \end{aligned} $$
(37)

where ℓ′ and ″ are the loops drawn on Figure 5.

Fig. 5
figure 5

The loops ℓ′ and ″ are obtained from by an operation that will feature prominently in Section 3

Perhaps even more interesting than the fact that (37) holds, which after all is a consequence of Theorem 3.1, is the observation that (37) does not seem to be easily guessed from (32) and Itō’s formula. More precisely, Itō’s formula allows us to give an expression of the left-hand side of (37) and it is not obvious that this expression coincides with the right-hand side of (37). We take this as a sign that the Makeenko–Migdal equations give an information that is practically non-trivial.

2.5 The Case of the Sphere: A Not So Simple Loop

Computations involving the Yang–Mills holonomy process on the sphere, although in principle based on the same formulas as in the case of the plane, are in general much more complicated. This can be explained by the fact that, as we indicated in Section 1.4.5, the stochastic core of the Yang–Mills holonomy process on a sphere is a Brownian bridge on U(N), or on the compact Lie group G, instead of a Brownian motion.

In this section, we are going to illustrate some of the difficulties that one meets when working on a sphere. The first is that the partition function is not equal to 1 anymore. Instead, according to (1.5), it is given, on a sphere of total area T, by

$$\displaystyle \begin{aligned}Z_{S^{2}}(T)=p_{T}(I_{N})=\|p_{\frac{T}{2}}\|{}^{2}_{L^{2}({\mathrm U}(N))}=\sum_{\alpha\in \widehat {\mathrm U}(N)} e^{-\frac{T}{2} c_{2}(\alpha)} d_{\alpha}^{2}\end{aligned}$$

This is also an expression in which nothing is specific to U(N): it is valid for any compact Lie group.Footnote 26

The most basic question about the Yang–Mills holonomy process on the sphere is the analogue to the question that we treated in Section 2.2, namely to compute the expectation of the normalised trace of the holonomy along a simple loop enclosing a domain of area t. The Driver–Sengupta formula yields the following expression for this expectation:

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{\ell})]=\frac{1}{Z_{S^{2}}(T)} \int_{{\mathrm U}(N)} \mathrm{tr}(x) p_{t}(x)p_{T-t}(x^{-1}){\; \mathrm{d}} x \end{aligned} $$
(38)

Using the Fourier expansion of the heat kernel, one finds

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{\ell})]=\frac{1}{Z_{S^{2}}(T)} \sum_{\lambda,\mu\in \widehat{\mathrm U}(N)} e^{-c_{2}(\lambda)\frac{t}{2}-c_{2}(\mu)\frac{T-t}{2}} d_{\lambda} d_{\mu} \int_{{\mathrm U}(N)} \mathrm{tr}(x)\chi_{\lambda}(x)\chi_{\mu}(x^{-1}){\; \mathrm{d}} x \end{aligned}$$

The integral can be computed thanks to Pieri’s rule: it is equal to 0 unless μ is obtained from λ by adding 1 to exactly one component, in which case it is equal to 1. We write when this happens. Thus,

(39)

It seems difficult to give an expression of \({\mathbb E}[\mathrm {tr}(H_{\ell })]\) much simpler than (38) or (39) which, as is hardly necessary to emphasise, is much more complicated than the one that we obtained in the case of the plane.Footnote 27

It is, however, possible to analyse the limit of this quantity as N tends to infinity. A first step in this direction is based on the realisation that Pieri’s rule is simple, and the quantity between square brackets, which we denote by f 1(λ) is a finite sum and can be written explicitly using (18):

This suggests to associate with the highest weight λ the decreasing sequence l = (l 1 > … > l N) of half-integers defined by

$$\displaystyle \begin{aligned}l_{i}=\lambda_{i}+\frac{N-2i+1}{2}\end{aligned}$$

so that

Let us now introduce the probability measure π N,T on \(\widehat {\mathrm U}(N)\) such that for every highest weight λ, one has

$$\displaystyle \begin{aligned}\pi_{N,T}(\{\lambda\})\propto e^{-c_{2}(\lambda)\frac{T}{2}}d_{\lambda}^{2}\end{aligned}$$

Then (39) can be written more compactly as

$$\displaystyle \begin{aligned} {\mathbb E}[\mathrm{tr}(H_{\ell})]=\int_{\widehat {\mathrm U}(N)} f_{1}(\lambda) {\; \mathrm{d}} \pi_{N,T}(\lambda) \end{aligned} $$
(40)

Moreover, there exists for each integer \(n\geqslant 2\) a function f n on \(\widehat {\mathrm U}(N)\), not very different from f 1, and the integral of which against π N,T yields \({\mathbb E}[\mathrm {tr}(H_{\ell }^{n})]\).

We would like to express that, as N tends to infinity, the measure π N,T concentrates on a few highest weights, characterised by a certain limiting shape. One unpleasant feature of (40) in this respect is that the set on which the integral is taken, namely \(\widehat {\mathrm U}(N)\), depends on N. It is thus uneasy to formulate a concentration result. One classical and efficient way around this problem is to associate with each highest weight λ its empirical measure (Figure 6)

$$\displaystyle \begin{aligned}\hat\mu_{\lambda}=\frac{1}{N}\sum_{i=1}^{N} \delta_{\frac{l_{i}}{N}}=\frac{1}{N}\sum_{i=1}^{N} \delta_{\frac{1}{N}(\lambda_{i}+\frac{N-2i+1}{2})}\end{aligned}$$
Fig. 6
figure 6

With N = 9, the highest weight λ = (5, 4, 4, 2, 2, 1, 0, −2, −4) drawn in the style of a Young diagram, and its empirical measure. Each dot represents \( \frac {1}{9}\) of mass and any two dots are distant by a multiple of \( \frac {1}{9}\)

Pushing the probability measure π N,T forward by the map \(\lambda \mapsto \hat \mu _{\lambda }\) yields a probability measure, which we denote by ΠN,T, on the set of probability measures on the real line. It is possible to predict the behaviour of this probability as N tends to infinity by writing c 2(λ) and dλ in terms of the empirical measure of λ. Up to some inessential terms (see [25, Eq. (24)] for complete expressions), one finds

$$\displaystyle \begin{aligned}c_{2}(\lambda)&\simeq N^{2} \int_{{\mathbb R}} x^{2} {\; \mathrm{d}}\hat\mu_{\lambda}(x) \ \ \text{ and } \\ &d_{\lambda}^{2} \simeq \exp \bigg[- N^{2} \int_{\{(x,y)\in {\mathbb R}^{2}, x\neq y\}} -\log |x-y| {\; \mathrm{d}}\hat\mu_{\lambda}(x) {\; \mathrm{d}}\hat\mu_{\lambda}(y)\bigg] \end{aligned} $$

Introducing, for every probability measure μ, the quantity

$$\displaystyle \begin{aligned}\mathcal J_{T}(\mu)=\int_{\{(x,y)\in {\mathbb R}^{2}, x\neq y\}} -\log |x-y| {\; \mathrm{d}}\mu(x) {\; \mathrm{d}}\mu(y)+\frac{T}{2}\int_{{\mathbb R}} x^{2}{\; \mathrm{d}}\mu(x)\end{aligned}$$

we see that the probability measure ΠN,T assigns to any probability measure μ that is the empirical measure of a highest weight a mass proportional to

$$\displaystyle \begin{aligned}\Pi_{N,T}(\{\mu\})\propto \exp(-N^{2}\mathcal J_{T}(\mu))\end{aligned}$$

In the large N limit, it seems plausible that ΠN,T will concentrate on the minimisers, or even better, on the unique minimiser of the functional \(\mathcal J_{T}\). This turns out to be true, with a little twist that we will explain and contributes to making the story much more interesting than it already is. Let us summarise the main results on which one can ground a rigorous analysis of the situation.

  • Minimising the functional \(\mathcal J_{T}\) on the space of all probability measures on \({\mathbb R}\) is one of the simplest examples of a rich and well-developed theory which is, for example, exposed in the book of Edward Saff and Vilmos Totik [38]. This is also a very common problem in random matrix theory. Indeed, the unique minimiser of \(\mathcal J_{T}\) is Wigner’s semi-circular distribution with variance \(\frac {1}{T}\):

    (41)
  • The fact that the measure ΠN,T concentrates, as N tends to infinity, to the minimiser of \(\mathcal J_{T}\) is a special case of a principle of large deviations proved by Alice Guionnet and Mylène Maïda in [16]. However, the minimiser of \(\mathcal J_{T}\) that one must consider is not the absolute minimiser on the set of all probability measures on \({\mathbb R}\). Indeed, for all \(N\geqslant 1\), the measure ΠN,T is supported by the set of empirical measures of highest weights of U(N), which form a rather special set of probability measures. A distinctive feature of these measures is that they are atomic, with atoms of mass \(\frac {1}{N}\) spaced by integer multiples of \(\frac {1}{N}\). Weak limits, as N tends to infinity, of such measures can only be absolutely continuous with respect to the Lebesgue measure on \({\mathbb R}\), with a density not exceeding 1: a class of probability measures that we will denote by \(\mathcal L({\mathbb R})\). The result of Guionnet and Maïda asserts that the measure ΠN,T concentrates exponentially fast, as N tends to infinity, around the unique minimiser \(\mu _{T}^{*}\) of \(\mathcal J_{T}\) on the closed set \(\mathcal L({\mathbb R})\).

  • The problem of minimising \(\mathcal J_{T}\) under the constraint of having a density not exceeding 1 is a problem that is, in principle, just as well understood as the unconstrained problem. The book [38] contains results ensuring the existence and uniqueness of the minimiser, and others allowing one to determine its support. In fact, the measure σ 1∕T given by (41), and which is the absolute minimiser of \(\mathcal J_{T}\), is absolutely continuous with a maximal density of \(\sqrt {T}/\pi \), so that it belongs to \(\mathcal L({\mathbb R})\) provided \(T\leqslant \pi ^{2}\). For T > π 2, the constraint becomes truly restrictive, and one must make do with a probability measure which is, in \(\mathcal L({\mathbb R})\), the best available substitute for σ 1∕T. The actual determination of this minimiser \(\mu ^{*}_{T}\) is, depending on one’s background, a more or less elementary exercise in Riemann–Hilbert theory, and involves manipulating elliptic functions. The density of \(\mu _{T}^{*}\) for T > π 2 is represented on Figure 7. An exact expression of this density can be found in [25, Eq. (37)].

    Fig. 7
    figure 7

    For T > π 2, the absolute minimiser of the functional \(\mathcal J_{T}\) does not belong to the class of probabilities on \({ \mathbb R}\) with a density not exceeding 1. The minimiser within this class is represented on the right. Its density is identically equal to 1 on an interval in the middle of its support, and given by elliptic functions outside this interval

Having established the exponential concentration, as N tends to infinity, of the measure ΠN,T around \(\mu ^{*}_{T}\), it is possible to come back to our initial problem of computing \({\mathbb E}[\mathrm {tr}(H_{\ell })]\). After noticing that f 1(λ) can be expressed as a functional \(F_{1}(\hat \mu _{\lambda })\) of the empirical measure of λ, it can be guessed that \({\mathbb E}[\mathrm {tr}(H_{\ell })]\) is related to \(F_{1}(\mu ^{*}_{T})\). Antoine Dahlqvist and James Norris were the first to rigorously and successfully pursue this line of reasoning, and to obtain the following remarkably elegant result.

Theorem 2.3

(Dahlqvist–Norris [5]) Let ρ T denote the density of the minimiser \(\mu ^{*}_{T}\). Then, for all integer \(n\geqslant 0\), one has

$$\displaystyle \begin{aligned} \lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell}^{n})]=\lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell}^{-n})]=\frac{1}{n\pi} \int_{{\mathbb R}} \cosh \Big(\frac{nx}{2}(T-2t)\Big) \sin (n\pi \rho_{T}(x)){\; \mathrm{d}} x \end{aligned} $$
(42)

To conclude this long discussion of the simple loop on the sphere, let us mention another result for the statement of which we have all the concepts at hand. Our description of the behaviour of the measure ΠN,T suggests that the partition function itself is dominated by the contribution of the highest weights that have an empirical measure close to \(\mu ^{*}_{T}\). This is indeed true, and the fact that the shape of \(\mu ^{*}_{T}\) changes suddenly when T crosses the critical value π 2 gives rise to a phase transition, in this case of third order, first discovered by Douglas and Kazakov, and named after them. It was first proved rigorously, in a slightly different but equivalent language, by Karl Liechty and Dong Wang in [27], and by Mylène Maïda and the author in [25].

Theorem 2.4

(Douglas–Kazakov phase transition) The free energy of the Yang–Mills model on a sphere of total area T is given by

$$\displaystyle \begin{aligned}F(T)=\lim_{N\to \infty} \frac{1}{N^{2}}\log Z_{S^{2}}(T)=\frac{T}{24}+\frac{3}{2}-\mathcal J_{T}(\mu^{*}_{T})\end{aligned}$$

The function F is of class C 2 on (0, ) and smooth on (0, ) ∖{π 2}. The third derivative of F admits a jump discontinuity at π 2.

This phase transition is not one that is easily detected numerically, as Figure 8 shows.

Fig. 8
figure 8

The graphs of TF(T) (on the left) and of TF (3)(T) near T = π 2 (on the right)

3 The Makeenko–Migdal Equations

3.1 First Approach

It is now time that we discuss the equations discovered by Yuri Makeenko and Alexander Migdal and which give their title to these notes. These equations are a powerful tool for the study of the Wilson loop expectations of which we gave a few examples in the previous section. They are related to the approach that we called dynamical, in which an expectation of the form \({\mathbb E}[\mathrm {tr}(H_{\ell })]\), where is some nice loop on a surface M, is seen as a function of the areas of the faces cut by on the surface M. The Makeenko–Migdal equations give a remarkably elegant expression of the alternated sum of the derivatives of \({\mathbb E}[\mathrm {tr}(H_{\ell })]\) with respect to the areas of the four faces that surround a generic point of self-intersection of . This expression is of the form \({\mathbb E}[\mathrm {tr}(H_{\ell '})\mathrm {tr}(H_{\ell ''})]\), where ℓ′ and ″ are two loops obtained from by a very simple operation at this point of self-intersection . This operation consists in taking the two incoming strands of at this point and connecting them with the two outgoing strands in the ‘other’ way, the way that is not realised by , see Figure 9.

Fig. 9
figure 9

On the left, we see a loop around a generic self-intersection point. The dotted and dashed part of can be arbitrarily complicated, and can meet many times outside the small region of the surface that we are focusing on. It is nevertheless true that after escaping this small region through the North-East corner (resp. North-West corner), the first time comes back is through the South-East corner (resp. South-West corner). This is why the ‘desingularisation’ operation illustrated on the right produces exactly two loops, that we call ℓ′ and

On this figure, we see four faces around the self-intersection point, which need not be pairwise distinct. We denote their areas by t 1, t 2, t 3, t 4 as indicated on Figure 9. The Makeenko–Migdal equation in this case reads

$$\displaystyle \begin{aligned} \bigg(\frac{\partial}{\partial t_{1}}-\frac{\partial}{\partial t_{2}}+\frac{\partial}{\partial t_{3}}-\frac{\partial}{\partial t_{4}}\bigg){\mathbb E}[\mathrm{tr}(H_{\ell})]={\mathbb E}[\mathrm{ tr}(H_{\ell^{\prime}})\mathrm{tr}(H_{\ell^{\prime\prime}})] \end{aligned} $$
(MM)

The relation (30), that we derived earlier in an elementary way, is an instance of this equation.

The relation (MM) would become particularly useful if we could combine it with a result saying that \({\mathbb E}[\mathrm {tr}(H_{\ell '})\mathrm {tr}(H_{\ell ''})]={\mathbb E}[\mathrm {tr}(H_{\ell '})]{\mathbb E}[\mathrm {tr}(H_{\ell ''})]\). A crucial fact is that this equality, which is of course false in general, becomes true in the large N limit in all cases where this limit has been studied, that is, on the plane and on the sphere. It corresponds to a concentration phenomenon, namely to the fact that the complex-valued random variable tr(H ) converges, in the large N limit, to a deterministic complex, indeed real number Φ(). This behaviour is expected to occur on any compact surface, and the function \(\Phi :{\mathcal L}(M)\to {\mathbb R}\), whose existence has so far been proved when M is the plane or the sphere, is called the master field.

In the large N limit, the Makeenko–Migdal equation (MM) becomes a kind of differential equation satisfied by this master field Φ:

$$\displaystyle \begin{aligned} \bigg(\frac{\partial}{\partial t_{1}}-\frac{\partial}{\partial t_{2}}+\frac{\partial}{\partial t_{3}}-\frac{\partial}{\partial t_{4}}\bigg)\Phi(\ell)=\Phi(\ell^{\prime})\Phi(\ell^{\prime\prime}) \end{aligned} $$
(MM ∞)

On the plane, we will see that this equation, together with the very simple equation (19), essentially characterises the function Φ.

3.2 Makeenko and Migdal’s Proof

Makeenko and Migdal discovered the relation (MM), and the extensions that we will describe later, by doing a very clever integration by parts in the functional integral with respect to the Yang–Mills measure (see (3)) that defines a Wilson loop expectation:

$$\displaystyle \begin{aligned}{\mathbb E}[\mathrm{tr}(H_{\ell})]=\frac{1}{Z}\int_{{\mathcal A}} \mathrm{tr}(\text{hol}(\omega,\ell)) e^{-\frac{1}{2}S_{{\mathsf{YM}}}(\omega)}{\; \mathrm{d}} \omega \end{aligned}$$

or instead, as we will explain, in a closely related integral (see [31] and [9]). That this integration by parts performed in an ill-defined integral yields as a final product a perfectly meaningful formula, makes Makeenko and Migdal’s original derivation the more intriguing. It is described in mathematical language in the introduction of [24], but this derivation is so beautiful that we reproduce its description here.

The finite-dimensional prototype of the so-called Schwinger–Dyson equations, obtained by integration by parts in functional integrals, is the fact that for all smooth function \(f:{\mathbb R}^{n}\to {\mathbb R}\) with bounded differential, and for all \(h\in {\mathbb R}^{n}\), the equality

$$\displaystyle \begin{aligned}\int_{{\mathbb R}^{n}} d_{x}f(h) e^{-\frac{1}{2}\|x\|{}^{2}}\; dx=\int_{{\mathbb R}^{n}}\langle x,h\rangle f(x)e^{-\frac{1}{2}\|x\|{}^{2}}\; dx\end{aligned}$$

holds. This equality ultimately relies on the invariance by translation of the Lebesgue measure on \({\mathbb R}^{n}\) and it can be proved by writing

$$\displaystyle \begin{aligned}0=\frac{d}{dt}_{|t=0}\int_{{\mathbb R}^{n}}f(x+th)e^{-\frac{1}{2}\|x+th\|{}^{2}}\; dx\end{aligned}$$

In our description of the Yang–Mills measure μ YM (see (3)), we mentioned that the measure dω on the space \({\mathcal A}\) of connections was meant to be a kind of Lebesgue measure, invariant by translations. This is the key to the derivation of the Schwinger–Dyson equations, as we will now explain. In what follows, we will use the differential geometric language introduced in Section 1.2.

Let \(\psi :{\mathcal A}\to {\mathbb R}\) be an observable, that is, a function. In general, we are interested in the integral of ψ with respect to the measure μ YM. The tangent space to the affine space \({\mathcal A}\) is the linear space Ω1(M) ⊗Ad(P). To say that the measure dω is translation invariant means that for every element η of this linear space,

$$\displaystyle \begin{aligned}0=\frac{d}{dt}_{|t=0}\int_{{\mathcal A}}\psi(\omega+t\eta)e^{-\frac{1}{2}S_{{\mathsf{YM}}}(\omega+t\eta)}{\; \mathrm{d}}\omega\end{aligned}$$

and the Schwinger–Dyson equations follow in their abstract form

$$\displaystyle \begin{aligned} \int_{{\mathcal A}}d_{\omega}\psi(\eta) {\; \mathrm{d}}\mu_{{\mathsf{YM}}}(\omega)=\frac{1}{2}\int_{{\mathcal A}}\psi(\omega)d_{\omega}S_{{\mathsf{YM}}}(\eta){\; \mathrm{d}}\mu_{{\mathsf{YM}}}(\omega) \end{aligned} $$
(43)

The directional differential of the Yang–Mills action is well known (see for example [3]) and most easily expressed using the covariant exterior differential d ω : Ω0(M) ⊗Ad(P) → Ω1(M) ⊗Ad(P) defined by d ωα =  + [ω ∧ α]. It is given by

$$\displaystyle \begin{aligned}d_{\omega}S_{{\mathsf{YM}}}(\eta)=2\int_{M}\langle \eta \wedge d^{\omega} *\!\Omega\rangle\end{aligned}$$

The problem is now to apply this formula to a well-chosen observable ψ and to differentiate in the right direction.

Given a loop on M, Makeenko and Migdal applied (43) to the observable defined by choosing a skew-Hermitian matrix \(X\in {\mathfrak u}(N)\) and setting, for all \(\omega \in {\mathcal A}\),

$$\displaystyle \begin{aligned} \psi_{X}(\omega)=\mathrm{Tr}(X\,\text{hol}(\omega,\ell)) \end{aligned} $$
(44)

To make this definition perfectly meaningful, one needs to choose a reference point in the fibre of P over the base point of : we will assume that such a point has been chosen and fixed, and compute holonomies with respect to this point.

Let us choose a parametrisation  : [0, 1] → M of . The directional derivative of the observable ψ X in the direction of a 1-form η ∈ Ω1(M) ⊗Ad(P) is given by

$$\displaystyle \begin{aligned} d_{\omega}\psi_{X}(\eta)=-\int_{0}^{1}\mathrm{Tr}\left(X\,\text{hol}(\omega,\ell_{[s,1]})\eta(\dot \ell(s)) \text{hol}(\omega,\ell_{[0,s]})\right) \; ds \end{aligned} $$
(45)

where we denote by [a,b] the restriction of to the interval [a, b].Footnote 28

One must now choose the direction of differentiation η. Let us assume that is a nice loop which around each point of self-intersection looks like the left half of Figure 9. Let us assume that for some s 0 ∈ (0, 1), we have (s 0) = (0) and \(\det (\dot \ell (0),\dot \ell (s_{0}))=1\). Makeenko and Migdal choose for η a distributional 1-form supported at the self-intersection point (0), which one could write asFootnote 29

$$\displaystyle \begin{aligned}\forall m\in M, \forall v\in T_{m}M, \; \eta_{m}(v)=\delta_{m,\ell(0)} \det(\dot \ell(0),v) X\end{aligned}$$

with \(\det (\dot \ell (0),v)\) denoting the determinant of the two vectors \(\dot \ell (0)\) and v. With this choice of η, the directional derivative of ψ X is given by

$$\displaystyle \begin{aligned} d_{\omega}\psi_{X}(\eta)=-\mathrm{Tr}\left(X\,\text{hol}(\omega,\ell_{[s_{0},1]}) X\, \text{hol}(\omega,\ell_{[0,s_{0}]})\right)=-\mathrm{Tr}\left(X\,\text{hol}(\omega,\ell') X\, \text{hol}(\omega,\ell'')\right)\end{aligned} $$
(46)

where ℓ′ and ″ are the loops defined on the right of Figure 9. Recall that \({\mathfrak u}(N)\) is endowed with the invariant scalar product 〈X, Y 〉 = −NTr(XY ). The directional derivative of the Yang–Mills action is thus given by

$$\displaystyle \begin{aligned}d_{\omega}S_{{\mathsf{YM}}}(\eta)=-2\langle X, (d^{\omega}\!*\!\Omega)(\dot \ell(0))\rangle=-2N\mathrm{Tr}\left(X d^{\omega}\!*\!\Omega(\dot \ell(0))\right)\end{aligned}$$

or so it seems from a naive computation. We shall soon see that this expression needs to be reconsidered. For the time being, our Schwinger–Dyson equation reads

$$\displaystyle \begin{aligned} \int_{{\mathcal A}}\mathrm{Tr}\left(X\,\text{hol}(\omega,\ell') X\, \text{hol}(\omega,\ell'')\right){\; \mathrm{d}}\mu_{{\mathsf{YM}}}(\omega)= N\int_{{\mathcal A}}\mathrm{Tr}(X\,\text{hol}(\omega,\ell)) \mathrm{Tr}(X\, d^{\omega}\! * \! \Omega(\dot \ell(0))) {\; \mathrm{d}}\mu_{{\mathsf{YM}}}(\omega)\end{aligned} $$
(SD X)

Let us add the equalities (SDX ) obtained by letting X take all the values \(X_{1},\ldots ,X_{N^{2}}\) of an orthonormal basis of \({\mathfrak u}(N)\). With the scalar product which we chose, the relationsFootnote 30

$$\displaystyle \begin{aligned} \sum_{k=1}^{N^{2}}\mathrm{Tr}(X_{k}AX_{k}B)=-\frac{1}{N}\mathrm{Tr}(A)\mathrm{Tr}(B) \mbox{ and } \sum_{k=1}^{N^{2}}\mathrm{Tr}(X_{k}A)\mathrm{Tr}(X_{k}B)=-\frac{1}{N}\mathrm{Tr}(AB) \end{aligned} $$
(47)

hold for any two matrices A and B, so that we find

$$\displaystyle \begin{aligned} \int_{{\mathcal A}}\mathrm{tr}\left(\text{hol}(\omega,\ell'))\mathrm{tr}(\text{hol}(\omega,\ell'')\right){\; \mathrm{d}}\mu_{{\mathsf{YM}}}(\omega)= \int_{{\mathcal A}}\mathrm{tr}\left(\text{hol}(\omega,\ell) d^{\omega}\! * \! \Omega(\dot \ell(0))\right){\; \mathrm{d}} \mu_{{\mathsf{YM}}}(\omega).\end{aligned} $$

The left-hand side of this equation is the right-hand side of (MM). The last and most delicate heuristic step is to interpret the right-hand side of this equation. For this, we must understand the term \(d^{\omega }*\!\Omega (\dot \ell (0))\) and we do this by combining two facts: the fact that d ω acts by differentiation in the horizontal direction and the fact that ∗ Ω computes the holonomy along infinitesimal rectangles. We must also remember that this term comes from the computation of the exterior product of the distributional form η with the form d ω ∗  Ω. It turns out that, instead of a derivative in the horizontal direction with respect to s at s = 0, we should think of the difference between the values at 0+ and at 0, which we denote by Δ|s=0.

With all this preparation and, it must be said, a small leap of faith, the right-hand side of the Schwinger–Dyson equation can finally be drawn as follows:

This is indeed the left-hand side of the Makeenko–Migdal equation (MM).

3.3 The Equations, Their Merits and Demerits

The strategy of proof described in the previous section can be used, and was used by Makeenko and Migdal, to derive equations slightly more general than (MM). Let us indeed consider a collection 1, …, n of loops on the surface M. We assume that these loops are nice and in generic position, in the sense that every crossing between two portions of these loops, be they two portions of the same loop or portions of two different loops, is a simple transverse intersection. Around such a crossing, we see, as before, four faces of the graph cut on M by 1, …, n, and we label the areas of these faces t 1, t 2, t 3, t 4 as indicated on Figures 9 and 10. The Makeenko–Migdal equations express the alternated sum of the derivatives with respect to t 1, t 2, t 3, t 4 of \({\mathbb E}[\mathrm {tr}(H_{\ell _{1}})\ldots \mathrm {tr}(H_{\ell _{n}})]\). The equations come in two variants, depending on whether the crossing is between two strands of the same loop (let us call this the case I) or between strands of two distinct loops (the case II). In the case II, where the crossing is between strands of two distinct loops, say 1 and 2, the same desingularisation operation explained at the beginning of Section 3.1 gives rise to one new loop 12, as explained in Figure 10.

Fig. 10
figure 10

When performed at a crossing of two distinct loops 1 and 2, the operation of reconnecting the incoming and outgoing strands in the other way that is consistent with orientation produces, from 1 and 2, one bigger loop that we denote by 12

Calling, in all cases, 1 the loop containing the South-West – North-East strand, one should replace the observable ψ X defined in (44) by

$$\displaystyle \begin{aligned}\psi_{X}(\omega)=\mathrm{Tr}(X\text{hol}(\omega,\ell_{1}))\mathrm{Tr}(\text{hol}(\omega,\ell_{2}))\ldots \mathrm{Tr}(\text{hol}(\omega,\ell_{n}))\end{aligned}$$

Then the directional derivative of ψ X is given by

$$\displaystyle \begin{aligned}d_{\omega}\psi_{X}(\eta)=\left|\!\!\begin{array}{ll} \mathrm{Tr}\big(X\,\text{hol}(\omega,\ell')X\,\text{hol}(\omega,\ell'')\big)\mathrm{Tr}(\text{hol}(\omega,\ell_{2}))\ldots \mathrm{Tr}(\text{hol}(\omega,\ell_{n})) & \text{(case I)}\\ {} \mathrm{Tr}(X\,\text{hol}(\omega,\ell_{1}))\mathrm{Tr}(X\,\text{hol}(\omega,\ell_{2}))\mathrm{Tr}(\text{hol}(\omega,\ell_{3}))\ldots \mathrm{Tr}(\text{hol}(\omega,\ell_{n}))& \text{(case II)} \end{array}\right.\end{aligned}$$

Then, the key to the computation is, as always, given by the equations (47). The final result, with the current notation, is the following.

Theorem 3.1

(Makeenko–Migdal equations) Let ℓ 1, …, n be nice loops on M in generic position. Consider a crossing point of two strands of ℓ 1 (case I) or of one strand of ℓ 1 and one strand of ℓ 2 (case II). Let t 1, t 2, t 3, t 4 denote the areas of the four faces around this crossing point, as illustrated on Figures 9 and 10. Then, with the notation of these figures,

It is understood that if two of the four faces around the crossing under consideration are identical, then the corresponding derivative should be taken twice. Moreover, in the case where \(M={\mathbb R}^{2}\) , any term corresponding to the derivative with respect to the area of the unbounded face should be ignored.

Makeenko and Migdal’s original paper on this subject is [31]. The first mathematical proof of the equations was given in [24]. It was rather long and convoluted, and restricted to the case where the surface M is the plane \({\mathbb R}^{2}\). Three very short and elegant proofs of the equations were then given, still for the case of the plane, by Bruce Driver, Brian Hall and Todd Kemp in [8]. Immediately after, the same team joined by Franck Gabriel proved in [7] that the equations hold on any compact surface. There is little point in reproducing here the content of these beautiful papers. Let us simply emphasise that the fundamental computations remain those summarised in (47).

In addition to their simplicity, the Makeenko–Migdal equations have one major quality which is the fact that the collection of loops appearing in the right-hand side has one crossing less compared with the original collection of loops. Indeed, the operation of desingularisation replaces the crossing where it takes place by a tangential contact which, to the price of an arbitrarily small deformation of the loops, can be suppressed. This suggests the possibility of a recursive computation of Wilson loop expectations. We will explain in the next section that it is indeed possible to use the Makeenko–Migdal equations to set up a recursive computation of the large N limit of Wilson loop expectations.

What the Makeenko–Migdal do not do however, is to give a simple formula for the derivative of a Wilson loop expectation with respect to the area of a single face of the graph traced by a given configuration of loops. Only very special linear combinations of these derivatives are accessible. Of course, unless one is working on the plane, the total area of the surface is prescribed and the best one could hope for is a formula describing the variation of the Wilson loop expectations under an arbitrary variation of the areas of the faces that preserves the total area. However, this is, in general, not given by the Makeenko–Migdal equations, see for example Figure 11.

Fig. 11
figure 11

Consider this configuration of two loops on a sphere. It has five faces and three vertices. Moreover, of the three instances of the Makeenko–Migdal equations, two compute the same linear combination of derivatives. There is no hope that the Makeenko–Migdal equations alone will allow one to compute the corresponding Wilson loop expectation

It is, in fact, not too difficult to understand what information is available in the Makeenko–Migdal equations. Let us consider n loops 1, …, n on our surface M. Let F 1, …, F r denote the faces of the graph traced by these loops. Let us identify a vector (c 1, …, c r) of the vector space \({\mathbb R}^{r}\) with the linear combination of derivatives

$$\displaystyle \begin{aligned}c_{1}\frac{\partial}{\partial |F_{1}|}+\ldots +c_{r}\frac{\partial}{\partial |F_{r}|}\end{aligned}$$

acting on Wilson loop expectations. Let us define the linear subspace \(M\subset {\mathbb R}^{r}\) generated by the linear combinations given by the Makeenko–Migdal equations applied at each crossing of the loops 1, …, n. This subspace M is of course contained in the hyperplane \({\mathbb R}^{r}_{0}\) of equation c 1 + … + c r = 0. Every element of \({\mathbb R}^{r}\) can naturally be identified with a function on M that is constant on each face of the graph. To each loop i, we can associate the unique element \(\mathsf {n}_{\ell _{i}}\) of \({\mathbb R}^{r}_{0}\) which, as a function on M, varies by 1 across i Footnote 31 and is constant across every other loop. This function is a substitute for the winding number of the loop i on the surface M.

It is not difficult to check that it is equivalent, for an element of \({\mathbb R}^{r}\), to be orthogonal, for the simplest scalar product, to the subspace M, or to have a constant jump across every loop, the constant possibly depending on the loop. A more formal statement is the following. We denote by 1 the vector (1, …, 1).

Proposition 3.2

In \({\mathbb R}^{r}\) , one has the equality of linear subspaces

$$\displaystyle \begin{aligned}M=\mathrm{Vect}(\mathsf{1},\mathsf{n}_{\ell_{1}},\ldots,\mathsf{n}_{\ell_{n}})^{\perp}\end{aligned}$$

In particular, \(\dim M=\dim {\mathbb R}^{r}_{0}-n\).

The greater the number of loops, the worse the situation. Even with one single loop, we see that all the information about the Wilson loop expectations is not contained in the Makeenko–Migdal equations.

It is time to turn to a case where things improve drastically, namely the large N limit of the Wilson loop expectations.

3.4 The Master Field on Compact Surfaces

We saw in Section 2 that when G = U(N), Wilson loop expectations tend to take simpler forms in the limit where N tends to infinity (compare for example (20) and (21)). We also observed some instances of a property of factorisation, see for example (26). The factorisation is due to a phenomenon of concentration, with the effect that, as N tends to infinity, and provided one scales the scalar product on \({\mathfrak u}(N)\) correctly (which we did), the Wilson loop functionals, that is, the normalised traces of the random holonomies, become deterministic. The limit is thus a number depending on a loop, and this function is relatively simple, at least when one is working on the plane, because it satisfies, and is essentially determined, by the Makeenko–Migdal equations.

The main theorem of convergence is the following.

Theorem 3.3

(Master field) Let M be either the plane \({\mathbb R}^{2}\) or the sphere S 2. For each \(N\geqslant 1\), let \((H_{N,\ell })_{\ell \in {\mathcal L}(M)}\) be the Yang–Mills holonomy process on M with structure group G = U(N), and with scalar productX, Y 〉 = NTr(X Y ) on \({\mathfrak u}(N)\). Then for every loop \(\ell \in {\mathcal L}(M)\), the convergence of complex-valued random variables

$$\displaystyle \begin{aligned} \mathrm{tr}(H_{N,\ell}) \,{\mathop{\kern 0pt\longrightarrow}\limits_{N\to \infty}^{P}}\, \Phi(\ell) \end{aligned} $$
(48)

holds in probability, towards a deterministic real limit.

This theorem was proved in [24] in the case of the plane, and in [5] in the case of the sphere, see also [17]. In the case of the plane, which is simpler, it is also known that the convergence occurs quickly, in the sense that the series \(\sum _{N\geqslant 1} \mathrm {Var}(\mathrm {tr}(H_{N,\ell }))\) converges. Thus, the convergence (48) holds almost surely. The conclusion is also known to be true if one replaces the unitary group by the special unitary group, the special orthogonal group, or the symplectic group.

It is expected that Theorem 3.3 is true on any compact surface, but a proof of this fact still has to be given.

In any case, when this theorem holds, the aforementioned asymptotic factorisation takes place, in the sense that for all loops 1, …, n,

$$\displaystyle \begin{aligned}\lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell_{1}})\ldots \mathrm{tr}(H_{\ell_{n}})]{=}\lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell_{1}})]\ldots \lim_{N\to \infty}{\mathbb E}[\mathrm{tr}(H_{\ell_{n}})]{=}\Phi(\ell_{1})\ldots \Phi(\ell_{n})\end{aligned} $$

The function \(\Phi :{\mathcal L}(M)\to {\mathbb R}\) which appears in (48) is called the master field. This is a continuous function with respect to the convergence of loops with fixed endpoints (see the beginning of Section 1.4.4) and it satisfies, crucially, the Makeenko–Migdal equation (MM ), which is all that there is left of the full set of equations stated in Theorem 3.1 as N tends to infinity.

Theorem 3.4

Assume that M is either the plane \({\mathbb R}^{2}\) or the sphere S 2. The function \(\Phi :{\mathcal L}(M)\to {\mathbb R}\) is the unique function that is continuous, invariant under area-preserving diffeomorphisms, satisfying the Makeenko–Migdal equation (MM ) and such that for every simple loop ℓ enclosing a domain of area t, one has, depending on whether M is the plane or a sphere of total area T,

$$\displaystyle \begin{aligned} \Phi(\ell)=e^{-\frac{t}{2}} \end{aligned} $$
(M = ℝ 2)

or

$$\displaystyle \begin{aligned} \Phi(\ell)=\frac{1}{\pi} \int_{{\mathbb R}} \cosh \Big(\frac{x}{2}(T-2t)\Big) \sin (\pi \rho_{T}(x)){\; \mathrm{d}} x \end{aligned} $$
(M = S2)

3.5 A Value of the Master Field on the Plane

As a conclusion to these notes, we give an example of computation of a value of the master field Φ on the plane, and choose an example that is not listed at the end of [24]. We choose the loop represented on the left half of Figure 12.

Fig. 12
figure 12

We are interested in computing Φ(). The strategy is to use the Makeenko–Migdal equations to compute u Φ(). As u = 0, the two inner windings of disentangle, and becomes identical to 0. This loop 0 is similar to the loop that we studied in Section 2.3, and becomes exactly this loop when t 2 = 0. Our first task is thus to compute \(\partial _{t_{2}} \Phi (\ell _{0})\)

Although we did not include this in our description of the function Φ on the plane \({\mathbb R}^{2}\), it is not difficult to check that the derivative of Φ of any loop with respect to the area of a face adjacent to the unbounded face is equal to \(-\frac {1}{2}\) times the value of Φ on this loop. This factor \(-\frac {1}{2}\) comes of course from the stochastic differential equation (15) satisfied by the Brownian motion on U(N).

Given the value of Φ on simple loops and (29), the Makeenko–Migdal equation applied to the vertex of 0 that is marked in Figure 12 yields

$$\displaystyle \begin{aligned}(2\partial_{s}-\partial_{t_{2}})\Phi(\ell_{0})=(-1-\partial_{t_{2}})\Phi(\ell_{0})=e^{-\frac{s}{2}-t_{1}-t_{2}}(1-t_{1})\end{aligned}$$

which is solved in

$$\displaystyle \begin{aligned}\Phi(\ell_{0})=e^{-\frac{s}{2}-t_{1}-t_{2}}(1-t_{1})(1-t_{2})\end{aligned}$$

If we can determine u Φ() explicitly, we are done, since Φ( 0) is exactly the value of Φ() at u = 0. Applying the Makeenko–Migdal equations at the three marked vertices in Figure 12 yields the derivatives \((\partial _{s_{1}}+\partial _{s_{2}}-\partial _{t_{2}})\Phi (\ell )\), \((\partial _{s_{1}}+\partial _{s_{2}}-\partial _{t_{1}})\Phi (\ell )\), and \((\partial _{t_{1}}+\partial _{t_{2}}-\partial _{s_{2}}-\partial _{u})\Phi (\ell )\). Adding the three expressions and using the fact that \(\partial _{s_{1}}\Phi (\ell )=\partial _{s_{2}}\Phi (\ell )=-\frac {1}{2}\Phi (\ell )\), we find

$$\displaystyle \begin{aligned}\Big(-\frac{3}{2}-\partial_{u}\Big)\Phi(\ell)=e^{-\frac{s_{1}+s_{2}}{2}-t_{1}-t_{2}-\frac{3u}{2}}(3-t_{1}-t_{2}-u)\end{aligned}$$

and finally

$$\displaystyle \begin{aligned} \Phi(\ell)=e^{-\frac{s_{1}+s_{2}}{2}-(t_{1}+t_{2})-\frac{3u}{2}}\Big(\frac{u^{2}}{2}+(t_{1}+t_{2}-3)u+(1-t_{1})(1-t_{2})\Big) \end{aligned} $$
(49)

Evaluating this expression with s 1 = s 2 = t 1 = t 2 = 0 yields the large N limit of the third moment of the unitary Brownian motion at time u, as expressed by (22) with n = 3. This is consistent with the fact that shrinking all faces but the face of area u reduces to a loop winding three times around a simple domain of area u.