The question we want to address now is: is there anything deeper behind the phenomenological facts (1), (2), and (3) discussed in the first chapter and the formalization of Sect. 3.4 ?

An appealing attempt to answer that question and justify the formalism based on the spectral theory is due to von Neumann [Neu32] (and subsequently extended by Birkhoff and von Neumann). This chapter will review quickly the elementary content of those ideas, adding however several modern results (see also [Var07, Mor18] for a similar approach and [Red98] for an extensive technical account on quantum lattice theory and applications).

4.1 Lattices in Classical and Quantum Mechanics

This section introduces the mathematical notion of lattice, which will be used later to construct a bridge between classical and quantum systems.

4.1.1 A Different Viewpoint on Classical Mechanics

Let us start by analyzing Classical Mechanics (CM). Consider a classical Hamiltonian system described on a symplectic manifold ( Γ, ω), where \(\omega = \sum _{k=1}^n dq^k\wedge dp_k\) in any system of local symplectic coordinates q 1, …, q n, p 1, …, p n. The state of the system at time t is a point s ∈ Γ, in local coordinates s ≡ (q 1, …, q n, p 1, …, p n), whose evolution \({\mathbb R} \ni t \mapsto s(t)\) solves the Hamiltonian equations of motion. Always in local symplectic coordinates, they read

$$\displaystyle \begin{aligned}\frac{dq^k}{dt} = \frac{\partial h(t,q,p)}{\partial p_k}\:, \quad \frac{dp_k}{dt} = -\frac{\partial h(t,q,p)}{\partial q^k} \:, \qquad k=1,\ldots, n \:,\end{aligned}$$

h being the Hamiltonian function of the system, depending on the reference frame. Every physical elementary property E that the system may possess at a certain time t, i.e. which can be true or false at that time, can be identified with a subset E ⊂ Γ. The property is true if s ∈ E and it is not if sE. From this point of view, the standard set operations ∩, ∪, ⊂, ¬ (where ¬E :=  Γ ∖ E from now on is the complementation) have a logical interpretation:

  1. (i)

    E ∩ F corresponds to the property “E AND F”,

  2. (ii)

    E ∪ F corresponds to the property “E OR F”,

  3. (iii)

    ¬E corresponds to the property “NOT E”,

  4. (iv)

    E ⊂ F means “E IMPLIES F”.

In this context,

  1. (v)

    Γ is the property which is always true

  2. (vi)

    \(\varnothing \) is the property which is always false.

This identification is possible because, as is well known, the logical connectives define the same algebraic structure as the set-theory operations.

As soon as we admit the possibility to construct statements including countably many disjunctions or conjunctions, we can move into abstract measure theory and interpret states as probability Dirac measures supported on a single point. To this end, we initially restrict the class of possible elementary properties to the Borel σ-algebra of Γ, \({\mathcal{B}}(\Gamma)\). For various reasons this class of sets seems to be sufficiently large to describe the physics (in particular \({\mathcal{B}}(\Gamma)\) contains the pre-images of measurable sets under continuous functions). A state at time t, s ∈ Γ, can be viewed as a Dirac measure, δ s, supported on s itself. If \(E \in {\mathcal{B}}(\Gamma)\), δ s(E) = 0 if sE or δ s(E) = 1 if s ∈ E.

If we do not have a perfect knowledge of the system, as for instance it happens in statistical mechanics, the state μ at time t is a proper probability measure on \({\mathcal{B}}(\Gamma)\), which now is allowed to attain all values in [0, 1]. If \(E \in {\mathcal{B}}(\Gamma)\) is an elementary property of the physical system, μ(E) denotes the probability that the property E is true for the system at time t.

Remark 4.1

The evolution equation of μ, in statistical mechanics is given by the well-known Liouville equation associate with the Hamiltonian flow. In that case μ is proportional to the natural symplectic volume of Γ, Ω = ω ∧⋯ ∧ ω (n-times, where 2n = dim( Γ)). In fact we have μ = ρ Ω, where the non-negative function ρ is the so-called Liouville density satisfying the famous Liouville equation. In symplectic local coordinates that equation reads

$$\displaystyle \begin{aligned}\frac{\partial \rho(t,q,p)}{\partial t} +\sum_{k=1}^n \left(\frac{\partial \rho}{\partial q^k} \frac{\partial h}{\partial p_k}-\frac{\partial \rho}{\partial p_k} \frac{\partial h}{\partial q^k}\right)=0\:.\end{aligned}$$

We shall not deal any further with this equation in this book. \(\blacksquare \)

More complicated classical quantities of the system can be described by Borel measurable functions \(f: \Gamma \to {\mathbb R}\). Measurability is a good requirement as it permits one to perform physical operations like computing, for instance, the expectation value (at a given time) when the state is μ:

$$\displaystyle \begin{aligned}\langle f \rangle_\mu = \int_\Gamma f \:d\mu\:.\end{aligned}$$

Also elementary properties can be described by measurable functions, in fact they are identified faithfully with Borel measurable functions g :  Γ →{0, 1}. The Borel set E g associated to g is g −1({1}) and in fact \(g=\chi _{E_g}\).

A generic physical quantity, a measurable function \(f: \Gamma \to {\mathbb R}\), is completely determined by the class of Borel sets (elementary properties) \(E^{(f)}_B := f^{-1}(B)\) where \(B \in {\mathcal {B}}({\mathbb R})\). The meaning of \(E^{(f)}_B\) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} E^{(f)}_B =\mbox{``the value of}\ f\ \mbox{belongs to}\ B\mbox{''} {}\end{array} \end{aligned} $$
(4.1)

It is possible to prove [Mor18] that the map \( {\mathcal {B}}({\mathbb R}) \ni B \mapsto E^{(f)}_B\) permits one to reconstruct the function f. The sets \(E^{(f)}_B := f^{-1}(B)\) form a σ-algebra as well and the class of sets \(E^{(f)}_B\) satisfies the following elementary properties when B ranges in \({\mathcal {B}}({\mathbb R})\).

  1. (Fi)

    \(E^{(f)}_{\mathbb R} = \Gamma \),

  2. (Fii)

    \(E^{(f)}_B \cap E^{(f)}_C =E^{(f)}_{B\cap C}\),

  3. (Fiii)

    If \(N \subset {\mathbb N}\) and \(\{B_k\}_{k\in N} \subset {\mathcal {B}}({\mathbb R})\) satisfies \(B_j \cap B_k = \varnothing \) for k ≠ j, then

    $$\displaystyle \begin{aligned}\cup_{j \in N} E^{(f)}_{B_j} = E^{(f)}_{\cup_{j\in N}B_j}\:.\end{aligned}$$

These conditions just say that \( {\mathcal {B}}({\mathbb R}) \ni B \mapsto E^{(f)}_B \in \mathcal {B}(\Gamma )\) is a homomorphism of σ -algebras. Notice in particular that, keeping (Fi) and (Fiii), requirement (Fii) can be replaced by \(E^{(f)}_{{\mathbb R} \setminus E} = \Gamma \setminus E^{(f)}_E\) as the reader immediately proves.

We observe that our model of classical elementary properties can be also viewed as another mathematical structure, when referring to the notion of lattice we go to introduce.

4.1.2 The Notion of Lattice

We remind the reader that in a partially ordered set (X, ≥) (or poset), if Y ⊂ X, the symbol \(\sup Y\) denotes, if it exists, the smallest element x of X such that x ≥ y for every y ∈ Y . Similarly, the symbol \(\inf Y\) denotes, if it exists, the largest element x of X such that y ≥ x for every y ∈ Y .

Definition 4.2

A partially ordered set (X, ≥) is a lattice when, for any a, b ∈ X,

  1. (a)

    \(\sup \{a,b\}\) exists in X, and is called join a ∨ b;

  2. (b)

    \(\inf \{a,b\}\) exists in X, and is called meet a ∧ b.

(The poset is not required to be totally ordered.) \(\blacksquare \)

Remark 4.3

  1. (a)

    In the concrete cases where \(X={\mathcal {B}}({\mathbb R})\) or \(X= \mathcal {B}(\Gamma )\), ≥ is nothing but ⊃ and thus ∨ means ∪ and ∧ has the meaning of ∩.

  2. (b)

    In the general case ∨ and ∧ turn out to be associative, so it makes sense to write a 1 ∨⋯ ∨ a n and a 1 ∧⋯ ∧ a n in a lattice. Moreover they are commutative so

    $$\displaystyle \begin{aligned}a_1 \vee \cdots \vee a_n = a_{\pi(1)} \vee \cdots \vee a_{\pi(n)}\quad \mbox{and}\quad a_1 \wedge \cdots \wedge a_n = a_{\pi(1)} \wedge \cdots \wedge a_{\pi(n)}\end{aligned}$$

    for every permutation π : {1, …, n}→{1, …, n}.

    The absorption laws are moreover valid: a ∨ (a ∧ b) = a and a ∧ (a ∨ b) = a.

  3. (c)

    It is easy to prove that in a lattice a ≥ b iff a ∨ b = a (equivalently a ∧ b = b). \(\blacksquare \)

Definition 4.4

A lattice (X, ≥) is said to be:

  1. (a)

    distributive if ∨ and ∧ distribute over one another: for any a, b, c ∈ X,

    $$\displaystyle \begin{aligned} \begin{array}{rcl} a\vee (b\wedge c) = (a\vee b)\wedge (a\vee c) \:,\quad a\wedge (b\vee c) = (a\wedge b)\vee (a\wedge c) \:; \end{array} \end{aligned} $$
  2. (b)

    bounded if it admits a minimum 0 and a maximum 1, called bottom and top;

  3. (c)

    orthocomplemented if bounded and equipped with a mapping X ∋ a↦¬a, where ¬a is the orthocomplement of a, such that:

    1. (i)

      a ∨¬a = 1 for any a ∈ X,

    2. (ii)

      a ∧¬a = 0 for any a ∈ X,

    3. (iii)

      ¬(¬a) = a for any a ∈ X,

    4. (iv)

      a ≥ b implies ¬b ≥¬a for any a, b ∈ X;

  4. (d)

    complete (resp. σ -complete), if every (countable) set {a j}jJ ⊂ X admits infimum ∨jJ a j and supremum ∧jJ a j.

A lattice with properties (a), (b) and (c) is called a Boolean algebra. A Boolean algebra satisfying (d) with \(J= {\mathbb N}\) is a Boolean σ -algebra.

A sublattice is a subset X 0 ⊂ X inheriting the lattice structure from X, in the following precise sense: the infimum and the supremum of any pair of elements of X must exist and coincide with the corresponding infimum and supremum in X. Referring to bounded sublattices and orthocomplemented sublattices, the top, the bottom and the orthocomplement of the substructure must coincide, by definition, with those in the larger structure. \(\blacksquare \)

It is easy to prove De Morgan’s laws for an orthocomplemented lattice [Red98, Mor18] just applying the relevant definitions.

Proposition 4.5

If (X, ≥, 0, 1, ¬) is an orthocomplemented lattice and A  X is finite then, with an obvious notation,

$$\displaystyle \begin{aligned}\neg \vee_{a\in A}a = \wedge_{a\in A} \neg a \quad \mathit{\mbox{and}}\quad \neg \wedge_{a\in A}a = \vee_{a\in A} \neg a\:.\end{aligned}$$

If A is infinite, the terms on either side exist or do not exist simultaneously. If they do, the formula holds.

Definition 4.6

If X, Y are lattices, a map h : X → Y is a lattice homomorphism when

$$\displaystyle \begin{aligned}h(a \vee_X b) = h(a) \vee_Y h(b) \:, \: \: \:h(a \wedge_X b) = h(a) \wedge_Y h(b)\:,\:\:\:a,b \in X\end{aligned}$$

(with the obvious notations.) If X and Y are bounded, a homomorphism h is further required to satisfy

$$\displaystyle \begin{aligned}h({\mathbf{0}}_X) = {\mathbf{0}}_Y \:, \quad h({\mathbf{1}}_X) = {\mathbf{1}}_Y \:.\end{aligned}$$

If X and Y are orthocomplemented, in addition,

$$\displaystyle \begin{aligned} h(\neg_X a) = \neg_Y h(x)\:.\end{aligned}$$

If X, Y are complete (σ-complete), h it is further required to satisfy (with \(J={\mathbb N}\))

$$\displaystyle \begin{aligned}h(\vee_{j\in J} a_j) = \vee_{j\in J} h(a_j) \:, \quad h(\wedge_{j\in J} a_j) = \wedge_{j\in J} h(a_j) \quad \mbox{if}\ \{a_j\}_{j\in J} \subset X\:.\end{aligned}$$

In all cases (bounded, orthocomplemented, (σ-)complete lattices, Boolean (σ-) algebras) if h is bijective it is called isomorphism. \(\blacksquare \)

It is clear that, just because it is a concrete σ-algebra, the lattice of the elementary properties of a classical system is a lattice which is distributive, bounded (here \(0=\varnothing \) and 1 =  Γ), orthocomplemented (the orthocomplement being the set complement in Γ) and σ-complete. Moreover, as the reader can easily prove, the above map, \({\mathcal {B}}({\mathbb R}) \ni B \mapsto E^{(f)}_B \in \mathcal {B}(\Gamma )\), is also a homomorphism of Boolean σ-algebras.

Remark 4.7

Given an abstract Boolean σ-algebra X, does there exist a concrete σ-algebra of sets that is isomorphic to it? The Loomis-Sikorski theorem [Sik48] gives an answer. This guarantees that every Boolean σ-algebra is isomorphic to a quotient Boolean σ-algebra \(\Sigma /\mathcal {N}\), where Σ is a concrete σ-algebra of sets on a measurable space and \(\mathcal {N}\subset \Sigma \) is closed under countable unions; moreover, \(\varnothing \in \mathcal {N}\) and for any A ∈ Σ with \(A \subset N \in \mathcal {N}\), then \(A \in \mathcal {N}\). The equivalence relation is A ∼ B iff \(A\cup B \setminus (A \cap B) \in \mathcal {N}\), for any A, B ∈ Σ. It is easy to see the coset space \(\Sigma /\mathcal {N}\) inherits the structure of Boolean σ-algebra from Σ with respect to the (well-defined) partial order [A] ≥ [B] if A ⊃ B, A, B ∈ Σ.

In the simpler case of an abstract Boolean algebra, the celebrated Stone’s representation theorem [Sto36] proves that it is always isomorphic to a concrete algebra of sets. \(\blacksquare \)

4.2 The Non-Boolean Logic of QM

It is evident that the classical-like picture illustrated in Sect. 4.1 is untenable for quantum systems. The deep reason is that there are pairs of elementary properties E, F of quantum systems which are incompatible. Here, an elementary property is an observable which, if measured by means of a corresponding experimental apparatus, can only attain two values: 0 if it is false or 1 if it is true. For instance, E = “the component S x of the electron is ħ∕2” and F = “the component S y of the electron is ħ∕2”. There is no physical instrument capable to establish if E AND F is true or false. We conclude that some of elementary observables of quantum systems cannot be combined the standard logical connectives. The model of Borel σ-algebra seems not to be appropriate for quantum systems. However one could try to use some form of lattice structure different form the classical one.

4.2.1 The Lattice of Quantum Elementary Observables

The fundamental ideas of von Neumann were the following two.

  1. (N1)

    Given a quantum system, there is a complex separable Hilbert space H such that the elementary observables—the ones which only assume values in {0, 1}—are represented faithfully by elements of \({\mathcal {L}}({\mathsf H})\), the orthogonal projectors in \({\mathfrak B}({\mathsf H})\).

  2. (N2)

    Two elementary observables P, Q are compatible if and only if they commute as projectors.

Remark 4.8

  1. (a)

    As we shall see later, (N1) has to be modified for quantum systems admitting superselection rules. For the moment we stick to the above version of (N1).

  2. (b)

    Separability will play a crucial role in several technical constructions. This technical requirement could actually be omitted, and proved to hold later for specific quantum systems (e.g., elementary particles) as a consequence of specific physical requirements. However we shall assume it from the beginning. \(\blacksquare \)

Let us analyse the reasons for von Neumann’s postulates. First of all we observe that \({\mathcal {L}}({\mathsf H})\) is in fact a lattice if one remembers the relation between orthogonal projectors and closed subspaces stated in Proposition 3.16 and equipping the set of closed subspaces with the natural ordering relation given by set-theoretic inclusion relation.

Referring to Notation 3.18, if \(P,Q \in {\mathcal {L}}({\mathsf H})\), we write P ≥ Q if and only if P(H) ⊃ Q(H). As announced, it turns out that \(({\mathcal {L}}({\mathsf H}), \geq )\) is a lattice and, in particular, it enjoys the following properties.

Proposition 4.9

Let H be a complex (not necessarily separable) Hilbert space. For every \(P \in {\mathcal {L}}({\mathsf H})\) , define ¬P := I  P (the orthogonal projector onto P(H) according to Proposition 3.16 ). Then \(({\mathcal {L}}({\mathsf H}), \geq , 0, I, \neg )\) is a bounded, orthocomplemented, complete (so also σ-complete) lattice which is not distributive if dim(H) ≥ 2.

More precisely,

  1. (i)

    P  Q is the orthogonal projector onto \(\overline {P({\mathsf H})+Q({\mathsf H})}\).

    The analogue holds for a set \(\{P_j\}_{j \in J}\subset {\mathcal {L}}({\mathsf H})\) , namelyjJ P j is the orthogonal projector onto \(\overline {\mathit{\mbox{span}}\{P_j({\mathsf H})\}_{j\in J}}\).

  2. (ii)

    P  Q is the orthogonal projector on P(H) ∩ Q(H).

    The analogue holds for a set \(\{P_j\}_{j \in J}\subset {\mathcal {L}}({\mathsf H})\) , namelyjJ P j is the orthogonal projector ontojJ P j(H).

  3. (iii)

    The bottom and top elements are respectively 0 and I.

  4. (iv)

    Referring to (i) and (ii), if \(J={\mathbb N}\)

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \vee_{n\in {\mathbb N}} P_n = \mathit{\mbox{s-}}\lim_{k \to +\infty} \vee_{n\leq k} P_n \quad \mathit{\mbox{and}}\quad \wedge_{n\in {\mathbb N}} P_n = \mathit{\mbox{s-}}\lim_{k \to +\infty} \wedge_{n\leq k} P_n \qquad {}\end{array} \end{aligned} $$
    (4.2)

    where “s-” indicates that the limits are computed in the strong operator topology.

Proof

The fact that \(\mathcal {L}({\mathsf H})\) is a lattice is evident when we interpret it as a poset of closed subspaces. It is clear that \(\sup \{P({\mathsf H}), Q({\mathsf H})\} = \overline {P(H)+Q(H)}\) if \(P,Q\in \mathcal {L}({\mathsf H})\), since \(\sup \{P({\mathsf H}), Q({\mathsf H})\}\) contains both P(H) and Q(H) and every closed subspace containing these subspaces must also contain \(\overline {P(H)+Q(H)}\) by linearity and definition of closure. It is clear that \(\inf \{P({\mathsf H}), Q({\mathsf H})\} = P(H)\cap Q(H)\) if \(P,Q\in \mathcal {L}({\mathsf H})\), since the closed subspace P(H) ∩ Q(H) is contained in both P(H) and Q(H) and every closed subspaces that is part of both P(H) and Q(H) must be contained in these subspaces must be contain in the closed subspace P(H) ∩ Q(H). A trivial extension of the same arguments proves (i) and (ii). It is evident that \(\mathcal {L}({\mathsf H})\) is bounded with said top and bottom. The fact that ¬P := I − P (that is the orthogonal projector onto P(H) as established in Proposition 3.16 (b)) is an orthocomplement can be immediately proved by direct inspection using properties of presented in Sect. 2.1.2 and in Proposition 3.16. Failure of distributivity for \(\dim ({\mathsf H})\geq 2\) immediately arises form the analog for \({\mathsf H}={\mathbb C}^2\) we go to prove. Let {e 1, e 2} be the standard basis of \({\mathbb C}^2\) and define the subspaces H 1 := span{e 1}, H 2 := span{e 2}, H 3 := span{e 1 + e 2}. Finally P 1, P 2, P 3 respectively denote the orthogonal projectors onto these spaces. By direct inspection one sees that P 1 ∧ (P 2 ∨ P 3) = P 1 ∧ I = P 1 and (P 1 ∧ P 2) ∨ (P 1 ∧ P 3) = 0 ∨ 0 = 0, so that P 1 ∧ (P 2 ∨ P 3) ≠ (P 1 ∧ P 2) ∨ (P 1 ∧ P 3). To end the proof, let us prove (4.2). Consider the former limit. P := s-limk→+nk P n exists in \(\mathcal {L}({\mathsf H})\) in view of Proposition 3.20 since ∨nk P n projects onto larger and larger subspaces as n increases. We want to prove that the limit P coincides to the projector onto \(\overline {\mbox{span}\{P_j({\mathsf H})\}_{j\in J}}\) denoted by \(\vee _{n\in {\mathbb N}}P_n\) in (i). It is clear that ∨nk P n ≤ P by definition of P as it holds that

$$\displaystyle \begin{aligned}\langle x| \vee_{n\leq k} P_nx\rangle \leq \sup_{k\in {\mathbb N}}\langle x| \vee_{n\leq k} P_nx\rangle = \lim_{k \to +\infty}\langle x| \vee_{n\leq k} P_nx\rangle = \langle x|Px\rangle\:,\end{aligned}$$

so P(H) contains all subspaces ∨nk P n and also each single P n(H). So P(H) contains their finite span, by linearity, and also the closure of the span, because P(H) is closed. Hence \(P({\mathsf H}) \supset \overline {\mbox{span}\{P_n({\mathsf H})\}_{n\in {\mathbb N}}}\). On the other hand, if x ∈ P(H), then \(x=\lim _{k \to +\infty } \vee _{n\leq k} P_nx \in \overline {\mbox{span}\{P_n({\mathsf H})\}_{n\in {\mathbb N}}}\), hence \(P({\mathsf H}) \subset \overline {\mbox{span}\{P_n({\mathsf H})\}_{n\in {\mathbb N}}}\). We conclude that \(P({\mathsf H}) = \overline {\mbox{span}\{P_n({\mathsf H})\}_{n\in {\mathbb N}}}\). For (i), this is the same as saying \(P= \vee _{n\in {\mathbb N}}P_n\). The proof of the second formula in (4.2) is identical barring trivial changes. □

4.2.2 Part of Classical Mechanics is Hidden in QM

To go on, the crucial observation is that \(({\mathcal {L}}({\mathsf H}), \geq , 0, I, \neg )\) contains lots of Boolean σ-algebras, and precisely the maximal sets of pairwise compatible projectors. These σ-algebras in the quantum context could be interpreted as made of classical observables at least concerning mutual relations.

Proposition 4.10

Let H be a complex separable Hilbert space and consider the lattice of orthogonal projectors \(({\mathcal {L}}({\mathsf H}), \geq , 0, I, \neg )\).

Assume that \({\mathcal {L}}_0 \subset {\mathcal {L}}({\mathsf H})\) is a maximal subset of pairwise commuting elements (i.e. if \(Q \in \mathcal {L}({\mathsf H})\) commutes with every \(P\in \mathcal {L}_0\) then \(Q\in \mathcal {L}_0\) ). Then \({\mathcal {L}}_0\) contains 0, I, it is ¬-closed. Furthermore, when equipped with the restriction of the lattice structure of \(({\mathcal {L}}({\mathsf H}), \geq , 0, I, \neg )\) , it becomes a Boolean σ-algebra (in particular the supremum and the infimum of sequences of elements computed in \(\mathcal {L}_0\) coincide with the corresponding inf and sup in the whole \(\mathcal {L}({\mathsf H})\) ). Finally, if \(P,Q \in {\mathcal {L}}_0\) ,

  1. (i)

    P  Q = P + Q  PQ ,

  2. (ii)

    P  Q = PQ.

Proof

\({\mathcal {L}}_0\) contains both 0 and I because \({\mathcal {L}}_0\) is maximally commutative and is ¬ closed: ¬P = I − P commutes with every element of \(\mathcal {L}_0\) if \(P\in \mathcal {L}_0\), so \(\neg P \in \mathcal {L}_0\) due to the maximality condition. Taking advantage of the associativity of ∨ and ∧, and using (iv) in Proposition 4.9, the \(\sup \) and \(\inf \) of a sequence of projectors \(\{P_n\}_{n\in {\mathbb N}} \subset {\mathcal {L}}_0\) commute with the elements of \({\mathcal {L}}_0\) since every element ∨nk P n and ∧nk P n does by direct application of (i) and (ii). Maximality implies that these limit projectors belong to \({\mathcal {L}}_0\). Finally (i) and (ii) prove by direct inspection that ∨ and ∧ are mutually distributive. Let us prove (ii) and (i) to conclude. If PQ = QP, PQ is an orthogonal projector and PQ(H) = QP(H) ⊂ P(H) ∩ Q(H). On the other hand, if x ∈ P(H) ∩ Q(H) then Px = x and x = Qx so that PQx = x and thus P(H) ∩ Q(H) ⊂ PQ(H) and (ii) holds. To prove (i) observe that \(\overline {P({\mathsf H})+Q({\mathsf H})}^\perp = (P({\mathsf H})+Q({\mathsf H}))^\perp \). By linearity, (P(H) + Q(H)) = P(H)∩ Q(H). Therefore \( \overline {P({\mathsf H})+Q({\mathsf H})}= (\overline {P({\mathsf H})+Q({\mathsf H})}^\perp )^\perp = (P({\mathsf H})^\perp \cap Q({\mathsf H})^\perp )^\perp \). Using (ii), and the fact that I − R is the orthogonal projector onto R(H), this can be rephrased as P ∨ Q = I − (I − P)(I − Q) = I − (I − P − Q + PQ) = P + Q − PQ. □

Remark 4.11

  1. (a)

    Every set of pairwise commuting orthogonal projectors can be completed to a maximal set as an elementary application of Zorn’s lemma. However, since the commutativity property is not transitive, there are many possible maximal subsets of pairwise commuting elements in \({\mathcal {L}}({\mathsf H})\) with non-empty intersection.

  2. (b)

    As a consequence of the proposition, the symbols ∨, ∧ and ¬ have the same properties in \({\mathcal {L}}_0\) as the connectives of classical logic OR, AND and NOT. Moreover P ≥ Q can be interpreted as “Q IMPLIES P”. \(\blacksquare \)

There have been and still are many attempts to interpret ∨ and ∧ as connectives of a new non-distributive logic when dealing with the whole \({\mathcal {L}}({\mathsf H})\): a quantum logic. The first noticeable proposal was due to Birkhoff and von Neumann [BivN36]. Nowadays there are lots of quantum logics [BeCa81, Red98, EGL09], all regarded with suspicion by physicists. Indeed, the most difficult issue is the physical operational interpretation of these connectives is to take in account the fact that they put together incompatible propositions, which cannot be measured simultaneously. An interesting interpretative attempt, due to Jauch, relies up an identity discovered by von Neumann. For the proof we will use the machinery of spectral theory and produce an original proof. More elementary proofs appear in [Red98] and [Mor18], based on technical propositions we did not discuss in these lectures.

Proposition 4.12

In a Hilbert space H , for every \(P,Q \in {\mathcal {L}}({\mathsf H})\) and x H ,

$$\displaystyle \begin{aligned} \begin{array}{rcl} (P \wedge Q)x = \lim_{n \to +\infty} (PQ)^nx {}\end{array} \end{aligned} $$
(4.3)

Proof

Fix x ∈H and (uniquely) decompose it as

$$\displaystyle \begin{aligned} \begin{array}{rcl} x=x_0 + y\:,\; \mbox{where}\ x_0 \in (P \wedge Q)({\mathsf H}) = P({\mathsf H}) \cap Q({\mathsf H})\ \mbox{and}\ y \in (P({\mathsf H}) \cap Q({\mathsf H}))^\perp.\\{}\end{array} \end{aligned} $$
(4.4)

Consider the sequence of operators A 1 := P, A n := QP, A 3 := PQP, A 4 := QPQP, ⋯ We want to prove that

$$\displaystyle \begin{aligned} \begin{array}{rcl} A_ny \to 0 {}\:.\end{array} \end{aligned} $$
(4.5)

This would conclude the proof because A n x 0 = x 0 since x 0 ∈ P(H) and x 0 ∈ Q(H), so that Px 0 = Qx 0 = x 0 and A n x → x 0 + 0 = x 0; finally, the sequence \(\{(PQ)^nx\}_{n\in {\mathbb N}}\) is a subsequence of \(\{A_nx\}_{n\in {\mathbb N}}\) and thus it converges to the same limit x 0, proving (4.3).

To prove (4.5), observe that the sequence of operators applied to y, \(\{A_ny\}_{n\in {\mathbb N}}\), satisfies

$$\displaystyle \begin{aligned}||A_{n+1}y|| \leq ||A_n y||\:,\end{aligned}$$

since either A n+1 = PA n or A n+1 = QA n and ||P||, ||Q||≤ 1. The non-increasing sequence \(\{||A_ny||\}_{n \in {\mathbb N}}\) must therefore admit a limit in view of elementary results of calculus. If we found a subsequence of \(\{A_n y\}_{n\in {\mathbb N}}\) converging to 0 we would prove that also ||A n y||→ 0 as n → + which, in turn, would entail (4.5). The following lemma concludes the proof.

Lemma 4.13

The subsequence \(\{A_{2n+1}y\}_{n\in {\mathbb N}}\) tends to 0 as n → +∞.

Proof Consider the subsequence of operators \(\{A_{2n+1}\}_{n \in {\mathbb N}}\). Remembering that PP = P, we have

$$\displaystyle \begin{aligned}A_3 = PQP =: B, \:\: A_5 = PQPQP = (PQP)^2 =B^2, \:\: A_7 = PQPQPQP = (PQP)^3= B^3\:, \cdots \end{aligned}$$
$$\displaystyle \begin{aligned}\cdots, A_{2n+1} = B^n, \cdots \end{aligned}$$

Notice that

  1. (1)

    \(B^*=(PQP)^* = P^*Q^*P^* = PQP= B \in {\mathfrak B}({\mathsf H})\),

  2. (2)

    ||B||≤||P||||Q||||P||≤ 1,

  3. (3)

    σ(B) ⊂ [−||B||, ||B||] (Proposition 3.47),

  4. (4)

    σ(B) ∈ [0, +) (Proposition 3.46) as 〈z|Bz〉 = 〈Pz|QPz〉 = 〈Pz|QQPz〉 = ||QPz||2 ≥ 0.

Collecting these results, we have from the spectral theory

$$\displaystyle \begin{aligned}B^nz = \int_{[0,1]} \lambda^n dP^{(B)}(\lambda)z\quad \mbox{if}\ z\in {\mathsf H}\:.\end{aligned}$$

Since λ n → χ {1}(λ) pointwise for λ ∈ [0, 1] if n → +, exploiting Proposition 3.29 (b), we conclude that

$$\displaystyle \begin{aligned} \begin{array}{rcl} B^nz \to Ez := P^{(B)}_{\{1\}}z\quad \mbox{as}\ n\to +\infty\ \mbox{and}\ z\in {\mathsf H}.{}\end{array} \end{aligned} $$
(4.6)

With the same argument we can also prove that

$$\displaystyle \begin{aligned} \begin{array}{rcl} C^nz \to Fz := P^{(C)}_{\{1\}}z\quad \mbox{as}\ n\to +\infty\ \mbox{and}\ z\in {\mathsf H},\end{array} \end{aligned} $$
(4.7)

where we have defined the other sequence of operators (which is not a subsequence of \(\{A_n\}_{n\in {\mathbb N}}\))

$$\displaystyle \begin{aligned}C := QPQ, \: C^2 = (QPQ)^2 = QPQPQ, \: B^3 = (QPQ)^3 = QPQPQPQ\:, \cdots \:.\end{aligned}$$

We now prove that the formula of orthogonal projectors holds E = F. To this end, notice that

$$\displaystyle \begin{aligned}(PQP)^n(QPQ)^m(PQP)^lz= (PQP)^{n+m+l+1}z\:,\end{aligned}$$

which implies EFE = E. (To prove it, take first the limit as m → + using the continuity of (PQP)n, next the limit as l → + using the continuity of (PQP)n F and eventually the limit as n → +.) Swapping the role of P and Q we also have FEF = F. From EFE = E we obtain

$$\displaystyle \begin{aligned} 0=\langle z| (E-EFE-EFE + EFE)z\rangle= \langle z| (E^2-EFE-EFE + EFE)z\rangle =\langle z|(E-EF)(E-FE)z\rangle\end{aligned}$$
$$\displaystyle \begin{aligned}= \langle z|(E-FE)^*(E-FE)z\rangle = ||(E-FE)z||{}^2 \; \mbox{for}\ z\in {\mathsf H}.\end{aligned}$$

Hence E = FE. Starting from FEF = F, with the same argument, we find F = EF. Putting together the found results, we find F = E as wanted, since F = F  = (EF) = FE = E.

To go on, observe that, by construction of E and F, PE = E and QF = F, so that

$$\displaystyle \begin{aligned}E({\mathsf H}) = F({\mathsf H})\subset P({\mathsf H})\cap Q({\mathsf H})\:.\end{aligned}$$

If we apply the result to the sequence B n y in (4.6) with y in (4.4), we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} A_{2n+1}y = B^ny \to Ey \in P({\mathsf H})\cap Q({\mathsf H})\:.{}\end{array} \end{aligned} $$
(4.8)

However we also have that

$$\displaystyle \begin{aligned} \begin{array}{rcl} A_{2n+1}y = B^ny \to Ey \in (P({\mathsf H})\cap Q({\mathsf H}))^\perp {}\end{array} \end{aligned} $$
(4.9)

because (P(H) ∩ Q(H)) is closed and every A 2n+1 y belongs to (P(H) ∩ Q(H)) since, if s ∈ P(H) ∩ Q(H) then 〈s|A 2n+1 y〉 = 〈s|(QPQP)y〉 = 〈(PQPQ)s|y〉 = 〈s|y〉 = 0 because y ∈ (P(H) ∩ Q(H)) by (4.4).

The only possibility permitted by (4.8) and (4.9) is A 2n+1 y → 0. □

As said above, the lemma ends the proof. □

Remark 4.14

  1. (a)

    The proof actually proves the stronger fact:

    $$\displaystyle \begin{aligned}Px,\:\: QP x, \:\: PQP x, \:\: QPQP x, \:\: PQPQPQPx,\:\: \cdots \to (P \wedge Q)x \quad \forall x \in {\mathsf H}\:. \end{aligned}$$

    We also have

    $$\displaystyle \begin{aligned}Qx,\:\: PQ x, \:\: QPQ x, \:\: PQPQ x, \:\: QPQPQPQx,\:\: \cdots \to (P \wedge Q)x \quad \forall x \in {\mathsf H}\:, \end{aligned}$$

    since P ∧ Q = Q ∧ P

  2. (b)

    Notice that the result holds in particular if P and Q do not commute, so they are incompatible elementary observables. The right-hand side of the formula above can be interpreted as the consecutive and alternated measurement of an infinite sequence of elementary observables P and Q. As

    $$\displaystyle \begin{aligned}||(P \wedge Q)x||{}^2 = \lim_{n \to +\infty} ||(PQ)^nx||{}^2 \quad \mbox{for every}\ P,Q \in {\mathcal{L}}({\mathsf H})\ \mbox{and}\ x \in {\mathsf H},\end{aligned}$$

    the probability that P ∧ Q is true for a state represented by the unit vector x ∈H is the probability that the infinite sequence of consecutive alternated measurements of P and Q produce is true at each step.\(\blacksquare \)

Exercise 4.15

Prove that, if \(P,Q\in \mathcal {L}({\mathsf H})\), then \(P+Q \in \mathcal {L}({\mathsf H})\) if and only if P and Q project onto orthogonal subspaces.

Solution

If P and Q project onto orthogonal subspaces then PQ = QP = 0 (Proposition 3.17), so that \(\mathcal {L}({\mathsf H}) \ni P\vee Q = P+Q-PQ = P+Q\) due to Proposition 4.10. Suppose conversely that \(P+Q\in \mathcal {L}({\mathsf H})\). Therefore (P + Q)2 = P + Q. In other words, P 2 + Q 2 + PQ + QP = P + Q, namely P + Q + PQ + QP = P + Q so that we end up with PQ = −QP. Applying P on the right, we obtain PQP = −QP and applying P on the left we produce PQP = −PQP. Hence PQP = 0. From PQP = −QP, we also have QP = 0 and also PQ = 0 if taking the adjoint. Proposition 3.17 implies that P and Q project onto orthogonal subspaces.

4.2.3 A Reason Why Observables Are Selfadjoint Operators

We are in a position to clarify why, in this context, observables are PVMs on \(\mathcal {B}({\mathbb R})\) and therefore they are also selfadjoint operators in view of the spectral integration and disintegration procedure, since PVMs on \(\mathcal {B}({\mathbb R})\) are one-to-one with selfadjoint operators. Exactly as in CM, an observable A can be viewed as collection of elementary YES-NO observables \(\{P_E\}_{E\in {\mathcal {B}}({\mathbb R})}\) labeled on the Borel sets E of \({\mathbb R}\). Exactly as for classical quantities, (4.1) we can say that the meaning of P E is

$$\displaystyle \begin{aligned} \begin{array}{rcl} P_E =\mbox{``the value of the observable belongs to } E\mbox{''} {}\:.\end{array} \end{aligned} $$
(4.10)

Assuming, as is obvious, that all those elementary observables are pairwise compatible, we can complete \(\{P_E\}_{E\in {\mathcal {B}}({\mathbb R})}\) to a maximal set of compatible elementary observables \(\mathcal {L}_0\) and we can work in there forgetting Quantum Theory. We therefore expect that they also satisfy the same properties (Fi)-(Fiii) of the classical quantities. Notice that (Fi)-(Fiii) immediately translate into

  1. (i)’

    \(P_{\mathbb R}=I\),

  2. (ii)’

    P E ∧ P F = P EF,

  3. (iii)’

    If \(N \subset {\mathbb N}\) and \(\{E_k\}_{k\in N} \subset {\mathcal {B}}({\mathbb R})\) satisfies \(E_j \cap E_k = \varnothing \) for k ≠ j, then

    $$\displaystyle \begin{aligned}\vee_{j \in N} P_{E_j}= P_{\cup_{j\in N}E_j}\:.\end{aligned}$$

Next, taking Proposition 4.10 into account (in particular Propositions 4.9 (iv) and 4.10 (iv), for (iii) below), these properties become

  1. (i)

    \(P_{\mathbb R}=I\),

  2. (ii)

    P E P F = P EF,

  3. (iii)

    If \(N \subset {\mathbb N}\) and \(\{E_k\}_{k\in N} \subset {\mathcal {B}}({\mathbb R})\) satisfies \(E_j \cap E_k = \varnothing \) for k ≠ j, then

    $$\displaystyle \begin{aligned}\sum_{j \in N} P_{E_j}x= P_{\cup_{j\in N}E_j}x \quad \mbox{for every}\ x\in {\mathsf H}.\end{aligned}$$

    (The presence of x is due to the fact that the convergence of the series if N is infinite is in the strong operator topology as declared in the last statement of Proposition 4.9.)

In other words we have just found Definition 3.21, specialized to a PVM on \({\mathbb R}\): observables in QM (viewed as collections of elementary propositions labelled over the Borel sets of \({\mathbb R}\)) are PVMs on \({\mathbb R}\). We also know that PVMs on \({\mathbb R}\) are associated in a 1-1 way to selfadjoint operators, in view of the results presented in the previous chapter. Indeed, integrating the function \(\imath : {\mathbb R} \ni r \mapsto r\in {\mathbb R}\) with respect to P we have the normal operator

$$\displaystyle \begin{aligned}A_P = \int_{{\mathbb R}} r\: dP(r) \end{aligned}$$

according to Theorem 3.24. This operator is selfadjoint because the integrand function is real-valued (Theorem 3.24 (c)). Finally, Theorem 3.40 proves that P is the unique PVM associated to the operator A P and the support of P is σ(A P). The operator A P encapsulates all information of the PVM \(\{P_E\}_{E\in {\mathcal {B}}({\mathbb R})}\), i.e. of the associated observable A as a collection of elementary propositions labelled over the Borel sets of \({\mathbb R}\).

We conclude that, adopting von Neumann’s framework, in QM observables are naturally described by selfadjoint operators, whose spectra coincide with the set of values attained by the observables.

4.3 Recovering the Hilbert Space Structure: The “Coordinatization” Problem

A reasonable question to ask is whether there are better reasons for choosing to describe quantum systems via a lattice of orthogonal projectors, other than the kill-off argument “it works”. To tackle the problem we start by listing special properties of the lattice of orthogonal projectors, whose proofs are elementary. The notion of orthomodularity shows up below. It is a weaker version of distributivity of the ∨ with respect to ∧, that we know to be untenable on \(\mathcal {L}({\mathsf H})\). A second notion is that of atom. (See [Red98] for a concise discussion on these properties and a list of alternative and equivalent reformulations of orthomodularity condition.)

Definition 4.16

If \((\mathcal {L}, \geq , \mathbf {0}, \mathbf {1})\) is a bounded lattice, \(a\in {\mathcal {L}}\setminus \{\mathbf { 0}\}\) is called atom if p ≤ a implies p = 0 or p = a. \(\blacksquare \)

The following theorem collects all relevant properties of the special lattice \(\mathcal {L}({\mathsf H})\), simultaneously defining them. These definitions may actually apply to a generic orthocomplemented lattice.

Theorem 4.17

In the bounded, orthocomplemented, σ-complete lattice \({\mathcal {L}}({\mathsf H})\) of Propositions 4.9 and 4.10 , the orthogonal projectors onto one-dimensional spaces are the only atoms of \({\mathcal {L}}({\mathsf H})\) . Moreover \({\mathcal {L}}({\mathsf H})\) satisfies these additional properties:

  1. (i)

    separability (for H separable): if \(\{P_a\}_{a\in A} \subset {\mathcal {L}}({\mathsf H})\setminus \{0\}\) satisfies P i ≤¬P j , i  j, then A is at most countable;

  2. (ii1)

    atomicity : for any \(P\in {\mathcal {L}}({\mathsf H})\setminus \{0\}\) there exists an atom A with A  P;

  3. (ii2)

    atomisticity : for every \(P\in {\mathcal {L}}({\mathsf H})\setminus \{0\}\) , then \(P = \vee \{A \leq P \:|\: A\:\: \mathit{\mbox{is an atom of}}\ \mathcal {L}({\mathsf H})\) };

  4. (iii)

    orthomodularity : P  Q implies Q = P ∨ ((¬P) ∧ Q);

  5. (iv)

    covering property : if \(A,P\in {\mathcal {L}}({\mathsf H})\) , with A an atom, satisfy A  P = 0, then

    1. (1)

      P  A  P with P  A  P, and

    2. (2)

      P  Q  A  P implies Q = P or Q = A  P;

  6. (v)

    irreducibility : only 0 and I commute with every element of \({\mathcal {L}}({\mathsf H})\).

Proof

Everything has an immediate elementary proof. The only pair of properties which are not completely trivial are orthomodularity and irreducibility. The former immediately arises form the observation that P ≤ Q is equivalent to PQ = QP = Q (Proposition 3.17) so that, in particular P and Q commute. Embedding them in a maximal set of pairwise commuting projectors, we can use Proposition 4.10:

$$\displaystyle \begin{aligned} P \vee ((\neg P) \wedge Q) &= P \vee ((I-P)Q) = P\vee (Q-P) = P + (Q-P) - P(Q-P)\\ &= P +Q-P -P +P =Q\:. \end{aligned} $$

Irreducibility can easily be proved observing that if \(P\in {\mathcal {L}}({\mathsf H})\) commutes with all projectors onto one-dimensional subspaces, Px = λ x x for every x ∈H. Thus P(x + y) = λ x+y(x + y) but also Px + Py = λ x x + λ y y and thus (λ x − λ x+y)x = (λ x+y − λ y)y, which entails λ x = λ y if x ⊥ y. If N ⊂H is a Hilbert basis, Pz =∑xNx|zλx = λz for some fixed \(\lambda \in {\mathbb C}\). Since P = P  = PP, we conclude that either λ = 0 or λ = 1, i.e. either P = 0 or P = I, as wanted. □

Actually, each of the listed properties admits a physical operational interpretation (e.g. see [BeCa81]). So, based on the experimental evidence of quantum systems, we could try to prove, in the absence of any Hilbert space, that elementary propositions with experimental outcome in {0, 1} form a poset. More precisely, we could attempt to find a bounded, orthocomplemented σ-complete lattice that verifies conditions (i)–(v) above, and then try to prove this lattice is described by the orthogonal projectors of a Hilbert space. This is known as the coordinatization problem [BeCa81], which can be traced back to von Neumann’s first works on the subject.

The partial order relation of elementary propositions can be defined in various ways. But it will always correspond to the logical implication, in some way or another. Starting from [Mac63] a number of approaches (either of essentially physical nature, or of formal character) have been developed to this end: in particular, those making use of the notion of (quantum) state, which we will see in a short while for the concrete case of propositions represented by orthogonal projectors. The object of the theory is now [Mac63] the pair \((\mathcal {O}, {\mathcal {S}})\), where \(\mathcal {O}\) is the class of observables and \({\mathcal {S}}\) the one of states. The elementary propositions form a subclass \({\mathcal {L}}\) of \(\mathcal {O}\) equipped with a natural poset structure \(({\mathcal {L}}, \geq )\) (also satisfying a weaker version of some of the conditions (i)–(v)). A state \(s\in {\mathcal {S}}\), in particular, defines the probability m s(P) that P is true for every \(P\in {\mathcal {L}}\) [Mac63]. As a matter of fact, if \(P,Q \in {\mathcal {L}}\), P ≥ Q means by definition that the probability m s(P) ≥ m s(Q) for every state \(s\in {\mathcal {S}}\). More difficult is to justify that the poset thus obtained is a lattice, i.e. that it admits a greatest lower bound P ∨ Q and a least upper bound P ∧ Q for every P, Q. There are several proposals, very different in nature, to introduce this lattice structure (see [BeCa81] and [EGL09] for a general treatise) and make the physical meaning explicit in terms of measurement outcome. See Aerts in [EGL09] for an abstract but operational viewpoint and [BeCa81, §21.1] for a summary on several possible ways to introduce the lattice structure on the partially ordered sets.

If we accept the lattice structure on elementary propositions of a quantum system, then we may define the operation of orthocomplementation by the familiar logical/physical negation. An apparent problem is the abstract definition of the notion of compatible propositions, since this notion makes explicit use of the structure of \(\mathcal {L}({\mathsf H})\) as set of operators. Actually also this notion is general and can be defined for generic orthocomplemented lattices.

Definition 4.18

Let \(({\mathcal {L}}, \geq , \mathbf {0}, \mathbf {1}, \neg )\) be an orthocomplemented lattice and consider two elements \(a,b \in {\mathcal {L}}\).

  1. (a)

    They are said to be orthogonal written a ⊥ b, if ¬a ≥ b (or equivalently ¬b ≥ a).

  2. (b)

    They are said to be commuting, if a = c 1 ∨ c 3 and b = c 2 ∨ c 3 with c i ⊥ c j if i ≠ j.

\(\blacksquare \)

Remark 4.19

  1. (a)

    These notions of orthogonality and compatibility make sense because, a posteriori, they turn out to be the usual ones when propositions are interpreted via projectors.

Proposition 4.20

If H Let H a Hilbert space and think of \({\mathcal {L}}({\mathsf H})\) as an orthocomplemented lattice. Two elements \(P,Q \in {\mathcal {L}}({\mathsf H})\)

  1. (i)

    are orthogonal in the sense of Definition 4.18 if and only if they project onto mutually orthogonal subspaces, which it is equivalent to saying PQ = QP = 0;

  2. (ii)

    commute in accordance with Definition 4.18 if and only if PQ = QP.

Proof

  1. (i)

    ¬P ≥ Q is equivalent to Q(H) ⊂ P(H); in turn, this is the same as PQ = QP = 0 for Proposition 3.17.

  2. (ii)

    Assume that P = P 1 ∨ P 3 and Q = P 1 ∨ P 2 where P i P j = 0 if i ≠ j so that, in particular, P i and P j commute. Therefore, embedding the P j in a maximal set of commuting projectors \(\mathcal {L}_0\), in view of Proposition 4.10 we have P = P 1 + P 2 − P 1 P 2 = P 1 + P 2 and Q = P 1 + P 3 − P 1 P 3 = P 1 + P 3 and also PQ = QP since P i and P j commute. If conversely, PQ = QP, the required decomposition comes from choosing P 3 := PQ, P 1 := P(I − Q), P 2 := Q(I − P).

  1. (b)

    It is not difficult to prove [BeCa81, Mor18] that, in an orthocomplemented lattice \(\mathcal {L}\), p, q commute if and only if the intersection of all orthocomplemented sublattices containing both p and q (an orthocomplemented sublattice in its own right) is Boolean. \(\blacksquare \)

Now, fully fledged with an orthocomplemented lattice and the notion of compatible propositions, we can attach a physical meaning (an interpretation backed by experimental evidence) to the requests that the lattice be orthocomplemented, complete, atomistic, irreducible and that it have the covering property [BeCa81]. Under these hypotheses and assuming there exist at least four pairwise-orthogonal atoms, Piron ([Pir64, JaPi69],[BeCa81, §21], Aerts in [EGL09]) used projective geometry techniques to show that the lattice of quantum propositions can be canonically identified with the closed (in a generalized sense) subsets of a Hilbert space of sorts. In the latter:

  1. (a)

    the field is replaced by a division ring (typically not commutative) equipped with an involution, and

  2. (b)

    there exists a certain non-singular Hermitian form associated with the involution.

It has been conjectured by many people (see [BeCa81]) that if the lattice is also orthomodular and separable, the division ring can only be picked among \({\mathbb R},{\mathbb C}\) or \({\mathbb H}\) (quaternion algebra).

More recently Solèr [Sol95] first and then Holland [Hol95] and Aerts–van Steirteghem [AeSt00] have found sufficient hypotheses, in terms of the existence of infinite orthogonal systems, for this to happen. These results are usually quoted as Solèr’s theorem. Under these hypotheses, if the ring is \({\mathbb R}\) or \({\mathbb C}\), we obtain precisely the lattice of orthogonal projectors of the separable Hilbert space. In the case of \({\mathbb H}\), one gets a similar generalized structure (see, e.g., [GMP13, GMP17]).

In all these arguments irreducibility is not really crucial: if property (v) fails, the lattice can be split into irreducible sublattices [Jau78, BeCa81]. Physically speaking this situation is natural in the presence of superselection rules, of which more later.

An evident issue arises here: why do physicists do not know quantum systems described on real or quaternionic Hilbert spaces?

This is a longstanding problem which was recently solved, at least for the physical description of elementary relativistic systems [MoOp17, MoOp19]. It seems that the complex structure is just a sort of accident imposed by relativistic symmetry.

Remark 4.21

It is worth stressing that the covering property in Theorem 4.17 is crucial. Indeed there are other lattice structures relevant in physics verifying all the remaining properties in the aforementioned theorem. Remarkably the family of so-called causally closed sets in a general spacetime satisfies all said properties but the covering law (see, e.g. [Cas02]). This obstruction prevents one from endowing a spacetime with a natural (generalized) Hilbert structure, while it suggests ideas towards a formulation of quantum gravity. \(\blacksquare \)

4.4 Quantum States as Probability Measures and Gleason’s Theorem

As commented in Remark 3.66, the probabilistic interpretation of quantum states is not well defined because there is no true probability measure in view of the fact that there are incompatible observables. The idea is to redefine the notion of probability in the bounded, orthocomplemented, σ-complete lattice like \({\mathcal {L}}({\mathsf H})\) instead of on a σ-algebra. The study of these generalized measures is the final goal of this section.

4.4.1 Probability Measures on \({\mathcal {L}}({\mathsf H})\)

Exactly as in CM, where the generic states are probability measures on Boolean lattice \({\mathcal{B}}(\Gamma)\) of the elementary properties of the system (Sect. 4.1), we can think of states of a quantum system as σ-additive probability measures on the non-Boolean lattice of the elementary observables \({\mathcal {L}}({\mathsf H})\). A state is therefore a map \(\rho : \mathcal {L}({\mathsf H}) \ni P \mapsto \rho (P) \in [0,1]\) that satisfies ρ(I) = 1 and a σ-additive requirement

$$\displaystyle \begin{aligned}\rho \left( \vee_{n\in {\mathbb N}} P_{n}\right) = \sum_{n\in {\mathbb N}} \rho (P_{n})\:, \end{aligned}$$

where the sequence \(\{P_n\}_{n\in {\mathbb N}} \subset \mathcal {L}({\mathsf H})\) is made of mutually exclusive elementary propositions, i.e., simultaneously compatible (P i P j = P j P i) and independent (P i ∧ P j = 0 if i ≠ j). In other words, since P i ∧ P j = P i P j when the projectors commute, the said condition can be equivalently stated by requiring that P i P j = P j P i = 0 for i ≠ j, also written P i ⊥ P j if i ≠ j. Making use of associativity of ∨ and Proposition 4.10 (i), we have

$$\displaystyle \begin{aligned}\vee_{n\leq k} P_{n} = \sum_{n=0}^k P_{n}\:.\end{aligned}$$

Next, exploiting Proposition 4.9 (iv), we can write the projector \(\vee _{n\in {\mathbb N}} P_{n}\) into a more effective way:

$$\displaystyle \begin{aligned}\vee_{n\in {\mathbb N}} P_{n} = \mbox{s-}\lim_{k\to +\infty} \vee_{n\leq k} P_{n} = \mbox{s-}\lim_{k\to +\infty} \sum_{n=0}^k P_{n} = \mbox{s-}\sum_{n \in {\mathbb N}}P_n\:.\end{aligned}$$

(As usual “s-” denotes the limit in the strong operator topology.) The σ-additivity requirement can be rephrased as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \rho \left(\mbox{s-}\sum_{n\in {\mathbb N}} P_{n}\right) = \sum_{n\in {\mathbb N}} \rho (P_{n}){}\end{array} \end{aligned} $$
(4.11)

where the sequence \(\{P_n\}_{n\in {\mathbb N}} \subset \mathcal {L}({\mathsf H})\) satisfies P n P m = 0 for n ≠ m. Notice that simple additivity is subsumed just assuming that P n = 0 for all n excluding a finite subset of \({\mathbb N}\).

Remark 4.22

  1. (a)

    The class \(\{P_n\}_{n\in {\mathbb N}}\) can always be embedded in a maximal set of commuting elementary observables \(\mathcal {L}_0\) that has the structure of a Boolean σ-algebra. A quantum state ρ restricted to \(\mathcal {L}_0\) is a standard Kolmogorov probability measure. Its quantum nature relies on the peculiarity that it acts also on projectors which are not contained in a common Boolean σ-algebra, namely incompatible elementary observables.

  2. (b)

    This is the most general notion of quantum state. The issue remains open about the existence of sharp states associating either 0 or 1 and not intermediate values to every elementary proposition, as the non-probabilistic states in phase space do. If they exist, they must be a special case of these probability measures. We shall see that actually sharp states do not exist in quantum theories, in the Hilbert space formulation, differently from classical theories. In this sense quantum theory is intrinsically probabilistic. \(\blacksquare \)

We address now two fundamental questions.

  1. (1)

    Do quantum states as above exist?

    The answer is positive: if ψ ∈H and ||ψ|| = 1, the map \(\rho _\psi : \mathcal {L}({\mathsf H}) \ni P \mapsto \langle \psi |P\psi \rangle \in [0,1]\) satisfies the requirement as the reader immediately proves: ρ ψ(I) = 〈ψ|ψ〉 = 1 and (4.11) is valid simply because the inner product is continuous. It is worth stressing that, as expected form elementary formulations, ρ ψ depends on ψ up to a phase. In fact, ρ  = ρ ψ if \(a\in {\mathbb C}\) with |a| = 1.

  2. (2)

    Are unit vectors, up to phase, the unique quantum states?

    The answer is negative and quite articulate. The rest of this section is mainly devoted to answer this question properly. To do it, we need to focus on a particular class of operators called trace-class operators because they play a central role in a celebrated characterization due to Gleason of the aforementioned measures. To define trace-class operators we need two ingredients, the polar decomposition theorem and the class of compact operators.

4.4.2 Polar Decomposition

Complex numbers z ≠ 0 can be decomposed in a product z = u|z| of a positive number, the absolute value |z|, and the phase u, with |u| = 1. Bounded (actually closed) operators A ≠ 0 can be analogously decomposed as composition A = U|A| of their absolute value |A|, which is positive, and a “partial isometry” U with ||U|| = 1. To explain how this decomposition works we needs a preliminary result.

Proposition 4.23

Let H be a Hilbert space and A : H →H a positive operator :x|Ax〉≥ 0 for every x H. There exists a unique positive operator B : H →H such that A = B 2 . This operator is bounded and commutes with every operator in \({\mathfrak B}({\mathsf H})\) commuting with A.

It is called the square root of A and is denoted by \(\sqrt {A}\).

Proof

We remind the reader that a positive operator T : H →H is necessarily in \({\mathfrak B}({\mathsf H})\) and selfadjoint in view of (3) Exercise 2.43. As \(A \in {\mathfrak B}({\mathsf H})\) is selfadjoint, A =∫σ(A) λdP (A)(λ) by Theorem 3.40. Moreover σ(A) ∈ [0, +) as proved in Proposition 3.46. So \(B := \int _{\sigma (A)} \sqrt {\lambda } dP^{(A)}(\lambda )\) is selfadjoint positive (using the same proof as for Proposition 3.46) and

$$\displaystyle \begin{aligned}BB = \int_{\sigma(A)} \sqrt{\lambda} dP^{(A)}(\lambda)\int_{\sigma(A)} \sqrt{\lambda} dP^{(A)}(\lambda) = \int_{\sigma(A)} \lambda dP^{(A)}(\lambda) =A\:, \end{aligned}$$

by Proposition 3.29 (d) as all operators are in \({\mathfrak B}({\mathsf H})\). If \(B' \in {\mathfrak B}({\mathsf H})\) is positive and B′B′ = A we have \(\int _{[0+\infty )} r^2 dP^{(B')}(r) = A = \int _{[0,+\infty )} r^2 dP^{(B)}(r)\), that is ∫[0+) sdQ′(s) = A =∫[0,+) sdQ(s) where we have defined \(Q_E := P^{(B')}_{\phi ^{-1}(E)}\) and \(Q_E := P^{(B)}_{\phi ^{-1}(E)}\) and the homeomorphism ϕ : [0, +) ∋ rr 2 ∈ [0, +) according to Proposition 3.33 (f). The uniqueness of the spectral measure of a selfadjoint operator (extending Q and Q′ on \(\mathcal {B}({\mathbb R})\) in the simplest way, i.e. Q 1E := Q E∩[0,+)) implies that Q = Q′ = P (A) so that \(P^{(B)}_E = Q_{\phi (E)}= Q^{\prime }_{\phi (E)} = P^{(B')}_E\). Hence B = B′.

To conclude, observe that if \(D^*=D\in {\mathfrak B}({\mathsf H})\) commutes with A, then D commutes with A n and hence with e itA for every \(t\in {\mathbb R}\) as a consequence of Exercise 3.64. Proposition 3.69 entails that D commutes with the spectral measure of A and thus with every operator s(A) =∫σ(A) sdP (A), where s is a simple function. Approximating essentially bounded functions f with simple functions according to Proposition 3.29 (c), we extend the result to operators f(A). In particular D commutes with \(\sqrt {A}\) (which is bounded on σ(A) since compact). If \(D \in {\mathfrak B}({\mathsf H})\) is not selfadjoint, the previous argument holds true for the selfadjoint operators \(\frac {1}{2}(D+D^*)\) and \(\frac {1}{2i}(D-D^*)\). Hence, it holds for their sum D. □

Definition 4.24

If \(A\in {\mathfrak B}({\mathsf H})\) for a Hilbert space H, the absolute value of A is the operator \(|A|:= \sqrt {A^*A}\). \(\blacksquare \)

We are ready for the polar decomposition theorem. An extensive discussion, also applied to closed unbounded operators, appears in [Mor18].

Theorem 4.25 (Polar Decomposition)

Let \(A\in {\mathfrak B}({\mathsf H})\) for a Hilbert space H . There is a unique pair \(P \in {\mathfrak B}({\mathsf H})\) , \(U \in {\mathfrak B}({\mathsf H})\) such that

  1. (a)

    A = UP (called polar decomposition of A),

  2. (b)

    P is positive,

  3. (c)

    U vanishes on Ker(A) and is isometric on Ran(P).

Moreover, P = |A| and Ker(U) = Ker(A) = Ker(P).

Proof

Let us start by observing that A and |A| have the same kernel, since we have |||A|x||2 = 〈|A|x||A|x〉 = 〈x||A|2 x〉 = 〈x|A Ax〉 = 〈Ax|Ax〉 = ||Ax||2 Hence, on \(Ker(A)^\perp = Ker(|A|)^\perp = \overline {Ran(|A|{ }^*)}= \overline {Ran(|A|)}\) they are injective. So define U : Ran(|A|) →H by means of Uy := A|A|−1 y if y ∈ Ran(|A|). With this definition, we have A = U|A| no matter how we extend U outside Ran(|A|). Now notice that ||Ax||2 = ||U|A|x||2 = |||A|x||2 as established above. This formula proves that U is isometric on Ran(|A|) and, with the standard argument based on polarization formula, we also have that 〈Uu|Uv〉 = 〈u|v〉 provided u, v ∈ Ran(|A|). The operator U is in particular continuous and can be extended on \(\overline {Ran(|A|)}\) by continuity, remaining isometric there. Since \({\mathsf H} = Ker(A) \oplus Ker(A)^\perp = Ker(A) \oplus Ker(|A|)^\perp = Ker(A) \oplus \overline {Ran(|A|{ }^*})= Ker(A) \oplus \overline {Ran(| A|})\), if we define U = 0 on Ker(A), we have constructed an operator \(U\in {\mathfrak B}({\mathsf H})\) such that, together with P := |A|, all requirements (a),(b) and (c) are valid and also Ker(U) = Ker(A) = Ker(P). In particular Ker(U) cannot contain non-vanishing vectors orthogonal to Ker(A), i.e. in \(\overline {Ran(|A|})\), since U is isometric thereon. Suppose conversely that there exist \(U', P'\in {\mathfrak B}({\mathsf H})\) satisfying (a),(b) and (c). From A = U′P′, we have A  = P U  = P′ and thus A A = P′U U′P′ = P′P′ = P 2 (where we have used the fact that, since U′ is isometric on Ran(P′), for every x, y ∈H, we have 〈x|P′P′y〉 = 〈P′x|P′y〉 = 〈U′P′x|U′P′y〉 = 〈x|P′U U′P′y〉, so that P′P′ = P′U U′P′). Since P′ is positive, we have \(P'= \sqrt {A^*A} = |A|\) by uniqueness of the square root. As A is injective on Ran(|A|), the formula A = U′|A| implies Uy := A|A|−1 y = Uy if y ∈ Ran(|A|). As before, since U′ is bounded, U′ = U on \(\overline {Ran(|A|)}\) by continuity. Finally U = U′ also on \(\overline {Ran(|A|)}^\perp = Ker(|A|)\) since both vanish by hypothesis there. Summing up, U = U′. □

Remark 4.26

Observe that if A ≠ 0, U cannot vanish. Since ||Ux||≤||x|| by construction and ||Ux|| = ||x|| on a non-trivial subspace (Ran(P) ≠ {0} if A ≠ 0), we conclude that ||U|| = 1. \(\blacksquare \)

Another related technically useful notion is that of partial isometry.

Definition 4.27

If H is an Hilbert space, an operator \(U \in {\mathfrak B}({\mathsf H})\) that restricts to an isometry on K 1 := Ker(U) is called a partial isometry with initial space K 1 and final space K 2 := Ran(U). \(\blacksquare \)

Evidently the U in the polar decomposition A = UP is a partial isometry with initial space Ker(A).

Exercise 4.28

Prove that if \(U\in {\mathfrak B}({\mathsf H})\) is a partial isometry with initial space K 1 and final space K 2, then K 2 is closed.

Solution

First of all, if \(y \in \overline {Ran(U)} = \overline {K_2}\), there is a sequence of vectors x n ∈H with Ux n → y. Decomposing \(x_n = x^{\prime }_n +x^{\prime \prime }_n\) with respect to the standard decomposition Ker(U)⊕ Ker(U), we can omit the part \(x^{\prime \prime }_n \in Ker(U)\) since \(Ux^{\prime \prime }_n=0\), and we are allowed to assume \(Ux^{\prime }_n \to y\). Since U acts isometrically on \(x^{\prime }_n\), and the sequence of \(Ux^{\prime }_n\)s is Cauchy, the sequence of the \(x^{\prime }_n\)s must be Cauchy as well. By continuity of U, \(y=U(\lim _{n\to +\infty }x^{\prime }_n) \in Ran(U)\). Therefore \(\overline {Ran(U)} = Ran(U)\), namely K 2 is closed.

Exercise 4.29

Prove that \(U\in {\mathfrak B}({\mathsf H})\) is a partial isometry with initial space K 1 if and only if U U is the orthogonal projector onto K 1.

Solution

If U is a partial isometry with initial space K 1 = Ker(U), then 〈Ux|Uy〉 = 〈x|y〉 for x, y ∈ K 1. However, since H = K 1 ⊕ Ker(U), we can extend by linearity this formula to 〈Ux|Uy〉 = 〈x|y〉 for x ∈ K 1 and y ∈H. This is equivalent to 〈U Ux|y〉 = 〈x|y〉 for x ∈ K 1 and y ∈H, namely U Ux = x if x ∈ K 1. On the other hand, U Ux = 0 if \(x\in Ker(U)=K_1^\perp \). In other words, \(U^*U : K_1 \oplus K_1^\perp \ni x+y \mapsto x+0 \in K_1 \oplus K_1^\perp \), so that it coincides with the orthogonal projector onto K 1. If, conversely \(U\in {\mathfrak B}({\mathsf H})\) is such that U U is the orthogonal projector onto the closed subspace K 1, we have that 〈Ux|Uy〉 = 〈U Ux|y〉 = 〈x|y〉 for x, y ∈ K 1, so that U is an isometry on it. Furthermore Ux = 0 is equivalent to ||Ux||2 = 0, that is 〈x|U Ux〉 = 0. Since U U is idempotent and selfadjoint, this is equivalent to 〈U Ux|U Ux〉 = 0, namely ||U Ux|| = 0. We have proved that \(Ker(U)= U^*U({\mathsf H})^\perp = K_1^\perp \). In other words, K 1 = Ker(U). In summary, U is a partial isometry with initial space K 1.

Exercise 4.30

Prove that if \(U\in {\mathfrak B}({\mathsf H})\) is a partial isometry with initial space K 1 and final space K 2, then U is a partial isometry with initial space K 2 and final space K 1. Consequently, UU is the orthogonal projector onto K 2.

Solution

From the previous exercise, U (Ux) = x if x ∈ K 1, so Ux ∈ K 2. Since ||Ux|| = ||x||, we have obtained that U is isometric on K 2 = Ran(U) = Ker(U ). We furthermore have \(Ker(U^*) = \overline {Ran(U)}= K_1\). The last statement immediately follows form the previous exercise noticing that (U ) = U.

4.4.3 The Two-Sided -Ideal of Compact Operators

We give here the definition of compact operator on a Hilbert space. However the definition is much more general and can be given for operators \(A \in {\mathfrak B}(X, Y)\) with X, Y normed spaces, preserving many properties of these types of bounded operators (see, e.g., [Mor18]).

Definition 4.31

Let H be a Hilbert space. An operator \(A \in {\mathfrak B}({\mathsf H})\) is said to be compact if \(\{Ax_n\}_{n\in {\mathbb N}}\) admits a convergent subsequence if \(\{x_n\}_{n \in {\mathbb N}} \subset {\mathsf H}\) is bounded. The class of compact operators on H is indicated by \({\mathfrak B}_\infty ({\mathsf H})\). \(\blacksquare \)

Example 4.32

  1. (1)

    As an example, every operator \(A \in {\mathfrak B}({\mathsf H})\) such that Ran(A) is a finite-dimensional subspace of H is necessarily compact. In fact, let us identify Ran(A) with \({\mathbb C}^n\) for n given by the (finite) dimension of Ran(A) by fixing a Hilbert basis of Ran(A) (which coincides with \(\overline {Ran(A)}\), since all finite-dimensional subspaces are closed, the proof being elementary). If \(\{x_n\}_{n\in {\mathbb N}} \subset {\mathsf H}\) is bounded, i.e. ||x n||≤ C for all \(n\in {\mathbb N}\) and some (finite) constant C > 0, then ||Ax n||≤||A||C for \(n \in {\mathbb N}\). The vectors Ax n are therefore contained in the closed ball in \({\mathbb C}^n\) of radius ||A||C and centred at the origin, which is necessarily compact. Hence \(\{Ax_n\}_{n\in {\mathbb N}}\) admits a convergent subsequence. An example of such type of compact operator is a finite linear combination of operators A x,y : H ∋ z↦〈x|zy, for x, y ∈H fixed.

  2. (2)

    If \(A \in {\mathfrak B}({\mathsf H})\) and \(P \in \mathcal {L}({\mathsf H})\) is an orthogonal projector onto a finite-dimensional subspace, then \(AP \in {\mathfrak B}_\infty ({\mathsf H})\). In fact, if e 1, …, e n is an orthonormal basis of P(H), we have

    $$\displaystyle \begin{aligned}Ran(AP) = \left\{ \left.\sum_{j=1}^n c_j f_j \:\right|\: c_j \in {\mathbb C}\:, j=1,\ldots, n\right\}\:,\end{aligned}$$

    where f j := Ae j for j = 1, …, n. Therefore Ran(AP) has dimension ≤ n and AP is compact due to (1). \(\blacksquare \)

We summarize below the most important properties of compact operators on Hilbert spaces.

Theorem 4.33

Let H be a Hilbert space and focus on the set of compact operators \({\mathfrak B}_\infty ({\mathsf H})\). \(A \in {\mathfrak B}({\mathsf H})\) is compact if and only if |A| is compact.

Furthermore \({\mathfrak B}_\infty ({\mathsf H})\) is:

  1. (a)

    a linear subspace of \({\mathfrak B}({\mathsf H})\) ;

  2. (b)

    a two-sided -ideal of \({\mathfrak B}({\mathsf H})\) , i.e,

    1. (i)

      \(AB, BA \in {\mathfrak B}_\infty ({\mathsf H})\) if \(B\in {\mathfrak B}({\mathsf H})\) and \(A\in {\mathfrak B}_\infty ({\mathsf H})\) ,

    2. (ii)

      \(A^*\in {\mathfrak B}_\infty ({\mathsf H})\) if \(A\in {\mathfrak B}_\infty ({\mathsf H})\).

  3. (c)

    a C -algebra (without unit if H is not finite-dimensional) with respect to the structure induced by \({\mathfrak B}({\mathsf H})\) . In particular \({\mathfrak B}_\infty ({\mathsf H})\) is a closed subspace of \({\mathfrak B}({\mathsf H})\).

Proof

The first statement immediately arises from the definition of compact operator and formula |||A|x||2 = 〈|A|x||A|x〉 = 〈x||A|2 x〉 = 〈x|A Ax〉 = 〈Ax|Ax〉 = ||Ax||2, that implies that \(\{|A|x_n\}_{n\in {\mathbb N}}\) is Cauchy if and only if \(\{Ax_n\}_{n\in {\mathbb N}}\) is Cauchy.

  1. (a)

    Fix \(a,b\in {\mathbb C}\), \(A,B \in {\mathfrak B}_\infty ({\mathsf H})\), and a bounded sequence \(\{x_n\}_{n\in {\mathbb N}}\). Extract a subsequence \(\{x_{n_k}\}_{k\in {\mathbb N}}\) such that \(Ax_{n_k} \to y\) as k → +. \(\{x_{n_k}\}_{k\in {\mathbb N}}\) is bounded, so that there is a subsequence \(\{x_{{n_k}_h}\}_{h\in {\mathbb N}}\) such that \(Bx_{{n_k}_h} \to z\) as h → +. By construction \((aA+bB) x_{{n_k}_h} \to ay + bz\) as h → +. Hence, aA + bB is compact.

  2. (b)

    The fact that AB and BA are compact if \(A \in {\mathfrak B}_\infty ({\mathsf H})\) and \(B\in {\mathfrak B}({\mathsf H})\) are immediate consequences of the fact that B is bounded and the definition of compact operator. The fact that A is compact if A is compact now immediately follows from the first statement and the polar decomposition (Theorem 4.25). In fact, from A = U|A| we have A  = |A|U , since |A| is compact and \(U^* \in {\mathfrak B}({\mathsf H})\), A is compact as well.

  3. (c)

    Let us prove that \({\mathfrak B}_\infty ({\mathsf H})\) is a Banach space with respect to the operator norm, since the remaining requirements for defining a C -algebra are valid because of (a) and (b). Let \({\mathfrak B}({\mathsf H}) \ni A = \lim _{i\to +\infty } A_i\) with \(A_i\in {\mathfrak B}_\infty ({\mathsf H})\). Take a bounded sequence \(\{x_n\}_{n\in {\mathbb N}}\) in H: ||x n||≤ C for any n. We want to prove the existence of a convergent subsequence of {Ax n}. Using a hopefully clear notation, we build recursively a family of subsequences:

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \{x_n\}\supset \{x_n^{(1)}\}\supset \{x_n^{(2)}\} \supset \cdots {}\end{array} \end{aligned} $$
    (4.12)

    such that, for any i = 1, 2, …, \(\{x_n^{(i+1)}\}\) is a subsequence of \(\{x_n^{(i)}\}\) with \(\{A_{i+1}x_n^{(i+1)}\}\) convergent. This is always possible, because any \(\{x_n^{(i)}\}\) is bounded by C, being a subsequence of {x n}, and A i+1 is compact by assumption. We claim that \(\{Ax_i^{(i)}\}\) is the subsequence of {Ax n} that will converge. From the triangle inequality

    $$\displaystyle \begin{aligned}||Ax_i^{(i)} - Ax_k^{(k)}|| \leq ||Ax_i^{(i)} - A_nx_i^{(i)}|| + ||A_nx_i^{(i)} - A_nx_k^{(k)}|| +||A_nx_k^{(k)} - Ax_k^{(k)}||\:.\end{aligned}$$

    With this estimate,

    $$\displaystyle \begin{aligned}||Ax_i^{(i)} - Ax_k^{(k)}|| \leq ||A-A_n||(||x_i^{(i)}|| +||x_k^{(k)}||) + ||A_nx_i^{(i)}-A_nx_k^{(k)}||\end{aligned}$$
    $$\displaystyle \begin{aligned}\leq 2C||A-A_n|| +||A_nx_i^{(i)}-A_nx_k^{(k)}||\:.\end{aligned}$$

    Given 𝜖 > 0, if n is large enough then 2C||A − A n||≤ 𝜖∕2, since A n → A. Fix n and take r ≥ n. Then \(\{A_n(x_p^{(r)})\}_p\) is a subsequence of the convergent sequence \(\{A_n(x_p^{(n)})\}_p\). Consider the sequence \(\{A_n(x_p^{(p)})\}_p\), for p ≥ n: it picks up the “diagonal” terms of all those subsequences, each of which is a subsequence of the preceding one by (4.12); moreover, it is still a subsequence of the convergent sequence \(\{A_n(x_p^{(n)})\}_p\), so it, too, converges (to the same limit). We conclude that if i, k ≥ n are large enough, then \(||A_nx_i^{(i)}-A_nx_k^{(k)}|| \leq \epsilon /2\). Hence if i, k are large enough then \(||Ax_i^{(i)} - Ax_k^{(k)}|| \leq \epsilon /2 +\epsilon /2 =\epsilon \). This finishes the proof, for we have produced a Cauchy subsequence in the Banach space H, which must converge in the space.

    To end the proof of (c), we notice that, evidently, I cannot be compact if H is infinite-dimensional, since every orthonormal sequence \(\{u_n\}_{n\in {\mathbb N}}\) cannot admit a convergent subsequence because ||u n − u m||2 = 2 for n ≠ m.

To conclude this essential summary of properties of compact operators on Hilbert spaces, we state and prove the version of spectral theorem for selfadjoint compact operators due to Hilbert and Schmidt. (An alternate proof of this classical theorem can be found in [Mor18].)

Theorem 4.34 (Hilbert-Schmidt Decomposition)

Let H be a Hilbert space and consider \(T^*=T \in {\mathfrak B}({\mathsf H})\) a compact operator with T ≠ 0. The following facts hold.

  1. (a)

    σ(T) ∖{0} = σ p(T) ∖{0}, so that, if 0 ∈ σ(T), either 0 ∈ σ p(T) or 0 is the unique element of σ c(T).

  2. (b)

    σ(T) is finite or countable. In the latter case 0 is unique accumulation point of σ p(T).

  3. (c)

    There exists λ  σ p(T) with ||T|| = |λ|.

  4. (d)

    If λ  σ p(T) ∖{0}, the λ-eigenspace has dimension d λ < +∞.

  5. (e)

    The spectral decomposition

    $$\displaystyle \begin{aligned} \begin{array}{rcl} T x = \sum_{n \in N} \lambda_n \langle u_n| x \rangle u_n \quad \forall x \in {\mathsf H}\: \end{array} \end{aligned} $$
    (4.13)

    holds (the ordering is irrelevant), for a finite ( \(N\subsetneq {\mathbb N}\) ) or countable ( \(N={\mathbb N}\) ) Hilbert basis of eigenvectors {u n}nN of \(\overline {Ran(T)}\) , where λ n ∈ σ p(T) is the eigenvalue of u n.

  6. (f)

    If \(N= {\mathbb N}\) and the ordering of the u n is such that |λ n|≥|λ n+1|, then

    $$\displaystyle \begin{aligned} \begin{array}{rcl} T= \sum_{n =0}^{+\infty} \lambda_n \langle u_n| \: \rangle u_n \:,{} \end{array} \end{aligned} $$
    (4.14)

    in the uniform operator topology.

Proof

  1. (a)

    Take λ ∈ σ c(T) ∖{0} assuming that it exists. Due to Proposition 3.3, for every natural number n > 0 there is x n ∈H with ||x n|| = 1 and \(||Tx_n-\lambda x_n||< \frac {2}{n}\). In particular, if P (T) is the PVM of T, we can always fix x n in the closed subspace P [λ−1∕n,λ+1∕n](H). This subspace is not trivial because of (d) Theorem 3.40, as it contains the non-trivial subspace P (λ−1∕n,λ+1∕n)(H). Consequently,

    $$\displaystyle \begin{aligned} ||x_n-x_m|| &= |\lambda|{}^{-1}\: ||\lambda x_n-\lambda x_m|| \leq |\lambda|{}^{-1}\: ||\lambda x_n - Tx_n -\lambda x_m+ Tx_m|| \\ &\quad + |\lambda|{}^{-1}\: ||T x_n-Tx_m|| \:. \end{aligned} $$

    Hence

    $$\displaystyle \begin{aligned} ||x_n-x_m||\leq \frac{4}{|\lambda| n} + \frac{1}{|\lambda|} ||T x_n-Tx_m|| \:,\end{aligned}$$

    so that

    $$\displaystyle \begin{aligned} \begin{array}{rcl} ||T x_n-Tx_m|| \geq|\lambda| ||x_n-x_m|| - \frac{4}{n} {}\:.\end{array} \end{aligned} $$
    (4.15)

    Moreover, since λ ∈ σ c(T) and invoking Proposition 3.29 (c), we have as m → +

    $$\displaystyle \begin{aligned} P^{(T)}_{[\lambda-1/m, \lambda +1/m]}x_n &= \int_{\mathbb R} \chi_{[\lambda-1/m, \lambda +1/m]} dP^{(T)}x_n \to \int_{\mathbb R} \chi_{\{\lambda\}} dP^{(T)}x_n \\ &= P^{(T)}_{\{\lambda\}}x_n= 0x_n =0\:. \end{aligned} $$

    This fact has the implication that, if n is fixed, then 〈x n|x m〉→ 0 if m → +, because \(\langle x_n|x_m\rangle = \langle x_n|P^{(T)}_{[\lambda -1/m, \lambda +1/m]} x_m\rangle = \langle P^{(T)}_{[\lambda -1/m, \lambda +1/m]} x_n|x_m\rangle \to 0\). Hence, we also have that ||x n − x m||2 = 2 − 2Rex n|x m〉→ 2 as m → +. In summary, looking at (4.15), if n is sufficiently large such that

    $$\displaystyle \begin{aligned} \frac{4}{n} < \frac{|\lambda|\sqrt{2}}{4}\:, \end{aligned}$$

    we can always take m so large that \(||x_n-x_m|| \geq \frac {\sqrt {2}}{2}\), obtaining

    $$\displaystyle \begin{aligned}||T x_n-Tx_m|| \geq|\lambda|\frac{\sqrt{2}}{2} -|\lambda| \frac{\sqrt{2}}{4} = |\lambda| \frac{\sqrt{2}}{4}\:.\end{aligned}$$

    Even if the sequence \(\{x_n\}_{n\in {\mathbb N}}\) is bounded (because ||x n|| = 1 for every \(n\in {\mathbb N}\)), its image \(\{Tx_n\}_{n\in {\mathbb N}}\) cannot contain Cauchy subsequences since \(||T x_n-Tx_m|| \geq |\lambda | \frac {\sqrt {2}}{4} \) if n and m are sufficiently large. This is impossible because T is compact. The only possibility is λ = 0 concluding the proof of (a).

  2. (b)

    Suppose that for some sequence of elements σ p(T) ∋ λ n → a ≠ 0 as n → +. Consider eigenvectors x n with Tx n = λ n x n for ||x n|| = 1. Since x n ⊥ x m if n ≠ m (Proposition 3.13 (d)) and λ n → a as n → +, we have

    $$\displaystyle \begin{aligned}||Tx_n-Tx_m||{}^2 = ||\lambda_n x_n - \lambda_m x_m||{}^2 =|\lambda_n|{}^2 + |\lambda_m|{}^2 \geq 2|a|{}^2-\epsilon\end{aligned}$$

    for n, m > N 𝜖. If |a| > 0, taking 𝜖 = |a|2, we conclude that the sequence \(\{Tx_n\}_{n\in {\mathbb N}}\) cannot admit Cauchy subsequences, since ||Tx n − Tx m||2 ≥|a|2 > 0 for sufficiently large n, m as it instead should, since T is compact and \(\{x_n\}_{n \in {\mathbb N}}\) is bounded. In summary, the accumulation point a ≠ 0 does not exist. Now remember that σ(T), and hence σ p(T), are contained in [−||T||, ||T||] (Proposition 3.47). In every compact set [−||T||, −1∕n] ∪ [1∕n, ||T||] for \(n\in {\mathbb N}\) with 1∕n < ||T||, the there are finitely many (possibly none) elements of σ p(T), otherwise there would be an accumulation point and this is forbidden since the set does not contain 0. We have found that σ p(T) is either finite or countable and, in this case, 0 is the only accumulation point.

  3. (c)

    Since \(\sup \{|\lambda | \:|\: \lambda \in \sigma (T)\} = \sup \{|\lambda | \:|\: \lambda \in \sigma _p(T)\} = ||T||\) (Proposition 3.47), and ||T||≠ 0 cannot be an accumulation point of σ p(T), there must be λ ∈ σ p(T) with |λ| = ||T||.

  4. (d)

    If λ ∈ σ p(T) ≠ {0}, define H λ as the corresponding eigenspace of T and let {x j}jJ be a Hilbert basis of H λ. As a consequence ||Tx j − Tx k||2 = |λ|2||x j − x k||2 = |λ|22 if j ≠ k. So {Tx j}jJ cannot admit a Cauchy subsequence when J is not finite in spite of {x j}jJ being bounded and T compact. We conclude that J is finite, namely \(\dim ({\mathsf H}_\lambda ) <+\infty \).

  5. (e)

    We assume \(N = {\mathbb N}\) since the finite case is trivial. Consider a collection of sets E n ⊂ σ p(T) with \(n\in {\mathbb N}\) such that every set E n is finite, E n+1 ⊃ E n and \(\cup _{n\in {\mathbb N}}E_n = \sigma _p(T)\). Notice that σ p(T) = σ(T), possibly up to the point 0 ∈ σ c(T), that however does not play any role in the following because \(P^{(T)}_{\{\lambda \}}=0\) if λ ∈ σ c(T) as we know by Theorem 3.40. The sequence of functions \(\chi _{E_n} \imath \) tends pointwise to \(\chi _{\sigma _{p}(T)} \imath \) and is bounded by the constant ||T|| since σ(T) ⊂ [−||T||, ||T||]. Applying Proposition 3.29 (c), we have since P (T) is concentrated on the eigenvalues,

    $$\displaystyle \begin{aligned}Tx = \int_{{\mathbb R}}\imath\: dP^{(T)}x = \lim_{n\to +\infty} \int_{{\mathbb R}} \chi_{E_n} \imath\: dP^{(T)}x =\lim_{n\to +\infty} \sum_{\lambda \in E_n} \lambda P_{\{\lambda\}}^{(T)}x = \sum_{\lambda \in \sigma_p(T)} \lambda P_{\{\lambda\}}^{(T)}x\:,\end{aligned}$$

    where in the final formula the procedure we adopt to enumerate the eigenvalues does not matter because the sets E n are chosen arbitrarily. If we fix an orthonormal basis \(N_\lambda = \{u^{(\lambda )}_j\}_{j=1,\ldots , d_\lambda }\) in every eigenspace P λ(H) with λ ≠ 0 (if λ = 0 is an eigenvalue it does not give contribution to the total sum defining Tx), so that \(P_{\{\lambda \}}^{(T)}= \sum _{j=1}^{d_\lambda } \langle u^{(\lambda )}_j| \:\: \rangle u^{(\lambda )}_j\), we can rearrange the formula as

    $$\displaystyle \begin{aligned}Tx = \sum_{\lambda \in \sigma_p(T)}\sum_{j=1}^{d_\lambda} \lambda \langle u^{(\lambda)}_j|x \rangle u^{(\lambda)}_j\:.\end{aligned}$$

    According to Lemma 2.8, since the vectors in the sum are pairwise orthogonal, the sum can be rearranged arbitrarily and written into the form where the pairwise-orthogonal u n are the vectors in the union of bases \(\cup _{\lambda \in \sigma _p(T)\setminus \{0\}} N_\lambda \),

    $$\displaystyle \begin{aligned}T x = \sum_{n \in {\mathbb N}} \lambda_n \langle u_n| x \rangle u_n \quad \forall x \in {\mathsf H}\:,\end{aligned}$$

    with Tu n = λ n u n. Observe that, from the formula above, the set of orthonormal vectors u n spans the whole range of T and also its closure, so that they form a Hilbert basis of \(\overline {Ran(T)}\). The proof of (e) is over.

  6. (f)

    Suppose again that \(N={\mathbb N}\), otherwise everything becomes trivial. In this case 0 must be the unique limit point of the λ n in view of (b). Assuming to order the eigenvectors so that |λ n+1|≤|λ n|, consider the operators \(T_N := \sum _{n=0}^{N-1} \lambda _n \langle u_n| \:\: \rangle u_n\). Then

    $$\displaystyle \begin{aligned} ||(T-T_{N})x||{}^2 &= \left|\left|\sum_{n=N}^{+\infty} \lambda_n \langle u_n| x \rangle u_n \right|\right|{}^2 = \sum_{n=N}^{+\infty} |\lambda_n|{}^2 |\langle u_n| x \rangle |{}^2 \\ &\quad \leq |\lambda_{N}|{}^2\sum_{n=N}^{+\infty} |\langle u_n| x \rangle |{}^2 \leq |\lambda_{N}|{}^2||x||{}^2\:, \end{aligned} $$

    Hence, dividing by ||x|| and taking the sup over the vectors x with ||x||≠ 0,

    $$\displaystyle \begin{aligned}||T-T_N|| \leq |\lambda_{N}|\to 0 \quad \mbox{if}\ N\to +\infty.\end{aligned} $$

    We have proved that (4.14) is valid in the uniform operator topology, completing the proof.

Example 4.35

Let us come back to the Hamiltonian operator H of the harmonic oscillator discussed in (3) Example 3.43. It turns out that \(H^{-1} \in {\mathfrak B}_\infty ({\mathsf H})\). Since 0∉σ(H), necessarily H −1 = R 0(H) (the resolvent operator for λ = 0), hence \(H^{-1}\in {\mathfrak B}({\mathsf H})\). Moreover applying Corollary 3.53,

$$\displaystyle \begin{aligned}\sigma(H^{-1}) = \overline{\left\{ \left.\frac{1}{\hbar \omega(n+1/2)}\:\right| n=0,1,2,\ldots\right\}} = \{0\} \cup \left\{ \left.\frac{1}{\hbar \omega(n+1/2)}\:\right| n=0,1,2,\ldots\right\}\:,\end{aligned}$$

where the points \(\frac {1}{\hbar \omega (n+1/2)}\) are in the point spectrum as they are isolated (Theorem 3.40). Using the same proof as for proving (f) of Theorem 4.33, we have that

$$\displaystyle \begin{aligned}H^{-1} = \lim_{N\to +\infty} \sum_{n=0}^{N} \frac{1}{\hbar \omega(n+1/2)} \langle \psi_n| \cdot \rangle\psi_n\end{aligned} $$

where the ψ n are the eigenvectors of H, according to (3) Example 3.43, and the limit is in the uniform operator topology. Since the operators after the limit symbol are of finite rank and thus compact, applying Theorem 4.33 (c), we have that also H −1 is compact. The same result actually holds true for H α with α > 0.

4.4.4 Trace-Class Operators

Let us finally introduce an important family of compact operators of trace class. As a matter of fact, these operators A : H →H are those which admit a well-defined trace

$$\displaystyle \begin{aligned}tr(A) = \sum_{u \in N} \langle u|Au\rangle\:,\end{aligned}$$

where N ⊂H is a Hilbert basis and tr(A) does not depend on the choice of the Hilbert basis. This notion of trace is evidently the direct generalization of the analogous notion in finite-dimensional vectors spaces. This family of compact operators will play a decisive role in characterization of the class of quantum states.

The traditional procedure to introduce them (see, e.g., [Mor18]) passes through Hilbert-Schmidt operators, or even Schatten-class operators. However, since these types are not of great relevance in our concise presentation, we shall follow a much more direct route.

We start with a definition which becomes illuminating if we think of the trace as an integration procedure: we should deal with absolutely integrable functions to make effective the notion of integral. The same happens for the trace.

Definition 4.36

If H is a Hilbert space, \({\mathfrak B}_1({\mathsf H})\subset {\mathfrak B}({\mathsf H})\) denotes the set of trace-class or nuclear operators, i.e. the operators \(T\in {\mathfrak B}({\mathsf H})\) satisfying

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{z\in M} \langle z| |T| z \rangle <+\infty {}\end{array} \end{aligned} $$
(4.16)

for some Hilbert basis M ⊂H. \(\blacksquare \)

A technical proposition is in order after an important remark concerning alternative definitions of \({\mathfrak B}_1({\mathsf H})\).

Remark 4.37

A weaker version of condition (b) below, namely,

$$\displaystyle \begin{aligned}\sum_{u\in N}|\langle u|Tu \rangle| <+\infty \quad \mbox{for every Hilbert basis}\ N,\end{aligned}$$

is equivalent to \(T\in {\mathfrak B}_1({\mathsf H})\) in complex Hilbert spaces [Mor18] (but not in real Hilbert spaces). This condition is sometimes adopted as an alternative definition of \({\mathfrak B}_1({\mathsf H})\) in complex Hilbert spaces.\(\blacksquare \)

Proposition 4.38

Let H be a complex Hilbert space. Then for every \(T\in {\mathfrak B}_1({\mathsf H})\)

  1. (a)

    for every Hilbert basis N H ,

    $$\displaystyle \begin{aligned}||T||{}_1 := \sum_{u\in N} \langle u| |T| u \rangle <+\infty\end{aligned}$$

    and ||T||1 does not depend on N.

  2. (b)

    For every Hilbert basis N H ,

    $$\displaystyle \begin{aligned}\sum_{u\in N}|\langle u|Tu \rangle| \leq ||T||{}_1 < +\infty\:.\end{aligned}$$
  3. (c)

    T, |T| and \(\sqrt {|T|}\) belong to \({\mathfrak B}_\infty ({\mathsf H})\).

Proof

  1. (a)

    From Definition 4.36,

    $$\displaystyle \begin{aligned}+\infty > \sum_{z\in M} \langle z| |T| z \rangle = \sum_{z\in M} \left\langle \sqrt{|T|} z\left| \sqrt{|T|} \right.z \right\rangle = \sum_{z\in M}\left|\left|\sqrt{|T|}z\right|\right|{}^2 = \sum_{z\in M}\sum_{u\in N}\left|\left\langle u\left|\sqrt{|T|}\right. z\right\rangle\right|{}^2\end{aligned}$$
    $$\displaystyle \begin{aligned}= \sum_{z\in M}\sum_{u\in N}\left|\left\langle \left. \sqrt{|T|} u\right| z\right\rangle\right|{}^2 = \sum_{u\in N}\sum_{z\in M}\left|\left\langle \left.\sqrt{|T|} u\right| z\right\rangle\right|{}^2 = \sum_{u\in N}\left|\left|\sqrt{|T|} u\right|\right|{}^2 = \sum_{u\in N} \langle u| |T| u \rangle\:.\end{aligned}$$

    The crucial passage is swapping the sums ∑zMuN →∑uNzM. This exchange is allowed by interpreting the sum as a product integration of a pair of counting measures on a product space N × M and using the Fubini-Tonelli theorem. Observe that only countably many terms \(|\langle u|\sqrt {|T|} z\rangle |{ }^2\) of the Cartesian product N × M do not vanish, so the spaces are σ-finite and their product can be defined.

  2. (b)

    Making use of the polar decomposition of T (Theorem 4.25), we have

    $$\displaystyle \begin{aligned}\sum_{u\in N}|\langle u|Tu \rangle| = \sum_{u\in N}|\langle u|U|T|u \rangle| =\sum_{u\in N}\left|\left\langle u\left|U\sqrt{|T|}\sqrt{|T|}u \right.\right\rangle\right| = \sum_{u\in N}\left|\left\langle \sqrt{|T|} U^*u\left|\sqrt{|T|}u\right. \right\rangle\right| \end{aligned} $$
    $$\displaystyle \begin{aligned} \begin{array}{rcl} \leq \sum_{u\in N}\left|\left| \sqrt{|T|} U^*u\right|\right| \left|\left|\sqrt{|T|}u \right|\right| \leq \sqrt{\sum_{u\in N}\left|\left| \sqrt{|T|} U^*u\right|\right|{}^2} \sqrt{\sum_{u\in N}\left|\left|\sqrt{|T|}u \right|\right|{}^2}\leq C \sqrt{||T||{}_1}\:, {}\\\end{array} \end{aligned} $$
    (4.17)

    where

    $$\displaystyle \begin{aligned}C:= \sqrt{\sum_{u\in N}\left|\left| \sqrt{|T|} U^*u\right|\right|{}^2} = \sqrt{\sum_{u\in N} \langle u| U|T|U^* u \rangle} \:.\end{aligned}$$

    Let us study the value of C, proving that it is finite. We start by noticing that U|T|U is positive and thus coincides with |U|T|U |. On the other hand, \(U|T|U^*\in {\mathfrak B}_1({\mathsf H})\) since it satisfies (4.16) for a Hilbert basis M we go to construct. First observe that U is a partial isometry according to Exercise 4.30, so that it is an isometry on a closed subspace K = Ker(U ). If L is a Hilbert basis of K, the vectors U v for v ∈ L are an orthonormal system in \(\overline {Ran(U^*)}\) and this system can always be completed to a Hilbert basis M of H. In summary,

    $$\displaystyle \begin{aligned}+ \infty > ||T||{}_1 =\sum_{z\in M} \langle z| |T| z\rangle = \sum_{v\in L} \langle U^*v| |T| U^*v\rangle + \sum_{v\in M\setminus L} \langle z| |T| z\rangle \geq \sum_{v\in L} \langle v| U|T|U^* |v\rangle\end{aligned}$$
    $$\displaystyle \begin{aligned} = \sum_{v\in N'} \langle v| U|T|U^* |v\rangle = || U|T|U^*||{}_1\:, \end{aligned}$$

    where, in the last line, we have completed the basis L of K with a Hilbert basis L′ of K  = Ker(U ), obtaining a Hilbert basis N′ = L ∪ L′ of H, so that 〈v|U|T|U v〉 = 〈U v||T|U v〉 = 0 when v ∈ L′. Since we have in this way established that \(U|T|U^*\in {\mathfrak B}_1({\mathsf H})\), the value of ||U|T|U ||1 ≤||T||1 must be independent of the used basis and we can conclude that

    $$\displaystyle \begin{aligned}C = \sqrt{ \sum_{u\in N} \langle u| U|T|U^* |u\rangle } \leq \sqrt{||T||{}_1}\:.\end{aligned}$$

    Inserting in (4.17), we finish the proof of (b), \(\sum _{u\in N}|\langle u|Tu \rangle | \leq \sqrt {||T||{ }_1}\sqrt {||T||{ }_1} = ||T||{ }_1 < +\infty \:.\)

  3. (c)

    Consider a Hilbert basis M ⊂H. If \(T\in {\mathfrak B}_1({\mathsf H})\), we have \(||T_1|| = \sum _{u\in M} \left |\left |\sqrt {|T|} u\right |\right |{ }^2 <+\infty \:.\) As a consequence, the elements u ∈ M such that \(\left |\left |\sqrt {|T|} u\right |\right |\neq 0\) form a finite or countable subset {u n}nN. We henceforth assume \(N={\mathbb N}\), the finite case being trivial. Consider the compact operator \(\sqrt {|T|}P_N\) (see (2) Example 4.32), where \(P_N = \sum _{n=0}^{N-1} \langle u_n| \:\: \rangle u_n\). We have

    $$\displaystyle \begin{aligned}\left|\left|\left(\sqrt{|T|} - \sqrt{|T|}P_N\right)x\right|\right| = \left|\left|\sum_{n=N}^{+\infty}\langle u_n|x\rangle \sqrt{|T|} u_n \right|\right| \leq \sum_{n=N}^{+\infty} |\langle u_n|x\rangle| \left|\left|\sqrt{|T|} u_n\right|\right|\end{aligned}$$
    $$\displaystyle \begin{aligned}\leq \sqrt{\sum_{n=N}^{+\infty} |\langle u_n|x\rangle|{}^2}\sqrt{\sum_{n=N}^{+\infty} \left|\left|\sqrt{|T|} u_n\right|\right|{}^2}\leq ||x|| \sqrt{\sum_{n=N}^{+\infty} \left|\left|\sqrt{|T|} u_n\right|\right|{}^2}\:.\end{aligned}$$

    Hence

    $$\displaystyle \begin{aligned}\left|\left|\sqrt{|T|} - \sqrt{|T|}P_N\right|\right| \leq \sqrt{\sum_{n=N}^{+\infty} \left|\left|\sqrt{|T|} u_n\right|\right|{}^2}\:.\end{aligned}$$

    The right-hand side vanishes as N → + because the series \(\sum _{n=1}^{+\infty } \left |\left |\sqrt {|T|} u_n\right |\right |{ }^2\) converges to ||T||1 < +. Since \(\sqrt {|T|}P_N \in {\mathfrak B}_\infty ({\mathsf H})\) and this space is closed in the uniform topology, it being a C -algebra in \({\mathfrak B}({\mathsf H})\) (Theorem 4.33), we have \(\sqrt {|T|} \in {\mathfrak B}_\infty ({\mathsf H})\). Since \({\mathfrak B}_\infty ({\mathsf H})\) is a two-sided ideal (Theorem 4.33 again) we have both that \(|T| = \sqrt {|T|} \sqrt {|T|} \in {\mathfrak B}_\infty ({\mathsf H})\), and \(T= U |T| \in {\mathfrak B}_\infty ({\mathsf H})\), where we have used the polar decomposition of T, so that \(U\in {\mathfrak B}({\mathsf H})\).

The general properties of \({\mathfrak B}_1({\mathsf H})\) are listed in the next proposition.

Proposition 4.39

Let H a Hilbert space. Then \({\mathfrak B}_1({\mathsf H})\) satisfies the following properties.

  1. (a)

    \({\mathfrak B}_1({\mathsf H})\) is a subspace of \({\mathfrak B}({\mathsf H})\) and a two-sided -ideal, namely

    1. (i)

      \(AT, TA \in {\mathfrak B}_1({\mathsf H})\) if \(T\in {\mathfrak B}_1({\mathsf H})\) and \(A\in {\mathfrak B}({\mathsf H})\) ,

    2. (ii)

      \(T^*\in {\mathfrak B}_1({\mathsf H})\) if and only if \(T\in {\mathfrak B}_1({\mathsf H})\).

  2. (b)

    || ||1 is a norm making \({\mathfrak B}_1({\mathsf H})\) a Banach space and satisfying

    1. (i)

      ||TA||1 ≤||A|| ||T||1 and ||AT||1 ≤||A|| ||T||1 if \(T\in {\mathfrak B}_1({\mathsf H})\) and \(A\in {\mathfrak B}({\mathsf H})\) ,

    2. (ii)

      ||T||1 = ||T ||1 if \(T\in {\mathfrak B}_1({\mathsf H})\).

Proof

  1. (a)

    (We closely follow the proof of [ReSi80].) First of all, observe that |aA| = |a||A| for \(a \in {\mathbb C}\) so that, to prove that \({\mathfrak B}_1({\mathsf H})\) is a vector space it suffices to check that \(A+B\in {\mathfrak B}_1({\mathsf H})\) for \(A,B\in {\mathfrak B}_1({\mathsf H})\). Let U, V , and W the partial isometries arising from polar decompositions of A + B, A, and B: A + B = U|A + B| , A = V |A| , B = W|B| . As a consequence, if N is a Hilbert basis of H,

    $$\displaystyle \begin{aligned}\sum_{u\in N} \langle u||A+B|u\rangle = \sum_{u\in N} \langle u| U^*(A+B)u\rangle \leq \sum_{u\in N}|\langle u| U^*V |A|u\rangle|+ \sum_{u\in N}|\langle u| U^*W |B|u\rangle|\:.\end{aligned}$$

    However,

    $$\displaystyle \begin{aligned} \sum_{u\in N}|\langle u| U^*V Au\rangle| \leq \sum_{u\in N}||\sqrt{|A|}V^*Uu|| ||\sqrt{|A|}u|| \leq \sqrt{\sum_{u\in N}||\sqrt{|A|}V^*Uu||{}^2} \sqrt{\sum_{u\in N}||\sqrt{|A|}u||{}^2}\:.\end{aligned}$$

    The same argument is valid for B. Hence, if we can prove that

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{u\in N}||\sqrt{|A|}V^*Uu||{}^2\leq tr(|A|)\:,{}\end{array} \end{aligned} $$
    (4.18)

    we can conclude that

    $$\displaystyle \begin{aligned}\sum_{u\in N} \langle u||A+B|u\rangle\leq tr(|A|)+ tr(|B|) <+\infty\:,\end{aligned}$$

    establishing that \(A+B \in {\mathfrak B}_1({\mathsf H})\) as wanted. To show (4.18) we need only to prove that

    $$\displaystyle \begin{aligned}tr(U^*V|A|V^*U) \leq tr(|A|)\:.\end{aligned}$$

    Referring to a Hilbert basis N ∋ u whose elements satisfy either u ∈ Ker(U) or u ∈ Ker(U), we see that

    $$\displaystyle \begin{aligned}tr(U^*V|A|V^*U) \leq tr(V|A|V^*)\:.\end{aligned}$$

    Iterating the procedure for tr(V |A|V ), using a Hilbert basis N ∋ u whose elements satisfy either u ∈ Ker(V ) or u ∈ Ker(V ), we also conclude that

    $$\displaystyle \begin{aligned}tr(V|A|V^*) \leq tr(|A|)\:,\end{aligned}$$

    proving our assertion.

    • (a)(i) Since Proposition 3.55 is valid, exploiting the fact that \({\mathfrak B}_1({\mathsf H})\) is a linear space, we have only to prove that \(UT, TU \in {\mathfrak B}_1({\mathsf H})\) if \(T\in {\mathfrak B}_1({\mathsf H})\) and \(U\in {\mathfrak B}({\mathsf H})\) is unitary. Observe that |UT|2 = T U UT = |T|2 so |UT| = |T| and thus tr(|UT|) = tr(|T|) < + proving that \(UT\in {\mathfrak B}_1({\mathsf H})\). Similarly |TU|2 = U T TU = U |T|2 U, so that |TU| = U |T|U (because this operator is positive and its square is U |T|2 U). Therefore we have tr(|TU|) = tr(U |T|U) =∑uNUu||T|Uu〉 = tr(|T|) < + (because {Uu}uN is a Hilbert basis if N is since U is unitary) and so \(TU\in {\mathfrak B}_1({\mathsf H})\).

    • (a)(ii) Let T = U|T| the polar decomposition of T. Therefore T  = |T|U and |T |2 = TT  = U|T|2 U . Since U|T|U U|T|U  = U|T|2 U because U U is the orthogonal projector onto Ran(|A|) (Theorem 4.25 and Exercise 4.29), we conclude that |T | = U|T|U . Now (i) implies that \(T^* \in {\mathfrak B}_1({\mathsf H})\) if \(T\in {\mathfrak B}_1({\mathsf H})\). Since (T ) = T we have also that \(T^*\in {\mathfrak B}_1({\mathsf H})\) entails \(T\in {\mathfrak B}_1({\mathsf H})\).

  2. (b)

    If \(a\in {\mathbb C}\) and \(A\in {\mathfrak B}_1({\mathsf H})\), we find

    $$\displaystyle \begin{aligned}||aA||{}_1 = \sum_{u\in N}\langle u||aA|u\rangle = \sum_{u\in N}\langle u||a| |A|u\rangle= |a|\sum_{u\in N}\langle u||A|u\rangle= |a| ||A||{}_1\:.\end{aligned}$$

    Proving (a), we have established that ||A + B||1 ≤||A||1 + ||B||1 for \(A,B\in {\mathfrak B}_1({\mathsf H})\), so that \(||\:\:||{ }_1 : {\mathfrak B}_1({\mathsf H})\to {\mathbb C}\) is a seminorm. On the other hand, if ||A||1 = 0 it means that ∑uNu||A|u〉 = 0 for every Hilbert basis N. Since every unit vector x ∈H can be completed to a basis, this implies in particular that \(||\sqrt {|A|}x||{ }^2 =\langle x||A|x\rangle =0\) and thus \(|A|x= \sqrt {|A|}^2x =0\) for every x ∈H, so that ||Ax||2 = 〈Ax|Ax〉 = |||A|2 x|| = 0 for every x ∈H meaning that A = 0. Hence \(||\:\:||{ }_1 : {\mathfrak B}_1({\mathsf H})\to {\mathbb C}\) is a norm. The proof of the fact that the norm makes \({\mathfrak B}_1({\mathsf H})\) a Banach space can be found in [Scha60].

    • (b)(i) It is sufficient to check that ||AT||1 ≤||A||||T||1. Indeed, assuming it, from (ii) whose proof is independent form the present one, we have ||TA||1 = ||A T ||1 ≤||T ||||A ||1 = ||T||||A||1. Let us prove ||AT||1 ≤||A||||T||1. Consider the polar decomposition T = U|T| and also |AT| = W|AT|, so that |AT| = W (AT) = W AU|T|. Putting S = W AU, we have, exploiting the usual Hilbert basis N of eigenvectors of the selfadjoint positive compact operator |T|

      $$\displaystyle \begin{aligned}||AT||{}_1 = tr(|AT|) = tr(S|T|) = \sum_{u\in N} \langle u|S|T|u\rangle = \sum_{u\in N} \lambda_u \langle u|S u\rangle \leq \sum_{u\in N} |\lambda_u \langle u|S u\rangle| \end{aligned}$$
      $$\displaystyle \begin{aligned} \leq \sum_{u\in N} \lambda_u |\langle u|S u\rangle| \leq \sum_{u\in N}\lambda_u ||S|| = ||S|| ||T||{}_1\:. \end{aligned}$$

      Since W and U are partial isometries, ||S||≤||A||, proving that ||AT||1 ≤||A||||T||1.

    • (b)(ii) The proof of (a)(ii) established that |T | = U|T|U . Making use of a Hilbert basis N whose elements belong either to Ker(U ) or Ker(U ), we immediately have ||T ||1 =∑uNU u||T|Uu〉 = ||T||1.

We are now in a position to introduce the central mathematical tool of this section, i.e. the notion of trace of a trace-class operator, listing and proving its main properties with direct interest to quantum physics.

Proposition 4.40

Let H be a Hilbert space and focus on the space of operators \({\mathfrak B}_1({\mathsf H})\) . If N H is a Hilbert basis, the map

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathfrak B}_1({\mathsf H}) \ni T \mapsto tr(T) := \sum_{u\in N} \langle u|Tu\rangle \:, {} \end{array} \end{aligned} $$
(4.19)

is well defined, the sum can be rearranged and does not depend on the choice of N.

The complex number tr(T) is called the trace of T and satisfies the following further properties.

  1. (a)

    tr(aA + bB) = a tr(A) + b tr(B) for every \(a,b\in {\mathbb C}\) and \(A, B \in {\mathfrak B}_1({\mathsf H})\).

  2. (b)

    \(tr(A^*)= \overline {tr(A)}\) for every \(A\in {\mathfrak B}_1({\mathsf H})\).

  3. (c)

    tr(AB) = tr(BA) if \(A\in {\mathfrak B}_1({\mathsf H})\) and \(B\in {\mathfrak B}({\mathsf H})\).

  4. (d)

    For every \(A\in {\mathfrak B}_1({\mathsf H})\) ,

    1. (i)

      |tr(A)|≤ tr(|A|) = ||A||1 ,

    2. (ii)

      ||A||≤ tr(|A|) = ||A||1.

  5. (e)

    If \(A^*=A \in {\mathfrak B}_1({\mathsf H})\) then

    $$\displaystyle \begin{aligned}tr(A) = \sum_{\lambda \in \sigma_p(A)} d_\lambda \lambda\end{aligned}$$

    where d λ is the dimension of the λ-eigenspace and we assume + ∞⋅ 0 = 0.

  6. (f)

    If \(U\in {\mathfrak B}({\mathsf H})\) is a bijective operator (in particular unitary), then tr(UAU −1) = tr(A) for every \(A\in {\mathfrak B}_1({\mathsf H})\).

  7. (g)

    If A ≥ 0 and \(A\in {\mathfrak B}_1({\mathsf H})\) , then tr(A) ≥ 0.

Proof

First of all we notice that ∑uNu|Tu〉 converges absolutely due to Proposition 4.38 (b), so that it can be rearranged. Let us prove that the sum is even independent of the basis N. Since T = A + iB with \(A=\frac {1}{2}(T+T^*)\) and \(B=\frac {1}{2i}(T-T^*)\), where A and B are selfadjoint and belong to \({\mathfrak B}_1({\mathsf H})\) because of Proposition 4.39 (a), it would be enough demonstrating the assertion for the case T = T , simply exploiting the linearity of the trace ((a) below, whose proof does not depend on the present argument). If \(T^*=T\in {\mathfrak B}({\mathsf H})\), we can decompose it as T = T + − T where \(T_+ := \int _{[0,+\infty )} \imath dP^{(T)} = TP^{(T)}_{[0,+\infty )}\) and \(T_- := -\int _{(-\infty ,0)} \imath dP^{(T)} =-TP^{(T)}_{(-\infty ,0)}\). Since \(T\in {\mathfrak B}_1({\mathsf H})\), also \(T_\pm \in {\mathfrak B}_1({\mathsf H})\) due to (a) Proposition 4.39. Since T ±≥ 0, exploiting again the linearity of the trace, to complete the proof it is sufficient to establish it in the case \(T^*=T\in {\mathfrak B}_{1}({\mathsf H})\) with T ≥ 0. In this case however T = |T| and therefore Proposition 4.38 (a) proves that tr(T) =∑uNu|Tu〉 =∑uNu||T|u〉 does not depend on N, concluding the proof.

(a) and (b) Observing that aA + bB, \(A^* \in {\mathfrak B}_1({\mathsf H})\) if \(A,B\in {\mathfrak B}_1({\mathsf H})\) due to Proposition 4.40 (a), the proofs of statements (a) and (b) immediately arise from elementary properties of inner products, using the fact that 〈u|(aA + bB)u〉 = au|Au〉 + bu|Au〉 and \(\langle u|Au\rangle = \overline {\langle u|A^*u\rangle }\).

(c) It is sufficient to prove the statement with \(A^*=A\in {\mathfrak B}_1({\mathsf H})\) and \(B\in {\mathfrak B}({\mathsf H})\), since we can always decompose a generic \(A\in {\mathfrak B}_1({\mathsf H})\) into a linear combination of a pair of selfadjoint trace-class operators \(\frac {1}{2}(A+A^*)\) and \(\frac {1}{2i}(A-A^*)\) taking advantage of Proposition 4.39 (a) and finally exploiting the linearity of the trace map. So let us stick to \(A^*=A\in {\mathfrak B}_1({\mathsf H})\) and \(B\in {\mathfrak B}({\mathsf H})\). We know that \(AB, BA \in {\mathfrak B}_1({\mathsf H})\) by Proposition 4.39 (a). Moreover, we compute the traces with respect to a Hilbert basis obtained by completing the Hilbert basis of \(\overline {Ran(A)}\) made of eigenvectors of A according to Theorem 4.34 (e), noticing that \(A\in {\mathfrak B}_\infty ({\mathsf H})\) from Proposition 4.38 (c). Notice that the elements added to the initial basis do not give contribution to the trace as they belong to Ker(A) = Ker(A ), so we can ignore them in the sums below.

$$\displaystyle \begin{aligned}tr(AB) = \sum_{n\in N} \langle u_n|ABu_n \rangle =\sum_{n\in N} \langle Au_n|Bu_n \rangle = \sum_{n\in N} \overline{\lambda_n} \langle u_n|Bu_n \rangle = \sum_{n\in N} \lambda_n \langle u_n|Bu_n \rangle\:,\end{aligned}$$

where we have used \(\sigma (A) \subset {\mathbb R}\) since A = A . Similarly

$$\displaystyle \begin{aligned}tr(BA) = \sum_{n\in N} \langle u_n|BAu_n \rangle = \sum_{n\in N} \langle u_n|Bu_n \rangle \lambda_n = \sum_{n\in N} \lambda_n \langle u_n|Bu_n \rangle = tr(AB)\:.\end{aligned}$$

(d) First of all take advantage of the polar decomposition A = U|A|. Here |A| is compact due to Proposition 4.38 (c). Since |A| is selfadjoint it being positive (so it is selfadjoint in view of (3) in Exercise 2.43)), there is a Hilbert basis N of eigenvectors of |A| obtained by completing that in Theorem 4.34 (e). We have

$$\displaystyle \begin{aligned}|tr(A)| = \left| \sum_{u\in N}\langle u|U |A| u \rangle\right|= \left| \sum_{u\in N}\langle u|U u \rangle \lambda_u\right| \leq \sum_{u\in N} |\lambda_u|\:|\langle u|U u \rangle|\:.\end{aligned}$$

Next observe that |λ u| = λ u because |A|≥ 0 and |〈u|Uu〉|≤||u|| ||Uu||≤ 1||Uu||≤||u|| = 1 (||U||≤ 1 since it is a partial isometry). Hence

$$\displaystyle \begin{aligned}|tr(A)| \leq \sum_{u\in N} \lambda_u = \sum_{u\in N} \langle u ||A| u\rangle = tr |A| = ||A||{}_1\:.\end{aligned}$$

The second statement is obvious. Since \(A \in {\mathfrak B}_\infty ({\mathsf H})\) (Proposition 4.38 (c)), there is λ ∈ σ p(A) such that |λ| = ||A|| because of Theorem 4.34 (c). On the other hand from (e), whose proof is independent of this argument, ||A||1 ≥|λ| = ||A||.

(e) Since \(A^*=A\in {\mathfrak B}_\infty ({\mathsf H})\), there is a Hilbert basis of eigenvectors of A obtained by completing that in Theorem 4.34 (e). Computing the trace using this basis, taking Theorem 4.34 (d) into account, we immediately have the thesis.

(f) Exploiting (c), we immediately have tr(UAU −1) = tr((UA)U −1) = tr(U −1 UA) = tr(A).

(g) The proof is evident form the definition of trace. □

Remark 4.41

It is easy to prove that (c) can be generalized to

$$\displaystyle \begin{aligned}tr(T_1\cdots T_n) = tr(T_{\pi(1)} \cdots T_{\pi(n)})\end{aligned}$$

if at least one of the T k belongs to \({\mathfrak B}_1({\mathsf H})\), the remaining ones are in \({\mathfrak B}({\mathsf H})\), and

$$\displaystyle \begin{aligned}\pi : \{1,\ldots, n\}\to \{1,\ldots, n\}\end{aligned}$$

is a cyclic permutation. The elementary proof arises by decomposing π in a product of 2-cycles and finally using (c) recursively, redefining A and B appearing in (c) at every action of the elementary cyclic permutations. The formula is recalled by saying that the trace is cyclic. \(\blacksquare \)

Example 4.42

Consider the Hamiltonian operator H of the harmonic oscillator discussed in (3) Example 3.43, where \(H^{-2} \in {\mathfrak B}_1({\mathsf H})\). The proof is easy: since 0∉σ(H), it must be H −2 = R 0(H 2) (the resolvent operator for λ = 0), hence \(H^{-2}\in {\mathfrak B}({\mathsf H})\). Moreover H −2 ≥ 0 because its spectrum is positive

$$\displaystyle \begin{aligned}\sigma(H^{-2}) =\overline{\left\{ \left.\frac{1}{\hbar^2 \omega^2(n+1/2)^2}\:\right| n=0,1,2,\ldots\right\}}\:.\end{aligned}$$

Finally, computing ||H −2||1 using the Hilbert basis of eigenvectors of H, we have

$$\displaystyle \begin{aligned}||H^{-2}||{}_1 = \sum_{n=0}^{+\infty}\frac{1}{\hbar^2 \omega^2(n+1/2)^2} <+\infty\:.\end{aligned}$$

The same result actually holds true for H α with α > 1.

4.4.5 The Mathematical Notion of Quantum State and Gleason’s Theorem

We have constructed all the mathematical machinery to pursue the description of quantum states in terms of probability measures of \(\mathcal {L}({\mathsf H})\) as discussed in Sect. 4.4.1. According to the discussion in that section, we can give the following general definition.

Definition 4.43

Let H be a Hilbert space. A quantum probability measure on H is a map \(\rho : {\mathcal {L}}({\mathsf H}) \to [0,1]\) such that the following requirements are satisfied.

  1. (1)

    ρ(I) = 1 .

  2. (2)

    If \(\{P_n\}_{n \in {\mathbb N}}\subset {\mathcal {L}}({\mathsf H})\) satisfies P k P h = 0 when h ≠ k for \(h,k \in {\mathbb N}\), then

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \rho\left( \mbox{s-}\sum_{n\in {\mathbb N}}P_n\right) = \sum_{n \in {\mathbb N}} \rho(P_n)\:.{}\end{array} \end{aligned} $$
    (4.20)

The convex set of quantum probability measures in H will be denoted by \(\mathcal {M}({\mathsf H})\). \(\blacksquare \)

The last statement refers to the evident fact that \(\lambda \rho _1 +(1-\lambda ) \rho _2 \in \mathcal {M}({\mathsf H})\) if λ ∈ [0, 1] and \(\rho _1,\rho _2 \in \mathcal {M}({\mathsf H})\). This result extends trivially to a finite convex combination

$$\displaystyle \begin{aligned}\rho = \sum_{k=1}^{n} p_{k} \rho_{k}\:,\end{aligned}$$

where p k ∈ [0, 1] and \(\sum _{k=1}^n p_k =1\), which defines an element of \(\mathcal {M}({\mathsf H})\) if all \(\rho _k\in \mathcal {M}({\mathsf H})\).

Remark 4.44

We stress that in these notes the term quantum state corresponds to the mathematical notion of quantum probability measure. We prefer to explicitly use the latter in mathematical statements because the former is used ambiguously in physics, where quantum states are confused with quantum state operators, that we will introduce shortly. This confusion is usually harmless, but becomes significant when dealing with superselection rules, see later. \(\blacksquare \)

As already observed in Sect. 4.4.1, unit vectors ψ ∈H define, up to phase, quantum probability measures by ρ ψ(P) := 〈ψ|〉 for every \(P\in \mathcal {L}({\mathsf H})\). This is not the only case, since finite convex combinations of quantum probability measures are quantum probability measures as well, as just said. Suppose in particular that 〈ψ k|ψ h〉 = δ hk and consider the finite convex combination

$$\displaystyle \begin{aligned}\rho = \sum_{k=1}^{n} p_{k} \rho_{\psi_k}\:,\end{aligned}$$

where p k ∈ [0, 1] and \(\sum _{k=1}^n p_k =1\). By direct inspection, completing the finite orthonormal system {ψ k}k=1,…,n to a full Hilbert basis of H, one quickly proves that, defining

$$\displaystyle \begin{aligned} \begin{array}{rcl} T = \sum_{k=1}^{n} p_{k} \langle \psi_k|\:\: \rangle \psi_k{}\:,\end{array} \end{aligned} $$
(4.21)

ρ(P) can be computed as

$$\displaystyle \begin{aligned}\rho(P) = tr(T P)\:, \quad P \in {\mathcal{L}}({\mathsf H})\:,\end{aligned}$$

In particular, it turns out that T is in \({\mathfrak B}_1({\mathsf H})\), it satisfies T ≥ 0 (so it is selfadjoint due to (3) in Exercise 2.43) and tr(T) = 1. As a matter of fact, (4.21) is just the spectral decomposition of T, whose spectrum is {p k}k=1,…,n. This result is general.

Proposition 4.45

Let H be a Hilbert space and define the convex subset of \({\mathfrak B}_1({\mathsf H})\) of quantum state operators

$$\displaystyle \begin{aligned}{\mathcal{S}}({\mathsf H}) := \{T \in {\mathfrak B}_1({\mathsf H})\:|\: T\geq 0\:, tr(T) =1\}\:.\end{aligned}$$

If \(T\in {\mathcal {S}}({\mathsf H})\) , the map

$$\displaystyle \begin{aligned}\rho_T : {\mathcal{L}}({\mathsf H}) \ni P \mapsto tr(T P) = tr(PT) \end{aligned}$$

is well defined and \(\rho _T \in \mathcal {M}({\mathsf H})\).

Proof

Observe that tr(TP) = tr(PT) is valid in view of Proposition 4.40 (c). The trace-class operator T is positive, hence selfadjoint, so the eigenvalues λ belong to [0, +). Furthermore, according to Proposition 4.40 (e), \(1= tr(T) = \sum _{\lambda \in \sigma _p(A)} d_\lambda \lambda \) and thus λ ∈ [0, 1]. Exploiting in particular Proposition 4.34 (e), since \(T\in {\mathfrak B}_\infty ({\mathsf H})\) by Proposition 4.38,

$$\displaystyle \begin{aligned} tr(TP) = \sum_{n\in M} \langle u_n| TPu_n\rangle = \sum_{n\in M} \lambda_n \langle u_n| Pu_n\rangle \leq \sum_{n\in M} \lambda_n || u_n|| ||Pu_n|| \leq \sum_{n\in M} \lambda_n= 1\:,\end{aligned}$$

where \(M\subset {\mathbb N}\) and {u n}nM is a Hilbert basis of \(\overline {Ran(T)}\) which can be completed to a Hilbert basis of Ker(T) = Ran(T), however these added vectors do not give contribution to traces as the reader immediately proves. On the other hand, since T ≥ 0,

$$\displaystyle \begin{aligned}0 \leq \sum_{n\in M}\langle Pu_n |TPu_n\rangle = tr(PTP)= tr(TPP)= tr(TP)\end{aligned}$$

and, trivially, tr(IT) = tr(T) = 1. Let us prove that the map \(\mathcal {L}({\mathsf H})\ni P \mapsto tr(PT)\) is σ-additive to conclude that it fulfils Definition 4.43. If \(\{P_n\}_{n\in {\mathbb N}} \subset \mathcal {L}({\mathsf H})\) satisfies P n P m = 0 for n ≠ 0, taking advantage of a Hilbert basis of H completing the Hilbert basis of \(\overline {Ran(T)}\) made of eigenvectors of T as said in Proposition 4.34 (e),

$$\displaystyle \begin{aligned}tr\left(T\mbox{s-}\sum_{n\in {\mathbb N}} P_n \right) = \sum_{l\in M} \left\langle u_l\left| T\sum_{n\in {\mathbb N}} P_nu_l \right. \right\rangle = \sum_{l\in M} \sum_{n\in {\mathbb N}} \left\langle u_l\left| TP_nu_l \right. \right\rangle = \sum_{l\in M} \sum_{n\in {\mathbb N}} \lambda_l \left\langle u_l\left|P_nu_l \right. \right\rangle\:.\end{aligned}$$

In other words, since 〈u l|P n u l〉 = 〈u l|P n P n u l〉 = 〈P n u l|P n u l〉,

$$\displaystyle \begin{aligned}tr\left(T\mbox{s-}\sum_{n\in {\mathbb N}} P_n \right) = \sum_{l\in N} \sum_{n\in {\mathbb N}} \lambda_n ||P_nu_l||{}^2 \:. \end{aligned}$$

Applying the Fubini-Tonelli theorem, since λ n||P n u l||2 ≥ 0, the sums can be exchanged:

$$\displaystyle \begin{aligned}tr\left(T\mbox{s-}\sum_{n\in {\mathbb N}} P_n \right) = \sum_{n\in {\mathbb N}} \sum_{l\in N} \lambda_n ||P_nu_l||{}^2 = \sum_{n\in {\mathbb N}} tr(TP_n)\:,\end{aligned}$$

proving σ-additivity. □

Remark 4.46

Actually, with a little change, remembering that \(\mathcal {L}({\mathsf H})\) is complete and not only σ-complete and that the notion of trace does not need the Hilbert space’s separability, the proof can be extended to prove that \(\mathcal {L}({\mathsf H}) \ni P \mapsto tr(TP)\) is completely additive. In other words, if \(\{P_k\}_{k\in K} \subset \mathcal {L}({\mathsf H})\) is such that P k P h = 0 if k ≠ h and K has any cardinality (when H is not separable), then

$$\displaystyle \begin{aligned}tr\left(T (\vee_{k\in k} P_k) \right) = \sum_{k\in K} tr(TP_k)\:,\end{aligned}$$

where the sum is understood as the supremum of the sums over finite subsets K 0 ⊂ K.

The very remarkable fact is that these operators exhaust \({\mathcal {S}}({\mathsf H})\) if H is separable with dimension ≠ 2, as established by Gleason in 1957; his celebrated theorem [Gle57] will be adapted to these lectures (see [Ham03] and [Dvu92] for general treatises on the subject).

Theorem 4.47 (Gleason’s Theorem)

Let H be a Hilbert space of finite dimension ≠ 2, or infinite-dimensional and separable. The set of quantum probability measures \(\rho \in \mathcal {M}({\mathsf H})\) is in one-to-one correspondence with the set of quantum-state operators \(T\in {\mathcal {S}}({\mathsf H})\) . The bijection is such that

$$\displaystyle \begin{aligned}tr(T P) = \rho(P)\quad \mathit{\mbox{ for every}}\ P \in {\mathcal{L}}({\mathsf H}),\end{aligned}$$

and preserves the convex structures of the two sets. Finally, quantum probability measures separate elements in \(\mathcal {L}({\mathsf H})\) because quantum-state operators do so.

Comments on the Proof

The only very hard part of Gleason’s theorem is the existence claim, and we will not try to address it here (see [Dvu92, Ham03]). The remaining statements are quite easy. It is evident by the trace’s linearity that the complex structures are preserved. The T associated to ρ is unique for the following elementary reason. Any other T′ of trace class such that ρ(P) = tr(T′P) for any \(P\in {\mathcal {L}}({\mathsf H})\) must also satisfy 〈x|(T − T′)x〉 = 0 for any x ∈H. If x = 0 this is clear, while if x ≠ 0 we may complete the vector x∕||x|| to a basis, in which tr((T − T′)P x) = 0 reads ||x||−2x|(T − T′)x〉 = 0, where P x is the projector onto span{x}. By (3) in Exercise 2.43, we obtain T − T′ = 0.Footnote 1 The fact that quantum-state operators separe the elements of \(\mathcal {L}({\mathsf H})\) is quite obvious since, if tr(TP) = tr(TP′) for all \(T\in {\mathcal {S}}({\mathsf H})\), we have in particular 〈x|Px〉 = 〈x|P′x〉, where we have chosen T = 〈x|⋅〉x for every x ∈H with ||x|| = 1. As before, this implies that P = P′.

Remark 4.48

  1. (a)

    Imposing \(\dim {\mathsf H} \neq 2 \) is mandatory, due to a well-known counterexample. Identifying H to \({\mathbb C}^2\), one-dimensional projectors P n correspond one-to-one with unit vectors \(\mathbf {n} =(n_1,n_2,n_3)^t \in {\mathbb R}^3\) by means of \(P_{\mathbf {n}} = \frac {1}{2}\left (I + \sum _{j=1}^3 n_j \sigma _j\right )\), where σ j are the standard Pauli matrices. Observe that we have \(P_{\mathbf {n}} \perp P_{\mathbf {n}'}\) if and only if n = −n . If \(\mathbf {m} \in {\mathbb R}^3\) is a fixed unit vector, the map \(\rho (P_{\mathbf {n}}):= \frac {1}{2}\left (1 + \sum _{j=1}^3 (n_j m_j)^3\right )\) uniquely extends to a quantum probability measure on \(\mathcal {L}({\mathbb C}^2)\) by additivity, as the reader immediately proves. However, there is no T as in Gleason’s theorem such that ρ(TP n) = ρ(P n) for every one-dimensional orthogonal projector P n. This is because, imposing this formula leads to \(\sum _{j=1}^3 n_jT_j = \sum _{j=1}^3 n^3_jm_j^3\) for a fixed unit vector m := (m 1, m 2, m 3)t and all unit vectors n. It is easy to prove that this is impossible for every choice of the constants T j = tr( j).

  2. (b)

    Particles with spin 1∕2, like electrons, admit a Hilbert space – in which the observable spin is defined – of dimension 2. The same occurs to the Hilbert space on which the polarisation of light is described (cf. helicity of photons). When these systems are described in full, however, for instance when including degrees of freedom relative to position or momentum, they are representable on a separable Hilbert space of infinite dimension.

  3. (c)

    Gleason’s theorem extends to real and quaternionic Hilbert spaces in accordance to Solér’s theorem, to formulate quantum theories. However this extension is technically complicated especially in the second case, and it involves subtle problems related with the notion of trace. These have been fixed [MoOp18] only recently. \(\blacksquare \)

Gleason’s characterization of quantum states has an important consequence discussed explicitly by Bell in 1966 [Bel66] (but already known to Specker in 1960). It proves that there are no sharp states in QM, i.e. probability measures assigning 1 to some elementary observables and 0 to the remaining ones, differently to what happens in CM. In a sense, QM is intrinsically probabilistic since it does not admit sharp measures, as happens in CM. We state Bell’s theorem below and prove it through a different—but mathematically equivalent—procedure from Bell’s original argument.

Theorem 4.49 (Bell’s Theorem)

Let H be a Hilbert space of finite dimension > 2, or infinite-dimensional and separable. There is no quantum probability measure \(\rho : {\mathcal {L}}({\mathsf H}) \to [0,1]\) , in the sense of Definition 4.43 , such that \(\rho ({\mathcal {L}}({\mathsf H})) = \{0,1\}\).

Proof

Define \({\mathbb S} := \{x\in {\mathsf H} \:|\: ||x||=1\}\) endowed with the topology induced by H, and let \(T\in {\mathfrak B}_1({\mathsf H})\) be the representative of ρ using Gleason’s theorem. The map \(f_\rho : {\mathbb S} \ni x \mapsto \langle x|Tx\rangle = \rho (\langle x| \:\: \rangle x) \in {\mathbb C}\) is continuous because T is bounded. We have \(f_\rho ({\mathbb S}) \subset \{0,1\}\), where {0, 1} is equipped with the topology induced by \({\mathbb C}\). Since \({\mathbb S}\) is connected (because path-connected, as the reader can prove easily) its image must be connected, too. So either \(f_\rho ({\mathbb S})=\{0\}\) or \(f_\rho ({\mathbb S})=\{1\}\). In the first case T = 0 which is impossible because tr(T) = 1, in the second case tr(T) > 2 which is similarly impossible. □

This negative result can be strengthened physically, or so it seems, by the Kochen-Specker theorem (Theorem 5.5) we shall discuss shortly. It produces no-go theorems within certain attempts to explain QM in terms of CM based on so-called hidden variables . Actually Theorem 4.49 has the same physical content of the Kochen-Specker theorem and can be applied to more general situations. We will also prove an alternative form in Theorem 5.2 below.

Remark 4.50

In view of Proposition 4.45 and Theorem 4.47, when dealing with Hilbert spaces with physical meaning, we could assume that H has finite dimension or is separable so that we automatically identify the set of σ-additive quantum probability measures \(\mathcal {M}({\mathsf H})\) with the set of quantum states \({\mathcal {S}}({\mathsf H})\). (We can simply disregard the quantum measures in a two-dimensional H which are not represented by elements of \({\mathcal {S}}({\mathsf H})\), especially taking (b) Remark 4.48 into account.) However, as most of the subsequent propositions are valid for the elements of \({\mathcal {S}}({\mathsf H})\) even if H does not fulfil Gleason’s hypotheses, we will always deal with the class of quantum-state operators \({\mathcal {S}}({\mathsf H})\) without restrictions on H. When H is not separable, the elements of \({\mathcal {S}}({\mathsf H})\) define completely additive (see Remark 4.46) probability measures on \(\mathcal {L}({\mathsf H})\) which satisfy a stronger requirement than σ-additivity, and define a proper subset of \(\mathcal {M}({\mathsf H})\) [Dvu92, Ham03]. (If H is separable, the two notions of additivity coincide.) It is possible to reformulate Gleason’s theorem for Hilbert spaces of dimension ≠ 2 (separable or not), proving that completely additive probability measures correspond one-to-one with unit-trace, positive trace-class operators [Dvu92, Ham03]. \(\blacksquare \)

We are in a position to state some definitions of interest to physicists, especially the distinction between pure and mixed states, so we proceed to analyze the structure of the space of the quantum-state operators. We remind the reader that, if C is a convex set in a vector space, e ∈ C is called extremal if it cannot be written as e = λx + (1 − λ)y, with λ ∈ (0, 1), x, y ∈ C ∖{e}. We have the following simple result.

Proposition 4.51

Let H be a Hilbert space. Then

  1. (a)

    The extremal points of the convex set \({\mathcal {S}}({\mathsf H})\) are those of the form: ρ ψ := 〈ψ| 〉ψ for every vector ψ H with ||ψ|| = 1. (This sets up a bijection between extremal-state operators and elements of the complex projective space P H .)

    Under the hypotheses of Gleason’s theorem, the extremal points of \({\mathcal {S}}({\mathsf H})\) are in one-to-one correspondence with the extremal points of \(\mathcal {M}({\mathsf H})\).

  2. (b)

    Any quantum state operator \(T \in {\mathcal {S}}({\mathsf H})\) is a linear combination of extremal quantum-state operators, including infinite combinations in the strong operator topology. In particular there is always a decomposition

    $$\displaystyle \begin{aligned}T = \sum_{u\in M} p_u\langle u| \:\: \rangle u\:,\end{aligned}$$

    where M is a Hilbert basis of T-eigenvectors of H , p u ∈ [0, 1] for any u  M, and

    $$\displaystyle \begin{aligned}\sum_{u\in M}p_u=1\:.\end{aligned}$$

Proof

We start by proving (b). The expansion is a trivial consequence of Theorem 4.34 (e), since trace-class operators are compact because of (c) Proposition 4.38. Next observe that T is positive, hence selfadjoint, so that its eigenvalues p u belong to [0, +). M is obtained by completing the Hilbert basis of \(\overline {Ran(T)}\) by adding a Hilbert space of Ker(T). Furthermore, according to Proposition 4.40 (e), 1 = tr(T) =∑uM p u and and also p u ∈ [0, 1].

(a) Consider \(T\in {\mathfrak B}_1({\mathsf H})\) and refer to the expansion used in the proof of (b), T =∑uN p uu| 〉u. If T is not a one-dimensional orthogonal projector there are at least two different u 1 and u 2 with \(p_{u_1}>0\) and \(1-p_{u_1} \ge p_{u_2} >0\). As a consequence, T decomposes as a convex combination \(T= p_{u_1} T_1 + (1- p_{u_1}) T_2\) for

$$\displaystyle \begin{aligned} T_1 = \langle u_1| \:\: \rangle u_1 \quad \mbox{and} \quad T_2 := \sum_{u\neq u_1} \frac{p_u}{1- p_{u_1}}\langle u| \:\: \rangle u\:. \end{aligned}$$

Notice that (i) T 1 ≠ T 2, (ii) T 1, T 2 ≠ 0, (iii) \(T_1,T_2 \in {\mathfrak B}_1({\mathsf H})\) by construction, (iv) they are selfadjoint, (v) T 1, T 2 ≥ 0 and (vi) tr(T 1) = tr(T 2) = 1, so T 1 and T 2 belong to \({\mathcal {S}}({\mathsf H})\). We conclude that T cannot be extremal. To complete the proof, let us prove that P = 〈ψ| 〉ψ, with ||ψ|| = 1, does not admit non-trivial convex decompositions. Suppose that

$$\displaystyle \begin{aligned}P = \lambda T_1 + (1-\lambda)T_2 \quad \mbox{for}\ \lambda \in (0,1)\ \mbox{and}\ T_1,T_2 \in {\mathfrak B}_1({\mathsf H}). \end{aligned}$$

We want to prove that T 1 = T 2 = P. As a consequence of the hypothesis, if P  = I − P,

$$\displaystyle \begin{aligned}0 = P^\perp P = \lambda P^\perp T_1 + (1-\lambda)P^\perp T_2\:,\end{aligned}$$

so that

$$\displaystyle \begin{aligned}0 = \lambda tr(P^\perp T_1) + (1-\lambda)tr(P^\perp T_2) = \lambda tr(P^\perp T_1P^\perp ) + (1-\lambda)tr(P^\perp T_2P^\perp)\:.\end{aligned}$$

Since λ, (1 − λ) > 0 and both P T j P ≥ 0 for j = 1, 2, it must be tr(P T 1 P ) = tr(P T 2 P ) = 0. Since T j ≥ 0, if N is a Hilbert basis of P (H) = P(H), the said conditions can be rephrased as \(\sum _{u\in N} ||\sqrt {T_j}u||{ }^2=0\), so that T j P  = 0 and, taking the adjoint, P T j = 0 because \(T_j=T_j^*\). Decomposing T j = PT j P + P T j P + PT j P  + P T j P , we conclude that

$$\displaystyle \begin{aligned}T_j = PT_jP = t_j\langle \psi|\:\:\rangle \psi\end{aligned}$$

for some \(t_j \in {\mathbb C}\) and j = 1, 2. The condition tr(T j) = 1 fixes t j = 1. □

Exercise 4.52

Consider \(T \in {\mathcal {S}}({\mathsf H})\). Prove that

  1. (i)

    T 2 ≤ T (i.e. 〈x|T 2 x〉≤〈x|Tx〉 for all x ∈H);

  2. (ii)

    T is extremal if and only if T 2 = T.

Solution

By decomposition of T along the Hilbert basis of eigenvectors of \(T\in {\mathcal {S}}({\mathsf H})\), we have \(T^2 = \sum _{u\in N} p_u^2 \langle u| \:\:\rangle u\). Since p u ∈ [0, 1], it follows that \(0\leq p^2_u\leq p_u\) so that 〈x|T 2 x〉≤〈x|Tx〉 for all x ∈H. Since \(tr(T^2)=\sum _{u\in N} p^2_u\), if T 2 = T is valid so that \(\sum _{u\in N} p^2_u-p_u=0\), and \(p^2_u-p\leq 0\), we conclude that \(p_u=p_u^2\) for all u, so that p u = 0 or p u = 1. Since ∑uN p n = 1, this is possible only if all p u vanish but one, which takes the value 1. In other words T = 〈u| 〉u. Conversely, if T = 〈u| 〉u, evidently T 2 = T. .

Exercise 4.53

Prove that the quantum probability measure \(\rho : \mathcal {L}({\mathsf H}) \to [0,1]\) associated to \(T\in {\mathcal {S}}({\mathsf H})\) according to Proposition 4.45 satisfies the so-called Jauch-Piron property: if ρ(P) = ρ(Q) = 0 is true for \(P,Q \in \mathcal {L}({\mathsf H})\), then ρ(P ∨ Q) = 0.

Solution

tr(TP) = 0 can be rewritten as \(\sum _{u\in N}||\sqrt {T}Pu||{ }^2=0\) for every Hilbert basis N ⊂H. Fix N and complete to a Hilbert basis of P(H): the formula entails \(\sqrt {T}x=0\) if x belongs to that basis and also for x ∈ P(H) in view of the continuity of \(\sqrt {T}\). As a consequence \(Tx=\sqrt {T} \sqrt {T} x= 0\) for x ∈ P(H). The same result is true when replacing P by Q. Every vector in P ∨ Q(H) is the limit of linear combinations of vectors in P(H) and Q(H). Hence Tx = 0 if x ∈ P ∨ Q(H) by the linearity and continuity of T. Computing tr(TP ∨ Q) using a Hilbert basis which completes a Hilbert basis of P ∨ Q(H) by adding a Hilbert basis of (PQ(H)), we immediately find tr(TP ∨ Q) = 0, namely ρ(P ∨ Q) = 0.

4.4.6 Physical Interpretation

The proposition allows us to introduce some notions and terminology relevant in physics.

  1. (a)

    First of all, extremal elements in \({\mathcal {S}}({\mathsf H})\) are usually said to describe pure states by physicists. We shall denote their set by \({\mathcal {S}}_p({\mathsf H})\).

  2. (b)

    Non-extremal quantum state operators are called statistical operators or also density matrices. They are said to describe mixed states, mixtures or non-pure states.

  3. (c)

    If

    $$\displaystyle \begin{aligned}\psi = \sum_{i\in I}a_i\phi_i\:,\end{aligned}$$

    with I finite or countable (and the series converges in the topology of H in the second case), where the vectors ϕ i ∈H are all non-null and \(0\neq a_i\in {\mathbb C}\), physicists call the state operator 〈ψ| 〉ψ a coherent superposition of the state operators 〈ϕ i| 〉ϕ i∕||ϕ i||2.

  4. (d)

    The possibility of creating pure states by non-trivial combinations of vectors associated to other pure states is called, in the jargon of QM, superposition principle of (pure) states .

  5. (e)

    There is however another type of superposition of states. If \(T\in {\mathcal {S}}({\mathsf H})\) satisfies:

    $$\displaystyle \begin{aligned}T = \sum_{i\in I} p_i T_i\end{aligned}$$

    with I finite, \(T_i \in {\mathcal {S}}({\mathsf H})\), 0 ≠ p i ∈ [0, 1] for any i ∈ I, and ∑i p i = 1, the state operator T is said to describe an incoherent superposition of the states described by the operators T i (possibly pure).

  6. (f)

    If ψ, ϕ ∈H satisfy ||ψ|| = ||ϕ|| = 1 the following terminology is very popular: the complex number 〈ψ|ϕ〉 is the transition amplitude or probability amplitude of the state operator 〈ϕ| 〉ϕ on the state operator 〈ψ| 〉ψ. Moreover the non-negative real number |〈ψ|ϕ〉|2 is the transition probability of the state operator 〈ϕ| 〉ϕ on the state operator 〈ψ| 〉ψ.

We make some comments about these notions. Consider the extremal state operator \(T_\psi \in {\mathcal {S}}_p({\mathsf H})\), written T ψ = 〈ψ| 〉ψ for some ψ ∈H with ||ψ|| = 1. What we want to emphasise is that this extremal state operator is also an orthogonal projector P ψ := 〈ψ| 〉ψ, so it must correspond to an elementary observable of the system (an atom using the terminology of Theorem 4.17). The naive and natural interpretationFootnote 2 of that observable is this: “the system’s state is the pure state given by the vector ψ”. We can therefore interpret the square modulus of the transition amplitude 〈ϕ|ψ〉 as follows. If ||ϕ|| = ||ψ|| = 1, as the definition of transition amplitude imposes, tr(T ψ P ϕ) = |〈ϕ|ψ〉|2, where T ψ := 〈ψ| 〉ψ and P ϕ = 〈ϕ| 〉ϕ. Using (4) we conclude:

|〈ϕ|ψ〉|2 is the probability that the state, given (at time t) by the vector ψ, following a measurement (at time t) on the system becomes determined by ϕ.

Notice |〈ϕ|ψ〉|2 = |〈ψ|ϕ〉|2, so the probability transition of the state determined by ψ on the state determined by ϕ coincides with the analogous probability where the vectors are swapped. This fact is, a priori, highly non-evident in physics.

4.4.7 Post-measurement States: The Meaning of the Lüders-von Neumann Postulate

Since we have introduced a new notion of state, the axiom concerning the collapse of the state (Sect. 3.4) must be upgraded to encompass all state operators of \({\mathcal {S}}({\mathsf H})\). The standard formulation of QM assumes the following axiom (introduced by von Neumann and generalized by Lüders) about what occurs to the physical system, in a state described by the operator \(T \in {\mathcal {S}}({\mathsf H})\) at time t, when subjected to the measurement of an elementary observable \(P\in {\mathcal {L}}({\mathsf H})\), if the latter is true (so in particular tr(TP) > 0, prior to the measurement). We are referring to non-destructive testing, also known as indirect measurement or first-kind measurement, where the physical system examined (typically a particle) is not absorbed/annihilated by the instrument. It is an idealised version of the actual processes used in labs, and only in part they can be modelled in such a way.

Collapse of the State: General Formulation

If the quantum system is in the state described by \(T \in {\mathcal {S}}({\mathsf H})\) at time t and proposition \(P\in {\mathcal {L}}({\mathsf H})\) is true after a measurement at time t, the system’s state immediately afterwards is described by

$$\displaystyle \begin{aligned} \begin{array}{rcl} T_P := \frac{PTP}{tr(T P)}\:. {}\end{array} \end{aligned} $$
(4.22)

In particular, if T is pure and determined by the unit vector ψ, the state immediately after measurement is still pure, and determined by:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \psi_P =\frac{P\psi}{||P\psi||}\:.{}\end{array} \end{aligned} $$
(4.23)

(Obviously, in either case T P and ψ P define states. In the former, in fact, T P is positive of trace class, with unit trace, while in the latter ||ψ P|| = 1.)

The postulate has an important characterization. Suppose that the initial state is described by \(T\in {\mathcal {S}}({\mathsf H})\), we measure \(P\in \mathcal {L}({\mathsf H})\) and we want to know the probability to measure \(Q\in \mathcal {L}({\mathsf H})\). This is a problem of conditional probability. In general, if Q is not compatible with P, i.e. if P and Q do not commute, the rules to handle conditional probability are different from the classical ones, as physicists know very well. However, if we deal with compatible elementary observables, we expect that the quantum rules and the classical ones coincide, by including these observables in a maximal set of commuting elementary observables as we already did elsewhere. In particular, let us assume Q ≤ P. In this case P ∧ Q = PQ = QP = Q (Proposition 3.19), so the classical rule of conditional probability is expected to hold with an obvious meaning of the symbols,

$$\displaystyle \begin{aligned}{\mathbb P}_T(Q|P) = \frac{{\mathbb P}_T(P\wedge Q)}{{\mathbb P}_T(P)} = \frac{{\mathbb P}_T(Q)}{{\mathbb P}_T(P)}\:.\end{aligned}$$

This requirement, if assumed, completely characterizes the post-measurement state and implies that the Lüders-von Neumann postulate holds, as established in the following proposition.

Proposition 4.54

Let \(T\in {\mathcal {S}}({\mathsf H})\) be a quantum state operator for a Hilbert space H and suppose that, for \(P\in \mathcal {L}({\mathsf H})\) , tr(TP) > 0. There exists exactly one other quantum state operator \(T'\in {\mathcal {S}}({\mathsf H})\) such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} tr(T'Q) = \frac{tr(TQ)}{tr(TP)} \quad \mathit{\mbox{for every}}\ Q\in \mathcal{L}({\mathsf H})\ \mathit{\mbox{with}}\ Q\leq P.{}\end{array} \end{aligned} $$
(4.24)

Moreover,

$$\displaystyle \begin{aligned}T' = \frac{PTP}{tr(TP)}\:.\end{aligned}$$

Proof

One immediately proves that T′ satisfies the condition. Let us prove the converse statement. If x ∈H 0 := P(H) has unit norm, consider the orthogonal projector Q x := 〈x| 〉x. Since Q ≤ P, condition (4.24) reads tr(T′Q x) = tr(TP)−1 tr(TQ x). Computing traces by completing x to a basis of H, we have 〈x|T′x〉− tr(TP)−1x|Tx〉 = 0 and, since x = Px, it can be rearranged to 〈x|T′x〉− tr(TP)−1Px|TPx〉 = 0, so that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \langle x|(T'- tr(TP)^{-1} PTP) x\rangle=0\quad \mbox{for every}\ x\in {\mathsf H}_0{}\:.\end{array} \end{aligned} $$
(4.25)

Now observe that condition (4.24) for Q = P leads to tr(T′P) = 1. Taking also advantage of the cyclic property of the trace and PP = P, we have tr(T′P) = tr(PT′P) = 1. On the other hand, using the decomposition T′ = PTP + P TP  + P TP + PTP , (whereP  := I − P), the normalization condition tr(T′) = 1 implies 1 = tr(PT′P) + tr(P T′P ). Comparing the results obtained, we conclude that tr(P T′P ) = 0, namely \(tr(P^\perp \sqrt {T}\sqrt {T}P^\perp )= \sum _{u\in N}||\sqrt {T}u||{ }^2=0\), where N is a Hilbert basis of P (H). We have found that T′P  = 0 and also, taking the adjoint P T′ = 0. Coming back to the decomposition T′ = PT′P + P T′P  + P T′P = PT′P , we realize that T′ = PT′P. In view of the analogous \(\frac {PTP}{tr(TP)}\), we can restrict our analysis to the Hilbert space H 0 := P(H), since both operators vanish on the orthogonal of H 0 and their images are contained in H 0 viewed as a Hilbert space. To this regard, Proposition 2.5 implies that (4.25) is therefore equivalent to (T′− tr(TP)−1 PTP)z = 0 when z ∈H 0. Since, as we said, both operators vanish on the orthogonal of H 0, we have that T′y = tr(TP)−1 PTPy for every y ∈H proving our assertion. □

Conditional probability is an articulated part of quantum logic (quantum conditional and quantum conditional probability) with profound differences between the classical counterparts and open issues. See [Red98] for a technical account.

Remark 4.55

  1. (a)

    Measuring a property of a physical quantity goes through the interaction between the system and an instrument (supposed to be macroscopic and obeying the laws of classical physics). Quantum Mechanics, in its standard formulation, does not establish what a measuring instrument is, it only says they exist; nor is it capable of describing the interaction of instrument and quantum system set out in the Lüders-von Neumann postulate discussed above. Several viewpoints and conjectures exist on how to complete the physical description of the measuring process; these are called, in the slang of QM, collapse/reduction of the state or of the wavefunction, and are also described in terms of decoherence (see [BLPY16, Lan17] for complete discussions and references).

  2. (b)

    Measuring instruments are commonly employed to prepare a system in a certain pure state. Theoretically speaking the preparation of a pure state is carried out like this. A finite collection of compatible propositions P 1, …, P n is chosen so that the projection subspace of P 1 ∧⋯ ∧ P n = P 1P n is one-dimensional. In other words P 1P n = 〈ψ| 〉ψ for some vector with ||ψ|| = 1. The existence of such propositions is seen in practically all quantum systems used in experiments. (From a theoretical point of view these are atomic propositions.) Then the P i are simultaneously measured on several identical copies of the physical system of concern (e.g., electrons), whose initial states, though, are unknown. If for one system the measurements of all propositions are successful, the post-measurement state is determined by the vector ψ, and the system was prepared in that particular pure state.

    Normally each projector P i belongs to the PVM P (A) of an observable A i whose spectrum is made of isolated points (thus a pure point spectrum according to Definition 3.44) and \(P_i = P^{(A)}_{\{\lambda _i\}}\) with λ i ∈ σ p(A i). We will come back to this issue in Sect. 6.2.2.

  3. (c)

    Let us finally explain how to obtain non-pure states from pure ones practically. Consider q 1 identical copies of system S prepared in the pure state associated to ψ 1, q 2 copies of S prepared in the pure state associated to ψ 2 and so on, up to ψ n. If we mix these states each one will be in the non-pure state: \(T = \sum _{i=1}^n p_i \langle \psi _i|\:\:\rangle \psi _i\:,\) where \(p_i := q_i/\sum _{i=1}^n q_i\). In general, 〈ψ i|ψ j〉 is not zero if i ≠ j, so the above expression for T is not the decomposition with respect to an eigenvector basis for T. This procedure may seem to suggest the existence of two different types of probability, one intrinsic and due to the quantum nature of the state associated to ψ i; the other epistemic, and encoded in the probability p i. But this is not true: once a non-pure state has been created, as above, there is no way, within QM, to distinguish the states forming the mixture. For example, the same state operator T could have been obtained mixing pure states other than those determined by the ψ i. In particular, one could have used those in the eigenvector decomposition of T. For physics, no kind of measurement would distinguish the two mixtures. \(\blacksquare \)

To conclude this quick discussion about measurements in Quantum Theories, it is fundamental to stress that the Lüders-von Neumann postulate refers to an extremely idealized notion of measurement. Similarly, the notion of observable viewed as the integral of a PVM, albeit representing a fundamental theoretical notion, appears to be a rigid idealization of concrete measurement instruments. Realistic quantum instruments are nowadays described through a mature and sophisticated mathematical theory based on the notion of POVMs (positive-operator valued measures) generalising our familiar PVM, and completely positive maps. We suggest [BLPY16] as a modern review on the subject.

4.4.8 Composite Systems in Elementary QM: The Use of Tensor Products

If a quantum system S described on the Hilbert space H contains two independent parts, S 1 and S 2, respectively described on the Hilbert spaces H 1 and H 2, we are committed to assume this triple of requirements at least.

  1. (A)

    The elementary propositions P i of each subsystem, the elements of \(\mathcal {L}({\mathsf H}_i)\) for i = 1, 2, must be (1-1) identified with corresponding elementary propositions \(P_i^{\prime }\) on the full system, i.e., elements of \(\mathcal {L}({\mathsf H})\).

  2. (B)

    Any pair of elementary propositions, one for each independent subsystem, viewed as elements of \(\mathcal {L}({\mathsf H})\) must be compatible.

  3. (C)

    For every couple of states \(T_1\in \mathcal {S}({\mathsf H}_1)\), \(T_2\in \mathcal {S}({\mathsf H}_2)\), there is a state \(T\in \mathcal {S}({\mathsf H})\) such that \(tr(TP^{\prime }_1)= tr(T_1P_1)\) and \(tr(TP^{\prime }_2)= tr(T_1P_2)\) for every \(P_1\in \mathcal {L}({\mathsf H}_i)\) and \(P_2\in \mathcal {L}({\mathsf H}_2)\).

(C) says that we can fix states on S 1 and S 2 independently: for every choice of two independent states on the two parts of the system, there is a state of the overall system which embodies those choices.

A natural way to implement these requirements in elementary QM is assuming that the whole system is described on the Hilbert tensor product H = H 1 ⊗H 2 (with further factors in case S 1 and S 2 do not exhaust the total system S, but further independent parts S 3 etc. are present) so that, in particular, the space of states is \(\mathcal {S}({\mathsf H}) = \mathcal {S}({\mathsf H}_1\otimes {\mathsf H}_2)\).

We quote here some basic technical facts [Mor18] regarding the tensor product of Hilbert spaces, useful when dealing with composite systems and leading to the assumption made above.

  1. (1)

    (From Sect. 2.1.4.) The Hermitian inner product 〈⋅|⋅〉 on H 1 ⊗H 2 is the unique Hermitian inner product such that, with obvious notation,

    $$\displaystyle \begin{aligned} \begin{array}{rcl}\langle \psi_1\otimes \psi_2| \phi_1\otimes \phi_2 \rangle = \langle \psi_1 | \phi_1 \rangle_1 \langle \psi_2 | \phi_2 \rangle_2\quad \mbox{for every}\ \phi_i,\psi_i \in {\mathsf H}_i\ \mbox{and}\ i=1,2.\\\end{array} \end{aligned} $$
    (4.26)
  2. (2)

    (From Proposition 10.32 in [Mor18]) If \(A_i \in {\mathfrak B}({\mathsf H}_i)\) i = 1, 2, there is a unique operator \(A_1\otimes A_2 \in {\mathfrak B}({\mathsf H}_1\otimes {\mathsf H}_2)\) called the tensor product of A 1 and A 2 such that

    $$\displaystyle \begin{aligned} \begin{array}{rcl} A_1\otimes A_2(\psi_1\otimes \psi_2) = (A_1\psi_1)\otimes (A_2\psi_2) \quad \mbox{for every}\ \psi_i \in {\mathsf H}_i\ \mbox{and}\ i=1,2\quad \quad {} \vspace{-2pt}\end{array} \end{aligned} $$
    (4.27)

    and it turns out that

    $$\displaystyle \begin{aligned} \begin{array}{rcl} ||A_1\otimes A_2|| = ||A_1||{}_1\: ||A_2||{}_2\;.\end{array} \end{aligned} $$
    (4.28)

    Moreover, A i ≥ 0 imply A 1 ⊗ A 2 ≥ 0.

  3. (3)

    It is easy to prove that if furthermore \(A_i\in {\mathfrak B}_1({\mathsf H}_i)\) then \(A_1\otimes A_2 \in {\mathfrak B}_1({\mathsf H}_1\otimes {\mathsf H}_2)\) and tr(A 1 ⊗ A 2) = tr(A 1)tr(A 2).

The stated facts lead straightforwardly to the following proposition.

Proposition 4.56

If H 1, H 2 are Hilbert spaces, the following results are valid.

  1. (a)

    The map

    $$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathfrak B}({\mathsf H}_1) \ni A_1 \mapsto A_1 \otimes I_2 \in {\mathfrak B}({{\mathsf H}}_1 \otimes {{\mathsf H}}_2)\vspace{-2pt}\end{array} \end{aligned} $$
    (4.29)

    is an injective and norm-preserving unital -algebra homomorphism . Furthermore,

    $$\displaystyle \begin{gathered} \sigma(A_1 \otimes I_2)= \sigma (A_1)\:, \quad \sigma_p(A_1 \otimes I_2)= \sigma_p(A_1)\:, \quad \sigma_c(A_1 \otimes I_2)= \sigma_c(A_1) \:,\\ \sigma_r(A_1 \otimes I_2)= \sigma_r(A_1)\:. \end{gathered} $$

    A similar statement holds replacing 1 with 2.

  2. (b)

    The map

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \mathcal{L}({\mathsf H}_1) \ni P_1\mapsto P_1\otimes I_2\in \mathcal{L}({\mathsf H}_1\otimes {\mathsf H}_2){}\end{array} \end{aligned} $$
    (4.30)

    is well defined and is an injective homomorphism of orthocomplemented lattices . A similar statement holds when replacing 1 by 2.

  3. (c)

    The map

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \mathcal{S}({\mathsf H}_1) \times \mathcal{S}({\mathsf H}_2) \ni (T_1,T_2) \mapsto T_1 \otimes T_2 \in \mathcal{S}({\mathsf H}_1\otimes{\mathsf H}_2){}\end{array} \end{aligned} $$
    (4.31)

    is well defined and T := T 1 ⊗ T 2 satisfies

    $$\displaystyle \begin{aligned}tr(T A_1\otimes A_2) =tr(T_1A_1) tr(T_2A_2)\:, \quad \mathit{\mbox{for}} \quad A_i\in {\mathfrak B}({\mathsf H}_i)\:,i=1,2\:.\end{aligned}$$

    In particular,

    $$\displaystyle \begin{gathered} tr(T(P_1\otimes I_2))= tr(T_1P_1)\quad \mathit{\mbox{ and }}\quad tr(T(I_1\otimes P_2))= tr(T_2P_2) \\ \mathit{\mbox{for}} \quad P_i\in \mathcal{L}({\mathsf H}_i)\:,i=1,2\:. \end{gathered} $$

Sketch of Proof

(a) is consequence of (2) and (1). In particular, \((A_1\otimes I_2)^* = A_1^*\otimes I_2\) arises from \(\langle \psi \otimes \phi |A_1\otimes I_2 (\psi ' \otimes \phi ')\rangle = \langle A^*_1\otimes I_2(\psi \otimes \phi )|A_1\otimes I_2 \psi ' \otimes \phi '\rangle \) which is valid due to (1),(2) and from the fact that linear combinations of elements ψ ⊗ ϕ are dense in H 1 ⊗H 2, also using the boundedness of the operators involved. The identities between the various parts of the spectrum easily arise from (2), the linearity of operators, and the direct application of the relevant definitions. (b) is consequence of the relevant definitions, the continuity of the operators and in particular Proposition 4.12. (c) Is consequence of (3) and the comment before the remark in Sect. 2.1.4.

Items (b) and (c) show that the tensor product yields a practical implementation of the requirements (A)–(C). In fact, (A) the elementary propositions of a subsystem are viewed as elementary propositions on the full system under the injective homomorphism (4.30). Moreover, (B) elementary propositions of two independent subsystems are always compatible because

$$\displaystyle \begin{aligned}(P_1\otimes I_2) (I_1\otimes P_2) = P_1\otimes P_2 =(I_1\otimes P_2)(P_1\otimes I_2)\quad \mbox{for}\ P_i\in \mathcal{L}({\mathsf H}_i)\ \mbox{and}\ i=1,2\:. \end{aligned}$$

There are natural extensions of these results to the case A 1, A 2 densely defined and selfadjoint, but we shall not enter the details here [Mor18]. The fact that (C) is valid is now trivial. All these results generalize to the case of a finite, and to some extent, countable number of subsystems.

Remark 4.57

The use of the tensor product of Hilbert spaces to formalize the notion of independent subsystems is a possibility usually exploited in elementary QM. However, this is not the only possibility and sometimes it is impossible to adopt that description. We will come back on this issue in Sect. 6.4. \(\blacksquare \)

Example 4.58

  1. (1)

    An electron possesses an electric charge in addition to the spin. That is another internal quantum observable Q with two values ± e, where e ≈−1.6 × 10−19 C is the value elementary electrical charge. So there are two types of electrons. Proper electrons, whose internal state of charge is an eigenvector of Q with eigenvalue − e and positrons, whose internal state of charge is a eigenvector of Q with eigenvalue e. The simplest version of the internal Hilbert space of the electrical charge is therefore H c which,Footnote 3 again, is isomorphic to \(\mathbb C^2\). With this representation Q =  3. The full Hilbert space of an electron must therefore contain a factor H s ⊗H c. Obviously this is by no means sufficient to describe an electron, since we must include the observables describing at least the position of the electron. The three observables describing the Cartesian coordinates of the positions of an electron in the rest space \({\mathbb R}^3\) of an inertial reference space are represented in \(L^2({\mathbb R}^3, d^3x)\) as we already know. The final space is therefore \(L^2({\mathbb R}^3, d^3x)\otimes {{\mathsf H}}_s\otimes {{\mathsf H}}_c\). Alternatively, the non-internal part of the state of the electron can be represented in the L 2 space associated with the momentum operators, the momentum picture introduced in Example 3.74. With this choice the total Hilbert space of an electron is \(L^2({\mathbb R}^3, d^3k)\otimes {{\mathsf H}}_s\otimes {{\mathsf H}}_c\) where the momentum operator is a multiplication. These two descriptions are unitarily equivalent (under the Fourier-Plancherel transform) and choosing one or another is just matter of convenience.

  2. (2)

    Composite systems are in particular systems made of many (either identical or not) particles. If we have a pair of particles respectively described on the Hilbert space H 1 and H 2, the full system is described on H 1 ⊗H 2. Notice that, in the finite-dimensional case, the dimension of the final space is the product of the components’ dimensions. In CM the system would instead be described on a phase space which is the Cartesian product of the two phase spaces. In that case the dimension would be the sum, rather than the product, of the dimensions of the component spaces. \(\blacksquare \)

4.5 General Interplay of Quantum Observables and Quantum States

This section is devoted to focus on the interplay of general observables and states and to prove that formulas familiar to physicists are well motivated by the rigorous formalism.

4.5.1 Observables, Expectation Values, Standard Deviations

When dealing with mixed states, Definitions (3.43) and (3.45) for the expectation value 〈Aψ and the standard deviation ΔA ψ of an observable A referred to the pure state defined by 〈ψ| 〉ψ with ||ψ|| = 1, are no longer valid. Extended natural definitions can be stated referring to the probability measure associated to both the mixed state defined by \(T\in {\mathcal {S}}({\mathsf H})\) and the observable A, more precisely its PVM P (A). In practice, we can define

$$\displaystyle \begin{aligned} \mu_T^{(A)} : \mathcal{B}(\sigma(A))\ni E \mapsto tr(P_E^{(A)}T)\in [0,1] \end{aligned} $$
(4.32)

with the meaning of the probability to obtain E after a measurement of A in the quantum state represented by \(T\in {\mathcal {S}}({\mathsf H})\).

In particular, if T is pure, so that T = ψψ|⋅〉 for some unit vector ψ ∈H, we find again the probability already seen in (3.42),

$$\displaystyle \begin{aligned} \mu_T^{(A)}(E) = ||P_E^{(A)}\psi||{}^2 = \mu^{(A)}_{\psi,\psi}(E)\:. \end{aligned}$$

The proof is trivial: just complete {ψ} to a Hilbert basis of H and compute the trace. Adopting the definition of \(\mu _T^{(A)}\) introduced in (4.32),

  1. (a)

    the expectation value of A with respect to the state described by T is defined as

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \langle A \rangle_T := \int_{\sigma(A)} \lambda \: d\mu_T^{(A)}(\lambda) \:,{}\end{array} \end{aligned} $$
    (4.33)

    provided the function \(\sigma (A) \ni \lambda \to \lambda \in {\mathbb R}\) is in \(L^1(\sigma (A),\mu _T^{(A)})\);

  2. (b)

    the standard deviation is defined as

    $$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta A_T := \sqrt{\int_{\sigma(A)} (\lambda - \langle A \rangle_T)^2\: d\mu_T^{(A)}(\lambda)} = \sqrt{\int_{\sigma(A)} \lambda^2 \: d\mu_T^{(A)}(\lambda)- \langle A \rangle^2_T}\:, \\{}\end{array} \end{aligned} $$
    (4.34)

    provided \(\sigma (A) \ni \lambda \to \lambda \in {\mathbb R}\) is in \(L^2(\sigma (A),\mu _T^{(A)})\). (Notice \(L^2(\sigma (A),\mu _T^{(A)}) \subset L^1(\sigma (A),\mu _T^{(A)})\) since the measure is finite.)

4.5.2 Relation with the Formalism Used in Physics

The next proposition establishes that the usual formal results handled by physicists (see formulas in (b)-(d) below) are valid under suitable conditions on the domains.Footnote 4 With reference to the domain issues in (b) and (c) below we observe that \(D(A^2)= \Delta _{\imath ^2}\subset \Delta _{\imath } = D(A)=D(|A|)\).

Proposition 4.59

Let H be a Hilbert space, \(T\in {\mathcal {S}}({\mathsf H})\) a quantum state operator and A : D(A) →H , densely defined, an observable (i.e. A = A ). The following facts hold.

  1. (1)

    \(\mu _T^{(A)}\) as in (4.32) is a well-defined probability measure on \(\mathcal {B}(\sigma (A))\).

  2. (2)

    If Ran(T) ⊂ D(A) and \(|A|T \in {\mathfrak B}_1({\mathsf H})\) (always valid if \(A \in {\mathfrak B}({\mathsf H})\) ), then

    1. (a)

      AT is well defined,

    2. (b)

      AT = tr(AT).

  3. (3)

    If Ran(T) ⊂ D(A 2) and \(|A|T, A^2T \in {\mathfrak B}_1({\mathsf H})\) (always valid if \(A \in {\mathfrak B}({\mathsf H})\) ), then

    1. (a)

      ΔA T is well defined,

    2. (b)

      \(\Delta A_T = \sqrt {tr(A^2T)- \left (tr(AT)\right )^2}\).

  4. (4)

    Assume that T = ψψ| 〉 with ||ψ|| = 1

    1. (a)

      If ψ  D(A) then the hypotheses in 2 are valid andAT = 〈ψ|,

    2. (b)

      If ψ  D(A 2) then the hypotheses in 3 are valid and \(\Delta A_T = \sqrt {\langle \psi |A^2 \psi \rangle - \langle \psi |A \psi \rangle ^2}\).

Proof

  1. (1)

    Taking the definition PVM into account, the proof is a trivial adaptation of the proof of Proposition 4.45.

  2. (2)(a)

    Let us assume Ran(T) ⊂ D(A) and \(|A|T \in {\mathfrak B}_1({\mathsf H})\) that are automatically true if \(A\in {\mathfrak B}({\mathsf H})\). As already stressed, D(|A|) = D(A) so Ran(T) ⊂ D(A) = D(|A|) is true and both AT, |A|T are well defined under said hypotheses. Next, the polar decomposition theorem for (unbounded) selfadjoint operators A = U|A| (immediately obtained from the spectral decomposition in the three cases with |A| and \(U := \mbox{sign}(A)\in {\mathfrak B}({\mathsf H})\) defined spectrally) implies \(AT = U|A|T\in {\mathfrak B}_1({\mathsf H})\), because \(U\in {\mathfrak B}({\mathsf H})\) and \({\mathfrak B}_1({\mathsf H})\) is two-sided ideal. Now, referring to the Borel σ-algebra on \(\sigma (A) \subset {\mathbb R}\), we can construct a sequence of real simple functions

    $$\displaystyle \begin{aligned} s_n= \sum_{i_n\in\mathcal{I}_n}c^{(n)}_{i_n}\chi_{E_{i_n}^{(n)}}: \sigma(A) \to {\mathbb R}\quad \mbox{with } c_{i_n}^{(n)}\in{\mathbb R},\ \mbox{and}\ \mathcal{I}_n\mbox{ finite} \end{aligned}$$

    which satisfies

    $$\displaystyle \begin{aligned} 0\leq |s_n| \leq |s_{n+1}| \leq |\imath|\:, \quad s_n \to \imath\quad \mbox{pointwise as}\ n\to +\infty, \end{aligned} $$
    (4.35)

    where \(\imath : \sigma (A) \ni \lambda \mapsto \lambda \in {\mathbb R}\). By direct application of the given definitions, if

    $$\displaystyle \begin{aligned} A_n := \int_{\sigma(A)} s_n dP^{(A)}=\sum_{i_n\in\mathcal{I}_n}c^{(n)}_{i_n}P^{(A)}_{E_{i_n}^{(n)}} \in {\mathfrak B}({\mathsf H})\:, \end{aligned}$$

    exploiting Proposition 3.29 (c), monotone convergence and Lebesgue’s dominated convergence, we have both

    $$\displaystyle \begin{aligned} \langle \psi| A_n \psi\rangle \to \langle \psi |A \psi\rangle\:, \quad \langle \psi| |A_n| \psi\rangle \to \langle \psi ||A| \psi\rangle \quad \forall \psi \in D(A)\quad \mbox{as}\ n \to +\infty \end{aligned} $$
    (4.36)

    and also

    $$\displaystyle \begin{aligned} |\langle \psi| A_n \psi\rangle | \leq \langle \psi| |A_n| \psi\rangle \leq \langle \psi| |A| \psi\rangle \:.{} \end{aligned} $$
    (4.37)

    On the other hand, if M is a Hilbert basis of H obtained by completing a Hilbert basis N of Ker(T) made of eigenvectors of T according to Theorem 4.34 (e) and taking advantage of the cyclic property of the trace, we have both

    $$\displaystyle \begin{aligned} \begin{aligned} tr(A_nT) &= tr\left(\sum_{i_n\in\mathcal{I}_n}c^{(n)}_{i_n} P^{(A)}_{E^{(n)}_{i_n}}T\right)=\sum_{i_n\in\mathcal{I}_n}c^{(n)}_{i_n} tr(P^{(A)}_{E^{(n)}_{i_n}}T)= \sum_{i_n\in\mathcal{I}_n}c^{(n)}_{i_n} tr(TP^{(A)}_{E^{(n)}_{i_n}})=\\ &=\sum_{i_n\in\mathcal{I}_n}c^{(n)}_{i_n} \mu_T(E^{(n)}_{i_n})=\int_{\sigma(A)}s_n d\mu_T^{(A)} \end{aligned} \end{aligned} $$
    (4.38)

    and similarly

    $$\displaystyle \begin{aligned} tr(|A_n| T) = \int_{\sigma(A)} |s_n| \: d\mu_T^{(A)}\:. \end{aligned} $$
    (4.39)

    Looking at the formula (4.39), by the monotone convergence theorem

    $$\displaystyle \begin{aligned} tr(|A_n| T)=\int_{\sigma(A)} |s_n|(\lambda) \: d\mu_T^{(A)}(\lambda) \to \int_{\sigma(A)} |\lambda| \: d\mu_T^{(A)}(\lambda)\: \end{aligned}$$

    as n → +, and simultaneously

    $$\displaystyle \begin{aligned}tr(|A_n|T) = \sum_{u\in N} s(u) \langle u| |A_n| u\rangle \to \sum_{u\in N} s(u) \langle u| A u\rangle = tr(|A|T)\:,\end{aligned}$$

    where s(u) ≥ 0 are the eigenvalues of T, again by monotone convergence and (4.36). Putting all together, we find

    $$\displaystyle \begin{aligned} tr(|A|T) = \int_{\sigma(A)} |\lambda| \: d\mu_T^{(A)}(\lambda)\:. \end{aligned} $$

    We have in particular established that the integral in the right-hand side is finite (because the left-hand side exists by hypothesis) and thus 〈AT is well defined.

  3. (2)(b)

    Let us look at the formula in (4.38). From dominated convergence, taking (4.35) into account, we obtain as n →

    $$\displaystyle \begin{aligned} tr(A_nT)=\int_{\sigma(A)} s_n(\lambda) \: d\mu_T^{(A)} \to \int_{\sigma(A)} \lambda \: d\mu_T^{(A)}\:. \end{aligned} $$

    On the other hand,

    $$\displaystyle \begin{aligned}tr(A_nT) = \sum_{u\in N} \langle u| A_n u\rangle s(u) \to \sum_{u\in N} \langle u| A u\rangle s(u)= tr(AT)\:,\end{aligned} $$

    where we have once again applied the dominated convergence theorem allowed by (4.37). Putting everything together we get

    $$\displaystyle \begin{aligned}tr(AT) = \int_{\sigma(A)} \lambda \: d\mu_T^{(A)}(\lambda) =: \langle A\rangle_T\:,\end{aligned} $$

    concluding the proof of 2.(b).

  4. (3)

    The proof is strictly analogous to that of (2) by noticing that the hypotheses of (3) imply those of (2) and that \(L^2(\sigma (A),\mu _T^{(A)}) \subset L^1(\sigma (A),\mu _T^{(A)})\) because \(\mu _T^{(A)}\) is finite.

  5. (4)

    The claim reduces to trivial subcases of (2) and (3), in particular by completing {ψ} to a Hilbert basis of H to compute the various traces.

Example 4.60

Let us consider a quantum spinless particle of mass m > 0, living on the real line, whose Hamiltonian operator

$$\displaystyle \begin{aligned}H = \mbox{s-}\sum_{n=0}^{+\infty} \hbar\omega (n+ 1/2)\langle \psi_n |\:\: \rangle \psi_n\end{aligned} $$

is that of a harmonic oscillator (see (3) Example 3.43). In this case \({\mathsf H}= L^2({\mathbb R}, dx)\). If the system is in contact with a heat bath at (absolute) temperature (k B β)−1 > 0 (k B being Boltzmann’s constant), its state is mixed and is described by the statistical operator

$$\displaystyle \begin{aligned}T_\beta = Z^{-1}_\beta e^{-\beta H}\:, \end{aligned}$$

where expanding the trace in the Hilbert basis of eigenvectors ψ n of H gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} Z_\beta = tr(e^{-\beta H}) = \sum_{n=0}^{+\infty} e^{-\beta \hbar \omega (n +1/2)}= \frac{e^{-\beta \hbar \omega/2}}{1- e^{-\beta \hbar \omega} }\:,{}\end{array} \end{aligned} $$
(4.40)

the so-called canonical partition function. In other words

$$\displaystyle \begin{aligned}T_\beta =\mbox{s-} \sum_{n=0}^{+\infty} \frac{e^{-\beta \hbar \omega (n +1/2)}}{Z_\beta} \langle \psi_n |\:\: \rangle \psi_n\:.\end{aligned}$$

It is easy to check that \(T_\beta \in {\mathcal {S}}({\mathsf H})\). Furthermore the elements of Ran(T β) have the form

$$\displaystyle \begin{aligned} \sum_{n=0}^{+\infty} e^{-\beta \hbar \omega (n +1/2)} c_n \psi_n\quad \mbox{with }\ \sum_{n=0}^{+\infty}|c_n|{}^2 <+\infty.\end{aligned}$$

It is therefore evident that Ran(T β) ⊂ D(H m) for m = 1, 2, … and that |H|T β = H β T β and \(H^2T_\beta \in {\mathfrak B}_1({\mathsf H})\). For instance

$$\displaystyle \begin{aligned}H^mT_\beta = \mbox{s-} \sum_{n=0}^{+\infty} \frac{(\hbar \omega)^m e^{-\beta \hbar \omega (n +1/2)}(n+1/2)^m}{Z_\beta} \langle \psi_n |\:\: \rangle \psi_n\:,\end{aligned}$$

so that \(H^mT_\beta \in {\mathfrak B}_1({\mathsf H})\) with

$$\displaystyle \begin{aligned}||H^mT_\beta || = \sup_{n\in {\mathbb N}} \frac{(\hbar \omega)^2 e^{-\beta \hbar \omega (n +1/2)}(n+1/2)^m}{Z_\beta}= \frac{(\hbar \omega)^m e^{-\beta \hbar \omega/2}}{2^mZ_\beta}\:, \end{aligned}$$
$$\displaystyle \begin{aligned}||H^mT_\beta||{}_1 = \sum_{n=0}^{+\infty} \frac{(\hbar \omega)^m e^{-\beta \hbar \omega (n +1/2)}(n+1/2)^m}{Z_\beta} < +\infty\:.\end{aligned}$$

Therefore we can apply Proposition 4.59. For instance

$$\displaystyle \begin{aligned}\langle H\rangle_{T_\beta} = tr(HT_\beta) = \frac{\hbar \omega}{Z_\beta}\sum_{n=0}^{+\infty} e^{-\beta \hbar \omega (n +1/2)}(n+1/2) = -\frac{1}{Z_\beta} \frac{d}{d\beta} Z_\beta = -\frac{d}{d\beta} \ln Z_\beta\:,\end{aligned}$$

where in the penultimate passage we have moved the derivative in β inside the sum, as allowed by standard elementary theorems of calculus, since the series converges and the derivatives’ series converges uniformly. \(\blacksquare \)