Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 Inaccessible Conceptual Variables and Quantum Theory

The statistical literature is full of discussions on how to do inference, but contains very little on the choice of question to do inference on in some given situation. These different questions may be conflicting, even complementary. In the following sections I will start by formalizing a way in which the discussion of such complementary questions may be addressed in the extreme case where it is only possible to raise one out of many different possible questions at a time. Each such question will be an epistemic question ‘What is θ?’ for some e-variable θ, and I will assume that the epistemic process ends by giving some information about θ, in the simplest case a complete specification: θ = u k.

The concept of an epistemic process is taken to be very wide in this book. In addition to statistical questions concerning a parameter θ, we can think of questions like: How many sun hours will there be here tomorrow? At the outset, to address this epistemic question will involve meteorological expertise and a lot of data from similar situations, but tomorrow the question can be answered by just counting the number of sun hours. Both these processes will be seen as epistemic processes.

However, when it comes to the parameters/e-variables of the epistemic processes, I will often make more specific assumptions. I will then take generalized experiments as point of departure, that is, I assume that there in each setting exist data z and a context τ such that the assumptions (1) and (2) of Sect. 3.2 are satisfied. For the most part, z and τ will be implicit in the discussion, but they will be there. The e-variables are also assumed to be associated with some actor (observer) or with a group of communicating actors.

So far I have assumed that each conceptual variable relevant to an epistemic process is accessible, that is, it can be estimated or given a value with arbitrary accuracy by any experiment. In Helland (2006, 2008, 2010) several situations with inaccessible conceptual variables were described (see also below), and it was indicated that such situations in special cases could form a link to important parts of quantum theory. I consider this way of thinking to be essential as a step towards obtaining a unification of epistemic science, and also as an attempt to give an alternative background for the—from a statistical point of view and also from the layman’s point of view—very formal language that one finds in textbooks and in scientific publications, both within quantum physics and in the mathematical traditions developed from this. In the following sections a less formal approach will be presented. Compared to my earlier publications, the discussion here will hopefully give both a simpler and a more complete treatment of my approach towards quantum mechanics.

In statistics, the parameter concept is connected to a hypothetical population of items. My e-variables are intended also for situations where we have a single item or a few items, and a human subject or a group of subjects use these variables in making statements about the item(s). This is crucial for my epistemic interpretation of quantum mechanics, an interpretation which I also share with the Bayesian quantum foundation school (QBism); see below.

The concept of e-variable will be very important in this book. Recall that it is any conceptual variable used in the epistemic process. In the same way as a parameter in statistics is any property for an hypothetical population, an e-variable can in principle be any property of a population, a single unit or a group of units. Like a parameter, it is a theoretical variable before the epistemic process, but after the process the observer or the set of communicating observers in question have some information about the value of the e-variable. An e-variable is a property of a unit or a set of units. A modern view of quantum theory and particle physics, see Kuhlmann (2013), reduces everything to properties and relations.

Quantum theory has a long history starting with the work of several eminent physicists in the beginning of the previous century, via the formalization made by von Neumann (1932) to the rather intense debate on quantum foundation that we see today. Several good books on quantum theory exist, for instance Ballentine (1998). Interpretations of the theory have been given by many authors, but it has also been argued that no interpretation is needed; see Fuchs and Peres (2000). Several authors have derived quantum theory from a few explicit or implicit physical assumptions; see Hardy (2001), Chiribella et al. (2010), Masanes (2010), Fields (2011), Fivel (2012) and Casinelli and Lahti (2016). There is also a group of quantum foundation researchers working towards a link with Bayesian inference; see Caves et al. (2002), Schack (2006), Timpson (2008), Fuchs (2010), Fuchs and Schack (2011) and Fuchs et al. (2013). The use of quantum information theory in the exploration of the foundation has also recently proved to be very useful, see Fuchs (2002). The present work has much in common with these schools, but I find it fruitful to maintain a broader link to statistics, in particular to allow a broader view on statistical inference than just the Bayesian view. In this way I will argue for a foundation which is purely epistemological: A general approach for going from experienced data to information about the nature behind these data. I will discuss connections to the quantum Bayesian interpretation later; see also Chap. 1.

One very obvious case of an inaccessible conceptual variable is in connection to counterfactual reasoning. Assume a single medical patient and let the doctors have the choice between two mutually exclusive treatments. Let θ i be the time for this patient until recovery when treatment i is used (i = 1, 2), and let ϕ = (θ 1, θ 2). Then θ 1 or θ 2 can be predicted before the treatment is applied, and each of them can be determined precisely after some time period, but ϕ is inaccessible, that is, there is no procedure by which ϕ can be given a value with arbitrary accuracy at any time for a single patient by any medical doctor, by any scientist or by any observer. This can be amended by instead of one patient considering large homogeneous groups of patients, which is done in standard statistical texts, but in practice there is a limitation on how homogeneous a group of patients can be. And concepts may be of interest for one single patient, too.

Here are two other examples of inaccessible conceptual variables:

  • We want to measure some quantity θ 1 with a very accurate apparatus which is so fragile that it is destroyed after a single measurement. There is another quantity θ 2 which can only be found by dismantling the apparatus, and then it can not be repaired. The vector ϕ = (θ 1, θ 2) is again inaccessible.

  • Assume that two questions are to be asked to a single individual at some given moment, and that we know that the answer will depend on the order in which the questions are posed. Let the e-variable (θ 1, θ 2) be the answers when the questions are posed in one order, and let the answers be (θ 3, θ 4) when the questions are posed in the opposite order. Then the vector ϕ = (θ 1, θ 2, θ 3, θ 4) is inaccessible.

Now go to the quantum mechanical situation. It is well known that the position θ 1 of a particle can be measured accurately in some experiments and its momentum θ 2 can be measured accurately in other experiments, but that the vector ϕ = (θ 1, θ 2) is inaccessible. Similarly, the spin vector ϕ of a particle is inaccessible, but its component θ a in some fixed, determined direction a is possible to measure in a suitable experiment.

In general, let ϕ = (θ 1, θ 2) be inaccessible. Then different experimental settings are needed to measure θ 1 and θ 2. In the words of Niels Bohr, the variables θ 1 and θ 2, which I call e-variables, are complementary. The concept of complementarity was crucial to Bohr, and it has been crucial to the foundation of quantum mechanics even before its formal apparatus was developed. In the same way, the concept of an inaccessible conceptual variable will be crucial in the further development of this book.

From a statistical point of view: Inaccessible parameters also occur in linear models of non-full rank, often used in the case of unbalanced data, cp. Searle (1971), and in the analysis of designed experiments where only some contrasts can be estimated. Also, in regression models where the number of variables by necessity is larger than the number of observations, the regression parameter is an inaccessible parameter. In my opinion a more complete theory of statistical inference is definitely obtained if we allow for inaccessible conceptual variables.

It is a crucial fact that the inaccessible conceptual variables ϕ are abstract variables in some mathematical space and that operations such as group actions may be made on this space. This is the case with the counterfactual example above, where a group action such as a change of time scale can be made. See the summary of group theory in Appendix B. In general, let ϕ vary in a set Φ. Then the group of endomorphisms on Φ is the group of all possible transformations of elements of Φ. This group always exists from a mathematical point of view. In my later approach towards quantum mechanics, I will choose a fixed subgroup G of the group of endomorphisms acting on the space Φ of inaccessible conceptual variables. Important subgroups of G again, are the groups G a, where G a corresponds to all transformations of the values of θ a = θ a(ϕ) in the space Θa.

What is important to note, however, is that I will not regard the inaccessible conceptual variables as physical variables, and they do not take concrete values, so I am not developing a hidden variable theory of the kind that has been much debated in the physical literature over the years. Also, the e-variables/ parameters are not hidden variables, but closely connected to the epistemic process. Note that the parameters of statistics exist only in our minds.

Historically, an example of a hidden variable theory is David Bohm’s dual wave-particle theory, and John Bell (see Bell 1987) proved that this theory is non-local. In fact, Bell proved much more. His famous theorem states that any realistic theory consistent with quantum mechanics must be non-local. This result has been very important in discussions among physicists in recent years. Bell’s theorem is proved using what is called the Einstein-Podolski-Rosen experiment and Bell’s inequality, concepts which will be discussed later in this book. One point for me here is that I do not want to develop a non-local theory, that is, a theory where communication is made by signals traveling faster than the light speed. Then I am instead forced to take a closer look upon the concept of realism. This has also been done recently in a very convincing way by Nisticò and Sestito (2011). In that paper they take as a point of departure the criterion of reality as formulated be Einstein et al. (1935):

Criterion of Reality

If, without in any way disturbing a system, we can predict the value of a physical quantity, then there exists an element of physical reality corresponding to this physical quantity.

Following arguments from Bohr’s discussion of Einstein et al. (1935) they make the case for a strict interpretation of this criterion:

Strict Interpretation

To ascribe reality to P, the measurement of an observable whose outcome would allow for the prediction of P, must actually be performed.

Nisticò and Sestito (2011) go on and formulate an extension of quantum correlation which is consistent with the strict interpretation, and using this they show that Bell’s argument and several related arguments in the literature fail when realism is interpreted in this strict way. Thus the possibility turns out to be open to interpret the non-locality theorems in the physical literature as arguments supporting the strict criterion of reality, rather than as a violation of locality.

Since the present book is theoretical and not experimental, I will have to modify Nisticò and Sestito’s requirement of strict interpretation slightly: ‘…a description of how the measurement can be actually performed, must be given.’ It is important that my conceptual variables are thought of as defined by one person or a group of persons and to the experimental data that he/she/they are able to obtain.

In other papers, Bell’s theorem is interpreted as saying that quantum physics must necessarily violate either the principle of locality or counterfactual definiteness. Counterfactual definiteness is defined as the ability to speak with meaning of definiteness of results of measurements that have not been performed (i.e., the ability to assure the existence of objects, and properties of objects, even when they have not been measured.) In this book it is crucial that I do not assume counterfactual definiteness. All my conceptual variables are assumed to be defined by some person(s), and these conceptual variables will not necessarily be such that results of measurements not performed will have meaning. Here is a simple example: By first sight, one of the statements ‘I have something on my lap’ and ‘I do not have anything on my lap’ must be true. But if I am standing, neither of these statements are true. The logical status of statements must depend on the context.

In my formulation, I will look upon the accessible e-variables as variables connected with experiment which actually can be imagined to be performed by some person. This person will have a certain context for his experiment. It is possible that another person, who has no communication with the first one, has a different context and uses different e-variables to formulate his observations, therefore getting seemingly conflicting predictions. But as soon as communication is restored, there must be no conflict any more. To make this precise: The two persons must then make non-conflicting predictions if they agree on a common context, and they must agree on observed results as long as they both have observed results.

In this chapter I will assume ideal experiments, where I will not distinguish between data and corresponding e-variables. I will come back to more realistic experiments with data in the next chapter.

4.2 The Maximal Symmetrical Epistemic Setting

I proceed to discuss a setting from which I will show that essential parts of the formalism of quantum mechanics can be derived under certain technical conditions. From my point of view this is nothing but a special situation with an inaccessible conceptual variable, where I focus upon accessible sub conceptual variables and where symmetry is introduced by natural group actions. The purpose at this particular point is not to derive all aspects of quantum mechanics, only as much that we see that the e-variable concept is useful also in this connection, so that we can obtain an interpretation where there is a link to the ordinary statistical theory of estimation. The rest of this chapter will involve some technical discussions, and can only be skimmed in the first reading of the book. The results of these discussions are crucial, however: To which extent can the Hilbert space formalism be derived from simple assumptions in the epistemic process setting? The next chapter will begin by simply assuming this formalism.

Let in general ϕ be an inaccessible conceptual variable taking values in some topological space Φ, and let λ a = λ a(ϕ) be accessible functions for a belonging to some index set \(\mathcal {A}\). I will repeat that a conceptual variable is accessible if it in the given context can be estimated with arbitrary accuracy by some experiment. In other words, the λ a’s are e-variables. Technically I will without further mention assume that all functions defined on Φ are Borel-measurable. To begin with, I will assume that the functions λ a are maximal, and also that there is an one-to-one functional relation between them. This is made precise below. In general, transformations of Φ by group elements g may be defined.

Assumption 4.1

  1. a)

    Consider the partial ordering defined by α  β iff α = f(β) for some function f. Under this partial ordering each λ a(ϕ) is maximally accessible, that is, (1) λ a(ϕ) is accessible, an e-variable; (2) if λ a(ϕ) = f(μ(ϕ)) for a non-invertible function f, then μ(ϕ) is inaccessible.

  2. b)

    For ab there is an invertible transformation g ab such that λ b(ϕ) = λ a(g ab(ϕ)).

Note that the partial ordering in a) is consistent with accessibility: If β is accessible and α = f(β), then α is accessible. Also, ϕ is an upper bound under this partial ordering. The existence of maximal accessible conceptual variables follows then from Zorn’s lemma.

To be clear, no summation convention is used in b). This assumption induces an one-to-one functional relation between λ a and λ b.

In this abstract setting, the inaccessible variable ϕ and the accessible variables λ a(ϕ) can be anything. However, to begin with it might be useful to have the following physical example in mind: Let ϕ be the spin or angular momentum vector for a particle, then focus upon some direction a in space, and let λ a(ϕ) be the spin or angular momentum component in that direction. Let the group elements g consist of rotations of ϕ in space. It is useful to think through what Assumption 4.1 means in this setting. An important point of departure is that ϕ only exists in our minds. A closer discussion of this example will be given below.

Consider in general the situation where the vector ϕ = (λ 1, λ 2) is inaccessible. Then the statement that λ 1 and λ 2 are maximally accessible is equivalent to the statement that they are complementary in Niels Bohr’s sense. The concept of complementarity is extremely important in quantum mechanics. In Sect. 6.3 I will discuss the concept in other contexts as well.

Below, I will often single out a particular index \(0\in \mathcal {A}\). Then (a), given (b), can be formally weakened to the assumption that λ 0(ϕ) is maximally accessible, and b) can be weakened to the existence for all a of an invertible transformation g 0a such that λ a(ϕ) = λ 0(g 0a(ϕ)). Take \(g_{ab}=g_{0a}^{-1}g_{0b}\).

In the example above with counterfactual medical treatments, we can take λ a = θ 1, λ b = θ 2, ϕ = (λ a, λ b) and g ab((λ a, λ b)) = (λ b, λ a). In general, when the transformation of Assumption 4.1b) exists, it is usually easy to see how it can be chosen.

Even though ϕ is inaccessible, it is possible to operate on ϕ with functions, in particular group actions. The group of endomorphisms on Φ, all transformations of elements ϕ always exists from a mathematical point of view, and one can imagine many subgroups of this group. Some of these will now be defined.

Definition 4.8

For each a, let \(\tilde {G}^a\) be the group of endomorphisms on Λa, the space upon which λ a varies. For \(\tilde {g}^a \in \tilde {G}^a\) let g a be any transformation on Φ for which \(\tilde {g}^a \lambda ^a (\phi )=\lambda ^a (g^a \phi )\).

Note that this makes sense in the spin/ angular momentum case: For some fixed integer or half integer number j, the possible values of λ a(ϕ) are − j, −j + 1, …, j − 1, j. Start with some vector ϕ, and fix a plane through this vector. Then by suitable rotations in this plane, λ a(ϕ) will change from one of these values to another arbitrary value. λ a(ϕ) is fixed when this plane is rotated with fixed ϕ.

It is easily verified in general that

  1. 1.

    For fixed \(\tilde {g}^a\) the transformations g a form a group.

  2. 2.

    For fixed a the transformations g a form a group G a.

In simpler terms, the group G a is the group transforming values of λ a into the same or other values of this e-variable, and the corresponding group \(\tilde {G}^a\) is the group of all transformations of these values.

The group G a can be characterized as follows. Let a function η on Φ be called permissible (Helland 2010) with respect to a group H if η(ϕ 1) = η(ϕ 2) implies η( 1) = η( 2) for all h ∈ H. Then G a is the maximal group under which λ a is permissible.

Obvious consequences of Definition 4.8 are that \(\tilde {G}^a\) is transitive over Λa and that G a is transitive over Φ.

Now single out a fixed index \(0\in \mathcal {A}\).

Definition 4.9

Let G be the group of transformations generated by G 0 and the transformations g 0a, \(a\in \mathcal {A}\).

It is easily verified that \(G^a = g_{0a}^{-1} G^0 g_{0a}.\) Together with \(g_{ab}=g_{0a}^{-1}g_{0b}\) this implies that G also is the group generated by G a, \(a\in \mathcal {A}\) and g ab, \(a,b \in \mathcal {A}\).

Now I want to introduce the further

Assumption 4.2

  1. a)

    The group G is a locally compact topological group, and satisfies weak assumptions such that an invariant measure on Φ exists. (see Appendix B).

  2. b)

    The group generated by products of elements of G a, G b, …; \(a,b,\ldots \in \mathcal {A}\) is equal to G.

Assumption 4.2a) is a technical one, needed in the next section. Note that G is defined in terms of transformations upon Φ, so that the topology must be introduced in terms of these transformations. Technically this can be achieved by assuming Φ to be a metric space with metric d, and letting g n → g for instance if supϕ d(g n(ϕ), g(ϕ)) → 0. Concerning Assumption 4.2b), it follows from \(g^a g^b \ldots =g_{0a}^{-1}g^{0}g_{0a}g_{0^{\prime }b}^{-1}g^{0^{\prime }}g_{0^{\prime }b},\ldots \), where g a ∈ G a, g b ∈ G b, … and \(g^{0}, g^{0^{\prime }},\ldots \in G^0\), that the group of products is contained in G. That it is equal to G, is an assumption on the richness of the index set \(\mathcal {A}\) or the richness of G 0.

The setting described here, where Assumptions 4.1 and 4.2 are satisfied, includes many quantum mechanical situations including spins and systems of spins. I will call it the maximal symmetrical epistemic setting. Later I will also sketch a macroscopical situation where the assumptions of the maximal symmetrical epistemic setting are satisfied. However, the focus in the present book will be quantum-mechanical.

An important special case is when each Λa is discrete. Then \(\tilde {G}^a\) is the group of permutations of elements of Λa, and G a is the group of all transformations on Φ which induce permutation of Λa. In this situation I will later define a state of the system as a focused question: “What is the value of λ a?” together with a definite answer: “λ a = u k”. Under an additional technical assumption on the group structure, I will prove that this leads to a link to the ordinary Hilbert space formalism of quantum mechanics.

Example 4.15

Model the spin vector of a particle such as the electron by a vector ϕ, an inaccessible conceptual variable. More generally, we can let ϕ denote the total spin/angular momentum vector for any particle or system of particles. Let Φ be the sphere corresponding to a fixed norm ∥ϕ∥. Let G be the group of rotations of this vector.

Next, choose a direction a in space, and focus upon the spin component in this direction:

$$\displaystyle \begin{aligned}\zeta^a =\|\phi\|\mathrm{cos}(\phi,a).\end{aligned}$$

Associate ζ a(ϕ) with the group F a of rigid rotations around a together with a reflection in a plane through the origin perpendicular to a. This is the largest subgroup of G with respect to which ζ a is a permissible subvariable. (For a closer discussion of the concept of permissible subparameter; see Helland 2010, Chapter 3.) The actions of the group \(\tilde {F}^a\) on ζ a are just a change of sign together with the identity.

Finally, introduce model reduction of the kind discussed in Sect. 2.2: The orbits of \(\tilde {F}^a\) as acting on ζ a are given as two-point sets {±c} together with the single point 0. A maximal model reduction is to one such orbit. Later I will give arguments to the effect that we want to reduce to the a set of orbits indexed by an integer or half-integer j, and that we will let this reduced set of orbits be

$$\displaystyle \begin{aligned}-j, -j+1,\ldots,j-1,j,\end{aligned}$$

this together with ∥ϕ2 = j(j + 1).

Now fix j and let λ a be the conceptual variable ζ a reduced to this set of orbits of \(\tilde {F}^a\). This is assumed to be the maximal accessible e-variable. Define the transformations g ab, and define the groups \(\tilde {G}^a\), G a and G as in the maximal epistemic setting. It is easy to see that the group G is as before. The group \(\tilde {G}^a\) is the group of permutations of values of λ a. The group elements g a can be seen as products of two kinds of elements. The first kinds are rotations around a. The second kinds are suitable rotations of each ϕ in a plane through a and ϕ.

We can prove that the general assumptions of this section are satisfied. In the case j = 0, where we must define G = G 0 to be the trivial group. Otherwise G is the group of all rotations of vectors ϕ, is obviously compact and satisfying Assumption 4.2a). Here an argument leading to the proof of Assumption 4.2b): Given a and b, a transformation g ab sending λ a(ϕ) onto λ b(ϕ) can be obtained by a reflection in a plane P perpendicular to a plane containing the two vectors a and b, where P contains the mid-line between a and b. More precisely: Let d be the orthogonal to the midline between a and b in the plane containing the two vectors, let λ d be the spin component along d and let g d be the group element changing sign of λ d. Then g ab = g d.

The case with one orbit and c = 1∕2 corresponds to electrons and other spin 1∕2 particles. The direction defined by a = 0 is some arbitrary fixed direction.

In general, the assumptions of this section may be motivated in a similar manner: First, a conceptual variable ζ a = ζ a(ϕ) is introduced for each a through a chosen focusing, and a suitable group acting on ζ a is defined. Then λ a is defined as a reduction of ζ a to a set of orbits of this group. The essence of Assumption 4.1 is that it is this λ a which is maximally accessible. This may be regarded as the quantum hypothesis.

This reasoning works for variables like spin and angular momentum, in general for many discrete e-variables. For theoretical position ξ and theoretical momentum π of a particle, let ϕ = (ξ, π). Then one can again introduce groups and group elements, and the assumptions of this section are except Assumption 4.2b) are satisfied for this case. A special discussion of continuous e-variables is carried out in Sect. 5.2 below.

4.3 The Toy Model of Spekkens

Nearly since its introduction in the beginning of the last century, discussions of the interpretation of quantum mechanics have taken place. In particular, researchers have disagreed on how the quantum state should be interpreted. Should it be seen as a real state of nature (the ontic view) or does it only represent our knowledge of some focused aspect of nature (the epistemic view)? In my opinion, some synthesis here should be sought, but one should start with an observer and the epistemic process connected to this observer in his particular context. This will give an easy interpretation of the collapse of the wave packet during measurement, and it will also solve paradoxes like that of Schrödinger’s cat and that of Wigner’s friend. These aspects will be further discussed later in the book after Born’s formula and the Schrödinger equation have been introduced and motivated from my point of view. The ontic interpretation arises in my world view from a hypothetical situation where all potential observers communicate and arrive at a common context.

As it stands now, however, the quantum community is divided. Recently there has appeared in the literature certain no-go theorems which seem to support the ontic view. All these theorems are deeply founded, but they rely on certain assumptions. Under these assumptions they show that a pure epistemic view leads to inconsistencies. In particular Pusey et al. (2012) take as a point of departure a certain assumption of separability. This is weakened by Hall (2011) to an assumption of compatibility. Hardy (2012) introduced a different assumption of ontic indifference. The common denominator of these papers is that they show that under the specific assumptions the probability distribution over the ontic states corresponding to different quantum states cannot overlap. See also my discussion of the PBR theorem in Sect. 1.4. A crucial assumption is that the properties of the system can be defined by some state concept.

The toy model of Spekkens (2007) is based on a principle that restricts the amount of knowledge an observer can have about reality. A wide variety of quantum phenomena were found to have analogues within this toy theory, and this can be taken as an argument in favour of the epistemic view of quantum states.

In the simplest version of the toy model, we have one elementary system. This system can be in one of the four ontic states 1, 2, 3 or 4, but our knowledge of this is in principle restricted. We can only know one of the following six epistemic states: (a) The ontic state is 1 or 2; (b) it is 3 or 4; (c) it is 1 or 3; (d) it is 2 or 4; (e) it is 1 or 4; or (f) it is 2 or 3. These are the epistemic states of maximal knowledge.

The ontic base of the state (a) is {1, 2} etc.. If the intersection of the ontic bases of a pair of epistemic is empty, then those states are said to be disjoint. Thus (a) and (b) are disjoint, (c) and (d) are disjoint, and (e) and (f) are disjoint. There is a correspondence with certain basis vectors of the two-dimensional complex Hilbert space, where disjointness corresponds to orthogonality in the Hilbert space. For those who knows the Bloch sphere representation of that Hilbert space, the pairs of disjoint epistemic states can be pictured on the intersections of three orthogonal axes with that sphere.

Transformations of the epistemic states correspond to permutations of the ontic state. Thus the underlying group is the permutation group of four symbols, which has 24 elements. Each permutation induces a map between the epistemic states. In the Hilbert space correspondence, the even permutations correspond to unitary transformations, and the odd permutations correspond to anti-unitary transformations.

In my terminology, the system can be described by an inaccessible conceptual variable ϕ which is a vector whose three components are accessible e-variables:

$$\displaystyle \begin{aligned}\phi =(\lambda^a,\lambda^c, \lambda^e).\end{aligned}$$

Here λ i is the indicator of the event that the epistemic state is i. Each λ i takes the value 1 or 0. If λ a = 1, say, the ontic state is either 1 or 2; if λ a = 0, it is either 3 or 4. A complete knowledge of ϕ is equivalent to a knowledge of the ontic state, which is impossible in the Spekkens toy model.

Each λ i is a maximal accessible e-variable. The event λ a = 1 is taken into the event λ a = 0 by the even permutation g a = (13)(24), written in cycle notation. This together with the identity generates the group G a. Similarly G c and G e are generated. The e-variable λ c is taken into the e-variable λ a by the even permutation g ca = (123)(4). This permutation can also be written as g af = g be = g db = g fc = g ed if obvious new e-variables are introduced. Similarly, the group elements g ac, g ae, g ea, g ce and g ec are even permutations. The group G is the group of all even permutations. All assumptions of the maximal symmetrical epistemic setting are satisfied except Assumption 4.2b). Thus the Spekkens toy model can not be seen as a special case of the maximal symmetrical epistemic setting, but the simplest case of the toy model is closely related to this.

The next simplest case of the Spekkens toy model consists of two elementary systems. The main requirement from one system carries over: If one has maximal knowledge, then for every system, at every time, the amount of knowledge one possesses about the ontic state of the system at that time must equal the amount of knowledge one lacks. The following discussion is very brief and presupposes a knowledge of Spekkens (2007). There are sixteen ontic states: 1 ⋅ 1, 1 ⋅ 2, …, 4 ⋅ 4. It turns out that the valid epistemic states are of two types: The uncorrelated states exemplified by (a) 1 ⋅ 1, 1 ⋅ 2, 2 ⋅ 1 or 2 ⋅ 2, and the correlated states exemplified by (e) 1 ⋅ 1, 2 ⋅ 2, 3 ⋅ 3 or 4 ⋅ 4.

Turning to my terminology, the state (a) can be represented by the event λ a = 1, where λ a is the indicator of the epistemic state (a), an e-variable. The event λ a = 0 does not represent an epistemic state of maximal knowledge, however, so the following trick is called for: Let λ 1 = (λ a, λ b, λ c, λ d), where λ a is the indicator of the epistemic state (a), λ b is the indicator of the epistemic state (b): 3 ⋅ 1, 4 ⋅ 1, 3 ⋅ 2 or 4 ⋅ 2, λ c is the indicator of the epistemic state 1 ⋅ 3, 1 ⋅ 4, 2 ⋅ 3 or 2 ⋅ 4, and λ d is the indicator of the epistemic state 3 ⋅ 3, 3 ⋅ 4, 4 ⋅ 3 or 4 ⋅ 4. Allow λ 1 to take the values (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0) or (0, 0, 0, 1). Then λ 1 is a maximal e-variable, and the events λ 1 = u k are all valid epistemic states in the Spekkens toy model. A similar trick can be made for the correlated epistemic states.

In Spekkens (2007) transformations between the epistemic states are discussed in terms of permutations between the ontic states. It turns out that there are permutations that take each of λ a, λ b, λ c, λ d and λ e (with an obvious definition of the last one) into each single other of this set. The transformations taking λ 1 into similar other maximal e-variable are transformations of four-vectors whose components are transformed by permutations. The transformations g 1 etc. can be written in a similar form, though they at the outset only are given by permutations of the components of the four-vector. Thus the group G is a subgroup of the group of transformations of four-vectors whose components are transformed by permutations of 16 elements, that is, a finite group. The assumptions of the maximal symmetrical epistemic setting are satisfied except Assumption 4.2b).

The fact that the Spekkens toy model has a valid epistemic interpretation and is strongly related to many phenomena of quantum mechanics, together with the fact that there is a link between this model and a modification of the maximal symmetrical epistemic setting, gives a strong indication that eventual logical difficulties in the interpretation of the maximal symmetrical epistemic setting and relating it to quantum mechanics, can be overcome. In the next section the formal apparatus of quantum mechanics will be reproduced from the maximal symmetrical epistemic setting under a certain technical condition.

4.4 The Hilbert Space Formulation

Take again as a point of departure the maximal symmetrical epistemic setting. The crucial step now towards the formalism of quantum mechanics is to define a Hilbert space, that is, a complete inner product space which serves as a state space in the formalism (see Appendix B). In ordinary quantum mechanics all observables are identified with operators on such a Hilbert space and every state is identified with a unit vector in the Hilbert space or more generally with a ray proportional to a unit vector. There is a large abstract general theory on this, well known to physicists, but largely unknown to statisticians and many other professionals. My goal here is to rederive this theory from the assumptions of the maximal symmetrical epistemic setting and possibly further assumptions. This may serve as introducing other scientists to the theory. The section is somewhat technical. It can be skimmed at the first reading, but it is in some sense essential for what I feel should be a way to understand ordinary quantum theory. However, it must only be seen as one out of several possible approaches towards the Hilbert space formalism. Several deep theories with a similar purpose exist; see references in Sect. 4.1.

4.4.1 Quantum Reconstruction

Fix \(0\in \mathcal {A}\), and let H be the Hilbert space

$$\displaystyle \begin{aligned}H=\{f\in L^2(\Phi,\rho):\ f(\phi)=\tilde{f}(\lambda^0(\phi))\ \text{for}\ \text{some}\ \tilde{f}\}.\end{aligned}$$

Here L 2( Φ, ρ) is the set of all complex functions f on Φ such that ∫Φ|f(ϕ)|2  < . Two functions f 1 and f 2 are identified if ∫Φ|f 1(ϕ) − f 2(ϕ)|2  = 0. From now on I will assume that the λ a’s are discrete. Then H is separable. If the λ a’s take d different values, H is d-dimensional. Since all separable Hilbert spaces are isomorphic, it is enough to arrive at the quantum formulation on this H.

Lemma 4.1

The values \(u_k^a\) of λ a can always be arranged such that \(u_k^a=u_k\) is the same for each a (k = 1, 2, …).

Proof

By Assumption 4.1

$$\displaystyle \begin{aligned}\{\phi:\lambda^b=u_k^b\}=\{\phi:\lambda^a(g_{ab}\phi)=u_k^b\}=g_{ba}(\{\phi:\lambda^a(\phi)=u_k^b\}).\end{aligned}$$

The sets in brackets on the lefthand side here are disjoint with union Φ. But then the sets in brackets on the righthand side are disjoint with union g ab( Φ) =  Φ, and this implies that \(\{u_k^b\}\) gives all possible values of λ a.

Now I am able to formulate the main result of this section. In the next subsection, I will prove this result under an additional technical assumption. An open question is to find exactly the conditions under which this theorem is valid.

Theorem 4.1

  1. a)

    For every a, u k and associated with every indicator function I(λ a(ϕ) = u k) there is a vector |a;k〉∈ H. The mapping I(λ a(ϕ) = u k) →|a;kis invertible in the sense that |a;k〉≠|b;jfor all a, b, j, k except in the trivial case a = b, j = k. This inequality is interpreted to mean that there is no phase factor e such that |a;k〉 = e |b;j〉.

  2. b)

    For each a the vectors |a;kform an orthonormal basis for H.

This gives us the possibility to interpret the vectors |a;k〉, corresponding to the indicators I(λ a(ϕ) = u k), as follows:

  1. (1)

    The question ‘What is the value of λ a?’ has been focused on.

  2. (2)

    Through an epistemic process we have obtained the answerλ a = u k’.

When a ket vector is defined, a corresponding bra vector can be defined. The operator corresponding to λ a can be defined as

$$\displaystyle \begin{aligned}A^a =\sum_k u_k |a;k\rangle\langle a;k|.\end{aligned}$$

In the maximal setting this has non-degenerate eigenvalues.

4.4.2 Proof Under an Extra Assumption

Let U be the left regular representation of G on L 2( Φ, ρ): U(g)f(ϕ) = f(g −1 ϕ). It is well known that this is a unitary representation. We will seek a corresponding representation of G on the smaller space H.

In the following, recall that upper indices as in g a indicate variables related to a particular λ a, here a group element of G a. Also recall that 0 is a fixed index in \(\mathcal {A}\). Lower indices as in g ab has to do with the relation between two different λ a and λ b.

Proposition 4.2

  1. a)

    A (multivalued) representation V of G on the Hilbert space H can always be found.

  2. b)

    There is an extended group G such that V is a univalued representation of G on H.

  3. c)

    There is a homomorphism G  G 0 such that V (g ) = U(g 0). If g e in G , then g 0e in G 0.

Proof

a) For each a and for g a ∈ G a define V (g a) = U(g 0a)U(g a)U(g a0). Then V (g a) is an operator on H, since it is equal to U(g 0a g a g a0), and g 0a g a g a0 ∈ G 0 = g 0a G a g a0. For a product g a g b g c with g a ∈ G a, g b ∈ G b and g c ∈ G c we define V (g a g b g c) = V (g a)V (g b)V (g c), and similarly for all elements of G that can be written as a finite product of elements from different subgroups.

Let now g and h be any two elements in G such that g can be written as a product of elements from G a, G b and G c, and similarly h (the proof is similar for other cases.) It follows that V (gh) = V (g)V (h) on these elements, since the last factor of g and the first factor of h either must belong to the same subgroup or to different subgroups; in both cases the product can be defined by the definition of the previous paragraph. In this way we see that V is a representation on the set of finite products, and since these generate G by Assumption 4.2b) it is a representation of G.

Since different representations of g as a product may give different solutions, we have to include the possibility that V may be multivalued.

b) Assume as in (a) that we have a multivalued representation V of G. Define a larger group G as follows: If g a g b g c = g d g e g f, say, with g k ∈ G k for all k, we define \(g_1^{\prime }=g^a g^b g^c \) and \(g_2^{\prime }=g^d g^e g^f\). Let G be the collection of all such new elements that can be written as a formal product of elements g k ∈ G k. The product is defined in the natural way, and the inverse by for example (g a g b g c)−1 = (g c)−1(g b)−1(g a)−1. By Assumption 4.2b), the group G generated by this construction must be at least as large as G. It is clear from the proof of a) that V also is a representation of the larger group G on H, now a one-valued representation.

c) Consider the case where g  = g a g b g c with g k ∈ G k. Then by the proof of a):

$$\displaystyle \begin{aligned}V(g^{\prime}) = U(g_{0a}) U(g^a)U(g_{a0}) U(g_{0b}) U(g^b)U(g_{b0})U(g_{0c}) U(g^c)U(g_{c0}) \end{aligned}$$
$$\displaystyle \begin{aligned}=U(g_{0a}g^a g_{a0} g_{0b}g^b g_{b0} g_{0c} g^c g_{c0}) =U(g^0),\end{aligned}$$

where g 0 ∈ G 0. The group element g 0 is unique since the decomposition g  = g a g b g c is unique for g ∈ G . The proof is similar for other decompositions. By the construction, the mapping g → g 0 is a homomorphism.

Assume now that g 0 = e and g e . Since \(U(g^0)\tilde {f}(\lambda ^0(\phi ))=\tilde {f}(\lambda ^0((g^0)^{-1}(\phi )))\), it follows from g 0 = e that U(g 0) = I on H. But then from what has been just proved, V (g ) = I, and since V is a univariate representation, it follows that g  = e , contrary to the assumption.

Assumption 4.3

  1. a)

    U is an irreducible representation of every cyclic subgroup of the group \(\tilde {G}^0\) on H other than the trivial group, and the dimension d of H is larger or equal to 2.

  2. b)

    The representation V of the whole group G is really multivalued on the elements g ab.

  3. c)

    When finding this basis, one can choose \(\tilde {f}_k\) and \(\tilde {f}_j\) in such a way that there exists a λ 1 such that \(\tilde {f}_k (\tilde {g}^{0} \lambda _{1}) \ne \tilde {f}_j (\lambda _1)\) for all \(\tilde {g}^{0} \epsilon \tilde {G}^{0}\) in the sense that the two sides can not be made equal by introducing a phase factor.

Now choose an orthonormal basis for H: f 1, …, f d where \(f_k(\phi )=\tilde {f}_k(\lambda ^0(\phi ))\), and where the interpretation of f k is that λ 0 = u k. Write |0;k〉 = f k(ϕ).

Lemma 4.2

For every k and every g 0 ∈ G 0 , g 0e, we have U(g 0)f kf k in the sense that the two functions can not be made equal by multiplying with a phase factor.

Proof

Let d ≥ 2. Assume that there exist γ, k and g 0e such that U(g 0)f k = e f k. Then e ∕2 f k span a one-dimensional subspace of H which is invariant under the cyclic group generated by g 0, contrary to the assumption of irreducibility.

Introduce now the assumption that the representation V really is multivalued. Let \(g_{0a1}^{\prime }\) and \(g_{0a2}^{\prime }\) be two different elements of the group G , both corresponding to g 0a of G. Define \(g_a^{\prime }=(g_{0a1}^{\prime })^{-1}g_{0a2}^{\prime }\). Then \(g_a^{\prime }\ne e^{\prime }\) in G . By the homomorphism of Proposition 4.2c), let \(g_a^{\prime }\rightarrow g_a^0\). Then \(g_a^0\ne e\) in G 0. Now define

$$\displaystyle \begin{aligned}|a;k\rangle = \tilde{f}_k(\lambda^0(g_a^0\phi))=U((g_a^0)^{-1})|0;k\rangle.\end{aligned}$$

Proof of Theorem 4.1 Under Assumption 4.3

By Lemma 4.2, |a;k〉≠|0;k〉. Here and below, inequality of state vectors is interpreted to mean that they can not be made equal by introducing a phase factor.

Next let jk. I will prove that the basis functions f 1, …, f d can be chosen so that |a;k〉≠|0;j〉 for all a. To this end, choose \(\tilde {f}_j\) and \(\tilde {f}_k\) in such a way that there exists an \(\lambda _0^j\) such that \(\tilde {f}_k(\tilde {g}^0\lambda ^{j}_{0})\ne \tilde {f}_j(\lambda _0^j)\) for all \(\tilde {g}^{0}\epsilon \tilde {G}^{0}\). Then for any fixed g, \(\tilde {f}_{kg}\) defined by \(\tilde {f}_{kg}(\lambda ^0(\phi ))=\tilde {f}_k(\lambda ^0(g\phi ))\) is different from \(\tilde {f}_j\), and |a;k〉≠|0;j〉 for all a.

The proof that |a;k〉≠|b;j〉 (except in the trivial case a = b, k = j) holds under Assumption 4.3, is a straightforward extension.

The vectors |0;k〉 are chosen to be an orthonormal basis for H. Since |a;k〉 = U|0;k〉 for some unitary U, it follows that the vectors |a;k〉 form an orthonormal basis.

Several authors have approached the foundation of quantum mechanics through group representation theory, one example being Mirman (1995), another Smilga (2017).

Assumption 4.3 is not satisfied for the spin/ angular momentum case. For this case the group \(\tilde {G}^0\) is too small to generate all states |a;k〉 from |0;k〉. I will show directly in the next chapter that Theorem 4.1 holds in general for the case of spin/ angular momentum (Corollary 5.1 of Sect. 5.2). But this also means that it must be possible to weaken Assumption 4.3) in some way.

Theorem 4.1 , saying that the question-and-answer pairs can be put in one-to-one correspondence with some states in a concrete Hilbert space, must hold under a set of assumptions including the spin/ angular momentum case.

In the qubit case (Hilbert space dimension 2; answers to questions: −1 and + 1), the question-and-answer pairs can be put in one-to-one correspondence with all state vectors in the Hilbert space. (Sect. 5.1.1 and Proposition 5.4 of Sect. 5.2.)

4.4.3 The Interpretation Argument

In the previous subsections I started with a setting where questions-and-answers could be defined, and arrived at Hilbert space unit vectors. Now start by assuming the Hilbert space formulation, and let |ψ〉 be an arbitrary unit vector in the Hilbert space H. This vector is of course the eigenvector of many operators in H. Assume that one can find such an operator A such that

  1. 1.

    A is physically meaningful, i.e., can be associated by an e-variable λ.

  2. 2.

    |ψ〉 is an eigenvector corresponding to a non-degenerate eigenvalue u of A.

Then |ψ〉 can be interpreted as a question ‘What is the value of λ?’ together with a definite answer ‘λ = u’. The non-degenerate eigenvalue case corresponds to the maximal epistemic setting.

4.5 The General Symmetrical Epistemic Setting

Go back to the definition of the maximal symmetrical epistemic setting. Let again ϕ be the inaccessible conceptual variable and let λ a = λ a(ϕ) for \(a\in \mathcal {A}\) be the maximal accessible conceptual variables satisfying Assumption 4.1. Let the corresponding induced groups G a and G satisfy Assumptions 4.2 and 4.3 (or some weaker assumption which can replace Assumption 4.3). Finally, let t a for each a be an arbitrary function on the range of λ a, and assume that we instead of focusing on λ a, focus on θ a = t a(λ a) for each \(a\in \mathcal {A}\). I will call this the symmetrical epistemic setting; the e-variables θ a are no longer maximal.

Let the Hilbert space be as in Sect. 4.4.1 with an orthonormal basis redefined to be {|a ;j〉} for each a. Let {u j} be the values of λ a, and let \(u_k^a\) be the values of θ a = t a(λ a). Define \(C_k^a=\{j: t^a(u_j)=u_k^a\}\), let \(V_k^a\) be the space spanned by \(\{|a^{\prime };j\rangle ; j\in C_k^a\}\) and let \(\Pi _k^a\) be the projection upon \(V_k^a\). Finally, let |a;k〉 be any unit vector in \(V_k^a\).

Interpretation of the State Vector |a;k

(1) The question: ‘What is the value of θ a ?’ has been posed. (2) We have obtained the answer \(\theta ^a =u_k^a\) . Both the question and the answer are contained in the state vector.

From this we may define the operator connected to the e-variable θ a:

$$\displaystyle \begin{aligned} A^a =\sum_k u_k^a \varPi_k^a = \sum_j t^a (u_j )|a';j\rangle\langle a';j|. {} \end{aligned} $$
(4.1)

Then A a is no longer necessarily an operator with distinct eigenvalues, but A a is still Hermitian: A a = A a.

Interpretation of the Operator A a

This gives all possible states and all possible values corresponding to the accessible e-variable θ a.

The general decomposition \(A^a=\sum _k u_k^a \Pi _k^a\) will be important in Sect. 5.4 and Sect. 5.7, and will be further discussed there.

The projectors |a;k〉〈a;k| and hence the ket vectors |a;k〉 are no longer uniquely determined by A a: They can be transformed arbitrarily by unitary transformations in each space corresponding to one eigenvalue. As long as the focus is only on θ a, or A a, I will redefine |a;k〉 by allowing it to be subject to such transformations. These transformed eigenvectors all still correspond to the same eigenvalue, that is, the same observed value of θ a and they give the same operators A a. In particular, in the maximal symmetric epistemic setting I will allow an arbitrary constant phase factor in the definition of the |a;k〉’s.

A more precise state interpretation is then to let the whole vector space of such transformed vectors |a;k〉 represent a question-and-answer pair. This will be gone more thoroughly into in Sect. 5.7.3.

As an example of the general construction, assume that λ a is a vector: \(\lambda ^a =(\theta ^{a_1},\ldots ,\theta ^{a_m})\). The different θ’s may be connected to different subsystems. This example is highly relevant when considering several observers. One single observer may have access to just a few subsystems. In addition he has his own context. From this context he may define his accessible and inaccessible conceptual variables. In the same way a group of observers may through verbal communication arrive at a common context, and from this context one may define their accessible and inaccessible conceptual variables. Assume that these observers together observe a particular physical system.

Assumption 4.4

For a given physical system at some particular time one can imagine an observer or a group of communicating observers for which the assumptions of the symmetrical epistemic setting are satisfied. In some cases all possible observers agree on the physical observations, and these then describe an objective property of the system.

So far I have kept the same groups G a and G when going from λ a to θ a = t a(λ a), that is from the maximal symmetrical epistemic setting to the general symmetrical epistemic setting. This implies that the (large) Hilbert space will be the same. A special case occurs if t a is a reduction to an orbit of G a. This is the kind of model reduction discussed in Sect. 2.2. Then the construction of the previous sections can also be carried with a smaller group action acting just upon an orbit, resulting then in a smaller Hilbert space. In the example \(\lambda ^a =(\theta ^{a_1},\ldots ,\theta ^{a_m})\). it may be relevant to consider one Hilbert space for each subsystem. Then one can write a state vectors corresponding to λ a as

$$\displaystyle \begin{aligned}|a;k\rangle =|a_1;k_1\rangle \otimes \ldots\otimes |a_m;k_m\rangle\end{aligned}$$

in an obvious notation, where a = (a 1, …, a m) and k = (k 1, …, k m). The large Hilbert space is however the correct space to use when the whole system is considered. In this Hilbert space the subsystem ket vectors will have degenerate eigenvalues and correspond to the general symmetrical epistemic setting.

At any time we can also imagine non-communicating observers. Then for each particular observers the assumptions of the general symmetrical setting may be assumed to apply. Particular state vectors in each observer’s Hilbert space may be linear combinations of primitive state vectors in the form of a tensor product. These are called entangled states when they can not be reduced to a primitive form, and play an important role in many areas of quantum physics.

Assumption 4.4 is assumed for a large class of physical systems. Through the imagined observers the constructions of this chapter may be carried out, and for each case a Hilbert space may e constructed.

This is the connection between my theory and the formal quantum theory defined in textbooks. I will claim that the theory defined by having the maximal symmetrical epistemic setting as a point of departure is from one point of view more intuitive than the ordinary formal theory.