1 Introduction

Whether the unitary time evolution in an isolated macroscopic quantum system can describe the phenomenon of thermalization or, equivalently, the approach to thermal equilibrium is an essential question in the foundation of statistical mechanics. Since there are several different formulations of thermalization, we shall first make clear what we precisely mean by thermalization in the present work. Consider a many-body quantum system with Hamiltonian \(\hat{H}\) and take a pure initial state \(|\Phi (0)\rangle \) in which energy is sharply distributed around some value E. We say that the system with this initial state thermalizes if the measurement result of any macroscopic observable \(\hat{A}\) in the time-evolved state \(|\Phi (t)\rangle \) after sufficiently long and typical time \(t>0\) is indistinguishable (with probability very close to one) from the microcanonical average \(\langle \hat{A}\rangle ^\textrm{MC}_E\). Note that we are dealing with the outcome of a single quantum mechanical measurement of \(\hat{A}\) in the state \(|\Phi (t)\rangle \) rather than the quantum mechanical expectation value \(\langle \Phi (t)|\hat{A}|\Phi (t)\rangle \) or any other averaged quantities. Therefore, thermalization formulated in this manner guarantees that the result of a single experiment at a sufficiently later time is predicted precisely by equilibrium statistical mechanics.Footnote 1

Our ultimate goal is to rigorously establish the presence of thermalization in the above strong form in a realistic macroscopic quantum system with a realistic nonequilibrium initial state. But this seems to be a formidably difficult problem for the moment. In the present paper, we report a partial result toward the goal, namely, complete proof that a low-density free fermion chain exhibits thermalization in the above sense but for a restricted class of observables [1].

The study of thermalization in isolated macroscopic quantum systems goes back to the early days of quantum mechanics [2], but considerable progress has been made in the present century partly motivated by modern ultracold atom experiments [3,4,5,6,7]. It is now a general consensus that a sufficiently complex many-body quantum system has the ability to thermalize only by the unitary time evolution [8, 9].

An important theoretical concept in the study of thermalization is the energy eigenstate thermalization hypothesis (ETH). It was first introduced (implicitly) by von Neumann in 1929 [2, 10] as an essential assumption for his quantum ergodic theorem. See [8, 11] for the relation between von Neumann’s ETH and the modern version of ETH proposed in [12, 13]. Another key theoretical concept is a large effective dimension of the initial state. It was first pointed out by Tasaki in 1998 [14] (without explicitly introducing the notion of the effective dimension) that one can show the presence of equilibration if the effective dimension is large enough. It is known that one can prove the presence of thermalization by assuming either (i) some (strong) form of ETH [2, 10, 15,16,17], (ii) some form of ETH and a large enough effective dimension [14, 18,19,20,21], or (iii) an effective dimension almost as large as the total dimension [17, 22].

It is strongly believed that the assumptions in the above scenarios (i), (ii), and (iii) are satisfied in a large class of sufficiently complex quantum systems and their realistic (nonequilibrium) initial states. However, it is extremely difficult, even if not impossible, to justify the assumptions rigorously for concrete models. As far as we know, there have been no concrete and nontrivial examples of quantum systems with short-range interaction in which the presence of thermalization was justified according to these scenarios without relying on any unproven assumptions. We note that an example based on a different mechanism is discussed in [23].

It is interesting, on the other hand, that there have been many examples of quantum systems in which the absence of thermalization was rigorously established. A well-known example is an integrable system, where the system relaxes not to the equilibrium state but to a state corresponding to an ensemble characterized by its local integrals of motion. The absence of thermalization in such systems with local integrals of motion is an old established property [24, 25], and has recently been studied in detail in terms of the generalized Gibbs ensemble [26, 27]. Another example is a system with many-body localization: A spin system with many-body localization has random interactions or a random magnetic field, and this randomness prohibits its thermalization as in the case of the Anderson localization [28,29,30,31,32,33]. Recently, a more exotic system was found where most initial states thermalize while some do not. This phenomenon was first observed in experiments of cold atoms [34], and independently from this experiment, a general theoretical framework covering such phenomena was proposed [35, 36]. Later, such phenomena were named quantum many-body scar states, and have attracted the interests of broad research fields [37,38,39,40,41,42,43]. Furthermore, it has even been shown that the problem of thermalization is, in general, undecidable [44].

The goal of this paper is to present a nontrivial and rigorous concrete example of thermalization (in a restricted sense) that does not rely on any unproven assumptions. We first develop a general theory of unitary time evolution in a low-density lattice gas that satisfies two crucial assumptions, and establish the presence of thermalization with respect to the number operator for any macroscopic region, assuming that the initial nonequilibrium state is generated randomly. The derivation is based on the above-mentioned scenario (iii), which requires the effective dimension of the initial state to be almost as large as the total Hilbert space dimension. We then prove that the two assumptions are indeed satisfied in the simplest model, namely, the free fermion chain with suitable parameters. Although a free fermion system does not exhibit full-fledged thermalization, i.e., the approach to thermal equilibrium from an arbitrary nonequilibrium state (with almost fixed energy), it does thermalize in our setting where the initial state is sufficiently complex. We should note that we are here using the notion of thermalization in a phenomenological manner, in the sense we focus only on macroscopically observable features and do not pay attention to microscopic mechanisms. More precisely, thermalization in our example is essentially indistinguishable from that observed in a realistic gas, provided that a macroscopic observer measures only the density of particles in a given region (and the coarse-grained momentum distribution).Footnote 2 See Sect. 4 for a related discussion, and [45,46,47] for detailed numerical studies of closely related problems.

It is important to note, however, that our general theory should apply to non-integrable models as well, in which one expects full-fledged thermalization to take place. In fact, the key assumption in our theory is about the particle distribution in energy eigenstates, which may be regarded as a very restrictive form of ETH. The other assumption is the absence of degeneracy in the energy spectrum of the model, which appears highly natural and plausible in complex many-body systems. Interestingly, if we assume the absence of degeneracy, we can justify the first assumption about the particle distribution for a wider class of lattice gas models, including interacting ones. It is an intriguing problem whether one can find non-integrable models in which our assumptions can be fully justified.

Before going into details of our theory, let us state precisely what we can prove for free fermion chains. Consider a system of N fermions on the chain \(\{1,\ldots ,L\}\), where we fix the density \(\rho =N/L\) and make N and L large. We take the standard Hamiltonian with uniform nearest-neighbor hopping

$$\begin{aligned} \hat{H}=\sum _{x=1}^L\bigl \{e^{i\theta }\,\hat{c}^\dagger _x\hat{c}_{x+1}+e^{-i\theta }\,\hat{c}^\dagger _{x+1}\hat{c}_x\bigr \}, \end{aligned}$$
(1.1)

where the phase \(\theta \in \mathbb {R}\) is introduced (artificially) to break the reflection symmetry. See Sect. 3.1 for notations and details. In the most standard model with \(\theta =0\), most energy eigenvalues are degenerate because of the reflection symmetry (which brings the wave number k to \(-k\)). It is likely that the degeneracies are lifted by a nonzero phase \(\theta \). We assume that the parameters are properly chosen so that all the energy eigenvalues of \(\hat{H}\) are nondegenerate. In fact, we prove in Sect. 3.2 that the model is free from degeneracy under some conditions. For example, it suffices to set \(\theta =(4N+2L)^{-(L-1)/2}\) provided that L is an odd prime.

We choose initial state \(|\Phi (0)\rangle \) randomly from the subspace of states in which all fermions are in the half-chain \(\{1,\ldots ,(L-1)/2\}\). This corresponds to the infinite temperature equilibrium state confined in the half-chain. We then denote by \(|\Phi (t)\rangle =e^{-i\hat{H}t}|\Phi (0)\rangle \) the state at time \(t>0\). We let \(\hat{N}_\textrm{left}\) be the operator that counts the number of fermions on the half-chain \(\{1,\ldots ,(L-1)/2\}\). Then, our main result is as follows:

Theorem 1.1

When N (or L) is sufficiently large and \(\rho =N/L\le 1/5\), the following is true with probability larger than \(1-e^{-(\rho /3)N}\) (where the probability is that for the choice of \(|\Phi (0)\rangle \)). There exists a sufficiently long time \(T>0\) and a subset (a collection of intervals) \(G\subset [0,T]\) with \(\mu (G)/T\ge 1-e^{-(\rho /4)N}\) (where \(\mu (G)\) denotes the total length of the intervals in G) such that if one measures \(\hat{N}_\textrm{left}\) in \(|\Phi (t)\rangle =e^{-i\hat{H}t}|\Phi (0)\rangle \) at any \(t\in G\) the measurement result \(N_\textrm{left}\) satisfies

$$\begin{aligned} \Bigl |\frac{N_\textrm{left}}{N}-\frac{1}{2}\Bigr |\le \varepsilon _0(\rho ), \end{aligned}$$
(1.2)

with probability larger than \(1-e^{-(\rho /4)N}\) (where the probability is that for quantum measurement). Here we set \(\varepsilon _0(\rho )=\sqrt{\frac{3}{2}\rho }\).

The factors \(e^{-(\rho /3)N}\) and \(e^{-(\rho /4)N}\) are essentially negligible if \(\rho N\gg 1\). Then the theorem states that it almost certainly happens that the measurement result of \(\hat{N}_\textrm{left}/N\) at a sufficiently large and typical time is close to its equilibrium value 1/2 with precision \(\varepsilon _0(\rho )\). Since the measurement result of \(\hat{N}_\textrm{left}/N\) is 1 in the initial state \(|\Phi (0)\rangle \), this establishes an irreversible behavior (or the approach to thermal equilibrium) with respect to the observable \(\hat{N}_\textrm{left}\). We should note that our result is not limited for a single specific observable \(N_\textrm{left}\). In fact, the main theorem, Theorem 2.4, is stated for the number operator for any macroscopic region. As we have already stressed, it is crucial that we are dealing with the result of a single projective measurement of \(\hat{N}_\textrm{left}\) in the state \(|\Phi (t)\rangle \), rather than its quantum mechanical average \(\langle \Phi (t)|\hat{N}_\textrm{left}|\Phi (t)\rangle \).

We must note, however, that the precision \(\varepsilon _0(\rho )\) in (1.2) is a function of the density \(\rho \), and may not be small. One needs to consider a system with low density in order to have high precision. For example, \(\rho \simeq 10^{-4}\) for \(\varepsilon _0\simeq 10^{-2}\), or \(\rho \simeq 0.04\) for \(\varepsilon _0=1/4\). This density-dependence of the precision \(\varepsilon _0(\rho )\) is a major shortcoming of the present theory, which reflects our strategy to base the theory only on mild verifiable assumptions. We nevertheless stress that our theorem establishes thermalization in a certain sense without relying on any unproven assumptions.

The present paper is organized as follows. In Sect. 2, we state our main thermalization theorem for a general lattice gas satisfying two assumptions, namely, Assumptions 2.1 and 2.2. Then in Sect. 3, we prove these two assumptions are indeed satisfied in free fermion chains with suitable parameters.

In Appendix A, we discuss the extension of our general theory to a model in which the energy spectrum is moderately degenerate. In Appendix B, we present two classes of models (one of which includes non-integrable models) in which we can justify Assumption 2.2 about the particle distribution in energy eigenstates, assuming that the energy eigenvalues are nondegenerate. We stress that Assumption 2.2 is indeed an essential nontrivial assumption in our theory. In Appendix C, we present some concrete estimates of the effective dimensions of some non-random initial states in the free fermion chain, and with the help of this estimate, we prove that some non-random initial states indeed thermalize. Finally, in Appendix D, we briefly discuss a possible extension of our result to finite temperature states.

2 General Results

Here we describe general systems of lattice gas, state necessary assumptions, and prove the main low-density thermalization theorem. The new observation about the effective dimension is summarized in Theorem 2.3.

2.1 Setting and Main Assumptions

Let \(\Lambda \) be a lattice with L sites, and consider a system of N fermions on \(\Lambda \).Footnote 3 A typical example is the chain \(\Lambda =\{1,\ldots ,L\}\). We take the thermodynamic convention (except in Appendix C), in which we fix the density \(\rho \), choose L and N such that \(N/L\simeq \rho \), and make L and N sufficiently large. Our results are meaningful in the low-density regime, where \(\rho \) is sufficiently small.

Let \(\mathcal{H}_\textrm{tot}\) be the Hilbert space of the system with N particles on the lattice \(\Lambda \). The dimension \(D_\textrm{tot}\) of \(\mathcal{H}_\textrm{tot}\) is given by

$$\begin{aligned} D_\textrm{tot}=\left( {\begin{array}{c}L\\ N\end{array}}\right) \sim e^{L\,S(\rho )}, \end{aligned}$$
(2.1)

where the relation \(F(L)\sim G(L)\) means

$$\begin{aligned} \lim _{L\uparrow \infty }\frac{1}{L}\log \frac{F(L)}{G(L)}=0, \end{aligned}$$
(2.2)

and

$$\begin{aligned} S(p)=-p\log p-(1-p)\log (1-p), \end{aligned}$$
(2.3)

is the binominal entropy. The final expression in (2.1) comes from the Stirling formula.

We decompose the lattice \(\Lambda \) disjointly into two parts as \(\Lambda =\Lambda _1\cup \Lambda _2\), where \(|\Lambda _1|=(L-1)/2\) and \(|\Lambda _2|=(L+1)/2\) when L is odd, and \(|\Lambda _1|=|\Lambda _2|=L/2\) when L is even. Throughout the present paper, we denote by |S| the number of elements in a set S. Let \(\mathcal{H}_1\) denote the nonequilibrium subspace where all particles are in the sublattice \(\Lambda _1\) and hence \(\Lambda _2\) is empty. The dimension of \(\mathcal{H}_1\) is

$$\begin{aligned} D_1=\left( {\begin{array}{c}\frac{L-1}{2}\\ N\end{array}}\right) \sim e^{(L/2)S(2\rho )}, \end{aligned}$$
(2.4)

where we assumed L is odd (but the result is essentially the same for even L). We denote by \(\hat{P}_1\) the projection onto the subspace \(\mathcal{H}_1\).

Let \(\hat{H}\) be the Hamiltonian of the system. We assume that \(\hat{H}\) preserves the particle number, and denote by \(|\Psi _j\rangle \in \mathcal{H}_\textrm{tot}\) with \(j=1,\ldots ,D_\textrm{tot}\) its normalized eigenstate (with N particles) corresponding to the energy eigenvalue \(E_j\). We make two essential assumptions about energy eigenvalues and eigenstates.

Assumption 2.1

The energy eigenvalues \(E_1,\ldots ,E_{D_\textrm{tot}}\) of \(\hat{H}\) are nondegenerate.

It is believed that the energy eigenvalues of a quantum many-body system are, in general, nondegenerate unless there are special reasons (such as symmetry) that cause degeneracy. In other words, it is likely that accidental degeneracies can always be lifted by adding an appropriate small perturbation to the Hamiltonian. It is, however, not at all easy to make this intuition into proof for a concrete class of models. In Sect. 3.2, we shall prove that some free fermion models on a chain are indeed free from degeneracy. See Theorems 3.1 and 3.2.

Assumption 2.2

For any \(j=1,\ldots ,D_\textrm{tot}\), the energy eigenstate \(|\Psi _j\rangle \) satisfies

$$\begin{aligned} \langle \Psi _j|\hat{P}_1|\Psi _j\rangle \le 2^{-N}. \end{aligned}$$
(2.5)

Here \(\langle \Psi _j|\hat{P}_1|\Psi _j\rangle \) is the probability to find all particles in \(\Lambda _1\) in the state \(|\Psi _j\rangle \). Note that one gets the probability \(2^{-N}\) if each particle independently chooses between \(\Lambda _1\) and \(\Lambda _2\) with probability 1/2. The bound (2.5) is reasonable since the hardcore nature further reduces the probability. We expect the bound (2.5) to hold for a large class of interacting quantum lattice gases, but, for the moment, we are able to prove it for a class of non-interacting fermions (Sect. 3.3 and Appendix B.2) and systems of interacting fermions or hardcore bosons on a double lattice with special symmetry (Appendix B.1).

We also note that Assumption 2.2 is reminiscent of the strong ETH in the sense that it is an assertion about every energy eigenstate. But this is much weaker than the standard ETH since we only require that a single observable, rather than all macroscopic observables, satisfies the inequality (2.5), rather than an equality.

In what follows, we first show that, under Assumption 2.2, a random initial state has an extremely large effective dimension with high probability (Theorem 2.3). Then, by combining Assumption 2.1 and the largeness of effective dimension, we conclude that this initial state thermalizes (Theorem 2.4).

2.2 Initinal State and its Effective Dimension

Let \(|\Phi (0)\rangle \in \mathcal{H}_\textrm{tot}\) be the normalized initial state of the system. We define the effective dimension \(D_\textrm{eff}\) of \(|\Phi (0)\rangle \) by

$$\begin{aligned} D_\textrm{eff}=\biggl (\sum _{j=1}^{D_\textrm{tot}}\bigl |\langle \Phi (0)|\Psi _j\rangle \bigr |^4\biggr )^{-1}, \end{aligned}$$
(2.6)

which quantifies the effective number of energy eigenstates that constitute the state \(|\Phi (0)\rangle \). It holds in general that \(1\le D_\textrm{eff}\le D_\textrm{tot}\). It is known that an initial state whose effective dimension \(D_\textrm{eff}\) is almost as large as \(D_\textrm{tot}\) generically exhibits thermalization, provided that the energy eigenvalues are nondegenerate. See section 6 of [17]. (See Appendix A for necessary modifications when there are degeneracies.) It is strongly believed that a realistic nonequilibrium initial state of a non-integrable many-body quantum system has an effective dimension almost as large as the total Hilbert space dimension.Footnote 4 See [48,49,50] for systematic convincing numerical studies.Footnote 5 However, it seems to be formidably difficult to prove this expectation rigorously. Currently available general lower bound for \(D_\textrm{eff}\) only shows that it is only moderately large [51]. Our major task is to construct an initial state \(|\Phi (0)\rangle \) that is far from equilibrium and has a large effective dimension \(D_\textrm{eff}\).

To realize such an initial state with large \(D_\textrm{eff}\), we choose \(|\Phi (0)\rangle \) randomly from the subspace \(\mathcal{H}_1\). To be precise, denoting by \(\{|\Xi _j\rangle \}_{j=1,\ldots ,D_1}\) an arbitrary orthonormal basis of \(\mathcal{H}_1\) we prepare an initial state as \(|\Phi (0)\rangle =\sum _{j=1}^{D_1}c_j|\Xi _j\rangle \), where \(c_j\in \mathbb {C}\) satisfies \(\sum _j|c_j|^2=1\) and are drawn randomly according to the uniform measure on the unit sphere in the \(D_1\) dimensional complex space. Such \(|\Phi (0)\rangle \) describes a nonequilibrium state such that all particles are confined in the sublattice \(\Lambda _1\), while the state restricted to \(\Lambda _1\) is in thermal equilibrium at infinite temperature. In this state, the infinite temperature state in \(\Lambda _1\) borders a vacuum in \(\Lambda _2\). Therefore we can interpret the present initial state as a limiting case of a nonequilibrium state in which two equilibrium states with different pressures are in touch with each other.

We then have the following essential result, which is the main new observation in the present paper.

Theorem 2.3

Suppose that Assumption 2.2 is valid and that \(\rho \le 1/5\). Then, for sufficiently large N, one has

$$\begin{aligned} \frac{D_\textrm{tot}}{D_\textrm{eff}}\le e^{\rho N}, \end{aligned}$$
(2.7)

with probability larger than \(1-e^{-(\rho /3)N}\).

Here the probability is that for the random choice of the initial state \(|\Phi (0)\rangle \). We thus see that, when \(\rho \) is small, the effective dimension \(D_\textrm{eff}\) is almost as large as \(D_\textrm{tot}\) with probability very close to one. We shall see in Sect. 2.3 below that the upper bound (2.7) implies thermalization in a certain sense.

Proof of Theorem 2.3

It is well known (and can easily be shown) that for any \(|\Xi \rangle \in \mathcal{H}_1\), one has

$$\begin{aligned} \overline{\bigl |\langle \Phi (0)|\Xi \rangle \bigr |^4}=\frac{2}{D_1(D_1+1)}\,\Vert |\Xi \rangle \Vert ^4, \end{aligned}$$
(2.8)

where the bar on the left-hand side denotes the average over the random choice of \(|\Phi (0)\rangle \). See, e.g., [52]. Noting that \(\langle \Phi (0)|\Psi _j\rangle =\langle \Phi (0)|\hat{P}_1|\Psi _j\rangle \) and that \(\hat{P}_1|\Psi _j\rangle \in \mathcal{H}_1\), we find from (2.6) and (2.8) that

$$\begin{aligned} \overline{D_\textrm{eff}^{-1}} =\sum _{j=1}^{D_\textrm{tot}}\overline{\bigl |\langle \Phi (0)|\hat{P}_1|\Psi _j\rangle \bigr |^4} =\frac{2}{D_1(D_1+1)}\sum _{j=1}^{D_\textrm{tot}}\Vert \hat{P}_1|\Psi _j\rangle \Vert ^4. \end{aligned}$$
(2.9)

By using the assumed bound (2.5), which is written as \(\Vert \hat{P}_1|\Psi _j\rangle \Vert ^2\le 2^{-N}\), we find

$$\begin{aligned} \overline{D_\textrm{eff}^{-1}}&\le \frac{2}{D_1(D_1+1)2^N}\sum _{j=1}^{D_\textrm{tot}}\Vert \hat{P}_1|\Psi _j\rangle \Vert ^2 =\frac{2}{D_1(D_1+1)2^N}{\text {Tr}}[\hat{P}_1] =\frac{2}{(D_1+1)2^N}, \end{aligned}$$
(2.10)

where we noted that \({\text {Tr}}[\hat{P}_1]=D_1\). Recalling (2.1) and (2.4), we see that

$$\begin{aligned} D_\textrm{tot}\overline{D_\textrm{eff}^{-1}}\le \frac{2D_\textrm{tot}}{2^ND_1}\sim \exp [L\,S(\rho )-\tfrac{L}{2}S(2\rho )-N\log 2]=e^{g(\rho )L}, \end{aligned}$$
(2.11)

with

$$\begin{aligned} g(\rho )&=S(\rho )-\tfrac{1}{2}S(2\rho )-\rho \log 2 \nonumber \\&=-(1-\rho )\log (1-\rho )+\frac{1-2\rho }{2}\log (1-2\rho ) \nonumber \\&=\frac{\rho ^2}{2}+\frac{\rho ^3}{2}+\frac{7\rho ^4}{12}+\cdots <\frac{2}{3}\rho ^2. \end{aligned}$$
(2.12)

Here the final inequality is verified for \(\rho \in [0,1/5]\) with an aid of numerical evaluation. We can rewrite the estimate (2.11) into the bound

$$\begin{aligned} D_\textrm{tot}\overline{D_\textrm{eff}^{-1}}\le \exp [\tfrac{2}{3}\rho ^2L]=\exp [\tfrac{2}{3}\rho N], \end{aligned}$$
(2.13)

provided that L (or N) is sufficiently large. Theorem 2.3 then follows from Markov’s inequality as follows. Let p be the probability that \(D_\textrm{tot}D_\textrm{eff}^{-1}\) is larger than \(e^{\rho N}\). Then we see \(D_\textrm{tot}\overline{D_\textrm{eff}^{-1}}\ge p\,e^{\rho N}\), which, with (2.13), implies \(p\le e^{-(\rho /3)N}\). \(\square \)

One may prefer a statement for a definite (i.e., non-random) initial state rather than that for (the majority of) random initial states. In Appendix C, we discuss a non-random initial state whose effective dimension almost saturates as in Theorem 2.3.

2.3 Time Evolution and Thermalization

Let us now consider the state obtained from the initial state \(|\Phi (0)\rangle \) by the unitary time evolution, i.e.,

$$\begin{aligned} |\Phi (t)\rangle =e^{-i\hat{H}t}|\Phi (0)\rangle =\sum _{j=1}^{D_\textrm{tot}}e^{-iE_jt}|\Psi _j\rangle \langle \Psi _j|\Phi (0)\rangle . \end{aligned}$$
(2.14)

We expect that, for sufficiently large and typical t, the time-evolved state \(|\Phi (t)\rangle \) describes (in a certain physical sense) the thermal equilibrium at infinite temperature. See the next subsection.

To examine the property of the state \(|\Phi (t)\rangle \), we take an arbitrary subset \(\Gamma \) of \(\Lambda \) such that \(|\Gamma |=\gamma L\), where \(\gamma \) is a constant of order 1, and measure the proportion of particles in \(\Gamma \). We shall prove that, for sufficiently large and typical time t, the proportion is close to its equilibrium value, \(\gamma \), with probability very close to one. This type of statement has been shown in the literature for initial states with extremely large effective dimensions [17, 22], and we follow the standard idea. Our precise statement is as follows.

Theorem 2.4

We fix the (small) density \(\rho >0\), and take sufficiently large L and N such that \(N/L\simeq \rho \). We consider a system of N particles on the lattice \(\Lambda \) such that \(|\Lambda |=L\) and let \(\hat{H}\) be the Hamiltonian. Suppose that Assumption 2.1 about nondegeneracy is valid and also that the effective dimension \(D_\textrm{eff}\) is large enough to satisfy the bound (2.7). (This is guaranteed by Theorem 2.3 to be extremely likely.) Take any \(\Gamma \subset \Lambda \) such that \(|\Gamma |=\gamma L\), and let \(\hat{N}_\Gamma \) be the operator that counts the number of particles in \(\Gamma \). Then there exists a constant \(T>0\) and a subset (a collection of intervals) \(G\subset [0,T]\) with

$$\begin{aligned} \frac{\mu (G)}{T}\ge 1-e^{-(\rho /4)N}, \end{aligned}$$
(2.15)

where \(\mu (G)\) is the total length of the intervals in G. Suppose that one performs a measurement of the number operator \(\hat{N}_\Gamma \) in the state \(|\Phi (t)\rangle \) with arbitrary \(t\in G\). Then, with probability larger than \(1-e^{-(\rho /4)N}\), the measurement result \(N_\Gamma \) satisfies

$$\begin{aligned} \Bigl |\frac{N_\Gamma }{N}-\gamma \Bigr |\le \varepsilon _0(\rho ), \end{aligned}$$
(2.16)

where the precision is given by

$$\begin{aligned} \varepsilon _0(\rho )=\sqrt{6\gamma (1-\gamma )\rho }. \end{aligned}$$
(2.17)

Here the probability is that for the quantum mechanical measurement. Suppose that N is sufficiently large so that \(e^{-(\rho /4)N}\) is negligibly small. Then the theorem guarantees that (2.16) almost certainly holds for almost all t in [0, T]. The bound (2.16) states that the observed proportion \(N_\Gamma /N\) is close to its equilibrium value, \(\gamma \). Recalling that the initial state \(|\Phi (0)\rangle \) is a nonequilibrium state in which all particles are in \(\Lambda _1\), we have established that the system thermalizes only by means of unitary time evolution (2.14).

We must note, however, that the precision in the relation (2.16) is given by \(\varepsilon _0(\rho )\), which is a function of \(\rho \) as in (2.17) and may not be small. In fact, we need to make the density \(\rho \) sufficiently low to achieve high precision. If one demands that the precision \(\varepsilon _0(\rho )\) should be, for example, of order \(10^{-2}\) then \(\rho \) should be of order \(10^{-4}\). This density dependence of the precision and the resulting limitation to dilute gases are the major shortcomings of the present theory, which comes from our strategy to base the whole theory on mild verifiable assumptions, namely, Assumptions 2.1 and 2.2.

We should also remark that our criterion for thermal equilibrium deals only with the number of particles in an arbitrary macroscopic region. We have proved the presence of thermalization, but only with respect to this rather restricted criterion. This again reflects the limitation arising from our mild assumptions. Although we expect that thermalization for other macroscopic quantities reflecting single-particle properties can be established by a straightforward extension of the present analysis, we are far from treating quantities that involve particle-particle correlations. See the discussion at the end of Sect. 4.

Proof of Theorem 2.4

The proof consists essentially of a combination of standard arguments found in the literature. For \(\varepsilon >0\), let \(\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}\) denote the projection operator onto the subspace of \(\mathcal{H}_\textrm{tot}\) determined by

$$\begin{aligned} \biggl |\frac{\hat{N}_\Gamma }{N}-\gamma \biggr |\ge \varepsilon . \end{aligned}$$
(2.18)

Clearly, the expectation value \(\langle \Phi (t)|\hat{P}^{\Gamma ,\varepsilon _0(\rho )}_\textrm{neq}|\Phi (t)\rangle \) is the probability that the measurement result of \(\hat{N}_\Gamma \) in \(|\Phi (t)\rangle \) does not satisfy the relation (2.16). From (2.14), we see that

$$\begin{aligned} \langle \Phi (t)|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Phi (t)\rangle =\sum _{j,j'=1}^{D_\textrm{tot}}e^{i(E_j-E_{j'})t} \langle \Phi (0)|\Psi _{j}\rangle \langle \Psi _j|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Psi _{j'}\rangle \langle \Psi _{j'}|\Phi (0)\rangle . \end{aligned}$$
(2.19)

Since we assumed that the energy eigenvalues \(E_j\) are non-degegerate, the long-time average of \(\langle \Phi (t)|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Phi (t)\rangle \) is expressed in terms of a single sum as

$$\begin{aligned} \lim _{T\uparrow \infty }\frac{1}{T}\int _0^Tdt\,\langle \Phi (t)|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Phi (t)\rangle&=\sum _{j=1}^{D_\textrm{tot}}\bigl |\langle \Phi (0)|\Psi _{j}\rangle \bigr |^2\langle \Psi _j|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Psi _j\rangle \nonumber \\&\le \sqrt{\biggl (\sum _{j=1}^{D_\textrm{tot}}\bigl |\langle \Phi (0)|\Psi _{j}\rangle \bigr |^4\biggr ) \biggl (\sum _{j=1}^{D_\textrm{tot}}\langle \Psi _j|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Psi _j\rangle ^2\biggr ) }\nonumber \\&\le \sqrt{ D_\textrm{tot}D_\textrm{eff}^{-1}\,\langle \hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}\rangle _\infty , } \end{aligned}$$
(2.20)

where we defined the canonical average at infinite temperature by

$$\begin{aligned} \langle \cdots \rangle _\infty =\frac{{\text {Tr}}_{\mathcal{H}_\textrm{tot}}[\cdots ]}{D_\textrm{tot}}. \end{aligned}$$
(2.21)

In (2.20), the second line follows from the Schwarz inequality, and the final expression follows from (2.6) by noting \(\langle \Psi _j|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Psi _j\rangle ^2\le \langle \Psi _j|\hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}|\Psi _j\rangle \).

Below we prove the large-deviation type upper bound

$$\begin{aligned} \langle \hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}\rangle _\infty \le C\exp \Bigl [-\frac{\varepsilon ^2}{3\gamma (1-\gamma )}N\Bigr ] =C\exp \Bigl [-2\rho \Bigl (\frac{\varepsilon }{\varepsilon _0(\rho )}\Bigr )^2N\Bigr ], \end{aligned}$$
(2.22)

with a constant \(C>1\), assuming that N is sufficiently large and \(\varepsilon \) is sufficiently small. Note that the right-hand side reduces to \(Ce^{-2\rho N}\) if we set \(\varepsilon =\varepsilon _0(\rho )\). Recalling (2.7), we find that the right-hand side of (2.20) with \(\varepsilon =\varepsilon _0(\rho )\) is bounded from above by \(\sqrt{C}\,e^{-(\rho /2)N}\). This means that there is sufficiently large T such that the finite-time average satisfies

$$\begin{aligned} \frac{1}{T}\int _0^Tdt\,\langle \Phi (t)|\hat{P}^{\Gamma ,\varepsilon _0(\rho )}_\textrm{neq}|\Phi (t)\rangle \le e^{-(\rho /2)N}. \end{aligned}$$
(2.23)

To rewrite the obtained relation into the form of Theorem 2.4, we apply Markov’s inequality. We let G be the set of time at which (2.16) is satisfied with probability larger than \(1-e^{-(\rho /4)N}\):

$$\begin{aligned} G=\bigl \{t\in [0,T]\,\bigl |\,\langle \Phi (t)|\hat{P}^{\Gamma ,\varepsilon _0(\rho )}_\textrm{neq}|\Phi (t)\rangle \le e^{-(\rho /4)N}\bigr \}. \end{aligned}$$
(2.24)

The property of G stated in the theorem is fulfilled by construction. It remains to verify (2.15) for the above G. For this, it suffices to note that

$$\begin{aligned} \frac{1}{T}\int _0^Tdt\,\langle \Phi (t)|\hat{P}^{\Gamma ,\varepsilon _0(\rho )}_\textrm{neq}|\Phi (t)\rangle \ge \frac{1}{T}\int _{t\in [0,T]\backslash G}dt\,e^{-(\rho /4)N} =\frac{T-\mu (G)}{T}\,e^{-(\rho /4)N},\nonumber \\ \end{aligned}$$
(2.25)

which, with (2.23), implies the desired (2.15). \(\square \)

Derivation of (2.22)

We shall be brief since the derivation is standard and elementary. Let \(\hat{P}_M\) be the projection onto the subspace with \(\hat{N}_\Gamma =M\). It is clear that

$$\begin{aligned} \hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}=\mathop {\sum _{M}}_{(|M/N-\gamma |\ge \varepsilon )}\hat{P}_M, \end{aligned}$$
(2.26)

and

$$\begin{aligned} \langle \hat{P}_M\rangle _\infty \sim \exp \left[ \gamma L\,S\left( \frac{M}{\gamma L}\right) +(1-\gamma )L\,S\left( \frac{N-M}{(1-\gamma )L}\right) -L\,S \left( \frac{N}{L}\right) \right] . \end{aligned}$$
(2.27)

When \(|M/N-\gamma |=\varepsilon \) or, equivalently, \(M/N-\gamma =\pm \varepsilon \), the two first argument of \(S(\cdot )\) in the above expression read

$$\begin{aligned} \frac{M}{\gamma L}=\Bigl (1\pm \frac{\varepsilon }{\gamma }\Bigr )\rho ,\quad \frac{N-M}{(1-\gamma )L}=\Bigl (1\mp \frac{\varepsilon }{1-\gamma }\Bigr )\rho . \end{aligned}$$
(2.28)

Since \(\langle \hat{P}_M\rangle _\infty \) takes a very sharp maximum around M such that \(M/(\gamma L)=\rho \), the probability that \(|M/N-\gamma |\ge \varepsilon \) is almost the same as the probability that \(|M/N-\gamma |\simeq \varepsilon \). We thus have

$$\begin{aligned} \langle \hat{P}^{\Gamma ,\varepsilon }_\textrm{neq}\rangle _\infty&\sim \max _\pm \exp \left[ \gamma L\,S\left( \left( 1\pm \tfrac{\varepsilon }{\gamma }\right) \rho \right) +(1-\gamma )L\,S\left( \left( 1\mp \tfrac{\varepsilon }{1-\gamma }\right) \rho \right) -L\,S(\rho )\right] \nonumber \\&=\exp \biggl [-\Bigl \{\frac{1}{2}\frac{1}{\gamma (1-\gamma )}\frac{\rho }{1-\rho }\varepsilon ^2+O(\varepsilon ^3)\Bigr \}\,L\biggr ]. \end{aligned}$$
(2.29)

For sufficiently large L and small \(\varepsilon \) this is converted into the inequality (2.22). \(\square \)

2.4 Nature of the Final State

As we have noted several times, we expect that the state \(|\Phi (t)\rangle \) with sufficiently large and typical t represents (with certain limited accuracy) the thermal equilibrium state of the whole system at infinite temperature. Here we briefly explain why the infinite temperature state, rather than a finite temperature state, is the destination of the relaxation process.

Let us assume in general that the Hamiltonian is written as

$$\begin{aligned} \hat{H}=\hat{H}_1+\hat{H}_2+{\varDelta }\hat{H}, \end{aligned}$$
(2.30)

where \(\hat{H}_1\) and \(\hat{H}_2\) act only on \(\Lambda _1\) and \(\Lambda _2\), respectively, and \({\varDelta }\hat{H}\) is the interaction Hamiltonian between \(\Lambda _1\) and \(\Lambda _2\). We assume that \(\hat{H}_1\) and \(\hat{H}_2\) are almost identical and \({\varDelta }\hat{H}\) is smaller.Footnote 6 We shall use the standard convention that \(\hat{H}_1|\Phi _\textrm{vac}\rangle =\hat{H}_2|\Phi _\textrm{vac}\rangle ={\varDelta }\hat{H}|\Phi _\textrm{vac}\rangle =0\), where \(|\Phi _\textrm{vac}\rangle \) is the state with no particles. Then we see from the energy conservation that

$$\begin{aligned} \langle \Phi (t)|\hat{H}|\Phi (t)\rangle =\langle \Phi (0)|\hat{H}|\Phi (0)\rangle \simeq \langle \Phi (0)|\hat{H}_1|\Phi (0)\rangle \simeq \frac{{\text {Tr}}_{\mathcal{H}_1}[\hat{H}_1]}{{\text {Tr}}_{\mathcal{H}_1}[\hat{1}]}, \end{aligned}$$
(2.31)

where we recalled that \(|\Phi (0)\rangle \) is drawn randomly from \(\mathcal{H}_1\). In a standard lattice gas model at low density, we expect from extensivity that

$$\begin{aligned} \frac{{\text {Tr}}_{\mathcal{H}_1}[\hat{H}_1]}{{\text {Tr}}_{\mathcal{H}_1}[\hat{1}]}\simeq \frac{{\text {Tr}}_{\mathcal{H}_\textrm{tot}}[\hat{H}]}{{\text {Tr}}_{\mathcal{H}_\textrm{tot}}[\hat{1}]}\simeq N\epsilon _\infty , \end{aligned}$$
(2.32)

where \(\epsilon _\infty \) is the energy per particle in the equilibrium state at infinite temperature. We thus see

$$\begin{aligned} \langle \Phi (t)|\hat{H}|\Phi (t)\rangle \simeq N\epsilon _\infty , \end{aligned}$$
(2.33)

i.e., \(|\Phi (t)\rangle \) has almost the same energy as the equilibrium state of the whole system at infinite temperature. This is confirmed explicitly for the free fermion chain. In summary, if the initial state \(|\Phi (0)\rangle \) has an almost saturating effective dimension, then the state after time evolution \(|\Phi (t)\rangle \) represents thermal equilibrium at infinite temperature.

3 Free Fermion on the Chain

In this section, we discuss our concrete example, namely the one-dimensional system of free fermions. We shall show that the model satisfies Assumptions 2.1 and 2.2 if we take a suitable setting.

3.1 Energy Eigenstates and Eigenvalues

We consider the chain \(\Lambda =\{1,2,\ldots ,L\}\), where L is odd. We denote the sites as \(x,y,\ldots \in \Lambda \). Let \(\hat{c}_x\) and \(\hat{c}^\dagger _x\) be the annihilation and creation operators, respectively, of the fermion at site \(x\in \Lambda \). They satisfy the canonical anticommutation relations \(\{\hat{c}_x,\hat{c}_y\}=0\) and \(\{\hat{c}_x,\hat{c}^\dagger _y\}=\delta _{x,y}\) for any \(x,y\in \Lambda \), where \(\{\hat{A},\hat{B}\}=\hat{A}\hat{B}+\hat{B}\hat{A}\). We denote by \(|\Phi _\textrm{vac}\rangle \) the state with no particles.

We take the standard Hamiltonian

$$\begin{aligned} \hat{H}=\sum _{x=1}^L\bigl \{e^{i\theta }\,\hat{c}^\dagger _x\hat{c}_{x+1}+e^{-i\theta }\,\hat{c}^\dagger _{x+1}\hat{c}_x\bigr \}, \end{aligned}$$
(3.1)

where we set the hopping amplitude to be unity for convenience. We introduced the artificial phase factor \(\theta \in [0,2\pi )\) in order to avoid degeneracy. We impose the periodic boundary condition and identify \(\hat{c}_{L+1}\) with \(\hat{c}_1\).

The Hamiltonian \(\hat{H}\) is readily diagonalized in terms of the plane wave states. Setting the k-space as

$$\begin{aligned} \mathcal{K}=\Bigl \{\frac{2\pi }{L}\nu \,\Bigl |\,\nu =0,\pm 1,\ldots ,\pm \frac{L-1}{2}\Bigr \}, \end{aligned}$$
(3.2)

we define the creation operator

$$\begin{aligned} \hat{a}^\dagger _k=\frac{1}{\sqrt{L}}\sum _{x=1}^Le^{ikx}\,\hat{c}^\dagger _x, \end{aligned}$$
(3.3)

for \(k\in \mathcal{K}\). It holds that \(\{\hat{a}^\dagger _k,\hat{a}_{k'}\}=\delta _{k,k'}\). One can show from the basic anticommutation relations that

$$\begin{aligned}{}[\hat{H},\hat{a}^\dagger _k]=2\tau \cos (k+\theta )\,\hat{a}^\dagger _k. \end{aligned}$$
(3.4)

Let \(\varvec{k}=(k_1,\ldots ,k_N)\) denote a collection of N elements in \(\mathcal{K}\) such that \(k_j<k_{j+1}\) for \(j=1,\ldots ,N-1\), and define

$$\begin{aligned} |\Psi _{\varvec{k}}\rangle =\hat{a}^\dagger _{k_1}\hat{a}^\dagger _{k_2}\cdots \hat{a}^\dagger _{k_N}|\Phi _\textrm{vac}\rangle . \end{aligned}$$
(3.5)

From (3.4) we readily see that \(|\Psi _{\varvec{k}}\rangle \) is an energy eigenstate, i.e.,

$$\begin{aligned} \hat{H}|\Psi _{\varvec{k}}\rangle =E_{\varvec{k}}|\Psi _{\varvec{k}}\rangle , \end{aligned}$$
(3.6)

where the energy eigenvalue is

$$\begin{aligned} E_{\varvec{k}}=2\sum _{j=1}^N\cos (k_j+\theta ). \end{aligned}$$
(3.7)

By counting the dimension, we see that these are the only energy eigenstates and eigenvalues.

3.2 Justification of Assumption 2.1

We prove two theorems for the free fermion chain that justify Assumption 2.1 about the absence of degeneracy in the energy eigenvalues. Note that the free fermion model on the continuous interval always has degenerate many-body energy eigenvalues. The degeneracy cannot be lifted by the flux insertion (which corresponds to the phase \(\theta \)). The following results on nondegeneracy essentially rely on the lattice nature of the model.

The first theorem rules out the degeneracy for most values of \(\theta \).Footnote 7

Theorem 3.1

(Nondegeneracy of \(E_{\varvec{k}}\) for most \(\theta \)) Let L be an arbitrary odd prime and N be an arbitrary integer with \(0<N\le L\). Except for a finite number of \(\theta \in [0,2\pi )\), one has \(E_{\varvec{k}}\ne E_{\varvec{k}'}\) whenever \(\varvec{k}\ne \varvec{k}'\), i.e., the energy eigenvalues \(E_{\varvec{k}}\) are nondegenerate.

The theorem, in particular, implies that if one draws \(\theta \) randomly, then with probability one, all the energy eigenvalues \(E_{\varvec{k}}\) are nondegenerate. The second theorem allows one to choose a model free from degeneracy without relying on a probabilistic choice.

Theorem 3.2

(Nondegeneracy of \(E_{\varvec{k}}\) for small \(|\theta |\ne 0\)) Let L be an arbitrary odd prime and N be an arbitrary integer with \(0<N\le L\). For any \(\theta \ne 0\) such that

$$\begin{aligned} |\theta |\le \frac{1}{(4N+2L)^{(L-1)/2}}, \end{aligned}$$
(3.8)

one has \(E_{\varvec{k}}\ne E_{\varvec{k}'}\) whenever \(\varvec{k}\ne \varvec{k}'\), i.e., the energy eigenvalues \(E_{\varvec{k}}\) are nondegenerate.

One thus knows that the model with, say, \(\theta =(4N+2L)^{-(L-1)/2}\) is free from degeneracy.

As we noted after Assumption 2.1, it is expected that the energy eigenvalues of a quantum many-body system are generically nondegenerate when there is no reason (like symmetry) to cause degeneracy. Even for a model of free fermions, we expect that possible degeneracy can be lifted by tuning some parameters, like the flux \(\theta \) or the site-dependent potential or hopping amplitude. However, it turns out that demonstrating nondegeneracy rigorously is very difficult in general. That is why we considered an artificial setting where the system size L is a prime number. In this case, the absence of degeneracy can be demonstrated by using number-theoretic results, as we shall see below.

We also note that the absence of degeneracy was rigorously established in a disordered free fermion chain. See Appendix A of [54].

To prove Theorems 3.1 and 3.2, it is convenient to introduce the standard occupation number description. For a given N-tuple \(\varvec{k}=(k_1,\ldots ,k_N)\), we define the corresponding occupation numbers \(\varvec{n}=(n_{-(L-1)/2},\ldots ,n_{(L-1)/2})\) as

$$\begin{aligned} n_\nu ={\left\{ \begin{array}{ll} 1,&{}\text {if }2\pi \nu /L=k_j \text { for some }j;\\ 0,&{}\text {otherwise}, \end{array}\right. } \end{aligned}$$
(3.9)

where \(\nu =0,\pm 1,\ldots ,\pm (L-1)/2\). One clearly has

$$\begin{aligned} \sum _{\nu =-(L-1)/2}^{(L-1)/2}n_\nu =N. \end{aligned}$$
(3.10)

By using the occupation numbers, the energy eigenstate (3.5) and the energy eigenvalue (3.7) are written as

$$\begin{aligned} |\Psi _{\varvec{n}}\rangle =\Biggl (\prod _{\nu =-(L-1)/2}^{(L-1)/2}(\hat{a}^\dagger _{2\pi \nu /L})^{n_\nu }\Biggr )|\Phi _\textrm{vac}\rangle , \end{aligned}$$
(3.11)

and

$$\begin{aligned} E_{\varvec{n}}=2\sum _{\nu =-(L-1)/2}^{(L-1)/2}n_\nu \,\cos \Bigl (\frac{2\pi }{L}\nu +\theta \Bigr ), \end{aligned}$$
(3.12)

respectively. By defining “complex energy” by

$$\begin{aligned} \mathcal{E}_{\varvec{n}}=\sum _{\nu =-(L-1)/2}^{(L-1)/2}n_\nu \,\zeta ^\nu , \end{aligned}$$
(3.13)

with

$$\begin{aligned} \zeta =e^{i 2\pi /L}, \end{aligned}$$
(3.14)

we can express the energy eigenvalue (3.12) as

$$\begin{aligned} E_{\varvec{n}}=2{\text {Re}}[e^{i\theta }\mathcal{E}_{\varvec{n}}]. \end{aligned}$$
(3.15)

Let us state two number theoretic lemmas,Footnote 8 which play essential roles in the proof of Theorems 3.1 and 3.2. We recall L is an odd prime, and \(\zeta \) is defined as (3.14).

Lemma 3.3

For any \(m_1,\ldots ,m_{L-1}\in \mathbb {Z}\) such that \(m_\mu \ne 0\) for some \(\mu \), one has

$$\begin{aligned} \sum _{\mu =1}^{L-1}m_\mu \,\zeta ^\mu \ne 0. \end{aligned}$$
(3.16)

Here, it is crucial that the sum is from 1 to \(L-1\), rather than from 1 to L. Otherwise (3.16) can never be true because \(\sum _{\mu =1}^L\zeta ^\mu =0\). The lemma is a straightforward consequence of the classical result by Gauss, known as the irreducibility of the cyclotomic polynomials of prime index. See, e.g., Chapter 12, Section 2 of [56], and also Chapter 13, Section 2 of [57] or section 3.2 of [58].

The following lemmaFootnote 9 provides an explicit lower bound for \(|\sum _{\mu =1}^{L-1}m_\mu \,\zeta ^\mu |\).

Lemma 3.4

For any \(m_1,\ldots ,m_{L-1}\in \mathbb {Z}\) such that \(m_\mu \ne 0\) for some \(\mu \), one has

$$\begin{aligned} \biggl |\sum _{\mu =1}^{L-1}m_\mu \,\zeta ^\mu \biggr |\ge \biggl (\sum _{\mu =1}^{L-1}|m_\mu |\biggr )^{-(L-3)/2}. \end{aligned}$$
(3.17)

Proof

The lemma is proved by using standard facts about the field norm and algebraic integers. See, e.g., [58]. Let \(\alpha =\sum _{\mu =1}^{L-1}m_\mu \,\zeta ^\mu \in \mathbb {Z}[\zeta ]\subset \mathbb {Q}[\zeta ]\) and

$$\begin{aligned} \sigma _j(\alpha )=\sum _{\mu =1}^{L-1}m_\mu \,e^{i2\pi j\mu /L}, \end{aligned}$$
(3.18)

be its conjugates, where \(j=1,\ldots ,L-1\). Note that \(\sigma _1(\alpha )=\alpha \), \(\sigma _j(\alpha )=\{\sigma _{L-j}(\alpha )\}^*\), and \(|\sigma _j(\alpha )|\le M\) with \(M=\sum _{\mu =1}^{L-1}|m_\mu |\). Let \(N:\mathbb {Q}[\zeta ]\rightarrow \mathbb {Q}\) denote the field norm of \(\mathbb {Q}[\zeta ]\). By definition, we have

$$\begin{aligned} N(\alpha )=\prod _{j=1}^{L-1}\sigma _j(\alpha )=\prod _{j=1}^{(L-1)/2}|\sigma _j(\alpha )|^2. \end{aligned}$$
(3.19)

Since Lemma 3.3 guarantees \(\sigma _j(\alpha )\ne 0\) for all j, we see that \(N(\alpha )>0\). Note that \(\alpha \) is an algebraic integer, and hence so are its conjugates \(\sigma _j(\alpha )\) and the norm \(N(\alpha )\). It is known that an algebraic integer that is rational must be an integer. Since \(N(\alpha )\in \mathbb {Q}\), we see \(N(\alpha )\in \mathbb {Z}\) and hence \(N(\alpha )\ge 1\). This bound, with (3.19), implies

$$\begin{aligned} |\alpha |^2\ge \biggl (\prod _{j=2}^{(L-1)/2}\bigl |\sigma _j(\alpha )\bigr |^2\biggr )^{-1}\ge \frac{1}{M^{L-3}}. \end{aligned}$$
(3.20)

\(\square \)

We are now ready to prove our physics theorems.

Proof of Theorem 3.1

We first show \(\mathcal{E}_{\varvec{n}}\ne \mathcal{E}_{\varvec{n}'}\) if \(\varvec{n}\ne \varvec{n}'\), where both \(\varvec{n}\) and \(\varvec{n}'\) are occupation numbers for N particle energy eigenstates. In other words, the complex energy eigenvalues are nondegenerate. From (3.13), we find

$$\begin{aligned} \mathcal{E}_{\varvec{n}}-\mathcal{E}_{\varvec{n}'}=\sum _{\nu =-(L-1)/2}^{(L-1)/2}(n_\nu -n'_\nu )\,\zeta ^\nu . \end{aligned}$$
(3.21)

We claim that there is at least one \(\nu \) such that \(n_{\nu }-n'_{\nu }=0\). To see this, it suffices to note that the converse, i.e., \(n_\nu =0\), \(n'_\nu =1\) or \(n_\nu =1\), \(n'_\nu =0\) for every \(\nu \), implies \(L=2N\), while L is odd. Let \(\nu _0\) be such \(\nu \), i.e., \(n_{\nu _0}-n'_{\nu _0}=0\). Noting that the right-hand side of (3.21) does not contain the term proportional to \(\zeta ^{\nu _0}\), we rewrite it as

$$\begin{aligned} \mathcal{E}_{\varvec{n}}-\mathcal{E}_{\varvec{n}'}=\zeta ^{\nu _0}\sum _{\mu =1}^{L-1}m_\mu \,\zeta ^\mu , \end{aligned}$$
(3.22)

with \(m_\mu =n_{\nu _0+\mu }-n'_{\nu _0+\mu }\), where we used the “periodic boundary condition”, \(\nu =\nu +L\), for the index. Since \(m_\mu \) is not identically zero (because \(\varvec{n}\ne \varvec{n}'\)), we see \(\mathcal{E}_{\varvec{n}}-\mathcal{E}_{\varvec{n}'}\ne 0\) from Lemma 3.3.

Now, the statement of the lemma is proved easily. Take any \(\varvec{n}\) and \(\varvec{n}'\) with \(\varvec{n}\ne \varvec{n}'\). Since \(e^{i\theta }(\mathcal{E}_{\varvec{n}}-\mathcal{E}_{\varvec{n}'})\ne 0\), (3.15) implies that the two energy eigenvalues \(E_{\varvec{n}}\) and \(E_{\varvec{n}'}\) are degenerate only at two values of \(\theta \) for which the real part of \(e^{i\theta }(\mathcal{E}_{\varvec{n}}-\mathcal{E}_{\varvec{n}'})\) vanishes. This means that the N-particle energy eigenvalues exhibit degeneracy at most at \(D_\textrm{tot}(D_\textrm{tot}-1)\) different values of \(\theta \), where we recalled that there are \(D_\textrm{tot}\) distinct \(\varvec{n}\)’s. Except for these finite points in the continuous interval \([0,2\pi )\), the Hamiltonian has no degeneracy.  \(\square \)

Proof of Theorem 3.2

Consider the model with \(\theta =0\). Because of the reflection symmetry \(\cos ((2\pi /L)\nu )=\cos (-(2\pi /L)\nu )\), we see from (3.12) that the energy eigenvalue \(E_{\varvec{n}}\) depends only on \(n_0\) and \(n_\nu +n_{-\nu }\) for \(\nu =1,\ldots ,(L-1)/2\). In particular, we get the same energy for \(n_\nu =1\), \(n_{-\nu }=0\) and \(n_\nu =0\), \(n_{-\nu }=1\). This means that \(E_{\varvec{n}}\) is generally degenerate, and the maximum possible degree of degeneracy is \(2^N\). We call such a degeneracy a trivial degeneracy.

We shall show that, in the model with \(\theta =0\), there are no additional degeneracies than trivial degeneracies.Footnote 10 Take occupation numbers \(\varvec{n}\) and \(\varvec{n}'\) for N particles such that \(n_\nu +n_{-\nu }\ne n'_\nu +n'_{-\nu }\) for some \(\nu \) (including \(\nu =0\)). The energy eigenvalues \(E_{\varvec{n}}\) and \(E_{\varvec{n}'}\) do not exhibit trivial degeneracy. Since \(\zeta ^*=\zeta ^{-1}\), we see from (3.13) and (3.15) that

$$\begin{aligned} E_{\varvec{n}}-E_{\varvec{n}'}=\mathcal{E}_{\varvec{n}}+(\mathcal{E}_{\varvec{n}})^*-\mathcal{E}_{\varvec{n}'}-(\mathcal{E}_{\varvec{n}'})^* =\sum _{\nu =-(L-1)/2}^{(L-1)/2}\tilde{n}_\nu \,\zeta ^\nu , \end{aligned}$$
(3.23)

where we set \(\tilde{n}_\nu =n_\nu +n_{-\nu }-n'_\nu -n'_{-\nu }\). Noting that \(\sum _{\nu =-(L-1)/2}^{(L-1)/2}\zeta ^\nu =0\), we rewrite (3.23) as

$$\begin{aligned} E_{\varvec{n}}-E_{\varvec{n}'}= \sum _{\nu =-(L-1)/2}^{(L-1)/2}(\tilde{n}_\nu -\tilde{n}_0)\,\zeta ^\nu =\sum _{\mu =1}^{L-1}m_\mu \,\zeta ^\mu , \end{aligned}$$
(3.24)

where

$$\begin{aligned} m_\mu ={\left\{ \begin{array}{ll} \tilde{n}_\mu -\tilde{n}_0,&{}\mu =1,\ldots ,\frac{L-1}{2};\\ \tilde{n}_{\mu -L}-\tilde{n}_0,&{}\mu =\frac{L+1}{2},\ldots ,L-1. \end{array}\right. } \end{aligned}$$
(3.25)

We shall see at the end of the proof that \(m_\mu \ne 0\) for some \(\mu \). Then, noting that

$$\begin{aligned} \sum _{\mu =1}^{L-1}|m_\mu |\le \sum _{\nu =-(L-1)/2}^{(L-1)/2}\{|\tilde{n}_\nu |+|\tilde{n}_0|\}\le 4N+2L, \end{aligned}$$
(3.26)

we find from Lemma 3.4 that

$$\begin{aligned} |E_{\varvec{n}}-E_{\varvec{n}'}|\ge \frac{1}{(4N+2L)^{(L-3)/2}}. \end{aligned}$$
(3.27)

This, in particular, means that the energy eigenvalues \(E_{\varvec{n}}\) and \(E_{\varvec{n}'}\) are not degenerate.

We shall now examine the effect of nonzero \(\theta \). We make the \(\theta \)-dependence of the energy eigenvalues explicit by writing \(E^{(\theta )}_{\varvec{n}}\) instead of \(E_{\varvec{n}}\).

Suppose for some \(\varvec{n}\ne \varvec{n}'\) that \(E^{(0)}_{\varvec{n}}=E^{(0)}_{\varvec{n}'}\), i.e., \({\text {Re}}\mathcal{E}_{\varvec{n}}={\text {Re}}\mathcal{E}_{\varvec{n}'}\). The two energy eigenvalues exhibit trivial degeneracy. Since \(\mathcal{E}_{\varvec{n}}\ne \mathcal{E}_{\varvec{n}'}\) (see the proof of Theorem 3.1 above), we must have \({\text {Im}}\mathcal{E}_{\varvec{n}}\ne {\text {Im}}\mathcal{E}_{\varvec{n}'}\). Recalling that (3.15) implies \(E^{(\theta )}_{\varvec{n}}=2\cos \theta \,{\text {Re}}\mathcal{E}_{\varvec{n}}-2\sin \theta \,{\text {Im}}\mathcal{E}_{\varvec{n}}\), we see \(E^{(\theta )}_{\varvec{n}}\ne E^{(\theta )}_{\varvec{n}'}\) for any \(\theta \ne 0,\pi \). Trivial degeneracies are completely lifted.

Since we have shown that the model is free from trivial degeneracies for \(\theta \) with \(0<|\theta |<\pi \), we look for a sufficient condition that additional (nontrivial) degeneracy is not generated when \(\theta \) is varied slightly from 0. We observe from (3.12) that the resulting change in the energy eigenvalue is bounded as

$$\begin{aligned} |E^{(\theta )}_{\varvec{n}}-E^{(0)}_{\varvec{n}}|&\le 2\sum _{\nu =-(L-1)/2}^{(L-1)/2}n_\nu \,\biggl |\cos \Bigl (\frac{2\pi }{L}\nu +\theta \Bigr )-\cos \Bigl (\frac{2\pi }{L}\nu \Bigr )\biggr | \nonumber \\&<2\sum _{\nu =-(L-1)/2}^{(L-1)/2}n_\nu \,|\theta |=2N\,|\theta |, \end{aligned}$$
(3.28)

for any \(\varvec{n}\) such that (3.10) holds. We then find from (3.27) that no additional degeneracy can be generated if

$$\begin{aligned} 2\times 2N\,|\theta |\le \frac{1}{(4N+2L)^{(L-3)/2}}, \end{aligned}$$
(3.29)

This is satisfied if the condition (3.8) in the theorem is valid.

It remains to prove that \(m_\mu \ne 0\) for some \(\mu \), where \(m_\mu \) is defined in (3.25). To this end, we assume \(m_\mu =0\) for all \(\mu \). First, suppose \(\tilde{n}_0=0\). We then have \(\tilde{n}_\nu =0\) for all \(\nu \), but this contradicts the basic assumption that \(n_\nu +n_{-\nu }\ne n'_\nu +n'_{-\nu }\) for some \(\nu \). Next, suppose \(\tilde{n}_0\ne 0\). We then have \(n_\nu +n_{-\nu }-n'_\nu -n'_{-\nu }=\tilde{n}_0\ne 0\) for any \(\nu \ne 0\). But this implies \(\sum _{\nu \ne 0}n_\nu -\sum _{\nu \ne 0}n'_\nu =\frac{L-1}{2}\tilde{n}_0\), which apparently contradicts with the constraint on the total particle number, i.e., \(\sum _{\nu }n_\nu =\sum _{\nu }n'_\nu =N\). \(\square \)

3.3 Justification of Assumption 2.2

We shall demonstrate that Assumption 2.2 about the particle distribution in the energy eigenstates is valid in the present free fermion chain. As in section 2.1, we disjointly decompose the chain \(\Lambda =\{1,\ldots ,L\}\) as \(\Lambda =\Lambda _1\cup \Lambda _2\) with \(|\Lambda _1|=(L-1)/2\) and \(|\Lambda _2|=(L+1)/2\). An obvious choice is \(\Lambda _1=\{1,2,\ldots ,(L-1)/2\}\), but any subset will work similarly.

Let us decompose the creation operator \(\hat{a}^\dagger _k\) defined in (3.3) as

$$\begin{aligned} \hat{a}^\dagger _k=\hat{b}^\dagger _{1,k}+\hat{b}^\dagger _{2,k}, \end{aligned}$$
(3.30)

where

$$\begin{aligned} \hat{b}^\dagger _{\alpha ,k}=\frac{1}{\sqrt{L}}\sum _{x\in \Lambda _\alpha }e^{ikx}\,\hat{c}^\dagger _x, \end{aligned}$$
(3.31)

with \(\alpha =1,2\). Note that \(\{\hat{b}^\dagger _{1,k},\hat{b}_{1,k'}\}\) with \(k\ne k'\) is not necessarily vanishing. From (3.5), we obviously have

$$\begin{aligned} \hat{P}_1|\Psi _{\varvec{k}}\rangle =\hat{b}^\dagger _{1,k_1}\hat{b}^\dagger _{1,k_2}\ldots \hat{b}^\dagger _{1,k_N}|\Phi _\textrm{vac}\rangle , \end{aligned}$$
(3.32)

and hence

$$\begin{aligned} \langle \Psi _{\varvec{k}}|\hat{P}_1|\Psi _{\varvec{k}}\rangle&=\langle \Phi _\textrm{vac}|\hat{b}_{1,k_N}\ldots \hat{b}_{1,k_2}\hat{b}_{1,k_1} \hat{b}^\dagger _{1,k_1}\hat{b}^\dagger _{1,k_2}\ldots \hat{b}^\dagger _{1,k_N}|\Phi _\textrm{vac}\rangle \nonumber \\&\le \Vert \hat{b}_{1,k_1}\hat{b}^\dagger _{1,k_1}\Vert \,\langle \Phi _\textrm{vac}|\hat{b}_{1,k_N}\ldots \hat{b}_{1,k_2} \hat{b}^\dagger _{1,k_2}\ldots \hat{b}^\dagger _{1,k_N}|\Phi _\textrm{vac}\rangle . \end{aligned}$$
(3.33)

Here we used the basic property \(\langle \Psi |\hat{A}|\Psi \rangle \le \Vert \hat{A}\Vert \langle \Psi |\Psi \rangle \) of the operator norm of an arbitrary operator \(\hat{A}\). Noting that

$$\begin{aligned} \Vert \hat{b}_{1,k}\hat{b}^\dagger _{1,k}\Vert \le \frac{1}{2}, \end{aligned}$$
(3.34)

for any \(k\in \mathcal{K}\) (as we shall show below), we get the desired bound (2.5) by repeatedly using (3.33).

To estimate the norm \(\Vert \hat{b}_{1,k}\hat{b}^\dagger _{1,k}\Vert \), we first note by an explicit calculation that \(\{\hat{b}_{1,k}, \hat{b}^\dagger _{1,k}\}=p\) with \(p=(L-1)/(2L)\le 1/2\). Then by noting that \((\hat{b}_{1,k}\hat{b}^\dagger _{1,k})^2=p\,\hat{b}_{1,k}\hat{b}^\dagger _{1,k}\), we see that the self-adjoing operator \(\hat{b}_{1,k}\hat{b}^\dagger _{1,k}\) has eigenvalues 0 and p. This means \(\Vert \hat{b}_{1,k}\hat{b}^\dagger _{1,k}\Vert =p\), which implies (3.34).

It is clear that the above justification of Assumption 2.2 applies to a much more general class of free fermion systems. The only requirement is that there is a decomposition corresponding to (3.30) of the creation operator for single-particle energy eigenstate with the property (3.34). See Appendix B.2 for a class of examples.

4 Discussion

We developed in Sect. 2 a general theory for the thermalization in low-density lattice gases. Under the two essential assumptions, namely, Assumption 2.2 about the particle distribution in energy eigenstates and Assumption 2.1 about nondegneracy of energy eigenvalues, we have shown that the system exhibits thermalization (in a restricted sense) when the initial state is drawn randomly from the Hilbert space \(\mathcal{H}_1\) in which all particles are confined in the half-lattice \(\Lambda _1\). The essential observation, which is summarized in Theorem 2.3, is that Assumptions 2.2 implies the lower bound (2.7) on the effective dimension of the initial state. Combined with standard arguments, the lower bound implies the desired statement about thermalization.

Then, in Sect. 3, we justified Assumptions 2.1 and 2.2 for a class of free fermion chains without relying on any unproven assumptions. Free fermion models, which have infinitely many conserved quantities, are often referred to as examples of systems that fail to thermalize. One might then be puzzled to see that we have established thermalization in free fermion chains. The essential point is in the choice of the initial state \(|\Phi (0)\rangle \). In a non-interacting fermion model with translation invariance, for example, the momentum distribution does not change under the unitary time evolution. Thus the system never thermalizes if it starts from a state with non-thermal momentum distribution. In our case, the momentum distribution is thermal from the beginning because the initial state is chosen to be a thermal state but with particles confined in the sublattice \(\Lambda _1\). In the language of generalized Gibbs ensembles, our generalized ensemble with the random initial state is characterized by local integrals of motion taking the same values as the equilibrium ones.

Although free fermion chains are the only examples in which we can fully justify our two assumptions, we stress that our general theory should apply to much more general models, most of which are non-integrable. Non-integrable models are believed to exhibit robust thermalization from an arbitrary realistic nonequilibrium initial state. When applied to such models, our thermalization theorem is expected to describe a partial aspect of thermalization exhibited by the model. We might say that our theory is general and broad enough to cover not only full-fledged thermalization in non-integrable system but also (rather trivial) thermalization in free fermion chains.

As we have already discussed after Assumption 2.1, it is believed that energy eigenvalues are nondegenerate in a generic non-integrable model. Therefore, let us focus on Assumption 2.2, which asserts that the probability of finding all particles in the sublattice \(\Lambda _1\) does not exceed \(2^{-N}\) in any energy eigenstate as in (2.5). By accepting the assumption of nondegeneracy as a plausible one, we have two additonal classes of models in which we can prove (2.5) as presented in Appendix B.

Assumption 2.2 is reminiscent of the (strong) ETH in the sense that we postulate that every energy eigenstate exhibits more or less uniform particle distributions. Although we are able to prove the bound (2.5) only for limited models, we expect that it is valid for all (or for a great majority of) energy eigenstates of a generic macroscopic quantum system. The bound does not hold, for example, in a state where a macroscopic number of particles form a big cluster and move together, but such states cannot be an energy eigenstate of a model with short-range interactions. We, in particular, note that the average of the probability \(\langle \Psi _j|\hat{P}_1|\Psi _j\rangle \) over all the energy eigenstates is

$$\begin{aligned} \frac{1}{D_\textrm{tot}}\sum _{j=1}^{D_\textrm{tot}}\langle \Psi _j|\hat{P}_1|\Psi _j\rangle =\frac{1}{D_\textrm{tot}}{\text {Tr}}[\hat{P}_1]=\frac{D_1}{D_\textrm{tot}}\sim 2^{-N}e^{-(L/2)\rho ^2}, \end{aligned}$$
(4.1)

and is much smaller than \(2^{-N}\).

In Sect. 2, we only discussed thermalization in the sense that the ratio of the number of particles in a macroscopic region \(\Gamma \) approaches its equilibrium value \(\gamma \). It is, however, clear from the proof that our method automatically extends to other criteria for thermal equilibrium. Let \(\hat{P}_\textrm{neq}\) be the projection operator onto the nonequilibrium subspace of \(\mathcal{H}_\textrm{tot}\) determined by a certain criterion for thermal equilibrium. If the canonical expectation value of \(\hat{P}_\textrm{neq}\) at infinite temperature satisfies

$$\begin{aligned} \langle \hat{P}_\textrm{neq}\rangle _\infty \le e^{-\kappa N}=e^{-\kappa \rho \,L}, \end{aligned}$$
(4.2)

with a constant \(\kappa \) such that \(\kappa >\rho \), then we can prove, exactly as in Theorem 2.4, that the expectation value \(\langle \Phi (t)|\hat{P}_\textrm{neq}|\Phi (t)\rangle \) is extremely small for sufficiently large typical t, i.e., the system exhibits thermalization. Although we do not go into details, we expect that the assumption (4.2) is valid if one defines \(\hat{P}_\textrm{neq}\) to be the projection onto the space where the total energy in a macroscopic region differs considerably from its expectation value in \(\langle \cdot \rangle _\infty \).

We note, however, that if one employs a criterion of thermal equilibrium that involves, say, particle-particle correlation, then the assumption (4.2) with \(\kappa >\rho \) for the corresponding nonequilibrium projection is never valid. This means that our theorem is simply powerless. This shortcoming is related to the limitation to low densities and reflects the limitation of our approach, which reflects our strategy to base the theory on mild assumptions.