1 Introduction and main results

This paper follows our prior work [15] and continues the study of probability theory beyond—but also subsuming—the Banach space setting. In the present work, we estimate sums of independent random variables in several ways, under very primitive mathematical assumptions that suffice to state our results. The setting is as follows.

Definition 1.1

A metric semigroup is defined to be a semigroup \(({\mathscr {G}}, \cdot )\) equipped with a metric \(d_{\mathscr {G}}: {\mathscr {G}}\times {\mathscr {G}}\rightarrow [0,\infty )\) that is translation-invariant. In other words,

$$\begin{aligned} d_{\mathscr {G}}(ac,bc) = d_{\mathscr {G}}(a,b) = d_{\mathscr {G}}(ca,cb)\ \forall a,b,c \in {\mathscr {G}}. \end{aligned}$$

(Equivalently, \(({\mathscr {G}},d_{\mathscr {G}})\) is a metric space equipped with a associative binary operation such that \(d_{\mathscr {G}}\) is translation-invariant.) Similarly, one defines a metric monoid and a metric group.

Metric groups are ubiquitous in probability theory and functional analysis, and subsume all normed linear spaces as well as compact and (connected) abelian Lie groups as special cases. More modern examples of recent interest are mentioned presently.

Now suppose \((\varOmega , {\mathscr {A}}, \mu )\) is a probability space and \(X_1, \ldots , X_n \in L^0(\varOmega ,{\mathscr {G}})\) are \({\mathscr {G}}\)-valued random variables. Fix \(z_0, z_1 \in {\mathscr {G}}\) and define for \(1 \leqslant j \leqslant n\):

$$\begin{aligned} \begin{aligned} S_j&:= X_1 X_2 \cdots X_j, \quad U_j := \max _{1 \leqslant i \leqslant j} d_{\mathscr {G}}(z_1, z_0 S_i),\\ Y_j&:= d_{\mathscr {G}}(z_0, z_0 X_j), \quad M_j := \max _{1 \leqslant i \leqslant j} Y_i. \end{aligned} \end{aligned}$$
(1.1)

In this paper we discuss bounds that govern the behavior of \(U_n\)—and consequently, of sums \(S_n\) of independent \({\mathscr {G}}\)-valued random variables \(X_j\)—in terms of the variables \(X_j\), and even \(Y_j\) or \(M_j\). We are interested in a variety of bounds: (a) one-sided geometric tail estimates; (b) approximate two-sided bounds for tail probabilities; (c) approximate two-sided bounds for moments; and (d) comparison of moments. For instance, is it possible to obtain bounds for \({\mathbb {E}}_\mu [U_n^p]^{1/p}\) in terms of the tail distribution for \(U_n\), or in terms of \({\mathbb {E}}_\mu [U_n^q]^{1/q}\) for \(p,q > 0\)? The latter question has been well-studied in the literature for Banach spaces, and universal bounds that grow at the “correct” rate have been obtained for all \(q \gg 0\). We explore the question of obtaining correctly growing universal constants for metric semigroups, which include not only normed linear spaces and inner product spaces, but also all connected abelian and compact Lie groups. Our results show that the universal constants in such inequalities do not depend on the semigroup in question.

1.1 Motivations

Our motivations in developing probability theory in such general settings are both modern and classical. An increasing number of modern-day theoretical and applied settings require mathematical frameworks that go beyond Banach spaces. For instance, data and random variables may take values in manifolds such as (real or complex) Lie groups. Compact or connected abelian Lie groups also commonly feature in the literature, including permutation groups and other finite groups, lattices, orthogonal groups, and tori. In fact every abelian, Hausdorff, metrizable, topologically complete group G admits a translation-invariant metric [17], though this fails to hold for cancellative semigroups [18]. Certain classes of amenable groups are also metric groups (see [14] for more details). Other modern examples arise in the study of large networks and include the space of graphons with the cut norm, which arises naturally out of combinatorics and is related to many applications [21]. In a parallel vein, the space of labelled graphs \({\mathscr {G}}(V)\) on a fixed vertex set V is a 2-torsion metric group (see [12, 13]), hence does not embed into a normed linear space.

With the above settings in mind, in this paper we develop novel techniques for proving maximal inequalities—as well as comparison results between tail distributions and various moments—for sums of independent random variables taking values in the aforementioned groups, which need not be Banach spaces.

At the same time, we also have theoretical motivations in mind when developing probability theory on non-linear spaces such as \({\mathscr {G}}(V)\) and beyond. Throughout the past century, the emphasis in probability has shifted somewhat from proving results on stochastic convergence, to obtaining sharper and stronger bounds on random sums, in increasingly weaker settings. A celebrated achievement of probability theory has been to develop a rigorous and systematic framework for studying the behavior of sums of (independent) random variables; see e.g. [20]. In this vein, we provide unifications of our results on graph space with those in the Banach space literature, by proving them in a more primitive mathematical framework encompassing both of these (and other) settings. In particular, our results apply to compact/abelian/discrete Lie groups, as well as normed linear spaces.

For example, maximal inequalities by Hoffmann–Jørgensen, Lévy, Ottaviani–Skorohod, and Mogul’skii require merely the notions of a metric and a binary associative operation to state them. Thus one only needs a separable metric semigroup \({\mathscr {G}}\) rather than a Banach space to state these inequalities. However, note that working in a metric semigroup raises technical questions. For instance, the lack of an identity element means one has to specify how to compute magnitudes of \({\mathscr {G}}\)-valued random variables (before trying to bound or estimate them); also, it is not apparent how to define truncations of random variables. The lack of inverses, norms, or commutativity implies in particular that one cannot rescale or subtract random variables.

In the present work, we explain how to overcome these challenges. We also hope to show that the approach of working with arbitrary metric semigroups turns out to be richly rewarding in (i) obtaining the above (and other) results for non-Banach settings; (ii) unifying these results with the existing Banach space results in order to hold in the greatest possible generality; and (iii) further strengthening these unified versions where possible.

1.2 Organization and results

We now describe the organization and contributions of the present paper. In Sect. 2 we prove the Mogul’skii–Ottaviani–Skorohod inequalities for all metric semigroups \({\mathscr {G}}\). As an application, we show Lévy’s equivalence for stochastic convergence in metric semigroups.

In Sect. 3, we come to our main goal in this paper, of estimating and comparing moments and tail probabilities for sums of independent \({\mathscr {G}}\)-valued random variables. Our main tool is a variant of Hoffmann–Jørgensen’s inequality for metric semigroups, which is shown in recent work [15]. The relevant part for our purposes is now stated.

Theorem 1.1

(Khare and Rajaratnam [15]) Notation as in Definition 1.1 and Equation (1.1). Suppose \(X_1, \ldots \), \(X_n \in L^0(\varOmega ,{\mathscr {G}})\) are independent. Fix scalars

$$\begin{aligned} n_1, \ldots , n_k \in {\mathbb {N}}, \quad t_1, \ldots , t_k, s \in [0,\infty ), \end{aligned}$$

and define

$$\begin{aligned} I_0 := \left\{ 1 \leqslant i \leqslant k : {\mathbb {P}}_\mu \left( U_n \leqslant t_i \right) ^{n_i - \delta _{i1}} \leqslant 1/n_i! \right\} , \end{aligned}$$

where \(\delta _{i1}\) denotes the Kronecker delta. Now if \(\sum _{i=1}^k n_i \leqslant n+1\), then:

$$\begin{aligned}&{\mathbb {P}}_\mu \left( U_n> (2 n_1 - 1) t_1 + 2 \sum _{i=2}^k n_i t_i + \left( \sum _{i=1}^k n_i - 1 \right) s \right) \\&\quad \leqslant {\mathbb {P}}_\mu \left( U_n \leqslant t_1 \right) ^{\mathbf{1}_{1 \notin I_0}} \prod _{i \in I_0} {\mathbb {P}}_\mu \left( U_n> t_i \right) ^{n_i} \prod _{i \notin I_0} \frac{1}{n_i!} \left( \frac{{\mathbb {P}}_\mu \left( U_n> t_i \right) }{{\mathbb {P}}_\mu \left( U_n \leqslant t_i \right) } \right) ^{n_i}\\&\qquad + {\mathbb {P}}_\mu \left( M_n > s \right) . \end{aligned}$$

Remark that Theorem 1.1 generalizes the original Hoffmann–Jørgensen inequality in three ways: (i) mathematically it strengthens the state-of-the-art even for real variables; (ii) it unifies previous results by Johnson and Schechtman [10], Klass and Nowicki [16], and Hitczenko and Montgomery-Smith [6] in the Banach space literature; and (iii) the result holds in the most primitive setting needed to state it, thereby being applicable also to e.g. Lie groups.

We now discuss several ways in which to estimate the size of sums of independent \({\mathscr {G}}\)-valued random variables, for metric semigroups \({\mathscr {G}}\). We present two results in this section, corresponding to two of the estimation techniques discussed in the introduction. (For a third result, see Theorem 3.1.)

The first approach, informally speaking, uses the Hoffmann–Jørgensen inequality to generalize an upper bound for \({\mathbb {E}}_\mu [\Vert S_n\Vert ^p]\) in terms of the quantiles of \(\Vert S_n\Vert \) as well as \({\mathbb {E}}_\mu [M_n^p]\)—but now in the “minimal” framework of metric semigroups. More precisely, we show that controlling the behavior of \(X_n\) is equivalent to controlling \(S_n\) or \(U_n\), for all metric semigroups.

Theorem A

Suppose \(A \subset {\mathbb {N}}\) is either \({\mathbb {N}}\) or \(\{ 1, \ldots , N \}\) for some \(N \in {\mathbb {N}}\). Suppose \(({\mathscr {G}}, d_{\mathscr {G}})\) is a separable metric semigroup, \(z_0, z_1 \in {\mathscr {G}}\), and \(X_n \in L^0(\varOmega ,{\mathscr {G}})\) are independent for all \(n \in A\). If \(\sup _{n \in A} d_{\mathscr {G}}(z_1, z_0 S_n) < \infty \) almost surely, then for all \(p \in (0,\infty )\),

$$\begin{aligned} {\mathbb {E}}_\mu \left[ \sup _{n \in A} d_{\mathscr {G}}(z_0, z_0 X_n)^p \right]< \infty \quad \Longleftrightarrow \quad {\mathbb {E}}_\mu \left[ \sup _{n \in A} d_{\mathscr {G}}(z_1, z_0 S_n)^p \right] < \infty . \end{aligned}$$

This result extends [7, Theorem 3.1] by Hoffmann–Jørgensen to the “minimal” framework of metric semigroups. The proofs of Theorem A and the next result use the notion of the quantile functions, or decreasing rearrangements, of \({\mathscr {G}}\)-valued random variables:

Definition 1.2

Suppose \(({\mathscr {G}}, d_{\mathscr {G}})\) is a metric semigroup, and \(X : (\varOmega , {\mathscr {A}}, \mu ) \rightarrow ({\mathscr {G}},{\mathscr {B}}_{\mathscr {G}})\). We define the decreasing (or non-increasing) rearrangement of X to be the right-continuous inverse \(X^*\) of the function \(t \mapsto {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(z_0, z_0 X) > t \right) \), for any \(z_0 \in {\mathscr {G}}\). In other words, \(X^*\) is the real-valued random variable defined on [0, 1] with the Lebesgue measure, as follows:

$$\begin{aligned} X^*(t) := \sup \{ y \in [0,\infty ) : {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(z_0, z_0 X)> y \right) > t \}. \end{aligned}$$

Note that \(X^*\) has exactly the same law as \(d_{\mathscr {G}}(z_0, z_0 X)\). Moreover, if \(({\mathscr {G}}, \Vert \cdot \Vert )\) is a normed linear space, then \(d_{\mathscr {G}}(z_0, z_0 X)\) can be replaced by \(\Vert X\Vert \), and often papers in the literature refer to \(X^*\) as the decreasing rearrangement of \(\Vert X\Vert \) instead of X itself. The convention that we adopt above is slightly weaker.

The second approach provides another estimate on the size of \(S_n\) through its moments, by comparing \(\Vert S_n \Vert _q\) to \(\Vert S_n \Vert _p\)—or more precisely, \({\mathbb {E}}_\mu [U_n^q]^{1/q}\) to \({\mathbb {E}}_\mu [U_n^p]^{1/p}\)—for \(0 < p \leqslant q\). Moreover, the constants of comparison are universal, valid for all abelian semigroups and all finite sequences of independent random variables, and depend only on a threshold:

Theorem B

Given \(p_0 > 0\), there exist universal constants \(c = c(p_0), c' = c'(p_0) > 0\) depending only on \(p_0\), such that for all choices of (a) separable abelian metric semigroups \(({\mathscr {G}}, d_{\mathscr {G}})\), (b) finite sequences of independent \({\mathscr {G}}\)-valued random variables \(X_1, \ldots , X_n\), (c) \(q \geqslant p \geqslant p_0\), and (d) \(\epsilon \in (-q,\log (16)]\), we have

$$\begin{aligned}&\ {\mathbb {E}}_\mu [U_n^q]^{1/q}\\&\quad \leqslant c \frac{q}{\max (p, \log (\epsilon +q))} \left( {\mathbb {E}}_\mu [U_n^p]^{1/p} + M_n^*(e^{-q}/8)\right) + c {\mathbb {E}}_\mu [M_n^q]^{1/q}\\&\quad \leqslant c' \frac{q}{\max (p, \log (\epsilon +q))} \left( {\mathbb {E}}_\mu [U_n^p]^{1/p} + {\mathbb {E}}_\mu [M_n^q]^{1/q}\right) \quad \text { if } \epsilon \geqslant \min (1, e-p_0). \end{aligned}$$

Moreover, we may choose

$$\begin{aligned} c'(p_0) = c(p_0) \cdot \left( 8^{1/p_0}e + \max \left( 1, \frac{\log (\epsilon +p_0)}{p_0}\right) \right) . \end{aligned}$$

Theorem B extends a host of results in the Banach space literature, including by Johnson–Schechtman–Zinn [11], Hitczenko [5], and Hitczenko and Montgomery-Smith [6]. (see also [20, Theorem 6.20] and [19, Proposition 1.4.2]) Theorem B also yields the correct order of the constants as \(q \rightarrow \infty \), as discussed by Johnson et al. in loc. cit. where they extend previous work on Khinchin’s inequality by Rosenthal [24]. Moreover, all of these results are shown for Banach spaces. Theorem B holds additionally for all compact Lie groups, finite abelian groups and lattices, and spaces of labelled and unlabelled graphs.

2 Lévy’s equivalence in metric semigroups

In this section we prove:

Theorem 2.1

(Lévy’s Equivalence) Suppose \(({\mathscr {G}}, d_{\mathscr {G}})\) is a complete separable metric semigroup, \(X_n : (\varOmega , {\mathscr {A}}, \mu ) \rightarrow ({\mathscr {G}}, {\mathscr {B}}_{\mathscr {G}})\) are independent, \(X \in L^0(\varOmega , {\mathscr {G}})\), and \(S_n\) is defined as in (1.1). Then

$$\begin{aligned} S_n \longrightarrow X\ a.s.~\mathbb {P}_\mu \quad \Longleftrightarrow \quad S_n {\mathop {\longrightarrow }\limits ^{P}} X. \end{aligned}$$

Moreover, if the sequence \(S_n\) does not converge as above, then it diverges almost surely.

Special cases of this result have been shown in the literature. For instance, [2, §9.7] considers \({\mathscr {G}}= \mathbb {R}^n\). The more general case of a separable Banach space \({\mathbb {B}}\) was shown by It\(\hat{\text{ o }}\)–Nisio [9, Theorem 3.1], as well as by Hoffmann-Jørgensen and Pisier [8, Lemma 1.2]. The most general version in the literature to date is by Tortrat, who proved the result for a complete separable metric group in [25]. Thus Theorem 2.1 is the closest to assuming only the minimal structure necessary to state the result (as well as to prove it).

In order to prove Theorem 2.1, we first study basic properties of metric semigroups. Note that for a metric group, the following is standard; see [17], for instance.

Lemma 2.1

If \(({\mathscr {G}}, d_{\mathscr {G}})\) is a metric (semi)group, then the translation-invariance of \(d_{\mathscr {G}}\) implies the “triangle inequality”:

$$\begin{aligned} d_{\mathscr {G}}(y_1 y_2, z_1 z_2) \leqslant d_{\mathscr {G}}(y_1, z_1) + d_{\mathscr {G}}(y_2, z_2)\ \forall y_i, z_i \in {\mathscr {G}}, \end{aligned}$$
(2.1)

and in turn, this implies that each (semi)group operation is continuous.

If instead \({\mathscr {G}}\) is a group equipped with a metric \(d_{\mathscr {G}}\), then except for the last two statements, any two of the following assertions imply the other two:

  1. 1.

    \(d_{\mathscr {G}}\) is left-translation invariant: \(d_{\mathscr {G}}(ca, cb) = d_{\mathscr {G}}(a,b)\) for all \(a,b,c \in {\mathscr {G}}\). In other words, left-multiplication by any \(c \in {\mathscr {G}}\) is an isometry.

  2. 2.

    \(d_{\mathscr {G}}\) is right-translation invariant.

  3. 3.

    The inverse map \(: {\mathscr {G}}\rightarrow {\mathscr {G}}\) is an isometry. Equivalently, the triangle inequality (2.1) holds.

  4. 4.

    \(d_{\mathscr {G}}\) is invariant under all inner/conjugation automorphisms.

In order to show Theorem 2.1 for metric semigroups, we recall the following preliminary result from [14], and will use it below without further reference.

Proposition 2.1

[14] Suppose \(({\mathscr {G}}, d_{\mathscr {G}})\) is a metric semigroup, and \(a,b \in {\mathscr {G}}\). Then

$$\begin{aligned} d_{\mathscr {G}}(a,ba) = d_{\mathscr {G}}(b,b^2) = d_{\mathscr {G}}(a,ab) \end{aligned}$$
(2.2)

is independent of \(a \in {\mathscr {G}}\). Moreover, a set \({\mathscr {G}}\) is a metric semigroup only if \({\mathscr {G}}\) is a metric monoid, or the set of non-identity elements in a metric monoid \({\mathscr {G}}'\). This is if and only if the number of idempotents in \({\mathscr {G}}\) is one or zero, respectively. Furthermore, the metric monoid \({\mathscr {G}}'\) is (up to a monoid isomorphism) the unique smallest element in the class of metric monoids containing \({\mathscr {G}}\) as a sub-semigroup.

Remark 1

In the sequel, we denote—when required—the unique metric monoid containing a given metric semigroup \({\mathscr {G}}\) by \({\mathscr {G}}' := {\mathscr {G}}\cup \{ 1' \}\). Note that the idempotent \(1'\) may already be in \({\mathscr {G}}\), in which case \({\mathscr {G}}= {\mathscr {G}}'\). One consequence of Proposition 2.1 is that instead of working with metric semigroups, one can use the associated monoid \({\mathscr {G}}'\) instead. (In other words, the (non)existence of the identity is not an issue in many such cases.) This helps simplify other calculations. For instance, what would be a lengthy, inductive (yet straightforward) computation now becomes much simpler: for nonnegative integers kl, and \(z_0, z_1, \ldots , z_{k+l} \in {\mathscr {G}}\), the triangle inequality (2.1) implies:

$$\begin{aligned} d_{\mathscr {G}}(z_0 \cdots z_k, z_0 \cdots z_{k+l})&= d_{{\mathscr {G}}'}(1', \prod _{i=1}^l z_{k+i}) \leqslant \sum _{i=1}^l d_{{\mathscr {G}}'}(1', z_{k+i}) \\&= \sum _{i=1}^l d_{\mathscr {G}}(z_0, z_0 z_{k+i}). \end{aligned}$$

2.1 The Mogul’skii inequalities and proof of Lévy’s equivalence

Like Lévy’s equivalence (Theorem 2.1) and the Hoffmann–Jørgensen inequality (Theorem 1.1), many other maximal and minimal inequalities can be formulated using only the notions of a distance function and of a semigroup operation. We now extend to metric semigroups two inequalities by Mogul’skii, which were used in [22] to prove a law of the iterated logarithm in normed linear spaces. The following result will be useful in proving Theorem 2.1.

Proposition 2.2

(Mogul’skii–Ottaviani–Skorohod inequalities) Suppose \(({\mathscr {G}}, d_{\mathscr {G}})\) is a separable metric semigroup, \(z_0, z_1 \in {\mathscr {G}}\), \(a,b \in [0,\infty )\), and \(X_1, \ldots , X_n\)\(\in L^0(\varOmega ,{\mathscr {G}})\) are independent. Then for all integers \(1 \leqslant m \leqslant n\),

$$\begin{aligned} {\mathbb {P}}_\mu \left( \min _{m \leqslant k \leqslant n} d_{\mathscr {G}}(z_1, z_0 S_k) \leqslant a \right) \cdot&\min _{m \leqslant k \leqslant n} {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(S_k, S_n) \leqslant b \right) \\&\qquad \leqslant {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(z_1, z_0 S_n) \leqslant a + b \right) ,\\ {\mathbb {P}}_\mu \left( \max _{m \leqslant k \leqslant n} d_{\mathscr {G}}(z_1, z_0 S_k) \geqslant a \right) \cdot&\min _{m \leqslant k \leqslant n} {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(S_k, S_n) \leqslant b \right) \\&\qquad \leqslant {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(z_1, z_0 S_n) \geqslant a - b \right) . \end{aligned}$$

These inequalities strengthen [22, Lemma 1] from normed linear spaces to arbitrary metric semigroups. Also note that the second inequality generalizes the Ottaviani–Skorohod inequality to all metric semigroups. Indeed, sources such as [2, § 9.7.2] prove this result in the special case \({\mathscr {G}}= (\mathbb {R}^n, +), z_0 = z_1 = 0, m=1, a = \alpha + \beta , b = \beta \), with \(\alpha , \beta > 0\).

We omit the proof of Proposition 2.2 for brevity as it involves standard arguments. Using this result, one can now prove Theorem 2.1. The idea is to use the approach in [2]; however, it needs to be suitably modified in order to work in the current level of generality.

Proof of Theorem 2.1

The forward implication is easily verified in the more general setting of a separable metric space; see e.g. [2, Section 9.2]. Conversely, we claim that \(S_i\) is Cauchy almost everywhere, if it converges in probability to X. Given \(\epsilon ,\eta > 0\), the assumption and definitions imply that there exists \(n_0 \in {\mathbb {N}}\) such that

$$\begin{aligned} {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(S_m,X) \geqslant \epsilon /8 \right) < \frac{\eta }{2(1 + \eta )}, \quad \forall m \geqslant n_0. \end{aligned}$$

This implies that \(\displaystyle {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(S_m,S_n) \geqslant \epsilon / 4 \right) < \frac{\eta }{1 + \eta }\) for all \(n \geqslant m \geqslant n_0\). Now define \(S'_i := \prod _{j=1}^i X_{n_0+j}\). Fix \(n > n_0\) and apply Proposition 2.2 to \(\{ X_{n_0+i} : 1 \leqslant i \leqslant n - n_0 \}\) with \(m=1, a = \epsilon /2, b = \epsilon /4\), and \(z_0 = z_1\):

$$\begin{aligned}&{\mathbb {P}}_\mu \left( \max _{n_0 + 1 \leqslant m \leqslant n} d_{\mathscr {G}}(S_{n_0}, S_m) \geqslant \epsilon /2 \right) = {\mathbb {P}}_\mu \left( \max _{1 \leqslant i \leqslant n - n_0} d_{{\mathscr {G}}'}(z_0, z_0 S'_i) \geqslant \epsilon /2 \right) \\&\quad \leqslant \frac{{\mathbb {P}}_\mu \left( d_{{\mathscr {G}}'}(z_0, z_0 S'_{n-n_0}) \geqslant \epsilon /4 \right) }{1 - \max _{1 \leqslant i \leqslant n-n_0} {\mathbb {P}}_\mu \left( d_{{\mathscr {G}}'}(S'_i,S'_{n-n_0}) \geqslant \epsilon /4 \right) } < \frac{\eta / (1 + \eta )}{1 - \eta /(1+\eta )} = \eta . \end{aligned}$$

Now define \(Q_{n_0} := \sup _{n > n_0} d_{\mathscr {G}}(S_{n_0}, S_n)\) and \(\delta _{n_0} := \sup _{n> m > n_0} d_{\mathscr {G}}(S_m,S_n)\). Then \(\delta _{n_0} \leqslant 2 Q_{n_0}\); moreover, taking the limit of the above inequality as \(n \rightarrow \infty \) yields:

$$\begin{aligned} {\mathbb {P}}_\mu \left( Q_{n_0} \geqslant \epsilon / 2 \right) \leqslant \eta \quad \implies \quad {\mathbb {P}}_\mu \left( \delta _{n_0} \geqslant \epsilon \right) \leqslant \eta . \end{aligned}$$

But then \({\mathbb {P}}_\mu \left( \sup _{n > m} d_{\mathscr {G}}(S_m,S_n) \geqslant \epsilon \right) \leqslant \eta \) for all \(m > n_0\). Thus, \(S_n\) is Cauchy almost everywhere. Since \({\mathscr {G}}\) is complete, the result now follows from [2, Lemma 9.2.4]; that the almost sure limit is X is because \(S_n {\mathop {\longrightarrow }\limits ^{P}} X\). Finally, since the \(X_n\) are independent, the convergence of the sequence \(S_n\) is a tail event. In particular, it has probability zero or one by the Kolmogorov 0–1 law, concluding the proof.

We remark for completeness that the other Lévy equivalence has been addressed in [1, 3, 25] for various classes of topological groups. See also [23] for a variant in discrete completely simple semigroups, [2, 9] for Banach space versions, and [14] for a version over any normed abelian metric (semi)group.

3 Measuring the magnitude of sums of independent random variables

We now prove Theorems A and B using the Hoffmann–Jørgensen inequality in Theorem 1.1. Recall that the Banach space version of this inequality is extremely important in the literature and is widely used in bounding sums of independent Banach space-valued random variables. Having proved Theorem 1.1, an immediate application of our main result is in obtaining the first such bounds for general metric semigroups \({\mathscr {G}}\). We also provide uniformly good \(L^p\)-bounds and tail probability bounds on sums \(S_n\) of independent \({\mathscr {G}}\)-valued random variables.

3.1 An upper bound by Hoffmann–Jørgensen

In this subsection we prove Theorem A. The proof uses basic properties of decreasing rearrangements (see Definition 1.2), which we record here and use below, possibly without reference.

Proposition 3.1

Suppose \(X, Y : (\varOmega , {\mathscr {A}}, \mu ) \rightarrow [0,\infty )\) are random variables, and

$$\begin{aligned} x,\alpha ,\beta ,\gamma > 0, \quad t \in [0,1]. \end{aligned}$$
  1. 1.

    \(X^*(t) \leqslant x\) if and only if \({\mathbb {P}}_\mu \left( X > x \right) \leqslant t\).

  2. 2.

    \(X^*(t)\) is decreasing in \(t \in [0,1]\) and increasing in \(X \geqslant 0\).

  3. 2.

    \((X/x)^*(t) = X^*(t)/x\).

  4. 3.

    Suppose \({\mathbb {P}}_\mu \left( X> x \right) \leqslant \beta {\mathbb {P}}_\mu \left( Y > \gamma x \right) \) for all \(x>0\). Then for all \(p \in (0,\infty )\) and \(t \in (0,1)\),

    $$\begin{aligned} {\mathbb {E}}_\mu [Y^p] \geqslant \beta ^{-1} \gamma ^p {\mathbb {E}}_\mu [X^p], \qquad {\mathbb {E}}_\mu [X^p] \geqslant t X^*(t)^p. \end{aligned}$$
  5. 4.

    Fix finitely many tuples of positive constants \((\alpha _i, \beta _i, \gamma _i, \delta _i)_{i=1}^N\), and real-valued nondecreasing functions \(f_i\) such that for all \(x>0\) there exists at least one i such that

    $$\begin{aligned} f_i({\mathbb {P}}_\mu \left( X> \alpha _i x \right) ) \leqslant \beta _i {\mathbb {P}}_\mu \left( Y > \gamma _i x \right) ^{\delta _i}. \end{aligned}$$
    (3.1)

    Then

    $$\begin{aligned} X^*(t) \leqslant \max _{1 \leqslant i \leqslant N} \frac{\alpha _i}{\gamma _i} Y^*\left( (f_i(t)/\beta _i)^{1/\delta _i}\right) . \end{aligned}$$
    (3.2)

    If on the other hand (3.1) holds for all i, then

    $$\begin{aligned} X^*(t) \leqslant \min _{1 \leqslant i \leqslant N} \frac{\alpha _i}{\gamma _i} Y^*\left( (f_i(t)/\beta _i)^{1/\delta _i}\right) . \end{aligned}$$

Proof

These properties are shown using the definitions via straightforward arguments, and so we omit the proofs, except for the final part. By assumption there exists at least one i such that if \({\mathbb {P}}_\mu \left( X> \alpha _i x \right) > t\) for some t, then \(\beta _i {\mathbb {P}}_\mu \left( Y> \gamma _i x \right) ^{\delta _i} > f_i(t)\) since \(f_i\) is nondecreasing. For this choice of i, we obtain:

$$\begin{aligned} \alpha _i^{-1} \left\{ y : \ {\mathbb {P}}_\mu \left( X> y \right)> t \right\} \subset&\ \gamma _i^{-1} \left\{ y : \ \beta _i {\mathbb {P}}_\mu \left( Y> y \right) ^{\delta _i}> f_i(t) \right\} \\ =&\ \gamma _i^{-1} \left\{ y : \ {\mathbb {P}}_\mu \left( Y> y \right) > (f_i(t)/\beta _i)^{1/\delta _i} \right\} \end{aligned}$$

(where we only consider \(y \geqslant 0\)). Therefore for all \(t \in [0,1]\),

$$\begin{aligned} \left\{ y \geqslant 0 : \ {\mathbb {P}}_\mu \left( X> y \right)> t \right\} \quad \subset \quad \bigcup _{i=1}^N\ \frac{\alpha _i}{\gamma _i} \left\{ y \geqslant 0 : \ {\mathbb {P}}_\mu \left( Y> y \right) > (f_i(t)/\beta _i)^{1/\delta _i} \right\} . \end{aligned}$$

Taking the supremum of both sides yields Eq. (3.2). If on the other hand Eq. (3.1) holds for all i, then the preceding inclusion holds with the union replaced by intersection. Now taking the supremum of both sides yields Eq. (3.2) with maximum replaced by minimum (since each set in the intersection is an interval containing 0).

Using Proposition 3.1, we now show one of the main results in this paper.

Proof of Theorem A

Note for all n that

$$\begin{aligned} d_{\mathscr {G}}(z_0, z_0 X_n) \leqslant d_{\mathscr {G}}(z_1, z_0 S_{n-1}) + d_{\mathscr {G}}(z_1, z_0 S_n), \end{aligned}$$

from which we obtain

$$\begin{aligned} d_{\mathscr {G}}(z_0, z_0 X_n)^p \leqslant 2^{p+1} \sup _{n \in A} d_{\mathscr {G}}(z_1, z_0 S_n)^p. \end{aligned}$$

Taking first the supremum over \(n \in A\) and then the expectation proves the backward implication. Conversely, first claim that controlling sums of \({\mathscr {G}}\)-valued \(L^p\) random variables in probability (i.e., in \(L^0\)) allows us to control these sums in \(L^p\) as well, for \(p>0\). Namely, we make the following claim:

Suppose \(({\mathscr {G}}, d_{\mathscr {G}})\) is a separable metric semigroup, \(p \in (0,\infty )\), and \(X_1, \ldots , X_n\)\(\in L^p(\varOmega ,{\mathscr {G}})\) are independent. Now fix \(z_0, z_1 \in {\mathscr {G}}\) and let \(S_k, U_n, M_n\) be as in Definition 1.1 and Eq. (1.1). Then,

$$\begin{aligned} {\mathbb {E}}_\mu [U_n^p] \leqslant 2^{1 + 2p} ({\mathbb {E}}_\mu [M_n^p] + U_n^*(2^{-1-2p})^p). \end{aligned}$$

Note that the claim is akin to the upper bound by Hoffmann–Jørgensen that bounds \({\mathbb {E}}_\mu [\Vert S_n \Vert ^p]\) in terms of \({\mathbb {E}}_\mu [M_n^p]\) and the quantiles of \(\Vert S_n \Vert \) for Banach space-valued random variables (see [7, proof of Theorem 3.1] and [4, Lemma 3.1]). We omit its proof for brevity, as a similar statement is asserted in [20, Proposition 6.8]. Given the claim, define:

$$\begin{aligned} \begin{aligned} t_n :=&U_n^*(2^{-1-2p}) \quad (n \in A),\qquad U_A := \sup _{n \in A} d_{\mathscr {G}}(z_1, z_0 S_n), \\ M_A :=&\ \sup _{n \in A} d_{\mathscr {G}}(z_0, z_0 X_n), \qquad \quad t_A := U_A^*(2^{-1-2p}), \end{aligned} \end{aligned}$$
(3.3)

as above, where we also use the assumption that \(U_A < \infty \) almost surely. Now for all \(n \in A\), compute using the above claim and elementary properties of decreasing rearrangements:

$$\begin{aligned} {\mathbb {E}}_\mu [U_n^p] \leqslant 2^{1+2p} {\mathbb {E}}_\mu [M_n^p] + 2 (4 t_n)^p \leqslant 2^{1+2p} {\mathbb {E}}_\mu [M_A^p] + 2 (4 t_A)^p. \end{aligned}$$

This concludes the proof if A is finite; for \(A = {\mathbb {N}}\), use the monotone convergence theorem for the increasing sequence \(0 \leqslant U_n^p \rightarrow U_A^p\).

3.2 Two-sided bounds and \(L^p\) norms

We now formulate and prove additional results that control tail behavior for metric semigroups and monoids—specifically, \(M_A, U_n, U_n^*\). This includes proving our other main result, Theorem B. We begin by setting notation.

Definition 3.1

Suppose \({\mathscr {G}}\) is a metric semigroup.

  1. 1.

    Given \(X_n \in L^0(\varOmega ,{\mathscr {G}})\) as above, for all n in a finite or countable set A, define the random variable \(\ell _X = \ell _{(X_n)} : \mathbb {R} \rightarrow [0,\infty ]\) via:

    $$\begin{aligned} \ell _X(t) := {\left\{ \begin{array}{ll} \inf \{ y> 0\ : \ \sum _{n \in A} {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(z_0, z_0 X_n) > y \right) \leqslant t \}, &{}\quad \text {if } t \in [0,1],\\ 0, &{}\quad \text {otherwise.} \end{array}\right. } \end{aligned}$$

    As indicated in [6, §2], one then has:

    $$\begin{aligned} \mathbb {P}(\ell _X> x) = \sum _{n \in A} {\mathbb {P}}_\mu \left( d_{\mathscr {G}}(z_0, z_0 X_n) > x \right) , \end{aligned}$$

    where \(\mathbb {P}\) is the Lebesgue measure on [0, 1].

  2. 2.

    Two families of variables P(t) and Q(t) are said to be comparable, denoted by \(P(t) \approx Q(t)\), if there exist constants \(c_1, c_2 > 0\) such that \(c_1^{-1} P(t) \leqslant Q(t) \leqslant c_2 P(t)\) uniformly over all t. The \(c_i\) are called the “constants of approximation”. For the remaining definitions, assume\(({\mathscr {G}}, 1_{\mathscr {G}}, d_{\mathscr {G}})\)is a separable metric monoid.

  3. 3.

    Given \(t \geqslant 0\) and a random variable \(X \in L^0(\varOmega , {\mathscr {G}})\), define its truncation to be:

    $$\begin{aligned} X(t) := {\left\{ \begin{array}{ll} 1_{\mathscr {G}}, \quad &{}\quad \text{ if } d_{\mathscr {G}}(1_{\mathscr {G}}, X) > t,\\ X, &{} \quad \text{ otherwise. } \end{array}\right. } \end{aligned}$$
  4. 4.

    Given variables \(X_1, \ldots , X_n : \varOmega \rightarrow {\mathscr {G}}\), and \(r \in (0,1)\), define:

    $$\begin{aligned} U'_n(r) := \max _{1 \leqslant k \leqslant n} d_{\mathscr {G}}(1_{\mathscr {G}}, \prod _{i=1}^k X_i(\ell _X(r))). \end{aligned}$$

The following estimate on tail behavior compares \(U_n\) with its decreasing rearrangement.

Theorem 3.1

Given \(p_0 > 0\), there exist universal constants of approximation (depending only on \(p_0\)), such that for all \(p \geqslant p_0\), separable abelian metric monoids \(({\mathscr {G}}, 1_{\mathscr {G}}, d_{\mathscr {G}})\), and finite sequences \(X_1, \ldots , X_n\) of independent \({\mathscr {G}}\)-valued random variables (for any \(n \in {\mathbb {N}}\)),

$$\begin{aligned} {\mathbb {E}}_\mu [U_n^p]^{1/p} \approx U_n^*(e^{-p}/4) + {\mathbb {E}}[\ell _X^p]^{1/p} \approx (U'_n(e^{-p}/8))^* (e^{-p}/4) + {\mathbb {E}}[\ell _X^p]^{1/p}, \end{aligned}$$

where \(U_n\) and \(U'_n\) were defined in Eq. (1.1) and Definition 3.1 respectively.

For real-valued X, the expression \({\mathbb {E}}[|X|^p]^{1/p}\) is also denoted by \(\Vert X \Vert _p\) in the literature.

To show Theorem 3.1, we require some preliminary results which provide additional estimates to govern tail behavior, and which we now collect before proving the theorem. As these preliminaries are often extensions to metric semigroups of results in the Banach space literature, we will sketch or omit their proofs now.

The first result obtains two-sided bounds to control the behavior of the “maximum magnitude” \(M_A\) (cf. Eq. (3.3)).

Proposition 3.2

Suppose \(\{ X_n : n \in A \}\) is a (finite or countably infinite) sequence of independent random variables with values in a separable metric semigroup \(({\mathscr {G}}, d_{\mathscr {G}})\).

  1. 1.

    For all \(t \in (0,1)\), \(\ell _X(2t) \leqslant \ell _X(t/(1-t)) \leqslant M_A^*(t) \leqslant \ell _X(t)\).

  2. 2.

    Suppose \(X_n \in L^p(\varOmega ,{\mathscr {G}})\) for some \(p>0\) (and for all \(n \in A\)). For all \(t > 0\), define:

    $$\begin{aligned} \varPsi _X(t) := p \sum _{n \in A} \int _{\ell _X(t)}^\infty u^{p-1} {\mathbb {P}}_\mu \left( d_{\mathscr {G}}\left( z_0, z_0 X_n\right) > u \right) \ du. \end{aligned}$$

    Then, \(\displaystyle \frac{t \ell _X(t)^p + \varPsi _X(t)}{1+t} \leqslant {\mathbb {E}}_\mu [M_A^p] \leqslant \ell _X(t)^p + \varPsi _X(t)\).

Proof

The first part follows [6, Proposition 1] (using a special case of Equation (3.2)). For the second, follow the arguments for showing [4, Lemma 3.2]; see also [20, Lemma 6.9].

We next discuss a consequence of Hoffmann-Jørgensen’s inequality for metric semigroups, Theorem 1.1, which can be used to bound the \(L^p\)-norms of the variables \(U_n\)—or more precisely, to relate these \(L^p\)-norms to the tail distributions of \(U_n\) via \(U_n^*\).

Lemma 3.1

(Notation as in Definition 1.1 and Eq. (1.1)) There exists a universal positive constant \(c_1\) such that for any \(0 \leqslant t \leqslant s \leqslant 1/2\), any separable metric semigroup \(({\mathscr {G}}, d_{\mathscr {G}})\) with elements \(z_0, z_1\), and any sequence of independent \({\mathscr {G}}\)-valued random variables \(X_1, \ldots , X_n\),

$$\begin{aligned} U_n^*(t) \leqslant c_1 \frac{\log (1/t)}{\max \{ \log (1/s), \log \log (4/t) \}} \left( U_n^*(s) + M_n^*(t/2)\right) . \end{aligned}$$

Proof

We begin by writing down a consequence of Theorem 1.1:

$$\begin{aligned}&{\mathbb {P}}_\mu \left( U_n> (3K-1)t \right) \\&\qquad \leqslant \frac{1}{K!} \left( \frac{{\mathbb {P}}_\mu \left( U_n> t \right) }{{\mathbb {P}}_\mu \left( U_n \leqslant t \right) } \right) ^K + {\mathbb {P}}_\mu \left( M_n> t \right) , \quad \forall t > 0,\ \forall K,n \in {\mathbb {N}}.\nonumber \end{aligned}$$
(3.4)

If \({\mathbb {P}}_\mu \left( U_n > t \right) \leqslant 1/2\), then this quantity is further dominated by

$$\begin{aligned} 2 \max \left\{ {\mathbb {P}}_\mu \left( M_n> t \right) , \frac{1}{K!} (2 {\mathbb {P}}_\mu \left( U_n > t \right) )^K \right\} . \end{aligned}$$

Now carry out the steps mentioned in the proof of [6, Corollary 1]. \(\square \)

The final preliminary result is proved by adapting the proofs of [6, Lemma 3 and Corollary 2] to metric monoids.

Proposition 3.3

Suppose \(({\mathscr {G}}, 1_{\mathscr {G}}, d_{\mathscr {G}})\) is a separable metric monoid and \(X_1, \ldots ,\)\(X_n: \varOmega \rightarrow {\mathscr {G}}\) is a finite sequence of independent \({\mathscr {G}}\)-valued random variables. For \(r \in (0,1)\), define:

$$\begin{aligned} U''_n(r) := \max _{1 \leqslant k \leqslant n} d_{\mathscr {G}}(1_{\mathscr {G}}, \prod _{i=1}^k X_i'(\ell _X(r))), \end{aligned}$$

where \(X'_i(t)\) equals \(1_{\mathscr {G}}\) if \(d_{\mathscr {G}}(1_{\mathscr {G}}, X_i) \leqslant t\), and \(X_i\) otherwise.

  1. 1.

    Then \(U''_n(r)\) may be expressed as the sum of “disjoint” random variables \(V_k\) for \(k \in {\mathbb {N}}\). In other words, \(\varOmega \) can be partitioned into measurable subsets \(E_k\) such that \(V_k = U''_n(r)\) on \(E_k\) and \(1_{\mathscr {G}}\) otherwise. Moreover, the \(V_k\) may be chosen such that \(V_k^*(t) \leqslant k \cdot \ell (t (k-1)! / r^{k-1})\).

  2. 2.

    Given the assumptions, for all \(p \in (0,\infty )\),

    $$\begin{aligned} {\mathbb {E}}_\mu [U''_n(r)^p]^{1/p} \leqslant 2 e^{2^p r/p} {\mathbb {E}}[\ell _X^p]^{1/p}. \end{aligned}$$

With the above results in hand, we can now show the above theorem.

Proof of Theorem 3.1

Compute using the triangle inequality (2.1) and Remark 1:

$$\begin{aligned} d_{\mathscr {G}}(1_{\mathscr {G}}, X_k) \leqslant d_{\mathscr {G}}(1_{\mathscr {G}}, S_{k-1}) + d_{\mathscr {G}}(1_{\mathscr {G}}, S_k) \leqslant 2 U_n. \end{aligned}$$

Hence \(M_n \leqslant 2 U_n\). Now compute for \(p \geqslant p_0\), using Propositions 3.1 and 3.2 :

$$\begin{aligned} {\mathbb {E}}_\mu [U_n^p]^{1/p}&\geqslant \frac{1}{2} {\mathbb {E}}_\mu [M_n^p]^{1/p} \geqslant 2^{-1-p_0^{-1}} {\mathbb {E}}[\ell _X^p]^{1/p},\\ {\mathbb {E}}_\mu [U_n^p]^{1/p}&\geqslant \left( e^{-p}/8\right) ^{1/p} U_n^*(e^{-p}/8) \geqslant 8^{-p_0^{-1}} e^{-1} U_n^*(e^{-p}/4). \end{aligned}$$

Hence there exists a constant \(0 < c_1 = c_1(p_0)\) such that:

$$\begin{aligned} {\mathbb {E}}_\mu [U_n^p]^{1/p} \geqslant c_1^{-1} (U_n^*(e^{-p}/4) + {\mathbb {E}}[\ell _X^p]^{1/p}). \end{aligned}$$

This yields one inequality; another one is obtained using Proposition 3.2 as follows:

$$\begin{aligned} {\mathbb {P}}_\mu \left( U_n \ne U'_n(e^{-p}/8) \right)&\leqslant \mathbb {P}(M_n> \ell _X(e^{-p}/8)) \leqslant {\mathbb {P}}_\mu \left( M_n > M_n^*(e^{-p}/8) \right) \\&\leqslant e^{-p}/8. \end{aligned}$$

Now if \({\mathbb {P}}_\mu \left( U'_n(e^{-p}/8)> y \right) > \eta \) for some \(\eta \in [\frac{e^{-p}}{8},1]\), then by the reverse triangle inequality,

$$\begin{aligned} {\mathbb {P}}_\mu \left( U_n> y \right) \geqslant&\ {\mathbb {P}}_\mu \left( U_n> y,\ U_n = U'_n(e^{-p}/8) \right) \\ \geqslant&\ {\mathbb {P}}_\mu \left( U'_n(e^{-p}/8)> y \right) - {\mathbb {P}}_\mu \left( U_n \ne U'_n(e^{-p}/8) \right) > \eta - \frac{e^{-p}}{8}. \end{aligned}$$

Hence by definition and the above calculations,

$$\begin{aligned} U'_n(e^{-p}/8)^*(\eta ) \leqslant U_n^*(\eta - e^{-p}/8). \end{aligned}$$
(3.5)

Applying this with \(\eta = e^{-p}/4\),

$$\begin{aligned} U'_n(e^{-p}/8)^*(e^{-p}/4) \leqslant U_n^*(e^{-p}/8) \leqslant e 8^{1/p} {\mathbb {E}}_\mu [U_n^p]^{1/p} \leqslant e 8^{1/p_0} {\mathbb {E}}_\mu [U_n^p]^{1/p}. \end{aligned}$$

Hence as above, there exists a constant \(0 < c_2 = c_2(p_0)\) such that:

$$\begin{aligned} {\mathbb {E}}_\mu [U_n^p]^{1/p} \geqslant c_2^{-1} ( U'_n(e^{-p}/8)^*(e^{-p}/4) + {\mathbb {E}}[\ell _X^p]^{1/p}). \end{aligned}$$

This proves the second of the four claimed inequalities. The remaining arguments can now be shown by suitably adapting the proof of [6, Theorem 3].

Finally, we use Theorem 3.1 to prove our remaining main result.

Proof of Theorem B

Using Proposition 2.1, let \({\mathscr {G}}'\) denote the smallest metric monoid containing \({\mathscr {G}}\). Thus the \(X_k\) are a sequence of independent \({\mathscr {G}}'\)-valued random variables, and we may assume henceforth that \({\mathscr {G}}= {\mathscr {G}}'\). Compute using Proposition 3.2, and the fact that \(X^*\) and X have the same law for the real-valued random variable \(X = M_n\):

$$\begin{aligned} {\mathbb {E}}[\ell _X^q] =&\ \int _0^{1/2} \ell _X(2t)^q \cdot 2 dt \leqslant 2 \int _0^{1/2} M^*_n(t)^q\ dt \leqslant 2 \int _0^1 M^*_n(t)^q\ dt = 2 \mathbb {E}[(M^*_n)^q]\\ =&\ 2 {\mathbb {E}}_\mu [M_n^q]. \end{aligned}$$

Using this computation, as well as Lemma 3.1 and Theorem 3.1 for \({\mathscr {G}}'\), we compute:

$$\begin{aligned}&{\mathbb {E}}_\mu [U_n^q]^{1/q}\\&\quad \leqslant c'_1 ({\mathbb {E}}[\ell _X^q]^{1/q} + U_n^*(e^{-q}/4))\\&\quad \leqslant c'_1 \cdot 2^{1/q} {\mathbb {E}}_\mu [M_n^q]^{1/q} + c'_1 c_1 \frac{\log (4e^q)}{\max (\log (4e^p), \log \log (16e^q))} (U_n^*(e^{-p}/4) \\&\qquad + M_n^*(e^{-q}/8))\\&\quad \leqslant c'_1 \cdot 2^{1/q} {\mathbb {E}}_\mu [M_n^q]^{1/q} + c'_1 c_1 \frac{\log (4e^q)}{\max (\log (4e^p), \log (\epsilon + q))} (c_2 {\mathbb {E}}_\mu [U_n^p]^{1/p} \\&\qquad + M^*_n(e^{-q}/8)) \end{aligned}$$

since \(\epsilon \in (-q, \log (16)]\). There are now two cases: first if \(e^p \geqslant \epsilon +q\), then

$$\begin{aligned} \frac{\log (4e^q)}{\max (\log (4e^p), \log (\epsilon + q))} \leqslant \frac{q + \log (4)}{p + \log (4)} \leqslant \frac{q}{p} = \frac{q}{\max (p, \log (\epsilon + q))}. \end{aligned}$$

On the other hand, if \(e^p < \epsilon + q\) then set \(C := 1 + \frac{\log (4)}{p_0}\) and note that \(Cq \geqslant q + \log (4)\). Therefore,

$$\begin{aligned} \frac{\log (4e^q)}{\max (\log (4e^p), \log (\epsilon + q))} \leqslant \frac{q + \log (4)}{\log (\epsilon + q)} \leqslant \frac{C q}{\log (\epsilon + q)} = C \frac{q}{\max (p, \log (\epsilon + q))}. \end{aligned}$$

Using the above analysis now yields:

$$\begin{aligned}&\ {\mathbb {E}}_\mu [U_n^q]^{1/q}\\&\quad \leqslant c'_1 \cdot 2^{1/q} {\mathbb {E}}_\mu [M_n^q]^{1/q}\\&\qquad + c'_1 c_1 \left( 1 + \frac{\log (4)}{p_0} \right) \frac{q}{\max (p, \log (\epsilon +q))} (c_2 {\mathbb {E}}_\mu [U_n^p]^{1/p} + M^*_n(e^{-q}/8)). \end{aligned}$$

Setting \(c := c'_1 \max (2^{1/p_0}, c_1(1 + \log (4)/p_0), c_1 c_2 (1 + \log (4)/p_0))\), we obtain the first inequality claimed in the statement of the theorem.

To show the second inequality, we first verify that if \(\epsilon \geqslant \min (1, e - p_0)\), then the function \(f(x) := x/\log (\epsilon +x)\) is strictly increasing on \((p_0,\infty )\). Now compute:

$$\begin{aligned} \frac{q}{\max (p, \log (\epsilon +q))} =&\ \min \left( \frac{q}{p}, \frac{q}{\log (\epsilon +q)} \right) \geqslant \min \left( 1, \frac{q}{\log (\epsilon +q)} \right) \\ \geqslant&\ \min \left( 1, \frac{p_0}{\log (\epsilon +p_0)} \right) . \end{aligned}$$

Next, use Proposition 3.1 to show: \(M^*_n(e^{-q}/8) \leqslant {\mathbb {E}}_\mu [M_n^q]^{1/q} (8e^q)^{1/q} \leqslant 8^{1/p_0} e {\mathbb {E}}_\mu [M_n^q]^{1/q}\). Using the previous two facts, we now complete the proof of the second inequality by beginning with the first inequality:

$$\begin{aligned}&\ {\mathbb {E}}_\mu [U_n^q]^{1/q}\\&\quad \leqslant c \frac{q}{\max (p, \log (\epsilon +q))} ( {\mathbb {E}}_\mu [U_n^p]^{1/p} + M_n^*(e^{-q}/8)) + c {\mathbb {E}}_\mu [M_n^q]^{1/q}\\&\quad \leqslant c \frac{q}{\max (p, \log (\epsilon +q))} ({\mathbb {E}}_\mu [U_n^p]^{1/p} + 8^{1/p_0} e {\mathbb {E}}_\mu [M_n^q]^{1/q}) + c \cdot 1 \cdot {\mathbb {E}}_\mu [M_n^q]^{1/q}\\&\quad \leqslant c \frac{q}{\max (p, \log (\epsilon +q))} \left( {\mathbb {E}}_\mu [U_n^p]^{1/p} + 8^{1/p_0} e {\mathbb {E}}_\mu [M_n^q]^{1/q} \right. \\&\qquad \left. + \max (1, \frac{\log (\epsilon +p_0)}{p_0}) {\mathbb {E}}_\mu [M_n^q]^{1/q} \right) . \end{aligned}$$

The second inequality in the theorem now follows.