1 Introduction

The concept of entropy is fundamental in both statistical physics and information theory. It plays a certain role in applying information-theoretic ideas to combinatorial problems [17]. Many results of such a kind were reviewed by Radhakrishnan [21] and Galvin [10]. An entropy approach is often used in studies of colorings of graphs [11, 12]. Applications of the entropy as a combinatorial tool are typically based on the Shannon entropy and its conditional form. Meantime, other entropic functions have found to be useful in various questions [2]. The Rényi entropy [25] and the Tsallis–Havrda–Charvát (THC) entropy [14, 29] are especially important extensions of the Shannon entropy. In principle, such entropic functions may have combinatorial or computational applications. For instances, they both have been used in global thresholding approach to image processing [27].

The main goal of this study is to address entropy-based approach to counting problems with use of the Tsallis–Havrda–Charvát entropies. The paper is organized as follows. In Sect. 2, we recall properties of the THC entropies and prove a useful statement. In Sect. 3, we obtain THC-entropy versions of some combinatorial results related to the so-called Shearer lemma. In particular, we consider an upper estimate for the maximum possible cardinality of a family of k-subsets of the given set, when subsets obey certain restrictions. In Sect. 4, we derive one-parameter family of upper bounds on permanents of square (0, 1)-matrices. This family is an extension of the Brégman theorem. We describe an example of utility of the presented extension.

2 Definitions and Properties of the THC \(\alpha \)-Entropies

In this section, we briefly recall definitions of the Tsallis–Havrda–Charvát entropies and related conditional entropies. Required properties of these entropic functionals are discussed as well. Let discrete random variable X take values on the finite set \({\varOmega }_{X}\). The non-extensive entropy of strictly positive degree \(\alpha \ne 1\) is defined by [29]

$$\begin{aligned} H_{\alpha }(X):=\frac{1}{1-\alpha }\left( \sum _{{\,}x\in {\varOmega }_{X}} p(x)^{\alpha } - 1\right) . \end{aligned}$$
(1)

With the factor \(\left( 2^{1-\alpha }-1\right) ^{-1}\) instead of \((1-\alpha )^{-1}\), this entropic form was considered by Havrda and Charvát [14]. In non-extensive statistical mechanics, the entropy (1) is known as the Tsallis entropy. It is instructive to rewrite (1) as

$$\begin{aligned} H_{\alpha }(X)=-\sum _{x\in {\varOmega }_{X}}p(x)^{\alpha }{\,}\ln _{\alpha }p(x) =\sum _{x\in {\varOmega }_{X}}p(x){\,}\ln _{\alpha }\left( \frac{1}{p(x)}\right) . \end{aligned}$$
(2)

Assuming \(\xi >0\), the so-called \(\alpha \)-logarithm is defined as

$$\begin{aligned} \ln _{\alpha }(\xi )= \left\{ \begin{array}{ll} \frac{\xi ^{1-\alpha }-1}{1-\alpha }\>, &{} \text { for } \alpha >0,\ \alpha \ne 1, \\ \ln \xi , &{} \text { for } \alpha =1. \end{array}\right. \end{aligned}$$
(3)

In the limit \(\alpha \rightarrow 1\), the entropy (1) gives the standard Shannon entropy

$$\begin{aligned} H_{1}(X)=-\sum _{x\in {\varOmega }_{X}}p(x){\,}\ln {p(x)} \ . \end{aligned}$$
(4)

For all real \(q\in [0,1]\), we write the binary THC entropy

$$\begin{aligned} h_{\alpha }(q):=- q^{\alpha }\ln _{\alpha }(q)-(1-q)^{\alpha }\ln _{\alpha }(1-q). \end{aligned}$$
(5)

For \(q\in (0,1)\), this function is concave and obeys \(h_{\alpha }(q)=h_{\alpha }(1-q)\). The THC entropies succeed some natural properties of the Shannon entropy. The maximal value of (1) is equal to \(\ln _{\alpha }|{\varOmega }_{X}|\) and reached with the uniform distribution. For \(\alpha \ge 1\), the joint THC entropy of two random variables obeys [9]

$$\begin{aligned} H_{\alpha }(X,Y)\le {H}_{\alpha }(X)+H_{\alpha }(Y). \end{aligned}$$
(6)

In applications of information-theoretic methods, the notion of conditional entropy is widely used [6]. Let us put the particular functional

$$\begin{aligned} H_{1}(X|y)=-\sum \nolimits _{x}p(x|y){\,}\ln {p}(x|y), \end{aligned}$$

in which the sum is taken over \(x\in {\varOmega }_{X}\). The entropy of X conditional on knowing Y is defined as [6]

$$\begin{aligned} H_{1}(X|Y):=\sum \nolimits _{y} p(y){\,}H_{1}(X|y) =-\sum \nolimits _{x}\sum \nolimits _{y} p(x,y){\,}\ln {p}(x|y), \end{aligned}$$
(7)

where \(p(x|y)=p(x,y)/p(y)\). When the range of summation is clear from the context, we will omit symbols such as \({\varOmega }_{X}\) and \({\varOmega }_{Y}\).

In the literature, two kinds of the conditional THC entropy have been discussed [9]. These forms are respectively inspired by the two expressions given in (2). The first form is defined as [9]

$$\begin{aligned} H_{\alpha }(X|Y):=\sum \nolimits _{y}p(y)^{\alpha }H_{\alpha }(X|y), \end{aligned}$$
(8)

where

$$\begin{aligned} H_{\alpha }(X|y):=\frac{1}{1-\alpha }\left( \sum \nolimits _{x} p(x|y)^{\alpha }-1\right) =-\sum \nolimits _{x} p(x|y)^{\alpha }{\,}\ln _{\alpha }p(x|y) \end{aligned}$$
(9)

and strictly positive \(\alpha \ne 1\). The conditional entropy (8) is, up to a factor, the quantity originally introduced by Daróczy [7]. For any \(\alpha >0\), this conditional entropy obeys the chain rule written as [7]

$$\begin{aligned} H_{\alpha }(X,Y)=H_{\alpha }(X|Y)+H_{\alpha }(Y). \end{aligned}$$
(10)

Due to nonnegativity of \(H_{\alpha }(X|Y)\), for all \(\alpha >0\) we also have

$$\begin{aligned} H_{\alpha }(X,Y)\ge {H}_{\alpha }(Y). \end{aligned}$$

The chain rule (10) can be extended to more than two variables. Up to reordering of random variables, this result is expressed as [9]

$$\begin{aligned} H_{\alpha }(X_{1},X_{2},\ldots ,X_{n}) =\sum \nolimits _{j=1}^{n}H_{\alpha }(X_{j}|X_{j-1},\ldots ,X_{1}), \end{aligned}$$
(11)

where \(\alpha >0\). In the case \(\alpha =1\), we obtain the chain rule with the standard conditional entropy (7). This property turns to be very essential in entropic approach to counting. The second form of conditional THC entropy is introduced as [9]

$$\begin{aligned} {\widetilde{H}}_{\alpha }(X|Y):=\sum \nolimits _{y}p(y){\,}H_{\alpha }(X|y). \end{aligned}$$
(12)

Although the quantity (12) does not share the chain rule, it has found use in some questions [9, 22]. Its definition is based on the formulation, which seems to be more appropriate in the context of dynamical systems and generalized entropy rates [8, 24, 28]. We also have \({\widetilde{H}}_{\alpha }(X|Y)\le {H}_{\alpha }(X|Y)\) for \(\alpha \in (0,1)\) and \({\widetilde{H}}_{\alpha }(X|Y)\ge {H}_{\alpha }(X|Y)\) for \(\alpha \in (1,\infty )\). For \(\alpha =1\), the \(\alpha \)-entropies (8) and (12) both coincide with (7).

Using entropic approach in counting, several properties of the conditional entropy are required. One of these properties is the chain rule. The standard conditional entropy also satisfies

$$\begin{aligned} H_{1}(X|Y_{1},\ldots ,Y_{n-1},Y_{n}) \le {H}_{1}(X|Y_{1},\ldots ,Y_{n-1}). \end{aligned}$$
(13)

Thus, conditioning on more can only reduce the conditional entropy. This relation is sometimes required in counting [21]. Another very useful property of the standard conditional entropy is formulated as follows. Let \(Y\mapsto {f}(Y)\) be some function, whose domain covers the support of random variable Y. Then we have [21]

$$\begin{aligned} H_{1}\bigl (X\big |f(Y)\bigr )\ge {H}_{1}(X|Y). \end{aligned}$$
(14)

We shall now establish analogous properties for the conditional \(\alpha \)-entropies.

Proposition 1

Let X and \(Y_{1},\ldots ,Y_{n}\) be discrete random variables, where \(n\ge 1\). For \(\alpha \ge 1\), the conditional entropy (8) satisfies

$$\begin{aligned} H_{\alpha }(X|Y_{1},\ldots ,Y_{n-1},Y_{n}) \le {H}_{\alpha }(X|Y_{1},\ldots ,Y_{n-1}). \end{aligned}$$
(15)

For \(\alpha >0\), the conditional entropy (12) satisfies

$$\begin{aligned} {\widetilde{H}}_{\alpha }(X|Y_{1},\ldots ,Y_{n-1},Y_{n}) \le {\widetilde{H}}_{\alpha }(X|Y_{1},\ldots ,Y_{n-1}). \end{aligned}$$
(16)

Let \(Y\mapsto {f}(Y)\) be a function of random variable Y. For \(\alpha \ge 1\), the conditional entropy (8) satisfies

$$\begin{aligned} H_{\alpha }\bigl (X\big |f(Y)\bigr )\ge {H}_{\alpha }(X|Y). \end{aligned}$$
(17)

For \(\alpha >0\), the conditional entropy (12) satisfies

$$\begin{aligned} {\widetilde{H}}_{\alpha }\bigl (X\big |f(Y)\bigr )\ge {\widetilde{H}}_{\alpha }(X|Y). \end{aligned}$$
(18)

Proof

The results (15) and (16) were proved in [9, 23] and [24], respectively. Let us proceed to (17) and (18). Since the standard case is known, we assume \(\alpha \ne 1\). To each value u of the function, we assign the subset \(\omega _{u}\subseteq {\varOmega }_{Y}\) such that

$$\begin{aligned} \omega _{u}:=\bigl \{y:{\>}y\in {\varOmega }_{Y},{\>}f(y)=u\bigr \}. \end{aligned}$$

Then the probabilities are written as

$$\begin{aligned} p(x,u)=\sum _{y\in \omega _{u}}p(x,y),\, \qquad p(u)=\sum _{y\in \omega _{u}}p(y). \end{aligned}$$
(19)

The left-hand side of (17) is represented as

$$\begin{aligned} H_{\alpha }\bigl (X\big |f(Y)\bigr ) =\sum _{u\in {\varOmega }_{f(Y)}} p(u)^{\alpha }H_{\alpha }(X|u). \end{aligned}$$
(20)

Replacing \(p(u)^{\alpha }\) with p(u), we obtain the expression for \({\widetilde{H}}_{\alpha }\bigl (X\big |f(Y)\bigr )\). For strictly positive \(\alpha \ne 1\) and \(\xi \ge 0\), we introduce the function

$$\begin{aligned} \eta _{\alpha }(\xi )=\frac{\xi ^{\alpha }-\xi }{1-\alpha }. \end{aligned}$$

In terms of this function, we now write

$$\begin{aligned} p(u)^{\alpha }H_{\alpha }(X|u)= \sum _{x\in {\varOmega }_{X}}p(u)^{\alpha }{\,}\eta _{\alpha }\bigl (p(x|u)\bigr ) \>. \end{aligned}$$

As \(\eta _{\alpha }^{\prime \prime }(\xi )\le 0\) for the considered values of \(\alpha \), the function \(\xi \mapsto \eta _{\alpha }(\xi )\) is concave. For fixed x and u, we put numbers \(\lambda _{y}=p(y)/p(u)\) and \(\xi _{y}=p(x,y)/p(y)=p(x|y)\) such that

$$\begin{aligned} \sum _{y\in \omega _{u}}\lambda _{y}=\frac{p(u)}{p(u)}=1 \ , \qquad \sum _{y\in \omega _{u}}\lambda _{y}{\,}\xi _{y}=\frac{p(x,u)}{p(u)}=p(x|u), \end{aligned}$$

according to (19). By Jensen’s inequality, we then obtain

$$\begin{aligned}&\displaystyle p(u){\>}\eta _{\alpha }\bigl (p(x|u)\bigr )\ge p(u)\sum _{y\in \omega _{u}}\lambda _{y}{\,}\eta _{\alpha }(\xi _{y}) =\sum _{y\in \omega _{u}}p(y){\>}\eta _{\alpha }\bigl (p(x|y)\bigr ) \>. \end{aligned}$$
(21)
$$\begin{aligned}&\displaystyle p(u)^{\alpha }{\>}\eta _{\alpha }\bigl (p(x|u)\bigr )\ge p(u)^{\alpha }\sum _{y\in \omega _{u}}\lambda _{y}{\,}\eta _{\alpha }(\xi _{y}) =\sum _{y\in \omega _{u}}p(u)^{\alpha -1}p(y){\>}\eta _{\alpha }\bigl (p(x|y)\bigr ) \>. \qquad \end{aligned}$$
(22)

Summing (21) with respect to \(x\in {\varOmega }_{X}\), for all the considered values of \(\alpha \) one gets

$$\begin{aligned} p(u){\,}H_{\alpha }(X|u)\ge \sum _{y\in \omega _{u}}p(y){\,}H_{\alpha }(X|y). \end{aligned}$$

The latter leads to (18) after summing with respect to \(u\in {\varOmega }_{f(Y)}\). For all \(y\in \omega _{u}\) and \(\alpha >1\), we have \(p(u)\ge {p}(y)\) and \(p(u)^{\alpha -1}p(y)\ge {p}(y)^{\alpha }\). Combining this with (22) and summing with respect to \(x\in {\varOmega }_{X}\), we obtain

$$\begin{aligned} p(u)^{\alpha }H_{\alpha }(X|u)\ge \sum _{y\in \omega _{u}}p(y)^{\alpha }H_{\alpha }(X|y). \end{aligned}$$

Summing this with respect to \(u\in {\varOmega }_{f(Y)}\) completes the proof of (17). \(\square \)

Note that the standard case \(\alpha =1\) of (17) and (18) can be proved by repeating the above reasons with the concave function \(\xi \mapsto -\xi \ln \xi \). In the mentioned ranges of the parameter, the conditional THC entropies (8) and (12) enjoy the property with respect to conditioning on more. The result (15) has allowed to derive the one-parametric extension of entropic Bell inequalities originally given in [3]. Using (14), one can deduce a property useful in entropic approach to Bregman’s theorem [20, 21]. We shall now formulate a similar statement for the \(\alpha \)-entropies.

Corollary 2

Let the support \({\varOmega }_{Y}\) of random variable Y be partitioned into m mutually disjoint sets \(\omega _{j}\) as

$$\begin{aligned} {\varOmega }_{Y}=\bigcup _{j=1}^{m}\omega _{j}. \end{aligned}$$

Let \(\varpi _{j}\subseteq {\varOmega }_{X}\) be defined as

$$\begin{aligned} \varpi _{j}:=\bigl \{x:{\>}x\in {\varOmega }_{X},{\>}y\in \omega _{j},{\>}p(x|y)\ne 0\bigr \} \>. \end{aligned}$$

If \(\varpi _{j}\ne \varpi _{k}\) for all \(j\ne {k}\), then

$$\begin{aligned} H_{\alpha }(X|Y)&\le \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]^{\alpha }{\>}\ln _{\alpha }|\varpi _{j}|&(1\le \alpha <\infty )\ , \end{aligned}$$
(23)
$$\begin{aligned} {\widetilde{H}}_{\alpha }(X|Y)&\le \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]{\>}\ln _{\alpha }|\varpi _{j}|&(0<\alpha <\infty ). \end{aligned}$$
(24)

Proof

Let us take the function \(Y\mapsto {f}_{\omega }(Y)\) such that \(f_{\omega }(y)=\varpi _{j}\) for each \(y\in \omega _{j}\). It then follows from (17) and (18) that

$$\begin{aligned}&H_{\alpha }(X|Y)\le {H}_{\alpha }\bigl (X\big |f_{\omega }(Y)\bigr )= \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]^{\alpha } H_{\alpha }(X|\varpi _{j})&(1\le \alpha <\infty ), \end{aligned}$$
(25)
$$\begin{aligned}&{\widetilde{H}}_{\alpha }(X|Y)\le {\widetilde{H}}_{\alpha }\bigl (X\big |f_{\omega }(Y)\bigr )= \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]{\,}H_{\alpha }(X|\varpi _{j})&(0<\alpha <\infty ). \end{aligned}$$
(26)

The quantity \(H_{\alpha }(X|\varpi _{j})\) is represented as the sum

$$\begin{aligned} H_{\alpha }(X|\varpi _{j})=\sum _{x\in \varpi _{j}}\eta _{\alpha }\bigl (p(x|\omega _{j})\bigr ) \>. \end{aligned}$$
(27)

The sum of \(p(x|\omega _{j})\) over \(x\in \varpi _{j}\) is equal to 1, whence the term (27) does not exceed \(\ln _{\alpha }|\varpi _{j}|\). Combining this fact with (25) and (26) completes the proof. \(\square \)

Using Corollary 2, we will obtain upper bounds on conditional \(\alpha \)-entropies in some combinatorial problems. To do so, we have to estimate not only cardinalities \(|\varpi _{j}|\), but also probabilities \({\mathrm {Pr}}[Y\in \omega _{j}]\). From this viewpoint, the inequality (24) seems to be more appropriate.

3 Shearer’s Lemma and Intersections of k-Element Sets

In this section, we will examine some questions connected with the Shearer lemma [5]. The properties of the THC entropies lead to a lot of inequalities with interesting combinatorial applications. We first note the following.

Proposition 3

Let \(X=(X_{1},\ldots ,X_{n})\) be a random variable taking values in the set \(S=S_{1}\times \cdots \times {S}_{n}\), where each coordinate \(X_{j}\) is a random variable taking values in \(S_{j}\). For all \(\alpha \ge 1\), we have

$$\begin{aligned} H_{\alpha }(X)\le \sum _{j=1}^{n} H_{\alpha }(X_{j}). \end{aligned}$$
(28)

Proof

The claim (28) immediately follows by induction from (6). \(\square \)

The result (28) is a straightforward extension of proposition 15.7.2 of the book [1]. Hence, we can obtain several corollaries. The first of them is posed as follows.

Corollary 4

Let \({\mathcal {F}}\) be a family of subsets of the set \(\{1,\ldots ,n\}\), and let \(q_{j}\) denote the fraction of members of \({\mathcal {F}}\) that contain j. For all \(\alpha \ge 1\), we have

$$\begin{aligned} \ln _{\alpha }|{\mathcal {F}}|\le \sum _{j=1}^{n} h_{\alpha }(q_{j}). \end{aligned}$$
(29)

Proof

To each set \(F\in {\mathcal {F}}\), we assign its characteristic vector v(F), which is a binary n-tuple. Let \(X=(X_{1},\ldots ,X_{n})\) be the random variable taking values in \(\{0,1\}^{n}\) such that

$$\begin{aligned} {\mathrm {Pr}}\bigl [X=v(F)\bigr ]=|{\mathcal {F}}|^{-1} \qquad \forall {\,}F\in {\mathcal {F}}\ , \end{aligned}$$
(30)

whence \(H_{\alpha }(X)=\ln _{\alpha }|{\mathcal {F}}|\). The random variable \(X_{j}\) takes values in \(\{0,1\}\) and is j-th value in the characteristic vector. By definition of \(q_{j}\), the entropy of \(X_{j}\) is equal to \(h_{\alpha }(q_{j})\). Combining this with (28) completes the proof. \(\square \)

The result (29) provides an upper estimate for the maximum possible cardinality of a family of subsets. It is an \(\alpha \)-entropy version of the basic lemma proved in [16]. The authors of [16] used tools of information theory for studying a family of k-subsets, which satisfy some restrictions. We will further apply (29) to a specific family of k-element subsets of the set \(\{1,\ldots ,n\}\). Suppose that a family \({\mathcal {G}}=\{G_{1},\ldots ,G_{m}\}\) of m subsets of the set \(\{1,\ldots ,n\}\) obey the implication

$$\begin{aligned} \hbox {``}\{i,j\}\ne \{s,t\}\hbox {''}{\,}\Longrightarrow {\,} \hbox {``}G_{i}\cap {G}_{j}\ne {G}_{s}\cap {G}_{t}\hbox {''}. \end{aligned}$$
(31)

That is, no pairwise intersections of the k-subsets may coincide. We aim to estimate cardinality of this family from above. Let us begin with an auxiliary result.

Lemma 5

For \(\alpha \in [1,3.67]\), the function \(\lambda \mapsto {h}_{\alpha }(\lambda ^{2})/\lambda \) is concave for \(\lambda \in \left[ 0,1/\sqrt{2}\right] \).

Proof

We left out the case \(\alpha =1\), for which the concavity was reported in [16] for all \(\lambda \in [0,1]\). During the proof, we will use the following generalization of Bernoulli’s inequality (see, e.g., section 2.4 of the book [19]). For \(-1<x\ne 0\), one has

$$\begin{aligned} (1+x)^{r}&>1+rx \qquad \qquad \qquad \qquad (r\notin [0,1]) \ , \end{aligned}$$
(32)
$$\begin{aligned} (1+x)^{r}<1+rx \qquad \qquad \qquad \qquad (0<r<1). \end{aligned}$$
(33)

For \(\alpha >1\), we can write the expression

$$\begin{aligned} (\alpha -1){\>}\frac{h_{\alpha }(\lambda ^{2})}{\lambda }= -\lambda ^{2\alpha -1}+\frac{1-(1-\lambda ^{2})^{\alpha }}{\lambda }. \end{aligned}$$
(34)

The term \(-\lambda ^{2\alpha -1}\) is concave with respect to \(\lambda \) for all \(\alpha >1\). We will show concavity of the second term in the right-hand side of (34). Let us use the second derivative test. For arbitrary function \(\xi \mapsto {f}(\xi )\), one has a general expression

$$\begin{aligned} \frac{{\mathrm {d}}^{2}}{{\mathrm {d}}\lambda ^{2}} {\>}\frac{f(\lambda ^{2})}{\lambda }= \frac{2}{\lambda ^{3}} {\,}\Bigl (2\xi ^{2}f^{\prime \prime }(\xi )-\xi {f}^{\prime }(\xi )+f(\xi )\Bigr ), \end{aligned}$$
(35)

where \(\xi =\lambda ^{2}\). Substituting \(f_{\alpha }(\xi )=1-(1-\xi )^{\alpha }\) finally gives

$$\begin{aligned} 2\xi ^{2}f_{\alpha }^{\prime \prime }(\xi )-\xi {f}_{\alpha }^{\prime }(\xi )+f_{\alpha }(\xi )= 1+(1-\xi )^{\alpha -2} \bigl (-1+c_{1}\xi +c_{2}\xi ^{2} \bigr ). \end{aligned}$$
(36)

The coefficients in (36) are calculated as

$$\begin{aligned} c_{1}=2-\alpha ,\, \qquad c_{2}=-1+3\alpha -2\alpha ^2=\frac{1}{8}-2\left( \alpha -\frac{3}{4}\right) ^{2}. \end{aligned}$$
(37)

We will show that the quantity (36) is not positive for \(\alpha \in [1,3.67]\) and \(\lambda \in \left[ 0,1/\sqrt{2}\right] \). Let us consider separately the cases \(\alpha \in [1,2]\) and \(\alpha \in [2,3.67]\).

Taking the intervals \(\alpha \in [1,2]\) and \(\xi \in [0,1]\), we have

$$\begin{aligned} (1-\xi )^{2-\alpha }\le 1-(2-\alpha )\xi =1-c_{1}\xi \qquad (1\le \alpha \le 2). \end{aligned}$$
(38)

This formula is based on (33) with \(x=-\xi \) and \(r=2-\alpha \). Due to (38), we rewrite (36) in the form

$$\begin{aligned} (1-\xi )^{\alpha -2} \Bigl ((1-\xi )^{2-\alpha } -1+c_{1}\xi +c_{2}\xi ^{2} \Bigr ) \le (1-\xi )^{\alpha -2}c_{2}\xi ^{2}\le 0 \ , \end{aligned}$$
(39)

where \(c_{2}\le 0\) for \(\alpha \in [1,2]\) by (37). Here, the concavity takes place for all \(\lambda \in [0,1]\).

The case \(\alpha \ge 2\) is more complicated to analysis. Here, we introduce the positive parameter \(\beta =\alpha -2\). The condition of negativity of (36) is then rewritten as

$$\begin{aligned} (1-\xi )^{\beta } \bigl (1+\beta \xi +\gamma \xi ^{2} \bigr )-1=:F_{\beta }(\xi )\ge 0, \end{aligned}$$

where \(\gamma =-c_{2}=3+5\beta +2\beta ^{2}\). This inequality is to be proved for \(\xi =\lambda ^{2}\le 1/2\).

We begin with the case \(\beta \in [0,1]\). Using the polynomial \(p_{\beta }(\xi )=1+\beta \xi +\gamma \xi ^{2}\), we write the derivative

$$\begin{aligned} \frac{{\mathrm {d}}{F}_{\beta }}{{\mathrm {d}}\xi }=(1-\xi )^{\beta -1} \Bigl ((1-\xi )p_{\beta }^{\prime }(\xi )-\beta \,p_{\beta }(\xi ) \Bigr ). \end{aligned}$$

Doing simple calculations, we easily obtain

$$\begin{aligned} (1-\xi )p_{\beta }^{\prime }(\xi )-\beta \,p_{\beta }(\xi )= \xi \Bigl (\bigl (2\gamma -\beta -\beta ^{2}\bigr )-\gamma (2+\beta )\xi \Bigr ). \end{aligned}$$
(40)

As \(\xi \ge 0\), the derivative \({\mathrm {d}}{F}_{\beta }/{\mathrm {d}}\xi \) is not negative, whenever

$$\begin{aligned} \xi \le \frac{2\gamma -\beta -\beta ^{2}}{\gamma (2+\beta )} =\frac{6+9\beta +3\beta ^{2}}{(3+5\beta +2\beta ^{2})(2+\beta )}. \end{aligned}$$
(41)

For \(\beta \in [0,1]\), the right-hand side of (41) monotonically decreases with \(\beta \) from 1 at \(\beta =0\) up to 0.6 at \(\beta =1\). The condition (41) is clearly satisfied for all \(\xi \in [0,1/2]\). Here, the function \(F_{\beta }(\xi )\) does not decrease. Combining this with \(F_{\beta }(0)=0\), we finally get \(F_{\beta }(\xi )\ge 0\) for all \(\beta \in [0,1]\) and \(\xi \in [0,1/2]\).

For \(\beta \ge 1\), we apply \((1-\xi )^{\beta }\ge 1-\beta \xi \) due to (32). Thus, the quantity of interest obeys

$$\begin{aligned} F_{\beta }(\xi )\ge (1-\beta \xi ) \bigl ( 1+\beta \xi +\gamma \xi ^{2} \bigr )-1 \ge \bigl (\gamma -\beta ^{2}\bigr )\xi ^{2}-\beta \gamma \xi ^{3}. \end{aligned}$$

The latter is not negative, whenever \(\gamma -\beta ^{2}\ge \beta \gamma \xi \). Due to \(\xi \le 1/2\), we can focus on the inequality \(2\bigl (\gamma -\beta ^{2}\bigr )-\beta \gamma \ge 0\), or

$$\begin{aligned} 6+7\beta -3\beta ^{2}-2\beta ^{3}\ge 0. \end{aligned}$$
(42)

Inspecting roots of the polynomial, the condition (42) holds for all \(\beta \in [0,1.67]\), though we use it only for \(\beta \in [1,1.67]\). The latter completes the proof for \(\alpha \in [3,3.67]\). \(\square \)

We have shown concavity of the function \(\lambda \mapsto {h}_{\alpha }(\lambda ^{2})/\lambda \) for \(\alpha \in [1,3.67]\) and \(\lambda \in \left[ 0,1/\sqrt{2}\right] \). When \(\alpha \in [1,2]\), the concavity actually holds for \(\lambda \in [0,1]\). We now formulate the desired estimate as follows.

Proposition 6

Let \({\mathcal {G}}=\{G_{1},\ldots ,G_{m}\}\) be a family containing m k-subsets of the set \(\{1,\ldots ,n\}\), and let \({\mathcal {G}}\) satisfy the property (31). Let \(\lambda _{j}\) denote a proportion of those members of \({\mathcal {G}}\) that contain j. If the precondition

$$\begin{aligned} \lambda _{j}\le \frac{1}{\sqrt{2}} \end{aligned}$$
(43)

holds for all \(j\in \{1,\ldots ,n\}\), then for all \(\alpha \in [1,3.67]\),

$$\begin{aligned} \ln _{\alpha }\left( {\begin{array}{c}m\\ 2\end{array}}\right) \le {k}{\>}\frac{h_{\alpha }(\lambda ^{2})}{\lambda } \ , \qquad \lambda :=\sum _{j=1}^{n}\frac{\lambda _{j}}{k}{\>}\lambda _{j}. \end{aligned}$$
(44)

Proof

Let us consider pairwise intersections of members of \({\mathcal {G}}\). Each \(j\in \{1,\ldots ,n\}\) will appear in a proportion

$$\begin{aligned} q_{j}^{*}=\left( {\begin{array}{c}m\\ 2\end{array}}\right) ^{-1}\left( {\begin{array}{c}\lambda _{j}m\\ 2\end{array}}\right) =\lambda _{j}^{2}-\frac{\lambda _{j}(1-\lambda _{j})}{m-1}. \end{aligned}$$

In the ratio, the denominator gives the number of all pairwise intersections; the numerator is the number of those pairwise intersections that contain j. The binary entropy (5) is concave for \(q\in (0,1)\) and reaches its maximum at the point \(q=1/2\). Hence, it does not decrease on (0, 1 / 2). Then the precondition (43) provides

$$\begin{aligned} h_{\alpha }(q_{j}^{*})\le {h}_{\alpha }(\lambda _{j}^{2}). \end{aligned}$$

Combining this with Corollary 4, for \(\alpha \ge 1\) we obtain

$$\begin{aligned} \ln _{\alpha }\left( {\begin{array}{c}m\\ 2\end{array}}\right) \le \sum _{j=1}^{n}h_{\alpha }(\lambda _{j}^{2}) =k\sum _{j=1}^{n}\frac{\lambda _{j}}{k}{\>} \frac{h_{\alpha }(\lambda _{j}^{2})}{\lambda _{j}}. \end{aligned}$$
(45)

We further use \(\sum _{j=1}^{n}\lambda _{j}/k=1\) and concavity of the function \(\lambda \mapsto {h}_{\alpha }(\lambda ^{2})/\lambda \). Combining (45) with the Jensen inequality completes the proof. \(\square \)

We have obtained an implicit upper bound on \(m=|{\mathcal {G}}|\) in terms of \(k=|G_{j}|\) and the average proportion \(\lambda \) of sets containing a particular element. Our result is a parametric extension of one of the statements proved in [16]. It also differs in the following two respects. First, the precondition (43) is now imposed. On the other hand, the formula (44) is more explicit in the sense that no unknown asymptotically small terms appear. For the prescribed value of \(\lambda \in \left[ 0,1/\sqrt{2}\right] \), we could optimize a bound with respect to the parameter \(\alpha \). The authors of [16] also consider a family of k-sets, in which the intersection of no two is contained in a third. Such estimates are connected with one of questions raised by Erdős.

The statement of Proposition 3 allows a certain extension. In the case of Shannon entropies, extension of such a kind has been proved by Shearer [5]. It is often referred to as the Shearer lemma [12, 15, 21]. Its generalization in terms of the THC entropies is posed as follows.

Proposition 7

Let \(X=(X_{1},\ldots ,X_{n})\) be a random variable taking values in the set \(S=S_{1}\times \cdots \times {S}_{n}\), where each coordinate \(X_{j}\) is a random variable taking values in \(S_{j}\). For a subset I of \(\{1,\ldots ,n\}\), let X(I) denote the random variable \((X_{j})_{j\in {I}}\). Suppose that \({\mathcal {G}}\) is a family of subsets of \(\{1,\ldots ,n\}\) and each \(j\in \{1,\ldots ,n\}\) belongs to at least k members of \({\mathcal {G}}\). For \(\alpha \ge 1\), we then have

$$\begin{aligned} k{\,}H_{\alpha }(X)\le \sum _{G\in {\mathcal {G}}} H_{\alpha }\bigl (X(G)\bigr ). \end{aligned}$$
(46)

Proof

Following [21], we will apply the chain rule. Using (11), for \(\alpha \ge 1\) we have

$$\begin{aligned} H_{\alpha }(X)&=\sum _{j=1}^{n} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k<j)\bigr ), \nonumber \\ H_{\alpha }\bigl (X(G)\bigr )&=\sum _{j\in {G}} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k\in {G},{\,}k<j)\bigr ) \\&\ge \sum _{j\in {G}} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k<j)\bigr ). \nonumber \end{aligned}$$
(47)

The step (47) follows from (15), since any string \((X_{k}:{\,}k<j)\) contains more elements than \((X_{k}:{\,}k\in {G},{\,}k<j)\). Summing (47) with respect to all \(G\in {\mathcal {G}}\) gives

$$\begin{aligned} \sum _{G\in {\mathcal {G}}} H_{\alpha }\bigl (X(G)\bigr )\ge k\sum _{j=1}^{n} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k<j)\bigr ), \end{aligned}$$
(48)

because each \(j\in \{1,\ldots ,n\}\) belongs to at least k members of \({\mathcal {G}}\). \(\square \)

The statement of Proposition 7 is a THC-entropy extension of the Shearer lemma. A related geometric picture was described in [21]. Interesting geometric applications are also discussed in [1]. An immediate consequence of (46) is posed as follows.

Corollary 8

Let N be a finite set, and let \({\mathcal {F}}\) be a family of subsets of N. Let \({\mathcal {G}}=\{G_{1},\ldots ,G_{m}\}\) be a family of subsets of N such that each element of N appears in at least k members of \({\mathcal {G}}\). For each \(1\le {j}\le {m}\), we define \({\mathcal {F}}_{j}:=\{F\cap {G}_{j}:{\>}F\in {\mathcal {F}}\}\). For \(\alpha \ge 1\), we then have

$$\begin{aligned} k{\,}\ln _{\alpha }|{\mathcal {F}}|\le \sum _{j=1}^{m} \ln _{\alpha }|{\mathcal {F}}_{j}|. \end{aligned}$$
(49)

For \(\alpha =1\), the formula (49) is reduced to a result originally proved in [5]. Some applications of the latter were also described in [5]. Of course, applications of such a kind can further be considered on the base of (49). In some cases, a family of one-parameter relations may give a stronger bound. An explicit example of this situation is the case of upper bounds on permanents of square (0, 1)-matrices.

4 Upper Bounds on Permanents of (0, 1)-Matrices

In this section, we will derive a family of one-parameter upper bounds on the permanent of a square (0, 1)-matrix. The well-known upper bound on permanents has been conjectured by Minc [18] and later proved by Brégman [4]. Brégman’s proof is based on the duality theorem of convex programming and properties of doubly stochastic matrices. A short elementary proof of this result was given by Schrijver [26]. Schrijver also mentioned an upper bound for permanents of arbitrary nonnegative matrices. A similar proof with randomization is explained in [1]. Developing an approach with randomization, Radhakrishnan presented an entropy-based proof [20]. Our aim is to study the question with use of the THC entropies. First, we recall preliminary facts. Let \({\mathsf {A}}=\bigl [\bigl [a(i,j)\bigr ]\bigr ]\) be a nonnegative \(n\times {n}\)-matrix, and let \(S_{n}\) denote the set of all permutations on \(\{1,\ldots ,n\}\). The permanent of \({\mathsf {A}}\) is defined as

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}}):=\sum _{\sigma \in {S}_{n}}\prod _{i=1}^{n} a\bigl (i,\sigma (i)\bigr ). \end{aligned}$$
(50)

We further consider matrices with elements \(a(i,j)\in \{0,1\}\). By \(S\subseteq {S}_{n}\), we mean the set of permutations \(\sigma \) such that \(a\bigl (i,\sigma (i)\bigr )=1\) for all \(i\in \{1,\ldots ,n\}\). It is obvious that \({\mathrm {per}}({\mathsf {A}})=|S|\). It is assumed that the matrix contain no rows of only zeros, since otherwise its permanent is certainly zero. We claim the following.

Proposition 9

Let \({\mathsf {A}}\) be a \(n\times {n}\) (0, 1)-matrix with \({\mathrm {per}}({\mathsf {A}})\ne 0\), and let \(r_{i}\ne 0\) be a number of ones in i-th row \((i=1,\ldots ,n)\). For all \(\alpha \ge 1\), the permanent of \({\mathsf {A}}\) obeys the inequality

$$\begin{aligned} \ln _{\alpha }\bigl ({\mathrm {per}}({\mathsf {A}})\bigr ) \le \sum _{i=1}^{n}\frac{1}{r_{i}}{\,}\sum _{j=1}^{r_{i}}\ln _{\alpha }(j). \end{aligned}$$
(51)

Proof

Let \(\sigma \) be a random permutation chosen uniformly from S. We then have the value \(H_{\alpha }(\sigma )={\ln _{\alpha }}{|S|}\), which coincides with the left-hand side of (51). We will show that, for \(\alpha \ge 1\), the entropy \(H_{\alpha }(\sigma )\) does not exceed the right-hand side of (51). Let us choose a random permutation \(\tau \in {S}_{n}\) uniformly. Using the chain rule (11), for each permutation \(\tau \) we can write

$$\begin{aligned} H_{\alpha }(\sigma )&=H_{\alpha }\Bigl ({\sigma }{\bigl (\tau (1)\bigr )}\Bigr )+ H_{\alpha }\Bigl ({\sigma }{\bigl (\tau (2)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )}\Bigr ) \nonumber \\&\qquad +\cdots +H_{\alpha }\Bigl ({\sigma }{\bigl (\tau (n)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (n-1)\bigr )}\Bigr ) \end{aligned}$$
(52)
$$\begin{aligned}&\le {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (1)\bigr )}\Bigr )+ {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (2)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )}\Bigr ) \nonumber \\&\quad \quad +\cdots +{\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (n)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (n-1)\bigr )}\Bigr ). \end{aligned}$$
(53)

Here, the second inequality holds for \(\alpha \ge 1\). To the given permutation \(\tau \) and index \(i\in \{1,\ldots ,n\}\), we assign the integer \(k(\tau ,i)\in \{1,\ldots ,n\}\) such that

$$\begin{aligned} k(\tau ,i):=\tau ^{-1}(i) \ , \qquad {\sigma }{\bigl (\tau (k)\bigr )}=\sigma (i) \end{aligned}$$

Summing (53) over all \(\tau \in {S}_{n}\), we further obtain

$$\begin{aligned} |S_{n}|{\,}H_{\alpha }(\sigma )&\le \sum _{\tau \in {S}_{n}}\left\{ {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (1)\bigr )}\Bigr )+ {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (2)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )}\Bigr ) +\cdots \right. \nonumber \\&\quad \left. \cdots +{\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (n)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (n-1)\bigr )}\Bigr ) \right\} \nonumber \\&=\sum _{\tau \in {S}_{n}}\sum _{i=1}^{n} {\widetilde{H}}_{\alpha }\Bigl (\sigma (i)\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (k-1)\bigr )}\Bigr ). \end{aligned}$$
(54)

At the last step, we gather the contributions of different \(\sigma (i)\) separately. For the given \(\sigma \in {S}\), \(\tau \in {S}_{n}\), and \(i\in \{1,\ldots ,n\}\), we define \(R_{i}(\sigma ,\tau )\) to be the set of those column indices that differ from \({\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (k-1)\bigr )}\) and give 1’s in i-th row [20]. By definition of \(r_{i}\), we have \(\big |R_{i}(\sigma ,\tau )\big |\le {r}_{i}\). Using (24), we then rewrite (54) as

$$\begin{aligned} |S_{n}|{\,}H_{\alpha }(\sigma )\le \sum _{i=1}^{n}\sum _{\tau \in {S}_{n}} \sum _{j=1}^{r_{i}}{\,}\underset{\sigma }{{\mathrm {Pr}}}\Bigl [\bigl |R_{i}(\sigma ,\tau )\bigr |=j\Bigr ]\ln _{\alpha }(j). \end{aligned}$$

Dividing this relation by \(|S_{n}|\) and taking into account the uniform distribution of \(\tau \in {S}_{n}\), we immediately obtain

$$\begin{aligned} H_{\alpha }(\sigma )\le \sum _{i=1}^{n} \sum _{j=1}^{r_{i}}{\,}\underset{\sigma ,\tau }{{\mathrm {Pr}}}\Bigl [\bigl |R_{i}(\sigma ,\tau )\bigr |=j\Bigr ]\ln _{\alpha }(j). \end{aligned}$$
(55)

We now recall a principal observation of [20] that

$$\begin{aligned} \underset{\sigma ,\tau }{{\mathrm {Pr}}}\Bigl [{\bigl |R_{i}(\sigma ,\tau )\bigr |}=j\Bigr ] =\frac{1}{r_{i}}. \end{aligned}$$

Combining this with (55) completes the proof. \(\square \)

The statement of Theorem 9 leads to an one-parameter family of upper bounds on permanents. In the limit \(\alpha \rightarrow 1^{+}\), the relation (51) leads to the previous result [1]

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})\le \prod _{i=1}^{n} (r_{i}!)^{1/r_{i}}. \end{aligned}$$
(56)

It was conjectured in [18] and then proved in several ways [4, 20, 26]. This result can naturally be reformulated as an upper bound on the number of perfect matchings in a bipartite graph [10, 21].

We now consider a significance of the one-parameter bound (51). It is instructive to consider a concrete example. Let \(n\times {n}\)-matrix \({\mathsf {A}}\) have elements \(a(1,j)=1\) for all \(j=1,\ldots ,n\) and \(a(i,j)=\delta (i,j)\) for \(i=2,\ldots ,n\). That is, our matrix is obtained from the identity \(n\times {n}\)-matrix by filling its first row with ones. We then have \({\mathrm {per}}({\mathsf {A}})=1\). On the other hand, one gives \(r_{1}=n\) and \(r_{2}=\cdots =r_{n}=1\). Let us compare values of the bounds (51) and (56). It is easy to apply (51) in the case \(\alpha =2\), since \(\ln _{2}(\xi )=1-1/\xi \) due to (3). For \(\alpha =2\), the upper bound (51) gives

$$\begin{aligned} 1-\frac{1}{{\mathrm {per}}({\mathsf {A}})}\le \frac{1}{n}{\,}\sum _{j=1}^{n}\left( 1-\frac{1}{j}\right) =1-\frac{{\mathcal {H}}_{n}}{n}. \end{aligned}$$
(57)

By \({\mathcal {H}}_{n}\), we denote the n-th harmonic number [13]. It is well known that the asymptotic expansion of this number for large n is written as [13]

$$\begin{aligned} {\mathcal {H}}_{n}=\ln {n}+\gamma +O(1/n), \end{aligned}$$

where \(\gamma \) is the Euler–Mascheroni constant. From (57), we immediately obtain

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})\le \frac{n}{{\mathcal {H}}_{n}}=\frac{n}{\ln {n}}\left\{ 1+O\left( \frac{1}{\ln {n}}\right) \right\} . \end{aligned}$$
(58)

Substituting the same collection of numbers \(r_{i}\) into (56) gives

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})\le (n!)^{1/n}=\frac{n}{e}\left\{ 1+O\left( \frac{1}{n}\right) \right\} . \end{aligned}$$
(59)

At the last step, we used the Stirling approximation. For sufficiently large n, the upper bound (58) is significantly stronger than (59). On the other hand, both the bounds are very far from the actual value of permanent. Nevertheless, our example has shown a relevance of the result (51) proved for \(\alpha \ge 1\).

We can further ask for extending bounds with values \(\alpha \in (0,1)\). The corresponding result can be obtained by an immediate extension of Schrijver’s proof [26]. We have the following statement.

Proposition 10

Let \({\mathsf {A}}\) be a \(n\times {n}\) (0, 1)-matrix with \({\mathrm {per}}({\mathsf {A}})\ne 0\), and let \(r_{i}\ne 0\) be a number of ones in i-th row \((i=1,\ldots ,n)\). For \(\alpha \in (0,1)\), the permanent of \({\mathsf {A}}\) obeys the inequality

$$\begin{aligned} -\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}({\mathsf {A}})}\right) \le \sum _{i=1}^{n}\frac{1}{r_{i}}{\,}\sum _{j=1}^{r_{i}}\ln _{\alpha }(j). \end{aligned}$$
(60)

Proof

For convenience, we introduce the function

$$\begin{aligned} g_{\alpha }(\xi ):=\frac{\xi ^{\alpha }-\xi }{\alpha -1} =-\xi {\,}\ln _{\alpha }\left( \frac{1}{\xi }\right) , \end{aligned}$$

where \(\xi >0\) and \(\alpha \in (0,1)\). For these values of \(\alpha \), the function \(\xi \mapsto {g}_{\alpha }(\xi )\) is convex. Due to the Jensen inequality, we have

$$\begin{aligned} g_{\alpha }\left( \frac{1}{r}{\,}\sum _{k=1}^{r} \xi _{k}\right) \le \frac{1}{r}{\,}\sum _{k=1}^{r} g_{\alpha }(\xi _{k}). \end{aligned}$$
(61)

We will prove (60) by induction on n. For \(n=1\), the result is trivial. Suppose that the claim is already proved for \((n-1)\times (n-1)\)-matrices. The permanent of a \(n\times {n}\)-matrix can be decomposed as

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})=\sum _{\begin{array}{c} k=1 \\ a(i,k)=1 \end{array}}^{n} {\mathrm {per}}\bigl ({\mathsf {A}}(i,k)\bigr ). \end{aligned}$$
(62)

Here, the submatrix \({\mathsf {A}}(i,k)\) is obtained from \({\mathsf {A}}\) by eliminating the i-th row and the k-th column. Combining (61) with (62) gives

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}}){\,}(-1){\,}\ln _{\alpha }\left( \frac{r_{i}}{{\mathrm {per}}({\mathsf {A}})}\right) = r_{i}{\,}g_{\alpha }\left( \frac{{\mathrm {per}}({\mathsf {A}})}{r_{i}}\right) \le \sum _{\begin{array}{c} k=1 \\ a(i,k)=1 \end{array}}^{n} g_{\alpha }\Bigl \{{\mathrm {per}}{\bigl ({\mathsf {A}}(i,k)\bigr )}\Bigr \} {\>}. \end{aligned}$$
(63)

From the definition of the \(\alpha \)-logarithm, we have the identity

$$\begin{aligned} \ln _{\alpha }(r\xi )=\ln _{\alpha }(r)+r^{1-\alpha }\ln _{\alpha }(\xi ). \end{aligned}$$
(64)

Summing (63) with respect to \(i\in \{1,\ldots ,n\}\), we therefore obtain

$$\begin{aligned}&{\mathrm {per}}({\mathsf {A}})\left\{ -\sum _{i=1}^{n}\ln _{\alpha }(r_{i}) -n{\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}({\mathsf {A}})}\right) \right\} \le \sum _{i=1}^{n}r_{i}{\,}g_{\alpha }\left( \frac{{\mathrm {per}}({\mathsf {A}})}{r_{i}}\right) \end{aligned}$$
(65)
$$\begin{aligned}&\le \sum _{i=1}^{n}\sum _{\begin{array}{c} k=1 \\ a(i,k)=1 \end{array}}^{n} {\mathrm {per}}{\bigl ({\mathsf {A}}(i,k)\bigr )}{\,}(-1){\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}\bigl ({\mathsf {A}}(i,k)\bigr )}\right) \end{aligned}$$
(66)
$$\begin{aligned}&=\sum _{\sigma \in {S}}\sum _{i=1}^{n} (-1){\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}\bigl \{{\mathsf {A}}\bigl (i,\sigma (i)\bigr )\bigr \}}\right) . \end{aligned}$$
(67)

To prove (65), we used (64) and the relation \(n\le \sum _{i=1}^{n}r_{i}^{1-\alpha }\) satisfied for \(\alpha \in (0,1)\). To justify (67), we note the following fact. In the double sum (67), the number of terms from any pair (ik) equals the number of those \(\sigma \in {S}\) for which \(\sigma (i)=k\). The latter number is \({\mathrm {per}}{\bigl ({\mathsf {A}}(i,k)\bigr )}\) for \(a(i,k)=1\), and zero otherwise. We now apply the induction hypothesis to each \({\mathrm {per}}{\bigl \{{\mathsf {A}}{\bigl (i,\sigma (i)\bigr )}\bigr \}}\) in (67). The left-hand side of (65) is no greater than

$$\begin{aligned}&\sum _{\sigma \in {S}}\sum _{i=1}^{n} \left\{ \sum _{\begin{array}{c} \ell \ne {i} \\ a(\ell ,\sigma (i))=0 \end{array}} \frac{1}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j) {\,}+\sum _{\begin{array}{c} \ell \ne {i} \\ a(\ell ,\sigma (i))=1 \end{array}} \frac{1}{r_{\ell }-1}{\,}\sum _{j=1}^{r_{\ell }-1}\ln _{\alpha }(j) \right\} \nonumber \\&=\sum _{\sigma \in {S}}\sum _{\ell =1}^{n} \left\{ \sum _{\begin{array}{c} i\ne \ell \\ a(\ell ,\sigma (i))=0 \end{array}} \frac{1}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j) {\,}+\sum _{\begin{array}{c} i\ne \ell \\ a(\ell ,\sigma (i))=1 \end{array}} \frac{1}{r_{\ell }-1}{\,}\sum _{j=1}^{r_{\ell }-1}\ln _{\alpha }(j) \right\} \end{aligned}$$
(68)
$$\begin{aligned}&=\sum _{\sigma \in {S}}\sum _{\ell =1}^{n} \left\{ \frac{n-r_{\ell }}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j) {\,}+\sum _{j=1}^{r_{\ell }-1}\ln _{\alpha }(j)\right\} . \end{aligned}$$
(69)

In the step (68), we change an order of summation. The step (69) is posed as follows. First, the number of i such that \(i\ne \ell \) and \({a}{\bigl (\ell ,\sigma (i)\bigr )}=0\) is equal to \((n-r_{\ell })\). Second, the number of i such that \(i\ne \ell \) and \({a}{\bigl (\ell ,\sigma (i)\bigr )}=1\) is equal to \((r_{\ell }-1)\). These observations allow to compute the sums with respect to i and get (69). Adding the term \({\mathrm {per}}({\mathsf {A}})\sum _{1\le \ell \le {n}}\ln _{\alpha }(r_{\ell })\) to both (65) and (69), we immediately obtain

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}}){\,}(-n){\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}({\mathsf {A}})}\right) \le {\mathrm {per}}({\mathsf {A}})\sum _{\ell =1}^{n}\frac{n}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j). \end{aligned}$$

The latter completes the proof. \(\square \)

In the limit \(\alpha \rightarrow 1^{-}\), the result (60) leads to the previous result (56). In this regard, it is a proper extension of (51) to the parameter range \(\alpha \in (0,1)\). Together, the bounds (51) and (60) cover all the values \(\alpha >0\).