Notes on use of Generalized Entropies in Counting

Rastegin, Alexey E.

doi:10.1007/s00373-016-1731-x

Notes on use of Generalized Entropies in Counting

Original Paper
Published: 04 August 2016

Volume 32, pages 2625–2641, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Graphs and Combinatorics Aims and scope Submit manuscript

Notes on use of Generalized Entropies in Counting

Download PDF

Alexey E. Rastegin¹

162 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

We address an idea of applying generalized entropies in counting problems. First, we consider some entropic properties that are essential for such purposes. Using the $\alpha $-entropies of Tsallis–Havrda–Charvát type, we derive several results connected with Shearer’s lemma. In particular, we derive upper bounds on the maximum possible cardinality of a family of k-subsets, when no pairwise intersections of these subsets may coincide. Further, we revisit the Minc conjecture. Our approach leads to a family of one-parameter extensions of Brégman’s theorem. A utility of the obtained bounds is explicitly exemplified.

Counting sets with small sumset and applications

Article 16 January 2015

On the mean square of the remainder for the euclidean lattice point counting problem

Article 22 September 2017

Counting SET-Free Sets

Article 27 July 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The concept of entropy is fundamental in both statistical physics and information theory. It plays a certain role in applying information-theoretic ideas to combinatorial problems [17]. Many results of such a kind were reviewed by Radhakrishnan [21] and Galvin [10]. An entropy approach is often used in studies of colorings of graphs [11, 12]. Applications of the entropy as a combinatorial tool are typically based on the Shannon entropy and its conditional form. Meantime, other entropic functions have found to be useful in various questions [2]. The Rényi entropy [25] and the Tsallis–Havrda–Charvát (THC) entropy [14, 29] are especially important extensions of the Shannon entropy. In principle, such entropic functions may have combinatorial or computational applications. For instances, they both have been used in global thresholding approach to image processing [27].

The main goal of this study is to address entropy-based approach to counting problems with use of the Tsallis–Havrda–Charvát entropies. The paper is organized as follows. In Sect. 2, we recall properties of the THC entropies and prove a useful statement. In Sect. 3, we obtain THC-entropy versions of some combinatorial results related to the so-called Shearer lemma. In particular, we consider an upper estimate for the maximum possible cardinality of a family of k-subsets of the given set, when subsets obey certain restrictions. In Sect. 4, we derive one-parameter family of upper bounds on permanents of square (0, 1)-matrices. This family is an extension of the Brégman theorem. We describe an example of utility of the presented extension.

2 Definitions and Properties of the THC $\alpha $-Entropies

In this section, we briefly recall definitions of the Tsallis–Havrda–Charvát entropies and related conditional entropies. Required properties of these entropic functionals are discussed as well. Let discrete random variable X take values on the finite set ${\varOmega }_{X}$. The non-extensive entropy of strictly positive degree $\alpha \ne 1$ is defined by [29]

$$\begin{aligned} H_{\alpha }(X):=\frac{1}{1-\alpha }\left( \sum _{{\,}x\in {\varOmega }_{X}} p(x)^{\alpha } - 1\right) . \end{aligned}$$

(1)

With the factor $\left( 2^{1-\alpha }-1\right) ^{-1}$ instead of $(1-\alpha )^{-1}$, this entropic form was considered by Havrda and Charvát [14]. In non-extensive statistical mechanics, the entropy (1) is known as the Tsallis entropy. It is instructive to rewrite (1) as

$$\begin{aligned} H_{\alpha }(X)=-\sum _{x\in {\varOmega }_{X}}p(x)^{\alpha }{\,}\ln _{\alpha }p(x) =\sum _{x\in {\varOmega }_{X}}p(x){\,}\ln _{\alpha }\left( \frac{1}{p(x)}\right) . \end{aligned}$$

(2)

Assuming $\xi >0$, the so-called $\alpha $-logarithm is defined as

$$\begin{aligned} \ln _{\alpha }(\xi )= \left\{ \begin{array}{ll} \frac{\xi ^{1-\alpha }-1}{1-\alpha }\>, &{} \text { for } \alpha >0,\ \alpha \ne 1, \\ \ln \xi , &{} \text { for } \alpha =1. \end{array}\right. \end{aligned}$$

(3)

In the limit $\alpha \rightarrow 1$, the entropy (1) gives the standard Shannon entropy

$$\begin{aligned} H_{1}(X)=-\sum _{x\in {\varOmega }_{X}}p(x){\,}\ln {p(x)} \ . \end{aligned}$$

(4)

For all real $q\in [0,1]$, we write the binary THC entropy

$$\begin{aligned} h_{\alpha }(q):=- q^{\alpha }\ln _{\alpha }(q)-(1-q)^{\alpha }\ln _{\alpha }(1-q). \end{aligned}$$

(5)

For $q\in (0,1)$, this function is concave and obeys $h_{\alpha }(q)=h_{\alpha }(1-q)$. The THC entropies succeed some natural properties of the Shannon entropy. The maximal value of (1) is equal to $\ln _{\alpha }|{\varOmega }_{X}|$ and reached with the uniform distribution. For $\alpha \ge 1$, the joint THC entropy of two random variables obeys [9]

$$\begin{aligned} H_{\alpha }(X,Y)\le {H}_{\alpha }(X)+H_{\alpha }(Y). \end{aligned}$$

(6)

In applications of information-theoretic methods, the notion of conditional entropy is widely used [6]. Let us put the particular functional

$$\begin{aligned} H_{1}(X|y)=-\sum \nolimits _{x}p(x|y){\,}\ln {p}(x|y), \end{aligned}$$

in which the sum is taken over $x\in {\varOmega }_{X}$. The entropy of X conditional on knowing Y is defined as [6]

$$\begin{aligned} H_{1}(X|Y):=\sum \nolimits _{y} p(y){\,}H_{1}(X|y) =-\sum \nolimits _{x}\sum \nolimits _{y} p(x,y){\,}\ln {p}(x|y), \end{aligned}$$

(7)

where $p(x|y)=p(x,y)/p(y)$. When the range of summation is clear from the context, we will omit symbols such as ${\varOmega }_{X}$ and ${\varOmega }_{Y}$.

In the literature, two kinds of the conditional THC entropy have been discussed [9]. These forms are respectively inspired by the two expressions given in (2). The first form is defined as [9]

$$\begin{aligned} H_{\alpha }(X|Y):=\sum \nolimits _{y}p(y)^{\alpha }H_{\alpha }(X|y), \end{aligned}$$

(8)

where

$$\begin{aligned} H_{\alpha }(X|y):=\frac{1}{1-\alpha }\left( \sum \nolimits _{x} p(x|y)^{\alpha }-1\right) =-\sum \nolimits _{x} p(x|y)^{\alpha }{\,}\ln _{\alpha }p(x|y) \end{aligned}$$

(9)

and strictly positive $\alpha \ne 1$. The conditional entropy (8) is, up to a factor, the quantity originally introduced by Daróczy [7]. For any $\alpha >0$, this conditional entropy obeys the chain rule written as [7]

$$\begin{aligned} H_{\alpha }(X,Y)=H_{\alpha }(X|Y)+H_{\alpha }(Y). \end{aligned}$$

(10)

Due to nonnegativity of $H_{\alpha }(X|Y)$, for all $\alpha >0$ we also have

$$\begin{aligned} H_{\alpha }(X,Y)\ge {H}_{\alpha }(Y). \end{aligned}$$

The chain rule (10) can be extended to more than two variables. Up to reordering of random variables, this result is expressed as [9]

$$\begin{aligned} H_{\alpha }(X_{1},X_{2},\ldots ,X_{n}) =\sum \nolimits _{j=1}^{n}H_{\alpha }(X_{j}|X_{j-1},\ldots ,X_{1}), \end{aligned}$$

(11)

where $\alpha >0$. In the case $\alpha =1$, we obtain the chain rule with the standard conditional entropy (7). This property turns to be very essential in entropic approach to counting. The second form of conditional THC entropy is introduced as [9]

$$\begin{aligned} {\widetilde{H}}_{\alpha }(X|Y):=\sum \nolimits _{y}p(y){\,}H_{\alpha }(X|y). \end{aligned}$$

(12)

Although the quantity (12) does not share the chain rule, it has found use in some questions [9, 22]. Its definition is based on the formulation, which seems to be more appropriate in the context of dynamical systems and generalized entropy rates [8, 24, 28]. We also have ${\widetilde{H}}_{\alpha }(X|Y)\le {H}_{\alpha }(X|Y)$ for $\alpha \in (0,1)$ and ${\widetilde{H}}_{\alpha }(X|Y)\ge {H}_{\alpha }(X|Y)$ for $\alpha \in (1,\infty )$. For $\alpha =1$, the $\alpha $-entropies (8) and (12) both coincide with (7).

Using entropic approach in counting, several properties of the conditional entropy are required. One of these properties is the chain rule. The standard conditional entropy also satisfies

$$\begin{aligned} H_{1}(X|Y_{1},\ldots ,Y_{n-1},Y_{n}) \le {H}_{1}(X|Y_{1},\ldots ,Y_{n-1}). \end{aligned}$$

(13)

Thus, conditioning on more can only reduce the conditional entropy. This relation is sometimes required in counting [21]. Another very useful property of the standard conditional entropy is formulated as follows. Let $Y\mapsto {f}(Y)$ be some function, whose domain covers the support of random variable Y. Then we have [21]

$$\begin{aligned} H_{1}\bigl (X\big |f(Y)\bigr )\ge {H}_{1}(X|Y). \end{aligned}$$

(14)

We shall now establish analogous properties for the conditional $\alpha $-entropies.

Proposition 1

Let X and $Y_{1},\ldots ,Y_{n}$ be discrete random variables, where $n\ge 1$. For $\alpha \ge 1$, the conditional entropy (8) satisfies

$$\begin{aligned} H_{\alpha }(X|Y_{1},\ldots ,Y_{n-1},Y_{n}) \le {H}_{\alpha }(X|Y_{1},\ldots ,Y_{n-1}). \end{aligned}$$

(15)

For $\alpha >0$, the conditional entropy (12) satisfies

$$\begin{aligned} {\widetilde{H}}_{\alpha }(X|Y_{1},\ldots ,Y_{n-1},Y_{n}) \le {\widetilde{H}}_{\alpha }(X|Y_{1},\ldots ,Y_{n-1}). \end{aligned}$$

(16)

Let $Y\mapsto {f}(Y)$ be a function of random variable Y. For $\alpha \ge 1$, the conditional entropy (8) satisfies

$$\begin{aligned} H_{\alpha }\bigl (X\big |f(Y)\bigr )\ge {H}_{\alpha }(X|Y). \end{aligned}$$

(17)

For $\alpha >0$, the conditional entropy (12) satisfies

$$\begin{aligned} {\widetilde{H}}_{\alpha }\bigl (X\big |f(Y)\bigr )\ge {\widetilde{H}}_{\alpha }(X|Y). \end{aligned}$$

(18)

Proof

The results (15) and (16) were proved in [9, 23] and [24], respectively. Let us proceed to (17) and (18). Since the standard case is known, we assume $\alpha \ne 1$. To each value u of the function, we assign the subset $\omega _{u}\subseteq {\varOmega }_{Y}$ such that

$$\begin{aligned} \omega _{u}:=\bigl \{y:{\>}y\in {\varOmega }_{Y},{\>}f(y)=u\bigr \}. \end{aligned}$$

Then the probabilities are written as

$$\begin{aligned} p(x,u)=\sum _{y\in \omega _{u}}p(x,y),\, \qquad p(u)=\sum _{y\in \omega _{u}}p(y). \end{aligned}$$

(19)

The left-hand side of (17) is represented as

$$\begin{aligned} H_{\alpha }\bigl (X\big |f(Y)\bigr ) =\sum _{u\in {\varOmega }_{f(Y)}} p(u)^{\alpha }H_{\alpha }(X|u). \end{aligned}$$

(20)

Replacing $p(u)^{\alpha }$ with p(u), we obtain the expression for ${\widetilde{H}}_{\alpha }\bigl (X\big |f(Y)\bigr )$. For strictly positive $\alpha \ne 1$ and $\xi \ge 0$, we introduce the function

$$\begin{aligned} \eta _{\alpha }(\xi )=\frac{\xi ^{\alpha }-\xi }{1-\alpha }. \end{aligned}$$

In terms of this function, we now write

$$\begin{aligned} p(u)^{\alpha }H_{\alpha }(X|u)= \sum _{x\in {\varOmega }_{X}}p(u)^{\alpha }{\,}\eta _{\alpha }\bigl (p(x|u)\bigr ) \>. \end{aligned}$$

As $\eta _{\alpha }^{\prime \prime }(\xi )\le 0$ for the considered values of $\alpha $, the function $\xi \mapsto \eta _{\alpha }(\xi )$ is concave. For fixed x and u, we put numbers $\lambda _{y}=p(y)/p(u)$ and $\xi _{y}=p(x,y)/p(y)=p(x|y)$ such that

$$\begin{aligned} \sum _{y\in \omega _{u}}\lambda _{y}=\frac{p(u)}{p(u)}=1 \ , \qquad \sum _{y\in \omega _{u}}\lambda _{y}{\,}\xi _{y}=\frac{p(x,u)}{p(u)}=p(x|u), \end{aligned}$$

according to (19). By Jensen’s inequality, we then obtain

$$\begin{aligned}&\displaystyle p(u){\>}\eta _{\alpha }\bigl (p(x|u)\bigr )\ge p(u)\sum _{y\in \omega _{u}}\lambda _{y}{\,}\eta _{\alpha }(\xi _{y}) =\sum _{y\in \omega _{u}}p(y){\>}\eta _{\alpha }\bigl (p(x|y)\bigr ) \>. \end{aligned}$$

(21)

$$\begin{aligned}&\displaystyle p(u)^{\alpha }{\>}\eta _{\alpha }\bigl (p(x|u)\bigr )\ge p(u)^{\alpha }\sum _{y\in \omega _{u}}\lambda _{y}{\,}\eta _{\alpha }(\xi _{y}) =\sum _{y\in \omega _{u}}p(u)^{\alpha -1}p(y){\>}\eta _{\alpha }\bigl (p(x|y)\bigr ) \>. \qquad \end{aligned}$$

(22)

Summing (21) with respect to $x\in {\varOmega }_{X}$, for all the considered values of $\alpha $ one gets

$$\begin{aligned} p(u){\,}H_{\alpha }(X|u)\ge \sum _{y\in \omega _{u}}p(y){\,}H_{\alpha }(X|y). \end{aligned}$$

The latter leads to (18) after summing with respect to $u\in {\varOmega }_{f(Y)}$. For all $y\in \omega _{u}$ and $\alpha >1$, we have $p(u)\ge {p}(y)$ and $p(u)^{\alpha -1}p(y)\ge {p}(y)^{\alpha }$. Combining this with (22) and summing with respect to $x\in {\varOmega }_{X}$, we obtain

$$\begin{aligned} p(u)^{\alpha }H_{\alpha }(X|u)\ge \sum _{y\in \omega _{u}}p(y)^{\alpha }H_{\alpha }(X|y). \end{aligned}$$

Summing this with respect to $u\in {\varOmega }_{f(Y)}$ completes the proof of (17). $\square $

Note that the standard case $\alpha =1$ of (17) and (18) can be proved by repeating the above reasons with the concave function $\xi \mapsto -\xi \ln \xi $. In the mentioned ranges of the parameter, the conditional THC entropies (8) and (12) enjoy the property with respect to conditioning on more. The result (15) has allowed to derive the one-parametric extension of entropic Bell inequalities originally given in [3]. Using (14), one can deduce a property useful in entropic approach to Bregman’s theorem [20, 21]. We shall now formulate a similar statement for the $\alpha $-entropies.

Corollary 2

Let the support ${\varOmega }_{Y}$ of random variable Y be partitioned into m mutually disjoint sets $\omega _{j}$ as

$$\begin{aligned} {\varOmega }_{Y}=\bigcup _{j=1}^{m}\omega _{j}. \end{aligned}$$

Let $\varpi _{j}\subseteq {\varOmega }_{X}$ be defined as

$$\begin{aligned} \varpi _{j}:=\bigl \{x:{\>}x\in {\varOmega }_{X},{\>}y\in \omega _{j},{\>}p(x|y)\ne 0\bigr \} \>. \end{aligned}$$

If $\varpi _{j}\ne \varpi _{k}$ for all $j\ne {k}$, then

$$\begin{aligned} H_{\alpha }(X|Y)&\le \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]^{\alpha }{\>}\ln _{\alpha }|\varpi _{j}|&(1\le \alpha <\infty )\ , \end{aligned}$$

(23)

$$\begin{aligned} {\widetilde{H}}_{\alpha }(X|Y)&\le \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]{\>}\ln _{\alpha }|\varpi _{j}|&(0<\alpha <\infty ). \end{aligned}$$

(24)

Proof

Let us take the function $Y\mapsto {f}_{\omega }(Y)$ such that $f_{\omega }(y)=\varpi _{j}$ for each $y\in \omega _{j}$. It then follows from (17) and (18) that

$$\begin{aligned}&H_{\alpha }(X|Y)\le {H}_{\alpha }\bigl (X\big |f_{\omega }(Y)\bigr )= \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]^{\alpha } H_{\alpha }(X|\varpi _{j})&(1\le \alpha <\infty ), \end{aligned}$$

(25)

$$\begin{aligned}&{\widetilde{H}}_{\alpha }(X|Y)\le {\widetilde{H}}_{\alpha }\bigl (X\big |f_{\omega }(Y)\bigr )= \sum _{j=1}^{m} {\mathrm {Pr}}[Y\in \omega _{j}]{\,}H_{\alpha }(X|\varpi _{j})&(0<\alpha <\infty ). \end{aligned}$$

(26)

The quantity $H_{\alpha }(X|\varpi _{j})$ is represented as the sum

$$\begin{aligned} H_{\alpha }(X|\varpi _{j})=\sum _{x\in \varpi _{j}}\eta _{\alpha }\bigl (p(x|\omega _{j})\bigr ) \>. \end{aligned}$$

(27)

The sum of $p(x|\omega _{j})$ over $x\in \varpi _{j}$ is equal to 1, whence the term (27) does not exceed $\ln _{\alpha }|\varpi _{j}|$. Combining this fact with (25) and (26) completes the proof. $\square $

Using Corollary 2, we will obtain upper bounds on conditional $\alpha $-entropies in some combinatorial problems. To do so, we have to estimate not only cardinalities $|\varpi _{j}|$, but also probabilities ${\mathrm {Pr}}[Y\in \omega _{j}]$. From this viewpoint, the inequality (24) seems to be more appropriate.

3 Shearer’s Lemma and Intersections of k-Element Sets

In this section, we will examine some questions connected with the Shearer lemma [5]. The properties of the THC entropies lead to a lot of inequalities with interesting combinatorial applications. We first note the following.

Proposition 3

Let $X=(X_{1},\ldots ,X_{n})$ be a random variable taking values in the set $S=S_{1}\times \cdots \times {S}_{n}$, where each coordinate $X_{j}$ is a random variable taking values in $S_{j}$. For all $\alpha \ge 1$, we have

$$\begin{aligned} H_{\alpha }(X)\le \sum _{j=1}^{n} H_{\alpha }(X_{j}). \end{aligned}$$

(28)

Proof

The claim (28) immediately follows by induction from (6). $\square $

The result (28) is a straightforward extension of proposition 15.7.2 of the book [1]. Hence, we can obtain several corollaries. The first of them is posed as follows.

Corollary 4

Let ${\mathcal {F}}$ be a family of subsets of the set $\{1,\ldots ,n\}$, and let $q_{j}$ denote the fraction of members of ${\mathcal {F}}$ that contain j. For all $\alpha \ge 1$, we have

$$\begin{aligned} \ln _{\alpha }|{\mathcal {F}}|\le \sum _{j=1}^{n} h_{\alpha }(q_{j}). \end{aligned}$$

(29)

Proof

To each set $F\in {\mathcal {F}}$, we assign its characteristic vector v(F), which is a binary n-tuple. Let $X=(X_{1},\ldots ,X_{n})$ be the random variable taking values in $\{0,1\}^{n}$ such that

$$\begin{aligned} {\mathrm {Pr}}\bigl [X=v(F)\bigr ]=|{\mathcal {F}}|^{-1} \qquad \forall {\,}F\in {\mathcal {F}}\ , \end{aligned}$$

(30)

whence $H_{\alpha }(X)=\ln _{\alpha }|{\mathcal {F}}|$. The random variable $X_{j}$ takes values in $\{0,1\}$ and is j-th value in the characteristic vector. By definition of $q_{j}$, the entropy of $X_{j}$ is equal to $h_{\alpha }(q_{j})$. Combining this with (28) completes the proof. $\square $

The result (29) provides an upper estimate for the maximum possible cardinality of a family of subsets. It is an $\alpha $-entropy version of the basic lemma proved in [16]. The authors of [16] used tools of information theory for studying a family of k-subsets, which satisfy some restrictions. We will further apply (29) to a specific family of k-element subsets of the set $\{1,\ldots ,n\}$. Suppose that a family ${\mathcal {G}}=\{G_{1},\ldots ,G_{m}\}$ of m subsets of the set $\{1,\ldots ,n\}$ obey the implication

$$\begin{aligned} \hbox {``}\{i,j\}\ne \{s,t\}\hbox {''}{\,}\Longrightarrow {\,} \hbox {``}G_{i}\cap {G}_{j}\ne {G}_{s}\cap {G}_{t}\hbox {''}. \end{aligned}$$

(31)

That is, no pairwise intersections of the k-subsets may coincide. We aim to estimate cardinality of this family from above. Let us begin with an auxiliary result.

Lemma 5

For $\alpha \in [1,3.67]$, the function $\lambda \mapsto {h}_{\alpha }(\lambda ^{2})/\lambda $ is concave for $\lambda \in \left[ 0,1/\sqrt{2}\right] $.

Proof

We left out the case $\alpha =1$, for which the concavity was reported in [16] for all $\lambda \in [0,1]$. During the proof, we will use the following generalization of Bernoulli’s inequality (see, e.g., section 2.4 of the book [19]). For $-1<x\ne 0$, one has

$$\begin{aligned} (1+x)^{r}&>1+rx \qquad \qquad \qquad \qquad (r\notin [0,1]) \ , \end{aligned}$$

(32)

$$\begin{aligned} (1+x)^{r}<1+rx \qquad \qquad \qquad \qquad (0<r<1). \end{aligned}$$

(33)

For $\alpha >1$, we can write the expression

$$\begin{aligned} (\alpha -1){\>}\frac{h_{\alpha }(\lambda ^{2})}{\lambda }= -\lambda ^{2\alpha -1}+\frac{1-(1-\lambda ^{2})^{\alpha }}{\lambda }. \end{aligned}$$

(34)

The term $-\lambda ^{2\alpha -1}$ is concave with respect to $\lambda $ for all $\alpha >1$. We will show concavity of the second term in the right-hand side of (34). Let us use the second derivative test. For arbitrary function $\xi \mapsto {f}(\xi )$, one has a general expression

$$\begin{aligned} \frac{{\mathrm {d}}^{2}}{{\mathrm {d}}\lambda ^{2}} {\>}\frac{f(\lambda ^{2})}{\lambda }= \frac{2}{\lambda ^{3}} {\,}\Bigl (2\xi ^{2}f^{\prime \prime }(\xi )-\xi {f}^{\prime }(\xi )+f(\xi )\Bigr ), \end{aligned}$$

(35)

where $\xi =\lambda ^{2}$. Substituting $f_{\alpha }(\xi )=1-(1-\xi )^{\alpha }$ finally gives

$$\begin{aligned} 2\xi ^{2}f_{\alpha }^{\prime \prime }(\xi )-\xi {f}_{\alpha }^{\prime }(\xi )+f_{\alpha }(\xi )= 1+(1-\xi )^{\alpha -2} \bigl (-1+c_{1}\xi +c_{2}\xi ^{2} \bigr ). \end{aligned}$$

(36)

The coefficients in (36) are calculated as

$$\begin{aligned} c_{1}=2-\alpha ,\, \qquad c_{2}=-1+3\alpha -2\alpha ^2=\frac{1}{8}-2\left( \alpha -\frac{3}{4}\right) ^{2}. \end{aligned}$$

(37)

We will show that the quantity (36) is not positive for $\alpha \in [1,3.67]$ and $\lambda \in \left[ 0,1/\sqrt{2}\right] $. Let us consider separately the cases $\alpha \in [1,2]$ and $\alpha \in [2,3.67]$.

Taking the intervals $\alpha \in [1,2]$ and $\xi \in [0,1]$, we have

$$\begin{aligned} (1-\xi )^{2-\alpha }\le 1-(2-\alpha )\xi =1-c_{1}\xi \qquad (1\le \alpha \le 2). \end{aligned}$$

(38)

This formula is based on (33) with $x=-\xi $ and $r=2-\alpha $. Due to (38), we rewrite (36) in the form

$$\begin{aligned} (1-\xi )^{\alpha -2} \Bigl ((1-\xi )^{2-\alpha } -1+c_{1}\xi +c_{2}\xi ^{2} \Bigr ) \le (1-\xi )^{\alpha -2}c_{2}\xi ^{2}\le 0 \ , \end{aligned}$$

(39)

where $c_{2}\le 0$ for $\alpha \in [1,2]$ by (37). Here, the concavity takes place for all $\lambda \in [0,1]$.

The case $\alpha \ge 2$ is more complicated to analysis. Here, we introduce the positive parameter $\beta =\alpha -2$. The condition of negativity of (36) is then rewritten as

$$\begin{aligned} (1-\xi )^{\beta } \bigl (1+\beta \xi +\gamma \xi ^{2} \bigr )-1=:F_{\beta }(\xi )\ge 0, \end{aligned}$$

where $\gamma =-c_{2}=3+5\beta +2\beta ^{2}$. This inequality is to be proved for $\xi =\lambda ^{2}\le 1/2$.

We begin with the case $\beta \in [0,1]$. Using the polynomial $p_{\beta }(\xi )=1+\beta \xi +\gamma \xi ^{2}$, we write the derivative

$$\begin{aligned} \frac{{\mathrm {d}}{F}_{\beta }}{{\mathrm {d}}\xi }=(1-\xi )^{\beta -1} \Bigl ((1-\xi )p_{\beta }^{\prime }(\xi )-\beta \,p_{\beta }(\xi ) \Bigr ). \end{aligned}$$

Doing simple calculations, we easily obtain

$$\begin{aligned} (1-\xi )p_{\beta }^{\prime }(\xi )-\beta \,p_{\beta }(\xi )= \xi \Bigl (\bigl (2\gamma -\beta -\beta ^{2}\bigr )-\gamma (2+\beta )\xi \Bigr ). \end{aligned}$$

(40)

As $\xi \ge 0$, the derivative ${\mathrm {d}}{F}_{\beta }/{\mathrm {d}}\xi $ is not negative, whenever

$$\begin{aligned} \xi \le \frac{2\gamma -\beta -\beta ^{2}}{\gamma (2+\beta )} =\frac{6+9\beta +3\beta ^{2}}{(3+5\beta +2\beta ^{2})(2+\beta )}. \end{aligned}$$

(41)

For $\beta \in [0,1]$, the right-hand side of (41) monotonically decreases with $\beta $ from 1 at $\beta =0$ up to 0.6 at $\beta =1$. The condition (41) is clearly satisfied for all $\xi \in [0,1/2]$. Here, the function $F_{\beta }(\xi )$ does not decrease. Combining this with $F_{\beta }(0)=0$, we finally get $F_{\beta }(\xi )\ge 0$ for all $\beta \in [0,1]$ and $\xi \in [0,1/2]$.

For $\beta \ge 1$, we apply $(1-\xi )^{\beta }\ge 1-\beta \xi $ due to (32). Thus, the quantity of interest obeys

$$\begin{aligned} F_{\beta }(\xi )\ge (1-\beta \xi ) \bigl ( 1+\beta \xi +\gamma \xi ^{2} \bigr )-1 \ge \bigl (\gamma -\beta ^{2}\bigr )\xi ^{2}-\beta \gamma \xi ^{3}. \end{aligned}$$

The latter is not negative, whenever $\gamma -\beta ^{2}\ge \beta \gamma \xi $. Due to $\xi \le 1/2$, we can focus on the inequality $2\bigl (\gamma -\beta ^{2}\bigr )-\beta \gamma \ge 0$, or

$$\begin{aligned} 6+7\beta -3\beta ^{2}-2\beta ^{3}\ge 0. \end{aligned}$$

(42)

Inspecting roots of the polynomial, the condition (42) holds for all $\beta \in [0,1.67]$, though we use it only for $\beta \in [1,1.67]$. The latter completes the proof for $\alpha \in [3,3.67]$. $\square $

We have shown concavity of the function $\lambda \mapsto {h}_{\alpha }(\lambda ^{2})/\lambda $ for $\alpha \in [1,3.67]$ and $\lambda \in \left[ 0,1/\sqrt{2}\right] $. When $\alpha \in [1,2]$, the concavity actually holds for $\lambda \in [0,1]$. We now formulate the desired estimate as follows.

Proposition 6

Let ${\mathcal {G}}=\{G_{1},\ldots ,G_{m}\}$ be a family containing m k-subsets of the set $\{1,\ldots ,n\}$, and let ${\mathcal {G}}$ satisfy the property (31). Let $\lambda _{j}$ denote a proportion of those members of ${\mathcal {G}}$ that contain j. If the precondition

$$\begin{aligned} \lambda _{j}\le \frac{1}{\sqrt{2}} \end{aligned}$$

(43)

holds for all $j\in \{1,\ldots ,n\}$, then for all $\alpha \in [1,3.67]$,

$$\begin{aligned} \ln _{\alpha }\left( {\begin{array}{c}m\\ 2\end{array}}\right) \le {k}{\>}\frac{h_{\alpha }(\lambda ^{2})}{\lambda } \ , \qquad \lambda :=\sum _{j=1}^{n}\frac{\lambda _{j}}{k}{\>}\lambda _{j}. \end{aligned}$$

(44)

Proof

Let us consider pairwise intersections of members of ${\mathcal {G}}$. Each $j\in \{1,\ldots ,n\}$ will appear in a proportion

$$\begin{aligned} q_{j}^{*}=\left( {\begin{array}{c}m\\ 2\end{array}}\right) ^{-1}\left( {\begin{array}{c}\lambda _{j}m\\ 2\end{array}}\right) =\lambda _{j}^{2}-\frac{\lambda _{j}(1-\lambda _{j})}{m-1}. \end{aligned}$$

In the ratio, the denominator gives the number of all pairwise intersections; the numerator is the number of those pairwise intersections that contain j. The binary entropy (5) is concave for $q\in (0,1)$ and reaches its maximum at the point $q=1/2$. Hence, it does not decrease on (0, 1 / 2). Then the precondition (43) provides

$$\begin{aligned} h_{\alpha }(q_{j}^{*})\le {h}_{\alpha }(\lambda _{j}^{2}). \end{aligned}$$

Combining this with Corollary 4, for $\alpha \ge 1$ we obtain

$$\begin{aligned} \ln _{\alpha }\left( {\begin{array}{c}m\\ 2\end{array}}\right) \le \sum _{j=1}^{n}h_{\alpha }(\lambda _{j}^{2}) =k\sum _{j=1}^{n}\frac{\lambda _{j}}{k}{\>} \frac{h_{\alpha }(\lambda _{j}^{2})}{\lambda _{j}}. \end{aligned}$$

(45)

We further use $\sum _{j=1}^{n}\lambda _{j}/k=1$ and concavity of the function $\lambda \mapsto {h}_{\alpha }(\lambda ^{2})/\lambda $. Combining (45) with the Jensen inequality completes the proof. $\square $

We have obtained an implicit upper bound on $m=|{\mathcal {G}}|$ in terms of $k=|G_{j}|$ and the average proportion $\lambda $ of sets containing a particular element. Our result is a parametric extension of one of the statements proved in [16]. It also differs in the following two respects. First, the precondition (43) is now imposed. On the other hand, the formula (44) is more explicit in the sense that no unknown asymptotically small terms appear. For the prescribed value of $\lambda \in \left[ 0,1/\sqrt{2}\right] $, we could optimize a bound with respect to the parameter $\alpha $. The authors of [16] also consider a family of k-sets, in which the intersection of no two is contained in a third. Such estimates are connected with one of questions raised by Erdős.

The statement of Proposition 3 allows a certain extension. In the case of Shannon entropies, extension of such a kind has been proved by Shearer [5]. It is often referred to as the Shearer lemma [12, 15, 21]. Its generalization in terms of the THC entropies is posed as follows.

Proposition 7

Let $X=(X_{1},\ldots ,X_{n})$ be a random variable taking values in the set $S=S_{1}\times \cdots \times {S}_{n}$, where each coordinate $X_{j}$ is a random variable taking values in $S_{j}$. For a subset I of $\{1,\ldots ,n\}$, let X(I) denote the random variable $(X_{j})_{j\in {I}}$. Suppose that ${\mathcal {G}}$ is a family of subsets of $\{1,\ldots ,n\}$ and each $j\in \{1,\ldots ,n\}$ belongs to at least k members of ${\mathcal {G}}$. For $\alpha \ge 1$, we then have

$$\begin{aligned} k{\,}H_{\alpha }(X)\le \sum _{G\in {\mathcal {G}}} H_{\alpha }\bigl (X(G)\bigr ). \end{aligned}$$

(46)

Proof

Following [21], we will apply the chain rule. Using (11), for $\alpha \ge 1$ we have

$$\begin{aligned} H_{\alpha }(X)&=\sum _{j=1}^{n} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k<j)\bigr ), \nonumber \\ H_{\alpha }\bigl (X(G)\bigr )&=\sum _{j\in {G}} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k\in {G},{\,}k<j)\bigr ) \\&\ge \sum _{j\in {G}} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k<j)\bigr ). \nonumber \end{aligned}$$

(47)

The step (47) follows from (15), since any string $(X_{k}:{\,}k<j)$ contains more elements than $(X_{k}:{\,}k\in {G},{\,}k<j)$. Summing (47) with respect to all $G\in {\mathcal {G}}$ gives

$$\begin{aligned} \sum _{G\in {\mathcal {G}}} H_{\alpha }\bigl (X(G)\bigr )\ge k\sum _{j=1}^{n} H_{\alpha }\bigl (X_{j}\bigm |(X_{k}:{\,}k<j)\bigr ), \end{aligned}$$

(48)

because each $j\in \{1,\ldots ,n\}$ belongs to at least k members of ${\mathcal {G}}$. $\square $

The statement of Proposition 7 is a THC-entropy extension of the Shearer lemma. A related geometric picture was described in [21]. Interesting geometric applications are also discussed in [1]. An immediate consequence of (46) is posed as follows.

Corollary 8

Let N be a finite set, and let ${\mathcal {F}}$ be a family of subsets of N. Let ${\mathcal {G}}=\{G_{1},\ldots ,G_{m}\}$ be a family of subsets of N such that each element of N appears in at least k members of ${\mathcal {G}}$. For each $1\le {j}\le {m}$, we define ${\mathcal {F}}_{j}:=\{F\cap {G}_{j}:{\>}F\in {\mathcal {F}}\}$. For $\alpha \ge 1$, we then have

$$\begin{aligned} k{\,}\ln _{\alpha }|{\mathcal {F}}|\le \sum _{j=1}^{m} \ln _{\alpha }|{\mathcal {F}}_{j}|. \end{aligned}$$

(49)

For $\alpha =1$, the formula (49) is reduced to a result originally proved in [5]. Some applications of the latter were also described in [5]. Of course, applications of such a kind can further be considered on the base of (49). In some cases, a family of one-parameter relations may give a stronger bound. An explicit example of this situation is the case of upper bounds on permanents of square (0, 1)-matrices.

4 Upper Bounds on Permanents of (0, 1)-Matrices

In this section, we will derive a family of one-parameter upper bounds on the permanent of a square (0, 1)-matrix. The well-known upper bound on permanents has been conjectured by Minc [18] and later proved by Brégman [4]. Brégman’s proof is based on the duality theorem of convex programming and properties of doubly stochastic matrices. A short elementary proof of this result was given by Schrijver [26]. Schrijver also mentioned an upper bound for permanents of arbitrary nonnegative matrices. A similar proof with randomization is explained in [1]. Developing an approach with randomization, Radhakrishnan presented an entropy-based proof [20]. Our aim is to study the question with use of the THC entropies. First, we recall preliminary facts. Let ${\mathsf {A}}=\bigl [\bigl [a(i,j)\bigr ]\bigr ]$ be a nonnegative $n\times {n}$-matrix, and let $S_{n}$ denote the set of all permutations on $\{1,\ldots ,n\}$. The permanent of ${\mathsf {A}}$ is defined as

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}}):=\sum _{\sigma \in {S}_{n}}\prod _{i=1}^{n} a\bigl (i,\sigma (i)\bigr ). \end{aligned}$$

(50)

We further consider matrices with elements $a(i,j)\in \{0,1\}$. By $S\subseteq {S}_{n}$, we mean the set of permutations $\sigma $ such that $a\bigl (i,\sigma (i)\bigr )=1$ for all $i\in \{1,\ldots ,n\}$. It is obvious that ${\mathrm {per}}({\mathsf {A}})=|S|$. It is assumed that the matrix contain no rows of only zeros, since otherwise its permanent is certainly zero. We claim the following.

Proposition 9

Let ${\mathsf {A}}$ be a $n\times {n}$ (0, 1)-matrix with ${\mathrm {per}}({\mathsf {A}})\ne 0$, and let $r_{i}\ne 0$ be a number of ones in i-th row $(i=1,\ldots ,n)$. For all $\alpha \ge 1$, the permanent of ${\mathsf {A}}$ obeys the inequality

$$\begin{aligned} \ln _{\alpha }\bigl ({\mathrm {per}}({\mathsf {A}})\bigr ) \le \sum _{i=1}^{n}\frac{1}{r_{i}}{\,}\sum _{j=1}^{r_{i}}\ln _{\alpha }(j). \end{aligned}$$

(51)

Proof

Let $\sigma $ be a random permutation chosen uniformly from S. We then have the value $H_{\alpha }(\sigma )={\ln _{\alpha }}{|S|}$, which coincides with the left-hand side of (51). We will show that, for $\alpha \ge 1$, the entropy $H_{\alpha }(\sigma )$ does not exceed the right-hand side of (51). Let us choose a random permutation $\tau \in {S}_{n}$ uniformly. Using the chain rule (11), for each permutation $\tau $ we can write

$$\begin{aligned} H_{\alpha }(\sigma )&=H_{\alpha }\Bigl ({\sigma }{\bigl (\tau (1)\bigr )}\Bigr )+ H_{\alpha }\Bigl ({\sigma }{\bigl (\tau (2)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )}\Bigr ) \nonumber \\&\qquad +\cdots +H_{\alpha }\Bigl ({\sigma }{\bigl (\tau (n)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (n-1)\bigr )}\Bigr ) \end{aligned}$$

(52)

$$\begin{aligned}&\le {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (1)\bigr )}\Bigr )+ {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (2)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )}\Bigr ) \nonumber \\&\quad \quad +\cdots +{\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (n)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (n-1)\bigr )}\Bigr ). \end{aligned}$$

(53)

Here, the second inequality holds for $\alpha \ge 1$. To the given permutation $\tau $ and index $i\in \{1,\ldots ,n\}$, we assign the integer $k(\tau ,i)\in \{1,\ldots ,n\}$ such that

$$\begin{aligned} k(\tau ,i):=\tau ^{-1}(i) \ , \qquad {\sigma }{\bigl (\tau (k)\bigr )}=\sigma (i) \end{aligned}$$

Summing (53) over all $\tau \in {S}_{n}$, we further obtain

$$\begin{aligned} |S_{n}|{\,}H_{\alpha }(\sigma )&\le \sum _{\tau \in {S}_{n}}\left\{ {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (1)\bigr )}\Bigr )+ {\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (2)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )}\Bigr ) +\cdots \right. \nonumber \\&\quad \left. \cdots +{\widetilde{H}}_{\alpha }\Bigl ({\sigma }{\bigl (\tau (n)\bigr )}\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (n-1)\bigr )}\Bigr ) \right\} \nonumber \\&=\sum _{\tau \in {S}_{n}}\sum _{i=1}^{n} {\widetilde{H}}_{\alpha }\Bigl (\sigma (i)\Bigm |{\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (k-1)\bigr )}\Bigr ). \end{aligned}$$

(54)

At the last step, we gather the contributions of different $\sigma (i)$ separately. For the given $\sigma \in {S}$, $\tau \in {S}_{n}$, and $i\in \{1,\ldots ,n\}$, we define $R_{i}(\sigma ,\tau )$ to be the set of those column indices that differ from ${\sigma }{\bigl (\tau (1)\bigr )},\ldots ,{\sigma }{\bigl (\tau (k-1)\bigr )}$ and give 1’s in i-th row [20]. By definition of $r_{i}$, we have $\big |R_{i}(\sigma ,\tau )\big |\le {r}_{i}$. Using (24), we then rewrite (54) as

$$\begin{aligned} |S_{n}|{\,}H_{\alpha }(\sigma )\le \sum _{i=1}^{n}\sum _{\tau \in {S}_{n}} \sum _{j=1}^{r_{i}}{\,}\underset{\sigma }{{\mathrm {Pr}}}\Bigl [\bigl |R_{i}(\sigma ,\tau )\bigr |=j\Bigr ]\ln _{\alpha }(j). \end{aligned}$$

Dividing this relation by $|S_{n}|$ and taking into account the uniform distribution of $\tau \in {S}_{n}$, we immediately obtain

$$\begin{aligned} H_{\alpha }(\sigma )\le \sum _{i=1}^{n} \sum _{j=1}^{r_{i}}{\,}\underset{\sigma ,\tau }{{\mathrm {Pr}}}\Bigl [\bigl |R_{i}(\sigma ,\tau )\bigr |=j\Bigr ]\ln _{\alpha }(j). \end{aligned}$$

(55)

We now recall a principal observation of [20] that

$$\begin{aligned} \underset{\sigma ,\tau }{{\mathrm {Pr}}}\Bigl [{\bigl |R_{i}(\sigma ,\tau )\bigr |}=j\Bigr ] =\frac{1}{r_{i}}. \end{aligned}$$

Combining this with (55) completes the proof. $\square $

The statement of Theorem 9 leads to an one-parameter family of upper bounds on permanents. In the limit $\alpha \rightarrow 1^{+}$, the relation (51) leads to the previous result [1]

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})\le \prod _{i=1}^{n} (r_{i}!)^{1/r_{i}}. \end{aligned}$$

(56)

It was conjectured in [18] and then proved in several ways [4, 20, 26]. This result can naturally be reformulated as an upper bound on the number of perfect matchings in a bipartite graph [10, 21].

We now consider a significance of the one-parameter bound (51). It is instructive to consider a concrete example. Let $n\times {n}$-matrix ${\mathsf {A}}$ have elements $a(1,j)=1$ for all $j=1,\ldots ,n$ and $a(i,j)=\delta (i,j)$ for $i=2,\ldots ,n$. That is, our matrix is obtained from the identity $n\times {n}$-matrix by filling its first row with ones. We then have ${\mathrm {per}}({\mathsf {A}})=1$. On the other hand, one gives $r_{1}=n$ and $r_{2}=\cdots =r_{n}=1$. Let us compare values of the bounds (51) and (56). It is easy to apply (51) in the case $\alpha =2$, since $\ln _{2}(\xi )=1-1/\xi $ due to (3). For $\alpha =2$, the upper bound (51) gives

$$\begin{aligned} 1-\frac{1}{{\mathrm {per}}({\mathsf {A}})}\le \frac{1}{n}{\,}\sum _{j=1}^{n}\left( 1-\frac{1}{j}\right) =1-\frac{{\mathcal {H}}_{n}}{n}. \end{aligned}$$

(57)

By ${\mathcal {H}}_{n}$, we denote the n-th harmonic number [13]. It is well known that the asymptotic expansion of this number for large n is written as [13]

$$\begin{aligned} {\mathcal {H}}_{n}=\ln {n}+\gamma +O(1/n), \end{aligned}$$

where $\gamma $ is the Euler–Mascheroni constant. From (57), we immediately obtain

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})\le \frac{n}{{\mathcal {H}}_{n}}=\frac{n}{\ln {n}}\left\{ 1+O\left( \frac{1}{\ln {n}}\right) \right\} . \end{aligned}$$

(58)

Substituting the same collection of numbers $r_{i}$ into (56) gives

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})\le (n!)^{1/n}=\frac{n}{e}\left\{ 1+O\left( \frac{1}{n}\right) \right\} . \end{aligned}$$

(59)

At the last step, we used the Stirling approximation. For sufficiently large n, the upper bound (58) is significantly stronger than (59). On the other hand, both the bounds are very far from the actual value of permanent. Nevertheless, our example has shown a relevance of the result (51) proved for $\alpha \ge 1$.

We can further ask for extending bounds with values $\alpha \in (0,1)$. The corresponding result can be obtained by an immediate extension of Schrijver’s proof [26]. We have the following statement.

Proposition 10

Let ${\mathsf {A}}$ be a $n\times {n}$ (0, 1)-matrix with ${\mathrm {per}}({\mathsf {A}})\ne 0$, and let $r_{i}\ne 0$ be a number of ones in i-th row $(i=1,\ldots ,n)$. For $\alpha \in (0,1)$, the permanent of ${\mathsf {A}}$ obeys the inequality

$$\begin{aligned} -\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}({\mathsf {A}})}\right) \le \sum _{i=1}^{n}\frac{1}{r_{i}}{\,}\sum _{j=1}^{r_{i}}\ln _{\alpha }(j). \end{aligned}$$

(60)

Proof

For convenience, we introduce the function

$$\begin{aligned} g_{\alpha }(\xi ):=\frac{\xi ^{\alpha }-\xi }{\alpha -1} =-\xi {\,}\ln _{\alpha }\left( \frac{1}{\xi }\right) , \end{aligned}$$

where $\xi >0$ and $\alpha \in (0,1)$. For these values of $\alpha $, the function $\xi \mapsto {g}_{\alpha }(\xi )$ is convex. Due to the Jensen inequality, we have

$$\begin{aligned} g_{\alpha }\left( \frac{1}{r}{\,}\sum _{k=1}^{r} \xi _{k}\right) \le \frac{1}{r}{\,}\sum _{k=1}^{r} g_{\alpha }(\xi _{k}). \end{aligned}$$

(61)

We will prove (60) by induction on n. For $n=1$, the result is trivial. Suppose that the claim is already proved for $(n-1)\times (n-1)$-matrices. The permanent of a $n\times {n}$-matrix can be decomposed as

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}})=\sum _{\begin{array}{c} k=1 \\ a(i,k)=1 \end{array}}^{n} {\mathrm {per}}\bigl ({\mathsf {A}}(i,k)\bigr ). \end{aligned}$$

(62)

Here, the submatrix ${\mathsf {A}}(i,k)$ is obtained from ${\mathsf {A}}$ by eliminating the i-th row and the k-th column. Combining (61) with (62) gives

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}}){\,}(-1){\,}\ln _{\alpha }\left( \frac{r_{i}}{{\mathrm {per}}({\mathsf {A}})}\right) = r_{i}{\,}g_{\alpha }\left( \frac{{\mathrm {per}}({\mathsf {A}})}{r_{i}}\right) \le \sum _{\begin{array}{c} k=1 \\ a(i,k)=1 \end{array}}^{n} g_{\alpha }\Bigl \{{\mathrm {per}}{\bigl ({\mathsf {A}}(i,k)\bigr )}\Bigr \} {\>}. \end{aligned}$$

(63)

From the definition of the $\alpha $-logarithm, we have the identity

$$\begin{aligned} \ln _{\alpha }(r\xi )=\ln _{\alpha }(r)+r^{1-\alpha }\ln _{\alpha }(\xi ). \end{aligned}$$

(64)

Summing (63) with respect to $i\in \{1,\ldots ,n\}$, we therefore obtain

$$\begin{aligned}&{\mathrm {per}}({\mathsf {A}})\left\{ -\sum _{i=1}^{n}\ln _{\alpha }(r_{i}) -n{\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}({\mathsf {A}})}\right) \right\} \le \sum _{i=1}^{n}r_{i}{\,}g_{\alpha }\left( \frac{{\mathrm {per}}({\mathsf {A}})}{r_{i}}\right) \end{aligned}$$

(65)

$$\begin{aligned}&\le \sum _{i=1}^{n}\sum _{\begin{array}{c} k=1 \\ a(i,k)=1 \end{array}}^{n} {\mathrm {per}}{\bigl ({\mathsf {A}}(i,k)\bigr )}{\,}(-1){\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}\bigl ({\mathsf {A}}(i,k)\bigr )}\right) \end{aligned}$$

(66)

$$\begin{aligned}&=\sum _{\sigma \in {S}}\sum _{i=1}^{n} (-1){\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}\bigl \{{\mathsf {A}}\bigl (i,\sigma (i)\bigr )\bigr \}}\right) . \end{aligned}$$

(67)

To prove (65), we used (64) and the relation $n\le \sum _{i=1}^{n}r_{i}^{1-\alpha }$ satisfied for $\alpha \in (0,1)$. To justify (67), we note the following fact. In the double sum (67), the number of terms from any pair (i, k) equals the number of those $\sigma \in {S}$ for which $\sigma (i)=k$. The latter number is ${\mathrm {per}}{\bigl ({\mathsf {A}}(i,k)\bigr )}$ for $a(i,k)=1$, and zero otherwise. We now apply the induction hypothesis to each ${\mathrm {per}}{\bigl \{{\mathsf {A}}{\bigl (i,\sigma (i)\bigr )}\bigr \}}$ in (67). The left-hand side of (65) is no greater than

$$\begin{aligned}&\sum _{\sigma \in {S}}\sum _{i=1}^{n} \left\{ \sum _{\begin{array}{c} \ell \ne {i} \\ a(\ell ,\sigma (i))=0 \end{array}} \frac{1}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j) {\,}+\sum _{\begin{array}{c} \ell \ne {i} \\ a(\ell ,\sigma (i))=1 \end{array}} \frac{1}{r_{\ell }-1}{\,}\sum _{j=1}^{r_{\ell }-1}\ln _{\alpha }(j) \right\} \nonumber \\&=\sum _{\sigma \in {S}}\sum _{\ell =1}^{n} \left\{ \sum _{\begin{array}{c} i\ne \ell \\ a(\ell ,\sigma (i))=0 \end{array}} \frac{1}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j) {\,}+\sum _{\begin{array}{c} i\ne \ell \\ a(\ell ,\sigma (i))=1 \end{array}} \frac{1}{r_{\ell }-1}{\,}\sum _{j=1}^{r_{\ell }-1}\ln _{\alpha }(j) \right\} \end{aligned}$$

(68)

$$\begin{aligned}&=\sum _{\sigma \in {S}}\sum _{\ell =1}^{n} \left\{ \frac{n-r_{\ell }}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j) {\,}+\sum _{j=1}^{r_{\ell }-1}\ln _{\alpha }(j)\right\} . \end{aligned}$$

(69)

In the step (68), we change an order of summation. The step (69) is posed as follows. First, the number of i such that $i\ne \ell $ and ${a}{\bigl (\ell ,\sigma (i)\bigr )}=0$ is equal to $(n-r_{\ell })$. Second, the number of i such that $i\ne \ell $ and ${a}{\bigl (\ell ,\sigma (i)\bigr )}=1$ is equal to $(r_{\ell }-1)$. These observations allow to compute the sums with respect to i and get (69). Adding the term ${\mathrm {per}}({\mathsf {A}})\sum _{1\le \ell \le {n}}\ln _{\alpha }(r_{\ell })$ to both (65) and (69), we immediately obtain

$$\begin{aligned} {\mathrm {per}}({\mathsf {A}}){\,}(-n){\,}\ln _{\alpha }\left( \frac{1}{{\mathrm {per}}({\mathsf {A}})}\right) \le {\mathrm {per}}({\mathsf {A}})\sum _{\ell =1}^{n}\frac{n}{r_{\ell }}{\,}\sum _{j=1}^{r_{\ell }}\ln _{\alpha }(j). \end{aligned}$$

The latter completes the proof. $\square $

In the limit $\alpha \rightarrow 1^{-}$, the result (60) leads to the previous result (56). In this regard, it is a proper extension of (51) to the parameter range $\alpha \in (0,1)$. Together, the bounds (51) and (60) cover all the values $\alpha >0$.

References

Alon, N., Spencer, J.H.: The probabilistic method. Wiley, New York (2008)
Book MATH Google Scholar
Bengtsson, I., Życzkowski, K.: Geometry of quantum states: an introduction to quantum entanglement. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Braunstein, S.L., Caves, C.M.: Information-theoretic Bell inequalities. Phys. Rev. Lett. 61, 662–665 (1988)
Article MathSciNet Google Scholar
Brégman, L.M.: Some properties of nonnegative matrices and their permanents. Soviet Math. Dokl. 14, 945–949 (1973)
MATH Google Scholar
Chung, F.R.K., Frankl, P., Graham, R., Shearer, J.B.: Some intersection theorems for ordered sets and graphs. J. Comb. Theory, Ser. A 43, 23–37 (1986)
Article MathSciNet MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, New York (1991)
Book MATH Google Scholar
Daróczy, Z.: Generalized information functions. Inform Control 16, 36–51 (1970)
Article MathSciNet MATH Google Scholar
Falniowski, F.: Generalized conditional entropy—determinicity of a process and Rokhlin’s formula. Open Sys. Information Dyn. 22, 1550025 (2015)
Article MathSciNet MATH Google Scholar
Furuichi, S.: Information-theoretical properties of Tsallis entropies. J. Math. Phys. 47, 023302 (2006)
Article MathSciNet MATH Google Scholar
Galvin, D.: Three tutorial lectures on entropy and counting. arXiv:1406.7872 [math.CO] (2014)
Galvin, D.: Counting colorings of a regular graph. Graphs Combin. 31, 629–638 (2015)
Article MathSciNet MATH Google Scholar
Galvin, D., Tetali, P.: On weighted graph homomorphisms. arXiv:1206.3160 [math.CO] (2012)
Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete mathematics. Addison-Wesley, New York (1994)
MATH Google Scholar
Havrda, J., Charvát, F.: Quantification methods of classification processes: concept of structural $\alpha $-entropy. Kybernetika 3, 30–35 (1967)
MathSciNet MATH Google Scholar
Kahn, J.: An entropy approach to the hard-core model on bipartite graphs. Comb. Prob. Comp. 10, 219–237 (2001)
MathSciNet MATH Google Scholar
Kleitman, D.J., Shearer, J., Sturtevant, D.: Intersections of $k$-element sets. Combinatorica 1, 381–384 (1981)
Article MathSciNet MATH Google Scholar
Madiman, M., Marcus, A.W., Tetali, P.: Entropy and set cardinality inequalities for partition-determined functions. Random Struct. Algor. 40, 1–26 (2012)
Article MathSciNet MATH Google Scholar
Minc, H.: Upper bounds for permanents of $(0,1)$-matrices. Bull. Amer. Math. Soc. 69, 789–791 (1963)
Article MathSciNet MATH Google Scholar
Mitrinović, D.S.: Analytic Inequalities. Springer-Verlag, Berlin (1970)
Book MATH Google Scholar
Radhakrishnan, J.: An entropy proof of Brégman’s theorem. J. Comb. Theory, Ser. A 77, 161–164 (1997)
Article MathSciNet MATH Google Scholar
Radhakrishnan, J.: Entropy and counting. In: Misra, J.C. (ed.) Computational mathematics, modelling and algorithms, pp. 146–168. Narosa Publishers, New Delhi (2003)
Google Scholar
Rastegin, A.E.: Convexity inequalities for estimating generalized conditional entropies from below. Kybernetika 48, 242–253 (2012)
MathSciNet MATH Google Scholar
Rastegin, A.E.: Tests for quantum contextuality in terms of $q$-entropies. Quantum Inf. Comput. 14, 0996–1013 (2014)
MathSciNet Google Scholar
Rastegin, A.E.: Further results on generalized conditional entropies. RAIRO-Theor. Inf. Appl. 49, 67–92 (2015)
Article MathSciNet MATH Google Scholar
Rényi, A.: On measures of entropy and information. In: Neyman, J. (ed.) Proc. 4th Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley, 547–561 (1961)
Schrijver, A.: A short proof of Minc’s conjecrute. J. Comb. Theory, Ser. A 25, 80–83 (1978)
Article MathSciNet MATH Google Scholar
Shitong, W., Chung, F.L.: Note on the equivalence relationship between Rényi-entropy based and Tsallis-entropy based image thresholding. Patt. Recogn. Lett. 26, 2309–2312 (2005)
Article Google Scholar
Słomczyński, W., Kwapień, J., Życzkowski, K.: Entropy computing via integration over fractal measures. Chaos 10, 180–188 (2000)
Article MathSciNet MATH Google Scholar
Tsallis, C.: Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 52, 479–487 (1988)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author is grateful to anonymous referees for valuable comments.

Author information

Authors and Affiliations

Department of Theoretical Physics, Irkutsk State University, Gagarin Bv. 20, Irkutsk, 664003, Russia
Alexey E. Rastegin

Authors

Alexey E. Rastegin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexey E. Rastegin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rastegin, A.E. Notes on use of Generalized Entropies in Counting. Graphs and Combinatorics 32, 2625–2641 (2016). https://doi.org/10.1007/s00373-016-1731-x

Download citation

Received: 15 May 2015
Revised: 25 July 2016
Published: 04 August 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s00373-016-1731-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Notes on use of Generalized Entropies in Counting

Abstract

Similar content being viewed by others

Counting sets with small sumset and applications

On the mean square of the remainder for the euclidean lattice point counting problem

Counting SET-Free Sets

1 Introduction

2 Definitions and Properties of the THC \(\alpha \)-Entropies

Proposition 1

Proof

Corollary 2

Proof

3 Shearer’s Lemma and Intersections of k-Element Sets

Proposition 3

Proof

Corollary 4

Proof

Lemma 5

Proof

Proposition 6

Proof

Proposition 7

Proof

Corollary 8

4 Upper Bounds on Permanents of (0, 1)-Matrices

Proposition 9

Proof

Proposition 10

Proof

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Notes on use of Generalized Entropies in Counting

Abstract

Similar content being viewed by others

Counting sets with small sumset and applications

On the mean square of the remainder for the euclidean lattice point counting problem

Counting SET-Free Sets

1 Introduction

2 Definitions and Properties of the THC \(\alpha \)-Entropies

Proposition 1

Proof

Corollary 2

Proof

3 Shearer’s Lemma and Intersections of k-Element Sets

Proposition 3

Proof

Corollary 4

Proof

Lemma 5

Proof

Proposition 6

Proof

Proposition 7

Proof

Corollary 8

4 Upper Bounds on Permanents of (0, 1)-Matrices

Proposition 9

Proof

Proposition 10

Proof

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation