1 Introduction

One of the extended approaches to convexity developed in the last century includes strongly convex functions as a subclass of convex functions (see [20] and for more recent contributions, [10, 11, 18]).

Let us recall that a function \(f\colon [a,b]\subseteq \mathbb{R}\rightarrow \mathbb{R} \) is strongly convex with modulus \(c>0\) if

$$ f(\lambda x+(1-\lambda )y)\leq \lambda f(x)+(1-\lambda )f(y)-c \lambda (1-\lambda )(x-y)^{2} $$
(1.1)

for all \(x,y\in \lbrack a,b]\) and \(\lambda \in \lbrack 0,1]\).

A function f that satisfies (1.1) with \(c=0\), i.e.,

$$ f(\lambda x+(1-\lambda )y)\leq \lambda f(x)+(1-\lambda )f(y), $$
(1.2)

is convex in the usual sense. Obviously, strong convexity implies convexity, but the reverse implication is not true in general. For example, a linear function is convex but is not strongly convex.

Comparing with convex functions, the strongly convex ones possess stronger versions of the analogous properties. One of their useful characterizations is given in the following lemma (see [23, p. 268], [11, 20], and the references therein).

Lemma 1

A function \(f\colon [a,b]\rightarrow \mathbb{R} \) is strongly convex with modulus \(c>0\) iff the function \(g\colon \lbrack a,b]\rightarrow \mathbb{R} \) defined by \(g(x)=f(x)-cx^{2}\) is convex.

We further use the well-known theorem proved by Stolz [19, p. 25].

Theorem 1

(Stolz)

Let \(f\colon [a,b]\rightarrow \mathbb{R} \) be a convex function. Then f is continuous on \((a,b)\) and has finite left and right derivatives at each point of \((a,b)\). Both \(f_{-}^{\prime}\) and \(f_{+}^{\prime}\) are nondecreasing on \((a,b)\). Moreover, for all \(x,y\in (a,b)\), \(x< y\), we have

$$ f_{-}^{\prime}(x)\leq f_{+}^{\prime}(x)\leq f_{-}^{\prime}(y)\leq f_{+}^{ \prime}(y). $$

Strongly convex functions are accompanied by the corresponding Jensen inequality, which was proved in [20].

Theorem 2

Let a function \(f\colon (a,b)\rightarrow \mathbb{R} \) be strongly convex with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) with \(\bar{x}=\frac{1}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\). Then

$$ f\left ( \bar{x}\right ) \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f \left ( x_{i}\right ) -\frac{c}{A_{n}}{\displaystyle \sum \limits _{i=1}^{n}} a_{i}(x_{i}-\bar{x})^{2}. $$
(1.3)

It is easily seen that for \(c=0\), inequality (1.3) becomes the Jensen inequality for convex functions:

$$ f\left ( \bar{x}\right ) \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f \left ( x_{i}\right ) . $$
(1.4)

Inequality (1.3) provides a better upper bound for \(f\left ( \bar{x}\right ) \) because of the nonnegativity of the term \(\frac{c}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}(x_{i}-\bar{x})^{2}\). Thus (1.3) is an improvement of (1.4) and is considered as its stronger variant.

Another Jensen-type inequality was established by Mercer [17]. Given a convex function \(f\colon (a,b)\rightarrow \mathbb{R} \) with \(m,M\in (a,b)\), \(m< M\), for \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\) and a nonnegative n-tuple \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) with \(\bar{x}=\frac{1}{A_{n}}{\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\), the Jensen–Mercer inequality states that

$$ f\left ( m+M-\bar{x}\right ) \leq f(m)+f(M)-\frac{1}{A_{n}}{\displaystyle \sum _{i=1}^{n}} a_{i}f(x_{i}). $$
(1.5)

Numerous improvements and generalizations of (1.5) have been obtained since. Here we accentuate two such results. In [15] the authors proved that for a convex function \(f\colon (a,b)\rightarrow \mathbb{R}\), \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\), where \(m,M\in (a,b)\), \(m< M\), and a nonnegative n-tuple \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) such that \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\), we have the inequalities

$$\begin{aligned} & f(c)+f^{\prime}(c)\left ( m+M-c-\dfrac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\right ) \\ & \hspace{0.5cm} \leq f(m)+f(M)-\dfrac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \hspace{0.5cm} \leq f(d)+f^{\prime}(m)(m-d)+f^{\prime}(M)(M-d)-\dfrac {1}{A_{n}} \sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})(x_{i}-d) \end{aligned}$$
(1.6)

for all \(c,d\in \lbrack m,M]\).

Furthermore, the following variant of the Jensen–Mercer inequality was proved in [18] for strongly convex functions.

Theorem 3

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function, and let \(m,M\in (a,b)\), \(m< M\). Let \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\), and let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple such that \({\textstyle \sum \nolimits _{i=1}^{n}} a_{i}=1\) with \(\bar{x}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}x_{i}\). Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} f(m+M-\bar{x}) & \leq f(m)+f(M)-\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & -c\left [ 2(M-m)^{2}\sum _{i=1}^{n}a_{i}\lambda _{i}(1-\lambda _{i})+\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}\right ] . \end{aligned}$$
(1.7)

For some recent results on the Jensen–Mercer inequality, see [13, 9, 1214, 16, 24].

With the aim of new improvements and elaborating the existing results, the paper is divided into five sections. In Section 1, we recall a few results needed further: some on strongly convex functions and some well-known ones, concerning convex functions. Sections 2 and 3 deal with the Jensen and Jensen–Mercer inequalities, both generalized by means of strongly convex functions. In Sect. 4, we discuss applications to Csiszár strong f-divergences introduced in [10], for which we provide new estimates and their particular types in the same manner. We also derive new estimates for the Shannon entropy. Section 5 deals with new Chebyshev-type inequalities.

2 The Jensen-type inequalities

We start this section with important properties of strongly convex functions, which are direct consequences of the characterizations given in Lemma 1 and Theorem 1.

Lemma 2

Let \(f\colon [a,b]\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Then it is continuous on \((a,b)\) and has finite left and right derivatives at each point of \((a,b)\). Both \(f_{-}^{\prime}\) and \(f_{+}^{\prime}\) are nondecreasing on \((a,b)\). Moreover, for all \(x,y\in (a,b)\), \(x< y\), we have

$$ f_{-}^{\prime}(x)-2cx\leq f_{+}^{\prime}(x)-2cx\leq f_{-}^{\prime}(y)-2cy \leq f_{+}^{\prime}(y)-2cy. $$
(2.1)

If f is differentiable, then \(f^{\prime}\) is strongly increasing on \((a,b)\), i.e., for all \(x,y\in (a,b)\), \(x< y\),

$$ f^{\prime}(x)+2c(y-x)\leq f^{\prime}(y). $$
(2.2)

Proof

Let id denote the identity function, i.e., \(id(t)=t\) for all \(t\in \lbrack a,b]\). Since f is strongly convex with modulus \(c>0\), the function \(g=f-c\cdot id^{2}\) is convex. Now, as an easy consequence of Theorem 1 applied to the convex function \(g=f-c\cdot id^{2}\), we get the first part of the statement.

If f is differentiable, then \(f^{\prime}(x)=f_{-}^{\prime}(x)=f_{+}^{\prime }(x)\) and \(f^{\prime}(y)=f_{-}^{\prime}(y)=f_{+}(y)\), and (2.1) implies (2.2). □

Bearing in mind the statement of the previous lemma, for a strongly convex function \(f\colon [a,b]\rightarrow \mathbb{R} \), by \(f^{\prime}(x)\), \(x\in (a,b)\), we mean that \(f^{\prime }(x)\) is any element from the interval \(\ [f_{-}^{\prime}(x),f_{+}^{\prime }(x)]\). If f is differentiable, then \(f^{\prime}(x)=f_{-}^{\prime}(x)=f_{+}^{\prime}(x)\).

Furthermore, for a strongly convex function \(f\colon [a,b]\rightarrow \mathbb{R} \) with modulus \(c>0\), we have

$$ f(x)\geq f(y)+f^{\prime}(y)(x-y)+c(x-y)^{2} $$
(2.3)

for all \(x,y\in (a,b)\). This inequality is as an easy consequence of the characterization of convex functions via support lines (see [21, Theorem 1.6]) applied to the convex function \(g=f-c\cdot id^{2}\).

A generalization and an improvement of Jensen’s inequality (1.3) for strongly convex functions is included in the following theorem.

Theorem 4

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\). Let \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) and \(\hat {x}_{i}=(1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}\), \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} 0 & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \hat{x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert - \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})\left \vert f^{ \prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}} \sum _{i=1}^{n}a_{i}f(\hat{x}_{i}) \\ & -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}( \hat{x}_{i})(\bar{x}-x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.4)

Proof

Applying the triangle inequality \(\left \vert \left \vert u\right \vert -\left \vert v\right \vert \right \vert \leq \left \vert u-v\right \vert \) to (2.3), we get

$$\begin{aligned} & \left \vert \left \vert f(x)-f(y)-c(x-y)^{2}\right \vert -\left \vert f^{\prime}(y)\right \vert \left \vert (x-y)\right \vert \right \vert \\ & \leq \left \vert f(x)-f(y)-c(x-y)^{2}-f^{\prime}(y)(x-y)\right \vert \\ & =f(x)-f(y)-c(x-y)^{2}-f^{\prime}(y)(x-y). \end{aligned}$$
(2.5)

Setting \(y=\hat{x}_{i}\) and \(x=x_{i}\), \(i\in \{1,\ldots,n\}\), from (2.5) we have

$$\begin{aligned} & \left \vert \left \vert f(x_{i})-f(\hat{x}_{i})-c(1-\lambda _{i})^{2}( \bar {x}-x_{i})^{2}\right \vert -(1-\lambda _{i})\left \vert f^{ \prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \left \vert f(x_{i})-f(\hat{x}_{i})-(1-\lambda _{i})f^{\prime}( \hat {x}_{i})(\bar{x}-x_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2} \right \vert \\ & =f(x_{i})-f(\hat{x}_{i})-(1-\lambda _{i})f^{\prime}(\hat{x}_{i})( \bar {x}-x_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$

Now multiplying by \(a_{i}\), summing over i, \(i=1,\ldots,n\), and then dividing by \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), we get

$$\begin{aligned} & \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert \left \vert f(x_{i})-f( \hat {x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert -(1- \lambda _{i})\left \vert f^{\prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \hat{x}_{i})-(1-\lambda _{i})f^{\prime}(\hat{x}_{i})(\bar{x}-x_{i})-c(1- \lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert \\ & =\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(\hat{x}_{i}) \\ & -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}( \hat{x}_{i})(\bar{x}-x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.6)

By the triangle inequality (\(\left \vert \sum _{i=1}^{n}a_{i}z_{i}\right \vert \leq \sum _{i=1}^{n}a_{i}\left \vert z_{i}\right \vert \)), we also have

$$\begin{aligned} & \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \hat {x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert - \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})\left \vert f^{\prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert \left \vert f(x_{i})-f(\hat{x}_{i})-c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}\right \vert -(1-\lambda _{i})\left \vert f^{\prime}(\hat{x}_{i})\right \vert \left \vert \bar{x}-x_{i}\right \vert \right \vert . \end{aligned}$$
(2.7)

Now combining (2.6) and (2.7), we get (2.4). □

The following corollary is a direct consequence of Theorem 4.

Corollary 1

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Then

$$\begin{aligned} 0 & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \bar{x})-c(x_{i}-\bar{x})^{2}\right \vert -\left \vert f^{\prime }( \bar{x})\right \vert \cdot \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert x_{i}-\bar{x}\right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-f(\bar{x})- \frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}-\bar{x})^{2}. \end{aligned}$$
(2.8)

Proof

Setting \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (2.4) we get

$$\begin{aligned} 0 & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f( \bar{x})-c(\bar{x}-x_{i})^{2}\right \vert -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i} \left \vert f^{\prime}(\bar{x})\right \vert \left \vert \bar {x}-x_{i} \right \vert \right \vert \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}} \sum _{i=1}^{n}a_{i}f(\bar{x}) \\ & -\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(\bar{x})(\bar{x}-x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.9)

Note that

$$ \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(\bar{x})(x_{i}- \bar{x})=f^{\prime}(\bar{x})\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}- \bar{x})=0. $$
(2.10)

Now combining (2.9) and (2.10), we get (2.8). □

Finally, in a similar manner, we get an inequality, which counterparts the Jensen inequality (1.3).

Theorem 5

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f((1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}) \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}(x_{i})(x_{i}-\bar{x})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.11)

Proof

With (2.3) slightly modified, for \(x=(1-\lambda _{i})\bar {x}+\lambda _{i}x_{i }\) and \(y=y_{i}\), \(i\in \{1,\ldots,n\}\), we have

$$ f((1-\lambda _{i})\bar{x}+\lambda _{i}x_{i})-f(x_{i})\geq f^{\prime}(x_{i})(1-\lambda _{i})(\bar{x}-x_{i})+c(1-\lambda _{i})^{2}(\bar{x}-x_{i})^{2}. $$

Now multiplying by \(a_{i}\), summing over i, \(i=1,\ldots,n\), and then dividing by \(A_{n}>0\), we get

$$\begin{aligned} & \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f((1-\lambda _{i})\bar{x}+ \lambda _{i}x_{i})-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \geq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})f^{\prime}(x_{i})(\bar{x}-x_{i})+\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(1-\lambda _{i})^{2}( \bar {x}-x_{i})^{2}, \end{aligned}$$

which is equivalent to (2.11). □

Again, a direct consequence of Theorem 5 follows by setting \(\lambda _{i}=0\) for \(i=1,\ldots,n\).

Corollary 2

Let \(f\colon (a,b)\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Suppose \(\boldsymbol{x}=\left ( x_{{1}},\ldots,x_{n}\right ) \in (a,b)^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\). Then

$$\begin{aligned} & 0\leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-f(\bar{x}) \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})x_{i}- \frac{1}{A_{n}^{2}}\sum _{i=1}^{n}a_{i}x_{i}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}. \end{aligned}$$
(2.12)

Remark 1

Our results generalize and improve the main results obtained in [7, 8], which were related to convex functions.

3 The Jensen–Mercer-type inequalities

We embark on further investigation of the Jensen–Mercer inequality (1.5). Along the way, we generalize and improve results (1.6) from [15] and (1.7) from [18].

Theorem 6

Let a function \(f\colon (a,b)\rightarrow \mathbb{R} \) be strongly convex with modulus \(c>0\), and let \(m,M\in (a,b)\), \(m< M\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Suppose \(\boldsymbol{x}=\left ( x_{1},\ldots,x_{n}\right ) \in \lbrack m,M]^{n}\) and \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) is a nonnegative n-tuple with \(A_{n}={\textstyle \sum \nolimits _{i=1}^{n}} a_{i}>0\) and \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) Then

$$\begin{aligned} & f(d)+f^{\prime}(d)\left ( m+M-d-\bar{x}\right ) +\frac{c}{A_{n}} \sum _{i=1}^{n}a_{i}(m+M-d-x_{i})^{2} \\ & +\frac{2c(M-m)^{2}}{A_{n}}\sum _{i=1}^{n}a_{i}\lambda _{i}(1- \lambda _{i}) \\ & \leq f(m)+f(M)-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\frac{1}{A_{n}} \sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})(x_{i}-e) \\ & -\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(M-x_{i})(3x_{i}-2e-M)+c(m-e)^{2} \end{aligned}$$
(3.1)

for all \(d,e\in \lbrack m,M]\).

Proof

Let \(\lambda _{i}\in \lbrack 0,1]\), \(x_{i}\in \lbrack m,M]\), and \(y_{i}=m+M-x_{i}\), \(i\in \{1,\ldots,n\}\). Then we can write as convex combinations:

$$\begin{aligned} x_{i} & =\lambda _{i}m+(1-\lambda _{i})M, \\ y_{i} & =(1-\lambda _{i})m+\lambda _{i}M,\text{ \ \ }i\in \{1,\ldots,n\}. \end{aligned}$$

Applying (1.1) twice, we have

$$\begin{aligned} f(m+M-x_{i}) & =f((1-\lambda _{i})m+\lambda _{i}M) \\ & \leq (1-\lambda _{i})f(m)+\lambda _{i}f(M)-c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(m)+f(M)-\lambda _{i}f(m)+\lambda _{i}f(M)-f(M)-c\lambda _{i}(1- \lambda _{i})(M-m)^{2} \\ & =f(m)+f(M)-\left [ \lambda _{i}f(m)+(1-\lambda _{i})f(M)\right ] -c \lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(m)+f(M)-f(\lambda _{i}m+(1-\lambda _{i})M)-2c\lambda _{i}(1- \lambda _{i})(M-m)^{2} \\ & =f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$

Further, applying (2.3), we get

$$ f(d)+f^{\prime}(d)(m+M-x_{i}-d)+c(m+M-x_{i}-d)^{2}\leq f(m+M-x_{i}), $$

which, combined with the previous inequality, implies

$$\begin{aligned} & f(d)+f^{\prime}(d)(m+M-x_{i}-d)+c(m+M-x_{i}-d)^{2} \\ & \leq f(m+M-x_{i}) \\ & \leq f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.2)

Furthermore, for \(x_{i},e\in \lbrack m,M]\), \(i\in \{1,\ldots,n\}\), by (2.3) we have

$$\begin{aligned} f(m)-f(e) & \leq f^{\prime}(m)(m-e)+c(m-e)^{2}, \\ f(M)-f(x_{i}) & \leq f^{\prime}(M)(M-x_{i})+c(M-x_{i})^{2}. \end{aligned}$$
(3.3)

Using (3.3), we have

$$\begin{aligned} & f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(e)+f(m)-f(e)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-x_{i}) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-f^{\prime}(M)(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.4)

Since \(f^{\prime}\) is strongly increasing and \(x_{i}\leq M\), by (2.2) we have \(-f^{\prime}(M)\leq -f^{\prime}(x_{i})-2c(M-x_{i})\), i.e.,

$$\begin{aligned} & f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-f^{\prime}(M)(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\left [ f^{\prime}(x_{i})+2c(M-x_{i})\right ] (x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.5)

Combining (3.4) and (3.5), we get

$$\begin{aligned} & f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\left [ f^{\prime}(x_{i})+2c(M-x_{i})\right ] (x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & =f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e) \\ & -f^{\prime}(x_{i})(x_{i}-e)-2c(M-x_{i})(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$
(3.6)

Finally, from (3.2) and (3.6) we have

$$\begin{aligned} & f(d)+f^{\prime}(d)(m+M-x_{i}-d)+c(m+M-x_{i}-d)^{2} \\ & \leq f(m)+f(M)-f(x_{i})-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2} \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e) \\ & -f^{\prime}(x_{i})(x_{i}-e)-2c(M-x_{i})(x_{i}-e) \\ & +c(m-e)^{2}+c(M-x_{i})^{2}-2c\lambda _{i}(1-\lambda _{i})(M-m)^{2}. \end{aligned}$$

Multiplying it by \(a_{i}\), summing over \(i,i=1,\ldots,n\), and then dividing by \(A_{n}>0\), we get (3.1). □

Remark 2

In particular, if we set \(A_{n}=1\) and \(d=m+M-\bar{x}\), then the first inequality in (3.1) becomes (1.7) from [18], which makes it a generalization. Furthermore, our result (3.1) improves (1.6) from [15].

As an easy consequence of the previous theorem, we get the following inequality of the Jensen–Mercer type.

Corollary 3

Let the assumptions of Theorem 6hold. Then

$$\begin{aligned} & f(m+M-\bar{x})+\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2}+\frac{2c(M-m)^{2}}{A_{n}}\sum _{i=1}^{n}a_{i}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i}) \\ & \leq f(\bar{x})+f^{\prime}(m)(m-\bar{x})+f^{\prime}(M)(M-\bar{x})- \frac {1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})(x_{i}-\bar{x}) \\ & -\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(M-x_{i})(3x_{i}-2\bar{x}-M)+c(m- \bar {x})^{2}. \end{aligned}$$
(3.7)

Proof

Choosing \(e=\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}\) and \(d=m+M-\bar{x}\), from (3.1) we get (3.7). □

4 Applications to strong f-divergences and the Shannon entropy

Let \(\mathcal{P}_{n}=\left \{ \mathbf{p}=(p_{1},\ldots,p_{n})\colon p_{1},\ldots,p_{n}>0,{\textstyle \sum \nolimits _{i=1}^{n}} p_{i}=1\right \} \) be the set of all complete finite discrete probability distributions. The restriction to positive distributions is only for convenience. If we take \(p_{i}=0\) for some \(i\in \left \{ 1,\ldots,n\right \} \), then in the following results, we need to interpret undefined expressions as \(f(0)=\lim _{t\rightarrow 0+}f(t)\), \(0f\left ( \frac{0}{0}\right ) =0\), and \(0f\left ( \frac{e}{0}\right ) =\lim _{\varepsilon \rightarrow 0+}f \left ( \dfrac{e}{\varepsilon}\right ) =e\lim _{t\rightarrow \infty} \frac{f(t)}{t}\), \(e>0\).

I. Csiszár [5] introduced an important class of statistical divergences by means of convex functions.

Definition 1

Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a convex function, and let \(\mathbf{p,q}\in \mathcal{P}_{n}\). The Csiszár f-divergence is defined as

$$ D_{f}(\mathbf{q},\mathbf{p})=\sum \limits _{i=1}^{n}p_{i}f\left ( \frac{q_{i}}{p_{i}}\right ) . $$
(4.1)

It has deep and fruitful applications in various branches of science (see, e.g., [4, 22] with references therein) and is involved in the following Csiszár–Körner inequality (see [6]).

Theorem 7

Let \(\mathbf{p,q}\in \mathcal{P}_{n}\). If \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a convex function, then

$$ 0\leqslant D_{f}(\mathbf{q},\mathbf{p})-f\left ( 1\right ) . $$
(4.2)

Remark 3

If f is normalized, i.e., \(f(1)=0\), then from (4.2) it follows that

$$ 0\leqslant D_{f}(\mathbf{q},\mathbf{p})\text{ \ \ with \ \ }D_{f}(\mathbf{q},\mathbf{p})=0\text{ \ \ if and only if \ \ } \mathbf{\mathbf{q}}=\mathbf{\mathbf{p.}} $$
(4.3)

Two distributions q and p are very similar if \(D_{f}(\mathbf{q},\mathbf{p})\) is very close to zero.

Recently, in [10] a new concept of f-divergences was introduced: when (4.1) is defined for a strongly convex function f, it is denoted with \(\tilde{D}_{f}(\mathbf{q},\mathbf{p})\) and is referred to as strong f-divergence. Accordingly, in [10] the following improvement of the Csiszár–Körner inequality for strong f-divergences was obtained.

Theorem 8

Let \(\mathbf{p,q}\in \mathcal{P}_{n}\). If \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a strongly convex function with modulus \(c>0\), then

$$ 0\leqslant \tilde{D}_{f}(\mathbf{q},\mathbf{p})-f\left ( 1\right ) -c \tilde {D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p}), $$
(4.4)

where \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \limits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}} \right ) ^{2}-1\).

Remark 4

Here \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \limits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}} \right ) ^{2}-1\) denotes the strong chi-squared distance obtained for the strongly convex function \(f(x)=(x-1)^{2}\) with modulus \(c=1\).

Additionally, if \(f(1)=0\), then from (4.4) we have

$$ 0\leqslant c\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p}) \leqslant \tilde {D}_{f}(\mathbf{q},\mathbf{p}). $$
(4.5)

Inequalities (4.4) and (4.5) improve (4.2) and (4.3).

We further use the results from the previous sections to prove new estimates for strong f-divergences.

Corollary 4

Let \(\mathbf{p,q}\in \mathcal{P}_{n}\), \(r_{i}=1-\lambda _{i}\left ( 1-\frac{q_{i}}{p_{i}}\right )\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Then

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}}\right ) -f\left ( r_{i}\right ) -c(1-\lambda _{i})\left ( 1- \frac {q_{i}}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n}(1- \lambda _{i})\left \vert f^{\prime}\left ( r_{i}\right ) \right \vert \left \vert p_{i}-q_{i}\right \vert \right \vert \\ & \leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})-\sum _{i=1}^{n}p_{i}f(r_{i}) \\ & -\sum _{i=1}^{n}(1-\lambda _{i})f^{\prime}(r_{i})\left ( p_{i}-q_{i} \right ) -c\sum _{i=1}^{n}(1-\lambda _{i})^{2}\left ( p_{i}-q_{i} \right ) ^{2}. \end{aligned}$$
(4.6)

In particular, we have

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}}\right ) -f(1)-c\left ( \frac{q_{i}}{p_{i}}-1\right ) ^{2} \right \vert -\left \vert f^{\prime}(1)\right \vert \cdot \sum _{i=1}^{n} \left \vert p_{i}-q_{i}\right \vert \right \vert \\ & \leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})-f(1)-c\tilde{D}_{\chi ^{2}}(\mathbf{q},\mathbf{p}), \end{aligned}$$
(4.7)

where \(\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})={\textstyle \sum \nolimits _{i=1}^{n}} p_{i}\left ( \frac{q_{i}}{p_{i}}\right ) ^{2}-1\).

If, in addition, f is normalized, then

$$ 0\leq \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}} \right ) -c\left ( \frac{q_{i}}{p_{i}}-1\right ) ^{2}\right \vert \leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})-c\tilde{D}_{\chi ^{2}}(\mathbf{q}, \mathbf{p}). $$
(4.8)

Proof

Applying (2.4) to \(x_{i}=\frac{q_{i}}{p_{i}}\), \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\) and \(\hat{x}_{i}=(1-\lambda _{i})\bar{x}+\lambda _{i}x_{i}=(1-\lambda _{i})+ \lambda _{i}\frac{q_{i}}{p_{i}}=1-\lambda _{i}\left ( 1- \frac{q_{i}}{p_{i}}\right ) =r_{i}\), \(i\in \{1,\ldots,n\}\), we get

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert f\left ( \frac{q_{i}}{p_{i}}\right ) -f\left ( r_{i}\right ) -c(1-\lambda _{i})\left ( 1- \frac {q_{i}}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n}p_{i}(1- \lambda _{i})\left \vert f^{\prime}\left ( r_{i}\right ) \right \vert \left \vert 1-\frac{q_{i}}{p_{i}}\right \vert \right \vert \\ & \leq \sum _{i=1}^{n}p_{i}f\left ( \frac{q_{i}}{p_{i}}\right ) - \sum _{i=1}^{n}p_{i}f\left ( r_{i}\right ) \\ & -\sum _{i=1}^{n}p_{i}(1-\lambda _{i})f^{\prime}\left ( r_{i}\right ) \left ( 1-\frac{q_{i}}{p_{i}}\right ) -c\sum _{i=1}^{n}p_{i}(1- \lambda _{i})^{2}\left ( 1-\frac{q_{i}}{p_{i}}\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.6).

If \(\lambda _{i}=0\), \(i=1,\ldots,n\), then \(r_{i}=1\), \(i=1,\ldots,n\), and from (4.6) we get (4.7). If, in addition, f is normalized, i.e., \(f(1)=0\), then (4.7) implies (4.8). □

Corollary 5

Let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\), and let \(\mathbf{p,q}\in \mathcal{P}_{n}\). Suppose \(f\colon (0,\infty )\rightarrow \mathbb{R} \) is a strongly convex function with modulus \(c>0\). Then

$$\begin{aligned} & \tilde{D}_{f}(\mathbf{q},\mathbf{p})-\sum _{i=1}^{n}p_{i}f\left ( 1- \lambda _{i}\left ( 1-\frac{q_{i}}{p_{i}}\right ) \right ) \\ & \leq \sum _{i=1}^{n}(1-\lambda _{i})f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i}\right ) -c\sum _{i=1}^{n}p_{i}(1- \lambda _{i})^{2}\left ( 1-\frac{q_{i}}{p_{i}}\right ) ^{2}. \end{aligned}$$
(4.9)

In particular,

$$ \tilde{D}_{f}(\mathbf{q},\mathbf{p})-f\left ( 1\right ) \leq \sum _{i=1}^{n}f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i} \right ) -c\tilde{D}_{\chi ^{2}}(\mathbf{q},\mathbf{p}). $$
(4.10)

If, in addition, f is normalized, then

$$ 0\leq \tilde{D}_{f}(\mathbf{q},\mathbf{p})\leq \sum _{i=1}^{n}f^{ \prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i}\right ) -c\tilde{D}_{\chi ^{2}}(\mathbf{q},\mathbf{p}). $$
(4.11)

Proof

Applying (2.11) to \(x_{i}=\frac{q_{i}}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\), we get

$$\begin{aligned} & \sum _{i=1}^{n}p_{i}f\left ( \frac{q_{i}}{p_{i}}\right ) -\sum _{i=1}^{n}p_{i}f\left ( 1-\lambda _{i}+\lambda _{i}\frac{q_{i}}{p_{i}} \right ) \\ & \leq \sum _{i=1}^{n}p_{i}(1-\lambda _{i})f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( \frac{q_{i}}{p_{i}}-1\right ) -c\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( 1-\frac{q_{i}}{p_{i}}\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.9).

Choosing \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (4.9) we get (4.10). Further, for a normalized function f, (4.10) implies (4.11). □

Corollary 6

Let \(f\colon (0,\infty )\rightarrow \mathbb{R} \) be a strongly convex function with modulus \(c>0\). Let \(\mathbf{p,q}\in \mathcal{P}_{n}\) with \(\frac{q_{i}}{p_{i}}\in \lbrack m,M]\), \(0< m< M\), and \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & f(d)+f^{\prime}(d)\left ( m+M-d-1\right ) +c\sum _{i=1}^{n}p_{i} \left ( m+M-d-\frac{q_{i}}{p_{i}}\right ) ^{2} \\ & +2c(M-m)^{2}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\tilde{D}_{f}(\mathbf{q},\mathbf{p}) \\ & \leq f(e)+f^{\prime}(m)(m-e)+f^{\prime}(M)(M-e)-\sum _{i=1}^{n}p_{i}f^{\prime}\left ( \frac{q_{i}}{p_{i}}\right ) \left ( \frac{q_{i}}{p_{i}}-e\right ) \\ & -c\sum _{i=1}^{n}p_{i}\left ( M-\frac{q_{i}}{p_{i}}\right ) \left ( 3 \frac{q_{i}}{p_{i}}-2e-M\right ) +c(m-e)^{2} \end{aligned}$$
(4.12)

for all \(d,e\in \lbrack m,M]\).

In particular,

$$\begin{aligned} & f(m+M-1)+c\tilde{D}_{\varkappa ^{2}}(\mathbf{q},\mathbf{p})+2c(M-m)^{2}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\tilde{D}_{f}(\mathbf{q},\mathbf{p}) \\ & \leq f(1)+f^{\prime}(m)(m-1)+f^{\prime}(M)(M-1)-\sum _{i=1}^{n}f^{ \prime }\left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i} \right ) \\ & -c\sum _{i=1}^{n}p_{i}\left ( M-\frac{q_{i}}{p_{i}}\right ) \left ( 3 \frac{q_{i}}{p_{i}}-2-M\right ) +c(m-1)^{2}. \end{aligned}$$
(4.13)

If, in addition, f is normalized, then

$$\begin{aligned} & f(m+M-1)+c\tilde{D}_{f}(\mathbf{q},\mathbf{p})+2c(M-m)^{2}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq f(m)+f(M)-\tilde{D}_{f}(\mathbf{q},\mathbf{p}) \\ & \leq f^{\prime}(m)(m-1)+f^{\prime}(M)(M-1)-\sum _{i=1}^{n}f^{\prime} \left ( \frac{q_{i}}{p_{i}}\right ) \left ( q_{i}-p_{i}\right ) \\ & -c\sum _{i=1}^{n}p_{i}\left ( M-\frac{q_{i}}{p_{i}}\right ) \left ( 3 \frac{q_{i}}{p_{i}}-2-M\right ) +c(m-1)^{2}. \end{aligned}$$
(4.14)

Proof

Applying (3.1) to \(x_{i}=\frac{q_{i}}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}q_{i}=1\), we get (4.12).

In a particular case, for \(e=1\) and \(d=m+M-1\), from (4.12) we get (4.13). If, in addition, \(f(1)=0\), then (4.13) implies (4.14). □

Applying the previous corollaries to the corresponding generating strongly convex function f, we derive new estimates for some well-known divergences, which are particular cases of the strong f-divergence. Here we consider a few of the most commonly used divergences.

Example 1

The strong Kullback–Leibler divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{KL}(\mathbf{\mathbf{q},\mathbf{p}})={\displaystyle \sum \limits _{i=1}^{n}} q_{i}\ln \left ( \frac{q_{i}}{p_{i}}\right ) , $$
(4.15)

where the generating function is \(f(t)=t\ln t\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{t}\), we have \(f^{\prime \prime }\geqslant \frac{1}{l}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{2l}\).

Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=t\ln t\) with \(c=\frac{1}{2l}\), we may derive new estimates for the strong Kullback–Leibler divergence \(\tilde{D}_{KL}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 2

The strong squared Hellinger divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{h^{2}}(\mathbf{q,p})=\sum _{i=1}^{n}(\sqrt{p_{i}}-\sqrt{q_{i}})^{2}, $$

where the generating function is \(f(t)=\left ( \sqrt{t}-1\right ) ^{2}\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{2\sqrt{l^{3}}}\), we have \(f^{\prime \prime}\geqslant \frac{1}{2\sqrt{l^{3}}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{4\sqrt{l^{3}}}\).

Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=\left ( \sqrt{t}-1\right ) ^{2}\) with \(c=\frac{1}{4\sqrt{l^{3}}}\), we may derive new estimates for the strong squared Hellinger divergence \(\tilde{D}_{h^{2}}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 3

The strong Bhattacharya distance of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{B}(\mathbf{q,p})=-{\displaystyle \sum \limits _{i=1}^{n}} \sqrt{p_{i}q_{i}}, $$

where the generating function is \(f(t)=-\sqrt{t}\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{4\sqrt{l^{3}}}\), we have \(f^{\prime \prime}\geqslant \frac{1}{4\sqrt{l^{3}}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{8\sqrt{l^{3}}}\).

Applying inequalities (4.6), (4.7), (4.8), (4.9), (4.10), (4.11), and (4.12) to \(f(t)=-\sqrt{t}\) with \(c=\frac{1}{8\sqrt{l^{3}}}\), we may derive new estimates for the strong Bhattacharya distance \(\tilde{D}_{B}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 4

The strong Jeffreys distance of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$ \tilde{D}_{J}(\mathbf{q,p})={\displaystyle \sum \limits _{i=1}^{n}} (q_{i}-p_{i})\ln \frac{q_{i}}{p_{i}}=\tilde{D}_{KL}(\mathbf{\mathbf{q},\mathbf{p}})+\tilde{D}_{KL}(\mathbf{p,q}), $$

where the generating function is \(f(t)=(t-1)\ln t\) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{t+1}{t^{2}}\), we have \(f^{\prime \prime}\geqslant \frac{l+1}{l^{2}}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{l+1}{2l^{2}}\).

Applying inequalities (4.6), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=(t-1)\ln t\) with \(c=\frac{l+1}{2l^{2}}\), we may derive new estimates for the strong Jeffreys distance \(\tilde{D}_{J}(\mathbf{\mathbf{q},\mathbf{p}})\).

Example 5

The strong Jensen–Shannon divergence of \(\mathbf{p,q}\in \mathcal{P}_{n}\) is defined by

$$\begin{aligned} \tilde{D}_{JS}(\mathbf{q},\mathbf{p}) & =\frac{1}{2}\left [ \sum \limits _{i=1}^{n}q_{i}\ln \frac{2q_{i}}{p_{i}+q_{i}}+\sum \limits _{i=1}^{n}p_{i}\frac{2p_{i}}{p_{i}+q_{i}}\right ] \\ & =\frac{1}{2}\left [ \tilde{D}_{KL}\left ( \mathbf{\mathbf{q},}\frac{\mathbf{\mathbf{p+q}}}{2}\right ) +\tilde{D}_{KL}\left ( \mathbf{\mathbf{p},}\frac{\mathbf{\mathbf{p+q}}}{2}\right ) \right ] , \end{aligned}$$

where the generating function is \(f(t)=\frac{1}{2}\left ( t\ln \frac{2t}{1+t}+\ln \frac{2}{1+t}\right ) \) for \(t\in (0,\infty )\). Fix \(l>0\). Since \(f^{\prime \prime}(t)=\frac{1}{2t(1+t)}\), we have \(f^{\prime \prime}\geqslant \frac{1}{2l(1+l)}\) on \([m,l]\), \(0< m< l\), and the function \(f|_{[m,l]}\) is strongly convex with modulus \(c=\frac{1}{4l(1+l)}\).

Applying inequalities (4.6)), (4.8), (4.9), (4.11), (4.12), and (4.14) to \(f(t)=\frac{1}{2}\big ( t\ln \frac{2t}{1+t}+ \ln \frac{2}{1+t}\big ) \) with \(c=\frac{1}{4l(1+l)}\), we may derive new estimates for the strong Jensen–Shannon divergence \(\tilde{D}_{JS}(\mathbf{\mathbf{q},\mathbf{p}})\).

We now consider the Shannon entropy [25], defined for a random variable X in terms of its probability distribution p as

$$ S(\mathbf{p})={\displaystyle \sum \limits _{i=1}^{n}} p_{i}\ln \frac{1}{p_{i}}=-{\displaystyle \sum \limits _{i=1}^{n}} p_{i}\ln p_{i}. $$
(4.16)

It quantifies the unevenness in p and satisfies the relation

$$ 0\leqslant S(\mathbf{p})\leqslant \ln n. $$

Using the results from the previous sections, we obtain new estimates for the Shannon entropy.

Corollary 7

Let \(l>0\), and let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in (0,l]\). Let \(\bar{p}_{i}=n-\lambda _{i}\left ( n-\frac{1}{p_{i}}\right ) \), \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} S(\mathbf{p}) & \leq S(\mathbf{p})+\left \vert \sum _{i=1}^{n}p_{i} \left \vert \ln p_{i}\bar{p}_{i}-\frac{1-\lambda _{i}}{2l^{2}}\left ( n- \frac{1}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n} \frac{p_{i}}{\bar{p}_{i}}(1-\lambda _{i})\left \vert n-\frac{1}{p_{i}}\right \vert \right \vert \\ & \leq \sum _{i=1}^{n}p_{i}\ln \bar{p}_{i}+\sum _{i=1}^{n} \frac{p_{i}}{\bar {p}_{i}}(1-\lambda _{i})\left ( n-\frac{1}{p_{i}} \right ) -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( n-\frac{1}{p_{i}} \right ) ^{2}. \end{aligned}$$
(4.17)

In particular, we have

$$\begin{aligned} S(\mathbf{p}) & \leq S(\mathbf{p})+\left \vert \sum _{i=1}^{n}p_{i} \left \vert \ln p_{i}\bar{p}_{i}-\frac{1}{2l^{2}}\left ( n- \frac{1}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n} \frac{p_{i}}{\bar{p}_{i}}\left \vert n-\frac {1}{p_{i}}\right \vert \right \vert \\ & \leq \sum _{i=1}^{n}p_{i}\ln \bar{p}_{i}+\sum _{i=1}^{n} \frac{p_{i}}{\bar {p}_{i}}\left ( n-\frac{1}{p_{i}}\right ) - \frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( n-\frac{1}{p_{i}}\right ) ^{2}. \end{aligned}$$
(4.18)

Proof

Applying (2.4) to the function \(f(t)=-\ln t\), \(t\in (0,l]\), strongly convex with modulus \(c=\frac{1}{2l^{2}}\), and \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i}\frac{1}{p_{i}}=n\) and \(\hat{x}_{i}=(1-\lambda _{i})\bar {x}+\lambda _{i}x_{i}=(1-\lambda _{i})n+ \lambda _{i}\frac{1}{p_{i}}=n-\lambda _{i}\left ( n-\frac{1}{p_{i}} \right ) =\bar{p}_{i}\), \(i\in \{1,\ldots,n\}\), we get

$$\begin{aligned} 0 & \leq \left \vert \sum _{i=1}^{n}p_{i}\left \vert -\ln \frac{1}{p_{i}}+\ln \bar{p}_{i}-\frac{1-\lambda _{i}}{2l^{2}}\left ( n- \frac{1}{p_{i}}\right ) ^{2}\right \vert -\sum _{i=1}^{n}p_{i}(1- \lambda _{i})\left \vert \frac{1}{\bar {p}_{i}}\right \vert \left \vert n-\frac{1}{p_{i}}\right \vert \right \vert \\ & \leq -\sum _{i=1}^{n}p_{i}\ln \frac{1}{p_{i}}+\sum _{i=1}^{n}p_{i} \ln \bar {p}_{i}+\sum _{i=1}^{n}\frac{p_{i}}{\bar{p}_{i}}(1-\lambda _{i}) \left ( n-\frac{1}{p_{i}}\right ) \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( n- \frac {1}{p_{i}}\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.17).

Choosing \(\lambda _{i}=0\), \(i=1,\ldots,n\), from (4.17) we get (4.18). □

Corollary 8

Let \(l>0\), let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in (0,l]\), and let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & \sum _{i=1}^{n}p_{i}^{2}(1-\lambda _{i})\left ( \frac{1}{p_{i}}-n \right ) +\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2} \left ( \frac{1}{p_{i}}-n\right ) ^{2} \\ & +\sum _{i=1}^{n}p_{i}\ln \left ( (1-\lambda _{i})n+ \frac{\lambda _{i}}{p_{i}}\right ) \\ & \leq S(\mathbf{p}). \end{aligned}$$
(4.19)

In particular, we have

$$ \ln n+1-n\sum _{i=1}^{n}p_{i}^{2}+\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i} \left ( \frac{1}{p_{i}}-n\right ) ^{2}\leq S(\mathbf{p}). $$
(4.20)

Proof

Applying (2.11) to the strongly convex function \(f(t)=-\ln t\), \(t\in (0,l]\), with modulus \(c=\frac{1}{2l^{2}}\), and to \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i} \frac{1}{p_{i}}=n\), we get

$$\begin{aligned} & -\sum _{i=1}^{n}p_{i}\ln \frac{1}{p_{i}}+\sum _{i=1}^{n}p_{i}\ln \left ( (1-\lambda _{i})n+\frac{\lambda _{i}}{p_{i}}\right ) \\ & \leq -\sum _{i=1}^{n}p_{i}(1-\lambda _{i})\left ( \frac{1}{p_{i}} \right ) ^{-1}\left ( \frac{1}{p_{i}}-n\right ) -\frac{1}{2l^{2}} \sum _{i=1}^{n}p_{i}(1-\lambda _{i})^{2}\left ( \frac{1}{p_{i}}-n\right ) ^{2}, \end{aligned}$$

which is equivalent to (4.19). If we choose \(\lambda _{i}=0\), \(i=1,\ldots,n\), then (4.19) implies (4.20). □

Corollary 9

Let \(0< m< l\), let \(\mathbf{p}\in \mathcal{P}_{n}\) be such that \(\frac{1}{p_{1}},\ldots,\frac{1}{p_{n}}\in \lbrack m,l]\), and let \(\lambda _{i}\in \lbrack 0,1]\), \(i\in \{1,\ldots,n\}\). Then

$$\begin{aligned} & \frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( m+l-d-\frac{1}{p_{i}} \right ) ^{2}-\frac{1}{d}\left ( m+l-d-n\right ) \\ & +\frac{(l-m)^{2}}{l^{2}}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i})+ \ln \frac{ml}{d} \\ & \leq S(\mathbf{p}) \\ & \leq \frac{1}{m}(e-m)+\frac{1}{l}(e-l)+\sum _{i=1}^{n}p_{i}^{2} \left ( e-\frac{1}{p_{i}}\right ) +\ln \frac{ml}{e} \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( l-\frac{1}{p_{i}} \right ) \left ( \frac{3}{p_{i}}-2e-l\right ) + \frac{(m-e)^{2}}{2l^{2}} \end{aligned}$$
(4.21)

for all \(d,e\in \lbrack m,l]\).

In particular, we have

$$\begin{aligned} & \frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( n-\frac{1}{p_{i}} \right ) ^{2}+\frac{(l-m)^{2}}{l^{2}}\sum _{i=1}^{n}\lambda _{i}(1- \lambda _{i})+\ln \frac{ml}{m+l-n} \\ & \leq S(\mathbf{p}) \\ & \leq \frac{1}{m}(n-m)+\frac{1}{l}(n-l)+\sum _{i=1}^{n}p_{i}^{2} \left ( n-\frac{1}{p_{i}}\right ) +\ln \frac{ml}{n} \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( l-\frac{1}{p_{i}} \right ) \left ( \frac{3}{p_{i}}-2n-l\right ) + \frac{(m-n)^{2}}{2l^{2}}. \end{aligned}$$
(4.22)

Proof

Applying (3.1) to the strongly convex function \(f(t)=-\ln t\), \(t\in (0,l]\), with modulus \(c=\frac{1}{2l^{2}}\) and to \(x_{i}=\frac{1}{p_{i}}\) and \(a_{i}=p_{i}\) with \(\bar{x}=\frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}x_{i}=\sum _{i=1}^{n}p_{i} \frac{1}{p_{i}}=n\), we get

$$\begin{aligned} & -\ln d-\frac{1}{d}\left ( m+l-d-n\right ) +\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( m+l-d-\frac{1}{p_{i}}\right ) ^{2} \\ & +\frac{(l-m)^{2}}{l^{2}}\sum _{i=1}^{n}\lambda _{i}(1-\lambda _{i}) \\ & \leq -\ln m-\ln l+\sum _{i=1}^{n}p_{i}\ln \frac{1}{p_{i}} \\ & \leq -\ln e-\frac{1}{m}(m-e)-\frac{1}{l}(l-e)+\sum _{i=1}^{n}p_{i} \left ( \frac{1}{p_{i}}\right ) ^{-1}\left ( \frac{1}{p_{i}}-e\right ) \\ & -\frac{1}{2l^{2}}\sum _{i=1}^{n}p_{i}\left ( l-\frac{1}{p_{i}} \right ) \left ( l-\frac{3}{p_{i}}+2e\right ) + \frac{(m-e)^{2}}{2l^{2}}, \end{aligned}$$

which is equivalent to (4.21). Choosing \(e=n\) and \(d=m+l-n\), from (4.21) we get (4.22). □

5 New bounds for the Chebyshev functional

One of the fundamental inequalities in probability is the discrete Chebyshev inequality, which we quote in the following form (see [21]).

Theorem 9

Let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), and let \(\boldsymbol{p}=(p_{1},\ldots,p_{n})\) and \(\boldsymbol{q}=(q_{1},\ldots,q_{n})\) be monotonic real n-tuples in the same direction. Then

$$ \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}p_{i}q_{i}-\frac{1}{A_{n}^{2}} \sum _{i=1}^{n}a_{i}p_{i}\sum _{i=1}^{n}a_{i}q_{i}\geq 0. $$
(5.1)

If p and q are monotonic in the opposite direction, then we have the reverse inequality of (5.1).

We can find many papers that study the Chebyshev functional \(T(\boldsymbol{a;p,q})\) derived from the Chebyshev inequality (5.1) by subtracting its right side from its left one:

$$ T(\boldsymbol{a;p,q})=A_{n}\sum _{i=1}^{n}a_{i}p_{i}q_{i}-\sum _{i=1}^{n}a_{i}p_{i}\sum _{i=1}^{n}a_{i}q_{i}, $$
(5.2)

and in the normalized form as

$$ \bar{T}(\boldsymbol{p,q})=\frac{1}{n}\sum _{i=1}^{n}p_{i}q_{i}- \frac{1}{n^{2}}\sum _{i=1}^{n}p_{i}\sum _{i=1}^{n}q_{i}. $$
(5.3)

By (5.1) we have

$$ T(\boldsymbol{a;p,q})\geq 0\text{ \ \ and \ \ }\bar{T}( \boldsymbol{p,q})\geq 0. $$

Using the results from Sect. 2, we obtain improvements of the Chebyshev inequality (5.1), i.e., we get new bounds for the Chebishev functional of types (5.2) and (5.3) without the assumption of monotonicity.

Corollary 10

Let \(\boldsymbol{a}=(a_{1},\ldots,a_{n})\) be a nonnegative n-tuple with \(A_{n}=\sum _{i=1}^{n}a_{i}>0\), and let \(\boldsymbol{p}=(p_{1},\ldots,p_{n})\) and \(\boldsymbol{q}=(q_{1},\ldots,q_{n})\) be real n-tuples with \(\bar{p}=\frac {1}{A_{n}}\sum _{i=1}^{n}a_{i}p_{i}\) and \(P_{n}=\sum _{i=1}^{n}p_{i}\). Then

$$\begin{aligned} 0 & \leq \frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(p_{i}-\bar{p})^{2} \\ & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(p_{i})-f(\bar{p})-c(p_{i}-\bar{p})^{2}\right \vert -\left \vert f^{\prime}( \bar {p})\right \vert \cdot \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert (p_{i}-\bar{p})\right \vert \right \vert \\ & +\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(p_{i}-\bar{p})^{2} \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(p_{i})-f(\bar{p}) \\ & \leq T(\boldsymbol{a;p,q})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}( \bar{p}-p_{i})^{2}\leq T(\boldsymbol{a;p,q}). \end{aligned}$$
(5.4)

In particular, we have

$$\begin{aligned} 0 & \leq \frac{c}{n}\sum _{i=1}^{n}\left ( p_{i}-\frac{P_{n}}{n} \right ) ^{2} \\ & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(p_{i})-f\left ( \frac{P_{n}}{n}\right ) -c\left ( p_{i}-\frac{P_{n}}{n} \right ) ^{2}\right \vert -\left \vert f^{\prime}\left ( \frac{P_{n}}{n}\right ) \right \vert \cdot \frac{1}{n}\sum _{i=1}^{n} \left \vert \left ( p_{i}-\frac {P_{n}}{n}\right ) \right \vert \right \vert \\ & +\frac{c}{n}\sum _{i=1}^{n}\left ( p_{i}-\frac{P_{n}}{n}\right ) ^{2} \\ & \leq \frac{1}{n}\sum _{i=1}^{n}f(p_{i})-f\left ( \frac{P_{n}}{n} \right ) \\ & \leq T(\boldsymbol{p,q})-\frac{c}{n}\sum _{i=1}^{n}\left ( p_{i}- \frac {P_{n}}{n}\right ) ^{2}\leq T(\boldsymbol{p,q}). \end{aligned}$$
(5.5)

Proof

Combining inequalities (2.8) and (2.12), we have

$$\begin{aligned} 0 & \leq \frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}-\bar{x})^{2} \\ & \leq \left \vert \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert f(x_{i})-f(\bar{x})-c(x_{i}-\bar{x})^{2}\right \vert -\left \vert f^{\prime}( \bar {x})\right \vert \cdot \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}\left \vert (x_{i}-\bar{x})\right \vert \right \vert \\ & +\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(x_{i}-\bar{x})^{2} \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f(x_{i})-f(\bar{x}) \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})x_{i}- \frac{1}{A_{n}^{2}}\sum _{i=1}^{n}a_{i}x_{i}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})-\frac{c}{A_{n}}\sum _{i=1}^{n}a_{i}(\bar{x}-x_{i})^{2} \\ & \leq \frac{1}{A_{n}}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i})x_{i}- \frac{1}{A_{n}^{2}}\sum _{i=1}^{n}a_{i}x_{i}\sum _{i=1}^{n}a_{i}f^{\prime}(x_{i}). \end{aligned}$$
(5.6)

Setting \(f^{\prime}(x_{i})=q_{i}\) and \(x_{i}=p_{i}\), \(i\in \{1,\ldots,n\}\) and using (5.6), we get (5.4).

If we set \(a_{i}=\frac{1}{n}\), \(i=1,\ldots,n\), then \(\bar{p}=\frac{1}{n}\sum _{i=1}^{n}p_{i}=\frac{P_{n}}{n}\), where \(P_{n}=\sum _{i=1}^{n}p_{i}\). Now inequality (5.5) immediately follows from (5.4). □