1 Introduction

Let C be a convex subset of a real vector space V. A function \(f:C\rightarrow R\) is said to be convex if whenever u, \(v\in C\) and \(\alpha \in [ 0,1] \) we have

$$\begin{aligned} f\left( \alpha u+\left( 1-\alpha \right) v\right) \le \alpha f\left( u\right) +\left( 1-\alpha \right) f\left( v\right) . \end{aligned}$$

The set of positive integers will be denoted by \(\mathbb {N}_{+}\).

Let the set I denote either \(\left\{ 1,\ldots ,n\right\} \) for some \(n\ge 1\) or \(\mathbb {N}_{+}\). We say that the numbers \(\left( p_{i}\right) _{i\in I}\) represent a (positive) discrete probability distribution if \(\left( p_{i}>0\right) \) \(p_{i}\ge 0\) \(\left( i\in I\right) \) and \(\sum \nolimits _{i\in I}p_{i}=1\).

Jensen’s inequality is one of the most important inequalities regarding convex functions.

The following discrete and integral versions of Jensen’s inequality are well known (see [10]).

Theorem 1.1

(discrete Jensen inequality for finite sums) Let C be a convex subset of a real vector space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},\ldots ,p_{n}\) represent a discrete probability distribution, and \(v_{1},\ldots ,v_{n}\in C\), then

$$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) \le \sum \limits _{i=1} ^{n}p_{i}f\left( v_{i}\right) . \end{aligned}$$
(1.1)

Theorem 1.2

(integral Jensen inequality) Let \(\varphi \) be an integrable function on a probability space \(\left( X,\mathcal {A},\mu \right) \) taking values in an interval \(C\subset \mathbb {R}\). Then \( {\displaystyle \int \nolimits _{X}} \varphi d\mu \) lies in C. If f is a convex function on C such that \(f\circ \varphi \) is \(\mu \)-integrable, then

$$\begin{aligned} f\left( {\displaystyle \int \limits _{X}} \varphi d\mu \right) \le {\displaystyle \int \limits _{X}} f\circ \varphi d\mu . \end{aligned}$$
(1.2)

The following refinement of the discrete Jensen inequality for finite sums can be found in [4]. Its special case \(n=2\) originates from the former paper [5].

Theorem 1.3

Let C be a convex subset of a real vector space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},\ldots ,p_{n}\) represent a discrete probability distribution, \(q_{1},\ldots ,q_{n}\) represent a positive discrete probability distribution, and \(v_{1},\ldots ,v_{n}\in C\), then

$$\begin{aligned}{} & {} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +\min _{1\le i\le n} \frac{p_{i}}{q_{i}}\left( \sum \limits _{i=1}^{n}q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}q_{i}v_{i}\right) \right) \le \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) \end{aligned}$$
(1.3)
$$\begin{aligned}{} & {} \quad \le f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +\max _{1\le i\le n}\frac{p_{i}}{q_{i}}\left( \sum \limits _{i=1}^{n}q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}q_{i}v_{i}\right) \right) . \end{aligned}$$
(1.4)

The previous result can also be found in paper [9], when C is an interval in \(\mathbb {R}\).

Inequality (1.3) is a refinement of discrete Jensen inequality by using the so-called discrete Jensen gap

$$\begin{aligned} \sum \limits _{i=1}^{n}q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1} ^{n}q_{i}v_{i}\right) . \end{aligned}$$

Motivated by Theorem 1.3, we obtain new refinements of both the discrete and the integral Jensen inequalities, also using Jensen’s gap. If \(p_{i}>0\) \(\left( i=1,\ldots ,n\right) \), then it is easy to think that the two inequalities (1.31.4) are equivalent, so in this paper we concentrate only on refinements of either the discrete or the integral Jensen inequality similar to inequality (1.3). We first show that inequality (1.3) can be extended to the case where \(q_{1},\ldots ,q_{n} \) are only nonnegative, and we give its form for infinite sums. This result allows us to prove a new refinement of the integral Jensen inequality. Paper [4] deals only with discrete inequalities for finite sums, while in paper [9] there are also versions of Theorem 1.3 for integrals, but not in general measure spaces, only on Borel sets of compact intervals. While the proofs of discrete inequalities in [9] are essentially transferable to the proofs of integral inequalities there, this is not the case for the integral inequality we have given. As applications, we give refinements of various inequalities verifiable by different types of Jensen’s inequality. Topics covered: norms, quasi-arithmetic means, Hölder’s inequality and f-divergences in information theory.

2 Preliminary results

We first give a version of the discrete Jensen inequality for series. The author has not found this form in the literature (although it is probably not new), so we prove it.

Theorem 2.1

(discrete Jensen inequality for series) Let C be a closed convex subset of a real normed space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},p_{2},\ldots \) represent a discrete probability distribution, \(v_{1},v_{2},\ldots \in C\) such that the series \(\sum \nolimits _{i=1} ^{\infty }p_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) are convergent and f is lower semicontinuous, then

$$\begin{aligned} f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) \le \sum \limits _{i=1} ^{\infty }p_{i}f\left( v_{i}\right) . \end{aligned}$$
(2.1)

Proof

We can suppose that \(p_{1}>0\). By the discrete Jensen inequality for finite sums,

$$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}/\sum \limits _{i=1}^{n} p_{i}\right) \le \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) / \sum \limits _{i=1}^{n}p_{i},\quad n\in \mathbb {N}_{+}. \end{aligned}$$
(2.2)

Since

$$\begin{aligned} \sum \limits _{i=1}^{n}p_{i}v_{i}/\sum \limits _{i=1}^{n}p_{i}\in C,\quad n\in \mathbb {N}_{+}, \end{aligned}$$

\(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) is convergent and C is closed, \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\in C\). It now follows from (2.2) and the convergence of \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) that

$$\begin{aligned} \liminf \limits _{n\rightarrow \infty }f\left( \sum \limits _{i=1}^{n}p_{i} v_{i}/\sum \limits _{i=1}^{n}p_{i}\right) \le \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) , \end{aligned}$$

and hence the result follows from the lower semicontinuity of f.

The proof is complete. \(\square \)

Next, we obtain an extension of Theorem 1.3 for infinite sums, and show that the two inequalities in Theorem 1.3 are equivalent in the sense that either one follows from the other.

Theorem 2.2

  1. (a)

    Let C be a closed convex subset of a real normed space V, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. If \(p_{1},p_{2},\ldots \) represent a discrete probability distribution, \(q_{1},q_{2},\ldots \) represent a positive discrete probability distribution, \(v_{1},v_{2},\ldots \in C\) such that the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\), \(\sum \nolimits _{i=1} ^{\infty }q_{i}v_{i}\), \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \ \)and \(\sum \nolimits _{i=1}^{\infty }q_{i}f\left( v_{i}\right) \) are convergent and f is lower semicontinuous, then

    $$\begin{aligned} f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) +\inf _{i\ge 1} \frac{p_{i}}{q_{i}}\left( \sum \limits _{i=1}^{\infty }q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }q_{i}v_{i}\right) \right) \le \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) . \end{aligned}$$
    (2.3)
  2. (b)

    If \(p_{i}>0\) \(\left( i=1,\ldots ,n\right) \) in Theorem 1.3, then inequalities (1.31.4) are equivalent.

Proof

(a) Let

$$\begin{aligned} s:=\inf _{i\ge 1}\frac{p_{i}}{q_{i}}. \end{aligned}$$

It is easy to check that

$$\begin{aligned} s+\sum \limits _{i=1}^{\infty }q_{i}\left( \frac{p_{i}}{q_{i}}-s\right) =1. \end{aligned}$$

By using this and the discrete Jensen inequality for series, we obtain that

$$\begin{aligned}{} & {} sf\left( \sum \limits _{i=1}^{\infty }q_{i}v_{i}\right) +\sum \limits _{i=1} ^{\infty }q_{i}\left( \frac{p_{i}}{q_{i}}-s\right) f\left( v_{i}\right) \\{} & {} \quad \ge f\left( s\sum \limits _{i=1}^{\infty }q_{i}v_{i}+\sum \limits _{i=1}^{\infty }q_{i}\left( \frac{p_{i}}{q_{i}}-s\right) v_{i}\right) =f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) , \end{aligned}$$

which gives the inequality.

(b) It follows obviously from Theorem 1.3 (by reversing the probability distributions) that

$$\begin{aligned} f\left( \sum \limits _{i=1}^{n}q_{i}v_{i}\right) +\min _{1\le i\le n} \frac{q_{i}}{p_{i}}\left( \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) \right) \le \sum \limits _{i=1}^{n}q_{i}f\left( v_{i}\right) , \end{aligned}$$

and hence

$$\begin{aligned} \min _{1\le i\le n}\frac{q_{i}}{p_{i}}\left( \sum \limits _{i=1}^{n} p_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) \right) \le \sum \limits _{i=1}^{n}q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}q_{i}v_{i}\right) . \end{aligned}$$
(2.4)

Since

$$\begin{aligned} \frac{1}{\min \limits _{1\le i\le n}\frac{q_{i}}{p_{i}}}=\max _{1\le i\le n}\frac{p_{i}}{q_{i}}, \end{aligned}$$

we have that

$$\begin{aligned} \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) \le f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +\max _{1\le i\le n}\frac{p_{i}}{q_{i} }\left( \sum \limits _{i=1}^{n}q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}q_{i}v_{i}\right) \right) . \end{aligned}$$

The proof is complete. \(\square \)

Remark 2.3

  1. (a)

    If

    $$\begin{aligned} \inf _{i\ge 1}\frac{p_{i}}{q_{i}}=0 \end{aligned}$$

    (for example, one of the numbers \(p_{1},p_{2}\ldots \) is 0), then (2.3) is just the discrete Jensen inequality, and therefore Theorem 2.2 (a) is really interesting in the case where

    $$\begin{aligned} \inf _{i\ge 1}\frac{p_{i}}{q_{i}}>0. \end{aligned}$$
    (2.5)

    A similar statement applies to inequality (1.3).

  2. (b)

    If (2.5) is satisfied, and the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) are absolutely convergent, then the series \(\sum \nolimits _{i=1}^{\infty }q_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty } q_{i}f\left( v_{i}\right) \) are also absolutely convergent. Therefore if V is a complete space, then they are also convergent.

  3. (c)

    Condition (2.5) by itself does not provide a finite upper bound on the expression \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \), since it is possible that

    $$\begin{aligned} \sup _{i\ge 1}\frac{p_{i}}{q_{i}}=\infty . \end{aligned}$$

    But if

    $$\begin{aligned} 0<\inf _{i\ge 1}\frac{p_{i}}{q_{i}}\le \sup _{i\ge 1}\frac{p_{i}}{q_{i}} <\infty , \end{aligned}$$

    then

    $$\begin{aligned} \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \le f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) +\sup _{i\ge 1}\frac{p_{i}}{q_{i} }\left( \sum \limits _{i=1}^{\infty }q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }q_{i}v_{i}\right) \right) \end{aligned}$$
    (2.6)

    holds too, and in this case inequalities (2.3) and (2.6) are also equivalent.

  4. (d)

    Inequality (2.3) also compares two different discrete Jensen gaps:

    $$\begin{aligned} 0\le \inf _{i\ge 1}\frac{p_{i}}{q_{i}}\left( \sum \limits _{i=1}^{\infty } q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }q_{i} v_{i}\right) \right) \le \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) . \end{aligned}$$

3 Main results

The first result shows that neither the positivity of the probability distribution \(q_{1},\ldots ,q_{n}\) in Theorem 1.3 nor the positivity of the probability distribution \(q_{1},q_{2},\ldots \) in Theorem 2.2 (a) are essential conditions.

Theorem 3.1

  1. (a)

    Let J be a nonempty subset of \(\left\{ 1,\ldots ,n\right\} \). Let C be a convex subset of a real normed space V, and let \(f:C\rightarrow \mathbb {R}\) be a continuous convex function. If \(p_{1},\ldots ,p_{n}\) and \(\left( q_{j}\right) _{j\in J}\) represent positive discrete probability distributions, and \(v_{1},\ldots ,v_{n}\in C\), then

    $$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +\min _{j\in J}\frac{p_{j} }{q_{j}}\left( \sum \limits _{j\in J}q_{j}f\left( v_{j}\right) -f\left( \sum \limits _{j\in J}q_{j}v_{j}\right) \right) \le \sum \limits _{i=1}^{n} p_{i}f\left( v_{i}\right) . \nonumber \\ \end{aligned}$$
    (3.1)
  2. (b)

    Let J be either a nonempty finite subset of \(\mathbb {N}_{+}\) or an infinite subset of \(\mathbb {N}_{+}\). Let C be a closed convex subset of a real Banach space V, and let \(f:C\rightarrow \mathbb {R}\) be a continuous convex function. If \(p_{1},p_{2},\ldots \) and \(\left( q_{j}\right) _{j\in J}\) represent positive discrete probability distributions, and \(v_{1},v_{2},\ldots \in C\) such that the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\), \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \ \)are absolutely convergent, and the series \(\sum \nolimits _{j\in J}q_{j}v_{j} \), \(\sum \nolimits _{j\in J}q_{j}f\left( v_{j}\right) \) are convergent, then

    $$\begin{aligned} f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) +\inf _{j\in J} \frac{p_{j}}{q_{j}}\left( \sum \limits _{j\in J}q_{j}f\left( v_{j}\right) -f\left( \sum \limits _{j\in J}q_{j}v_{j}\right) \right) \le \sum \limits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) .\nonumber \\ \end{aligned}$$
    (3.2)
  3. (c)

    If \(V=\mathbb {R}\), so C is an interval in \(\mathbb {R}\), then either (a) or (b) remains true without the continuity of f.

Proof

We first prove part (b).

(b) Let \(K:=\mathbb {N}_{+}\setminus J\), and let

$$\begin{aligned} c:=\sum \nolimits _{i\in K}^{\infty }p_{i}\text { and }s:=\inf _{j\in J}\frac{p_{j} }{q_{j}}. \end{aligned}$$

If \(J=\mathbb {N}_{+}\), then Theorem 2.2 (a) can be applied, while if \(s=0\), then (3.2) is obvious, and hence we can suppose that \(J\ne \mathbb {N}_{+}\) (in this case \(c>0\)) and \(s>0\).

For every \(0<\varepsilon <\min \left( 1,\frac{c}{s}\right) \) define

$$\begin{aligned} \begin{array}{l} \widehat{q}_{i}\left( \varepsilon \right) :=q_{i}-\varepsilon q_{i}\text { if }i\in J\\ \widehat{q}_{i}\left( \varepsilon \right) :=\frac{\varepsilon }{c}p_{i}\text { if }i\in K \end{array}. \end{aligned}$$

Then \(\left( \widehat{q}_{i}\left( \varepsilon \right) \right) _{i=1}^{\infty }\) represents a positive discrete probability distribution for any possible \(\varepsilon \).

Since the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) and \(\sum \nolimits _{i=1}^{\infty }p_{i}f\left( v_{i}\right) \) are absolutely convergent and the space is complete, the series

$$\begin{aligned} \sum \limits _{i\in K}p_{i}v_{i}\text { and }\sum \limits _{i\in K}p_{i}f\left( v_{i}\right) \end{aligned}$$

are also convergent (it is only really interesting if K is an infinite set). It now follows from the convergence of the series \(\sum \nolimits _{j\in J} q_{j}v_{j} \ \)and \(\sum \nolimits _{j\in J}q_{j}f\left( v_{j}\right) \) that

$$\begin{aligned} \sum \limits _{i=1}^{\infty }\widehat{q}_{i}\left( \varepsilon \right) v_{i}=\left( 1-\varepsilon \right) \sum \limits _{j\in J}q_{j}v_{j} +\frac{\varepsilon }{c}\sum \limits _{i\in K}p_{i}v_{i} \end{aligned}$$
(3.3)

and

$$\begin{aligned} \sum \limits _{i=1}^{\infty }\widehat{q}_{i}\left( \varepsilon \right) f\left( v_{i}\right) =\left( 1-\varepsilon \right) \sum \limits _{j\in J}q_{j}f\left( v_{j}\right) +\frac{\varepsilon }{c}\sum \limits _{i\in K}p_{i}f\left( v_{i}\right) \end{aligned}$$
(3.4)

for all \(0<\varepsilon <\min \left( 1,\frac{c}{s}\right) \).

Based on the previous two statements, we can apply Theorem 2.2 (a) which gives that

$$\begin{aligned}{} & {} f\left( \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right) +\inf _{j\ge 1} \frac{p_{j}}{\widehat{q}_{j}\left( \varepsilon \right) }\left( \sum \limits _{i=1}^{\infty }\widehat{q}_{i}\left( \varepsilon \right) f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{\infty }\widehat{q}_{i}\left( \varepsilon \right) v_{i}\right) \right) \nonumber \\{} & {} \quad \le \sum \limits _{i=1}^{\infty } p_{i}f\left( v_{i}\right) . \end{aligned}$$
(3.5)

Let

$$\begin{aligned} s\left( \varepsilon \right) :=\inf _{j\ge 1}\frac{p_{j}}{\widehat{q} _{j}\left( \varepsilon \right) }\text { for }0<\varepsilon <\min \left( 1,\frac{c}{s}\right) , \end{aligned}$$

and let \(\delta >0\) be fixed.

Then there exist \(\overline{j}\in J\) and \(0<\overline{\varepsilon }<\min \left( 1,\frac{c}{s}\right) \) such that

$$\begin{aligned} s\le \frac{p_{\overline{j}}}{q_{\overline{j}}}<\frac{p_{\overline{j}} }{\widehat{q}_{\overline{j}}\left( \varepsilon \right) }<s+\delta ,\quad 0<\varepsilon \le \overline{\varepsilon }. \end{aligned}$$
(3.6)

Since \(0<\varepsilon <\frac{c}{s}\), for every \(i\in K\) we have

$$\begin{aligned} \frac{p_{i}}{\widehat{q}_{i}\left( \varepsilon \right) }=\frac{p_{i}}{\frac{\varepsilon }{c}p_{i}}=\frac{c}{\varepsilon }>s. \end{aligned}$$
(3.7)

It follows from (3.6) and (3.7) that

$$\begin{aligned} s\le s\left( \varepsilon \right)<s+\delta ,\quad 0<\varepsilon \le \overline{\varepsilon }, \end{aligned}$$

which implies

$$\begin{aligned} \lim \limits _{\varepsilon \rightarrow 0}s\left( \varepsilon \right) =s. \end{aligned}$$

Therefore (3.5) implies the result, by using (3.3), (3.4) and the continuity of f.

(a) The proof of part (b) can be followed in a simplified version.

(c) Let the left-hand endpoint of C be \(a\ge -\infty \) and the right-hand endpoint be \(b\le \infty \). Assume \(a\in C\) and f is not continuous at a. Our previous considerations show that it can only be a problem if \(\sum \limits _{j\in K}q_{j}v_{j}=a\). But this is only possible if \(v_{j}=a\) \(\left( j\in K\right) \). In this case, however

$$\begin{aligned} \sum \limits _{j\in K}q_{j}f\left( v_{j}\right) -f\left( \sum \limits _{j\in K}q_{j}v_{j}\right) =0. \end{aligned}$$

If \(b\in C\) and f is not continuous at a, the proof goes similarly.

The proof is complete. \(\square \)

Remark 3.2

  1. (a)

    Theorem 3.1 is an essential generalization of Theorem 1.3.

  2. (b)

    Assume \(J=\left\{ i_{1},\ldots ,i_{k}\right\} \) where \(1\le i_{1}<\cdots <i_{k}\le n\). Then (3.1) can be written in the following form

    $$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +\min _{1\le j\le k} \frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1}^{k}q_{i_{j}}f\left( v_{i_{j}}\right) -f\left( \sum \limits _{j=1}^{k}q_{i_{j}}v_{i_{j}}\right) \right) \le \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) . \end{aligned}$$

    Inequality (3.2) can be rewritten in a similar form, and in this case K can be a finite or infinite set.

  3. (c)

    It is easy to check that

    $$\begin{aligned} \min _{j\in J}\frac{p_{j}}{q_{j}}\le 1\text { and }\inf _{j\in J}\frac{p_{j} }{q_{j}}\le 1. \end{aligned}$$
  4. (d)

    Remark 2.3 (d) also applies here.

We are now able to state and prove the analogue of Theorem 3.1 for the integral Jensen inequality.

Theorem 3.3

Let \(\left( X,\mathcal {A}\right) \) be a measurable space with probability measures \(\mu \) and \(\nu \) having the following property

$$\begin{aligned} 0<s:=\inf _{\left\{ A\in \mathcal {A}\mid \nu \left( A\right) >0\right\} } \frac{\mu \left( A\right) }{\nu \left( A\right) }. \end{aligned}$$
(3.8)

Assume \(\varphi :X\rightarrow \mathbb {R}\) is a measurable function taking values in an interval \(C\subset \mathbb {R}\) such that \(\varphi \) is \(\mu \)- and \(\nu \)-integrable. If f is a convex function on C such that \(f\circ \varphi \) is \(\mu \)- and \(\nu \)-integrable, then

$$\begin{aligned} f\left( {\displaystyle \int \limits _{X}} \varphi d\mu \right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\nu -f\left( {\displaystyle \int \limits _{X}} \varphi d\nu \right) \right) \le {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\mu . \end{aligned}$$
(3.9)

Proof

(i) Assume first that \(\varphi \) is a simple function on X, which means that it has only finitely many different values. If \(\left\{ c_{1},\ldots c_{k}\right\} \subset C\) is the set of distinct values of \(\varphi \), then the sets \(A_{i}:=\left\{ x\in X\mid \varphi \left( x\right) =c_{i}\right\} \) \(\left( i=1,\ldots ,k\right) \) are pairwise disjoint elements of \(\mathcal {A} \).

In this case inequality (3.9) can be written in the form

$$\begin{aligned}{} & {} f\left( \sum \limits _{i=1}^{k}c_{i}\mu \left( A_{i}\right) \right) +s\left( \sum \limits _{i=1}^{k}f\left( c_{i}\right) \nu \left( A_{i}\right) -f\left( \sum \limits _{i=1}^{k}c_{i}\nu \left( A_{i}\right) \right) \right) \nonumber \\{} & {} \quad \le \sum \limits _{i=1}^{k}f\left( c_{i}\right) \mu \left( A_{i}\right) . \end{aligned}$$
(3.10)

It follows from (3.8) that \(\mu \left( A\right) =0\) implies \(\nu \left( A\right) =0\), and hence we can suppose \(\mu \left( A_{i}\right) >0\) \(\left( i=1,\ldots ,k\right) \). According to this and

$$\begin{aligned} s\le \min _{\left\{ j\in \left\{ 1,\ldots ,k\right\} \mid \nu \left( A_{j}\right) >0\right\} }\frac{\mu \left( A_{j}\right) }{\nu \left( A_{j}\right) }, \end{aligned}$$

inequality (3.10) is an immediate consequence of Theorem 3.1 (a).

(ii) Now assume that f is continuous.

It is well known (see [6]) that there exists a sequence \(\left( \varphi _{n}\right) _{n=1}^{\infty }\) of \(\mathcal {A}\)-measurable simple functions defined on X such that \(\left| \varphi _{1}\right| \le \left| \varphi _{2}\right| \le \ldots \le \left| \varphi _{n}\right| \le \ldots \le \left| \varphi \right| \) and \(\varphi _{n}\left( x\right) \rightarrow \varphi \left( x\right) \) for each \(x\in X\). It follows that \(\varphi _{n}\left( x\right) \in C\) \(\left( x\in X\right) \) and \(\varphi _{n}\) is \(\mu \)- and \(\nu \)-integrable for all \(n\in \mathbb {N}_{+}\).

By part (i), we obtain that

$$\begin{aligned}{} & {} f\left( \int \limits _{X}\varphi _{n}d\mu \right) +s\left( \int \limits _{X} \left( f\circ \varphi _{n}\right) d\nu -f\left( \int \limits _{X}\varphi _{n} d\nu \right) \right) \nonumber \\{} & {} \quad \le \int \limits _{X}\left( f\circ \varphi _{n}\right) d\mu ,\quad n\in \mathbb {N}_{+}. \end{aligned}$$
(3.11)

The dominated convergence theorem implies that

$$\begin{aligned} \int \limits _{X}\varphi _{n}d\mu \rightarrow \int \limits _{X}\varphi d\mu \text { and }\int \limits _{X}\varphi _{n}d\nu \rightarrow \int \limits _{X}\varphi d\nu . \end{aligned}$$

By the continuity of f,

$$\begin{aligned} f\left( \varphi _{n}\left( x\right) \right) \rightarrow f\left( \varphi \left( x\right) \right) ,\quad x\in X. \end{aligned}$$

Since f is convex and continuous, it is either monotonic on C or it has a global minimum at an interior point \(t_{0}\) of C (see [10]). In the first case

$$\begin{aligned} \min \left( f\circ \varphi _{1},f\circ \varphi \right) \le f\circ \varphi _{n} \le \max \left( f\circ \varphi _{1},f\circ \varphi \right) ,\quad n\in \mathbb {N}_{+}, \end{aligned}$$

while in the second case

$$\begin{aligned} f(t_{0})\le f\circ \varphi _{n}\le \max \left( f\circ \varphi _{1},f\circ \varphi \right) ,\quad n\in \mathbb {N}_{+}. \end{aligned}$$

It follows from our previous observations that we can again apply the dominated convergence theorem showing that

$$\begin{aligned} \int \limits _{X}f\circ \varphi _{n}d\mu \rightarrow \int \limits _{X}f\circ \varphi d\mu \text { and }\int \limits _{X}\left( f\circ \varphi _{n}\right) d\nu \rightarrow \int \limits _{X}\left( f\circ \varphi \right) d\nu . \end{aligned}$$

Now the result comes from inequality (3.11).

(iii) Finally, assume f is not continuous at least at one endpoint of the interval C.

Then it is not hard to think that there exists a decreasing sequence \(\left( f_{n}\right) _{n=1}^{\infty }\) of convex functions defined on C such that \(f_{n}\) is continuous and \(f_{n}\circ \varphi \) is \(\mu \)- and \(\nu \)-integrable for all \(n\in \mathbb {N}_{+}\), and also \(f_{n}\left( t\right) \rightarrow f\left( t\right) \) for each \(t\in C\). In this case the sequence \(\left( f_{n}\circ \varphi \right) _{n=1}^{\infty }\) is decreasing too, and \(f_{n}\left( \varphi \left( x\right) \right) \rightarrow f\left( \varphi \left( x\right) \right) \) for each \(x\in X\), and hence Beppo Levi’s theorem shows that

$$\begin{aligned} \int \limits _{X}f_{n}\circ \varphi d\mu \rightarrow \int \limits _{X}f\circ \varphi d\mu \text { and }\int \limits _{X}\left( f_{n}\circ \varphi \right) d\nu \rightarrow \int \limits _{X}\left( f\circ \varphi \right) d\nu . \end{aligned}$$

Now, we can apply part (ii).

The proof is complete. \(\square \)

Let \(\left( X,\mathcal {A}\right) \) be a measurable space. The unit mass at \(x\in X\) (the Dirac measure at x) is denoted by \(\varepsilon _{x}\). The set of all subsets of X is denoted by \(P\left( X\right) \).

Remark 3.4

  1. (a)

    As we have seen in the proof, or we can refer to condition (3.8), the measure \(\nu \) is continuous with respect to the measure \(\mu \). Then \(\nu \) has a Radon–Nikodym derivative (or density) \(q:X\rightarrow \mathbb {R}\) with respect to \(\mu \). By using this, the condition (3.8) and inequality (3.9) can be written in the following form:

    $$\begin{aligned} 0<s:=\inf _{\left\{ A\in \mathcal {A}\mid \int \limits _{A}qd\mu >0\right\} } \frac{\mu \left( A\right) }{\int \limits _{A}qd\mu } \end{aligned}$$

    and

    $$\begin{aligned} f\left( {\displaystyle \int \limits _{X}} \varphi d\mu \right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) qd\mu -f\left( {\displaystyle \int \limits _{X}} \varphi qd\mu \right) \right) \le {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\mu . \end{aligned}$$
  2. (b)

    It is easy to think that \(s\le 1\).

  3. (c)

    Let \(C\subset \mathbb {R}\) be an interval, and let \(f:C\rightarrow \mathbb {R}\) be a convex function. Assume \(p_{1},\ldots ,p_{n}\) represent a positive discrete probability distribution, \(q_{1},\ldots ,q_{n}\) represent a discrete probability distribution, and \(v_{1},\ldots ,v_{n}\in C\). Consider the discrete measures

    $$\begin{aligned} \mu :=\sum \limits _{i=1}^{n}p_{i}\varepsilon _{i}\text { and }\nu :=\sum \limits _{i=1}^{n}q_{i}\varepsilon _{i} \end{aligned}$$

    on the set of all subsets of \(\left\{ 1,\ldots n\right\} \). Then obviously

    $$\begin{aligned} 0<s:=\min _{\left\{ A\subset \left\{ 1,\ldots n\right\} \mid \nu \left( A\right) >0\right\} }\frac{\mu \left( A\right) }{\nu \left( A\right) }, \end{aligned}$$

    and if \(\varphi \) is defined on \(\left\{ 1,\ldots ,n\right\} \) by \(\varphi \left( i\right) :=v_{i}\), then Theorem 3.3 gives that

    $$\begin{aligned} f\left( \sum \limits _{i=1}^{n}p_{i}v_{i}\right) +s\left( \sum \limits _{i=1} ^{n}q_{i}f\left( v_{i}\right) -f\left( \sum \limits _{i=1}^{n}q_{i} v_{i}\right) \right) \le \sum \limits _{i=1}^{n}p_{i}f\left( v_{i}\right) . \end{aligned}$$

    This is not as sharp a result as Theorem 3.1 (a), since

    $$\begin{aligned} s\le \min _{\left\{ j\in \left\{ 1,\ldots ,n\right\} \mid q_{j}>0\right\} }\frac{p_{j}}{q_{j}}, \end{aligned}$$

    but it is quite natural, since \(\varphi \) is now an elementary function, but in the general case \(\varphi \) must be approximated by a sequence of elementary functions.

  4. (d)

    We can also see from (c) that the measure \(\mu \) is in general not continuous with respect to the measure \(\nu \).

  5. (e)

    Similarly to Remark 2.3 (d), a comparison of different integral Jensen gaps can be obtained here:

    $$\begin{aligned} 0\le s\left( {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\nu -f\left( {\displaystyle \int \limits _{X}} \varphi d\nu \right) \right) \le {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) d\mu -f\left( {\displaystyle \int \limits _{X}} \varphi d\mu \right) . \end{aligned}$$

It is worth noting the following version of the previous theorem.

Corollary 3.5

Let \(\left( X,\mathcal {A}\right) \) be a measurable space with a \(\sigma \)-finite measure \(\xi \) on \(\mathcal {A}\), and let p, \(q:X\rightarrow \mathbb {R}\) be positive and \(\xi \)-integrable functions such that

$$\begin{aligned} \int \limits _{X}pd\xi =\int \limits _{X}qd\xi =1 \end{aligned}$$
(3.12)

and

$$\begin{aligned} 0<s:=\inf _{\left\{ A\in \mathcal {A}\mid \xi \left( A\right) >0\right\} } \frac{\int \limits _{A}pd\xi }{\int \limits _{A}qd\xi }. \end{aligned}$$
(3.13)

Assume \(\varphi :X\rightarrow \mathbb {R}\) is a measurable function taking values in an interval \(C\subset \mathbb {R}\) for which \(\varphi p\) and \(\varphi q\) are \(\xi \)-integrable functions. If f is a convex function on C such that \(\left( f\circ \varphi \right) p\) and \(\left( f\circ \varphi \right) q\) are \(\xi \)-integrable, then

$$\begin{aligned} f\left( {\displaystyle \int \limits _{X}} \varphi pd\xi \right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) qd\xi -f\left( {\displaystyle \int \limits _{X}} \varphi qd\xi \right) \right) \le {\displaystyle \int \limits _{X}} \left( f\circ \varphi \right) pd\xi .\nonumber \\ \end{aligned}$$
(3.14)

Proof

Define the measures \(\mu \) and \(\nu \) on \(\mathcal {A}\) by

$$\begin{aligned} \mu \left( A\right) := {\displaystyle \int \limits _{A}} pd\xi \text { and }\nu \left( A\right) := {\displaystyle \int \limits _{A}} qd\xi . \end{aligned}$$

By (3.12), \(\mu \) and \(\nu \) are probability measures, and hence

$$\begin{aligned} {\displaystyle \int \limits _{X}} \varphi pd\xi \in C\text { and } {\displaystyle \int \limits _{X}} \varphi qd\xi \in C. \end{aligned}$$

It is also known from the theory of integration that

$$\begin{aligned} {\displaystyle \int \limits _{X}} \varphi d\mu = {\displaystyle \int \limits _{X}} \varphi pd\xi \text { and } {\displaystyle \int \limits _{X}} \varphi d\nu = {\displaystyle \int \limits _{X}} \varphi qd\xi . \end{aligned}$$

Theorem 3.3 can be applied.

The proof is complete. \(\square \)

Remark 3.6

  1. (a)

    The measures \(\mu \) and \(\nu \) are \(\xi \)-continuous and the functions p and q are the Radon–Nikodym derivatives of \(\mu \) and \(\nu \) with respect to \(\xi \), respectively.

  2. (b)

    If the measure \(\xi \) is not \(\sigma \)-finite, then there is no positive and integrable function on X.

  3. (c)

    If

    $$\begin{aligned} 0<\inf _{X}\frac{p}{q} \end{aligned}$$
    (3.15)

    then obviously

    $$\begin{aligned} \inf _{X}\frac{p}{q}\le s, \end{aligned}$$

    which implies that (3.13) holds too, but (3.15) does not follow from (3.13) in general. Although our result is weaker under the condition (3.15), it is also interesting because it is easier to check than condition (3.13).

4 Applications

The first application relates to norms.

Proposition 4.1

Let \(\left( V,\left\| \cdot \right\| \right) \) be a real Banach space, and let \(\alpha \ge 1\).

  1. (a)

    If \(p_{1},\ldots ,p_{n}\) and \(q_{i_{1}},\ldots ,q_{i_{k}}\) represent positive discrete probability distributions, where \(1\le i_{1}<i_{2}<\cdots <i_{k}\le n\), \(v_{1},\ldots ,v_{n}\in V\), and \(\alpha \ge 1\), then

    $$\begin{aligned} \left\| \sum \limits _{i=1}^{n}p_{i}v_{i}\right\| ^{\alpha }+\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1}^{k}q_{i_{j} }\left\| v_{i_{j}}\right\| ^{\alpha }-\left\| \sum \limits _{j=1} ^{k}q_{i_{j}}v_{i_{j}}\right\| ^{\alpha }\right) \le \sum \limits _{i=1} ^{n}p_{i}\left\| v_{i}\right\| ^{\alpha }. \end{aligned}$$
  2. (b)

    Assume \(p_{1},p_{2},\ldots \) represent a positive discrete probability distribution, and \(v_{1},v_{2},\ldots \in V\) such that the series \(\sum \nolimits _{i=1}^{\infty }p_{i}v_{i}\) is absolutely convergent.

(b\(_{1}\)):

If \(q_{i_{1}},\ldots ,q_{i_{k}}\) represent a positive discrete probability distribution, where \(1\le i_{1}<i_{2}<\cdots <i_{k} \), then

$$\begin{aligned} \left\| \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right\| ^{\alpha } +\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1} ^{k}q_{i_{j}}\left\| v_{i_{j}}\right\| ^{\alpha }-\left\| \sum \limits _{j=1}^{k}q_{i_{j}}v_{i_{j}}\right\| ^{\alpha }\right) \le \sum \limits _{i=1}^{\infty }p_{i}\left\| v_{i}\right\| ^{\alpha }. \end{aligned}$$
(b\(_{2}\)):

If \(q_{i_{1}},q_{i_{2}},\ldots \)represent a positive discrete probability distribution, where \(1\le i_{1}<i_{2}<\cdots \), and the series \(\sum \limits _{j=1}^{\infty }q_{i_{j}}v_{i_{j}}\) is absolutely convergent, then

$$\begin{aligned} \left\| \sum \limits _{i=1}^{\infty }p_{i}v_{i}\right\| ^{\alpha } +\inf _{j\ge 1}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1}^{\infty }q_{i_{j}}\left\| v_{i_{j}}\right\| ^{\alpha }-\left\| \sum \limits _{j=1}^{\infty }q_{i_{j}}v_{i_{j}}\right\| ^{\alpha }\right) \le \sum \limits _{i=1}^{\infty }p_{i}\left\| v_{i}\right\| ^{\alpha }. \end{aligned}$$

Proof

It is well known that the function \(f:V\rightarrow \mathbb {R}\), \(f\left( x\right) :=\left\| x\right\| ^{\alpha }\) is continuous and convex.

  1. (a)

    Theorem 3.1 (a) can be applied.

  2. (b)

    In this case the series \(\sum \nolimits _{i=1}^{\infty }p_{i}\left\| v_{i}\right\| ^{\alpha }\) and \(\sum \nolimits _{j=1}^{\infty }q_{i_{j}}\left\| v_{i_{j}}\right\| ^{\alpha }\) are convergent, and hence the result follows from Theorem 3.1 (b).

The proof is complete. \(\square \)

Remark 4.2

Paper [4] contains the special case of Proposition 4.1 (a), when \(k=n\).

The second application concerns quasi-arithmetic means.

Let \(C\subset \mathbb {R}\) be an interval, and let \(g:C\rightarrow \mathbb {R}\) be a continuous and strictly monotone function.

If \(p_{1},\ldots ,p_{n}\) represent a discrete probability distribution and \(v_{1},\ldots ,v_{n}\in C\), then the weighted quasi-arithmetic mean is defined by

$$\begin{aligned} A_{g}\left( \textbf{v},\textbf{p}\right) :=g^{-1}\left( \sum \limits _{i=1} ^{n}p_{i}g\left( v_{i}\right) \right) . \end{aligned}$$
(4.1)

If \(\left( X,\mathcal {A},\mu \right) \) is a probability space, and \(\varphi :X\rightarrow C\) is a measurable function such that \(g\circ \varphi \) is \(\mu \)-integrable on X, then

$$\begin{aligned} M_{g}\left( \varphi ,\mu \right) :=g^{-1}\left( \int \limits _{X}g\circ \varphi d\mu \right) \end{aligned}$$
(4.2)

is called the quasi-arithmetic mean (integral g-mean) of \(\varphi \). Of course (4.2) contains (4.1) as a special case.

Proposition 4.3

Let \(C\subset \mathbb {R}\) be an interval, and g be a continuous and strictly monotone function on C.

  1. (a)

    Let \(p_{1},\ldots ,p_{n}\) and \(q_{i_{1}},\ldots ,q_{i_{k}}\) represent positive discrete probability distributions, where \(1\le i_{1}<i_{2}<\cdots <i_{k}\le n\), and let \(v_{1},\ldots ,v_{n}\in C\). If either g is strictly increasing and concave or g is strictly decreasing and convex, then

    $$\begin{aligned}{} & {} g^{-1}\left( \sum \limits _{i=1}^{n}p_{i}g\left( v_{i}\right) \right) +\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \sum \limits _{j=1} ^{k}q_{i_{j}}v_{i_{j}}-g^{-1}\left( \sum \limits _{j=1}^{k}q_{i_{j}}g\left( v_{i_{j}}\right) \right) \right) \nonumber \\{} & {} \quad \le \sum \limits _{i=1}^{n}p_{i_{j}} v_{i_{j}}. \end{aligned}$$
    (4.3)

    If either g is strictly increasing and convex or g is strictly decreasing and concave, then the reverse inequality is satisfied.

  2. (b)

    Let \(\left( X,\mathcal {A}\right) \) be a measurable space with probability measures \(\mu \) and \(\nu \) having the property (3.8). Assume \(\varphi :X\rightarrow \mathbb {R}\) is a measurable function taking values in C such that \(\varphi \) and \(g\circ \varphi \) are \(\mu \)- and \(\nu \)-integrable. If either g is strictly increasing and concave or g is strictly decreasing and convex, then

    $$\begin{aligned} g^{-1}\left( {\displaystyle \int \limits _{X}} g\circ \varphi d\mu \right) +s\left( {\displaystyle \int \limits _{X}} \varphi d\nu -g^{-1}\left( {\displaystyle \int \limits _{X}} g\circ \varphi d\nu \right) \right) \le {\displaystyle \int \limits _{X}} \varphi d\mu . \end{aligned}$$
    (4.4)

If either g is strictly increasing and convex or g is strictly decreasing and concave, then the reverse inequality is satisfied.

Proof

Under the conditions the function \(g^{-1}\) is continuous and convex.

  1. (a)

    It comes from Theorem 3.1 (a).

  2. (b)

    We can apply Theorem 3.3.

The proof is complete. \(\square \)

Remark 4.4

Of course, part (a) of the previous result can also be formulated for infinite sums in a way analogous to Proposition 4.1 (b).

We consider two special cases of the previous result.

Example

  1. (a)

    Choose \(C=] 0,\infty [ \) and \(g=\ln \) in Proposition 4.3 (a). Then (4.3) gives that

    $$\begin{aligned} \prod \limits _{i=1}^{n}v_{i}^{p_{i}}+\min _{1\le j\le k}\frac{p_{i_{j}} }{q_{i_{j}}}\left( \sum \limits _{j=1}^{k}q_{i_{j}}v_{i_{j}}-\prod \limits _{j=1}^{k}v_{i_{j}}^{q_{i_{j}}}\right) \le \sum \limits _{i=1}^{n} p_{i}v_{i}, \end{aligned}$$

    which contains weighted arithmetic means and weighted geometric means, and it refines the inequality between these means.

  2. (b)

    Choose \(C=] 0,\infty [ \) and \(g:] 0,\infty [ \rightarrow \mathbb {R}\), \(g\left( t\right) :=t^{\alpha }\) \(\left( \alpha \ne 0\right) \) in Proposition 4.3 (b).

The mean defined by the function

$$\begin{aligned} \left( {\displaystyle \int \limits _{X}} \varphi ^{\alpha }d\mu \right) ^{\frac{1}{\alpha }} \end{aligned}$$

   is called the \(\alpha \)th power mean (Hölder mean). It is usual to extend it for \(\alpha =0\) by

$$\begin{aligned} \exp \left( {\displaystyle \int \limits _{X}} \ln \circ \varphi d\mu \right) . \end{aligned}$$

In this case (4.4) gives that for \(\alpha \in ] -\infty ,0[ \cup ] 0,1[ \)

$$\begin{aligned} \left( {\displaystyle \int \limits _{X}} \varphi ^{\alpha }d\mu \right) ^{\frac{1}{\alpha }}+s\left( {\displaystyle \int \limits _{X}} \varphi d\nu -\left( {\displaystyle \int \limits _{X}} \varphi ^{\alpha }d\nu \right) ^{\frac{1}{\alpha }}\right) \le {\displaystyle \int \limits _{X}} \varphi d\mu , \end{aligned}$$

while for \(\alpha \in [ 1,\infty [ \) the reverse inequality holds. For \(\alpha =0\) we have

$$\begin{aligned} \exp \left( {\displaystyle \int \limits _{X}} \ln \circ \varphi d\mu \right) +s\left( {\displaystyle \int \limits _{X}} \varphi d\nu -\exp \left( {\displaystyle \int \limits _{X}} \ln \circ \varphi d\nu \right) \right) \le {\displaystyle \int \limits _{X}} \varphi d\mu . \end{aligned}$$

In the following result we obtain refinements of Hölder’s inequality. If \(\left( X,\mathcal {A},\mu \right) \) is a measure space, then \(L^{\alpha }\left( \mu \right) \) \(\left( \alpha \ge 1\right) \) denotes the vector space of all complex \(\alpha \)-fold \(\mu \)-integrable functions on X.

Proposition 4.5

Assume \(\alpha \), \(\beta >1\) with \(\frac{1}{\alpha }+\frac{1}{\beta }=1\).

  1. (a)

    If \(p_{1},\ldots ,p_{n}\) and \(q_{i_{1}},\ldots ,q_{i_{k}}\) are positive numbers, where \(1\le i_{1}<i_{2}<\cdots <i_{k}\le n\), and \(u_{1},\ldots ,u_{n}\in \mathbb {C}\), \(v_{1},\ldots ,v_{n}\in \mathbb {C}\), then

    $$\begin{aligned}{} & {} \sum \limits _{i=1}^{n}p_{i}\left| u_{i}\right| \left| v_{i}\right| \\{} & {} \qquad +\min _{1\le j\le k}\frac{p_{i_{j}}}{q_{i_{j}}}\left( \left( \sum \limits _{j=1}^{k}q_{i_{j}}\left| u_{i_{j}}\right| ^{\alpha }\right) ^{\frac{1}{\alpha }}\left( \sum \limits _{j=1}^{k}q_{i_{j}}\left| v_{i_{j} }\right| ^{\beta }\right) ^{\frac{1}{\beta }}-\sum \limits _{j=1}^{k} q_{i_{j}}\left| u_{i_{j}}\right| \left| v_{i_{j}}\right| \right) \\{} & {} \quad \le \left( \sum \limits _{i=1}^{n}p_{i}\left| u_{i}\right| ^{\alpha }\right) ^{\frac{1}{\alpha }}\left( \sum \limits _{i=1}^{n}p_{i}\left| v_{i}\right| ^{\beta }\right) ^{\frac{1}{\beta }}. \end{aligned}$$
  2. (b)

    Let \(\left( X,\mathcal {A}\right) \) be a measurable space with measures \(\xi \) and \(\eta \), and let u, \(v:X\rightarrow \mathbb {C}\) be measurable functions such that \(u\in L^{\alpha }\left( \xi \right) \cap L^{\alpha }\left( \eta \right) \), \(v\in L^{\beta }\left( \xi \right) \cap L^{\beta }\left( \eta \right) \). If

    $$\begin{aligned} 0<s:=\inf _{\left\{ A\in \mathcal {A}\mid \int \limits _{A}\left| v\right| ^{\alpha }d\eta >0\right\} }\frac{\int \limits _{A}\left| v\right| ^{\beta }d\xi }{\int \limits _{A}\left| v\right| ^{\alpha }d\eta }, \end{aligned}$$
    (4.5)

    then

    $$\begin{aligned}{} & {} \int \limits _{X}\left| u\right| \left| v\right| d\xi +s\left( \left( \int \limits _{X}\left| u\right| ^{\alpha }d\eta \right) ^{\frac{1}{\alpha }}\left( \int \limits _{X}\left| v\right| ^{\beta }d\eta \right) ^{\frac{1}{\beta }}- {\displaystyle \int \limits _{X}} \left| u\right| \left| v\right| d\eta \right) \end{aligned}$$
    (4.6)
    $$\begin{aligned}{} & {} \quad \le \left( \int \limits _{X}\left| u\right| ^{\alpha }d\xi \right) ^{\frac{1}{\alpha }}\left( \int \limits _{X}\left| v\right| ^{\beta } d\xi \right) ^{\frac{1}{\beta }}. \end{aligned}$$
    (4.7)

Proof

We only show (b), (a) can similarly be proved using Theorem 3.1 (a) instead of Theorem 3.3.

(b) It follows from (4.5) that \(\int \limits _{X}\left| v\right| ^{\beta }d\xi >0\) and \(\int \limits _{X}\left| v\right| ^{\beta }d\eta >0\).

Define the measure \(\mu \) on \(\mathcal {A}\) having density \(\left| v\right| ^{\beta }/\int \limits _{X}\left| v\right| ^{\beta }d\xi \) with respect to \(\xi \), that is

$$\begin{aligned} \mu \left( A\right) :=\frac{1}{\int \limits _{X}\left| v\right| ^{\beta }d\xi }\cdot \int \limits _{A}\left| v\right| ^{\beta }d\xi ,\quad A\in \mathcal {A}. \end{aligned}$$

The measure \(\nu \) on \(\mathcal {A}\) is defined similarly by

$$\begin{aligned} \nu \left( A\right) :=\frac{1}{\int \limits _{X}\left| v\right| ^{\beta }d\eta }\cdot \int \limits _{A}\left| v\right| ^{\beta }d\eta ,\quad A\in \mathcal {A}. \end{aligned}$$

Then \(\mu \) and \(\nu \) are probability measures on \(\mathcal {A}\), and \(g\in L^{\alpha }\left( \xi \right) \cap L^{\alpha }\left( \eta \right) \) shows that the function

$$\begin{aligned} \varphi :=\left| u\right| ^{\alpha }\left| v\right| ^{-\beta } \end{aligned}$$

is \(\mu \)- and \(\nu \)-integrable. The function \(f:[ 0,\infty [ \rightarrow \mathbb {R}\), \(f(t)=-t^{1/\alpha }\) is strictly convex, and since \(uv\in L^{1}\left( \xi \right) \cap L^{1}\left( \eta \right) \), we have that

$$\begin{aligned} f\circ \varphi =-\left| u\right| \left| v\right| ^{-\beta /\alpha } \end{aligned}$$

is also \(\mu \)- and \(\nu \)-integrable. It can be seen that Theorem 3.3 can be applied to the introduced functions f and \(\varphi \), and measures \(\mu \) and \(\nu \), and therefore

$$\begin{aligned} -\left( {\displaystyle \int \limits _{X}} \varphi d\mu \right) ^{\frac{1}{\alpha }}+s\left( {\displaystyle \int \limits _{X}} -\varphi ^{1/\alpha }d\nu +\left( {\displaystyle \int \limits _{X}} \varphi d\nu \right) ^{\frac{1}{\alpha }}\right) \le {\displaystyle \int \limits _{X}} -\varphi ^{1/\alpha }d\mu . \end{aligned}$$
(4.8)

By considering the relationship between \(\mu \)- and \(\xi \)- as well as \(\nu \)- and \(\eta \)-integrals, (4.8) can be rewritten in the form

$$\begin{aligned}{} & {} \left( \frac{1}{\int \limits _{X}\left| v\right| ^{\beta }d\xi }\cdot \int \limits _{X}\left| u\right| ^{\alpha }d\xi \right) ^{\frac{1}{\alpha }} \\{} & {} \qquad -s\frac{\int \limits _{X}\left| v\right| ^{\beta }d\eta }{\int \limits _{X}\left| v\right| ^{\beta }d\xi }\left( -\frac{1}{\int \limits _{X}\left| v\right| ^{\beta }d\eta }\cdot {\displaystyle \int \limits _{X}} \left| u\right| \left| v\right| ^{-\beta /\alpha }\left| v\right| ^{\beta }d\eta \right. \\{} & {} \qquad \left. +\left( \frac{1}{\int \limits _{X}\left| v\right| ^{\beta }d\eta }\cdot {\displaystyle \int \limits _{X}} \left| u\right| ^{\alpha }d\eta \right) ^{\frac{1}{\alpha }}\right) \\{} & {} \quad \ge \frac{1}{\int \limits _{X}\left| v\right| ^{\beta }d\xi }\cdot \int \limits _{X}\left| u\right| \left| v\right| ^{-\beta /\alpha }\left| v\right| ^{\beta }d\xi , \end{aligned}$$

from which we obtain (4.6-4.7) by simple calculation.

The proof is complete. \(\square \)

Remark 4.6

  1. (a)

    Obviously, starting from Theorem 3.1 (b), we could make a statement analogous to (a) for infinite sums.

  2. (b)

    If \(v\ne 0\) \(\eta \)-a.e. on X, then

    $$\begin{aligned} s=\inf _{\left\{ A\in \mathcal {A}\mid \eta \left( A\right) >0\right\} } \frac{\int \limits _{A}\left| v\right| ^{\beta }d\xi }{\int \limits _{A} \left| v\right| ^{\alpha }d\eta }. \end{aligned}$$

Finally, some applications to information theory are presented.

Throughout the rest of the paper probability measures P and Q are defined on a fixed measurable space \(\left( X,\mathcal {A}\right) \). It is also assumed that P and Q are continuous with respect to a \(\sigma \)-finite measure \(\xi \) on \(\mathcal {A}\). The Radon–Nikodym derivatives of P and Q with respect to \(\xi \) are denoted by p and q, respectively. These densities are \(\xi \)-almost everywhere uniquely determined.

Introduce the set of functions

$$\begin{aligned} F:=\left\{ f:] 0,\infty [ \rightarrow \mathbb {R}\mid f\text { is convex}\right\} , \end{aligned}$$

and define for every \(f\in F\) the function

$$\begin{aligned} f^{*}:] 0,\infty [ \rightarrow \mathbb {R},\quad f^{*}\left( t\right) :=tf\left( \frac{1}{t}\right) . \end{aligned}$$

If \(f\in F\), then either f is monotonic or there exists a point \(t_{0} \in ] 0,\infty [ \) such that f is decreasing on \(] 0,t_{0}[ \). This implies that the limit

$$\begin{aligned} \lim \limits _{t\rightarrow 0+}f\left( t\right) \end{aligned}$$

exists in \(] -\infty ,\infty ] \), and

$$\begin{aligned} f\left( 0\right) :=\lim \limits _{t\rightarrow 0+}f\left( t\right) \end{aligned}$$

extends f into a convex function on \([ 0,\infty [ \).

It is well known that for every \(f\in F\) the function \(f^{*}\) also belongs to F, and therefore

$$\begin{aligned} f^{*}\left( 0\right) :=\lim \limits _{t\rightarrow 0+}f^{*}\left( t\right) =\lim \limits _{u\rightarrow \infty }\frac{f\left( u\right) }{u}. \end{aligned}$$

The important notion of f-divergences was introduced in [2, 3], and independently in [1].

Definition 4.7

For every \(f\in F\) we define the f-divergence of P and Q by

$$\begin{aligned} D_{f}\left( P,Q\right) := {\displaystyle \int \limits _{X}} q\left( \omega \right) f\left( \frac{p\left( \omega \right) }{q\left( \omega \right) }\right) d\xi \left( \omega \right) , \end{aligned}$$

where the following conventions are used

$$\begin{aligned} 0f\left( \frac{x}{0}\right) :=xf^{*}\left( 0\right) \text { if }x>0,\quad 0f\left( \frac{0}{0}\right) =0f^{*}\left( 0\right) :=0. \end{aligned}$$
(4.9)

Remark 4.8

  1. (a)

    For every \(f\in F\) the perspective \(\hat{f}:] 0,\infty [ \times ] 0,\infty [ \rightarrow \mathbb {R}\) of f is defined by

    $$\begin{aligned} \hat{f}\left( x,y\right) :=yf\left( \frac{x}{y}\right) . \end{aligned}$$

    Then (see [11]) \(\hat{f}\) is also a convex function. It is proved in [12] that (4.9) is the unique rule leading to convex and lower semicontinuous extension of \(\hat{f}\) to the set

    $$\begin{aligned} \left\{ \left( x,y\right) \in \mathbb {R}^{2}\mid x,y\ge 0\right\} . \end{aligned}$$
  2. (b)

    Since \(f^{*}\left( 0\right) \in ] -\infty ,\infty ] \), Lemma 2.8 in [7] shows that \(D_{f}\left( P,Q\right) \) exists in \(] -\infty ,\infty ] \) and

    $$\begin{aligned} D_{f}\left( P,Q\right) = {\displaystyle \int \limits _{\left( q>0\right) }} f\left( \frac{p\left( \omega \right) }{q\left( \omega \right) }\right) dQ\left( \omega \right) +f^{*}\left( 0\right) P\left( q=0\right) . \end{aligned}$$
    (4.10)

It follows that if P is continuous with respect to Q, then

$$\begin{aligned} D_{f}\left( P,Q\right) = {\displaystyle \int \limits _{\left( q>0\right) }} f\left( \frac{p\left( \omega \right) }{q\left( \omega \right) }\right) dQ\left( \omega \right) . \end{aligned}$$

The basic inequality (see [8])

$$\begin{aligned} D_{f}\left( P,Q\right) \ge f\left( 1\right) \end{aligned}$$
(4.11)

is one of the key properties of f-divergences.

In the next result we refine this inequality.

Proposition 4.9

Assume densities p and q are positive functions such that

$$\begin{aligned} 0<s:=\inf _{\left\{ A\in \mathcal {A}\mid P\left( A\right)>0\right\} } \frac{Q\left( A\right) }{P\left( A\right) }=\inf _{\left\{ A\in \mathcal {A}\mid \xi \left( A\right) >0\right\} }\frac{\int \limits _{A}qd\xi }{\int \limits _{A}pd\xi }, \end{aligned}$$
(4.12)

and \(\frac{p}{q}\) is a P-integrable function. If \(f\in F\) such that \(f\circ \frac{p}{q}\) is P- and Q-integrable, then

$$\begin{aligned} f\left( 1\right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ \frac{p}{q}\right) dP-f\left( {\displaystyle \int \limits _{X}} \frac{p}{q}dP\right) \right) \le D_{f}\left( P,Q\right) . \end{aligned}$$
(4.13)

Proof

It can be obtained easily from Theorem 3.3 by choosing \(\mu =Q\), \(\nu =P\) and \(\varphi =\frac{p}{q}\).

As we have mentioned in Remark 3.4 (a), the measure P is continuous with respect to Q, and hence by Remark 4.8 (b),

$$\begin{aligned} {\displaystyle \int \limits _{X}} \left( f\circ \frac{p}{q}\right) dQ=D_{f}\left( P,Q\right) . \end{aligned}$$

The proof is complete. \(\square \)

Remark 4.10

  1. (a)

    Since P is continuous with respect to Q, P has a Radon–Nikodym derivative \(u:X\rightarrow \mathbb {R}\) with respect to Q. For the uniqueness of the Radon–Nikodym derivative, \(p=uq\). By using this, (4.13) can be written in other forms:

    $$\begin{aligned} f\left( 1\right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ \frac{p}{q}\right) pd\xi -f\left( {\displaystyle \int \limits _{X}} \frac{p^{2}}{q}d\xi \right) \right) \le D_{f}\left( P,Q\right) \end{aligned}$$

    or

    $$\begin{aligned} f\left( 1\right) +s\left( {\displaystyle \int \limits _{X}} \left( f\circ u\right) udQ-f\left( {\displaystyle \int \limits _{X}} u^{2}dQ\right) \right) \le D_{f}\left( P,Q\right) . \end{aligned}$$
  2. (b)

    As we have mentioned in Remark 3.6 (c) for \(0<\inf _{X}\frac{q}{p}\) the condition (4.12) is also satisfied.