A generalized Cauchy-Schwarz inequality via the Gibbs variational formula

Let be a non-empty finite set. If is a random variable taking values in , the Shannon entropy of is defined as
There is a nice variational formula that lets one compute logs of sums of exponentials in terms of this entropy:
Lemma 1 (Gibbs variational formula) Let be a function. Then

Proof: Note that shifting by a constant affects both sides of (1) the same way, so we may normalize . Then is now the probability distribution of some random variable , and the inequality can be rewritten as
But this is precisely the Gibbs inequality. (The expression inside the supremum can also be written as , where denotes Kullback-Leibler divergence. One can also interpret this inequality as a special case of the Fenchel–Young inequality relating the conjugate convex functions and .)

In this note I would like to use this variational formula (which is also known as the Donsker-Varadhan variational formula) to give another proof of the following inequality of Carbery.
Theorem 2 (Generalized Cauchy-Schwarz inequality) Let , let be finite non-empty sets, and let be functions for each . Let and be positive functions for each . Then
where is the quantity
where is the set of all tuples such that for .

Thus for instance, the identity is trivial for . When , the inequality reads

which is easily proven by Cauchy-Schwarz, while for the inequality reads

which can also be proven by elementary means. However even for , the existing proofs require the “tensor power trick” in order to reduce to the case when the are step functions (in which case the inequality can be proven elementarily, as discussed in the above paper of Carbery).

We now prove this inequality. We write and for some functions and . If we take logarithms in the inequality to be proven and apply Lemma 1, the inequality becomes

where ranges over random variables taking values in , range over tuples of random variables taking values in , and range over random variables taking values in . Comparing the suprema, the claim now reduces to
Lemma 3 (Conditional expectation computation) Let be an -valued random variable. Then there exists a -valued random variable , where each has the same distribution as , and

Proof: We induct on . When we just take . Now suppose that , and the claim has already been proven for , thus one has already obtained a tuple with each having the same distribution as , and
By hypothesis, has the same distribution as . For each value attained by , we can take conditionally independent copies of and conditioned to the events and respectively, and then concatenate them to form a tuple in , with a further copy of that is conditionally independent of relative to . One can the use the entropy chain rule to compute

and the claim now follows from the induction hypothesis.

With a little more effort, one can replace by a more general measure space (and use differential entropy in place of Shannon entropy), to recover Carbery’s inequality in full generality; we leave the details to the interested reader.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

More like this

Goldman Sachs loses profit after hits from GreenSky, real...

Second-quarter profit fell 58% to $1.22 billion, or $3.08 a share, due to steep declines in trading...
Multiplication principle and addition principle

Fundamental Counting Principle

We will introduce the fundamental counting principle with an example. This counting principle is all about choices...
The Battlestar Galactica Theory of Math Education – Math with Bad Drawings

The Battlestar Galactica Theory of Math Education – Math...

Last month, as I read Christopher J. Phillips’ brief and engrossing The New Math: A Political History,...