Next Article in Journal
Conformity and Mass Media Influence in the Sznajd Model on Regular Lattices
Next Article in Special Issue
A Joint Communication and Computation Design for Probabilistic Semantic Communications
Previous Article in Journal
Entanglement-Based CV-QKD with Information Reconciliation over Entanglement-Assisted Link
Previous Article in Special Issue
The Role of Gossiping in Information Dissemination over a Network of Agents
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Structural Properties of the Wyner–Ziv Rate Distortion Function: Applications for Multivariate Gaussian Sources †

by
Michail Gkagkos
1 and
Charalambos D. Charalambous
2,*
1
Department of Electrical and Computer Engineering, Texas A & M University, College Station, TX 77843, USA
2
Department of Electrical and Computer Engineering, University of Cyprus, P.O. Box 20537, CY-1678 Nicosia, Cyprus
*
Author to whom correspondence should be addressed.
Preliminary results of this paper are published in 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Victoria, Australia.
Entropy 2024, 26(4), 306; https://doi.org/10.3390/e26040306
Submission received: 4 January 2024 / Revised: 7 March 2024 / Accepted: 20 March 2024 / Published: 29 March 2024

Abstract

:
The main focus of this paper is the derivation of the structural properties of the test channels of Wyner’s operational information rate distortion function (RDF), R ¯ ( Δ X ) , for arbitrary abstract sources and, subsequently, the derivation of additional properties for a tuple of multivariate correlated, jointly independent, and identically distributed Gaussian random variables, { X t , Y t } t = 1 , X t : Ω R n x , Y t : Ω R n y , with average mean-square error at the decoder and the side information, { Y t } t = 1 , available only at the decoder. For the tuple of multivariate correlated Gaussian sources, we construct optimal test channel realizations which achieve the informational RDF, R ¯ ( Δ X ) = inf M ( Δ X ) I ( X ; Z | Y ) , where M ( Δ X ) is the set of auxiliary RVs Z such that P Z | X , Y = P Z | X , X ^ = f ( Y , Z ) , and E { | | X X ^ | | 2 } Δ X . We show the following fundamental structural properties: (1) Optimal test channel realizations that achieve the RDF and satisfy conditional independence, P X | X ^ , Y , Z = P X | X ^ , Y = P X | X ^ , E X | X ^ , Y , Z = E X | X ^ = X ^ . (2) Similarly, for the conditional RDF, R X | Y ( Δ X ) , when the side information is available to both the encoder and the decoder, we show the equality R ¯ ( Δ X ) = R X | Y ( Δ X ) . (3) We derive the water-filling solution for R X | Y ( Δ X ) .

1. Introduction, Problem Statement, and Main Results

1.1. The Wyner and Ziv Lossy Compression Problem and Generalizations

Wyner and Ziv [1] derived an operational information definition for the lossy compression problem in Figure 1 with respect to a single-letter fidelity of reconstruction. The joint sequence of random variables (RVs) { ( X t , Y t ) : t = 1 , 2 , } takes values in sets of finite cardinality, { X , Y } , and it is generated independently according to the joint probability distribution function P X , Y . Wyner [2] generalized [1] to RVs { ( X t , Y t ) : t = 1 , 2 , } that take values in abstract alphabet spaces { X , Y } and hence include continuous-valued RVs.
(A) Switch “A” Closed: When the side information { Y t : t = 1 , 2 , } is available non-causally at both the encoder and the decoder, Wyner [2] (see also Berger [3]) characterized the infimum of all achievable operational rates (denoted by R ¯ 1 ( Δ X ) in [2]), subject to a single-letter fidelity with average distortion less than or equal to Δ X [ 0 , ) . The rate is given by the single-letter operational information theoretic conditional RDF:
R X | Y ( Δ X ) = inf M 0 ( Δ X ) I ( X ; X ^ | Y ) [ 0 , ] , Δ X [ 0 , )
= inf P X ^ | X , Y : E d X ( X , X ^ ) Δ X I ( X ; X ^ | Y )
where M 0 ( Δ X ) is the set specified by
M 0 ( Δ X ) = { X ^ : Ω X ^ : P X , Y , X ^ is the joint measure on X × Y × X ^ , E d X ( X , X ^ ) Δ X } ,
and X ^ is the reproduction of X. I ( X ; X ^ | Y ) is the conditional mutual information between X and X ^ conditioned on Y, and d X ( · , · ) is the fidelity criterion between x and x ^ . The infimum in (1) is over all elements of M 0 ( Δ X ) with induced joint distributions P X , Y , X ^ of the RVs ( X , Y , X ^ ) such that the marginal distribution P X , Y is the fixed joint distribution of the source ( X , Y ) . This problem is equivalent to (2) [4].
(B) Switch “A” Open: When the side information is available non-causally only at the decoder, Wyner [2] characterized the infimum of all achievable operational rates (denoted by R ( Δ X ) in [2]), subject to a single-letter fidelity with average distortion less than or equal to Δ X . The rate is given by the single-letter operational information theoretic RDF as a function of an auxiliary RV Z : Ω Z :
R ¯ ( Δ X ) = inf M ( Δ X ) I ( X ; Z ) I ( Y ; Z ) [ 0 , ] , Δ X [ 0 , )
= inf M ( Δ X ) I ( X ; Z | Y )
where M ( Δ X ) is specified by the set of auxiliary RVs Z and defined as:
M ( Δ X ) = { Z : Ω Z : P X , Y , Z , X ^ is the joint measure on X × Y × Z × X ^ , P Z | X , Y = P Z | X , meas . fun . f : Y × Z X ^ , X ^ = f ( Y , Z ) , E d X ( X , X ^ ) Δ X } .
Wyner’s realization of the joint measure P X , Y , Z , X ^ induced by the RVs ( X , Y , Z , X ^ ) is illustrated in Figure 2, where Z is the output of the “test channel”, P Z | X . Clearly, R ¯ ( Δ X ) involves two strategies, i.e., f ( · , · ) and P Z | X , Y = P Z | X . This makes it a much more complex problem compared to R X | Y ( Δ X ) (which involves only P X ^ | X , Y ).
Throughout [2], the following assumption is imposed.
Assumption 1. 
I ( X ; Y ) < (see [2]).
Wyner [2] considered scalar-valued jointly Gaussian RVs ( X , Y ) with square-error distortion and constructed the optimal realizations X ^ and ( Z , X ^ ) and the function f ( X , Z ) from the sets M 0 ( Δ X ) and M ( Δ X ) , respectively. Also, it is shown that these realizations achieve the characterizations of the RDFs R X | Y ( Δ X ) and R ¯ ( Δ X ) , respectively, and that the two rates are equal, i.e., R ¯ ( Δ X ) = R X | Y ( Δ X ) .
(C) Marginal RDF: If there is no side information, { Y t : t = 1 , 2 , } , or the side information is independent of the source, { X t : t = 1 , 2 , } , the RDFs R X | Y ( Δ X ) and R ¯ ( Δ X ) degenerate to the marginal RDF R X ( Δ X ) , defined by
R X ( Δ X ) = inf P X ^ | X : E d X ( X , X ^ ) Δ X I ( X ; X ^ ) [ 0 , ] , Δ X [ 0 , ) .
(D) Gray’s Lower Bounds: A lower bound on R X | Y ( Δ X ) is given by Gray in [4] [Theorem 3.1]. This bound connects R X | Y ( Δ X ) with the marginal RDF and the mutual information between X and Y as follows:
R X | Y ( Δ X ) R X ( Δ X ) I ( X ; Y ) .
Clearly, the lower bound is trivial for values of Δ X [ 0 , ) such that R X ( Δ X ) I ( X ; Y ) < 0 .

1.2. Main Contributions of the Paper

We first consider Wyner’s [2] RDFs R X | Y ( Δ X ) and R ¯ ( Δ X ) for arbitrary RVs ( X , Y ) defined on abstract alphabet spaces, and we derive structural properties of the realizations that achieve the two optimal test channels. Subsequently, we generalize Wyner’s [2] results to multivariate-valued jointly Gaussian RVs ( X , Y ) . In other words, we construct the optimal multivariate-valued realizations X ^ and ( X ^ , Z ) and the function f ( X , Z ) which achieve the RDFs R X | Y ( Δ X ) and R ¯ ( Δ X ) , respectively. In the literature, it is often called achievability of the converse coding theorem. In addition, we use the realizations to prove the equality R ¯ ( Δ X ) = R X | Y ( Δ X ) and to derive the water-filling solution. Along the way, we verify that our results reproduce, for scalar-valued RVs ( X , Y ) , Wyner [2] RDFs and the optimal realizations. However, to our surprise, the existing results from the literature [[5], Theorem 4 and Abstract and [6], Theorem 3A], which deal with the more general multivariate-valued remote sensor problem (the RDF of the remote sensor problem is a generalization of Wyner’s RDF R ¯ ( Δ X ) , with the encoder observing a noisy version of the RVs generated by the source), do not degenerate to Wyner’s [2] RDFs, when specilized to scalar-valued RVs (we verify this in Remark 5 by also checking the correction suggested in https://tiangroup.engr.tamu.edu/publications/) (accessed on 3 January 2024). In Section 1.3, we give a detailed account of the main results of this paper. We should emphasize that preliminary results of this paper appeared in [7], mostly without the details of the proofs. This paper is extended [7] and contains complete proofs of the preliminary results of [7], which in some cases are lengthy (see, for example, Section 4, proofs of Theorems 3–5, Corollaries 1 and 2, etc.).

1.3. Problem Statement and Main Results

(a) We consider a tuple of jointly independent and identically distributed (i.i.d.) arbitrary RVs ( X n , Y n ) = { ( X t , Y t ) : t = 1 , 2 , , n } defined on abstract alphabet spaces, and we derive the following results.
(a.1) Lemma 1: Achievable lower bound on the conditional mutual information I ( X ; X ^ | Y ) , which strengthens Gray’s lower bound (8) [[4], Theorem 3.1].
(a.2) Theorem 2: Structural properties of the optimal reconstruction X ^ , which achieves a lower bound on R X | Y ( Δ X ) for mean-square error distortion. Theorem 2 strengthens the conditions for the equality to hold, R X | Y ( Δ X ) = R ¯ ( Δ X ) , given by Wyner [2] [Remarks, p. 65] (see Remark 1). However, for finite-alphabet-valued sources with Hamming distance distortion, it might be the case that R X | Y ( Δ X ) < R ¯ ( Δ X ) , as pointed out by Wyner and Ziv [1] [Section 3] for the doubly symmetric binary source.
(b) We consider a tuple of jointly i.i.d. multivariate Gaussian RVs ( X n , Y n ) = { ( X t , Y t ) : t = 1 , 2 , , n } , with respect to the square-error fidelity, as defined below.
X t : Ω R n x = X , Y t : Ω R n y = Y , t = 1 , 2 , , n ,
X t N ( 0 , Q X ) , Y t N ( 0 , Q Y ) ,
Q ( X t , Y t ) = E X t Y t X t Y t T = Q X Q X , Y Q X , Y T Q Y ,
P X t , Y t = P X , Y multivariate Gaussian distribution ,
X ^ t : Ω R n x = X , t = 1 , 2 , n ,
D X ( x n , x ^ n ) = 1 n t = 1 n | | x t x ^ t | | R n x 2 ,
where n x , n y are arbitrary positive integers, X N ( 0 , Q X ) means X is a Gaussian RV, with zero mean and covariance matrix Q X , and | | · | | R n x 2 is the Euclidean distance on R n x . To give additional insight we often consider the following realization of side information (the condition D D T 0 ensures I ( X ; Y ) < , and hence, Assumption 1 is respected).
Y t = C X t + D V t ,
V t N ( 0 , Q V ) ,
C R n y × n x , D R n y × n y , D D T 0 , Q V = I n y ,
V n independent of X n ,
where I n y denotes the n y × n y identity matrix. For the above specification of the source and distortion criterion, we derive the following results.
(b.1) Theorems 3 and 4: Structural properties of optimal realization of X ^ , which achieves R X | Y ( Δ X ) , its closed form expression.
(b.2) Theorem 5: Structural properties of optimal realization of X ^ and X ^ = f ( Y , Z ) , which achieve R ¯ ( Δ X ) and the closed form expression of R ¯ ( Δ X ) .
(b.3) A proof that  R ¯ ( Δ X )  and  R X | Y ( Δ X )  coincide: Calculation of the distortion region such that Gray’s lower bound (8) holds with equality.
In Remark 4, we consider the tuple of scalar-valued, jointly Gaussian RVs ( X , Y ) with square error distortion function and verify that our optimal realizations of X ^ and the closed form expressions for R X | Y ( Δ X ) and R ¯ ( Δ X ) are identical to Wyner’s [2] realizations and RDFs.
We should emphasize that our methodology is different from past studies in the sense that we focus on the structural properties of the realizations of the test channels, that achieve the characterizations of the two RDFs (i.e., verification of the converse coding theorem). Our derivations are generic and bring new insight into the construction of realizations that induce the optimal test channels of other distributed source coding problems (i.e., establishing the achievability of the converse coding theorem).

1.4. Additional Generalizations of the Wyner-Ziv [1] and Wyner [2] RDFs

Below, we discuss additional generalizations of Wyner and Ziv [1] and Wyner’s [2] RDFs.
(A) Draper and Wornell [8] Distributed Remote Source Coding Problem: Draper and Wornell [8] generalized the RDF R ¯ ( Δ X ) , when the source to be estimated at the decoder is S : Ω S , and it is not directly observed at the encoder. Rather, the encoder observes a RV X : Ω X (which is correlated with S), while the decoder observes another RV, as side information, Y : Ω Y , which provides information on ( S , X ) . The aim is to reconstruct S at the decoder by S ^ : Ω S ^ , subject to an average distortion E { d S ( S , S ^ ) } Δ S , by a function S ^ = f ( Y , Z ) . The RDF for this problem, called the distributed remote source coding problem, is defined by [8]
R ¯ P O ( Δ S ) = inf M P O ( Δ S ) I ( X ; Z | Y ) [ 0 , ] ,
where M P O ( Δ S ) is specified by the set of auxiliary RVs Z, and defined as:
M P O ( Δ S ) = { Z : Ω Z : P S , X , Y , Z , X ^ is the joint measure on S × X × Y × Z × X ^ , P Z | S , X , Y = P Z | X , measurable function f P O : Y × Z S ^ , S ^ = f P O ( Y , Z ) , E d S ( S , S ^ ) Δ S } .
Clearly, if S = X a . s (almost surely), then R ¯ P O ( Δ S ) degenerates (this implies the optimal test channel that achieves the characterization of the RDF R ¯ P O ( Δ S ) should degenerate to the optimal test channel that achieves the characterization of the RDF R ¯ ( Δ X ) ) to R ¯ ( Δ X ) . For scalar-valued jointly Gaussian RVs ( S , X , Y , Z , X ^ ) with square-error distortion, Draper and Wornell [8] [Equation (3) and Appendix A.1] derived the characterization of the RDF R ¯ P O ( Δ S ) and constructed the optimal realization S ^ = f P O ( Y , Z ) , which achieves this characterization.
In [5,6], the authors investigated the RDF R ¯ P O ( Δ S ) of [8] for the multivariate jointly Gaussian RVs ( S , X , Y , Z , X ^ ) , with square-error distortion, and derived a characterization for the RDF R ¯ P O ( Δ S ) in [[5], Theorem 4] and [[6], Theorem 3A] (see [[6], Equation (26)]). However, it will become apparent in Remark 5 that, when S = X almost surely (a.s.), and hence R ¯ P O ( Δ S ) = R ¯ ( Δ X ) , the RDFs given in [[5], Theorem 4] and [[6], Theorem 3A], do not produce Wyner’s [2] value. We also show in Remark 5 that the same technical issues occur for the correction suggested in https://tiangroup.engr.tamu.edu/publications/ (accessed on 3 January 2024). Similarly, when S = X a . s . and Y = X a . s . [[5], Theorem 4] and [[6], Theorem 3A], do not produce the classical RDF R X ( Δ X ) of the Gaussian source X.
(B) Additional Literature Review: The formulation of Figure 1 is generalized to other multiterminal or distributed lossy compression problems, such as relay networks, sensor networks, etc., under various code formulations and assumptions. Oohama [9] analyzed lossy compression problems for a tuple of scalar correlated Gaussian memoryless sources with square error distortion criterion. Also, he determined the rate-distortion region, in the special case when one source provides partial side information to the other source. Furthermore, Oohama in [10] analyzed separate lossy compression problems for L + 1 scalar correlated Gaussian memoryless sources, when L of the sources provide partial side information at the decoder for the reconstruction of the remaining source and gave a partial answer to the rate distortion region. Additionally, ref. [10] proved that the problem of [10] includes, as a special case, the additive white Gaussian CEO problem analyzed by Viswanathan and Berger [11]. Extensions of [10] are derived by Ekrem and Ulukus [12] and Wang and Chen [13], where an outer bound on the rate region is derived for the vector Gaussian multiterminal source. Additional works are [14,15,16] and the references therein.
The vast literature on multiterminal or distributed lossy compression of jointly Gaussian sources with square-error distortion (including the references mentioned above), is often confined to scalar-valued correlated RVs. Moreover, as easily verified, not much emphasis is given in the literature on the structural properties of the realizations of RVs that induce the optimal test channels that achieve the characterizations of the RDFs.
The rest of the paper is organized as follows. In Section 2, we review Wyner’s [2] operational definition of lossy compression. We also state a fundamental theorem on mean-square estimation that we use throughout the paper regarding the analysis of (b). The main Theorems are presented in Section 3; some of the proofs, including the structural properties, are given in Section 4. Connections between our results and the past literature are provided in Section 5. A simulation to show the gap between the two rates is given in the same section.

2. Preliminaries

In this section, we review the Wyner [2] source coding problems with fidelity in Figure 1. We begin with the notation, which follows closely [2].

2.1. Notation

Let Z = { , 1 , 0 , 1 , } the set of all integers, N = { 0 , 1 , 2 , , } the set of natural integers, Z + = { 1 , 2 , , } . For n Z + , denote the following finite subset of the above defined set, Z n = { 1 , 2 , , n } . Denote the real numbers by R and the set of positive and of strictly positive real numbers, by R + = [ 0 , ) and R + + = ( 0 , ) , respectively.
For any matrix A R p × m , ( p , m ) Z + × Z + , we denote its kernel by k e r ( A ) its transpose by A T , and for m = p , we denote its trace by trace ( A ) , and by diag { A } , the matrix with diagonal entries A i i , i Z p , and zero elsewhere. The determinant of a square matrix A is denoted by det ( A ) . The identity matrix with dimensions p × p is designated as I p . Denote an arbitrary set or space by U and the product space formed by n copies of it by U n = × t = 1 n U . u n U n denotes the set of n tuples u n = ( u 1 , u 2 , , u n ) , where u k U , k Z k are its coordinates. Denote a probability space by ( Ω , F , P ) . For a sub-sigma-field G F , and A F , denote by P ( A | G ) the conditional probability of A given G ; i.e., P ( A | G ) = P ( A | G ) ( ω ) , ω Ω is a measurable function on Ω .
On the above probability space, consider two-real valued random variables (RV) X : Ω X , Y : Ω X , where ( X , B ( X ) ) , ( Y , B ( Y ) ) are arbitrary measurable spaces. The measure (or joint distribution if X , Y are Euclidean spaces) induced by ( X , Y ) on X × Y is denoted by P X , Y or P ( d x , d y ) and their marginals on X and Y by P X and P Y , respectively. The conditional measure of RV X conditioned on Y is denoted by P X | Y or P ( d x | y ) , when Y = y is fixed. On the above probability space, consider three-real values RVs X : Ω X , Y : Ω X , Z : Ω Z . We say that RVs ( Y , Z ) are conditional independent given RV X if P Y , Z | X = P Y | X P Z | X a . s . (almost surely) or equivalently P Z | X , Y = P Z | X a . s ; the specification a.s is often omitted. We often denote the above conditional independence by the Markov chain (MC) Y X Z .
Finally, for RVs X , Y , etc., H ( X ) denotes differential entropy of X, H ( X | Y ) conditional differential entropy of X given Y, and I ( X ; Y ) the mutual information between X and Y, as defined in standard books on information theory [17,18]. We use log ( · ) to denote the natural logarithm. The notation X N ( 0 , Q X ) means X is a Gaussian distributed RV with zero mean and covariance Q X 0 , where Q X 0 (resp. Q X 0 ) means Q X is positive semidefinite (respectively, positive definite). We denote the covariance of X and Y by
Q X , Y = cov X , Y .
We denote the covariance of X conditioned on Y by
Q X | Y = cov ( X , X | Y ) = E X E X | Y X E X | Y T if ( X , Y ) is jointly Gaussian ,
where the second equality is due to a property of jointly Gaussian RVs.

2.2. Mean-Square Estimation of Conditionally Gaussian RVs

Below, we state a well-known property of conditionally Gaussian RVs from [19], which we use in our derivations.
Proposition 1. 
Conditionally Gaussian RVs [19]. Consider a pair of multivariate RVs X = ( X 1 , , X n x ) T : Ω R n x and Y = ( Y 1 , , Y n y ) T : Ω R n y , ( n x , n y ) Z + × Z + , defined on some probability distribution Ω , F , P . Let G F be a sub σ algebra. Assume the conditional distribution of ( X , Y ) conditioned on G , i.e., P ( d x , d y | G ) is P a . s . (almost surely) Gaussian, with conditional means
μ X | G = E X | G , μ Y | G = E Y | G ,
and conditional covariances
Q X | G = cov X , X | G , Q Y | G = cov Y , Y | G ,
Q X , Y | G = cov X , Y | G .
Then, the vectors of conditional expectations μ X | Y , G = E X | Y , G and matrices of conditional covariances Q X | Y , G = cov X , X | Y , G are given, P a . s . , by the following expressions (If Q Y | G 0 then the inverse exists and the pseudoinverse is Q Y | G = Q Y | G 1 ):
μ X | Y , G = μ X | G + Q X , Y | G Q Y | G Y μ Y | G ,
Q X | Y , G = Q X | G Q X , Y | G Q Y | G Q X , Y | G T .
If G is the trivial information, i.e., G = { Ω , } , then G is removed from the above expressions.
Note that, if G = { Ω , } , then (26) and (27) reduce to the well-known conditional mean and conditional covariance of X conditioned on Y.
For Gaussian RVs, we make use of the following properties.
Proposition 2. 
Let X : Ω R n , n Z + , X N ( 0 , Q X ) , Q X 0 , S R n 1 × n , n 1 Z + , and denote by F X and F S X the σ algebra generated by the RVs X and S X , respectively. The following hold.
(a) F S X F X .
(b) F S X = F X if and only if k e r ( Q X ) = k e r ( S Q X ) .
Proof. 
This is well-known in measure theory, see [20]. □
Proposition 3. 
Let X : Ω R n , n Z + , X N ( 0 , Q X ) , Q X 0 , rank ( Q X ) = n 1 , n 1 Z + , n 1 < n . Then, there exists a linear transformation S R n 1 × n such that, if X 1 : Ω R n 1 , X 1 = S X , then X 1 N ( 0 , Q X 1 ) , Q X 1 0 , F X = F X 1 .
Proof. 
This is well-known in probability theory, see [20]. □

2.3. Wyner’s Coding Theorems with Side Information at the Decoder

For the sake of completeness, we introduce certain results from Wyner’s work in [2], which we use in this paper. On a probability space ( Ω , F , P ) , consider a tuple of jointly i.i.d. RVs ( X n , Y n ) = { ( Y t , Y t ) : t Z n } ,
X t : Ω Y , Y t : Ω Y , t Z n ,
with induced distribution P X t , Y t = P X , Y , t . Consider also the measurable function d X : X × X ^ [ 0 , ) , for a measurable space X ^ . Let
I M = 0 , 1 , , M 1 , M Z M ,
be a finite set.
A code ( n , M , D X ) , when switch “A” is open (see Figure 1), is defined by two measurable functions, the encoder F E and the decoder F D , with average distortion, as follows.
F E : X n I M , F D : I M × Y n X ^ n ,
1 n E t = 1 n d X ( X t , X ^ t ) = D X ,
where X ^ n is again a sequence of RVs, X ^ n = F D ( F E ( X n ) , Y n ) X ^ n . A non-negative rate distortion pair ( R , Δ X ) is said to be achievable if for every ϵ > 0 , and n sufficiently large, there exists a code ( n , M , D X ) such that
M 2 n ( R + ϵ ) , D X Δ X + ϵ .
Let R denote the set of all achievable pairs ( R , Δ X ) , and define, for Δ X 0 , the infimum of all achievable rates by
R ( Δ X ) = inf ( R , Δ X ) R R .
If for some Δ X there is no R < such that ( R , Δ X ) R , then set R ( Δ X ) = . For arbitrary abstract spaces Wyner [2] characterized the infimum of all achievable rates R ( Δ X ) by the single-letter RDF, R ¯ ( Δ X ) given by (5) and (6), in terms of an auxiliary RV Z : Ω Z . Wyner’s realization of the joint measure P X , Y , Z , X ^ induced by the RVs ( X , Y , Z , X ^ ) is illustrated in Figure 2, where Z is the output of the “test channel”, P Z | X . Wyner proved the following coding theorems.
Theorem 1. 
Wyner [[2], Theorems, pp. 64–65]. Suppose Assumption 1 holds.
(a) Converse Theorem. For any Δ X 0 , R ( Δ X ) R ¯ ( Δ X ) .
(b) Direct Theorem. If the conditions stated in ([2], pages 64-65, (i), (ii)) hold, then R ( Δ X ) R ¯ ( Δ X ) , 0 Δ X < .
In Figure 1, when switch A is closed and the tuple of jointly independent and identically distributed RVs ( X n , Y n ) is defined as in Section 2.3, Wyner [2] generalized Berger’s [3] characterization of all achievable pairs ( R , Δ X ) , from finite alphabet spaces to abstract alphabet spaces.
A code ( n , M , D X ) , when switch “A” is closed, (see Figure 1), is defined as in Section 2.3, with the encoder F E , replaced by
F E : X n × Y n I M .
Let R 1 denote the set of all achievable pairs ( R , Δ X ) , again as defined in Section 2.3. For Δ X 0 , define the infimum of all achievable rates by
R ¯ 1 ( Δ X ) = inf ( R , Δ X ) R 1 R .
Wyner [2] characterized the infimum of all achievable rates R ¯ 1 ( Δ X ) by the single-letter RDF R X | Y ( Δ X ) given by (1) and (3). The coding Theorems are given by Theorem 1 with R ( Δ X ) and R ¯ ( Δ X ) replaced by R ¯ 1 ( Δ X ) and R X | Y ( Δ X ) , respectively. That is, R ¯ 1 ( Δ X ) = R X | Y ( Δ X ) (using Wyner’s notation [[2], Appendix A.1]) These coding theorems generalized earlier work of Berger [3] for finite alphabet spaces. Wyner also derived a fundamental lower bound on R ( Δ X ) in terms of R ¯ 1 ( Δ X ) , as stated in the next remark.
Remark 1. 
Wyner [[2], Remarks, p. 65]
(A) For Z M ( Δ X ) , X ^ = f ( Y , Z ) , and thus P Z | X , Y = P Z | X . Then, by a property of conditional mutual information and the data processing inequality:
I ( X ; Z | Y ) = I ( X ; Z , f ( Y , Z ) | Y ) I ( X ; X ^ | Y ) R X | Y ( Δ X ) ,
where the last equality is defined since X ^ M 0 ( Δ X ) (see [[2], Remarks, p. 65]. Moreover, minimizing (36) over Z M ( Δ X ) gives
R ( Δ X ) R X | Y ( Δ X ) .
(B) Inequality (37) holds with equality, i.e., R ( Δ X ) = R X | Y ( Δ X ) if X ^ M 0 ( Δ X ) , which achieves I ( X ; X ^ | Y ) = R X | Y ( Δ X ) can be generated as in Figure 2 with I ( X ; Z | Y ) = I ( X ; X ^ | Y ) . This occurs if and only if I ( X ; Z | X ^ , Y ) = 0 , and follows from the identity and lower bound
I ( X ; Z | Y ) = I ( X ; Z , X ^ | Y ) = I ( X ; Z | Y , X ^ ) + I ( X ; X ^ | Y )
I ( X ; X ^ | Y ) ,
where the inequality holds with equality if and only if I ( X ; Z | X ^ , Y ) = 0 .

3. Main Theorems and Discussion

In this section, we state the main results of this paper. These are the achievable lower bounds of Lemma 1 and Theorem 2, which hold for RVs defined on general abstract alphabet spaces, and Theorems 4 and 5, which hold for multivariate Gaussian RVs.

3.1. Side Information at Encoder and Decoder for an Arbitrary Source

We start with the following achievable lower bound on the conditional mutual information I ( X ; X ^ | Y ) , which appears in the definition of R X | Y ( Δ X ) of (1); this strengthens Gray’s lower bound (8) [[4], Theorem 3.1].
Lemma 1. 
Achievable lower bound on conditional mutual information. Let ( X , Y , X ^ ) be a triple of arbitrary RVs taking values in the abstract spaces X × Y × X ^ , with distribution P X , Y , X ^ and joint marginal the fixed distribution P X , Y of ( X , Y ) . Then, the following hold.
(a) The inequality holds:
I ( X ; X ^ | Y ) I ( X ; X ^ ) I ( X ; Y ) .
Moreover, the equality holds
I ( X ; X ^ | Y ) = I ( X ; X ^ ) I ( X ; Y ) [ 0 , ) ,
if and only if
P X | X ^ , Y = P X | X ^ a . s . o r e q u i v a l e n t l y Y X ^ X i s a M C .
(b) If  Y X ^ X is a Markov chain then the equality holds
R X | Y ( Δ X ) = R X ( Δ X ) I ( X ; Y ) , Δ X D C ( X | Y ) ,
i.e., for all Δ X that belong to strictly positive set D C ( X | Y ) [ 0 , ) .
Proof. 
See Appendix A.1. □
The next theorem which holds for arbitrary RVs is further used to derive the characterization of R X | Y ( Δ X ) for multivariate Gaussian RVs.
Theorem 2. 
Achievable lower bound on conditional mutual information and mean-square error estimation
(a) Let ( X , Y , X ^ ) be a triple of arbitrary RVs on the abstract spaces X × Y × X ^ , with distribution P X , Y , X ^ and joint marginal the fixed distribution P X , Y of ( X , Y ) .
Define the conditional mean of X conditioned on ( X ^ , Y ) by
X ¯ c m = E X | Y , X ^ = e ( Y , X ^ ) ,
for some measurable function f : Y × X ^ X .
(1) The inequality holds:
I ( X ; X ^ | Y ) I ( X ; X ¯ c m | Y ) .
(2) The equality holds, I ( X ; X ^ | Y ) = I ( X ; X ¯ c m | Y ) if anyone of the conditions (i) or (ii) holds.
( i ) X ¯ c m = X ^ a . s ( i i ) F o r a f i x e d y Y t h e f u n c t i o n e ( y , · ) : X ^ X , e ( y , x ^ ) = x ¯ c m u n i q u e l y d e f i n e s x ^
i . e . , e ( y , · ) i s a n i n j e c t i v e f u n c t i o n o n t h e s u p p o r t o f x ^ .
(b) In part (a) let ( X , Y , X ^ ) be a triple of arbitrary RVs on X × Y × X ^ = R n x × R n y × R n x , ( n x , n y ) Z + × Z + .
For all measurable functions ( y , x ^ ) g ( y , x ^ ) R n x , the mean-square error satisfies
E | | X g ( Y , X ^ ) | | R n x 2 E | | X E X | Y , X ^ | | R n x 2 , g ( · ) .
Proof. 
See Appendix A.2. □

3.2. Side Information at Encoder and Decoder for Multivariate Gaussian Source

The characterizations of the RDFs R X | Y ( Δ X ) and R ¯ ( Δ X ) for a multivariate Gaussian source are encapsulated in Theorems 3–5; these are proved in Section 4. These theorems include the structural properties of optimal test channels or realizations of ( X ^ , Z ) , which induce joint distributions. Furthermore, they achieve the RDFs; the closed form expressions of the RDFs are based on a water-filling. The realization of the optimal test channel of R X | Y ( Δ X ) is shown in Figure 3.
The following theorem gives a parametric realization of optimal test channel that achieves the characterization of the RDF R X | Y ( Δ X ) .
Theorem 3. 
Characterization of R X | Y ( Δ X ) by test channel realization. Consider the RDF R X | Y ( Δ X ) defined by (1), for the multivariate Gaussian source with mean-square error distortion defined by (9)–(18). The following hold.
(a) The optimal realization X ^ that achieves R X | Y ( Δ X ) is parametrized by the matrices ( H , Q W ) and represented by
X ^ = H X Q X , Y Q Y 1 Y + Q X , Y Q Y 1 Y + W
= H X Q X , Y Q Y 1 Y + Q X , Y Q Y 1 Y + H Ψ , i f H 1 e x i s t s ,
where
H Q X | Y = Q X | Y H T = Q X | Y Σ Δ 0 ,
W i d e p e n d e n t o f ( X , Y ) , Q W N ( 0 , Q W ) ,
Q W = H Q X | Y H Q X | Y H T = H Σ Δ = Σ Δ Σ Δ Q X | Y 1 Σ Δ = Σ Δ H 0 ,
W = H Ψ , Ψ N ( 0 , Q Ψ ) , Q Ψ = Σ Δ H 1 = H 1 Σ Δ , i f H 1 e x i s t s ,
Σ Δ = E X X ^ X X ^ T ,
Q X ^ | Y = Q X | Y Σ Δ 0 ,
Q X | Y = Q X Q X , Y Q Y 1 Q X , Y T 0 , Q X , Y = Q X C T , Q Y = C Q X C T + D D T .
Moreover, the optimal parametric realization of X ^ satisfies the following structural properties.
( i ) P X | X ^ , Y = P X | X ^ , i f Q X Σ Δ ,
( i i ) E X | Y = E X ^ | Y , i f Q X Σ Δ ,
( i i i ) cov ( X , X ^ | Y ) = cov ( X ^ , X ^ | Y ) , i f Q X | Y Σ Δ ,
( i v ) E X | X ^ , Y = E X | X ^ = X ^ , i f Q X | Y Σ Δ .
(b) The RDF R X | Y ( Δ X ) is given by
R X | Y ( Δ X ) = inf Σ Δ 0 , Q X | Y Σ Δ 0 , trace Σ Δ Δ X 1 2 log max 1 , det ( Q X | Y Σ Δ 1 ) .
Proof. 
The proof is given in Section 4. □
The next theorem gives additional structural properties of the optimal test channel realization of Theorem 3 and uses these properties to characterize RDF R X | Y ( Δ X ) via a water-filling solution.
Theorem 4. 
Characterization of R X | Y ( Δ X ) via water-filling solution. Consider the RDF R X | Y ( Δ X ) defined by (1), for the multivariate Gaussian source with mean-square error distortion defined by (9)–(18), and its characterization in Theorem 3. The following hold.
(a) The matrices of the parametric realization of X ^ ,
{ Σ Δ , Q X | Y , H , Q W } h a v e s p e c t r a l d e c o m p o s i t i o n s w i t h r e s p e c t t o t h e s a m e u n i t a r y m a t r i x U U T = I n x , U T U = I n x ,
where the realization coefficients are
Q W = H Σ Δ = U diag ( σ W 1 2 , , σ W n x 2 ) U T , Σ Δ = U diag ( δ 1 , , δ n x ) U T ,
H = I n x Q X | Y 1 Σ Δ = U diag ( h 1 , , h n x ) U T , Q X | Y = U diag ( λ 1 , , λ n x ) U T ,
λ 1 λ 2 λ n x > 0 , δ 1 δ 2 δ n x > 0 ,
σ W 1 2 σ W 2 2 σ W n x 0 , h 1 h 2 h n x 0 , σ W i 2 = h i δ i , h i = 1 δ i λ i ,
and the eigenvalues σ W i 2 and h i are given by
σ W i 2 = min ( λ i , δ i ) ( λ i min ( λ i , δ i ) ) λ i , h i = λ i min ( λ i , δ i ) λ i , i = 1 n x min ( λ i , δ i ) = Δ X .
Moreover, if σ W i 2 = 0 , then h i = 0 , and vice versa.
(b) The RDF R X | Y ( Δ X ) is given by the water-filling solution:
R X | Y ( Δ X ) = 1 2 log max 1 , det ( Q X | Y Σ Δ 1 ) = 1 2 i = 1 n x log λ i δ i ,
where
E | | X X ^ | | R n x 2 = trace Σ Δ = i = 1 n x δ i = Δ X , δ i = μ , i f μ < λ i λ i , i f μ λ i
and μ ( 0 , ) is a Lagrange multiplier (obtained from the Kuch–Tucker conditions).
(c) Figure 3 depicts the parallel channel scheme that realizes the optimal X ^ of parts (a), (b), which achieves R X | Y ( Δ X ) .
(d) If X and Y are independent or Y is replaced by a RV that generates the trivial information, i.e., the σ algebra of Y is σ { Y } = { Ω , } (or C = 0 in (15)), then (a)–(c) hold with Q X | Y = Q X , Q X , Y = 0 , and R X | Y ( Δ X ) = R X ( Δ X ) , i.e., reduces to the marginal RDF of X.
Proof. 
The proof is given in Section 4. □
The proof of Theorem 4 (see Section 4) is based on the identification of structural properties of the test channel distribution. Some of the implications are briefly described below.
Conclusion 1: The construction and the structural properties of the optimal test channel P X | X ^ , Y that achieves the water-filling characterization of the RDF R X | Y ( Δ X ) of Theorems 3 and 4 are not documented elsewhere in the literature.
(i) Structural properties (58) and (61) strengthen Gray’s inequality [[4], Theorem 3.1], (see proof of (8)) to the equality. That is, structural property (58) implies that Gray’s [[4], Theorem 3.1] lower bound (8) holds with equality for a strictly positive surface (See Gray [4] for definition) Δ X D C ( X | Y ) [ 0 , ) , i.e.,
R X | Y ( Δ X ) = R X ( Δ X ) I ( X ; Y ) , Δ X D C ( X | Y ) = Δ X [ 0 , ) : Δ X n x λ n x .
The set D C ( X | Y ) excludes values of Δ X [ 0 , ) for which water-filling is active in (69) and (70).
By the realization of the optimal reproduction X ^ , it follows that the subtraction of equal quantities E X | Y at the encoder and decoder does not affect the information measure, noting that E X | Y = E X ^ | Y .
Theorem 4 points (a) and (b) are obtained with the aid of Theorem 3 and Hadamard’s inequality, which shows Q X | Y and Σ Δ have the same eigenvectors.
(ii) Structural properties of realizations of Theorems 3 and 4: The matrices { Σ Δ , Q X | Y , H , Q W } are nonnegative symmetric and have a spectral decomposition with respect to the same unitary matrix U U T = I n x [21]. This implies that the test channel is equivalently represented by parallel additive Gaussian noise channels (subject to pre-processing and post-processing at the encoder and decoder).
(iii) In Remark 4, we show that the realization of optimal X ^ in Figure 3, which achieves the RDF of Theorem 4, degenerates to Wyner’s [2] optimal realization, which attains the RDF R X | Y ( Δ X ) , for the tuple of scalar-valued, jointly Gaussian RVs ( X , Y ) with square error distortion function.

3.3. Side Information Only at Decoder for Multivariate Gaussian Source

Theorem 5 gives the optimal test channel that achieves the characterization of the RDF R ¯ ( Δ X ) and further states that there is no loss of compression rate if side information is only available at the decoder. That is, although in general, R ¯ ( Δ X ) R X | Y ( Δ X ) , an optimal reproduction X ^ = f ( Y , Z ) of X, where f ( · , · ) is linear, is constructed such that the inequality holds with equality.
Theorem 5. 
Characterization and water-filling solution of R ¯ ( Δ X ) . Consider the RDF R ¯ ( Δ X ) defined by (5) for the multivariate Gaussian source with mean-square error distortion, defined by (9)–(18). Then, the following hold.
(a) The characterization of the RDF, R ¯ ( Δ X ) satisfies
R ¯ ( Δ X ) R X | Y ( Δ X ) ,
where R X | Y ( Δ X ) is given in Theorem 4b.
(b) The optimal realization X ^ = f ( Y , Z ) , which achieves the lower bound in (72), i.e., R ¯ ( Δ X ) = R X | Y ( Δ X ) , is represented by
X ^ = f ( Y , Z )
= I H Q X , Y Q Y 1 Y + Z ,
Z = H X + W ,
( H , Q W ) g i v e n b y ( 51 ) ( 57 ) , a n d ( 63 ) h o l d s .
Moreover, the following structural properties hold:
(1) The optimal test channel satisfies
( i ) P X | X ^ , Y , Z = P X | X ^ , Y = ( α ) P X | X ^ , w h e r e ( α ) h o l d s i f Q X Σ Δ ,
( i i ) E X | X ^ , Y , Z = E X | X ^ , Y = ( β ) E X | X ^ = ( γ ) X ^ , w h e r e ( β ) , ( γ ) h o l d i f Q X | Y Σ Δ ,
( i i i ) P Z | X , Y = P Z | X .
(2) Structural property (2) of Theorem 4a holds.
Proof. 
It is given in Section 4. □
The proof of Theorem 5 is based on the derivation of the structural properties and Theorem 4. Some implications are discussed below.
Conclusion 2: The optimal reproduction X ^ = f ( X , Z ) or test channel distribution P X | X ^ , Y , Z , which achieves R ¯ ( Δ X ) of Theorem 5, are not reported in the literature.
(i) From the structural property (1) of Theorem 5, i.e., (77), it follows that the lower bound R ¯ ( Δ X ) R X | Y ( Δ X ) is achieved by the realization X ^ = f ( Y , Z ) of Theorem 5b; i.e., for a given Y = y , then X ^ uniquely defines Z.
(ii) If X is independent of Y or Y generates trivial information, then the RDFs R ¯ ( Δ X ) = R ¯ X | Y ( Δ X ) degenerate to the classical RDF of the source X, i.e., R X ( Δ X ) , as expected. This is easily verified from (73) and (76), i.e., Q X , Y = 0 , which implies X ^ = Z .
For scalar-valued RVs, X : Ω R , Y : Ω R , X N ( 0 , σ X 2 ) , and X independent of Y, then the optimal realization reduces to
X ^ = Z = 1 Δ X σ X 2 X + 1 Δ X σ X 2 Δ X W ¯ , W ¯ N ( 0 , 1 ) , σ X 2 Δ X ,
Q X ^ = Q Z = σ X ^ 2 = σ X 2 Δ X 0 ,
as expected.
(iii) In Remark 4, we show that the realization of optimal X ^ = f ( Y , Z ) , which achieves the RDF R ¯ ( Δ X ) of Theorem 5, degenerates to Wyner’s [2] realization that attains the RDF R ¯ ( Δ X ) , of the tuple of scalar-valued, jointly Gaussian RVs ( X , Y ) , with the square error distortion function.

4. Proofs of Theorems 3–5

In this section, we derive the statements of Theorems 3–5 by making use of Theorem 2 (which holds for general abstract alphabet spaces) by restricting attention to multivariate jointly Gaussian ( X , Y ) .

4.1. Side Information at Encoder and Decoder

For jointly Gaussian RVs ( X , Y , X ^ ) , in the next theorem we identify simple sufficient conditions for the lower bound of Theorem 2 to be achievable.
Theorem 6. 
Sufficient conditions for the lower bounds of Theorem 2 to be achievable. Consider the statement of Theorem 2 for a triple of jointly Gaussian RVs ( X , Y , X ^ ) on R n x × R n y × R n x , ( n x , n y ) Z + × Z + , i.e., P X , Y , X ^ = P X , Y , X ^ G and joint marginal the fixed Gaussian distribution P X , Y = P X , Y G of ( X , Y )
Then,
X ¯ c m = E X | Y , X ^ = e G ( Y , X ^ ) ,
e G ( Y , X ^ ) = E X | Y + cov ( X , X ^ | Y ) cov ( X ^ , X ^ | Y ) X ^ E X ^ | Y .
Moreover, the following hold.
Case (i). cov ( X ^ , X ^ | Y ) 0 , that is, rank ( Q X ^ | Y ) = n x . Condition (84) is sufficient for I ( X ; X ^ | Y ) = I ( X ; X ¯ c m | Y ) .
X ¯ c m = E X | Y , X ^ = e G ( Y , X ^ ) = X ^ a . s .
In addition, Conditions 1 and 2 below are sufficient for (84) to hold.
C o n d i t i o n 1 . E X | Y = E X ^ | Y
C o n d i t i o n 2 . cov ( X , X ^ | Y ) cov ( X ^ , X ^ | Y ) 1 = I n x
Case (ii). cov ( X ^ , X ^ | Y ) 0 but not cov ( X ^ , X ^ | Y ) 0 ; that is, rank ( Q X ^ | Y ) = n 1 < n x . Condition (87) is sufficient for I ( X ; X ^ | Y ) = I ( X ; X ¯ c m | Y ) .
e G ( · , · ) d e f i n e d b y ( 83 ) s a t i s f i e s ( 47 ) .
In addition, a sufficient condition for (87) to hold is, for a fixed Y = y Y , the σ algebras satisfy F X ^ = F e G ( y , X ^ ) .
Proof. 
Note that identity (83) follows from Proposition 1, (26), by letting Y = X ^ and G be the information generated by Y. Consider Case (i); If (84) holds then I ( X ; X ^ | X ¯ c m , Y ) = 0 . By (83), Conditions 1 and 2 are sufficient for (84) to hold. Consider Case (ii). Sufficient condition (87) follows from Theorem 2, and implies I ( X ; X ^ | X ¯ c m , Y ) = 0 . The statement below (87) follows from Proposition 2. □
Now, we turn our attention to the optimization problem R X | Y ( Δ X ) defined by (1) for the multivariate Gaussian source with mean-square error distortion defined by (9)–(18). In the next lemma, we derive a preliminary parametrization of the optimal reproduction distribution P X ^ | X , Y of the RDF R X | Y ( Δ X ) .
Lemma 2. 
Preliminary parametrization of optimal reproduction distribution of R X | Y ( Δ X ) . Consider the RDF R X | Y ( Δ X ) defined by (1) for the multivariate Gaussian source, i.e., P X , Y = P X , Y G , with mean-square error distortion defined by (9)–(18).
(a) For every joint distribution P X , Y , X ^ there exists a jointly Gaussian distribution denoted by P X , Y , X ^ G , with marginal the fixed distribution P X , Y G , which minimizes I ( X ; X ^ | Y ) and satisfies the average distortion constraint, i.e., with d X ( x , x ^ ) = | | x x ^ | | R n x 2 .
(b) The conditional reproduction distribution of the RDF R X | Y ( Δ X ) is P X ^ | X , Y = P X ^ | X , Y G and induced by the parametric realization of X ^ (in terms of H , G , Q W ),
X ^ = H X + G Y + W ,
H R n x × n x , G R n x × n y ,
W N ( 0 , Q W ) , Q W 0 ,
W i n d e p e n d e n t o f ( X , Y ) ,
and X ^ is a Gaussian RV.
(c) R X | Y ( Δ X ) is characterized by the optimization problem.
R X | Y ( Δ X ) = inf M 0 G ( Δ X ) I ( X ; X ^ | Y ) , Δ X [ 0 , ) ,
where M 0 G ( Δ X ) is specified by the set
M 0 G ( Δ X ) = X ^ : Ω X ^ : ( 88 ) ( 91 ) h o l d , a n d E | | X X ^ | | R n x 2 Δ X .
(d) If there exists ( H , G , Q W ) such that (84) or (87) hold, then a further lower bound on R X | Y ( Δ X ) is achieved in the subset M 0 G , o ( Δ X ) M 0 G ( Δ X ) defined by
M 0 G , o ( Δ X ) = { X ^ : Ω X ^ : ( 88 ) ( 91 ) h o l d , ( 84 ) o r ( 87 ) h o l d , E | | X X ^ | | R n x 2 Δ X } ,
and the corresponding characterization of the RDF is
R X | Y ( Δ X ) = inf M 0 G , o ( Δ X ) I ( X ; X ^ | Y ) , Δ X [ 0 , ) .
Proof. 
(a) This is omitted since it is similar to the classical unconditional RDF R X ( Δ X ) of a Gaussian message X N ( 0 , Q X ) . (b) By (a), the conditional distribution P X ^ | X , Y G is such that, its conditional mean is linear in ( X , Y ) , its conditional covariance is nonrandom, i.e., constant, and for fixed ( X , Y ) = ( x , y ) , P X ^ | X , Y G is Gaussian. Such a distribution is induced by the parametric realization (88)–(91). (c) Follows from parts (a) and (b). (d) Follows from Theorem 6 and (48) due to the achievability of the lower bounds. □
In the next theorem, we identify the optimal triple ( H , G , Q W ) such that (84) or (87) hold (i.e., establish its existence), characterize the RDF by R X | Y ( Δ X ) = inf M 0 G , o ( Δ X ) I ( X ; X ^ | Y ) , and construct a realization X ^ that achieves it.
Theorem 7. 
Characterization of RDF R X | Y ( Δ X ) . Consider the RDF R X | Y ( Δ X ) , defined by (1), for the multivariate Gaussian source with mean-square error distortion, defined by (9)–(18). The characterization of the RDF R X | Y ( Δ X ) is
R X | Y ( Δ X ) = inf Q ( Δ X ) I ( X ; X ^ | Y )
= inf M 0 G , o ( Δ X ) I ( X ; X ^ | Y )
= inf Q ( Δ X ) 1 2 log det ( Q X | Y Σ Δ 1 ) ,
where
Q ( Δ X ) = Σ Δ 0 : Q X | Y Σ Δ 0 , trace Σ Δ Δ X ,
Σ Δ = E X X ^ X X ^ T ,
Q X | Y = Q X Q X , Y Q Y 1 Q X , Y T ,
Q X , Y = Q X C T , Q Y = C Q X C T + D D T .
The realization of the optimal reproduction X ^ M 0 G , o ( Δ X ) , which achieves R X | Y ( Δ X ) , is given in Theorem 3a, also satisfies the properties stated under Theorem 3a. (i)–(iv).
Proof. 
See Appendix A.3. □
Remark 2. 
Structural properties of the optimal realization of Theorem 4a. For the characterization of the RDF R X | Y ( Δ X ) of Theorem 7, which is achieved by X ^ defined in Theorem 3a in terms of the matrices Σ Δ , Q X | Y , H , Q W , we show in Corollary 2, the statements of Theorem 4a, i.e.,
( i ) H = H T 0 ,
( i i ) Σ Δ , Σ X | Y , H , Q W h a v e s p e c t r a l r e p r e s . w i t h r e s p e c t t o t h e s a m e u n i t a r y m a t r i x U U T = I n x .
To prove the structural property of Remark 2, we use the next corollary, which is a degenerate case of [[22], Lemma 2] (i.e., the structural properties of test channel of Gorbunov and Pinsker [23] nonanticipatory RDF of Markov sources).
Corollary 1. 
Structural properties of realization of optimal X ^ of Theorem 4a. Consider the characterization of the RDF R X | Y ( Δ X ) of Theorem 7. Suppose Q X | Y 0 and Σ Δ 0 commute, that is,
Q X | Y Σ Δ = Σ Δ Q X | Y .
Then,
( 1 ) H = I n x Σ Δ Q X | Y 1 = H T , Q W = Σ Δ H T = Σ Δ H = H Σ Δ = Q W T 0 ( 2 ) Σ Δ , Q X | Y , H , Q W h a v e s p e c t r a l
d e c o m p o s i t i o n s w i t h r e s p e c t t o t h e s a m e u n i t a r y m a t r i x U U T = I n x , U T U = I n x .
that is, the following hold.
Q X | Y = U diag { λ 1 , , λ n x } U T , λ 1 λ 2 λ n x > 0 ,
Σ Δ = U diag { δ 1 , , δ n x } U T , δ 1 δ 2 δ n x 0 ,
H = U diag { 1 δ 1 λ 1 , , 1 δ n x λ n x } U T ,
Q W = U diag { ( 1 δ 1 λ 1 ) δ 1 , , ( 1 δ n x λ n x ) δ n x } U T , and ( 1 δ k λ k ) δ k 0 .
Proof. 
See Appendix A.4. □
In the next corollary, we re-express the realization of X ^ of Theorem 4a, which characterizes the RDF of Theorem 7 using a translation of X and X ^ by subtracting their conditional means with respect to Y, making use of property E X | Y = E X ^ | Y of (78). This is the the realization shown in Figure 3.
Corollary 2. 
Equivalent characterization of R X | Y ( Δ X ) . Consider the characterization of the RDF R X | Y ( Δ X ) of Theorem 7 and the realization of X ^ of Theorem 3a and Theorem 4a. Define the translated RVs
X = X E X | Y = X Q X , Y Q Y 1 Y , X ^ = X ^ E X | Y = X ^ Q X , Y Q Y 1 Y .
Let
Q X | Y = U diag { λ 1 , , λ n x } U T , U U T = I n x , U T U = I n x , λ 1 λ 2 λ n x ,
X ¯ = U T X , X ¯ ^ = U T X ^ .
Then,
X ^ = H X + W ,
I ( X ; X ^ | Y ) = I ( X ; X ^ ) = I ( U T X ; U T X ^ ) ,
E X X ^ R n x 2 = E X X ^ R n x 2 = E U T X U T X ^ R n x 2 = trace Σ Δ ,
where ( H , Q W ) are given in Theorem 3a.
Further, the characterization of the RDF R X | Y ( Δ X ) (98) satisfies the following equalities and inequality:
R X | Y ( Δ X ) = inf Q ( Δ X ) I ( X ; X ^ | Y ) = inf Q ( Δ X ) 1 2 log max 1 , det ( Q X | Y Σ Δ 1 )
= inf E X X ^ R n x 2 Δ X I ( X ; X ^ )
= inf E U T X U T X ^ R n x 2 Δ X I ( U T X ; U T X ^ )
inf E U T X U T X ^ R n x 2 Δ X t = 1 n x I ( X ¯ t ; X ¯ ^ t )
Moreover, the inequality (121) is achieved if Q X | Y 0 and Σ Δ 0 commute; that is, if (105) holds, then
R X | Y ( Δ X ) = inf i = 1 n x δ i Δ X 1 2 i = 1 n x log max 1 , λ i δ i
where
diag { E U T X U T X ^ U T X U T X ^ T } = diag { δ 1 , δ 2 , , δ n x } .
Proof. 
By Theorem 3a,
X ^ = H X + I H Q X , Y Q Y 1 Y + W
= H X Q X , Y Q Y 1 Y + Q X , Y Q Y 1 Y + W
X ^ Q X , Y Q Y 1 Y = H X Q X , Y Q Y 1 Y + W
X ^ = H X + W .
The last equation establishes (115). By properties of conditional mutual information and the properties of optimal realization X ^ , the following equalities hold.
I ( X ; X ^ | Y ) = I ( X Q X , Y Q Y 1 Y ; X ^ Q X , Y Q Y 1 Y | Y )
= I ( X ; X ^ | Y ) , by ( 112 )
= H ( X ^ | Y ) H ( X ^ | Y , X )
= H ( X ^ ) H ( X ^ | Y , X ) , by indep . of ( X , W ) and Y
= H ( X ^ ) H ( X ^ | X ) , by indep . of W and Y for fixed X
= I ( X ; X ^ )
= I ( U T X ; U T X ^ )
= I ( X ¯ 1 , X ¯ 2 , , X ¯ n x ; X ¯ ^ 1 , X ¯ ^ 2 , , X ¯ ^ n x )
t = 1 n x I ( X ¯ t ; X ¯ ^ t ) , by mutual independence of X ¯ t , t = 1 , 2 , , n x .
Moreover, inequality (136) holds with equality if ( X ¯ t ; X ¯ ^ t ) , t = 1 , 2 , , n x are jointly independent. The average distortion function is then given by
E X X ^ R n x 2 = E X X ^ Q X , Y Q Y 1 Y + Q X , Y Q Y 1 Y R n x 2
= E X X ^ R n x 2 , b y ( 112 )
= E U T X U T X ^ R n x 2 = trace Σ Δ , b y U U T = I n x .
By Corollary 1, if (105) holds, that is, Q X | Y 0 and Σ Δ 0 satisfy Q X | Y Σ Δ = Σ Δ Q X | Y (i.e., commute), then (106)–(108) hold, and by (122) we obtain
X ¯ ^ = U T X ^ = U T H X + U T W = U T X ^ = U T H U U T X + U T W
= U T H U X ¯ + U T W , U T H U is diagonal and U T W has indep . components .
Hence, if (105) holds, then the lower bound in (136) holds with equality because ( X ¯ t ; X ¯ ^ t ) , t Z n x are jointly independent. Moreover, if (105) holds, then from, say, (118), the expressions (122) and (123) are obtained. The above equations establish all claims. □
Proposition 4. 
Theorem 4 is correct.
Proof. 
By invoking Corollary 2, Theorem 7 and the convexity of R X | Y ( Δ X ) given by (122), then we arrive at the statements of Theorem 4, which completely characterize the RDF R X | Y ( Δ X ) and construct a realization of the optimal X ^ that achieves it. □
Next, we discuss the degenerate case, when the statements of Theorems 3, 4 and 7 reduce to the RDF R X ( Δ X ) of a Gaussian RV X with square-error distortion function. We illustrate that the identified structural property of the realization matrices Σ Δ , Q X | Y , H , Q W leads to to the well-known water-filling solution.
Remark 3. 
Degenerate case of Theorem 7 and realization X ^ of Theorem 4a. Consider the characterization of the RDF R X | Y ( Δ X ) of Theorem 7, the realization of X ^ Theorem 3a, Theorem 3, and assume X and Y are independent or Y generates the trivial information; i.e., the σ algebra of Y is σ { Y } = { Ω , } or C = 0 in (15)–(18).
(a) By the definitions of Q X , Y , Q X | Y then
Q X , Y = 0 , Q X | Y = Q X .
Substituting (142) into the expressions of Theorem 7, the RDF R X | Y ( Δ X ) reduces to
R X | Y ( Δ X ) = R X ( Δ X ) = inf Q ( Δ X ) I ( X ; X ^ )
= inf Q m ( Δ X ) 1 2 log det ( Q X Σ Δ 1 ) ,
where
Q m ( Δ X ) = Σ Δ 0 : Q X Σ Δ , trace Σ Δ Δ X ,
and the optimal reproduction X ^ reduces to
X ^ = I n x Σ Δ Q X 1 X + W , Q X Σ Δ ,
Q W = I n x Σ Δ Q X 1 Σ Δ 0 .
Thus, R X ( Δ X ) is the well-known RDF of a multivariate memoryless Gaussian RV X with square-error distortion.
(b) For the RDF R X ( Δ X ) of part (a), it is known [24] that Σ Δ and Q X have a spectral decomposition with respect to the same unitary matrix, that is,
Q X = U Λ X U T , Σ Δ = U Δ U T , U U T = I ,
Λ X = diag { λ X , 1 , , λ X , n x } , Δ = diag { δ 1 , , δ n x } ,
where the entries of ( Λ X , Δ ) are in decreasing order.
Define
X p = U T X , X ^ p = U T X ^ , W p = U T W .
Then, a parallel channel realization of the optimal reproduction X ^ p is obtained as follows:
X ^ p = H X p + W p ,
H = I n x Δ Λ X 1 = diag { 1 δ 1 λ X , 1 , , 1 δ n x λ X , n x } ,
Q W p = H Δ = diag { 1 δ 1 λ X , 1 δ 1 , , 1 δ n x λ X , n x δ n x } .
The RDF R X ( Δ X ) is then computed from the reverse water-filling equations as follows.
R X ( Δ X ) = 1 2 i = 1 n x log λ X , i δ i ,
where
i = 1 n x δ i = Δ X , δ i = μ , i f μ < λ X , i λ X , i , i f μ λ X , i
and μ [ 0 , ) is a Lagrange multiplier (obtained from the Kuch–Tucker conditions).

4.2. Side Information Only at Decoder

In general, when the side information is available only at the decoder, the achievable operational rate R ( Δ X ) is greater than the achievable operational rate R ¯ 1 ( Δ X ) when the side information is available to the encoder and the decoder [2]. By Remark 1, R ¯ ( Δ X ) R X | Y ( Δ X ) , and equality holds if I ( X ; Z | X ^ , Y ) = 0 .
In view of the characterization of R X | Y ( Δ X ) and the realization of the optimal reproduction X ^ of Theorem 3, which is presented in Figure 3, we observe that we can re-write (49) as follows.
X ^ = H X + I n x H Q X , Y Q Y 1 Y + W ,
= I n x H Q X , Y Q Y 1 Y + Z
= f ( Y , Z )
Z = H X + W ,
H = I n x Σ Δ Q X | Y 1 , Q W = H Σ Δ , defined by ( 51 ) ( 63 ) ,
P Z | X , Y = P Z | X , ( X ^ , Y ) uniquely define Z , which implies I ( X ; Z | X ^ , Y ) = 0 .
Proposition 5. 
Theorem 5 is correct.
Proof. 
From the above realization of X ^ = f ( Y , Z ) , we have the following. (a) By Wyner, see Remark 1, then the inequalities (36) and (37) hold, and equalities hold if I ( X ; Z | X ^ , Y ) = 0 . That is, for any X ^ = f ( Y , Z ) , and by the properties of conditional mutual information, then
I ( X ; Z | Y ) = ( α ) I ( X ; Z , X ^ | Y )
= ( β ) I ( X ; Z | X ^ , Y ) + I ( X ; X ^ | Y )
( γ ) I ( X ; X ^ | Y ) ,
where ( α ) is due to X ^ = f ( Y , Z ) , ( β ) is due to the chain rule of mutual information, and ( γ ) is due to I ( X ; Z | X ^ , Y ) 0 . Hence, (72) is obtained (as in Wyner [2] for a tuple of scalar jointly Gaussian RVs). (b) Equality holds in (164) if there exists an X ^ = f ( Y , Z ) such that I ( X ; Z | X ^ , Y ) = 0 , and the average distortion is satisfied. Taking X ^ = f ( Y , Z ) = ( I n x H ) Q X , Y Q Y 1 Y + Z , where Z = g ( X , W ) is specified by (156)–(160), then I ( X ; Z | X ^ , Y ) = 0 and the average distortion is satisfied. Since the realization (156)–(160) is identical to the realization (73)–(76), then part (b) is also shown. (c) This follows directly from the optimal realization. □

5. Connection with Other Works and Simulations

In this section, we illustrate that for the special case of scalar-valued jointly Gaussian RVs ( X , Y ) , our results reproduce Wyner’s [2] results. In addition, we show that the characterizations of the RDFs of the more general problems considered in [5,6] (i.e., where a noisy version of source is available at the encoder) do not reproduce Wyner’s [2] results. Finally, we present simulations.

5.1. Connection with Other Works

Remark 4. 
The degenerate case to Wyner’s [2] optimal test channel realizations. Now, we verify that for the tuple of scalar-valued, jointly Gaussian RVs ( X , Y ) , with square error distortion function specified below, our optimal realizations of X ^ and closed form expressions for R X | Y ( Δ X ) and R ¯ ( Δ X ) are identical to Wyner’s [2] realizations and RDFs (see Figure 4). Let us define:
X : Ω X = R , Y : Ω Y = R , X ^ : Ω X ^ = R ,
d X ( x , x ^ ) = x x ^ 2 ,
X N ( 0 , σ X 2 ) , σ X 2 > 0 , Y = α X + U ,
U N ( 0 , σ U 2 ) , σ U 2 > 0 , α > 0 .
(a) RDF R X | Y ( Δ X ) : By Theorem 4a applied to (165)–(168), we obtain
Q X = σ X 2 , Q X , Y = α σ X 2 , Q Y = σ Y 2 = α 2 σ X 2 + α 2 σ U 2 , Q X | Y = c σ U 2 , c = σ X 2 σ X 2 + σ U 2 ,
H = 1 Δ X Q X | Y 1 = c σ U 2 d c σ U 2 a , Q X , Y Q Y 1 = c α , H Q X , Y Q Y 1 = a c α ,
W = H Ψ = a Ψ , Q Ψ = H 1 Δ X = Δ X a = c σ U 2 Δ X c σ U 2 Δ X , c σ U 2 Δ X > 0 .
Moreover, by Theorem 4b the optimal reproduction X ^ M 0 ( d ) and R X | Y ( d ) are,
X ^ = a ( X c α Y ) + c α Y + a Ψ , c σ U 2 Δ X > 0
R X | Y ( Δ X ) = 1 2 log c σ U 2 Δ X , 0 < Δ X < c σ U 2 0 , Δ X c σ U 2 .
This shows our realization of Figure 3 degenerates to Wyner’s [2] realization of Figure 4a.
(b) RDF R ¯ ( Δ X ) : By Theorem 5b applied to (165)–(168), and using the calculations (169)–(172), we obtain
X ^ = f ( Y , Z ) = c α ( 1 a ) Y + Z b y ( 172 ) , ( 175 ) ,
Z = a X + Ψ , ( a , Ψ ) d e f i n e d i n ( 170 ) , ( 171 )
R ¯ ( Δ X ) = R X | Y ( Δ X ) = ( 173 ) b y e v a l u a t i n g I ( X ; Z ) I ( Y ; Z ) , u s i n g ( 4 ) a n d ( 175 ) .
This shows our value of R ¯ ( Δ X ) and optimal realization X ^ = f ( Y , Z ) reproduce Wyner’s optimal realization and the value of R ¯ ( Δ X ) given in [2] (i.e., Figure 4b).
In the following Remark, we show that, when S = X -a.s., the realization of the auxiliary RV Z, which is used in the proofs in [5,6] to show the converse coding theorem does not coincide with Wyner’s realization [2]. Also, their realizations do not reproduce Wyner’s RDF (this observation is verified for modified realization given in the correction note without proof in https://tiangroup.engr.tamu.edu/publications/ (accessed on 3 January 2024)). The deficiency of the realizations in [5,6] to show the converse was first pointed out in [7], using an alternative proof.
Remark 5. 
Optimal test channel realization of [5,6]
(a) The derivation of [[5], Theorem 4], uses the following representation of RVs (see [[5], Equation (4)] adopted to our notation using (19)):
X = K x s K s y + K x y Y + K x s N 1 + N 2 , S = K s y Y + N 1 ,
where N 1 and N 2 are independent Gaussian RVs with zero mean, N 1 is independent Y and N 2 is independent of ( S , Y ) .
To reduce [5,6] to the Wyner and Ziv RDF, we set X = S a . s . , which then implies, K x s = I , N 2 = 0 a . s and K x y = 0 . According to the derivation of the converse [[5], Theorem 4] (see [[5], 3 lines above Equation (32)] using our notation), the optimal realization of the auxiliary RV Z T used to achieve the RDF is
Z T = U T X + N 3 ,
where Q X | Y = U diag ( λ 1 , , λ n ) U T and U is a unitary matrix, N 3 N ( 0 , Q N 3 ) such that Q N 3 is a diagonal covariance matrix, with elements given by (for the value of σ 3 , i 2 , we considered the one given in the correction note in https://tiangroup.engr.tamu.edu/publications/ (accessed on 3 January 2024) (although no derivation is given), where it is stated that σ 3 , i 2 that appeared in the derivation [[5], proof of theorem 4] should be multiplied by λ i ),
σ 3 , i 2 = min ( λ i , δ i ) λ i min ( λ i , δ i ) λ i , i = 1 n min ( λ i , δ i ) = Δ X .
(b) It is easy to verify that the above realization of Z T that uses the correction of footnote 6 is precisely the realization given in [[6], Theorem 3A].
(c) Special Case: For scalar-valued RVs the auxiliary RV Z T reduces to
Z T = X + N 3 , N 3 N 0 , Δ X Q X | Y Q X | Y Δ X , Q X | Y > Δ X
Now, we examine whether the realization (179) corresponds to Wyner’s realization and induces Wyner’s RDF. Recall that the Wyner’s [2] RDF, denoted by R X ; Z | Y ( Δ X ) and corresponding to auxiliary RV Z, is
Z = H X + W , H = Q X | Y Δ X Q X | Y , W N ( 0 , H Δ X ) ,
R X ; Z | Y ( Δ X ) = I ( X ; Z | Y ) = 1 2 log Q X | Y Δ X , Δ X Q X | Y .
Clearly, the two realizations (179) and (180) are different. Let R ^ X ; Z T | Y ( Δ X ) denote the RDF corresponding to the realization Z T . Then R ^ X ; Z T | Y ( Δ X ) can be computed using I ( X ; Z T | Y ) = I ( X ; Z T ) I ( Y ; Z T ) = H ( Z T | X ) + H ( Z T | Y ) where H ( · | · ) denotes the conditional differential entropy. Then, by using
Q Z T | X = Q N 3 = Δ X Q X | Y Q X | Y Δ X ,
Q Z T | Y = Q N 3 + Q X | Y .
it is straightforward to show that
R ^ X ; Z T | Y ( Δ X ) = H ( Z T | X ) + H ( Z T | Y )
= 1 2 log ( 2 π e Δ X Q X | Y Q X | Y Δ X ) + 1 2 log ( 2 π e Q X | Y 2 Q X | Y Δ X ) , Δ X < Q X | Y
However, we note that (i) unlike Wyner’s RDF given in (181), which gives R X ; Z | Y ( Δ X ) = 0 at Δ X = Q X | Y , the corresponding R ^ X ; Z T | Y ( Δ X ) = + at Δ X = Q X | Y , and (ii) Wyner’s test channel realization is Z = H X + W , H = Q X | Y Δ X Q X | Y and W N ( 0 , H Δ X ) , which is different from the test channel realization in (179). In particular, if Q X | Y = Δ X , then H = 0 W N ( 0 , 0 ) and Z = 0 a . s . On the other hand, for the test channel in (179), if Q X | Y = Δ X , then N 3 N 0 , + , and thus the variance of Z T in (179) is not zero.
Further, in Proposition 6, we prove that for the multi-dimensional source, the test channel realization in (179) does not achieve the RDF when water-filling is active, i.e., when at least one component of the source is not reproduced.
(d) Special Case Classical RDF: The classical RDF is obtained as a special case if we assume X and Y are independent or Y generates the trivial information { Ω , } ; i.e., Y is nonrandom. Clearly, in this case, the RDF R ^ S ; Z T | Y ( Δ X ) should degenerate to the classical RDF of the source X, i.e., R X ( Δ X ) , and it should be that X ^ = Z T . However, for this case, (179) gives Q Z T = Q X + Δ X Q X Q X Δ X = Q X 2 Q X Δ X , which is fundamentally different from Wyner’s degenerate, and correct values Q X ^ = Q Z = max { 0 , Q X Δ X } .
Proposition 6. 
When S = X -a.s., Wyner’s [2] auxiliary RV Z and the auxiliary RV Z T given in (177) i.e., the degenerate case of [5,6] (with the correction of footnote 6), are not related by an invertible function. As a result, the computed RDF based on the two realizations are different.
Proof. 
Recall that, if the two auxiliary RVs Z T and Z are not related by an invertible function, i.e., Z = f ( Z T ) , where f ( · ) is invertible and both f and its inverse are measurable, then I ( X ; Z T ) I ( Y ; Z T ) I ( X ; Z ) I ( Y ; Z ) . It was shown earlier in this paper (and also in [7]) that for the multivariate Wyner’s RDF, the auxiliary RV takes the form
Z = H X + W , W N ( 0 , Q W ) ,
where Q W = H Σ Δ = U diag ( σ w , 1 2 , , σ w , n 2 ) U T , Σ Δ = U diag ( δ 1 , , δ n ) U T , H = I Q X | Y 1 Σ Δ = U diag ( h 1 , , h n ) U T and Q X | Y = U diag ( λ 1 , , λ n ) U T , where U is a unitary matrix. The eigenvalues σ w , i 2 and h i are given by
σ w , i 2 = min ( λ i , δ i ) ( λ i min ( λ i , δ i ) ) λ i ,
h i = λ i min ( λ i , δ i ) λ i ,
where i = 1 n min ( λ i , δ i ) = Δ X . Hence, Equations (186), (187), and (188), imply that if σ w , i 2 = 0 then h i = 0 , and vice versa. Such zero values correspond to compression with water-filling. On the other hand, from (177) and (178), if water-filling is active, then σ 3 , i 2 = λ i 2 λ i λ i . Moreover, by comparing Equations (187) with (178) and (188) with (177), it is straightforward to show that f ( · ) = H U . If H U is not an invertible matrix for all values of the distortion Δ X , then I ( X ; Z T ) I ( Y ; Z T ) I ( X ; Z ) I ( Y ; Z ) .
By (188) it is easy to show that if min ( λ i , δ i ) = λ i , H U is not invertible. This implies I ( X ; Z T ) I ( Y ; Z T ) I ( X ; Z ) I ( Y ; Z ) . □

5.2. Simulations

In this section, we provide an example to show the gap between the classical rate distortion R X ( Δ X ) defined in (154), the conditional distortion function R X | Y ( Δ X ) (69), and to verify the validity of Gray’s lower bound (8). Note that in Theorem 5 it is shown that R X | Y ( Δ X ) = R ¯ ( Δ X ) , and hence the plot for R ¯ ( Δ X ) is omitted. For the evaluation, we pick a joint covariance matrix (11) given by
Q ( X , Y ) = 2.5000 1.1250 0.4750 0.6125 1.1250 0.8125 0.2750 0.3063 0.4750 0.2750 0.1525 0.1625 0.6125 0.3063 0.1625 0.2031 , X : Ω R 2 , Y : Ω R 2 .
In order to compute the rates, we first have to find Q X , Q Y , Q X Y and Q X | Y . From the definition of Q ( X , Y ) given in (11), it is easy to see that the covariance of X, Y, and the joint covariance of X and Y are equal to
Q X = 2.5000 1.1250 1.1250 0.8125 , Q Y = 0.1525 0.1625 0.1625 0.2031 , Q X Y = 0.4750 0.6125 0.2750 0.3063 .
Then, the conditional covariance Q X | Y , which appears in R X | Y ( Δ X ) , can be computed from (27). Using Singular Value Decomposition (SVD), we can calculate the eigenvalues of Q X | Y . For this case, the eigenvalues of the conditional covariance are { 0.7538 , 0.2 } . Similarly, the eigenvalues of Q X can be determined. Finally, the eigenvalues of Q X and Q X | Y are passed to the water-filling to compute the R X ( Δ X ) and R X | Y ( Δ X ) , respectively.
The classical rate distortion, the conditional RDF, and the Gray’s lower bound for the joint covariance above are illustrated in Figure 5. It is clear that R X | Y ( Δ X ) is smaller, and as the distortion Δ X increases, the gap between the classical and conditional RDF becomes larger. Gray’s lower bound is achievable for some positive distortion values, as provided in (71), i.e., for Δ X { Δ X [ 0 , ) : Δ X n x λ n x } . Recall that the set of eigenvalues of Q X | Y is { 0.7538 , 0.2 } , and the lower bound is achievable for Δ X 2 · 0.2 = 0.4 ; i.e., for these values R X | Y ( Δ X ) = R X ( Δ X ) I ( X ; Y ) .

6. Conclusions

We derived nontrivial structural properties of the optimal test channel realizations that achieve the optimal test channel distributions of the characterizations of RDFs for a tuple of multivariate jointly independent and identically distributed Gaussian random variables with mean-square error fidelity for two cases. Initially, the side information was available at the encoder and decoder, and then it was only available at the decoder. Using the realizations of the optimal test channels, we showed that when the side information is known to the encoder and the decoder, it does not achieve a better compression compared to when side information is only known to the decoder.

Author Contributions

M.G. and C.D.C. contributed to the conceptualization, methodology, and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The work of M.G. and C.D. Charalambous was co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (Project: EXCELLENCE/1216/0296).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Appendix A.1. Proof of Lemma 1

(a) By the chain rule of mutual information,
I ( X ; X ^ , Y ) = I ( X ; Y | X ^ ) + I ( X ; X ^ )
= I ( X ; X ^ | Y ) + I ( X ; Y )
Since I ( X ; Y | X ^ ) 0 , then from the above it follows
I ( X ; X ^ ) I ( X ; X ^ | Y ) + I ( X ; Y )
I ( X ; X ^ | Y ) I ( X ; X ^ ) I ( X ; Y )
The above shows (40). However, the inequality holds with equality if and only if I ( X ; Y | X ^ ) = 0 , and this quantity is zero if and only if P X | X ^ , Y = P X | X ^ . Alternatively, we note the following:
I ( X ; X ^ | Y ) = E log P X | X ^ , Y P X | Y = E log P X | X ^ , Y P X | Y P X P X = E log P X | X ^ , Y P X log P X | Y P X = E log P X | X ^ P X log P X | Y P X , if and only if P X | X ^ , Y = P X | X ^ .
This completes the statement of equality of (40); i.e., it establishes equality (41). (b) Consider a test channel P X | X ^ , Y such that E { | | X X ^ | | R n x 2 Δ X , i.e., X ^ M 0 ( Δ X ) , and such that P X | X ^ , Y = P X | X ^ , for Δ X D C ( X | Y ) [ 0 , ) . By (41) taking the infimum of both sides over X ^ M 0 ( Δ X ) such that P X | X ^ , Y = P X | X ^ , then (43) is obtained on a nontrivial surface, i.e., Δ X D C ( X | Y ) , which exists due to continuity and convexity of R X ( Δ X ) for Δ X ( 0 , ) . This completes the proof.

Appendix A.2. Proof of Theorem 2

(a) (1) By properties of conditional mutual information [18],
I ( X ; X ^ | Y ) = ( α ) I ( X ; X ^ , X ¯ c m | Y )
= ( β ) I ( X ; X ^ | X ¯ c m , Y ) + I ( X ; X ¯ c m | Y )
( γ ) I ( X ; X ¯ c m | Y )
where ( α ) is due to X ¯ c m being a function of ( Y , X ^ ) , and a well-known property of the mutual information [18] ( β ) is due to the chain rule of mutual information [18], and ( γ ) is due to I ( X ; X ^ | X ¯ c m , Y ) 0 . Hence, inequality (45) is shown. (2) If (i) holds, i.e., X ^ = X ¯ c m - a.s, then I ( X ; X ^ | X ¯ c m , Y ) = 0 , and hence the inequality (45) becomes an equality. If (ii) holds, since for fixed y Y the function e ( y , · ) : X ^ X , e ( y , x ^ ) = x ¯ c m uniquely defines x ^ , then I ( X ; X ^ | X ¯ c m , Y ) = 0 , and the inequality (45) becomes an equality.
(b) The inequality (48) is well known due to the orthogonal projection theorem.

Appendix A.3. Proof of Theorem 7

Consider the realization (88). We identify the triple ( H , G , Q W ) such that (84) or (87) hold; i.e., we characterize the set M 0 G , o ( Δ X ) .
Case (i). cov ( X ^ , X ^ | Y ) 0 , that is, rank ( Q X ^ | Y ) = n x . By Theorem 6, Case (i), we seek the triple ( H , G , Q W ) such that (84) holds, i.e., X ¯ c m = X ^ a . s . Recall that Conditions 1 and 2 of Theorem 6 are sufficient for X ^ = X ¯ c m .
Condition 1, i.e., (85). The left-hand side part of (85) is given by (this follows from mean-square estimation theory, or an application of (26) with G = { Ω , } )
E X | Y = E X + cov ( X , Y ) cov ( Y , Y ) 1 Y E Y = cov ( X , Y ) cov ( Y , Y ) 1 Y
= Q X , Y Q Y 1 Y
= Q X C T Q Y 1 Y by model ( 15 ) ( 18 ) .
Similarly, the right hand side of (85) is given by
E X ^ | Y = E X ^ + cov ( X ^ , Y ) cov ( Y , Y ) 1 Y E Y = cov ( X ^ , Y ) cov ( Y , Y ) 1 Y
= H Q X , Y + G Q Y Q Y 1 Y
= H Q X C T + G Q Y Q Y 1 Y by ( 15 ) ( 18 )
Equating (A9) and (A12), then
E X | Y = E X ^ | Y
Q X , Y Q Y 1 Y = H Q X , Y + G Q Y Q Y 1 Y by ( A12 )
G = I H Q X , Y Q Y 1
G = I H Q X C T Q Y 1 by ( 15 ) ( 18 ) .
Hence, G is obtained, and the reproduction is represented by
X ^ = H X + I H Q X , Y Q Y 1 Y + W ,
E X ^ | Y = Q X , Y Q Y 1 Y = E X | Y ,
X ^ E X ^ | Y = H X H Q X , Y Q Y 1 Y + W .
Condition 2, i.e., (86). To apply (86), the following calculations are needed.
Q X | Y = cov ( X , X | Y ) = E X E X | Y X E X | Y T
= Q X Q X , Y Q Y 1 Q X , Y T
= Q X Q X C T Q Y 1 C Q X by ( 15 ) ( 18 )
cov ( X , X ^ | Y ) = E X E X | Y X ^ E X ^ | Y T
= E X E X | Y X ^ E X | Y T by ( A19 )
= E X E X | Y X ^ T by orthogonality
= Q X H T Q X , Y Q Y 1 Q Y , X H T by ( A18 ) , ( A19 )
= Q X H T Q X C T Q Y 1 C Q X H T by ( 15 ) ( 18 ) = Q X Q X C T Q Y 1 C Q X H T
= Q X | Y H T .
cov ( X ^ , X ^ | Y ) = E X ^ E X ^ | Y X ^ E X ^ | Y T = H Q X H T + Q W H Q X , Y Q Y 1 Q Y , X H T by ( A20 )
= H Q X H T + Q W H Q X C T Q Y 1 C Q X H T by ( 15 ) ( 18 ) = H Q X Q X C T Q Y 1 C Q X H T + Q W
= H Q X | Y H T + Q W .
By Condition 2 and (A28) and (A31),
cov ( X , X ^ | Y ) cov ( X ^ , X ^ | Y ) 1 = I n x Q X | Y H T H Q X | Y H T + Q W 1 = I n x
Q W = Q X | Y H T H Q X | Y H T
Q W = I n x H Q X | Y H T .
It remains to show Q W = Q W T . This will follow shortly by identifying the equation for H as follows. Conditions 1 and 2 imply
Σ Δ = cov ( X , X | Y , X ^ )
= cov ( X , X | Y ) cov ( X , X ^ | Y ) cov ( X ^ , X ^ | Y ) 1 cov ( X , X ^ | Y ) T , mboxby Propostion 1 , ( 26 )
= cov ( X , X | Y ) cov ( X , X ^ | Y ) T , by ( 86 )
= Q X | Y H Q X | Y , by ( A28 ) .
H Q X | Y = Q X | Y Σ Δ
H = I Σ Δ Q X | Y 1
By (A39), it then follows from (A33) that Q W = Q W T . From the specification of G the equation of Q W given by (A33) and (A34) and H Q X | Y , H given by (A39), and (A40) then follows the realization of Theorem 4.(a) for the case Q X | Y Σ Δ 0 . Properties (58)–(61) are easily verified.
Case (ii). cov ( X ^ , X ^ | Y ) 0 but not cov ( X ^ , X ^ | Y ) 0 , that is, rank ( Q X ^ | Y ) = n 1 < n x . We can verify that the stated realization in Theorem 4.(a) is such that Condition (87) holds. By (83) and the above calculations, we have
X ¯ c m = e G ( Y , X ^ ) = E X | Y + cov ( X , X ^ | Y ) cov ( X ^ , X ^ | Y ) X ^ E X ^ | Y
= E X | Y + Q X | Y Σ Δ Q X | Y Σ Δ X ^ E X ^ | Y .
Since Q X ^ | Y = Q X | Y Σ Δ , E X ^ E X ^ | Y X ^ E X ^ | Y T = Q X | Y Σ Δ , and rank ( L ) = n 1 , where L = Q X | Y Σ Δ Q X | Y Σ Δ , then an application of Proposition 3, implies that Condition (87) holds. Thus, we have established Theorem 3.(a) and the properties stated under Theorem 4.(a). (i)–(iv). Finally, (96)–(101) are obtained from the realization, and hence Theorem 3.(b) is achievable.

Appendix A.4. Proof of Corollary 1

(a) This part is a special case of a related statement in [22]. However, we include it for completeness. By linear algebra [21], given two matrices A S + k × k , B S + k × k , then the following statements are equivalent: (1) A B is normal and (2) A B 0 , where A B normal means ( A B ) ( A B ) T = ( A B ) T ( A B ) . Note that A B is normal if and only if A B = B A ; i.e., they commute. Let A = U A D A U A T , B = U B D B U B T , U A U A T = I k , U B U B T = I k ,; i.e., there exists a spectral representation of A , B in terms of unitary matrices U A , U B and diagonal matrices D A , D B . Then, A B 0 if and only if the matrices A and B commute; i.e., A B = B A , and A and B commute if and only if U A = U B .
Suppose (105) holds. Letting A = Q X | Y , B = Σ Δ , then A = U A D A U A T ,   B = U B D B U B T , U A U A T = I n x , U B U B T = I n x , U A = U B . Since Q X | Y 1 = A 1 = U A D A 1 U A T , then Σ Δ Q X | Y 1 = Q X | Y 1 Σ Δ ; i.e., they commute. Hence,
H t T = ( I n x ( Σ Δ Q X | Y 1 ) T = I n x ( Q X | Y 1 ) T Σ Δ T = I n x Q X | Y 1 Σ Δ = I n x Σ Δ Q X | Y 1 = H since Q X | Y and Σ Δ commute .
By the definition of Q W given in Theorem 4.(a), we have
Q W = Σ Δ H T = Q W T = H Σ Δ .
Substituting (A43) into (A44), then
Q W = Σ Δ H .
Hence, { Σ Δ , Σ X | Y , H , Q W } are all elements of S + p × p having a spectral decomposition with respect to the same unitary matrix U U T = I n x .

References

  1. Wyner, A.; Ziv, J. The rate-distortion function for source coding with side information at the decoder. IEEE Trans. Inf. Theory 1976, 22, 1–10. [Google Scholar] [CrossRef]
  2. Wyner, A. The rate-distortion function for source coding with side information at the decoder-II: General sources. Inf. Control. 1978, 38, 60–80. [Google Scholar] [CrossRef]
  3. Berger, T. Rate Distortion Theory: A Mathematical Basis for Data Compression; Englewood Cliffs: Prentice-Hall, NJ, USA, 1971. [Google Scholar]
  4. Gray, R.M. A new class of lower bounds to information rates of stationary sources via conditional rate-distortion functions. IEEE Trans. Inf. Theory 1973, 19, 480–489. [Google Scholar] [CrossRef]
  5. Tian, C.; Chen, J. Remote Vector Gaussian Source Coding With Decoder Side Information Under Mutual Information and Distortion Constraints. IEEE Trans. Inf. Theory 2009, 55, 4676–4680. [Google Scholar] [CrossRef]
  6. Zahedi, A.; Ostergaard, J.; Jensen, S.H.; Naylor, P.; Bech, S. Distributed remote vector gaussian source coding with covariance distortion constraints. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 586–590. [Google Scholar]
  7. Gkagkos, M.; Charalambous, C.D. Structural Properties of Test Channels of the RDF for Gaussian Multivariate Distributed Sources. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 2631–2636. [Google Scholar] [CrossRef]
  8. Draper, S.C.; Wornell, G.W. Side information aware coding strategies for sensor networks. IEEE J. Sel. Areas Commun. 2004, 22, 966–976. [Google Scholar] [CrossRef]
  9. Oohama, Y. Gaussian multiterminal source coding. IEEE Trans. Inf. Theory 1997, 43, 1912–1923. [Google Scholar] [CrossRef]
  10. Oohama, Y. Rate-distortion theory for Gaussian multiterminal source coding systems with several side informations at the decoder. IEEE Trans. Inf. Theory 2005, 51, 2577–2593. [Google Scholar] [CrossRef]
  11. Viswanathan, H.; Berger, T. The quadratic Gaussian CEO problem. IEEE Trans. Inf. Theory 1997, 43, 1549–1559. [Google Scholar] [CrossRef]
  12. Ekrem, E.; Ulukus, S. An outer bound for the vector Gaussian CEO problem. In Proceedings of the 2012 IEEE International Symposium on Information Theory Proceedings, Cambridge, MA, USA, 1–6 July 2012; pp. 576–580. [Google Scholar]
  13. Wang, J.; Chen, J. Vector Gaussian Multiterminal Source Coding. IEEE Trans. Inf. Theory 2014, 60, 5533–5552. [Google Scholar] [CrossRef]
  14. Xu, Y.; Guang, X.; Lu, J.; Chen, J. Vector Gaussian Successive Refinement With Degraded Side Information. IEEE Trans. Inf. Theory 2021, 67, 6963–6982. [Google Scholar] [CrossRef]
  15. Renna, F.; Wang, L.; Yuan, X.; Yang, J.; Reeves, G.; Calderbank, R.; Carin, L.; Rodrigues, M.R.D. Classification and Reconstruction of High-Dimensional Signals From Low-Dimensional Features in the Presence of Side Information. IEEE Trans. Inf. Theory 2016, 62, 6459–6492. [Google Scholar] [CrossRef]
  16. Salehkalaibar, S.; Phan, B.; Khisti, A.; Yu, W. Rate-Distortion-Perception Tradeoff Based on the Conditional Perception Measure. In Proceedings of the 2023 Biennial Symposium on Communications (BSC), Montreal, QC, Canada, 4–7 July 2023; pp. 31–37. [Google Scholar]
  17. Gallager, R.G. Information Theory and Reliable Communication; John Wiley & Sons, Inc.: New York, NY, USA, 1968. [Google Scholar]
  18. Pinsker, M.S. The Information Stability of Gaussian Random Variables and Processes; Holden-Day, Inc.: San Francisco, CA, USA, 1964; Volume 133, pp. 28–30. [Google Scholar]
  19. Aries, A.; Liptser, R.; Shiryayev, A. Statistics of Random Processes II: Applications; Stochastic Modelling and Applied Probability; Springer: New York, NY, USA, 2013. [Google Scholar]
  20. van Schuppen, J. Control and System Theory of Discrete-Time Stochastic Systems; Number 923 in Communications and Control Engineering; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar] [CrossRef]
  21. Horn, R.A.; Johnson, C.R. (Eds.) Matrix Analysis, 2nd ed.; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
  22. Charalambous, C.; Charalambous, T.; Kourtellaris, C.; van Schuppen, J. Structural Properties of Nonanticipatory Epsilon Entropy of Multivariate Gaussian Sources. In Proceedings of the 2020 IEEE International Symposium on Information Theory, Los Angeles, CA, USA, 21–26 June 2020; pp. 586–590. [Google Scholar]
  23. Gorbunov, A.K.; Pinsker, M.S. Prognostic Epsilon Entropy of a Gaussian Message and a Gaussian Source. Probl. Inf. Transm. 1974, 10, 93–109. [Google Scholar]
  24. Ihara, S. Information Theory for Continuous Systems; World Scientific: Singapore, 1993. [Google Scholar]
Figure 1. The Wyner and Ziv [1] block diagram of lossy compression. If switch A is closed, then the side information is available at both the encoder and the decoder; if switch A is open, the side information is only available at the decoder.
Figure 1. The Wyner and Ziv [1] block diagram of lossy compression. If switch A is closed, then the side information is available at both the encoder and the decoder; if switch A is open, the side information is only available at the decoder.
Entropy 26 00306 g001
Figure 2. Test channel when side information is only available to the decoder.
Figure 2. Test channel when side information is only available to the decoder.
Entropy 26 00306 g002
Figure 3. R X | Y ( Δ X ) : A realization of optimal reproduction X ^ over parallel additive Gaussian noise channels of Theorem 4, where h i = 1 δ i λ i 0 , i = 1 , , n x are the diagonal element of the spectral decomposition of the matrix H = U diag { h 1 , , h n x } U T , and W i N ( 0 , h i δ i ) , i = 1 , , n x , the additive noise introduced due to compression.
Figure 3. R X | Y ( Δ X ) : A realization of optimal reproduction X ^ over parallel additive Gaussian noise channels of Theorem 4, where h i = 1 δ i λ i 0 , i = 1 , , n x are the diagonal element of the spectral decomposition of the matrix H = U diag { h 1 , , h n x } U T , and W i N ( 0 , h i δ i ) , i = 1 , , n x , the additive noise introduced due to compression.
Entropy 26 00306 g003
Figure 4. Wyner’s realizations of optimal reproductions for RDFs R X | Y ( Δ X ) and R ¯ ( Δ X ) . (a) RDF R X | Y ( Δ X ) : Wyner’s [2] optimal realization of X ^ for RDF R X | Y ( Δ X ) of (165)–(168). (b) RDF R ¯ ( Δ X ) : Wyner’s [2] optimal realization X ^ = f ( X , Z ) for RDF R ¯ ( Δ X ) of (165)–(168).
Figure 4. Wyner’s realizations of optimal reproductions for RDFs R X | Y ( Δ X ) and R ¯ ( Δ X ) . (a) RDF R X | Y ( Δ X ) : Wyner’s [2] optimal realization of X ^ for RDF R X | Y ( Δ X ) of (165)–(168). (b) RDF R ¯ ( Δ X ) : Wyner’s [2] optimal realization X ^ = f ( X , Z ) for RDF R ¯ ( Δ X ) of (165)–(168).
Entropy 26 00306 g004
Figure 5. Comparison of classical RDF, R X ( Δ X ) , conditional RDF R X | Y ( Δ X ) = R ¯ ( Δ X ) , and Gray’s lower bound R X ( Δ X ) I ( X ; Y ) (solid green line).
Figure 5. Comparison of classical RDF, R X ( Δ X ) , conditional RDF R X | Y ( Δ X ) = R ¯ ( Δ X ) , and Gray’s lower bound R X ( Δ X ) I ( X ; Y ) (solid green line).
Entropy 26 00306 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gkagkos, M.; Charalambous, C.D. Structural Properties of the Wyner–Ziv Rate Distortion Function: Applications for Multivariate Gaussian Sources. Entropy 2024, 26, 306. https://doi.org/10.3390/e26040306

AMA Style

Gkagkos M, Charalambous CD. Structural Properties of the Wyner–Ziv Rate Distortion Function: Applications for Multivariate Gaussian Sources. Entropy. 2024; 26(4):306. https://doi.org/10.3390/e26040306

Chicago/Turabian Style

Gkagkos, Michail, and Charalambos D. Charalambous. 2024. "Structural Properties of the Wyner–Ziv Rate Distortion Function: Applications for Multivariate Gaussian Sources" Entropy 26, no. 4: 306. https://doi.org/10.3390/e26040306

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop