Information Dimension and the Degrees of Freedom
of the Interference Channel
Yihong Wu, Shlomo Shamai (Shitz) and Sergio Verd´u
Abstract—The degrees of freedom of the K-user Gaussian interference channel determine the asymptotic growth of the maximal sum rate as a function of the signal-to-noise ratio. Subject to a very general sufficient condition on the cross-channel gains, we give a formula for the degrees of freedom of the scalar interference channel as a function of the deterministic channel matrix, which involves maximization of a sum of information dimensions over K scalar input distributions. Known special cases are recovered, and even generalized in certain cases with unified proofs.
Index Terms—Multiuser information theory, interference chan-nel, information dimension, Shannon theory, sumset entropy inequalities, degrees of freedom, high-SNR channels.
I. INTRODUCTION
Consider aK-user real-valued memoryless Gaussian Inter-ference Channel (IC) with a fixed deterministic channel matrix
H= [hij](known at the encoder and decoder), where at each
symbol epoch the ith user transmitsX
i and the ith decoder
receives Yi= K X j=1 √ snrhijXj+Ni, (1)
where {Xi, Ni}Ki=1 are independent with E X2 i ≤ 1 and Ni ∼ N(0,1).
Denote the capacity region of (1) byC(H,snr)and the sum-rate capacity by ¯ C(H,snr),max (K X i=1 Ri:RK ∈ C(H,snr) ) . (2) The degrees of freedom(DoF) or themultiplexing gain is the pre-log1 of the sum-rate capacity in the high-SNR regime,
defined by
DoF(H) = lim sup
snr→∞ ¯ C(H,snr) 1 2logsnr . (3)
This work was partially supported by the National Science Foundation (NSF) under Grant CCF-1016625 and by the Center for Science of Infor-mation (CSoI), an NSF Science and Technology Center, under Grant CCF-0939370. This work was also supported by the U.S.Israel Binational Science Foundation (BSF). The results of this paper were presented in part at the 2011 IEEE International Symposium on Information Theory, Saint-Petersburg, Russia, July 31 – August 5, 2011 [1].
Y. Wu was with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA. He is now with the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA (e-mail: yihongwu@illinois.edu). S. Verd´u is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA (e-mail: verdu@princeton.edu). S. Shamai (Shitz) is with the Department of Electrical Engineering, Technion, IIT, Technion City, Haifa 32000, Israel (e-mail: sshlomo@ee.technion.ac.il)
1All logarithms in this paper are binary unless explicitly stated otherwise.
Note that the normalization in the definition of (3) is such that
DoF(H) = 1 in the single-user case. Despite its suggestive terminology,DoF(H)need not be an integer in the multi-user case.
Determining the degrees of freedom has been an active and challenging research subject, in view of the fact that the maximal sum rate of the Gaussian interference channel
¯
C(H,snr)remains unknown. In [2] it is shown that
DoF(H)≤K
2 (4)
for fully-connected H, i.e., H with no zero entries. Using Diophantine approximations, (4) is shown to be achievable for Lebesgue-almost every H [3]. The almost sure achievability of K2 for the vector interference channel with varying channel gains has been shown in [4] using the technique of inter-ference alignment. Sufficient conditions on individualH that guarantee equality in (4) are also given in [5], [6]. However, equality does not hold in general: for example, if all the entries in H are rational, then (4) is strict as shown in [5] based on additive-combinatorial results and deterministic channel approximations. In addition to theoretical investigations, vari-ous communication and interference alignments schemes have been proposed (see, e.g., [7], for a comprehensive review).
The goal of this paper is to give a single-letter formula for DoF(H) via the maximization of a functional involving theinformation dimension(also known asR´enyi dimensionor
entropy dimension) [8] under a general sufficient condition on the off-diagonals ofH, which, in particular, is satisfied almost surely or if the off-diagonals are algebraic. Moreover, the maximization gives a lower bound onDoF(H)for any channel matrix H. This unified approach allows us to uncover new results, as well as recover previously known results obtained via different methods. We also give results for the more general case in which the rates are, unlike (2), not equally weighted. The rest of the paper is organized as follows: Section II
gives the necessary background on information dimension, which is the cornerstone of the main results presented in SectionIII. Our results on the degrees of freedom region are given in Section III-F. In Section IV we discuss self-similar distributions, which play an important role in constructing
DoF-achieving input distributions. Proofs of the main results are given in Section V, while several technical lemmas are relegated to the appendix. The converse results for rational channel matrices are based on new sumset inequalities for linear combinations of independent group-valued random vari-ables in AppendixE, which might be of independent interests.
II. INFORMATION DIMENSION
In this section we summarize the main properties of infor-mation dimension relevant to the current paper. In particular, the connection between the information dimension and the high-SNR asymptotics of mutual information with additive noise is presented in SectionII-B. Other properties and further discussions can be found in [8], [9], [10], [11].
A. Definitions
A key concept in fractal geometry, the information dimen-sion of a random variable measures the rate of growth of the entropy of its successively finer discretizations [8].
Definition 1. Let X be a real-valued random variable. For m∈N, denote the uniformly quantized version ofX by2
hXim, bmXc
m . (5)
The information dimensionof X is defined as d(X) = lim
m→∞
H(hXim)
logm . (6)
Useful if the limit in (6) does not exist, thelim inf andlim sup
are called lower and upper information dimensions of X, respectively, denoted by d(X)andd(X).
Definition 1 can be readily extended to random vectors, where the floor function b·c is taken componentwise [8, p. 208]. Since d(X)only depends on the distribution of X, we also denoted(PX) =d(X). Similar convention is traditionally
followed with entropy and other information measures. The information dimension of X is finite if and only if the mild condition
H(bXc)<∞ (7)
is satisfied [8], [10]. A sufficient condition for finite informa-tion dimension is
E[log(1 +|X|)]<∞, (8)
which is milder than the existence of E[X]. Therefore (7) is
satisfied for all random variables considered in this paper. Equivalent definitions of information dimension include:
• For an arbitrary integer M ≥ 2, write the M-ary
expansion of X as
X =bXc+X i∈N
(X)iM−i. (9)
where the ith digit (X)
i , MiX −MMi−1X
is a discrete random variable taking values on {0, . . . , M−1}. Thend(X)andd(X)coincide with the normalized lower and upper entropy rates of the process {(X)j:j∈N}.
• Denote by B(x, )the open ball3 of radiuscentered at
x. Then (see [12, Definition 4.2] and [10, Appendix A]) d(X) = lim
↓0
E[logPX(B(X, ))]
log . (10)
2In the sequel, we will use the notation in (5) even ifmis not an integer. 3By the equivalence of norms on finite-dimensional space, the value of the
limit on the right side of (10) is independent of the underlying norm.
With Shannon entropy replaced by R´enyi entropy of order αin Definition1, the generalized notion of dimension of order α4 is defined similarly [14], [9]:
Definition 2 (Information dimension of order α). Let α ∈
[0,∞]. The information dimension ofX of orderαis defined as
dα(X) = lim m→∞
Hα(hXim)
logm , (11)
where Hα(Y) denotes the R´enyi entropy of order α of a
discrete random variableY with probability mass functionPY
defined onY, defined as Hα(Y) = logmax1 y∈YPY(y) , α=∞, 1 1−αlog P y∈YPY(y)α , α6= 1,∞. (12)
The lim inf andlim sup of the ratio in (11) are called lower and upper information dimensions of X of order α respec-tively, denoted bydα(X)anddα(X).
Note thatd1(X) =d(X). As a consequence of the
mono-tonicity of R´enyi entropy, dα decreases with α. Moreover,
α7→dα(X)is continuous except possibly atα= 1[15]. For
instance, if X has a discrete-continuous mixed distribution, thendα(X) = ˆd(X) = 1for all α <1, whiled(X)equals to
the weight of the continuous part (see Theorem1). IfX has a Cantor distribution (see [16] or SectionIV),dα(X) = log32
for allα∈[0,∞].
B. Properties
The following are basic properties of information dimension [8], [10], the last three of which are inherited from Shannon entropy.
Lemma 1. •
0≤d(Xn)≤n. (13)
• Scale-invariance: For alla6= 0,
d(aXn) =d(Xn). (14)
• IfXn and Yn are independent, then5
max{d(Xn), d(Yn)} ≤ d(Xn+Yn) (15)
≤d(Xn) +d(Yn). (16) • IfXn, Yn, Zn are independent, then
d(Xn+Yn+Zn)+d(Zn)≤d(Xn+Zn)+d(Yn+Zn). (17)
Proof. AppendixA.
To calculate the information dimension in (6) or (11), it is sufficient to restrict to an exponential subsequence, as a result of the following lemma.
4Also known as theLα dimension or spectrum [13]. In particular,d2 is
also called the correlation dimension.
5The upper bound (16) can be improved by defining conditional information
Lemma 2. Letλ >1. Ifdα(X)exists, then
dα(X) = lim l→∞
Hα(hXiλl)
llogλ . (18)
As shown in [8], the information dimension for the mixture of discrete and absolutely continuous distributions can be determined as follows (see [8, Theorem 1 and 3] or [9, Theorem 1, pp. 588 – 592]):
Theorem 1. Assume that the distribution of a scalar random variableX is given by
ν= (1−ρ)νd+ρνc, (19)
whereνdis a discrete probability measure,νcis an absolutely
continuous probability measure and0≤ρ≤1. Then
d(X) =ρ. (20)
In particular, if X has a density (with respect to Lebesgue measure), thend(X) = 1; ifX is discrete, thend(X) = 0.
When the distribution of X has a singular component, its information dimension does not admit a simple formula in general. However, for the important class of self-similar sin-gular distributions, the information dimension can be explicitly determined, as we show in SectionIV-B.
C. Information dimension and high-SNR mutual information
The high-SNR asymptotics of mutual information with additive noise is governed by the input information dimen-sion, as shown by the following result due to Guionnet and Shlyakhtenko [17].6 In this paper we give a considerably simplified and self-contained proof for this result based on a non-asymptotic bound on mutual information in SectionV-A, from which Theorem 2is an immediate consequence.
Theorem 2. Let X be independent of N which is standard normal. Denote I(X,snr),I(X;√snrX+N), (21) Then7 lim snr→∞ I(X,snr) 1 2logsnr =d(X). (22)
Moreover, (22) holds verbatim in the vector case.
In view of Theorem 2, the information dimension d(X)
represents the single-user degrees of freedom when the input distribution is constrained to be PX. Naturally, information
dimension, as we will see, also appears in the characterization of degrees of freedom in the multi-user case. In the single-user case, (22) indicates that mutual information is maximized 6The original results in [17, Theorem 3.1] are proved assuming (8), which
is in fact superfluous as shown in [18, Section 2.7]. Moreover, Theorem2
holds even ifd(X) =∞, becauseI(X,snr)<∞if and only if (7) holds [19, Theorem 6]. Also, [17] failed to point out that the “classical analogue” of the free entropy dimension they defined is, in fact, the information dimension.
7When the limit on the left side of (22) does not exist, (22) continues to
hold withlimreplaced bylim inf(resp.lim sup) anddreplaced byd(resp.
d).
asymptotically by any absolutely continuous input distribution, in view of Theorem 1 in Section II-B. In the interference channel, as we will see in SectionIII, the situation is far more intricate since analog inputs are deleterious from the viewpoint of interference.
D. Information dimension under projection
LetA∈Rm×n withm≤n. Then for anyXn,
d(AXn)≤min{d(Xn),rank(A)}. (23)
Understanding how the dimension of a measure behaves under projections is a basic problem in fractal geometry. It is well-known thatalmost every projection preserves the dimension, be it Hausdorff dimension (Marstrand’s projection theorem [20, Chapter 9]) or information dimension [12, Theorems 1.1 and 4.1]. However, computing the dimension for individual projections is, in general, difficult.
The preservation of information dimension of order α ∈
(1,2] under typical projections is shown in [12, Theorem 1], which is relevant to our investigation of the almost sure behavior of degrees of freedom.
Lemma 3([12]). Letα∈(1,2]and m≤n. Then for almost everyA∈Rm×n,
dα(AXn) = min{dα(Xn), m}. (24)
It is easy to see that (24) fails for information dimension (α = 1) [12, p. 1041]: Let PXn = (1 − ρ)δ0 + ρQ,
where Q is an absolutely continuous distribution on Rn,
n > m and ρ < 1. Note that almost surely, AXn also has
a mixed distribution with an atom at zero of mass 1 −ρ. Then d(AXn) = ρm < min{d(Xn), m} = min{ρn, m}.
Examples of nonpreservation ofdα for 0≤α <1 or α >2
are given in [12, Section 5].
A problem closely related to determining DoF(H) is to find the dimension difference of a product measure (i.e., distribution of a pair of independent random variables) under two projections (c.f. Remark5in SectionIII-G). Letp, q, p0, q0
be non-zero real numbers. Then d(pX+qY)−d(p0X+q0Y)
≤ 1
2(d(pX+qY) +d(pX+qY)−d(X)−d(Y)) (25)
≤ 1
2, (26)
where (25) and (26) follow from (15) and (16), respectively. Therefore, the dimension of a two-dimensional product mea-sure under two projections can differ by at most one half. Moreover, the next result shows that if the coefficients are rational, then the dimension difference isstrictlyless than one half. The proof is based on new entropy inequalities for linear combinations of independent random variables developed in AppendixE.
Theorem 3. Letp, p0, q, q0 be non-zero integers. Then
d(pX+qY)−d(p0X+q0Y)≤ 1
2 −(p
where
(a, b), 1
56(blog|a|c+blog|b|c) + 16. (28)
In the special cases of p= 2, q = 1 and p= 1, q =−1, the following improved bounds hold
d(2X+Y)−d(X+Y)≤ 3 7. (29) and d(X−Y)−d(X+Y)≤2 5. (30) Proof. AppendixE.
Remark 1. The constants in (28) are, of course, not optimal. However, the logarithmic dependence on the coefficients is
sharp, as shown by the following example: For any p∈ N, there exists independent X and Y, such that d(pX +Y)− d(X+Y)≥ 1 2−Θ 1 logp
. See the proof of Theorem11in Section V-Efor the construction.
III. MAIN RESULTS
A. A general formula
The degrees of freedom of theK-user interference channel with channel matrixHdefined in (3) is given by the following result. Theorem 4. Let dof(XK,H) , K X i=1 d K X j=1 hijXj ! −d X j6=i hijXj ! . (31)
Then, for any H,8
DoF(H)≥sup XK
dof(XK,H), (32)
where the supremum is over independent X1, . . . , XK such
that
H(bXic)<∞ (33)
and all information dimensions appearing in (31) exist. Equal-ity holds in (32) except possibly forHwhose off-diagonals are non-algebraic and belong to a set of zero Lebesgue measure. Proof. SectionV-B.
For a fixed channel matrixH, we have been able to show that equality holds in (32) under the sufficient condition that all cross-channel gains are algebraic (e.g., rationals, quadratic irrationals, etc).9 This might be an artifact of the proof. We conjecture that equality in (32) in fact applies to any matrixH. Nevertheless, in the three-user case, the condition for equality in (32) can be further relaxed. SinceDoF(H)is invariant under 8In fact, the lower bound (32) holds even ifDoF(H)in (3) is defined with
anlim infsnr→∞. Therefore the limit exists whenever equality holds in (32). 9In SectionV-B, we give a deterministic sufficient condition on the
cross-channel gains, which is more general than algebraicity and holds almost surely. See (112).
row and column scaling, the channel matrix can be reduced to the following standard form [3, p. 12]:
H= g1 1 1 1 g2 1 1 h g3 . (34)
wherehis the ratio between the product of upper and lower off-diagonals, i.e., h, Q i<jhij Q i>jhij (35) Then, (32) holds with equality as long as h is an algebraic number.
Remark 2. Condition (33) is much weaker than the average power constraint, since EX2 ≤1 implies that H(bXc)≤ E[1 +|bXc|] ≤ 2. In fact, as we show in Section V-B, in
order for (32) to hold, the inputs need not satisfy any power constraint. As long as (33) is satisfied, DoF(H) does not change even if we allow infinite input power, in which casesnr
is simply a scale parameter no longer carrying the operational meaning of signal-to-noise ratio.
We proceed to offer some insight on the formula in Theorem
4. By thelimiting characterizationof the interference channel capacity region [21] [22, (2)], the sum-rate capacity is given by ¯ C(H,snr) = lim n→∞ 1 nXnsup 1,...,XKn K X i=1 I(Xn i;Y n i ), (36) whereXn
i = [Xi,1, . . . , Xi,n]is the input of theith user, and
the supremum is over independentXn
1, . . . , XKn satisfying the
corresponding average cost constraints. Then, I(Xn i;Y n i ) =I(X1n, . . . , X n K;Y n i )−I(X n 1, . . . , X n K;Y n i |X n i) (37) =I K X j=1 hijXjn,snr ! −I X j6=i hijXjn,snr ! , (38)
where I(·,snr) is defined in (21). Therefore the degrees of freedom admit the following limiting characterization:10
DoF(H) = lim sup
snr→∞ nlim→∞Xnsup 1,...,XKn 2 nlogsnr (39) K X i=1 ( I K X j=1 hijXjn,snr ! −I X j6=i hijXjn,snr !) , Our main result is the single-letterization of (39). Note that if the limit over snr were inside the supremum, then the desired result (32) would follow immediately in view of the high-SNR scaling of mutual information in Theorem 2. Indeed, employing i.i.d. input distributions yields (32). Thus the main difficulty in the converse proof lies in exchanging the supremum with the limits in (39), which amounts to proving that varying the input distribution with increasing SNR does not improve the degrees of freedom. To this end, we need a non-asymptotic version of Theorem2 given in SectionV-A.
10The second limit in (39) can be replaced by supremum overn∈ N.
In the rest of the section, we assume that the cross-channel gains are algebraic and we discuss some of the benefits of the information dimension characterization in Theorem4.
B. Some simple special cases
As we illustrate next, the degrees of freedom of various channels can be obtained by specializing Theorem 4.
• Two-user IC: DoF a b c d = 0 a=d= 0 2 a6= 0, d6= 0, b=c= 0 1 otherwise. (40)
• Many-to-one IC[23]: ifhii6= 0for alliandh1i 6= 0for
somei, then DoF h11 h12 h13 · · · h1K 0 h22 0 · · · 0 .. . 0 . .. . .. ... .. . ... . .. . .. 0 0 · · · 0 0 hKK =K−1. (41)
To see this, assuming h126= 0, we have dof(XK,H) =d X j h1jXj −d X j6=1 h1jXj + X j6=1 d(Xj) ≤d X j h1jXj + K X j=3 d(Xj) (42) ≤K−1, (43)
where (42) is due to (15). The upper bound K − 1
is attained by choosing X2, . . . , XK to be absolutely
continuous.
• One-to-many IC [23]: if hii 6= 0for alli, then
DoF h11 0 · · · 0 0 h21 h22 · · · 0 0 .. . ... . .. 0 hK1 0 · · · 0 hKK =K−1 (44)
To verify (44) note that
dof(XK,H) = X j6=1 d(h1jX1+hjjXj)−(K−2)d(X1) (45) ≤ X j6=1 d(h1jX1+hjjXj) (46) ≤K−1, (47)
attained by choosingX1 discrete and the rest absolutely
continuous.
• Multiple-access channel (MAC): If H is the all-one matrix, thenDoF(H) = 1, because
dof(XK,H) =Kd K X j=1 Xj − K X i=1 d X j6=i Xj (48) ≤d K X j=1 Xj (49) ≤1, (50)
where we have used the following additive-combinatorial result [24, p. 3]: (K−1)H K X i=1 Uj ! ≤ K X i=1 H X j6=i Uj ! (51)
with{Ui}taking values on an arbitrary group. In view
of Lemma 9, setting Uj = hXjim in (51) and sending
m → ∞ yields (49). More generally, DoF(H) = 1 if
rank(H) = 1. Another way to see that DoF(H) = 1 is to invoke the last property of DoF(·) in Section III-D: Denote by H0 the upper-triangular all-one matrix. Then
DoF(H)≤DoF(H0) = supXKd(PXj)≤1.
C. Suboptimality of discrete-continuous mixtures
Mixed discrete-continuous input distributions are strictly suboptimal in general. In fact, they achieve at most one
degree of freedom in the fully-connected case. To see this, denote for brevity d(Xi) = ρi. Assume that H is
fully-connected. For eachi,PKj=1hijXj has a discrete-continuous
distribution, where the weight of the discrete component is equal toQKj=1(1−ρj). Then dof(XK,H) = K X i=1 1− K Y j=1 (1−ρj) −1−Y j6=i (1−ρj) = K X i=1 ρi Y j6=i (1−ρj) (52) ≤1, (53)
Therefore to obtain more than one degree of freedom, it is necessary to employ input distributions withsingular compo-nents.
As we will show later,singulardistributions of information dimension one half are crucial in achieving the maximal degrees of freedom. In SectionIV-Cwe give a family of such distributions {µN}N≥2 which are self-similar [16] and DoF
-achieving in many cases. In fact,µN can be understood as the
distribution of a random variable whoseN-ary expansion has equiprobable even digits and zero odd digits. Thend(µN) = 12
in view of (9). Such a distribution is neither discrete nor continuous.
D. Properties ofDoF(H)
The following are immediate consequences of Theorem
4 combined with the elementary properties of information dimension in Lemma1:
• DoF(H) is invariant under row or column scaling [5, Lemma 1], in view of (14). To be more precise, for any XK,dof(XK,H)is invariant under row scaling by (14),
but dof(XK,H) is not invariant under column scaling.
However, by replacing each Xi with its scaled version, supXKdof(XK,H)is invariant under column scaling.
• DoF(H)≤K, with equality if and only ifHis a diagonal matrix with no diagonal entry equal to zero. To see this, note that dof(XK,H) =K if and only if
d K X j=1 hijXj ! = 1, (54) d X j6=i hijXj ! = 0. (55)
for alli. On one hand, (54) – (55) implyhii6= 0. On the
other hand, since d(Xi)≥d X j hijXj −d X j6=i hijXj , (56)
we have d(Xi) = 1 for all i, and (55) implies hij = 0
for alli6=j.
• Letdiag(H) = [h11, . . . , hKK]. ThenDoF(H)≥0, with
equality if and only ifdiag(H) =0. Ifdiag(H)6=0, then
DoF(H)≥1, which can be achieved by choosing one of the inputs to be absolutely continuous (unit dimension) and the rest discrete (zero dimension).
• Removing cross-links increases the degrees of freedom: Let H0 be obtained from H by setting some of the off-diagonal entries to zero. By (17), for any inde-pendent XK, dof(XK,H) ≤ dof(XK,H0). Therefore DoF(H) ≤ DoF(H0). This can also be seen using a genie-aided converse argument by providing the interfer-ing messages to the relevant receivers.
E. Bounds and exact expressions
Next we prove that the number of degrees of freedom is upper bounded by K2 under more general sufficient conditions than the fully-connected assumption in [2].
Theorem 5. Suppose thatHsatisfies the assumption in Theo-rem4. Letπbe a fixed-point-free permutation on{1, . . . , K}, i.e.,π(i)6=i for alli. If hπ(i),i6= 0for eachi, then
DoF(H)≤ K
2. (57)
Moreover, if Kis odd and πiscyclic, i.e., there exist distinct elements{n1, . . . , nK}of{1, . . . , K}, such thatπ(ni) =ni+1
for i= 1, . . . , K−1 and π(nK) = n1, then dof(XK,H) = K
2 if and only if for each i,
d(Xi) = 1 2 (58) d X j6=i hijXj ! = 1 2 (59) d K X j=1 hijXj ! = 1. (60) Proof. SectionV-C.
Remark 3. In the second part of Theorem 5, the assumption that πis cyclic is not superfluous. To see this, considerK= 5, π(1) = 2, π(2) = 1, π(3) = 4, π(4) = 5, π(5) = 3 and hij = 0 except for hii and hπ(i),i. Such a channel consists
of a two-user IC decoupled with a three-user IC. Therefore as the sum of the respective DoFs, DoF(H) = 5/2, which can be achieved by choosingX1 to be absolutely continuous,
X2 discrete and X3, X4, X5 to be the corresponding
DoF-achieving distributions for the three-user IC. Clearly in this case (58) is not necessary. The same also applies to the case of even number of users.
Next we give various sufficient conditions onH that guar-anteeDoF(H) =K
2.
Theorem 6([3]). DoF(H) = K
2 for Lebesgue-a.e. H.
Proof. SectionV-D.
Recently, Stotz and B¨olcskei [25], following up on an earlier version of this paper, obtained deterministic sufficient conditions on H to guarantee that DoF(H) = K
2, which
imply Theorem 6 as a special case. Using the achievability bound (32) via information dimension and the self-similar input distribution in Section IV, the proof in [26] invokes a recent result by Hochman [27] in fractal geometry, in lieu of the Diophantine approximation results used in the present paper as well as [5], [3]. In particular, using results from [27] it is shown in [25, Theorem 2] that for the self-similar construction in (73), the information dimension formula (81) holds for almost every contraction ratio r without requiring the open set condition, which can be difficult to verify. This shortcut offers simpler proofs of achievability results such as the following (see [25, Section VIII]):
Theorem 7. If the off-diagonal entries ofHare rational and the diagonal entries are irrational non-zeros, thenDoF(H) =
K 2.
Theorem 7 generalizes the following known results, pre-viously obtained using different Diophantine approximation results:
• [5, Theorem 1], which relies on the Thue-Siegel-Roth theorem [28] and requires the diagonal entries to be irrational algebraic numbers. Note that the set of real algebraic numbers is dense but countable. Therefore, almost every real number is transcendental.
• [6, Theorem 1(2)], which assumes that the diagonal and
off-diagonal entries are equal to one and hrespectively, with h irrational. Upon scaling, this is equivalent to a channel matrix with unit off-diagonal entries and irra-tional diagonal entriesh−1.
Remark 4. Theorem7requires all diagonal entries to be irra-tional. OtherwiseK2 degrees of freedom might be unattainable. For instance, consider the 3×3 channel matrix Hλ in (65). SinceDoFis invariant under row or column scaling,DoF(Hλ)
does not depend on h11or h33 as long as they are both
non-zero. Then Theorem 11 implies thatDoF(Hλ)< 32 if h22 is
rational, even ifh11 andh33 are both irrational.
For H with non-zero rational coefficients it is known that
DoF(H) < K
2 [5]. Next we give a more general sufficient
condition for the strict inequality in the three-user case.
Theorem 8. Let K = 3 and let the off-diagonals of H be algebraic. If there exist distinct i, j, k, such thathij, hii, hkj
and hki are non-zero rationals, thenDoF(H)< 32.
Proof. SectionV-E.
Examples of strict improvement ofDoF(H)< 32 for certain
H can be found in Section III-G. The following example, which is proved in Appendix F, illustrates that the condition in Theorem8 is not superfluous:
DoF 1 2 0 0 1 2 2 0 1 = 3 2. (61)
F. Degrees of freedom region
The degrees of freedom region of the interference channel (1) is defined as the high-SNR limit of normalized capacity regions:
DoF
(H), rK: lim snr→∞ρ rK,C(H,snr) 1 2logsnr = 0 , (62) where ρ(x, E) , infy∈Ekx−yk2 denotes the distancebe-tween the point x and the set E belonging to an arbitrary Euclidean space.
The degrees of freedom region exists and is characterized by the following result, whose proof is almost identical to that of Theorem4:
Theorem 9. Suppose that H satisfies the assumption in Theorem 4.
DoF
(H) is the collection of all rK ∈ [0,1]K,such that for any probability vectorwK,
hrK, wKi ≤sup XK K X i=1 wid K X j=1 hijXj ! −wid X j6=i hijXj ! , (63)
where the supremum is over independent X1, . . . , XK such
that (33) is satisfied and all information dimensions appearing in (63) exist.
Analogous to Theorems5and6, the following result shows that the degrees of freedom region coincides with a polyhedron for almost every channel matrix. See Fig. 1for an illustration in the three-user case.
Theorem 10. Let H satisfy the assumption in Theorem 5. Then
DoF
(H)⊂co e1, . . . ,eK,1 21,0 , (64)where{e1, . . . ,eK}is the standard basis, andcodenotes the
convex hull. Moreover, (64) holds with equality for Lebesgue-a.e. H. Proof. SectionV-C. d1 d2 d3 1 1 1
Fig. 1. DoF region in the three-user case: Outer bound (64).
G. An example of lower-triangular channel matrix
In this section we consider a prototypical example of the fol-lowing lower-triangular channel matrix [5, Section V], which captures the essential difficulty of the three-user interference channel: Hλ, 1 0 0 1 λ 0 1 1 1 . (65)
Since the off-diagonals of Hλ are integers, equality holds
in (32), where the maximization can be further simplified as follows: Theorem 11. DoF(Hλ) = 1 + sup X1,X2 d(X1+λX2)−d(X1+X2). (66) Moreover, 1) DoF(Hλ) =DoF(HTλ). 2) For allλ6= 0, DoF(Hλ) =DoF(Hλ−1). (67) 3) DoF(Hλ)≥1 with equality if and only ifλ= 0 or 1;
4) DoF(Hλ)≤32 with equality if and only ifλis irrational.
5) For integersλ= 2,3,4, . . ., 3 2− 1 λlogλ λ−1 X i=1 i λlog λ i ≤DoF(Hλ) (68) ≤ 3 2 − 1 56 logλ+ 16 (69) Therefore asλ→ ∞ onN, DoF(Hλ) = 3 2 + Θ 1 logλ . (70)
Forλ= 2, (68) – (69) can be sharpened to
1.27≈1 + log6φ≤DoF(H2)≤ 10
where φ = 1+√5
2 denotes the golden ratio, while for
λ=−1we obtain 1.10≈1 + 25 log 5−13 log 13−8 18 log 3 ≤DoF(H−1)≤ 7 5. (72) Proof. SectionV-F.
Remark 5. Note that (66) deals with the maximal information dimension difference between projections of a production measure to the lines of angle π4 and of an arbitrary angle
tan−1(λ). In view of this interpretation and the fact that tan−1(λ−1) =π
2−tan
−1(λ), (67) becomes apparent. Sinceλ
is the channel gain of the direct link for the second user, one might expect at first glance thatDoF(Hλ)should be increasing in|λ|. However, (67) shows that this is not the case.
IV. SELF-SIMILAR DISTRIBUTIONS
In this section we discuss self-similar distributions, an important subclass of singular distributions, which play the central role ofDoF-achieving input distributions in the proofs shown in Section V. As shown in Theorem 5, input distri-butions with information dimension one half are necessary for achieving the maximal degrees of freedom K2. Although, in view of (20), equal mixtures of discrete and absolutely continuous components also have information dimension one half, they can achieve at most one degree of freedom as proved in Section III-C. In order to overcome this bottleneck, it is necessary to employ singular input distributions.
A. Definitions
In general, a self-similar distribution is defined as the invariant measure ofiterated function systems, see, e.g., [29], [30]. For our purpose in this paper we focus on the special case of homogeneous self-similar distributionsdefined as follows:
Definition 3. A probability measureµis calledhomogeneous self-similar with similarity ratio r ∈ (0,1) if µ is the distribution of
X =X
k≥1
Wkrk−1, (73)
where{Wk}are i.i.d. copies of a real-valued nondeterministic
random variableW taking a finite number of values. The distribution ofX is self-similar in view of the fact that
X(d)=rX+W (74)
whereW is independent ofX. It is easy to see that (73) and (74) are equivalent. WhenWk is binary, (73) is known as the
Bernoulli convolution, an important object in fractal geometry and ergodic theory [31]. As a special case, (73) encompasses distributions defined by M-ary expansion with independent digits, in which caser= M1 andW is{0, . . . , M−1}-valued. For example, the Cantor distribution [32], supported on the middle-third Cantor set, can be defined by (73) via its ternary expansion, which consists of independent digits taking value 0 or 2 equiprobably.
B. Information dimension
It is shown in [13] that the information dimension (or dimension of orderα) of self-similar measures always exists.11
More general results on the information dimension of self-similar distributions are summarized in [18, Section 2.8].
Theorem 12 ([13, Theorem 1.1 and Lemma 2.2]). Let the distribution of X be homogeneous self-similar. Then d(X)
exists and satisfies
d(X) = sup m≥1 H([X]m) m . (75) where [x]m,hxi2m. (76)
Moreover,dα(X)exists for allα≥0.
In spite of the general existence result in [13], there is no known formula for the information dimension of self-similar distributions unless certain separation conditions are satisfied, the most common of which is theopen set condition
[16, p. 129]: There exists a nonempty bounded open set U ⊂R, such thatS
jFj(U) ⊂U and Fi(U)∩Fj(U) = ∅
for i 6= j, where we have denoted Fj(x) = rx+wj for
j = 1, . . . , m. The following lemma (proved in Appendix B) gives a sufficient condition for a homogeneous self-similar distribution (or random variables of the form (73)) to satisfy the open set condition.
Lemma 4. Let X be defined in (73) with 0 < r < 1 and
W , {w1, . . . , wm} ⊂ R. Then the open set condition is
satisfied if
r≤ m(W)
m(W) +M(W), (77)
where the minimum distance and the diameter of W are defined by m(W), min wi6=wj∈W |wi−wj|. (78) and M(W), max wi,wj∈W |wi−wj| (79) respectively.
The next result gives an upper bound on the information dimension of a self-similar homogeneous measure µ, which holds with equality if the open set condition is satisfied.
Theorem 13. For homogeneous self-similar measure µ with similarity ratio rdefined in Definition3,
dα(µ)≤
Hα(W) log1
r
, α≥0. (80)
Moreover, ifµsatisfies the open set condition, then
dα(µ) =
Hα(W) log1
r
, α≥0. (81)
Proof. See AppendixC.
11The results in [13] are considerably more general, encompassing
C. Self-similar distributions with half dimension
In this subsection, we construct a family of scalar self-similar distributions which play an important role in the achievability proof of Theorems 7 and 11. Let N ≥ 2 be an integer. Let µN be the distribution of (73) with r =
1
N2, wj =N(j−1) andpj = N1,j= 1, . . . , N. ThenµN is
homogeneous self-similar and satisfies the open set condition, in view of Lemma 4. By Theorem13,d(µN) = 12 for allN.
The DoF-achieving distributions in Theorems 7 and 11 are slight variants of µN.
Alternatively, µN can be defined by the N-ary expansions
as follows: Let X be distributed according to µN. Then the
odd digits in the N-ary expansion ofX are independent and equiprobable, while the even digits are set to zero. In view of (9), the information dimension of µN is the normalized
entropy rate of the digits, which is equal to 12.
D. Convolutions of self-similar measures
Formula (32) forDoF(H)deals with convolutions of input distributions. It is therefore of interest to investigate the behavior of self-similar measures under convolutions. The next lemma shows that convolutions of homogeneous self-similar measures with common similarity ratio are also self-similar (see also [33, p. 1341] and the references therein), but with possible overlaps which might violate the open set conditions. Therefore (81) does not necessarily apply to convolutions.
Lemma 5. Let X and X0 be independent with homogeneous
self-similar distributions of identical similarity ratio r. Then for any a, a0 ∈
R, the distribution of aX +a0X0 is also
homogeneous self-similar with similarity ratior.
Proof. According to (74), X and X0 satisfy X(d)=rX+W
and X0(d)=rX0 + W0, respectively, where W and W0 are
both finitely valued and {X, W, X0, W0} are independent.
ThereforeaX+a0X0(d)=r(aX+a0X0) +aW +a0W0, which
implies the desired conclusion. V. PROOFS
A. Auxiliary results
Before proceeding to the proof of Theorem 4, we present the following non-asymptotic bounds on mutual information, which, in view of the alternative definition of information dimension in (10), implies the high-SNR scaling law of mutual information in Theorem 2 as an immediate corollary. This argument is considerably simpler and more general than the original result in [17]. This result also allows us to conclude that the capacity region of the interference channel (1) for non-Gaussian noise can differ only by absolute additive con-stants from the Gaussian counterpart if the noise has a well-behaved density such as uniform or Laplace distribution (see Remark 6).
Lemma 6. Let B(xn, ) denote the `
∞-ball of radius
centered atxn. For anyXn and anysnr>0,
−8≤1 nI(X n,snr)−1 nE h log 1 PXn(B(Xn,1/ √ snr)) i (82) ≤2. (83) Proof. AppendixD.
To gain some insight on Lemma 6, we note that the bracketed term in (82) is close to the mutual information between X and itself contaminated by uniformly distributed noise. Consider the scalar case and let U be uniformly distributed on [−1
2, 1
2]. Then the density of Y = X +U
is given by pY(y) = 1PX(B(y, /2)) and it is easy to
verify that I(X;Y) =E[logPX(B(Y,/2))1 ], which is provably
close to E[logPX(B(X,))1 ]. The desired result for Gaussian
noise follows from the fact that the Gaussian density can be approximated by piecewise constant functions incurring at most an additive constant difference in the mutual information. Next we present several auxiliary results regarding the Shannon and R´enyi entropy of quantized random variables. The first result states that ifX is close toY and the alphabet ofX is well-separated, then the entropy ofX is also close to that ofY.
Lemma 7. Let X and Y be real-valued discrete random variables such that |X −Y| ≤ almost surely. Suppose X
takes values in a set X with minimum distance m(X) ≥ δ. Then H(X)−H(Y)≤log 1 + 2 δ . (84)
In particular, if2 < δ, then H(X)≤H(Y). Moreover, (84) also holds for R´enyi entropyHα for all α≥0.
Proof. Inequality (84) follows from
H(X)−H(Y)≤H(X, Y)−H(Y) (85) =H(X|Y) =H(X−Y|Y) (86) ≤ log 1 + 2 δ , (87)
since conditioned onY the random variable X takes at most
1 +2 δ
values. However, (86) in the above argument does not directly generalize to R´enyi entropy due to the lack of conditioning property of R´enyi entropy. Instead, (84) can be deduced from the following: First note that by Jensen’s inequality, if a distribution Qhas support cardinalitym, then
P uQ α(u)≤m1−α (resp.≥m1−α) ofα <1 (resp.α >1). Therefore, Hα(X)≤Hα(X, Y) (88) = 1 1−αlog X y Pα Y(y) X x Pα X|Y=y(x) (89) ≤Hα(Y) + log 1 + 2 δ . (90)
The following lemma admits the same proof as Lemma7.
Lemma 8. LetUnandVnbe integer-valued random vectors.
If−Bi≤Ui−Vi≤Ai almost surely, then
|H(Un)−H(Vn)| ≤ n X
i=1
which also holds for Hα for allα≥0.
The next lemma shows that the entropy of quantized linear combinations is close to the entropy of linear combinations of quantized random variables, as long as the coefficients are integers. The proof is a simple application of Lemma 8.
Lemma 9. For anyaK ∈
ZK and anyX1n, . . . , XKn, H K X j=1 aj[Xjn]m ! −H " K X j=1 ajXjn # m ! ≤nlog 2 + 2 K X j=1 |aj| ! , (92) where [·]m is defined in (76). Lemma 10. For anyhK∈
RK and anyX1n, . . . , XKn, H K X j=1 hjXjn m ≤H K X j=1 hjhXjni m + K X j=1 H( Xn j ) + 1, (93) where hXn ji,Xjn− Xn j
is the fractional part of Xn j. Proof. Define V = 2m K X j=1 hjhXjni, Z = 2 m K X j=1 hjXjn . (94) Then H(bV +Zc)−H(bVc)≤H(bV +Zc|bVc) (95) ≤1 +H(bbVc+Zc|bVc) (96) ≤1 +H(Z) (97) ≤1 + K X j=1 H( Xn j ). (98)
We state a Diophantine approximation result on manifolds due to Kleinbock and Margulis [34, Theorem A] which is useful for proving Theorems 4 and6. The version below is stated in terms of the Diophantine approximation exponent introduced by Mahler [35], which is defined as follows:
Definition 4. Letm, n∈N. For x∈Rn, denote by (x),argmin
z∈Zn
kx−zk∞ (99)
denote the `∞-distance of xto the nearest integer vector. For A∈Rn×m, definewˆ(A)to be the supremum ofw >0 such
that for all sufficiently large real numberQ >0, the set {q∈Zm\{0}: (Aq)≤Q−w,kqk
∞≤Q} (100)
is non-empty.
For any matrix A ∈ Rn×m, the Dirichlet approximation
theorem [36, Theorem VI, p. 13] states that
ˆ
w(A)≥ m
n, (101)
which holds with equality for Lebesgue-a.e. A∈Rn×m due
to the Borel-Cantelli lemma [36, Chapter VII]. The Kleinbock-Margulis theorem generalizes this result to manifolds.
Lemma 11 ([37, p. 823]). Let M be a non-degenerate manifold ofRk. Then w(a) = 1k and w(a
T) =k for almost
everya∈ M.
Lemma 11 deals with homogeneous diophantine approx-imation (approximating zero by linear forms with integer coefficients). In the course of proving Theorem 4, we will also need results oninhomogenousdiophantine approximation (approximating an arbitrary point by linear forms), which are connected to the homogeneous case via the so-called
transference theorems (see, e.g., [38], [39]). The following transference result is from [39, Lemma 3, p. 750], which is a direct consequence of [38, Theorem XVII.B, p. 99] (see also [36, Theorem VI, p. 318] for a more general version). Note that this result holds for any fixed matrixG.
Lemma 12. Let G∈ Rn×m. Let κ= 21−m−n((m+n)!)2.
LetX, Y >0. Suppose the set
{y∈Zn\{0}:(GTy)
∞< κX−1,kyk∞≤Y} (102)
is empty. Then for allz∈Rn, the set
{x∈Zm:k(Gx+z)k∞≤κY −1,kxk
∞≤X} (103)
is non-empty.
In the sequel we adopt the following conventional notation of sumsets and dilations:
A+B={a+b:a∈A, b∈B}, (104)
λA={λa:a∈A}. (105)
We need the following lemma for minimum distances:
Lemma 13. Let W , {w1, . . . , wm} ⊂ R and let r > 0
satisfy (77). Then for any n∈N,
m(W+rW+· · ·+rn−1W)≥rn−1m(W). (106)
Moreover,W+rW+· · ·+rn−1WandWnare in one-to-one
correspondence.
Proof. Let an, bn be distinct elements of Wn. Then k
, min{i : ai 6= bi,1 ≤ i ≤ n} is well-defined. Note that
|Pni=1(ai−bi)ri−1| ≥ |ak−bk|rk−1−Pni=k+1ri−1|ai−bi|. Consequently, m(W+rW+· · ·+rn−1W) (107) ≥ min 0≤k≤n−1 ( rkm(W)−M(W) n−1 X i=k+1 ri ) =rn−1m(W).
To see that the above minimum is attained atk=n−1, note that the assumption that r ≤ m
m+M implies that m M ≥ r 1−r, which is equivalent to rkm−MPn−1
i=k+1ri ≥rn−1m for all 0≤k≤n−1.
B. General formula
In this section we prove Theorem 4. As a side product of the proof, we also establish that the capacity regions of the Gaussian interference channel and a particular deterministic interference channel differ by at most universal constants (See Remark 6).
Step 1. We start by proving the achievability side of the theorem, which holds for any channel matrix H. Choosing i.i.d. input distribution, we have
¯ C(H,snr)≥ K X i=1 I(Xi;Yi) (108) = K X i=1 I K X j=1 hijXj,snr ! −I X j6=i hijXj,snr ! . In view of Theorem2, dividing both sides by 12logsnr, send-ing snr→ ∞ and optimizing over independent X1, . . . , XK
subject to (33) yield (32).
Step 2. The rest of the subsection is devoted to proving the converse side of the theorem except possibly for H whose off-diagonals are non-algebraic and belong to a set of zero Lebesgue measure, regardless of the diagonal entries. To this end, first we state a number-theoretic condition in terms of the Diophantine approximation exponents, which are satisfied in either of the two cases.
Let`,K(K−1). Given a matrixH∈RK×K, denote by ˘
h the`-dimensional vector consisting of the off-diagonals of
H. Letb∈Nand define Lb , `+b b (109) Let P`,b={f1, . . . , fLb} (110)
denote the collection of all `-variant monomials of degree at most b, where f1 ≡ 1. Define the column vector pb(H) = (f2(˘h), . . . , fLb(˘h))
T∈ RLb−1.
Recalling the Diophantine approximation exponent wˆ(·)in Definition4, we define
wb(H),wˆ(pb(H)), wb0(H),wˆ(pb(H)T). (111)
We will assume the following condition on the cross-channel gains for the converse proof:
wb(H)w0b+1(H) b→∞
−−−→1. (112)
This assumption warrants some explanation. First of all, as a consequence of (101), for any matrix H, we havewb(H)≥
1 Lb−1 andw 0 b+1(H)≥Lb+1−1. Note thatLb+1 =`+b+1b+1 Lb and`=K(K−1). Then lim inf b→∞ wb(H)w 0 b+1(H)≥blim →∞ Lb+1−1 Lb−1 = 1. (113) Therefore the assumption (112) amounts to requiring that the
lim sup, hence the limit, are also equal to one.
As mentioned earlier, sufficient conditions for (112) include the following:
1) Algebraiccase: Fori6=j, lethij be algebraic of degree
rij. Consider the finite set
S=n Y i6=j hkij ij :kij∈ {0, . . . , rij−1}, i6=j o . (114) Then there existsS0 ⊂ S such that any point in S can
be written as a linear combination of elements ofS0 with
rational coefficients. Letrdenote the rational dimension of S, i.e., the smallest cardinality of such S0. Then
r ≤ Q
i6=jrij. Therefore, for all degree b ≥ maxrij, pb(H) has rational dimension r. As a consequence of
the Schmidt subspace theorem, we have [40, p. 2166]: wb(H) =
1 min{Lb−1, r}
, (115)
wb+10 (H) = min{Lb+1−1, r}. (116)
Hencewb(H)w0b+1(H) = 1 for all sufficiently largeb.
2) Almost surecase: Since{pb(G) :G∈RK×K}is a
non-degenerate manifold of RLb−1 (see, e.g., [37, p. 822]),
Lemma11implies that wb(H) =
1
Lb−1
, (117)
wb+10 (H) =Lb+1−1. (118)
hold for a.e.H∈RK×K, which satisfy (112).
As mentioned in Section III-A, in the three-user case the channel matrix can be reduced to the standard form (34), where the off-diagonals are either one or h. Then the polynomials of the off-diagonals are in fact univariate polynomials of h. Consequently, we can abbreviate wb(H)
as wb(h) = w((h, . . . , hb)T), which is the exponent for
simultaneous approximation of powers of a real number, and w0
b(H) as w0b(h) = w((h, . . . , h
b)), which is the exponent
for polynomial approximation. These are the central objects studied in [39], [40], [41]. In particular, by (101) and [41, Theorems 20 and 27], we have 1b ≤ w0
b(h) ≤ 2 b and
b≤w0
b(h)≤2b−1. Therefore for anyh, 1≤lim inf b→∞ wb(h)w 0 b+1(h)≤lim sup b→∞ wb(h)w0b+1(h)≤4,
and the condition (112) requires that the limit is equal to one. Step 3. We proceed to show that (32) holds with equality by assuming the condition (112).
By the monotonicity ofsnr7→C¯(H,snr), we may restrict the limit in (3) to any increasing unbounded subsequence {snrm}m≥1as long as the growth is no faster than exponential,
i.e., sup m≥1 snrm+1 snrm <∞. (119)
In particular, we choose snrm = 4m. Consequently, we can
restrict to this subsequence in the limiting characterization in (39). Define g(Xn, ) ,E log 1 PXn(B(Xn, )) (120) =E " log 1 PXn∈B( ¯Xn, )|X¯n # , (121)
where X¯n is an independent copy of Xn. By Lemma 6, for
any Xn and anysnr>0,
−8n≤I(Xn,snr)−gXn,snr−1 2
(122)
≤2n. (123)
By [10, (294) and (296)] (or [13, Lemma 2.3]), for anym∈N, 0≤H([Xn]
m)−g(Xn,2−m) (124)
≤3n. (125)
Plugging (122) – (125) into (39), we have the following equivalent limiting characterization of DoF(H):
DoF(H) = lim sup
m→∞ nlim→∞Xnsup 1,...,XnK 1 nm (126) K X i=1 H " X j hijXjn # m ! −H " X j6=i hijXjn # m ! , whereXn
1, . . . , XKn are independent, such thatH(bXikc)≤C
for some C ≥0 and for all i = 1, . . . , K, k = 1, . . . , n. In view of Lemma 10, the right side of (126) is unchanged if we replace all inputs by their fractional parts. Therefore, it suffices to consider [0,1]-valued inputs in the maximization of (126). Moreover, by the scale-invariance of theDoF(·), we can assume, without loss of generality, that kHk∞≤1, i.e.,
|hij| ≤1 for alli, j.
For simplicity, in the remainder of the proof we abbreviate Lb and Lb+1 by L and L0, respectively, and suppress the
dependence of wb(H) andwb+10 (H) onH. By Definition 4,
for any δ >0, there exists Q0=Q0(δ)>0 such that for all
Q≥Q0, {q∈Z\{0}: (pb(H)q)≤Q−(1+δ)wb,|q| ≤Q}=∅ (127) and n q∈ZL 0−1 \{0}:(pb+1(H)Tq)≤Q−(1+δ)wb0+1, kqk∞≤Qo=∅. (128)
Next we assume that kHk∞≤1. Any finite set Ainduces a quantizer QA:R→Adefined by
QA(x) = argmin a∈A
|x−a|, (129)
with ties broken arbitrarily. For an arbitrary positive integer N, define the following sets (subsets of a lattice)
V, 1 N ( L X i=1 qifi(˘h) :qi ∈Z,|q1| ≤LN, max 2≤i≤L|qi| ≤N ) , (130) V0, 1 N ( L0 X i=1 qifi(˘h) :qi ∈Z,|qi| ≤LN, i= 1, . . . , L0 ) . (131) As a consequence of (127), we can upper bound the ap-proximation error of the quantizer QV as follows. Recall the
constant κ defined in Lemma 12, which, in the special case of m+n=L, is given by κ=κ(b) = 21−L(L!)2. (132) Let N0 = N0(b, δ) = κQw0b(1+δ) and Q = d( N κ) 1 (1+δ)wbe.
Let N ≥ N0 and, equivalently, Q ≥ Q0. Hence for all
integer |q| ≤ Q, we have (pb(H)q)> Q−(1+δ)wb. Applying
Lemma 12with X =N, Y = Q, m=L−1, n= 1 yields the following: For allθ∈[0,1], there exists x∈ZL−1, such
that kxk∞≤N and
(pb(H)Tx+N θ)≤κY−1≤κ(κ/N)
1
(1+δ)wb, (133)
By the assumption that kHk∞ ≤ 1, we have |fi(˘h)| ≤ 1
for all i, hence |pb(H)Tx+N θ| ≤LN. Recall that (x) = mina∈Z|x−a|. Therefore (133) implies that there existsq∈ RL, such that |q1| ≤ LN,|qi| ≤ N,2 ≤ i ≤ L, such that
|N−1PL
i=1qifi(˘h) +θ| ≤ (κ/N) 1+ 1
(1+δ)wb. In other words,
for allN ≥N0, we have sup
θ∈[0,1]
|QV(θ)−θ| ≤(κ/N)1+
1
(1+δ)wb. (134)
Using (128), we can lower bound the minimum distance of the set V0 as follows. For all Q≥Q
0, (128) implies that
for any q ∈ ZL 0
such that |qi| ≤ Q,2 ≤ i ≤ L0, we have
|PLi=10 qifi(˘h)| ≥Q−(1+δ)w 0 b+1. Since V0− V0= 1 N L0 X i=1 qifi(˘h) :qi∈Z,|qi| ≤2LN , (135) then as long asN ≥Q0 2L, we have m(V0)≥ 1 N(2LN) −(1+δ)w0 b+1. (136)
Therefore we conclude that (134) and (136) hold simultane-ously for allN ≥max{N0,Q2L0}.
By the multi-letter characterization (126), for any >
0 and sufficiently large m, there exist n and independent Xn 1, . . . , XKn withX n j = (Xjt)nt=1∈[0,1]n, such that DoF(H)≤+ 1 mn K X i=1 H " X j hijXjn # m ! −H " X j6=i hijXjn # m ! . (137)
The goal is to construct single-letter input distributions {Xˆ1, . . . ,XˆK}based on the multi-letter inputs X1n, . . . , XKn,
such that dof( ˆXK,H) dominates the right side of (137). To
this end, fixN to be chosen depending onmandb later. Set
˜
Xjt = QV(Xjt) ∈ V for all 1 ≤ t ≤ n. In view of (134),
we have kX˜n j −X
n
jk∞ ≤ (κ/N) 1+ 1
(1+δ)wb for all j. Next
we show thatH(P
jhijX˜jn)andH([ P
jhijXjn]m)are close.
First note that
X j6=i hijX˜jn− " X j6=i hijXjn # m ∞ ≤2−m+K(κ/N)1+(1+1δ)wb. (138) Moreover, for allt,P
j6=ihijX˜jttakes values inPj6=ihijV ⊂
V0. We conclude from (136) that the minimum distance satisfy mX j6=ihijV ≥m(V0)≥ 1 N(2LN) −(1+δ)w0 b+1. (139)
Set N= $ 1 2L 2m 4K (1+δ)w10 b+1+1 % . (140)
Applying Lemma 7 yields
1 nH X j6=i hijX˜jn ! − 1 nH " X j6=i hijXjn # m ! ≤ log 1 + 22 −m+K(κ/N)1+(1+1δ)wb 1 N(2LN) −(1+δ)w0 b+1 ! (141) ≤ log 3 + 2KN (κ/N) 1+ 1 (1+δ)wb (2LN)−(1+δ)wb0+1 ! (142) ≤C(b, δ) + (1 +δ)w0 b+1− 1 (1 +δ)wb logN, (143) where (142) follows from (140) since 2−m ≤
1 N(2LN)
−(1+δ)w0b+1, and in (143) we have defined C(b, δ),log 3 + log(16K2Lκ1+(1+1δ)wb(2L)(1+δ)w0b+1) which only depends onbandδ, sinceκandLare functions of b only. Analogously, a similar application of Lemma7yields
1 nH " X j hijXjn # m ! −1 nH X j hijX˜jn ! ≤ log 1 + 22 −m+K(κ/N)1+(1+1δ)wb 2−m ! (144) ≤C(b, δ) + (1 +δ)w0b+1− 1 (1 +δ)wb logN. (145) Now we construct scalar input distributions which are self-similar (see Section IV) andDoF-achieving. First put
Wj= n X k=1 ˜ Xjkrk−1, (146) where r= 2−m. (147)
Let {Wjt:t≥0} be a sequence of i.i.d. copies ofWj. Set ˆ
Xj = X
t≥0
Wjtrnt. (148)
Note that eachXˆjhas a homogeneous self-similar distribution
with similarity ratiorn. In view of Theorem12,d( ˆX
j)exists
for allj. Moreover, by (148),
K X j=1 hijXˆj= X t≥0 K X j=1 hijWjt ! rnt (149)
is also homogeneously self-similar with similarity ratio rn,
in view of Lemma 5. Note that M(P
jhijV) ≤ KM(V) ≤ KM([−2L,2L]) = 4KL. Combining (139), (140) and (147), we have r≤ m( P jhijV) 2M(P jhijV) . (150) SetW,Pnk=1rk−1V. Since X˜ jt∈ V, we have X j hijWj ∈ X j hijW= n X k=1 rk−1 X j hijV ! . (151) In view of (150), applying Lemma13yields
mX jhijW ≥rn−1mX jhijV . (152) On the other hand, the diameter satisfies M(P
jhijW) ≤
KM(W)≤KPnk=1rk−1M(V)≤8KL. Therefore, in view
of (150) and (152), our choice ofr in (147) satisfies rn≤ m( P jhijW) 2M(P jhijW) . (153)
Invoking the sufficient conditions in Lemma 4, we conclude that P
jhijXˆj satisfies the open set condition. Hence, by
Theorem13, the information dimension is given by
d X j hijXˆj = HP jhijWj log 1 rn = H P jhijWj nm = H P jhijX˜jn nm , (154)
where (154) follows from the fact thatPnk=1rk−1(P jhijV)
and(P jhijV)
n are in one-to-one correspondence with each
other, in view of (146) and Lemma13. Entirely analogously, we have d X j6=i hijXˆj = HP j6=ihijX˜jn nm . (155)
Assembling (140), (145) and (154), we obtain
d X j hijXˆj ≥ 1 nmH " X j hijXjn # m ! −C(b, δ) m − (1 +δ)w0b+1− 1 (1 +δ)wb logN m , (156) Combining (140), (143) and (155) yields
d X j6=i hijXˆj ≤ 1 nmH " X j6=i hijXjn # m ! +C(b, δ) m + (1 +δ)w0 b+1− 1 (1 +δ)wb logN m . (157) Recall the notationdofdefined in (31). Plugging (156) – (157) into (137) and summing overi, we obtain
DoF(H)≤dof( ˆXK,H) ++2KC(b, δ) m (158) + (1 +δ)w0 b+1− 1 (1 +δ)wb 2KlogN m .
Note that according to (140), logmN −−−−→m→∞ 1 1+(1+δ)w0 b+1 ≤ 1 (1+δ)w0 b+1
. Sending m ↑ ∞, then b ↑ ∞, and δ ↓ 0, and finally ↓0, we have DoF(H) ≤ sup XK dof(XK,H) + lim δ→0blim→∞2K 1− 1 (1 +δ)2w bwb+10 = sup XK dof(XK,H) + 2K 1− 1 limb→∞wbwb+10 = sup XK dof(XK,H),
where the last step follows from our assumption (112). This completes the proof of the converse.
Remark 6. Deterministic channel approximation has been employed to determine the capacity region of various network communication problems within constant gaps, see, e.g., [42], [43]. As a side product of the proof of Theorem4, we establish a connection between the Gaussian interference channel
YK =√snrHXK+NK (159) and the following deterministic Gaussian interference channel,
˜
YK = [HXK]
m (160)
where [·]m, defined in (76), is taken componentwise, and EXi2
≤ 1. It can be shown that for m ≈ 1
2logsnr, the
capacity regions of (1) and (160) differ by at most universal constants. To see this, denote the capacity region of (160) by
˜
Cm(H). Assembling (122) – (125) as well as the multi-letter
characterization (36) and (38), we conclude that there exist a universal constant c (which is at most 22 bits), such that
dH(C(H,snr),C˜m(H))≤c (161)
holds uniformly for anysnr>0and anyH, wheremandsnr
are related through m=1 2logsnr , and dH(A, B),max n sup b∈B inf a∈Aka−bk∞,supa∈Abinf∈Bka−bk∞ o . Moreover, (161) also holds for quantized interference channel with (160) replaced by YK = [[H]
mXK]m or YK = [[H]m[XK]m]m. Note that a related one-sided bound was
given in [5, Lemma 4].
Furthermore, we note that for non-Gaussian additive noise with well-behaved densities such as uniform or Laplace dis-tribution, the capacity region is only a constant gap away from the Gaussian one. This follows from the multi-letter characterization and the uniform bound of mutual information in Lemma6 (see also Remark10in AppendixD).
Remark 7(Rational channel matrices). For channel matrices with rational entries, the general formula can be proved without appealing to diophantine approximation arguments. Essentially, we can simply constructXˆKbased on a quantized
version ofXK. An elementary proof is as follows: SinceDoF
is scale-invariant, we assume that hij ∈ Z without loss of
generality. The gist of constructing Xˆj is by concatenating
the bits from Xn
j. Let L be an integer such that L >
1 + max1≤i≤Klog(1 + PK j=1|hij|)and let M =n(m+L). Define ˆ Xj = X l≥0 Zjl2−M l, (162)
where{Zjl}l≥1 are independent copies of
Zj= n X k=1 2−L[X jk]m2−(m+L)(k−1). (163)
Note that Zj is a discrete random variable whose binary
expansion has M bits, formed by concatenating the first m bits of each Xjk preceded by L zeros. Therefore, Zj takes
values in 2−M{0, . . . ,2M−L−1}. Note that each Xˆ j has
a homogeneous self-similar distribution with similarity ratio
2−M. In view of Theorem12,d( ˆX
j)exists for allj. Moreover,
by (162), K X j=1 hijXˆj = X l≥0 K X j=1 hijZjl ! 2−M l (164)
is also homogeneous self-similar with similarity ratio2−M, in
view of Lemma 5. It is easy to verify that the distribution of (164) also satisfies the open set condition: Let
W= 2−M K X
j=1
hij{0, . . . ,2M−L−1} (165)
denote the alphabet of PKj=1hijZj. Since hij ∈ Z, the
minimum distance of W is given by m(W) = 2−M,
while the diameter is upper bounded by M(W) ≤ (2−L − 2−M)PK
j=1|hij| ≤ 12. By Lemma4, (164) satisfies the open
set condition. Applying Theorem13, we have M ·d K X j=1 hijXˆj ! =H K X j=1 hijZj ! (166) =H n X k=1 K X j=1 2−Lhij[Xik]m2−(m+L)(k−1) ! (167) =H K X j=1 hij[Xi1]m, . . . , K X j=1 hij[Xin]m ! (168) =H K X j=1 hij[Xin]m ! (169) where (167) is due to (163) and (168) follows from the fact that each (m+L)-bit block in the summation of (167) does not overlap. Sincehij∈Z, by Lemma9,
H K X j=1 hij[Xin]m ! −H " K X j=1 hijXin # m ! ≤nlog 2 + 2 K X j=1 |hij| ! (170) ≤nL. (171)
Combining (169) and (171), we have (m+L)d K X j=1 hijXˆj ! − 1 nH " K X j=1 hijXin # m ! ≤L. (172) Similarly, (172) holds if the summation over j excludes i. Plugging (172) into (137) and summing overi, we have
DoF(H)≤ m+L N K X i=1 d K X j=1 hijXˆj ! −d X j6=i hijXˆj ! ++2KL m . (173)
By the arbitrariness of and m, the proof for rational H is complete.
C. The K2 upper bound
Proof of Theorem 5. Counting in different ways, we have
2dof(XK,H) = K X i=1 d X j hijXj +d X j hijXj −d X j6=i hijXj −d X j6=π(i) hπ(i)jXj (174) ≤K+ K X i=1 d X j hijXj −d X j6=i hijXj −d(Xi) (175) ≤K, (176)
where (175) follows from (13) – (15), and (176) is due to (16). If (174) – (176) hold with equality, then for alli, we have
1 =d X j hijXj (177) =d X j6=i hπ(i)jXj ! +d(Xi) (178) =d(Xπ(i)) +d(Xi). (179)
Since πis cyclic, the system of linear equations
d(Xπ(i)) +d(Xi) = 1, i= 1, . . . , K (180)
has a unique solution:d(Xi) =12 for alli, yielding the desired
(58) – (60).
Remark 8. Based on the fact that, with nonzero off-diagonal entries, the degrees of freedom of the two-user IC is at most one, (57) can be shown, alternatively, using a similar method to [2, Proposition 1]. Consider user i and user π(i). Since hπ(i),i6= 0, we have
Ri+Rπ(i)≤ 1
2logsnr+o(logsnr) (181)
even in the absence of interference from users other thaniand π(i). Summing (181) over igives (57).
Proof of Theorem10. To prove (64), it is equivalent to show that for anydK∈
DoF
(H)and any probability vectorwK,hdK, wKi ≤max w1, . . . , wK, 1 2 . (182) Repeating (174) – (176), we have hdK, wKi ≤1 2. (183)
Therefore (182) is indeed satisfied if wi≤ 12 for alli.
Otherwise, there exists a uniquewi, sayw1 for simplicity,
such that w1> 1 2 > X j6=1 wj. (184)
Then for anyXK, K X i=1 wid X j hijXj −wid X j6=i hijXj ≤w1−w1d K X j=2 h1jXj ! + K X i=2 wid(Xi) (185) ≤w1− K X i=2 wi d K X j=2 h1jXj ! −d(Xi) , (186) ≤w1, (187)
where (185) – (187) follow from (16), (184) and (15), re-spectively. Note that the necessary and sufficient condition for (187) to hold with equality is d(Xj) = 0for allj6= 1.
The proof of the almost sure equality of (64) is analogous to the proof of Theorem6in SectionV-D.
D. Almost sure results on degrees of freedom
Proof of Theorem6. The outline of the proof is as follows: According to Theorem 4, we construct input distributions depending only on the off-diagonal entries ofHsuch that (58) and (59) are (approximately) satisfied. Using the projection theorem [12, Theorem 1] with respect to the diagonal entries, the condition (60) (approximately) holds for Lebesgue-almost every diagonal elements.
Let`,K(K−1). Recall thath˘ denotes the`-dimensional vector formed by the off-diagonal entries ofH. LetN, b∈N
and r ∈ (0,1) be parameters to be specified later. We construct an input distribution µ which depends on h˘ and parameters(N, b, r)as follows: Recall the set of monomials P`,b={f1, . . . , fL}defined in (110), whereL=Lbis defined
in (109). Let µ be the homogeneous self-similar distribution of similarity ratio r as per Definition 3, with pj = N1L and
W={w1, . . . , wm} given by W, 1 N ( L X i=1 qifi(˘h) :qi∈ {0, . . . , N−1} ) , (188) i.e., the set of all`-variate polynomials evaluated ath˘, with de-grees not exceedingband coefficients valued in the arithmetic progression N1{0, . . . , N−1}. LetX be distributed according
toµ. ThenX is of the self-similar form in (73), where{Wk}
are i.i.d. and equiprobable onW.
Next we choose rproperly so that µsatisfies the open set condition, which renders the formula (81) applicable. This part is analogous to the second step in the converse proof of Theorem 4. In view of Lemma 4, we proceed by giving lower and upper bounds on the minimum distance m(W)
and the diameter M(W), which are defined in (78) and (79) respectively.
Recall that pb(H) = (f2(˘h), . . . , fL(˘h))0 ∈RL−1 denotes
the vector of non-constant monomials of h˘ with degree not exceeding b. By Lemma11,w(pb(H)T) =L−1 for a.e.h˘.
Therefore, for a.e. h˘, there exists N0 = N0(b,h˘) > 0 such
that for all N ≥N0,
{q∈ZL−1\{0}: (pb(H)Tq)≤N−2L,kqk∞≤N}=∅,
(189) Therefore as long as N ≥N0, we have
m(W)> N−2L, (190) which implies that|W|=NLin particular. On the other hand,
M(W)≤2 L X i=1 |fi(˘h)|. (191) Defining C=C(b,h˘),4 L X i=1 |fi(˘h)|, (192) we choose r= 1 CN2L, (193)
which, in view of (190) – (192), satisfies r≤ m(W)
2M(W) ≤
m(W)
m(W) +M(W). (194)
By Lemma 4,µsatisfies the open set condition. By Theorem
13, for allα≥0, we have dα(µ) = log|W| log1 r = LlogN 2LlogN+ logC. (195) Next we upper bound the information dimension of the
interference. Let X1, . . . , XK be i.i.d. according to µ. By
Lemma 5, the distribution of P
j6=ihijXj is homogeneous
self-similar with similarity ratio r. By Theorem13,
d X j6=i hijXj ! ≤ H P j6=ihijWj log1 r , (196)
where the Wj’s are i.i.d. copies ofW. Next we upper bound
the alphabet size of P
j6=ihijWj: In view of (188), we have X j6=i hijW ⊂ ( 1 N L0 X i=1 qifi(˘h) :qi∈ {0, . . . , K(N−1)} ) , (197) where L0 = L b+1 = `+b+1` per (109). Combining (196), (197) and (193) gives d X j6=i hijXj ! ≤ L 0log(K(N−1)) log1r (198) = L 0(logK+ logN) 2LlogN+ logC . (199) By the dimension preservation result in Lemma 3, for any fixedh˘ and anyα∈(1,2],
dα X j hijXj ! = min 1, dα(Xi) +dα X j6=i hijXj ! (200) holds for Lebesgue-almost everyhii. Then
d X j hijXj ! ≥dα X j hijXj ! (201) = min 1, dα(Xi) +dα X j6=i hijXj ! (202) ≥ min{1,2dα(µ)} (203) = 2LlogN 2LlogN+ logC, (204) where • (201): by the monotonicity of α7→dα; • (202): by (200); • (203): by (15); • (204): by (195)
Subtracting (199) from (204) then summing over i, we con-clude that
DoF(H)≥ 2LKlogN−L
0K(log(K−1) + logN)
2LlogN+ logC (205)
holds for almost everyH and everyb, N ∈Z. Recall that C is given by (192), which does not depend on N. Therefore, sending N→ ∞ first and thenb→ ∞in (205), we have
DoF(H) = lim b→∞ K 2 − K 2(b+1) `+b+1 + 1 (`+b+1 ` ) (206) = K 2, (207)
where in (206) we used the fact thatL0 =`+b+1 b+1 L.
Remark 9. A variant of the Kleinbock-Margulis diophantine approximation result (Lemma 11) is also used in [3] to construct a communication scheme, called real interference alignment, that achieves K2 degrees of freedom almost surely. The diophantine approximation theorem is then invoked to lower bound the minimum distance in the multi-layered signal constellation. In a similar spirit, in our proof the role of Lemma11 is to prove that µsatisfies the open set condition almost surely, which leads to (195), i.e., the information dimension of each user is close to 12. To upper bound the