14 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Asymptotic equipartition property | 2/3 | https://en.wikipedia.org/wiki/Asymptotic_equipartition_property | reference | science, encyclopedia | 2026-05-05T14:39:53.761478+00:00 | kb-cron |
lim
n
→
∞
Pr
[
|
−
1
n
log
p
(
X
1
,
X
2
,
…
,
X
n
)
−
H
¯
n
(
X
)
|
<
ε
]
=
1
∀
ε
>
0
{\displaystyle \lim _{n\to \infty }\Pr \left[\,\left|-{\frac {1}{n}}\log p(X_{1},X_{2},\ldots ,X_{n})-{\overline {H}}_{n}(X)\right|<\varepsilon \right]=1\qquad \forall \varepsilon >0}
where
H
¯
n
(
X
)
=
1
n
H
(
X
1
,
X
2
,
…
,
X
n
)
{\displaystyle {\overline {H}}_{n}(X)={\frac {1}{n}}H(X_{1},X_{2},\ldots ,X_{n})}
=== Applications === The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the source coding theorem for non-stationary source (with independent output symbols) and noisy-channel coding theorem for non-stationary memoryless channels.
== Measure-theoretic form ==
T
{\textstyle T}
is a measure-preserving map on the probability space
Ω
{\textstyle \Omega }
. If
P
{\textstyle P}
is a finite or countable partition of
Ω
{\textstyle \Omega }
, then its entropy is
H
(
P
)
:=
−
∑
p
∈
P
μ
(
p
)
ln
μ
(
p
)
{\displaystyle H(P):=-\sum _{p\in P}\mu (p)\ln \mu (p)}
with the convention that
0
ln
0
=
0
{\displaystyle 0\ln 0=0}
. We only consider partitions with finite entropy:
H
(
P
)
<
∞
{\textstyle H(P)<\infty }
. If
P
{\textstyle P}
is a finite or countable partition of
Ω
{\textstyle \Omega }
, then we construct a sequence of partitions by iterating the map:
P
(
n
)
:=
P
∨
T
−
1
P
∨
⋯
∨
T
−
(
n
−
1
)
P
{\displaystyle P^{(n)}:=P\vee T^{-1}P\vee \dots \vee T^{-(n-1)}P}
where
P
∨
Q
{\textstyle P\vee Q}
is the least upper bound partition, that is, the least refined partition that refines both
P
{\textstyle P}
and
Q
{\textstyle Q}
:
P
∨
Q
:=
{
p
∩
q
:
p
∈
P
,
q
∈
Q
}
{\displaystyle P\vee Q:=\{p\cap q:p\in P,q\in Q\}}
Write
P
(
x
)
{\textstyle P(x)}
to be the set in
P
{\textstyle P}
where
x
{\textstyle x}
falls in. So, for example,
P
(
n
)
(
x
)
{\textstyle P^{(n)}(x)}
is the
n
{\textstyle n}
-letter initial segment of the
(
P
,
T
)
{\textstyle (P,T)}
name of
x
{\textstyle x}
. Write
I
P
(
x
)
{\textstyle I_{P}(x)}
to be the information (in units of nats) about
x
{\textstyle x}
we can recover, if we know which element in the partition
P
{\textstyle P}
that
x
{\textstyle x}
falls in:
I
P
:=
−
ln
μ
(
P
(
x
)
)
{\displaystyle I_{P}:=-\ln \mu (P(x))}
Similarly, the conditional information of partition
P
{\textstyle P}
, conditional on partition
Q
{\textstyle Q}
, about
x
{\textstyle x}
, is
I
P
|
Q
(
x
)
:=
−
ln
P
∨
Q
(
x
)
Q
(
x
)
{\displaystyle I_{P|Q}(x):=-\ln {\frac {P\vee Q(x)}{Q(x)}}}
h
T
(
P
)
{\textstyle h_{T}(P)}
is the Kolmogorov-Sinai entropy
h
T
(
P
)
:=
lim
n
1
n
H
(
P
(
n
)
)
=
lim
n
E
x
∼
μ
[
1
n
I
P
(
n
)
(
x
)
]
{\displaystyle h_{T}(P):=\lim _{n}{\frac {1}{n}}H(P^{(n)})=\lim _{n}E_{x\sim \mu }\left[{\frac {1}{n}}I_{P^{(n)}}(x)\right]}
In other words, by definition, there is a convergence in expectation. The SMB theorem states that when
T
{\textstyle T}
is ergodic, there is convergence in L1.
If
T
{\textstyle T}
is not necessarily ergodic, then the underlying probability space would be split up into multiple subsets, each invariant under
T
{\textstyle T}
. In this case, we still have L1 convergence to some function, but that function is no longer a constant function.
When
T
{\textstyle T}
is ergodic,
I
{\textstyle {\mathcal {I}}}
is trivial, and so the function
x
↦
E
[
lim
n
I
P
|
∨
k
=
1
n
T
−
k
P
|
I
]
{\displaystyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}{\big |}\;{\mathcal {I}}\right]}
simplifies into the constant function
x
↦
E
[
lim
n
I
P
|
∨
k
=
1
n
T
−
k
P
]
{\textstyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}\right]}
, which by definition, equals
lim
n
H
(
P
|
∨
k
=
1
n
T
−
k
P
)
{\textstyle \lim _{n}H(P|\vee _{k=1}^{n}T^{-k}P)}
, which equals
h
T
(
P
)
{\textstyle h_{T}(P)}
by a proposition.
== Continuous-time stationary ergodic sources == Discrete-time functions can be interpolated to continuous-time functions. If such interpolation f is measurable, we may define the continuous-time stationary process accordingly as
X
~
:=
f
∘
X
{\displaystyle {\tilde {X}}:=f\circ X}
. If the asymptotic equipartition property holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e.
−
1
n
log
p
(
X
~
0
τ
)
→
H
(
X
)
{\displaystyle -{\frac {1}{n}}\log p({\tilde {X}}_{0}^{\tau })\to H(X)}
where n corresponds to the degree of freedom in time τ. nH(X)/τ and H(X) are the entropy per unit time and per degree of freedom respectively, defined by Shannon. An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous
L
2
{\displaystyle {\mathcal {L}}_{2}}
functions. The asymptotic equipartition property holds if the process is white, in which case the time samples are i.i.d., or there exists T > 1/2W, where W is the nominal bandwidth, such that the T-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process. Any time-invariant operations also preserves the asymptotic equipartition property, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process.