--- title: "Asymptotic equipartition property" chunk: 2/3 source: "https://en.wikipedia.org/wiki/Asymptotic_equipartition_property" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T14:39:53.761478+00:00" instance: "kb-cron" --- lim n → ∞ Pr [ | − 1 n log ⁡ p ( X 1 , X 2 , … , X n ) − H ¯ n ( X ) | < ε ] = 1 ∀ ε > 0 {\displaystyle \lim _{n\to \infty }\Pr \left[\,\left|-{\frac {1}{n}}\log p(X_{1},X_{2},\ldots ,X_{n})-{\overline {H}}_{n}(X)\right|<\varepsilon \right]=1\qquad \forall \varepsilon >0} where H ¯ n ( X ) = 1 n H ( X 1 , X 2 , … , X n ) {\displaystyle {\overline {H}}_{n}(X)={\frac {1}{n}}H(X_{1},X_{2},\ldots ,X_{n})} === Applications === The asymptotic equipartition property for non-stationary discrete-time independent process leads us to (among other results) the source coding theorem for non-stationary source (with independent output symbols) and noisy-channel coding theorem for non-stationary memoryless channels. == Measure-theoretic form == T {\textstyle T} is a measure-preserving map on the probability space Ω {\textstyle \Omega } . If P {\textstyle P} is a finite or countable partition of Ω {\textstyle \Omega } , then its entropy is H ( P ) := − ∑ p ∈ P μ ( p ) ln ⁡ μ ( p ) {\displaystyle H(P):=-\sum _{p\in P}\mu (p)\ln \mu (p)} with the convention that 0 ln ⁡ 0 = 0 {\displaystyle 0\ln 0=0} . We only consider partitions with finite entropy: H ( P ) < ∞ {\textstyle H(P)<\infty } . If P {\textstyle P} is a finite or countable partition of Ω {\textstyle \Omega } , then we construct a sequence of partitions by iterating the map: P ( n ) := P ∨ T − 1 P ∨ ⋯ ∨ T − ( n − 1 ) P {\displaystyle P^{(n)}:=P\vee T^{-1}P\vee \dots \vee T^{-(n-1)}P} where P ∨ Q {\textstyle P\vee Q} is the least upper bound partition, that is, the least refined partition that refines both P {\textstyle P} and Q {\textstyle Q} : P ∨ Q := { p ∩ q : p ∈ P , q ∈ Q } {\displaystyle P\vee Q:=\{p\cap q:p\in P,q\in Q\}} Write P ( x ) {\textstyle P(x)} to be the set in P {\textstyle P} where x {\textstyle x} falls in. So, for example, P ( n ) ( x ) {\textstyle P^{(n)}(x)} is the n {\textstyle n} -letter initial segment of the ( P , T ) {\textstyle (P,T)} name of x {\textstyle x} . Write I P ( x ) {\textstyle I_{P}(x)} to be the information (in units of nats) about x {\textstyle x} we can recover, if we know which element in the partition P {\textstyle P} that x {\textstyle x} falls in: I P := − ln ⁡ μ ( P ( x ) ) {\displaystyle I_{P}:=-\ln \mu (P(x))} Similarly, the conditional information of partition P {\textstyle P} , conditional on partition Q {\textstyle Q} , about x {\textstyle x} , is I P | Q ( x ) := − ln ⁡ P ∨ Q ( x ) Q ( x ) {\displaystyle I_{P|Q}(x):=-\ln {\frac {P\vee Q(x)}{Q(x)}}} h T ( P ) {\textstyle h_{T}(P)} is the Kolmogorov-Sinai entropy h T ( P ) := lim n 1 n H ( P ( n ) ) = lim n E x ∼ μ [ 1 n I P ( n ) ( x ) ] {\displaystyle h_{T}(P):=\lim _{n}{\frac {1}{n}}H(P^{(n)})=\lim _{n}E_{x\sim \mu }\left[{\frac {1}{n}}I_{P^{(n)}}(x)\right]} In other words, by definition, there is a convergence in expectation. The SMB theorem states that when T {\textstyle T} is ergodic, there is convergence in L1. If T {\textstyle T} is not necessarily ergodic, then the underlying probability space would be split up into multiple subsets, each invariant under T {\textstyle T} . In this case, we still have L1 convergence to some function, but that function is no longer a constant function. When T {\textstyle T} is ergodic, I {\textstyle {\mathcal {I}}} is trivial, and so the function x ↦ E [ lim n I P | ∨ k = 1 n T − k P | I ] {\displaystyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}{\big |}\;{\mathcal {I}}\right]} simplifies into the constant function x ↦ E [ lim n I P | ∨ k = 1 n T − k P ] {\textstyle x\mapsto E\left[\lim _{n}I_{P|\vee _{k=1}^{n}T^{-k}P}\right]} , which by definition, equals lim n H ( P | ∨ k = 1 n T − k P ) {\textstyle \lim _{n}H(P|\vee _{k=1}^{n}T^{-k}P)} , which equals h T ( P ) {\textstyle h_{T}(P)} by a proposition. == Continuous-time stationary ergodic sources == Discrete-time functions can be interpolated to continuous-time functions. If such interpolation f is measurable, we may define the continuous-time stationary process accordingly as X ~ := f ∘ X {\displaystyle {\tilde {X}}:=f\circ X} . If the asymptotic equipartition property holds for the discrete-time process, as in the i.i.d. or finite-valued stationary ergodic cases shown above, it automatically holds for the continuous-time stationary process derived from it by some measurable interpolation. i.e. − 1 n log ⁡ p ( X ~ 0 τ ) → H ( X ) {\displaystyle -{\frac {1}{n}}\log p({\tilde {X}}_{0}^{\tau })\to H(X)} where n corresponds to the degree of freedom in time τ. nH(X)/τ and H(X) are the entropy per unit time and per degree of freedom respectively, defined by Shannon. An important class of such continuous-time stationary process is the bandlimited stationary ergodic process with the sample space being a subset of the continuous L 2 {\displaystyle {\mathcal {L}}_{2}} functions. The asymptotic equipartition property holds if the process is white, in which case the time samples are i.i.d., or there exists T > 1/2W, where W is the nominal bandwidth, such that the T-spaced time samples take values in a finite set, in which case we have the discrete-time finite-valued stationary ergodic process. Any time-invariant operations also preserves the asymptotic equipartition property, stationarity and ergodicity and we may easily turn a stationary process to non-stationary without losing the asymptotic equipartition property by nulling out a finite number of time samples in the process.