kb/data/en.wikipedia.org/wiki/Information_theory-4.md

9.4 KiB
Raw Blame History

title chunk source category tags date_saved instance
Information theory 5/7 https://en.wikipedia.org/wiki/Information_theory reference science, encyclopedia 2026-05-05T03:56:37.735412+00:00 kb-cron
    I
    (
    
      X
      
        n
      
    
    →
    
      Y
      
        n
      
    
    )
     
    ≜
     
    
      ∑
      
        i
        =
        1
      
      
        n
      
    
    I
    (
    
      X
      
        i
      
    
    ;
    
      Y
      
        i
      
    
    
      |
    
    
      Y
      
        i
        
        1
      
    
    )
  

{\displaystyle I(X^{n}\to Y^{n})\ \triangleq \ \sum _{i=1}^{n}I(X^{i};Y_{i}|Y^{i-1})}

, where

    I
    (
    
      X
      
        i
      
    
    ;
    
      Y
      
        i
      
    
    
      |
    
    
      Y
      
        i
        
        1
      
    
    )
  

{\displaystyle I(X^{i};Y_{i}|Y^{i-1})}

is the conditional mutual information

    I
    (
    
      X
      
        1
      
    
    ,
    
      X
      
        2
      
    
    ,
    .
    .
    .
    ,
    
      X
      
        i
      
    
    ;
    
      Y
      
        i
      
    
    
      |
    
    
      Y
      
        1
      
    
    ,
    
      Y
      
        2
      
    
    ,
    .
    .
    .
    ,
    
      Y
      
        i
        
        1
      
    
    )
  

{\displaystyle I(X_{1},X_{2},...,X_{i};Y_{i}|Y_{1},Y_{2},...,Y_{i-1})}

. In contrast to mutual information, directed information is not symmetric. The

    I
    (
    
      X
      
        n
      
    
    →
    
      Y
      
        n
      
    
    )
  

{\displaystyle I(X^{n}\to Y^{n})}

measures the information bits that are transmitted causally from

      X
      
        n
      
    
  

{\displaystyle X^{n}}

to

      Y
      
        n
      
    
  

{\displaystyle Y^{n}}

. The Directed information has many applications in problems where causality plays an important role such as capacity of channel with feedback, capacity of discrete memoryless networks with feedback, gambling with causal side information, compression with causal side information, real-time control communication settings, and in statistical physics.

=== Other quantities === Other important information theoretic quantities include the Rényi entropy and the Tsallis entropy (generalizations of the concept of entropy), differential entropy (a generalization of quantities of information to continuous distributions), and the conditional mutual information. Also, pragmatic information has been proposed as a measure of how much information has been used in making a decision.

== Coding theory ==

Coding theory is one of the most important and direct applications of information theory. It can be subdivided into source coding theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.

Data compression (source coding): There are two formulations for the compression problem: Lossless data compression: the data must be reconstructed exactly; Lossy data compression: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of information theory is called ratedistortion theory. Error-correcting codes (channel coding): While data compression removes as much redundancy as possible, an error-correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel. This division of coding theory into compression and transmission is justified by the information transmission theorems, or sourcechannel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel), or more general networks, compression followed by transmission may no longer be optimal. For general sources and channels that are not necessarily stationary or ergodic, information-spectrum methods characterize coding limits using asymptotic distributions of information density rather than only single-letter entropies or mutual information. A related problem, channel resolvability, asks what rate is required for channel inputs to approximate a target output distribution; Han and Sergio Verdú connected this approximation problem to coding theorems for general channels.

Hayashi later derived general nonasymptotic and asymptotic formulas connecting channel resolvability and identification capacity, and applied these formulas to secrecy analysis for the wiretap channel.

=== Source theory === Any process that generates successive messages can be considered a source of information. A memoryless source is one in which each message is an independent identically distributed random variable, whereas the properties of ergodicity and stationarity impose less restrictive constraints. All such sources are stochastic. These terms are well studied in their own right outside information theory.

==== Rate ==== Information rate is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is:

    r
    =
    
      lim
      
        n
        →
        ∞
      
    
    H
    (
    
      X
      
        n
      
    
    
      |
    
    
      X
      
        n
        
        1
      
    
    ,
    
      X
      
        n
        
        2
      
    
    ,
    
      X
      
        n
        
        3
      
    
    ,
    …
    )
    ;
  

{\displaystyle r=\lim _{n\to \infty }H(X_{n}|X_{n-1},X_{n-2},X_{n-3},\ldots );}

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is:

    r
    =
    
      lim
      
        n
        →
        ∞
      
    
    
      
        1
        n
      
    
    H
    (
    
      X
      
        1
      
    
    ,
    
      X
      
        2
      
    
    ,
    …
    
      X
      
        n
      
    
    )
    ;
  

{\displaystyle r=\lim _{n\to \infty }{\frac {1}{n}}H(X_{1},X_{2},\dots X_{n});}

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result. The information rate is defined as:

    r
    =
    
      lim
      
        n
        →
        ∞
      
    
    
      
        1
        n
      
    
    I
    (
    
      X
      
        1
      
    
    ,
    
      X
      
        2
      
    
    ,
    …
    
      X
      
        n
      
    
    ;
    
      Y
      
        1
      
    
    ,
    
      Y
      
        2
      
    
    ,
    …
    
      Y
      
        n
      
    
    )
    ;
  

{\displaystyle r=\lim _{n\to \infty }{\frac {1}{n}}I(X_{1},X_{2},\dots X_{n};Y_{1},Y_{2},\dots Y_{n});}

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy and how well it can be compressed, the subject of source coding.

=== Channel capacity ===

Communications over a channel is the primary motivation of information theory. However, channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality. Consider the communications process over a discrete channel. A simple model of the process is shown below: