kb/data/en.wikipedia.org/wiki/Information_theory-2.md

11 KiB
Raw Blame History

title chunk source category tags date_saved instance
Information theory 3/7 https://en.wikipedia.org/wiki/Information_theory reference science, encyclopedia 2026-05-05T03:56:37.735412+00:00 kb-cron

Intuitively, the entropy

    H
    (
    X
    )
  

{\displaystyle H(X)}

of a discrete random variable X is a measure of the amount of uncertainty associated with the value of

    X
  

{\displaystyle X}

when only its distribution is known. A high entropy indicates the outcomes are more evenly distributed, making the result harder to predict. For example, if one transmits 1000 bits (0s and 1s), and the value of each of these bits is known to the receiver (has a specific value with certainty) ahead of transmission, no information is transmitted. If, however, each bit is independently and equally likely to be 0 or 1, 1000 shannons of information (more often called bits) have been transmitted.

==== Properties ==== A key property of entropy is that it is maximized when all the messages in the message space are equiprobable. For a source with n possible symbols, where

      p
      
        i
      
    
    =
    
      
        1
        n
      
    
  

{\textstyle p_{i}={\frac {1}{n}}}

for all

    i
  

{\displaystyle i}

, the entropy is given by:

    H
    (
    X
    )
    =
    
      log
      
        2
      
    
    
    (
    n
    )
  

{\displaystyle H(X)=\log _{2}(n)}

This maximum value represents the most unpredictable state. For a source that emits a sequence of

    N
  

{\displaystyle N}

symbols that are independent and identically distributed (i.i.d.), the total entropy of the message is

    N
    ⋅
    H
  

{\displaystyle N\cdot H}

bits. If the source data symbols are identically distributed but not independent, the entropy of a message of length

    N
  

{\displaystyle N}

will be less than

    N
    ⋅
    H
  

{\displaystyle N\cdot H}

.

==== Units ==== The choice of the logarithmic base in the entropy formula determines the unit of entropy used:

A base-2 logarithm (as shown in the main formula) measures entropy in bits per symbol. This unit is also sometimes called the shannon in honor of Claude Shannon. A Natural logarithm (base e) measures entropy in nats per symbol. This is often used in theoretical analysis as it avoids the need for scaling constants (like ln 2) in derivations. Other bases are also possible. A base-10 logarithm measures entropy in decimal digits, or hartleys, per symbol. A base-256 logarithm measures entropy in bytes per symbol, since 28 = 256.

==== Binary Entropy Function ==== The special case of information entropy for a random variable with two outcomes (a Bernoulli trial) is the binary entropy function. This is typically calculated using a base-2 logarithm, and its unit is the shannon. If one outcome has probability p, the other has probability 1 p. The entropy is given by:

      H
      
        
          b
        
      
    
    (
    p
    )
    =
    
    p
    
      log
      
        2
      
    
    
    p
    
    (
    1
    
    p
    )
    
      log
      
        2
      
    
    
    (
    1
    
    p
    )
  

{\displaystyle H_{\mathrm {b} }(p)=-p\log _{2}p-(1-p)\log _{2}(1-p)}

This function is depicted in the plot shown above, reaching its maximum of 1 bit when p = 0.5, corresponding to the highest uncertainty.

=== Joint entropy === The joint entropy of two discrete random variables X and Y is merely the entropy of their pairing: (X, Y). This implies that if X and Y are independent, then their joint entropy is the sum of their individual entropies. For example, if (X, Y) represents the position of a chess piece—X the row and Y the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.

    H
    (
    X
    ,
    Y
    )
    =
    
      
        E
      
      
        X
        ,
        Y
      
    
    [
    
    log
    
    p
    (
    x
    ,
    y
    )
    ]
    =
    
    
      ∑
      
        x
        ,
        y
      
    
    p
    (
    x
    ,
    y
    )
    log
    
    p
    (
    x
    ,
    y
    )
    
  

{\displaystyle H(X,Y)=\mathbb {E} _{X,Y}[-\log p(x,y)]=-\sum _{x,y}p(x,y)\log p(x,y)\,}

Despite similar notation, joint entropy should not be confused with cross-entropy. The joint entropy of

    n
  

{\displaystyle n}

discrete random variables

      X
      
        n
      
    
    ≜
    (
    
      X
      
        1
      
    
    ,
    
      X
      
        2
      
    
    ,
    …
    ,
    
      X
      
        n
      
    
    )
  

{\displaystyle X^{n}\triangleq (X_{1},X_{2},\ldots ,X_{n})}

is

    H
    (
    
      X
      
        n
      
    
    )
    =
    H
    (
    
      X
      
        1
      
    
    ,
    
      X
      
        2
      
    
    ,
    …
    ,
    
      X
      
        n
      
    
    )
    =
    
      E
    
    
      [
      
        
        log
        
        
          P
          
            
              X
              
                1
              
            
            ,
            …
            ,
            
              X
              
                n
              
            
          
        
        (
        
          X
          
            1
          
        
        ,
        …
        ,
        
          X
          
            n
          
        
        )
      
      ]
    
  

{\displaystyle H(X^{n})=H(X_{1},X_{2},\ldots ,X_{n})=\mathbb {E} \left[-\log P_{X_{1},\ldots ,X_{n}}(X_{1},\ldots ,X_{n})\right]}

This can also be represented as a summation of their joint probability mass function:

    H
    (
    
      X
      
        n
      
    
    )
    =
    
    
      ∑
      
        
          x
          
            1
          
        
      
    
    ⋯
    
      ∑
      
        
          x
          
            n
          
        
      
    
    
      P
      
        
          X
          
            1
          
        
        ,
        …
        ,
        
          X
          
            n
          
        
      
    
    (
    
      x
      
        1
      
    
    ,
    …
    ,
    
      x
      
        n
      
    
    )
    log
    
    
      P
      
        
          X
          
            1
          
        
        ,
        …
        ,
        
          X
          
            n
          
        
      
    
    (
    
      x
      
        1
      
    
    ,
    …
    ,
    
      x
      
        n
      
    
    )
  

{\displaystyle H(X^{n})=-\sum _{x_{1}}\cdots \sum _{x_{n}}P_{X_{1},\ldots ,X_{n}}(x_{1},\ldots ,x_{n})\log P_{X_{1},\ldots ,X_{n}}(x_{1},\ldots ,x_{n})}

. Thus, joint entropy is just a subcase of entropy where the random variable is a vector giving values in the product space.

=== Conditional entropy (equivocation) === The conditional entropy or conditional uncertainty of X given random variable Y (also called the equivocation of X about Y) is the average conditional entropy over Y:

    H
    (
    X
    
      |
    
    Y
    )
    =
    
      
        E
      
      
        Y
      
    
    [
    H
    (
    X
    
      |
    
    y
    )
    ]
    =
    
    
      ∑
      
        y
        ∈
        Y
      
    
    p
    (
    y
    )
    
      ∑
      
        x
        ∈
        X
      
    
    p
    (
    x
    
      |
    
    y
    )
    log
    
    p
    (
    x
    
      |
    
    y
    )
    =
    
    
      ∑
      
        x
        ,
        y
      
    
    p
    (
    x
    ,
    y
    )
    log
    
    p
    (
    x
    
      |
    
    y
    )
    .
  

{\displaystyle H(X|Y)=\mathbb {E} _{Y}[H(X|y)]=-\sum _{y\in Y}p(y)\sum _{x\in X}p(x|y)\log p(x|y)=-\sum _{x,y}p(x,y)\log p(x|y).}

Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:

    H
    (
    X
    
      |
    
    Y
    )
    =
    H
    (
    X
    ,
    Y
    )
    
    H
    (
    Y
    )
    .
    
  

{\displaystyle H(X|Y)=H(X,Y)-H(Y).\,}