7.6 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Information theory | 2/7 | https://en.wikipedia.org/wiki/Information_theory | reference | science, encyclopedia | 2026-05-05T03:56:37.735412+00:00 | kb-cron |
The landmark event establishing the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948. Historian James Gleick rated the paper as the most important development of 1948, noting that the paper was "even more profound and more fundamental" than the transistor. He came to be known as the "father of information theory". Shannon outlined some of his initial ideas of information theory as early as 1939 in a letter to Vannevar Bush. Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs, all implicitly assuming events of equal probability. Harry Nyquist's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation W = K log m (recalling the Boltzmann constant), where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as H = log Sn = n log S, where S was the number of possible symbols, and n the number of symbols in a transmission. The unit of information was therefore the decimal digit, which since has sometimes been called the hartley in his honor as a unit or scale or measure of information. Alan Turing in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma ciphers. Much of the mathematics behind information theory with events of different probabilities were developed for the field of thermodynamics by Ludwig Boltzmann and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer in the 1960s, are explored in Entropy in thermodynamics and information theory. In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion:
"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point." With it came the ideas of:
The information entropy and redundancy of a source, and its relevance through the source coding theorem; The mutual information, and the channel capacity of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem; The practical result of the Shannon–Hartley law for the channel capacity of a Gaussian channel; as well as The bit—a new way of seeing the most fundamental unit of information.
== Quantities of information ==
Information theory is based on probability theory and statistics, where quantified information is usually described in terms of bits. Information theory often concerns itself with measures of information of the distributions associated with random variables. One of the most important measures is called entropy, which forms the building block of many other measures. Entropy allows quantification of measure of information in a single random variable. Another useful concept is mutual information defined on two random variables, which quantifies the dependence between those variables, which is done by comparing the conditional and unconditional distributions. The former quantity is a property of the probability distribution of a random variable and gives a limit on the rate at which data generated by independent samples with the given distribution can be reliably compressed. The latter is a property of the joint distribution of two random variables and is the maximum rate of reliable communication across a noisy channel in the limit of long block lengths, when the channel statistics are determined by the joint distribution. The choice of logarithmic base in the following formulae determines the unit of information entropy that is used. A common unit of information is the bit or shannon, based on the binary logarithm. Other units include the nat, which is based on the natural logarithm, and the decimal digit, which is based on the common logarithm. In what follows, an expression of the form p log p is considered by convention to be equal to zero whenever p = 0. This is justified because
lim
p
→
0
+
p
log
p
=
0
{\displaystyle \lim _{p\rightarrow 0^{+}}p\log p=0}
for any logarithmic base.
=== Entropy of an information source === Based on the probability mass function of a source, the Shannon entropy H, in units of bits per symbol, is defined as the expected value of the information content of the symbols. The amount of information conveyed by an individual source symbol
x
i
{\displaystyle x_{i}}
with probability
p
i
{\displaystyle p_{i}}
is known as its self-information or surprisal,
I
(
p
i
)
{\displaystyle I(p_{i})}
. This quantity is defined as:
I
(
p
i
)
=
−
log
2
(
p
i
)
{\displaystyle I(p_{i})=-\log _{2}(p_{i})}
A less probable symbol has a larger surprisal, meaning its occurrence provides more information. The entropy
H
{\displaystyle H}
is the weighted average of the surprisal of all possible symbols from the source's probability distribution:
H
(
X
)
=
E
X
[
I
(
x
)
]
=
∑
i
p
i
I
(
p
i
)
=
−
∑
i
p
i
log
2
(
p
i
)
{\displaystyle H(X)\ =\ \mathbb {E} _{X}[I(x)]\ =\ \sum _{i}p_{i}I(p_{i})\ =\ -\sum _{i}p_{i}\log _{2}(p_{i})}