Information entropy

Entropy is a concept in thermodynamics (see thermodynamic entropy) and information theory. The two concepts do actually have something in common, although it takes a thorough understanding of both fields for this to become apparent.

Claude E. Shannon defined a measure of entropy (H = - Σ p_i log p_i) that, when applied to an information source, could determine the channel capacity required to transmit the source as encoded binary digits. Shannon's entropy measure came to be taken as a measure of the information contained in a message as opposed to the portion of the message that is strictly determined (hence predictable) by inherent structures. For example, redundancy in language structure or statistical properties relating to the occurrence frequencies of letter or word pairs, triplets etc. See Markov chains.

Shannon's definition of Entropy is closely related to thermodynamic entropy as defined by physicists and many chemists. Boltzmann and Gibbs did considerable work on statistical thermodynamics, which became the inspiration for adopting the term entropy in information theory. There are relationships between thermodynamic and informational entropy. For example, Maxwell's demon reverses thermodynamic entropy with information but getting that information exactly balances out the thermodynamic gain the demon would otherwise achieve.

In information theory, entropy is conceptually the actual amount of (information theoretic) information in a piece of data. Entirely random byte data has an entropy of about infinity, since the next character is unknown. A long string of A's has an entropy of 0, since the next character will always be an 'A'. The entropy of English text is about 1.5 bits per character (Try compressing it with the PPM compression algorithm!) The entropy rate of a data source means the average number of bits per symbol needed to encode it.

Many data bits may not convey information. For example, data structures often store information redundantly, or have identical sections regardless of the information in the data structure.
The amount of entropy is not always an integer number of bits.

Entropy is effectively the strongest non-lossy compression possible, which can be realized in theory by using the typical set or in practice using Huffman, Lempel-Ziv or Arithmetic coding. The definition of entropy is based on the Markov model of text. For an order-0 source (each character is selected independent of the last characters), the entropy is:

Where is the probability of . For a second-order Markov source (one in which probabilities are dependent on the preceding character), the entropy rate is:

Where is a state (certain preceding characters) and is the probability of given as the previous character (s).