Measuring information: Shannon's H

math-to-play-with ...Because when we play, we learn!

Measuring information: Shannon's H

$$H(X) = -\sum_{i=1}^{n} P(x_i) \log_b P(x_i)$$ Although in different applications different log bases (b) can be used, here we will use base 2. Hence: $$H(X) = -\sum_{i=1}^{n} P(x_i) \log_2 P(x_i)$$ To be clear, H measures entropy; i.e., uncertainty —the 'non-information' if you want— rather than the information. Here are some examples where we assume equal probability for each possibility x_i within example X.

Example (X)	Possibilities (x_i)	Probability of each (P(x_i))	H(X)
One day we will die	1	1	0
Coin toss	2	.5	1
Rock, paper scissors	3	.333...	1.584
4-sided die roll	4	.25	2
5-sided die roll	5	.20	2.322
6-sided die roll	6	.166...	2.584

Note that as the possibilities increase, so does the entropy. Assuming that we all will die one day —(P(we will die) = 1.0)— there is zero entropy (no uncertainty; H = 0) for that case. Entropy of a coin toss (2 possibilities) is .5. Entropy of a 4-sided die roll is 2, etc.

This makes intuitive sense. The uncertainty of a case with two possibilities is less than that with 3, 4 or more possibilities.

All of the above cases, however, are rather trivial. Since the possibilities for each case all have the identical probability associated with them, the formula for H reduces to the simple negative of the log; i.e., $$H(X) = -log_2 P(x_i) $$ Below we have plotted log₂ and H for 0 ≤ x ≤ 1.0.

The choice of log₂ is desirable as it indicates for a dataset how many binary (yes/no) choices we will have to make to find a single type of case. Consider, for instance, a dataset with the following three variables:

Employment status: 50% of the cases (people) employed; 50% unemployed

Age: 50% old; 50% young

Hair color: 25% brown; 25% black; 25% blonde; 25% red

₂

This also means that in order to store the information of one person we will need at least four bits; i.e., four on/off or yes/no switches.

Things become more interesting when the probabilities are no longer equally divided, for instance when the dataset contains more employed than unemployed people, unequal percentages of hair colors, etc.

We have set this up below for you to play with. Play with the probabilities of the categories (make sure that they sum to 1.0 across) and see what happens to H. You might even try and see what happens when the dataset contains only employed or unemployed, only old or young people and see what effect that has on H and this the number of bits needed to store the information of one person.

Variable	Probability 1	Probability 2	Probability 3	Probability 4
Employment	P(employed):	P(unemployed):
Age	P(young):	P(old):
Hair color	P(brown):	P(black):	P(blonde):	P(red):
	H:

About math-to-play-with — Contact — Disclaimers — License