If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

### Course: Computer science theory>Unit 3

Lesson 2: Modern information theory

# Compression codes

What is the limit of compression? Created by Brit Cruise.

## Want to join the conversation?

• Is audio data pseudo-random from a binary point of view? Is audio compression done by throwing away non-audible frequencies from a fast fourier transform?
• A lot of audio data is reasonably compressible, which suggests that it would be a poor candidate for randomness (we shouldn't be able to compress "random" data). However, it would be possible to find audio signals that are good candidates for randomness.

For lossy compression, for both images and audio signals, you are basically correct in that they use the idea of transforming the signals into the frequency domain and discard frequencies that won't be noticed. However, the transform that is used is typically the Discrete Cosine Transform (DCT), which is similar to the Discrete Fourier Transform. DCT is used for .mp3 and .jpg files.

For more details on DCT see:
http://en.wikipedia.org/wiki/Discrete_cosine_transform

Hope this makes sense
• "So for this to work, you need to introduce letter spaces, which cancel out savings during transmissions."
How much are letter spaces accounted for in compression codes? They don't seem to be accounted for when compressing the message from 2000 bits to 1750 bits at bit are accounted for when calculating savings. Letter spaces don't seem to be bits, but they give information by showing where one letter ends and another begins, so how would you measure them information-wise?

"...And Claude Shannon was the first to claim that the limit of compression would be the entropy of the message source."
This claim is intuitive, but if we compress D further to just 0 (which doesn't seem to create any problems...although that's not a Huffman coding anymore...), wouldn't that compress the message to 1.5 bits per letter which is smaller than the entropy or does that throw away information?
I think that compressing D to just 0 would cause some kind of problem because otherwise contradicts Shannon's claim and is thus counter-intuitive since the smallest average number of bits one letter takes up should be the number of bits of information one letter has and not any smaller. Also, if I was right, the claim most likely wouldn't even be mentioned without stating or at least implying it wasn't right.
• Noble,
Your confusion is understandable. You've been caught by a parenthesis at the lecture.

Professor Brit says that the Huffman Coding is optimal. At he says "for examples of the following type. You cannot beat it. Go ahead and try. For an example..." Here start the parenthesis.

After this short introduction, Professor emulated an unsuccessful attempt to beat the Huffman Coding. He reduces the code for "D" (to be only 0). And so, the conclusion of the unsuccessful attempt: it will not work, because generates ambiguity. He continued talking about the "reduced code", and concluded further at : "So for this to work (to transcend the ambiguity), you'd need to introduce letter spaces, which cancel out any savings during transmission."

With the last frase, he concluded the parenthesis. The tentative to beat the Huffman Coding was unsuccessful. The proposed code (that reduces the code for "D" to be only 0) was not only worst than Huffman's, it was ineffective. The letter spaces would cancel out any savings.

The sequence of the lecture is completely disconnected with the unsuccessful attempt to beat Huffman Coding (the proposed code does not even compress the information). The parenthesis and the letter spaces went to oblivion and Professor Brit follows explaning about the compression with Huffman Coding.

I hope it helps :)
• Does radio send information using Binary Digits? Or do the waves cause the receiver to vibrate at a frequency conducive for analog decompression into audio waves?
• Yes! It does now, but traditional radio was born out of the latter. Since FM can involved several frequencies at once, it allowed for developments such as stereo audio and subcarriers to transmit digital information through the waves (ie. text info for digital receivers). Now days through a service called "HD radio" from iBiquity, digital audio data is sent alongside the standard analog signal for those who have digital receivers that can decode the information. The digital signal has a much cleaner sound as the receiver is effectively judging 100% yes or no decisions, throwing out all the low level junk noise/distortion from the analog airwaves.
• what does "entropy" mean?
• Entropy is a measure of disorder or randomness. In information theory, it specifically refers the average amount of information in every message, which increases as the message gets more random.

For example, the machine in he videos that is equally likely to give you four symbols has more entropy than a machine that gives different symbols at different rates. I'd suggest rewatching the Information Entropy video.

Entropy is often used in other areas of science, especially thermodynamics. It's the measurement of disorder or chaos in a system, and relates to the ability of a system to do useful work. Over time, entropy always increases in a closed system as the system becomes more disorganized.

For example, if you drop food colouring into a glass of water, the food colouring starts off fairly ordered (all in the spot you dropped it). However, disorder and entropy will increase over time: the food colouring will slowly spread out until it is equally randomly dispersed throughout the water,
• The probability distribution of the four letters in the video worked out well in that 2*12.5 = 25 and 2*25 = 50. But, what if the probabilities weren't such clean multiples of each other? What if you had probabilities of 0.41, 0.31, 0.17, and 0.11 respectively for A, B, C, and D? How do you group the nodes when their probabilities don't imply obvious divisions?
• Here's how you would generate a Huffman coding for the following nodes:
A=0.41, B=0.31, C=0.17, D=0.11

Find the two least probable nodes: C and D
Group them together and create a new node, whose probability is their sum:
We create: Node (C,D) with probability 0.17+ 0.11 = 0.28

Now we have:
A=0.41, B=0.31, (C,D)=0.28

Find the two least probable nodes: B and (C,D)
Group them together and create a new node, whose probability is their sum:
We create: Node (B,(C,D)) with probability 0.31+ 0.28 = 0.59

Now we have:
A=0.41, (B,(C,D))=0.59

Find the two least probable nodes: A and (B,(C,D))
Group them together and create a new node, whose probability is their sum:
We create: Node (A,(B,(C,D))) with probability 0.41+ 0.59 = 1.00

We have just created this Huffman Coding tree

``  /\  A  *   / \  *   B /\D  C``

or with probabilities shown:
``  (A,(B,(C,D)))=1.00     /      \  A=0.41  (B,(C,D))=0.59           /     \    (C,D)=0.28   B=0.31     /  \D=0.11  C=0.17``

If we use 0 for each left branch and 1 for each right branch the code is
A is 0 (length of 1 bit)
B is 11 (length of 2 bits)
C is 101 (length of 3 bits)
D is 100 (length of 3 bits)

You can try entering this distribution into this program:

The average bit size per symbol is:
0.41 * 1 + 0.31 * 2 + 0.17 * 3 + 0.11 * 3 = 1.87

The entropy of the probability distribution is:
-0.41* log_2(0.41) -0.31* log_2(0.31) -0.17* log_2(0.17) -0.11 * log_2(0.11) = 1.836

The difference between the entropy of the probability distribution and our average size
per symbol is caused by the probabilities not being powers of 2.

However, this is the best we can do using a prefix free code (assuming that we stick to binary
branching and only encode one symbol at a time), since the Huffman code is the provably
optimal prefix free code.

If instead we tried the distribution in the video with
A= 1/2 B= 1/8 C=1/8 D= 1/4

The average bit size per symbol is:
1/2 * 1 + 1/4 * 2 + 1/8 * 3 + 1/8 * 3 = 1.75

The entropy of the probability distribution is:
-1/2* log_2(1/2) -1/4* log_2(1/4) -1/8 * log_2(1/8) -1/8 * log_2(1/8) = 1.75

Here, the average bit size per symbol actually matches the entropy of the probability
distribution (because we used nice powers of two for our probabilities).

So if our probabilities are powers of 2, our Huffman coding will be perfect, if not, it won't be, but we won't be able to find a better prefix free code (assuming that we stick to binary branching and only encode one symbol at a time).

Hope this makes sense
• This is really tricky stuff, Shannon was an incredibly deep thinker. I know this is not a question but these videos are the best I have seen, I am going to have to go over these quite a few times. I wonder if this material may be the foundation of explaining the weirdness of Quantum Mechanics in a more "realistic" way ?
• Why can we number the edges in any order we want. Wouldn't this cause confusion on the other end?
• Your observation is of course correct but the idea is that you always share the code/"numbering of the edges" with the other side/reciever. Otherwise it will cause confusion as you said.
(1 vote)
• What is the prefix used to say "This is a new character"?
• If you're using Huffman coding, you don't need a prefix to say "This is a new character," because Huffman coding produces a prefix-free code. Therefore, no code for any symbol will be at the beginning of the code for another symbol.

In the video, the code is A=1, B=011, C=010, and D=00. Try decoding the following message from left to right. You'll find there is no ambiguity!
0101011
First digit is 0, doesn't correspond to a code. Second digit is 1, so now we're working with 01, doesn't correspond to a code. Third digit is 0, so now we're working with 010 = C! Next digit is 1, which corresponds to A! and so on, to get CAB. You see there is no need to signal that "This is a new character" with a prefix-free code.
• At I do not get why we have to add spaces.
• Thats a good question
Think what would happen if one wants to send multiple symbols in a sequence without the spaces.
Would you still be able to know where the coding of one symbol starts and another ends?

Delimiters are needed to decide what was registered.
(1 vote)
• So this Huffman binary tree seems really good at storing information efficiently, without the use of separators (like spaces). Nevertheless aren't separators really useful for quickly accessing a certain symbol?

Let me rephrase this question:
Consider the following code (using the encoding from the video):
``100101001110110111``

This code translates to `addacbabba` if read from left to right. But what if you wanted to start somewhere in the middle, e.g. with the 5th symbol? Is there any way to systematically know where the 5th symbol starts (without decoding from left to right)?
(1 vote)
• Since the symbols have different lengths you need to look at almost all of the bits to determine where the 5th symbol starts.

This should illustrate why we are forced to look at almost all of the bits to find out where the 5th symbols starts. Imagine, we skipped over just 3 bits. That 3 bits might have been:
1 symbol e.g. 011
1 symbol plus part of another symbol e.g. 000(followed by 0 or 10 or 11)
2 symbols e.g. 001
3 symbols e.g. 111

So even skipping over just 3 bits makes it impossible to tell where the 5th digit starts.