If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

### Course: Computer science theory>Unit 3

Lesson 2: Modern information theory

# Error correction

How can we communicate in the presence of noise? Created by Brit Cruise.

## Want to join the conversation?

• What happens is a parity bit gets corrupted?
(98 votes)
• If one of the data bits gets flipped then two or three of the parity bits are wrong. if only one parity bit is wrong then it must be the parity bit that is flipped. note he stated repeatedly that the system only corrects single errors.
(84 votes)
• What's the reason for sequence of information and parity bits at ? Is there some logical reason for the sequence or is it just what was used initially and so has stuck ever since?
(13 votes)
• Indeed, there is a method behind the madness for which bit is used for data and which bit is used for parity in Hamming codes.

In general, a Hamming code uses a block of 2^m-1 bits with m bits used for parity
(where m is some integer >=2)
The powers of 2 (1,2,4,8,etc.) are the parity bits
The remaining bits are data bits
The parity bits cover the bit numbers that have have its power of 2 set in their binary representation
This gives us the property that if we sum up the incorrect parity bits we can identify the bit in error

e.g.
For the Hamming code presented in the video m=3
Thus we have a block of 2^3-1=7 bits with 3 parity bits
7 bits total-3 parity bits=4 data bits
This is called a (7,4) Hamming code

So bit 1,2,4 are parity bits (these correspond to p1,p2,p3 in the video)
Bits 3,5,6,7 are data bits (these correspond to d1,d2,d3,d4 in the video)

Notice at in the video that when p1,p2 were incorrect this caused d1 to be corrected
Notice how we get the same result by noting the incorrect parity bits and summing them, and
then correcting the bit corresponding to the sum.
i.e. 1+2=3 bit 3 is in error, bit 3 corresponds to d1

e.g.
For m=4 we would hav a 2^4-1=15 bit block and 4 parity bits
The parity bits would be bits 1,2,4,8
The data bits would be 3,5,6,7,9,10,11,12,13,14,15
If we found parity bits 1,4 and 8 didn't match we would calculate 1+4+8=13, and correct bit 13
If instead we found that parity bit 4 didn't match we would calculate a sum of 4, and know that the parity bit 4 was incorrect

Whether one uses odd or even parity doesn't matter

Hope this makes sense
(36 votes)
• Couldn't the parity bits be mis-punched?
(10 votes)
• Yes, the parity bits can be in error as well. An error correcting code should be able to detect which bit is in error, even if it is a parity bit.

The Hamming Code is arranged so that:
If only 1 parity bit doesn't match up, then it is the parity bit that is in error.
If more than 1 parity bit doesn't match up, it is a data bit that is in error (the combination of parity bits tells us which data bit is in error).

Hope this makes sense.
(20 votes)
• Is this the same method used to allow QR codes to be damaged, but still readable?
(10 votes)
• Yes, in the general sense that QR codes incorporate extra information, so that we can detect errors.

No, in the specific sense, in that QR codes do not use Hamming Codes.
QR uses RS codes for error correction.
Hamming codes are linear error correcting codes which use parity while, as I understand it, RS views data as coefficients of polynomials in a finite field in order to correct errors. (finite fields are a concept from abstract algebra)

Hope this makes sense
(14 votes)
• What happens if two data bits get flipped in one set of four data bits?
(5 votes)
• That usually only happens in very unstable connections. To be able to overcome this problem, you probably have to include more redundancy (telling more times the same thing).
(7 votes)
• Is this the same system that is used in today's data storage devices? I wondered because the redundant bits take up 42% of the sequence and that seems a bit too much.
(7 votes)
• Typically, for data storage, CRC (cyclical redundancy codes) is used for error detection, and Reed-Solomon codes are used for error correction.

These codes are used in data storage as they are well suited to handle random noise and burst errors (a contiguous chunk of errors), which tends to reflect the types of errors that occur in practice.
(1 vote)
• First at all, congratulations this an amazing video for learn Hamming Code and I hope to see more of this kind. My question: I'm confused In minute the sequence is P1 D1 P2 D2 P3 D4, later on minute the sequence is P1 P2 D1 P3 D2 D3 D4. But follow the discussion I see that the later is the correct one. Is this an error on the video or am I missing something?.
(5 votes)
• I think it is a small error on the video, but in the case shown in the video, P1 D1 P2 D2 P3 D3 D4 are actually the same as P1 P2 D1 P3 D2 D3 D4.
Hope it helps.
(3 votes)
• @Brit: I know you've a lot planned, and that you've a lot to do, but I was wondering: is the course you're developing predicted to talk about the halting problem, or solving NP-complete problems in polynomial time, or anything like that?
(2 votes)
• Hey Kieran, yes the course I have planned will deal with those exact problems. However I'm currently unsure of the timeline...it's definitely on my list!
(4 votes)
• I don't see why we would want error correcting codes in the sense that it almost doubles the size of the message. We've probably come out with Ultra-efficient error correcting codes, but I'd rather just have a smaller error detection code (in larger multi-BYTE sequences, a small hash), so the other side can request a retransmit, making it something that is probable in size, and such.
(1 vote)
• Error detection with re-transmission works fine if the time to retransmit the message is small, errors are infrequent, and we have access to the original message to retransmit it. This is true in many cases, and this scheme has been used in many network protocols, but there are some notable exceptions.

If we were communicating with a rover on mars where messages could take several minutes to reach the rover and interference is routine, an error detection with retransmission scheme wouldn't work. An error correction scheme would just fix the small errors in the messages, but the error detection scheme could result in the same message being re-transmitted over and over again causing hours of delays before it is perfectly transmitted.

CDs and hard drives also use error correcting codes. These are case where if the data gets corrupted we don't have access to the original message. We use the error correcting codes to reconstruct the original message/data. This is why a scratched CD can still be played.

Hope this makes sense
(5 votes)
• hello, Is there a study that shows which languages are more efficient.? For example I have seen that some people from india can say a lot with less amount of noises (language) and some how is the same with some Asian languages. I have seen that also in a comparisons between english and spanish.
(2 votes)
• First we need to quantify what it means to be "efficient." Here, you mean "to say less, but have the same message come across." In that case, we have to go back a few videos: to the concept of message space.

We appropriately defined a message space to be s^n. In English, s is basically the 26 letters of the alphabet, the space character, and punctuation. n is the number of times you can use any of these, in combination. So to have the same message conveyed, you need s^n to remain constant, but we can raise or lower s as long as we do the opposite to n.

Therefore, in order to convey the same message in the most efficient manner possible, we need a language that minimizes n, and thus has the biggest s. This can be thought of as simply: the language with the most distinct letters is the most efficient -- you will be saying less. Of course, in practice many languages are further limited by their choice of grammatical structure, etc. but at that point this question becomes more of an applied sciences question.
(3 votes)

## Video transcript

Voiceover: Alice and Bob have figured out an amazing trick. They are exchanging messages by plucking a wire either hard or soft, to transmit a zero vs. a one However due to gusts of wind, false zeros or ones can occur during transmission, resulting in errors. Though they've figured out a way to communicate error-free, even in the presence of noise. How could they do this? In the 1940s, Richard Hamming faced a similar problem, while working at Bell Laboratories. Richard: At the Bell Telephone Laboratories, we do about 10% of the experiments on a computer, and about 90% in the laboratory. I expect that in time, we will do 90% on the computer, and 10% in the lab. Speed, cost, and effort favor the computer over the laboratory approach. At the time, the computers used stored information on punch cards, representing one vs. zero with hole vs. no hole. This system was error prone, because it was common for cards to get bent or mis-punched in the first place. So holes could be missed, or no holes could be accidentally punctured, causing flipped bits. These errors would cause the entire system to halt, until the error location could be found and corrected manually. Hamming took it upon himself to devise a method which could automatically detect and correct single bit errors, without interrupting calculations. His solution was rooted in the intuitive idea of repetition, something we all do when faced with interference, or the chance that part of our message will be corrupted. His error-correcting codes were built on the simple concept of a parity bit. A parity bit is a single bit which is added to the end of a message, and indicates whether the number of ones in the message is even or odd. If a single error occurs, the receiver could then detect it, because the parity bit will no longer match. However to detect and correct single errors, Hamming needed to add more parity bits to identify the error location. This leads to his seven-four code, which adds three parity bits to each block of four data bits as follows. First we start with the three parity bits, which can be represented by a circle. These circles intersect to produce four regions. The four data bits are placed inside these regions in a specific order. To calculate the parity bits, we look at each circle one at a time, each containing three data bits. We determine the parity bit as before. Add up the data bits, and if we get zero or two, the parity bit is zero for even, and if we get one or three, the parity bit is one, for odd. We do this for all circles, and end up with three parity bits to match the four data bits. These are then placed in a standard sequence as follows. Realize now, this system can automatically correct single errors with a simple rule. If a single error occurs, two or more of the parity bits will be incorrect, and wherever they intersect is the location of the error. This intersecting data bit is then flipped automatically, so that all parity bits are valid again. This is Alice and Bob's trick. The additional parity bits are known as redundant bits, because they don't carry any new information. All error-correction codes work this way. They all increase the size of the source messages slightly, at the expense of automatically correcting errors. We also use error correction codes for storage, for example on a physical CD, the information is encoded using special codes, to correct for chunks of errors caused by scratches or dust, which corrupt longer sequences of zeros and ones stored on the surface. This is why you can scratch a CD and often it will still play perfectly. Claude Shannon used this idea of redundancy to redefine the capacity of a communication channel, because as the noise on your channel increases, we must increase the amount of redundancy to communicate error-free. This must then decrease the effective amount of information you can send per unit time.