Main content

## Computer science

### Course: Computer science > Unit 3

Lesson 2: Modern information theory- Symbol rate
- Introduction to channel capacity
- Message space exploration
- Measuring information
- Origin of Markov chains
- Markov chain exploration
- A mathematical theory of communication
- Markov text exploration
- Information entropy
- Compression codes
- Error correction
- The search for extraterrestrial intelligence

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Error correction

How can we communicate in the presence of noise? Created by Brit Cruise.

## Want to join the conversation?

- What happens is a parity bit gets corrupted?(96 votes)
- If one of the data bits gets flipped then two or three of the parity bits are wrong. if only one parity bit is wrong then it must be the parity bit that is flipped. note he stated repeatedly that the system only corrects single errors.(83 votes)

- What's the reason for sequence of information and parity bits at3:38? Is there some logical reason for the sequence or is it just what was used initially and so has stuck ever since?(13 votes)
- Indeed, there is a method behind the madness for which bit is used for data and which bit is used for parity in Hamming codes.

In general, a Hamming code uses a block of 2^m-1 bits with m bits used for parity

(where m is some integer >=2)

The powers of 2 (1,2,4,8,etc.) are the parity bits

The remaining bits are data bits

The parity bits cover the bit numbers that have have its power of 2 set in their binary representation

This gives us the property that if we sum up the incorrect parity bits we can identify the bit in error

e.g.

For the Hamming code presented in the video m=3

Thus we have a block of 2^3-1=7 bits with 3 parity bits

7 bits total-3 parity bits=4 data bits

This is called a (7,4) Hamming code

So bit 1,2,4 are parity bits (these correspond to p1,p2,p3 in the video)

Bits 3,5,6,7 are data bits (these correspond to d1,d2,d3,d4 in the video)

Notice at3:49in the video that when p1,p2 were incorrect this caused d1 to be corrected

Notice how we get the same result by noting the incorrect parity bits and summing them, and

then correcting the bit corresponding to the sum.

i.e. 1+2=3 bit 3 is in error, bit 3 corresponds to d1

e.g.

For m=4 we would hav a 2^4-1=15 bit block and 4 parity bits

The parity bits would be bits 1,2,4,8

The data bits would be 3,5,6,7,9,10,11,12,13,14,15

If we found parity bits 1,4 and 8 didn't match we would calculate 1+4+8=13, and correct bit 13

If instead we found that parity bit 4 didn't match we would calculate a sum of 4, and know that the parity bit 4 was incorrect

Whether one uses odd or even parity doesn't matter

Hope this makes sense(36 votes)

- Couldn't the parity bits be mis-punched?(10 votes)
- Yes, the parity bits can be in error as well. An error correcting code should be able to detect which bit is in error, even if it is a parity bit.

The Hamming Code is arranged so that:

If only 1 parity bit doesn't match up, then it is the parity bit that is in error.

If more than 1 parity bit doesn't match up, it is a data bit that is in error (the combination of parity bits tells us which data bit is in error).

Hope this makes sense.(20 votes)

- Is this the same method used to allow QR codes to be damaged, but still readable?(10 votes)
- Yes, in the general sense that QR codes incorporate extra information, so that we can detect errors.

No, in the specific sense, in that QR codes do not use Hamming Codes.

QR uses RS codes for error correction.

Hamming codes are linear error correcting codes which use parity while, as I understand it, RS views data as coefficients of polynomials in a finite field in order to correct errors. (finite fields are a concept from abstract algebra)

Hope this makes sense(14 votes)

- What happens if two data bits get flipped in one set of four data bits?(5 votes)
- That usually only happens in very unstable connections. To be able to overcome this problem, you probably have to include more redundancy (telling more times the same thing).(7 votes)

- Is this the same system that is used in today's data storage devices? I wondered because the redundant bits take up 42% of the sequence and that seems a bit too much.(7 votes)
- Typically, for data storage, CRC (cyclical redundancy codes) is used for error detection, and Reed-Solomon codes are used for error correction.

These codes are used in data storage as they are well suited to handle random noise and burst errors (a contiguous chunk of errors), which tends to reflect the types of errors that occur in practice.(1 vote)

- First at all, congratulations this an amazing video for learn Hamming Code and I hope to see more of this kind. My question: I'm confused In minute3:38the sequence is P1 D1 P2 D2 P3 D4, later on minute4:20the sequence is P1 P2 D1 P3 D2 D3 D4. But follow the discussion I see that the later is the correct one. Is this an error on the video or am I missing something?.(5 votes)
- I think it is a small error on the video, but in the case shown in the video, P1 D1 P2 D2 P3 D3 D4 are actually the same as P1 P2 D1 P3 D2 D3 D4.

Hope it helps.(3 votes)

- I don't see why we would want error correcting codes in the sense that it almost doubles the size of the message. We've probably come out with Ultra-efficient error correcting codes, but I'd rather just have a smaller error detection code (in larger multi-BYTE sequences, a small hash), so the other side can request a retransmit, making it something that is probable in size, and such.(1 vote)
- Error detection with re-transmission works fine if the time to retransmit the message is small, errors are infrequent, and we have access to the original message to retransmit it. This is true in many cases, and this scheme has been used in many network protocols, but there are some notable exceptions.

If we were communicating with a rover on mars where messages could take several minutes to reach the rover and interference is routine, an error detection with retransmission scheme wouldn't work. An error correction scheme would just fix the small errors in the messages, but the error detection scheme could result in the same message being re-transmitted over and over again causing hours of delays before it is perfectly transmitted.

CDs and hard drives also use error correcting codes. These are case where if the data gets corrupted we don't have access to the original message. We use the error correcting codes to reconstruct the original message/data. This is why a scratched CD can still be played.

Hope this makes sense(5 votes)

- Can you just amplify the signal so the noise can't corrupt it?(3 votes)
- yes but that only would work in alice and bobs case, the hole punching example wouldn't work that way because you can't punch a hole harder.(1 vote)

- hello, Is there a study that shows which languages are more efficient.? For example I have seen that some people from india can say a lot with less amount of noises (language) and some how is the same with some Asian languages. I have seen that also in a comparisons between english and spanish.(2 votes)
- First we need to quantify what it means to be "efficient." Here, you mean "to say less, but have the same message come across." In that case, we have to go back a few videos: to the concept of message space.

We appropriately defined a message space to be s^n. In English, s is basically the 26 letters of the alphabet, the space character, and punctuation. n is the number of times you can use any of these, in combination. So to have the same message conveyed, you need s^n to remain constant, but we can raise or lower s as long as we do the opposite to n.

Therefore, in order to convey the same message in the most efficient manner possible, we need a language that minimizes n, and thus has the biggest s. This can be thought of as simply: the language with the most distinct letters is the most efficient -- you will be saying less. Of course, in practice many languages are further limited by their choice of grammatical structure, etc. but at that point this question becomes more of an applied sciences question.(2 votes)

## Video transcript

Voiceover: Alice and Bob have
figured out an amazing trick. They are exchanging
messages by plucking a wire either hard or soft, to
transmit a zero vs. a one However due to gusts of
wind, false zeros or ones can occur during transmission,
resulting in errors. Though they've figured out a way to communicate error-free, even in the presence of noise. How could they do this? In the 1940s, Richard Hamming faced a similar problem, while working at Bell Laboratories. Richard: At the Bell
Telephone Laboratories, we do about 10% of the
experiments on a computer, and about 90% in the laboratory. I expect that in time, we
will do 90% on the computer, and 10% in the lab. Speed, cost, and effort favor the computer over the laboratory approach. At the time, the computers
used stored information on punch cards, representing one vs. zero with hole vs. no hole. This system was error
prone, because it was common for cards to get bent or
mis-punched in the first place. So holes could be missed, or no holes could be
accidentally punctured, causing flipped bits. These errors would cause
the entire system to halt, until the error location could be found and corrected manually. Hamming took it upon
himself to devise a method which could automatically
detect and correct single bit errors, without
interrupting calculations. His solution was rooted in the intuitive idea of repetition, something we all do when
faced with interference, or the chance that part of
our message will be corrupted. His error-correcting codes were built on the simple concept of a parity bit. A parity bit is a single bit which is added to the end of a message, and indicates whether the number of ones in the message is even or odd. If a single error occurs, the receiver could then detect it, because the parity bit
will no longer match. However to detect and
correct single errors, Hamming needed to add more parity bits to identify the error location. This leads to his seven-four code, which adds three parity bits to each block of four data bits as follows. First we start with the three parity bits, which can be represented by a circle. These circles intersect
to produce four regions. The four data bits are
placed inside these regions in a specific order. To calculate the parity bits, we look at each circle one at a time, each containing three data bits. We determine the parity bit as before. Add up the data bits, and
if we get zero or two, the parity bit is zero for even, and if we get one or three, the parity bit is one, for odd. We do this for all circles, and end up with three parity bits to match the four data bits. These are then placed in a standard sequence as follows. Realize now, this system
can automatically correct single errors with a simple rule. If a single error occurs, two or more of the parity
bits will be incorrect, and wherever they intersect is the location of the error. This intersecting data bit is
then flipped automatically, so that all parity bits are valid again. This is Alice and Bob's trick. The additional parity bits are known as redundant bits, because they don't carry
any new information. All error-correction codes work this way. They all increase the size of the source messages slightly, at the expense of automatically
correcting errors. We also use error correction
codes for storage, for example on a physical CD, the information is encoded
using special codes, to correct for chunks of errors caused by scratches or dust, which corrupt longer
sequences of zeros and ones stored on the surface. This is why you can scratch a CD and often it will still play perfectly. Claude Shannon used
this idea of redundancy to redefine the capacity
of a communication channel, because as the noise on
your channel increases, we must increase the amount of redundancy to communicate error-free. This must then decrease
the effective amount of information you can send per unit time.