If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

### Course: Computer science theory>Unit 3

Lesson 2: Modern information theory

# A mathematical theory of communication

Claude Shannon demonstrated how to generate "english looking" text using Markov chains. Created by Brit Cruise.

## Want to join the conversation?

• At , how does Shannon decide where to separate the random letters into words? On a related note, I'm still not quite understanding how you could generate a two letter word using a third order approximation.
(20 votes)
• Good question!
Spaces can be treated like any other symbol (I simply didn't use any in my examples)

With a third order approximation you could a space as one of the symbols. This could result in two letter words, as follows:

We know that, "is" is almost always followed by " "
Which would result in many "IS" "" trigrams in that state
(31 votes)
• At , he creates pairs of possible letter combinations but repeats many tiles that begin with A, especially AA tiles. I don't understand why he includes so many tiles in the A cup and only three tiles in the B and C cups.
Is this because A is needed to occur more often for the pattern to look like English writing?
(7 votes)
• This is not about English, but rather a language Claude Shannon made up.

Second-Order Approximation:
Brit Cruise/Claude Shannon took this message:
acaaaabcbbaacc
broke it up into pairs of two:
ac|aa|aa|bc|bb|aa|cc
broke it up into pairs of two in a different way:
(Ignore a)|ca|aa|ab|cb|ba|ac|(Ignore c)
then took all the pairs and put them into cups depending on what letter they started with (I have them rearranged differently):
A Cup
ac
aa
aa
aa
aa
ab
ac
B Cup
bc
bb
ba
C Cup
cc
ca
cb

As you can see, it's a coincidence that one copy of every possible pair starting with B and C is listed one time and if the message had multiple of them, Brit Cruise and Claude Shannon would've listed multiple under the B and C cups.

I hope this clarifies that part of the video!
(19 votes)
• what are the uses of this?
(2 votes)
• One use of approximating English language could be to pre-test a person's English's comprehending. These random examples could ensure that the reader has never read text.

Also, artificial intelligence could be made with approximating English based off of text the computer has seen or heard previously. This might lead to computers having conversations with each other or humans.

I hope this gives you some ideas of practical uses of approximating English!
(7 votes)
• My second set of questions (see Part 1 below):

Part 2 - Questions about 3-gram algorithm and transitioning from one cup to another

First, is the 3rd word in a 3-gram algorithm dependent on the 1st word or a combination of the 1st and 2nd word? Given the following occurrences:

the man ate
the man ate
a man ate

So here the the 3rd word 'ate' is more likely to come from 'the man' then 'a man', correct? So you would put it in different cups like this:

Cup 'the man'
the man ate
the man ate

Cup 'a man'
a man ate

Last question: For both 2-gram and 3-gram algorithms, when you pick a phrase from a cup, what happens when the last word does not match another cup name? So from above, if I pick 'a man ate', the last word 'ate' does not match another cup name. In that case, I'd just have to pick a random cup, right?
(3 votes)
• Hey Paul!
Remember that to construct our cups (or states) we need some input text:

using your example: "the man ate the man ate a man ate"

Then we need to slide along and capture every possible pair of words.

in this case would need *5* cups for:
The man
man ate
ate the
ate a
a man

So, you will *never* have an occurrence where you pick a word that doesn't belong to another state. See above we have a "man ate" cup.

Let me know if this is clear!
(3 votes)
• What book does he keep marking in, i might want to read it.
(4 votes)
• It is Shannon's paper called "A Mathematical Theory of Communication". It is probably freely available online.
(1 vote)
• Brit, these Markov videos are so interesting it really made me want to learn a lot more about n-grams and how to apply them to English words! But I have so many question!! I have so many that I'm splitting them into two parts.

Part 1 - Terminology and 2-gram

First, the terminology. Does 2-gram mean the second word of word n, e.g. the 2-gram of 'the man' is 'man'? Or is better to just call it the 2nd word?

Next, the application of the n-gram algorithms. How would you apply a 2-gram algorithm to English words. So given the following word pairs:

a man
the man
hey man
a dog
the dog

Here, the n=2 refers to the words 'man' and 'dog' , correct? Consequently, we would have 3 cups: 'a', 'the' and 'hey', is that right? Giving the following cups:

Cup 'a'
a dog
a man

Cup 'the'
the dog
the man

Cup 'hey'
hey man

I'm pretty sure that's how you would do it.
(3 votes)
• Here's are answers to my own question:

Terminology:
n-gram - the slice of words
Order - the number of slice of words
Level - the units of the n-gram
Example: a letter-level n-gram with at 3rd order looks like:

dba
bbd

a word-level n-gram with 4th order looks like:

"jim is very happy"
"oven dog candy house"

Determining how many cups. Answer: the number of cups depends on how many unique slices of words. So if you have "the man ate food", with a word-level 2nd order n-gram you will have the following 3 cups:

the man
man ate
ate food
(3 votes)
• If synaptic nerves are electrical signals, wouldn't they produce radio signals, which in turn could be deciphered via technology even before the verbal enunciation of syntax?
(3 votes)
• Neurons do generate a magnetic field which can be detected by scientific equipment. There are several studies about predicting human decisions before they are conscious by monitoring the brain.
(2 votes)
• At , is "improvementagement" supposed to be a real word?
(2 votes)
• Nope, it's an example of jibberish which is starting to 'look like' english. It "looks like" english because it has the same statistical structure as english words.
(4 votes)
• How does selection in tri-grams (or higher order n-grams) work? With a 2nd order approximation, one writes the first letter in the pair, and uses the second as an index to select from the next "cup."

In a tri-gram, would the bins (cups) be arranged in a matrix? Where each bin is assigned two indices? In that case, the first letter one writes down, the second letter selects a column, the third letter selects a row, and one draws from the cup with said coordinate (i.e., row-column position)?
(3 votes)
• How do these Markov chains relate to Moore and Mealy machines?
(2 votes)

## Video transcript

Voiceover: Shannon had just finished developing his theories related to cryptography and therefore was well aware that human communication was a mix of randomness and statistical dependencies. Letters in our messages were obviously dependent on previous letters to some extent. In 1949, he published a groundbreaking paper, "A Mathematical Theory of Communication". In it, he uses Markov models as the basis for how we can think about communication. He starts with a toy example. Imagine you encounter a bunch of text written in an alphabet of A, B, and C. Perhaps you know nothing about this language, though you notice As seem to clump together, while Bs and Cs do not. He then shows that you could design a machine to generate similar-looking text, using a Markov chain. He starts off with a zeroth-order approximation, which means we just independently select each symbol A, B, or C at random, and form a sequence However, notice that this sequence doesn't look like the original. He shows then you could do a bit better with a first-order approximation, where the letters are chosen independently, but according to the probability of each letter in the original sequence. This is slightly better as As are now more likely, but it still doesn't capture much structure. The next step is key. A second-order approximation takes into account each pair of letters which can occur. In this case, we need three states. The first state represents all pairs which begin with A, the second all pairs that begin with B, and the third state all pairs that begin with C. Notice now that the A cup has many AA pairs, which makes sense, since the conditional probability of an A after an A is higher in our original message. We can generate a sequence using this second-order model easily as follows. We start anywhere and pick a tile, and we write down our output the first letter, and move to the cup defined by the second letter. Then we pick a new tile, and repeat this process indefinitely Notice that this sequence is starting to look very similar to the original message, because this model is capturing the conditional dependencies between letters. If we wanted to do even better, we could move to a third-order approximation, which takes into account groups of three letters, or "trigrams". In this case, we would need nine states. But next, Shannon applies this exact same logic to actual English text, using statistics that were known for letters, pairs, and trigrams, etc. He shows the same progression from zeroth-order random letters to first-order, second-order and third-order sequences. He then goes on and tries the same thing using words instead of letters, and he writes "the resemblance to ordinary English text "increases quite noticeably at each depth." Indeed, these machines were producing meaningless text, though they contained approximately the same statistical structure you'd see in actual English. Shannon then proceeds to define a quantitative measure of information, as he realizes that the amount of information in some message must be tied up in the design of the machine which could be used to generate similar-looking sequences. Which brings us to his concept of entropy.