Main content

## Computer science theory

### Course: Computer science theory > Unit 3

Lesson 2: Modern information theory- Symbol rate
- Introduction to channel capacity
- Message space exploration
- Measuring information
- Origin of Markov chains
- Markov chain exploration
- A mathematical theory of communication
- Markov text exploration
- Information entropy
- Compression codes
- Error correction
- The search for extraterrestrial intelligence

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# A mathematical theory of communication

Claude Shannon demonstrated how to generate "english looking" text using Markov chains. Created by Brit Cruise.

## Want to join the conversation?

- At2:58, how does Shannon decide where to separate the random letters into words? On a related note, I'm still not quite understanding how you could generate a two letter word using a third order approximation.(20 votes)
- Good question!

Spaces can be treated like any other symbol (I simply didn't use any in my examples)

With a**third order approximation**you could a space**as**one of the symbols. This could result in two letter words, as follows:

We know that, "is" is almost always followed by " "

Which would result in many "IS" "" trigrams in that state(31 votes)

- At1:34, he creates pairs of possible letter combinations but repeats many tiles that begin with A, especially AA tiles. I don't understand why he includes so many tiles in the A cup and only three tiles in the B and C cups.

Is this because A is needed to occur more often for the pattern to look like English writing?(7 votes)- This is not about English, but rather a language Claude Shannon made up.

Second-Order Approximation:

Brit Cruise/Claude Shannon took this message:

acaaaabcbbaacc

broke it up into pairs of two:

ac|aa|aa|bc|bb|aa|cc

broke it up into pairs of two in a different way:

(Ignore a)|ca|aa|ab|cb|ba|ac|(Ignore c)

then took all the pairs and put them into cups depending on what letter they started with (I have them rearranged differently):

A Cup

ac

aa

aa

aa

aa

ab

ac

B Cup

bc

bb

ba

C Cup

cc

ca

cb

As you can see, it's a coincidence that one copy of every possible pair starting with B and C is listed one time and if the message had multiple of them, Brit Cruise and Claude Shannon would've listed multiple under the B and C cups.

I hope this clarifies that part of the video!(19 votes)

- what are the uses of this?(2 votes)
- One use of approximating English language could be to pre-test a person's English's comprehending. These random examples could ensure that the reader has never read text.

Also, artificial intelligence could be made with approximating English based off of text the computer has seen or heard previously. This might lead to computers having conversations with each other or humans.

I hope this gives you some ideas of practical uses of approximating English!(7 votes)

- My second set of questions (see Part 1 below):

Part 2 - Questions about 3-gram algorithm and transitioning from one cup to another

First, is the 3rd word in a 3-gram algorithm dependent on the 1st word or a combination of the 1st and 2nd word? Given the following occurrences:

the man ate

the man ate

a man ate

So here the the 3rd word 'ate' is more likely to come from 'the man' then 'a man', correct? So you would put it in different cups like this:

Cup 'the man'

the man ate

the man ate

Cup 'a man'

a man ate

Last question: For both 2-gram and 3-gram algorithms, when you pick a phrase from a cup, what happens when the last word does not match another cup name? So from above, if I pick 'a man ate', the last word 'ate' does not match another cup name. In that case, I'd just have to pick a random cup, right?(3 votes)- Hey Paul!

Remember that to construct our cups (or states) we need some input text:

using your example: "the man ate the man ate a man ate"

Then we need to slide along and capture every possible pair of words.

in this case would need ***5*** cups for:

The man

man ate

ate the

ate a

a man

So, you will ***never*** have an occurrence where you pick a word that doesn't belong to another state. See above we have a "man ate" cup.

Let me know if this is clear!(3 votes)

- What book does he keep marking in, i might want to read it.(4 votes)
- It is Shannon's paper called "A Mathematical Theory of Communication". It is probably freely available online.(1 vote)

- Brit, these Markov videos are so interesting it really made me want to learn a lot more about n-grams and how to apply them to English words! But I have so many question!! I have so many that I'm splitting them into two parts.

Part 1 - Terminology and 2-gram

First, the terminology. Does 2-gram mean the second word of word n, e.g. the 2-gram of 'the man' is 'man'? Or is better to just call it the 2nd word?

Next, the application of the n-gram algorithms. How would you apply a 2-gram algorithm to English words. So given the following word pairs:

a man

the man

hey man

a dog

the dog

Here, the n=2 refers to the words 'man' and 'dog' , correct? Consequently, we would have 3 cups: 'a', 'the' and 'hey', is that right? Giving the following cups:

Cup 'a'

a dog

a man

Cup 'the'

the dog

the man

Cup 'hey'

hey man

I'm pretty sure that's how you would do it.(3 votes)- Here's are answers to my own question:

Terminology:

n-gram - the slice of words

Order - the number of slice of words

Level - the units of the n-gram

Example: a letter-level n-gram with at 3rd order looks like:

dba

bbd

a word-level n-gram with 4th order looks like:

"jim is very happy"

"oven dog candy house"

Determining how many cups. Answer: the number of cups depends on how many unique slices of words. So if you have "the man ate food", with a word-level 2nd order n-gram you will have the following 3 cups:

the man

man ate

ate food(3 votes)

- If synaptic nerves are electrical signals, wouldn't they produce radio signals, which in turn could be deciphered via technology even before the verbal enunciation of syntax?(3 votes)
- Neurons do generate a magnetic field which can be detected by scientific equipment. There are several studies about predicting human decisions before they are conscious by monitoring the brain.(2 votes)

- At3:27, is "improvementagement" supposed to be a real word?(2 votes)
- Nope, it's an example of jibberish which is starting to 'look like' english. It "looks like" english because it has the same statistical structure as english words.(4 votes)

- How does selection in tri-grams (or higher order n-grams) work? With a 2nd order approximation, one writes the first letter in the pair, and uses the second as an index to select from the next "cup."

In a tri-gram, would the bins (cups) be arranged in a matrix? Where each bin is assigned two indices? In that case, the first letter one writes down, the second letter selects a column, the third letter selects a row, and one draws from the cup with said coordinate (i.e., row-column position)?(3 votes) - How do these Markov chains relate to Moore and Mealy machines?(2 votes)

## Video transcript

Voiceover: Shannon had
just finished developing his theories related to
cryptography and therefore was well aware that human
communication was a mix of randomness and
statistical dependencies. Letters in our messages were obviously dependent on previous
letters to some extent. In 1949, he published
a groundbreaking paper, "A Mathematical Theory of Communication". In it, he uses Markov models as the basis for how we can think about communication. He starts with a toy example. Imagine you encounter a bunch of text written in an alphabet of A, B, and C. Perhaps you know nothing
about this language, though you notice As
seem to clump together, while Bs and Cs do not. He then shows that you
could design a machine to generate similar-looking text, using a Markov chain. He starts off with a
zeroth-order approximation, which means we just independently select each symbol A, B, or C at
random, and form a sequence However, notice that this sequence doesn't look like the original. He shows then you could do a bit better with a first-order approximation, where the letters are
chosen independently, but according to the
probability of each letter in the original sequence. This is slightly better
as As are now more likely, but it still doesn't
capture much structure. The next step is key. A second-order approximation
takes into account each pair of letters which can occur. In this case, we need three states. The first state represents
all pairs which begin with A, the second all pairs that begin with B, and the third state all
pairs that begin with C. Notice now that the A
cup has many AA pairs, which makes sense, since
the conditional probability of an A after an A is higher
in our original message. We can generate a sequence using this second-order
model easily as follows. We start anywhere and pick a tile, and we write down our
output the first letter, and move to the cup defined
by the second letter. Then we pick a new tile, and repeat this process indefinitely Notice that this sequence
is starting to look very similar to the original message, because this model is capturing the conditional dependencies
between letters. If we wanted to do even better, we could move to a
third-order approximation, which takes into account groups of three letters, or "trigrams". In this case, we would need nine states. But next, Shannon applies
this exact same logic to actual English text, using statistics that were known for letters,
pairs, and trigrams, etc. He shows the same
progression from zeroth-order random letters to
first-order, second-order and third-order sequences. He then goes on and tries
the same thing using words instead of letters, and he writes "the resemblance to ordinary English text "increases quite
noticeably at each depth." Indeed, these machines were producing meaningless text, though they contained approximately the same
statistical structure you'd see in actual English. Shannon then proceeds to define a quantitative measure of information, as he realizes that the
amount of information in some message must be tied up in the design of the
machine which could be used to generate similar-looking sequences. Which brings us to his concept of entropy.