Main content

## Computer science

### Unit 2: Lesson 1

Ancient cryptography- What is cryptography?
- The Caesar cipher
- Caesar Cipher Exploration
- Frequency Fingerprint Exploration
- Polyalphabetic cipher
- Polyalphabetic Exploration
- The one-time pad
- Perfect Secrecy Exploration
- Frequency stability property short film
- How uniform are you?
- The Enigma encryption machine
- Perfect secrecy
- Pseudorandom number generators
- Random Walk Exploration

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Frequency stability property short film

Can you tell the difference between actions based upon flipping a coin and those based upon blind guessing or simulating randomness? This short video examines the frequency stability property. Created by Brit Cruise.

## Want to join the conversation?

- So, is it just a person's tendency to try to be random that ruins our attempt to be random? What makes a computer so much better at simulating a random coin flip?(163 votes)
- Whether or not the computer is random is not the point. What it is doing is coldly calculating probabilities using formulas, and simply not messing up because it has been pre-programmed.

The computer uses pseudorandom numbers to effectively model "randomness", which is really only a theoretical concept. Something can only be considered random relative to your methods of measuring "randomness", that is, the entropy of the medium. This is an example of something which can, by definition does, transcend human understanding.

The computer proceeds to draw pseudorandom numbers according to some distribution of what is called a "random variable", like the analogy of a function but to probability spaces...

- if you are drawing numbers from a distribution that is flat, such as pdf(x) = c over some interval [a, b] : This is called the uniform distribution, and every event is equally likely.

- (More Technical) if you draw from a bell curve pdf(x) = 1/(sigma*sqrt{2*pi}) * e^[(-x^2)/(2*sigma^2)] then your numbers are said to be normally distributed, with mean 0 and standard deviation sigma. This means things cluster around 0, the center, with some spread (=variation, whose square root is standard deviation sigma).

Fun fact: if you take the integral of this function, it converges but cannot be expressed with any mathematical instruments we already have, so a name was invented for it: erf(x). This is also just called the cumulative distribution function (cdf) of the normal. It looks like a logistic function (one asymptote to the left at y=0, then function slopes up to another asymptote at y=1 to the right.

Note a few things here: the probability of any one event is given by evaluating the pdf (probability density function) at an x-location. That is, the likelihood of something is governed by the height of the pdf curve at some location on the x-axis. In the case of the uniform, which is flat everywhere, you have no difference between x-axis locations. In the normal distribution, you are more likely to be clustered around the mean (0 in the above example), but still have rare instances of finding oneself 2 or even 3 standard deviations from the mean.

For reference: about two-thirds of cases fall within 1 standard deviation from the mean (that is, +/- 1); 95% are within 2 s.d. of the mean; 99.8% are within 3 s.d. of the mean. Rare events are said to fall along the "long tail" of the distribution.

Back to pseudorandomness. What a computer does is use some seed value, for instance the time of day in milliseconds, and begin transforming that number using a bunch of complicated functions.

More Technical: Most functions multiply your seed with a large prime number, then modulo by (i.e. take the remainder from) some medium-sized power of 10, like 10^20. Voila, you now have 19 pseudorandom digits! And that's only in base 10, in base 2 you have 10^20 ~ 2^66, i.e. 65 random bits.

Exercise: what does the cdf of the uniform distribution look like?

Exercise: expand the above equation for # random bits (hint: use the identity 10^3 ~ 2^10).

Exercise: explain the concept of a long tail as you would to a 6th grader.

Exercise: explain the additions to the formula for the normal pdf if you want to shift the mean from 0.

Exercise: provide as complicated an example as you can find of: a computation which you can do with 19 (base 10) digits.

(119 votes)

- What if a person checked 2 digit frequency instead of 3 digit. Woulndt the pattern still be obvious and you would need to check less data?(11 votes)
- One of the key reasons people are very bad at generating uniformly random numbers, is human perceive long sequences of repeated numbers or patterns to not be random.

Repeating a number just twice doesn't bother people, so people will be able to generate reasonably random numbers for sequences of length two, but when these sequences get longer and longer people will feel very reluctant to select them, because they don't "feel random" to them.

If I were to randomly select a sequence of 4 coin tosses, I should expect to find 4 heads 1/16 times (2^4=16 possibilities). If I asked 32 people in a classroom to give me a random sequence of 4 coin tosses, and these people gave me truly random sequences (which they won't be), I should expect 2 of them to give me a sequence with all heads (32*1/16=2). However, it is likely that less than 2 people would give a sequence with 4 heads, because a sequence of 4 heads in a row doesn't "feel random". This is something that any student or teacher can try with their class.

The bias people have will become clearer as we use longer and longer sequences. The bias may be detectable with a sequence of two numbers, but it will be smaller, which would mean you would need to look at more two digits sequences to confirm the bias exists and is not just randomly occurring. However, since the bias becomes more and more pronounced as the sequences get longer, we will need to look at less of the longer sequences to be confident that a bias exists and isn't just randomly occurring. So, it becomes a trade off between sequence length and number of sequences we need to look at: as we shorten the length of the sequences we need to check more of them. Without doing an empirical study on it, I would suspect that the optimal sequence length to uncover bias would be 3 or 4.

Hope this makes sense.(37 votes)

- Then what is random and how come a computer is better than a human. Does it mean that humans have favorite numbers and computers don't?(12 votes)
- This is a very difficult question, which touches on psychology and philosophy. Here is my understanding:

Humans are very good at recognising patterns. If you see a text on a piece of paper it doesn't matter if it's written by hand or printed, you will be able to recognise the patterns that we call letters, words, and sentences, and you can read the text. If you see a picture on a computer monitor this is really just a bunch of colours, but your mind recognises the patterns these colours make, and you know that the picture is, for example, a cat.

Random sequences, by definition, do not have a pattern. Yet, our mind cannot help but look for patterns, even if there is none. This is something we cannot "shut off". When we try to create something random, we know there shouldn't be a pattern, so we try to avoid anything we would see as a pattern. If a human is asked for a random sequence of 5 digits, almost no-one will say "33333", even though that sequence should be just as likely as "57194".

Now in your question you said that computers are better than humans at creating random sequences. This is not entirely true. Although humans are notoriously bad at it, computers are even worse. A computer is "deterministic"; this means that, given a specific input, the same program will always give the same result. In other words, there is nothing random at all about a computer. However, through clever mathematics and simple tricks, we can make sure the computer creates sequences which look random to humans, and can even pass mathematical tests for randomness.

The main point is that, although "true random" is not achievable by either human nor computer, it is often unnecessary. As long as the generated numbers are random enough to pass certain tests, it is good enough for most applications.

Note that there are some interesting philosophical questions related to this: is the universe itself random? If not, then nothing we do can ever create true random sequences.(31 votes)

- What are these basis humans have in generating random numbers? How might have they developed?(6 votes)
- I suspect that part of the difficulty with creating a random sequence of 1's and 0's is that there are only two digits available. One method I came up with that worked pretty well was to try to make a large, random base 10) number and then convert it to binary.

Here is my example:

Take a "random" decimal number with 18 digits: 984643451642321768

Convert it to binary: 110110101010001010000000000010011101100011100001101101101000

That seemed to be more random than any sequence of 1's and 0's that I typed with my eyes closed.

Now, I will take an 18 digit random number generated by Excel:

818348906403743000

Convert it into Binary:

110000010010010100100010101000001011101001100101010000101000

That one did not seem to be as random as the one I did by hand. I wonder if that is a result of Excel's Pseudo-Random number generator of if I just got lucky...

Happy Mathing!(11 votes)

- At1:20or so when Mr. Cruise talks about identifying true randomness through graphing the frequency of sequences of numbers (e.g. 110 and 010 occur a similar amount of times in true randomness) in the pattern "ABCDEF" are ABC, BCD, CDE, and DEF all sequences or are just ABC and DEF the sequences Brit is referring to?(5 votes)
- So this is just like the computer sequences we see in the Spy movies?

The code reminds me of it a little bit.(3 votes)- Not really, the computer sequences are mostly made up - there are no green numbers flashing on the screen any time your computer needs to make a calculation as the spy movies would have you believe. The idea and the principle behind frequency stability, as well as this code, were made hundreds of years before the computer.(4 votes)

- in0:14, brit said that in the first room, there was a man flipping a coin and switching on/off. Isn't the room dark and therefore the person can't see what he has landed? I really don't know.(3 votes)
- there was light over the mans hand otherwise the section of the vid would be all black(4 votes)

- How can frequency stability relate to cryptography?(3 votes)
- To have a perfect cipher, there must be complete randomness(3 votes)

- In the exercise following this video, one of the choices is "unfair coin".

What does it mean by "unfair coin"?(3 votes)- An unfair coin is one in which the odds of heads are not equal to the odds of tails. In other words, the coin lands on one side more often than it does the other.(3 votes)

- So witch one was the coin and witch one was not?(3 votes)
- Time 1.40.

The one above is not a coin. You can see it by looking at for example sequence 000 and 111, where it is not equally likely to occur, while when the coin is used all sequences are equally likely. This is the whole point! :)(2 votes)

## Video transcript

[TYPING] Consider the following. Imagine two rooms. [DOOR CLOSING] [DOOR CLOSING] Inside each room is a switch. [CLICK] [CLICK] In one room, there is a man
who flips his switch according to a coin flip. If he lands heads,
the switch is on. If he lands tails,
the switch is off. In the other room, a
woman switches her light based on a blind guess. She tries to simulate
randomness without a coin. Then we start a clock, and they
make their switches in unison. [CLICK] [CLICK] [CLICK] [CLICK] Can you determine
which light bulb is being switched
by a coin flip? [CLICK] [CLICK] [CLICK] [CLICK] The answer is yes, but how? [CLICK] [CLICK] [CLICK] And the trick is to think about
properties of each sequence rather than looking for
any specific patterns. For example, first,
we may try to count the number of 1's and 0's
which occur in each sequence. This is close, but
not enough since they will both seem fairly even. The answer is to count sequences
of numbers, such as runs of three consecutive switches. A true random sequence
will be equally likely to contain every
sequence of any length. This is called the
frequency stability property and is demonstrated
by this uniform graph. The forgery is now obvious. Humans favor certain sequences
when they make guesses, resulting in uneven patterns
such as we see here. One reason this
happens is because we make the mistake of
thinking certain outcomes are less random than others. But realize, there is no
such thing as a lucky number. There is no such thing
as a lucky sequence. If we flip a coin
10 times, it is equally likely to come
up all heads, all tails, or any other sequence
you can think of. [CLICK] [CRICKETS CHIRPING]