If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Polyalphabetic cipher

Brit introduces the polyalphabetic cipher, which creates a lighter fingerprint than the Caesar cipher. Created by Brit Cruise.

Want to join the conversation?

  • leaf green style avatar for user Solt Budavári
    Could someone please explain the meaning of the following sentences of this lesson?
    a)"Any time there is a differential in letter frequencies, a leak of information occurs. These differences are caused by repetition in the encrypted message."
    b)"[the codebraker] will need to go through and check the *frequency distributions of different intervals.*"

    * emphasis added by me
    Can someone explain these things by, say, decrypting the message seen in the video step-by-step?
    Thank you.
    (111 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Albert Lee
      Well, as in the video, with each code there is almost always some form of pattern, no matter how minor. Each pattern/unique difference is something that may stand out, which will then draw one's attention to that specific section.

      In the video, each difference in letter frequency assists the code-breaker "Eve" by hinting at which letters/numbers may have been modified the most. Eventually, a pattern will be deduced from the discrepancies between each tally amount of characters.

      The best code will have a lessened "fingerprint", or more minuscule and therefore harder to notice patterns. No code is completely fool-proof, unless in the obvious situation where the code-makers were just making stuff up or died without sharing the full message.
      (51 votes)
  • spunky sam blue style avatar for user mikegill32
    I love this series! I had to pause this to understand better.
    At if you do the decrypt math, it seems any of the letters that are being modified by +13 are off by one, so they should be MEET=FRFE, AT=NU, ELEPHANT=PQXCILSM, LAKE=YBVJ
    Am I missing something?
    (51 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Ruchi
    I wasn't able to understand decrypting the message with the snake example. how do you determine the length of the word used to encrypt?
    (44 votes)
    Default Khan Academy avatar avatar for user
  • leaf grey style avatar for user Shlomo Fingerer
    Ok. Basically, me & you exchange in advance any word we want. That word is our "master plan". Every time you write me a letter, you take a look at our "master word", turn the word into a bunch of numbers, A=1 Z=26 etc. Then... you write your message, writing on top of every letter, our chosen "master word" over and over. Before you send it to me, you shift every letter in your message by the number represented by the "master letter" that you have written on top of your rough draft! No?
    (11 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user N. J. Pratt
      Yes that's correct!

      As an example, let’s say the message is, “Elephant Lake has been compromised."
      Then, let’s say the master word is “snake” as in the video at .
      The message would translate to numbers as follows:
      ELEPHANT LAKE HAS BEEN COMPROMISED.
      5 12 5 16 8 1 14 20 12 1 11 5 8 1 19 2 5 5 14 3 15 13 16 18 15 13 9 19 5 4.
      The ‘master word’ would translate to numbers as follows:
      SNAKE
      19 14 1 11 5
      Then, place the message over the "master word" or key, as it is commonly called, and shift, like this:
      E L E P H A N T L A K E
      5 12 5 16 8 1 14 20 12 1 11 5
      19 14 1 11 5 19 14 1 11 5 19 14
      24 26 6 1 13 20 2 21 23 6 4 19
      X Z F A M T B U W F D S

      H A S B E E N
      8 1 19 2 5 5 14
      1 11 5 19 14 1 11
      9 12 24 21 19 6 25
      I I L U S F Y

      C O M P R O M I S E D.
      3 15 13 16 18 15 13 9 19 5 4.
      5 19 14 1 11 5 19 14 1 11 5
      8 8 1 17 3 20 6 23 20 16 9
      H H A Q C T F W T P I.

      Thus, "Elephant Lake has been compromised," turns into "XZFAMTBU WFDS IIX USFY HHAQCTFWTPI." Quite unintelligible and difficult to decipher.
      (48 votes)
  • old spice man green style avatar for user eggie5
    I believe "snake" should code to: 19 14 1 11 5, i.e.: 14 not 13 for 'n'. I just discovered this while trying to implement a poly block cipher in ruby:
    shared_secret="snake"
    m="Meet me at elephant lake Make sure nobody follows you Bring the bomb".downcase!.delete!(" ")
    cipher=""

    m.chars.each_with_index do |c, i|
    shift = shared_secret[i % shared_secret.length].ord-96

    shifted_char = c.ord+shift
    shifted_char=(shifted_char % 122)+96

    cipher << shifted_char.chr
    end

    p m
    p cipher
    (17 votes)
    Default Khan Academy avatar avatar for user
  • leaf red style avatar for user Pradyumna Chakrapani
    How does one determine the leak of information & length of the word?
    (6 votes)
    Default Khan Academy avatar avatar for user
    • starky sapling style avatar for user James McKinney
      It is the differential in letter frequencies that is the leak. You can obtain this by counting how many times a number is used in the code. In a Ceasar cypher the frequency of each letter is used is it's fingerprint. @ it shows the fingerprint for English. We use some letter more frequently like 'E', 'A', and 'T'. 'E' in particular is used a lot more than the others. So when you see a letter in the code used a lot there is a good chance it is the letter 'E'. This gives you a place to start, and reduce the tedious task of trial-and-error.

      @ For a polyalphabetic cypher Brit explains that the length of the word is the key in a cracking the code.

      To find this you take letters at different intervals to build a subset of letters to analyze their frequency. i.e. start with the first letter then take every 3rd,4th, or 5th letter and build subsets. Analyze these subsets and look for the familiar fingerprint. If, for example, the length of the code word is 5 letters, the fingerprint should appear when you analyze every 5th letter (@). If the word is 3 letters long the fingerprint should appear when you analyze every 3rd letter.

      @ After you have determined the length of the word to be X letters long, you break up the code into subsets of every Xth letter (you should have X number of these subsets). You then need to perform a Ceasar cypher on each subset individually.
      (15 votes)
  • male robot hal style avatar for user Jacob
    Who is Brit Cruise?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • male robot donald style avatar for user Zachary Mass
    So, if you used a code word that is has 26 letters in it (the same number that is in the alphabet) would the code then be unbreakable?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      No, the code word must be as long as the message to be unbreakable.

      The number of symbols you use in the alphabet does not matter as long as you choose from them randomly for each letter in your key (in computers we typically use only 2 symbols 1 and 0 for our "alphabet" in binary). If you choose an English phrase or some other non-random choice for the key, the code will be breakable,

      For frequency analysis, if we know how long the key is we do the following:
      -we draw some columns (# of columns = length of our key)
      -we write the cipher text letter by letter in each of the columns
      -once we go over the number of columns, we start a new row and start back at the first column and write the letters underneath the previous row
      -once all the letters have been written in columns, we calculate how often each cipher text letter occurs in each column
      -we expect that the cipher text letter will correspond to the plain text letter with a similar frequency in normal English usage

      So if we had a key of length 26 and a message of length 2600, we would have 100 rows. Suppose we measured the frequency of the first column, and found that "q" occurred ~8% of the time and "u" occured ~13% of the time. In plain English we expect that the letter "a" occurs ~8.2% of the time and "e" occurs ~12.7% of the time. So we can figure out that the "q" probably represents an "a" and the "u" represents an "e", so the 1st letter in the key has shifted everything up "q" places, so the 1st key letter is a "q".

      Now, if we tried the same thing with a key length of 2600 and a message length of 2600, we would only have one letter in each column (this will be true when the key length = message length). So if we try to calculate the frequency of each cipher text letter we get 100% for the letter in the column and 0% for all the other letters in the alphabet. This doesn't really tell us anything.

      Hope this makes sense
      (7 votes)
  • orange juice squid orange style avatar for user Lasker
    While it is true that for any given message a longer shift word will lead to a stronger code, wouldn't a more accurate way of expressing the strength of a cipher using this code be as a ratio between the length of the message and the length of the shift word? For example, A 40 character message with a four letter shift word should be about as difficult to break (10 instances of each Caesar cipher) as a 100 character message with a ten character shift word (10 instances again), right?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • aqualine seed style avatar for user fleizach
      No, it is indeed the length of the shift word and not the ratio of the shift word to message length that provides the strength. In the above video, Brit shows that when a 5-letter interval is guessed, the result gives the characteristic letter frequency. That means he had to try a 1-letter interval, a 2-letter interval, and so on up to the 5-letter interval. In your example, a four letter shift word only requires a search of 1, 2, 3, and 4-letter intervals before the correct interval is found, but a ten letter shift word requires a codebreaker to search up to the 10-letter interval before we can break the message. This is more work so the longer shift word provides more security.
      (7 votes)
  • spunky sam blue style avatar for user Arjun Bahuguna
    Eve is really determined to get this Bob guy.
    (5 votes)
    Default Khan Academy avatar avatar for user

Video transcript

A strong cipher is one which disguises your fingerprint. To make a lighter fingerprint is to flatten this distribution of letter frequencies. By the mid-15th century, we had advanced to polyalphabetic ciphers to accomplish this. Imagine Alice and Bob shared a secret shift word. First, Alice converts the word into numbers according to the letter position in the alphabet. Next, this sequence of numbers is repeated along the message. Then each letter in the message is encrypted by shifting according to the number below it. Now she is using multiple shifts instead of a single shift across the message, as Caesar had done before. Then the encrypted message is sent openly to Bob. Bob decrypts the message by subtracting the shifts according to the secret word he also has a copy of. Now imagine a code breaker, Eve, intercepts a series of messages and calculates the letter frequencies. She will find a flatter distribution, or a lighter fingerprint. So how could she break this? Remember, code breakers look for information leak, the same as finding a partial fingerprint. Any time there is a differential in letter frequencies, a leak of information occurs. This difference is caused by repetition in the encrypted message. In this case, Alice's cipher contains a repeating code word. To break the encryption, Even would first need to determine the length of this shift word used, not the word itself. She will need to go through and check the frequency distribution of different intervals. When she checks the frequency distribution of every fifth letter, the fingerprint will reveal itself. The problem now is to break five Cesar Ciphers in a repeating sequence. Individually this is a trivial task, as we have seen before. The added strength of this cipher is the time taken to determine the length of the shift word used. The longer the shift word, the stronger the cipher.