Название | The Code Book: The Secret History of Codes and Code-breaking |
---|---|
Автор произведения | Simon Singh |
Жанр | Прочая образовательная литература |
Серия | |
Издательство | Прочая образовательная литература |
Год выпуска | 0 |
isbn | 9780007378302 |
Table 8 Repetitions and spacings in the ciphertext.
To identify whether the key is 2, 4, 5, 10 or 20 letters long, we need to look at the factors of all the other spacings. Because the keyword seems to be 20 letters or smaller, Table 8 lists those factors that are 20 or smaller for each of the other spacings. There is a clear propensity for a spacing divisible by 5. In fact, every spacing is divisible by 5. The first repeated sequence, E-F-I-Q, can be explained by a keyword of length 5 recycled nineteen times between the first and second encryptions. The second repeated sequence, P-S-D-L-P, can be explained by a keyword of length 5 recycled just once between the first and second encryptions. The third repeated sequence, W-C-X-Y-M, can be explained by a keyword of length 5 recycled four times between the first and second encryptions. The fourth repeated sequence, E-T-R-L, can be explained by a keyword of length 5 recycled twenty-four times between the first and second encryptions. In short, everything is consistent with a five-letter keyword.
Assuming that the keyword is indeed 5 letters long, the next step is to work out the actual letters of the keyword. For the time being, let us call the keyword L1-L2-L3-L4-L5, such that L1 represents the first letter of the keyword, and so on. The process of encipherment would have begun with enciphering the first letter of the plaintext according to the first letter of the keyword, L1. The letter L1 defines one row of the Vigenère square, and effectively provides a monoalphabetic substitution cipher alphabet for the first letter of the plaintext. However, when it comes to encrypting the second letter of the plaintext, the cryptographer would have used L2 to define a different row of the Vigenère square, effectively providing a different monoalphabetic substitution cipher alphabet. The third letter of plaintext would be encrypted according to L3, the fourth according to L4, and the fifth according to L5. Each letter of the keyword is providing a different cipher alphabet for encryption. However, the sixth letter of the plaintext would once again be encrypted according to L1, the seventh letter of the plaintext would once again be encrypted according to L2, and the cycle repeats itself thereafter. In other words, the polyalphabetic cipher consists of five monoalphabetic ciphers, each monoalphabetic cipher is responsible for encrypting one-fifth of the entire message, and, most importantly, we already know how to cryptanalyse monoalphabetic ciphers.
We proceed as follows. We know that one of the rows of the Vigenère square, defined by L1, provided the cipher alphabet to encrypt the 1st, 6th, 11th, 16th, … letters of the message. Hence, if we look at the 1st, 6th, 11th, 16th, … letters of the ciphertext, we should be able to use old-fashioned frequency analysis to work out the cipher alphabet in question. Figure 14 shows the frequency distribution of the letters that appear in the 1st, 6th, 11th, 16th, … positions of the ciphertext, which are W, I, R, E, …. At this point, remember that each cipher alphabet in the Vigenère square is simply a standard alphabet shifted by a value between 1 and 26. Hence, the frequency distribution in Figure 14 should have similar features to the frequency distribution of a standard alphabet, except that it will have been shifted by some distance. By comparing the L1 distribution with the standard distribution, it should be possible to work out the shift. Figure 15 shows the standard frequency distribution for a piece of English plaintext.
Figure 14 Frequency distribution for letters in the ciphertext encrypted using the L1 cipher alphabet (number of occurrences).
Figure 15 Standard frequency distribution (number of occurrences based on a piece of plaintext containing the same number of letters as in the ciphertext).
The standard distribution has peaks, plateaus and valleys, and to match it with the L1 cipher distribution we look for the most outstanding combination of features. For example, the three spikes at R-S-T in the standard distribution (Figure 15) and the long depression to its right that stretches across six letters from U to Z together form a very distinctive pair of features. The only similar features in the L1 distribution (Figure 14) are the three spikes at V-W-X, followed by the depression stretching six letters from Y to D. This would suggest that all the letters encrypted according to L1 have been shifted four places, or that L1 defines a cipher alphabet which begins E, F, G, H, …. In turn, this means that the first letter of the keyword, L1, is probably E. This hypothesis can be tested by shifting the L1 distribution back four letters and comparing it with the standard distribution. Figure 16 shows both distributions for comparison. The match between the major peaks is very strong, implying that it is safe to assume that the keyword does indeed begin with E.
Figure 16 The L1 distribution shifted back four letters (top), compared with the standard frequency distribution (bottom). All major peaks and troughs match.
To summarise, searching for repetitions in the ciphertext has allowed us to identify the length of the keyword, which turned out to be five letters long. This allowed us to split the ciphertext into five parts, each one enciphered according to a monoalphabetic substitution as defined by one letter of the keyword. By analysing the fraction of the ciphertext that was enciphered according to the first letter of the keyword, we have been able to show that this letter, L1, is probably E. This process is repeated in order to identify the second letter of the keyword. A frequency distribution is established for the 2nd, 7th, 12th, 17th, … letters in the ciphertext. Again, the resulting distribution, shown in Figure 17, is compared with the standard distribution in order to deduce the shift.
Figure 17 Frequency distribution for letters in the ciphertext encrypted using the L2 cipher alphabet (number of occurrences).
Figure 18 The L2 distribution shifted back twelve letters (top), compared with the standard frequency distribution (bottom). Most major peaks and troughs match.
This distribution is harder to analyse. There are no obvious candidates for the three neighbouring peaks that correspond to R-S-T. However, the depression that stretches from G to L is very distinct, and probably corresponds to the depression we expect to see stretching from U to Z in the standard distribution. If this were the case, we would expect the three R-S-T peaks to appear at D, E and F, but the peak at E is missing. For the time being, we shall dismiss the missing peak as a statistical glitch, and go with our initial reaction, which is that the depression from G to L is a recognisably shifted feature. This would suggest that all the letters encrypted according to L2 have been shifted twelve places, or that L2 defines a cipher alphabet which begins M, N, O, P, … and that the second letter of the keyword, L2, is M. Once again, this hypothesis can be tested by shifting the L2 distribution back twelve letters and comparing it with the standard distribution. Figure 18 shows both distributions, and the match between the major peaks is very strong, implying that it is safe to assume that the second letter of the keyword is indeed M.
I shall not continue the analysis; suffice to say that analysing the 3rd, 8th, 13th, … letters implies that the third letter of the keyword is I, analysing the 4th, 9th, 14th, … letters implies that the fourth letter is L, and analysing the 5th, 10th, 15th, … letters implies that the fifth letter is Y. The keyword is EMILY. It is now possible to reverse the Vigenère cipher and complete the cryptanalysis. The first letter of the ciphertext is W, and it was encrypted according to the first letter of the keyword, E. Working backwards, we look at the Vigenère square, and find W in the row beginning with E, and then we find which letter is at the top of that column. The letter is s, which must make it the first letter of the plaintext. By repeating this