Cryptography, Information Theory, and Error-Correction. Aiden A. Bruen

Читать онлайн.
Название Cryptography, Information Theory, and Error-Correction
Автор произведения Aiden A. Bruen
Жанр Зарубежная компьютерная литература
Серия
Издательство Зарубежная компьютерная литература
Год выпуска 0
isbn 9781119582403



Скачать книгу

      Source: These percentages are based on the book “The Tragical History of Doctor Faustus”, by Christopher Marlowe, which is a book of approximately 100 000 characters that was chosen randomly from the Project Gutenberg.

      Frequency analysis can be used for cryptanalysis. However, one needs a lot of craft in its use, along with any information that can be gathered about the contents of the message and the sender.

      Now that the Vigenère cipher has been defined, we will show how to use the frequencies of letters in English, to break this cipher. We have two tasks.

       Find the keyword length.

       Find the keyword itself.

      We will use two fundamental principles in carrying out our tasks.

       “E” is the most frequent letter of the English language.

       Informally, written English tends to “repeat itself.” This means that the frequencies of a passage starting in position 1 are similar to the frequencies of the passage starting in position when we slide the text along itself.

      Once we obtain n, the key‐length, we can find the keyword itself. We do this by using the first fundamental property above. Namely, we exploit the statistics of the letters in English, or pairs of letters (i.e. digrams), or trigrams, etc.

      The second principle has two important interpretations. For the Babbage–Kasiski method, this means that if we find a repeated letter (or sequence of letters) in the cipher text there is a good chance that it comes from a given letter (or sequence of letters) in the plain text that has been enciphered by a given letter (or letters) in the key. Thus, there is a reasonable expectation that the distances between such repeated sequences of letters equals the key length or a multiple of the key length.

      The second principle has an important implication in terms of our second method, called the method of “coincidences,” as well. The basic idea is explained in an example below.

      The Babbage–Kasiski method

      To demonstrate this method for finding n, suppose we received the following cipher text:

EHMVL VDWLP WIWXW PMMYD PTKNF RHWXS
LTWLP OSKNF WDGNF DEWLP SOXWP HIWLL
EHMYD LNGPT EEUWE QLLSX TUP

      Our task is to search for repetitions in the above text. For a small cipher text, the brute‐force method is not too difficult. We focus on trigrams, and highlight some as follows:

images
EHM 60
WLP 25,15,40
XWP 39
MYD 45

      We note that with the exception of 39, 5 divides all of the distances. In fact, if we proceed with frequency analysis, it turns out that we can decipher this message with a key length of 5. The codeword is “ladel,” and the plain text is “Thor and the little mouse thought that they should douse the house with a thousand liters of lithium” (Who said that secret messages have to have a clear meaning!) Frequency analysis is used in our next example below.

      It is purely by chance that we had the repeated trigram WXP – this repeated trigram was not the result of the same three letters being enciphered by the same part of the keyword. This highlights the fact that the above method is probabilistic.

      The method of coincidences

      We will now use the second principle of “coincidences,” to find the length of the keyword. The sequence of plain text letters in positions 1 to n, n plus 1 to 2 n, 2 n plus 1, to 3 n, etc., should be approximately the same, especially if n is large. It follows that a similar result holds true for the corresponding sequences of cipher text letters. Thus, if we slide the cipher text along itself and count the number of coincidences (i.e. agreements) at each displacement, then on average the maximum number of coincidences will occur after an integer multiple of the keyword length n (i.e. the max occurs for some lamda n, lamda equals 1 comma 2 comma 3 comma ellipsis). This technique can be illustrated using the following example: We will first determine the period n and use it to determine the nature of the keyword from a cipher text passage:


VVHZK UHRGF HGKDK ITKDW EFHEV SGMPH
KIUWA XGSQX JQHRV IUCCB GACGF SPGLH
GFHHD MHZGF BSPSW SDSXR DFHEM OEPGI
QXKZW LGHZI