• No results found

Strengthening Julius Caesar: Vigenère ciphers

The weakness of the Julius Caesar system is that there are only 25 possible decrypts and so the cryptanalyst can try them all. Life can obviously be made more difficult for him if we increase the number of cases that must be tried before success can be assured. We can do this if, instead of shifting each letter by a fixed number of places in the alphabet, we shift the letters by a variable amount depending upon their position in the text. Of course there must be a rule for deciding the amount of the shift in each case oth- erwise even an authorised recipient won’t be able to decrypt the message. A simple rule is to use several fixed shifts in sequence. For example, if instead of a fixed shift of 19 as was used in the message

COME AT ONCE

in the last chapter and which enciphered to VHFX TM HGVX

we use two shifts, say 19 and 5, alternately, so that the first, third, fifth etc. letters are shifted 19 places and the second, fourth etc. are shifted 5 places then the cipher now becomes

VTFJ TY HSVJ.

If we replace the space character by Zin the message and use threeshifts, say 19, 5 and 11, in sequence the plaintext becomes

COMEZATZONCE. The cipher is now

VTXXELMEZGHP

and the key which provides the encipherment is 19-5-11. To read the message the recipient must use the decipherment keyin which each of these three numbers is replaced by its complement (mod 26), i.e. by 7-21-15.

Even if a cryptanalyst suspected that a Julius Caesar system with three shifts being used sequentially was being employed he would have to try 75 or more combinations (one of the shifts might be 0). On such a short message as this there would be the possibility of more than one solution. If the message is too short to identify the three shifts independently, as we shall do in the example below, a ‘brute force’ method might have to be tried but, since this would involve

25252515 625

trials, it would only be used as a last resort. In the extreme case where the number of shifts used was equal to the number of letters in the message the message becomes ‘unbreakable’, unless there is some non-random feature to the sequence of shifts. Where there is no non-random feature, such as when the sequence of shifts has been generated by some ‘random number process’, we have what is known as a ‘one-time pad’ cipher, which we come to in Chapter 7.

This approach to strengthening the Julius Caesar cipher by means of several shifts has been used for some hundreds of years. Such systems are known under the name of Vigenère ciphers. Since most people find it easier to remember words rather than arbitrary strings of letters or numbers Vigenère keys often take the form of a keyword. This reduces the number of possible keys of course but that is the price the cryptographer has to pay for easing the burden on his memory. The letters of the keyword are interpreted as numbers in the usual way (A0, B1, C2, ..., Z25) so that, for example, the keyword CHAOSwould be equivalent to using the five shifts 2, 7, 0, 14 and 18 in sequence.

The keyword or numerical key would be written repeatedly above the plaintext and each plaintext letter moved the appropriate number of places to give the cipher. Thus if we enciphered COMEZATZONCEusing Vigenère with the keyword CHAOSthe layout would be

CHAOSCHAOSCH COMEZATZONCE and the resultant cipher is

EVMSRCAZCFEL.

A Vigenère cipher is a particular, and rather special, case of a polyalphabetic systemin which, as the name implies, a number of different substitution alphabets are used rather than just one, as in simple substitution systems. The number of substitution alphabets used may be anything from 2 to many thousands; in the enigmafor example it is effectively 16900, and these are simple substitutions, not Julius Caesar type shifted alphabets as in Vigenère ciphers, as we shall see in Chapter 9.

How to solve a Vigenère cipher

The first step in solving a Vigenère cipher is to determine the length of the key and, assuming that there is sufficient cipher text available, we do this by looking for repeated combinations of letters,polygraphsas they are called, and noting how far apart they are in the text. If these repetitions are genuine, that is if they are cipher versions of the same plaintext, then they will be separated by multiples of the length of the key which should then be identified or, at least, reduced to one of a small number of possibilities. The longer the repeated polygraphs are the better the situation for the cryptanalyst, but evendigraphs, two-letter combinations, can be helpful.

Example 3.1

A cipher message of 157 characters enciphered by a Vigenère cipher with Zused as ‘space’ is

HQEOT FNMKP ELTEL UEZSI KTFYG STNME GNDGL PUJCH QWFEX FEEPR PGKZY EHHQV PSRGN YGYSL EDBRX LWKPE ZMYPU EWLFG LESVR PGJLY QJGNY GYSLE XVWYP SRGFY KECVF XGFMV ZEGKT LQOZE LUIKS FYLXK HQWGI LF

(1) Find the length of the key.

(2) Find the key and decrypt the message.

Solution

(1) We examine the text and find that six digraphs occur three times or more, viz: ELat positions 11, 14 and 140; FYat positions 23, 119 and 146; GNat positions 31, 64 and 103; HQat positions 1, 40, 58 and 151; LEat positions 70, 91 and 109; YGat positions 24, 66 and 105. c h a p t e r 3 30

Further examination reveals that the digraphGNat positions 64 and 103 is in both cases the beginning of an eight-letter (‘octograph’) repeat:

GNYGYSLE

(these letters have been underlined in the text above).

Eight-letter repeats are very unlikely to occur at random (but Jack Good’s experience referred to later in this chapter shows that even ‘very unlikely’ events do sometimes occur!) so we assume that this is almost certainly significant. We therefore find the distance between the octographs, which is (10364)39 and since 39313 we conjecture that the key has a length of either 3 or 13. We now look at the distances between repeats of the other digraphs such as the following:

ELat positions 11, 14 and 140 gives intervals of 3 and 126 (342);

HQat positions 1, 40, 58 and 151 gives intervals 39, 18 and 93, all

multiples of 3.

These indicate that 3 is by far the most likely length of the keyword. Assuming that this is so the next step is to find the key.

(2) We now believe that three shifts were used; the first shift being applied to the 1st, 4th, 7th, ... letters; the second shift to the 2nd, 5th, 8th, ... letters and the third shift to the 3rd, 6th, 9th, ... letters. We there- fore write the cipher out on a width of three columns and make a fre- quency count of the cipher letters in each of these three columns and we find Table 3.1. The numbers in the rows total 53, 52 and 52. If the fre- quencies were randomly distributed each of the numbers should be about 2, but we could reasonably expect a range from 0 to about 5 or 6. Of course the frequencies are far from random since each individual column consists of plaintext letters which have all been shifted by the same amount. The line of attack then is to look for unusually large frequencies in the hope of identifying the letters which are the enciphered versions of Z, the letter used to represent ‘space’, in the three rows above. In the first row we note thatGoccurs 13 times and thatL, which occurs 7 times, is the next most frequent. IfGis the encipherment ofZthen the shift for the first row is 7 andLwould be the encipherment ofE, the letter 7 places before it in the alphabet. SinceEis a high frequency letter this lends support to our belief that the first shift is 7, i.e. that the first number in the key is 7.

Table 3.1

Letter A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

First shift 0 1 0 010 3 13 4 0 0 1 7 1 2 1 5 0 0 2 2 4 4 0 0 2 1 Second shift 0 0 0 0 13 6 10 0 3 2 2 1 2 3 0 0 6 3 3 1 0 0 2 1 3 1 Third shift 0 0 2 214 1 11 1 0 1 5 5 1 0 1 4 0 2 3 2 0 1 3 4 6 3

In the second row we see thatEoccurs 13 times, which makes it a good candidate for being the encipherment of Z, and this would imply that the second shift is 5. In this row the next most frequent cipher letters are F and Qboth of which occur 6 times and, shifting these back 5 places, we see that they would correspond to plaintext lettersAand Lrespectively which looks promising. On the other hand the cipher version of plaintext Ewould be Jand this cipher letter only occurs twice in the second row whereas we might expect it to occur 5 times, since Eaccounts for about 10% of the letters in typical samples of English. The evidence, though not totally convincing, on balance indicates that the second number of the key is probably 5.

In the third row there is no letter of outstandingly high frequency, with Y, Kand Lwhich occur 6, 5 and 5 times respectively, being the best contenders as the cipher equivalent of Z. We could try each of these in turn but an alternative approach is to write out the beginning of the cipher text, ignoring the spaces after each five-letter group, and using the assumed shifts of 7 and 5 to decrypt the first and second letters in each group of three. The third letter in each group we ‘decrypt’ as ‘/’ and we look to see if we can identify any incomplete words and so deduce the third number of the key. So we have

Cipher: HQEOTFNMKPELTELUEZSIKTFYGSTNMEGNDGLPUJCHQWFEXFEEPR Plain: AL/HO/GH/IE/ME/NE/LD/MA/EN/GH/EI/EG/NE/AL/YE/YE/IM The first word looks as if it is ALTHOUGHand if this is so then plain letter Tbecomes cipher letter Ewhich implies a shift of 11, since Eis 11 places ‘on’ fromT(or, what is the same, 15 places ‘behind’T)in the alphabet. With a shift of 11 the cipher letter corresponding to ‘space’ (i.e. to Z) would beKwhich was one of our possibilities. We conclude that the third number in the key is 11, so that the three-figure enciphermentkey is 7-5-11, and the decipherment key is therefore 19-21-15, and this is confirmed by the full decrypt which is:

c h a p t e r 3

ALTHOUGH I AM AN OLD MAN NIGHT IS GENERALLY MY TIME