Genetic algorithms in cryptography

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

7-1-2004

Genetic algorithms in cryptography

Bethany Delman

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Genetic Algorithms in Cryptography

by

Bethany Delman

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering

Supervised by

Professor Dr. Muhammad Shaaban Department of Computer Engineering Kate Gleason College of Engineering

Rochester Institute of Technology Rochester, New York

July 2004

ApprovedBy:

Dr. Muhammad Shaaban Professor

Primary Advisor

Dr. Stanislaw Radziszowski

Professor, Computer Science Department

7-?.I-o~ Dr. Shanchieh Jay Yang

(3)

Thesis Release Permission Form

Rochester Institute of Technology Kate Gleason College of Engineering Master of Science, Computer Engineering

Title: Genetic Algorithms in Cryptography

I, Bethany Delman, hereby grant permission to the Rochester Institute of Technology to

reproduce my print thesis in whole or in part. Any reproduction will not be for commercial

use or profit.

I, Bethany Delman, additionally grant to the Rochester Institute of Technology

Digi-tal Media Library (RIT DML) the non-exclusive license to archive and provide electronic

access to my thesis in whole or in part in all forms of media in perpetuity.

I understand that my work, in addition to its bibliographic record and abstract, will be

available to the world-wide community of scholars and researchers through the RIT DML.

I retain all other ownership rights to the copyright of the thesis. I also retain the right to

use in future works (such as articles or books) all or part of this thesis. I am aware that the

Rochester Institute of Technology does not require registration of copyrights for ETDs.

I hereby certify that, if appropriate, I have obtained and attached written permission

statements from the owners of each third party copyrighted matter to be included in my

thesis. I certify that the version I submitted is the same as that approved by my committee.

(4)

Abstract

Genetic algorithms _(GAs) are a class of optimization algorithms. GAs attempt to solve

problems through _modeling a simplified version of genetic processes. There are _many

problemsforwhich aGAapproach isuseful. It_{is, however,} undeterminedifcryptanalysis

is such a problem.

Therefore, thiswork explores theuse ofGAs in cryptography. Both traditional crypt

analysis and GA-based methods are implemented in software. The results are then com pared _using the metrics of elapsed time and percentage of successful decryptions. A de terminationis madeforeach cipher under consideration asto the _validityoftheGA-based

approaches found in the literature. In general, these GA-based approaches are typical of thefield.

Ofthe genetic algorithm attacks found in the _{literature, totaling} _twelve, seven were

re-implemented. Ofthese seven, only three achieved _any success. The successful attacks

werethose on the transpositionand permutation ciphers_by Matthews _[20],Clark [4], and Griindlingh and Van Vuuren _[13], respectively. These attacks werefurther investigated in

anattempttoimproveor extendtheirsuccess. _{Unfortunately,}thisattempt wasunsuccessful,

as wasthe attemptto _apply the Clark _[4] attack to the monoalphabetic substitution cipher

and achievethe same orindeed any level of success.

Overall, thestandard fitness equation genetic algorithm approach, and the scoreboard

variant thereof, are not worth the extra effortinvolved. Traditional cryptanalysis methods

are moresuccessful, and easierto implement. While atraditional methodtakesmore _time, a faster unsuccessful attack is worthless. The failure ofthe genetic algorithm approach indicatesthat _{supplementary} research into traditional cryptanalysis methods _{may be} more

(5)

Figures

4. 1 Substitution Cipher_Family and_Attacking Papers 23

4.2 Permutation Cipher_Familyand_Attacking Papers 24

4.3 Knapsack Cipher_Family and_Attacking Papers 24

4.4 Vernam Cipher_Family and_AttackingPaper 24

6.1 Classical SubstitutionAttack 62

6.2 Classical Vigenere Attack 63

6.3 Classical Permutation Attack 64

6.4 Classical Transposition Attack 65

6.5 EffectofNumberofGenerations atPopulation Size 10 67 6.6 EffectofNumberofGenerations atPopulation Size 20 67 6.7 EffectofNumberofGenerations atPopulation Size 40 68 6.8 EffectofNumberofGenerations atPopulation Size 50 68

6.9 EffectofPopulation Sizeat25 Generations 69

6.10 EffectofPopulationSizeat50 Generations 69

6.11 EffectofPopulation Size at 100Generations 70

6.12 EffectofPopulation Sizeat200Generations 70

(9)

Glossary

block

block length

chromosome

cipher

ciphertext

crossover_(mating)

cryptanalysis

cryptography

cryptosystem

decryption

digram

encryption

A sequence of consecutive characters encoded at one

time

Thenumberofcharactersin ablock

The genetic material ofa individual represents the in formation about a possible solutionto thegiven problem

Analgorithm_{for performing}encryption (andthe reverse,

decryption) a series of well-defined steps that can be followed as a procedure. Worksatthelevel ofindividual

letters,or small groups ofletters

A text in the encrypted form produced _by some _cryp

tosystem. Theconventionis forciphertextstocontain no

whitespaceor punctuation

Crossover is the process _by which two chromosomes

combine some portion of their genetic material to pro

duce a child orchildren

The analysis and _deciphering of cryptographic writings

orsystems

Theprocessorskill of_{communicating in}or_deciphering

secret writings orciphers

Thepackage ofall _processes, _formulae, andinstructions for encoding and_decodingmessages_using_cryptography

Any procedure used in cryptography to convert

cipher-text(encrypted _data)intoplaintext

Sequence oftwoconsecutive characters

(10)

fitness

generation

genetic algorithm_(GA)

key keylength monoalphabetic mutation one-timepad order-basedGA plaintext polyalphabetic

The extent to which a possible solution successfully

solvesthe given problem

-usuallya numerical value

Theaverage interval oftimebetween thebirth of parents

andthebirth oftheir_offspring- in the

genetic algorithm

case, this isone iteration ofthe main _loopof code

Search/optimization algorithm based on the mechanics

ofnatural selectionand natural genetics

A _relativelysmall amount ofinformation that is used _by

an algorithmtocustomizethe transformationof plaintext

intociphertext _(during_encryption) or vice versa _(during

decryption)

The size ofthe_key- how

manyvaluescomprisethe_key

Usingone alphabet

-referstoa cryptosystem where each

alphabetic character is mapped to a unique alphabetic

character

Simulation of transcription errors that occur in nature

with alow_probability

-a childis randomlychangedfrom

whatits parents producedin mating

Anothername forthe Vernamcipher

A form of GA where the chromosomes represent per

mutations. Special care must be taken to avoid illegal

permutations

A message beforeencryption or after_{decryption, i.e.,} in

its usual form which anyone can _read, as opposedto its

encryptedform

Usingmany alphabets referstoacipher where each al

phabetic character canbe mapped toone of_many possi

(11)

population

scoreboard

trigram uni gram Vigenere

The possible solutions _{(chromosomes)} _currently under

investigation,as well as the number of solutions thatcan

be investigatedat onetime, i.e., per generation

Methodof_determining thefitness of a possible solution takes a small list ofthe most common digrams and tri-grams andgives each chromosome a scorebasedon how oftenthese combinationsoccur

Sequence ofthreeconsecutive characters Single character

(12)

Chapter

1

Introduction

The application of a genetic algorithm _(GA)to the field of cryptanalysis is rather unique.

Few works exist on this topic. This nontraditional application is investigated todetermine

thebenefitsof_applyingaGAtoacryptanalyticproblem,ifany. IftheGA-based approach

proves _successful, itcould lead to_faster, more automated cryptanalysis techniques. How ever, since this area is so different from the application areas where GAs developed, the

GA-based methods will_likelyprovetobe less successful than the traditionalmethods.

The primary goals of this work are to produce a performance comparison between

traditional cryptanalysis methods and genetic algorithmbased methods, and to determine

the_validityoftypical GA-basedmethodsinthe fieldof cryptanalysis.

The focus will be on classical _ciphers, _including_{substitution,} _{permutation, transposi}

tion, knapsackandVernam ciphers. Theprinciples used in theseciphers form thefounda

tion_{for many} ofthe modem cryptosystems. _Also, ifaGA-based approach isunsuccessful

on these simple_systems, it is unlikely tobe worthwhileto _apply aGA-based approach to

more complicated systems. A thorough search ofthe availableliterature found GA-based

attacks on _only these ciphers. In many cases, these ciphers are some ofthe simplest pos

sible versions. For example, oneoftheknapsack systems attacked is theMerkle-Hellman

knapsack scheme. Inthis scheme,a _{superincreasing}sequenceoflengthn isusedas apub

lic key. _According to Shamir_[23],atypicalvalue fornis 100. IntheGA-basedattacks on

this cipher,nwaslessthan 16. Thisisalmost anorder of magnitudebelowthe_{typical case,}

(13)

GAs.

InotherGA-basedattacks,thereisahuge gap intheknowledgeneededtore-implement

theattack andtheinformationprovidedinthepaper. Forexample, thevalues of_pand hare

vitally necessary foran implementationoftheChor-Rivestknapsack scheme. Thissystem utilizesafinite_{field, q,}of characteristic_p,where_q =

ph,p > h,andthediscrete logarithm

problemis feasible. In theGA-based attack onthis cipher, these values are not provided. This information holemeans that this attack couldhave been mn_using trivia] parameters,

orparameters knownto_{be easy}toattack. Thisattackcan notbeconsidered _remotely valid without_knowingtheseparameters.

ManyoftheGA-basedattacks alsolackinformationrequiredforcomparisonto the tra ditionalattacks. Thenumber of generations ittookfor a run oftheGAtogettosome state is notat all useful when _attempting to compare andcontrast attacks. The most coherent

and _seemingly valid attacks were re-implemented so that a _consistent, reasonable set of metrics couldbecollected. These metricsincludeelapsed timerequired fortheattackand

thesuccess percentage of each attack.

1.1 Document Overview

Thisthesisis broken intoa number of chapters. Eachchaptercovers oneaspect ofthe thesis in detail. Chapter 2gives abrief introduction to genetic algorithms. Thegeneral formof

the selection,reproductionand mutationoperatorsare_discussed,asisthegeneral sequence ofeventsforaGA-based approach. Chapter 3 introduces the_necessaryclassical ciphersin detail. Thealgorithmforeach cipherisgiven, in some cases _includinga small exampleto illustratetheprocess of_usingthegivencryptosystem.

Thenexttwochapters coverthedifferentattacksfound in theliterature. Chapter 4 dis

cussesthe variousGA-based attacks in detail. These attacks are coveredin chronological

(14)

chapter,Chapter5,detailstheclassical attack onthosecipherschosenfor furtherinvestiga

tion. Theciphers were selectedfor further investigation duetothe_difficultyorlack thereof

ofthe traditional attack _method, as well as the_clarity and _quality of theGA-based work.

For_example, the Vernam cipherwas not selected for further investigation. This cipheris

unconditionally secure, meaningthat an attack is successful _only when thereis a protocol or implementation failure. There is no _{basis for attacking} this cipher when no mistakes

have been made.

The nextchapters detail theresults ofthe traditional and GA-based attacks, as well as

extensions to theGA-based attacks. Chapter6covers theresults ofthe traditional attacks,

anddiscussesthoseGA-based attacksthatachieved some success. Chapter 7 discussesthe

extensions madetoseveral ofthe GA-basedattacks,as well asnew work attempted.

Finally, Chapter 8 discusses the overall conclusions drawn from this work, as well as

(15)

Chapter

2

Genetic Algorithms

The genetic algorithm is a search algorithm based on the mechanics of natural selection

and natural genetics [12]. Assummarized_byTomassini _[29],themain ideais thatinorder fora population ofindividualsto adapttosomeenvironment, it shouldbehave likea natu

ral system. Thismeans thatsurvival andreproduction of anindividual is promoted _bythe eliminationof uselessorharmfultraitsand_by_rewardingusefulbehavior. Thegenetic algo rithm belongs to the _familyof_evolutionary algorithms, alongwith genetic programming, evolution _strategies, and _evolutionary programming. _Evolutionary algorithms canbe con sidered as abroadclass of stochastic optimization techniques. _{An evolutionary} algorithm maintains a population of candidate solutions fortheproblem at hand. The population is then evolved_by the iterativeapplication of a set ofstochastic operators. The set ofopera tors _usuallyconsists ofmutation, recombination, and selectionor _{something very} similar. Globallysatisfactory, ifsub-optimal, solutionsto the problem arefound in much thesame

way as populationsin natureadapt to their_surroundingenvironment.

UsingTomassini 'sterms [29],geneticalgorithms_(GAs)consider an optimization prob lem as the environment wherefeasible solutions aretheindividuals _living in thatenviron

ment. The degreeof adaptation of anindividual toitsenvironmentisthecounterpart ofthe fitness functionevaluatedon asolution. _Similarly, a setoffeasiblesolutionstakestheplace

(16)

In _general, a genetic algorithm begins with a _randomly generated set of individuals.

Once the initial population has beencreated, thegenetic algorithm enters a_loop [29]. At

the end of each _iteration, a new population has been produced_by _applying a certain num ber of stochastic operatorsto the previous population. Each such iteration is known as a

generation [29].

Aselection operatorisapplied first. Thiscreates an intermediatepopulationof n "par

ent"

individuals. To produce these _"parents", n independent extractions of an individual from theold population are performed _{[29]. The probability} of each individual _being ex tracted shouldbe _(linearly) proportional to the fitness ofthat individual. This means that above average individuals should have more copies in the new population, while below average individuals should have few tono copiespresent, i.e., abelow average individual risksextinction.

Oncetheintermediatepopulation of

"parents"

(those individualsselectedforreproduc

tion)has beenproduced, the individuals forthenext generation willbecreatedthrough the application of anumberof reproductionoperators[29]. Theseoperators caninvolve oneor

more parents. An operatorthat involves just oneparent, simulatingasexual _{reproduction,} is called a mutation operator. Whenmorethan one parent is_involved, sexual reproduction is_simulated,andtheoperatoriscalledrecombination [29]. Thegenetic algorithmusestwo

reproduction operators

-crossover and mutation.

To apply a crossover _operator, parents are paired together. There are several differ

ent types of crossover operators, and the types available depend on what representation is used forthe individuals. For _binary _string _individuals, _one-point, _two-point, and uni formcrossover are often used. Forpermutation ororder-based _individuals, _order,_partially

(17)

between thefirstand second cutpoints iscopiedfromthe firstparent to thechild [28]. The

remaining places are filled usingelements not _occurring in this section, in the order that

they occurinthe secondparent_{starting from} the second cut pointand _wrapping around as needed. _{For partially} mapped crossover, the section between the two cut points definesa series of_swappingoperationstobe performed on the second parent [28]. Cyclecrossover satisfiestwo conditions _every position ofthe child must retain a valuefound in the cor respondingposition of a _parent, and the child must be a validpermutation. Each_cycle, a random parent isselected. Forexample [28]:

Parent 1 is (h kc ef d b 1 a _{i g}_j) and parent_{2is(abcdefghijkl)}

Forposition#1 in the child, 'h' is selected

Therefore, 'a'

cannotbeselected fromparent2 and mustbe chosenfrom parent 1

Since 'a'

isabove

'i'

inparent _2, T mustalso beselectedfrom parent 1

So, if 'h'

isselected fromparent 1 forposition_#1, then _'a', _{'i\ 'j',} and T mustalso beselectedfromparent 1

The positions ofthese elements are said to form a_cycle, since selection continues

untilthefirstelement

('h'

here) isreached again

After crossover, each individual has a small chance of mutation. The purpose ofthe

mutation operator is to simulatethe effect oftranscription errors that can happen with a very low probability when a chromosome is mutated [29]. A standard mutation operator for_binary stringsis bit inversion. Each bit in an individual hasa small chanceof_mutating into itscomplementi.e. a

'0'

wouldmutateintoa T.

In principle, the loop of selection-crossover-mutation is infinite [29]. _However, it is

usually stopped when a given termination condition is met. Some common termination

conditions are [29]:

(18)

2. A satisfactory solutionhas been found

3. No improvement in solution _{quality has} taken placefora certain numberofgenera tions

The differentterminationconditions are possible since a genetic algorithm is notguar

anteed to converge to a solution. _{The evolutionary} cycle can be summarized as follows [29]:

generation = 0

seed population

while not_{(termination condition)}do generation = _generation _{+ 1}

calculatefitness

selection crossover mutation endwhile

Typicalapplication areasforgenetic algorithms include:

Function Optimization

Graph Applicationssuchas

-Coloring

- Paths

- Circuit Layout

Scheduling

Satisfiability

Game Strategies

(19)

Computing Models

Neural Networks

Lens Design

Many ofthese application areas are concerned with problems which are hard to solve

but have easily verifiable solutions. Another trait common to these application areas is

the equation style offitness function. _Cryptographyand cryptanalysis couldbe considered

to meet these criteria. _However, cryptanalysis is not _closely related to the typical GA

application areas and, subsequently, fitness equations are difficultto generate. This makes

(20)

Chapter 3

Cryptography

In general, the genetic algorithm approach has only been used to attack _fairly simple ci

phers. Most oftheciphers attackedareconsideredtobe classical, i.e. thosecreatedbefore

1950 A.D. _[21] whichhave becomewell knownovertime. Ciphers attacked include:

Monoalphabetic Substitutioncipher

Polyalphabetic Substitutioncipher

Permutationcipher

Transpositioncipher

Merkle-Hellman Knapsackcipher

Chor-RivestKnapsackcipher

Vernamcipher

3.1

Monoalphabetic Substitution cipher

Thesimplestcipherofthoseattackedisthemonoalphabetic substitution cipher. Thiscipher

is described asfollows:

(21)

Letthe_{keys, /C,}consistof all possible permutations ofthesymbols in Z

Foreach permutationit G /C,

1. define theencryption _{function e7I(x)} =

n(x)

2. and define the decryption function _dn(y) =

Tt~l(y), where 7r_1

is the inverse

permutationton.

Forexample,define Ztobethe26 letter Englishalphabet. _Then,a random permutation

7r couldbe (plaintextcharacters are in lowercase and ciphertext characters in upper_case)

[27]:

a b c d e f _g h i _j k 1 m

X N Y A H P 0 G Z _Q W B T

n o _P _q r s t u V w X _y z

S F L R C V M U E K J D I

This definestheencryptionfunction likeso

-en(a) = X,

e^(b) = N, _etc. The

decryp

tion_function,_dn(y) istheinversepermutation,and canbeobtained_by _switchingtherows.

7T"1

is,therefore:

A B C D E F G H I J K L M

d 1 r _y v 0 h e z X w _P t

N O P _Q R S T U V W X Y Z

b _g f _j _q n m u s k a c i

Usingtheencryptionfunction,aplaintext of

'plain'

becomesaciphertext of'LBXZS'.

Aciphertext of 'YZLGHC woulddecryptto 'cipher'.

Monoalphabetic substitution is easy to _recognize, since the _frequency distribution of

(22)

course. This attribute allows associations betweenciphertext andplaintext characters, i.e.

if the most frequent plaintext character X occurred ten times, the ciphertext character X

mapstowill appearten times.

3.2

Polyalphabetic

Substitution cipher

This cipher is a more complex version ofthe substitution cipher. Instead of_mapping the

plaintext alphabet ontociphertext _characters, different substitution mappings areused on

differentportions oftheplaintext[21]. Theresultiscalled polyalphabetic substitution. The

simplest case consistsof_{using different} alphabets_sequentially andrepeatedly, so thatthe

position of each plaintext characterdetermines which _{mapping is} applied to it. Under dif

ferentalphabets, thesame plaintext characterisencryptedtodifferentciphertext_characters,

makingfrequencyanalysis harder.

Thiscipherhasthe _followingproperties _(assuming m mappings):

The_keyspace/Cconsists of all ordered sets of m permutations

Thesepermutations arerepresentedas (ki,_k2,.

,km)

Eachpermutation _ki is definedonthe alphabetinuse

Encryptionofthemessage x = (x1,x2, . . .

,xm) is givenby

ek(x) =

ki(xi)k2(x2) . . . km(xm), assumingthekey k = (fci,k2,. . . ,km)

The decryption_key associated with _{ek(x) is}_dk(y) = (k^1,k2l ,. . .

,fc"1).

The encryption and decryption functions could also be written as follows _(assuming

that theaddition/subtraction operationsoccur modulothe sizeofthealphabet):

eK(x1,x2, .,xm)

= (xi ₊ ki,x2 ₊k2,...,xm ₊ km)

dK(yi,yo.,...,ym) = (y\ - h,y2-k2,. . .

,ym

(23)

The most common polyalphabetic substitution cipher is the Vigenere cipher. Thisci

pher encrypts m characters at a _{time, making} each plaintext character equivalent to m

alphabetic characters. Aexample ofthe Vigenerecipheris:

Supposem= 6

Usingthe_mapping A <-> _0,B <-^-1, . . .

,Z <-> ₂₅

IfthekeywordAC = _CIPHER,

numerically (2, 8, 15,7,4, 17)

Supposethe plaintextis the_string 'cryptosystem'

Converttheplaintext elements toresidues modulo26- 2 17 24 15 19 14 18 24 18 19

4 12

Writetheplaintext elements in groups of6and addthekeywordmodulo26

2 17 24 15 19 14 18 24 18 19 4 12

2 8 15 7 4 17 2 8 15 7 4 17

4 25 13 22 23 5 20 6 7 0 8 3

The alphabetic equivalent oftheciphertext_{string is: EZNWXFUGHAID.}

To_decrypt,thesamekeywordisusedbutissubtractedmodulo26 fromtheciphertext

3.3 Permutation/Transposition ciphers

Althoughsimple transpositionciphers changethedependencies betweenconsecutive char

acters, they are _fairly _easily recognized sincethe _frequency distribution ofthe characters

ispreserved [21]. This istrue for both ciphers_considered,the permutation cipher andthe

columnartransposition cipher. Both ciphers _apply the same _principles, but differ in how

the transformation is applied. The permutation cipher is applied to blocks of characters

(24)

3.3.1 Permutation cipher

The idea behind a permutation cipher is to _keep the plaintext characters unchanged, but

altertheirpositions_by rearrangement_usinga permutation [27].

Thiscipheris defined as:

Let mbe a positive _integer, andACconsist of all permutations of{1,... _,m}

Fora_key_{(permutation)} -n E AC,define:

Theencryption function_e7r(xi,. . . ,xm)

= (x^i),

. . ,x^m))

The decryption function_d^(yi _ym) = (2/^-1(1),

,y*-Hm))

A smallexample,assuming m = _6,_and_the

key isthepermutation n:

ic(l) 2

The first row is the value of _i, and the second row is the _{corresponding} value of

7r(z). The inverse permutation,

ir~x

is constructed _by _{interchanging} the _{two rows,} and

rearranging the columns so that the first row is in _increasing order, Therefore, 7r_1 is:

X 1 2 3 4 5 6

TX~l(x) 3 6 1 5 2 4

Ifthe plaintext given is _{'apermutation',} it is firstpartitioned into groups of m letters

('apermu'

and _'tation') and then rearranged _according to the permutation ir. This yields

'EMAURP'

and _'TOTNIA', _makingtheciphertextequal

'EMAURPTOTNIA'

. The

cipher-textis decrypted in asimilar_process, _usingtheinverse permutation7r_1.

3.3.2 Transposition cipher

Anothertypeof cipherinthis_category isthecolumnartranspositioncipher. Asdescribedin

[24], thiscipher consistsof_writingtheplaintextintoarectangle with a prearranged number

of columns and _transcribing it vertically by columns to yield the ciphertext. The _key is

(25)

rectangle andtheorderinwhich to transcribe thecolumns [24]. This_key _{is usually}derived

from an agreed keyword _by _assigning numbersto the individual _{letters according} to their

alphabetical order.

For _example, ifthe keyword is _SORCERY, the _resulting numerical _key would be 6 3

4 12 5 7 _[24], where numbers are assignedto the _{keyword letters according}to the alpha

betical order ofthe letters. Ifwe wishtoencipherthe message LASER BEAMS CAN BE

MODULATED TO CARRY MOREINTELLIGENCETHAN RADIO _WAVES, it would

first be written out on a width of7underthe numerical key. As is traditional,white spaces

and punctuation are ignored in thisand all ciphertexts used. The rectangle wouldlooklike

this:

S O R C E R Y

6 3 4 12 5 7

L A S E R B E

A M S C A N B

E M O D U L A

T E D T O C A

R R Y M O R E

I N T E L L I

G E N C E T H

A N R A D I O

WAVE S Q R

Sincethe message length was not amultiple ofthe keyword _length, the message does

not fill the entire rectangle. _Deciphering is simplified if the last row contains no blank

cells, so two _dummy letters are added _(here, _Q and _R) to make the rectangle _completely

filled [24]. _Enciphering consists of_readingthe ciphertext out _{vertically in} the orderofthe

numbered columns. Itcan bewritten ingroupsoffive letters atthesame_{time, if}sodesired

[24]. The ciphertext _is, _{therefore, ECDTM ECAER AUOOL EDSAM MERNE} _NASSO

(26)

The decipherer would countthe number of letters in the message ₍₆₃₎ and determine

thelength oftheinscription rectangle _{by dividing} this value _by the length ofthe keyword.

This makes theinscription rectangle 7 x 9 in thiscase. The numerical _key is then written

above the rectangle and the ciphertext characters entered intothe diagram in the order of

the_keynumbers [24]. The plaintextisread off_by_rows, as itwas _originallyentered.

3.4 Knapsack ciphers

Knapsack public-key encryption systems are based on the subset sum problem, an

NP-complete problem. Given in thisproblem are afinite setS C N+ (N+ is the set of natural

numbers {1,_2, . . . }), and a target t G N+. The question is whether thereexists a subset

S'

C S whose elements sumtot [9]. Forexample:

If5 = {1,2,7, 14,

49, 98, 343, 686, 2409, 2793,16808, 17206, 117705, _117993}

Andt = 138457

Then thesubsetS' = {1,

2, 7, 98, 343, 686, 2409, 17206,117705} isa solution.

Thebasic idea of aknapsackcipheris toselect an instance ofthe subset sum problem

that_{is easy} _{to solve,} andthen disguise it asan instance ofthegeneral subset sum problem

whichis_hopefullydifficulttosolve [21]. Theoriginal _(easy)knapsackset can serve asthe

privatekey, whilethe transformedknapsack set serves asthepublickey.

The Merkle-Hellman knapsack cipheris important as it was the first implementation

of a_public-key encryption scheme. _Many variations on it have been suggested but _most,

includingthe original,have been demonstratedtobe insecure [21]. A notableexception to

(27)

3.4.1 Merkle-HellmanKnapsack cipher

TheMerkle-Hellman knapsackcipher attemptstodisguisean _easily solved instanceofthe

subset sum _problem, called a_{superincreasing}subset subproblem, by modularmultiplica

tion and apermutation [21]. A superincreasing sequenceis a sequence (bi,b2,. . .

,bn) of

positive integerswith the_property that_6j > _2^7=1 _fy> freacni,2 <i <n.

The integern is a common system parameter. _{6j, b2,}. . .

,bn is the superincreasing se

quence and M isthe _modulus, selected such thatM > _6] + b2 + . . . + bn. W isa random

integer, such that 1 < _W < M 1 and gcd(H/, 7\/) = 1. _7r is _{a random permutation}

ofthe integers _{1,2,...,n}. The public_key ofA is (ai,a2,. . .

, an), where a; =

Wbv^

mod _M, andtheprivate_keyofA is _(n,M, W, _(h,b2, . . . ,bn)).

The steps inthe transmission of a message m are as follows (B isthe sender, A isthe

receiver) [21]:

1. Encryption - B

shoulddothefollowing:

Obtain A's authenticpublic_key _(a1;_a2, . . . ,an)

Representthemessage m as a_binary _stringoflength n,

m =

m\m2 . . .mn

Compute the integerc=

m\a\ + m2a2 + . . . + mnan

Sendtheciphertext cto A

2. Decryption Torecovertheplaintext mfrom c,Ashould dothefollowing:

Compute d = W~lc _mod M

Bysolvinga_{superincreasing}subset sum_problem,findtheintegersr-\,r2, . . . ,rn,

Ti G {0, 1} suchthatd =

r-\b\ +r2b2 + . . . +rnbn.

Themessagebitsare_m, =

r^), i =

1,2, . . . ,n.

(28)

1. Letn = 6

2. _EntityAchoosesthe_{superincreasing}sequence(12, _17,_{33, 74, 157,}₃₁₆₎

3. M = ₇₃₇

4. VU =635

5. Thepermutation n of{1,_{2, 3,4, 5,}_6} is definedas:

6. 7r(l) =

3,7r(2) =

6,tt(3) =

1,tt(4) =

2,tt(5) =5,tt(6) =4

7. A'spublic_keyistheknapsackset _(319,_196,_{250, 477, 200,}₅₅₉₎

8. A's private_key is _(n,_{M, W,} _(12,_17,_33,_74, _157,₃₁₆₎₎

9. Toencryptthemessage m _101101,B computesc as:

10. c= 319₊ 250₊477₊559 = 1605

1 1. and sends ittoA

1. Todecrypt, Acomputes

2. d = W~lc _mod M = 136

3. andsolves the _{superincreasing}subset sum problemof:

4. 136 = _12r: ₊ _17r2 ₊

33r3 +74r4 + 157r5 +316r6

5. toget 136 = 12 ₊17₊ 33₊ 74

6. Therefore,rx =

1,r2 =

1,r3 =

1,r4 =

1,r5 = _0,

r6 = 0

7. _Applying the permutation n producesthe messagebits:

8. m1 = r3 =

l,ra2 = r6 =

0,m3 = r1 =

1, mA =

r2 = _l,m5 = r5 =

0,m6 =

(29)

3.4.2 Chor-Rivest Knapsackcipher

The Chor-Rivest cipher is the _{only known knapsack public-key} system that does not use

some form of modular multiplication to disguise an _easy subset sum problem [21]. This

cipheris _by farthemost sophisticated ofthose attacked _by agenetic algorithm. When the

parameters ofthis system are _carefully chosen, and no information is _leaked, there is no

feasible attack on theChor-Rivest system. It can not be covered in detail in a few pages,

so_only a_summary ofthesystem will be given. Further numbertheoretical algorithms are

neededforthe_set-up and operation ofthecipher.

Most of the parameters are based on a finite field q, of characteristic _p, where _q =

ph,p > hand forwhich thediscrete logarithmproblem is feasible [21]. f(x) is arandom

monicirreducible polynomial ofdegree h overZp. The elements of_q arerepresented as

polynomials in _Zp[x] of degree less than _h, with multiplication done modulo f(x). g(x)

is a random primitive element ofq. Foreach ground field element i G Zp, the discrete

logarithm _ai = log^^x ₊

i) ofthefieldelement (x + _i) to the _{base g(x)} mustbe found.

The random permutation n is on the set of integers _{0, _{1, 2,}. . .

,p 1}, and the random

integerd is selected such that 0 < d < ph

2. c, is then computed as _c* =

(a^j)

+ _d)

mod (ph _1),0 < i < _p 1. A's public_key is _((c0,Ci, . . .

,cp-i),p,h) andits privatekey

is (f(x),g(x),n,d).

A message mis transmittedasfollows (B isthe_sender,Aisthe_receiver) [21]:

1. Encryption - B

shoulddothefollowing:

Obtain A'sauthentic public _{key ((c0,}_c\,. . .

,cp_i),p,h)

Represent the message m as a_binary _string oflength _{|lg (^)J,} where

()

is a

binomialcoefficient

Considerm as the_binary representationof an integer

Transformthisinteger intoa_binaryvectorM = (M0,

Mi,...

,Mp_i) oflength

p havingexactly h l's asfollows:

(30)

(b) For% from 1 to_pdothefollowing:

(c) Ifm >

(p;')

then set _M^

*-l,m*-m-{p~li), I <- I - 1

(d) Otherwise, set_{A/,_i} < ₀

(e) Note:

()

= 1 for_n > 0,

(?)

=

0) forI > 1

Computec =

2_X^> ^c mo<^ (Ph ~ 1)

Sendtheciphertext ctoA

2. Decryption Torecovertheplaintext mfrom c, Ashoulddothefollowing:

Computer =

(c-hd) mod (ph ₁₎

Compute u(x) = g(x)r

mod f(x)

Compute _s(x) ~

u(x) + f(x), a monic polynomial ofdegree hover_{Zp (Zp}is

theset of integers {0, ...

,p 1}. Addition, subtraction, and multiplication in

Zp are performed modulo_p.)

Factor s(x) into linear factors over_Zp : _s(x) =Ylj=1(x ₊_tf), _where

tj G _Zp

Computea_binary vectorM = (M0,Ml5 . . .

,Mp_i) asfollows:

(a) The componentsofMthatare 1 have indices7r_1(tj), 1 < j < h.

(b) The remainingcomponents are0.

Themessage misrecovered fromM as follows:

(a) Setm

<-0,/ <- _h

(b) Fori from 1 to_pdothefollowing:

(c) if_{Mj_x} = _1, _then _{set m}

<-m+

(P7')

andI < _/ -1

An example_{using artificially}small parameters wouldgo asfollows [21]:

1. EntityAselectsp

= ₇ _andh = 4

2. _Entity A selects the irreducible polynomial _/(x) = x4

+ 3x3 + 5x2 + 6x + 2 of

degree 4overZr. Theelements ofthefinite field 7i are represented as polynomials

(31)

3. _EntityA selectsthe random primitive element_g(x) = 3x3 ₊ 3x2

+ 6

4. _EntityA computesthe_followingdiscrete logarithms:

. _o.o =

logg(:r)(x) = ₁₀₂₈

a2 = logg{x)(:r ₊

1) = 1935

. _a2 = logs(l)(./; ₊

2) = 2054

. _a3 = log9(l)(r ₊

3) = 1008

a., = log5(l)(x₊

4) = 379

a5 = logs(l)(.T₊

5) = 1780

. _a6 = log3(l)(.r ₊

6) = ₂₂₃

5. _Entity A selectstherandom permutation n on _{{0, 1,2,3,4,5,6}} defined_by _7r(0) =

6,tt(1) =

4,tt(2) =

0,tt(3) =

2,tt(4) =

1,tt(5) =

5,tt(6) = ₃

6. _EntityA selectsthe randominteger d = ₁₇₀₂

7. _EntityA computes:

Co = (a6₊

d) mod 2400 = 1925

a = (a4₊

d) mod 2400 = ₂₀₈₁

C2 = (o0₊_d) _mod 2400 = 330

c3 = (a2₊

d) mod 2400 = ₁₃₅₆

c4 = (ai ₊

d) mod 2400 = 1237

c5 = (a5₊

d) mod 2400 = 1082

c6 = (a3₊

d) mod 2400= 310

8. _EntityA'spublic_key is ((c0,Ci,c2, c3,r4, c5,c6),p = 7,h= _4),_andits _private

key is

(32)

9. Toencryptamessagem = 22 for_A,

entity B doesthefollowing:

(a) Obtains A'sauthentic public_{key ((c0,}_{Ci, c2, c3,c4,c5, c$),p,}_h)

(b) Represents m as a_binary _string oflength 5: m = 10110 (Note_that [lg

Qj

=

5).

(c) Transforms mto the_binary vectorM = (1,

0, 1, 1,0,0,1) oflength 7

(d) Computes c = _(c0 ₊

c2 +c3 +_c6) mod 2400= 1521

(e) Sends c= ₁₅₂₁ _toA

10. To decrypttheciphertext c= _1521,A does_the following:

(a) Computes r = (c

-hd) mod 2400 =1913

(b) Computes u(x) =

#(x)(1913) mod _/(.r) = x3

+ 3x2 +2x + 5

(c) Computes s(x) =

u(x) +_f(x) = x4

+ 4x3 +x2

+x

(d) Factors s(x) = _x(x ₊2)(x₊ 3)(x₊

6) (so*, =0,t2 =

2,*3 =

3,t4 = 6).

(e) Thecomponentsof7\/thatare 1 _{have indices 7r_1(0)} =

2,7r-1(2) =

3,7r_I(3) =

6,ir-1

(6) = 0. _Hence, M=

(1,0,1,1,0,0,1)

(f) Transforms M to theinteger m = _22,

recoveringtheplaintext

To illustrate how small the above parametersare, the parameters _originally suggested

fortheChor-Rivestsystem are [21]:

p = 197 _andh = 24

p = 211 _andh = 24

p = 35 _andh= 24

(33)

3.5

Vernam

cipher

The Vernam cipher is a stream cipher defined on the alphabet A = {0,1}. A binary

message _niim2. . . mt is operated on by abinary key string kjk2. kt, ofthe same length

to produce aciphertext _string _cxc2. . . ct where c, = m, (b k,, 1 < i < t [21]. Ifthe key

string is randomly chosen and never used _{again, this}cipheris called aone-time system or

one-time pad.

The one-time pad can be shown tobe _{theoretically} unbreakable [21]. Ifa cryptanalyst

has a ciphertext _string _c\c2...ct encrypted _using a random, non-reused _key string, the

cryptanalyst candonobetterthan guess attheplaintext_being_any _binary _stringoflength t.

It has been proven that an unbreakableVernam system requires a random_key ofthesame

length asthemessage [21].

3.6

Comments

The above ciphers comprise all those cryptosystems attacked_using a genetic algorithm in

a_fairly exhaustive search oftheliterature. The newest ofthese systems are the knapsack

ciphers, with theChor-Rivest system published in _1985, and the Merkle-Hellman system

publishedin 1978 [21]. The other ciphers attacked are all systems olderthan that. In some

cases, centuries older. None ofthe modem ciphers, such as _{DES, AES, RSA,} or Elliptic

Curve, were even attempted_using a genetic algorithm based approach. _{The only} ciphers

that are considered even _{remotely mathematically difficult} are the knapsack systems, and

even _they are nottrue numbertheoretical protocols. This lack makes _many ofthe genetic

(34)

Chapter

4

Relevant Prior Works

While there_{have been many}paperswritten on genetic algorithms and theirapplication to

various_{problems, there}are_{relatively few}papersthat_applygenetic algorithmstocryptanal

ysis. Theearliest papers thatuse a genetic algorithm approach fora cryptanalysis problem

were writtenin _1993,almost_twenty years afterthe_primarypaperon genetic algorithms_by John Holland _[12] [14]. These papersfocus on thesimplest classical and modem _ciphers,

and are cited_frequently_bylaterworks.

Figures 4.1 through4.4 show whichciphers areattacked_by which authors.

Substitution

Monoalphabetic Polyalphabetic

/

V

Spillmanet. al.1993

Clark1994

Clarket. al.1996

Clark&Dawson1997

C

Clark & Dawson

rundlingh&Van Vu 1998

uren2002

(35)

Permutation/Transposition

Permutation Transposition

1

\

Clark1994

Matthews1 993

[image:35.521.101.420.75.220.2]

Grundlingh&Van Vuuren2002

Figure 4.2: Permutation Cipher_Family and_Attacking Papers

Knapsack

Merkle-Hellman

Spillman1993

Clark, Dawson,Bergen 1996

Kolodziejczuk1 997

Chor-Rivest

[image:35.521.53.470.292.463.2]

Yaseen,Sahasrabuddhe 1999

Figure 4.3: Knapsack Cipher_Family and_Attacking Papers

Vernam

Lin,Kao1995

[image:35.521.207.314.536.616.2]

(36)

4.1

Substitution:

Spillman

et al. - 1993

The firstpaper published in 1993 _by Spillman, Janssen, Nelson and Kepner [26], focuses

on thecryptanalysis of a simplesubstitution cipher _usingagenetic algorithm. This paper

selects a genetic algorithm approach so that a directed random search of the _key space

can occur. The paper begins with a review of genetic _algorithms, _covering the genetic

algorithm life _cycle, and the systems needed to _apply a genetic algorithm. The systems

are consideredto be a _key representation, a _mating scheme, and a mutation scheme. The

selected systems are then discussed. The_key representation is an ordered list of alphabet

characters,wherethefirstcharacterinthelist istheplaintextcharacterforthemostfrequent

ciphertext _{character, the} second character is the plaintext character for the second most

frequentciphertextcharacter,and so on. The matingscheme_randomlyselectstwokeys for

mating,weightedin favorofkeyswith ahigh fitnessvalue. The twoparentkeyscreatetwo

childkeys_by_{selecting from}each slotintheparentsthemorefrequentciphertext character.

Thefirstchild_key iscreated_by_{scanning left} _{to right,}andthe second_by_scanning right to

left. Ifthemorefrequentcharacter_{is already in}_{the child, then the}characterfromtheother

parent is selected. The mutation scheme consists of_swapping two characters in the _key

string. Thefitness function uses unigrams and_digrams, andis asfollows:

26 26

Fitness = (1

-J]{|5F[i]

-DF[i\\ +

Y^\SDF[i,j]

-DDF[i,j]|}/4)8

(4.1.1)

i=i i=i

Thefunctionrepresents anormalizederror value. Largerfitnessvaluesrepresentsmaller

errors. _Sensitivity to small differences is increased _by _raisingthe resultto the 8th power,

while _sensitivityto large differences is decreased _by thedivision ofthe summationterms

by 4. _SF[i] is the standard _frequency of character i in English plaintext, while _DF[i] is

the measured _frequency ofthe decoded characteri in the ciphertext. This makes the first

summationtermequal to thesum ofthedifferences betweeneach character's standardfre

quency andthe _frequency ofthe ciphertextcharacterthatdecodes to thatcharacter. SDF

is thestandard_frequency foradigram andDDFthe measured frequency. Whenthemea

(37)

fitnessvalueequal one. Thisfitnessfunction is bounded below_byzero, butcannotevaluate

tozero. Since high fitnessvalues aremore_important,thisdoes not effectthe overall result.

Thecomplete algorithm used in thisworkis:

1. Generatearandompopulationofkeystrings

2. Determine afitnessvalue foreach_key inthepopulation

3. Doabiased random selection of parents

4. _Applythecrossover operation to theparents

5. _Applythe mutationprocessto thechildren

6. Scanthenew population of_keystrings andupdatethe listofthe 10"best"keysseen

Theprocessstops after afixed number of_generations, at whichpoint, thebest keysare

usedtodeciphertheciphertext. Resultsare givenfor 100generation runs ofthe algorithm,

using populations of 10 and50 _keys, and mutation rates of _0.05, _{0.10, 0.20,} and 0.40. In

the smaller population _{size, the} exact_key is found after_{searching less} than 4x10 23 of

the_key _space, or afterthe examination of about 1000 keys. In the larger population _size,

only about20 generationsare needed to examine 1000keys. The best mutation rates for

thisexperiment were thoselessthan0.20.

4.1.1 Comments

Overall, thispaperis a good choice forre-implementation and comparison. It is an _early

work, sodoesnotcompareits results _{to others,}butprovidesadecent level ofresult report

ing and _discussion, _makingcomparison possible. It presents the basic genetic algorithm

conceptsclearly, anduses examplesanddiagrams to _{clarify important}points.

Typicalresults:

- The

(38)

- Visual inspection

ofthe plaintextis required mostofthe time todetermine the

exact _key

Re-implementationapproach:

- The

sameparameters are re-used

- Additional

parameters areusedtoexpandtherangeof parameter values

Observedcryptanalytic usefulness:

- None

-noexactkeys found in_testing

Commenton resultdiscrepancy:

- Different_{measures of}_success_{were used}

4.2 Transposition: Matthews - 1993

The second paper published in 1993 _[20], _by R. A. J. _Matthews, uses an order-based ge

netic algorithmto attack asimpletranspositioncipher. Thepaperbegins_{by discussing}the

archetypal genetic algorithm. This includes the typical genetic algorithm _components, as

well as the two characteristics a problem must have in orderfor it to be attackable _by a

genetic algorithm. Thefirstcharacteristicisthatapartially-correctchromosomemusthave

ahigherfitnessthananincorrectchromosome. Thesecond characteristicisan_{encoding for}

theproblem thatallows geneticoperators to work. The next sectionofthepaperdiscusses

the cryptanalytic use ofthegenetic algorithm. Itconcludesthat_{substitution,}_{transposition,}

permutation, and otherciphers areattackable_byagenetic _algorithm,while ciphers such as

DES(DataEncryption Standard),which is aFeistelcipher, are not attackable.

Thecipher selectedforattack in thiswork isaclassical two-stagetransposition cipher.

Here, a _key of length N takes the form of a permutation ofthe integers 1 to N. The

plaintext, L characters_long, iswrittenbeneaththe_keyto forma matrix TVcharacters wide

(39)

thecolumnsintheorderindicated_bytheintegers in thekey. Decryptionconsists of_writing

down the _key, _determining the length ofthecolumns, writing the ciphertext back intothe

matrixinsequence, and_readingacrossthecolumns. _Breakingthiscipheris_typically a

two-stage process as well. The stages are _determining the lengthofthe transposition sequence

(TV), and then_findingthe permutation oftheTV integers.

The genetic algorithmsystem used toattackthe transposition cipheris knownas

GEN-ALYST. It incorporates allthestandardgenetic algorithm_features, withthe addition ofthe

items needed foran order-based approach. The GENALYSTsystem usesa fitness scoring

system basedon the number oftimes commonEnglish digrams andtrigramsappearinthe

decrypted text. Ten digramandtrigramswere used

-TH, HE, IN, ER, AN, ED, THE, ING,

AND, andEEE. The presence ofEEE resultedin a subtraction offitness_points, as itprac

tically never appears in English. This process was checkedforcorrelation betweenfitness

valueand correctness of_{chromosome, to}improveconfidencein GENALYST. GENALYST

selectsthefittestchromosomes_usinga roulette wheel approachbasedon normalizedfitness

values. This means that the _probability of a chromosome_getting selected is proportional

to its normalized fitness value.

"Elitism"

is present to ensure that the fittestor most elite

chromosomesfoundsofarare alwaysincluded inthe_breedingpool. Afterselection_occurs,

breeding is done with a proportion of the_remaining chromosomes. The proportion used decreases _linearly as the search continues, in order to _help optimize the effectiveness of GENALYST. _Breeding _{is done using}theposition based crossover method fororder-based

genetic algorithms. In this method, a randombit string ofthe samelength as the chromo

somesis used. The randombit string indicateswhich elements to takefrom parent#1 _bya

1 inthe string, andtheelementsomitted fromparent#1 are addedintheorder_they appear

inparent#2. The second childisproduced_by_takingelements fromparent#2 indicated_by

a0 inthe string, and_filling in fromparent#1 asbefore. Twopossible mutation types _may

then be applied. The firsttype isa simpletwo-point mutation wheretworandom chromo

some elements are swapped. The secondtypeis ashuffle_mutation, wherethepermutation

(40)

A testprocedure isgiven toensurethat test textshave atrue fitness value thatis repre

sentativeofEnglish text. This is calculated mathematically, based on digram/trigram per

centage_frequency in thetext,the fitnessscore given_by GENALYSTto thatdi- ortri-gram,

andthe summation over all di-and tri-gramschecked. The textlength is also important, as

it is easierto attacktexts with a lengththatis an integermultiple ofthe_key length. Based

onthese_{considerations,}atesttextwith alengthof 181 characters was selected. This length

is_prime, butclose tomultiples of_{6, 9, 10,} and 12. This is a worst-case scenario since not

all column lengths arethe same, and multiple possibilities forthe _key length are present.

A MonteCarlo search is run on the same text so that the genetic algorithm results can be

compared to the results from a traditional search technique. In a Monte Carlo search, a

series ofguesses is made to determine the correct _key, and the most elite chromosome is

kept.

The_breaking ofatranspositioncipherhastwoparts: _findingthecorrect_keylength, and

finding thecorrect permutation ofthe key. GENALYSTwas tested on both parts. Forthe

first part, the test text was encrypted _using a target key. GENALYST was then used to

find the best decryption ofthe _{text, using} seven different keylengths in the range of 6 to

12, which includes the length of the targetkey. Thegoal of this section was tosee ifthe

fittestoftheseven chromosomeshadthesame lengthasthe targetkey. Thepopulation size

used was _20, and the number ofgenerations was _{25. The probability} ofcrossover started

at0.8 anddecreasedto_{0.5. Mutation probability} started at0.1 for bothpoint mutation and

shufflemutation, _increasingto 0.5 forpoint mutation and to 0.8 forshuffle mutation. The

experiment was runfivetimeseach forthree target_keylengths, K = _7,

9,11. GENALYST

was successful in 13 ofthe 15 runs (87%). However, the random Monte Carlo algorithm

performed _equally well in this task, also_producinga success rate of87%. Forthesecond

part, the sameparameters of population size and number ofgenerations were used. GEN

ALYSTmanaged to_{completely break} thecipherin one ofthe runsfor K = 7 _{and one} _of

the runs for K = 9.

Overall, GENALYST involves little more computational effort than

(41)

Therefore, a genetic algorithm approach is useful even when _only a small advantage is

produced. _Using alarger gene pool and number of generationsboosts GENALYST's per

formance _{significantly} overtheperformance ofMonte Carlo. Thegreatest improvementin

GENALYST's performance comesfrom _hybridizing the algorithm with other techniques,

such as perming.

4.2.1 Comments

Overall, this is one ofthe best reference papers available. It is an _early work, but is very

complete. Itcovers background material and problem considerations _well, and has a _very

balanced approachto_consideringthegenetic algorithm as apossible solution. Theparam

eters are _clearly _reported, as are the algorithms used. This is an excellent candidate for

re-implementation andcomparison.

Typicalresults:

-Keylength correctly determined 87%ofthe time

- Correct

keyfound 13.33%ofthe time

- Different_textlengths

used, sameparameters re-used

- Additional _{parameters are used}to_expandthe

range ofparameter values

- Low- low decryption

percentage

Commentonresultdiscrepancy:

-Percentages reported are based on number of tests and different numbers of

(42)

4.3

Merkle-Hellman

Knapsack: Spillman -

1993

The third paper published in _1993, _by R. Spillman [25], applies a genetic algorithm _ap

proach to aMerkle-Hellman knapsack system. This paperconsiders thegenetic algorithm

as_simply another_{possibility for}thecryptanalysis ofknapsackciphers. Aknapsackcipher

is basedon theNP-complete problem ofknapsack packing, which is an application ofthe

subset sumproblem. Theproblem statement is: "Given n objects each with aknown vol

ume and aknapsack offixed _volume, is there a subset ofthe n objects which _{exactly fills}

theknapsack ?". The idea behind the knapsack cipheris that _findingthe sum is hard, and

this fact can be used toprotect an _encoding ofinformation. There must also be a decod

ingalgorithmthat_{is relatively easy for} theintended recipientto apply. An_easy _decoding

algorithm can be made _by _using the _{easy knapsacks.} An _{easy knapsack}consists of a se

quence of numbersthat is_{superincreasing,} _i.e., each numberis greaterthan the sum ofthe

previous numbers. _However, an_{easy knapsack does} not protectthe data ifthe elements of

the knapsack areknown. A Merkle-Hellman knapsack system converts an _{easy knapsack}

into a trapdoorknapsack. A trapdoor knapsack is not superincreasing, making it harder

toattack. The Merkle-Hellmanprocedure tocreate atrapdoorknapsacksequence A is as

follows:

1. Givena simpleknapsacksequence A'

=

(a\, . . .

,a'n)

2. Selectan integeru > _2a'n

3. Selectanotherintegerw such that_gcd(u,_w) 1

4. Find theinverseof w mod u

5. Constructthe trapdoorsequence A =

wA'

mod u

Once the _{easy knapsack has been} converted into a trapdoor _knapsack, both the _easy

knapsack andthevalue ofw~l

must be known to decodethe data. Thetarget sum forthe

(43)

w_1

togetherand then_{reducing it} modulo u. This system canbecomeapublic-keycipher,

with the trapdoorsequencethe public key. The private _key would be the _{superincreasing}

sequenceandthevalues of_{u, w,} andw"1.

Thenext section ofthepaperintroducesgenetic algorithms_usingbinarychromosomes.

The applicable processes of_{selection, mating,} and mutation are described. Selection is a

random process which is biased so thatthe chromosomes with higher fitness values are

more _likely to be selected. The matingprocess is a simple crossover operation, i.e., bits

s + 1 to rof parent#1 are swapped withbits s + 1 tor of parent#2. Mutationconsists of

switchingabit inthechromosometoitscomplement,i.e., a0 becomesa 1.

Thethree systemsthatneedto be defined fora genetic algorithm application are: _rep

resentation scheme, matingprocess, and mutation process. The representation used here

isthatof_binary chromosomes,which representthesummation termsofthe_knapsack, _i.e.,

a 1 in the chromosome indicates that that term should be included when _calculating the

knapsack sum. The evaluation function is determined once a representation scheme has

been selected. _Here, thefunction measures howclose a given sum ofterms isto the target

sum oftheknapsack. Otherpropertiesincluded forthisexperiment are:

Function rangebetween 0and 1, with 1 indicating an exact match

Chromosomesthatproduceasum greaterthan the targetshouldhavealower_fitness,

ingeneral, than those thatproduce a sumlessthan the target

It shouldbe difficulttoproduce ahigh fitnessvalue

Theactualevaluationfunctionuseddependsonthe_{relationship between}the target sum

andthechromosome sum.The function isdeterminedasfollows:

1. Calculatethe maximum differencethatcould occurbetween achromosome andthe

targetsum

MaxDiff =

m&x(Target,FullSum

-Target) (4.3.1)

(44)

2. Determinethe Sumofthecurrent chromosome

3. IfSum < _Target_then

Fitness= 1 _\J\Sum Target\/Target (4.3.2)

4. IfSum > Target then

Fitness = 1 - {/\Sum

-Target]/MaxDiff (4.3.3)

As described above, the _mating process is simple crossover. The mutation process

consists ofthree different possibilities. Abouthalfthe time, bitsare _randomly mutated as

described above (switched to theircomplement). Next most _likely is the _swapping of a

bitwith a _{neighboring bit. The final possibility is} a reversal ofthe bit order between two

points.

A description of the complete algorithm is given, as well as results for one 15-item

knapsack appliedto a5 charactertext. The charactersare encoded as 8 bit ASCIIcharac

ters. Eachofthe5 sums (oneper_character) wasattacked _usingthegenetic _algorithm,with

an initial populationof20random 15-bit_binarystrings.

4.3.1 Comments

Thispaperisreasonablefor its length (about 10pages) and year. Aminimallevelofexperi

mentationis included buttheresultsandparameters are_decentlyreported. The_introductory

material anddiscussion are reasonable, although short. Re-implementationshould not be

difficult, andtherearesufficient resultstomakea comparison possible. _Overall,thispaper

isareasonable choiceforre-implementationandcomparison.

Typical results:

- Used 5 ASCII

characters,each ofwhich wasdecrypted

(45)

-Notre-implemented

Observedcryptanalyticusefulness:

- None

-theknapsacksize istrivial

Commenton resultdiscrepancy:

- No

comment, notre-implemented

4.4

Substitution/Permutation:

Clark - 1994

The only paper published in _1994, _by AndrewClark _[4], includes the genetic algorithm as

one ofthreeoptimizationalgorithmsappliedtocryptanalysis. Theothertwoalgorithms are

tabusearch and simulatedannealing. Thefirstsection ofthepaperdescribesthe twociphers

used

-simple substitution and permutation. _Next, _suitabilityassessmentis discussed. This

section is wherethe fitness functionsaredeveloped. Forthesubstitution_{cipher, the} fitness

function is

Fk =

(a|SF[t]

-DF[t\\+P^2J^\SDF[i\[i)

-DDF[t\\j]\) (4-4.1)

A denotes the set of characters in the alphabet (A, B,C,...,Z,space). _SF[i] is the

relative _frequency of characteri in _English, while _DF[i] is the relative _frequency ofthe

decoded characteri in the message _{decrypted using}the_key k k. _SDF[i\[j] is the relative

frequency ofthe digram _ij in English, while _DDF[i][j] is the relative _frequency of that

digram in the message decrypted using k. _Varying a and /?allows one tofavor eitherthe

unigram or digram frequencies. For the permutation cipher, the same approach was used

as in _[20]

-the _scoring tablefor digrams andtrigrams.

The next three sections introduce the basic concepts of simulated _annealing, genetic

algorithms, and tabu search. The genetic algorithm section is _fairly _standard, _{and con}

(46)

specifics aboutthealgorithm used. The final sectionin thepaperdiscusses theresults. It is

fairly short, anddoes not give_{many details.}

4.4.1 Comments

Overall, thispaper is too short (5 pages) tobe useful. It is well written, but lacks details.

The_tinygeneticalgorithm section andfewreported resultsmake it difficulttore-implement

or even tocompare against. This paperis nota good candidatefor furtheruse.

Typicalresults:

- Convergence

ratesandcomputation timearethe _only reported results

- Parameters_{were not reported}

- Standard

parameter values usedin other re-implementations were used

-Only the permutation attack was re-implemented as alater workfromthis au

thorwas usedforthe substitution attack

- Medium- the

correct_key wasfoundwith ahigh successrate

Comment on resultdiscrepancy:

- No_real discrepancy, different_{combination of metrics used}

4.5 Vernam:

Lin,

Kao

- 1995

This is the _only paperpublished in 1995. _By Feng-Tse Lin and Cheng-Yan Kao, [18] is a

ciphertext-only attack on aVernamcipher. Thepaperbegins _by _introducing _cryptography

(47)

itself. The Vernam cipher is a one-time pad style system. Let M =

m\,m2, ...

,mn

denote a plaintextbit stream andK =

ky, k2, . . .

,kr akey bitstream. The Vernamcipher

generates a ciphertext bit stream C =

Ek(M) =

cuc2, ,cn, where a

= (m{ ₊

ki)

mod_p and_p is a base. Since this is a _one-key (symmetric or_private-key) cryptosystem,

the decryption formula has the same _key bit stream _K, _{only M} _Dk(C) =

my, m2, . . .,

where_m{ = (ct

kt)mod p. Theproposed approachistodetermineK fromanintercepted

ciphertext _C, anduseitto break thecipher.

Sincethedescribedcryptosystemis nottheVernam cipher,andthereare_manyerrorsin

this paper,both in notation and _logic, this applicationneeds nofurtherdiscussion. The ge

neticalgorithm approach is rendered invalid_by theerrors presentin thereported approach

and assumptions.

4.5.1 Comments

This paper is of below average quality. It is rather short, and was written forthe IEEE

International Conference on _{Systems, Man,} and Cybernetics. There are some problems

with Englishgrammar,possibly dueto translationfromthe originallanguage. Thesystems

are _adequately _described, but _are, however incorrect. The cipher given inthispaper is not

<

Genetic algorithms in cryptography

Rochester Institute of Technology

Recommended Citation

Thesis Release Permission Form

Abstract

Contents

Figures

Glossary

Chapter

Chapter

Chapter 3

Polyalphabetic

'EMAURP'

WAVE S Q R

Vernam

Comments

Chapter

Substitution:

Y^\SDF[i,j]

-TH, HE, IN, ER, AN, ED, THE, ING,

9,11. GENALYST

Merkle-Hellman

Substitution/Permutation: