• No results found

Genetic algorithms in cryptography

N/A
N/A
Protected

Academic year: 2019

Share "Genetic algorithms in cryptography"

Copied!
98
0
0

Loading.... (view fulltext now)

Full text

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

7-1-2004

Genetic algorithms in cryptography

Bethany Delman

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Genetic Algorithms in Cryptography

by

Bethany Delman

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering

Supervised by

Professor Dr. Muhammad Shaaban Department of Computer Engineering Kate Gleason College of Engineering

Rochester Institute of Technology Rochester, New York

July 2004

ApprovedBy:

Dr. Muhammad Shaaban Professor

Primary Advisor

Dr. Stanislaw Radziszowski

Professor, Computer Science Department

7-?.I-o~ Dr. Shanchieh Jay Yang

(3)

Thesis Release Permission Form

Rochester Institute of Technology Kate Gleason College of Engineering Master of Science, Computer Engineering

Title: Genetic Algorithms in Cryptography

I, Bethany Delman, hereby grant permission to the Rochester Institute of Technology to

reproduce my print thesis in whole or in part. Any reproduction will not be for commercial

use or profit.

I, Bethany Delman, additionally grant to the Rochester Institute of Technology

Digi-tal Media Library (RIT DML) the non-exclusive license to archive and provide electronic

access to my thesis in whole or in part in all forms of media in perpetuity.

I understand that my work, in addition to its bibliographic record and abstract, will be

available to the world-wide community of scholars and researchers through the RIT DML.

I retain all other ownership rights to the copyright of the thesis. I also retain the right to

use in future works (such as articles or books) all or part of this thesis. I am aware that the

Rochester Institute of Technology does not require registration of copyrights for ETDs.

I hereby certify that, if appropriate, I have obtained and attached written permission

statements from the owners of each third party copyrighted matter to be included in my

thesis. I certify that the version I submitted is the same as that approved by my committee.

(4)

Abstract

Genetic algorithms (GAs) are a class of optimization algorithms. GAs attempt to solve

problems through modeling a simplified version of genetic processes. There are many

problemsforwhich aGAapproach isuseful. Itis, however, undeterminedifcryptanalysis

is such a problem.

Therefore, thiswork explores theuse ofGAs in cryptography. Both traditional crypt

analysis and GA-based methods are implemented in software. The results are then com pared using the metrics of elapsed time and percentage of successful decryptions. A de terminationis madeforeach cipher under consideration asto the validityoftheGA-based

approaches found in the literature. In general, these GA-based approaches are typical of thefield.

Ofthe genetic algorithm attacks found in the literature, totaling twelve, seven were

re-implemented. Ofthese seven, only three achieved any success. The successful attacks

werethose on the transpositionand permutation ciphersby Matthews [20],Clark [4], and Griindlingh and Van Vuuren [13], respectively. These attacks werefurther investigated in

anattempttoimproveor extendtheirsuccess. Unfortunately,thisattempt wasunsuccessful,

as wasthe attemptto apply the Clark [4] attack to the monoalphabetic substitution cipher

and achievethe same orindeed any level of success.

Overall, thestandard fitness equation genetic algorithm approach, and the scoreboard

variant thereof, are not worth the extra effortinvolved. Traditional cryptanalysis methods

are moresuccessful, and easierto implement. While atraditional methodtakesmore time, a faster unsuccessful attack is worthless. The failure ofthe genetic algorithm approach indicatesthat supplementary research into traditional cryptanalysis methods may be more

(5)

Contents

1 Introduction .... .... ... .... 1

1.1 Document Overview .... 2

2 Genetic Algorithms 4

3 Cryptography ... 9

3.1 Monoalphabetic Substitutioncipher 9

3.2 Polyalphabetic Substitution cipher 11

3.3 Permutation/Transposition ciphers 12

3.3.1 Permutationcipher 13

3.3.2 Transpositioncipher 13

3.4 Knapsackciphers 15

3.4.1 Merkle-Hellman Knapsackcipher 16

3.4.2 Chor-RivestKnapsackcipher 18

3.5 Vernam cipher 22

3.6 Comments 22

4 Relevant Prior Works 23

4.1 Spillmanetal. 1993 25

4.1.1 Comments 26

4.2 Matthews- 1993 27

4.2.1 Comments 30

(6)

4.3.1 Comments 33

4.4 Clark 1994 34

4.4.1 Comments 35

4.5 Lin, Kao 1995 35

4.5.1 Comments 36

4.6 Clark, Dawson, Bergen 1996 37

4.6.1 Comments 38

4.7 Clark, Dawson, Nieuwland 1996 39

4.7.1 Comments 41

4.8 Clark,Dawson - 1997 41

4.9 Kolodziejczyk

-1997 42

4.9.1 Comments 42

4.10 Clark,Dawson - 1998 43

4.10.1 Comments 44

4.11 Yaseen, Sahasrabuddhe - 1999 45

4.11.1 Comments 47

4.12 Grundlingh,Van Vuuren submitted 2002 47

4.12.1 Comments 50

4.13 Overall Comments 51

5 Traditional Cryptanalysis Methods 52

5.1 Distinguishingamongthe threeciphers 53

5.2 Monoalphabeticsubstitution cipher 53

5.3 Polyalphabetic substitution cipher 54

5.4 Permutation/Transpositionciphers 56

5.5 Comments 58

6 Attack Results . . 59

(7)

6.2 Classical Attack Results 61

6.3 Genetic Algorithm Attack Results 64

6.4 Comparison ofResults 71

7 ExtensionsandAdditions 74

7.1 Genetic Algorithm AttackExtensions 74

7.1.1 Transposition Cipher Attacks 75

7.1.2 Permutation Cipher Attack 76

7.1.3 Comments 77

7.2 Additions 77

7.2.1 Comments 79

8 Evaluationand Conclusion 81

8.1 Summary 81

8.2 Future Work 83

8.3 Conclusion 83

(8)

List

of

Figures

4. 1 Substitution CipherFamily andAttacking Papers 23

4.2 Permutation CipherFamilyandAttacking Papers 24

4.3 Knapsack CipherFamily andAttacking Papers 24

4.4 Vernam CipherFamily andAttackingPaper 24

6.1 Classical SubstitutionAttack 62

6.2 Classical Vigenere Attack 63

6.3 Classical Permutation Attack 64

6.4 Classical Transposition Attack 65

6.5 EffectofNumberofGenerations atPopulation Size 10 67 6.6 EffectofNumberofGenerations atPopulation Size 20 67 6.7 EffectofNumberofGenerations atPopulation Size 40 68 6.8 EffectofNumberofGenerations atPopulation Size 50 68

6.9 EffectofPopulation Sizeat25 Generations 69

6.10 EffectofPopulationSizeat50 Generations 69

6.11 EffectofPopulation Size at 100Generations 70

6.12 EffectofPopulation Sizeat200Generations 70

(9)

Glossary

block

block length

chromosome

cipher

ciphertext

crossover(mating)

cryptanalysis

cryptography

cryptosystem

decryption

digram

encryption

A sequence of consecutive characters encoded at one

time

Thenumberofcharactersin ablock

The genetic material ofa individual represents the in formation about a possible solutionto thegiven problem

Analgorithmfor performingencryption (andthe reverse,

decryption) a series of well-defined steps that can be followed as a procedure. Worksatthelevel ofindividual

letters,or small groups ofletters

A text in the encrypted form produced by some cryp

tosystem. Theconventionis forciphertextstocontain no

whitespaceor punctuation

Crossover is the process by which two chromosomes

combine some portion of their genetic material to pro

duce a child orchildren

The analysis and deciphering of cryptographic writings

orsystems

Theprocessorskill ofcommunicating inordeciphering

secret writings orciphers

Thepackage ofall processes, formulae, andinstructions for encoding anddecodingmessagesusingcryptography

Any procedure used in cryptography to convert

cipher-text(encrypted data)intoplaintext

Sequence oftwoconsecutive characters

(10)

fitness

generation

genetic algorithm(GA)

key keylength monoalphabetic mutation one-timepad order-basedGA plaintext polyalphabetic

The extent to which a possible solution successfully

solvesthe given problem

-usuallya numerical value

Theaverage interval oftimebetween thebirth of parents

andthebirth oftheiroffspring- in the

genetic algorithm

case, this isone iteration ofthe main loopof code

Search/optimization algorithm based on the mechanics

ofnatural selectionand natural genetics

A relativelysmall amount ofinformation that is used by

an algorithmtocustomizethe transformationof plaintext

intociphertext (duringencryption) or vice versa (during

decryption)

The size ofthekey- how

manyvaluescomprisethekey

Usingone alphabet

-referstoa cryptosystem where each

alphabetic character is mapped to a unique alphabetic

character

Simulation of transcription errors that occur in nature

with alowprobability

-a childis randomlychangedfrom

whatits parents producedin mating

Anothername forthe Vernamcipher

A form of GA where the chromosomes represent per

mutations. Special care must be taken to avoid illegal

permutations

A message beforeencryption or afterdecryption, i.e., in

its usual form which anyone can read, as opposedto its

encryptedform

Usingmany alphabets referstoacipher where each al

phabetic character canbe mapped toone ofmany possi

(11)

population

scoreboard

trigram uni gram Vigenere

The possible solutions (chromosomes) currently under

investigation,as well as the number of solutions thatcan

be investigatedat onetime, i.e., per generation

Methodofdetermining thefitness of a possible solution takes a small list ofthe most common digrams and tri-grams andgives each chromosome a scorebasedon how oftenthese combinationsoccur

Sequence ofthreeconsecutive characters Single character

(12)

Chapter

1

Introduction

The application of a genetic algorithm (GA)to the field of cryptanalysis is rather unique.

Few works exist on this topic. This nontraditional application is investigated todetermine

thebenefitsofapplyingaGAtoacryptanalyticproblem,ifany. IftheGA-based approach

proves successful, itcould lead tofaster, more automated cryptanalysis techniques. How ever, since this area is so different from the application areas where GAs developed, the

GA-based methods willlikelyprovetobe less successful than the traditionalmethods.

The primary goals of this work are to produce a performance comparison between

traditional cryptanalysis methods and genetic algorithmbased methods, and to determine

thevalidityoftypical GA-basedmethodsinthe fieldof cryptanalysis.

The focus will be on classical ciphers, includingsubstitution, permutation, transposi

tion, knapsackandVernam ciphers. Theprinciples used in theseciphers form thefounda

tionfor many ofthe modem cryptosystems. Also, ifaGA-based approach isunsuccessful

on these simplesystems, it is unlikely tobe worthwhileto apply aGA-based approach to

more complicated systems. A thorough search ofthe availableliterature found GA-based

attacks on only these ciphers. In many cases, these ciphers are some ofthe simplest pos

sible versions. For example, oneoftheknapsack systems attacked is theMerkle-Hellman

knapsack scheme. Inthis scheme,a superincreasingsequenceoflengthn isusedas apub

lic key. According to Shamir[23],atypicalvalue fornis 100. IntheGA-basedattacks on

this cipher,nwaslessthan 16. Thisisalmost anorder of magnitudebelowthetypical case,

(13)

GAs.

InotherGA-basedattacks,thereisahuge gap intheknowledgeneededtore-implement

theattack andtheinformationprovidedinthepaper. Forexample, thevalues ofpand hare

vitally necessary foran implementationoftheChor-Rivestknapsack scheme. Thissystem utilizesafinitefield, q,of characteristicp,whereq =

ph,p > h,andthediscrete logarithm

problemis feasible. In theGA-based attack onthis cipher, these values are not provided. This information holemeans that this attack couldhave been mnusing trivia] parameters,

orparameters knowntobe easytoattack. Thisattackcan notbeconsidered remotely valid withoutknowingtheseparameters.

ManyoftheGA-basedattacks alsolackinformationrequiredforcomparisonto the tra ditionalattacks. Thenumber of generations ittookfor a run oftheGAtogettosome state is notat all useful when attempting to compare andcontrast attacks. The most coherent

and seemingly valid attacks were re-implemented so that a consistent, reasonable set of metrics couldbecollected. These metricsincludeelapsed timerequired fortheattackand

thesuccess percentage of each attack.

1.1 Document Overview

Thisthesisis broken intoa number of chapters. Eachchaptercovers oneaspect ofthe thesis in detail. Chapter 2gives abrief introduction to genetic algorithms. Thegeneral formof

the selection,reproductionand mutationoperatorsarediscussed,asisthegeneral sequence ofeventsforaGA-based approach. Chapter 3 introduces thenecessaryclassical ciphersin detail. Thealgorithmforeach cipherisgiven, in some cases includinga small exampleto illustratetheprocess ofusingthegivencryptosystem.

Thenexttwochapters coverthedifferentattacksfound in theliterature. Chapter 4 dis

cussesthe variousGA-based attacks in detail. These attacks are coveredin chronological

(14)

chapter,Chapter5,detailstheclassical attack onthosecipherschosenfor furtherinvestiga

tion. Theciphers were selectedfor further investigation duetothedifficultyorlack thereof

ofthe traditional attack method, as well as theclarity and quality of theGA-based work.

Forexample, the Vernam cipherwas not selected for further investigation. This cipheris

unconditionally secure, meaningthat an attack is successful only when thereis a protocol or implementation failure. There is no basis for attacking this cipher when no mistakes

have been made.

The nextchapters detail theresults ofthe traditional and GA-based attacks, as well as

extensions to theGA-based attacks. Chapter6covers theresults ofthe traditional attacks,

anddiscussesthoseGA-based attacksthatachieved some success. Chapter 7 discussesthe

extensions madetoseveral ofthe GA-basedattacks,as well asnew work attempted.

Finally, Chapter 8 discusses the overall conclusions drawn from this work, as well as

(15)

Chapter

2

Genetic Algorithms

The genetic algorithm is a search algorithm based on the mechanics of natural selection

and natural genetics [12]. AssummarizedbyTomassini [29],themain ideais thatinorder fora population ofindividualsto adapttosomeenvironment, it shouldbehave likea natu

ral system. Thismeans thatsurvival andreproduction of anindividual is promoted bythe eliminationof uselessorharmfultraitsandbyrewardingusefulbehavior. Thegenetic algo rithm belongs to the familyofevolutionary algorithms, alongwith genetic programming, evolution strategies, and evolutionary programming. Evolutionary algorithms canbe con sidered as abroadclass of stochastic optimization techniques. An evolutionary algorithm maintains a population of candidate solutions fortheproblem at hand. The population is then evolvedby the iterativeapplication of a set ofstochastic operators. The set ofopera tors usuallyconsists ofmutation, recombination, and selectionor something very similar. Globallysatisfactory, ifsub-optimal, solutionsto the problem arefound in much thesame

way as populationsin natureadapt to theirsurroundingenvironment.

UsingTomassini 'sterms [29],geneticalgorithms(GAs)consider an optimization prob lem as the environment wherefeasible solutions aretheindividuals living in thatenviron

ment. The degreeof adaptation of anindividual toitsenvironmentisthecounterpart ofthe fitness functionevaluatedon asolution. Similarly, a setoffeasiblesolutionstakestheplace

(16)

In general, a genetic algorithm begins with a randomly generated set of individuals.

Once the initial population has beencreated, thegenetic algorithm enters aloop [29]. At

the end of each iteration, a new population has been producedby applying a certain num ber of stochastic operatorsto the previous population. Each such iteration is known as a

generation [29].

Aselection operatorisapplied first. Thiscreates an intermediatepopulationof n "par

ent"

individuals. To produce these "parents", n independent extractions of an individual from theold population are performed [29]. The probability of each individual being ex tracted shouldbe (linearly) proportional to the fitness ofthat individual. This means that above average individuals should have more copies in the new population, while below average individuals should have few tono copiespresent, i.e., abelow average individual risksextinction.

Oncetheintermediatepopulation of

"parents"

(those individualsselectedforreproduc

tion)has beenproduced, the individuals forthenext generation willbecreatedthrough the application of anumberof reproductionoperators[29]. Theseoperators caninvolve oneor

more parents. An operatorthat involves just oneparent, simulatingasexual reproduction, is called a mutation operator. Whenmorethan one parent isinvolved, sexual reproduction issimulated,andtheoperatoriscalledrecombination [29]. Thegenetic algorithmusestwo

reproduction operators

-crossover and mutation.

To apply a crossover operator, parents are paired together. There are several differ

ent types of crossover operators, and the types available depend on what representation is used forthe individuals. For binary string individuals, one-point, two-point, and uni formcrossover are often used. Forpermutation ororder-based individuals, order,partially

(17)

between thefirstand second cutpoints iscopiedfromthe firstparent to thechild [28]. The

remaining places are filled usingelements not occurring in this section, in the order that

they occurinthe secondparentstarting from the second cut pointand wrapping around as needed. For partially mapped crossover, the section between the two cut points definesa series ofswappingoperationstobe performed on the second parent [28]. Cyclecrossover satisfiestwo conditions every position ofthe child must retain a valuefound in the cor respondingposition of a parent, and the child must be a validpermutation. Eachcycle, a random parent isselected. Forexample [28]:

Parent 1 is (h kc ef d b 1 a i gj) and parent2is(abcdefghijkl)

Forposition#1 in the child, 'h' is selected

Therefore, 'a'

cannotbeselected fromparent2 and mustbe chosenfrom parent 1

Since 'a'

isabove

'i'

inparent 2, T mustalso beselectedfrom parent 1

So, if 'h'

isselected fromparent 1 forposition#1, then 'a', 'i\ 'j', and T mustalso beselectedfromparent 1

The positions ofthese elements are said to form acycle, since selection continues

untilthefirstelement

('h'

here) isreached again

After crossover, each individual has a small chance of mutation. The purpose ofthe

mutation operator is to simulatethe effect oftranscription errors that can happen with a very low probability when a chromosome is mutated [29]. A standard mutation operator forbinary stringsis bit inversion. Each bit in an individual hasa small chanceofmutating into itscomplementi.e. a

'0'

wouldmutateintoa T.

In principle, the loop of selection-crossover-mutation is infinite [29]. However, it is

usually stopped when a given termination condition is met. Some common termination

conditions are [29]:

(18)

2. A satisfactory solutionhas been found

3. No improvement in solution quality has taken placefora certain numberofgenera tions

The differentterminationconditions are possible since a genetic algorithm is notguar

anteed to converge to a solution. The evolutionary cycle can be summarized as follows [29]:

generation = 0

seed population

while not(termination condition)do generation = generation + 1

calculatefitness

selection crossover mutation endwhile

Typicalapplication areasforgenetic algorithms include:

Function Optimization

Graph Applicationssuchas

-Coloring

- Paths

- Circuit Layout

Scheduling

Satisfiability

Game Strategies

(19)

Computing Models

Neural Networks

Lens Design

Many ofthese application areas are concerned with problems which are hard to solve

but have easily verifiable solutions. Another trait common to these application areas is

the equation style offitness function. Cryptographyand cryptanalysis couldbe considered

to meet these criteria. However, cryptanalysis is not closely related to the typical GA

application areas and, subsequently, fitness equations are difficultto generate. This makes

(20)

Chapter 3

Cryptography

In general, the genetic algorithm approach has only been used to attack fairly simple ci

phers. Most oftheciphers attackedareconsideredtobe classical, i.e. thosecreatedbefore

1950 A.D. [21] whichhave becomewell knownovertime. Ciphers attacked include:

Monoalphabetic Substitutioncipher

Polyalphabetic Substitutioncipher

Permutationcipher

Transpositioncipher

Merkle-Hellman Knapsackcipher

Chor-RivestKnapsackcipher

Vernamcipher

3.1

Monoalphabetic Substitution cipher

Thesimplestcipherofthoseattackedisthemonoalphabetic substitution cipher. Thiscipher

is described asfollows:

(21)

Letthekeys, /C,consistof all possible permutations ofthesymbols in Z

Foreach permutationit G /C,

1. define theencryption function e7I(x) =

n(x)

2. and define the decryption function dn(y) =

Tt~l(y), where 7r_1

is the inverse

permutationton.

Forexample,define Ztobethe26 letter Englishalphabet. Then,a random permutation

7r couldbe (plaintextcharacters are in lowercase and ciphertext characters in uppercase)

[27]:

a b c d e f g h i j k 1 m

X N Y A H P 0 G Z Q W B T

n o P q r s t u V w X y z

S F L R C V M U E K J D I

This definestheencryptionfunction likeso

-en(a) = X,

e^(b) = N, etc. The

decryp

tionfunction,dn(y) istheinversepermutation,and canbeobtainedby switchingtherows.

7T"1

is,therefore:

A B C D E F G H I J K L M

d 1 r y v 0 h e z X w P t

N O P Q R S T U V W X Y Z

b g f j q n m u s k a c i

Usingtheencryptionfunction,aplaintext of

'plain'

becomesaciphertext of'LBXZS'.

Aciphertext of 'YZLGHC woulddecryptto 'cipher'.

Monoalphabetic substitution is easy to recognize, since the frequency distribution of

(22)

course. This attribute allows associations betweenciphertext andplaintext characters, i.e.

if the most frequent plaintext character X occurred ten times, the ciphertext character X

mapstowill appearten times.

3.2

Polyalphabetic

Substitution cipher

This cipher is a more complex version ofthe substitution cipher. Instead ofmapping the

plaintext alphabet ontociphertext characters, different substitution mappings areused on

differentportions oftheplaintext[21]. Theresultiscalled polyalphabetic substitution. The

simplest case consistsofusing different alphabetssequentially andrepeatedly, so thatthe

position of each plaintext characterdetermines which mapping is applied to it. Under dif

ferentalphabets, thesame plaintext characterisencryptedtodifferentciphertextcharacters,

makingfrequencyanalysis harder.

Thiscipherhasthe followingproperties (assuming m mappings):

Thekeyspace/Cconsists of all ordered sets of m permutations

Thesepermutations arerepresentedas (ki,k2,.

,km)

Eachpermutation ki is definedonthe alphabetinuse

Encryptionofthemessage x = (x1,x2, . . .

,xm) is givenby

ek(x) =

ki(xi)k2(x2) . . . km(xm), assumingthekey k = (fci,k2,. . . ,km)

The decryptionkey associated with ek(x) isdk(y) = (k^1,k2l ,. . .

,fc"1).

The encryption and decryption functions could also be written as follows (assuming

that theaddition/subtraction operationsoccur modulothe sizeofthealphabet):

eK(x1,x2, .,xm)

= (xi + ki,x2 +k2,...,xm + km)

dK(yi,yo.,...,ym) = (y\ - h,y2-k2,. . .

,ym

(23)

The most common polyalphabetic substitution cipher is the Vigenere cipher. Thisci

pher encrypts m characters at a time, making each plaintext character equivalent to m

alphabetic characters. Aexample ofthe Vigenerecipheris:

Supposem= 6

Usingthemapping A <-> 0,B <-^-1, . . .

,Z <-> 25

IfthekeywordAC = CIPHER,

numerically (2, 8, 15,7,4, 17)

Supposethe plaintextis thestring 'cryptosystem'

Converttheplaintext elements toresidues modulo26- 2 17 24 15 19 14 18 24 18 19

4 12

Writetheplaintext elements in groups of6and addthekeywordmodulo26

2 17 24 15 19 14 18 24 18 19 4 12

2 8 15 7 4 17 2 8 15 7 4 17

4 25 13 22 23 5 20 6 7 0 8 3

The alphabetic equivalent oftheciphertextstring is: EZNWXFUGHAID.

Todecrypt,thesamekeywordisusedbutissubtractedmodulo26 fromtheciphertext

3.3 Permutation/Transposition ciphers

Althoughsimple transpositionciphers changethedependencies betweenconsecutive char

acters, they are fairly easily recognized sincethe frequency distribution ofthe characters

ispreserved [21]. This istrue for both ciphersconsidered,the permutation cipher andthe

columnartransposition cipher. Both ciphers apply the same principles, but differ in how

the transformation is applied. The permutation cipher is applied to blocks of characters

(24)

3.3.1 Permutation cipher

The idea behind a permutation cipher is to keep the plaintext characters unchanged, but

altertheirpositionsby rearrangementusinga permutation [27].

Thiscipheris defined as:

Let mbe a positive integer, andACconsist of all permutations of{1,... ,m}

Forakey(permutation) -n E AC,define:

Theencryption functione7r(xi,. . . ,xm)

= (x^i),

. . ,x^m))

The decryption functiond^(yi ym) = (2/^-1(1),

,y*-Hm))

A smallexample,assuming m = 6,andthe

key isthepermutation n:

ic(l) 2

The first row is the value of i, and the second row is the corresponding value of

7r(z). The inverse permutation,

ir~x

is constructed by interchanging the two rows, and

rearranging the columns so that the first row is in increasing order, Therefore, 7r_1 is:

X 1 2 3 4 5 6

TX~l(x) 3 6 1 5 2 4

Ifthe plaintext given is 'apermutation', it is firstpartitioned into groups of m letters

('apermu'

and 'tation') and then rearranged according to the permutation ir. This yields

'EMAURP'

and 'TOTNIA', makingtheciphertextequal

'EMAURPTOTNIA'

. The

cipher-textis decrypted in asimilarprocess, usingtheinverse permutation7r_1.

3.3.2 Transposition cipher

Anothertypeof cipherinthiscategory isthecolumnartranspositioncipher. Asdescribedin

[24], thiscipher consistsofwritingtheplaintextintoarectangle with a prearranged number

of columns and transcribing it vertically by columns to yield the ciphertext. The key is

(25)

rectangle andtheorderinwhich to transcribe thecolumns [24]. Thiskey is usuallyderived

from an agreed keyword by assigning numbersto the individual letters according to their

alphabetical order.

For example, ifthe keyword is SORCERY, the resulting numerical key would be 6 3

4 12 5 7 [24], where numbers are assignedto the keyword letters accordingto the alpha

betical order ofthe letters. Ifwe wishtoencipherthe message LASER BEAMS CAN BE

MODULATED TO CARRY MOREINTELLIGENCETHAN RADIO WAVES, it would

first be written out on a width of7underthe numerical key. As is traditional,white spaces

and punctuation are ignored in thisand all ciphertexts used. The rectangle wouldlooklike

this:

S O R C E R Y

6 3 4 12 5 7

L A S E R B E

A M S C A N B

E M O D U L A

T E D T O C A

R R Y M O R E

I N T E L L I

G E N C E T H

A N R A D I O

WAVE S Q R

Sincethe message length was not amultiple ofthe keyword length, the message does

not fill the entire rectangle. Deciphering is simplified if the last row contains no blank

cells, so two dummy letters are added (here, Q and R) to make the rectangle completely

filled [24]. Enciphering consists ofreadingthe ciphertext out vertically in the orderofthe

numbered columns. Itcan bewritten ingroupsoffive letters atthesametime, ifsodesired

[24]. The ciphertext is, therefore, ECDTM ECAER AUOOL EDSAM MERNE NASSO

(26)

The decipherer would countthe number of letters in the message (63) and determine

thelength oftheinscription rectangle by dividing this value by the length ofthe keyword.

This makes theinscription rectangle 7 x 9 in thiscase. The numerical key is then written

above the rectangle and the ciphertext characters entered intothe diagram in the order of

thekeynumbers [24]. The plaintextisread offbyrows, as itwas originallyentered.

3.4 Knapsack ciphers

Knapsack public-key encryption systems are based on the subset sum problem, an

NP-complete problem. Given in thisproblem are afinite setS C N+ (N+ is the set of natural

numbers {1,2, . . . }), and a target t G N+. The question is whether thereexists a subset

S'

C S whose elements sumtot [9]. Forexample:

If5 = {1,2,7, 14,

49, 98, 343, 686, 2409, 2793,16808, 17206, 117705, 117993}

Andt = 138457

Then thesubsetS' = {1,

2, 7, 98, 343, 686, 2409, 17206,117705} isa solution.

Thebasic idea of aknapsackcipheris toselect an instance ofthe subset sum problem

thatis easy to solve, andthen disguise it asan instance ofthegeneral subset sum problem

whichishopefullydifficulttosolve [21]. Theoriginal (easy)knapsackset can serve asthe

privatekey, whilethe transformedknapsack set serves asthepublickey.

The Merkle-Hellman knapsack cipheris important as it was the first implementation

of apublic-key encryption scheme. Many variations on it have been suggested but most,

includingthe original,have been demonstratedtobe insecure [21]. A notableexception to

(27)

3.4.1 Merkle-HellmanKnapsack cipher

TheMerkle-Hellman knapsackcipher attemptstodisguisean easily solved instanceofthe

subset sum problem, called asuperincreasingsubset subproblem, by modularmultiplica

tion and apermutation [21]. A superincreasing sequenceis a sequence (bi,b2,. . .

,bn) of

positive integerswith theproperty that6j > 2^7=1 fy> freacni,2 <i <n.

The integern is a common system parameter. 6j, b2,. . .

,bn is the superincreasing se

quence and M isthe modulus, selected such thatM > 6] + b2 + . . . + bn. W isa random

integer, such that 1 < W < M 1 and gcd(H/, 7\/) = 1. 7r is a random permutation

ofthe integers {1,2,...,n}. The publickey ofA is (ai,a2,. . .

, an), where a; =

Wbv^

mod M, andtheprivatekeyofA is (n,M, W, (h,b2, . . . ,bn)).

The steps inthe transmission of a message m are as follows (B isthe sender, A isthe

receiver) [21]:

1. Encryption - B

shoulddothefollowing:

Obtain A's authenticpublickey (a1;a2, . . . ,an)

Representthemessage m as abinary stringoflength n,

m =

m\m2 . . .mn

Compute the integerc=

m\a\ + m2a2 + . . . + mnan

Sendtheciphertext cto A

2. Decryption Torecovertheplaintext mfrom c,Ashould dothefollowing:

Compute d = W~lc mod M

Bysolvingasuperincreasingsubset sumproblem,findtheintegersr-\,r2, . . . ,rn,

Ti G {0, 1} suchthatd =

r-\b\ +r2b2 + . . . +rnbn.

Themessagebitsarem, =

r^), i =

1,2, . . . ,n.

(28)

1. Letn = 6

2. EntityAchoosesthesuperincreasingsequence(12, 17,33, 74, 157,316)

3. M = 737

4. VU =635

5. Thepermutation n of{1,2, 3,4, 5,6} is definedas:

6. 7r(l) =

3,7r(2) =

6,tt(3) =

1,tt(4) =

2,tt(5) =5,tt(6) =4

7. A'spublickeyistheknapsackset (319,196,250, 477, 200,559)

8. A's privatekey is (n,M, W, (12,17,33,74, 157,316))

9. Toencryptthemessage m 101101,B computesc as:

10. c= 319+ 250+477+559 = 1605

1 1. and sends ittoA

1. Todecrypt, Acomputes

2. d = W~lc mod M = 136

3. andsolves the superincreasingsubset sum problemof:

4. 136 = 12r: + 17r2 +

33r3 +74r4 + 157r5 +316r6

5. toget 136 = 12 +17+ 33+ 74

6. Therefore,rx =

1,r2 =

1,r3 =

1,r4 =

1,r5 = 0,

r6 = 0

7. Applying the permutation n producesthe messagebits:

8. m1 = r3 =

l,ra2 = r6 =

0,m3 = r1 =

1, mA =

r2 = l,m5 = r5 =

0,m6 =

(29)

3.4.2 Chor-Rivest Knapsackcipher

The Chor-Rivest cipher is the only known knapsack public-key system that does not use

some form of modular multiplication to disguise an easy subset sum problem [21]. This

cipheris by farthemost sophisticated ofthose attacked by agenetic algorithm. When the

parameters ofthis system are carefully chosen, and no information is leaked, there is no

feasible attack on theChor-Rivest system. It can not be covered in detail in a few pages,

soonly asummary ofthesystem will be given. Further numbertheoretical algorithms are

neededfortheset-up and operation ofthecipher.

Most of the parameters are based on a finite field q, of characteristic p, where q =

ph,p > hand forwhich thediscrete logarithmproblem is feasible [21]. f(x) is arandom

monicirreducible polynomial ofdegree h overZp. The elements ofq arerepresented as

polynomials in Zp[x] of degree less than h, with multiplication done modulo f(x). g(x)

is a random primitive element ofq. Foreach ground field element i G Zp, the discrete

logarithm ai = log^^x +

i) ofthefieldelement (x + i) to the base g(x) mustbe found.

The random permutation n is on the set of integers {0, 1, 2,. . .

,p 1}, and the random

integerd is selected such that 0 < d < ph

2. c, is then computed as c* =

(a^j)

+ d)

mod (ph 1),0 < i < p 1. A's publickey is ((c0,Ci, . . .

,cp-i),p,h) andits privatekey

is (f(x),g(x),n,d).

A message mis transmittedasfollows (B isthesender,Aisthereceiver) [21]:

1. Encryption - B

shoulddothefollowing:

Obtain A'sauthentic public key ((c0,c\,. . .

,cp_i),p,h)

Represent the message m as abinary string oflength |lg (^)J, where

()

is a

binomialcoefficient

Considerm as thebinary representationof an integer

Transformthisinteger intoabinaryvectorM = (M0,

Mi,...

,Mp_i) oflength

p havingexactly h l's asfollows:

(30)

(b) For% from 1 topdothefollowing:

(c) Ifm >

(p;')

then set M^

*-l,m*-m-{p~li), I <- I - 1

(d) Otherwise, setA/,_i < 0

(e) Note:

()

= 1 forn > 0,

(?)

=

0) forI > 1

Computec =

2_X^> ^c mo<^ (Ph ~ 1)

Sendtheciphertext ctoA

2. Decryption Torecovertheplaintext mfrom c, Ashoulddothefollowing:

Computer =

(c-hd) mod (ph 1)

Compute u(x) = g(x)r

mod f(x)

Compute s(x) ~

u(x) + f(x), a monic polynomial ofdegree hoverZp (Zpis

theset of integers {0, ...

,p 1}. Addition, subtraction, and multiplication in

Zp are performed modulop.)

Factor s(x) into linear factors overZp : s(x) =Ylj=1(x +tf), where

tj G Zp

Computeabinary vectorM = (M0,Ml5 . . .

,Mp_i) asfollows:

(a) The componentsofMthatare 1 have indices7r_1(tj), 1 < j < h.

(b) The remainingcomponents are0.

Themessage misrecovered fromM as follows:

(a) Setm

<-0,/ <- h

(b) Fori from 1 topdothefollowing:

(c) ifMj_x = 1, then set m

<-m+

(P7')

andI < / -1

An exampleusing artificiallysmall parameters wouldgo asfollows [21]:

1. EntityAselectsp

= 7 andh = 4

2. Entity A selects the irreducible polynomial /(x) = x4

+ 3x3 + 5x2 + 6x + 2 of

degree 4overZr. Theelements ofthefinite field 7i are represented as polynomials

(31)

3. EntityA selectsthe random primitive elementg(x) = 3x3 + 3x2

+ 6

4. EntityA computesthefollowingdiscrete logarithms:

. o.o =

logg(:r)(x) = 1028

a2 = logg{x)(:r +

1) = 1935

. a2 = logs(l)(./; +

2) = 2054

. a3 = log9(l)(r +

3) = 1008

a., = log5(l)(x+

4) = 379

a5 = logs(l)(.T+

5) = 1780

. a6 = log3(l)(.r +

6) = 223

5. Entity A selectstherandom permutation n on {0, 1,2,3,4,5,6} definedby 7r(0) =

6,tt(1) =

4,tt(2) =

0,tt(3) =

2,tt(4) =

1,tt(5) =

5,tt(6) = 3

6. EntityA selectsthe randominteger d = 1702

7. EntityA computes:

Co = (a6+

d) mod 2400 = 1925

a = (a4+

d) mod 2400 = 2081

C2 = (o0+d) mod 2400 = 330

c3 = (a2+

d) mod 2400 = 1356

c4 = (ai +

d) mod 2400 = 1237

c5 = (a5+

d) mod 2400 = 1082

c6 = (a3+

d) mod 2400= 310

8. EntityA'spublickey is ((c0,Ci,c2, c3,r4, c5,c6),p = 7,h= 4),andits private

key is

(32)

9. Toencryptamessagem = 22 forA,

entity B doesthefollowing:

(a) Obtains A'sauthentic publickey ((c0,Ci, c2, c3,c4,c5, c$),p,h)

(b) Represents m as abinary string oflength 5: m = 10110 (Notethat [lg

Qj

=

5).

(c) Transforms mto thebinary vectorM = (1,

0, 1, 1,0,0,1) oflength 7

(d) Computes c = (c0 +

c2 +c3 +c6) mod 2400= 1521

(e) Sends c= 1521 toA

10. To decrypttheciphertext c= 1521,A doesthe following:

(a) Computes r = (c

-hd) mod 2400 =1913

(b) Computes u(x) =

#(x)(1913) mod /(.r) = x3

+ 3x2 +2x + 5

(c) Computes s(x) =

u(x) +f(x) = x4

+ 4x3 +x2

+x

(d) Factors s(x) = x(x +2)(x+ 3)(x+

6) (so*, =0,t2 =

2,*3 =

3,t4 = 6).

(e) Thecomponentsof7\/thatare 1 have indices 7r_1(0) =

2,7r-1(2) =

3,7r_I(3) =

6,ir-1

(6) = 0. Hence, M=

(1,0,1,1,0,0,1)

(f) Transforms M to theinteger m = 22,

recoveringtheplaintext

To illustrate how small the above parametersare, the parameters originally suggested

fortheChor-Rivestsystem are [21]:

p = 197 andh = 24

p = 211 andh = 24

p = 35 andh= 24

(33)

3.5

Vernam

cipher

The Vernam cipher is a stream cipher defined on the alphabet A = {0,1}. A binary

message niim2. . . mt is operated on by abinary key string kjk2. kt, ofthe same length

to produce aciphertext string cxc2. . . ct where c, = m, (b k,, 1 < i < t [21]. Ifthe key

string is randomly chosen and never used again, thiscipheris called aone-time system or

one-time pad.

The one-time pad can be shown tobe theoretically unbreakable [21]. Ifa cryptanalyst

has a ciphertext string c\c2...ct encrypted using a random, non-reused key string, the

cryptanalyst candonobetterthan guess attheplaintextbeingany binary stringoflength t.

It has been proven that an unbreakableVernam system requires a randomkey ofthesame

length asthemessage [21].

3.6

Comments

The above ciphers comprise all those cryptosystems attackedusing a genetic algorithm in

afairly exhaustive search oftheliterature. The newest ofthese systems are the knapsack

ciphers, with theChor-Rivest system published in 1985, and the Merkle-Hellman system

publishedin 1978 [21]. The other ciphers attacked are all systems olderthan that. In some

cases, centuries older. None ofthe modem ciphers, such as DES, AES, RSA, or Elliptic

Curve, were even attemptedusing a genetic algorithm based approach. The only ciphers

that are considered even remotely mathematically difficult are the knapsack systems, and

even they are nottrue numbertheoretical protocols. This lack makes many ofthe genetic

(34)

Chapter

4

Relevant Prior Works

While therehave been manypaperswritten on genetic algorithms and theirapplication to

variousproblems, therearerelatively fewpapersthatapplygenetic algorithmstocryptanal

ysis. Theearliest papers thatuse a genetic algorithm approach fora cryptanalysis problem

were writtenin 1993,almosttwenty years aftertheprimarypaperon genetic algorithmsby John Holland [12] [14]. These papersfocus on thesimplest classical and modem ciphers,

and are citedfrequentlybylaterworks.

Figures 4.1 through4.4 show whichciphers areattackedby which authors.

Substitution

Monoalphabetic Polyalphabetic

/

V

Spillmanet. al.1993

Clark1994

Clarket. al.1996

Clark&Dawson1997

C

Clark & Dawson

rundlingh&Van Vu 1998

uren2002

(35)

Permutation/Transposition

Permutation Transposition

1

\

Clark1994

Matthews1 993

[image:35.521.101.420.75.220.2]

Grundlingh&Van Vuuren2002

Figure 4.2: Permutation CipherFamily andAttacking Papers

Knapsack

Merkle-Hellman

Spillman1993

Clark, Dawson,Bergen 1996

Kolodziejczuk1 997

Chor-Rivest

[image:35.521.53.470.292.463.2]

Yaseen,Sahasrabuddhe 1999

Figure 4.3: Knapsack CipherFamily andAttacking Papers

Vernam

Lin,Kao1995

[image:35.521.207.314.536.616.2]
(36)

4.1

Substitution:

Spillman

et al. - 1993

The firstpaper published in 1993 by Spillman, Janssen, Nelson and Kepner [26], focuses

on thecryptanalysis of a simplesubstitution cipher usingagenetic algorithm. This paper

selects a genetic algorithm approach so that a directed random search of the key space

can occur. The paper begins with a review of genetic algorithms, covering the genetic

algorithm life cycle, and the systems needed to apply a genetic algorithm. The systems

are consideredto be a key representation, a mating scheme, and a mutation scheme. The

selected systems are then discussed. Thekey representation is an ordered list of alphabet

characters,wherethefirstcharacterinthelist istheplaintextcharacterforthemostfrequent

ciphertext character, the second character is the plaintext character for the second most

frequentciphertextcharacter,and so on. The matingschemerandomlyselectstwokeys for

mating,weightedin favorofkeyswith ahigh fitnessvalue. The twoparentkeyscreatetwo

childkeysbyselecting fromeach slotintheparentsthemorefrequentciphertext character.

Thefirstchildkey iscreatedbyscanning left to right,andthe secondbyscanning right to

left. Ifthemorefrequentcharacteris already inthe child, then thecharacterfromtheother

parent is selected. The mutation scheme consists ofswapping two characters in the key

string. Thefitness function uses unigrams anddigrams, andis asfollows:

26 26

Fitness = (1

-J]{|5F[i]

-DF[i\\ +

Y^\SDF[i,j]

-DDF[i,j]|}/4)8

(4.1.1)

i=i i=i

Thefunctionrepresents anormalizederror value. Largerfitnessvaluesrepresentsmaller

errors. Sensitivity to small differences is increased by raisingthe resultto the 8th power,

while sensitivityto large differences is decreased by thedivision ofthe summationterms

by 4. SF[i] is the standard frequency of character i in English plaintext, while DF[i] is

the measured frequency ofthe decoded characteri in the ciphertext. This makes the first

summationtermequal to thesum ofthedifferences betweeneach character's standardfre

quency andthe frequency ofthe ciphertextcharacterthatdecodes to thatcharacter. SDF

is thestandardfrequency foradigram andDDFthe measured frequency. Whenthemea

(37)

fitnessvalueequal one. Thisfitnessfunction is bounded belowbyzero, butcannotevaluate

tozero. Since high fitnessvalues aremoreimportant,thisdoes not effectthe overall result.

Thecomplete algorithm used in thisworkis:

1. Generatearandompopulationofkeystrings

2. Determine afitnessvalue foreachkey inthepopulation

3. Doabiased random selection of parents

4. Applythecrossover operation to theparents

5. Applythe mutationprocessto thechildren

6. Scanthenew population ofkeystrings andupdatethe listofthe 10"best"keysseen

Theprocessstops after afixed number ofgenerations, at whichpoint, thebest keysare

usedtodeciphertheciphertext. Resultsare givenfor 100generation runs ofthe algorithm,

using populations of 10 and50 keys, and mutation rates of 0.05, 0.10, 0.20, and 0.40. In

the smaller population size, the exactkey is found aftersearching less than 4x10 23 of

thekey space, or afterthe examination of about 1000 keys. In the larger population size,

only about20 generationsare needed to examine 1000keys. The best mutation rates for

thisexperiment were thoselessthan0.20.

4.1.1 Comments

Overall, thispaperis a good choice forre-implementation and comparison. It is an early

work, sodoesnotcompareits results to others,butprovidesadecent level ofresult report

ing and discussion, makingcomparison possible. It presents the basic genetic algorithm

conceptsclearly, anduses examplesanddiagrams to clarify importantpoints.

Typicalresults:

- The

(38)

- Visual inspection

ofthe plaintextis required mostofthe time todetermine the

exact key

Re-implementationapproach:

- The

sameparameters are re-used

- Additional

parameters areusedtoexpandtherangeof parameter values

Observedcryptanalytic usefulness:

- None

-noexactkeys found intesting

Commenton resultdiscrepancy:

- Differentmeasures ofsuccesswere used

4.2 Transposition: Matthews - 1993

The second paper published in 1993 [20], by R. A. J. Matthews, uses an order-based ge

netic algorithmto attack asimpletranspositioncipher. Thepaperbeginsby discussingthe

archetypal genetic algorithm. This includes the typical genetic algorithm components, as

well as the two characteristics a problem must have in orderfor it to be attackable by a

genetic algorithm. Thefirstcharacteristicisthatapartially-correctchromosomemusthave

ahigherfitnessthananincorrectchromosome. Thesecond characteristicisanencoding for

theproblem thatallows geneticoperators to work. The next sectionofthepaperdiscusses

the cryptanalytic use ofthegenetic algorithm. Itconcludesthatsubstitution,transposition,

permutation, and otherciphers areattackablebyagenetic algorithm,while ciphers such as

DES(DataEncryption Standard),which is aFeistelcipher, are not attackable.

Thecipher selectedforattack in thiswork isaclassical two-stagetransposition cipher.

Here, a key of length N takes the form of a permutation ofthe integers 1 to N. The

plaintext, L characterslong, iswrittenbeneaththekeyto forma matrix TVcharacters wide

(39)

thecolumnsintheorderindicatedbytheintegers in thekey. Decryptionconsists ofwriting

down the key, determining the length ofthecolumns, writing the ciphertext back intothe

matrixinsequence, andreadingacrossthecolumns. Breakingthiscipheristypically a

two-stage process as well. The stages are determining the lengthofthe transposition sequence

(TV), and thenfindingthe permutation oftheTV integers.

The genetic algorithmsystem used toattackthe transposition cipheris knownas

GEN-ALYST. It incorporates allthestandardgenetic algorithmfeatures, withthe addition ofthe

items needed foran order-based approach. The GENALYSTsystem usesa fitness scoring

system basedon the number oftimes commonEnglish digrams andtrigramsappearinthe

decrypted text. Ten digramandtrigramswere used

-TH, HE, IN, ER, AN, ED, THE, ING,

AND, andEEE. The presence ofEEE resultedin a subtraction offitnesspoints, as itprac

tically never appears in English. This process was checkedforcorrelation betweenfitness

valueand correctness ofchromosome, toimproveconfidencein GENALYST. GENALYST

selectsthefittestchromosomesusinga roulette wheel approachbasedon normalizedfitness

values. This means that the probability of a chromosomegetting selected is proportional

to its normalized fitness value.

"Elitism"

is present to ensure that the fittestor most elite

chromosomesfoundsofarare alwaysincluded inthebreedingpool. Afterselectionoccurs,

breeding is done with a proportion of theremaining chromosomes. The proportion used decreases linearly as the search continues, in order to help optimize the effectiveness of GENALYST. Breeding is done usingtheposition based crossover method fororder-based

genetic algorithms. In this method, a randombit string ofthe samelength as the chromo

somesis used. The randombit string indicateswhich elements to takefrom parent#1 bya

1 inthe string, andtheelementsomitted fromparent#1 are addedintheorderthey appear

inparent#2. The second childisproducedbytakingelements fromparent#2 indicatedby

a0 inthe string, andfilling in fromparent#1 asbefore. Twopossible mutation types may

then be applied. The firsttype isa simpletwo-point mutation wheretworandom chromo

some elements are swapped. The secondtypeis ashufflemutation, wherethepermutation

(40)

A testprocedure isgiven toensurethat test textshave atrue fitness value thatis repre

sentativeofEnglish text. This is calculated mathematically, based on digram/trigram per

centagefrequency in thetext,the fitnessscore givenby GENALYSTto thatdi- ortri-gram,

andthe summation over all di-and tri-gramschecked. The textlength is also important, as

it is easierto attacktexts with a lengththatis an integermultiple ofthekey length. Based

ontheseconsiderations,atesttextwith alengthof 181 characters was selected. This length

isprime, butclose tomultiples of6, 9, 10, and 12. This is a worst-case scenario since not

all column lengths arethe same, and multiple possibilities forthe key length are present.

A MonteCarlo search is run on the same text so that the genetic algorithm results can be

compared to the results from a traditional search technique. In a Monte Carlo search, a

series ofguesses is made to determine the correct key, and the most elite chromosome is

kept.

Thebreaking ofatranspositioncipherhastwoparts: findingthecorrectkeylength, and

finding thecorrect permutation ofthe key. GENALYSTwas tested on both parts. Forthe

first part, the test text was encrypted using a target key. GENALYST was then used to

find the best decryption ofthe text, using seven different keylengths in the range of 6 to

12, which includes the length of the targetkey. Thegoal of this section was tosee ifthe

fittestoftheseven chromosomeshadthesame lengthasthe targetkey. Thepopulation size

used was 20, and the number ofgenerations was 25. The probability ofcrossover started

at0.8 anddecreasedto0.5. Mutation probability started at0.1 for bothpoint mutation and

shufflemutation, increasingto 0.5 forpoint mutation and to 0.8 forshuffle mutation. The

experiment was runfivetimeseach forthree targetkeylengths, K = 7,

9,11. GENALYST

was successful in 13 ofthe 15 runs (87%). However, the random Monte Carlo algorithm

performed equally well in this task, alsoproducinga success rate of87%. Forthesecond

part, the sameparameters of population size and number ofgenerations were used. GEN

ALYSTmanaged tocompletely break thecipherin one ofthe runsfor K = 7 and one of

the runs for K = 9.

Overall, GENALYST involves little more computational effort than

(41)

Therefore, a genetic algorithm approach is useful even when only a small advantage is

produced. Using alarger gene pool and number of generationsboosts GENALYST's per

formance significantly overtheperformance ofMonte Carlo. Thegreatest improvementin

GENALYST's performance comesfrom hybridizing the algorithm with other techniques,

such as perming.

4.2.1 Comments

Overall, this is one ofthe best reference papers available. It is an early work, but is very

complete. Itcovers background material and problem considerations well, and has a very

balanced approachtoconsideringthegenetic algorithm as apossible solution. Theparam

eters are clearly reported, as are the algorithms used. This is an excellent candidate for

re-implementation andcomparison.

Typicalresults:

-Keylength correctly determined 87%ofthe time

- Correct

keyfound 13.33%ofthe time

Re-implementationapproach:

- Differenttextlengths

used, sameparameters re-used

- Additional parameters are usedtoexpandthe

range ofparameter values

Observedcryptanalytic usefulness:

- Low- low decryption

percentage

Commentonresultdiscrepancy:

-Percentages reported are based on number of tests and different numbers of

(42)

4.3

Merkle-Hellman

Knapsack: Spillman -

1993

The third paper published in 1993, by R. Spillman [25], applies a genetic algorithm ap

proach to aMerkle-Hellman knapsack system. This paperconsiders thegenetic algorithm

assimply anotherpossibility forthecryptanalysis ofknapsackciphers. Aknapsackcipher

is basedon theNP-complete problem ofknapsack packing, which is an application ofthe

subset sumproblem. Theproblem statement is: "Given n objects each with aknown vol

ume and aknapsack offixed volume, is there a subset ofthe n objects which exactly fills

theknapsack ?". The idea behind the knapsack cipheris that findingthe sum is hard, and

this fact can be used toprotect an encoding ofinformation. There must also be a decod

ingalgorithmthatis relatively easy for theintended recipientto apply. Aneasy decoding

algorithm can be made by using the easy knapsacks. An easy knapsackconsists of a se

quence of numbersthat issuperincreasing, i.e., each numberis greaterthan the sum ofthe

previous numbers. However, aneasy knapsack does not protectthe data ifthe elements of

the knapsack areknown. A Merkle-Hellman knapsack system converts an easy knapsack

into a trapdoorknapsack. A trapdoor knapsack is not superincreasing, making it harder

toattack. The Merkle-Hellmanprocedure tocreate atrapdoorknapsacksequence A is as

follows:

1. Givena simpleknapsacksequence A'

=

(a\, . . .

,a'n)

2. Selectan integeru > 2a'n

3. Selectanotherintegerw such thatgcd(u,w) 1

4. Find theinverseof w mod u

5. Constructthe trapdoorsequence A =

wA'

mod u

Once the easy knapsack has been converted into a trapdoor knapsack, both the easy

knapsack andthevalue ofw~l

must be known to decodethe data. Thetarget sum forthe

(43)

w_1

togetherand thenreducing it modulo u. This system canbecomeapublic-keycipher,

with the trapdoorsequencethe public key. The private key would be the superincreasing

sequenceandthevalues ofu, w, andw"1.

Thenext section ofthepaperintroducesgenetic algorithmsusingbinarychromosomes.

The applicable processes ofselection, mating, and mutation are described. Selection is a

random process which is biased so thatthe chromosomes with higher fitness values are

more likely to be selected. The matingprocess is a simple crossover operation, i.e., bits

s + 1 to rof parent#1 are swapped withbits s + 1 tor of parent#2. Mutationconsists of

switchingabit inthechromosometoitscomplement,i.e., a0 becomesa 1.

Thethree systemsthatneedto be defined fora genetic algorithm application are: rep

resentation scheme, matingprocess, and mutation process. The representation used here

isthatofbinary chromosomes,which representthesummation termsoftheknapsack, i.e.,

a 1 in the chromosome indicates that that term should be included when calculating the

knapsack sum. The evaluation function is determined once a representation scheme has

been selected. Here, thefunction measures howclose a given sum ofterms isto the target

sum oftheknapsack. Otherpropertiesincluded forthisexperiment are:

Function rangebetween 0and 1, with 1 indicating an exact match

Chromosomesthatproduceasum greaterthan the targetshouldhavealowerfitness,

ingeneral, than those thatproduce a sumlessthan the target

It shouldbe difficulttoproduce ahigh fitnessvalue

Theactualevaluationfunctionuseddependsontherelationship betweenthe target sum

andthechromosome sum.The function isdeterminedasfollows:

1. Calculatethe maximum differencethatcould occurbetween achromosome andthe

targetsum

MaxDiff =

m&x(Target,FullSum

-Target) (4.3.1)

(44)

2. Determinethe Sumofthecurrent chromosome

3. IfSum < Targetthen

Fitness= 1 \J\Sum Target\/Target (4.3.2)

4. IfSum > Target then

Fitness = 1 - {/\Sum

-Target]/MaxDiff (4.3.3)

As described above, the mating process is simple crossover. The mutation process

consists ofthree different possibilities. Abouthalfthe time, bitsare randomly mutated as

described above (switched to theircomplement). Next most likely is the swapping of a

bitwith a neighboring bit. The final possibility is a reversal ofthe bit order between two

points.

A description of the complete algorithm is given, as well as results for one 15-item

knapsack appliedto a5 charactertext. The charactersare encoded as 8 bit ASCIIcharac

ters. Eachofthe5 sums (onepercharacter) wasattacked usingthegenetic algorithm,with

an initial populationof20random 15-bitbinarystrings.

4.3.1 Comments

Thispaperisreasonablefor its length (about 10pages) and year. Aminimallevelofexperi

mentationis included buttheresultsandparameters aredecentlyreported. Theintroductory

material anddiscussion are reasonable, although short. Re-implementationshould not be

difficult, andtherearesufficient resultstomakea comparison possible. Overall,thispaper

isareasonable choiceforre-implementationandcomparison.

Typical results:

- Used 5 ASCII

characters,each ofwhich wasdecrypted

(45)

-Notre-implemented

Observedcryptanalyticusefulness:

- None

-theknapsacksize istrivial

Commenton resultdiscrepancy:

- No

comment, notre-implemented

4.4

Substitution/Permutation:

Clark - 1994

The only paper published in 1994, by AndrewClark [4], includes the genetic algorithm as

one ofthreeoptimizationalgorithmsappliedtocryptanalysis. Theothertwoalgorithms are

tabusearch and simulatedannealing. Thefirstsection ofthepaperdescribesthe twociphers

used

-simple substitution and permutation. Next, suitabilityassessmentis discussed. This

section is wherethe fitness functionsaredeveloped. Forthesubstitutioncipher, the fitness

function is

Fk =

(a|SF[t]

-DF[t\\+P^2J^\SDF[i\[i)

-DDF[t\\j]\) (4-4.1)

A denotes the set of characters in the alphabet (A, B,C,...,Z,space). SF[i] is the

relative frequency of characteri in English, while DF[i] is the relative frequency ofthe

decoded characteri in the message decrypted usingthekey k k. SDF[i\[j] is the relative

frequency ofthe digram ij in English, while DDF[i][j] is the relative frequency of that

digram in the message decrypted using k. Varying a and /?allows one tofavor eitherthe

unigram or digram frequencies. For the permutation cipher, the same approach was used

as in [20]

-the scoring tablefor digrams andtrigrams.

The next three sections introduce the basic concepts of simulated annealing, genetic

algorithms, and tabu search. The genetic algorithm section is fairly standard, and con

(46)

specifics aboutthealgorithm used. The final sectionin thepaperdiscusses theresults. It is

fairly short, anddoes not givemany details.

4.4.1 Comments

Overall, thispaper is too short (5 pages) tobe useful. It is well written, but lacks details.

Thetinygeneticalgorithm section andfewreported resultsmake it difficulttore-implement

or even tocompare against. This paperis nota good candidatefor furtheruse.

Typicalresults:

- Convergence

ratesandcomputation timearethe only reported results

Re-implementationapproach:

- Parameterswere not reported

- Standard

parameter values usedin other re-implementations were used

-Only the permutation attack was re-implemented as alater workfromthis au

thorwas usedforthe substitution attack

Observedcryptanalytic usefulness:

- Medium- the

correctkey wasfoundwith ahigh successrate

Comment on resultdiscrepancy:

- Noreal discrepancy, differentcombination of metrics used

4.5 Vernam:

Lin,

Kao

- 1995

This is the only paperpublished in 1995. By Feng-Tse Lin and Cheng-Yan Kao, [18] is a

ciphertext-only attack on aVernamcipher. Thepaperbegins by introducing cryptography

(47)

itself. The Vernam cipher is a one-time pad style system. Let M =

m\,m2, ...

,mn

denote a plaintextbit stream andK =

ky, k2, . . .

,kr akey bitstream. The Vernamcipher

generates a ciphertext bit stream C =

Ek(M) =

cuc2, ,cn, where a

= (m{ +

ki)

modp andp is a base. Since this is a one-key (symmetric orprivate-key) cryptosystem,

the decryption formula has the same key bit stream K, only M Dk(C) =

my, m2, . . .,

wherem{ = (ct

kt)mod p. Theproposed approachistodetermineK fromanintercepted

ciphertext C, anduseitto break thecipher.

Sincethedescribedcryptosystemis nottheVernam cipher,andtherearemanyerrorsin

this paper,both in notation and logic, this applicationneeds nofurtherdiscussion. The ge

neticalgorithm approach is rendered invalidby theerrors presentin thereported approach

and assumptions.

4.5.1 Comments

This paper is of below average quality. It is rather short, and was written forthe IEEE

International Conference on Systems, Man, and Cybernetics. There are some problems

with Englishgrammar,possibly dueto translationfromthe originallanguage. Thesystems

are adequately described, but are, however incorrect. The cipher given inthispaper is not

<

Figure

Figure 4.3: Knapsack Cipher Family and Attacking Papers
Figure 6. 1 : Classical Substitution Attack
Figure 6.2: Classical Vigenere Attack
Figure 6.3: Classical Permutation Attack
+7

References

Related documents

Therefore, field-based research trials were initiated to find atrazine alternatives were conducted in 2017 and 2018 in Fayetteville, Arkansas, by testing the tolerance of corn

This essay asserts that to effectively degrade and ultimately destroy the Islamic State of Iraq and Syria (ISIS), and to topple the Bashar al-Assad’s regime, the international

Since the amount of dye injected in the experimental procedures was at least half the standard amount used, the results may have clinical application in those infants where there is

[2, 4]. TB is prevalent and a serious national health problem in Pakistan. The smear microscopy is rapid and an inexpensive method. However, it lacks sensitivity and

In the British literature we identified only two publica- tions despite finding some evidence of local attempts to look at quality issues; the Society of Health Education and

Hence, we evaluated the expression levels of three main genes of NFκB family including NFκB1 , NFκB2 and RELA in patients with pterygium and

sulla base della nostra esperien- za di cinque anni la presenza di un medico (spesso come co-conduttore) è molto utile per discutere ad esempio dei sintomi

Total particulate emissions data (ginning and emission rates and corresponding emission factors) for the master trash systems are shown in.