3.2 Fred Cohen’s Formalization
3.2.3 Study and Basic Properties of Viral Sets
We are now going to present Fred Cohen’s main theoretical results about properties of viral sets. The proofs are here omitted, in order to not frighten the non-mathematical reader. They are essential to seize different concepts involved. The interested reader will find them in [34, section 2.5] and [35]. We strongly recommend reading these two reference works if you wish to acquire a deeper knowledge in the field of computer virology.
The first theorem asserts that any union of (a finite number of) viral sets is also a viral set.
Theorem 7
∀M ∈ M,∀U∗ ⊂ P(I∗)10[∀V ∈U∗(M, V)∈ V]⇒[(M,∪U∗)∈ V]
Proof. The proof is left to the reader as an exercise (see at the end of the
chapter). 2
9 The transitive closure of a binary relationRon a setE(let us recall that such a relation
can be described or defined by a subset of the Cartesian product ofE) is the minimal transitive relationRonEthat containsR. To be more precise, for any elementsxand
yofE, ifxRythen eitherxRyor there existsz∈ E such that xRz andzRy.
10IfE denotes a set,P(E) denotes the set of all subsets ofE (also called the power set).
The reader will prove thatP(E) has cardinal number 2|E| (hint: use the characteristic function.
This theorem has a rather strong consequence since it implies that if we consider two viral setsV1 andV2, any virusv2 ∈V2may evolve fromv1∈V1. Moreover, this theorem enables to prove the next proposition.
Proposition 8 (Largest viral set)
[∀M ∈ M[[∃V ⊂I∗[(M, V)∈ V]]⇒[∃U ⊂I∗ such that 1. [(M, U)∈ V]and
2. [∀V ⊂I∗[[(M, V)∈ V]⇒[∀v∈V[v∈U]]]]]]
The set U is called the largest viral set with respect to M and is denoted
LV S(M).
Proof. Hint: the first item is easily proved by means of Theorem 7 while item 2 is proved by contradiction by assuming that item 2 is false. You get a contradictory result since you conclude that both v∈U andv ∈U. 2 The notion of “largest viral set11” thus enables us to consider all the viruses
v that have evolved from a given virusv. In other words, we have v⇒M v ⇒ v ∈LV S(M). Let us notice thatLV S(M) is the union of all viral sets with respect to M.
The notion of largest viral set may also conversely suggest the notion of smallest viral set with respect to a machineM. Thus, it supposes that there exists a non-empty viral set, all of whose proper subset are no longer viral sets.
Definition 22 (Smallest viral set)
A smallest viral set with respect to M ∈ M, denoted SV S(M), is defined by
[∀M ∈ M[∀V ⊂I∗[(M, V)∈P P EV(M)]⇔
1. [(M, V)∈ V]and
2. [ ∃U ⊂V such that [(M, U)∈ V]]].
It obvious, by considering previous remarks and comments, that there may be many SV S(M) for a given machine M. In fact, the viral property is defined over the subset lattice of I∗, that is to say the set of subsets of I∗
partially ordered by the inclusion relation; thus the existence of many, non- empty, smallest subsets is quite logical. In particular, it is quite reasonable to ask oneself whetherSV S(M) with respect to a given machine M, contains only one element, in other words, whether it is a singleton. Indeed, when 11By largest, it is meant with respect to the partial ordering defined with respect to the
considering the partial ordering defined by subset inclusion, the smallest non-empty sets are precisely singletons. The next theorem gives the answer to that question.
Theorem 8 There exists a machineM ∈ Mwith respect to whichSV S(M)
is a singleton. In other words,
[∃M ∈ M[∃V ⊂I∗ such that [(M, V)∈SV S(M)] and [|V|= 1]]].
The singleton smallest viral set (singleton viral set for short) describes the practical case of simple viruses that are non polymorphic viruses (they do not evolve). This case is the most frequent one, and also the most obviously known to the public’s mind. Fred Cohen gave in his seminal thesis [34, pp 94-95], as an example, a simulation of such a machine. This machine has a singleton as smallest viral set (see the section devoted to the study projects at the end of chapter).
By reversing the previous theorem (contrapositive approach), it becomes possible to define a virus with respect to a given machine (a computing environment) as any sequence (in the sense of Turing machines) which is able to evolve with respect to that machine.
Corollary 1 For all machines M ∈ M and for all u∈I∗ we have:
[[u⇒ {M u}]⇒[(M,{u} ∈ V]].
Proof. Hint: use the formal definition given in Figure 3.1. 2 In fact, Fred Cohen proved a more general result, by considering finite viral sets which come in all sizes.
Theorem 9 (Smallest viral set of fixed size)
For any integer i∈ N∗, there exists a machine M ∈ M and a set V ⊂ I∗
such that
1. [(M, V)∈SV S(M)] and 2. [|V|=i]
This is thus the case where viruses contained in such a viral set have a lim- ited (bounded) and controlled infective power. In other words, any of these viruses have a fixed numer of evolved forms12. Fred Cohen also illustrates this particular case, by giving the detailed pseudo-code of a machine with respect to which the smallest viral set has size 4 [34, pp 95-97].
12The reader may refer to [162] for a generalization of that result, which considers viruses
In a more general view, the existence of a finite, countable viral set (in other words a set which is equipotent to the set of natural numbers N) is demonstrated, for all Turing machines, with the following theorem.
Theorem 10 (Finite countable viral set)
There exists a machine M ∈ M and a set V ⊂I∗ such that
[(M, V)∈ V] and [|V|=|N|].
Proof. The reader will refer to [34, pp 19-20] for the detailed proof of this theorem. Before doing this, he is strongly advised to read Section 2.6 of Fred Cohen’s thesis, in which are defined some essential tools required for that proof (abbreviated tables which enable us to describe in a single statement, a large set of states, inputs, outputs, next states and tape movements). Here is the sketch of the proof: consider a viral set in which each element evolves into another element which has a one more symbol. This thus enables one to use the induction principle (in other word the bijectionn→n+ 1 which builds the set of natural integers). Hence the result. 2 Fred Cohen also demonstrated this result by giving a practical implemen- tation of such a machine, potentially producing a finite countable viral set (see [34, pp 99-101]). The previous theorem may seem of theoretical interest only. It is definitively not the case. A very essential corollary, which has strong and fundamental consequences, can be derived from that theorem.
Corollary 2 Let us consider a machine M ∈ M as defined in Theorem 10.
There exists a setW ⊂I∗ such that
[|M|=|N|]and [∀w∈W[ ∃W ⊂W[w⇒M W]]].
Proof. The proof derives from that of Theorem 10 when considering a viral set which does not accept any smallest viral set with respect to that machine
(see [34, pp 19-20]). 2
The machine M as defined in Theorem 10 thus accepts a finite countable set of sequences which are not viruses (that is to say these sequences do not follow to the formal definition of Figure 3.1). Thus, as a consequence, there cannot exist a machineM ∈ Mallowing one to determine if a pair (M, V) is of viral nature or not, by simply enumerating either all the viruses (case of Theorem 10) or the set of all non viral sequences with respect toM (case of Corollary 2). We will consider again the extremely important consequences and implications of that corollary in Section 3.2.4
Proposition 9 There exists a machine M ∈ M for which any sequence of symbols is not viral with respect to M. In other words,
∀M ∈ M[V ⊂I∗[(M, V)∈ V]].
Proof. Just consider a machine which always halts without moving its tape
head (see [34, page 20]). 2
The machines M of Proposition 9 correspond in fact to all environments “computing” or manipulating completely “inert” data (which definitively do not involve any execution process as text document like *.txt, image files, audio files...). This implies that a pure text document cannot be infected.
In a contrapositive way, Theorem 11 implies that it is always possible to find a machine for which an arbitrary sequence is a virus with respect to that machine.
Theorem 11 For all sequencev∈I∗, there exists a machine M ∈ Msuch
that
[(M,{v})∈ V].
In his proof, Fred Cohen effectively built such a machine (see [34, page 21 and 101-103]). Let us notice that in this case, the machineM is defined such thatSV S(M) is a singleton and such thatSV S(M) =LV S(M).
To conclude this section devoted to basic properties of viral sets, let us consider the next proposition which complements the two preceding results (Proposition 9 and Theorem 11).
Proposition 10 There exists a machineM ∈ Msuch that for all sequences
v∈I∗ there exists a set V ⊂I∗ such that
[[v∈V] and[(M, V)∈LV S(M)]].
Thus, for this machine, any sequence is a virus. The proof given by Fred Cohen [34, pp 22-23] is a constructive one.