• No results found

Numerical Data and Indices

4.2 General Aspects of Computer Infection Programs

4.2.5 Numerical Data and Indices

Statistics and numerical data concerning viruses are rather difficult both to find and check. Antivirus software publishers who receive a huge number of reports about infectious cases and malware attacks from their customers (data about the number of infections, types of viruses, infected files), are not inclined to reveal and publish any relevant information. They publish the latest news about viruses (some information about the viruses which have been released during the next month, and sometimes some monthly statistics) but they never provide fundamental data enabling to perform thorough analyses in the long run. Moreover, given the existing commercial stakes in this field, each publisher tends to use different parameters for their analyses which make the comparisons difficult. In particular, each publisher has adopted a different malware naming convention that make things far more complex to analyze8.

Similarly, it is difficult to have a good idea of the number of infections and of their variants. The list of existing viruses varies from publisher to publisher and available figures may greatly differ. After analysis of the most available serious data – which have been both crossed-checked and compared with the results of independant surveys and with those from the author’s own virus database – the following figures can be considered:

• the total number of known viruses (including their variants) has reached roughly 70,000 in January 2002. This figure has probably increased to more than 80,000 in January 2005;

• each month, between 800 and 1200 new viruses are discovered;

• in January 2002, computer infections were divided into the following categories whose distribution in given in Figure 4.2 (let us precise that the viruses classified under “miscellaneous” include all the other types of viruses and worms).

Another interesting aspect is to measure the impact of a computer infection. To the authors knowledge, there are no scales allowing us to assess the whole gamut of the dangers which can be caused by computer malware, and to order them from the most dangerous to the most innocuous. To fill this gap, we have defined and propose several indices designed to assess viral risk.

Definition 37 (Virulence)

The infectious indexIv0 for a virusv is a measure of thea prioririsk. It is

8 Very recently – December 2004 – the US Computer and Emergency Response Team

(CERT) launched a project calledCommon Malware Enumeration(CME) which aims at normalizing malware naming

Fig. 4.2.Distribution of Malware (January 2002)

defined by

Iv0 = Number of files that are susceptible to being infected by v

Total number of files in the system . The infection index Iv1 for a virus v is a measure of the a posteriori risk. It is defined by

Iv1 = Number of files infected by v

Number of files that are susceptible to being infected by v. The virulence Vv of a virus v is then given by:

V =Iv0×Iv1= Number of files infected by v

Total number of files in the system.

As for worms, the previous indices are defined on the basis of infected com- puters (regardless of the files).

All the indices Iv0, Iv1 and V range from 0 to 1. The notion of file which are susceptible to be infected, heavily depends on the considered virus. The total number of files (total number of computers respectively) only takes into account either executable files (in case of any type of executable viruses) or documents (in case of document viruses). For the total number of computers, we considered only computers running on operating systems targeted by

the worm. In fact, the purpose is to compare things that can be compared (obviously, if one wishes to measure the infective power of a worm under Windows environments, it makes no sense to consider computers running under Unix environments).

Readers will notice that these indicators simply consider the infection risk regardless of the risk inherent in the final payload itself. These indices are rather easy to establish when it comes to viruses. Indeed, an analysis of all the files contained in a computer turns out to be sufficient to get figures concerning, for instance, the number of files that are susceptible to be infected, the total number of files on the system, or the number of infected files. When it comes to worms, getting accurate data is a far more difficult task. For instance, in the case of theCoderedworm, no data were made public about the proportion of IIS Web servers which were still unpatched when the worm attacked. Similarly, accurate figures are not available concerning the total number of servers or computers used worldwide. Nevertheless, the above-mentioned indices allows us to better understand the risk from worms. Let us precise that our purpose is to measure the relative risk inherent in a given infection, regardless of the potential action of antivirus software.

As illustrative examples, a worm like the Codered worm (a simple worm or I-worm) had a virulence which was close to 1, as it is shown by Equa- tion (4.1) in Section 4.5.2. The Sapphire/Slammer worm – whose agressive scanning caused some local shutdowns on the Internet which thus limited the spread of the worm itself – had a virulence lower than that of theCodered

worm, despite the fact that they both belong to the I-Worm class. An email worm (also called mass-mailing worm) will very often have a virulence which is lower than that of any I-worm. Indeed, the proportion of hosts that have been infected by this kind of worm remains relatively weak9. The increased vigilance of users with respect to email attachments that are likely to con- tain malware, tends to limit the risk. It is absolutely quite the contrary as far as software security flaws are concerned. Most of the users and even many system or network administrators are not aware of dangerous security flaws in their systems. The virulence index enables in a very interesting and powerful way to sort – yet still approximatively – viral hazard with respect to the different malware classes.

Considering on the one hand, the lessons learned from experiments and observations, and on the other hand the parallel between biologi- 9 Even if some recent attacks, like those conducted by theMyDoomworm or theNetsky

cal/computer viruses drawn in Section 4.2.4, the following empirical defi- nition can be laid down:

Proposition 14 The level of detectability of a computer infection program is inversely proportional to the length of the incubation period and propor- tional to the number of infections which occur in a system. In other words:

Detectability=C×Number of viral copies Tincubation ,

where 0≤C≤1.

We consider here a complete incubation period. In other words, no antiviral alert has been triggered. The longer the incubation period is, the lower the risks of virus detection. On the contrary, the more copies the virus makes, the greater the risk of it being detected. The C constant describes a wide range of parameters: it highlights for instance, the presence of mistakes made by the virus developer (presence of bugs and so on), the use of anti-antiviral techniques, the kind of final payload, etc. As a general rule, this empirical measure mostly provides a rather accurate picture of the reality. This rule has also the advantage of measuring – though empirically – the global effects of the virus (the final payload is taken into account).

One can say roughly that the level of risk for a given system (e.g. com- puter) to be infected by a given computer infection program inversely varies according to the level of detectability of this infection.