MEHARI: A System for Analysing the Use of the Internet Services

(1)

MEHARI: A System for Analysing the Use of the Internet Services

Pedro J. Lizcano, Comisión Interministerial de Ciencia y Tecnología (CICYT),

Rosario Pino, 14-16, 28020 Madrid, Spain

Arturo Azcorra, Universidad Carlos III de Madrid (UC3M), Butarque 15, 28911 Leganés (Madrid), Spain

Josep Solé-Pareta and Jordi Domingo-Pascual, Universitat Politècnica de Catalunya (UPC), Jordi Girona 1-3, 08034 Barcelona, Spain

Manuel Alvarez-Campana, Universidad Politécnica de Madrid (UPM), ETSI Telecomunicación, Ciudad Universitaria, 28040 Madrid, Spain

Abstract. This paper describes the MEHARI system for monitoring and analysing the traffic on

National R&D Networks (NRN). The main design requirement of the MEHARI system was to cope with the main challenges currently faced by most of these networks: Service usage, charging schemes, network dimensioning and auditing procedures according to a given Acceptable Use Policy (AUP). The resulting MEHARI system consists of a low cost hardware traffic capture platform, and several traffic analysis software modules running on top of this platform. These modules can report information on the usage of the Internet services, the main traffic origin and destinations, etc. The MEHARI system has been tested by running a field trial on the Spanish NRN (RedIRIS). Some relevant data on the performance and efficiency of the system configuration used in the field trial is presented in this paper.

Keywords: Internet Services monitoring, Internet traffic analysis, National R&D Networks,

Acceptable Use Policy auditing.

1. Introduction

The full development of the Information Society paradigm has led to the development of new IP infrastructures worldwide, driven by commercial interest and by different government initiatives because of strategic reasons. Even the EU has play a leading roll in this matter with the TEN-34 and TEN-155

initiatives to foster the trans-national collaboration among the different National NRN researches. But these deployments are facing with two major problems:

• In most cases, NRNs are funded by public money. Nevertheless, budgets can not follow the traffic increasing rates that users demand. Therefore, a new funding model which leads to an use-responsible approach is needed.

• Market forces, in current full competitive scenarios, demand from these public funded networks a transparent and acceptable use policy (AUP) to assure no distortions in the emerging Internet-based services business.

To cope with these two open issues, tools specifically tailored for auditing AUP are required for monitoring the actual use of the public funded networks. The possible collaborations of European or NRNs networking initiatives with their leading counterparts in North America (i.e., Internet 2 and Canarie) depend on having these tools available to ensure the coherence of the Americans and Europeans AUPs concerning the traffic to be exchanged among them.

Furthermore, the above mentioned full development of the Information Society also complicates the dimensioning, management, and operation of Internet backbones. These tasks cannot be successfully performed without precise and up-to-date knowledge about what is going on in the network. Thus, there

(2)

are two different dimensions to this problem: one is from the traffic perspective, and the other is from the

purpose usage angle. The former may need contents analysis in cases in which application identification by port is not possible, and its results are reasonably objective. The latter definitely needs contents analysis, and its results are based on heuristics and subjective techniques.

On the traffic analysis aspect, it is worth mentioning some recent studies on the characterisation of Internet traffic [1,2]. The traffic measurements obtained in these studies have revealed some interesting results about the Internet traffic patterns on IP/ATM backbones [3]. The main conclusion is, however, that Internet traffic is highly variable and unpredictable and the results of traffic analysis are therefore only valid in the short-term.

On the purpose usage aspect, different heuristic techniques may be combined according to the nature of the traffic classification desired. As content analysis is costly in terms of CPU, a balance has to be established between confidence in the objectivity of the results and the processing power required. It is recommended that heuristics be manually checked from time to time in order to make the analysis more tight or relaxed, depending on the deviations between real traffic and the diagnosis performed by the tool.

This paper describes the design of the MEHARI system, a high performance traffic processor for analysing the headers and contents of Internet traffic. The system characteristics are modularity, scalability, high-performance, low-cost, flexibility and adaptability. The MEHARI hardware is based on low-cost PC components, and the application processing may be performed on any UNIX machine. The MEHARI platform consists of PCs with STM-1 cards for traffic capture and processing. The pre-processed traffic is fed from the platform to the MEHARI traffic analysis software modules. The MEHARI capture and processing capabilities can be expanded in modules by increasing the number of hardware elements in the system configuration. The MEHARI analysis functionality can be extended or modified both by configuration and by adding on new analysis software modules. Three examples of the type of information that can be obtained from the current implemented traffic analysis modules are the following: 1) a heuristic classification of the traffic in academic, commercial, leisure and undetermined categories; 2) a classification of the traffic according its origen and destination; and 3) the verification of port assignment of applications. The system including these three has been field tested and the results of the trials are also described in the article, both in terms of functionality and performance

Data provided by MEHARI may serve different purposes, some of which are:

• Network engineering to adjust capacity to traffic matrix and hourly profiles. • Characterisation and analysis of user (or user group) traffic.

• Identification of the information services, information servers and client sites available on the network.

• Classification of the network traffic by application and also by purpose usage. • Validating the appropriate usage policy.

• Billing and charging.

• Detection and identification of security threats and attacks [4].

The rest of the paper is structured as follows. The next section presents the functional architecture of the MEHARI system. Section 4 describes the application modules currently supported by the system and the reasoning behind its development. Section 5 provides some sample results of the field trial of the MEHARI system in a real network scenario: the Spanish NRN (RedIRIS). Finally, Section 6 summarises the main conclusions of our work.

2. MEHARI Functional Architecture

Figure 1 shows the functional architecture of the MEHARI system, which consists of the following subsystems:

• Traffic Capture Subsystem (TCSS): This functional block is responsible for capturing the traffic samples, which are subsequently analysed by other blocks of the MEHARI system. The TCSS can be configured to capture ATM cells either in a given list of VPI/VCI pairs or in promiscuous mode (all the VPI/VCI pairs). Cells are reassembled into AAL5 frames, which are periodically dumped to disk for further processing. Depending on the type of analysis to be performed, the user can select either dumping the whole AAL5 frame or just part of it (e.g. the first 48 bytes).

(3)

It should be noted that, although the MEHARI system was initially conceived for monitoring IP/ATM traffic, the TCSS does not make any assumption about the content of the AAL5 frame. This way, the system could be used in the future to analyse other protocols encapsulated over ATM. The TCSS has been designed so that it can be physically implemented on one or several capture machines, thus allowing the capture ratio to be increased as required. The capture platforms are standard PCs running Free BSD. The capture itself is performed by two ATM Fore network interface cards (one for each transmission direction) using a special firmware (OC3MON [3]). Because of the use of standard hardware components, the total cost of the system is quite low (approximately 6,000 euros per capture platform).

Traffic Capture Subsystem

Preprocessing Module

Traffic Analysis Subsystem

PPS

Application Modules MEHARI System ATM 1 ATM 0 ATM 1 ATM 0 Capture Platform ATM 1 ATM 0 ATM Network Traffic Samples Self-regulation IP Flows ATM Cells Monitoring Point

Figure 1. Functional Architecture of the MEHARI System.

• Traffic Analysis Subsystem (TASS): This functional block is responsible for analysing the traffic samples generated by the TCSS subsystem. The TASS contains the Pre-processing Module (PPM) and the Application Modules (APMs).

Because of the high traffic volume that can be transported on the ATM links that are monitored, the TASS must discard the traffic samples as soon as possible. This is precisely the reason for introducing the PPM, which pre-processes the traffic samples on the fly so that capture files are deleted once the parameters of interest have been extracted.

Note that the rate at which the traffic samples are pre-processed is crucial for the overall performance of the system. In the MEHARI project we concentrated on analysing IP flows (see Section 3.1), which led us to develop a somewhat CPU-intensive pre-processing of the traffic samples. Other applications of the MEHARI system will probably require less processing capacity. In any case, the TASS subsystem was designed so that it can be physically implemented on one or several machines. This approach allows the processing capacity to be increased as required by adding more PCs or Workstations. In any case, note the existence of a self-regulation mechanism, which adapts the capture ratio to the TASS processing capacity.

The traffic analysis is actually performed by the APMs. Different types of APMs, corresponding to different types of analysis, can be present in the TASS. Section 3 describes some of the APMs developed for the MEHARI system. It should be pointed out that the MEHARI system has been designed so that new APMs can easily be added. Besides, there is the possibility of using several PCs to perform a distributed analysis, so that the processing capacity can be matched to the requirements of the different APMs considered.

(4)

3. Application Modules

This section describes the Application Modules (APMs) specifically developed for the MEHARI project. As mentioned in Section 2, these APMs are only an example of the different types of analysis that can be carried out by the MEHARI system. Our main purpose in describing the following APMs is to show the capabilities of the system and to provide a sample of the wide range of possibilities for which it could be considered.

3.1 Traffic Classification Module

The MEHARI project was conceived as the continuation of the traffic measurements and network usage characterisation studies carried out on the Spanish NRN (RedIRIS) in the CASTBA project. The results of CASTBA revealed a number of difficulties in characterising Internet traffic by using conventional traffic analysers. We found out that a simple analysis based on protocol and TCP/UDP port numbers did not provide sufficient accuracy [5]. The reason is the existence of applications making use of unregistered ports and registered ports used for purposes other than those asigned. This resulted in a large percentage of traffic (10-20%) whose nature could not be determined or, what it is worse, an undetermined amount of traffic classified under the wrong categories. One of the main motivations of the MEHARI project was precisely to develop an Internet traffic monitoring and analysis tool to solve these uncertainties. What started as a modest attempt to overcome the limitations of TCP/UDP port traffic analysis turned out to form the Traffic Classification Module, which has been proved to be highly effective for characterising Internet traffic and services.

Figure 2 shows the structure of the MEHARI Traffic Classification Module (TCM). This APM analyses the data coming from the PPM, which is based on the concept of IP flows and pattern recognition. The PPM performs the statistical aggregation of all the packets belonging to a same IP flow during a configurable period of time. An IP flow is defined as the aggregation of packets sharing the quadruple IP source address, TCP/UDP source port, IP destination address, and TCP/UDP port. Furthermore, the PPM explores the contents of the packets trying to determine the presence of specific patterns, which can be programmed by the user. As a result, the PPM periodically generates a file containing a summary of the IP flows detected and the number of times that a given pattern has been detected in it.

Correlation Biflows + symptoms Capture Files (incoming traffic) PPM PPM Data base with patterns Capture Files (outgoing traffic) IP Flows + symptoms Classification Data base with heuristics Summary Report Files Traffic Classification Module (TCM)

Preprocessing Module (PPM)

IP Flows + symptoms

Figure 2. Traffic Classification Module (TCM).

The IP flows are further analysed by the TCM, which performs the correlation of the data corresponding to incoming and outgoing traffic. The result is a collection of bi-directional IP flows (biflows) with a weighted list of symptoms for each direction. The next step consists in applying a set of heuristics, which can easily be configured by the user. The application of the heuristics causes a biflow to be classified under a particular traffic category, which has been previously defined by the user. For example, in the field trial described in Section 4 of this paper, we considered four traffic categories, namely academic, commercial, leisure and undetermined. Finally, the TCM generates a summary file reporting the traffic

(5)

volume under each traffic category considered. Section 4 shows an example of the traffic classification summary reports that can be obtained with the TCM.

3.2 Traffic Origin/Destination Analysis Module

The Traffic Origin/Destination Analysis Module (TODM) performs a traffic classification based on the origin/destination Autonomous System (AS) of the bi-directional IP flows identified by the TCM module (see Subsection 3.1). The functional diagram of the TODM module is represented in Figure 3.

IP Biflows

Identification of AS AS

Data base

Traffic Origin/Destination Analysis Module (TODM

Traffic Classification Module (TCM) IRR Processor Summary Report Files Official IRR Data Bases

Figure 3. Traffic Origin/Destination Analysis Module (TODM).

To determine the AS to which an IP source/destination address belongs, the TODM makes use of the information stored in the Internet Routing Register (IIR) databases. This information is periodically downloaded by the IIR Processor Block, which generates a local AS database that is used by the AS Identification Block. The AS database includes the name of the organisation responsible for the different ASs as well as the identification of the subnetworks belonging to each AS.

The AS Identification Block analyses the IP addresses by looking them up in the local AS database. As a result, it periodically generates a statistics report file with the traffic volume exchanged between each subnetwork belonging to the network under study (e.g. RedIRIS) and the different ASs registered in the IIR databases. The TODM application module can be used, for example, to determine the fraction of traffic exchanged with a specific academic or commercial network, or to obtain a list of the most visited ASs. Other possible applications of the TODM module could be the following:

• Assessment of the external links usage: The system provides a set of reports showing the use of resources per sub-network of the monitored academic network. This information is essential for future network re-design, billing schemes, etc.

• Billing and charging for the use of network resources: The distribution of traffic according to sub-network or AS allows supporting billing and charging models based on destination where tariffs rely on source and destination IP address or mask, autonomous system number or chain (route).

3.3 Internet Headers Analysis Module

The Internet Headers Analysis Module (IHM) provides a traffic classification based on the analysis of the three basic headers of the Internet protocol architecture: the IP packet header, the TCP/UDP segment headers, and the service PDU header. The structure of the IHM module is shown in Figure 4.

The main objective of the IHM is to differentiate as many traffic flows (packet sequences with the same source and destination) as possible, and then to check the coherence between the TCP/UDP port and the application within each flow. It differentiates between the traffic that performs according to the standards (the TCP or UDP port corresponds to standard assignment) and the traffic that does not. It provides

(6)

statistics on the most visited local (inside the academic network) and remote (outside the academic network) servers.

The classification is a result of the verification of the services contained in the routed traffic. This verification is carried out by checking the coherence between the TCP/UDP ports and the service PDU header. In order to achieve this verification, we produced a dictionary of the main well-known services containing a set of patterns extracted from each service specification. The verification is server-oriented, which provides better performance and less consumption of computing resources. Some possible applications of the IHM module could be billing or auditing network usage, verification of the services routed by the network, recording unknown/unusual traffic, etc.

Internet Header Verification (session oriented) Capture Files Pre-filtering Unknown Traffic Processor Data base with patterns

Summary Report Files Internet Header Analysis Module (IHM)

- % Verified traffic - % Pending traffic

Summary Report Files (unknown traffic)

Capture Files (unknown traffic) Summary Report Files

IPnIP, ICMP, ...

Figure 4. Internet Header Analysis Module (IHM).

Some possible applications of the IHM module could be the following:

• Verification ratio: Traffic distribution classified as: verified (checked ok), pending (patterns not found in the dictionary), unknown (ports not included in the dictionary) and rejected (transport protocol different from TCP/UDP).

• Identification of the main local and remote servers: The verified traffic is classified according to the local or remote server addresses. This classification allows the main remote and local servers to be highlighted, either according to the service they provide or not.

• Classification of unknown traffic: Some of the following heuristics can be useful for classifying the main servers and services that remain unknown for the network managers: Reports on remote and local addresses sorted by bytes served. Report of possible servers detected sorted by number of possible clients connected to. All these reports highlight addresses, volumes, transport protocols and application ports that enable system auditors to inspect traffic and guess applications.

4. MEHARI System Tests

The MEHARI system has been successfully tested on a real network scenario: the IP/ATM backbone of the Spanish NRN (RedIRIS). Two MEHARI system units were deployed at two different monitoring points of the RedIRIS backbone. One of the MEHARI units was installed in the RedIRIS central node in Madrid, according to the configuration shown in Figure 5. The other MEHARI unit was placed on Catalonia’s RedIRIS access network, with a network scenario very similar to the one used in the central node.

(7)

GIGACOM Telefónica ATM Network RedIRIS Regional Nodes Splitters

ATM Access Switch

Post-processing PC (LINUX) 100 BaseT Ethernet NFS Internet (RedIris ) Remote Access Traffic Cature PC (FreeBSD) 1 0 STM-1 ATM Optical Interfaces RedIRIS Core Router

Figure 5. Trial network configuration.

It must be stressed that the results presented hereinafter aims at providing some examples of the MEHARI reporting capabilities. The figures included in these reports do not necessary correspond to the actual value of the involved traffic parameters in the Spanish NRN. However, these values may correspond to a monitoring process in a given and limited time period, and therefore are not full representative.

Figure 6 shows some examples of the results obtained by the Traffic Classification Module (TCM) at the central node site in Madrid. For each of the 17 regional links the histogram represents the percentage of input traffic corresponding to the four categories of traffic considered in the MEHARI project (academic, leisure, commercial and undetermined). The results correspond to the average obtained during a capture period of approximately four and a half months (from 15 September 1998 to 2 February 1999).

% Incoming bytes 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Link 1 Link 2 Link 3 Link 4 Link 5 Link 6 Link 7 Link 8 Link 9 _{Link 10} _{Link 11} _{Link 12} _{Link 13} _{Link 14} _{Link 15} _{Link 16} _{Link 17}

Academic Leisure Commercial Undetermined

Figure 6. Traffic classification by usage (per RedIRIS regional nodes).

Figure 7 shows the overall input traffic distribution for the 17 regional nodes of RedIRIS as whole for the same period of capture than in Figure 6.

(8)

% Incoming bytes Commercial 3% Undetermined 2% Leisure 18% Academic 77%

Figure 7. Traffic classification by usage (global distribution).

For an example of the results provided by the Traffic Origin/Destination Analysis Module (TODM), see Figures 8 to 12. Figure 8 shows the main traffic origins of the RedIRIS traffic obtained from an input traffic capture made during a period of approximately four and a half months (from 15 September 1998 to 2 February 1999). The graphics show the percentage of input traffic to each of the 17 RedIRIS regional nodes and the following origins: other RedIRIS sites, the Spanish commercial Internet, the European research networks (TEN-34/155), and the rest of Internet (through the RedIRIS to USA link).

RedIRIS TEN-34/155 Ibernet Rest of Internet (through USA) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

% Bytes (Input traffic)

RedIRIS Regional Nodes (17)

Figure 8. Main origins of the RedIRIS traffic (per RedIRIS regional nodes).

Note than, in both cases Figures 6 and 8, the fact that the distribution is given in percentage of captured traffic hides the differences in amount of traffic handled by the different regional nodes of RedIRIS.

Figure 9 shows the overall input traffic distribution for the 17 RedIRIS regional links for the same capture period as in Figure 8.

(9)

Total Input traffic to RedIRIS 26% 21% 12% 41% RedIRIS TEN-34/155 Ibernet Rest of Internet (through USA)

Figure 9. Main origins of the RedIRIS input traffic (for the whole RedIRIS network).

Figure 10 shows an attempt to measure the percentage of academic traffic in the link between RedIRIS and USA. Note that Figure 9 is based on the pessimistic approach, and only the traffic from USA (input traffic to RedIRIS) originating at an institution declared as being an academic or research centre in the Internet Routing Register is considered academic. The rest is considered commodity traffic. This result corresponds to one month’s capture (between September and October 1998).

Input traffic 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 % of c ap tu re d tr affi c

RedIRIS Regional Nodes (17)

Figure 10. Percentage of academic traffic found in the USA to RedIRIS link (according with the IRR description).

As regards traffic classification by commercial destinations, the TODM module provides separate statistics for each of the Spanish NRN regional nodes. Figure 11 shows the main commercial origins of the input traffic to the Catalan node. The capture period in this case goes from January to February 1999. In this figure only the 25 most visited sites are depicted: the rest (958 sites for the input traffic graphic and 990 for the output traffic graphic) are compiled in the last column. Note that the 25 most visited sites collect around 60 % of the captured traffic.

(10)

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% TS AI R E DEST B OLE E S -CTV-980 527 E S -TTD-951 020 C A IXA-RED C A NAL-PL US-SPA IES _ABF ICT N ET IBE RNETC OM E S -FCR-95 0607 JE TNET GR N C O NEXIS IP-MU LTIMEDIA SE R V IC O M 2-NE T S AB C T E LEM A T IC SP R IT E L SE R V IC O M 1-NE T S D A UCOM2ME G-ES RA N FU T INF ASE RS INT E RCOM Other Su b-Networks: 958

% Bytes (Input traffic to Catalonia)

Figure 11. The 25 most visited commercial sites in Catalonia (from input traffic captures).

Another interesting result is the traffic classification by the most visited ASs of the European academic network (TEN-155). As for the main commercial origins, the TODM module provides separate statistics for each of the Spanish NRN regional nodes. Again, as an example, Figure 12 includes the statistics on the main visited ASs of the Catalan node for the same capture period as above (January and February 1999). 0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% A S 1275 D F N -I P s er vi ce and D F N c us tom er ne tw or ks A S 78 6 Th e JA N E T IP S er vi ce A S 1653 S U N E T S w edis h U ni ver si ty N et w or k A S 1103 S U R F net A S 2856 B T ne t U K R egion al net w or k A S 224 U N IN E T T , T he N or w egian U niv er si ty & R es ear ch Ne tw or k A S 1717 R E N A T E R A S 3301 T eli aN et S w ede n A S 2852 C E S N E T z .s .p. o. T E N 34-C Z A S 51 3 CE RN A S 1853 A C O net B ac kbon e A S 1239 A S 559 S W IT C H , S w is s A cadem ic and R es ear ch N et w or k A S 8761 R E T E N E T A ut onom ous S ys te m A S 1741 F U N E T au to nom ous s ys te m A S 8743 H ighw ay O ne A ut onom us S ys tem A S 1835 D E N et D ani sh N et w or k f or R es ear ch and E duc at io n A S 3269 T E LE C O M IT A LI A A S 6805 m ediaW ay s A ut onom ous S ys tem A S 3215 R A IN A S 5470 A U T H -N E T -A S A S 5556 T elenor di a A B A S 8209 A 20 00 / K abelt el ev is ie A m st er da m bv A S 2529 D em on I nt er net Lt d A S 1290 P S IN et U K Lt d. O ther A ss : 433

% Bytes (Input traffic to Catalonia)

Figure 12. The 25 most visited TEN-155 ASs in Catalonia (from input traffic captures).

Figures from 13 to 16 show some examples of the results provided by the Internet Header Module (IHM). Figure 13 shows the percentage of verified traffic, pending traffic (no pattern found), unknown traffic

(11)

(ports not registered) and rejected traffic (other protocols than TCP/UDP), for the traffic captured in the Catalan sub-network in January and February 1999.

0.1 % 84.9 % 13.5 % 1.5 % Pending Verified Unknown Rejected

Figure 13.Traffic verification results.

Figures 14 and 15 present one application of the tool: finding the most relevant servers. Host and application are determined for the 25 most visited remote and local servers respectively from the traffic captured during January and February 1999. In these graphics the input and output traffic are computed together. Note that remote servers collected 74 % of the total amount of captured traffic (see Figure 14). Local servers collected only 26%, the remainder of the captured traffic (see Figure 15). Also note that in both remote and local server graphics the last column represents the traffic collected by the rest of the servers, other than the 25 most visited. The number of these servers is 865,811 and 153,900 respectively.

Remote Servers (IP address, TCP port, Service) 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 1.E+12 13 0.206. 1.5 20 F T P 19 5.5.65 .67 80 H T T P 13 0.206. 1.11 8 080 H T T P 19 2.187. 16.2 8 080 H T T P 13 7.222. 213.62 20 F T P 13 4.2.3.4 8 20 F T P 12 9.69.1 8.18 8 080 H T T P 19 5.113. 126.7 20 F T P 19 4.224. 55.46 80 H T T P 19 5.5.65 .68 80 H T T P 20 7.82.2 50.251 80 H T T P 19 3.63.2 55.4 2 0 F T P 15 0.244. 9.234 80 H T T P 19 5.76.1 78.35 80 H T T P 13 2.239. 152.86 20 F T P 13 0.161. 5.222 20 F T P 13 0.236. 147.11 0 20 F T P 19 4.47.1 51.126 20 F T P 13 8.100. 8.6 20 F T P 19 4.179. 106.12 0 20 F T P 13 0.225. 51.30 20 F T P 12 8.173. 16.28 80 H T T P 20 6.79.4 .8 80 H T T P 15 5.207. 19.200 20 F T P 20 6.132. 163.16 7 80 H T T P O ther S er ver s: S 86 5811

Total of traffic: 202,730 Mbytes (74% of the captured traffic)

(12)

1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1.E+11 19 3. 14 5. 22 3. 194 1 19 N E W S 19 3. 14 5. 88 .1 6 80 H T T P 14 7. 83 .61. 24 8 0 H T T P 16 1. 11 6. 1. 2 80 H T T P 14 7. 83 .20. 85 8 0 H T T P 13 0. 20 6. 11 0. 1 25 SM T P 15 8. 10 9. 0. 0 80 H T T P 19 3. 14 6. 19 6. 21 80 H T T P 19 3. 14 6. 19 6. 11 80 H T T P 19 3. 14 6. 19 6. 16 80 H T T P 19 3. 14 6. 19 6. 18 80 H T T P 13 0. 20 6. 42 .6 9 80 H T T P 19 3. 14 5. 88 .1 6 25 SM T P 14 7. 83 .2 .6 2 25 SM T P 14 7. 83 .61. 3 50 00 IR C 14 7. 83 .61. 32 2 0 F T P 19 3. 14 4. 20 .6 2 0 F T P 14 7. 83 .61. 3 51 51 IR C 15 8. 10 9. 24 1. 12 80 H T T P 19 3. 14 6. 19 6. 5 80 H T T P 15 8. 10 9. 0. 0 25 SM T P 16 1. 11 6. 8. 1 23 T E LN ET 14 7. 83 .42. 28 2 0 F T P 15 8. 10 9. 0. 0 20 F T P 14 7. 83 .20. 20 8 0 H T T P O the r Ser ver s: 153 900

Total of traffic: 71,991Mbytes (26% of the captured traffic)

Local Servers (IP address, TCP port, Service)

Figure 15. The 25 most visited servers of Catalonia.

Figure 16 shows an example of the possible reports on the unknown traffic that can be provided by the IHM. It is an attempt to find possible servers by the number of their possible clients (number of sources sending traffic to that destination). The graphics of the Figure 16 corresponds to a traffic capture of a single day. 208. 169. 222. 142 6 969 6 195. 77.20 7.2 7001 6 207. 153. 197. 43 81 6 206.43 .200 .92 8 1 6 195. 76.19 4.12 500 52 6 158. 42.17 2.2 3005 6 195. 76.19 4.12 500 51 6 192. 0.0. 0 161 17 195. 77.24 0.14 620 00 6 130. 206. 1.3 113 6 194. 224. 80.35 312 8 6 193. 127. 50.25 4 40 80 6 148. 206. 43.54 100 04 6 148. 206. 43.54 100 01 6 148. 206. 43.54 100 02 6 148. 206. 43.54 100 03 6 194. 179. 125. 78 65 277 6 207. 250. 229. 65 19 75 6 137. 138. 99.9 161 17 200. 32.11 9.88 103 7 6 138. 100. 16.8 137 17 193. 145. 140. 21 11 3 6 195. 76.19 4.12 500 50 6 193. 145. 62.87 139 6 130. 206. 34.74 279 10 1 7 193. 145. 62.19 5 11 3 6 193. 145. 62.31 113 6 147. 83.49 .85 113 6 130. 206. 124. 48 11 3 6 147. 83.61 .32 2791 0 17 193. 146. 196. 21 44 4 6 147. 83.20 .115 312 8 6 193. 145. 91.15 6 50 30 6 193. 145. 62.24 113 6 193. 145. 54.60 113 6 193. 146. 196. 5 113 6 147. 83.11 2.65 106 5 17 130. 206. 113. 1 113 6 161. 116. 131. 47 65 535 6 147. 83.2. 62 113 6 193. 144. 6.150 371 17 146. 219. 4.155 113 6 158. 109. 8.88 2791 0 17 147. 83.61 .32 2792 0 17 193. 145. 54.60 88 6 158. 109. 0.0 113 6 0 50 100 150 200 250 300 N u m b er of p o ss ibl e c lie nt s

Possible local servers (IP address, TCP/UDP) Possible remote servers (IP address, TCP/UDP)

0 200 400 600 800 1000 1200 1400 1600 1800 N u m b er of p o ss ibl e c lie nt s

Figure 16. Statistics of the unknown traffic.

7. Conclusions

The flexibility of the IP protocol combined with broadband networks creates different difficulties for the dimensioning, management, and operation of this type of infrastructures. The permanent availability of detailed and updated information about network traffic is critical for network tuning, accounting, enforcement of acceptable use policy and security purposes.

(13)

The MEHARI platform allows data capture rates on STM-1 lines with price/performance ratios ten times better than commercial protocol analysers, also having a much higher range on the number of VCIs snooped. As capture is at the AAL5 level, it is independent of the higher level protocol, and may be customised to capture only part of the packets, or certain packet types. Furthermore, its modular design allows the traffic capture probes to be distributed throughout the network, consolidating pre-processed flow information to a central site.

MEHARI traffic processing modules currently provide information about conventional traffic statistics, the autonomous systems’ traffic matrix-by-user group, validation of the port-to-application assignment and heuristic detection of unacceptable traffic flows. They can be customised and/or extended to suit the changing traffic information requirements of network administrators.

The system field trial has been running 24 hours a day, 30 days a month for several months. The field trial results have been cross-checked to verify the correctness of objective measurements and the statistical validity of heuristic results on traffic types. These results give great confidence in the resilience and accuracy of the MEHARI system.

Current work is expected to improve the capabilities of the capture and pre-processing platform, in order to obtain better price/performance capture ratios. More traffic processing modules are also being developed, mainly targeted at security and charging applications.

Two practical applications of the MEHARI system are monitoring the traffic to support billing and charging mechanisms for Academic and Corporate Networks and auditing the National R&D Networks AUP (Acceptable Use Policy)

Acknowledgements:

The MEHARI project was funded by the CICYT (Comisión Interministerial de Ciencia y Tecnología) under contracts TEL97-1897-E and TEL97-TEL97-1893-E. The authors thank all those participating in the MEHARI project along with the authors for their contribution to this work. These contributors are: Xavier Martínez, Carles Veciana, Albert Renom and Sergi Sales of UPC, David Larrabeiti of UC3M and Julio Berrocal and Ana B. García of UPM. The authors also thank the staff at RedIRIS, Telefónica I+D and C4_{(Centre de Computació i Comunicacions de Catalunya), who have been following the work done}

in this project, for their valuable comments and suggestions. Finally, many thanks to the OC3-MON development team for all their help in adapting the modified FORE firmware to the MEHARI capture modules.

References

[1]Kevin Thompson et al., “Wide-Area Internet Traffic Patterns and Characteristics”, IEEE Network,

November/December 1997.

[2]M. Alvarez-Campana et al., “CASTBA: Medidas de Tráfico sobre la Red Acedémica Española de Banda Ancha”, TELECOM I+D, Madrid, October 1998.

[3] J. Aspirdof et al., “OC3MON: Flexible, Affordable, High-Performance Statistics Collection”, INET’97, Malasya, june 1997.

[4] David Newman et al., “Intrusion Detection Systems, Suspicious Finds”, Data Communications, August 1998.

[5] M. Alvarez-Campana, A. Azcorra, J. Berrocal, D. Larrabeiti, J.I. Moreno, J.R. Pérez, “CASTBA: Internet Traffic Measurements over the Spanish ATM Network”, HP-OVUA Workshop, Rennes (France), April 1998.

Author Biographies:

Pedro J. Lizcano ([email protected]) holds a Telecommunication Engineering Master degree from the Universitat Politècnica de Catalunya (UPC, Barcelona, in 1983). From 1983 to 1988 he was with the R&D Department at Telefónica de España, involved in advanced communications systems development such as a multi-access radio system for rural applications (MARD fully industrialised by Telettra and deployed world-wide), and a second generation X.25 switching packet system (TESYS-B, a major joint development between Telefónica industrial group companies and Fujitsu, deployed nation-wide, in which he led the hardware platform development). In 1988 he joined Telefónica R&D as the Packet Switching Division manager. From 1991 to 1993 he worked with AT&T Bell Laboratories in Denver (Colorado, USA) as Resident Visitor in a forward-looking work project dealing with SONET / SDH high speed

(14)

broadband interfaces. In 1993, back at Telefónica R&D, he led the ATM Systems Division, fully involved in the development of a B-ISDN platform (RECIBA) for benchmarking the emerging multimedia services. He also worked on a number of EURESCOM projects (concerning network synchronisation) and in RACE II and ACTS EU-programme projects (dealing with advanced networking and services). Since 1994 he has been working as external expert and adviser with the EU Commission in programme definition, proposal selection and technical audit processes. From 1996 to now he has held the positions of head of the areas dealing with International Development, Basic Technologies and Product Engineering. He is currently leading the Technological Innovation Programme area.

Since 1995 he has been a technology adviser at the Interministerial Commission of Science and Technology, leading a National R&D Programme on Telematic Services and Applications. He is the national representative in the European Networking Policy Group (ENPG, currently as Vice-President). He is also member of the Board of Trustees of the International Computer Science Institute (ICSI) in Berkeley (California, USA). He has been a member of the IEEE for over 20 years and a member of the Planetary Society.

Arturo Azcorra ([email protected]) was awarded an M.Sc. degree in Telecommunications from the Technical University of Madrid in 1986 and a Ph.D. from the same university in 1989. In 1990 he became an associate professor at the Department of Telematics Engineering, Technical University of Madrid, Spain. Since 1998 he has been lecturing at U. Carlos III of Madrid, Spain. Current research projects include broadband networks, multicast teleservices, intelligent agents, and IP/ATM convergence.

Josep Solé-Pareta ([email protected]) was awarded his Master's degree in Telecommunication Engineering in 1984, and his Ph.D. in Computer Science in 1991, both from the Universitat Politècnica de Catalunya (UPC). In 1984 he joined the Computer Architecture Department of UPC. Since 1992 he has been an Associate Professor with this department. He spent the summers of 1993 and 1994 at the Georgia Institute of Technology. He has participated in the Spanish R&D Programme for the development of the Broadband Communications in Spain (PLANBA). He is a member of the Advanced Broadband Communications laboratory of UPC (http://www.ac.upc.es/CCABA). His current research interests are in Broadband Internet, ATM Networks and Optical Packet Networks, with emphasis on traffic engineering, traffic characterisation, traffic management and QoS provisioning. He is a member of the IEEE and the ACM (Sigcomm).

Jordi Domingo-Pascual ([email protected]) is an Associate Professor of Computer Science and Communications at the Universitat Politècnica de Catalunya (UPC) in Barcelona. There, he received the engineering degree in Telecommunication (1982) and the Ph.D. Degree in Computer Science (1987). In 1983 he joined the Computer Architecture Department. Since 1994 he is co-founder and researcher at the Advanced Broadband Communications Center of the University (CABA) which participates in the Spanish National Host and in the PLANBA demonstrator. He was visiting reseracher at the International Computer Science Institute in Berkeley (California) for six months. His research topics are Broadband Communications and Applications. Since 1988 he has participated in RACE projects (Technology for ATD and EXPLOIT), in several Spanish Broadband projects (PLANBA: AFTER, TR1 and IRMEM), ACTS projects (INFOWIN, MICC, IMMP), and spanish research projects (CASTBA, MEHARI, SABA). A more detailed information may be found in: http://www.ccaba.upc.es.

Manuel Alvarez-Campana ([email protected]) received his M.S. and Ph.D. degrees in Telecommunication Engineering from the Universidad Politécnica de Madrid. In 1989 he joined the Departamento de Ingeniería Telemática at the Universidad Politécnica de Madrid (DIT-UPM), where he is an associate professor since 1996. He has participated in several projects within the framework of European and Spanish R&D programs. His current research interests include design and analysis of broadband networks, integration of services, traffic characterisation, and network performance evaluation.