Agent Architecture in the Network Environment

8.3 Single Cell with Multi-Agent architecture

8.3.1 Agent Architecture in the Network Environment

After we have reviewed the performance of a single agent architecture in the past sec- tion, we concluded that the multiple source of information had significant potential for improving the ability of IDSs to detect complex attacks. To address this situation we proposed an agent architecture similar to the one previously used in the abstract environment but enhanced with the ability to use a large input space. This new ability to use several inputs with high resolution will allow us to adapt this architecture to the requirements that a real world application needs. To accomplish this task we have slightly modified the original agent architecture used in the abstract environment (See Figure 8.3).

The basic architecture is composed of a single cell with a Congestion Sensor Agent (CSA), a Delay Sensor Agent (DSA), a Flow Sensor Agent (FSA) and the Decision Agent (DA). The variety of information sources is required to improve the detection capabilities [22, 112, 117]. The idea here is that each sensor agent perceives different information depending on their capabilities, their operative task, and where they are deployed in the network [154].

One important reason to require a MAS approach instead of a single agent with multiple sources is that not all the features are available at a single point in the network. Flow and congestion information may be measured in a border router between the

8.3. Single Cell with Multi-Agent architecture Chapter 8. Network Environment

Figure 8.3: Agent Architecture

Internet and the Intranet whilst delay information may be only available from an internal router. Besides, Flood-Based DDoS attacks are launched from several remote controlled sources trying to exhaust a target’s key resource. A stand-alone IDS does not have all the information to accurately identify sources and destinations of DDoS attacks when the sources use address spoofing to hide their identity.

The CSA analyses link information on a particular node in the network. Specifically this agent samples link utilisation in bytes per second, the size of the queue in packets, and the number of packets dropped by the queue. This set of monitored information (link utilisation, queue size and packets dropped) will be referred as feature domain. A good placement for this agent to collect data is in the path between the protected service and the untrusted network. The closest this agent is to the protected service, the more data it could gather. The DSA monitors TCP connections between nodes. DoS and DDoS attacks modify the normal behaviour of the network in many ways. Some of these changes can be spotted by analysing TCP information from connections in the path of the attack. This agent has the same internal structure as the CSA but

8.3. Single Cell with Multi-Agent architecture Chapter 8. Network Environment

its feature domain is different. The features analysed for the TCP connections are: the average number of ACK packets received, the average window size, and the average Round Trip Time (RTT).

The task of the FSA is to analyse and summarise flow information. To perform this task the agent is divided into two logical sub-agents: the Flow Monitor (FM) and the Flow Aggregator (FA). The FM analyses the traffic flows that pass through the FSA. This sub-agent can be hand-coded or a learning agent. If it were hand-coded, it would use a table with the flow information that represents an attack in a similar fashion to a misuse IDS. For the tests that we perform we use a learning agent using RL to learn which flows are abnormal. The feature domain of the agent is composed of protocol number, port number, and the average packet size of the flow. Using this information the learning agent acting as FM learns which flows are normal traffic and which ones may lead to an attack. The second sub-agent is the FA. It aggregates flow information by keeping a flow table with the signals reported by the FM. The basic feature domain of the FA is the number of attack flows reported by the FM. It is possible to hold more information such as the total number of flows, the number of no attack flows, the rate of attack flows vs. no attack flows, etc.

In the network environment we use the same Decision Agent (DA) that we used in the abstract environment. The DA did not undergo any modification in its structure, functionality or operation. The diagram of the Figure 8.3 shows a module called feature extraction, this module uses tile coding as function approximation technique to map input data to state information. The tile coding parameters used in tests based on this agent architecture were 32 tilings and 16 tiles per tiling. We used three input variables in each sensor agent.

The same RL of Signalling algorithm explained in Chapter 5 was developed using this agent architecture. Sensor agents (CSA, DSA and FA) send signals to their cor- respondent DA within the cell. The DA in turn generates an alarm or no-alarm signal to the network operator or an action-signal to the next level in the hierarchy. Again, it is important to mention that agents are not directly interacting or changing the environment, thus the action selection is based in the maximisation of the immediate rewards. The DA in top of the hierarchy obtains its reward from the network operator. If the categorisation of the network state was correct, the operator rewards positively

8.3. Single Cell with Multi-Agent architecture Chapter 8. Network Environment

and negatively otherwise. This same reward is passed down to other DAs lower in the hierarchy and eventually to the SAs.

Both Figure 8.3 in this chapter and Figure 5.2 in Chapter 5 show an agent architecture that resembles a Multi-layer Artificial Neural Network. Also the operation looks very similar, input information is received and processed by perceptrons (sensor agents) and then fed forward to the next layer. In the next layer (DA) an output is generated. would imagine that this problem would be better addressed using a simple ANN. That would be true if all the inputs would be in the same location and the outputs of all the sensor agents (or perceptrons) to the DA were synchronised. In practice this is not possible. Sensor agents are located away from each other, they are not just gathering different information, but also they may be located in different places within the network. To deploy an ANN we need to synchronise the outputs of each perceptron to arrive at specific intervals to the next layer in the neural network. Such synchronis- ing of signals may be very difficult to accomplish in noisy and congested environments such as a network under a DoS attack. In contrast, the approach using RL works with asynchronous signals and it is easier to deploy and more reliable under the loss of information from sensor agents. However it is important to consider the delay that exists between when the input information is processed by sensor agents and when the final action-signal is generated by the top DA in the hierarchy.

In document Multi-agent reinforcement learning for intrusion detection (Page 122-125)