Post-Intrusion Recovery Using Data Dependency Approach

(1)

T3A3 4:20 Proceedings of the 2001 IEEE

Workshop on Information Assurance and Security

United States Military Academy, West Point, NY, 5-6 June, 2001

Abstract-- Recovery of lost or damaged data in a post-intrusion detection scenario is a difficult task since database management systems are not designed to deal with malicious committed transactions. Few existing methods developed for this purpose heavily rely on logs and require that the log must not be purged. This causes the log grow tremendously and, since scanning the huge log takes enormous amount of time, recovery becomes a complex and prolonged process. In this research, we have used data dependency approach to divide a log into multiple segments, each segment containing only related operations. During damage assessment and recovery, we identify and skip parts of logs that contain unaffected operations. This accelerates the task. Through simulation we have validated performance of our method.

Index Terms-- Information Warfare, Malicious Transaction, Data Dependency, Log Segmentation.

I. INTRODUCTION

NY computer system that is connected to a network is vulnerable to information attacks. In spite of all preventive measures, savvy intruders manage to sneak through and damage sensitive data. Initial damage later spreads to other parts of the database when a legitimate transaction updates valid data after reading damaged data. Damage may also get spread for reasons such as system integrity check as described in [1] and [6]. Intrusion detection helps in identifying an attack. Significant amount of work has been performed in the area of intrusion detection. A few are described in [3], [5] and [10]. However, none of the existing intrusion detection methods guarantee that an attack will be detected immediately. Therefore, a major part of the database may have been affected by the time an attack is detected and an attacking transaction is identified. The situation becomes worse as time passes by and finally, it may be difficult, although not impossible, to recover the system. Hence, immediate and efficient damage assessment, and fast and accurate recovery is important. In this research, we have developed a model using data dependency approach to divide the log into multiple segments. During damage assessment, only few of these segments will be accessed instead of the entire log. This expedites the recovery process. We have

!_{This work was supported in part by US AFOSR grant F49620-99-1-0235.}

developed a simulation model to test the performance of our model. The results show dramatic improvements over traditional methods.

In the next section, we discuss related work. Our proposed model is described in section III and the clustering algorithm is presented in section IV. Section V offers the performance of our model obtained through simulation. Section VI concludes the paper.

II. RELATED WORK

Traditional recovery methods [2], [4], [7] have been designed to perform recovery in case of media or system failures, but they lack the efficacy required to recover from effects of malicious committed transactions. In such situations, after the detection of an attack, effects of all transactions reading directly or indirectly from the malicious transaction along with that of the malicious transaction need to be undone. Then affected transactions must be re-executed to reflect the correct state of the database. Since an attack may be detected days or even months after its occurrence, the log must never be purged. Otherwise, information about the attacker and other valid but affected transactions will not be available. This requirement makes the log grow massively and searching the log during damage assessment and recovery incurs a long delay, which is unacceptable in many real-time applications.

In [8], Jajodia et al. have discussed recovery issues for defensive information warfare. Liu et al. [9] have presented algorithms that re-write transaction history by moving the attacking transaction and all affected transactions beyond non-affected transactions. However, this process requires significant page I/O since all transactions after and including the malicious transaction have to be read. In order to save log access time, researchers have proposed to cluster the log using transaction dependency approach and have shown that during damage assessment only one of the clusters will be accessed [13]. Instead of traditional transaction dependency, researchers in [11] and [12] have used data dependency for recovery from malicious transactions. Rather than undoing all operations of affected transactions and then re-executing them, their approach suggests to undo and redo only affected operations of those transactions. Nevertheless, they require that log must be accessed starting from the malicious

Post-Intrusion Recovery Using Data

Dependency Approach

Sani Tripathy and Brajendra Panda

!

,

Member, IEEE

Computer Science Department

University of North Dakota

Grand Forks, ND 58202

Email: {sani, panda}@cs.und.edu

(2)

transaction till the end in order to perform damage assessment and recovery.

In this research, we have developed an extended data dependency model and developed an algorithm to divide the log into several clusters. Only operations on data items that are inter-dependent kept in one cluster. During damage assessment, only relevant clusters need to be scanned. The damage assessment and recovery algorithms presented in [11] can aptly use clusters created by our method, whereas, other existing algorithms can easily be modified for this purpose.

III. THE MODEL

This work is based on the assumption that the attacking transaction has already been detected by intrusion detection techniques. So, given an attacking transaction, our goal is to determine the affected ones quickly, stop new and executing transactions from accessing affected data, and then carry out recovery process. We further assume that the scheduler produces a strict serializable history, and the log is not modifiable by users (so that log can’t be damaged). As the transactions get executed, the log grows with time and is never purged. The log is stored in the secondary storage, so every access to it requires a disk I/O. During recovery, we need to access the log to restore the database. To avoid unnecessary retrieval of the massive log that results in tremendous amount of page I/Os, the clustering approach is followed. Next, we cite two of the definitions that were initially presented in [11] since they form the basis of our model.

Definition 1: A write operation wi[x] of a transaction Ti is dependenton a read operationri[y] of Ti if wi[x] is computed using the value obtained from ri[y].

Definition 2: A data valuev1is dependent on data valuev2 if the write operation that wrote v1 was dependent on a read operation on v2. Note that v1 and v2 may be two different versions of the same data item.

In our model, the operations on the data items that, in accordance with the definition 2, are directly or indirectly dependent on each other are kept in the same cluster. Within a transaction, some of its operations may be independent of each other. Therefore, not all operations of an affected transaction are affected in case of an attack. Hence, during recovery we need not re-execute all operations of a transaction, rather re-execute only the affected operations of that transaction. Keeping this philosophy in mind, while clustering, we store independent operations in different clusters. Another perspective that we have contemplated is determining the largest possible subset of all unaffected data items and, then, making them available as soon as possible. This reduces the risk of denial-of-service types of attacks. The following definitions help us achieve this goal by determining various possible dependency boundaries between data items.

Definition 3: For any two data items x and y, if value of x may be used in calculating value of y then, ycan beinfluenced by

x. This relation is denoted by: x||→ y . If the relation is bi-directional, i.e., either data item can influence the other, we denote the relationship as x←||→ y.

We assume that any data item can be updated by using its previous value. Therefore, the relationship is reflexive. But it is neither commutative nor transitive.

Definition 4: A probability graph is a directed graph representing possible relationships among data items in a database. In such a graph, the data items are represented by nodes and an edge between two nodes represent the can-influence relationship between them. An edge can be unidirectional or bi-directional.

Definition 5: A CliqueC comprises of related data items such that, 1) for each x∈C, ∃ a data item y, such that either x||→

y , or y ||→ x , or x ←||→ y and 2) the probability graph plotted by taking all the data items is a connected one.

A clique guarantees that data items belonging to different cliques never affect each other. Therefore, during damage assessment, if we identify the clique containing the damage made by the attacker, items of other cliques can be immediately made available to users.

Definition 6: A critical link is a specific connecting node in a probability graph, removal of which may divide the graph into multiple disconnected graphs.

During damage assessment, if it is determined that a critical link is not updated, then it is clear that data items on one side of the link have not affected items on the other side(s) of the link. However, it must be noticed that determination of whether a link is updated adds additional overhead. A study needs to be done to determine information on which links must be kept; otherwise, the maintenance cost will exceed the expected benefit.

IV. LOG CLUSTERING

In data dependency approach, as update operations of transactions are encountered, affected data items are checked for data dependency and accordingly put in appropriate cluster. Operations on the items of the clique may spread over a number of clusters. Operations that are related or dependent on each other are put in the same cluster.

When we consider the aspect of the recovery from malicious attack, we cannot overlook the requirements for recovery from traditional failures. Hence the basic operations for traditional recovery like undo and redo must be taken care of. Since our algorithm stores only committed transactions in clusters, there is no need to carry out undo operations in case of transaction or media failures. Operations of all active transactions are stored in a temporary log. Clusters for those are determined periodically. In case of traditional recovery, when operations of a transaction have not been flushed even after the commit point, the need for redo operation arises. In our method, the operations of any non-flushed committed

(3)

transaction will be in temporary log. We add a checkpoint in temporary log to mark the last committed transaction that has been flushed, so that a cluster can be created until that point. We do not store commit operations of transaction in any cluster. Therefore, the transactions stored in cluster do not require undo and redo operations. To carry out those operations we may need to refer to the temporary log.

The actual dependency of data items is hard to determine from transaction operations, and transaction semantics must be considered for this purpose. Since transaction semantics are not available, for simplicity we have made following assumptions. First, a write operation on any data item is dependent on all preceding read operations appearing after the previous write operation (if any) of the same transaction. Secondly, if a write operation is immediately followed by another write operation, the second write operation is independent of any read operation. The following data structures are used in the clustering algorithm.

A. Data Structures

Cluster - data list (CDL): This is used to store the cluster IDs and the corresponding data items, so that related (directly or transitively) data items can be stored together. This table may be referred to identify the relevant clusters to which the operations of the transaction belong. Each operation in the TOLi is considered and the CDL is checked to find out if the

data item is already in a cluster, if it is already there, then the same cluster ID is assigned; otherwise, a new cluster ID is assigned. It is guaranteed that no items belong to more than one cluster. If a dependency is established between two items belonging to two different clusters, then those clusters are combined to form one cluster.

Transaction - cluster list (TCL): This structure is used to store the transaction IDs and corresponding cluster IDs. A single transaction may be spread over a large number of clusters. Therefore, to trace a transaction we need to store this information. In case an attacker is detected, this list will help in determining the affected clusters.

Transaction operation list for Ti (TOLi): This structure

stores operations of transaction Ti along with their corresponding cluster IDs. This is a temporary structure and is discarded, once the last operation of the transaction is stored in a cluster.

B. Clustering Algorithm

1. If data structures TCL and CDL not found then Set TCL = {} and CDL= {}

2. Scan each operation till check point of the log. For every operation Oiin the log

2.1 Case Oi is Start Set TOLi ={}

Case Oi is Read or Write Add Oi to TOLi

Case Oi is Abort Delete TOLi

Case Oiis Commit

Add transaction ID to the Transaction ID entry of the table TCL; Delete all the read operations

in TOLi which do not affect any write

operation; Call assign_cluster; Delete TOLi

Procedure assign_cluster

// Identifies cluster for each operation in TOLi

1. Scan each operation until the end of the table TOLi

For every operation Oi[x] in TOLi

1.1 If Oi(x) is read, then If x belongs to a cluster say Ck

Update the cluster entry for Oi[x] in TOLi to Ck Else

Assign a new cluster ID and update the cluster entry for Oi[x] in TOLi accordingly; Add the

new cluster ID and data item to CDL; Add the new cluster ID to TCL

1.2 If Oi[x] is write operation

If x is in cluster Ck (as checked from CDL) Update cluster entry for Oi[x] in TOLi to Ck Else

Assign a new cluster ID and update the cluster entry for Oi[x] in TOLi

Add the new cluster ID to CDL and TCL;

If operation Oi[x] has dependent reads then Check all such dependent read operations in TOLi and their cluster IDs

For each such different cluster ID

Merge all the clusters and put all entries of those in one cluster as given in CDL; Update CDL, TCL and TOLi to reflect the

changes

2. Add all operations of TOLi to their respective clusters

Next, we present two lemmas regarding transaction operations. Proofs of these lemmas are obvious; therefore, due to space constraints proofs are not provided here. Readers interested in proofs may want to contact the authors.

Lemma 1: Effect of every operation in the cluster is already in the database.

Lemma 2: No operation is stored in more than one cluster. In the above algorithm, whenever a read operation on any item is encountered, the CDL is checked to determine the corresponding cluster. When a write operation is encountered, the dependency is checked with the previous read operations. If any dependency is established, all dependent operations are put in the same cluster. Once the clusters are created, pinpointing the damage becomes easier. Given the attacking transaction, we can determine the clusters, which contain this transaction. Then the operations of the attacking transaction can be undone and the affected operations can be re-executed by using existing algorithms such as the one described in [11].

V. PERFORMANCE ANALYSIS

A simulation model was developed to compare the performance of the clustering approach with the traditional

(4)

log based recovery method. The simulation program was executed in two phases. The first phase included the creation of a strict serializable history, followed by clustering and damage assessment in the second phase. The transactions were executed following a strict two-phase locking protocol. The transaction ID of the attacking transaction was pre-specified in the program. Traditional log based recovery techniques always scan operations of all transactions from the point of attack to the end of the log. Then recovery was done accordingly. In this case, total number of read and write data items of the transactions was calculated from the point of attack to the end of the log. In the case of the clustering model, only those clusters containing the attacking transaction were accessed. Then the counts of the total number of affected and unaffected operations were taken.

This simulation was run using two main variations. First, to realize the effect of the attacker at various places in the log. For this, the attacker ID was varied while keeping the total number of transactions, total number of data items, and maximum number of items accessible by a transaction fixed during each run of the program. Secondly, to observe the effect of number of executed transactions. In this case, the total number of transactions was changed in each run while maintaining the same attacker ID, fixed number of data items, and fixed maximum accessible data items by a transaction.

A. Calculation Methods

For the calculation of page access time, the following system-dependent parameters were used:

Space taken by a read operation record of a transaction in the log (RD) = 40 bytes,

Space taken by a write operation record of a transaction in the log (WR) = 60 bytes,

Page Size (PS) = 1024 bytes, and

Page Access Time (PT) = 20 milliseconds.

For traditional log based recovery methods, the recovery process begins by scanning all operations of transactions from the point of attack to the end of the log. To calculate the total page access time or log access time for this model, all read and write operations of transactions starting from the attacking transaction to the last transaction in the log were considered. Using the fixed space occupied by each of these operations as listed above, the size of the part of the log that needs to be scanned and hence the estimated total page access time was calculated. For the clustering approach, the total number of affected operations was calculated in each cluster from the attacking transaction up to the end of each cluster. The calculation of time for the damage assessment was done in the following manner.

Let R represent the total number of affected read operations and, W denotes the total number of affected write operations. Then the total page access time is calculated as follows:

Total space for read records = R * RD bytes Total space for write records = W * WR bytes

Total amount of space needed to be scanned T = R * RD +

W * WR bytes

Number of pages needed P = T / PS +1

Total access Time = P * PT

The above-mentioned steps were used to calculate the total log access time for the traditional log based methods. Similar calculation was done to calculate the access time for the clustering model, based on the total number of affected read and write operations. The number of pages to be read for each cluster was calculated and then summed up to calculate the total access time.

B. Results

Figure 15.1 shows access time comparison between our cluster based approach and the traditional log approach, when the attacking transaction is varied. Other fixed parameters used were: 4000 total data items, 500 transactions, and a maximum of 40 data items accessible by a transaction. Since the transactions are interleaved, the access time is not dependent on the position of the attacking transaction. Therefore, the access time fluctuates with the change of attacker position.

Figure 15.2 was obtained by running the program with same parameter values as before except for the total data items, which was changed to 8000. Although the log access time was almost similar to the previous case, the access times for cluster based model decreased drastically. The larger set of data items yielded less dependency among the data items, thus, giving rise to clusters with less number of affected operations.

Using the same parameter values as in the first case but changing the total number of transactions to 1000 incurred more dependencies among data items. Therefore, the cluster size also increased resulting in a higher access time than before. Still our model proved to be better than the traditional approach. Figure 15.3 displays the obtained result.

We ran the experiment again by fixing the total possible data items at 8000, maximum data items accessible by a transaction at 30, and having the attacker ID as 950. This time, total number of transactions executed was varied from 1000 to 1400 with increments of 100. The comparison result is illustrated in Figure 15.4.

VI. CONCLUSIONS

In a post-intrusion detection situation, fast and accurate recovery from malicious transactions is crucial for survival of any information system. Since recovery algorithms require that the database log must not be purged, the log grows out of proportion. Searching massive logs during damage assessment and recovery is very inefficient. In this paper, we have developed a model based on data dependency approach to pre-determine possible paths of information flow in databases. This helps in determining parts of the database that are not affected by the attack. We have presented an algorithm to cluster the log based on data dependency, thus, grouping the related operations together. The operations affecting each other directly or indirectly will be stored in the same cluster. This enables us in skipping various sections of the log during damage assessment and recovery. Through simulation we compared the performance of our algorithm

(5)

with the traditional method. The results confirm our claim that our proposed method accelerates the recovery process considerably. In situations where the amount of dependencies among data items is less, the cluster sizes remain small. This results in dramatically less access time than the traditional approach. However, even with larger cluster sizes resulting due to more dependencies among data items, our model outperforms the traditional log based method.

VII. ACKNOWLEDGMENT

The authors wish to thank Dr. Robert L. Herklotz and Capt. Alex Kilpatrick for their support, which made this work possible.

VIII. REFERENCES

[1] P. Ammann, S. Jajodia, C. D. McCollum, and B. Blaustein, “Surviving information warfare attacks on databases,” In Proceedings of the 1997 IEEE Symposium on Security and Privacy, p.164-174, Oakland, CA, May 1997.

[2] P. A. Bernstein, V. Hadzilacos, and N. Goodman, “Concurrency Control and Recovery in Database Systems”, Addison-Wesley, Reading, MA, 1987.

[3] Leonard J. LaPadula, “State of the art in anomaly detection and reaction,” Technical Report, Center for Integrated Intelligence Systems, The Mitre Corporation, Bedford, MA, July 1999.

[4] R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, Second Edition, Addison-Wesley, Menlo Park, CA, 1994.

[5] B. Mukherjee, L. Herelein, and K. Letitt, “Network intrusion detection,” IEEE Network, Vol. 8, No. 3, p. 26-41, May/June 1994. [6] R. Graubart, L. Schlipper, and C. McCollum, “Defending Database

Management Systems Against Information Warfare Attacks”, Technical report, The MITRE Corporation, 1996.

[7] J. Gray and A. Reuter, “Transaction Processing: Concepts and Techniques”, Morgan Kaufmann, San Mateo, CA, 1993.

[8] S. Jajodia, C. D. McCollum and P. Amman, “Trusted Recovery”, In Communications of the ACM, Vol. 42, No. 7, p. 71-75, July 1999. [9] P. Liu, P. Ammann, and S. Jajodia, “Rewriting histories: recovering

from malicious transactions,” Distributed and Parallel Databases, Vol. 8, No. 1, p. 7-40, January 2000.

[10] T. F. Lunt, “A Survey of Intrusion Detection Techniques”, Computers & Security, Vol. 12, No. 4, p. 405-418, June 1993.

[11] B. Panda and J. Giordano, “Reconstructing the Database after Electronic Attacks”, In Database Security XII: Status and Prospect, S. Jajodia (editor), Kluwer Academic Publishers, p. 143-156, 1999. [12] B. Panda and J. Giordano, “An Overview of Post Information Warfare

Data Recovery”, In Proceedings of the 1998 ACM Symposium on Applied Computing, Atlanta, GA, February 1998.

[13] S. Patnaik and B. Panda, “Dependency Based Logging for Database Survivability from Hostile Transactions”, In Proceedings of the 12th International Conference on Computer Applications in Industry and Engineering, Atlanta, GA, Nov. 1999.