SLBA: A Security Load-balancing Algorithm for Structured P2P Systems

(1)

Available at http://www.Jofcis.com

SLBA: A Security Load-balancing Algorithm for

Structured P2P Systems ⋆

Wei MI∗,

Chunhong ZHANG, Xiaofeng QIU

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

Dynamic load balancing is one key adaptation mechanism often deployed in networking and computing systems. Numerous proposals exist for load balancing in peer-to-peer networks. All of them will enhance the availability of P2P system to some extent. However, few attentions have been paid on security threats introduced by the load balancing. This paper analyzes the security vulnerabilities of the typical DHT load balancing mechanism; then proposes an algorithm that both facilitates good performance and does not dilute security. Our algorithm, SLBA, achieves load balance by targeted interval ID generation and higher convergence rate load transfer algorithm, and limits any fundamental decrease in security by basing each node’s set of identifiers on a single certificate and considering security factor in load transfer among nodes. Performance evaluation shows that, compared to the classical algorithms, the load balancing effect of SLBA algorithm is significant, the convergence rate and load balancing security are significantly raised.

Keywords: P2P; Load Balancing; Security

1 Introduction

Decentralized structured overlays and distributed hash tables (DHT) proﬀer a unique vision of computing: a collection of computing and communication resources shared by active users. How-ever, nodes are heterogeneous, workload assigned to system may be heavy-tailed, node availability and churn rates may change over time. Load balancing is a key step towards adapting to these characteristics and ensuring the reliability and availability.

There is a large body of literature on load balancing of DHT system and all proposed load balancing schemes can be broadly characterized as ID manipulating solutions and virtual server solutions. ID manipulating solutions [1-3] balance load owned by nodes through elaborately assigning and reassigning node IDs. To balance load, many node IDs need to be reassigned

⋆_{This work is supported by China-Finland Cooperation Project (No. 2010DFA12780), National Key Program} (No. 2011ZX03005-004-02) and Key Laboratory of Universal Wireless Communications, Ministry of Education.

∗_{Corresponding author.}

Email address: [email protected] (Wei MI).

(2)

when a node joins or leaves overlay. This results in transferring a lot of data. In virtual server solutions [4-9], each physical node runs multiple virtual DHT servers proportional to its capacity. Virtual servers can also be created, deleted and transferred dynamically based on the changing load distribution.

However, these solutions have some deficiencies as following: the convergence rate is still low, which needs more detecting messages and higher detecting overhead; algorithm relies on some fixed nodes for collecting load information and generating reassign policy in the system, which requires additional equipments and higher cost; the main killer issue is give up on important security properties. A key issue in the operation of a p2p network is whether or not one assumes it may contain malicious nodes. A malicious node can subvert content or attempt to control particular portions of the identifier space.

In this paper, we analyze the security vulnerabilities of the typical DHT load balancing so-lutions. Then, we propose SLBA, a security load balancing algorithm for DHT that supports wide variation in skew, heterogeneity, and churn while retaining security. At a high level, SLBA works as follows: (1) at joining time, based on targeted interval verifiable ID generation, an u-nique semi-CA server generates a set of verifiable IDs for node, which can limit any fundamental decrease in security and greedily reduces discrepancies between capacity and load; (2) during its run-time, experiencing overload node should execute security-aware load transfer algorithm, which can significantly raise the convergence rate and load transfer security.

This paper proceeds as follows. In Section II, we introduce the related work. In Section III, we analyze the security vulnerabilities of load balancing solutions. In Section IV, we present SLBA algorithm in detail. In Sections V, we evaluate performance of SLBA. Finally, we conclude and present future work in Section VI.

2 Related Work

ID manipulating load balancing. These solutions balance load owned by nodes through

elab-orately assigning and reassigning node IDs. ID manipulating solution causes too much additional overhead if load balancing requirement is stringent. To balance load, many node IDs need to be reassigned when a node joins or leaves overlay. This results in migrating a lot of data. It also increases maintenance overhead because many messages are needed to update routing table, which is due to the changes of node IDs.

Reference [1] has proposed the use of the ”power of two choices” paradigm to achieve better load balancing. Each object is hashed to d≥ 2 IDs, and is placed in the namespace of the least loaded node. Reference [2] introduces a scheme where a physical host maintains a set of virtual servers which have overlapping links in the routing table. However, there will still be overloaded nodes. Reference [3] proposes algorithm for ID space balancing. They assign multiple positions of the ID space to every node, but choose only one of those virtual nodes to become active at a time. However, this algorithm needs to frequently adjust the node ID, causing higher load regulation overhead.

Virtual server load balancing. In these solutions, each physical node runs a number of

virtual servers proportional to its capacity. So the load of a physical node is determined by the amount of all the load segments owned by its virtual servers. Based on the changing load distribution, virtual servers can be created, deleted and transferred dynamically.

(3)

Both CAN [4] and Chord [5] had achieved load balancing, and assumed that the capacity of all nodes is equal. CFS [6] simpliﬁes load transfer by removing the virtual server, which may lead to the other nodes overloaded and convergence time longer.

Reference [7] describes three load balancing algorithms: one-to-one, one-to-many, many-to-many, based on virtual server and directory, which are used in static heterogeneous networks and are expanded to dynamics heterogeneous networks [8]. The assignment of virtual nodes is typically performed by one or more directory nodes, which can result in single-point failure.

Reference [9] builds a structure on top of the P2P network: k-ary tree, which is responsible for the collection and the release of node information, as well as the transfer strategy of virtual server. The algorithm makes the network structure more complicated so that the balancing speed and fault tolerance are degraded.

Security models for load balancing. There are only few studies on security models for load

balancing in distributed systems [10]. According to virtual server idea, it proposes k-Choices, a load balancing algorithm for structured overlays that retains the security aﬀorded by veriﬁable IDs. However, it only consider security of node joining, there are also some security vulnerabilities in load distribution collection, load transfer strategic decisions and execution.

3 Security Vulnerabilities Analysis

In order to achieve load balancing, these solutions balance namespace or adjust the number of documents for node through transferring virtual servers or multi-hash, but the emphasis is diﬀerent.

ID manipulating solutions have main characteristic: (1) need to measure and calculate the changing load distribution; (2) balance load owned by nodes through elaborately assigning and reassigning node IDs. Whereas virtual server solutions’ main characteristic: (1) each node runs multiple virtual server; (2) need to measure and calculate the changing load distribution; (3) based on the changing load distribution, virtual servers can be created, deleted and transferred dynamically.

A key issue of P2P is whether or not one assumes it may contain malicious nodes. A malicious node can subvert content or attempt to control particular portions of namespace. Based on main characteristic, our exposition is focused on load balancing policies’ security vulnerabilities.

Node ID generation and ID assignment. To achieve good performance, those solutions

let nodes join as normal and reactively position nodes to arbitrary locations in namespace. Ar-bitrarily choosing IDs forfeits an important security goal for p2p. Attacks that center around the falsiﬁcation of a node’s identiﬁer are called Sybil [11] and ID mapping [12] attacks. The load balancing solutions may facilitate the execution of these attacks. Douceur outlines having a logical center, trusted authority to issue IDs is the only practical way to guarantee a one-to-one correspondence between IDs and the physical entities.

Node Joins, Leaves and Churn. In virtual server solution, when one node departs, it must

take its log(N) VSs with it, causing log(N) times more adjustments to be made. So, it may result in churn attack [13] is easier to implement and more eﬃcient.

Load distribution collection and data correctness and confidentiality. To balance

(4)

validity load transfer strategic decision, it must guarantee that the collected load distribution is valid.

Load transfer strategic decisions and execution and nodes’ security levels. During

load transferring, the security level of transfer node is important for transfer security. On overlay, each node is a node in traditional network. It has a kind of operation system, network protocol. If a node has low security, attackers may intrude the weak nodes, then penetrate into the whole P2P through them. In P2P, the reputation [14] of a node can represent the security level. It is a long-term evaluation. Nodes’ behavior is restricted on the basis of the evaluation, or provides the reputation of the node as a reference when choosing a node to cooperate.

4 SLBA Design

Considering the security vulnerabilities of load balancing policies, we design a security load bal-ancing algorithm called SLBA for DHT. Based on the virtual server, SLBA include a novel veriﬁable virtual ID generation (targeted interval veriﬁable ID generation) and a security virtual server transfer algorithm (security-aware load transfer algorithm).

A. Targeted interval verifiable ID generation

Arbitrarily choosing IDs forfeits an important security goal for P2P. While having a logical center, trusted authority to issue IDs is the only practical way to defend ID attacks. So, we propose that virtual IDs are generated by a central semi-CA server. This option is scalable, because each node contacts server as it joins/leaves and transfers virtual servers.

In DHT system, data distribution is under uniformly random or Zipf query distribution [15]. Under these two distributions, the proportion of data the node is responsible for is also strongly depends on the proportion of the node’s hash space. Thus, hash space assignment during node join is very important for load balancing. We achieve namespace balance through special ID generation mode and examine load balancing under uniformly random and Zipf queries.

To evaluate the balancing degree of load distribution precisely, we need to deﬁne a mathematical way for evaluation. Suppose nodes are indexed from 1 to N. The capacity of node i is Ci. Each node’s capacity can be estimated by the node itself or operator in the same standard. RSpacei is the proportion of hash space which node i actually owns. Then the proportion of hash space which node i should owns is OSpacei = ci/

N ∑ k=1

Ck. We deﬁne node’s LB (Load Balancing Factor) as

LB = RSpace/OSpace−1, so LB is closer to zero means that load distribution is more balanced.

To make sure that node joining won’t break the load balance, virtual IDs are generated with the current load distribution status. Suppose nodes that already in the overlay are indexed from 1 to N-1 and the joining node is N. When node N joins, it first sends a single unit of certified information to semi-CA server, server runs targeted interval verifiable ID generation to generate virtual IDs for node N.

Procedure targeted intervals veriﬁable ID generation mode if N=1

generate C1∗ V random virtual IDs for node N else

(5)

Compute hypothetic LB of node 1 to N-1 in system:

LBi = RSpacei/OSpacei− 1; Create new virtual IDs of new node N

while (RSpaceN < OSpaceN) Select node x:

LBx = max(LBi);

Select biggest VID of node x:

Space(V ID) = max(Space(x′s virtual ID));

Create new virtual ID (xID):

xID = rand(prodecessor(V ID), V ID);

Computer LBx, RSpacex, RSpaceN:

RSpacex = RSpacex− Space(xID);

LBx = RSpacex/OSpacex− 1;

RSpaceN = RSpaceN + Space(xID); end while

end if

LBN = RSpaceN/OSpaceN − 1;

In this algorithm, we ﬁrst ﬁnd the largest space virtual server VID of node x whose LB is biggest. Then, we generate new ID xID randomly in targeted interval of virtual server VID responsible for.

B. Security-aware load transfer algorithm

Once the load balance is broken, virtual IDs are transferred from heavily loaded nodes to light-load nodes according to current load distribution status. However, in DHT systems, the load distribution is unpredictable. How to ﬁnd light-load nodes quickly and correctly is the key problem. Furthermore, there also are some security vulnerabilities in load distribution collection, load transfer strategic decisions and execution. For this reason, we propose the security-aware load transfer algorithm (SALT).

In SALT, we adopt ant colony optimization, which can quickly and correctly ﬁnd candidate light-load nodes in the unknown load distribution system. In addition, both nodes’ load status and reputation are introduced in light-load nodes discovery, so it can achieve good load balancing and ensure the load transfer security.

Considering the real load skew, we deﬁne the node utilization rate µ refers to the ratio of load of node L to its largest carrying capacity C, that is, µ = L/C. And the system utilization rate can be described as µ = N ∑ i=1 Li/ N ∑ i=1

Ci. To assure QoS, we set a threshold LT for µ and prevent node’s µ becoming higher than LT .

According to the ant colony optimization, we also deﬁne some related terms and parameters. Deﬁne 1 Pheromone

(6)

speed faster and the cost lower, producing and updating of pheromone is complete with DHT node routing table update process.

Deﬁne 2 Heuristic

We deﬁne the safety factor as the reputation of node. The reputation (rep) is also introduced in routing selection as a heuristic factor, which can guarantee load transfer security.

Deﬁne 3 Forward probability pk(i, j)

In DHT routing, according to pheromone ph and heuristic rep, next hop is determined. Suppose that node i receives ant k, it will select neighbor j as next hop by pk(i, j).

pk(i, j) =            ph(j)α_∗rep(j)β ∑ routT able(i)

ph(u)α_∗rep(u)β , ph(j) > 0 & j ∈ routT able(i) − tabu(k)

−|ph(j)|α_∗rep(j)β

∑

routT able(i)

|ph(u)|α_∗rep(u)β, ph(j) < 0 & j ∈ routT able(i) − tabu(k)

0 , others

(1)

Where, α and β is the relative important factor of pheromone and heuristic; routTable(i) is the routing table entry of node i; tabu(k) is the taboo list of search ant k.

Deﬁne 4 Constraints (s.t.)

In light-load nodes searching, the target nodes must meet some constraints (s.t.) as follow

s.t.                L(s)− Load transfer ≤ C(s) ∗ LT ∀m ∈ M L(m) + Load transfer(m) ≤ C(m) ∗ ¯µ M ∑ m=1

Load transf er(m)≥ Load transfer ∀m ∈ M rep(s) ≤ rep(m)                (2)

Where, Load transf er is total load which source node transfers out; Load transf er(m) is the load of node m receives. Overloaded node’s µ can drop below LT, light node’s µ can’t exceed ¯µ,

and the light node’s reputation value is higher than resource node’s.

Once node FN’s capacity utilization reaches LT, FN will run the security-aware load transfer algorithm.

Procedure security-aware load transfer algorithm

1) Node FN generates kth(k is initialized to 1) Search ant and set antID , constraints s.t., tabu list, TTL, pheromone and heuristic list of visited nodes;

2) according to the formula (1), node FN chooses the neighbor j whose pk(i, j) is kth largest as the next hop and forwards search ant k to node j ;

3) On reciving ant k, node j puts its ID , Ph(j) and rep(j) into tabu list and pheromone and heuristic list; according to s.t., node j judges whether it is a valid candidate node ;

a. if node j meets s.t., it should generate guid ant which return node FN directly, update TTL=0, and end forward search ant k.

b. if node j does not meet s.t., and all neighbors are in tabu list ,it should generate guid ant which return node FN directly, update TTL=0, and end forward ant k.

(7)

c. if node j meet s.t., nor all neighbors are in tabu list ,it should update TTL=TTL-1; if TTL=0, it generates guid ant which return node FN directly, otherwise node j chooses neighbor m whose pk(j, m) is largest as next hop, and forwards ant k, then go to step 3

4) on receiving guide ant, source node FN should make all candidate nodes form list, and selects target node, then doing load transfer.

5) source node FN caluates the new µ, if µ > LT , then k = k + 1,go to step 1; otherwise the algorithm is end.

5 Performance Evaluation

This section compares load distribution and load balancing overhead between targeted interval veriﬁable ID generation of SLBA and typical virtual ID generation, and evaluates the convergence rate and load balancing security of security-aware load transfer algorithm. Basic parameters are listed in TABLE I.

Table 1: Basic experiment parameters

Parameters Description Value

N Node number 2 ∧14

Load Rate of DHT put/get/remove operations (times per second per node)

[100,5000] ¯

C The mean of system capacity

C Node’s capacity [0.5 ¯C, 2 ¯C]

V The number of virtual servers per capacity in network initialization

log N, 2log N, 4log N

A. Targeted interval verifiable ID generation

To evaluate the eﬀect of load balancing, we compare the load distribution and load balancing overhead between targeted interval veriﬁable ID generation of SLBA and typical virtual ID gen-eration in Chord context. Our solution adopts a novel algorithm to generate virtual ID, while virtual node IDs are generated randomly. ID manipulating solution can distribute load evenly as the other two solutions, and it is not practical because of the overhead issue. So ID manipulating solution is not compared here.

1) Load distribution

Fig.1 shows the maximum and minimum LB. Fig.2 shows the empirical CDF (Cumulative Distribution Function) of LB. As shown, LBs are much closer to zero and maximum LB is reduced a lot when using targeted interval veriﬁable ID generation, which means that our ID generation can increase system capacity a lot. With the increase of V, the load balancing eﬀects of all two ID solutions are becoming better.

2) Load balancing overhead

Virtual server solution has to increase DHT nodes number a lot, which results in increasing routing and maintenance overhead. These overhead is proportional to the number of virtual servers, so they are evaluated with the average of virtual server number per capacity.

(8)

Fig. 3 shows the average of virtual server number per capacity. The value of targeted interval veriﬁable ID generation is about 40% of typical virtual ID generation’s. So, compared with typical virtual ID, our ID solution balancing overhead has reduced by 60%.

Fig.1: The maximum and minimum LB

Fig.2: The empirical CDF of LB

Fig.3: The average of virtual server number per capacity B. Security-aware load transfer algorithm

This section compares convergence rate and load balancing security among security-aware load transfer algorithm (SALT-algorithm), O2O-algorithm [7] (One-to-One) and M2M-algorithm [7] (Many-to-Many). SALT and O2O are full distributed model, while M2M is centralized semi-distributed model, all are the typical virtual server solutions. Parameters are listed in TABLE II.

For meeting the stringent load balancing requirement, all solutions have to adjust the load through transferring virtual servers, which results in increasing candidate node discovery over-head and data migration overover-head a lot. Furthermore, reassign loads among nodes without considering safety factor, resulting in the transfer contents insecurity. Consequently, we will eval-uate convergence rate, data migration overhead and security factor increment. Here, convergence rate is evaluated with detect hops for candidate node discovery per heavy nodes.

(9)

Table 2: Balancing algorithm parameters

Parameters Description Value

µ Node utilization rate [0,1]

rep the quantiﬁed reputation [1,5]

TTL Maximal survival time of ants log2N

α, β Relative importance factor of pheromone or heuristic [0,1]

dNum Directory nodes number in M2M log2N

Fig.4: The convergence rate of load balancing

Fig.5: Security factor increment of load transfer

As shown in Fig.4, Fig.5, experimental results as follows:

• The mean of the detection hops per heavy nodes for SALT, O2O and M2M algorithms are

distributed in [15,20], [25,45], [110,150] range respectively, indicating that SALT algorithm can make the search for light-load nodes more targeted. It is because that ant-based technology is a non-direct collaboration way, which can avoid blind search services, so, it can improve the convergence rate.

• Security factor increment per data migration of our SALT algorithm is larger than the other

two algorithms obviously. It is because that, in SALT, candidate nodes discovery introduces safety factor, which makes a compromise between load balancing eﬀect and load balancing security.

6 Conclusion and Further Work

This paper has analyzed the security vulnerabilities of the typical DHT load balancing mecha-nism, and proposed a security load balancing algorithm called SLBA for DHT. SLBA includes

(10)

targeted interval verifiable ID generation mode and security-aware load transfer algorithm, which can facilitate good performance and do not dilute security. Performance evaluation shows that, compared to the classical algorithms, the load balancing effect of SLBA is significant, the conver-gence rate and load balancing security are significantly raised. In this paper, we roughly analyze the security vulnerabilities of the typical DHT load balancing mechanisms. However, it should to expound the influence of load balancing or overhead because of the introduction of security mechanisms. We are also implementing SLBA and will test its performance in the future.

References

[1] J. Byers, J. Considine, ”Simple load balancing for distributed hash tables”, LNCS, 2003, pp. 80-87. [2] P. Godfrey and I. Stoica, ”Heterogeneity and load balance in distributed hash tables”, INFOCOM

2005, 2005, vol. 1, pp. 595-606.

[3] D. Karger, M. Ruhl, ”Simple eﬃcient load balancing algorithms for peer-to-peer systems”, Theory of Computing System, 2006, vol. 39, pp. 787-804.

[4] S. Ratnasamy, P. Francis, ”A scalable content-addressable network”, Proceedings of the 2001 con-ference on Applications, technologies, architectures, and protocols for computer communications, 2001, pp. 161-172.

[5] I. Stoica, R. Morris, ”Chord: A scalable peer-to-peer lookup service for Internet applications”, Proc. of the 2001 Conf. on Applications, Technologies, Architectures, and Protocols for Computer Communications, 2001, pp. 149-160.

[6] F. Dabek, M. F. Kaashoek, ”Wide-Area Cooperative storage with CFS”, ACM SIGOPS Operating Systems Review, 2001, vol. 35, no. 5, pp. 202-215.

[7] A. R. Karthik, K. Lakshminarayanan, ”Load balancing in structured P2P systems”, LNCS, 2003, pp. 68-79.

[8] B. Godfrey, K. Lakshminarayanan, ”Load balancing in dynamic structured P2P systems”, INFO-COM 2004, 2004, vol. 4, pp. 2253-2262.

[9] Zhu Y, Hu Y, ”Eﬃcient, proximity-aware load balancing for DHT based P2P systems”, IEEE Trans. on Parallel and Distributed Systems, 2005, vol. 6, no. 4, pp. 349-361.

[10] J. Ledlie, M. Seltzer, ”Distributed, secure load balancing with skew, heterogeneity and churn”, INFOCOM, 2005, vol. 2, pp. 1419-1430.

[11] J. Douceur, ”The Sybil Attack”, Peer-to-Peer Systems, Springer, 2002, pp. 251-260.

[12] Davide Cerri, Alessandro Ghiono, ”ID Mapping Attacks in P2P Networks”, IEEE Globecom, 2005, vol. 3, pp. 6.

[13] E. Sit, R. Morris. ”Security considerations for peer-to-peer distribution hash tables”, Future Di-rections in Distributed Computing, Springer-Verlag, pp. 103-107, 2003.

[14] Jochem van Vroonhoven, ”Peer to Peer Security”, 4th Twente Student Conference on IT, Enschede, 2006.

[15] F. Bustamante and Y. Qiao, ”Friendships that last: Peer lifespan and its role in P2P protocol-s”, Eighth International Workshop on Web Content Caching and Distribution, Hawthorne, NY, October 2003.

[16] M. Dorigo. and M. Birattari, T. Stutzle, ” Ant colony optimization”, Computational Intelligence Magazine, IEEE, 2006, vol. 1, pp. 28-39.