Enhanced Management Method of Storage Area Network (SAN) Server with Random Remote Backups

(1)

ELSEVIER

SCIENCE ~__DIRECT"

C O M P U T E R

MODELLING Mathematical and Computer Modelling 42 (2005) 947 958

www.elsevier.com/locate/mcm

E n h a n c e d M a n a g e m e n t M e t h o d of

Storage Area Network ( S A N ) Server

with R a n d o m R e m o t e Backups

S O N G - K Y O O K I M

Mobile Communication Division, Samsung Electronics 94-1 Imsudong, Oumi, Gyeongbook 730-350, South Korea

amang, kim~s amsung, c o m

(Received December 2003; revised and accepted June 2005)

A b s t r a c t - - A SAN (storage area network) is a mass storage solution, t h a t is, designed to provide a high-reliability and high-speed network over fiber optic link. This paper is dealing with the stochastic SAN server management method which is focused on reliability. The (remote) backup servers are hooked up by the long-haul network and replace broken main severs immediately. If the SAN s e r v e r s are represented as "machines", this system can be solved by using t h e stochastic maintenance model with main unreliable and random auxiliary spare (remote backup) machines, subject to random breakdowns, repairs, and two replacement policies: one for busy and another for idle or vacation periods. When the repair facility is not available because of the given conditions, auxiliary machines are being used for backups. Unlike existing models, the availability of auxiliary machines is changed for each activation of the system. Analytically tractable results are obtained by using a duality principle (which enables us to treat a more rudimentary system), semiregenerative analysis, and multivariate marked renewal processes. The results are demonstrated in the framework of optimized SAN server allocation problems with unreliable backup servers. @ 2005 Elsevier Ltd. All rights reserved.

K e y w o r d s - - S t o c h a s t i c maintenance, SAN management, Availability, Stochastic optimization, Closed queues, Duality principle.

1. I N T R O D U C T I O N

Optical networks provide enormous capacities in the networks and a c o m m o n infrastructure over which a variety of services can be delivered. Storage area network (SAN) is one of optical network applications to interconnect c o m p u t e r systems with other c o m p u t e r systems and peripheral equipments. A SAN architecture is identified by fibre channel and consists of one or more servers connected to storage devices via switching devices such as hubs, switches, and bridges. Con- nections between clients and storage devices are controlled by SAN servers t h a t should be very reliable and can s u p p o r t high-speed connections between storage a n d clients. SANs operate at bit rates ranging from 200 Mbps to 1 G b p s over fiber optic link. While the bit rate itself is relatively modest, the SAN is attractive because of the optical layer t h a t can s u p p o r t a huge number of such connections between two d a t a centers. Optical fibre offers much higher b a n d w i d t h t h a n copper cables and is less susceptible to various kinds of electromagnetic interference and other

(2)

948 STORAGE AREA NETWORK (SAN) SERVER

undesirable effects [1]. SAN server for resilience against disaster is a m a j o r issue of reliable network m a n a g e m e n t . We assume t h a t the remote backup SAN servers are geometrically separated with main SAN servers and hooked up by the long-haul networks over fiber optic links.

T h e stochastic model in this p a p e r is considered to improve the availability of SAN servers. Let us assume t h a t we have SAN and spare servers which are linked with long-haul networks. Once all m + 1 servers become intact, the system renders a checkup. Patching applications or upgrading the operating system is a very typical p a t t e r n of maintenance. During this period, the system borrows servers with the latest version of the o p e r a t i n g s y s t e m and applications from the t o t a l q u a n t i t y of from an external provider. Since the usage of external resources are r a n d o m because these are also unreliable. Our model provides decision factors such as the optimal control level, the n u m b e r of spare servers, failure rates, and so on, t h a t are beneficial for the SAN architecture. T h e mechanism t h a t is mentioned above can considered as the stochastic model t h a t consists of machines and a repair facility which fixes broken machines and goes on vacations when all machines are working properly. Unlike other existing models, the availability of auxiliary machines is changed for each activation and it considers the r a n d o m n u m b e r of external backups for m a i n t e n a n c e purposes beyond M / G / 1 closed queueing system.

If each SAN server is considered as "machines", the model forms a class of closed queueing systems with the initial quantity of m + 1 main unreliable machines, S r a n d o m auxiliary reserve machines, also called "super-reserve" machines. T h e concept of super-reserve machines is intoduced in [2]. Main machines represent the SAN servers and super-reserve machines means their remote backups. Main machines are subject to "exponential failures" and their repairs are rendered (in the F I F O order) by a single repair facility (also known as the " r e p a i r m a n " ) with generally distributed repair times. T h e r a n d o m super-reserve facility is "activated" and the re- p a i r m a n is idle or vacationing whenever the n u m b e r of m a i n machines is restored to its original quantity. Even t h o u g h the n u m b e r of super-reserve machines is r a n d o m , we assume t h a t the n u m b e r is fixed when the super-reserve machines are activated. During this time, dropped machines line up in the %vaiting room" until the whole super-reserve facility is exhausted and one more machine breaks down. Then, a r e p a i r m a n ' s busy period resumes and its cycle continues.

T h e classical model with m unreliable machines and no backup [3] is also described in terms of closed queueing systems, i.e., those with so-called "finite sources." and the r a n d o m availability of super-reserve machines makes our model different. T h e s u p p l e m e n t a r y variable m e t h o d and semi-regenerative analysis are used with additional results for semi-Markov processes and analytic solution is very appealing in optimization. T h e current m e t h o d o l o g y (without super-reserve facility) stems from [4-6] and further explored with super-reserve facility [2].

2. D U A L I T Y P R I N C I P L E : G E N E R A L

2.1. S t o c h a s t i c a l l y C o n g r u e n t M o d e l s

T h e duality principle we would like to begin with includes a n o t h e r model, which is more simple t h a n the m a i n one (Model 1) and to which we will refer as to Model 2. Model 2 is similar to Model 1, except t h a t it does not have the backup facility and idle periods. We rather associate it with r e p a i r m a n ' s vacations, which are distributed as regular repairs. However, upon his return, the r e p a i r m a n brings a b r a n d new machine, which replaces any one t h a t breaks down during his vacation trip if any such available. Otherwise, the new machine he brings in substitutes any other machine and in b o t h cases the old machine is disposed. Model 2 is directly connected with yet another model, which we will call Model 3. Model 3 is a regular multichannel queueing system, in notation, (GIo, G I ) / M / m / O (see A p p e n d i x A.1).

Model 2 a n d Model 3 are best observed in a parallel fashion [5]. T h e n u m b e r Zt of intact machines in Model 2 corresponds to the total n u m b e r of customers in Model 3. m - Zt gives the n u m b e r of defective machines waiting for repair in Model 2 and it also gives the n u m b e r

(3)

of available channel in Model 3. Whenever a machine of Model 2 breaks down, a customer of Model 3 is processed and leaves the system. A customer in Model 3 is called for service to occupy a channel just vacant, while in Model 2, the latter joins the queue of defective machines. Whenever a machine is completely repaired, it joins the main facility while this event corresponds to the arrival of a new customer from an outside source. If the channel is full, this c u s t o m e r gets lost. Understandably, the channel must have been filled up upon the previous arrival and there has been no service completion since then. Correspondingly, the r e p a i r m a n leaves on vacation when all machines become intact and upon his return, the r e p a i r m a n a t t e m p t s to replace a defective machine by a new one he carries out. If there is none available, he m a y return the new machine for refund. In any event, the status of the reliability system, in this case, is not altered. We call Models 2 and 3 "stochastically congruent." This idea has been previously utilized in [5,6], in connection with simpler models.

2.2. N o t a t i o n s

In this section, we will describe all the above models more formally. Denote by Z 1 the total n u m b e r of intact main machines at time t in Model 1. If a r e p a i r m a n fixes all machines completely, the total n u m b e r of main machines is restored to m + 1. Then, he goes on vacation until S + 1 machines break down, after which the r e p a i r m a n resumes his work. In other words, a busy period begins. As we already mentioned, backup machines are replaced when m a i n machines fail during r e p a i r m a n ' s idle period. Denote by S, the r a n d o m n u m b e r of external backups t h a t are used during idle periods. T h e P M F (probability mass function' of S (the n u m b e r of super-reserve backups) is

s ( n ) : = P { S = n } , n = 0 , 1 , . . , N ... (2.1) with the m e a n r - E[S] and Nmax is the m a x i m u m n u m b e r of externall backups with the control level N ( < N ... ). Let TO(= 0 ) , 7 1 , . . . be the successive m o m e n t s of repair completions. T h e r a n d o m variable 7n+1 -~-~ is supposed to have a P D F (probability distribution function),

A ( x ) : = P { ~ + l - ~n _< x}, x > 0, (2.2)

with m e a n a = E[7-,~+1 - T~] < oo. If upon the service completion, the totM n u m b e r of in- t a c t machine is greater t h a n m, the P M F of next service completion period is Ao(x) with the m e a n ao = f~+ xAo(dx). Each of the main machines breaks down independently of each other and of repairs, and according to the exponential distribution with p a r a m e t e r 0 < # < oo.

In Model 2, there is a m a x i m u m of m main served by a single repairman. T h e total number of main machines at t i m e t (> 0) is denoted by Zt. Unlike Model 1, there are no idle periods even when all m a i n machines become intact. Let T~ be the n t h repair completion. If the line of defective machines is nonempty, the r e p a i r m a n continues to repair a next machine immediately. W h e n the cumulative n u m b e r of intact machines becomes m at Tn, the r e p a i r m a n leaves the system and returns at T~+I with a b r a n d new machine. We assume t h a t his vacation is distributed as his regular service time. T h e new machine replaces a defective one, if by t h e n available, or otherwise- refund the new machine. We suppose t h a t the last action does not affect the status of the system in this particular problem setting. T h e r a n d o m variable T,~+I - Tn is stochastically equivalent to Tn+l --T~ of Model 1. T h e assumption a b o u t the failures distribution is the same as of Model 1 (i.e., exponential distribution with p a r a m e t e r #).

3. D U A L I T Y P R I N C I P L E :

C O N N E C T I O N B E T W E E N T H E M O D E L S

In case of the given n u m b e r of external backups S, it can be shown t h a t 7-1,72... and T1, T2, . . . are sequences of stopping times of the processes,

(4)

950 S T O R A G E A R E A N E T W O R K ( S A N ) S E R V E R and

( , t 2 , ( P ) x e e , ( Z t ; t > O ) ) E {0,1, . , m } , (3.1a)

respectively, and that these processes are semiregenerative relative to these sequences [5, p. 246]. Let ~ := Z 1 ,rn_ and Xn := ZT,~-, n 0, 1, 2 , . . . . and Consequently, (~1,"~[1, ( P X)xc E , (~n; ?~ = 1, 2 , . . . )) --* E

(3.2)

( f ~ , l l , ( P ~ ) x c E , ( X n ; n = l , 2 , . . . ) ) ~ E (3.2a) are embedded Markov chains (MC). Both Markov chains are stochastically equivalent and ergodic. Their limiting probabilities are expressed through the common invariant probability measure P = (P0, P1 ... P c ) . Since ((n) and (X,~) are stochastically equivalent, only one of them will be treated.

Let (ft1,£11, (P~)x~E, (Ytl;t > 0)) ~ E and (ft,tl, ( P ~ ) ~ S , (Yt;t > 0)) --~ E be the minimal semi-Markov processes associated with the above sequences of stopping times, respectively. In the case of the Model 1, we need to find the limiting probabilities,

y~ := lim P{Yt' = k} = PkMk k e E ,

(3.3)

x - - , ~ P M ' (cf., [7, p. 342]), where { a , [ ] k = 0 , 1 , . . . , m - 1 , (3.4) Mk := Ek[T1] = E ao(S + 1) + ( S ÷ 1). , k = m, m # j

and P M is the scalar product of P and M = (M0, M 1 , . . . , Mm) T, which equals

P M = f i + P m . E aoS + m---fi-- ' where

O~ = (1 - P c ) a + Pmao.

It is readily seen that the process Zt and Zt 1 on their busy periods are stochastically equivalent, or formally, P~{Zt = k} = P~{Zlt = k I t E Busy Period} (k E E). Furthermore,

P~ {Zt = k} = P z {Zlt = k l t c Busy Period} _ P~{Zlt = k , Z 1 C {0, 1 , . . . , r n } } P x { Z l t C {O, 1 , . . . , m } } 0, k = m + l , = = k } 1 _ p~{ztl = m + 1}, k < m . Therefore, P ~ { Z I = k } = ( 1 - P x { Z l t = m + l } ) P ~ { Z t = k } , k = 0 , 1 , . . . , m . (3.5)

On the other hand, breaking a period when Yt 1 = m in two intervals, of which the first one is an idle period and the second one is the repair period of all backup machines, we have

lira p x {t E Idle Period I S = n, Yt ] = m} = (n + 1 ) / m # = 1 .

(5)

From (3.3),

P,,,(n +

i)(I

+ a0m#)

lim P* {Yt 1 = m ] S = n} = m # ( a + P,,aon) + Pm(n + 1)" (3.r)

t ---+00

1 : } , ~ k ( n ) : = l i m t - ~ {Z~ k I s = k = 0 , 1 , . . ,m, be the

Let ~k := limt__+~ P ~ { Z 1 k 1 p x = n}, .

limiting probabilities and conditional probabilities of the process Zt 1. These probabilities exist, under the same conditions as those for the embedded process. Then, from (3.6),(3.7), it follows that

: .

(3.8)

~ m + l (7/) : lim P x { z 1 t = rn + ] I S = n} Pm(n + 1)

t---~oz m~(a + Pmaon) + Pm(n + 1)

From (3.7),(3.5) yield

= -- ~m+] (n)) 7rk, k = O, 1 , . . . , m , (3.9)

where 7rk := l i m t ~ P x { Z t = k} is subject to our consideration in Section 4. Because of conditional expectations, we have

and

,

i

[

Pm(S ÷ 1)

]

=

[~m+l (S)]

= ~

mp(~t + PmaoS) + P m ( S + 1) 7 r m + l E 1 l~,

[(1

1 71"k = -- ~ r n + l (S))] 71"k,

F r o m

(3.1), (3.10), and

(3.11)

yield

(3.10)

k = 0 , 1 , . . . , m . (3.11)

i

f i

s(n)P.,(n

+

1)

X , n + l = ~#(0, ÷ Pmaon) + P.~(n + 1) rt=O and

1 ~

S(~1,)TD~#(Ct

-~ Pmaon)

~k ~k

_n=O

mp(a + Pmaon) + P.z(n +

1)'

k = 0 , 1 , . . . , m .

4. S T O C H A S T I C

P R O C E S S

W I T H

C O N T I N U O U S T I M E P A R A M E T E R

We now turn to the continuous time parameter queueing process of Model 3. The below treatment is similar to that of Dshalalow [5] (see also [2]). We bring some details, for the sake of consistency. Let (Nt) be the counting process associated with the point process (Tn) and Vt := TN,+I - - t , t >_ O. {Vt; t _> 0} gives the residual time from time t to the next repair completion TN,+I. The process

(

a,u,( L~,(z~,E)L>0 ~(~,;~(~)),

v

)

~ : = E x ~ + ,

is weak Markov. Let (Pi i) be the Markov semigroup associated with (Zt, Vt) assumed to be absolutely continuous, i.e., with

P](k × [0, y ] ) : = P i { Z t = k , Vt C [0, y]},

(4.1)

and 7r~(t,u)du = P i { Z t = k, Vt E [ u , u + d u ) } , i , k C E, A ( x ) , j = O , . . . , m - 1 , Aj (x) = Ao (x) , j = m. ~, t c R+, (4.2)

(6)

952 STORAGE AREA NETWORK ( S A N ) SERVER

Since we are dealing with equilibrium, we do not need to consider the initial state. So, we are dropping the index (i) from 7r. To find the limiting distribution 7r of the process

(Zt),

we will start with the Kolmogorov differential equations, assuming t h a t

Aj(x)

(of

(3.2))

has the Radon- Nikodym density

aj(x),

which is pointwise continuous. From (4.2), the Kolmogorov equations arc as follows,

( 0 0 )

0 u 7rk(t, u) = -kpTrk(t, u) + (k + 1)~zTrk+l (t, u) (4.3)

+Trk_l(t,O)a(u),

k = 1 , . . . , m -

1;

(o0)

_Ot

_{O-u 7rk(t,u)}

_{= --mpTrrn(t,u) +}

_{pTrm(t,O)a°(u) + 7r,~_l(t,O)a(u),}

_{k =}

_m.

_(4.3a) T h e process

(Zt, Vt)

is aiso semiregenerative relative to the sequence To, T 1 , . . . , and its limiting distribution exists. Let

7rk(u) := t~eclim

7rk(t,u),Crk(O):=

j f

7rk(u)e-O~du,

Re(0) _> 0, k C E .

+

Letting t --~ ec in (4.3) and t h e n applying the Laplace transform, we have

(0 - ]¢#)Trk(O)

= 7rk(0) -- (]; + 1)#~k+1(0) -- OL(0)Tl'k--l(0), k ---- 0 , . . . , 7)2 -- 1; (4.4)

( 0 -- TFt~t)~rn(O ) = 7 r r n ( 0 ) -- 71"m_1(0 ) O~(0) -- 71"m(0 ) Oz0(0).

W i t h 0 + 0 in (4.4),

- ~ u ~ =~(o)

(k+l)U~k+~

~k_~(0).

S u m m a t i o n over k = 1 , . . . , n - 1(< m - 1) in (4.6) t h e n yields

(4.5)

(4.6) 7 r n - 1 ( 0 ) 7On - - - - , ~t ~- 1 ~ . . . , ~ 7 ~ . n # (4.7) From (4.6), Using (Tr, 1) = 1, we get

~m_l(o)

m ~ Tr0 = 1 - )___, 7rn. n = l

To find the unknowns 7rk(0) when k = 0, 1 , . . . , m, we use the approach in [3,4]. Denote (4.8)

(4.9)

pjk(t) : - PJ { z t = k I r~ > t}. Note t h a t

~ pjk(t) A(dt) = Pjk,

+

j,k c E,

(4.10)

are the transition probabilities of the embedded MC (Xn). We use the n a t u r a l assumption t h a t

P J { Z ~ = k l T l > t } = P J { Z t = k l T l > t + u } , V u > O .

Let

K3t(k

× [0, y]) :=

PJ{Zt = k, Vt c

[0, y], T1 > t}. Then, from the main convergence

theorem,

lim

P{(k × [O,y])= (~)-lj~EPj /~ KJt(k × [O,y])dt,

(7)

where

a = (1 - P m ) a + P m a o . By elementary probability arguments,

K ~ (k x [0, y]) :

Pjk

(t) [A (t + y) - A

(t)]

and then,

lim

P[(k

x [O,y]) = lim

Pi{Zt = k, Vt C

[O,y]}

t ----* O0 t--+ O~

: ( ~ , ) - - 1 E PJ

~ KJ(k

x [0, y ] ) d t dEE +

= (gt)-I E PJ £ pjk(t)[Aj(t + y) - Aj(t)] dt,

j@E +

= ( a ) - I E PJ

p;k(t)

aj (t + x) dx dt

j 6 E + = 0

(by Fubini's theorem),

= ( ~ ) - - 1 E

PJ

pjk(t)aj(t + x) dt dx.

= 0 j E E +

From (4.2) and (4.12),

~(x) = (~)-~ ~ Pj £ pj~(t)~j(t + ~)dt.

j E E +

When x -+ 0 in (4.13), due to (4.10) and Pk = Y-]~jeE

PjPjk,

j, k E E , y E N +

(4.12)

(4.13)

(4.15)

where P = { P 0 , . . . , P,z} from (A.1). For the process Z 1, the corresponding formulas yield (from Section 3), 7rm+ 1

: E

m#(g

+ Prnao n) 4-

Pm(n + 1)'

~

~(~).~(~

+ Pmao~)

~k = ~k , ~ ( a + P~a0~) + P ~ ( ~ + 1)' 7z=O k = 0 , 1 , . . . , m , (4.17) (4.16) along with (4.15). m 7to = 1 - E r r n ' k = 0, n ~ l

Pkl

7r k - - k = 1 , . . . , m , k#a '

where Pk from (A.1). From (4.7)-(4.9) and (4.14), we finally arrive at the limiting distribution rr,

Pk

~ k ( o ) = - - , k = o , . . . , , ~ (4.14)

(8)

954 STORAGE AREA NETWORK (SAN) SERVER

5. O P T I M A L I T Y O F T H E S A N S E R V E R M A N A G E M E N T

In this section, we will deal with a class of optimization problems t h a t arise in reliability. Let us formalize a pertinent optimization problem. Let a strategy, say Z , specify, ahead of the time, a set of acts we impose on the system, such as the choice of repair distribution, the number of main and external backups, statistics of failure rates dependent on the nmnber of backup machines t h a t enables us to spend more or less time on the maintenance and so on. On the other hand, a system can be subject to a set, say C, of cost functions. Denote by ¢ ( Z , C, t) the expected costs within [0, t], due to the strategy Z costs C and define the expected cumulative cost rate over an infinite horizon,

¢ (E, C) := t__+o~ 7 ¢ 1 i m 1 (Z, C, t) . (5.1)

We formulate our problem for a special case. Let

g(u)

denote the cost function associated with the system spending u units of time for the super-reserve machine usage during all idle periods t h a t fall into the period of time [0, t]. If {Yt~; t > 0} is the semi-Markov process (from Section 2.3) and

g(u)

is a linear function, i.e.,

g(u) = cu

where c is cost rate of idle periods, then f0 t 1{,~+~} (Y~)du gives the time spent by the system during all repairman's idle periods within [0, t], where 1A is the indicator function of a set A, and

Ug(t) = ]Ei [g ( fot l{~} (Y~) du) ] = cE~ [/ot l{rn} (Y~) du]

is the corresponding expected cost. By Fubini's theorem,

~0 t

Ug(t) = c

pi {y~ = m} du.

(5.2)

Let

fl(n)

be the reward rate for n intact main machines. Then, the expected reward for all

working machines in the interval [0, t] is

[f0t

]

[r~l

/ t

]

Ufl(t) = IE i

f l ( Z 1)

ds = E i

fl(k)

l I k } ( Z 1)

ds

kk=0

=0

(5.3)

(which by Pubini's theorem is)

m+l

t

= Z s l ( k ) f P {zl=k}as.

k=O

=0

If Lt denotes the number of failed machines waiting for repair, during a busy period, at time t, then it is £t = m + 1 Zt 1. Analogous to the above calculations, the expected cost due to failures during all busy periods within the time period [0, t] is

[jf0 t

] rn+l

/ ~

Uf2(i,t)

= E i

f2(L,)ds

= E f 2 ( m + l - k )

Pi{z~ = k } ds.

k=0

0

(5.4)

Since the Markov renewal function

Ri(t)

= Ei[Nt] ]Ei[~n~_-0 l[0,t](T~)] represents the total expected number of machines refurbished in the time interval [0, t], the functional

ldR i (t) = rR i (t)

gives the total reward for all machines completely processed in [0, t], if r is the common reward for each repaired machine.

(9)

The cumulative cost of the entire procedures involved in the reliability system of main and borrowed super-reserve machines, in the interval [0, t] is

¢ (Z, C, t) = Ug (t) + Ufl (t) -- U f2 (t) + LIR ~ (t)

m + l t

=~

{z: =~} du+ ~ fl(k)

{z~ =k} ds

k = O =0 m + 1 t

+ Z f ~ ( . ~ + l - k ) f , P~{zJ=k} d~+~R*(t)

k=0 =0 ~0 t m + l

f t p i

=~

P~{Z3=.~} d ~ + ~ (fi(k)+f~(m+l-k)).

{Z l=k} d~+~R~(t)

k=0 J s = 0

Among various constraints, it is reasonable to include t h a t the quantity of backup machines must be large enough as to provide some maintenance (or upgrade or retraining) of m + 1 main machines and this can be done as

1 { r e + l } ( Z 1 ) du > 1 " ( r e + l ) , ( 5 . 5 )

t ~ o o t =0

where F is an appropriate "reconciliation" function of m. The value on the left of (5.5) is the portion of a maintenance period within a unit time (we can call it shortly the maintenance rate). On the right-hand side, we set the minimal maintenance rate as a function F of internal

t

machines. In terms of our system, f£=0 l{m+l}(Z~)du is an "idle period" within [0, t], during which we assign the repairman to the above mentioned maintenance. In thc case of computer networks, an upgrade can be anything from installing a newer version of operating system and virus checking to patching the network server applications.

Now, we turn to convergence theorems for semiregenerative, semi-Markov, and Markov renewal processes,

(i)

lim

~ R i ( t ) -

1

t - ~ P M ' (ii) (iii)

-/o

1

lira 1 t p i {Z 1 k}

ds

7rk,

t---* oo t lira -1

f t pi {y~ = k} du _ __PkMk

t ~ o ~ t Jo P M '

to arrive at the objective function ¢ ( Z , C) which gives the total expected rate of all processes over an infinite horizon. In light of equations (i)-(iii), we have

m + i

PmMm

1

6 (Z, C) = t--,~olim 6 (Z, C, t) = c p ~ + r ~ - ~ + E (fl (k) + f2 (m + 1 - k)) 7r~. (5.6)

k = 0

With fi being linear functions, we have

r n + l

P ~ M m 1

¢ ( Z , C ) = c P M

+r~--M+ E [kh+(m+l)l]Tc~,

(5.7)

(10)

9 5 6 S T O R A G E A R E A N E T W O R K (SAN) S E R V E R

w h e r e h, l are relevant (cost) constant coefficients. Recall that f r o m (4.15)-(4.17), w e have

1 ~ S (n) P,,~ (n -F 1) 71-m+1 = 2_~ ?T~t (a -}- Pma0?%) Jr-

Prn (~ q- 1)'

t z = 0 s (n) rnp (5 + Pmaon)

E

rr k = rrk k = 0, l,...,m, and m 7r0 : 1 - ~ rr~, k = 0, Pk-1 k = 1 , . . . , r n . :rk - k~t> '

W i t h these a t t a c h m e n t s and (A.1)-(A.5), formula (5.7) for the objective function is complete.

6. O P T I M I Z A T I O N E X A M P L E S

In this section, we intend to d e m o n s t r a t e the t r a c t a b i l i t y of our results in Sections 2-6 on a special case of super-reserve machines. We allow an exponential distribution of repair times. Since our m e t h o d involves operating with an e m b e d d e d Markov process in a multichannel open queue with a finite buffer, we get back to Section 3 for some particular formulas.

6.1. M / M / m / 0 Queueing S y s t e m

T h e pertinent special formulas of Section 3 under these assumptions are as follows,

A ( x ) -- Ao (x) = i

- e -(1/a)x,

(6.1)

JfR

~°°o(1/a)e-(°+l/a)Z dx

a(O) = e - ° x d A ( x ) = _ 1 + = 1 ~-aO' 1 oe (rap (1 - z)) = 1 + a m # ( 1 - z)' (6.2)

Now, we can calculate the remaining p a r a m e t e r such as a~ from (3.3), B , from (3.2) to get Pk

(from (3.1)).

Once we get Pk, the probabilities rrk can be found by using formula (4.15). 6.2. S A N Server Allocation Optimization: Unreliable External Backups Servers

Let us consider the n u m b e r of external backup servers has a binomial distribution. In this case, the s y s t e m borrows N external backup servers but these backups are also unreliable at the m o m e n t of borrowing. If the probability of the intact backup servers is p (i.e., Bernoulli r a n d o m variable), the s u m of unreliable backups yields the binomial r a n d o m valuables. So,

(3.1)

yields

s ( n ) = ( N ) p n ( 1 - p) N - n , (6.3) and N 1 = E 7rm+ 1 [qOl+l(S)] = ~ (pl+l(n)8(n)" r~--O

We also specify the remaining of the five p r i m a r y cost functions are as follows,

f l ( n ) = G . n , f 2 ( n ) = B . n , (6.4)

where G is the reward for one SAN server, B is the p e n a l t y for one broken server during all busy periods all per unit time. F r o m (5.7) and (6.4), we have

r n + l P m M , ~ 1 ¢ ( Z ( N ) , C ) : t _ ~ o l i m ¢ ( Z ( N ) , C , t ) : c . - - + r . p M ~ + ~ - ~ ( f l ( k ) + f 2 ( m + l - k ) ) r r ~ k=0 m+l P m M , ~ 1

1

= c . p ~ ÷ r . ~ - - ~ + ~ ( G k + B ( m + l - k ) ) r r k k=O

(11)

Optimal Soultion of Servers 9.9255 . . . 9925 o 9 9245 09 9924 E O 99235 9 922,

... i ... i ...

5 10 15 20

The Number of Ser'cers with 7 channels (m=6)

F i g u r e 1. O p t i m i z a t i o n o f S A N s e v e r b a c k u p s .

w h e r e

m + l

- 1 1

_(6.5)

k - - 0

Finally, we arrive at the following expression for the objective function,

P,~Mm

1

¢ ( Z ( N ) , C ) = c .

p ~ + r . ~ + ( G - B ) 2 ~ + B ( m + l )

(6.6) Here, we use formulas (3.7), (4.15)-(4.17), and (6.6). Notice t h a t we have the p a r a m e t e r N vary. Even though we cannot see the m a x i m u m n u m b e r of backups N in (6.6), it is included in all ~ (see (3.10),(3.11)).

C o m p a r i n g (6.6) w i t h (5.7), w e identify h = G - B a n d l -- B. W e restrict the initial strategy of this m o d e l to one, w h i c h includes only the control level N of super-reserve machines. In other words, w e n e e d to find an N , such that

¢ ( Z ( N ) , C ) = min {¢ (Z ( N ) , C ) : N = 1 , 2 , . . . }. (6.7)

As an illustration, we take c = 2, G = 2, r = 4, and B = 1. Repair time distribution is exponential with m e a n a = 1.2 and the p a r a m e t e r # = 0.2. Take the t o t a l number of SAN servers as five.

Now, wc calculate ¢ ( Z ( N ) , C) and No t h a t gives a m i n i m u m for ¢ ( & ( N ) , C). In other words, the control level No stands for the available number of backup servers (super-reserve machines) which minimizes the total cost of this system. Below is a plot of ¢ ( Z ( N ) , C) for N C {1, 2 , . . . , 20}. Our calculation yields t h a t No = 5 for which the minimal cost equals 9.9233. It means t h a t we allocate our internal resources to seven m a i n (m + 1) machines and obtain the decision value No = 5 which is the n u m b e r of external backups under availability p ( = 0.2) for a backup server. T h e m a x i m u m number of external servers is needed to minimize the cost of this SAN model.

7. C O N C L U S I O N S

This p a p e r shows the analytical approach of the SAN m a n a g e m e n t m e t h o d by using the closed queueing system with flexible conditions. This approaches theoretical, b u t feasible to a p p l y real- world applications such as networked server allocation m a n a g e m e n t [8] and design of enhanced secure network infrastructure [9]. Analytically tractable results are o b t a i n e d by using a duality principle, semiregenerative analysis and multivariate marked renewal processes. I m p l e m e n t a t i o n of this model and comparison between the model and the actual d a t a will be the further research t h a t is related with this paper.

(12)

958 STORAGE AREA NETWORK (SAN) SERVER

A P P E N D I X A . I . E m b e d d e d P r o c e s s o f M o d e l 3

Model 3, as mentioned, is the conventional (GIo, G I ) / M / m / O (nlultichannel) queue. Recall t h a t such a s y s t e m is c h a r a c t e r i z e d by t h e g e n e r a l - i n d e p e n d e n t i n p u t (i.e., a renewal process), m parallel channels w i t h o u t a n y buffer. A c u s t o m e r enters a free channel available w i t h his service d e m a n d d i s t r i b u t e d e x p o n e n t i a l l y with p a r a m e t e r #. I n t e r - r e n e w a l times are d i s t r i b u t e d in ac- c o r d a n c e w i t h t h e P M F A(x) a n d Ao(x). Model 2, as we see it, is c o n g r u e n t t o Model 3, while Model 1 is dual with Model 2, so also with Model 3. T h e l a t t e r is a classical s y s t e m investigated b y Takacs [3]. T h e s t a t i o n a r y probabilities P = (P0, P ~ , . . . , Pro) for t h e e m b e d d e d process are k n o w n to satisfy t h e following formulas,

where B _{, ( k ) ( - 1 ) ~ - k}r r ~ k P k rr~ 1 k = O , . . . , m - I, (A.1) Bn = r = n (A.2)

(7)/

'

O~0 ~ Ozr I, r = 0 , ~ : = ~ a i (A.3) - - , r > I, l-'ii=0 1 - a~

/0

a (0) = e-°~A(du), a ° (0) e-°~Ao(du), (A.4)

o c~o

aT = a ( r # ) , a r = ( r # ) . (A.5) R E F E R E N C E S

1. R. Ramaswami and K.N. Sivarajan, Optical Networks: A Practical Perspective, American Press, San Diego,

CA, (2002).

2. s. Kim and J.H. Dshalalow, A versatile stochastic maintenance model with res. and super-reserve machines,

Methodology and Computing in App. Probability 5 (1), 59-84, (2003).

3. L. Takacs, Introduction to the Theory of Queues, Oxford University Press, New York, NY, (1962).

4. J.H. Dshalalow, On the multiserver queue with finite waiting room and controlled input, Adv. Appl. Prob. 17, 408-423, (1985).

5. J.H. Dshalalow, On single-server closed queues with priorities and state dependent parameters, Queueing

Systems 8, 23~254, (1991).

6. J.H. DshMalow, On a duality principle in processes of servicing machines with double control, J. of Appl.

Math. and Sire. 1 (3), 245-251, (1988).

7. E. Cinlar~ Introduction to Stochastic Processes, Prentice Hall, Englewood Cliffs, N J, (1975).

8. S. Kim, Enhanced networked server management with random remote backups, S P I E Proc. on Performance

and Control of N. G. Comm. Networks 5244, 106-114, (2003).

9. T. Kawno, S. Matsui, T. Yasue and C. Konno, Secure communiction infrastructure for mobile, Hitaeh Review 48 (1), 15-20, (1999).