An Improved Data Mining Technique Combined Apriori Algorithm with Ant Colony Algorithm and its Application

(1)

An Improved Data Mining Technique Combined Apriori Algorithm with

Ant Colony Algorithm and its Application

Li Guodong ,

Xia Kewen

School of Information Engineering

Hebei University of Technology, Tianjin, China

Tianjin 300401, China

E-mail:[email protected]

doi:10.4156/jdcta.vol5.issue8.27

Abstract

In this paper, Apriori algorithm has been improved and applied to substation data mining process. Ant colony algorithm is applied to get the optimal solution of reactive power allocation in substations. The state transition probability formula is amended and parameters are dynamically adjusted in this ant colony algorithm. The choice of the ant’s path to the next node is determined by the tabu table formulated according to the confidence level of the data mining. The switching strategy of the capacitor sets are given by online algorithm.

Keywords

: Ant Colony Algorithm, Data Mining, Reactive Power Optimization

1. Introduction

Electric power system is a large-scale nonlinear interconnected system. It is difficult to extract the useful information from the accumulated continuously running data for operators in power system. The data mining technique can take full advantage of these operating data to reveal the principles and rules that the power system contains through association analysis, classification and prediction, clustering analysis, outlier analysis, and so on [1-3]_{. Data mining technology has been applied in many fields such}

as credit card management, churning analysis and so on. Most researchers focus on the study of data mining models [4-6]_{. The application of traditional data mining techniques is continually facing new}

challenges in power system because an ever increasing amount of data is still being produced at high rates in power system and the analyses of the data often needs to be conducted in real-time and under time constraints.

Ant colony algorithm (ACA) is a new method for solving the optimal combination problem [7]_{. In}

recent years, researches on the ant colony focus on improving the traditional ant colony algorithm, such as TSP optimal problems and its extended application of the ant colony algorithm to other areas, such as data mining and knowledge discovery [8-10]_{. Paper}[11]_{adjusts the ant colony pheromone adaptively}

under the limitation of pheromone to further solve the stagnation problem and improve the searching ability of ACA. Paper [12]_{applies ACA to optimize the rapid microgrid power management problem}

given complex constraints and objectives including: environmental, fuel/resource availability, and economic considerations.

Reactive power plays an important role in supporting the real power flow by maintaining voltage stability and system reliability. The available reactive power capabilities of the system have to be optimally deployed so that bus voltages are kept within specified limits. The purpose of reactive power dispatch is to determine the proper amount and location of reactive support with several constraints. Paper [13]_{focuses on the voltage/reactive power problem keeping the real power flows fixed to values}

determined from a base case load flow analysis. In paper [14]_{, optimal power dispatch is solved by}

time-varying acceleration coefficients particle swarm optimization (TVAC-PSO). It proposes a comprehensive model for reactive power pricing in an ancillary services market. Paper [15]_{presents an}

efficient Genetic Algorithm (GA) based reactive power optimization approach to minimize the total support cost from generators and reactive compensators.

This paper focuses on the problem of extracting useful data for effective decision-making of reactive power optimization. It describes the concepts and improvements of association rules algorithm - Apriori algorithm and ant colony algorithm. The improved Apriori algorithm is applied to extract the

(2)

useful information for the ACA from the large number of running data in the substation operation process. The overall model based on Apriori algorithm and ant colony algorithm is established for reactive power optimization. An example power substation is used to illustrate the application of the proposed models in the voltage and reactive power automatic control system. Based on historical data, the proposed method is used to get the optimal operating conditions of the optimal solution to guide the practical operation.

2. Data mining

2.1. Principle of Association Rules Method

Association rules method is represented simply as

A



B

. Where, A⊂ I ; B ⊂I ; B∩A= φ .

The support level of

A



B

is

sup

port A

(



B

)



P A

(



B

)

The confidence level of

A



B

is

sup

_

(

)

(

)

(

)

sup

_

( )

port count A

B

cofidence A

B

P B A

port count A







(1)

Where,

sup

port count A

_

(



B

)

is the record number of the items which include

A



B

;

sup

port count A

_

( )

is the record number of the items which include A.

The support level indicates the statistical importance of association rules in the whole data set. The confidence level indicates the credibility of the association rules. Generally, the useful association rules are the ones with high support level and confidence level. The data mining process can be divided into two parts: (i) mining the large items set whose general support level is higher than the pre-set value; (ii) get the association rules whose support level is higher than the pre-set minimum support frequency.

2.2. Improved Apriori Algorithm

The Apriori algorithm proposed by Agrawal in year 1994 is recursive and includes two main steps:

(i) Get the frequent K- item on the frequent (K-1)- item.

(ii) Calculate the support level of the candidate set on the database scanning and pattern matching.

It can be included that the candidate set is too large and the database is scanned repeatedly in the Apriori algorithm. A improved method without these two drawbacks is applied to the data mining in the historical database of the substations. It is described as follow:

(i) Preprocess the original data based on partition. It divides the database of the substation into 9-zones according to the requirement of reactive power and bus voltages. Then it focuses on the data in the area except the normal running area. So it is time-saving and fast-accessing because it only scans the corresponding area in the database without scanning the whole database.

(ii) Classify with similarity search, according to central substation operation conditions. The association level of the selected data is improved to meet the requirements of practical operation.

(3)

3. Online optimal algorithm and overall model

3.1. Model of Ant Colony Algorithm

Let m be the number of the ants;

b t

_i

( )

the number of the ants at moment t and element i;



_ij

( )

t

the information in path (i, j) at moment t; dij ( i,j =1,2,…, n) the distance between cities i and j. At

the beginning,



_ij

(0)



C

(C is constant).

When a ant k (k=1,2…,m) is moving, it collects the information in the path to choose the next path. The state transition probability of ant’s shift from city i to city j at moment t is represented as

[ ( )]

,

[ ( )]

( )

0

k k ij ik k k k ij ik ij s allowed

t

j

allowed

t

p t

   





















 







(2)

Where, allowdk ={0,1, …,n-1}; tabu k represents the possible cities allowed to choose in the next

step. The artificial ants have the function of memory. Tabu k (k=1,2…,m) records the cities the ant has gone to in the last k steps. And it is updated dynamically as the evolutionary process. After a circle with n times, the ant passes all the cities. Each path traversed by an ant is a solution. The information in each path is updated as

(

) (1

)

( )

ij

t

n

ij

t

ij

t





 

 



 



(3) Where, 1

( )

m k

( )

ij ij k

t













(4)

) 1 , 0 [ ⊂ ρ is volatile factor; 1-ρ is information residual factor. k

( )

ij

t





is the residual information between city i and city j and can be represented as

,

( , )

( )

0,

k k ij

Q

if ant k pass path i j

L

t

else





 

 













 



(5)

Where, Q indicates the pheromone intensity; Lk is the total length of the path the ant k passed in

this cycle. After several cycles, the calculation ends based on the stop condition.

3.2. Improvement of Ant Colony Algorithm

The improvement of the ant colony includes:

(i) Selection of parameters: The parameters are dynamically adjusted. At the beginning, the parameters are set at a small value, to avoid "false positive feedback" and "solution loss". When the calculation is running after a certain number of cycles, the parameters are increased to improve the solution quality.

(ii) Modification of the parameters: The state transition probability in (2) is modified according to the results of data mining. The higher the confidence level and the pheromone concentration are, the greater the probability that ants choose.

(4)

In the ant k passes path (i, j) , k

( )

ij

t





is represented as

(1

)

( )

k ij k

Q

p

t

L









(6)

Where, p is the confidence level. The tabu table is established according to the results of data mining. And it is updated after each ant’s choice until the new optimal strategy is found.

(iii) Selection of paths: First, calculate the reactive power supplied by the capacitor sets in all the substations to establish all the working states. The probable strategies are found out when the reactive power shortfall is compared with the calculated reactive power. The strategies with great difference are aborted. Number the left states and find out the confidence level through data mining.

Second, the path selection strategy in the basic ant colony algorithm is adjusted. The probability of paths that ants choose is set as the confidence levels of the mined association rules. The tabu table of probable choice is listed. The next path is calculated by the tabu table without randomness. And the original establishment of tabu table is related to the results of the offline data mining.

3.3. Overall Model

For a substation in centralized control mode in China, the proposed control strategy of switching capacitors for optimal allocation of reactive power is described as Fig.1. First, it establishes the association rules of the central station and controlled stations based on historical databases. Second, it compares the established results and the measured data. Then it calculates the optimal solution according to evaluation function, namely, optimization goals.

Figure 1. Proposed strategy

The proposed strategy can be divided into two parts: offline and online. The input of the offline part is the historical databases and the output is the associate rule and the confidence level of the historical data calculated by the Apriori algorithm. The frequent items are mined according to the principle that

New data Historical data

Preparation of the data

data mining with Apriori

Rule and knowledge Confidence level

The actual power grid ant colony algorithm The optimal strategy Optimization goals output offline online

(5)

their frequencies are not less than the pre-set minimum support frequency. Based on the frequent items, the corresponding strong association rules are gained. Ant colony algorithm is used to find the optimal strategy of reactive power regulation, based on the output association rules of the offline part. And the renew output of the offline part interact with the online strategy.

3.4. Target Function

The power loss between two points i, j can be represented as

2 ij i i i

P

f

l

U







  





(7)

Where,

P

_ij is the transported power between i and j; li is the length of the transmission line;



_i is

the related comprehensive coefficient. The total power loss can be represented as

1 1 n i i

F

f





(8)

The node voltage deviation is

2 sp j j j sp j

U

f

U







(9)

The total voltage deviation of all nodes is

2 1 sp n j j sp j j

U

F

U











(10)

Where, n is the number of the nodes except the slack bus nodes; sp j

U

is the set value of the node voltage; Δ sp

j

U

is the set value max deviation of the node voltage.

The mathematical model of the reactive power optimization can be represented as

1 1 2 2

min

(

)

k k N

C



F



F









(11)

Where,



₁ and



₂ are the weight coefficients; Nk is a group of the numbers of the available

capacitors;

[ , ,

₁ ₂

]

T n

E



e e



e

is the group of the states of the available capacitors ; f1 and f2 are the

functions of E. 1

1

0

capacitor i is switched

e

capacitor i is disconected



  



 



  



The constraints can be represented as following: (i) The constraint of power balance

(6)

1 1

(

cos

sin

) 0

(

sin

cos ) 0

i n i i j ij ij ij ij j i n i i j ij ij ij ij j

P U

U G

B

Q

U

U G

B



   

 









 









(12)

Where, Pi is the injected active power; Qi is the injected reactive power; Ui and U j are the node

voltage; Gij is the conductance between i and j; Bij is the susceptance between i and j ; δij is the

electrical angle difference between i and j. (ii) The constraint of node voltage

min max min max min max

min max min max

,

Ci Ci Ci i i i ij ij ij i i i i i i

Q

U

T

C





 



(13)

Where, QCimin is the min available reactive power; QCimaxis the max available reactive power; Uimin

is the min voltage amplitude of node i ; Uimax is the max voltage amplitude of node i; [Timin , Timax] is

adjustment range of the adjustable transformer i; n i ,..., 2 , 1 = ; Ci is the switching frequency; Cmin

and Cmax are the limits of Ci. If Ci reaches to Cmax , the capacitor is disabled in the left time.

3.5. Calculation of Target Function

(i)Target function for TSP method: The problem of reactive power optimization in substations can be regarded as a TSP problem. A capacitor set can be regarded as a city in TSP method. The switching state is the path between two cities. The function in (11) can be described as

1 1

min(

n

( ( ))

_ii

( (

_n

)))

s

ts s e

_ 





(14)

Where,

ts s e

( (

_n₁

))

represents the change of target function if there is injected reactive

power in the new-added node n .

(ii) Constraint conditions: Considering the representation of the constraint conditions of (13) in tabu table, the constraints on voltage and the change of the transformer taps can be ignored.

The switching frequency of capacitor sets is

C

_i_min



C

_i



C

_i_max. If

C

_i



C

_i_maxand last for a period time, the capacitor Ci will be not allowed to switched again and the value is set to zero

in the left time.

4. Case study

The improved algorithm is applied to an example system. The diary operating data are available. Fig.2 shows the simplified study example. A center substation (C, as in Fig.1.) has nineteen controlled substations, three 110KV substations and sixteen 35KV substations. All these substations are equipped with reactive compensators and on-load tap-changing transformers as shown in Table I.

The parameters are





0.5,





1,





0.4

before the 1/4 calculation period and

1,

3,

0.8













later. k

( )

ij

t





is calculated by (6). So the information in the path is enlarged and the computational complexity is reduced to find the optimal solution quickly.

(7)

Fig.3(a), (b), and (c) are the evaluation results when the reactive difference of 110kV buses changes continuously. Where, (I) aims at the min of the net loss; in other words,



₁



1,



₂



0

in (11); (II) aims at the min node voltage deviation; in other words,



₁



0,



₂



1

in (11).

Figure 2. A real electric system

Table 1. THE CONFIGURATION OF THE COMPENSATED REACTIVE POWER IN EXAMPLE

SUBSTATION

Node No. Distance Available Var

1 35km 24kVar 2 25km 36 kVar 3 100km 24 kVar 4 78 km 36 kVar 5 43 km 24 kVar 6 65 km 36 kVar 7 73 km 24 kVar 8 53 km 36 kVar 9 67 km 30 kVar 10 36 km 30 kVar 11 36 km 12 kVar 12 37 km 18 kVar 13 56 km 12 kVar 14 38 km 18 kVar 15 47 km 12 kVar 16 56 km 18 kVar 17 67 km 12 kVar 18 86 km 18 kVar 19 33 km 12 kVar 17/18 28 km 0 kVar

The evaluation function is as

1 1 2 2 2

(

)

k i i k N i NL

F



F



F

C f

 









(5)

(8)

Where, F1 is shown in (8); f2j is shown in (9);

NL





f

₂_j



1



. If the node voltage exceeds a given

maximum deviation voltage of the node, the corresponding coefficient Ci increase as a punitive options.

When the 35 kV bus coupler switcher S1 is disconnected and 110 kV bus coupler switcher S2 is closed, the compensating results are shown in Fig. 3 (a). When S1 is closed and S2 is disconnected, the

compensating results are shown in Fig.3 (b). When S1 and S2 are disconnected, the compensating results are shown in Fig.3 (c).

Figure 3. The comparison of reactive compensation

From Fig.3, it can be concluded that the overall compensation result with optimized strategy is better than that of the old switching method (III). The evaluation coefficient is equal to zero when fully compensated. The reactive power is over-compensated because of the step reactive power regulation with capacitors in Table.I.

(9)

5. Conclusions

An example substation system is described to test the algorithm proposed in this paper. Experimental results show that, reactive power optimization method based on data mining system can improve the system efficiency, reduce power loss, and have a great significance of stable operation

6. References

[1] Qi Luo, “Advancing Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining,” WKDD 2008. 23-24 Jan. 2008, pp.3-5.

[2] Xindong Wu, “Data mining: artificial intelligence in data analysis,” Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004). pp.7-7.

[3] Aihua Li, Lingling Zhang, “A Study of the Gap from Data Mining to Its Application with Cases, Business Intelligence and Financial Engineering,” BIFE '09. International Conference on 24-26 July 2009 pp.464 - 467.

[4] S J.A.teele, J.R.McDonald, and C.D'Arcy, “Knowledge discovery in databases: applications in the electrical power engineering domain,” IT Strategies for Information Overload (Digest No: 1997/340), IEE Colloquium on 3 Dec. 1997, pp.8/1 - 8/4.

[5] LI Jianqiang, NIU Chenglin,LIU Jizhen. “Application of Data Mining Technique in Optimizing the Operation of Power Plants,” Joumal of Power Engineering. Vol.26,No.6, pp.830-835.

[6] Cesario, E.; Talia, D.; “Distributed Data Mining Models as Services on the Grid,” International Conference on Data Mining Workshops, 2008, pp.486 - 495.

[7] Dingli Song, Bingru Yang, Zhen Peng, and Weiwei Fang, “Study of cost-sensitive ant colony data mining algorithm,” Industrial Mechatronics and Automation, ICIMA 2009. International Conference on15-16 May 2009, pp.488 - 491.

[8] L.Admane , K.Benatchba, M.Koudil, M.Drias, S.Gharout, N.Hamani, “ Using ant colonies to solve data-mining problems,” IEEE International Conference on Systems, Man and Cybernetics, 2004 (4):3151-3157.

[9] P. S. Shelokar, V. K. Jayaraman, B. D. Kulkarni, “An ant colony classifier system: application to some process engineering problems,” Computers and Chemical Engineering, 2004 (28): 1577-1584.

[10] WANG Zhigang, YANG Lixi, CHEN Genyong. “Ant Colony Algorithm for Distribution Network Planning,” Proceedings of the EPSA. 2002, 14(6):73-76.

[11] Yi Shen; Mingxin Yuan; Yunfeng Bu; “Study on adaptive planning strategy using ant colony algorithm based on predictive learning,” Control and Decision Conference, 2009, pp: 3030 - 3035. [12] Colson, C.M.; Nehrir, M.H.; Wang, C.; “Ant colony optimization for microgrid multi-objective

power management. Power Systems Conference and Exposition, 2009, pp: 1 - 7.

[13] Ali Abdulhadi Noaman, "Concentric Circular Array Antenna Null Steering Synthesis by Using Modified Hybrid Ant Colony System Algorithm", IJACT, Vol. 2, No. 2, pp. 144-157, 2010 [14] [Alaa Aljanaby , Ku Ruhana Ku-Mahamud , Norita Md. Norwawi , "Interacted Multiple Ant

Colonies Optimization Framework: an Experimental Study of the Evaluation and the Exploration Techniques to Control the Search Stagnation", IJACT, Vol. 2, No. 1, pp. 78 -85, 2010

[15] S. Janakiraman, V. Vasudevan, "ACO based Distributed Intrusion Detection System", JDCTA, Vol. 3, No. 1, pp. 66-72, 2009