Combinations of the Greedy Heuristic Method for Clustering Problems and Local Search Algorithms

(1)

In: A. Kononov et al. (eds.): DOOR 2016, Vladivostok, Russia, published at http://ceur-ws.org

Com

bin

at

i

o

ns

of t

h

e

G

ree

d

y

H

e

u

r

is

t

i

c Met

h

o

d

for C

lus

ter

in

g Pro

bl

em

s

a

nd

L

oca

l

S

earc

h

A

l

gor

i

t

h

m

s

Lev Kazakovts_ev1,2,*_{, A}_l_{exander An}_t_am_os_hk_i_n1,2

1_{Siberian State Aero}_s_{pace Univer}_s_{ity Na}_m_{ed after Acade}_m_ician_M_.F._R_e_s_{hetnev, Kra}_s_noyar_s_k,

Russian Federation 2

Siberian Federal Univers_{ity, Kra}s_noyars_k,Russ_{ian Federation} [email protected], [email protected]

Abs_{tract. In thi}s paper, we investigate application of various options of algo-rithms with greedy agglomerative heuristic procedure for object clustering problems in continuousspace in combination with various local search m eth-ods. We propose new modifications of the greedy agglomerative heuristic algo-rithms_{with local}s_{earch in SWAP neighborhood for the p-}m_{edoid proble}ms_and j-m_eans_{procedure for continuou}s_clus_{tering proble}ms_(p-m_{edian and k-}m_eans_). New modifications of algorithms were applied to clustering problems in both continuous and discrete settings. Computational results with classical data sets and real data s_how_{the co}m_{parative efficiency of new algorith}ms_form iddle-s_{ize proble}ms_only.

K_eywords_{: p-}median _· k-means_· p-medoids_· Genetic algorithm_· Heuristic op-timization _·Clustering

1

I

n

tro

du

ct

i

o

n

Problems_{of au}_t_om_a_ti_{c groupp}_i_{ng of ob}_j_ec_ts_i_{n a con}_ti_nuouss_{pace w}_it_{h def}_i_{ned d}_is -tance measure function between two points are the most widely applied clustering m_ode_ls_{. The}_k_-m_eans_prob_l_em_ism_os_t_popu_l_{ar c}_l_us_t_er_i_ngm_ode_l_i_{n wh}_i_ch_it_is_requ_i_red to s_p_litN_ob_j_ec_ts_i_n_t_o_k_groupss_o_t_ha_t_t_hes_um_ofs_quares_{of d}_is_t_ances_from_ob_j_ec_ts_t_o the closest center of a group reachesitsminimum. Centers (sometimes called by cen-troids_{) are unknown po}_i_n_ts_i_n_t_hes_am_es_{pace. O}_t_{her popu}_l_{ar c}_l_us_t_er_i_ngm_ode_l_is_t_he_p -median problem [9] which has similar setting but instead of the sum of distance s_quares_,s_um_{of d}_is_t_ances_has_t_{o be}m_i_n_im_i_{zed. Thu}s_,_i_n_t_he_k_-m_eans_prob_l_em_{, a} m_eas_{ure of d}_is_t_ance_is_t_hes_{quared Euc}_li_{dean d}_is_t_{ance. The}s_im_il_ar_it_{y of}_t_{he con}_ti nu-ousp-median problem and k-means problem was emphasized by many researchers [25, 12, 16, 13].

There exists a special class of discrete optimization problems operating concepts of the continuous_prob_l_ems_ca_ll_ed_p_-m_edo_i_{d prob}_l_em_{[19] or d}_is_cre_t_e_p_-m_ed_i_an prob-lem_[37]._S_{uch c}_l_us_t_er_i_{ng prob}_l_ems_{can be con}s_i_{dered a}s_l_oca_ti_{on prob}_l_ems_{[34, 28]} since the main parameters of such continuous and discrete problems are coordinates of objects_{and d}_is_t_{ance be}_t_ween_t_hem_{[10, 9].}_S_{uch prob}_l_ems_ar_is_e_i_ns_t_a_tis_ti_cs_(for

(2)

M

exam_p_l_{e, prob}_l_ems_{of e}s_tim_a_ti_on),s_t_a_tis_ti_ca_l_da_t_{a proce}ss_i_ng,s_i_gna_l_or_im_age pro-cessing and other engineering applications [20].

The form_u_l_a_ti_{on of}_t_{he p-}m_ed_i_{an prob}_l_em_{[12, 7]}_is_as_fo_ll_ows_:

(1)

Here, is_as_e_t_{of known po}_i_n_ts_(da_t_{a vec}_t_ors_), _are un-known points_(cen_t_ers_{, cen}_t_ro_i_ds_), _{are we}_i_gh_t_coeff_i_c_i_en_ts_(us_ua_ll_{y equa}_l_t_{o 1),} is_t_{he d}_is_t_{ance func}_ti_on_i_{n a d-d}_im_ens_i_ona_ls_{pace [14, 9]. In}m_os_t_popu_l_{ar ca}s_es_, iss_{quared Euc}_li_{dean d}_is_t_{ance, Euc}_li_{dean or}_M_anha_tt_anm_e_t_r_i_c.

The center of each clus_t_{er (group)}_is_t_hes_o_l_u_ti_{on of}_t_{he 1-}m_ed_i_{an (Weber)} prob-lem for this cluster. If the distance measure issquared Euclidean distance ( ) then the s_o_l_u_ti_on_is_[12]_:

(2)

Here, , .

Such p-m_ed_i_an,_k_-m_eans_and_k_-m_edo_i_ds_prob_l_ems_{are prob}_l_ems_{of genera}_l_op_tim_i za-tion: objective function is_{non-convex [8].}_M_oreover,_t_{hey are N}_P_{-hard [11, 2, 33, 40,} 41, 15]. The m_os_t_popu_l_{ar ALA procedure for}s_{uch prob}_l_ems_is_bas_{ed on}_t_{he L}_l_oyd algorithm [30] also known asstandard k-means procedure [32]. Nevertheless, many authors_{offered fa}s_t_erm_e_t_hods_bas_{ed on}_t_h_iss_t_{andard procedure [46, 1] for da}_t_as_e_ts and data s_t_reams_{. The ALA and}s_im_il_{ar procedure}s_{are a}_l_gor_it_hmss_equen_ti_a_ll_y_im -proving the known s_o_l_u_ti_{on. They are no}_t_t_rue_l_oca_ls_{earch a}_l_gor_it_hms_i_ns_t_r_i_c_ts_ens_e s_i_nce_t_{he new dec}_is_i_on_iss_{earched no}_t_ne _-ne_i_{ghborhood of}_t_{he known} solution.

The m_odern_lit_era_t_{ure offer}sm_{any heur}_is_ti_cm_e_t_hods_{[36] for}s_eed_i_{ng of}_t_he_i_n_iti_a_l s_o_l_u_ti_ons₍s_e_ts_{of cen}_t_ers_{) for}_t_{he ALA procedure wh}_i_{ch are var}_i_ous_evo_l_u_ti_onary m_e_t_hods_{and rando}ms_earchm_e_t_hodss_{uch a}s_k-m_eans_{++ procedure [4].}

Popular idea is_us_{e of}_t_{he gene}_ti_{c a}_l_gor_it_hms_{(GA) and for}_im_prov_i_{ng of re}s_u_lts_of localsearch [18, 27, 39]. In case of GAs, authors use variousmethods of coding of s_o_l_u_ti_ons_wh_i_{ch for}m_{a "popu}_l_a_ti_{on" of}_t_{he evo}_l_u_ti_{onary a}_l_gor_it_hm_{. Ho}s_{age and} Goodchild [17] offered the firs_t_gene_ti_{c a}_l_gor_it_hm_for_t_he_p_-m_ed_i_{an prob}_l_em_{on a ne}_t -work. Rather precise but very slow algorithm based on special greedy agglomerative heuris_ti_cs_was_{offered by}B_{ozkaya, Zhang and Erku}_t_i_{n 2002 [46].}

Rather precise and fast algorithm with the greedy agglomerative heuristic recombi na-tion procedure for the p-m_ed_i_{an prob}_l_em_{on a ne}_t_{work wa}s_{offered by A}_l_{p, Erku}_t_and Drezner in 2002 [3]. This_a_l_gor_it_hm_was_adap_t_{ed for}_t_{he con}_ti_nuous_prob_l_ems_by Neema et al. in 2011 [39]. At each iteration, this algorithm generatesinitialsolutions for the local s_{earch procedure. Thu}s_,_t_h_is_{approach beco}m_es_ex_t_rem_e_l_ys_l_{ow w}_it_h growth of number of clustersp. Other method of recombination is offered by Sheng and Liu (2006) [43]. Thes_{e a}_l_gor_it_hms_{work qu}_i_{cker, however,}_t_{hey are}_l_ess_prec_is_e. Als_{o L}_im_{and X}_i_u_i_{n 2003 offered}_t_{he gene}_ti_{c a}_l_gor_it_hm_bas_{ed on reco}m_b_i_na_ti_{on of} subsets of centers of the fixed length [29].

(3)

The genetic algorithm_w_it_{h greedy heur}_is_ti_cs_{[3] doe}s_no_t_prov_i_{de u}s_{e of a}m_u_t_a_ti_on procedure which is common for GAs [38].

In [21], authors_propos_e_t_{he GA w}_it_{h greedy agg}_l_om_era_ti_{ve heur}_is_ti_{c u}s_i_{ng f}_l_oa_ti_ng point alphabet. Most GAs for p-median problems [39] use the integer alphabet for coding of initial s_o_l_u_ti_ons_of_t_{he ALA procedure by nu}m_bers_{of corre}s_pond_i_{ng da}_t_a vectors_{. In [21], au}_t_hors_encode_t_he_i_n_t_er_ims_o_l_u_ti_ons_i_{n a for}m_ofs_e_ts_{of po}_i_n_ts (cen-ters) in the d-dimensional space which are changed by the ALA procedure. Such com_b_i_na_ti_{on of greedy agg}_l_om_era_ti_{ve heur}_is_ti_cs_{and ALA procedure a}_ll_ows_a_l_gor_it_hm to receive more precise results.

In this_{paper, we propo}s_{e fur}_t_herm_od_i_f_i_ca_ti_{on of}_t_{he gene}_ti_{c a}_l_gor_it_hm_w_it_{h greedy} heuris_ti_{c wh}_i_ch_i_nc_l_udes_com_b_i_na_ti_{on of}_t_{he greedy agg}_l_om_era_ti_{ve heur}_is_ti_c proce-dures with localsearch in wider neighborhoodssuch asSWAP neighborhood.

2

K

n

ow

n

a

l

gor

i

t

h

m

s

The ALA procedure includes_t_wos_im_p_l_es_t_eps_: Algorithm_{1. ALA procedure.}

R_equ_i_red_:_da_t_{a vec}_t_ors_A_1..._A

N, kinitial cluster centersX1...Xk. 1. For each center X_i, determine its cluster C

i as a subset of the data vectors for

which this_cen_t_er_X

iisthe closest one.

2. For each cluster , recalculate its center X_i (i.e., solve the Weber problem_).

3. R_epea_t_St_{ep 1 un}_l_ess_St_eps_{1, 2}m_{ade no change}_i_{n any c}_l_us_t_er.

Many papers propose approachesto improve the speed of the ALA algorithm [1] s_{uch a}s_randoms_am_p_li_{ng [35] e}_t_c.

The GA with greedy heuris_ti_{c [3, 23] w}_it_hm_od_i_f_i_ca_ti_ons_{[39, 21, 24] can be} de-s_cr_i_{bed a}s_fo_ll_ows_.

Algorithm 2. GA with greedy heuristic for p-median and p-medoid problems. R_equ_i_red_:_P_opu_l_a_ti_ons_i_zeN

POP.

1. Generate (random_l_{y w}_it_{h equa}_l_probab_iliti_es_{or u}s_i_ng_t_{he k-}m_eans_{++ procedure)} N

POPinitialsolutions χ, . . .χ ,N , POP

N {1 }

1 χ_i=p i=1,NPOP which are sets of data

vectors_us_{ed a}s_i_n_iti_a_ls_o_l_u_ti_ons_for_t_{he ALA procedure.}_F_{or each of}_t_hem_{, run}_t_he ALA procedure and es_tim_a_t_e_t_{he va}_l_ues

F

fitness

χ

of the objective function (1) for

the results of the ALA procedure, save these valuesto variables . 1, . . . ,fNPOP f

2. If the s_t_{op cond}_iti_ons_{are reached}_t_hen_S_TO_P_{. The}s_o_l_u_ti_on_i_f_t_hes_e_t_of_t_he_i_n_i -tial centers

*

i

χ corres_pond_i_ng_t_hem_i_n_im_a_l_va_l_{ue of}_f

i. For estimating the finalsol u-tion, run the ALA procedure for _*

i χ . 3. C_hoose randomly two indexes

2 1 2 1k {1,N},k k k , . 4. Form_an_i_n_t_er_ims_o_l_u_ti_on 2 1 k k c=χ χ χ .

(4)

M

5. If

χ

_c

>p

then go to Step 7:

6. Performthe greedy agglomerative heuristic procedure (Algorithm 3) for

χ

c 7. If i {1,NPOP}:χ_i=χcthen go to Step 2.

8. C_hoos_{e an}_i_ndex _{₁ _}

3 ,NPOP

k . Authors_{[21] u}s_{e a}s_im_p_l_e_t_ournam_en_ts_e_l ec-tion procedure: choose randomly k₄,k₅ {1,NPOP}, if fk₄>fk₅then k3=k4, ot her-wis_e 5 3=k k _. 9. R_ep_l_ace 3 k

χ and corres_pond_i_{ng ob}_j_ec_ti_{ve func}_ti_{on va}_l_ue_:

c k =χ χ 3 , c fitness k=F χ f 3 . Go to Step 2.

The greedy agglom_era_ti_{ve heur}_is_ti_c_is_rea_li_{zed a}s_fo_ll_ows_: Algorithm 3. Greedy agglomerative heuristic.

R_equ_i_red_:_i_n_iti_a_ls_o_l_u_ti_on

χ

_c 1.For each

j

χ

_cdo: 1.1. Form_as_e_t _χ ₌_χ_\_{_j_}

с . For each data vector A {A...,AN} , i 1 , choose the clos_es_t_cen_t_er j i χ , j= - _L_A_,X min arg = C -i 1

. Form - _s_e_t_s_{of da}_t_{a vec}_t_or_s_{for each of}

which the center is_its_c_l_os_es_t_cen_t_er_:Сclust₌ _i _,N _|C_i =j j 1 ; for each cluster

,

1

с clus t-j

,

j

=

,

χ

С

calculate its _cen_t_er _X_j-_, _and _t_hen _ca_l_cu_l_a_t_e

. ,A X L w = f _k- _i χ k i N i -j 1 min 1.2. Nextiteration 1. 2. Sortthe s_e_t_{of pa}_i_rs₍j, f= w LX ,A . i -k χ k i N i -j 1 min ) in ascending order of -j f , form_as_e_t_E elim of the firstNe

lim_i_ndexe_s_{of da}_t_{a vec}_t_or_s_fro_m_t_h_i_s_arranged_s_e_t_.

From_t_h_iss_e_t_E

elim, eliminate indexesisuch that j Ee :Ai Aj L , j i min

lim .

Here, N_e_li_m=_{[ (|}

χ

c|-p)]+1. The value of parameter Lminis equalto 0.1-0.5 of the average dis_t_{ance be}_t_ween_t_{wo da}_t_{a vec}_t_ors_{. We u}s_e 0,2 ( ( , )).

min LAi Aj

L Param

e-interims_o_l_u_ti_on. Value 0 meanssequential deleting of elements (centers, centroids, medoids) one by one. The default value 0.2 makesthe algorithm work faster which isimportantif val -ue of pis_com_para_ti_ve_l_y_l_{arge (}_p>_10).

3. Elim_i_na_t_e_Ee_li_m_from c

χ

: χ_c=χ_c\ E_e_lim.and return.

(5)

Algorithm_4.C_a_l_cu_l_a_ti_ng_t_{he ob}_j_ec_ti_{ve func}_ti_{on va}_l_ue

F

fitness

χ

.

R_equ_i_red_:_i_n_iti_a_ls_o_l_u_ti_on

χ

₌_{ _}

1,...,Xp

X .

1. R_un_t_{he ALA procedure or}_t_he_P_A_M_{procedure w}_it_h_t_he_i_n_iti_a_ls_o_l_u_ti_on

χ

to gen new set of centers_{ _}

1,...,Xp X . 2. Return F χ= w _LX_j,A_i. ,p j i N i fitness min } 1 { 1

Step 4 of Algorithm_{2 genera}_t_es_an_i_n_t_er_ims_o_l_u_ti_ons_e_t_w_it_{h card}_i_na_lit_{y up}_t_{o 2}_p from which we sequentially eliminate elements (Step 6) until we reach cardinality equalto p. At each iteration, the value Ffintessis calculated up to 2ptimes. Thus, the

locals_{earch procedure}_iss_t_ar_t_{ed up}_t_o_p2_i_{n each}_it_era_ti_{on of}_t_{he greedy heur}_is_ti_c re-com_b_i_na_ti_{on. In}_t_{he ca}s_{e of}_t_{he k-}m_edo_i_{d prob}_l_em_,_t_{he co}m_pu_t_a_ti_ona_l_com_p_l_ex_it_y increas_es_{. In}_t_h_is_cas_{e, we}m_us_t_ca_l_cu_l_a_t_e . ) a (a = x d k i j clust j C j clust j C i j,k 1 2 min

Instead of the ALA procedure, we can use faster PAM procedure. Nevertheless, its complexity also depends on p and N.

3

Com

bin

at

i

o

n

w

i

t

h

a

l

ter

n

at

i

ve

l

oca

l

s

earc

h

a

l

gor

i

t

h

m

s

The s_t_ruc_t_{ure of A}_l_gor_it_hm_{4 can be de}s_cr_i_{bed a}s_t_{hree ne}s_t_ed_l_oop.

The first of them performsiterations of iteration of strategy of globalsearch (for the GA, this_is_execu_ti_{on of evo}_l_u_ti_{onary opera}_t_ors_{of rando}ms_e_l_ec_ti_{on and}s_t_ar_ti_ng the cross_i_{ngover procedure, however, o}_t_herm_e_t_aheur_is_ti_cs_{can be a}_ls_{o app}_li_{ed [22]).} The second nested loop performsthe iterations greedy heuristic procedure untilthe feas_i_b_l_es_o_l_u_ti_on_is_{reached. The}_t_h_i_{rd ne}s_t_ed_l_{oop w}_it_h_i_n_t_h_is_{procedure prov}_i_des_es -tim_a_ti_{on of con}s_equences_{of e}_lim_i_na_ti_{ng each of e}_l_em_en_ts_of_t_he_i_n_t_erm_ed_i_a_t_{e dec}_i -sion.

The s_t_eps_rea_li_z_i_{ng greedy agg}_l_om_era_ti_{ve heur}_is_ti_cs_are_l_aunched_i_{n co}m_b_i_na_ti_on with one of the existing algorithms of localsearch. Thus, for the continuous problems, it can be the ALA procedure or other two-s_t_{ep procedure}s_of_t_{he a}_lt_erna_ti_ng_l_oca_ti_on and allocation, for exam_p_l_e,_t_he_j_-m_eans_{or h-}m_eans_procdure._S_earch_i_{n d}_i_fferen_t types of neighborhoods can be applied to the discrete clustering problems. The PAM

procedure (Partition around Medoid, i.e. neighborhood of the given quantity of the closest data vectors), ALA procedure (wider neighborhood: all data vectors of a clus -ter), or s_earch_i_{n w}_i_der_S_WA_P_{or K-}_S_WA_P_-ne_i_ghborhoods_{can be app}_li_ed_t_o_t_he_p -m_edo_i_{d prob}_l_em_.

For the p-median, p-medoids and k-means problems, we can use the ALA proce-dure as_a_l_oca_ls_{earch procedure [44, 31]. The}_P_A_M_procedure_is_as_{earch a}_l_gor_it_hm_i_n neighborhood of each of medoids consisting of the solutions received by changeover of a m_edo_i_{d by one of}_pPAM_of_t_{he c}_l_os_es_t_da_t_{a vec}_t_ors_{. In our re}s_{earch, we u}s_ed_pPAM =3. However, experim_en_ts_d_i_{d no}_t_revea_l_{any un}_i_vers_a_l_advan_t_{age of each of}m_e_t_hods_:

(6)

M

PAM allows_t_{o have a good re}s_u_lt_qu_i_{cker ca}s_{e of a}sm_a_ll_num_{ber of}_l_{arge c}_l_us_t_ers_, ALA procedure ismore efficientin the case of small clusters.

Meanwhile, there are m_{any o}_t_{her eff}_i_c_i_en_t_l_oca_ls_earchm_e_t_hods_t_{he d}_is_cre_t_{e c}_l_us -tering problems [45, 42, 26]. Long ago it was noted [42] that very precise solutions can be obtained with the algorithms_bas_{ed on}s_earch_i_{n a}_S_WA_P_ne_i_{ghborhood for}m_ed by changeover of a m_edo_i_{d by any da}_t_{a vec}_t_{or. In}m_{any ca}s_es_,s_earch_i_n_t_h_is_ne_i gh-borhood allowsto receive exactsolutions [26]. Computing complexity of one it era-tion of s_{uch procedure}_li_es_w_it_h_i_{n O(}_pN2_{). Trad}_iti_ona_ll_y,_t_h_is_procedure_is_app_li_ed_t_o rather small data sets (N <5000). Larger neighborhoodssuch as K-SWAP (changeo-ver of Km_edo_i_ds_w_it_{h o}_t_{her da}_t_{a vec}_t_ors_{), cer}_t_a_i_n_l_{y, are capab}_l_e_t_o_i_ncreas_e_t_he ex-pected accuracy of the res_u_lt_rece_i_{ved af}_t_er_t_hes_i_ng_l_es_t_ar_t_ofs_{uch procedure, bu}_t_tim_e expenses grow so fastthat even in the 2-SWAP neighborhood, search is possible for very sm_a_ll_da_t_as_e_ts_on_l_{y. In}_t_{he ca}s_{e when}K_=p_,s_earch_i_n_t_h_is_ne_i_ghborhood degener-atesinto the fullsearch and givesthe exactsolution.

The greedy genetic algorithmsm_en_ti_oned_i_n_t_h_is_{paper (}_i_nc_l_ud_i_{ng A}_l_gor_it_hm₂₎ us_e_t_{he ALA procedure (a}s_it_wasm_en_ti_{oned above,}_i_n_t_{he ca}s_{e of}_t_{he p-}m_edo_i_d prob-lem, PAM procedure is also possible). In this research, we add search in SWAP neighborhood to this_{procedure . A}_l_gor_it_hm_{5 can be u}s_ed_i_ns_t_{ead of}_t_{he ALA} proce-dure option in Algorithms_{2 and 4.}

Algorithm_{5. Loca}_ls_{earch co}m_b_i_na_ti_{on for}_t_{he greedy heur}_is_ti_{c a}_l_gor_it_hm_.

Required:initialsolution which is a set of centers, centroids or medoids. Step 1. R_un_t_{he ALA procedure (or}_t_he_P_A_M_{procedure) fro}mthe initialsolution . Store the new value of .

Step 2. If Step 1 did notimprove the solution then STOP and return . Step 3. Form_array_I_{of nu}m_bers _,s_huff_l_e_t_h_is_{array rando}m_l_y. Step 4. For each do:

Step 4.1. Store i=Ii’.Store .

Step 4.2. Store )). Here, is _t_he_i_t_h

center or centroid or m_edo_i_d_i_n_t_hes_o_l_u_ti_on,_A_j_is_t_he_j_t_{h da}_t_{a vec}_t_or.

Step 4.3. If ))<F )), then store and go

to Step 1.

Steps_of_t_h_is_a_l_gor_it_hm_com_b_i_nes_earch_i_n_S_WA_P_ne_i_{ghborhood (}_St_eps_{4-4.3) w}_it_h the ALA procedure (Step 1). While the PAM procedure is_as_im_p_l_e_l_oca_ls_{earch a}_l go-rithm, the ALA procedure is not a true localsearch algorithm. However, itimproves an exis_ti_ngs_o_l_u_ti_ons_t_{ep by}s_t_eps_im_il_ar_l_y_t_o_t_he_l_oca_ls_{earch a}_l_gor_it_hm_.

For the continuous_prob_l_ems₍_k_-m_eans_,_p_-m_ed_i_an)_t_{here ex}_is_ts_t_he_j_-m_eans proce-dure [16] s_im_il_ar_t_os_earch_i_n_S_WA_P_ne_i_{ghborhood wh}_i_chs_cope_is_a_ls_{o re}s_t_r_i_c_t_ed_t_o comparatively small problems. This procedure is reduced to changeover of cen-ters_/_cen_t_ro_i_ds_{by one of da}_t_{a vec}_t_ors_w_it_h_t_hes_ubs_equen_t_con_ti_nua_ti_{on of}s_{earch by} standard ALA procedure. Itis easy to note thatthis principle of search is realized by Algorithm_{5. Thu}s_{, A}_l_gor_it_hm_{5 rea}_li_zes_a_lt_erna_ti_{on of ALA or}_P_A_M_{procedure w}_it_h s_earch_i_n_S_WA_P_ne_i_{ghborhood for}_t_{he d}_is_cre_t_{e prob}_l_ems_and_t_he_j_-m_eans_procedure for the continuous ones.

(7)

We will apply Algorithm₅_i_ns_t_{ead of A}_l_gor_it_hm_{4 for}s_o_l_v_i_{ng d}_is_cre_t_{e and con}_ti n-uous problems as a part of genetic algorithms with greedy heuristic (Algorithm 2). For tes_ti_{ng purpo}s_es_{, we u}s_e_t_{he rea}_l_da_t_{a and}_t_{he genera}_t_{ed da}_t_as_e_ts_co_ll_ec_t_{ed by} department of processing of the speech and images of computing School of Universi -ty of Eas_t_ern_Fi_n_l_{and and a repo}s_it_{ory of}m_ach_i_ne_t_ra_i_n_i_{ng of U}C_{I, and a}_ls_{o da}_t_{a of} tes_ts_{of EEE co}m_ponen_ts_fors_paces_h_i_ps_[20].

4

Com

pu

tat

i

o

n

a

l

ex

p

er

i

me

n

t

s

In our experiments, we used the Depo X8Sti computer (6-core CPU Xeon X5650 2.67 GHz, 12Gb R_A_M_{). We}_l_{aunched each a}_l_gor_it_hm_w_it_{h each da}_t_as_e_t₃₀_tim_es_.

In Fig. 1, 2 and Table 1, some results of computing experiments are provided. With num_{ber of vec}_t_ors_{of da}_t_{a N}_>_10000,_t_{he a}_tt_em_p_ts_t_{o app}_l_{y A}_l_gor_it_hm_{5 a}s_{a par}_t of the genetic algorithm_w_it_{h greedy heur}_is_ti_cs_{were un}s_uccess_fu_l s_i_nce_t_hes_i_ng_l_e start of Algorithm 5 required time exceeding alltime limit of for the GA.

Dynam_i_cs_{of change of}_t_{he be}s_t_{known va}_l_{ue of}_t_{he ob}_j_ec_ti_{ve func}_ti_ons_hows_t_ha_t for small discrete problems (N <1000), application of Algorithm 5 has advantage in com_par_is_{on w}_it_{h A}_l_gor_it_hm_4._M_{oreover, for}s_{uch prob}_l_ems_{, app}_li_ca_ti_{on of A}_l go-rithm_{5 even a}s_as_epara_t_{e a}_l_gor_it_hm_{, w}_it_hou_t_us_{e of}_t_{he gene}_ti_{c a}_l_gor_it_hm_has ad-vantage in comparison with the genetic algorithm with greedy heuristicsin a combi -nation with Algorithm_4.

Middle-s_i_{ze prob}_l_ems₍N_<₁₀₀₀₀₎s_{how o}_t_hers_it_ua_ti_{on. In cer}_t_a_i_{n ca}s_es_,_t_{he gene}_t -ic algorithm with greedy heuristics even without application of search in SWAP neighborhood s_hows_t_{he be}s_t_res_u_lts_{. In o}_t_{her ca}s_es_{, app}_li_ca_ti_{on of}s_earch_i_n_S_WA_P neighborhood is repaid but application of thistype of localsearch as a part of the genetic algorithm_l_eads_t_{o fur}_t_her_im_prov_i_{ng of re}s_u_lt_._M_oreover,_i_{n cer}_t_a_i_{n ca}s_es_, s_im_p_l_{er reco}m_b_i_na_ti_{on procedure}s_of_t_{he gene}_ti_{c a}_l_gor_it_hms_{uch a}s_t_{he gene}_ti_{c a}_l go-rithm with a recombination of fixed length subsets [43] yield the best resultsin com -paris_{on w}_it_{h GA w}_it_{h greedy heur}_is_ti_cs_{. Th}_is_ism_os_t_ev_i_den_t_{for prob}_l_ems_w_it_hB_oo_l

-ean and categorical data. Nevertheless, the obtained results are comparable by accura-cy and us_{e of}_t_{he GA w}_it_{h greedy heur}_is_ti_cs_{can be co}m_pe_titi_{ve for prob}_l_ems_w_it_h 1000<N_<_10000.

For the continuous problems, (exceptthe smallest ones with N_<1000), application of algorithms_w_it_{h greedy heur}_is_ti_cs_i_{n a co}m_b_i_na_ti_{on w}_it_{h A}_l_gor_it_hm₅_is_qu_it_e_j_us_ti -fied in the majority of practical cases.

5

Co

n

c

lu

s

i

o

n

s

Application of s_earch_i_n_S_WA_P_ne_i_{ghborhood (and probab}_l_{y w}_i_{der ne}_i_ghborhoods₎ can be com_pe_titi_ve_i_n_t_{he ca}s_{e of}_t_hesm_a_ll_prob_l_ems_{, bu}_t_is_no_t_com_pe_titi_{ve for}_l arge-scale problems (N_>10000). Experimentsshow thatthis effect does not depend on the heuris_ti_{c u}s_{ed for con}s_t_ruc_ti_ng_t_he_i_n_iti_a_l_popu_l_a_ti_{on. A}s_{a ru}_l_{e, ob}_t_a_i_n_i_{ng an accep}_t a-ble resultis decelerated in case of using SWAP procedure. The exception issome problems_w_it_hJ_acquardm_e_t_r_i_{c and Ha}mm_i_ngm_e_t_r_i_{c. Depend}_i_{ng on prob}_l_em_param

(8)

e-M

ters₍N_,_p_,_d_,m_e_t_r_i_{c) and}_tim_e_lim_it_{and accuracy requ}_i_rem_en_ts_{, var}_i_ous_l_oca_ls_earch procedures are more or less efficient. The mostimportant factor in the case of the large-s_ca_l_{e prob}_l_ems_is_tim_{e con}s_um_p_ti_{on for a}s_i_ng_l_{e run of}_t_he_S_WA_P_l_oca_ls_earch procedure.

p-m_edo_i_ds_prob_l_ems

Fig. 1.Com_para_ti_{ve re}s_u_lts_{of a}_l_gor_it_hms_{. 1}_l_oca_l_S_WA_Ps_earchm_u_ltis_t_ar_t_{, 2 GA w}_it_h

greedy heuris_ti_{c (}_P_A_M_as_a_l_oca_ls_{earch procedure), 3 A}_l_gor_it_hm_{2 w}_it_h _{=0 (}_i_{n co}m_b_i_na_ti_on

with SWAPsearch), 4 Algorithm 2 with =0.2 (in combination with SWAPsearch), 5 GA with fixed length subset recombination [43,29], 6 PAM procedure multistart

(9)

A) p-medoids problems

B_{) p-}m_ed_i_{an prob}_l_ems

Fig. 2.Com_para_ti_{ve re}s_u_lts_{of a}_l_gor_it_hms_{. 1}_l_oca_l_S_WA_Ps_earchm_u_ltis_t_ar_t₍_j_-m_eans_i_n_t_he

case of p-median problem), 2 GA with greedy heuristic (PAM or ALA as a localsearch pro-cedure), 3 Algorithm_{2 w}_it_h _{=0 (}_i_{n co}m_b_i_na_ti_{on w}_it_h_S_WA_Ps_{earch or}_j_-m_eans_{), 4 A}_l

go-rithm_{2 w}_it_{h =0.2 (}_i_{n co}m_b_i_na_ti_{on w}_it_h_S_WA_Ps_{earch or}_j_-m_eans_{), 5 GA w}_it_{h f}_i_xed_l_eng_t_h s_ubs_e_t_recom_b_i_na_ti_{on [43], 6 ALA or}_P_A_M_procedurem_u_ltis_t_ar_t

(10)

M

Table 1.Com_para_ti_{ve re}s_u_lts_{for p-}m_ed_i_{an prob}_l_ems_{, 30 run}s_.

Data set, its parameters p and distance measure Algorithm (see below) Time , sec.. Average result Std. deviation Testing results for electronic chip 1526TL1, N=1234, d=120. p=14, l22 ALA multistart j-meansmultistart GA FL+ALA GA FL+j-means GA GH+ALA GA GH+j-means 15 15 15 15 15 15 150,124869801 150,533299444 149,954679652 151,280175427 149,78736565* 151,082443691 0,384203928 0,598587789 0,172789313 0,982922979 0,03157532* 0,654212395 p=10, l22 ALA multistart j-meansmultistart GA FL+ALA GA FL+j-means GA GH+ALA GA GH+j-means 15 15 15 15 15 15 198,375350991 198,426881563 198,377650812 198,450402498 198,359747028 198,35421865* 0,018643710 0,044039446 0,024878118 0,032311263 -14_* 0,0070903 p=6, l22 ALA multistart j-meansmultistart GA FL+ALA GA FL+j-means GA GH+ALA GA GH+j-means 15 15 15 15 15 15 362,70701636* 362,70401636* 362,70401636* 362,704156850 362,704051312 362,704051312 0* 0* 0* 0,000344112 0* 0* UCI MopsiJoensuu, N=6014, d=2. p=10, l2 ALA multistart j-meansmultistart GA FL+ALA GA FL+j-means GA GH+ALA GA GH+j-means 15 15 15 15 15 15 359,680203232 359,545287242 359,545250068 361,435624000 359,410460803 359,41036391* 3,964320582 0,208756158 2,526439494 0,208770779 0,177992934 0* p=4, l2 ALA multistart j-meansmultistart GA FL+ALA GA FL+j-means GA GH+ALA GA GH+j-means 15 15 15 15 15 15 596,825210394 596,825217410 596,82520843* 596,825208927 596,825283111 596,825283111 0,000000442 0,000004148 0,000000388 0,000000574 0* 0* BIRCH-3, N=100000, d=2. p=100, l22 ALA multistart j-meansmultistart GA GH+ALA GA GH+j-means 30 3000 30 30 13 13 13_* - 116786778766 158613580914 21699776156* - p=50, l22 _GA_F_L₊_ALA GA FL+j-means GA GH+ALA GA GH+j-means 30 30 30 30 13 - 13_* - 9545892119 - 0* - p=20, l22 _GA_F_L +ALA GA FL+j-means GA GH+ALA GA GH+j-means 30 30 30 30 14_* - 14 - 0* - 0* - Notation: GA FL isthe GA with fixed length subset recombination [43], GA GH

is_t_{he GA w}_it_{h greedy heur}_is_ti_c. _.

Refere

n

ce

s

1. Ackermann, M.R. et al.:StreamKM: A Clustering Algorithm for Data Streams. J. Exp. Algorithm_i_cs₁₇_{, Ar}_ti_c_l_{e 2.4 (2012), DOI}_:_10.1145_/_{2133803.2184450.}

(11)

2. Alois_{e, D., De}s_{hpande, A., Han}s_en,_P_._P_opa_t_,_P_._:_N_P_-Hardness_{of Euc}_li_dean_S_um -of-SquaresClus_t_er_i_ng._M_ach_i_{ne Learn}_i_ng₇₅_{, 245-249 (2009), DOI}_:_10.1007_/s

10994-009-5103-0.

3. Alp, O., Erkut, E., Drezner, Z.: An Efficient Genetic Algorithm for the p-Median Problem. Annals_{of Opera}_ti_onsRes_earch₁₂₂_{(1-4), 21 42 (2003), DOI 10.1023}_/_A_:_{1026130003508}

4. Arthur, D., Vass_il_v_its_k_ii_{, D.}_:_k-_M_eans₊₊_:C The Advantages_ofCarefulSeeding. P roceed-ings_of_t_{he E}_i_gh_t_een_t_{h Annua}_l_AC_M-SIAMSym_pos_i_um_{on D}_is_cre_t_{e a}_l_gor_it_hms₍_S_ODA

), pp. 1027-1035. SIAM (2007).

5. Balcan, M.-F., Ehrlich, S., Liang, Y.: Distributed k-Means and k-Median Clustering on GeneralCommunication Topologies. Advancesin Neural Information Processing Systems, pp. 1995-2003 (2013).

6. Bozkaya, B.A., Zhang, J_{., Erku}_t_{, E.}_:_Gene_ti_{c A}_l_gor_it_hm_for_t_{he p-}_M_ed_i_an_P_rob_l_em_{. In}_:

Drezner, Z., Ham_{acher, H. [ed}s_.]._:_F_ac_ilit_{y Loca}_ti_on_:_App_li_ca_ti_ons_{and Theory,}

pp.179-205, Springer, New York (2002).

7. Cooper, L.: An Extens_i_{on of}_t_{he Genera}_li_{zed Weber}_P_rob_l_em_.J_ourna_l_ofRegionalScience 8(2), 181-197 (1968).

8. Cooper, L.: Location-Allocation Problem_{. Opera}_ti_onsRes_earch₁₁_{, 331-343 (1963).}

9. Drezner, Z., Ham_{acher, H.}_:_F_ac_ilit_{y Loca}_ti_on_:_App_li_ca_ti_ons_{and Theory, 460p.,}_S_pr_i

nger-Verlag, Berlin (2004).

10. Drezner, Z., Wes_o_l_ows_{ky, G.O.A.}_:_Tra_j_ec_t_ory_M_e_t_{hod for}_t_{he Op}_tim_i_za_ti_{on of}_t_he Multifacility Location Problem_w_it_h_l_{p D}_is_t_ances_._M_anagem_en_t_S_c_i_ence₂₄_{, 1507-1514}

(1978).

11. Drineas_,_P_.,_F_r_i_{eze, A., Kannan,}R., Vem_pa_l_a,_S_{., V}_i_{nay, V.}_:Clus_t_er_i_{ng Large Graph}s_v_i_a the Singular Value Decomposition. Machine learning 56(1-3), 9-33 (1999).

12. Farahani, R., Hekmatfar, M. (eds.):Facility Location:Concepts, Models, Algorithms and

Cas_e_St_ud_i_es_{, 549 p.,}_S_pr_i_nger-Ver_l_ag,Berlin-Heidelberg (2009).

13. Har-Peled, S., Mazudm_ar,_S_._:Cores_e_ts_{for k-}_M_eans_{and k-}_M_ed_i_anClus_t_er_i_{ng and}_t_he_i_r

Applications_._P_{roc. 36}_t_{h Annu. A}C_MSym_pos_{. Theory}Com_pu_t_{., pp. 291-300 (2003).}

14. Hakimi, S.L. Optimum Locations of Switching Centers and the Absolute Centers and M e-dians_{of a Graph. Opera}_ti_onsRes_earch₁₂_{(3), 450-459 (1964).}

15. Hans_en_P_.Brim_berg,J_{., Uro}s_ev_i_{c, D.,}_Ml_adenov_i_{c, N.}_:_S_o_l_v_i_{ng Large p-}_M_ed_i_anClus_t_er_i_ng Problems_by_P_r_im_a_l_-dua_l_Var_i_ab_l_{e Ne}_i_ghborhood_S_{earch. Da}_t_a_Mi_n_i_{ng and Know}_l_edge

Dis_covery₁₉_{(3), 351-375 (2009).}

16. Hansen, P., Mladenovic N.:J-Means: a New LocalSearch Heuristic for MinimumSum of SquaresClustering. Pattern Recognition 34(2), 405 413 (2001).

17. Hos_age,C.M., Goodchild, M.F.: Dis_cre_t_e_S_{pace Loca}_ti_onA_ll_oca_ti_on_S_o_l_u_ti_ons_from_Gene_t -ic Algorithms_{. Anna}_ls_{of Opera}_ti_onsRes_earch₆_{, 35-46 (1986).}

18. Houck, C.R., J_o_i_nes_,J_{.A., Kay, G.}_M_._:Com_par_is_{on of Gene}_ti_{c A}_l_gor_it_hms_,RandomR

e-start, and Two-OptSwitching for Solving Large Location-Allocation Problems. Computers

and OperationsResearch 23, 587-596 (1996).

19. Kaufm_{an, L.,}Rouss_eeuw,_P_.J_._:_Fi_nd_i_{ng Group}s_i_{n Da}_t_a_:_{an In}_t_roduc_ti_on_t_oClus_t_{er Ana}_l y-s_is_{. W}_il_{ey, 368 p., New York (1990).}

20. Kazakovts_{ev, L.A., An}_t_am_os_hk_i_{n, A.N.,}_M_as_i_{ch, I.}_S_._:_F_as_t_De_t_erm_i_n_is_ti_{c A}_l_gor_it_hm_for

EEE ComponentsClassification. IOPConf. Series:MaterialsScience and Engineering 94, Article ID 012015, 10 p. (2015), DOI: 10.1088/1757-899X/04/1012015.

21. Kazakovtsev, L.A. Antamoshkin, N.A.: Genetic Algorithm with Fast Greedy Heuristic for

Clus_t_er_i_{ng and Loca}_ti_on_P_rob_l_ems_{. Infor}m_a_ti_ca₃₈_{(3), (2014).}

22. Kazakovts_{ev, L.A., An}_t_am_os_hk_i_{n, A.N. Greedy Heur}_is_ti_c_M_e_t_{hod for Loca}_ti_on_P_rob_l_ems_. SibSAU Ves_t_n_i_k₁₆_{(2), 317-325 (2015).}

(12)

M

23. Kazakovts_{ev, L.A., An}_t_am_os_hk_i_{n, A.N., Gudy}m_a,_M_.N._P_ara_ll_e_l_ny_i_a_l_gor_itm_d_l_ya p-m_ed_i_{annoy zadach}_i_[_P_ara_ll_e_l_A_l_gor_it_hm_{for p-}_M_ed_i_an_P_rob_l_em_]._Sis_t_em_{y Uprav}_l_en_i_{ya I}

Informatsionnye Tekhnologii 52 (2.1), 124 128 (2013).

24. Kazakovtsev, L.A., Orlov, V.I., Stupina, A.A., Kazakovtsev, V.L.:Modied Genetic Al go-rithm_w_it_{h Greedy Heur}_is_ti_{c for}Continuous_{and D}_is_cre_t_{e p-}_M_ed_i_an_P_rob_l_ems_._F_ac_t_a

Univers_it_a_tis_,_S_er_i_es_:_M_a_t_hem_a_ti_cs_{and Infor}m_a_ti_cs₃₀_{(1), 89-106 (2015).}

25. Klas_t_or_i_{n, T.D.}_:_{The p-}_M_ed_i_an_P_rob_l_em_forClus_t_{er Ana}_l_ys_is_:_ACom_para_ti_{ve Te}s_t_Us_i_ng the Mixture Model Approach. ManagementScience 31(1), 84-95 (1985).

26. Kochetov, Yu. A.:Metody lokalnogo poiska dlya diskretnykh zadach razmescheniya [Lo-calSearch Methods for Discrete Location Problems]. Doctoral Thesis. 259 p., Sobolev

In-s_tit_u_t_{e of}_M_a_t_hem_a_ti_cs_{, Novo}s_i_b_i_rs_{k (2010).}

27. Kris_{hna, K.,}_M_ur_t_y,_M_._:_Gene_ti_{c K-}_M_eans_A_l_gor_it_hm_{. IEEE Tran}s_ac_ti_{on on}_S_ys_t_em_,_M_an

and Cybernetics_-_P_ar_tB29, 433-439 (1999).- Vol.29.

28. Liao, K., Guo, D.: A Clustering-Based Approach to the Capacitated Facility Location Problem_{. Tran}s_ac_ti_ons_i_{n GI}_S₁₂_{(3), 323-339 (2008).}

29. Lim_{, A., Xu, Z.}_:_A_Fi_xed-Leng_t_h_S_ubs_e_t_Gene_ti_{c A}_l_gor_it_hm_for_t_{he p-}_M_ed_i_an_P_rob_l_em_.

Lecture Notes_i_nCom_pu_t_er_S_c_i_ence₂₇₂₄_{, 1596-1597 (2003).}

30. Lloyd, S.P.: Leas_t_S_quares_Quan_ti_za_ti_on_i_n_PC_M. IEEE Trans_ac_ti_ons_{on Infor}m_a_ti_on

Theo-ry 28, 129-137 (1982).

31. Lucas_i_us_,C.B., Dane, A.D., Katem_{an, G.}_:_{On K-}_M_edo_i_dClus_t_er_i_{ng of Large Da}_t_a_S_e_ts

with the Aid of a Genetic Algorithm_:Background, Feas_i_b_ilit_{y and}Com_par_is_{on. Ana}_l_y_ti_ca_l Chim_i_{ca Ac}_t_a₂₈₂_{, 647-669 (1993).}

32. MacQueen, J_.B.:Som_e_M_e_t_hods_ofClass_i_f_i_ca_ti_{on and Ana}_l_ys_is_of_M_u_lti_var_i_a_t_{e Ob}s erva-tions. Proceedings of the 5th Berkley Symposium on MathematicalStatistics and Probabil -ity 1, 281 297 (1967).

33. Mas_uyam_a,_S_{., Ibarak}_i_{, T., Ha}s_{egawa, T.}_:_TheCom_pu_t_a_ti_ona_lCom_p_l_ex_it_{y of}_t_hem_-Center Problems_on_t_he_Pl_{ane. The Tran}s_ac_ti_ons_of_t_{he In}s_tit_u_t_{e of E}_l_ec_t_ron_i_cs_andComm_un_i ca-tion Engineers_ofJ_apan₆₄_E_{, 57-64 (1981).}

34. Meira, L.A.A., Miyazawa, F.K.: A ContinuousFacility Location Problem and Its Appli ca-tion to a Clus_t_er_i_ng_P_rob_l_em_._P_roceed_i_ngs_of_t_{he A}C_Ms_ym_pos_i_um_{on App}_li_{ed co}m_pu_ti_ng

(SAC '08), pp.1826-1831, AC_M, New York (2008), DOI:10.1145/1363686.1364126. 35. Mis_{hra, N., Ob}_li_{nger, D.,}_Pitt_{, L.}_:_S_ub_li_{near T}_im_{e Approx}_im_a_t_eClus_t_er_i_{ng. 12}_t_h_S_ODA,

pp. 439 447 (2001).

36. Mladenovic, N., Brimberg, J., Hansen, P.: The p-Median Problem: A Survey of Metaheuristic Approaches. European Journal of OperationalResearch 179(3), 927 939 (2007).

37. Moreno-Perez, J_.A.,Roda Garcia, J_.L.,_M_oreno-Vega,J_._M_._:_A_P_ara_ll_e_l_Gene_ti_{c A}_l_gor_it_hm

for the Dis_cre_t_{e p-}_M_ed_i_an_P_rob_l_em_._St_ud_i_es_i_{n Loca}_ti_{on Ana}_l_ys_is_{7, 131-141 (1994).}

38. Muhlenbein, H., Shomisch, M., Born, J.: The parallel Genetic Algorithm asFunction Op-timizer. Proceedings of the Fourth Conference of Genetic Algorithms, pp.271-278, San Mateo,Morgan Kaufm_{ann (1991).}

39. Neem_a,_M_.N.,_M_an_i_ruzzam_{an, K.}_M_{., Ohga}_i_{, A.}_:_{New Gene}_ti_{c A}_l_gor_it_hms Bas_ed

Ap-proaches _t_oContinuous_p-_M_ed_i_an_P_rob_l_em_{. Ne}_t_w._S_pa_t_{. Econ.}₁₁_{, 83-99 (2011),}

DOI:10.1007/s11067-008-9084-5.

40. Res_ende,_M_.G.C.:Metaheuris_ti_{c hybr}_i_d_i_za_ti_{on w}_it_{h Greedy}Random_i_{zed Adap}_ti_ve_S_earch Procedures. In:Chen, Zh.-L., Raghavan, S. (eds.): TutORialsin OperationsResearch, pp.295-319. INFOR_MS (2008).

(13)

41. Res_ende,_M_.G.C., Ribeiro, C.C., Glover, F., Marti, R.:Scatter Search and Path Relinking: Fundam_en_t_a_ls_{, Advance}s_{, and App}_li_ca_ti_ons_{. In}_:_Gendreau,_M_.,_P_o_t_v_i_n,J_{.-Y. (ed}s_.)_:

Hand-book of Metaheuristics, 2nd Edition, pp.87-107, Springer (2010).

42. Resende, M., Werneck, R.: On the Implementation of a Swap-Based LocalSearch P roce-dure for the p-Median Problem_._P_roceed_i_ngs_of_t_he_Fi_f_t_{h Work}s_{hop on A}_l_gor_it_hm_Eng_i

-neering and Experim_en_ts_{(ALENEX'03), pp.119-127,}_S_IA_M_,_P_h_il_ade_l_ph_i_{a (2003).}

43. Sheng, W., Liu, X.: A Genetic k-MedoidsClus_t_er_i_{ng A}_l_gor_it_hm_.J_ourna_l_{of Heur}_is_ti_cs₁₂_,

447-466 (2006).

44. Sun, Zh., Fox, G., Gu, W., Li, Zh.: A ParallelClustering Method Combined Information

Bottleneck Theory and Centroid Based Clustering. The Journal of Supercomputing 69(1), 452-467 (2014), DOI: 10.1007/s_{11227-014-1174-1.}

45. Teitz, M.B., Bart, P. Heuris_ti_c_M_e_t_hods_{for E}s_tim_a_ti_ng_t_{he Genera}_li_{zed Ver}_t_ex_M_ed_i_{an of a}

Weighted Graph. OperationsRes_{earch 16, 955-961 (1968).}

46. Zhang T., Ramakrishnan, R., Livny, M.:BIRCH: An Effcient Data Clustering Method for Very Large Databas_es_._P_roceed_i_ngs_of_t_{he 1996 A}C_MSIGMOD InternationalConference on Managem_en_t_{of Da}_t_{a (}_S_IG_M_{OD '96), pp. 103-114, A}C_M, New York (1996).