• No results found

High utility-itemset mining and privacy-preserving utility mining

N/A
N/A
Protected

Academic year: 2021

Share "High utility-itemset mining and privacy-preserving utility mining"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Availableonlineatwww.sciencedirect.com

ScienceDirect

jo u rn al h o m e p a g e :w w w . e l s e v i e r . c o m / p i s c

High

utility-itemset

mining

and

privacy-preserving

utility

mining

Jerry

Chun-Wei

Lin

a,∗

,

Wensheng

Gan

a

,

Philippe

Fournier-Viger

b

,

Lu

Yang

a

,

Qiankun

Liu

a

,

Jaroslav

Frnda

c

,

Lukas

Sevcik

c

,

Miroslav

Voznak

c

aSchoolofComputerScienceandTechnologyHarbinInstituteofTechnologyShenzhenGraduateSchool,

Shenzhen,China

bSchoolofNaturalSciencesandHumanities,HarbinInstituteofTechnologyShenzhenGraduateSchool,

Shenzhen,China

cDepartmentofTelecommunications,FacultyofElectricalEngineeringandComputerScience,

VSB-TechnicalUniversityofOstrava,17.listopadu15,70800Ostrava-Poruba,CzechRepublic Received27October2015;receivedinrevisedform27October2015;accepted11November2015 Availableonline 10December2015

KEYWORDS Datamining; High-utilityitemset; Privacypreserving; PSOalgorithm; GAalgorithm; Evolutionary algorithms

Summary In recent decades, high-utility itemset mining (HUIM) has emerging a critical researchtopicsincethequantityandprofitfactorsarebothconcernedtominethehigh-utility itemsets(HUIs). Generally,dataminingiscommonlyusedtodiscoverinteresting anduseful knowledgefrommassive data.Itmay,however,lead toprivacythreatsif private orsecure information(e.g.,HUIs)arepublishedinthepublicplaceormisused.Inthispaper,wefocus ontheissuesofHUIMandprivacy-preservingutilitymining(PPUM),andpresenttwo evolution-aryalgorithmstorespectivelymineHUIsandhidethesensitivehigh-utilityitemsetsinPPUM. ExtensiveexperimentsshowedthatthetwoproposedmodelsfortheapplicationsofHUIMand PPUMcannotonlygeneratethehighqualityprofitableitemsetsaccordingtotheuser-specified minimumutilitythreshold,butalsoenablethecapabilityofprivacypreservingforprivateor secureinformation(e.g.,HUIs)inreal-wordapplications.

©2015PublishedbyElsevierGmbH.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Thisarticleispartofaspecialissueentitled‘‘Proceedingsof the1stCzech-ChinaScientificConference2015’’.

Correspondingauthor.

E-mailaddress:jerrylin@ieee.org(J.C.-W.Lin).

Introduction

Withtherapidgrowthofinformationtechniquesand vari-ousapplications,theKnowledgeDiscoveryinDatabase(KDD) which is alsocalled data mining, has become a powerful technique and commonly be used to discover interesting and useful knowledge frommassive data. The discovered http://dx.doi.org/10.1016/j.pisc.2015.11.013

2213-0209/©2015PublishedbyElsevierGmbH.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/ licenses/by-nc-nd/4.0/).

(2)

knowledgefromdataminingcanbegenerallyclassifiedas frequent patterns(FIs)or association rules(ARs) (Agrawal et al., 1993a,b; Han et al., 2004), high-utility patterns (HUPs) (Ahmed etal.,2009;Chanetal.,2003; Liuetal., 2005;Yaoetal.,2004),sequentialpatterns(SPs)(Agrawal and Srikant, 1995; Pei et al., 2004), clustering (Berkhin, 2006) and classification (Kotsiantis, 2007), among others. Amongthem,association-rulemining(ARM)(Agrawaletal., 1993a,b;Han et al.,2004)is the fundamental knowledge whichcanbecommonlyusedtoanalyzethepurchasedata ofcustomers.However,ARMonlyconsiderswhetherornot theitemor itemsetispresent inatransaction.The other factorsinreal-lifeapplications,suchasweight,profit, quan-tity,riskorothermeasures,arenotconsideredinARM.Thus, high-utilityitemsetmining(HUIM)(Ahmedetal.,2009;Chan etal.,2003;Liuetal.,2005;Yaoetal.,2004)has emerg-inga criticalissue in recentyearssinceit canbeusedto revealprofitableitemsets(high-utilityitemsets)by consid-eringbothpurchasequantityanddistinctprofitfactors.

Due to quick proliferation of massive data from gov-ernment, corporations and organizations, the discovered knowledge may, however, implicitly contain confidential, privateor secureinformation(i.e.,personalidentification numbers,address information, socialsecuritynumbers,or creditcardnumbers),whichleadstoprivacythreatsifthey are misused (Agrawal and Srikant, 2000; Verykios et al., 2004).Generally, collaborationamongindustriescanwork together to share information for achieving higher bene-fitsandprofitsinbusiness.However,thesharedinformation canbeextractedandanalyzedbytheothercollaboratorsor competitors, thusdecreasing itsownbenefits andcausing thesecuritythreats.Privacy-preservingdatamining(PPDM) wasthusproposedtoaddresstheabovelimitationsby per-turbingtheoriginaldatabaseandproducingasanitizedone (Amiri,2007).Manyalgorithmshave beenextensively pro-posed tohide the private or secure information (i.e.,FIs or ARs) from the different type databases (Dunning and Kresman, 2013;Han andNg, 2007).As the similar consid-erationofPPDM,privacypreservingforhigh-utilityitemset mining(PPUM)hasalsobecomeanimportanttopicinrecent years. Fewer studies have addressed the issue of PPUM and mostof them areprocessed toreduce the qualityor delete transactions for hiding sensitive high-utility item-sets(SHUIs).SeveralrelatedalgorithmsforPPUMhavebeen extensively studied, such as the Hiding High-Utility Item-setFirst(HHUIF)algorithmandMaximumSensitiveItemsets ConflictFirst(MSICF)algorithmtohidetheSHUIs(Yehand Hsu,2010),theGA-basedalgorithmforhidingSHUIsthrough transactioninsertion(Linetal.,2014a),andtheFast Pertur-bationalgorithmUsingaTreestructureandTables(FPUTT) algorithm(KannimuthuandPremalatha,2015)tospeedthe sanitization process withan aided tree structure and the associated index table. The above approaches, however, areinsufficientsincewhetherPPDMorPPUMbelongstothe NP-hardproblem.Itisnecessarytohidethesensitive infor-mation but discover the required information in decision making.

Inthispaper,weaddresstheresearchissuesby propos-ingefficientalgorithmsappliedinHUIMandPPUM.Wefirstly proposeaPSO-basedalgorithmforHUIM,andsecondly pro-pose a GA-based approach to evaluate the effectiveness andefficiencyofPPUM.Keycontributionsofthispaperare

presentbelow.(1)Wepresenttwoevolutionaryalgorithms, the PSO-based algorithm for efficiently mining HUIs and theGA-basedprivacypreservingalgorithminPPUM.(2)Not onlytheminingperformanceforHUIM,butalsoatrade-off betweenminingperformanceanditsprivacypreservingfor theSHUIscan beensured.(3) Extensiveexperiments con-ductedonseveralreal-lifeand syntheticdatasets showed thatthetwoproposedmodelsforHUIMandtheapplications ofPPUMhavebetterresultsthanthepreviousworkswhether inHUIMorPPUM.

Background

High-utilityitemsetmining

The concept of high-utility itemset mining was first pro-posedby Chanetal. (2003),and themathematical mode wasformedbyYaoetal.(2004).Theproblemofhigh-utility itemset mining (HUIM) was defined to find the rare fre-quenciesitemsets butwithhigh profits(Yaoetal.,2004). Sincethedownwardclosurepropertyisnolongerkeptfor HUIM,aTwo-Phasemodel(Liuetal.,2005)waspresentedto keepthetransaction-weightedutilizationdownwardclosure (TWDC) property for discovering HUIs.Several tree-based algorithmswerealsoproposed,suchastheHUP-tree-based IHUP algorithm for mining HUIs in incrementaldatabases (Ahmedetal.,2009),theHUP-growthalgorithm(Linetal., 2011)to findHUIs without candidategeneration, andthe efficientUP-tree-basedtwomining algorithms, UP-growth (Tseng et al., 2010)and UP-growth+ (Tseng etal., 2013). Liuetal.then proposedtheHUI-Miner algorithm(Liuand Qu, 2012) tobuild utility-liststructures and todevelop a set-enumerationtreetodirectlyextractHUIswithouteither candidategenerationoranadditionaldatabaserescan.The improvedFHM algorithm wasfurtherdesigned by enhanc-ingtheHUI-Miner foranalyzing theco-occurrences among 2-itemsets(Fournier-Vigeretal.,2014).

Besides, many interesting other issues on HUIM have alsobeenextensivelystudied,suchasup-to-dateHUIs min-ing which aims at discovering recent HUIs which may be moreuseful andinteresting(Lin etal.,2015d);HUIs min-ingindynamicenvironment(e.g.,datainsertion(Linetal., 2014b), data deletion (Lin et al., 2015a), and data mod-ification (Lin et al., 2015c)), HUIs mining in stream data environment (Li et al., 2008); on-shelf HUIs mining (Lan etal.,2011);top-kHUIsmining(Wuetal.,2012);HUIs min-ing with multiple minimum utility thresholds (Lin et al., 2015b); HUIs mining with or without negative unit profit value(Fournier-Viger,2014).

Privacypreservingforhigh-utilityitemsetsmining

Althoughdataminingvarioustechniquescanbeusedtofind theimplicitinformation,theconfidentialorsecure informa-tionis,however,requiredtobehiddenbeforeitispublished in the public place or shared among the collaborators. Privacy-preservingdatamining(PPDM)hasthusarisenasan importanttopicinrecentyears(AgrawalandSrikant,2000; Han and Ng, 2007; Verykioset al., 2004). A novel recon-structionprocedurewaspresented toaccuratelyestimate the distributionof original data values, thus building the

(3)

classifierstocomparetheaccuracy betweenthesanitized dataandtheoriginalones(AgrawalandSrikant,2000). Sev-eralalgorithmsforhidingthesensitiveinformationofPPDM arestilldesignedinprogress,suchas(DunningandKresman, 2013;HanandNg,2007).Therearethreesideeffects com-monly usedin PPDM,such asthe side effects ofartificial cost(AC),missingcost(MC),andhidingfailure(Dunningand Kresman,2013;HanandNg,2007).Sincemoreuseful infor-mationinhigh-utilityitemsetsthaninthatofthefrequent itemsetsorsequentialpatterns,privacypreservingfor high-utilityitemsetsmining(PPUM)ismorerealisticandcritical thanPPDM(Linetal.,2014a;KannimuthuandPremalatha, 2015;YehandHsu,2010)ThemaintaskofPPUMistohide thesensitive high-utility itemsets (SHUIs). Yeh et al.first designedtheHidingHighUtilityItemsetFirst(HHUIF) algo-rithmandMaximumSensitiveItemsetsConflictFirst(MSICF) andMaximumSensitiveItemsetsConflictFirst(MSICF) algo-rithmsto hideSHUIs inPPUM (Yeh andHsu,2010). Inthe past,Linetal.firstdevelopedaGA-basedmethodtohide the user-specifiedSHUIs by inserting the dummy transac-tionstotheoriginaldatabases(Linetal.,2014a).Yunetal. thendevelopedatree-basedalgorithmcalledFast Perturba-tionalgorithmUsingaTreestructureandTables(FPUTT)for hidingSHUIsinPPUM(SKannimuthuandPremalatha,2015).

Proposed

evolutionary

algorithm

for

HUIM

In this section, we propose an efficient PSO-based high-utility itemset mining model, and compared with the state-of-the-artHUPEumu-GRAMalgorithm(Kannimuthuand

Premalatha,2014)toevaluatetheefficiencyofthe devel-opedalgorithm.Detailsaredescribedbelow.

BinaryPSOforHUIM

There are four processes to mine HUIs based on the binary PSO model (Kennedy and Eberhart, 1997), which are pre-processing, particle encoding, fitness evaluation andtheupdatingprocess.Inthepre-processingprocess,it firstdiscoveredthehightransaction-weightedutilization 1-itemsets(1-HTWUIs)based ontheTWUmodel (Liuetal., 2005).Toavoidtheinvalidgenerationsoftheparticles pro-ducingintheupdatingprocess,italsocompressesthevalid combinationsoftheitemsetsinthedatabaseasa maximal-patterntree(MP-tree).Itdetermineswhetherthecombined itemsetscangeneratethevalidcombinationor notinthe databasesbased on the OR and NOR operators. A simple MP-treeisshowninFig.1.

The items of 1-HTWUIs are sorted in alphabetic-ascending order corresponding to the jth position of a particleinthesecondparticleencodingprocess.Each par-ticleisencodedasthesetofbinaryvariablescorresponding tothesortedorderof1-HTWUIs.Thentheparticles calcu-latetheirownfitnesstofindthepbestandgbestparticles inthefitnessevaluation.Forthelastupdatingprocess,the particlesarecorrespondinglyupdatedbyvelocities,pbest, gbest, and the sigmoid function. The MP-tree structure is used to generate valid combinations of the particles, which can greatly reduce the computations of multiple databasescans.Ifthefitnessvalueoftheparticleisnoless thantheminimumutilityvalue,theitemsetcorresponding

c

b

f

d

f

d

f

d

c

NOR OR OR OR NOR null NOR OR OR null OR NOR OR null

root

b

c

d

f

Figure1 ThedevelopedMP-treestructure. to the particle will be concerned as a HUI and put into the set of HUIs. The fitness function value of particle is the same as the utility value of corresponding itemset as: fitness(pi)=u(X), in which X represent the itemset corresponding to the particle pi. The iteration repeated

untilreachingtheterminationcriteriaandthen thesetof HUIswillbediscovered.DetailsoftheproposedPSO-based HUIsminingalgorithmareshowninAlgorithm1.

Algorithm1(ProposedalgorithmforHUIM).

Input:D,aquantitativedatabase;ptable,aprofittable;ı, theminimumutilitythreshold;M,thenumberofparticlesof eachiteration.

Output:HUIs,asetofhigh-utilityitemsets 1find1-HTWUIs;

2 setm:=|1-HTWUIs|; 3initializep(t):=either1or0; 4initializev(t):=rand();

5 initializepbest(t)andgbest(t);

6whileterminationcriteriaisnotreacheddo 7 updatethev(t+1)ofMparticles; 8 updatethep(t+1)ofMparticles; 9 fori←1,Mparticlesdo 10iffitness(pi(t+1))≥ıthen 11HUIs←GItems(pi(t+1))∪HUIs; 12 findpbest(t+1)ofeachMparticle; 13 findgbest(t+1)amongM;

14settt+1; 15 returnHUIs;

ExperimentalresultsofbinaryPSOforHUIM

ThealgorithmsintheexperimentswereimplementedinC++ language,performingona PCwithanIntelCore2 i3-4160 CPUand4GBofRAM,runningthe64-bitMicrosoftWindows 7operatingsystem.Tworeal-worlddatasetscalledchessand mushroom(http://fimi.ua.ac.be/data/(2012)),areusedin theexperiments.

(4)

Figure2 Runtimeofthecomparedalgorithms.

Theproposedalgorithmwascomparedwiththe state-of-the-artevolutionaryalgorithmHUPEumu-GRAM(Kannimuthu

andPremalatha,2014)intermsofruntimeandthenumber ofHUIs.Notethattheperformed algorithmswereall per-formedfor10,000iterationsandthepopulationsizeis set as20.The experimentswereexecutedfivetimesand the resultsweretheaveragevaluesofthem.Theparametersof theproposedalgorithmaresetas:c1=c2(=2)andw(=0.9).

The results of the conducted experiments of runtime in two datasets are shown in Fig. 2. It can be seen that the proposed algorithm has better performance than the HUPEumu-GRAM algorithm in performance of runtime.The

reason is that the proposed algorithmonly generates the valid combinations of itemsets existing in the database, whichcangreatlyavoid thecomputationalproblemin the evolution process. The lower the minimumutility thresh-oldisset,themore1-HTWUIsarediscoveredandthemore combinations to find the promising HUIs in the evolution process.

Proposed

evolutionary

algorithm

for

PPUM

Asitwasmentionedabove,PPUMisacriticaltasktosecure thesensitivehigh-utilityitemsets.ForthepurposeofPPUM in variousapplications,it is necessary todevelop a novel PPUMapproachthantheconventionalmethods.Anefficient GA-basedalgorithmforPPUMispresentedbelow.

ProposedPPUMGAT+algorithm

The pseudo-code of the designed PPUMGAT+ algorithm is describedinAlgorithm2.

Algorithm2(ProposedPPUMGAT+algorithm).

Input:D,aquantitativedatabasewithitshigh-utilitysensitive itemset(HS);ptable,aprofittable;Su,theupper(minimum)

utilitythreshold;MDU,themaximaldeletedutility;M,the numberofchromosomesinapopulation;N,thenumberof iterationintheevolutionprocess.

Output:DB,asanitizeddatabase.

Calculatetheitemutilityvalueu(ij),thetransactionutility

valuetu(Ti),andthetotalutilityvalueTUDofthewhole

dataset.

1 setCandtrans=null; 2 foreachTqDdo

3 foreachsiHSdo

4 ifsi∈Tqandtu(Tq)≤ MDUthen

5 CandtransCandtransTq;

6 sortthetransactionsinCandtransinascending orderaccordingtotheirtu;

7 setm=0,sum=0;

8 foreachTqinCandtransdo

9 sum+=tu(Tq);

10 ifsumMDUthen

11 m=m+1;

12 setthesizeofachromosomeasm;

13 generateMchromosomesrandomlyfromCandtrans; 14 calculatethelowerutilitythresholdSlaccordingto

SuandMDUandfindHUIsandPUIs;

15 whileterminationcriteriaisnotreacheddo 16 foreachchromosomeciamongMchromosomesina

populationdo

17 performcrossoveroperation; 18 performmutationoperation; 19 evaluatefitness(ci);

20 selecttopM/2chromosomesinapopulation; 21 generateM/2chromosomesrandomlyfrom

Candtrans;

22 obtaintheoptimalchromosomeciwithminimal

fitnessvaluefromM;

23 deleteTqofcifromDBasDB’;

24 returnDB’;

Firstly,theutilityofeachitem,theutilityofeach trans-action,andtheutilityofthewholedatasetarecalculated byoriginaldatasetandthecorrespondingutilitytable.After that,eachtransactionisthendeterminedandprojectedifit containsanyofsensitivehigh-utilityitemsetsandits trans-actionutilitytuissmallerthanthegivenmaximaldeleted utilityMDUastheCandtrans(Lines1—5).Thensortingthe transactions of Candtrans in ascending order by their tu (Line6). Summingup thesorted transactions bytheir tu, andterminating thenumber of m transactions where the summedvalueisnolongersmallthanMDUandthenumberof missetasthesizeofachromosome(Lines7—12).Afterthat, generatinga populationwithM individualsrandomly from Candtrans andcalculating thelower supportthresholdSl

(5)

ofpre-largeconceptaccordingtotheuppersupport thresh-oldSuandMDU(Lines13—14).Afterthoseoperations,the

crossoverandmutationoperationsareperformedtoupdate thechromosomes(Lines17—18).Thefitnessofeach chromo-someisthenevaluatedbythedesignedfitnessfunction(Line 19).Thehalfofthecurrentchromosomeswithlowestfitness valuesisselectedasthechromosomesandanotherhalf chro-mosomesaregeneratedrandomlyfornextgeneration(Lines 20and21).Theiterationprocedureisprocessedrepeatedly untilthe termination criterion is achieved (Lines15—21). Afterthat,theoptimalchromosomewiththelowestfitness function is obtained and the transaction IDs in this chro-mosomeisselectedasthevictimtransactionsfordeletion, thushidingthesensitivehigh-utilityitemsets(Lines22and 23).Finally, thedatabase is sanitizedintoa non-sensitive database(Line24).

Pre-largeconcept

Tospeeduptheevaluatingprocessandreducethecostof timeandspaceintheevolutionprocess,thepre-large con-cept(Hongetal.,2001)isadoptedinthedesignedalgorithm toavoidthemultipledatabasescanseachtime.Itusesthe upper (Su) and the lower (Sl) support thresholds to keep

theunpromisingitemsetswhichmayhave highly probabil-itytobelargeitemsetsasthebuffertoavoidthemultiple databasescansfortransactiondeletion.Thisstrategyis sim-ple but can be used efficiently to update the discovered information.ForthepurposeofPPUMinthispaper,the sen-sitivehigh-utilityitemsetsarerequiredtobehiddenthrough transaction deletionin the sanitization process. The pre-largeconceptfortransactiondeletionisshownbelow.

Definition1. Asafetydeletedutilitybound(f)ofpre-large conceptindicatesthatanitemsetcannotbelargeaftersome transactionsaredeletedfromtheoriginaldatabasewithout databaserescanas:

f= (Su−Sl)×|D| Su (1) inwhich |D|is the sizeof theoriginal database, Su is the

uppersupportthresholdandSlisthelowersupportthreshold

whichbotharedefinedbyusers’preference.

In the designed PPUMGAT+ algorithm, the maximal deleted utility (MDU)which wasfirstly givenbyusers can beconsideredasthesafetyboundinthepre-largeconcept and totalwhole database utility (TUD)can beconsidered asthedatabasesizeintheoriginaldatabase.Basedonthe pre-largeconcept,wecancalculatetheSlvaluetokeepthe

setofunpromisingpre-largeutilityitemsets(PUIs).

Definition2. LetSubetheuppersupportthresholdwhich

canbedefinedbyusers’preference,andMDUisthemaximal deletedutilityobtainedfromthesensitivehigh-utility item-sets,andTUDisthetotalutilityintheoriginaldatabase.The

Slinthedesignedapproachismodifiedfromthepre-large

conceptanddefinedas:

Sl=Su× 1−MDU TUD (2) Afterdelete the transactions of one chromosome, the deletedutilityofeachHUIandPUIcanbeobtained,andthe realutilityofeachHUIandPUIcanbecalculated.The miss-ingcost(MC)onlyappearsinHUIsandtheartificialcost(AC) onlyoccursinPUIs. Thus,itiseasy todetermine whether eachitemsetofHUIsandPUIsisaHUI.

Experimentalresults

Substantialexperimentswereconductedonreal-lifechess dataset (http://fimi.ua.ac.be/data/(2012)) and synthetic T10I4D100Kdataset(whichwasgeneratedbyIBMQuest Syn-thetic Data Generation Code, http://www.Almaden.ibm. com/cs/quest/syndata.html (1994)) to verify the com-prehensive performance of the proposed algorithm. The implemented algorithm adopts the pre-large concept and an improved strategy to reduce the rescanning of origin database in the evolution process, and the comparison between ourtechniqueand thenaive GA-basedalgorithm (Holland, 1992) alsois given to present the advantage of thepre-largeconceptintheproposedalgorithmintermsof runningtime.Notethatthepercentageofsensitiveitemsets isdenotedassenper.Runtimeofthecomparedapproaches ontwodatasetsunderdifferentminimumutilitythresholds withafixedsenperofHUIsisshownasFig.3.

FromFig.3,itcanbeobservedthattheproposed algo-rithmwhichadoptsthepre-largeconceptoutperformsthe

(6)

naive GA-based algorithm w.r.t. variousminimum support thresholds in all datasets. From Fig. 3(a), it can also be obviouslyseenthattheproposedalgorithmistwiceorthree times faster than the naive GA-based algorithm. For the sparseT10I4D100KdatasetshowninFig.3(b),thedesigned approach is up to two orders of magnitude faster than thecomparedtechnique.The reasonis thatthepre-large concept is usedin the proposed algorithm to reduce the rescanningoforigin database,thedesignedapproachthus only scans the origin database once in the whole evolu-tionprocess.Besides, at the situationof afixed sensitive percentage,thenumberofHUIsisdecreasedwhenthe min-imumsupportthresholdisincreased,lesscomputationsare requiredduetolessnumberofSHUIsshouldbehidden.

Conclusion

Thefirstpartofthisworkistopresentanovelevolutionary algorithmforHUIM.BasedonthedesignedMP-tree,the PSO-basedalgorithmcanconsumelessruntimethantheprevious HUPEumu-GRAM algorithm.The second partof this workis

topresentaGA-basedPPUMalgorithm.The pre-large con-cept wasalsoextended toreduce the computations. The designedapproachwasuptotwoordersofmagnitudefaster thanthecomparednaiveGA-basedtechnique.

Conflict

of

interest

Theauthorsdeclarethatthereisnoconflictofinterest.

Acknowledgements

ThispublicationhasbeencreatedwithintheprojectSupport ofVˇSB-TUOactivitieswithChinawithfinancialsupportfrom the Moravian-Silesian Region and partially was supported by the grant SGS reg. no. SP2015/82 conducted at VSB-Technical University of Ostrava, Czech Republic, and was partiallysupportedbytheShenzhenPeacockProject,China, under grant KQC201109020055A, by the Tencent Project undergrantCCF-TencentRAGR20140114,bytheNatural Sci-entific ResearchInnovation Foundation in Harbin Institute ofTechnologyundergrantHIT.NSRIF.2014100,bythe Shen-zhen Strategic Emerging Industries Program under grant ZDSY20120613125016389,andby National NaturalScience FoundationofChina(NSFC)undergrantNo.61503092.

References

Agrawal,R.,Imielinski,T.,Swami,A.,1993a.Databasemining:a performanceperspective.IEEETrans.Knowl. DataEng. 5(6), 914—925,http://dx.doi.org/10.1109/69.250074.

Agrawal,R.,Imielinski, T.,Swami,A.,1993b. Mining association rules between sets of items in large database. In: The ACM SIGMODInternationalConferenceonManagementofData,pp. 207—216.

Agrawal,R.,Srikant,R.,1995.Mining sequentialpatterns.Paper presentedattheInternationalConferenceonDataEngineering. Agrawal, R., Srikant, R., 2000. Privacy-preserving data mining. ACMSIGMODRec.29(2),439—450,http://dx.doi.org/10.1145/ 335191.335438.

Ahmed,C.F.,Tanbeer,S.K.,Jeong,B.S.,Le,Y.K.,2009.Efficient tree structures for high utilitypattern miningin incremental databases.IEEETrans.Knowl.DataEng.21(12),1708—1721. Amiri, A., 2007. Dare to share: protecting sensitive knowledge

with datasanitization.Decis. Support Syst.43(1), 181—191, http://dx.doi.org/10.1016/j.dss.2006.08.007.

Berkhin, P., 2006. A survey of clustering data mining tech-niques.PaperpresentedattheGroupingMultidimensionalData, http://dx.doi.org/10.1007/3-540-28349-82.

Chan,R.,Yang,Q.,Shen,Y.D.,2003.MiningHighUtilityItemsets. In:IEEEInternationalConferenceonDataMining,pp.19—26. Dunning,L.A.,Kresman,R.,2013.Privacypreservingdatasharing

withanonymousIDassignment.IEEETrans.Inf.ForensicsSecur. 8(2),402—413,http://dx.doi.org/10.1109/TIFS.2012.2235831. Fournier-Viger,P.,2014.FHN:efficientminingofhigh-utility

item-setswithnegativeunitprofits.Adv.DataMin.Appl.,16—29. Fournier-Viger,P.,Wu,C.W.,Zida,S.,Tseng,V.S.,2014.FHM:faster

high-utilityitemsetminingusingestimatedutilityco-occurrence pruning.Found.Intell.Syst.8502,83—92.

Han, J., Pei, J., Yin, Y., Mao, R., 2004. Mining frequent pat-terns without candidate generation: a frequent-pattern tree approach.DataMin.Knowl.Discov.8(1),53—87.

Han,S.,Ng,W.K.,2007.Privacy-preservinggeneticalgorithmsfor rulediscovery.PaperpresentedattheInternationalConference on Data Warehousing and Knowledge Discovery, Regensburg, Germany.

Holland,J.H.,1992.AdaptationinNaturalandArtificialSystems. MITPress.

Hong,T.P.,Wang,C.Y., Tao,Y.H., 2001.Anewincrementaldata miningalgorithmusingpre-largeitemsets.Intell.DataAnal.5 (2),111—129.

Kannimuthu,S.,Premalatha,K.,2015.Afastperturbationalgorithm usingtreestructureforprivacypreservingutilitymining.Expert Syst. Appl. 42 (3), 1149—1165, http://dx.doi.org/10.1016/ j.eswa.2014.08.037.

Kennedy,J.,Eberhart,R.C.,1997.Adiscretebinaryversionofthe particleswarmalgorithm.In:IEEEInternationalConferenceon Systems,Man,andCybernetics,pp.4104—4108.

Kannimuthu, S., Premalatha, K., 2014. Discovery of high util-ity itemsets using genetic algorithm with ranked mutation. Appl.Artif.Intell.28(4),337—359,http://dx.doi.org/10.1080/ 08839514.2014.891839.

Kotsiantis,S.B.,2007. Supervised machine learning:a reviewof classification techniques. Paper presented at the The Con-ference on Emerging Artificial Intelligence Applications in ComputerEngineering:RealWordAISystemswithApplicationsin eHealth,HCI,InformationRetrievalandPervasiveTechnologies. Lan, G.C., Hong, T.P., Tseng, V.S., 2011. Discoveryof high util-ity itemsets from on-shelf time periods of products. Expert Syst. Appl. 38 (5), 5851—5857, http://dx.doi.org/10.1016/ j.eswa.2010.11.040.

Li, H.F., Huang, H.Y., Chen, Y.C., Liu, Y.J., Lee, S.Y., 2008. Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams., pp. 881—886, http://dx.doi.org/10.1109/ icdm.2008.107.

Lin, C.W., Hong, T.P., Lu, W.H., 2011. An effective tree struc-tureformininghighutilityitemsets.ExpertSyst.Appl.38(6), 7419—7424,http://dx.doi.org/10.1016/j.eswa.2010.12.082. Lin,C.W.,Hong,T.P.,Wong,J.W.,Lan,G.C., Lin,W.Y.,2014a.A

GA-basedapproachtohidesensitivehighutilityitemsets.Sci. WorldJ.,2014.

Lin,J.C.W.,Gan,W.,Hong,T.P., Pan,J.S.,2014b.Incrementally updating high-utilityitemsetswithtransaction insertion. Adv. DataMin.Appl.,44—56.

Lin,C.W.,Hong,T.P.,Lan,G.C.,Wong,J.W.,Lin,W.Y.,2015a. Effi-cientupdatingofdiscoveredhigh-utilityitemsetsfortransaction deletionindynamicdatabases.Adv.Eng.Inform.29(1),16—27, http://dx.doi.org/10.1016/j.aei.2014.08.003.

(7)

Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., 2015b. Mining high-utility itemsets with multiple minimum utility thresholds.In:ACMInternationalConferenceonComputer Sci-ence & Software Engineering, pp. 9—17, http://dx.doi.org/ 10.1145/2790798.2790807.

Lin, J.C.W., Gan, W., Hong, T.P., 2015c. A fast updated algo-rithm to maintain the discovered high-utility itemsets for transaction modification. Adv. Eng. Inform. 29(3), 562—574, http://dx.doi.org/10.1016/j.aei.2015.05.003.

Lin,J.C.W.,Gan,W.,Hong,T.P.,Tseng,V.S.,2015d.Efficient algo-rithms for mining up-to-date high-utility patterns. Adv. Eng. Inform.29(3),648—661,http://dx.doi.org/10.1016/j.aei.2015. 06.002.

Liu,M.,Qu,J.,2012.Mininghighutilityitemsetswithoutcandidate generation.In:ACMInternationalConferenceonInformationand KnowledgeManagement,pp.55—64.

Liu,Y.,Liao,W.K.,Choudhary,A.,2005.Atwo-phasealgorithmfor fastdiscoveryofhighutilityitemsets.Paperpresentedatthe LectureNotesinComputerScience,http://dx.doi.org/10.1007/ 1143091979.

Pei,J.,Han,J.,Mortazavi-Asl,B.,Wang,J.,Pinto,H.,Chen,Q.,... Hsu,M.-C.,2004.Miningsequentialpatternsbypattern-growth:

thePrefixSpanapproach.IEEETrans.Knowl.DataEng.16(11), 1424—1440,http://dx.doi.org/10.1109/tkde.2004.77.

Tseng,V.S.,Shie,B.E.,Wu,C.W.,Yu,P.S.,2013.Efficientalgorithms for mininghighutility itemsetsfrom transactionaldatabases. IEEETrans.Knowl.DataEng.25,1772—1786.

Tseng,V.S., Wu,C.W.,Shie,B.E., Yu,P.S.,2010. UP-Growth:an efficientalgorithmforhighutilityitemsetmining.In:The16th ACMSIGKDDInternationalConferenceonKnowledgeDiscovery andDataMining,pp.253—262.

Verykios,V.S.,Bertino,E.,Fovino,I.N.,Provenza,L.P.,Saygin,Y., Theodoridis,Y.,2004.State-of-the-artinprivacypreservingdata mining. ACM SIGMOD Rec. 33(1), 50—57, http://dx.doi.org/ 10.1145/974121.974131.

Wu,C.W.,Shie,B.E.,Tseng,V.S.,Yu,P.S.,2012.Miningtop-Khigh utilityitemsets.In:The18thACMSIGKDDInternational Confer-enceonKnowledgeDiscoveryandDataMining,pp.78—86. Yao,H.,Hamilton,H.J.,Butz,C.J.,2004.Afoundationalapproach

tominingitemsetutilitiesfromdatabases.In:TheSIAM Interna-tionalConferenceonDataMining,pp.211—225.

Yeh,J.S.,Hsu,P.C.,2010.HHUIFandMSICF:novelalgorithmsfor privacy preserving utility mining. Expert Syst. Appl. 37 (7), 4779—4786,http://dx.doi.org/10.1016/j.eswa.2009.12.038.

www.sciencedirect.com w w w . e l s e v i e r . c o m / p i s c http://creativecommons.org/licenses/by-nc-nd/4.0/ (http://fimi.ua.ac.be/data/ http://www.Almaden.ibm.com/cs/quest/syndata.html Mining Mining 1708—1721. 19—26. 16—29. 83—92. 53—87. Privacy-preserving Adaptation 111—129. 4104—4108. Supervised 2014. 44—56. 55—64. 1772—1786. 253—262. 78—86. 211—225.

References

Related documents

tendencies of the rising capitalist order in his. presentation of Biprodas’s

As seen in this table, the effect of the microphone position, engine rotation speed, gear ratio, interaction of microphone position in engine rotation speed, the

Author Chaturvedi (2008) has confirmed that in prior researches the application of the tobacco dust as organic amendment provide the nutrients such as high quantity

• Has the cabinet manufacturer produced reliable data to show a stored endoscope may be directly used on a patient without reprocessing for a validated time.. • Which tests

Therefore a tradeoff can be illustrated between the implementation cost and the hydrological impact on the watershed based on the storm water management approach of using only LID

Conclusions: A progressive series of three double-balloon dilations for cricopharyngeus muscle dysfunction resulted in improved patient reported dysphagia symptom scores and

Sakai T, Matsuyama T, Sano M, Iida T: Identification of novel putative virulence factors, adhesin AIDA and type VI secretion system, in atypical strains of fish pathogenic

Kannieappan et al BMC Pregnancy and Childbirth 2013, 13 42 http //www biomedcentral com/1471 2393/13/42 RESEARCH ARTICLE Open Access Developing a tool for obtaining maternal