Availableonlineatwww.sciencedirect.com
ScienceDirect
jo u rn al h o m e p a g e :w w w . e l s e v i e r . c o m / p i s c
High
utility-itemset
mining
and
privacy-preserving
utility
mining
夽
Jerry
Chun-Wei
Lin
a,∗,
Wensheng
Gan
a,
Philippe
Fournier-Viger
b,
Lu
Yang
a,
Qiankun
Liu
a,
Jaroslav
Frnda
c,
Lukas
Sevcik
c,
Miroslav
Voznak
caSchoolofComputerScienceandTechnologyHarbinInstituteofTechnologyShenzhenGraduateSchool,
Shenzhen,China
bSchoolofNaturalSciencesandHumanities,HarbinInstituteofTechnologyShenzhenGraduateSchool,
Shenzhen,China
cDepartmentofTelecommunications,FacultyofElectricalEngineeringandComputerScience,
VSB-TechnicalUniversityofOstrava,17.listopadu15,70800Ostrava-Poruba,CzechRepublic Received27October2015;receivedinrevisedform27October2015;accepted11November2015 Availableonline 10December2015
KEYWORDS Datamining; High-utilityitemset; Privacypreserving; PSOalgorithm; GAalgorithm; Evolutionary algorithms
Summary In recent decades, high-utility itemset mining (HUIM) has emerging a critical researchtopicsincethequantityandprofitfactorsarebothconcernedtominethehigh-utility itemsets(HUIs). Generally,dataminingiscommonlyusedtodiscoverinteresting anduseful knowledgefrommassive data.Itmay,however,lead toprivacythreatsif private orsecure information(e.g.,HUIs)arepublishedinthepublicplaceormisused.Inthispaper,wefocus ontheissuesofHUIMandprivacy-preservingutilitymining(PPUM),andpresenttwo evolution-aryalgorithmstorespectivelymineHUIsandhidethesensitivehigh-utilityitemsetsinPPUM. ExtensiveexperimentsshowedthatthetwoproposedmodelsfortheapplicationsofHUIMand PPUMcannotonlygeneratethehighqualityprofitableitemsetsaccordingtotheuser-specified minimumutilitythreshold,butalsoenablethecapabilityofprivacypreservingforprivateor secureinformation(e.g.,HUIs)inreal-wordapplications.
©2015PublishedbyElsevierGmbH.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).
夽 Thisarticleispartofaspecialissueentitled‘‘Proceedingsof the1stCzech-ChinaScientificConference2015’’.
∗Correspondingauthor.
E-mailaddress:jerrylin@ieee.org(J.C.-W.Lin).
Introduction
Withtherapidgrowthofinformationtechniquesand vari-ousapplications,theKnowledgeDiscoveryinDatabase(KDD) which is alsocalled data mining, has become a powerful technique and commonly be used to discover interesting and useful knowledge frommassive data. The discovered http://dx.doi.org/10.1016/j.pisc.2015.11.013
2213-0209/©2015PublishedbyElsevierGmbH.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/ licenses/by-nc-nd/4.0/).
knowledgefromdataminingcanbegenerallyclassifiedas frequent patterns(FIs)or association rules(ARs) (Agrawal et al., 1993a,b; Han et al., 2004), high-utility patterns (HUPs) (Ahmed etal.,2009;Chanetal.,2003; Liuetal., 2005;Yaoetal.,2004),sequentialpatterns(SPs)(Agrawal and Srikant, 1995; Pei et al., 2004), clustering (Berkhin, 2006) and classification (Kotsiantis, 2007), among others. Amongthem,association-rulemining(ARM)(Agrawaletal., 1993a,b;Han et al.,2004)is the fundamental knowledge whichcanbecommonlyusedtoanalyzethepurchasedata ofcustomers.However,ARMonlyconsiderswhetherornot theitemor itemsetispresent inatransaction.The other factorsinreal-lifeapplications,suchasweight,profit, quan-tity,riskorothermeasures,arenotconsideredinARM.Thus, high-utilityitemsetmining(HUIM)(Ahmedetal.,2009;Chan etal.,2003;Liuetal.,2005;Yaoetal.,2004)has emerg-inga criticalissue in recentyearssinceit canbeusedto revealprofitableitemsets(high-utilityitemsets)by consid-eringbothpurchasequantityanddistinctprofitfactors.
Due to quick proliferation of massive data from gov-ernment, corporations and organizations, the discovered knowledge may, however, implicitly contain confidential, privateor secureinformation(i.e.,personalidentification numbers,address information, socialsecuritynumbers,or creditcardnumbers),whichleadstoprivacythreatsifthey are misused (Agrawal and Srikant, 2000; Verykios et al., 2004).Generally, collaborationamongindustriescanwork together to share information for achieving higher bene-fitsandprofitsinbusiness.However,thesharedinformation canbeextractedandanalyzedbytheothercollaboratorsor competitors, thusdecreasing itsownbenefits andcausing thesecuritythreats.Privacy-preservingdatamining(PPDM) wasthusproposedtoaddresstheabovelimitationsby per-turbingtheoriginaldatabaseandproducingasanitizedone (Amiri,2007).Manyalgorithmshave beenextensively pro-posed tohide the private or secure information (i.e.,FIs or ARs) from the different type databases (Dunning and Kresman, 2013;Han andNg, 2007).As the similar consid-erationofPPDM,privacypreservingforhigh-utilityitemset mining(PPUM)hasalsobecomeanimportanttopicinrecent years. Fewer studies have addressed the issue of PPUM and mostof them areprocessed toreduce the qualityor delete transactions for hiding sensitive high-utility item-sets(SHUIs).SeveralrelatedalgorithmsforPPUMhavebeen extensively studied, such as the Hiding High-Utility Item-setFirst(HHUIF)algorithmandMaximumSensitiveItemsets ConflictFirst(MSICF)algorithmtohidetheSHUIs(Yehand Hsu,2010),theGA-basedalgorithmforhidingSHUIsthrough transactioninsertion(Linetal.,2014a),andtheFast Pertur-bationalgorithmUsingaTreestructureandTables(FPUTT) algorithm(KannimuthuandPremalatha,2015)tospeedthe sanitization process withan aided tree structure and the associated index table. The above approaches, however, areinsufficientsincewhetherPPDMorPPUMbelongstothe NP-hardproblem.Itisnecessarytohidethesensitive infor-mation but discover the required information in decision making.
Inthispaper,weaddresstheresearchissuesby propos-ingefficientalgorithmsappliedinHUIMandPPUM.Wefirstly proposeaPSO-basedalgorithmforHUIM,andsecondly pro-pose a GA-based approach to evaluate the effectiveness andefficiencyofPPUM.Keycontributionsofthispaperare
presentbelow.(1)Wepresenttwoevolutionaryalgorithms, the PSO-based algorithm for efficiently mining HUIs and theGA-basedprivacypreservingalgorithminPPUM.(2)Not onlytheminingperformanceforHUIM,butalsoatrade-off betweenminingperformanceanditsprivacypreservingfor theSHUIscan beensured.(3) Extensiveexperiments con-ductedonseveralreal-lifeand syntheticdatasets showed thatthetwoproposedmodelsforHUIMandtheapplications ofPPUMhavebetterresultsthanthepreviousworkswhether inHUIMorPPUM.
Background
High-utilityitemsetmining
The concept of high-utility itemset mining was first pro-posedby Chanetal. (2003),and themathematical mode wasformedbyYaoetal.(2004).Theproblemofhigh-utility itemset mining (HUIM) was defined to find the rare fre-quenciesitemsets butwithhigh profits(Yaoetal.,2004). Sincethedownwardclosurepropertyisnolongerkeptfor HUIM,aTwo-Phasemodel(Liuetal.,2005)waspresentedto keepthetransaction-weightedutilizationdownwardclosure (TWDC) property for discovering HUIs.Several tree-based algorithmswerealsoproposed,suchastheHUP-tree-based IHUP algorithm for mining HUIs in incrementaldatabases (Ahmedetal.,2009),theHUP-growthalgorithm(Linetal., 2011)to findHUIs without candidategeneration, andthe efficientUP-tree-basedtwomining algorithms, UP-growth (Tseng et al., 2010)and UP-growth+ (Tseng etal., 2013). Liuetal.then proposedtheHUI-Miner algorithm(Liuand Qu, 2012) tobuild utility-liststructures and todevelop a set-enumerationtreetodirectlyextractHUIswithouteither candidategenerationoranadditionaldatabaserescan.The improvedFHM algorithm wasfurtherdesigned by enhanc-ingtheHUI-Miner foranalyzing theco-occurrences among 2-itemsets(Fournier-Vigeretal.,2014).
Besides, many interesting other issues on HUIM have alsobeenextensivelystudied,suchasup-to-dateHUIs min-ing which aims at discovering recent HUIs which may be moreuseful andinteresting(Lin etal.,2015d);HUIs min-ingindynamicenvironment(e.g.,datainsertion(Linetal., 2014b), data deletion (Lin et al., 2015a), and data mod-ification (Lin et al., 2015c)), HUIs mining in stream data environment (Li et al., 2008); on-shelf HUIs mining (Lan etal.,2011);top-kHUIsmining(Wuetal.,2012);HUIs min-ing with multiple minimum utility thresholds (Lin et al., 2015b); HUIs mining with or without negative unit profit value(Fournier-Viger,2014).
Privacypreservingforhigh-utilityitemsetsmining
Althoughdataminingvarioustechniquescanbeusedtofind theimplicitinformation,theconfidentialorsecure informa-tionis,however,requiredtobehiddenbeforeitispublished in the public place or shared among the collaborators. Privacy-preservingdatamining(PPDM)hasthusarisenasan importanttopicinrecentyears(AgrawalandSrikant,2000; Han and Ng, 2007; Verykioset al., 2004). A novel recon-structionprocedurewaspresented toaccuratelyestimate the distributionof original data values, thus building the
classifierstocomparetheaccuracy betweenthesanitized dataandtheoriginalones(AgrawalandSrikant,2000). Sev-eralalgorithmsforhidingthesensitiveinformationofPPDM arestilldesignedinprogress,suchas(DunningandKresman, 2013;HanandNg,2007).Therearethreesideeffects com-monly usedin PPDM,such asthe side effects ofartificial cost(AC),missingcost(MC),andhidingfailure(Dunningand Kresman,2013;HanandNg,2007).Sincemoreuseful infor-mationinhigh-utilityitemsetsthaninthatofthefrequent itemsetsorsequentialpatterns,privacypreservingfor high-utilityitemsetsmining(PPUM)ismorerealisticandcritical thanPPDM(Linetal.,2014a;KannimuthuandPremalatha, 2015;YehandHsu,2010)ThemaintaskofPPUMistohide thesensitive high-utility itemsets (SHUIs). Yeh et al.first designedtheHidingHighUtilityItemsetFirst(HHUIF) algo-rithmandMaximumSensitiveItemsetsConflictFirst(MSICF) andMaximumSensitiveItemsetsConflictFirst(MSICF) algo-rithmsto hideSHUIs inPPUM (Yeh andHsu,2010). Inthe past,Linetal.firstdevelopedaGA-basedmethodtohide the user-specifiedSHUIs by inserting the dummy transac-tionstotheoriginaldatabases(Linetal.,2014a).Yunetal. thendevelopedatree-basedalgorithmcalledFast Perturba-tionalgorithmUsingaTreestructureandTables(FPUTT)for hidingSHUIsinPPUM(SKannimuthuandPremalatha,2015).
Proposed
evolutionary
algorithm
for
HUIM
In this section, we propose an efficient PSO-based high-utility itemset mining model, and compared with the state-of-the-artHUPEumu-GRAMalgorithm(Kannimuthuand
Premalatha,2014)toevaluatetheefficiencyofthe devel-opedalgorithm.Detailsaredescribedbelow.
BinaryPSOforHUIM
There are four processes to mine HUIs based on the binary PSO model (Kennedy and Eberhart, 1997), which are pre-processing, particle encoding, fitness evaluation andtheupdatingprocess.Inthepre-processingprocess,it firstdiscoveredthehightransaction-weightedutilization 1-itemsets(1-HTWUIs)based ontheTWUmodel (Liuetal., 2005).Toavoidtheinvalidgenerationsoftheparticles pro-ducingintheupdatingprocess,italsocompressesthevalid combinationsoftheitemsetsinthedatabaseasa maximal-patterntree(MP-tree).Itdetermineswhetherthecombined itemsetscangeneratethevalidcombinationor notinthe databasesbased on the OR and NOR operators. A simple MP-treeisshowninFig.1.
The items of 1-HTWUIs are sorted in alphabetic-ascending order corresponding to the jth position of a particleinthesecondparticleencodingprocess.Each par-ticleisencodedasthesetofbinaryvariablescorresponding tothesortedorderof1-HTWUIs.Thentheparticles calcu-latetheirownfitnesstofindthepbestandgbestparticles inthefitnessevaluation.Forthelastupdatingprocess,the particlesarecorrespondinglyupdatedbyvelocities,pbest, gbest, and the sigmoid function. The MP-tree structure is used to generate valid combinations of the particles, which can greatly reduce the computations of multiple databasescans.Ifthefitnessvalueoftheparticleisnoless thantheminimumutilityvalue,theitemsetcorresponding
c
b
f
d
f
d
f
d
c
NOR OR OR OR NOR null NOR OR OR null OR NOR OR nullroot
b
c
d
f
Figure1 ThedevelopedMP-treestructure. to the particle will be concerned as a HUI and put into the set of HUIs. The fitness function value of particle is the same as the utility value of corresponding itemset as: fitness(pi)=u(X), in which X represent the itemset corresponding to the particle pi. The iteration repeated
untilreachingtheterminationcriteriaandthen thesetof HUIswillbediscovered.DetailsoftheproposedPSO-based HUIsminingalgorithmareshowninAlgorithm1.
Algorithm1(ProposedalgorithmforHUIM).
Input:D,aquantitativedatabase;ptable,aprofittable;ı, theminimumutilitythreshold;M,thenumberofparticlesof eachiteration.
Output:HUIs,asetofhigh-utilityitemsets 1find1-HTWUIs;
2 setm:=|1-HTWUIs|; 3initializep(t):=either1or0; 4initializev(t):=rand();
5 initializepbest(t)andgbest(t);
6whileterminationcriteriaisnotreacheddo 7 updatethev(t+1)ofMparticles; 8 updatethep(t+1)ofMparticles; 9 fori←1,Mparticlesdo 10iffitness(pi(t+1))≥ıthen 11HUIs←GItems(pi(t+1))∪HUIs; 12 findpbest(t+1)ofeachMparticle; 13 findgbest(t+1)amongM;
14sett←t+1; 15 returnHUIs;
ExperimentalresultsofbinaryPSOforHUIM
ThealgorithmsintheexperimentswereimplementedinC++ language,performingona PCwithanIntelCore2 i3-4160 CPUand4GBofRAM,runningthe64-bitMicrosoftWindows 7operatingsystem.Tworeal-worlddatasetscalledchessand mushroom(http://fimi.ua.ac.be/data/(2012)),areusedin theexperiments.
Figure2 Runtimeofthecomparedalgorithms.
Theproposedalgorithmwascomparedwiththe state-of-the-artevolutionaryalgorithmHUPEumu-GRAM(Kannimuthu
andPremalatha,2014)intermsofruntimeandthenumber ofHUIs.Notethattheperformed algorithmswereall per-formedfor10,000iterationsandthepopulationsizeis set as20.The experimentswereexecutedfivetimesand the resultsweretheaveragevaluesofthem.Theparametersof theproposedalgorithmaresetas:c1=c2(=2)andw(=0.9).
The results of the conducted experiments of runtime in two datasets are shown in Fig. 2. It can be seen that the proposed algorithm has better performance than the HUPEumu-GRAM algorithm in performance of runtime.The
reason is that the proposed algorithmonly generates the valid combinations of itemsets existing in the database, whichcangreatlyavoid thecomputationalproblemin the evolution process. The lower the minimumutility thresh-oldisset,themore1-HTWUIsarediscoveredandthemore combinations to find the promising HUIs in the evolution process.
Proposed
evolutionary
algorithm
for
PPUM
Asitwasmentionedabove,PPUMisacriticaltasktosecure thesensitivehigh-utilityitemsets.ForthepurposeofPPUM in variousapplications,it is necessary todevelop a novel PPUMapproachthantheconventionalmethods.Anefficient GA-basedalgorithmforPPUMispresentedbelow.
ProposedPPUMGAT+algorithm
The pseudo-code of the designed PPUMGAT+ algorithm is describedinAlgorithm2.
Algorithm2(ProposedPPUMGAT+algorithm).
Input:D,aquantitativedatabasewithitshigh-utilitysensitive itemset(HS);ptable,aprofittable;Su,theupper(minimum)
utilitythreshold;MDU,themaximaldeletedutility;M,the numberofchromosomesinapopulation;N,thenumberof iterationintheevolutionprocess.
Output:DB,asanitizeddatabase.
Calculatetheitemutilityvalueu(ij),thetransactionutility
valuetu(Ti),andthetotalutilityvalueTUDofthewhole
dataset.
1 setCandtrans=null; 2 foreachTq∈Ddo
3 foreachsi∈HSdo
4 ifsi∈Tqandtu(Tq)≤ MDUthen
5 Candtrans←Candtrans∪Tq;
6 sortthetransactionsinCandtransinascending orderaccordingtotheirtu;
7 setm=0,sum=0;
8 foreachTqinCandtransdo
9 sum+=tu(Tq);
10 ifsum≤MDUthen
11 m=m+1;
12 setthesizeofachromosomeasm;
13 generateMchromosomesrandomlyfromCandtrans; 14 calculatethelowerutilitythresholdSlaccordingto
SuandMDUandfindHUIsandPUIs;
15 whileterminationcriteriaisnotreacheddo 16 foreachchromosomeciamongMchromosomesina
populationdo
17 performcrossoveroperation; 18 performmutationoperation; 19 evaluatefitness(ci);
20 selecttopM/2chromosomesinapopulation; 21 generateM/2chromosomesrandomlyfrom
Candtrans;
22 obtaintheoptimalchromosomeciwithminimal
fitnessvaluefromM;
23 deleteTqofcifromDBasDB’;
24 returnDB’;
Firstly,theutilityofeachitem,theutilityofeach trans-action,andtheutilityofthewholedatasetarecalculated byoriginaldatasetandthecorrespondingutilitytable.After that,eachtransactionisthendeterminedandprojectedifit containsanyofsensitivehigh-utilityitemsetsandits trans-actionutilitytuissmallerthanthegivenmaximaldeleted utilityMDUastheCandtrans(Lines1—5).Thensortingthe transactions of Candtrans in ascending order by their tu (Line6). Summingup thesorted transactions bytheir tu, andterminating thenumber of m transactions where the summedvalueisnolongersmallthanMDUandthenumberof missetasthesizeofachromosome(Lines7—12).Afterthat, generatinga populationwithM individualsrandomly from Candtrans andcalculating thelower supportthresholdSl
ofpre-largeconceptaccordingtotheuppersupport thresh-oldSuandMDU(Lines13—14).Afterthoseoperations,the
crossoverandmutationoperationsareperformedtoupdate thechromosomes(Lines17—18).Thefitnessofeach chromo-someisthenevaluatedbythedesignedfitnessfunction(Line 19).Thehalfofthecurrentchromosomeswithlowestfitness valuesisselectedasthechromosomesandanotherhalf chro-mosomesaregeneratedrandomlyfornextgeneration(Lines 20and21).Theiterationprocedureisprocessedrepeatedly untilthe termination criterion is achieved (Lines15—21). Afterthat,theoptimalchromosomewiththelowestfitness function is obtained and the transaction IDs in this chro-mosomeisselectedasthevictimtransactionsfordeletion, thushidingthesensitivehigh-utilityitemsets(Lines22and 23).Finally, thedatabase is sanitizedintoa non-sensitive database(Line24).
Pre-largeconcept
Tospeeduptheevaluatingprocessandreducethecostof timeandspaceintheevolutionprocess,thepre-large con-cept(Hongetal.,2001)isadoptedinthedesignedalgorithm toavoidthemultipledatabasescanseachtime.Itusesthe upper (Su) and the lower (Sl) support thresholds to keep
theunpromisingitemsetswhichmayhave highly probabil-itytobelargeitemsetsasthebuffertoavoidthemultiple databasescansfortransactiondeletion.Thisstrategyis sim-ple but can be used efficiently to update the discovered information.ForthepurposeofPPUMinthispaper,the sen-sitivehigh-utilityitemsetsarerequiredtobehiddenthrough transaction deletionin the sanitization process. The pre-largeconceptfortransactiondeletionisshownbelow.
Definition1. Asafetydeletedutilitybound(f)ofpre-large conceptindicatesthatanitemsetcannotbelargeaftersome transactionsaredeletedfromtheoriginaldatabasewithout databaserescanas:
f= (Su−Sl)×|D| Su (1) inwhich |D|is the sizeof theoriginal database, Su is the
uppersupportthresholdandSlisthelowersupportthreshold
whichbotharedefinedbyusers’preference.
In the designed PPUMGAT+ algorithm, the maximal deleted utility (MDU)which wasfirstly givenbyusers can beconsideredasthesafetyboundinthepre-largeconcept and totalwhole database utility (TUD)can beconsidered asthedatabasesizeintheoriginaldatabase.Basedonthe pre-largeconcept,wecancalculatetheSlvaluetokeepthe
setofunpromisingpre-largeutilityitemsets(PUIs).
Definition2. LetSubetheuppersupportthresholdwhich
canbedefinedbyusers’preference,andMDUisthemaximal deletedutilityobtainedfromthesensitivehigh-utility item-sets,andTUDisthetotalutilityintheoriginaldatabase.The
Slinthedesignedapproachismodifiedfromthepre-large
conceptanddefinedas:
Sl=Su× 1−MDU TUD (2) Afterdelete the transactions of one chromosome, the deletedutilityofeachHUIandPUIcanbeobtained,andthe realutilityofeachHUIandPUIcanbecalculated.The miss-ingcost(MC)onlyappearsinHUIsandtheartificialcost(AC) onlyoccursinPUIs. Thus,itiseasy todetermine whether eachitemsetofHUIsandPUIsisaHUI.
Experimentalresults
Substantialexperimentswereconductedonreal-lifechess dataset (http://fimi.ua.ac.be/data/(2012)) and synthetic T10I4D100Kdataset(whichwasgeneratedbyIBMQuest Syn-thetic Data Generation Code, http://www.Almaden.ibm. com/cs/quest/syndata.html (1994)) to verify the com-prehensive performance of the proposed algorithm. The implemented algorithm adopts the pre-large concept and an improved strategy to reduce the rescanning of origin database in the evolution process, and the comparison between ourtechniqueand thenaive GA-basedalgorithm (Holland, 1992) alsois given to present the advantage of thepre-largeconceptintheproposedalgorithmintermsof runningtime.Notethatthepercentageofsensitiveitemsets isdenotedassenper.Runtimeofthecomparedapproaches ontwodatasetsunderdifferentminimumutilitythresholds withafixedsenperofHUIsisshownasFig.3.
FromFig.3,itcanbeobservedthattheproposed algo-rithmwhichadoptsthepre-largeconceptoutperformsthe
naive GA-based algorithm w.r.t. variousminimum support thresholds in all datasets. From Fig. 3(a), it can also be obviouslyseenthattheproposedalgorithmistwiceorthree times faster than the naive GA-based algorithm. For the sparseT10I4D100KdatasetshowninFig.3(b),thedesigned approach is up to two orders of magnitude faster than thecomparedtechnique.The reasonis thatthepre-large concept is usedin the proposed algorithm to reduce the rescanningoforigin database,thedesignedapproachthus only scans the origin database once in the whole evolu-tionprocess.Besides, at the situationof afixed sensitive percentage,thenumberofHUIsisdecreasedwhenthe min-imumsupportthresholdisincreased,lesscomputationsare requiredduetolessnumberofSHUIsshouldbehidden.
Conclusion
Thefirstpartofthisworkistopresentanovelevolutionary algorithmforHUIM.BasedonthedesignedMP-tree,the PSO-basedalgorithmcanconsumelessruntimethantheprevious HUPEumu-GRAM algorithm.The second partof this workis
topresentaGA-basedPPUMalgorithm.The pre-large con-cept wasalsoextended toreduce the computations. The designedapproachwasuptotwoordersofmagnitudefaster thanthecomparednaiveGA-basedtechnique.
Conflict
of
interest
Theauthorsdeclarethatthereisnoconflictofinterest.
Acknowledgements
ThispublicationhasbeencreatedwithintheprojectSupport ofVˇSB-TUOactivitieswithChinawithfinancialsupportfrom the Moravian-Silesian Region and partially was supported by the grant SGS reg. no. SP2015/82 conducted at VSB-Technical University of Ostrava, Czech Republic, and was partiallysupportedbytheShenzhenPeacockProject,China, under grant KQC201109020055A, by the Tencent Project undergrantCCF-TencentRAGR20140114,bytheNatural Sci-entific ResearchInnovation Foundation in Harbin Institute ofTechnologyundergrantHIT.NSRIF.2014100,bythe Shen-zhen Strategic Emerging Industries Program under grant ZDSY20120613125016389,andby National NaturalScience FoundationofChina(NSFC)undergrantNo.61503092.
References
Agrawal,R.,Imielinski,T.,Swami,A.,1993a.Databasemining:a performanceperspective.IEEETrans.Knowl. DataEng. 5(6), 914—925,http://dx.doi.org/10.1109/69.250074.
Agrawal,R.,Imielinski, T.,Swami,A.,1993b. Mining association rules between sets of items in large database. In: The ACM SIGMODInternationalConferenceonManagementofData,pp. 207—216.
Agrawal,R.,Srikant,R.,1995.Mining sequentialpatterns.Paper presentedattheInternationalConferenceonDataEngineering. Agrawal, R., Srikant, R., 2000. Privacy-preserving data mining. ACMSIGMODRec.29(2),439—450,http://dx.doi.org/10.1145/ 335191.335438.
Ahmed,C.F.,Tanbeer,S.K.,Jeong,B.S.,Le,Y.K.,2009.Efficient tree structures for high utilitypattern miningin incremental databases.IEEETrans.Knowl.DataEng.21(12),1708—1721. Amiri, A., 2007. Dare to share: protecting sensitive knowledge
with datasanitization.Decis. Support Syst.43(1), 181—191, http://dx.doi.org/10.1016/j.dss.2006.08.007.
Berkhin, P., 2006. A survey of clustering data mining tech-niques.PaperpresentedattheGroupingMultidimensionalData, http://dx.doi.org/10.1007/3-540-28349-82.
Chan,R.,Yang,Q.,Shen,Y.D.,2003.MiningHighUtilityItemsets. In:IEEEInternationalConferenceonDataMining,pp.19—26. Dunning,L.A.,Kresman,R.,2013.Privacypreservingdatasharing
withanonymousIDassignment.IEEETrans.Inf.ForensicsSecur. 8(2),402—413,http://dx.doi.org/10.1109/TIFS.2012.2235831. Fournier-Viger,P.,2014.FHN:efficientminingofhigh-utility
item-setswithnegativeunitprofits.Adv.DataMin.Appl.,16—29. Fournier-Viger,P.,Wu,C.W.,Zida,S.,Tseng,V.S.,2014.FHM:faster
high-utilityitemsetminingusingestimatedutilityco-occurrence pruning.Found.Intell.Syst.8502,83—92.
Han, J., Pei, J., Yin, Y., Mao, R., 2004. Mining frequent pat-terns without candidate generation: a frequent-pattern tree approach.DataMin.Knowl.Discov.8(1),53—87.
Han,S.,Ng,W.K.,2007.Privacy-preservinggeneticalgorithmsfor rulediscovery.PaperpresentedattheInternationalConference on Data Warehousing and Knowledge Discovery, Regensburg, Germany.
Holland,J.H.,1992.AdaptationinNaturalandArtificialSystems. MITPress.
Hong,T.P.,Wang,C.Y., Tao,Y.H., 2001.Anewincrementaldata miningalgorithmusingpre-largeitemsets.Intell.DataAnal.5 (2),111—129.
Kannimuthu,S.,Premalatha,K.,2015.Afastperturbationalgorithm usingtreestructureforprivacypreservingutilitymining.Expert Syst. Appl. 42 (3), 1149—1165, http://dx.doi.org/10.1016/ j.eswa.2014.08.037.
Kennedy,J.,Eberhart,R.C.,1997.Adiscretebinaryversionofthe particleswarmalgorithm.In:IEEEInternationalConferenceon Systems,Man,andCybernetics,pp.4104—4108.
Kannimuthu, S., Premalatha, K., 2014. Discovery of high util-ity itemsets using genetic algorithm with ranked mutation. Appl.Artif.Intell.28(4),337—359,http://dx.doi.org/10.1080/ 08839514.2014.891839.
Kotsiantis,S.B.,2007. Supervised machine learning:a reviewof classification techniques. Paper presented at the The Con-ference on Emerging Artificial Intelligence Applications in ComputerEngineering:RealWordAISystemswithApplicationsin eHealth,HCI,InformationRetrievalandPervasiveTechnologies. Lan, G.C., Hong, T.P., Tseng, V.S., 2011. Discoveryof high util-ity itemsets from on-shelf time periods of products. Expert Syst. Appl. 38 (5), 5851—5857, http://dx.doi.org/10.1016/ j.eswa.2010.11.040.
Li, H.F., Huang, H.Y., Chen, Y.C., Liu, Y.J., Lee, S.Y., 2008. Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams., pp. 881—886, http://dx.doi.org/10.1109/ icdm.2008.107.
Lin, C.W., Hong, T.P., Lu, W.H., 2011. An effective tree struc-tureformininghighutilityitemsets.ExpertSyst.Appl.38(6), 7419—7424,http://dx.doi.org/10.1016/j.eswa.2010.12.082. Lin,C.W.,Hong,T.P.,Wong,J.W.,Lan,G.C., Lin,W.Y.,2014a.A
GA-basedapproachtohidesensitivehighutilityitemsets.Sci. WorldJ.,2014.
Lin,J.C.W.,Gan,W.,Hong,T.P., Pan,J.S.,2014b.Incrementally updating high-utilityitemsetswithtransaction insertion. Adv. DataMin.Appl.,44—56.
Lin,C.W.,Hong,T.P.,Lan,G.C.,Wong,J.W.,Lin,W.Y.,2015a. Effi-cientupdatingofdiscoveredhigh-utilityitemsetsfortransaction deletionindynamicdatabases.Adv.Eng.Inform.29(1),16—27, http://dx.doi.org/10.1016/j.aei.2014.08.003.
Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., 2015b. Mining high-utility itemsets with multiple minimum utility thresholds.In:ACMInternationalConferenceonComputer Sci-ence & Software Engineering, pp. 9—17, http://dx.doi.org/ 10.1145/2790798.2790807.
Lin, J.C.W., Gan, W., Hong, T.P., 2015c. A fast updated algo-rithm to maintain the discovered high-utility itemsets for transaction modification. Adv. Eng. Inform. 29(3), 562—574, http://dx.doi.org/10.1016/j.aei.2015.05.003.
Lin,J.C.W.,Gan,W.,Hong,T.P.,Tseng,V.S.,2015d.Efficient algo-rithms for mining up-to-date high-utility patterns. Adv. Eng. Inform.29(3),648—661,http://dx.doi.org/10.1016/j.aei.2015. 06.002.
Liu,M.,Qu,J.,2012.Mininghighutilityitemsetswithoutcandidate generation.In:ACMInternationalConferenceonInformationand KnowledgeManagement,pp.55—64.
Liu,Y.,Liao,W.K.,Choudhary,A.,2005.Atwo-phasealgorithmfor fastdiscoveryofhighutilityitemsets.Paperpresentedatthe LectureNotesinComputerScience,http://dx.doi.org/10.1007/ 1143091979.
Pei,J.,Han,J.,Mortazavi-Asl,B.,Wang,J.,Pinto,H.,Chen,Q.,... Hsu,M.-C.,2004.Miningsequentialpatternsbypattern-growth:
thePrefixSpanapproach.IEEETrans.Knowl.DataEng.16(11), 1424—1440,http://dx.doi.org/10.1109/tkde.2004.77.
Tseng,V.S.,Shie,B.E.,Wu,C.W.,Yu,P.S.,2013.Efficientalgorithms for mininghighutility itemsetsfrom transactionaldatabases. IEEETrans.Knowl.DataEng.25,1772—1786.
Tseng,V.S., Wu,C.W.,Shie,B.E., Yu,P.S.,2010. UP-Growth:an efficientalgorithmforhighutilityitemsetmining.In:The16th ACMSIGKDDInternationalConferenceonKnowledgeDiscovery andDataMining,pp.253—262.
Verykios,V.S.,Bertino,E.,Fovino,I.N.,Provenza,L.P.,Saygin,Y., Theodoridis,Y.,2004.State-of-the-artinprivacypreservingdata mining. ACM SIGMOD Rec. 33(1), 50—57, http://dx.doi.org/ 10.1145/974121.974131.
Wu,C.W.,Shie,B.E.,Tseng,V.S.,Yu,P.S.,2012.Miningtop-Khigh utilityitemsets.In:The18thACMSIGKDDInternational Confer-enceonKnowledgeDiscoveryandDataMining,pp.78—86. Yao,H.,Hamilton,H.J.,Butz,C.J.,2004.Afoundationalapproach
tominingitemsetutilitiesfromdatabases.In:TheSIAM Interna-tionalConferenceonDataMining,pp.211—225.
Yeh,J.S.,Hsu,P.C.,2010.HHUIFandMSICF:novelalgorithmsfor privacy preserving utility mining. Expert Syst. Appl. 37 (7), 4779—4786,http://dx.doi.org/10.1016/j.eswa.2009.12.038.