An Adaptive Fusion Algorithm for Spam Detection
Congfu Xu
(1), Baojun Su
(1), Yunbiao Cheng
(1)[email protected],
{
freizsu,yunbiaoch
}
@gmail.com
Weike Pan
(2),(3)∗[email protected]
Li Chen
(3)[email protected]
(1)
Institute of Arti¿cial Intelligence, College of Computer Science, Zhejiang University
(2)College of Computer Science and Software Engineering, Shenzhen University
(3)
Department of Computer Science, Hong Kong Baptist University
Abstract
Spam detection has become a critical component in vari-ous online systems, like email services, advertising engines, social media sites, etc. In this paper, we take email services as an example, and present an adaptive fusion algorith-m for spaalgorith-m detection (AFSD), which is a general content-based approach and can be applied to non-email spam de-tection tasks with little additional effort. In our proposed algorithm, we (1) use n-grams of non-tokenized text strings to represent an email, (2) introduce a link function in or-der to convert the prediction scores of online learners to be more comparable ones, (3) train the online learners in a mistake-driven manner via “thick thresholding” to ob-tain high competitive online learners, and (4) design update rules to adaptively integrate the online learners to capture different aspects of spams. We study the prediction perfor-mance of AFSD on¿ve public competition datasets and one industry dataset, and observe that our algorithm achieves signi¿cantly better results than several state-of-the-art ap-proaches, including the champion solutions of the corre-sponding competitions.
Keywords: Spam Detection, Adaptive Fusion
1 Introduction
Spam detection or anti-spam has become a critical com-ponent in various online systems to ¿lter harmful infor-mation, e.g. false information in email or SNS services, malicious clicks in advertising engines, fake UGC (user generated content) in social networks, etc. Most commer-cial systems adopt a machine learning classi¿er, such as
0*The corresponding author is Weike Pan.
Naive Bayes, Logistic regression, or support vector ma-chines (SVMs), to detect the spams. However one single classi¿er may not be able to capture diverse aspects of s-pams, which may also change dynamically. As a response, we design a fusion algorithm based on a set of online learn-ers, instead of relying on one single base model. We use email spam detection as an example, while our algorithm is not limited to the email domain.
An email spam is de¿ned as an unsolicited email that is sent indiscriminately, directly or indirectly, by a sender having no current relationship with the recipient [7]. email spams will affect the employees’ working ef¿ciency and al-so cause bandwidth wastage. Besides email spams, there are more and more similar abuses in social medias [28] and mobile services [26]. We are surrounded by spams in our daily life, which motivates us to detect and¿lter them accu-rately.
There are a variety of popular methods for ¿ghting s-pams, such as DNS-based Blackhole Lists (DNSBL) [12], greylisting [15], spamtraps [19], extrusion [3], online ma-chine learning models [17], feature engineering [26], matrix factorization [28], etc. As spammers are becoming more sophisticated and constantly managing to outsmart “static” anti-spam methods, content-based approaches have shown promising accuracy in combating them. In this paper, we also focus on content-based approaches.
To overcome the limitations of one single machine learn-ing classi¿er, in this paper, we borrow ideas from informa-tion fusion community and devise an adaptive fusion algo-rithm for spam detection (AFSD). AFSD aims to build an integrated spam detector from a collection of lightweight online learners in an adaptive manner. As far as we know, AFSD holds the best AUC score on TREC spam competi-tion datasets.
2 Adaptive Fusion for Spam Detection
2.1 Problem De
¿
nition
We have some real-time arriving text (e.g. emails),
{(x, y)}, wherex∈Rd×1is the feature representation of a
certain email andy ∈ {1,0}is the label denoting whether the email is spam “1” or ham “0”. We also havekonline learners,fj(x;θ), j = 1, . . . , k, with the prediction of the
jth learner on emailxas follows, yj=fj(x;θ).
Our goal is to learn an adaptively integrated prediction mod-el,
f(x) ={fj(x;θ)}kj=1,
which minimizes the accumulated error during the whole online learning and prediction procedure. As far as we know, we are the¿rst in designing anadaptivefusion al-gorithm for real-time email spam detection.
2.2 Feature Representation
Various methods have been proposed to extract features from text, among which tokenization is probably the most popular one. However, tokenization may not obtain good results when facing spammers’ intentional obfuscation or good word attack, especially for the task of email spam de-tection. We thus drop tokenization and adoptn-grams of non-tokenized text strings, which is a simple yet effective method [10]. The feature space includes all n-character sub-strings of the training data. We construct a binary fea-ture vector for each email,
x= [xi]di=1∈ {1,0}d×1, (1)
wherexi indicates the existence of the correspondingith
feature, xi =
1, if theith feature exists
0, otherwise . (2)
Note that such representation is very ef¿cient for online learning and prediction environment.
2.3 The Algorithm
2.3.1 Link Function
Considering that the prediction scores of different online learners are usually in different ranges, we thus adopt a commonly used sigmoid function,
σ(z) = 1/(1 +e−z), (3)
to map raw prediction scores returned by online learners to a common range between0and1.
In order to make the scores from different online learner-s more comparable, we follow the approachelearner-s ulearner-sed in data normalization and introduce a bias parametery0and an off-set parameteryΔ in the link function to achieve effect of centering and scaling [1],
Pj(x) =σ yj−y0 yΔ =σ fj(x;θ)−y0 yΔ , (4) where different bias parameter values and offset parameter values shall be used for different online learners, which can be determined empirically via cross validation. In our ex-periments, we set the value of bias and offset empirically in order to make the scores of different base classi¿ers to be in a similar range, which will then make the scores more comparable.
2.3.2 Mistake-Driven Training of Online Learners We consider a quali¿ed online learner from four perspec-tives. First, it shall be a vector space model or can be trans-formed into a vector space model, since then the email text only needs to be processed once, and can be used for all on-line learners. Second, it shall be a lightweight classi¿er with acceptable accuracy, which will most likely help achieve high prediction accuracy in the¿nal algorithm. Third, the model parameters can be learned incrementally, since it will be trained in a mistake-driven manner to make the classi¿er more competitive. Fourth, the output of a model should be a score or probability. Any classi¿er meeting the above four requirements is acceptable in our adaptive fusion algorithm. In our AFSD algorithm, we implement eight online learners: Naive Bayes (NB), Not So Naive Bayes (N-SNB) [24], WINNOW [13], Balance Winnow (BWIN-NOW) [2], online Logistic Regression (LR) [10], HIT [20], Passive-Aggressive (PA) [8] and online Perceptron Algo-rithm with Margins (PAM) [23].
We use “thick thresholding” in our mistake-driven train-ing procedure of the online learners. For a labeled email instance, the model parameters of an online learner will not be updated if the email has been well classi¿ed. Otherwise, this email will be used to train the online learner until it has been well classi¿ed. An email is considered as not well classi¿ed by the online learnerjin the following cases,
Pj(x)≤0.75 ifxis spam
Pj(x)≥0.25 ifxis ham
, (5)
which means that an email is well classi¿ed only if the pre-diction score is larger than0.75or smaller than0.25. An email with prediction score of larger than 0.75is usually considered as a spam with high con¿dence, and similar for
that smaller than0.25as a ham with little uncertainty. The mechanism of “thick thresholding” will thus emphasize the dif¿cult-to-classify email instances, and¿nally produces a well trained online learner with prediction scores far away from the uncertain decision range, e.g. scores located in the range between0.25and 0.75. Models trained in this way, usually have high generalization ability, which is also observed in our experiments when we compare our online learners with the champion solutions of the corresponding datasets.
2.3.3 Adaptive Fusion of Online Learners
Once we have trained the online learners, we have to ¿nd a way to integrate them for ¿nal predition. We use w1, w2, . . . , wkto denote the weight of thosekonline
learn-ers. For any coming emailx, we calculate the¿nal predic-tion score via a weighted combinapredic-tion,
P(x) =k j=1 wjPj(x)/ k j=1 wj, (6)
where the weight of each online learner is initialized as1, and will be updated adaptively according to the correspond-ing online learner’s performance. Once we have learned the integrated prediction model, we can decide whether the e-mailxis spam or ham viaδ(P(x))with,
δ(z) =
1, ifz >0.5
0, otherwise . (7) During the training procedure of adaptive fusion, for an online learnerfj(x;θ), if its prediction is the same as that
of the ¿nal prediction, δ(Pj(x)) = δ(P(x)), the
corre-sponding weight, wj, will not be updated; otherwise, wj
will be updated adaptively. More speci¿cally, if an on-line learner makes a correct prediction while the integrat-ed model makes an incorrect printegrat-ediction,δ(Pj(x)) =yand
δ(P(x))= y, the weightwj will be increased, otherwise
wjwill be decreased, wj= wj+γΔw+, ifδ(Pj(x)) =y, δ(P(x))=y wj+γΔw−, ifδ(Pj(x))=y, δ(P(x)) =y (8) whereyis the true label of emailx,Δw+>0andΔw−<
0are the award and punishment on the weight, respectively, andγ is the learning rate. In our experiments, we¿x the awardΔw+ = 20, the punishmentΔw− = −1, and the
learning rate γ = 0.02. From Eq.(8), we can see that a classi¿er with correct prediction will be awarded with more weight, while a classi¿er with incorrect prediction will be punished with reduced weight.
Input: The real-time arriving text{(x, y)},konline learnersfj(x;θ),j= 1, . . . , k.
Output: The learnedk online learnersfj(x;θ)and
the corresponding weightwj,j= 1, . . . , k
Step 1. Feature extraction of the text;
Step 2. Mistake-driven training of each online learner as shown in Eq.(5);
Step 3. Adaptive fusion of online learners as shown in Eq.(8).
Figure 1: The algorithm ofAdaptive Fusion for Spam De-tection(AFSD).
Finally, we have the complete algorithm as shown in Fig-ure 1. The AFSD algorithm can be implemented ef¿ cient-ly, since (1) the extracted features can be used for all on-line learners (vector space models), (2) the onon-line learners are trained independently and thus can be implemented via multi-thread programming or in a distributed platform, and (3) the adaptive fusion procedure will not update the mod-el parameters of the trained online learners, but weight on-ly. Furthermore, both the model parameters learned in the mistake-driven training step and the weight learned in the adaptive fusion step can be updated online, which means that our fusion algorithm AFSD is actually an online learn-ing algorithm with the ability to receive the trainlearn-ing data on-the-Ày. Our experiments are also conducted in an online setting.
3 Experimental Results
3.1 Experimental Settings
Datasets The benchmarks used in our experiments con-sist of the commonly used TREC datasets from 2005 to 20071, the CEAS 2008 dataset2and the NETEASE dataset3. Speci¿cally, TREC05p, TREC06p, TREC06c, TREC07p, CEAS08 and NETEASE have 92189, 37822, 64620, 75419, 137705 and 208350 instances, respectively. There are four types of emails in the NETEASE dataset, including spam, advertisement, subscription and regular emails. For the NETEASE dataset, we convert the spam detection task into a binary classi¿cation task, spam vs. ham (advertisement, subscription and regular).
For each dataset, we use4-grams to extract features from character strings of an email and use binary coding to
rep-1http://trec.nist.gov/data/spam.html
2http://plg.uwaterloo.ca/ gvcormac/ceascorpus/
3This dataset is authorized from the largest email service provider in
China, NetEase (http://www.163.com/).
Table 1: (1-AUC )% scores of online learners.
WINNER NB NSNB WINNOW BWINNOW LR HIT PA PAM TREC05p .0190 .0197 .0073 .0356 .0152 .0150 .0108 .0137 .0137 TREC06p .0540 .0495 .0278 .0599 .0624 .0359 .0305 .0347 .0348 TREC06c .0023 .0137 .0003 .0061 .0023 .0006 .0003 .0006 .0006 TREC07p .0055 .0163 .0079 .0189 .0061 .0070 .0086 .0066 .0066 CEAS08 .0233 .0039 .0006 .0037 .0009 .0013 .0005 .0012 .0013 NETEASE - .0218 .0149 .0220 .0133 .0135 .0128 .0127 .0128 Total - .1249 .0588 .1462 .1002 .0733 .0635 .0695 .0698
Table 2: (1-AUC)% scores of our fusion algorithm AFSD and other fusion approaches. Note that we use NSNB as the best single online learner.
NSNB Bagging Voting AFSD TREC05p .0073 .0065 (+11.0%) .0070 (+ 4.1%) .0055 (+24.7%) TREC06p .0278 .0176 (+36.7%) .0193 (+30.6%) .0155 (+44.2%) TREC06c .0003 .0001 (+66.7%) .0002 (+33.3%) .0001 (+66.7%) TREC07p .0079 .0058 (+26.6%) .0060 (+24.1%) .0058 (+26.6%) CEAS08 .0006 .0006 .0006 .0004 (+33.3%) NETEASE .0149 .0095 (+36.2%) .0096 (+35.6%) .0092 (+38.3%) Total .0588 .0401 (+31.8%) .0427 (+27.4%) .0365 (+37.9%)
resent the corresponding feature’s existence. In order to re-duce the impact of long messages, we only keep the¿rst
3000characters of each message.
Evaluation Metrics In order to take the false positive rate into consideration, we use (1- AUC)% [11] in our evalua-tions, which is commonly used in email spam detection.
For evaluation, we use the standard TREC spam detec-tion evaluadetec-tion toolkit4, which ensures that all the results obtained by different approaches on the same datasets are comparable.
Note that the baseline of WINNER in our results refer-s to the champion refer-solutionrefer-s of the correrefer-sponding compe-titions, TREC05p [6], TREC06p [4], TREC07p [5], and CEAS085. The baseline53-Ensemble refers to the fusion algorithm with53base classi¿ers [16].
3.2 Study of Online Learners
To ensure reliable performance of AFSD, we must guar-antee the performance of each online learner. Furthermore, the predicability of each online learner is also very useful in the analysis of fusion approaches and the selection of a subset of online learners.
The results of (1-AUC)% are shown in Table 1, from which we can see that NSNB [24] has the best performance on TREC05p,TREC06p and TREC06c, HIT has the best performance on TREC06c and CEAS08, and PA has the best performance on NETEASE. Note that WINNER is best
4http://plg.uwaterloo.ca/ gvcormac/jig/ 5http://www.ceas.cc/2008/challenge/results.pdf
only on TREC07p, and BWINNOW is very close (.0061
compared to.0055). When we consider the total (1-AUC)% of all six datasets, the result of the champion solutions of each year is only slightly better than the worst online learner (WINNOW) in our AFSD algorithm, which shows that the online learners are very competitive. We think that the high prediction accuracy of our online learners are from the gen-eralization ability of online learners trained in the mistake-driven manner, which is described in Section 2.3.2.
From both Table 1, we can see that NSNB outperforms other online learners. Hence, we use NSNB as the best sin-gle online learner to compare with fusion approaches in the next subsection.
3.3 Study of Fusion Algorithms
We demonstrate the effectiveness of our AFSD algorith-m by coalgorith-mparing with the following approaches:
Best online learnerAs a baseline algorithm for compari-son, we use NSNB as the best single¿ltler.
Bagging We use the average prediction scores of online learners, δ(k
j=1Pj(x;θ)/k), where δ(z)is the same as
that in Eq.(7). Note that Bagging can be considered as a special case of AFSD when the weights of online learners are¿xed aswj = 1,j= 1, . . . , k.
Voting We use the majority votes of online learners, δ(k
j=1δ(Pj(x;θ))/k), whereδ(z)is the same as that in
Eq.(7).
The results of different fusion approaches are shown in Table 2 and Table 3.
From Table 2, we can see that Bagging improves the baseline algorithm (NSNB) on average by 31.8% on (1-AUC)%, Voting 27.4%, and AFSD 37.9%. Our proposed algorithm AFSD is signi¿cantly better than both Bagging and Voting, which clearly shows the effect of our adaptive fusion algorithm.
Table 3: (1-AUC)% scores of our proposed fusion algorithm AFSD and other approaches.
WINNER 53-Ensemble AFSD TREC05p .0190 .0070 .0055 (+71.1%) TREC06p .0540 .0200 .0155 (+71.3%) TREC06c .0023 - .0001 (+95.7%) TREC07p .0055 - .0058 CEAS08 .0233 - .0004 (+98.3%) Total .1041 - .0273 (+73.8%) From Table 3, we can see that AFSD outperforms the TREC champion solutions (WINNER) signi¿cantly on most datasets, and is only slightly worse than WINNER on TREC07p. The total ROC score of TREC champion so-lutions (WINNER) is .1041, while AFSD gives the total score .0273, which improves the WINNER’s ROC score by73.8%. AFSD also achieves better results using only8
classi¿ers than a recent ensemble classi¿er using53online learners [16].
3.4 Study of Weight on Online Learners
Generally, the more classi¿ers integrated, the slower the entire system would be when deployed. Moreover, increas-ing the number of online learners does not guarantee better prediction performance [27]. Therefore, it is important to select a relatively small subset of the online learners to be both ef¿cient and effective.
Our AFSD is able to achieve this goal. After train-ing, each online learner has a weight indicating its impor-tance on the¿nal results, e.g., on the TREC06p dataset: N-B (13.48), NSNN-B (6.26), WINNOW (12.14), N-BWINNOW (9.88), LR (5.38), HIT (12.48), PA (6.14) and PAM (5.72). During subset selection, we propose to select a highly weighted online learner with high priority. For example, NB and HIT will be selected if two online learners are needed and NB, HIT and Winnow will be selected if three online learners are needed.
The results of using different numbers of online learners are shown in Figure 2. We can see that if the number of online learners is smaller than4, the result is worse than that of using all those8online learners. And when4 to7
online learners are integrated, we can obtain better results than that of using all online learners. Our main observation from Figure 2 is that a selected subset of classi¿ers is able
to achieve comparable or even slightly better performance than using the whole set of classi¿ers.
2 3 4 5 6 7 8 0.005 0.006 0.007 0.008 0.009
Number of base classifiers
(1
í
AUC)%
Figure 2: The (1-AUC)% score of our proposed fusion al-gorithm AFSD with different numbers of online learners on TREC05p dataset.
4 Related Work
Spam Detection Spam detection as a critical compo-nent in various online systems, has attracted much atten-tion from both researchers and practiatten-tioners. Parallel to the categorization of recommender systems [21], spam de-tection algorithms can also be divided into content-based approaches [24] and collaborative-based approaches [28]. Most previous works belong to content-based approaches, while some recent works exploit social networks for spam detection [14, 26, 25] or spammer detection [28]. Our pro-posed AFSD algorithm belongs to the content-based class. Fusion of Base Classi¿ers Fusion algorithms or en-semble methods [22, 9, 18] have achieved great success in classi¿cation tasks. Rokach [22] categorized the en-semble methods from different dimensions. Dzeroski andß
ß
Zenko [9] showed that a well designed stacking based en-semble method can beat a best single method. Menahem et al., [18] proposed a three-layer stacking based ensemble learning method, which works well for multi-class classi¿ -cation problems. Lynam and Cormack [16] built a fused model by combining all the participants’ spam ¿lters of TREC, and also studied the fusion approaches thoroughly. They showed that a set of independently developed spam ¿lters may be combined in simple ways to provide substan-tially better performance than any individual¿lter. Those so-called simple fusion approaches [16] are similar to the bagging approaches in our experiments.
The differences between our proposed fusion algorithm, AFSD, and other fusion approaches can be identi¿ed at least from three aspects: (1) we introduce a link function with
the effect of “result scaling”, which makes the prediction scores of online learners more comparable; (2) we train our online learners in a mistake-driven manner, which allows us to obtain a series of highly competitive online learners; and (3) we design a fusion algorithm to adaptively integrate the predictions of online learners.
5 Conclusions and Future Work
In this paper, we present a spam detection algorithm,
Adaptive Fusion for Spam Detection(AFSD). We¿rst train online learners in a mistake-driven manner, and then fuse the predictions of online learners adaptively in an online learning manner. Our algorithm also inherits the effective-ness and ef¿ciency of online learning. Experimental results on¿ve public competition datasets and one industry dataset show that AFSD produces signi¿cantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.
For future works, we are mostly interested in continuing our work in three aspects, including (1) designing strategies for automatic selection of a subset of base classi¿ers, (2) ap-plying our fusion algorithm to spam detection tasks in social media and mobile computing domains, and (3) studying the generalization ability of our proposed algorithm.
Acknowledgement
We thank the support of Natural Science Foundation of China No. 60970081 and No. 61272303, and Na-tional Basic Research Program of China (973 Plan) No. 2010CB327903.
References
[1] T. W. Anderson. An Introduction to Multivariate S-tatistica Analysis. Wiley-Interscience, third edition, 2003.
[2] Vitor R Carvalho and William W Cohen. Single-pass online learning: performance, voting schemes and on-line feature selection. InProceedings of International Conference on Knowledge Discovery and Data Min-ing, pages 8–13, 2006.
[3] Richard Clayton. Stopping spam by extrusion detec-tion. InProceedings of First Conference on Email and Anti-Spam (CEAS 2004), 2004.
[4] Gordon Cormack. Trec 2006 spam track overview. Technical report, University of Waterloo, Ontario, Canada, 2006.
[5] Gordon Cormack. Trec 2007 spam track overview. Technical report, University of Waterloo, Ontario, Canada, 2007.
[6] Gordon Cormack and Thomas Lynam. Trec 2005 s-pam track overview. Technical report, University of Waterloo Waterloo, Ontario, Canada, 2005.
[7] G.V. Cormack. Email spam ¿ltering: A systematic review. Foundations and Trends in Information Re-trieval, 1(4):335–455, 2007.
[8] Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online Passive-Aggressive Algorithms.Journal of Machine Learning Research, 7:551–585, 2006.
[9] Saso Dzeroski and Bernardß Zenko. Is combining clas-ß si¿ers with stacking better than selecting the best one?
Mach. Learn., 54(3):255–273, March 2004.
[10] Joshua Goodman. Online Discriminative Spam Fil-ter Training. InProceedings of Third Conference on Email and Anti-Spam, 2006.
[11] J.A. Hanley and B.J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve.Radiology, 143(1):29, 1982.
[12] J. Jung and E. Sit. An empirical study of spam traf¿c and the use of DNS black lists. InProceedings of the 4th ACM SIGCOMM conference on Internet measure-ment, pages 370–375. ACM, 2004.
[13] J. Kivinen, MK Warmuth, and P. Auer. The perceptron algorithm versus winnow: linear versus logarithmic mistake bounds when few input variables are relevant* 1.Arti¿cial Intelligence, 97(1-2):325–343, 1997. [14] Ho-Yu Lam and Dit-Yan Yeung. A learning approach
to spam detection based on social networks. In Pro-ceedings of Fourth Conference on Email and Anti-Spam (CEAS 2007), pages 832–840, 2007.
[15] J.R. Levine. Experiences with greylisting. In Confer-ence on Email and Anti-Spam. Citeseer, 2005. [16] Thomas R. Lynam and Gordon V. Cormack. On-line
spam¿lter fusion. InProceedings of the 29th annu-al internationannu-al ACM SIGIR conference on Research and development in information retrieval - SIGIR ’06, New York, New York, USA, 2006. ACM Press. [17] Justin Ma, Lawrence K. Saul, Stefan Savage, and
Ge-offrey M. Voelker. Learning to detect malicious url-s. ACM Trans. Intell. Syst. Technol., 2(3):30:1–30:24, May 2011.
[18] Eitan Menahem, Lior Rokach, and Yuval Elovici. Troika - an improved stacking schema for classi¿ ca-tion tasks. Inf. Sci., 179(24):4097–4122, December 2009.
[19] M. Prince, B. Dahl, L. Holloway, A. Keller, and E. Langheinrich. Understanding how spammers steal your e-mail address: An analysis of the ¿rst six months of data from project honey pot. In Second Conference on Email and Anti-Spam, 2005.
[20] Haoliang Qi, Xiaoning He, Muyun Yang, Jun Li, Guo-hua Lei, and Sheng Li. Joint NLP Lab between HIT 2 at CEAS Spam-¿lter Challenge 2008. InProceedings of Fifth Conference on Email and Anti-Spam, number 999, 2008.
[21] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, editors.Recommender Systems Hand-book. Springer, 2011.
[22] Lior Rokach. Taxonomy for characterizing ensem-ble methods in classi¿cation tasks: A review and annotated bibliography. Comput. Stat. Data Anal., 53(12):4046–4072, October 2009.
[23] D Sculley, Gabriel M Wachman, and Carla E Brodley. Spam Filtering using Inexact String Matching in Ex-plicit Feature Space with On-Line Linear Classi¿ers. InProceedings of Fifteenth Text REtrieval Conference (TREC 2006), number Section 2, 2006.
[24] Baojun Su and Congfu Xu. Not So Na¨Õve Online Bayesian Spam Filter. InProceedings of Twenty-First Innovative Applications of Arti¿cial Intelligence Con-ference, number 1, 2009.
[25] Chi-Yao Tseng and Ming-Syan Chen. Incremental svm model for spam detection on dynamic email so-cial networks. 2012 IEEE 15th International Con-ference on Computational Science and Engineering, 4:128–135, 2009.
[26] Qian Xu, Evan Xiang, Jiachun Du, Jieping Zhong, and Qiang Yang. Sms spam detection using content-less features.IEEE Intelligent Systems, 99(PrePrints), 2012.
[27] Z.H. Zhou, J. Wu, and W. Tang. Ensembling neural networks: Many could be better than all* 1. Arti¿cial intelligence, 137(1-2):239–263, 2002.
[28] Yin Zhu, Xiao Wang, ErHeng Zhong, Nathan Nan Li-u, He Li, and Qiang Yang. Discovering spammers in social networks. InAAAI, 2012.
Dr. Congfu Xu is currently an associate professor with the Col-lege of Computer Science, Zhejiang University, Hangzhou, China.
He received the Ph.D. degree in Computer Science from Zhejiang University, in 2000.
His current research interests include information fusion, data mining, arti¿cial intelli-gence, recommender systems, and sensor networks.
Baojun Su is currently working on search advertising in Microsoft STC-Beijing.
He received the M.S. de-gree in Computer Application Technology from Zhejiang U-niversity, Hangzhou, China, in 2011.
Yunbiao Cheng is currently working on Yahoo! Software Research & Development (Beijing) Co.
He received the M.S. de-gree in Computer Application Technology from Zhejiang U-niversity, Hangzhou, China, in 2013.
Dr. Weike Pan is currently a lec-turer with the College of Comput-er Science and Software EngineComput-er- Engineer-ing, Shenzhen University, Shen-zhen, China.
He received the Ph.D. degree in Computer Science and Engineering from the Hong Kong University of Science and Technology, Kowloon, Hong Kong, China, in 2012. His current research interests include transfer learning, recommender systems, and statistical machine learning.
Dr. Li Chen is an Assistant Pro-fessor in the Department of Com-puter Science at Hong Kong Baptist University. She obtained her PhD degree at Swiss Federal Institute of Technology in Lausanne (EPFL), and Bachelor and Master Degrees in Computer Science at Peking U-niversity, China. Her research terests are mainly in the areas of in-telligent Web technologies, recom-mender systems and e-commerce decision supports.