Spam related cyber-crimes are one of the most ever growing threats to the society. Spamming contribute to illegal earnings by selling various products and also spread malwares which serves as a medium to steal confidential data from a user’s computer or makes their functioning ambiguous. Present methods to combat spamming have been quite lethargic since they serve as a temporary means to occlude the spamming effect. The best system to completely prevent spam emails is to stop the source of a spam, that is, to detect the spammer. At present there are three important techniques which are used by spammers to send spam emails :
Abstract: An email client receives emails from different websites, portals and domains, which can be an advertisement. Receiving a bulk amount of emails can cause serious damages like suspension of a particular email id.Mostly an email client gets exposed to the number of malicious receipts by registering an email account to a web portal, which in turn sends a bulk amount of emails. The email client wants to be decisive about differentiating the useful emails and spam emails. One of the solutions to escape from spam emails is to develop a decision based system which can classify the spam and non-spam emails. This survey gives an overview about different machine learning and deep learning algorithms to classify the spam and non-spam emails by accessing the received emails of an email client. The machine learning approaches and mechanisms like support vector machine, naive Bayesian classifier, artificial neural networks and logistic regression can be of important help to determine spam emails. These approaches use decision trees to run tests on a given sets of data (emails). After classifying a spam email source a user can navigate, block and report the source of the spam email generator. Most of the times the spam emails are generated by the autonomous sources which are called spam-bots.
A misclassified spam that arrives in a user’s inbox is annoying. A misclassified ham that the user never sees may result in loss of business, productivity, opportunity, or time. Spammers actively attempt to defeat spam filters by substituting look-alike characters for letters, hiding random text in an email, misspelling words, including pictures that show the advertisement, or embedding links into deceptively-phrased emails. Their techniques change daily. Therefore any anti-spam technology must be able to adapt quickly. Automated methods of spam filtering that can learn how to distinguish spam emails from ham emails and can be trained – learn in an updatable fashion - are of vital importance. A good anti-spam technique will have three characteristics: it will accurately classify spam and ham, it will be easily adaptable, and it will be easily scalable. Most of the current research in spam filtering concentrates on using data mining approaches to solve the spam filtering. According to data mining, the spam is a classification problem where the filtering system aims at distinguishing spam from legitimate (ham) emails. Thus, classification algorithms that are widely used for pattern recognition can be used to solve the spam problem.
This paper presented an analysis of modern spam from an age-comparative user perspective, integrating manual qualitative content coding and quantitative statistics. We aimed to clarify (i) the extent to which weapons of influ- ence and life domains were represented in young vs older users’ spam emails and (ii) variations of the prevalence of weapons of influence and life domains by age demo- graphic. Our study demonstrated the presence of some level of age-specific targeting in current spam campaigns. This knowledge is crucial in its potential for integration in the development of future spam mitigation solutions, capable of detecting influence in emails and warning users in an demographic-targeted fashion such as by con- sidering age-specific vulnerabilities. Moving forward, we plan to leverage this manually labeled dataset of emails to develop machine learning classifiers that can detect influ- ence in text.
14 Read more
Ali Cıltık et al  proposed a method of spam e-mail filtering methods with high accuracies and low time complexities. They took Turkish mails for their research. They used PC- KIMMO system, a morphological analyzer to extract root forms of words as input and produce parse of words as output. This method is based on the n-gram approach and a heuristics. They developed two models, a class general model and an e- mail specific model. The general model classifies the mail as spam or legitimate by using bayes rule. The second model determines the correct class of a message by comparing it with the similar previous message for matching. The third model is a combined perception refined model. It is a combination of above two models. Free word order is used for ordering the word in fixed order for n gram model. This spam filtering method is based on classifying text contents and raw contents of emails obtaining results from the categorization of data sets. They faced the increase of time complexity problem when handling the larger number of words. Adaboost ensemble algorithm is used to compare with its previous work. They performed extensive tests on various number datasets sizes and initial words. They have obtained a result of high success rates in both Turkish language and English. A.G. López-Herrera et al  developed a multiobjective evolutionary algorithm for filtering spam. They evaluated the concepts of dominance and pareto–set. SPAM-NSGA-II-GP is used for filtering spam mails. MOEA is used to learn a set of queries with good precision and recall. PUI datasets are used for spam filtering. SPAM-NSGA-II-GP with very strong filtering rules are (high recall and low precision) used to block all the legitimate emails and labeled as spam. They used the weak filtering rules (high precision and low recall) for labeling a minimum portion of spam emails.
Several solutions to the spam problem involve detection and filtering of the spam emails on the client side. Machine learning approaches have been used in the past for this purpose. Some examples of this are: Bayesian classifiers as Naive Bayes, , , , C4.5, Ripper and Support Vector Machine(SVM) and others. In many of these approaches, Bayesian classifiers were observed to give good results and so they have been widely used in several spam filtering software. A number of techniques make use of clustering as a part of their spam detection approach as: clustering followed by KNN classification , , clustering followed by KNN or BIRCH classification  and clustering followed by SVM classification . Up to the knowledge of the authors, clustering with association rules has not been used for spam detection in the past.
Class priors of hams and spams change over time and assuming them static is not realistic. Over time, the writers of scams can change any number of features, such as the motive for money transfer of the name and title subject of who is in need of help. Moreover, sections of the story or plead may change as well, such as when a paragraph of the message is removed or added. It is not uncommon to find that over time, there is a continual tweaking of the scam,where a part of the scam is changed while keeping most parts in common . Something of an arms race has emerged between spammers and the spam filters used to combat spam. As filters are adapted to contend with today’s types of spam emails, the spammers alter, obfuscate and confuse filters by disguising their emails to look more like legitimate email. This dynamic nature of spam & scam raises a requirement for update in any filter that is to be successful over time in identifying spam .
12 Read more
Today ,emails are used for communication purpose by many users. Emails are broadly classified as spam mails and non-spam mails.First we will try to explain what is spam mails and non-spam mails and how affects on email user Spam is defined as bad emails and unwanted emails sent with the purpose of spreading viruses, for fraud in business and causing harm to email users. Non-spam mails are nothing but our regular emails which is useful for email users. According to the survey, today email users received spam emails than non-spam emails. In 1997 ,corporate networks received 10% emails are spam.The objective of email classification is to decide spam emails and not let them be delivered to email users. In document classification technique, document can be categorized into different predefined categories, have been applied to email classification with satisfactory result. In document classification, the document can be represented by vector space model(VSM). Each email can be represented into the vector space model ,i.e. each email is considered as a vector of word term. Since there are so many different word in emails, and all classifier can‟t be handle such a high dimension ,only few powerful classification terms should be used.
Compromised machines on the Internet are generally referred as bots, and the set of bots controlled by an entity is called a botnet. Botnets are used for different purposes like mounting DDoS,generating click fraud, stealing user passwords and identities, and sending spam email. Compromised machines are one of the key security threats on the Internet. The compromised machines in a network that are involved in the spamming activities, are commonly known as spam zombies.We have developed an effective solution for detecting spam zombies named “SPOT”. SPOT is designed based on a powerful statistical tool called Sequential Probability Ratio Test(SPRT), which has bounded false positive and false negative error rates. We also study two spam zombie detection algorithms based on the number and the percentage of spam messages originated by the machine.
Email Spam is most crucial matter in a social network. Thereare many problem created through spam. The spam isnothing this is unwanted message or mail which the end userdoesn’t want in our mail box. Because of these spam theperformance of the system can be degraded and also affectedthe accuracy of the system. To send the unsolicited orunwanted messages which are also called spam is used inElectronic spamming. In this project explain about the emailspam, where how spam can spoil the performance of mailingsystem. In the previous study there are many types of spamclassifier are present too detect the spam and non-spammails.
ABSTRACT: With the trend that the internet is becoming more accessible and our devices being more mobile, people are spending an increasing amount of time on social networking sites. Out of these, twitter is one of the popular micro blogging service site. This popularity of twitter has attracted more & more spammers. Spammers send unwanted tweets to twitter users to promote websites or services. This leads to external phishing sites or malware downloads, which has become a huge issue for online safety & undetermined user experience. In order to stop spammers, although researchers have proposed number of mechanisms, the current solution fails to detect twitter spams precisely and effectively. In order to prevent this attacks, training tweets are added and for real time spam detection, 12 light weight features for tweet representation such as account age, number of followers, number of tweets, number of retweets are extracted. Spam detection mainly builds the classification model which includes the binary classification and further it can be solved by using machine learning algorithms. System reports the impact of the data related factors, such as spam to non-spam ratio, training data size, and data sampling, to the detection performance.. The System shows the spam detection is big challenge and it bridges the gap between the performance evaluation and mainly focus on the data, feature and model to identify the genuine user and report the spams.
10 Read more
Recent years have seen increasing re- search on extracting and using temporal information in natural language applica- tions. However most of the works found in the literature have focused on identi- fying and understanding temporal expres- sions in newswire texts. In this paper we report our work on anchoring tempo- ral expressions in a novel genre, emails. The highly under-specified nature of these expressions fits well with our constraint- based representation of time, Time Cal- culus for Natural Language (TCNL). We have developed and evaluated a Tempo- ral Expression Anchoror (TEA), and the result shows that it performs significantly better than the baseline, and compares fa- vorably with some of the closely related work.
Due to the existence of numerous merchant sites, forums, discussion groups and blogs, individual users are participating more actively and generating vast amount of new data which is symbolized by new terms such as “Blog journalism”, “Consumer generated mediums” and “User Generated Contents”. As customer feedbacks on the Web influence other customer‟s decision, these feedbacks have become an important source of information for business to take into account while developing marketing and product development plans. It has been observed that user generated contents includes junk and unnecessary information known as “spam”, whose identification is important to safeguard the interest of large web community.
Email messages for the intervention group are based on self-help strategies endorsed by depression experts  (see Table 1 for an overview). These self-help strategies were expanded into lengthier email messages. The emails aim to change participant behaviour rather than passively increase knowledge, and as such, their design was informed by theories of behaviour change, persua- sion, health communication, and communication on the web [23-32]. Each message contains a brief overview of the strategy, a lengthier explanation of the strategy, why the strategy would work, tips on how the strategy could be implemented, suggested solutions for barriers to implementing the strategy, and an appeal to commit to implementing the strategy. The content of the messages (e.g., strategy rationales, potential barriers and solutions) was derived from professional books, and posts on depression forums by consumers [33-37]. Messages were refined further by a working group of mental health researchers (including consumer researchers). The mes- sages have minimal tailoring to individual participant characteristics . For example, links to internet resources are tailored based on country of residence and the wording advocating each strategy is tailored to the participant ’ s familiarity with using each strategy , as indicated during the baseline questionnaire. The strate- gies are ordered from highest to lowest feasibility to carry out, based on rankings from the original expert study . This is consistent with clinical advice [39,40] to start homework with simple, feasible tasks and build up with time and success.
With the advent of the electronic mail system in the 1970s, a new opportunity for direct marketing using unsolicited electronic mail became apparent. In 1978, Gary Thuerk compiled a list of those on the Arpanet and then sent out a huge mailing publicising Digital Equipment Corporation (DEC—now Compaq) sys- tems. The reaction from the Defense Communica- tions Agency (DCA), who ran Arpanet, was very negative, and it was this negative reaction that ensured that it was a long time before unsolicited e- mail was used again (Templeton, 2003). As long as the U.S. government controlled a major part of the backbone, most forms of commercial activity were forbidden (Hayes, 2003). However, in 1993, the Internet Network Information Center was priva- tized, and with no central government controls, spam, as it is now called, came into wider use.
10 Read more
Spammers always find new ways to get spammy content to the public. Very commonly this is accomplished by using email, social media, or advertisements. According to a 2011 report by the Messaging Anti-Abuse Working Group roughly 90% of all emails in the United States are spam. This is why we will be taking a more detailed look at email spam. Spam filters have been getting better at detecting spam and removing it, but no method is able to block 100% of it. Because of this, many different methods of text classification have been developed, including a group of classifiers that use a Bayesian approach. The Bayesian approach to spam filtering was one of the earliest methods used to filter spam, and it remains relevant to this day.
2. To hurt the reputation of some other target objects, e.g., aftereffects of one's opponents (scrutinizing spam). To fulfill the above objectives, the spammer generally takes both or one of the exercises: (1) make undeserving positive reviews for the target articles with a particular final objective to propel them; (2) create noxious negative overviews for the target things to hurt their reputation. Table 4 gives a clear point of view of sort 1 spam. Spam studies in areas 1, 3 and 5 are customarily made by producers out of the thing or persons with direct money related or distinctive premiums in the thing. They will probably propel the thing. Notwithstanding the way that conclusions imparted in range 1 may be substantial, analysts don't report their hostile situation. Note that awesome, dreadful and typical things can be portrayed in light of ordinary evaluations given to things. Spam reviews in districts 2, 4, and 6 are inclined to be created by contenders.
12 Read more
social networking sites like Twitter, Facebook, LinkedIn are very popular. Twitter, an online Social Networking site, is one of the most visited sites. Lot of users communicates with each other using Twitter. The rapidly growing social network Twitter has been infiltrated by large amount of spam. As Twitter spam is not similar to traditional spam, such as email and blog spam, conventional spam filtering methods are not appropriate and effective to detect it. Thus, many researchers have proposed schemes to detect spammers in Twitter, so need to identify spammers in twitter.
The Association of Bureaux De Change Operators of Nigeria manages all independent bureaux de change operators in the Nigeria. There are thousands of bureaux de change operators that are managed by the company and all their transactions solely depend on ABCON. Currently, ABCON faces security threats of viruses, infected emails (spam), data loss or theft and human error. Thus, all these threats can be disastrous not only to ABCON but their customers as well. ABCON deals with a lot of customers and their critical information are stored in the organisation. Technology makes work faster and easier but it also comes with disadvantages that can be used to cause harm. Due to the weaknesses in technology, ABCON needs to have an up-to-date disaster recovery model that will help them in reducing threats that may cause data loss in the organisation. Protecting the confidentiality, integrity and availability of customer‟s information is critical to ABCON.
31 Read more
There are many digest algorithms in the anti-spam field. In the distributed anti-spam mechanism DCC  (Distributed Checksum Clearinghouse), there are two digests Dig l and Dig 2 for each email. Dig l is the MD5 value of the email body after removing the simple characters such as comma and semicolon, etc. Dig 2 is the MD5 value of the words set which is composed of special words in the email. Using the MD5 algorithm can ensure different emails to have different digests, but it can't do well with the usual spam attack strategy. For Dig l, if the spam attacker adds some additional information in the email, the Dig l will be entirely different. For Dig2, if the spam attacker exchanges the positions of some sentences in the email, the Dig 2 will be entirely different. So the digest algorithms in the DCC mechanism aren't strong enough to be used in anti-spam field. The CTPH  is a text digest algorithm which is based on fragments hash. This algorithm divides the text into fragments first, and then calculates the hash values of all the fragments, finally gets a character string composed of the hash value as the digest. CTPH determines the similarity of the two texts by computing the edit distance of the digests. The CTPH algorithm can identify the similar texts accurately with editing differences, so it has been widely used in computer forensic and anti-spam field. However, this algorithm doesn't do well with the usual spam attack strategy neither. Adding special characters after some sentences can make CTPH digests of similar emails completely different.