Review on Crime Analysis and Prediction
Using Data Mining Techniques
Dr.M.Sreedevi
[1], A.Harsha Vardhan Reddy
[2], Ch.Venakata Sai Krishna Reddy
[3]Professor, Dept .of CSE , KLEF, Vaddeshwaram, Guntur,India[1]
B.Tech Scholor , Dept .of CSE, KLEF, Vaddeshwaram,Guntur,India[2]
B.Tech Scholor , Dept .of CSE, KLEF, Vaddeshwaram,Guntur,India[3]
ABSTRACT: The main aim is that instead of focusing on causes of crime occurrences we are focusing mainly on crime factors. By Using the concept of Data Mining, we can extract previously unknown useful information from an unstructured data. we can approach between computer science and criminal justice to develop a data mining procedure that can help solve crimes faster. criminals also be predicted based on the crime data . this paper explains various types of criminal analysis and crime prediction using several Data Mining techniques.
KEYWORDS: Data mining, machine learning, crime analysis, crime prediction
I. INTRODUCTION
In our day to day life crime rate is increasing but we cannot predict the crime because it is neither systematic or random. The predicted result will not be 100% accuracy it will reduce crime rate to some extent. The main is that to develop the better efficient crime pattern detection tool to identify crime patterns effectively, but it will take a lot of time whether particular crime fits for the known pattern or not. If the pattern did not match, then we will go to the new pattern. This method is that if we have a data about the known crimes we will get the pattern for particular place. Therefore, Clustering technique is used for existing and known crimes. It is able to detect newer and unknown patterns in future; clustering techniques works better. we have several methods and methodologies about crime analysis. Since our ancient days we give very much importance to woman in our Indian culture. But, now a days position of woman is totally different. Crime against woman is increasing day by day in recent years. In police department we have huge number of datasets . By using these data set we can predict the crime for some extent by using data mining techniques.
In our day to day life some major issues are:-
● Increasing of crime information should be stored and analysed.
● Taking a lot of time to investigate the crime because of complex issues.
We can collect the datasets from crime data available in National Crime Bureau of records.IN this data we have some missing values as well as some wrong values.
To have a proper data pre-processing data include cleaning and pre-processing .The data can be classified into various groups based on their characteristics in clustering states and cities we use K-means algorithm to group the data into same characteristics.
By using the correlated data set we can identify the correlated crimes. It is used to make the predictions’. If two crimes are occurred in particular place then two crimes are correlated to each other, if these crimes are already done in past we can predict the future correlated crimes.
By using linear regression, we can predict the various types of crimes against women.
II. RELATED WORK
Text, Content& NLP-Based Method:
Due to the increase in internet usage the data generated form social medias and other web pages like e newspapers we get more information related to crimes that are happening over time it is easy to predict the crime happenings by mining that data . we can know the opinions and judgements of various people on that
crimes and how they happened.by considering the social media posts and tweets, news.
The basic idea is that there will be a relation between the crimes that happened over time like terrorist attacks that happens at different places which are more likely share similar characteristics and targeted areas which we call as Crime hotspots where most of the crimes happen.
So we can understand that crime is not a random event. Criminology and criminal studies have also stated that crime should not be considered as a random event .
Occurrences of crime depends on many factors like reason, situation ,motive etc.,
Some theories that are used to predict crime: ● Integrated theory
● Biological Theories ● Psychological Theories ● Sociological Theories ● Conflict Theories ● Victimization Theories ● Choice Theories
Deep Learning Method
Deep learning or Deep neural networks work with any kind of noisy , discrete data and can recognize the patterns from the given dataset so that we can understand the relation between crimes . Generally deep neural networks work over a graph architecture as of the experiment mentioned in [3].
Input parameters :
Input dataset and other required information and the time slice to predict whether there are chances of occurrence of crime .
Output:
Crime Patterns & Evidence Based Method
:
This Method Proposed by Bogahawatte & adikariet.al[4] which they use clustering &classification for effective investigation of crimes &criminal identification. In this they developing a system named Intelligent Crime Investigation System(ICSIS)that could identify a criminal based up on the evidence collected from the crime location
.
METH OD INPU T DATAS ET PRE PROCESS ING FEATURE EXTRACT ION CLASSIFICA TION/ CLUSTERIN G STREN GTH WEAKN ESS OUTCOM E Crime pattern s and Eviden ce-based method s Burgla ry, Robbe ry, and Homic ide Crime dataset for crime analysis by polices in England and Wales from 1990 – 2011
Nil Filtering of dataset, Outlier detection using distance operator (k-NN), Genetic Algorithm used for optimizing of outlier detection
Classification was done using Decision Tree using GINI index and the testing and training done using Sample Stratified
Use of GA to optimize the distance operator paramete rs in Clusterin g and Predict the cluster’s members The number of clusters in the clustering process needs to be optimized and further optimizat ion of the
operator parameters
based on classifica tion using Decision Tree technique needs to be done
effectivene ss
Spatial & Geo-Location Based Methods
:
This Method Proposed by Huang focused on different approach for criminal activity prediction based on mining location based on social network interactions. In this they can collect the information using the geographical interactions & data collection from the people. By using the Haversine formula the distance between the two points i.e., the crime location & venue location is calculated &shown in the google maps API& open street map.
METH OD INPU T DATA SET PRE PROCESS ING FEATURE EXTRACT ION CLASSIFICA TION/ CLUSTERIN G STREN GTH WEAKN ESS OUTCOM E Spatial and Geo-locatio n based method s Geo-locati on and Crim e Type SNAP Gowall a dataset, DataSF crimina l dataset up to Februar y 2015
Extraction of crime type like Assault, Robbery, Theft, Vandalism , Drug Geographi cal features, Popularity, Location category, Neighbor entropy, Social Tightness density, crime location, venue from Foursquare Random Forest(RF), Linear Regression (LR) and Support Vector Machine (SVM) Random Split method utilized with 80% for training and 20% for testing in classifica tion
be more difficult or easier to be predicted
Prisoner Based Methods
:
This Method Proposed by Sheehy which is towards the treatment of the mentally ill people inside the prison. In this we identify the mentally ill criminals using their social security number with all there criminal personal records &their crime career records. It describe & classify criminals into a misdemeanour based on the mental health of the criminals.
METH OD INPU T DATAS ET PRE PROCESS ING FEATURE EXTRACT ION CLASSIFICA TION/ CLUSTERIN G STRENG TH WEAKN ESS OUTCO ME Prisone r based method s The Social Secur ity Numb er (SSN ) with all the crimi nal perso nal and crime career recor ds. Albemarl e- Charlotte svil le Regional Jail (ACRJ), Jefferson Area Commun ity Correctio ns (JACC) and Region Ten Commun ity Services Board. A combinatio n which includes the Social Security Number (SSN) and date was used to link the databases together. age, criminal history, employme nt history, crime type := “assault”, “larceny”, “supervisio n violations”, “narcotics charges”, “traffic violations”, “driving while intoxicated ”,
Offenders are classified into three classes namely “high”, “medium”, and “low” as levels of recidivism risk potential. Further, the mental health status of the inmates is categorized into two categories “referred.” and “not-referred.”
Analysis for the identifica tion of the mentally ill felony.
Statistical classificat ion of criminals missing. Could have taken more features “Referre d” individu als can be made to have a longer stay in jail longer than “not-referred” individu als.
Communication Based Methods:-
METHOD INPUT DATASET PRE PROCE SSING FEATU RE EXTRA CTION CLASSIFIC ATION/ CLUSTERI NG STRENGT H WE AK NE SS OUTC OME Communic ation based methods
Flow of
communicatio ns/information links between two
criminals(e.g., phone call records,messa ges,etc.)names of
criminals/susp ects,the type of
crime,location and date of the crime. Real-world communicat ion records(DB LP,Enron email dataset,Nod obo mobile phone records dataset) Creating the graph based on the data and then assignin g weight to a vertex based on its number of commun ication attempts in the criminal graph.
The immediat e leaders of lower-level criminals and the lower-level criminals themselv es are extracted .
Evaluation of the accuracy of the three systems by measuring their Recall, Precision, and Euclidean Distance. Evaluated SIIMCO by comparing it experiment ally with CrimeNet Explorer and LogAnalys is
III. CRIME ANALYSIS PROCEDURE
Fig:1 Crime analysis procedure
➢ The criminals can hold certain properties and their crimes characteristics and crime careers may vary from one criminal to another. Such type of information can be taken as input dataset.
➢ The input dataset is given to a pre-processor which performs the pre-processing based on the requirements.
➢ Once the pre processing is completed the features or attributes from those information are extracted which may be in the form of text content from emails, the crime factors for day, criminal characteristics, geo-location of the criminals etc..,
➢ The pre processed results is further given to the classification algorithm or the clustering algorithm based on the requirements.
➢ The requirements may be anything from selecting the crime prone areas to predicting the criminal based on the previous crime records.
➢ The classification algorithm works in a supervised learning manner in which the training and testing phase is required in order to train the classifier to identify the new unknown crime record.
Methodology:
Fig:3 -Methodology
Data Collection:
In this step we are collecting the data from various resources like new site, blogs, social media etc. .These collected data is stored into data base for future use and it is unstructured data. In this we can use Object Oriented Programming which is easy to use and flexible.
Classification:
In this step we use Naive Bayes Algorithm which is supervised learning method. The algorithm classifies a news article into a crime type to which it fits the best. By using naïve-bayes classifier the main advantage is that it is simple, converges quicker than logistic regression. It works well for small amount of training to calculate the classification parameters .The concept of Named Entity Recognition(NER) is used to find and classify elements in text into predefined categories such as person names, organizations, locations etc.. by using this concept we can gather more information details about related crimes.
Pattern Identification:
Prediction
In this step we are using the decision tree concept. It is simple to understand and interpret and it works well with large datasets. It is similar to graph in each internal node represents test on an attribute and each branch represents outcome of test.
The tree has three types of nodes: (i)Root Node:-
It has incoming edges and zero or more outgoing edges. (ii)Internal Node:-
It has only one incoming edge and two or more outgoing edges. (iii)Leaf Node or End Node:-
It has exactly one incoming edge and no outgoing edges.
It is a supervised Machine Learning technique that builds a decision tree from a set of class labelled training samples and it use a set of binary rules to calculate the class value
.
Visualization
In this step we used the graphical representation using a heat map. In heat map darker colours indicates low activity and brighter colour indicates high activity. The main advantages of using heat maps are we can easily analyse the data we want and out of range data is automatically discarded.
IV.CONCLUSION
By observing this survey this is a theoretical study for several methods and methodologies in identification of crime and criminals which includes Text/NLP,crimepatterns,geo location, prisoner methods , communication based methods Data collection, classification, pattern identification, prediction,
Visualisation are the Methodologies used for this survey.
These are the Data mining techniques studied from this survey for identifying the criminals in the society and also providing the better future to live in.
REFERENCES
1) H.Benjamin Fredrick David1 and A. Suruliandi2, “SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MININGTECHNIQUES” , ICTACT JOURNAL ON SOFT COMPUTING, APRIL 2017, VOLUME: 07, ISSUE: 03
2) D.E. Brown, ”The regional crime analysis program(RECAP):A frame work for mining data to catch criminals” ,In Proceedings of theIEEE International Conference on Systems, Man,and Cybernetics,Vol.3,pp.2848-2853,1998.
3) Jazeem Azeez, D. John Aravindhar “Hybrid Approach to Crime Prediction using Deep learning”, 978-1-4799-8792-4/15/ 2015 IEEE 4) KaumaleeBogahawatte and shalindaAdikari,”Intelligent Criminal IdentifiactionSystem”,Proceedings of 8th IEEE InternationalConference on
Computer Science and Education,pp.633-638,2013.
5) M.Sreedevi and G.Vijay Kumar “Parallel and Distributed Approach for mining closed regular patterns on Incremental databases at user thresholds” ACM Digital Library ,DOI: 10.1145/2677855.2677914, 2014.
6) Lakshmi narasamma and M.Sreedevi “Modeling of Tweet Summarization Systems using Data Mining Techniques: A Review Report” Indian Journal of Science and Technology. Vol 9(44), november 2016.
7) Mugdha Sharma,”Z-Crime:A Data Mining Tool for the Detection of Suspicious Criminal Activities based on the Decision Tree”,International Conference on Data Mining and Intelligent Computing,pp.1-6,2014.
8) Kamal Taha and paulD.Yoo,”SIMMCO:A Forensic Investigation Tool for Identifying the Influential Members of a criminal Organization”,IEEE Transactions on Information Forensics and Security,Vol.11,No.4,pp.811-822,2016.
9) Malathi. A and Dr. S. Santhosh Baboo. Article:an enhanced algorithm to predict a future crime using data mining. International Journal of Computer Applications, 21(1):1–6, May 2011.
10) Colleen McCue, “Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis”, Butterworth-Heinemann, 2014
11) Manish Gupta, B. Chandra and M.P. Gupta, “Crime Data Mining for Indian Police Information System”, Journal of Crime, Vol. 2, No. 6, pp. 43-54, 2006.
12) Sathyadevan,Shiju,Devan M.S, and Surya Gangadharan S.. "Crime analysis and prediction using data mining", 2014 FirstInternationalConference on Networks & Soft Computing(ICNSC2014), 2014.
14) Kevin Sheehy, Thomas Rehberger, Andrew O'Shea, William Hammond, Charlotte Blais, Michael Smith, K. Preston White, Neal Goodloe. "Evidence-based analysis of mentally 111 individuals in the criminal justice system" , 2016 IEEE Systems and Information Engineering Design Symposium (SIEDS), 2016