# Data Analytics for Data Science, Big Data & Machine Learning

## Top PDF Data Analytics for Data Science, Big Data & Machine Learning:

### Big Data Management, Data Science and Data Analytics: What is it and Where— An Educational in Indian Perspective

Abstract—Large object normally treated as ‘Big’. It is a fact that Data is the raw information and content. Technology is rapidly changing emerging and today social media is very much popular and broken all the geographical boundaries. Big data is a concept and procedure which is deals with the data sets which are so large and in which traditional data processing become tough and eventually applications are inadequate. Analysis, capture, sharing, storage, visualization, querying, information etc in general data management principles become important challenge. Hence data sets having complexity and huge sizes suffer in adequacy. Business Intelligence is a related branch and accountable for the descriptive statistics with soaring information compactness to measure things, identify trends and so on. Data science approaches is deals with the quantitative analysis of data by using methods of statistical learning. It is an approach and combines classical statistical methods including progress in computational systems along with machine learning. This is a theoretical paper depicted current trends and issues of data science and big data. Moreover paper is also describes the potential and available programs in the field. Paper is also proposed and possible programs in the field.

### Data Science and Big Data Analytics pdf

Linear regression is an analytical technique used to model the relationship between several input variables and a continuous outcome variable. A key assumption is that the relationship between an input variable and the outcome variable is linear. Although this assumption may appear restrictive, it is often possible to properly transform the input or outcome variables to achieve a linear relationship between the modiﬁed input and outcome variables. Possible transformations will be covered in more detail later in the chapter. The physical sciences have well-known linear models, such as Ohm’s Law, which states that the electri- cal current ﬂowing through a resistive circuit is linearly proportional to the voltage applied to the circuit. Such a model is considered deterministic in the sense that if the input values are known, the value of the outcome variable is precisely determined. A linear regression model is a probabilistic one that accounts for the randomness that can aﬀect any particular outcome. Based on known input values, a linear regression model provides the expected value of the outcome variable based on the values of the input variables, but some uncertainty may remain in predicting any particular outcome. Thus, linear regression models are useful in physical and social science applications where there may be considerable variation in a particular outcome based on a given set of input values. After presenting possible linear regression use cases, the foundations of linear regression modeling are provided.

### Big Data Science and EXASOL as Big Data Analytics tool

Big data science will revolutionize the way businesses generate value from data. It provides the ability to create, deploy, and interact with production quality data science models right where the data is stored. In addition, by wrapping big data science in a standard SQL interface, EXASOL provides a smooth transition from traditional BI to big data science, both for analysts and for their SQL toolsets. In this paper we have discussed how big data science architectures result from the convergence of the following technologies: advanced in-memory, massive parallel processing, and in- database programming. This is the very reason why EXASOL is the perfect solution if anyone wants to build and create an agile and scalable big data science system.

### Machine Learning for Big Data Analytics

ABSTRACT: Big data is more than just repository and access to data. Big data Analytics plays an imperative role in making sense of the data and capitalizing it. But it’s a substantial challenge to discern and cultivate new types of machine learning algorithms. Scaling up big data to suitable dimensionality is an issue that is tackled in machine learning algorithms, also there are challenges of dealing with velocity, volume and various more across all categories of machine learning algorithms. This paper probes big data concept, bringing with a desperate need for advanced data acquisition, management, and analysis mechanisms.Also, this paper presents the concept of big data and spotlights the four phases of big data that are engendering data, acquisition of data, storing this voluminous data, and then analysing it. The next phase of this paper, zeros on dealing with big data using machine learning (ML), and spotlighted the four ML methods: supervised learning, unsupervised learning, semi-supervised and reinforcement learning and its impact on big data.

### A Study on Basics of Data mining, Machine Learning and Big data

Chanchal Yadav, Shullang Wang, Manoj Kumar, “Algorithm and Approaches to handle large Data- A Survey”, IJCSN, Vol 2, Issuue 3, 2013 ISSN:2277-5420, presents a review of various algorithms from 1994-2013 necessary for handling big data set. It gives an overview of architecture and algorithms used in large data sets. These algorithms define various structures and methods implemented to handle Big Data and this paper lists various tools that were developed for analyzing them. It also describes about the various security issues, application and trends followed by a large data set [9].

### Big Data Analytics on Indian Healthcare Data

Available online: https://edupediapublications.org/journals/index.php/IJR/ P a g e | 3051 [9] S. Ibrahim, H. Jin, L. Lu, L. Qi, S. Wu and X. Shi, Evaluating Map Reduce on Virtual Machines: The Hadoop Case, Springer: Cloud Computing Lecture Notes in Computer Science, vol. 5931, pp. 519–528, (2009).

### Index Terms- Big data analytics, Machine Learning, Healthcare, Disease Detection, Medical Data Analysis.

Paper [7] gives the survey for Disease prediction in big data healthcare using extended CNN. This concept is applied in the medical field to implements the hospital. It provides the (i) high accuracy, (ii) high performance, (iii) high convergence speed. To select the particular region and then, analyzed the chronic diseases, that holds the structured data (extracted useful features), the unstructured data is use the CNN technique, so automatically selects the features. The novel CNN is proposed the medical data, and disease risk model is combined this data. The characteristic behavior of this system is selects the data via previous term. This term is previously applied is possible but not satisfied the disease changes, because disease level is not standard, it is changed in every seconds. To take the selected data from large number of data and improves the accuracy by using risk classification term. The proposed system aim is to predict the risk in liver oriented disease. So, the hospital dataset is related to the liver oriented disease and it collects only the structured data from liver disease information. In the proposed system is use the disease risk modeling and get the accuracy. But the risk prediction is depends on the different feature of medical data with higher accuracy.

### Impact of Deep Learning in Big Data Analytics

New technologies enable us to collect more data than ever before. With an overwhelming amount of web-based, mobile, and sensor-generated data arriving at a terabyte and even zeta byte scale, new science and insights can be discovered from the highly detailed and domain-specific information which can contain useful information about problems such as national intelligence, cyber security, fraud detection, financial trading, personalized medicine and treatments, personalized information and recommendations and personalized athletic training. Machine learning algorithms, particularly deep learning (evolved from artificial neural networks) plays a vital role in big data analysis. Deep Learning algorithms extracts high-level and complex abstractions by discovering intricate structure in large data sets. Deep learning techniques are nowadays the leading approaches to solve complex machine learning and pattern recognition problems such as speech and image understanding, semantic indexing, data tagging and fast information retrieval. This paper focuses on all aspects of big data analytics, with a particular emphasis on the analysis and learning of massive volume of unstructured data and developing effective and efficient large-scale learning algorithms.

### Big data analytics and organisational change The case of learning analytics

One of the most prevalent unintended effects of the introduction of LA was the fact that LA data led to changes in work and working practices, as already signalled by critical researchers of education, whose arguments I have summarised in the background literature. Indeed, also in the case studied, for example, teaching staff found that “it [LA] detracts from the job of educating” (I_Teaching_011) and introduces a host of different data-related activities which ultimately take away time they would otherwise spend teaching or interacting with students. Importantly, a number of interviewees have experienced what they called “the move towards e-learning” (I_Teaching_011), that is, an impression or encouragement they received that e- learning elements should be introduced even in face-to-face teaching, with some residential modules introducing two or three weeks of online classes with an explicit connection to “the move towards using the data that you get from e-learning” (I_Teaching_011). While it could be argued that the move towards e-learning can have other causes, such as savings, resourcing, and the immense profitability of distance learning programmes, the conviction with which some interviewees expressed their view that they were being almost forced to introduce distance learning components in their face-to-face modules seems to confirm the attribution of these changes to the LA system: “Maybe the data can strengthen them more to having more like more online programmes. Or also to have the campus based programmes to move closer to the distance learning approaches, I guess” (I_Academic_007). It has been pointed out that “The university seems to have become a lot more open to online learning as a way of engaging students, not as a way of just disseminating information. And I feel that part of that is to do with the ability to monitor the analytics and understand the students better” (I_Teaching_002 Follow-up). One interviewee in particular, puzzled as to why she was asked to introduce a few weeks of distance learning into her residential course, arrived at the conclusion that it was due to the trackability and traceability of online actions as opposed to classroom activity.

### 1. Prototype for policy recommendation system based on aadhar data

Abstract- The main focused of Watermarking is developing and The project carried is in the field of Big data analytics related to computer science. Data analytics is the process of examining data sets in order to draw conclusions about the information they contain. Big data analytics refers to the techniques that can be used for converting raw data into meaningful information which helps in business analysis and forms a decision support system for the executives in the organization. Big data is the large and complex collection of data that cannot be processed using traditional tools. In this proposed work, web application is designed to help government and people to gain knowledge about the government policies and count of people using policies. Citizens will know about the policies they are eligible for with existing policies. Government will know the count of people who are using policies. To implement this project, we are using Hadoop. Common masses can be benefited by the various governmental policies and they can proceed to recommended policies.

### Similarity Based Prediction System using Machine Learning Algorithms in Big Data Analytics

Abstract: Big Data is a noteworthy environment to maintain the diversity of the huge amount of data. The big data utilizes machine learning algorithms to process large datasets which comes from various places such as histories, weblogs, and data repositories, large datasets and data warehousing, etc. In an existing method, most of the data mining approaches might not be able to maintain the large dataset. Using datamining, the big data are having lack of compatibility with database systems and analysis tools; large dataset clustering and analyzing is a big issue in big data. For this reason, the research work uses machine learning algorithms which are implemented in the Hadoop tool to collect and process the large amount of data which is structured, semi-structured or unstructured in a reasonable amount of time. Also, it gives more accurate prediction system and accurate information. Using Machine Learning Algorithm computational cost and complexities is minimized. The overall research work is implemented in the Hadoop tool with the help of the python programming language and it is compared with some existing algorithms. The proposed work tested with suitable parameters such as accuracy, Kappa T and Kappa M.

### Deep learning applications and challenges in big data analytics

Zhou et al. [49] describe how a Deep Learning algorithm can be used for incremental feature learning on very large datasets, employing denoising autoencoders [50]. Denoising autoencoders are a variant of autoencoders which extract features from corrupted input, where the extracted features are robust to noisy data and good for classification purposes. Deep Learning algorithms in general use hidden layers to contribute towards the extrac- tion of features or data representations. In a denoising autoencoder, there is one hidden layer which extracts features, with the number of nodes in this hidden layer initially being the same as the number of features that would be extracted. Incrementally, the samples that do not conform to the given objective function (for example, their classification error is more than a threshold, or their reconstruction error is high) are collected and are used for adding new nodes to the hidden layer, with these new nodes being initialized based on those samples. Subsequently, incoming new data samples are used to jointly retrain all the features. This incremental feature learning and mapping can improve the discrimina- tive or generative objective function; however, monotonically adding features can lead to having a lot of redundant features and overfitting of data. Consequently, similar features are merged to produce a more compact set of features. Zhou et al. [49] demonstrate that the incremental feature learning method quickly converges to the optimal number of fea- tures in a large-scale online setting. This kind of incremental feature extraction is useful in applications where the distribution of data changes with respect to time in massive online data streams. Incremental feature learning and extraction can be generalized for other Deep Learning algorithms, such as RBM [7], and makes it possible to adapt to new incom- ing stream of an online large-scale data. Moreover, it avoids expensive cross-validation analysis in selecting the number of features in large-scale datasets.

### The Big Data analytics with Hadoop

1) Size: The name (Big Data) which is indicate that the data is big in the size which growth rate is high as compare to the last few recent years, that time the increasement of the data are low as compare to current time. The size of the data breaking all boundaries which is reached at top of the peak of the data storage because all these data are stored in the use of the future purpose. The data size goes to the Petabytes or zettabytes, which is a typical work to manage all these data for further use in the machine.

### A New Competitive Opportunities in Various Sectors: Arising from Big Data

Online education has a very big development at recent years and has a very increasing impact of the education sector. Digital learning is actually a collection of data and analytics which can contribute to teaching and learning. In this way many students participate in online or mobile learning, where are crated new data [13]. These new data, also with the help of social networks, are helping the students with the different background to correlate between them and help them to understand core course Concepts. Except from making education more personal and executive, also new types of data help researchers’ ability to learn about learning. In this case Big Data can provide more opportunities for new learning experience for children and young adults. Hence students can share information with educational institutions in this way they can expand their knowledge and skills. Furthermore, Educational institutes and Universities are able to help and prepare their future students.

### Title: BIG DATA COMPUTING AND CLOUDS﻿

Organizations are increasingly generating large volumes of data as result of instrumented business processes, monitoring of user activity, web site tracking, sensors, finance, accounting, among other reasons. With the advent of social network Web sites, users create records of their lives by daily posting details of activities they perform, events they attend, places they visit, pictures they take, and things they enjoy and want. This data deluge is often referred to as Big Data; a term that conveys the challenges it poses on existing infrastructure with respect to storage, management, interoperability, governance, and analysis of the data. In today’s competitive market, being able to explore data to understand customer behavior, segment customer base, offer customized services, and gain insights from data provided by multiple sources is key to competitive advantage. Although decision makers would like to base their decisions and actions on insights gained from this data, making sense of data, extracting non obvious patterns, and using these patterns to predict future behavior are not new topics. Knowledge Discovery in Data (KDD) aims to extract non obvious information using careful and detailed analysis and interpretation. Data mining, more specifically, aims to discover previously unknown interrelations among apparently unrelated attributes of data sets by applying methods from several areas including machine learning, database systems, and statistics. Analytics comprises techniques of KDD, data mining, text mining, statistical and quantitative analysis, explanatory and predictive models, and advanced and interactive visualization to drive decisions and actions. Fig. 1 depicts the common phases of a traditional analytics workflow for Big Data. Data from various sources, including databases, streams, marts, and data warehouses, are used to build models. The large volume and different types of the data can demand pre-processing tasks for integrating the data, cleaning it, and filtering it. The prepared data is used to train a model and to estimate its parameters. Once the model is estimated, it should be validated before its consumption. Normally this phase requires the use of the original input data and specific methods to validate the created model. Finally, the model is consumed and applied to data as it arrives. This phase, called model scoring, is used to generate predictions, prescriptions, and recommendations. The results are interpreted and evaluated, used to generate new models or calibrate existing ones, or are integrated to pre-processed data.

### Big Data Analytics Using Support Vector Machine Algorithm

Apache Sqoop is a CLI device designed in accordance with switch facts within Hadoop or relational databases. Sqoop be able inhalant statistics from an RDBMS such namely MySQL and Oracle Database among HDFS yet afterward export the information again afterwards information has been converted the usage of MapReduce. Sqoop also has the ability in accordance with income information within HBase or Hive. Sqoop connects in accordance with an RDBMS thru its JDBC connector then relies concerning the RDBMS to pencil the database schema for data in accordance with keep imported. Both arrival and export turn to advantage MapReduce, which offers parallelism verb as much well namely error tolerance. longevity During import, Sqoop reads the table, rank by using row, into HDFS. Because import is executed within parallel, the output between HDFS is more than one file.

### APPLICATIONS OF BIG DATA ANALYTICS

Dell offers its own big data package. Their solution includes an automated facility to load and continuously replicate changes from an Oracle database to a Hadoop cluster to support big data analytics projects. Techniques such as natural language processing, machine learning and sentiment analysis are made accessible through straightforward search and powerful visualization to enable users to learn relationships between different data streams and leverage these for their businesses.

### Big Data Analytics: An Overview

Security and Public Safety: Since the tragic events of September 11, 2001, security research has gained much attention, especially given the increasing dependency of business and our global society on digital enablement. Researchers in computational science, information systems, social sciences, engineering, medicine, and many other fields have been called upon to help enhance our ability to fight violence, terrorism, cyber crimes, and other cyber security concerns. Critical mission areas have been identified where information technology can contribute, as suggested in the U.S. Office of Homeland Security‘s report ―National Strategy for Homeland Security,‖ released in 2002, including intelligence and warning, border and transportation security, domestic counter-terrorism, protecting critical infrastructure(including cyberspace), defending against catastrophic terrorism, and emergency preparedness and response Intelligence, security, and public safety agencies are gathering large amounts of data from multiple sources, from criminal records of terrorism incidents, and from cyber security threats to multilingual open-source intelligence. Companies of different sizes are facing the daunting task of defending against Cyber security threats and protecting their intellectual assets and infrastructure. Processing and analyzing security-related data, however, is increasingly difficult. A significant challenge in security IT research is the information stovepipe and overload resulting from diverse data sources, multiple data formats, and large data volumes. Current research on technologies for cyber security, counter- terrorism, and crime fighting applications lacks a consistent framework for addressing these data challenges. Selected BI&A technologies such as criminal association rule mining and clustering, criminal network analysis, spatial-temporal analysis and visualization, multilingual text analytics, sentiment and affect analysis, and cyber attacks analysis and attribution should be considered for security informatics research.

### BIG DATA ANALYTICS: A PRIMER

6) Deep Learning: Deep learning (DL) refers to a family of approaches that have taken machine learning to a new level, helping computers make sense out of vast amounts of data. Deep learning algorithms are used to train deep networks with large amounts of data. DL has become a big wave of technology trend for big data and artificial intelligence [10]. 7) Visual Analytics: Data visualization plays a major role in understanding and exploring data because there is much to gain when data is presented in a visual manner. Visual analytics are efficient when working in a geospatial domain and multi-dimensional analysis.

### Big Data Analytics in Healthcare

Firstly, a platform for streaming data acquisition and ingestion is required, which has the bandwidth to handle multiple waveforms at different fidelities. Integrating these dynamic waveform data with static data from the EHR is a crucial component to provide situational and contextual awareness for the analytics engine. Enriching the data consumed by analytics not only makesthe system more robust but also helps balance the sensitivity and specificity of the predictive analytics. The specifics of the signal processing will largely depend on the type of disease cohort under investigation. A variety of signal processing mechanisms can be utilized to extract a multitude of target features which are then consumed by a trained machine learning model to produce actionable insight.