• No results found

Big Data Analytics and Optimization

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Analytics and Optimization"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

INTERNATIONAL SCHOOL OF ENGINEERING

http://www.insof e.edu.in

Big Data Analytics

and Optimization

C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e

C e r t i f i c a t e P r o g r a m s i n A c c e l e r a t e d E n g i n e e r i n g

(2)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

LIST OF COURSES

Essential Business Skills for a Data Scientist ... 3

Thinking Skills for Effective Architecture Design ... 4

Essential Engineering Skills in Big Data Analytics ... 5

Statistical Modeling for Predictive Analytics in Engineering and Business ... 7

Optimization and Decision Analysis ... 9

Engineering Big Data with R and Hadoop Ecosystem ... 10

Text Mining, Social Network Analysis and Natural Language Processing ... 11

Methods and Algorithms in Machine Learning ... 12

(3)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7110c

Essential Business Skills for a Data Scientist

This one-day module is being independently offered to several CXOs and senior management across the globe and highly appreciated as one of the most hands-on managerial introduction to data science. You learn to become a consumer of analytics for which McKinsey predicted there is unprecedented demand.

Day 1

 Why should we build models or use data to run a business: The edge of evidence over intuition

 What kind of models do data scientists build and where they do not work

 When you want a prediction

o How do you estimate how much to pay and how long to wait

o How do you precisely define for the teams what to deliver

o How do you evaluate how good their prediction are

 When does big unstructured data become really important

 When you want to build an analytics group

o What software or hardware should you invest in

o Several engagement models and the ideal teams for each

 Business plan: Each team develops a business plan for setting up an analytics organization, and creates a complete business plan and presents.

 Case analysis: Participants would be divided into separate teams and would be given several high level business problems. They have to identify the prediction problems with high ROI and provide concise requirements

(4)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7111c

Thinking Skills for Effective Architecture Design

This two-day module trains the data scientists with skills to design and architect practical and workable solutions. They also understand the skills needed to coordinate between business and technical teams.

Day 1

 Thinking tools

o Approximations and estimations

o Geometric visualization of data and models o Probabilistic analysis of data and models o Analyzing networks and graphs

o Analyzing transitions, Markov chains and unstructured data o Estimating complexity of algorithms

 Choosing the right models and architecting a solution

o Structure and anatomy of models

o Problematic data and choosing the best experimentation

 Sources of errors in predictive models and techniques to minimize them

Day 2

 Interacting with technical and business teams

o Translating typical business problems into technical specifications

o Brainstorming and analyzing data and designing transformations

o Manual analysis of the models

 Case study: Participants will be given business problems. They need to:

o Translate it into a specific technical solution

o Brain storm for data and design transformations

(5)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7412c

Essential Engineering Skills in Big Data Analytics

This 8 day module trains engineers in hands-on Big Data and analytics tools like R, Hadoop, Hive and Pig. The students work on several real world data sets.

Day 1

 Reading from Excel, CSV and other forms

 Data exploration (histograms, bar chart, box plot, line graph, scatter plot)

 Data story telling - The science, ggplot, bubble charts with multiple dimensions, gauge charts, treemap, heat map and motion charts

Day 2

 Advanced data pre-processing using Excel and spreadsheets Day 3

 Lecture Session: Data preprocessing of structured data

 Lab Session:

o In R

o Handling missing values

o Binning o Standardization o Outlier/Noise o PCA o Type Conversion Day 4

 Lecture Session: Introduction to Text mining, Text preprocessing

 Lab Session: Text preprocessing

o Write a Web crawler to collect data

o In R

o Find Unique words and count

o Handling number

o Punctuations

o Stopwords

o incorrect spellings

o stemming

o lemmatization and TxD computation Day 5

 Lecture Session: unstructured data processing - audio and image processing

 Lab Session: unstructured data processing - audio and image processing Day 6

 Lecture Session: Introduction - Big Data, Hadoop Applications

 Lab Session: Hadoop -

o 1. Different types of Installations.

(6)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

o 3. Basic OS-Level commands. Day 7

 Lecture Session:

o 1. Parallel and Distributed Computing.

o 2. Introduction to Algorithms.

o 3. Concurrent Algorithms.

o 4. Linux, Java rapid refresher

 Lab Session:

o 1. Java Hands On.

o 2. Reading Code, libraries etc.

o 3. R refresher o 4. Python refresher Day 8  Lecture Session: o 1. NOSQL o 2. HDFS o 3. CDH4 – HDFS  Lab Session: o 1. HDFS Command line

(7)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7302c

Statistical Modeling for Predictive Analytics in Engineering and

Business

Statistical Modeling for Predictive Analytics in Engineering and Business

This six day module is aimed at teaching “how to think like a statistician”. “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write”, wrote H. G. Wells in the year 1895. That day and age has arrived with Data Analytics going mainstream (For Today’s Graduate, Just One Word: Statistics - http://www.nytimes.com/2009/08/06/technology/06stats.html).

This course thoroughly trains candidates on the following techniques: Day 1

 Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard Deviation); Expectations of a Variable; Moment Generating Functions

 Describing an attribute: Probability distributions (Discrete and Continuous) -Bernoulli, Binomial, Multinomial and Poisson distributions

 Describing the relationship between attributes: Covariance; Correlation; ChiSquare Day 2

 Describing a single variable continued: Weibull, Geometric, Negative Binomial, Gamma and Exponential distributions; Special emphasis on Normal distribution; Central Limit Theorem

 Inferential statistics: How to learn about the population from a sample and vice versa; Sampling distributions; Confidence Intervals, Hypothesis Testing

 ANOVA Day 3

 Regression (Linear, Multivariate Regression) in forecasting

 Analyzing and interpreting regression results

 Logistic Regression Day 4

 Trend analysis and Time Series

 Cyclical and Seasonal analysis; Box-Jenkins method

 Smoothing; Moving averages; Auto-correlation; ARIMA – Holt-Winters method; Day 5

 Connectivity models (hierarchical clustering)

 Centroid models (k-means algorithm)

(8)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in Day 6

 Bayesian analysis and Naïve Bayes classifier

(9)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7213c

Optimization and Decision Analysis

This module is designed to teach linear and non-linear Optimization models namely Genetic Algorithms, Linear programming and Goal programming. The application areas originate from problems in finance and operations.

Day 1

 Genetic Algorithms: The algorithm and the process

 Representing data for a Genetic Algorithm

 Why and how do Genetic Algorithms work? Day 2

 Linear programming: Graphical analysis

 Sensitivity and Duality analyses

 Integer, binary programming; Applications, problem formulation and solving through R Day 3

 Goal programming

 Data envelopment analysis

(10)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7304c

Engineering Big Data with R and Hadoop Ecosystem

Companies collect and store large amounts of data during daily transactions. This data is both structured and unstructured. The volume of the data being collected has grown from MB to TB in the past few years and is continuing to grow at an exponential pace. The very large size, lack of structure and the pace at which it is growing characterize the “Big Data”.

To analyze long-term trends and patterns in the data and provide actionable intelligence to managers, this data needs to be consolidated and processed in specialized processes; those techniques form the core of the module.

From a tools perspective, this course introduces you to Hadoop. You will learn one of the most powerful combinations of Big Data, viz., “R and Hadoop”.

Day 1

 Lecture Session: Map Reduce, YARN

 Lab Session: Code samples of all covered components of Map Reduce Day 2

 Lecture Session: Map Reduce Applications - Text Mining, Page Rank, Graph Processing

 Lab Session:

o 1. Map Reduce Applications Code Review

o 2. Hands - on - Assignment

Day 3

 Lecture Session: Hadoop Eco system components - Pig, Hive, Hbase, Sqoop

 Lab Session: Hands on demo on the topics covered. Day 4

 Lecture Session:

o Hadoop Ecosystem components - Mahout, Hama, Flume, Chukwa, Avro, Whirr, Hue, Oozie, Zookeeper.

(11)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7206c

Text Mining, Social Network Analysis and Natural Language

Processing

This module teaches two of the most important applications of analytics in high tech industries.

Text mining: Unstructured data comprises more than 80% of the stored business information (primarily as text). This helped text mining emerge as a leading-edge technology. This module describes practical techniques for text mining, including pre-processing (tokenization, part-of speech tagging), document clustering and classification, information retrieval, search and sentiment extraction in a business context. Predictive modeling with social network data: Social network mining is extremely useful in targeted marketing, on-line advertising and fraud detection. The course teaches how incorporating social media analysis can help improve the performance of predictive models.

Natural Language Processing:

By the end of the course, you will be able to answer questions like “how to classify or tag a document into a category”, “how to rank some people in a network as more likely customers than others”, etc.

In terms of techniques, the course teaches:

 Bag-of-words and Text Similarity measures

 Page Rank; Neighbor analysis on predictive modeling

This course uses packages like R, WEKA and R-Hadoop for demonstrating real world examples. Day 1

 Unstructured vs. semi-structured data; Fundamentals of information retrieval

 Properties of words; Vector space models; Creating Term-Document (TxD) matrices; Similarity measures

 Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)

 Text classification and feature selection: How to use Naïve Bayes classifier for text classification Day 2

 Evaluation systems on the accuracy of text mining

 Sentiment Analysis Day 3

 Natural Language Analysis

(12)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7305c

Methods and Algorithms in Machine Learning

This module discusses the principles and ideas underlying the current practice of data mining and introduces a powerful set of useful data analytics tools (such as K-Nearest Neighbors, Neural Networks, etc.). Real-world business problems are used for practice.

Day 1

 Rule based knowledge: Logic of rules, evaluating rules, Rule induction and association rules Day 2

 Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each non-leaf node; Entropy; Information Gain

 Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with numerical variables; Other measures of randomness

 Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as rules Day 3

 Specialized decision trees (oblique trees),

 Ensemble and Hybrid models

 AdaBoost, Random Forests and Gradient boosting machines Day 4

 K-Nearest Neighbor method

 Wilson editing and triangulations

 K-nearest neighbors in collaborative filtering, digit recognition Day 5

 Motivation for Neural Networks and its applications

 Perceptron and Single Layer Neural Network, and hand calculations

 Learning in a Neural Net: Back propagation and conjugant gradient techniques

 Application of Neural Net in Face and Digit Recognition Day 6

 Linear learning machines and Kernel methods in learning

 VC (Vapnik-Chervonenkis) dimension; Shattering power of models

(13)

INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSV 1103

Communication, Ethical and IP challenges for Analytics

Professionals

This 2-day module presents students with very complex real world non-technical problems encountered to the students. On day 1, the students are expected to write a persuasive document and on day 2, they are expected to make a presentation.

In addition, on these two days, two expert speakers will give seminars and students will receive specific training in techno-persuasive writing, presentations and IP and confidentiality.

Day 1

 Creating a profile

 Professional Writing

 Industry exposure: A webinar by an industry expert about how they are using analytics in the real world

Day 2

 Data security, NDA, IP

 Presentation skills

 Industry exposure: A webinar by an industry expert about how they are using analytics in the real rld

References

Related documents

I We also consider a noisy variant with results concerning the asymptotic behaviour of the MLE. Ajay Jasra Estimation of

The binary image, results of the binarization is then processed using improved blocking block area method so that there will be word blocks based on the number of pixels for

VM backups fail with a message similar to the following when the destination pool contains Data Domain devices configured with fibre channel connectivity, which the NetWorker

In this study, determination of the effects of different thawing methods on the quality of meagre fillets namely, total volatile basic nitrogen (TVB-N), pH,

required plugins please run the Check system / Check browser from Lesesne Gateway login page (check under login area) or your CitLearn course area.. Refer to the Lesesne

Because it is the horizon that moves up and down and turns, while the symbolic aircraft is fixed relative to the rest of the instrument panel, trainees get confused; a

Therefore 65% should be converted with electricity emission factors and 35% should be converted with natural gas emission factors as per The Australian Government's Carbon

(a) Single-sided displacement spectra measured by the in-loop homodyne detector, at different electronic gains g el.. We exclude three narrow spectral features from the analysis