## STAT 370: Probability and Statistics for

*North Carolina State University*

## y

## Engineers

## [Section 002]

### Instructor: Hua Zhou

### Harrelson Hall 210

### 10:15AM–11:30PM, Jan 9, 2012

Today

• Introduction: What’s statistics and what does this course cover?

• Course logistics • Course logistics • Q&A

The New York Times (Aug 5, 2009) What is Statistics ?

• Statistics, the science of data analysis, is the applied mathematicsof the 21st century.

• This era is characterized by massive data sets, automated measurement, and raw computational power. It is the statistical analysis of these data sets which is the driving force behind internet search, online merchants,

computational finance, weather forecast, bioinformatics, and dozens of other fields.

• It is the most important and portable subject you will learn in your quantitative curriculum.

• Surveys of practicing engineersconsistently show that one of their foremost academic regretswas not learning enough statistics, and the reasons for this are clear.

Statistics in Engineering • Engineers apply physical and

chemical laws and mathematics to design, develop, test, and supervise various products and services.

• Engineers perform tests to learn how things behave under stress, and at what point they might fail. • As engineers perform

experiments, they collect data that can be used to explain relationships better and to reveal information about the quality of products and services they provide.

What is the importance of statistics in the field of engineering?

1. Design of Experiments(DOE) uses **statistical techniques **to test
and construct models of engineering components and systems.
2.Quality controly and process control p use **statistics**as a tool to
manage conformance to specifications of manufacturing processes
and their products.

3. Time and methods engineering uses **statistics**to study repetitive
operations in manufacturing in order to set standards and find
optimum (in some sense) manufacturing procedures.

4. Reliability engineering uses **statistics**to measures the ability of a
system to perform for its intended function (and time) and has tools
for improving performance.

5. Probabilistic design uses **statistics**in the use of probability in
product and system design.

http://en.wikipedia.org/wiki/Engineering_statistics

Some Interesting Video Clips about Statistics • Joy of Statistics

http://www.open.ac.uk/openlearn/whats-on/the-joy-stats

• TED: Ideas worth spreading: about Statistics

http://www.ted.com/talks/arthur_benjamin_s_formula_for _changing_math_education.html

Course Objectives

The main objective in this class is to equip you with basic tools for (1) making sense of real data, (2) designing experiments, and (3) preparing for the higher level experiments, and (3) preparing for the higher level classes in machine learning, stochastic processes, and computational statistics

• Master essential statistical terminology • Numerical and graphical summariesof data • Plan and analyze simple factorial designs

B i l l ti i i l li i

• Basic calculations in simple linear regression • Calculate probabilitiesusing basic probability

distributions

Graphical Summary: Where are the Cancers?

What do you observe and why?

Graphical Summary: Where are the Cancers?

What do you observe and why?

Many are in the Great Plains and relatively few near the • Many are in the Great Plains and relatively few near the

coasts (older people in these counties?)

• Most shaded counties are in rural areas (worse health centers? Less healthy diets? More exposure to harmful chemicals?)

What do you expect the map of counties with lowest cancer death rate?

Graphical Summary: Where are the Cancers?

What do you observe and why?

Graphical Summary: Where are the Cancers?

Counties with small population sizes are more likely to be highlighted in both maps!

• Consider a county with just 100 residents. If there was only 1 death, the rate (0.01) is extremely high.

Numerical Summary of Data: Average/Mean • Question: Which department of UNC, Chapel Hill

produces students that earn the most on average 10 years after they got their degrees?

• Survey a certain number of graduates from UNC. • A lot of departments are surveyed.

• Answer:

– Geography!!!!?????? – Michael Jordan

• Median is robust to outlier and a better summary in thisMedian is robust to outlier and a better summary in this case

Simple Linear Regression: Do higher people earn more?

• Is there a trend in the earning with respect to height? • How to find the best fit line on this data?

• Is this line model good? • How to interpret the line model? • Is this model adequate?

Factorial Data Analysis: Diet Coke and Mentos

• This is from a previous course project for ST370

• It is interesting to know how the volume loss depends on number of Mentos applied and initial volumes

• Conclusion: Percentage of volume loss depends on initial p volume but not on # mentos applied

Number of Mentos and Diet Coke on % soda volume loss

**Initial **

**volume % volume lost**

### Number of Mentos

### Number of Mentos

**4**

**8**

### m

### e

**0591 mL**

### 0.565

### 0.57

### 0591 mL

### 0.526

### 0.577

### 0591 mL

### 0.54

### 0.558

**1000 mL**

### 0.561

### 0.587

### Vo

### lu

### m

### 1000 mL

### 0.532

### 0.539

### 1000 mL

### 0.519

### 0.559

**2000 mL**

### 0.475

### 0.537

### 2000 mL

### 0.565

### 0.615

### 2000 mL

### 0.537

### 0.5

Calculating Probability: Roulette

Players bet $1 that the ball will land in a red (or black) slot and win $1 if it does. Let Xibe the winnings on the i-th day.

(a) What is distribution mean and variance of Xi? (a) What is distribution, mean and variance of Xi ? (b) Suppose you play once per day for 365 days, what

does CLT say about your average winning? (c) What is the probability that your average winning

over 365 days is positive?

Solution to the Roulette Problem

• (a) E(X) = -0.0526, Var(X) = 0.9972

• (b) Average payoff after 365 days is approximately

normal with mean -0 0526 and standard deviation normal with mean -0.0526 and standard deviation 0.0523

• (c) P( ) = 0.1573*X* 0

Some more examples

• claims (at least it used to claim) that it contains 1000
chips.Is this true? **(Inference)**

• What is the chance that a poker gambler gets aWhat is the chance that a poker gambler gets a Royal FlushRoyal Flush((*AKQJTAKQJT *

*of the same suit*) at Atlantic City? **(Probability)**

• Among a group of randomly chosen people, how likely is it for two of
them to have the same birthday? **(Probability)**

• What is the relationship between Income and Years of Education?

**(Linear Regression)**

• Design your own experiment, collect data, analyze data and draw
conclusions. **(Design of Experiment)**

Course Requirement

Course Materials • PowerPoint presentation + blackboard illustration

– Keep focused in class

• Optional Text: Selected material from Chapter 1 9 11 13 and 14 • Optional Text: Selected material from Chapter 1-9, 11, 13 and 14 • Class Web Page

– http://www4.stat.ncsu.edu/~hzhou3/courses/st370-2012-Spring/

• WebAssign@NCSU (log in using unity ID) – http://webassign.ncsu.edu

– Make sure you have the correct email address on file

• Software [*StatCrunch@NCSU*]
– http://statcrunch.stat.ncsu.edu
Optional Textbook
• ISBN: 9780470910610
• NCSU Bookstore:
$122 50
$122.50
• Amazon
$143.33

• We will cover selected topics in Chapters 1-9, 11, 13, and 14

• Homework 100pts

– Best 10 out of 12 assignments

• Midterm I 100pts

Course grade (500pts)

• Midterm I 100pts

– Wed, Feb 29(in class)

• Midterm II 100pts

– Mon, Apr 2 (in class)

• Quizzes 100pts

- Best 5 out of >6 quizzesq

• Final Exam 100pts

- Monday, May 7(8:00-11:00AM)

Final Letter Grade

• Final Letter Grade:

98-100% A+, 92-97% A, 90-91% A-88-89% B+, 82-87% B, 80-81% B-78-79% C+, 72-77% C, 70-71% C-68-69% D+, 62-67% D, 60-61%

D-0-59% F

Homework and Quizzes • Twelve homework assignments

– Assign and due online @ WebAssign • No late homework will be accepted

• I only count your 10 best homework scores and the lowest 2 homework scores will be dropped

The semester homework score will count as 20%of the course grade

• At least five in-class quizzes will be given. Quiz scores will count 20%of the course grade.

Exam Policy

• Midterms: Feb 29, Apr 2 (in class, closed book, closed notes) • The final exam (8-11am, May 7) is accumulative

• A review session will be held prior to each exam

• All exams are required with **no make-up permitted. **

[Conflicts with other subjects, please notify me ASAP; no later than 2 weeks before the test.]

Strategy to Succeed in this Class

• Focus in classes: preview andreviewlecture notes, stay active/involved in class, take notes, answer questions

• Class attendance is expected and will influence your grade through quizzes (20% of the grade)

quizzes (20% of the grade)

• Ask questionsduring class (especially if you can not see, read, hear or understand anything)

[Not the place to be shy !]

• Start homeworkearly

• Make effective use of the office hours

• Do not overly count on the finalto pull up your final grade; The two midtermshave same weights as final and may be easier because they are not accumulative

Classroom Policies

• Please come on time and do not start packing up before the class is over

• Please turn off cell phones

• Please refrain from conversation with your neighbor during the class, except when you are instructed to do so during in-class activities

Academic Integrity

• You may work together on homework assignments, but simply giving or receiving answers to or from another student is cheating

student is cheating

• Copying from another student’s paper is not allowed • Using unauthorized materials during exam is not allowed • Falsifying data is not allowed