• No results found

Applied Data Analysis. Fall 2015

N/A
N/A
Protected

Academic year: 2021

Share "Applied Data Analysis. Fall 2015"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Applied Data Analysis

(2)

Course information: Labs

Anna Walsdorff Mary Clare Roche

[email protected] [email protected]

(3)

Lecture outline

1. Practice questions

(4)

Question 1

For women age 25-45 in the U.S. in 2005, with full-time jobs, the relationship between education (years of schooling

completed) and personal income (dollars) can be summarized as follows:

Education Income

Mean 14.0 32,000

St. Dev. 2.4 26,000

Estimate the average income of those women who have finished high school, but have not gone on to college (12 years of education). The correlation is 0.34.

(5)
(6)

Question 1 answer

12 − 14

2.4 = −0.8333

0.34 · −0.833 = −0.28

(7)

Question 1 answer

12 − 14

2.4 = −0.8333

0.34 · −0.833 = −0.28

(8)

Question 1 answer

12 − 14

2.4 = −0.8333

0.34 · −0.833 = −0.28

(9)

Question 2

For the first-year students at a certain university, the correlation between SAT scores and first-year GPA was 0.60. The scatter diagram is football-shaped. Predict the percentile rank for the first-year GPA for a student whose percentile rank on the SAT was

1. 90%

2. 30%

3. 50%

(10)
(11)

Question 2 answer

1. On the z-table, 90% translates to 1.28. 1.28 · 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%.

2. On the z-table, 30% translates to -0.52.

−0.52 · 0.6 = −0.312. Going back to the z-table, -0.312 gives us about 38%.

3. 50%

(12)

Question 2 answer

1. On the z-table, 90% translates to 1.28. 1.28 · 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%.

2. On the z-table, 30% translates to -0.52.

−0.52 · 0.6 = −0.312. Going back to the z-table, -0.312 gives us about 38%.

3. 50%

(13)

Question 2 answer

1. On the z-table, 90% translates to 1.28. 1.28 · 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%.

2. On the z-table, 30% translates to -0.52.

−0.52 · 0.6 = −0.312. Going back to the z-table, -0.312 gives us about 38%.

3. 50%

(14)

Question 2 answer

1. On the z-table, 90% translates to 1.28. 1.28 · 0.6 = 0.768. Going back to the z-table, 0.768 gives us about 78%.

2. On the z-table, 30% translates to -0.52.

−0.52 · 0.6 = −0.312. Going back to the z-table, -0.312 gives us about 38%.

3. 50%

(15)

Question 3

As part of their training, air force pilots make two practice landings with instructors and are rated on performance. The instructors discuss the ratings with the pilots after each landing. Statistical analysis shows that pilots who make poor landings the first time tend to do better the second time. Conversely, pilots who make good landings the first time tend to do worse the second time. The conclusion: criticism helps the pilots while praise makes them perform worse. As a result, instructors were ordered to criticize all landings, good or bad. Was this policy warranted by the facts?

(16)

Question 3 answer

No, the air force is guilty of making the regression fallacy. The results are probably due to the regression effect.

(17)

Question 4

An admissions officer is trying to choose between two methods of predicting first-year scores. One method has an r.m.s. error of 12. The other has an r.m.s. error of 7. Other things being equal, which should she choose? Why?

(18)

Question 4 answer

The one with the smaller r.m.s. error because it will be more accurate.

(19)

Question 5

At a certain college, the first-year GPAs average about 3.0, with a SD of about 0.5; they are correlated at about 0.6 with

high-school GPA. Person A predicts first-year GPAs just using the average. Person B predicts first-year GPAs by regression, using the high-school GPAs. Which person makes the smaller r.m.s. error? Smaller by what factor?

(20)

Question 5 answer

Person B, who uses more information.

The r.m.s. will be smaller by a factor of

p

(21)

Question 5 answer

Person B, who uses more information. The r.m.s. will be smaller by a factor of

p

(22)

Question 6

Pearson and Lee obtained the following results for about 1,000 families:

Husband height Wife height

Mean 68.0 63.0

St. Dev. 2.7 2.5

r = 0.25

1. What percentage of the women were over 5’8”?

2. Of the women who were married to men of height 6 feet, what percentage were over 5’8”?

(23)
(24)

Question 6 answer

1. 68 − 63 2.5 =2 ⇒ 2.28%. 2. 72 − 68 2.7 = 1.48 1.48 · 0.25 = 0.37 0.37 · 2.5 + 63 = 63.9 68 − 63.9 2.5 = 1.64 ⇒ 5%

(25)

Question 6 answer

1. 68 − 63 2.5 =2 ⇒ 2.28%. 2. 72 − 68 2.7 = 1.48 1.48 · 0.25 = 0.37 0.37 · 2.5 + 63 = 63.9 68 − 63.9 2.5 = 1.64 ⇒ 5%

(26)

Inference

Up to this point, I have not been clear between the difference between the true regression line and the estimated regression line. This is just like the difference between µx and ¯x .

The true regression line is

yi = β0+ β1xi+ i

The estimated regression line is

ˆ

yi = a + bxi = βˆ0+ ˆβ1xi

(27)

Inference

Up to this point, I have not been clear between the difference between the true regression line and the estimated regression line. This is just like the difference between µx and ¯x .

The true regression line is

yi = β0+ β1xi+ i

The estimated regression line is

ˆ

yi = a + bxi = βˆ0+ ˆβ1xi

(28)

Return of the hypothesis test

b is an estimator so it must have a sampling distribution and a standard error. If it has those, we can perform hypothesis tests.

H0 : β =0 H1 : β 6=0

(29)

Return of the hypothesis test

b is an estimator so it must have a sampling distribution and a standard error. If it has those, we can perform hypothesis tests.

H0 : β =0 H1 : β 6=0

(30)

Have to take my word for it

The standard error for the regression coefficient is

sb = b ·√1 − r2 r ·√n − 2 = sy √ 1 − r2 sx √ n − 2

(31)

The test statistic

is distributed as a t with n − 2 degrees of freedom.

b sb

(32)

Is the effect of income on contacts real?

Contacts Income Mean 3.60 4.230 St. Dev. 2.27 3.328 b = 0.78 · 2.27 3.328 =0.532 a = 3.60 − 0.532(4.32) = 1.3

(33)

The test

sb = b ·√1 − r2 r ·√n − 2 = 0.532 √ 1 − 0.782 0.78√10 − 2 = 0.151 t = 0.532 0.151 =3.47

The p-value is approximately 0, and we reject the null hypothesis.

(34)

The test

sb = b ·√1 − r2 r ·√n − 2 = 0.532 √ 1 − 0.782 0.78√10 − 2 = 0.151 t = 0.532 0.151 =3.47

The p-value is approximately 0, and we reject the null hypothesis.

(35)

The test

sb = b ·√1 − r2 r ·√n − 2 = 0.532 √ 1 − 0.782 0.78√10 − 2 = 0.151 t = 0.532 0.151 =3.47

The p-value is approximately 0, and we reject the null hypothesis.

(36)

What did we learn?

• If you had majored in international relations, you would not need this class.

References

Related documents

Even though the results obtained with this work were satisfactory and consistent with most studies among the literature, as well as with some of the expectations from when this work

Making sacramental wine requires special attention and care, starting with qvevri washing and marani hygiene and ending with fermentation, aging and storage. During

Patient information Pat profile Pat history Disease status Management plan Simulation Immobilisation Simulation portal Set up information Diagnostic data Blood investigation

O percorrido da escrita feminista é, tanto para Queizán como para Cixous, a procura da feminidade; é unha escrita en feminino da feminidade. As mulleres buscan coñecerse a

• Risky Level – The price at which you remove a single long position or reduce a multiple.. long position on share

的 哲 学 观 念 才 使 ASP.NET 的基于应用在开发过程中更快捷与完整。安装 ASP.NET 应用不需要非常复杂的过程,只需要将必备的文件拷贝就可,并且系

3) Determine the moles of solute of the unknown solution used during the titration 4) Determine the molarity of the

Seeing Falls Specialist at home for falls and poor mobility. Referred in for urgent medical assessment for subacute decline- fatigue, worsening mobility, poor appetite &