• No results found

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

N/A
N/A
Protected

Academic year: 2021

Share "consider the number of math classes taken by math 150 students. how can we represent the results in one number?"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

ch 3: numerically summarizing data - center, spread, shape

3.1 measure of central tendency

or, give me one number that represents all the data

consider the number of math classes taken by math 150 students. how can we represent the results in one number?

average: add up all the numbers and divide by the amount of numbers that

there are

ex) suppose you score on three tests 71,75,84. what is your test average? also called the mean

ex) for number of math classes, mean =

median: the middle number

ex) suppose you score on three tests 71,75,84. what is your median test score?

median is 75

interpretation: half the time the score is above 75, half the time the score is below 75

note: you must put data in ascending order to determine the median ex) what is the median for: 75, 84, 71

0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2...median is ex) heights of students (in inches):

59,61,62,64,64,64,65,66,66,66,67,68,68,69,70,70,71,71,73 what is the median height?

...find middle number: there are 19 numbers (19+1)/2=10 ...so its the number in the 10th position ...the median is 66

what do you do if there are two middle numbers? add together, divide by two (i.e. take the average)

..this will happen when there is an even amount of data

note that, using the "+1" method, you would get (20+1)/2 = 10.5

...this means the median is between the 10th and 11th numbers, so take their average

(2)

mode: most common number ex) number of math classes: 1 ex) heights: two modes 64 and 66

ex) test scores: no mode (all the same frequency) Question: which of these should we use, and why?

ex) number of credits taken at BMCC among math150 students: 0,0,9,12,21,22,27,32,35,38,44,50,52,56

mean = median = mode = 0

ex) there can be a problem with the mean

the average salary in this class is around $15,000

if Bill Gates (and his $1,000,000,000 salary) walk into the room,

the average salary is now around $35,000,000. does this make us all millionaires? ...no

the median salary is still around $15,000, because at most you go to the next number on the list

"the Bill Gates effect"

Bill Gates' salary is an outlier: it is a value far away from most of the data the average is not robust with respect to an outlier

the median is robust with respect to an outlier robust: not affected by [also known as resistant]

(3)

3.2 Measures of Dispersion how spread out is the data

because mean & median do not tell the whole story ex) group of 5 men, heights

group 1: 5'8,5'10, 5'11, 6', 5'9 ... in inches: 68,69,70,71,72 group 2: 4'6,7'4,4'2,6'8,6'6 ... in inches: 50,54,78,80,88 find mean:

group 1: 68+69+70+71+72 = 350 = 70" (or 5'10)

5 5

group 2: 54+88+50+80+78 = 350 = 70" (or 5'10)

5 5

- range

(highest) - (lowest)

ex) group #1: 72" - 68"=4" group #2: 88" - 50" = 38" note: affected by an outlier

ex) our salary range is 30000-0 = 30000

(4)

standard deviation

ex) group 1 (inches) 68,69,70,71,72 mean = 70

ex) var = 4 ... st.dev. = ex) st.dev. = 9 ... var = standard deviation =

you do:

ex) group #2: 54, 88, 50, 80, 78 ... mean = 70 find the standard deviation

(5)

sample

population

mean

x "x-bar"

µ

"mu"

st.dev.

s

σ

"sigma"

variance

s

2

σ

2

size

n

N

depends on

fixed

your sample

a "statistic" a "parameter"

also: "data value" = x

the way that you calculate the sample mean and

the population mean are exactly the same.

the difference is the kind of information it gives

you

note for standard dev:

for a population, divide by

the number of data

for a sample, divide by

the number - 1

ex) find the standard deviation of the sample 7,10,16 (and the

variance)

(6)

3.3 calculating that stuff from a table [extra credit material] (measures of central tendency and dispersion)

or, what to do if we have only the table of data and not the raw data

ex)

whats the mean??

note: the table is an

approximation, so the result will be an approximation

note: divide by 12, not 5, because 12 is the total frequency (e.g. 25 appears 7 times)

this is similar to a weighted mean ex) get three scores, 80, 95, 70 whats the mean?...

but the first score is your hw grade (that counts 20%) the second score is your midterm grade (that counts 30%) the third score is your final exam grade (that counts 50%)

Formula for a weighted mean:

mean = Σ x · rel.freq(x) x or µ

(7)

whats the standard deviation? [extra credit material]

(8)

measures of position

- rank (location)

ex) New York marathon, 12,635 people run, you finished 586

your rank is 586 (out of 12635)

- percentile

you are above ? % of the data

percentile --> data value

ex) 3,7,9,12,15,15,16,18,19,21,24,26,28,29 (n=14)

find the 37th percentile:

rank = (n+1)(P/100), then find the data value

ex) find the 58th percentile

you do:

ex) find the 82nd percentile

data value --> percentile

ex) at what percentile is x=24? [recall: "x" means data value]

x=24 is above 10 data values (out of 14)

percentile: 10/14 = .71 or 71st percentile (above 71% of the data)

notation: the 71th percentile is 24

P

71

= 24

note: for both problems, the middle step is to find the rank (position)

note: the "+1" formula has some glitches for small data sets. this comes from

the fact that one data value represents a large chunk of your data set (e.g. if

you have 20 numbers, each one represents 5%)

(9)

- quartile

break the data into four quartiles. they are marked off by: quarter point, half-way point, three-quarter point

- 5-number summary

min--Q1--Q2--Q3--max

Q1: data value after one quarter of the data. thats the same as P25 (the data value at the 25th percentile mark). it separates first quartile and second quartile

Q2 is in the 50th percentile position (then find the data value) Q3 is in the 75th percentile position (then find the data value) ex) 14,15,16,17,18,19,20,21,22 (n=9)

using the formula:

Q1 appears in which position? Q1 =

Q2 appears in which position? Q2 =

Q3 appears In which position? Q3 =

follow-up: in which quartile is x=19 ?

why do we need the "+1" ? well, if we didnt have it then for Q2 we would calculate

(9)(.5) = 4.5

but we know thats not right, its too low...the "+1" fixes that problem

Boxplot

- a visual representation of the 5 number summary

- helps you see if the distribution is symmetric or skewed

this distribution shape is called "symmetric"

here are some other shapes (as seen with boxplots):

(10)

- z-score

"the number of standard deviations from the mean"

ex) there is an exam. the mean score is 77, you got an 85. is that good? how good?

it depends.

suppose the standard deviation is 4. how many standard dev's above the mean is your score?

you are 8 points above the mean...that is 2 standard deviations (since st.dev. is 4)

Jerry got a 88. how many standard deviations above the mean is his score?

what is each number called?

ex) find the z-score for 47 if µ=38, σ=5

what does that mean, in words?

... 1.8 standard deviations above the mean ex) find the z-score for 68 if µ=78, σ=4

note that a positive z-score means your data value is above the mean and a negative z-score means your data value is below the mean

ex) which exam score is relatively better, a 75 when the class average was 68 and the standard deviation was 4, or a 89 when the class average was 76 and the standard deviation was 12 ? (use the z-score)

ex) find the data value which is 2 standard deviations above the mean if µ=32, σ=6

formula for x: x = µ + z·σ

same as the formula for z, but you solve for x Formula:

for a z-score: z = x - µ (population) σ

for a sample, same formula: z = x - x different notation s

References

Related documents

The minimal polynomial is always irreducible (otherwise, one factor would have α as a root and have smaller degree) and it cannot have any repeated roots (otherwise m and

that address patient safety in relation to health information technology. —   However, the Medical

In this nature of business, Bankers Realm Core Microfinance solution provided by Craft silicon, has been able to incorporate, not only their product based services, but also

These support the notion that TCR-peptide interactions govern TCR-pMHC binding because, although the ILA1 ␣1␤1 TCR with a mutated CDR2 loop did not contact the peptide, the

(c) Describe methods used in epidemiology and toxicology to assess environmental exposures and hazards.. (d) Describe policies that have been developed to manage health

Jack Schwager - Jack Schwager's Complete Guide to Designing and Testing Trading Systems Jack Schwager - Jack Schwager's Guide to Winning with Automated Trading Systems Jack Schwager

Table 4: Average Absolute Gain due to Being Listed in First Position on Ballots using All Races from 1978 to 2002. Standard errors are in parentheses. As in Table 3,

ANSWER: Malik feroz khan noon the 7th Prime minister of Pakistan. 62 The distance between earth and sun is smallest in the