• No results found

Mean = (sum of the values / the number of the value) if probabilities are equal

N/A
N/A
Protected

Academic year: 2021

Share "Mean = (sum of the values / the number of the value) if probabilities are equal"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Population Mean

Mean = (sum of the values / the number of the value) if probabilities are equal Compute the population mean

Population/Sample mean: 1. Collect the data

2. sum all the values in the population/sample.

3. divide the sum by the number of elements in the population/sample.

Median

The median is a center value that divides a sorted list of data into two halves. Data Array

(2)

Mode

Is the value in a data set that occurs most frequently. Percentile location value

i = (P/100) (n+1) p desired percentile

n number of values in the data set.

The pth percentile in a data array is a value that divides the data set into two parts. The lower segment contains at least p%, and the upper segment contains at least (100 – p)%, of the data. The 50th percentile is the median.

Box and Whisker plots

(3)

2. find the 25th percentile ( first quartile), 50th percentile (median), 75th percentile

3. draw a box so that the ends of the box at Q1 and Q3, This box wil contain the middle 50% of the data values in the population or sample

4. Draw a vertical line through the box at the median. Half the data values in the box will be on either side of the median.

5. Calculate the interquartile range (IQR = Q3 – Q1).

Compute the lower limit for the box and whisker plot as Q1 – 1.5(Q3-Q1) and upper limit Q3 + 1.5(Q3 – Q1). Any data values outside these limits are referred to as outliers. 6. extend dashed lines(call the whiskers) from each end to the box to the lowest (on the left) and highest value (on the right) within the limits.

7. any value outside the limits (outlier) found in 5 is marked with an asterisk(*).

Range

(4)

Interquartile Range IQR = Q3 – Q1 Variance

The population variance is the average of the squared distances of the data values from the mean.

The sample variance is the “average” (divide by n-1 instead n) of the squared distances of the data values from the mean (“residuals”).

Standard Deviation

Positive square root of the variance. Coefficient of Variation

(5)

Standardized Data Values (Z scores)

1. compute the population mean and SD or the sample mean and SD 2. use these formulas:

Z = (x – mean) / SD For samples

Z = (x – sample mean) / sample SD

Using Tree Diagram

(6)

Two events are independent if the occurrence of one event in no way influences the probability of the occurrence of the other event.

Probability Rule

P(E1 or E2) = P(E1) + P(E2) – P(E1 and E2) For two mutually exclusive events

P(E1 or E2) = P(E1) + P(E2) Conditional probability

P(E1 | E2) = P(E1 and E2) / P(E2)

It reads “probability of Event E1 given event E2 has occurred.”

(7)

Conditional Probability for Independent Events P(E1 | E2) = P (E1)

And P(E2 | E1) = P (E2)

Binomial Use R

Pbinom for if you want to find the probability less than or equal to q, Size = number of trials and p = probability of a success at each trial. pbinom( q , number_of_trials, probability of success)

Probability of outcome x, use:

(8)

Expected value for the binomial is = number_of_trials x probability_of_success

Poisson: number of successes when number_of_trials is very large and the probability of a success is very small.

λ= number_of_trials x prob_of_success = expected number of successes. Use R

dpois (x, lambda ) = prob of x if expected value = lambda Normal distribution

Use R Pnorm

(9)

Sample Error

Sample Error = sample mean – population mean

Std.Error = SD of the Sample error = population SD / square root of n This is SD of the sampling distribution.

To find probabilities associated with a sampling distribution of xbar for samples of size n from a population with mean and SD (if population is normal or if n is large)

1. compute the sample mean

2. Define the sampling distribution

Population Mean of Sample mean = population mean SD of sample mean = SD / square root of n

(10)

3. define the event of interest

4. Express in terms of a Z value = (Sample Mean – Pop Mean) / ( SD of sample mean) and use pnorm to get the probability

Sample proportion

1. “find” p (true probability) 2. find pbar

3. find SD pbar

If we have p: sqrt( p(1-p) / n) [Hypothesis testing] (6.10) If only pbar: sqrt( pbar(1-pbar) / n) [Confidence intervals] 4. define the event of interest

5. find the Z value 6. use pnorm

(11)

Confidence Interval Calculation

Point estimate +/- (critical Value (Z or T))(Standard Error of Estimate)

Developing a confidence interval estimate for a population proportion

1. define the population of interest and the variable from which to estimate the population proportion.

2. determine the sample size and select a simple random sample.

3. specify the level of confidence and obtain the critical value from qnorm or qt (in R) 4. calculate the pbar, the sample proportion.

(12)

One tailed test for a hypothesis about a population mean, SD known, large samples 1. Specify the population value of interest.

2. Formulate the null hypothesis and the alternative hypothesis in terms of the population mean.

3. Specify the desired significance level 4. construct the rejection region

5. compute the test statistic. 6. draw the conclusion

© T. Lau 2007

References

Related documents

The positive and signi…cant coe¢ cient on the post shipment dummy in the fourth column implies that prices charged in post shipment term transactions are higher than those charged

more than four additional runs were required, they were needed for the 2 7-3 design, which is intuitive as this design has one more factor than the 2 6-2 design

In the simplest scheme based on the boolean model for retrieval, we retrieve the documents that have occurrences of both good and teacher.. Such

The corona radiata consists of one or more layers of follicular cells that surround the zona pellucida, the polar body, and the secondary oocyte.. The corona radiata is dispersed

organisasjonslæring, arbeidsplasslæring, uformell og formell læring, læring gjennom praksis, sosial praksis og så videre vil derfor være nyttige når man skal foreta en studie

Proprietary Schools are referred to as those classified nonpublic, which sell or offer for sale mostly post- secondary instruction which leads to an occupation..

• The development of a model named the image based feature space (IBFS) model for linking image regions or segments with text labels, as well as for automatic image

Sensitivity analysis was performed on the lower cut-off voltages to differentiate scar area for three types of catheters (Thermocool, Pentaray, and Orion) with three orientations