Introduction to Statistics and Data Fall/2017-18
What is statistics?
Webster’s Third New International Dictionary
“ Statistics is science dealing with collection, analysis, interpretation and presentation of numerical data ”
Uses mathematics and probability
2
Data
"facts or figures from which conclusions can be drawn“
−Before one can present and interpret information, there has to be a process of gathering and sorting data.
−Just as trees are the raw material from which paper is produced, so too, can data be viewed as the raw material from which information is obtained.
Nihar Ranjan Roy
Data, Information and Statistics
3Data collected on the weight of 20 individuals in your classroom
Data Information Statistics
20, 21, 21.5, 24,
25 kg 5 individuals in the 20-to-25-kg range Mean weight = 22.5 kg 28 kg, 30 kg, etc. 15 individuals in the 26-to-30-kg range Median weight = 28 kg
Information
A good definition of information is "data that have been recorded, classified, organized, related, or interpreted within a framework so that meaning
emerges".
Data, Information and Statistics
4Data collected on the weight of 20 individuals in your classroom
Data Information Statistics
20, 21, 21.5, 24,
25 kg 5 individuals in the 20-to-25-kg range Mean weight = 22.5 kg 28 kg, 30 kg, etc. 15 individuals in the 26-to-30-kg range Median weight = 28 kg
Statistics
"a type of information obtained through mathematical operations on numerical data".
Nihar Ranjan Roy
Data, Information and Statistics
5Data collected on the weight of 20 individuals in your classroom
Data Information Statistics
20, 21, 21.5, 24,
25 kg 5 individuals in the 20-to-25-kg range Mean weight = 22.5 kg 28 kg, 30 kg, etc. 15 individuals in the 26-to-30-kg range Median weight = 28 kg
Types of Statistics
Population:
−A collection of persons, objects or items of interest.
−When researchers gather data from whole population for a given measurement of interest they call it CENSUS.
Sample:
−A portion of the whole
Statistics
Descriptive Statistic Inferential Statistics
6
Descriptive Vs Inferential Statistics
7Descriptive Statistics
Data is gathered on a group to
describe or reach conclusions about the same group.
Example: Athletic Statistics
Inferential Statistics
Data is gathered from a sample and uses the statistics to reach
conclusions about the population from which the sample was taken.
Example: Market Research.
Nihar Ranjan Roy 7
Descriptive statistics allow you to characterize your data based on its properties. There are four major types of descriptive statistics:
1. Measures of Frequency:
− Count, Percent, Frequency
− Shows how often something occurs
− Use this when you want to show how often a response is given
2. Measures of Central Tendency
− Mean, Median, and Mode
− Locates the distribution by various points
− Use this when you want to show how an average or most commonly indicated response
3. Measures of Dispersion or Variation
− Range, Variance, Standard Deviation
− Identifies the spread of scores by stating intervals
− Range = High/Low points
− Variance or Standard Deviation = difference between observed score and mean
− Use this when you want to show how "spread out" the data are. It is helpful to know when your data are so spread out that it affects the mean
4. Measures of Position
− Percentile Ranks, Quartile Ranks
− Describes how scores fall in relation to one another. Relies on standardized scores
− Use this when you need to compare scores to a normalized score (e.g., a national norm)
Descriptive Statistics
8Data???
What can be the possible form of data?
What operations can be performed on this data?
What does this data represent?
Relationships between two values? Interpretation
How to analyse this data?
Nihar Ranjan Roy
9
Data Measurement
Every data measured should not be analysed the same way statistically.
Need for level of data measurement
− Nominal
− Ordinal
− Interval
− Ratio
10
Nominal Level
Nominal — In nominal measurement the values just "name" the attribute uniquely.
−No ordering of the cases is implied.
−For example, a persons gender is nominal. It doesn’t matter whether you call them boys vs. girls or males vs. females or XY vs. XX chromosomes.
−Another example is religion – Catholic, Protestant, Muslim, etc.
Nihar Ranjan Roy
11
Ordinal Level
Ordinal - A variable is ordinal measurable if
ranking
is possible for values of the variable.−For example, a gold medal reflects superior performance to a silver or bronze medal in the Olympics. You can’t say a gold and a bronze medal average out to a silver medal, though.
−Preference scales are typically ordinal – how much do you like this cereal?
__________ _____________ ___________ ____________ ___________
Like it a lot, somewhat like it, neutral, somewhat dislike it, dislike it a lot.
1 2 3 4 5
12
Interval Level
Interval - In interval measurement the distance between attributes does have meaning.
−Numerical data typically fall into this category
−For example, when measuring temperature (in Fahrenheit), the distance from 30-40 is same as the distance from 70-80. The interval between values is interpretable.
Nihar Ranjan Roy
13
Ratio Level
Ratio — in ratio measurement there is always a reference point that is meaningful (either 0 for rates or 1 for ratios)
−This means that you can construct a meaningful fraction (or ratio) with a ratio variable.
−In applied social research most "count" variables are ratio, for example, the number of clients in past six months.
14
Cardinal Level
A Cardinal Number says how many of something there are, such as one, two, three, four, five.
−A Cardinal Number answers the question "How Many?“
−It does not have fractions or decimals, it is only used for counting.
Cardinal - A variable is cardinally measurable if a given interval between measures has a consistent meaning, i.e., if the measure corresponds to points along a straight line.
−For example, height, output, and income are cardinally measurable
Nihar Ranjan Roy
15
Nominal level data
Numbers are used to classify or categorize Example: Employment Classification
−1 for Educator
−2 for Construction Worker
−3 for Manufacturing Worker
16
Ordinal Level Data
Numbers are used to indicate rank or order
−Relative magnitude of numbers is meaningful
−Differences between numbers are not comparable Example: Ranking productivity of employees
Example: Position within an organization o1 for President
o2 for Vice President o3 for Plant Manager
o4 for Department Supervisor o5 for Employee
Nihar Ranjan Roy
17
Ordinal Level Data
Faculty and staff should receive preferential treatment for parking space.
1 2 3 4 5
Strongly
Agree Agree Strongly
Disagree Disagree
Neutral
18
Interval Level Data
Interval Level data - Distances between consecutive integers are equal
−Relative magnitude of numbers is meaningful
−Differences between numbers are comparable
−Location of origin, zero, is arbitrary
−Vertical intercept of unit of measure transform function is not zero
Example: Fahrenheit Temperature Example: Monetary Utility
Nihar Ranjan Roy
19
Ratio Level Data
Highest level of measurement
−Relative magnitude of numbers is meaningful
−Differences between numbers are comparable
−Location of origin, zero, is absolute (natural)
−Vertical intercept of unit of measure transform function is zero
Examples: Height, Weight, and Volume
Example: Monetary Variables, such as Profit and Loss, Revenues, Expenses, Financial ratios - such as P/E Ratio, Inventory Turnover, and Quick Ratio.
20
Ratio Level Data…
Parametric statistics – requires that the data be interval or ratio
Non Parametric – used if data are nominal or ordinal
−Non parametric statistics can be used to analyze interval or ratio data
Nihar Ranjan Roy
21
Data Level Nominal Ordinal Interval
Ratio
Classifying and Counting
All of the above plus Ranking
All of the above plus Addition, Subtraction,
Multiplication, and Division (including means, standard deviations, etc.)
All of the above
Meaningful Operations
Data Level, Operations, and Statistical Methods
22Classify each of the following as nominal, ordinal, interval or ratio data.?
1. The time required to produce each tire on an assembly line.
2. The number of quarts of milk a family drinks in a month.
3. The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor.
4. The telephone area code of clients in the United States.
5. The age of each of your employees.
6. The dollar sales at the local pizza house each month.
7. An employee’s ID number.
8. The response time of an emergency unit
Nihar Ranjan Roy
Problem
23Classify each of the following as nominal, ordinal, interval or ratio data.?
1. The time required to produce each tire on an assembly line.
2. The number of quarts of milk a family drinks in a month.
3. The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor.
4. The telephone area code of clients in the United States.
5. The age of each of your employees.
6. The dollar sales at the local pizza house each month.
7. An employee’s ID number.
8. The response time of an emergency unit
Problem
241. Ratio 2. Ratio 3. Ordinal 4. Nominal 5. Ratio 6. Ratio
Problem
Classify the following as nominal, ordinal, interval or ratio data.
1. The ranking of a company by Fortune 500.
2. The number of tickets sold at a movie theatre on a given night.
3. The identification number of a questionnaire 4. Per capita income
5. The trade balance in dollars 6. Profit/loss in dollars
7. A company's tax identification
Nihar Ranjan Roy
25