Introductory Statistics
ECON1005
I Descriptive Statistics (PART I)
■
Introduction
INTRODUCTION
■
What is Statistics?
■
Basic Definitions
What is Statistics?
•
Statistics
is a group of methods used to
Branches of Statistics
STATISTICS
DESCRIPTIVE STATISTICS
(characterise attributes of sample
&/or population)
•
Used to collect, organize, display and analyse data
•
There are two types:
1.
Numerical
●
Involves the computation of a statistic (eg. the average)
2.
Graphical
●
Involves representing the data using pictures
Inferential Statistics
•
Uses sample results to make generalizations,
inferences and predictions about a wider population
•
There are two types:
▫ Estimation
●
sample is used to estimate a parameter
▫ Hypothesis Testing
Descriptive vs. Inferential Statistics
•
Descriptive Statistics
▫ Collect
▫ Organize
▫ Summarize
▫ Display
▫ Analyze
•
Inferential Statistics
▫ Predict and forecast
values of a population
▫ Test hypotheses about
values of a population
Basic Definitions I
•
Population
:
▫ A
population
is the collection of all items whose characteristics
are being studied.
●
N represents the population size
▫ Values calculated using population data are called parameters
•
Sample:
▫ A
sample
is a portion of the population selected for study.
●
n represents the sample size
Basic Definitions II
•
Data:
▫ numbers or measurements that are collected
•
Variables:
▫ characteristics or attributes that enable us to distinguish one
individual from another
▫ they take on different values when different individuals are
observed (e.g. height)
•
Element:
Summarising & Describing Data
•
Describing the observed patterns in data is an
important part of statistics
•
Distribution of a single variable
Describing Data
S
Shape
What is the overall shape of the distribution?
(symmetric or skewed / Mounded or flat)
U
Unusual
(errors, outliers or influential points)
Are there any unusual points?
M
Middle
Where is the centre of the distribution?
(mean, median, mode)
Organizing and Graphing Data
■
Introduction
■
Frequency Distributions
■
Bar Charts
Introduction
•
There are two main types of data:
▫
Quantitative
●
This is information presented in the form of numbers,
percentages or statistics
●
It answers in numerical terms such questions as "how often"
and "how many“
▫
Qualitative
Organizing Data
A frequency distribution lists all categories/classes and
the number of elements that belong to each
Frequency Distributions
•
A frequency distribution:
▫ a table in which measurements are tallied
▫ then the frequency or total number of times that each item
occurs is recorded
•
Usually measurements are arranged in ascending or descending
order
•
A frequency distribution has 3 columns
▫ the data categories or classes
▫ the tally column (for raw data)
▫ the corresponding frequencies
Examples
•
Quantitative
•
Qualitative
CATEGORY TALLY FREQUENCY
Yes 23
No 13
Undecided 4
Total 40
10M - 14M 25
15M - 19M 15
20M - 24M 19
25M - 29M 8
Total 77
Frequency Distribution Cont’d
•
two main types of frequency distributions:
▫ Ungrouped data
▫ Grouped data
Ad. expenditure (J$M) Tally Number of Firms (Frequency)
5M - 9M 10
10M - 14M 25
15M - 19M 15
20M - 24M 19
25M - 29M 8
Total 77
Ages Tally Frequency
•
Class Intervals/Limits
▫ largest or smallest numbers which can actually
belong to each class
▫ each class has a lower class limit and an upper
class limit
Ad. expenditure
(J$M)
Tally Number of Firms
5 - 9 10
10 – 14 25
15 – 19 15
20 – 24 19
25 - 29 8
▫ the numbers which separate classes
▫ given by the midpoint of the upper limit of one class and the
lower limit of the next class
Ad. expenditure (J$M) Class Boundaries Tally Number of Firms
5 - 9 10
10 – 14 9.5 – 14.5 25
15 – 19 15
20 – 24 19
25 - 29 8
Total 77
Lower class boundary for 2
ndclass
(10 – 14):
•
Class Mark (Midpoint)
▫ found by taking the average of the class limits (or class
boundaries)
Ad. expenditure (J$M) Class Boundaries Midpoints Number of Firms5 - 9 4.5 – 9.5 7 10
10 – 14 9.5 – 14.5 25
15 – 19 14.5 – 19.5 15
20 – 24 19.5 – 24.5 19
25 - 29 24.5 – 29.5 8
Total 77
Class 1 - Using Class Limits
•
Class Width
▫ aka: class size, class width, class length
▫ Two ways of calculating
▫ Method 1: the difference between corresponding class limits
▫ Method 2: the difference between two class boundaries
Ad. expenditure (J$M) Class Boundaries Midpoints Number of Firms
5 - 9 4.5 – 9.5 7 10
10 – 14 9.5 – 14.5 12 25
15 – 19 14.5 – 19.5 17 15
20 – 24 19.5 – 24.5 22 19
25 - 29 24.5 – 29.5 27 8
Total 77
Using Class
Limits
•
Found by dividing the frequency of a category/class by the
sum of all frequencies
▫ The sum of the relative frequencies MUST add to 1
▫ Sometimes expressed as a percentage
Ad. expenditure (J$M) Class Boundaries Number of Firms Relative Frequency
5 - 9 4.5 – 9.5 10 0.13
10 – 14 9.5 – 14.5 25
15 – 19 14.5 – 19.5 15
20 – 24 19.5 – 24.5 19
25 - 29 24.5 – 29.5 8
Total 77 1.00
General
Formula
Distributions
1.
The classes must be “mutually exclusive” - no element can
belong to more than one class
2.
Even if the frequency is zero, include each and every class
3.
Make all classes the same width (open ended classes may be
inevitable)
4.
Target between 5 and 20 classes, depending on the range and
number of data points
Consider the following data set:
2.3 4.2
2.8 6.7 4.7 1.6
2.0 1.4 1.0
2.8 1.8
5.2 6.0 5.2 3.5 1.0 3.6 5.1
1.9 7.3
2.5 5.6 3.3 3.4
2.9 3.0 1.8
2.1 3.1
2.8 2.1 4.3 7.1
4.9 1.6
2.2
4.5 6.3
2.7 8.3
a.
Group these figures into a frequency distribution having the classes:
1.0 – 1.9, 2.0 – 2.9, 3.0 – 3.9, 4.0 – 4.9, 5.0 – 5.9, 6.0 – 6.9, 7.0 –
7.9, and 8.0 – 8.9
b.
Calculate the class boundaries
c.
Calculate the class midpoints
d.
Calculate the class width
Graphical Representation
•
When presenting
Quantitative Data
use:
▫ histograms
▫ frequency polygons
▫ cumulative frequency polygons (O-give)
•
When presenting
Qualitative Data
use:
▫ A graphical way of presenting qualitative data
•
Bars (columns) are separated from each other and have
the same width
•
Categories are placed on the horizontal axis and
▫ A graphical way of presenting qualitative data
•
Pie Chart is a circle divided into portions that represent
the relative frequencies belonging to different categories.
•
To construct pie chart:
Qualitative Example
The following are the results for a third year
statistics course:
A - 41
B+ - 12
B - 2
C - 22
F - 19
▫ Calculate the relative frequencies
▫ Construct a bar chart
•
A graphical way of presenting qualitative data
•
Divide data into classes of equal width and the number of
observations in each class is counted (information would be
presented in a frequency table)
•
Class is on the x-axis (horizontal)
▫ Can plot using either:
●
Class Limits
●
Class Boundaries
•
Frequency (or relative frequency) is on the y-axis (vertical)
•
Bars are drawn where the base of each bar covers the class
and the height of each bar covers the frequency
Figure 2 – plotted using class
limits
[image:32.720.41.425.35.308.2]4. Frequency Polygons
•
A
Frequency Polygon
is a line graph joining the
midpoints of the bars of a histogram
•
To construct a frequency polygon:
▫ Plot the midpoint of each class (on horizontal) with its
corresponding frequency/relative frequency (on vertical)
•
Examines how many observations
lie below
a certain class
boundary
•
Plotted against the upper class boundaries
Using Frequencies
•
The first value in the distribution is ALWAYS zero
•
The last value in the distribution is ALWAYS the total number
Using Relative Frequencies
•
The first value in the distribution is ALWAYS zero
•
The last value in the distribution is ALWAYS 1
Using Percentages
•
The first value in the distribution is ALWAYS zero
expenditure (J$M)
Upper Class Boundaries
Number
of Firms Cumulative Frequency
4.5
5 - 9 9.5 10
10 – 14 14.5 25
15 – 19 19.5 15
20 – 24 24.5 19
25 - 29 29.5 8
Total - 77
•
Examines how many observations
lie above
a certain class
boundary
•
Plotted against the upper class boundaries
Using Frequencies
•
The first value in the distribution is ALWAYS the total
number
•
The last value in the distribution is ALWAYS zero
Using Relative Frequencies
•
The first value in the distribution is ALWAYS 1
•
The last value in the distribution is ALWAYS zero
Using Percentages
•
The first value in the distribution is ALWAYS 100
Ad. expenditure (J$M) Upper Class Boundaries Number of Firms More Than Cumulative Frequency 4.5
5 - 9 9.5 10
10 – 14 14.5 25
15 – 19 19.5 15
20 – 24 24.5 19
25 - 29 29.5 8
Total - 77