Foundation of Quantitative Data Analysis
Part 1: Data manipulation and descriptive statistics with SPSS/Excel
HSRS #10 - October 17, 2013
Reference : A. Aczel, Complete Business Statistics. Chapters 1 and 2
Assignment #3: To replicate the classroom exercises.
D.B. Khang _ HSRS #10 - Page 1 Foundation of QDA - 1
Objectives
At the end of this lesson, you should be able to:
Understand the role of statistical analysis in empirical research
Use Excel and SPSS software in data manipulation and simplest statistical operations
Be refreshed of the basic knowledge of probability theory to properly interpret the findings of statistical analysis
D.B. Khang _ HSRS #10 - Page 2 Foundation of QDA - 1
Statistical Analysis
• Data Information knowledge decisions and actions
• Statistical analysis: Set of scientific methods used to analyze the data in order to provide meaningful information for better understanding and decision making through
An approximation of the real world
Measurements of the errors of this approximation
Based on the data available and the purposes, we may classify as
• Descriptive statistics: summarizing and presenting the (population or census) data in order:
To provide insights
To explain
To assess and evaluate
• Inferential statistics: Analysis of data available (from a sample, and experiment, etc.) to draw conclusions on a larger or unseen group (population, future events, etc.) in order :
To estimate and predict
To test hypotheses
To provide insights and
To explain
Types of data
• Non-metric (or qualitative) data:
Nominal – size of number is not related to the amount of the characteristic being measured
• Referring to names or attributes only
• Examples: brand, color, sex, professions, etc.
Ordinal – larger numbers indicate more (or less) of the characteristic measured, but not how much more (or less)
• Referring to ranking
• Examples: ranks, preferences, age groups, social classes, etc.
• Metric (or quantitative) data:
Interval – contains ordinal properties, and in addition, there are equal differences between scale points.
D.B. Khang _ HSRS #10 - Page 4 Foundation of QDA - 1
Storage of data for analysis
• Good storage of raw quantitative data is essential for meaningful manipulation, summary, presentation and analysis
• Most databases store data in format of table
Rows are the data items or subjects
Columns are the measurements or values assigned (collected) to the items:
variables
• Data storage in most databases are transferable
• Basic data management skills to be developed through practices:
Enter data into Excel and SPSS – provide explanations of variables and scores
Transfer data between these two platforms
Calculate new variables from existing data entered
• Practical tips:
Data should be coded numerically
Full documentation (meanings of variables and their values)
Consistency: data collection, storage and analysis
Manipulations of data stored are acceptable but should be transparent
Classroom exercise 1
•
Consider the data set HBAT.sav Read the description of the data and try to understand the meaning of the variables in the data set.
Identify the metric and the non-metric variables, and the meanings of the values of the variables.
Save the file into Excel file. Transfer the file back into SPSS data file. Try to reformat both files for better readability.
D.B. Khang _ HSRS #10 - Page 6 Foundation of QDA - 1
Summarizing and presenting data
• Most often, data should be summarized and presented in sensible ways that support our objectives (that is, to provide insights, to explain or to evaluate)
• Options usually include:
Presenting summarized distributions: frequency tables, percentiles
Using some measures of central tendency as representative statistics:
averages, medians, modes
Using some measures of variability: ranges, variances, standard deviations, inter-quartile ranges
Using other descriptive statistics: min, max, quartiles, skewness, kurtosis, etc.
Using tabulations and cross tabulations
Using graphs and diagrams: line graphs, bar charts, pie charts, frequency diagrams, histograms, box plots and other statistical graphs
• Most of these can be supported by Excel and SPSS.
Classroom exercise 2
• Apply descriptive statistical tools of SPSS/Excel to the
variables X18and X19of HBAT data set and interpret the results.
• Apply Pie chart to X1, Histogram to X19.
• Draw the scatter graph of X18and X19and interpret the results
• Draw the frequency tables of X1and X2and interpret the results
• Apply cross tabulation to X1and X2and interpret the results.
• Apply cross tabulation with two layers to X1, X3and X4and
D.B. Khang _ HSRS #10 - Page 8 Foundation of QDA - 1
Classroom exercise 3
•
Create in Excel and SPSS a new variable:Z19= (X19– μ )/σ
where μ is mean of X19and σ is standard deviation of X19
•
Apply descriptive statistical tools on Z19and interpret the results•
Draw the histogram charts of X19and Z19and interpret the results•
Note: Z19is called the standardized variable of X19Review of probability and distribution
• Probability: defined on random events (occurrences)
Takes values between 0 and 1
Can be interpreted as limit of relative frequency (objective probability)
• Note: Often we may use also subjective probabilities, especially in decision making under uncertainty. Such probabilities simply mean the extent of our belief in the occurrence of uncertain events. However, most of statistics deals with objective interpretation based on random sampling of data!
• Random variable: output of a measurement (or survey question) that is taken out randomly from a given population.
Usually we can have only sample values of the variables.
Random variable can (only) be described by its distribution
Distribution of a random variable can be approximated through observed values using summary statistics, histogram, frequency table or various charts
Distribution of real random variables can also be approximated by theoretical distributions like normal, uniform, student, chi square, etc.
• Notation and examples
Probability: P(customer is from magazine industry) = 0.52
Random variable: X19= customer satisfaction score
Combined: P(X19>= 7.8) = ?
D.B. Khang _ HSRS #10 - Page 10 Foundation of QDA - 1
A small challenge
•
A two-headed coin, a two-tailed coin and an ordinary coin are placed in a bag. One of the coins is drawn at random and flipped; it comes up “head”. What is the probability that there is a head on the other side of this coin?•
Solution: There are 6 sides of which 3 sides are Head: one from the normal coin and 2 from the two-head coin. Call them H1, H2 and H3.
Each side has equal chance to come up
If you see H1, the other side is Tail; if you see H2 or H3, the other side will be head.
Once you see head, the probability is 2/3 to see H2 or H3.