Foundation of Quantitative Data Analysis

(1)

Foundation of Quantitative Data Analysis

Part 1: Data manipulation and descriptive statistics with SPSS/Excel

HSRS #10 - October 17, 2013

Reference : A. Aczel, Complete Business Statistics. Chapters 1 and 2

Assignment #3: To replicate the classroom exercises.

D.B. Khang _ HSRS #10 - Page 1 Foundation of QDA - 1

Objectives

At the end of this lesson, you should be able to:

 Understand the role of statistical analysis in empirical research

 Use Excel and SPSS software in data manipulation and simplest statistical operations

 Be refreshed of the basic knowledge of probability theory to properly interpret the findings of statistical analysis

D.B. Khang _ HSRS #10 - Page 2 Foundation of QDA - 1

(2)

Statistical Analysis

• ^Data Information  knowledge  decisions and actions

• Statistical analysis: Set of scientific methods used to analyze the data in order to provide meaningful information for better understanding and decision making through

 An approximation of the real world

 Measurements of the errors of this approximation

Based on the data available and the purposes, we may classify as

• Descriptive statistics: summarizing and presenting the (population or census) data in order:

 To provide insights

 To explain

 To assess and evaluate

• Inferential statistics: Analysis of data available (from a sample, and experiment, etc.) to draw conclusions on a larger or unseen group (population, future events, etc.) in order :

 To estimate and predict

 To test hypotheses

 To provide insights and

 To explain

Types of data

• Non-metric (or qualitative) data:

 Nominal – size of number is not related to the amount of the characteristic being measured

• Referring to names or attributes only

• Examples: brand, color, sex, professions, etc.

 Ordinal – larger numbers indicate more (or less) of the characteristic measured, but not how much more (or less)

• Referring to ranking

• Examples: ranks, preferences, age groups, social classes, etc.

• Metric (or quantitative) data:

 Interval – contains ordinal properties, and in addition, there are equal differences between scale points.

(3)

Storage of data for analysis

• Good storage of raw quantitative data is essential for meaningful manipulation, summary, presentation and analysis

• Most databases store data in format of table

 Rows are the data items or subjects

 Columns are the measurements or values assigned (collected) to the items:

variables

• Data storage in most databases are transferable

• Basic data management skills to be developed through practices:

 Enter data into Excel and SPSS – provide explanations of variables and scores

 Transfer data between these two platforms

 Calculate new variables from existing data entered

• Practical tips:

 Data should be coded numerically

 Full documentation (meanings of variables and their values)

 Consistency: data collection, storage and analysis

 Manipulations of data stored are acceptable but should be transparent

Classroom exercise 1

•

Consider the data set HBAT.sav

 Read the description of the data and try to understand the meaning of the variables in the data set.

 Identify the metric and the non-metric variables, and the meanings of the values of the variables.

 Save the file into Excel file. Transfer the file back into SPSS data file. Try to reformat both files for better readability.

(4)

Summarizing and presenting data

• Most often, data should be summarized and presented in sensible ways that support our objectives (that is, to provide insights, to explain or to evaluate)

• Options usually include:

 Presenting summarized distributions: frequency tables, percentiles

 Using some measures of central tendency as representative statistics:

averages, medians, modes

 Using some measures of variability: ranges, variances, standard deviations, inter-quartile ranges

 Using other descriptive statistics: min, max, quartiles, skewness, kurtosis, etc.

 Using tabulations and cross tabulations

 Using graphs and diagrams: line graphs, bar charts, pie charts, frequency diagrams, histograms, box plots and other statistical graphs

• Most of these can be supported by Excel and SPSS.

Classroom exercise 2

• Apply descriptive statistical tools of SPSS/Excel to the

variables X₁₈and X₁₉of HBAT data set and interpret the results.

• Apply Pie chart to X₁, Histogram to X₁₉.

• Draw the scatter graph of X₁₈and X₁₉and interpret the results

• Draw the frequency tables of X₁and X₂and interpret the results

• Apply cross tabulation to X₁and X₂and interpret the results.

• Apply cross tabulation with two layers to X₁, X₃and X₄and

(5)

Classroom exercise 3

•

Create in Excel and SPSS a new variable:

Z₁₉= (X₁₉– μ )/σ

where μ is mean of X₁₉and σ is standard deviation of X₁₉

•

Apply descriptive statistical tools on Z₁₉and interpret the results

•

Draw the histogram charts of X₁₉and Z₁₉and interpret the results

•

^{Note: Z}₁₉is called the standardized variable of X₁₉

Review of probability and distribution

• Probability: defined on random events (occurrences)

 Takes values between 0 and 1

 Can be interpreted as limit of relative frequency (objective probability)

• Note: Often we may use also subjective probabilities, especially in decision making under uncertainty. Such probabilities simply mean the extent of our belief in the occurrence of uncertain events. However, most of statistics deals with objective interpretation based on random sampling of data!

• Random variable: output of a measurement (or survey question) that is taken out randomly from a given population.

 Usually we can have only sample values of the variables.

 Random variable can (only) be described by its distribution

 Distribution of a random variable can be approximated through observed values using summary statistics, histogram, frequency table or various charts

 Distribution of real random variables can also be approximated by theoretical distributions like normal, uniform, student, chi square, etc.

• Notation and examples

 Probability: P(customer is from magazine industry) = 0.52

 Random variable: X₁₉= customer satisfaction score

 Combined: P(X₁₉>= 7.8) = ?

(6)

A small challenge

•

A two-headed coin, a two-tailed coin and an ordinary coin are placed in a bag. One of the coins is drawn at random and flipped; it comes up “head”. What is the probability that there is a head on the other side of this coin?

•

^Solution:

 There are 6 sides of which 3 sides are Head: one from the normal coin and 2 from the two-head coin. Call them H1, H2 and H3.

 Each side has equal chance to come up

 If you see H1, the other side is Tail; if you see H2 or H3, the other side will be head.

 Once you see head, the probability is 2/3 to see H2 or H3.