• No results found

Foundation of Quantitative Data Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Foundation of Quantitative Data Analysis"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Foundation of Quantitative Data Analysis

Part 1: Data manipulation and descriptive statistics with SPSS/Excel

HSRS #10 - October 17, 2013

Reference : A. Aczel, Complete Business Statistics. Chapters 1 and 2

Assignment #3: To replicate the classroom exercises.

D.B. Khang _ HSRS #10 - Page 1 Foundation of QDA - 1

Objectives

At the end of this lesson, you should be able to:

Understand the role of statistical analysis in empirical research

Use Excel and SPSS software in data manipulation and simplest statistical operations

Be refreshed of the basic knowledge of probability theory to properly interpret the findings of statistical analysis

D.B. Khang _ HSRS #10 - Page 2 Foundation of QDA - 1

(2)

Statistical Analysis

Data Information  knowledge  decisions and actions

Statistical analysis: Set of scientific methods used to analyze the data in order to provide meaningful information for better understanding and decision making through

An approximation of the real world

Measurements of the errors of this approximation

Based on the data available and the purposes, we may classify as

Descriptive statistics: summarizing and presenting the (population or census) data in order:

To provide insights

To explain

To assess and evaluate

Inferential statistics: Analysis of data available (from a sample, and experiment, etc.) to draw conclusions on a larger or unseen group (population, future events, etc.) in order :

To estimate and predict

To test hypotheses

To provide insights and

To explain

Types of data

Non-metric (or qualitative) data:

Nominal – size of number is not related to the amount of the characteristic being measured

Referring to names or attributes only

Examples: brand, color, sex, professions, etc.

Ordinal – larger numbers indicate more (or less) of the characteristic measured, but not how much more (or less)

Referring to ranking

Examples: ranks, preferences, age groups, social classes, etc.

Metric (or quantitative) data:

Interval – contains ordinal properties, and in addition, there are equal differences between scale points.

D.B. Khang _ HSRS #10 - Page 4 Foundation of QDA - 1

(3)

Storage of data for analysis

Good storage of raw quantitative data is essential for meaningful manipulation, summary, presentation and analysis

Most databases store data in format of table

Rows are the data items or subjects

Columns are the measurements or values assigned (collected) to the items:

variables

Data storage in most databases are transferable

Basic data management skills to be developed through practices:

Enter data into Excel and SPSS – provide explanations of variables and scores

Transfer data between these two platforms

Calculate new variables from existing data entered

Practical tips:

Data should be coded numerically

Full documentation (meanings of variables and their values)

Consistency: data collection, storage and analysis

Manipulations of data stored are acceptable but should be transparent

Classroom exercise 1

Consider the data set HBAT.sav

Read the description of the data and try to understand the meaning of the variables in the data set.

Identify the metric and the non-metric variables, and the meanings of the values of the variables.

Save the file into Excel file. Transfer the file back into SPSS data file. Try to reformat both files for better readability.

D.B. Khang _ HSRS #10 - Page 6 Foundation of QDA - 1

(4)

Summarizing and presenting data

Most often, data should be summarized and presented in sensible ways that support our objectives (that is, to provide insights, to explain or to evaluate)

Options usually include:

Presenting summarized distributions: frequency tables, percentiles

Using some measures of central tendency as representative statistics:

averages, medians, modes

Using some measures of variability: ranges, variances, standard deviations, inter-quartile ranges

Using other descriptive statistics: min, max, quartiles, skewness, kurtosis, etc.

Using tabulations and cross tabulations

Using graphs and diagrams: line graphs, bar charts, pie charts, frequency diagrams, histograms, box plots and other statistical graphs

Most of these can be supported by Excel and SPSS.

Classroom exercise 2

Apply descriptive statistical tools of SPSS/Excel to the

variables X18and X19of HBAT data set and interpret the results.

Apply Pie chart to X1, Histogram to X19.

Draw the scatter graph of X18and X19and interpret the results

Draw the frequency tables of X1and X2and interpret the results

Apply cross tabulation to X1and X2and interpret the results.

Apply cross tabulation with two layers to X1, X3and X4and

D.B. Khang _ HSRS #10 - Page 8 Foundation of QDA - 1

(5)

Classroom exercise 3

Create in Excel and SPSS a new variable:

Z19= (X19– μ )/σ

where μ is mean of X19and σ is standard deviation of X19

Apply descriptive statistical tools on Z19and interpret the results

Draw the histogram charts of X19and Z19and interpret the results

Note: Z19is called the standardized variable of X19

Review of probability and distribution

Probability: defined on random events (occurrences)

Takes values between 0 and 1

Can be interpreted as limit of relative frequency (objective probability)

Note: Often we may use also subjective probabilities, especially in decision making under uncertainty. Such probabilities simply mean the extent of our belief in the occurrence of uncertain events. However, most of statistics deals with objective interpretation based on random sampling of data!

Random variable: output of a measurement (or survey question) that is taken out randomly from a given population.

Usually we can have only sample values of the variables.

Random variable can (only) be described by its distribution

Distribution of a random variable can be approximated through observed values using summary statistics, histogram, frequency table or various charts

Distribution of real random variables can also be approximated by theoretical distributions like normal, uniform, student, chi square, etc.

Notation and examples

Probability: P(customer is from magazine industry) = 0.52

Random variable: X19= customer satisfaction score

Combined: P(X19>= 7.8) = ?

D.B. Khang _ HSRS #10 - Page 10 Foundation of QDA - 1

(6)

A small challenge

A two-headed coin, a two-tailed coin and an ordinary coin are placed in a bag. One of the coins is drawn at random and flipped; it comes up “head”. What is the probability that there is a head on the other side of this coin?

Solution:

There are 6 sides of which 3 sides are Head: one from the normal coin and 2 from the two-head coin. Call them H1, H2 and H3.

Each side has equal chance to come up

If you see H1, the other side is Tail; if you see H2 or H3, the other side will be head.

Once you see head, the probability is 2/3 to see H2 or H3.

References

Related documents