• No results found

Some applications of data analysis

N/A
N/A
Protected

Academic year: 2020

Share "Some applications of data analysis"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Some Application of Statistical Methods in

Data Analysis

Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman, Former Director,

(2)

Forms of “statistical” relationship

 Correlation  Contingency

 Cause-and-effect

* Causal

* Feedback

* Multi-directional * Recursive

 The last two categories are normally dealt with

(3)

Statistical Data Analysis Methods – A Summary Scale of measurement One-sample Two independent Sample K independent Sample Measures of Association Independent Sample Single treatment repeat Measures Multiple treatment repeat Measures Nominal Binomial test; one-way contingency Table McNemar test Cochrane Q Test Two-way contingency Table Contingency Table Contingency Coefficients

Ordinal Runs test Wilcoxon signed rank test Friedman test Mann-Whitney Test Kruskal-Wallis Test Spearman rank Correlation Interval/ratio Z- or t-test

of variance

(4)

One-Sample Test

McNemar Test: tests for change

in a sample upon a “treatment”.

 Example. Two condominium

projects K&L. Respondents decide their preferences for K or L before and after

“advertising”.

Hypothesis: Advertising does

not influence buyers to change their mind on product choice

Before After

Project L Project K Project K A = 40 B = 60 Project L C = 30 D = 50

(5)

One-Sample Test (contd.)

Test statistics:

r c

Q = (0ij – Eij)2/E ij

i=1 j=1

where E = (A+D)/2

Therefore, r c

Q = (0ij – Eij)2/E ij

i=1 j=1

[A-(A+D)/2]2 [D-(A+D)/2]2 (A-D)2

--- + -- = (A+D)/2 (A+D)/2 A+D

Thus, Q = (40-45)2/(40+45)

= 25/85

= 0.29

(2-1)(2-1); 0.05 = 3.84

Ho not rejected. No influence

(6)

One-Sample Test (contd.)

Friedman Test: tests

for equal preferences for something of

various characteristics.

 Example. Buyers’ rank

of preference for three condominium types A, B, C.

 Hypothesis: Buyers’

preferences for all

condo type do not differ

Resp. Type A Type B Type C

Man 2 3 1

Min 1 2 3

Lee 1 3 2

Ling 3 1 2

Dass 1 2 3

(7)

One-Sample Test (contd.)

Test statistics:

(n-1)k

k

F

r

= ---

R

j2

– 3n(k+1)

nk(k+1)

j=1
(8)

One-Sample Test (contd.)

 (5-1)3

F = --- [82 + 112 + 112] – 3x5(3+1)  5x3(3+1)

 = 1.2

X2

(3-1); 0.05 = 5.99

H

o not rejected. Buyers do not show different

(9)

One-Sample Test (contd.)

Repeated measures ANOVA: tests outcome of a phenomenon under different conditions.

Example. Waiting time at junctions in the city area to

determine level of congestion at different times of the day.  Test statistics:

t/(m-1) F = r/[(n-1)(m-1)

where t = sum of squares due to treatment, r =sum of squares of residual, m = number of treatment, n = number of

observations.

Critical region based on: F

v1. v2; α

where v1 = (m-1), v2 = (n-1)(m-1)

(10)

One-Sample Test (contd.)

Waiting time at junction (min.) Row mean Sum Sq. about row mean(Wi)

Morning Noon Evening

Junction 1 4.00 5.00 6.00 5.00 2.00

Junction 2 5.00 6.00 6.00 5.67 0.67

Junction 3 6.00 7.00 8.00 7.00 2.00

Junction 4 5.00 8.00 6.00 6.33 4.67

Junction 5 5. 00 4.00 9.00 6.00 14.00

Column mean

(11)

One-sample test (contd.)

m n

T =

(c

ij

– M)

2

i=1 j=1 

= 30

W

i

=

(c

ij

– )

2

= 23.34

B = m

( - M)

2 

= 6.65

t = n

( - M)

2

= 10

W = t + r

r = W – t

(12)

One-Sample Test (contd.)

 10/(3-1)

F

c = --- = 2.99  13.34/(5-1)(3-2)

F

t (3-1),(3-1)(5-1); 0.05 = 4.46

H

o not rejected. Congestion is quite the same at

(13)

Two-Sample Test

Two-way Contingency

Table: test whether two independent groups differ on a given characteristic.

 Hypothesis: choice for

type of house does not relate to location.

Test: Group Total (R) Inner suburbs Outer suburbs

Terraced 50 75 125

Semi-detached

30 25 55

Total (C) 80 100 180

r c

Q = (0ij – Eij)2/E ij

(14)

Two-Sample Test (contd.)

 D.o.f. = (r-1)(c-1),  where r=number of

rows, c=number of columns

Eij = RiCj/N

Inner suburbs Outer suburbs Terraced 125 x 80/180

= 55.6

125 x 100/180 = 69.4

Semi-detached

55 x 80/180 = 24.4

55 x 100/180 = 30.6

Q = (50-55.6)2/55.6 + (30-24.4)2/24.4 + (75-69.4)2/69.4 + (25-30.6)2/30.6

= 3.33

(2-1)(2-1); 0.05 = 3.84

(15)

K Independent Test - Correlation

“Co-exist”.E.g.

* left shoe & right shoe, sleep & lying down, food & drink

 Indicate “some” co-existence relationship. E.g.

* Linearly associated (-ve or +ve) * Co-dependent, independent

But, nothing to do with C-A-E r/ship!

Example: After a field survey, you have the following

data on the distance to work and distance to the city

of residents in J.B. area. Interpret the results?

(16)
(17)
(18)
(19)
(20)
(21)

Test yourselves!

Q1: Calculate the min and std. variance of the following data:

Q2: Calculate the mean price of the following low-cost houses, in various localities across the country:

PRICE - RM ‘000 130 137 128 390 140 241 342 143

SQ. M OF FLOOR 135 140 100 360 175 270 200 170

PRICE - RM ‘000 (x) 36 37 38 39 40 41 42 43

(22)

Test yourselves!

Q3: From a sample information, a population of housing estate is believed have a “normal” distribution of X ~ (155, 45). What is the general adjustment to obtain a Standard Normal Distribution of this population?

Q4: Consider the following ROI for two types of investment:

A: 3.6, 4.6, 4.6, 5.2, 4.2, 6.5 B: 3.3, 3.4, 4.2, 5.5, 5.8, 6.8

(23)

Test yourselves!

Q5: Find:

(AGE > “30-34”)

(AGE ≤ 20-24)

(24)

Test yourselves!

Q6: You are asked by a property marketing manager to ascertain whether

or not distance to work and distance to the city are “equally” important factors influencing people’s choice of house location.

You are given the following data for the purpose of testing:

Explore the data as follows:

Create histograms for both distances. Comment on the shape of the

histograms. What is you conclusion?

Construct scatter diagram of both distances. Comment on the output.Explore the data and give some analysis.

Set a hypothesis that means of both distances are the same. Make

(25)

Perception about Influence of New Neighbourhood

Degree of perception

Locality

Total Bblaut Patau1 Patau2 Racha2

Not worried at all 17 30 24 9 80

Not so worried 6 0 2 14 22

Worried 6 0 3 4 13

Quite worried 1 0 0 2 3

So Worried 0 0 1 1 2

Total 30 30 30 30 120

Q 7. You have surveyed a group of local people and asked them to express their feeling about a new project that will attract a new population and thus a new

(26)

Test yourselves! (contd.)

Q7: From your initial investigation, you belief that tenants of “low-quality” housing choose to rent particular flat units just to find shelters. In this context ,these groups of people do not pay much attention to pertinent aspects of “quality

life” such as accessibility, good surrounding, security, and physical facilities in the living areas.

(a) Set your research design and data analysis procedure to

address the research issue

(b) Test your hypothesis that low-income tenants do not

References

Related documents

If the consecutive measurements at a given site within the same PAMS measurement season can be assumed to be approximately independent then more powerful statistical methods can be

Against this background, this paper explores statistical perspective on data mining issues in the clustering of deep into the study, the statistical theory, statistical methods

• The goal of PhenX is to select and define high priority measures useful for Genome-wide association studies (GWAS) and other large-scale genomic research efforts.. • The

Multivariate statistical methods often require that the scales of measurement of all variables are either the same or at least similar (as similar interval and ratio scale

Keywords: topological data analysis, statistical topology, persistent homology, topolog- ical summary, persistence

After saving them as new variables, predicted values, residuals, and other diagnostics are available in the Data Editor for constructing plots with the independent

We apply (1) different statistical measures and analysis on the harvested Twitter data and additionally bring two of the most used methods in Twitter analysis together, the (2)

Subject headings: item analysis / item selection / single-peaked response data / scale construction / bipolar measurement scales / construct validity / internal consistency