• No results found

Analyzing Research Data Using Excel

N/A
N/A
Protected

Academic year: 2022

Share "Analyzing Research Data Using Excel"

Copied!
74
0
0

Loading.... (view fulltext now)

Full text

(1)

Analyzing Research Data Using Excel

© Fraser Health Authority, 2012

The Fraser Health Authority (“FH”) authorizes the use, reproduction and/or

modification of this publication for purposes other than commercial redistribution. In consideration for this authorization, the user agrees that any unmodified

reproduction of this publication shall retain all copyright and proprietary notices. If the user modifies the content of this publication, all FH copyright notices shall be removed, however FH shall be acknowledged as the author of the source publication.

Reproduction or storage of this publication in any form by any means for the purpose of commercial redistribution is strictly prohibited.

This publication is intended to provide general information only, and should not be relied on as providing specific healthcare, legal or other professional advice. The Fraser Health Authority, and every person involved in the creation of this publication, disclaims any warranty, express or implied, as to its accuracy, completeness or currency, and disclaims all liability in respect of any actions, including the results of any actions, taken or not taken in reliance on the information contained herein.

(2)

22

http://research.fraserhealth.ca/

(3)

Objectives

 To review key concepts and elements of quantitative research

 To explore the application of Excel in conducting a research project:

 Creating data files

 Creating data dictionary

 Linking research question to appropriate analysis

 To apply statistics to analytical data

(4)

44

Workshop Outline

09:00 - 09:15 Review of Quantitative Research 09:15 - 09:30 Measurement

09:30 - 10:30 Excel 101

Excel as database 10:30 - 10:45 Break

10:45 - 11:15 Using Excel to Clean/Explore Data 11:15 - 12:00 Using Excel to Analyze Data

(5)

Excel Pop Quiz

How many columns?

 2003: 256

 2007 & 2010: 16,384

How many rows?

 2003: 65,546

 2007 & 2010: 1,048,576 1,048,576

True or False

 You can conduct statistical analyses in Excel?

(6)

66

Quick Review of Quantitative

Research

(7)

Framework for Quantitative Research

Conduct literature review Develop rationale

Formulate research question

Generate objective(s) and/or hypothesis

Apply methods and conduct

Why do want to do this research?

What do others say?

What are knowledge gaps?

PICO Method

P = population / patient I = intervention

C = comparison O = outcome

Hypothesis

(Usually) statement of anticipated results

Objective

“Action” statement

Measurement Study Design

(8)

88

Research Question

(9)

Measurement: “Thinking in Numbers”

IDID GenderGender AgeAge Disease Disease Outcome Outcome

1 Male 59 Y

2 Female 52 Y

3 Male 53 N

4 Female 60 N

IDID GenderGender AgeAge Disease Disease Outcome Outcome

1 1 59 1

2 2 52 1

3 1 53 0

4 2 60 0

From this To this

(10)

1010

Types of Variables

Independent variable (IV)

Influences your outcome measure

Active (intervention) or Attribute (characteristic) Dependent variable (DV)

Influenced by the IV(s)

Usually represents outcome studied Confounders

Alternative explanation for an association between an exposure (IV) and an outcome (DV)

Not a focus of the study

Independently associated with the outcome

Associated with the exposure under study

(11)

Level of Measurement

Nominal

Example: gender

Data categorized into mutually exclusive and unordered groups

Can assign number codes but calculations would be meaningless (male=1; female=2)

Ordinal

Example: SES level: low, middle and high income

Data classified/categorized with implied order

Distance between data not always equal

Can't measure the magnitude or quantify the

difference between data: how much lower is middle from higher income?

(12)

1212

Interval: attributes measured on interval scales

Equal distance between each interval

Distance between scale numbers has meaning

Arbitrary zero point (e.g., temperature)

Ratio: similar to interval scale

Has true zero point

Clear definition of 0: There is none of the variable

Example: weight, salary ($0=$0).

Can make assumptions about the ratio of two measurements – 6 grams is twice as much as 3 grams

Level of Measurement

(13)

Level of Measurement & Acceptable Statistical Operations

Nominal Ordinal Interval Ratio

Frequency distribution Yes Yes Yes Yes

Mode Yes Yes Yes Yes

Median & percentile No Yes Yes Yes

Mean & standard deviation No No Yes Yes

(14)

1414

Excel 101

(15)

Objectives

 How to organize data in an MS Excel spreadsheet

 How to define variables

 How to code in preparation for analysis

(16)

1616

Terminologies

Data: Information that you collect

Dataset

Collection of data usually presented in tabular form

Columns represents variables

Rows represent members of the dataset

Spreadsheet

Computer application that facilitates use of datasets (enter data, analyses, sharing)

MS Excel is a spreadsheet program

(17)

Using Excel for a Research Study

To capture data

Facilitate data collection, minimize entry errors

To clean/explore/describe data

Starting point for analyses is “cleaning” raw data

Preliminary descriptive statistics

To analyze data

Using program “add-ins”

(18)

1818

Stages in preparing data for analysis

Interpret results Explore data

Create data file: Enter & clean data Collect data

Analyze data

(19)

Stages in preparing data for analysis

Good practice

 Design your spreadsheet keeping your statistical analyses in mind

 Use logic check to clean data

 Create data dictionary

 Consult with analyst

(20)

2020

Creating data file using spreadsheet

Each variable (e.g., ID #) represented by a column

Each participant is represented by a row

All the information for a single case is entered across one row only

the data in each column summarizes information on a particular variable

(21)

Creating Your Spreadsheet

Defining variables

Use descriptive, unique, names for variables

Use underscore (_) instead of space

Be consistent in naming especially with array variables

Variables that capture a pattern

Example: measuring blood pressure at regular intervals (e.g., Q30 min for 12 hours)

BP_time1 BP_time2 BP_time3

BP_30min BP_1hr BP_final

(22)

2222

Creating Your Spreadsheet

Each column captures only single piece of information

ID

Intervention_Used

1 2 3

Yes surgery Yes medication

No

ID Intervention_Used Intervention_Type 1

2 3

Surgery Medication Not applicable Yes

Yes No

(23)

Creating Your Spreadsheet

Use tools to facilitate entry: Add notes for data entry Insert  Comments

(24)

2424

Creating Your Spreadsheet

Use tools to facilitate entry: Colour-code columns Fill color icon

(25)

Creating Your Spreadsheet

For numeric variables, use Format to “force” entry into specific form

Date

Number of decimals

Highlight entire column to ensure consistent format applied

(26)

2626

Data Dictionaries

AKA code books

A “centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format”

Develop data dictionary: Before collecting data

Components:

Variable name

Variable label

Type: nominal or interval

Values/coding for each variable

Always assign values for missing or non applicable cases

(27)

Sample Data Dictionary

Data Dictionary for Research Consults Database

Item Variable Label Type Coding/ entry instruction 1 ID_no ID number Numeric Enter unique number for each 

record/patient (1….100)

2 Gender Gender of participant String Enter corresponding number:

0= Female 1= Male

999= missing

3 Date_T1 Date at baseline  Date Specify format: mm/dd/yy 4 Education  Highest level of 

education

Numeric Enter corresponding number:

1= Grade 10 2= Grade 12

3= College Diploma 4= University

(28)

Data Structures

 Re-emphasize importance of keeping

analyses in mind when designing dataset

 Statistical packages require that data are entered in specific way in order to run

analytical steps

2828

(29)

Using Excel to Clean / Explore

Data

(30)

3030

Data Cleaning & Data Exploration

 Data at this stage is raw dataset

 Need to

 Clean: any entry errors, duplicate entries

 convert text variables into numeric variables

 Explore: any outliers

 Excel tools to facilitate cleaning and exploring

 Filter

 Sort

(31)

Data Cleaning & Data Exploration

Sources of data errors:

Missing: never leave blank

Assign a meaningful number for missing values

Consider coding for cases with non-applicable responses

Typing errors on data entry (e.g., age= 121)

Measurement errors (e.g., height) Identify errors:

Descriptive statistics for each variable

Minimum and maximum values

Means, medians and SD

(32)

3232

Our “Dataset” for the Remainder of

the Workshop

(33)

Gout and AMI Study

(34)

3434

Gout and AMI 2 Study

Research Question

What clinical factors are associated with the

development of AMI among elderly women with gout?

Objectives

To compare women with gout and no gout on clinical factors including age, BMI, uric acid level

To evaluate the correlation between clinical factors among elderly women with gout and without gout

(35)

Gout and AMI 2 Dataset

(36)

3636

Gout and AMI 2 Study

 Data dictionary (handout)

 13 Variables

 What are the continuous variables?

 What are the categorical variables?

 N = 200 subjects

(37)

Excel Tool: Filter

 Place cursor over the block by A1

 Data  Filter  AutoFilter (2003)

 Data  Sort & Filter

 Filter (2007)

(38)

Exercise 1

You are the analyst for the Gout and AMI 2 study.

A data dictionary was not implemented (GASP!) and there are entry errors.

 Clean the dataset by locating and finding entry errors using tools in Excel.

 Sample_worksheet_Exercise1.xls

3838

(39)

Summary

Important data considerations while conducting the study

 Designing the dataset

 Collecting the data

 Entering the data

 Cleaning/exploring raw data

 Applying statistics to analytical data

(40)

4040

Using Excel to Analyze Data

(41)

Using Excel to Analyze Data

 Analysis ToolPak: add-in to be installed in Excel

 Supplemental program that adds custom commands

 Descriptive statistics

 Analytical statistics

 T-tests

 Correlation

(42)

4242

Getting Started

1. Click Tools  Add-Ins

2. Select Analysis Tool Pack

(43)

 Statistics used to describe characteristics of study population/sample

 Not used to infer the properties of the population from which the sample was drawn

 For continuous variables

 Measures of central tendency: mean, median, mode

 Measures of variability (standard deviation)

 Shape of Distribution (skewness, kurtosis)

Descriptive Statistics

(44)

Level of Measurement & Acceptable Statistical Operations

Nominal Ordinal Interval Ratio

Frequency distribution Yes Yes Yes Yes

Mode Yes Yes Yes Yes

Median & percentile No Yes Yes Yes

Mean & standard deviation No No Yes Yes

4444

(45)

Descriptive Statistics

1.Click Tools  Data Analysis

2. Select Descriptive Statistics

(46)

4646

Example: Age

Input range

Select columns “$A:$E”

Group by columns

Click on Labels in 1st row

Output options

New worksheet

Summary statistics

(47)

Output for Descriptive Statistics: Age

(48)

4848

Exercise 2

Data file =

sample_worksheet_Masterfile.xls

Using Descriptive Statistics in Analysis ToolPak, obtain descriptive statistics for:

 Uric Acid

 BMI

(49)

Descriptive Statistics by Group

Use Pivot Tables

 Useful to obtain descriptive statistics by group

 For example, if you wanted to know the average and standard deviation of BMI for men and women

(50)

5050

Pivot Tables

Click Data  Pivot Table and Pivot Chart

(51)

Example Age by Gout Diagnosis

Step 1

 Where is data?

 What kind of report?

(52)

5252

Step 2

 Where is data you want to use?

 Drag pointer over entire dataset

Example Age by Gout Diagnosis

(53)

Step 3

 Where do you want to put the Pivot Table?

 Layout

This is where you tell Excel which groups you want to output your results by and for what variables

Example Age by Gout Diagnosis

(54)

5454

Step 3 Layout

Row represents your grouping variable (Gout_Dx)

Column variable you want to output according to groups

May need to drag several times for parameters

needed

Example Age by Gout Diagnosis

(55)

Example Age by Gout Diagnosis

No gout

Gout

(56)

5656

Exercise 3

Using Pivot Tables and Charts, obtain the

mean and standard deviation for uric acid

according to gout diagnosis

(57)

Analytic Statistics

Statistical procedures used to draw conclusions about a population from sample data

 Compare groups

T-tests

 Evaluate correlation

Correlation coefficients

 Evaluate association

Regression models

(58)

5858

Analytic Statistics: Considerations

Research question: Describe, compare or predict?

Levels of data measurement: nominal, ordinal or interval?

Are you comparing same or different subjects?

Number of experimental groups?

Is your data normally distributed?

What are the assumptions of the statistical test you would like to use?

(59)

Check Data Assumptions

What are the assumptions of the statistical test you would like to use?

Some common assumptions are:

The DV and IV will need to be measured on a certain level (e.g. continuous)

The population is normally distributed (not skewed)

(60)

6060

1. Research goal

2. Identify ID and DV

3. Describe level of the data 4. Identify the # & group

pairing groups

5. Check data assumptions

Type of Dependent Variable Data

Goal Continuous

Normal

Ordinal Non-normal

Categorical

Describe one group

Mean, SD Median, interquartile range

Proportion

Compare one group to a hypothetical value

One-sample t test Wilcoxon test Chi-square

Compare two unpaired groups

Unpaired t test Mann-Whitney test Fisher's test (chi-square for large samples) Compare two

paired groups

Paired t test Wilcoxon test McNemar's test

Compare three or more unmatched groups

One-way ANOVA Kruskal-Wallis test Chi-square test

Compare three or more matched groups

Repeated-measures ANOVA

Friedman test Cochrane Q

Quantify association between two variables

Pearson correlation Spearman correlation Contingency coefficients

Predict value from another measured variable

Simple linear regression or

Nonlinear regression

Nonparametric regression

Simple logistic regression

Predict value from several measured or binomial variables

Multiple linear regression or

Multiple nonlinear regression

Multiple logistic regression

Selecting Appropriate Statistical Test

Statistical decision tree (handout)

(61)

Exercise 4 (handout)

A pilot experiment designed to test the effectiveness of a new therapy to pain management for patients with chronic pain, conducted over a one year time period.

What is the goal?

What is the IV?

What is the DV?

How many groups?

Paired/matched or independent?

What is the level of measurement?

(62)

6262

Comparing groups: Mean differences

Independent Samples T-Test

 Comparison of the means of 2 non-paired groups

 Differences in pain levels between 2 groups (standard care and new intervention)

Paired Samples T-Test

 Comparison of means of 2 paired measures

 Differences in pain levels within groups

 Pre and post measurement, repeated measurement under different conditions

(63)

Comparing Groups

 Sort dataset by variable you are comparing

 Click Tools  Data analysis

 Three options for t-tests

(64)

6464

Is there a difference in age between gout and non-gout patients?

 Highlight age data for first group (no gout)

 Highlight age data for second group (gout)

 Output to new worksheet

(65)

Output

No gout Gout

(66)

6666

Paired Samples T-Test Procedure

Tools Data Analysis T-test: Paired Two Sample for Means Input 1 Range: DV at time 1 Input 2 Range: DV at time 2 Output Options

(67)

Exercise 5

Using Descriptive Statistics in Analysis ToolPak, compare patients with no gout versus gout according to:

 Uric Acid levels

 BMI

(68)

6868

Exercise 6

Using Descriptive Statistics in Analysis ToolPak to answer the following question:

Is there a difference in mean pain before

and after surgery?

(69)

Associate - Correlation

Allows an examination of the association between variables

Range: 0 to +1

Information about the strength of association

Information about the direction of the association (positive or negative)

A correlation coefficient of 0 =no relationship

A correlation coefficient of +1= positive relationship

A correlation coefficient of -1= negative relationship

(70)

7070

Evaluating Correlation

 Continuous variables

 Columns:

side-by-side

 Click Tools  Data Analysis

 Click Correlation

(71)

What is Correlation Between Age and Uric Acid Level?

 Highlight 2 columns

 Group by columns

 Labels in first row

 Output to new worksheet

(72)

7272

Output

(73)

Exercise 6

Using Descriptive Statistics in Analysis

ToolPak, evaluate the correlation between:

 Uric Acid and BMI

(74)

7474

Thank You!

References

Related documents

The Chart Options menu is where you enter the labels for your plot (Fig. 9) For Chart Title, type in Lineweaver-Burk Plot.. • Enter 1/S for the Value

Please rate the quality of the workshop: Strongly Disagree Somewhat Disagree Neutral Somewhat Agree Strongly Agree 1. The handouts and

Registration will include a 7-day backpack adventure, prizes, float plane, 1-night stay at Northern Rockies Lodge, breakfast/dinner at Northern Rockies Lodge, dip in the Liard

Figure 2 shows wiring connections necessary to power a Model 3051 and enable communications with a hand-held HART communicator.. For low-power transmitters, refer to the

Red Hat CloudForms OpenShift by Red Hat Other applications Handhelds Raw data Control tier Thousands of instances Device tier Millions of instances Sensors Control points

a) Identify Ms Excel tools used for data analysis b) Analyze data using the tools.. Ms Excel for Data Analysis.. • Ms Excel provides powerful tools that for data analysis

In the Create PivotTable dialog box, under Choose the data that you want to analyze section, do the following:.  Select Use an external data source

NOTE: Any schedule entries made before the upload will be deleted when the Excel file is uploaded – this process OVERWRITES any previous information.. The template can be