• No results found

Database preparation and Abstract/Manuscript 101

N/A
N/A
Protected

Academic year: 2021

Share "Database preparation and Abstract/Manuscript 101"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)

Database preparation and

Abstract/Manuscript 101

Jennifer F. Tseng, MD, MPH

Chief, Division of Surgical Oncology, BIDMC BWH Residents’ Conference, Boston, MA

(2)

EXCELlent Colleagues

Bruce A. Barton, PhD Aimee Knoll, MS

Heena Santry, MD, FACS

(3)

Plan

• 10,000 foot view of starting your own database

• Pitfalls

• 10 step program for Excel (after Barton)

(4)

Conveyor Belt of Research

(5)

The Zen of Prospective Databases,

after Bruce Barton

• A prospective database is like a slow-moving conveyor belt

• At the beginning, you have the design of the product (hypothesis & study design)

• Once approved by Management (a.k.a., IRB, funding), then the conveyor belt starts up

• Every time you collect a piece of data, you place it on the conveyor belt

(6)

More Zen

• As the data move down the conveyor belt, it is slowly cleaned and polished

• The data are not ready for the “final step” (analysis) until the very end of the belt

(7)

Non-Zen

from Murray Brennan

• “It will take you 3 to 5 years before a

prospective database is of any use to you.”

– (At a Meet the Professors breakfast, circa 2005)

(8)

Basics of databases

• Why?

• How?

(9)

Basics of databases: Why?

• Hypotheses determine data points

– Write dummy methods, results, table shells • Research vs. Quality

• Individual vs. part of a joint or departmental effort

• Actively seek examples of similar databases to determine which fields are important

(10)

Basics of databases: How?

• Which program?

– Excel

– Access

– SPSS

– Filemaker Pro, Oracle, Sequel etc.

• Prospective vs. retrospective vs. hybrid

(11)

Basics of databases: How?

• Nitty-gritty

– Who will support ($, time)

– How will patients be captured?

• Point of entry

– Default maintenance

• Timing of updates

– The simpler the strategy, the more likely to succeed

• Test system for flaws prior to committing major time and resources

• Database will never be perfect

(12)

Basics of databases:

personnel

• Single most important individual: you, the PI

• It takes a village:

– Dedicated data entry staff

– Biostatistician

– Outcomes / Health services researchers

– Computer scientists

– Research residents, fellows

(13)

10 Step Program for Excel

• Excel: easy, available, readily export to SAS, STATA, SPSS

• Plenty of opportunities to get in trouble

1 3

(14)

Step 1: Column = variable

• Single row for column headings

– Column headings = variable names

– “Please enter your gender (check all that apply)” is not a good variable name

• Keep it short and sweet

– Sex

– Age

1 4

(15)

Columns: concise, unique variable names

ID Age_Time1 Age_Time2 1 64 64 2 63 65 3 57 58 4 71 74 ID Age in Months at First Timepoint Age in Months at Second Timepoint 1 64 64 2 63 65 3 57 58 4 71 74

(16)

Datasets are not tables:

Transposed data confuses statistical programs!

ID 123 456 789 234 567 890 Age 54 56 50 51 59 53 Gender M M M F F M Weight 174 172 197 141 135 160 Height 69 72 70 64 62 70 Score1 5 6 3 9 8 5 Score2 7 8 2 7 6 7 C1 Frequency % Age 1 14.29 Gender 1 14.29 Height 1 14.29 ID 1 14.29 Score1 1 14.29 Score2 1 14.29 Weight 1 14.29 Excel Dataset SAS Dataset

(17)

ID Age Gender Weight Height Score1 Score2 123 54 M 174 69 5 7 456 56 M 172 72 6 8 789 50 M 197 70 3 2 234 51 F 141 64 9 7 567 59 F 135 62 8 6 890 53 M 160 70 5 7 ID Frequency % 123 1 16.67 234 1 16.67 456 1 16.67 567 1 16.67 789 1 16.67 890 1 16.67 Gender Frequency % F 2 33.33 M 4 66.67 Excel Dataset SAS Dataset

(18)

Step 2: One item per cell

/ column

• Put only one item of information in a cell/column

• Examples

– For blood pressure, put 120 not 120 mmHg or 120/80 or 120? or 120 8AM

– Make another variable for additional information (diastolic; BP_time)

1 8

(19)

More variables make for simpler spreadsheet

ID Age and Gender

1 64F 2 63M 3 57F 4 71F 5 75M 6 65M ID Age Gender 1 64 F 2 63 M 3 57 F 4 71 F 5 75 M 6 65 M

(20)

Avoid putting too much information in one field ID Symptoms

1

Cough, fever (100.3), sore throat, fatigue

2 None

3

Cough, Sore throat, Sore muscles, Vomiting 4 Sore throat, fever (101.8)

5 None

6

Sore throat, weakness, fatigue

ID Symptom1 Symptom2 Symptom3

1 Cough Fever Sore throat

2 None . .

3 Cough Sore throat Sore muscles 4 Sore throat Fever .

5 None . .

(21)

Another example…

ID Pathology

1 SB Resection: Meckel's diverticulum with ectopic gastric mucosa. Focal fibrosis, regenerated

mucosa at suture site- possibly a healed ulcer. 2 SB Resection: Angiodysplasia; Lymph Nodes with

reactive hyperplasia; jejunal diverticulitis 3 SB Resection: Ulceration, acute and chronic

inflammation; Thickened submucosa

Researcher’s Conclusions: Normal: 0%; Superficial Ulcer/inflammation (not beyond submucosa): 43%; Deep Ulcer/inflammation (not beyond

muscularis propria): 14%; Transmural ulcer/inflammation: 14%; Meckel's diverticulum 14%; Angiodysplasia: 14%

(22)

Step 2A: Think numeric

• To a computer, there are two types of data: numeric and character

• Any non-numeric character, including a space, makes computer read as text not #

• Use numbers whenever possible (e.g. 1/0 rather than y, Y, Yes YES; n, N, No, NO)

2 2

(23)

Keep units of measurement consistent ID Temperature Height 1 38.8 181 2 37.1 175 3 98.6 72 4 39.0 170 5 36.9 152 6 37.0 161

(24)

If units cannot be consistent, create a variable to clarify

ID Temperature Temperature_Units Height Height_Units

1 38.8 Celsius 181 Centimeters 2 37.1 Celsius 175 Centimeters 3 98.6 Fahrenheit 72 Inches 4 39.0 Celsius 170 Centimeters 5 36.9 Celsius 152 Centimeters 6 37.0 Celsius 161 Centimeters

(25)

Maintain consistency with free text: Check capitalization and spelling!

(preferred: assign a number in codebook) ID Symptoms 1 Cough 2 None 3 COUGH 4 Sore throat 5 None 6 Sore throt Symptoms Frequency Cough 1 COUGH 1 None 2 Sore throat 1 Sore throt 1

(26)

Step 3: Define “missing”

• Indicators of missing or not available

– Use a number to indicate missing – like 999 – something that is not a possible measurement (like 99 for BP)

– Do not leave blank

– Do not use zero

– Do not use NA or UNK – see Step 2

2 6

(27)

Step 4: Format cells:

numbers

• Format cells with numbers in numeric format, not a “general” or “text” format (default).

• If leading zeros necessary (like an ID number of 01-001), then needs to be text (explain in code book; involve data analyst)

2 7

(28)
(29)
(30)

Step 4A: Format dates/times

• Dates and times – fraught with pitfalls for transfer

• Format in Excel to a valid date or date:time or time format

• Dates are stored as the number of days from some arbitrary date and time as the number of second from midnight

3 0

(31)

Step 5: Don’t use text formatting

• Computers can’t read colors or fonts

• Make a new column to modify or group

– Example: instead of highlighting all “done”

entries, make a column labeled “done” and enter 1 / 0

3 1

(32)

What not to do:

Statistical programs are colorblind

Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8

3/25/10 286-2 Left 5.5M 0.181 1139.0637 286-2 Right 5.5M 0.181 1122.2231 6/24/10 18.2-4 Left 3.0M 22 0.184 0.836% 1173.1429 5332%

(33)

Step 6: Data only

• Keep data area clean with just data

– Summary statistics go elsewhere—in another sheet or a clearly marked temporary copy

• SAS will try to read any statistics in the same columns as data

3 3

(34)

Step 7: Compare apples to apples

• Include only related items on a row, e.g. from one patient, or from one admission

• Multiple measurements may require

additional variables (columns); or potentially, additional rows.

3 4

(35)

Step 8: Study ID

• Always include ID information on every row for each patient

– Never include PHI in your spreadsheet

– Assign a “study ID” and use that on file

– If multiple rows of data for a patient, make sure that each row has an ID, date, and any other

information that will uniquely identifies that observation

3 5

(36)

Step 9: Codebook

• Create a codebook or data dictionary including each column in your data set

• Complete name for each column

– Example: TC = “Total Cholesterol (mg/dl)” • Include the meaning of each code

– Example: 1 = Low, 2=Normal, 3=High…

• Critical for data entry and data analysis

3 6

(37)

Create a codebook!

ID Pathology1 Pathology2 Pathology3

1 2 3 4 2 3 4 . 3 3 . . Variable Values Pathology 1= Normal 2= Superficial Ulcer 3= Deep Ulcer 4= Inflammation

(38)

Step 10: Name game

• Give spreadsheet a descriptive name and include your name and date (and a brief description) in Properties

– Example: Curren_AAS_092912.XLSX

– Not data.xlsx

3 8

(39)

Warnings and Pitfalls

• When in doubt, get IRB approval, amendment or waiver, even for “quality” projects

• Save your data on locked folders on secure servers with regular backups. Never on laptops. Never on portable media. Never on local drives.

• Access should be limited to IRB-approved coinvestigators.

• Do not email or otherwise share PHI in a non-secure fashion, even with coinvestigators.

(40)

Final thoughts

• When in doubt, consult a biostatistician

• Remember your hypotheses:

– Type 3 error: the right answer to the wrong question

– George Reed, PhD, UMass

• Never, never, never give up!

4 0

(41)

Part 2:

(42)

Abstract: Plan of attack

• Hypothesis first (& IRB and CITI training)

• Literature search (Pubmed; meetings; Google)

• Plan methods second

– Experiments; data sources

– Pitfalls

– Statistical tests

• Results

– Empty table shells

(43)

Abstracts for dummies

• Write dummy abstract

– Data-free

• Review with PI, colleagues, data analysts, biostatisticians

• Troubleshoot prior to experiments / programming

• Rewrite, re-review, re-circulate

(44)

Abstracts: good, bad, & ugly

• Good

– P.R. for resident, attending

– Meetings are fun

– Deadlines help productivity • Bad & ugly

– Abstract but no paper = wasted effort

– Scooped!

(45)

Abstracts: good, bad, & ugly

• Good

– P.R. for resident, attending

– Meetings are fun

– Deadlines help productivity • Bad & ugly

– Abstract but no paper = wasted effort

– Scooped!

(46)

Manuscript 101

(47)

Manuscript 101

• Never start with the introduction

“As we approach the millenium . . . .”

(48)

Manuscript 101

• Know your audience

– Journal; readership; state of the field

• Start with methods and results

– Methods (steps and numbers, not prose)

• Statistics

– Results

• Empty table shells first

• Limited prose – Outcomes

– Univariate

– Multivariate

(49)

Manuscript 101

• Introduction

– Brief is better

– 3 paragraphs max (can always expand later) 1. The problem

2. Current state of the field

(50)

Manuscript 101

Discussion: the SOAR Seven Paragraphs

1. What we’ve found

2. What others have found previously

3. How our results extend/better/change #2 4. Limitations of our work

5. BUT . . . Still important because 6. Summary

(51)

Support

– ASCO (PanCAN/Samuel Stroum) – American Surgical Association – Howard Hughes Medical Institute – American Cancer Society

– NIH (UMass CTSA) – BIDMC

– Philanthropy

• Pancreatic Cancer Alliance

• www.pancreaticalliance.org

• Verville, Maney family foundations • Grateful patients

(52)

References

Related documents

5 H SERIES TOPIC 35 Fractons Mathletics © 3P Learning Ltd  Word problems with fractions. (i) In a group of eighteen friends, one third are girls and one sixth of these girls

[r]

The causes include cold delivery room (temperature <25 0 C) and resuscitation corner, using air conditioner and/or fan, keeping the windows/doors open especially in winter,

The recent outbreak of novel coronavirus dis- ease-2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2), emerged in Wuhan, China, in

Include Execution Range in Prices Display Show prices in the FX Board module with the execution range defined for the currency pair and size, giving you an immediate indication of

The VALUE statement defines a character or numeric format, PICTURE defines the appearance of numeric variables, and INVALUE defines input formats.. When using this format variables

 When there is large amounts of data in text format, and you want convert them to numeric format, you can do the following: On the Home tab, in the Cells group, select Convert Text

Year range in model includes a complete range for all systems with new or updated functionality... sensor 2010 ● Zero compensation of