• No results found

Creating a Database. Frank Friedenberg, MD

N/A
N/A
Protected

Academic year: 2021

Share "Creating a Database. Frank Friedenberg, MD"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Creating a Database

(2)

Classic Data Management Flow

for Clinical Research

Scientific Hypotheses

Identify Specific Data Elements Required to Test Hypotheses

Data Acquisition Instruments (optional) (paper forms)

Computer Program for Data Collection and Organization (e.g. Access)

Output to Analytical Software (e.g. SPSS)

(3)

Define data set to acquire, create codes (e.g. F=0, M=1) Paper Data forms

Enter data into computer

Transfer to statistical and presentation software Do statistical analysis Prepare graphs

and tables

Data Management Process

Data quality control (check for missing, out of range and illogical responses)

(4)

Differences between a clinical and

research database

Clinical database (e.g. PACS®)

– Report oriented so data is displayed for clinical decision making (e.g. abdominal X-ray report)

– Emphasis on displaying or reporting of individual data rather than accumulating multiple records

– Not a lot of variables recorded

Research database

– Table oriented so that data is accumulated for eventual export to a statistical package for data analysis and

reporting

– Less emphasis on individual records

– Create Forms to represent tables that are storing information

(5)
(6)
(7)

Databases versus Excel

Excel has some capabilities to sort data but its primary

function is to create financial spreadsheets

Can be used for small and limited research data sets

and simple lists

Databases (e.g. ACCESS) are designed to collect, sort,

and manipulate data

Can create Data Quality Control features that ensure

valid data is entered and missing data is eliminated

A relational database allows for linking of an

unlimited number of tables and therefore an

unlimited number of variables

(8)

Id Name Age 10 Smith 50 11 Jones 55 12 Doe 60 ID Weight (lb) Weight (kg) 10 230 104.5 11 212 96.4 12 199 90.4 ID KCAL KCAL/kg 10 2400 23.1 11 2652 27.5 12 2350 25.9

Relational Database Example

ID V02 V02/kg 10 2.8 26.7

11 3.2 33.1

12 2.1 23.2

Subject Info Anthropometrics

(9)

Relationships Between Tables

(10)

Databases versus Excel

Databases are more suitable for importing

data from multiple sources

• Imports of different data types (e.g. SAS files and Dbase files) into different tables can be linked via common

identifiers such as subject ID

• Merging multiple data sources into Excel can be a challenge

(11)

How is a database organized?

• Multiple linked tables

• Tables store records (rows in the database)

• Tables have a collection of fields (these are the columns)

– Patient identifiers

(12)

Table = Records and Fields

Re

co

rd

s

Fields

(13)
(14)

Advantages of a Database

Collection of data in a centralized location

Data stored so as to appear to users in one

location

(15)

Sharing and Exchanging Data

Multiple users can access the same database

via a network

Can be local or over the internet

Access via a client application or web interface

Server allows remote access over the internet

from anywhere

(16)

Database Design Considerations

What to collect

What questions are to be answered?

Think of the data tables in your future publications

• Focus on the key data elements rather than collect as much as possible

What statistical package will be used?

May need to format the data to accommodate

program it will be exported to

– For example, gender can be recorded in the database as M or F but statistical package may require 0 and 1

(17)

What needs to be in the research database?

Research variables directly related to the

hypotheses being tested-

YES

Clinical measures used for screening-

MAYBE

– Blood work, ECG, medical history

Administrative data-

NO

– Contact information

(18)

Examples of Data Collection Forms

• Paper – then into Excel; tedious, mistakes common

• Scantron - Must scan in; many opportunities for mistakes

• Directly into Excel -tedious

• Direct computer entry via forms

• Allow real-time validity/completeness checks, prompts

(19)

Designing the Questions

• Try to collect continuous data – convert to categorical during analysis period. (e.g. age)

• Use validated scales/instruments

• Don’t build your own unless unavoidable

• Consider asking in more than one way a question concerning a critical variable under study. (allows for validity assessment)

• Avoid measurements that cluster at one position or one end of a scale

• E.g. measuring body temperature on healthy outpatients

(20)

What is Data Coding?

A group of letters, numbers, or symbols and

the rules that form a link to a specific

terminology (e.g. male =1, female =2)

Coded references should be incorporated into

a data dictionary

Dictionaries should be based on standardized

terminology

e.g. if doing a study on cirrhotics categorize

(21)

Why code?

• Forces data into analyzable structure, format

• Speeds data

input/transcription

• Vastly simplifies data analysis/reporting

(22)
(23)

Rules for Data Entry

• Decide which variables require a number or string code

– e.g. code subject’s name or MRN as a string variable

• Continuous values are entered directly

• Missing values must be different values from a real response

– Common formats are “99” or bullets “·” Don’t use “99” if the variable is a continuous data field!

– Don’t know is a response—do not leave blank. Have a code for “don’t know”

– Don’t use “0” for missing, just leave blank

(24)

Avoid open-ended questions in

subject/patient surveys

What is your gender?___________________

-correct responses could be man, woman, male, female

What is your level of education?__________

-the answers 9

th

grade, did not finish high school, and

no college education all are correct.

-provide specific choices for reply.

(25)
(26)

Be Careful With Scales

(27)
(28)

Data in Spreadsheet

Finding Errors

Subject ID Gender Age 1001 Male 52 1002 Male 54 103 Mael 65 1004 Female 54 5 Female 52 1006 Female 52 1007 Femele 75 1008 Male 48 1009 M 37 1010 Female 73 11 F 54

(29)
(30)

Exploratory Data Analysis

Explore why there is missing data

Is this a software or coding glitch?

Did you delete by accident? (always back-up)

Examine for unusual consistency /inconsistency

issues

Is every patient measured one month 6’10”

Good time to use histograms and calculate skewness

Find inadmissible ranges and codes

(31)

Attributes of Successful Data

Management

Attention to detail

Explicit structure and process

Robust designs

– Anticipate failures, lapses and mistakes

References

Related documents

6 Before Database System introduced, early manual system was used in 1950s and the previous year where data was stored as paper records. Therefore, this system is

z 1100 ACTIVE client records z 1000 UNDUPLICATED clients z 980 DOB fields have data. z 500 Marital Status fields have

Figure 7.1 shows statements in SQL to create a database called DVD Rentals, and define a table called MEMBER with fields Member Number, Name, Address, Post Code and Telephone

Image and data files stored on a separate file server and referenced by pointers in the relational database system.... Oracle RDBMS J2EE™

When such a database is properly set up, a user may, depending on the available connection software (the database driver), create tables, input data, and query data.. Click on

Database Systems Design Example Relations EXAMPLE 1 Bank Schema Branch each Account Depositor Loan Borrower.. Collection management loan process and accession records Biodiversity

» A platform for code analysis based on code property graphs stored in a graph database. » Extensible

The students will learn to use managed code to implement database objects, such as stored procedures, user-defined data types, user-defined functions, and