• No results found

Understanding Unduplicated Count and Data Integration

N/A
N/A
Protected

Academic year: 2021

Share "Understanding Unduplicated Count and Data Integration"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

National HMIS Conference

September 14th and 15th, 2004

Understanding Unduplicated

Count and Data Integration

Presenters:

Loren Hoffmann, System Administrator

WI Statewide HMIS

Ray Allen, Executive Director

Community Technology Alliance

(2)

Topics to be covered:

Types of fields - and data quality

Statistical Considerations

An “unduplicated Count”

z

Overcounts

z

Undercounts

(3)

Data Quality:

General tests:

Completeness

z

NULL vs “something”

Validity

z

Is the data valid?

(4)

Data Quality:

Types of data fields:

z

Picklists; yes/no

z

Text - numeric, alphanumeric

z

Date

(5)

Data Quality: Picklists

Validity - must be item from picklist

Completeness - response/no response

Who updates the list?

z

System administrator or user

(6)

Data Quality: Date field

Valid date; NULL value

Determining Validity:

z

Control by format on entry

z

mmddyyy 10152003

z

mmmddyyyy Oct152004

z

Is date a valid date

(7)

Data Quality: Text

Little that can be easily validated on a

large scale

(8)

Data Quality: Completeness

For a given field, how many NULLS are

there?

z

For the entire database

z

For a specified period of time

z

For a given agency/user

(9)

Unduplicated Count

and the Client identifier

To generate an unduplicated count (or to

merge systems), most HMIS systems create

and/or generate a common client identifier.

(10)

(UN) Duplicate Counts

How does your system manage the

“unique client” or “unduplicated client

count”?

z

You need to know the algorithm used

z

Evaluate the data elements that are used

(11)

Un-duplicated Counts

Two possible errors

It is not magic or foolproof.

Undercount the number of clients:

z

The system counts two client record

entries as a single client when it really is

two clients

z

Overcount the number of clients:

z

The system counts two client record

entries as two clients when it really is the

same client

(12)

Unique Client Count

(13)

Unique Client Count

Put all the clients in the same room and

count them;

Make a list of all the clients that you

know;

(14)

Unique Client Count

Using:

z

Client first name

z

Client last name

z

Client date of birth

z

Client gender

(15)

Un-duplicated Counts

An example - how many clients?

Using first name, last name, gender,

date of birth

z

William Smith, male, 10-15-1973

z

Bill Smith, male, 10-15-1973

z

William Smith, male, no DOB

z

Consider: address, race, HH members,

(16)

Statistical Considerations

Defining the universe:

z

Number of client records in the system vs.

z

Number of ACTIVE client records vs.

z

Number of UNDUPLICATED client records

vs.

z

Number of valid responses for a given data

(17)

Defining the Universe:

Example:

z

1200 client records

z

1100 ACTIVE client records

z

1000 UNDUPLICATED clients

z

980 DOB fields have data

(18)

Defining the Universe:

DOB example - 980 of 1000 had a valid

date, therefore:

z

If 70% of the 980 records are 18+ (adults)

then the actual number of adults on the

system is between 68% and 72% (margin

of error is 2%)

(19)

Issue:

What do I do with conflicting answers

for the same client? e.g. different race,

DOB, or response to a question like :

z

“Is client homeless?” with both a “yes” and

a “no”

(20)

Coverage of Data

Database statistics vs the “universe”

z

Determine the relevant universe

EX. Emergency Shelters

z

Men’s shelters

z

Women’s shelters

z

Family units

(21)

Merging Databases

Most HMIS systems are decentralized and

will require some form of systems

integration and/or data migration to obtain

unduplicated counts, service utilization

patterns and characteristics of homeless

persons served.

(22)

11 County Region of Northern California

Population = 7,512,499

Geographic Area = 10,691

(sq. miles)

Equivalent in size to the

state of Maryland

(23)

BACHIC

Bay Area Counties Homeless Information Collaborative

Mission:

To better enable policy makers, service agencies,

and funders to understand and service the needs of the

homeless within the community

Goals:

z

Obtain unduplicated regional count of homeless persons

z

Identify prevalence of cross-county chronic homelessness

z

Understand client movement across continuum boundaries

z

Analyze service usage across continuums

z

Inform funders about effectiveness of sponsored programs

in the region

z

Leverage HMIS learning and expertise across multiple

(24)

BACHIC

Product: Regional HMIS Data Warehouse

Outcomes:

z

Better planning and resource management

z

Clearer vision of the present and future needs of the

homeless

Sponsored by the Charles and Helen Schwab

Foundation

(25)

HMIS Implementations by County

ServicePoint

- All locally hosted

except for Contra

Costa and Monterey

Legacy System

MS Access

Metsys

(26)

RHINO Data Collection

Regional Homeless Information Network

BACHIC group agreed to the collection of

z

All Universal Data Elements

z

All Program-Specific Data Elements

What each county has agreed to forward

RHINO

z

All Universal & Program Elements except for

Protected Personal Information (PPI)

(27)

Data Warehouse and Counties

Regional HMIS Data Warehouse San Francisco Metsys System San Mateo Daisy/HOPE System ServicePoint Stand-alone Locally Hosted System ServicePoint Hosted

Systems

(28)

RHINO Design

Encryption (SSH2) County HMIS System

Data Entry Extraction Transformation

Regional HMIS Data Warehouse

BACHIC Reports

INTERNET

INTERNET

Standard Format

(29)

Project Phases

Vision

Automated Real-Time Flexible Accurate

Phase I Phase II Phase III

Hardware Software Testing Validation Of Design Pilot of Santa Clara County Implementation Of Select Diverse Counties All other Counties Phase IV Growth

(30)

Design of System

Minimize counties’ efforts

z

Especially ongoing duties/obligations

Security, privacy

Multiple diverse HMIS systems

z

Different stages of implementations

z

Different data formats

Reporting

z

Flexibility so as not to limit future reporting

choices

Work flow

z

Processes and procedures for resolving

Linux Server

SQL Server

(31)

Transference of Data

Data from counties will be CSV

format (min. requirement)

Minimum encryption (128 bit) using

SSH2 (Secure Shell Version 2)

Regional HMIS Data Warehouse

County HMIS Systems

Double firewall for

increased security

Future use of OpenSSL

(Open Source software)

(32)

Regional Unique Identifier

Required for de-duplication of customers

within 11 counties.

Information from personal identifiable data

elements.

Uses a hash algorithm to encrypt ID.

Key is created by 11 counties, unknown to

data warehouse team.

(33)

Data Integrity

Before data is merged, it will be checked for

the following:

z

Each record/entity ID is unique

z

Required data elements have some value

z

Date formats are correct and values are

reasonable

z

Code values conform to HMIS Standard

z

Ex. Gender: Male=0, Female=1

z

All data elements can be linked back to a unique

(34)

Reporting

Item

Scope Status Grid

As of 12/04 Counties (CoC’s) 11 Agencies 85 Emergency Shelters 47 Emergency Beds 8600 Etc…

Demographics

z

Total client population:

z

Age

z

Race

z

Ethnicity

z

Adult client population

z

Gender

z

Income Sources

z

Disabilities

z

Family income group

z

Top 10 last permanent

Contra Costa 3% Marin 2% Monterey 3% Napa 10% Santa Clara 11% Santa Cruz 8% Solano 7% Sonoma 9% Alameda 15%

(35)

Reporting

% Veteran Status Chronically Homeless Yes No

Grand Total Yes 63% 45% 49% No 38% 55% 51% Grand Total 100% 100% 100%

Demographics

z

Families w/ children

z

Single

z

Veterans

z

Chronically Homeless

Migration and Service Access

z

Last permanent zip outside

county

z

Clients receiving shelter or

other services in multiple

counties

Program Effectiveness

z

Reason for leaving

z

Destination

• 15% of Adult Client

Population are Veterans

Age Group % of Total Client Population 17 and under 27% 18 – 30 17% 31 – 50 40% 51 – 61 8% 62 and over 2% Unknown 7%

(36)

Contact Information

Community Technology Alliance

115 East Gish Road, Suite 222

San José, CA 95112

Ray Allen

Executive Director

(408) 437-8800

(37)

References

Related documents

The fact that current account deficit Granger cause budget deficit suggest that policy makers in these countries tend to respond with additional government spending in response

All updates and optional software packages are available at www.myelux.com or the Fujitsu support portal Manageability optional: Scout Enterprise for flash memory management,

o Brought student groups to the 2008 and 2007 Maryland Business Education Association annual conference and include students in my professional

Notes: All samples showed 100% of cell viability except in the case of time-varying magnetic fields applied on magnetically loaded cells, which caused 95% ± 5% cell

The front panel audio header supports Intel High Definition audio (HD) and AC'97 audio. You may connect your chassis front panel audio module to this header. Make sure the

This article examines the long run relationship between economic growth and stock prices for Canada and the United States through cointegration estimation procedure, and it

Successful applicants will be awarded Federal fiscal year (FY) 2011 competitive grant funds, in addition to the FY 2011 MIECHV formula based funds, to support the effective

The stent spans the trachea from the caudal edge of the cricoid cartilage to just cranial to the