National HMIS Conference
September 14th and 15th, 2004
Understanding Unduplicated
Count and Data Integration
Presenters:
Loren Hoffmann, System Administrator
WI Statewide HMIS
Ray Allen, Executive Director
Community Technology Alliance
Topics to be covered:
Types of fields - and data quality
Statistical Considerations
An “unduplicated Count”
z
Overcounts
z
Undercounts
Data Quality:
General tests:
Completeness
z
NULL vs “something”
Validity
z
Is the data valid?
Data Quality:
Types of data fields:
z
Picklists; yes/no
z
Text - numeric, alphanumeric
z
Date
Data Quality: Picklists
Validity - must be item from picklist
Completeness - response/no response
Who updates the list?
z
System administrator or user
Data Quality: Date field
Valid date; NULL value
Determining Validity:
z
Control by format on entry
z
mmddyyy 10152003
z
mmmddyyyy Oct152004
z
Is date a valid date
Data Quality: Text
Little that can be easily validated on a
large scale
Data Quality: Completeness
For a given field, how many NULLS are
there?
z
For the entire database
z
For a specified period of time
z
For a given agency/user
Unduplicated Count
and the Client identifier
•
To generate an unduplicated count (or to
merge systems), most HMIS systems create
and/or generate a common client identifier.
(UN) Duplicate Counts
How does your system manage the
“unique client” or “unduplicated client
count”?
z
You need to know the algorithm used
z
Evaluate the data elements that are used
Un-duplicated Counts
Two possible errors
It is not magic or foolproof.
Undercount the number of clients:
z
The system counts two client record
entries as a single client when it really is
two clients
z
Overcount the number of clients:
z
The system counts two client record
entries as two clients when it really is the
same client
Unique Client Count
Unique Client Count
Put all the clients in the same room and
count them;
Make a list of all the clients that you
know;
Unique Client Count
Using:
z
Client first name
z
Client last name
z
Client date of birth
z
Client gender
Un-duplicated Counts
An example - how many clients?
Using first name, last name, gender,
date of birth
z
William Smith, male, 10-15-1973
z
Bill Smith, male, 10-15-1973
z
William Smith, male, no DOB
z
Consider: address, race, HH members,
Statistical Considerations
Defining the universe:
z
Number of client records in the system vs.
z
Number of ACTIVE client records vs.
z
Number of UNDUPLICATED client records
vs.
z
Number of valid responses for a given data
Defining the Universe:
Example:
z
1200 client records
z
1100 ACTIVE client records
z
1000 UNDUPLICATED clients
z
980 DOB fields have data
Defining the Universe:
DOB example - 980 of 1000 had a valid
date, therefore:
z
If 70% of the 980 records are 18+ (adults)
then the actual number of adults on the
system is between 68% and 72% (margin
of error is 2%)
Issue:
What do I do with conflicting answers
for the same client? e.g. different race,
DOB, or response to a question like :
z
“Is client homeless?” with both a “yes” and
a “no”
Coverage of Data
Database statistics vs the “universe”
z
Determine the relevant universe
EX. Emergency Shelters
z
Men’s shelters
z
Women’s shelters
z
Family units
Merging Databases
Most HMIS systems are decentralized and
will require some form of systems
integration and/or data migration to obtain
unduplicated counts, service utilization
patterns and characteristics of homeless
persons served.
11 County Region of Northern California
Population = 7,512,499
Geographic Area = 10,691
(sq. miles)
Equivalent in size to the
state of Maryland
BACHIC
Bay Area Counties Homeless Information Collaborative
Mission:
To better enable policy makers, service agencies,
and funders to understand and service the needs of the
homeless within the community
Goals:
z
Obtain unduplicated regional count of homeless persons
zIdentify prevalence of cross-county chronic homelessness
zUnderstand client movement across continuum boundaries
zAnalyze service usage across continuums
z
Inform funders about effectiveness of sponsored programs
in the region
z
Leverage HMIS learning and expertise across multiple
BACHIC
Product: Regional HMIS Data Warehouse
Outcomes:
z
Better planning and resource management
z
Clearer vision of the present and future needs of the
homeless
Sponsored by the Charles and Helen Schwab
Foundation
HMIS Implementations by County
ServicePoint
- All locally hosted
except for Contra
Costa and Monterey
Legacy System
MS Access
Metsys
RHINO Data Collection
Regional Homeless Information Network
BACHIC group agreed to the collection of
z
All Universal Data Elements
z
All Program-Specific Data Elements
What each county has agreed to forward
RHINO
z
All Universal & Program Elements except for
Protected Personal Information (PPI)
Data Warehouse and Counties
Regional HMIS Data Warehouse San Francisco Metsys System San Mateo Daisy/HOPE System ServicePoint Stand-alone Locally Hosted System ServicePoint Hosted
Systems
RHINO Design
Encryption (SSH2) County HMIS System
Data Entry Extraction Transformation
Regional HMIS Data Warehouse
BACHIC Reports
INTERNET
INTERNET
Standard FormatProject Phases
Vision
Automated Real-Time Flexible AccuratePhase I Phase II Phase III
Hardware Software Testing Validation Of Design Pilot of Santa Clara County Implementation Of Select Diverse Counties All other Counties Phase IV Growth
Design of System
Minimize counties’ efforts
z
Especially ongoing duties/obligations
Security, privacy
Multiple diverse HMIS systems
z
Different stages of implementations
zDifferent data formats
Reporting
z
Flexibility so as not to limit future reporting
choices
Work flow
z
Processes and procedures for resolving
Linux Server
SQL Server
Transference of Data
Data from counties will be CSV
format (min. requirement)
Minimum encryption (128 bit) using
SSH2 (Secure Shell Version 2)
Regional HMIS Data Warehouse
County HMIS Systems
Double firewall for
increased security
Future use of OpenSSL
(Open Source software)
Regional Unique Identifier
Required for de-duplication of customers
within 11 counties.
Information from personal identifiable data
elements.
Uses a hash algorithm to encrypt ID.
Key is created by 11 counties, unknown to
data warehouse team.
Data Integrity
Before data is merged, it will be checked for
the following:
z
Each record/entity ID is unique
z
Required data elements have some value
z
Date formats are correct and values are
reasonable
z
Code values conform to HMIS Standard
z
Ex. Gender: Male=0, Female=1
z
All data elements can be linked back to a unique
Reporting
ItemScope Status Grid
As of 12/04 Counties (CoC’s) 11 Agencies 85 Emergency Shelters 47 Emergency Beds 8600 Etc…Demographics
z
Total client population:
z
Age
zRace
z
Ethnicity
z
Adult client population
z
Gender
z
Income Sources
z
Disabilities
z
Family income group
zTop 10 last permanent
Contra Costa 3% Marin 2% Monterey 3% Napa 10% Santa Clara 11% Santa Cruz 8% Solano 7% Sonoma 9% Alameda 15%
Reporting
% Veteran Status Chronically Homeless Yes NoGrand Total Yes 63% 45% 49% No 38% 55% 51% Grand Total 100% 100% 100%
Demographics
zFamilies w/ children
zSingle
zVeterans
zChronically Homeless
Migration and Service Access
z
Last permanent zip outside
county
z
Clients receiving shelter or
other services in multiple
counties
Program Effectiveness
z
Reason for leaving
z
Destination
• 15% of Adult Client
Population are Veterans
Age Group % of Total Client Population 17 and under 27% 18 – 30 17% 31 – 50 40% 51 – 61 8% 62 and over 2% Unknown 7%