Query Log Analysis of an Electronic Health Record Search Engine

(1)

Query Log Analysis of an Electronic Health

Record Search Engine

Lei Yang1_{, Qiaozhu Mei}1,2_{, Kai Zheng}1,3_{, David A. Hanauer}4,5

1,_{School of Information} 2._{Department of EECS}

3._{School of Public Health Department of Health Management and Policy} 4._{Comprehensive Cancer Center}

5._{Department of Pediatrics}

(2)

Motivation of the Study

•  There is a significant need to conduct

full-text

search in medical records

(e.g., by clinicians,

researchers, administrators)

• Very few electronic health records (EHR)

have full-text search functionality (Natarajan et

al., IJMI 2010)

(3)

Information Retrieval in EHR

- Retrieving Electronic Health Records

•  Specialized language •  Rich, implicit intra/inter

document structures

•  Deep NLP/Text Mining is necessary

•  Complicated information needs

(4)

What are We Going to Do?

•  By studying query log of a EHR search engine

–  What are users of electronic health record search engine looking for?

–  How different is this search task from Web search? –  How to apply the learning to build effective EHR

(5)

EMERSE

•  EMERSE - Electronic Medical Record Search Engine •  Full-text search engine

•  Widely used in UMHS since 2005 (and VA)

•  Boolean keyword queries

•  Semi-automatic query suggestion

•  Collaborative search (Zheng et al., JAMIA 2011)

(6)

(7)

(8)

(9)

(10)

Descriptive Statistics of the Log Data

•  202,905 queries collected over 4 years (2006 - 2010) •  533 users (medical professionals in UMHS)

•  Each query log recorded query string, user, and time stamp.

•  We employed effective heuristics to classify queries as three types.

55% 26%

19%

Types of Queries

User typed-in query

System suggested query Bundle query

(11)

Examples of Query Log

•  Non-bundle query could be either a user typed-in

query or a system suggested query.

User Query Type Time IP Address

A Chemotherapy Terms Bundle 10/06/2006 14:34:29 10.20.1.31 B infection Non-bundle 09/19/2006 10.20.97.60

(12)

Temporal Patterns

Hours of a day Days of a week (Mon - Sun)

(13)

What are users looking for?

Ok, that’s the query log

(14)

To Describe Users’ Information Needs

•  Extract

a category list

from SNOMED Clinical

Terms, the comprehensive clinical terminology.

•  Categorize

each query

into one of the categories

in the list.

•  Use the

query

category distribution

to describe

users’ information needs.

(15)

SNOMED CT Ontology Structure

ROOT Procedure Child … Clinical _finding Child 19 top-level concepts

(16)

An Example

•  Concept procedure = {t1, t2, t3, t4, t5}

•  Concept clinical_finding = {t1, t6, t7, t8, t9}

•  Query q = {t1, t6, t7, t8}

•  |procedure ∩ q | = 1

•  |clinical_finding ∩ q | = 4

(17)

Distribution of Query Category

Clinical finding 28.00% Pharmaceuti cal / biologic product 12.20% Procedure Unknown Situation with explicit context 8.90% Body structure 6.50% Others 22.30%

(18)

Top 3 Categories of Each Type of Query

Clinical finding 20.70% Unknown 16.50% Pharm. / biologic product 13.10%

User typed-in query

Clinical finding 49.20% Procedure 14.6% Pharm. / biologic product 17.20%

System suggested query

Clinical finding 20.00% Procedure 18.50% Body structure 11.4% Bundle query

(19)

How different is EHR search from Web

search?

Diversified queries reflect diversified

information need.

(20)

Questions

•  How are queries distributed?

–  Are navigational queries also popular in EHR search?

–  What are the most frequently used queries?

•  How hard is the search task?

–  Are EHR queries more complex to construct? –  How many query terms are outside the scope of

formal biomedical vocabularies?

–  What is the average length of search sessions?

(21)

Query Distribution

Long tail but no fat head

Conclusion:

Fewer navigational queries compared to Web search (e.g., “facebook”, “amazon”)

(22)

Top Queries

Rank Query Type Rank Query Type

1 Cancer Staging Terms Bundle 11 PEP risk factors Bundle 2 Chemotherapy Terms Bundle 12 Cigarette/alcohol use Bundle 3 Radiation Therapy Terms Bundle 13 comorbidities Bundle 4 Hormone Therapy Terms Bundle 14 Radiation Typed-in

5 CTC Toxicities Bundle 15 lupus Bundle 6 GVHD Terms Bundle 16 Deep Tissue Injuries Bundle 7 Performance Status Bundle 17 Life Threatening Infections Bundle 8 Growth Factors Bundle 18 CHF Bundle 9 Other Toxicity Modifying

Regimen

Bundle 19 relapse Typed-in

(23)

Query Length

All User System Bundle Bundle

The average length of user typed-in queries (1.7 terms) is less than that of Web search queries (2.35 terms).

Conclusion: information need of EHR users is much more difficult to be represented than that of Web search.

(24)

Dictionary Coverage

Dictionary Size of Vocabulary (terms) Query Term Coverage SNOMED CT 30,455 41.6%

UMLS 797,941 63.6%

Web search query term coverage: 85% - 90%

Unfortunately, even the meta-dictionary can only cover 68.0%. (e.g., adsr01->?, anemaia->anemia)

Conclusion: This low coverage indicates more challenging to handle in EHR search engine.

(25)

Session Analysis

•  Definition

–  A search session: a basic mission of a user to achieve a single information need.

User Time Query

A 10:29:04 warifin A 10:29:08 warfrin A 10:29:11 warfarin

(26)

Session Analysis

•  Observations

The average length of a session is 14.8 minutes, which is considerably longer than a session of Web search.

27.7% sessions end with a search bundle compared to the overall bundle usage of 19.2%.

Many users seek for bundles after a few unsuccessful attempts to formulate queries.

A search task in EHR is generally more difficult than a task in Web search.

(27)

What have we learned?

Query log analysis

(28)

Lessons Learned

•  EHR search is much more challenging than Web search

–  More complex information need –  Longer queries, more noise

•  Users have substantial difficulty to formulate their queries

–  Longer search sessions

–  High adoption rate of system generated queries

•  Collaborative Search might be the solution

–  The effectiveness of collaborative search encourage to further explore social-information-foraging-techniques.

(29)

Future Work

•  Query suggestion using large scale, heterogeneous language networks

•  Useful presentation of search results

•  Support result analysis and decision making

•  Enhance collaborative search in EHR search engines •  …

(30)

References

•  [1] Natarajan K, Stein D, Jain S, Elhadad N. An analysis of clinical queries in an electronic health record search utility. Int J Med Inform. 2010 Jul;79(7):515-22. Epub 2010 Apr 24.

•  [2] Zheng K, Mei Q, Hanauer DA. Collaborative search in electronic health records. Journal of the American Medical Informatics Association. 2011;18(3):282–91.

(31)

Query Log Analysis of an Electronic Health Record Search Engine

Query Log Analysis of an Electronic Health

Record Search Engine

Motivation of the Study

• There is a significant need to conduct

full-text

search in medical records

(e.g., by clinicians,

researchers, administrators)

•

Very few electronic health records (EHR)

have full-text search functionality (Natarajan et

al., IJMI 2010)

Information Retrieval in EHR

- Retrieving Electronic Health Records

What are We Going to Do?

• By studying query log of a EHR search engine

EMERSE

Descriptive Statistics of the Log Data

Examples of Query Log

• Non-bundle query could be either a user typed-in

query or a system suggested query.

Temporal Patterns

What are users looking for?

Ok, that’s the query log

To Describe Users’ Information Needs

• Extract

a category list

from SNOMED Clinical

Terms, the comprehensive clinical terminology.

• Categorize

each query

into one of the categories

in the list.

• Use the

query

category distribution

to describe

users’ information needs.

SNOMED CT Ontology Structure

An Example

• Concept procedure = {t1, t2, t3, t4, t5}

• Concept clinical_finding = {t1, t6, t7, t8, t9}

• Query q = {t1, t6, t7, t8}

• |procedure ∩ q | = 1

• |clinical_finding ∩ q | = 4

Distribution of Query Category

Top 3 Categories of Each Type of Query

How different is EHR search from Web

search?

Diversified queries reflect diversified

information need.

Questions

• How are queries distributed?

• How hard is the search task?

Query Distribution

Top Queries

Query Length

Dictionary Coverage

Session Analysis

Session Analysis

What have we learned?

Query log analysis

Lessons Learned

Future Work

References

•  There is a significant need to conduct

•  By studying query log of a EHR search engine

•  Non-bundle query could be either a user typed-in

•  Extract

•  Categorize

•  Use the

•  Concept procedure = {t1, t2, t3, t4, t5}

•  Concept clinical_finding = {t1, t6, t7, t8, t9}

•  Query q = {t1, t6, t7, t8}

•  |procedure ∩ q | = 1

•  |clinical_finding ∩ q | = 4

•  How are queries distributed?

•  How hard is the search task?