6.0 Evaluation
6.1.4 Data Collection
Table 13 summarizes the data collected for each study task. It should be noted that the original scale items for the key UTAUT constructs in the subjective assessment task (performance expectancy, effort expectancy) were experimentally selected from the scale items of constructs from other models of technology acceptance and use (called root constructs). Only a few scale items were originally selected for each key construct and not all scale items were relevant to assess in the context of the proposed experiment (e.g., performance expectancy scale item of “If I use the system, I will increase my chances of getting a raise”). Therefore, for each key construct, I selected a set of scale items from each respective root construct that were relevant to assess in the context of the proposed experiment.
Table 13. Data collected for each study task
Study Task Data Collected
Background Questionnaire
Current clinical position (e.g., resident)
Length of time in current position (e.g., <1 year) Patient Case
Review
Data collected for each patient case:
Time-stamped interactions with application interface (e.g., tab selections, lab tests viewed)
List of information selected to discuss during rounds
Urgency decision accuracy (see Figure 20)
Urgency decision confidence, rated from 1—not confident at all to 5—extremely confident
Free-text rationale for urgency decision
Time (in seconds) to review patient case (excludes verbal case presentation, see Figure 20)
Audio-recording of verbal patient case presentation
Moderator notes on interesting comments or behavior during case review
Subjective Assessments
Data collected for “prediction only” and “explanation” displays:
Selected UTAUT Root Construct Scale Items for Performance Expectancy38 (Likert scale agreement):
1. Using the system would enable me to accomplish tasks more quickly.
2. Using the system would make it easier to do my job.
3. Using the system would increase my productivity.
4. I would find the system useful in my job.
Selected UTAUT Root Construct Scale Items for Effort Expectancy38 (Likert scale agreement):
1. My interaction with the system would be clear and understandable.
2. I would find the system easy to use.
3. It would be easy for me to become skillful at using the system.
Free-text feedback on the display (optional)
6.1.5Data Analysis
Audio recordings of all verbal case presentations were transcribed verbatim and compiled with urgency decision rationales and moderator notes for each case. Answers to background questionnaires were summarized in a contingency table. Based on the background questionnaire responses, two levels of clinical experience (residents and fellow/attendings) were defined for use in analyses. Primary outcomes of interest included the impact of the user-centered explanation display on decision accuracy, decision confidence, case review efficiency, and provider perceptions of the pediatric ICU in-hospital mortality risk model. Analyses for each outcome are summarized in Table 14 and described in the next few sections. P-values of <0.05 were considered
significant for all statistical analyses, which were carried out using Stata version 15.133 Plots were generated using the Python packages seaborn version 0.9.0134 and matplotlib version 3.0.3.135
Table 14. Summary of analyses examining the impact of the user-centered explanation display on outcomes
Outcome
Display Comparison
Groups
Metrics Analytic approach
Decision accuracy
“No model” “Prediction
only”
Urgency decision accuracy Proportion of correct decisions with 95% CI
Logistic mixed effect analysis Precision and recall in selecting
relevant information
Visual review of violin plots Mentions of predictive model in
rationales, transcripts, or notes
Qualitative review to assist in interpretation of quantitative results Decision confidence “No model” “Prediction only”
Urgency decision confidence Visual review of stacked bar charts
Ordinal logistic mixed effects analysis Mentions of predictive model in
rationales, transcripts, or notes
Qualitative review to assist in interpretation of quantitative results Case review efficiency “No model” “Prediction only”
Time to review patient case Descriptive statistics
Log-linear mixed effects analysis Number of unique items viewed
(computed from interactions data)
Descriptive statistics
Poisson mixed effects analysis Total number of items viewed
(computed from interactions data)
Descriptive statistics
Negative binomial mixed effects analysis Provider
perceptions
“Prediction
only” UTAUT questionnaire responses Visual review of stacked bar charts
Free-text feedback on displays and moderator notes
Qualitative review for insights about participant perceptions of predictive model
Analysis of decision accuracy
Decision accuracy included participant accuracy in urgency decisions (i.e., identifying patients who need to be seen urgently) as well as selecting relevant information to discuss with the rounding team. To evaluate urgency decision accuracy, the proportion of correct decisions with 95% CIs for each of the three displays were calculated and a logistic mixed effects analysis of the relationship between urgency decision accuracy and display was performed. Display, case urgency (urgent, non-urgent), and participant experience (resident, attending/fellow) were included as fixed effects in the model (no interaction terms), and an intercept for participant was included as a random effect in the model. To assess accuracy in selecting relevant information, participant
precision and recall in selecting ‘relevant’ items were calculated, where information items selected by a senior pediatric ICU attending using the “explanations” display served as the gold standard. Precision and recall scores for each display were visualized using violin plots. Decision urgency rationales, case presentation transcripts, and moderator notes were reviewed for mentions of the predictive model tool and to assist in interpretation of the results.
Analysis of decision confidence
To assess the relationship between the display shown and participant-reported confidence in their urgency decision, confidence ratings for each of the displays were visualized in a stacked bar chart and an ordinal logistic mixed effects analysis was performed. Display, case urgency (urgent, non-urgent), and participant experience (resident, attending/fellow) were included as fixed effects in the model (no interaction terms), and an intercept for participant was included as a random effect in the model. Decision urgency rationales, case presentation transcripts, and moderator notes were reviewed for mentions of the predictive model tool and to assist in interpretation of the results.
Analysis of case review efficiency
Case review efficiency consisted of the time it took participants to review each patient case and the amount of information being viewed, which was measured by the number of items (e.g., lab test, vital sign) viewed during the case. Descriptive statistics were used to summarize the case review time, number of unique items viewed, and the total number of items viewed. To assess the relationship between the display shown and case review time, a log-linear mixed effects analysis was performed after it was determined that case review time followed a log-normal distribution.
To assess the relationship between the display shown and the number of unique items viewed, a Poisson mixed effects analysis was performed. To assess the relationship between the display shown and the total number of items viewed, a negative binomial mixed effects analysis was performed after it was determined that the distribution of the total number of items was over- dispersed (mean=33.0; variance=206.3). For all three models, display, case urgency (urgent, non- urgent), participant experience (resident, attending/fellow), and case order (i.e., the order in which the case was seen by a participant) were included as fixed effects (no interaction terms) and an intercept for participant was included as a random effect.
Analysis of provider perceptions
Responses to the UTAUT scale items for the “explanation” and “prediction only” displays were visualized and compared using stacked bar charts. Free-text feedback on displays and moderator notes were qualitatively reviewed to assist in the interpretation of the UTAUT questionnaire responses and to identify additional insights about participant perceptions of the pediatric ICU in-hospital mortality risk model and the displays.
6.2Results
A total of 15 participants were recruited for this study. Responses to the background questionnaire on clinical experience are summarized in Table 15. As per the study design, each participant reviewed and provided responses for 6 patient cases. Due to a technical error, one participant failed to successfully complete one of their assigned cases. Thus, there were a total of 89 participant responses for the patient cases. The breakdown of case responses by display and
case urgency is shown in Table 16. In 6.2.1-6.2.3, I describe the results from the analyses on decision accuracy and confidence, case review efficiency, and provider perceptions of the model, respectively.
Table 15. Summary of participant clincial experience Time in current position
Position <1 year 1 to <2 years 2 to <3 years Total
Attending 1 0 0 1
Fellow 1 5 1 7
Resident 0 2 5 7
15
Table 16. Participant responses by case urgency and display Case Urgency
Display Non-urgent Urgent Total
No model 14 15 29
Prediction only 15 15 30
Explanation 15 15 30
Total 44 45 89