Data cleaning - Phase I Basic Model - Modeling and Improving Patient flow at an Emergency Depar

Chapter 3 Phase I Basic Model

3.2. Data cleaning

As mentioned earlier, raw data of patient records were provided by the hospital upon

request for this work. Data includes many information and time stamps, from patient

entering the ED till discharge, but not all of them were used for simulation purposes.

Besides that, since all data were recorded by the ED staffs in the information system, it

includes missing records and inaccurate information. Therefore, data cleaning was required

to trim data and make it ready for the data analysis. Data cleaning was done in three steps:

1. Deleting the missing and inaccurate records

Since the exported data were presented to us in an MS Excel format, some functions

were applied to clean the data from missing and inaccurate records. Filtering function was

applied to find all the blank cells in the required columns of data and the record of all

patients with missing data are entirely deleted. This process was repeated to clean all

incomplete patient records from the data file. For example, for some EMS arrival patient,

ESI levels were not entered by the triage nurse since the triage process happened in the

room instead of triage room. And probably the nurse forgot to enter the data into the system.

However, for some purposes such as finding patient arrival pattern and finding a total

number of a patient visiting the ED, all records, including those with missing data were

used since those missing data do not affect the result of data analysis. Figure 3-1 shows an

example of cleaning missing records in cell B950, K951, and L951. The entire records of

patients in rows 950 and 951 are deleted for analyzing arrival type and acuity level of

Figure 3-1 Data file with missing records

2. Adding index values for patients in certain categories

In the ED, each type of patient may receive a different type of care, have different

routing, and get a priority to receive a care. Therefore, each type of patient should be

separated and treated differently in the model as well. In the simulation model, each patient

has different attributes which distinguish it from the others. By assigning attributes to the

patient different routing policies and different sequence of the process can be applied to

the patients. In order to assign these index values accurately, it was required to use the

actual data. for this purpose, records of patient’s age group, acuity level, disposition type

and arrival type are selected and cleaned.

Since the ED allocate two different waiting areas to separate adults from pediatrics, it

is needed to find the percentage of each to implement in the model. In the data file records of patient age was followed by “year old” or “month old”. To change it to the number

the TRIM function was used to delete any possible extra spaces entered in each cell at the

end. Column E of Figure 3-2 is showing the TRIM function used. Column F of the same

Figure 3-2 is showing the LEN function applied to return the length of the string in each

cell, where in Column G the LEFT function was used to return 3 characters from the left

side of the text in each cell and it multiplied by 1 to change it to a number. In column, I of

Figure 3-2, IF and RIGHT functions were combined to check if the age was followed by “year old” or not. The IF function returns False if the record is followed by “year old” which means that patient is not an infant. In the last step in column J, IF function was used

to classify patients into 11 groups based on their age, by checking the content of column H

and column I. 0 to 1-year-old patients were considered as group 1, 1 to 10 years old patients

as group 2, 11 to 20 years old patients as group 3 and so on. Formulas were copied all the

Figure 3-2 Adding index value for age group

A similar procedure was taken to extract a proper format of acuity level records for

data analysis purpose. Figure 3-3 shows patient records of acuity level in column K and

the RIGHT function which is used to change the format of records in column L. The

Figure 3-3 Adding index value for acuity level

To easily categorize patients based on their disposition type in the data analysis part,

the records of disposition type were coded as follow: 1 for discharged, 2 for admitted, 3 for

left without being seen and 4 for transferred patients. Figure 3-4 shows how an IFS function

is applied in column P.

Figure 3-4 Adding index value for disposition type

The ED has two arrival points and patients arriving at each location follow a different

sequence of activities. In order to analyze the data of patient arrival, an index value was

three main group: walk-in patients, EMS arrivals, and LAW arrivals. VLOOKUP function

was applied to convert the arrival type records into the 3 arrival types: 1 representing EMS

arrivals, 2 for LAW arrivals and 3 for walk-in patients. The function and the output of using

the function in column C represented in Figure 3-5.

Figure 3-5 Adding index value for arrival method

3. Matching patient records from multiple sources

Since the data file was provided in different spreadsheets it was needed to aggregate

all records into one sheet for the data analysis. The VLOOKUP function was applied to

match patient records from different sheets by the identification number (MRN) and

aggregate all into one sheet. Figure 3-6 shows how the function was used to transfer all

transferred from different spreadsheets into one may shift. For example, in column AO the

imaging ordered time from different spreadsheet was copied by matching the MRN

number.

Figure 3-6 Matching data from different sources

In document Modeling and Improving Patient flow at an Emergency Department in a Local Hospital Using Discrete Event Simulation (Page 39-45)