Chapter 3 Phase I Basic Model
3.2. Data cleaning
As mentioned earlier, raw data of patient records were provided by the hospital upon
request for this work. Data includes many information and time stamps, from patient
entering the ED till discharge, but not all of them were used for simulation purposes.
Besides that, since all data were recorded by the ED staffs in the information system, it
includes missing records and inaccurate information. Therefore, data cleaning was required
to trim data and make it ready for the data analysis. Data cleaning was done in three steps:
1. Deleting the missing and inaccurate records
Since the exported data were presented to us in an MS Excel format, some functions
were applied to clean the data from missing and inaccurate records. Filtering function was
applied to find all the blank cells in the required columns of data and the record of all
patients with missing data are entirely deleted. This process was repeated to clean all
incomplete patient records from the data file. For example, for some EMS arrival patient,
ESI levels were not entered by the triage nurse since the triage process happened in the
room instead of triage room. And probably the nurse forgot to enter the data into the system.
However, for some purposes such as finding patient arrival pattern and finding a total
number of a patient visiting the ED, all records, including those with missing data were
used since those missing data do not affect the result of data analysis. Figure 3-1 shows an
example of cleaning missing records in cell B950, K951, and L951. The entire records of
patients in rows 950 and 951 are deleted for analyzing arrival type and acuity level of
Figure 3-1 Data file with missing records
2. Adding index values for patients in certain categories
In the ED, each type of patient may receive a different type of care, have different
routing, and get a priority to receive a care. Therefore, each type of patient should be
separated and treated differently in the model as well. In the simulation model, each patient
has different attributes which distinguish it from the others. By assigning attributes to the
patient different routing policies and different sequence of the process can be applied to
the patients. In order to assign these index values accurately, it was required to use the
actual data. for this purpose, records of patient’s age group, acuity level, disposition type
and arrival type are selected and cleaned.
Since the ED allocate two different waiting areas to separate adults from pediatrics, it
is needed to find the percentage of each to implement in the model. In the data file records of patient age was followed by “year old” or “month old”. To change it to the number
the TRIM function was used to delete any possible extra spaces entered in each cell at the
end. Column E of Figure 3-2 is showing the TRIM function used. Column F of the same
Figure 3-2 is showing the LEN function applied to return the length of the string in each
cell, where in Column G the LEFT function was used to return 3 characters from the left
side of the text in each cell and it multiplied by 1 to change it to a number. In column, I of
Figure 3-2, IF and RIGHT functions were combined to check if the age was followed by “year old” or not. The IF function returns False if the record is followed by “year old” which means that patient is not an infant. In the last step in column J, IF function was used
to classify patients into 11 groups based on their age, by checking the content of column H
and column I. 0 to 1-year-old patients were considered as group 1, 1 to 10 years old patients
as group 2, 11 to 20 years old patients as group 3 and so on. Formulas were copied all the
Figure 3-2 Adding index value for age group
A similar procedure was taken to extract a proper format of acuity level records for
data analysis purpose. Figure 3-3 shows patient records of acuity level in column K and
the RIGHT function which is used to change the format of records in column L. The
Figure 3-3 Adding index value for acuity level
To easily categorize patients based on their disposition type in the data analysis part,
the records of disposition type were coded as follow: 1 for discharged, 2 for admitted, 3 for
left without being seen and 4 for transferred patients. Figure 3-4 shows how an IFS function
is applied in column P.
Figure 3-4 Adding index value for disposition type
The ED has two arrival points and patients arriving at each location follow a different
sequence of activities. In order to analyze the data of patient arrival, an index value was
three main group: walk-in patients, EMS arrivals, and LAW arrivals. VLOOKUP function
was applied to convert the arrival type records into the 3 arrival types: 1 representing EMS
arrivals, 2 for LAW arrivals and 3 for walk-in patients. The function and the output of using
the function in column C represented in Figure 3-5.
Figure 3-5 Adding index value for arrival method
3. Matching patient records from multiple sources
Since the data file was provided in different spreadsheets it was needed to aggregate
all records into one sheet for the data analysis. The VLOOKUP function was applied to
match patient records from different sheets by the identification number (MRN) and
aggregate all into one sheet. Figure 3-6 shows how the function was used to transfer all
transferred from different spreadsheets into one may shift. For example, in column AO the
imaging ordered time from different spreadsheet was copied by matching the MRN
number.
Figure 3-6 Matching data from different sources