• No results found

Subjective data applies to both responses from participants about their opin-ions and also observer notes about participants they have been asked to observe.

Participants

When collecting data from participants, this is primarily done by question-naires. For this section we focus on questionnaires that are physically filled out by the participants themselves.

Instructions

The key thing is to give instructions to the participants not only about what is required from them, but also about why the information is needed. This may seem like an obvious statement but the second point isn’t always applied.

There’s no better incentive than one that shows exactly how the responses col-lected can directly benefit the participant. For example, working with military participants, they aren’t too enthralled about filling in multiple questionnaires, but if the benefits are made clear, such as this will directly aid with which body armor you will be using in the future, then they are much more willing to put in the time and answer the questions thoughtfully. There’s nothing worse than receiving blank questionnaires or questionnaires where no thought has gone into it. For example checking exactly the same response throughout or always checking the neutral response due to them being either too busy and/or not engaged.

Figure 2-2. Missing data responses for Ct values

Chapter 2 | Data Collection

38

Giving clear instructions about how to fill in the questionnaires properly will reduce the data formatting time immensely. Even with a well-planned and produced questionnaire, participants can still answer incorrectly, in terms of the response data type that is required for the analysis. For example, if you are using a Likert-type scale such as in Figure 2-3 for the likelihood of an event occurring, definition needs to be given as to the responses you are expecting to see.

Figure 2-3. Likelihood of an event ocurring response options

Figure 2-4. Likelihood of an event ocurring responses

Figure 2-4 shows an example of some of the responses received that we would class as “incorrect” for use in the analysis due to there being multiple options chosen. “AC-AI” in the figure below stands for “Almost Certain–

Almost Impossible” on the scale in Figure 2-3, and so forth.

In the questionnaire provided, the question simply stated “what is the likeli-hood of the event occurring?” with the Likert scale in Figure 2-3 shown on its side at the edge of the page, and a blank line for the participants response. The first “incorrect” response could clearly be discounted from the analysis as the participant had covered the entire response option space however the other responses are understandable as they are within a close range of each other.

A lesson learned from this would be to either specifically define up front that only one likelihood could be chosen or to offer boxes stating that to check one option only.

A recommended extra for questions similar to the one in the previous para-graph would be to include a confidence level. For example, in the question-naire, below each question, we included a five-point Likert confidence scale from very confident to very unconfident. The advantage of this was that it specified their confidence in the answer they chose so it aimed to improve the understanding of the analysis. You could see when someone really wasn’t sure about an answer, and may have just chosen it because they knew they

Translating Statistics to Make Decisions 39

needed to note an answer, and also when someone was very confident about the response they had chosen. This isn’t always applicable though, such as for questions about comfort, ease of use, preference, and so forth.

Repeats

It may be necessary to gather multiple questionnaire responses from the same participants over time, in which case this needs to be accounted for during the analysis. It is during the design of experiments phase that the number of questionnaires per participant needs to be decided on dependent on the rela-tive gain of useful information. However, it’s worth noting that running a pilot study may improve the estimate.

Taking this approach, while beneficial to answer the customer question, it can be cumbersome to the participant and thought needs to be given to that aspect. Imagine that you were the participant having to answer all these ques-tionnaires, could you give thoughtful answers each time?

We conduct multiple studies where we piggyback, as it were, onto military training exercises; so they are performing their everyday routines without any interference from us except for giving out the questionnaires. From our point of view we want to gather lots of information from them once we have added something extra to their training, such as different vehicles, storage, cloth-ing, and so forth, and from their point of view they just want to get on with the training. We have to come up with a balance of gaining our information without annoying or distracting them which would result in noncooperation or unrealistic responses. An easy starting point, as I’ve mentioned earlier, is making sure they understand the direct benefit to themselves by giving us this information, then making sure the questionnaires are as clear and concise as possible, and finally having the potential to be a bit flexible. For example with the last point, you don’t want to be stopping them mid-fire with the “enemy”;

postponing the questionnaire until just after would produce much better results as they won’t be rushing to get back to the action.

Observers

In some cases the subjective data will be collected by observers watching other people, the participants, carry out an experiment, or training using the military example. This also can be done in addition to the participants directly filling in questionnaires themselves.

Chapter 2 | Data Collection

40

Instructions

Clear instructions need to be given to the observers as well as to the par-ticipants, as the observers need to know how their responses will fit into answering the key customer questions. They also need to know exactly what the participants will be doing.

The observers need to be made aware of the type of things to look out for so that the notes they are making can be applied to one or more of the key question areas. For example, is it of interest for them to record who speaks to whom, the stress levels, if anyone is not being included, and so forth.

Tied in to this is defining who the observers will be watching, will there be dif-ferent observers per participant or group of people, or will there be multiple observers generally watching the whole room?

If things such as stress levels are of interest should it just be recorded in note form or would it be preferable to record stress in a structured format, such as on a Likert scale, so there is comparison across both time and the observ-ers? It’s good practice to have the observers noting some of the same things that the participants themselves have been asked, such as work load, as it’s interesting to compare the observers opinions to what the participants have actually recorded.

There is also the practical aspect of how long can the observer actually be alert to what is going on. There should be planned breaks for the observers with replacements coming in so the experiment doesn’t need to be paused and the flow disrupted.

Authority

The observers need to know their place in a completely non-negative way.

If their role is just to observe the room then they need to make sure they are as invisible as possible. They need to make sure they don’t engage with the participants by asking or answering questions. The participants should be made aware that they are not to become involved with the observers. The observers also need to ensure that they stay impartial.

If the observer’s role includes issuing and collecting questionnaires, or asking the participants the questions directly, then they need to the lead with this aspect. For example, there may be missing questionnaires in the results due to the observers feeling that as the military participant was a very high rank, they didn’t feel they had authority to tell them what to do. It needs to be made clear that in an experiment they have the authority to stop the participant, in line with a sensible stopping time, to gain the responses required. The partici-pant may get annoyed, but with reiteration of the personal gain from providing the responses, it should pave the way to get sensible responses rather than

Translating Statistics to Make Decisions 41

a rush through to get it done or a pure dismissal of the questions. This boils down to having good communication between the observers and the partici-pants and it may mean changing observers around as sometimes there are personality clashes that just cannot be avoided.

Variation

There will be variation between the observed participants and that is what we want to collect through the experiment. However, there also will be variation between the observers, as there should always be multiple observers, and this we want to minimize as much as possible. A good starting point is to run a pilot study to highlight the type of items you want them to notice, collect, or do. This way they can practice as well as see the methods other observers use.

There are some statistics that can be used during analysis, or in the initial pilot, to measure inter-rater agreement and also internal consistency, two common statistics being Cohen’s kappa and Cronbach’s alpha. To be able to calculate these statistics an observer ID should be recorded.

Formatting

If the data collection instructions have not been very clear then the format-ting of the data for analysis can take a long time, sometimes it can take longer than the analysis.

If the data has been collected by hand, that is, on paper; this will then need to be inputted into a computer. Thought should be given as to where these documents are stored to avoid being lost; how they can be distinguished from each other; a naming convention on the top is recommended; and when they should be inputted as soft copies, for example, at the end of each day, at the end of the experiment, and so forth.

Contrary to some advice given in statistics, when entering the data from hard copy to soft copy, this is generally best done by one person. This way there will be a much lower risk of data duplication and of different naming conven-tions. There also will be a point of contact for quesconven-tions. However, for quality assurance the data also can be entered independently by another person, depending on the time burden, then both data sets can be compared on the computer to check for possible errors through human input.

If data has already been collected in soft copies then the task is to merge all the data and/or to check for the same mistakes that could be made inputting the data from hard copies, which we will look at later. Make sure there are always back up files for all soft copy data, the worst thing would be to lose a whole trial worth of data.

Chapter 2 | Data Collection

42

Generally speaking, how software requires the data to be set up for analysis usually isn’t pleasing to the eye, so sometimes there will be a “viewable” data set and an analysis data set. If you are using R, it requires data to be stacked as in Figure 2-5, with the middle section deleted for space saving; unless you are dealing with paired data, which is discussed in Chapter 3.

Figure 2-5. Example of stacked data

It’s always better to start with the complete data set like this as it’s very simple to subset out sections, such as per question, rather than start with separate columns for questions and have to merge them later on.

Different software may require the data to be set out in different formats, so you need to be aware of what format is required as this will drive how you structure the soft copy data.

I also recommend using Excel to store the data in csv files as it’s accessible by almost all companies and can be converted if they have an older version. It is easy to use and can be read in most software packages.

Translating Statistics to Make Decisions 43

That said, no matter whether you are inputting the data from hard copies or are having to check already entered soft copy data for errors/discontinuity, there are certain things you should be aware of:

• Get to know the software you will be using for analysis and what structure the data will need to be in.

• Make sure the data template is set up first, which means make sure you have all your column headers in place so you know where all the data fits.

• Ensure the column headings are kept as simple and clear as possible as a lot of software will convert spaces and/or symbols to a full stop. This doesn’t apply to group levels within a variable as spaces and symbols can be included within these.

• Keep track of the data so you can avoid multiple entries or missing out data.

• If applicable, make sure the participants have an ID num-ber, whether the questionnaires are anonymous or not, you may need to refer to it later on.

• Ensure you enter all the raw data rather than averages, any summary statistics can be done later using the software.

• Use the same value for missing data, whether it’s a blank space or specific text.

• If there are zeros in the recorded data, check whether they represent zeros or missing data and input appropriately.

• When repeating text, such as group names, copy and paste is the best to use, as this way you avoid misspelling words, inserting an extra space, and so forth.

• If you are dragging a cell down for copying in Excel and it contains a number, make sure it copied correctly: copy the cell as opposed to continuing the sequence.

• Save the document in a useable extension for the soft-ware, for example R can read .xlsx files using certain packages but it’s much simpler and quicker to save a .csv file.

Once the data has successfully been inputted and checked you can still double check items within the software. The software will soon highlight an error if the data is in an unreadable format; if there are extra unnamed variables, usu-ally if there is a space in one cell; or if the data is unbalanced when it shouldn’t be, if you haven’t filled in all the data for one variable.

Chapter 2 | Data Collection

44

The following example of formatting checks, including some errors that may be seen, are shown using R, so skip to the end of the chapter if you won’t be using that software.

Reading in data to R will immediately show if the data set is complete.

data = read.csv("MyData.csv")

Error in file(file, "rt") : cannot open the connection In addition: Warning message:

In file(file, "rt") :

cannot open file 'MyData.csv': No such file or directory

Common solutions to this error may be that R is linking to a folder that doesn’t contain the data in which case you need to change the working direc-tory, or make sure the data set is saved using the same extension as R is asking for.

Once the data has been read, successfully check the class of your variables to verify they are in a format you expect, it may be that you need to tweak one. For example, if Day was recorded as 1, 2, 3, then R will assume it to be an integer, treated as continuous data in models, and you may want it to be a factor.

class(data$Gender); class(data$Day) [1] "factor"

[1] "integer"

data$Day = factor(data$Day); class(data$Day) [1] "factor"

You also can check the levels of factors to ensure there haven’t been any spell-ing mistakes or missspell-ing/extra levels.

levels(data$Smoker); levels(data$Likert) [1] "N" "y" "Y"

[1] "1" "12" "2" "3" "4" "5"

You clearly can see there has been a typo when entering data for whether someone smokes or not (Smoker), and there also has been an error enter-ing the Likert responses. With Smoker it is obvious that the “y” should be changed to a “Y” to tie in with the capital “N.” However with the Likert response you don’t know whether the value of 12 should have been a 1 or a 2, so either the hard copies will need to be referred to or the data may have to be excluded if there’s no way of verifying the correct answer.

With numerical data you can investigate the minimum to maximum values to ensure that these give the range you were expecting, if there is an anomaly it may have been missed if only the mean was investigated.

min(data$Age); max(data$Age) [1] 0

[1] 220

Translating Statistics to Make Decisions 45

mean(data$Age); median(data$Age) [1] 59.5

[1] 42

A quick look at the mean shows a sensible age, though it is a bit high if you also checked the median. However by including the minimum and maximum values we can see that there have obviously been some errors made with the data input. The age of 0 is clearly incorrect, but does it mean N/A or has another number been missed before the zero? Likewise with 220, should it have been 22, 20, or neither? These would both need to be checked and changed if pos-sible; otherwise they would need to be excluded from the analysis.

As a side note if summary(data$Age) was used this would show the mini-mum, first quartile, median, mean, third quartile, and maximum of the data, more in Chapter 4.

You will be able to see if all the data has been entered, especially if you have equal groups of participants.

Although the length is correct for the number of responses we should have, the groups are not correct. One participant is not in a group when they should have been in Group D. The most likely explanation for this is that the last label was left off the spreadsheet by accident and is easily rectified.

Summary

This chapter has shown the thought process behind collecting the data in terms of what will be collected, how it will be collected, who will be collecting it, and the possible problems that can occur within each if not given enough consideration. It also looked at the issues that can occur with data formatting.

This chapter has shown the thought process behind collecting the data in terms of what will be collected, how it will be collected, who will be collecting it, and the possible problems that can occur within each if not given enough consideration. It also looked at the issues that can occur with data formatting.