Missing Data - Data Management - Exploring the impact of reshoring decisions on supply chain an

5.2 Data Management

5.2.1 Missing Data

After the pilot test, the formal data collection was conducted starting in mid- October 2016 and ending in mid-December 2016. Through multiple approaches (the majority based on email dissemination) and 2-3 email reminders, finally, 652 participants started the survey, which means they open the survey link. Among

136

them, 298 companies have fully completed the survey, which means they fill in the survey till final page.

Missing data is one of the most pervasive problems occurring in data analysis. It is a fairly common occurrence that a respondent does not provide the answer to one or more of the survey questions. As a result, missing data can affect the results of research objectives (Hair et al. 2006). As Hair et al. (2006, p51-52) claimed, one key practical impact of missing data could be to reduce the sample size when excluded the cases with missing data, from an adequate sample to an inadequate sample. Another impact could be that the non-random missing data could sometimes affect the normal distribution further may cause bias in results (Hair et al. 2006, p51-52). To avoid missing data, this study set up the questions as compulsory questions in the online survey, therefore, without answering a question, the participants could not move on. In this way, if the participants complete the survey, it can guarantee that there is no data missing in the completed responses. In other words, the cases with missing data, in this study, are those uncompleted survey responses. In Qualtrics, there is a progress tracking function that showing how much percentage of a survey has been completed by the participant. Therefore, to find the cases with missing data, the researcher filtered the responses with the “100%” completed, which show 298 responses in total out of 652 recorded cases. Therefore, there are 354 responses that participants started the survey but did not completed, which are the cases with missing data in this study. For these uncompleted responses (the missing data cases), the researcher decides to exclude them from the sample for further analysis following the “complete case approach”, one of the popular methods to deal with missing data cases, rather than the “all available subsets approach” (Hair et al. 2006, p53). The reasons for following this complete case approach to remove all these uncompleted cases are discussed from both practical and statistical perspectives as below:

From the data practical perspective, looking into the details of uncompleted cases, it is not difficult to find that those uncompleted cases have a very high level of missing data. Majority of participants are just open the link of the survey and

137

then closed it without filling any answer to the survey questions. (This actually fits the current survey data collection practice that people tends to ignore the survey emails after they identified, due to their busy work.) As introduced in Chapter 4, the survey is consisted by seven sections, and the last two sections are about competitive priorities and business performance respectively, the IVs and DV in the moderation model. However, within those uncompleted cases above, 90% of them lack half or more than of the data, and 93% of uncompleted responses did not start answering the questions about competitive priorities yet, not even mentioned business performance. In other words, the 93% of uncompleted cases lack of the key data information of independent variables. The rest 7% of them answered up to questions about CPs (competitive priorities) but not start questions about BP (business performance), which also are not able to use for model analysis. Therefore, all of the uncompleted cases lack of the key data information, which were not able to be remedied due to the high missing level. Therefore, there is no value to keep the uncompleted responses other than exclude them.

In addition, during the data collection period, actually the researcher has tried to push the responses with completion progress of 90% or above by contact the participants directly to ask for information and encourage them to completed it. Therefore, the uncompleted cases left finally are those ones which are very bad quality and lack so much information. Therefore, from the reality of data perspective, the uncompleted cases should also be removed.

From the statistical perspective, removing these uncompleted responses could improve the reliability of data and analysis results. In detail:

1) Remove uncomplete cases is a fundamental way to avoid the impacts brought by the missing data. If keep the uncompleted cases for analysis, it requires to remedy the missing data, for example, by adding the value using mean value (Hair et al. 2006, p50-54). This remedy processes could increase risks of producing biased results. Therefore, the author believes it is better to keep original data for analysis.

138

2) Considering the sample size, even though removing the uncompleted samples, the research still has enough samples to conduct further analysis (minimum sample size requires 175 as discussed in section 4.4.2.3.4). Also, by observation, the miss data are randomly happened in this study. Thus, removing them will not affect the distribution of the dataset or create bias (further confirmed by the normality test in chapter 6). Therefore, it is free to remove the uncompleted responses, without worrying inadequate samples issue.

As claimed in Chapter 1, this research has two research objectives which are explore the current reshoring status and explore the moderation relationship. In order to reveal a full picture of the current reshoring status, and achieve both of the objectives, this research has to set a higher requirement of the completion level for the acceptable responses. Some people may argue, the responses which answered all the questions for first 5 sections of the survey could be included for descriptive analysis for research objective one, even though it lacks information regarding CPs and BP. However, the author believes it makes more sense to use the same set of data for both research objectives, rather than using different datasets with different sample size separately. Especially the first research objectives also need information of CPs and BP as a part of the reshoring status description. Therefore, it is better to take the completed cases for analysis and dropped all the uncompleted cases for this study. However, in the future research, as discussed above, those cases with missing data could able be used for other research purposes. For example, for the cases which complete all the questions other than CPs and BP sections could be used for a further exploration of a specific perspective of the reshoring status. The cases which completed up to questions about CPs could be used for some research regarding explorations of key CPs of reshoring or the relationships between CPs and location decision etc.

After excluding the uncompleted cases, within the 298 completed responses, there were eight duplicated responses which came from the same companies. Therefore, they were removed from 298, leaving 290 responses in total. The

139

author then furthered filter them based on the responders’ awareness of company location strategy, to further remove the unqualified responses, leaving a total of 272. Within the 272, a double check of the reliability of the responses was conducted and identified an extra three cases. Therefore 269 is the finally sample size applied for the analysis of this research.

In document Exploring the impact of reshoring decisions on supply chain and business performance : evidence from 261 UK manufacturers (Page 153-157)