Research Quality - Decision Making in a Microservice Architecture

SOLUTIONS

9.2 Research Quality

To reflect on the quality of the design research documented in this thesis, guidelines by Hevner et al. [103] are used. The authors propose seven guidelines that quality design research should incorporate. The guidelines, description and execution are shown in Table 14. When comparing these guidelines with present research, it can be seen that all have been fulfilled. Weak parts that can be identified are mainly the limited number of participants in the case studies used for validation, and the fact that only marginal changes to the artifact have been made within the scope of this research. Future changes are anticipated, but not included. The next step to improve on this is to implement the artifact in practice, evaluate upon this and make changes accordingly. Nevertheless, all other guidelines have been adhered to, suggesting that the research overall is reliable.

Table 14 - Guidelines for Desing Science - Adopted from [103]

Guideline Description Execution

1: Design as an Artifact

Design-science research must produce a viable artifact in the form of a construct, a model, a method or an instantiation.

An artifact in the form of a decision-making framework for managing microservice

challenges was developed. 2: Problem

Relevance

The objective of design-science research is to develop

technology-based solutions to important and relevant

business problems.

The motivation for this research in part directly originates from a business, and the research benefits businesses by helping manage technical challenges. 3: Design

Evaluation

The utility, quality, and efficacy of a design artifact must be rigorously demonstrated via well-executed evaluation methods.

Various methods of validation including interviews on the challenges and single case mechanism experiments for the artifact were conducted.

4: Research Contributions

Effective design science research must provide clear and verifiable contributions in the areas of the design artifact,

Practical and theoretical insights on microservices and decision-making were

DECISION-MAKING IN A MICROSERVICE ARCHITECTURE PAGE 110/130

design foundations and/or design methodologies.

decision-making framework that can be used in practice. 5: Research

Rigor

Design-science research relies upon the application of rigorous methods in both the

construction and evaluation of the design artifact.

The DSM by Wieringa was followed to structure the research. This methodology guided the design and evaluation of the artifact. 6: Design as a

Search Process

The search for an effective artifact requires utilizing available means to reach desired ends while satisfying laws in the problem

environment.

The validated design represents a first step in an iterative process of

improvement. It is

acknowledged that real-world use is needed to further assess its fitness for purpose, and possible future changes are also considered.

7: Communication of Research

Design-science research must be presented effectively both to technology-oriented as well as management-oriented

audiences.

The descriptions and

representations of the artifact were made to be

communicated effectively to both academics and

professionals involved with decision-making in

microservices.

9.3 Validity and Reliability

When talking about research validity, Gregor and Hevner give a clear description: “validity means that the artifact works and does what it is meant to do; that it is dependable in operational terms in achieving its goals”[104]. Wieringa [4] describes different types of validity that are involved in design science research. Those applicable to this research and in particular its validation are construct validity, descriptive validity, internal validity and external validity.

“A conceptual framework is a set of definitions of concepts, often called constructs” [4]. Construct validity is defined by Wieringa as “the degree to which the application of constructs to phenomena is warranted with respect to the research goals and questions” [4]. The constructs of the conceptual framework can be subject to certain threats to their validity. Wieringa discusses several of these. A first threat is inadequate definition; constructs should be clearly defined to be able to distinguish instances of a concept from those that are not. Effort has been put in defining the concepts involved in the conceptual framework based on related literature. For instance, a meta-model was included in the artifact to clearly show the different entities and their interactions during decision making. Through these measures, inadequate definition is avoided as much as possible. Construct confounding is another threat; in which instances of use cases can be ambiguous. An example to help aid in this is the inclusion of usage requirements to the artifact. These describe the cases in which it should be applicable, to rule out cases that it is not. These also help in defining the population to which the findings can be generalised. Another possible threat is that of mono-operation bias; in which the indicators defined for a concept do not fully capture it. This was avoided as much as possible; by for instance measuring indicators on usability, usefulness and decision quality in the case studies by using measurement questions from academic research. This way,

DECISION-MAKING IN A MICROSERVICE ARCHITECTURE PAGE 111/130

participants would not only be directly asked to comment on these indicators, but its value is determined by multiple different measurements. Mono-method bias can also be at play here, when for instance all indicators are measured in the same way. This is why besides the surveys in the case studies, also qualitative feedback and details about the decision-making process were gathered to find out if the views from both are aligned. That is not to say that qualitative feedback may not introduce other biases, but the combination of both should mitigate this as much as possible. With these measures against threats to construct validity in place, the aim is to make the constructs as valid as possible.

Descriptive validity is relevant for assessing the support for descriptive inferences made in this research. Checklists that Wieringa [4] proposed for designing the validation methodology can be looked back at after execution of the research to assess the descriptive validity of the inferences. For instance, questions like whether prepared data represents the same phenomena as the ‘raw’ results, as well as whether scientific peers would be able to make the same interpretations from the data presented. The measurements and descriptions of the outcomes have been prepared with descriptive validity in mind. In the interviews on challenges, for example, participants were asked whether the interpretations of their answers matched their beliefs (member checking). Another example is that during the case study no data was removed, results were discussed qualitatively to give more context for interpretation (triangulation), and the statistical procedures applied to data were straight-forward and could be reproduced by others.

Internal validity refers to the support of any abductive inferences done in this research. Wieringa [4] describes that the single case mechanism case studies that were conducted can support abductive inference. Again, threats to internal validity are listed. A main concern is possible sampling influence that can introduce biases in the results. For example, there can be a selection effect by which subjects participating in a case study can act differently just because they have been chosen to participate. Besides random selection of subjects, it is hard to control this effect. No direct indications were found during the case studies that participants altered their behaviour, but there is also no evidence to definitively say that this cannot have been the case. It was found that participants were not hesitant to voice any concerns during the case study, indicating that they were not likely to be less critical to satisfy the researcher. Treatment must also be controlled, by for instance randomly selecting subjects, or at least as random as possible. Practitioners were asked about possible case studies and the teams involved, but after selection of the case studies, no deliberate selection between team members were made; all were asked and free to participate. The experimental set up was also made to resemble working conditions from practice closely, as to reduce any responses based on the set up alone. It is possible that subjects have responded to the novelty of the artifact under study in these case studies. While no direct indications were found for this, there is a chance that participants responded more favourably to the artifact just because it was novel. Influences by the experimenter were also avoided as much as possible by being aware of the possible threats of experimenter expectation – bias through hope for a particular experiment outcome – and experimenter compensation – when experimenters treat subjects differently based on their interaction with the treatment. Awareness of these threats helped in the experimenter not blindly making the mistakes described.

External validity comes into play when generalising about findings beyond the cases in which it has been tested. Wieringa describes requirements for this [4]. Validation cases were chosen that fulfilled the usage requirements outlined in the artifact design. When searching for cases, practitioners were explicitly asked for real-world cases that were currently being considered. This was done to ensure they were a representative sample of the target population, thus

DECISION-MAKING IN A MICROSERVICE ARCHITECTURE PAGE 112/130

helping support analogous inference. The treatment was also kept as similar to the intended use in practice as possible, by for instance only limiting the number of requirements and alternatives considered, but not the decision-making process and options itself. One notable inhibitor for validity is the limited number of participants during validation, which makes for less certainty about the generalisability of the findings. The artifact behaved as intended during the case studies. By controlling as many factors as possible to create an environment similar to the intended environment for use in practice, generalisations are supported as well as possible.

As for the question of how reliable present research is, we must look at the extent to which the operations of the study can be repeated with the same results [105]. The research followed Wieringa’s DSM [4] as design research protocol. The steps taken in this process have been described as detailed as possible. The fact that practical insights from the case study largely originate from Thales Naval does somewhat inhibit repeatability. While for many practical insights it is expected that they also occur in similarly sized and structured organisations, every organisation is different. It is expected that if this research were to be conducted in collaboration with another company, similar general findings will likely be found.

As Wieringa [4] states; constructs and inferences are never totally valid, since science is fallible. This is true for all described types of validity. No major threats to the types described have been found. Nevertheless, case study research can be limited in especially construct validity because the measurements are inherently subjective to an extent [105]. Instruments such as multi-method measurements and using multiple measurements per indicator have been put into place to limit subjectivity. Still, the number of participants was a notable limiting factor for validity. Overall, care has been taken to have as little threats to research validity and reliability as possible. It is expected that the general findings are supported well through the conducted research, and that these are replicable in future research.

In document Decision Making in a Microservice Architecture (Page 109-112)