• No results found

Data Quality Dimensions

3. State of the Art

3.2. Data Quality

3.2.3. Data Quality Dimensions

Since data quality touches many different aspects it can be decomposed into different data quality dimensions. Depending on the use case, only a subset of all known dimen- sions needs consideration, breaking the problem of measuring quality down into smaller pieces. Even though regarding data quality as a multi-dimensional concept is a com-

mon view in the literature and an enormous amount of dimensions were proposed (see Appendix C),“there is no agreement on the set of dimensions characterizing data qual- ity” [128]. Another issue is that there is also no consensus on what particular dimensions mean, leading to multiple definitions of single dimensions. This situation to some extent

reflects the circumstance that quality is often considered in connection with a certain use case or application domain. Besides these differing views of quality dimensions there

are also different approaches to actually infer them from a given use case. These are categorized into theoretical, empirical and intuitive approaches [128, 14].

In a theoretical approach the modeled system is considered in a more abstract way, deriving a formal model to detect and describe quality issues. An example of such an

approach is the quality model of Wand and Wang. Due to the presented abstraction, viewing the development of an information system as a mapping problem of the real world, several artifacts can be derived that are of interest. These are design deficien- cies referring to the errors shown in Figure 11 (incomplete representation, ambiguous

representation and meaningless state) and operation deficiencies standing for inappro- priate behaviour of the system. With these deficiencies at hand one can define quality

dimensions as shown in Table 9. The process oriented models mentioned in the previous section are theoretical approaches as well.

Dimension Description Accuracy and

Precision

“inaccuracy implies that the information system represents a real world state different from the one that should have been represented.” Reliability indicates “whether the data can be counted on to convey the right

information; it can be viewed as correctness of data.” Timeliness

and Currency

refers to “the delay between a change of the real-world state and the resulting modification of the information system state.”

Completeness is“the ability of an information system to represent every meaningful state of the represented real world system.”

Consistency

inconsistency of data values occurs if there is more than one state of the information system matching a state of the real-world system; therefore“inconsistency would mean that the representation mapping is one-to-many.”

Table 9: Quality dimensions derived from the quality model of Wand and Wang [143] The empirical approach does not consider formal models but takes stakeholder opin-

ions into account. In most cases such approaches are based on a user survey as in the method of Wang and Strong [144]. There, a survey performed in multiple steps led to a shortlisted catalogue of 19 quality dimensions grouped in four categories shown in Figure 12.

When following an intuitive approach, data quality dimensions are defined“according to common sense and practical experience” [14]. A concrete example of this approach is given by Redman [123]. The corresponding data quality dimensions are listed in Table 10.

These three approaches and their prerequisites are summarized in Figure 13. Apart from the dimensions presented for the three approaches, an overview of all dimensions

introduced in the considered literature, can be found in Appendix C. To ease the un- derstanding, the dimension definitions or descriptions were normalized using a shared

vocabulary for formulae, and consolidated in case multiple dimensions share the same meaning.

Data Quality Intrinsic Data Quality Contextual Data Quality Representational Data Quality Accessibility Data Quality - Believability - Accuracy - Objectivity - Reputation - Value-added - Relevancy - Timeliness - Completeness - Appropriate amount of data - Interpretability - Ease of understanding - Representational consistency - Concise representation - Accessibility - Access security

Figure 12: Quality dimensions according to Wang and Strong [144]

Type Dimension Description

Data value

Accuracy “Distance between v and v, considered as correct” Completeness “Degree to which values are present in a data collection” Currency “Degree to which a datum is up to date”

Consistency “Coherence of the same datum, represented in multiple copies, or different data to respect integrity constraints and rules”

Data format

Appropriateness “One format is more appropriate than another if it is more suited to the user needs”

Interpretability “Ability of the user to interpret correctly values from their format”

Portability “The format can be applied to as a wide set of situations as possible”

Format precision

“Ability to distinguish between elements in the domain that must be distinguished by users”

Format flexibility

“Changes in user needs and recording medium can be easily accommodated”

Ability to repre- sent null values

“Ability to distinguish neatly (without ambiguities) null and default values from applicable values of the domain”

Efficient use of memory

“Efficiency in the physical representation. An icon is less efficient than a code”

Representation consistency

“Coherence of physical instances of data with their formats” Table 10: Quality dimensions proposed by Redman [123] (cited from [14])

theoretical empirical intuitive experience/ intuition survey model/ formalization dim dim dim dim dim approach prerequisites dimensions

Figure 13: Approaches to derive quality dimensions to consider for a given domain and their prerequisites

Related documents