This section examines metadata from a functional perspective and illustrates the different roles of metadata in a complex decision environment. Managing metadata is critical for the success of decision environments such as data warehouses (Kimball et al., 1998; Marco, 2000) and for ensuring the quality of the data within (Jarke et al., 2000). Metadata serves many different needs in complex decision environments. This has motivated attempts to classify and categorize metadata. Kimball (1998) distinguishes between metadata supporting back-end data processing and metadata supporting front-end data delivery. Marco (1998) classifies metadata as technical metadata useful for system and data administrators and as business metadata that end-users find useful when using the data, applications and systems. Imhoff (2003) classifies metadata as business, technical and administrative, where administrative metadata in- cludes data necessary to manage the overall performance of the decision environment, such as audit trails and performance metrics. The above classifi- cations do not help us completely understand the functionality of metadata. The taxonomy presented here (first introduced in Shankaranarayanan & Even, 2004) extends the above classification schemes and classifies metadata in a more granular manner. The taxonomy is based upon functionality categories: infrastructure, data model, proisare further classified along the previously defined perspectives: business versus technical and front-end versus back-end.
•
Infrastructure metadata: Infrastructure metadata (Table 1) containsdata on system components and abstracts the infrastructure of the informa- tion system. It is used primarily for system administration, maintenance and enhancements.
•
Data model metadata: Model metadata (Table 2), also known as datadictionary, includes definitions of the data entities maintained and the relationships among them. The data dictionary captures storage informa- tion at different levels, such as databases, tables and fields. As a data integration solution, the data dictionary includes the modules necessary for
“vocabulary mapping” across multiple user-groups or business units. It also includes the semantic layer necessary to translate source data elements to their data warehouse representation, and the business terms needed for end-users to interpret the data elements.
•
Process metadata: Process metadata (Table 3) abstracts information on data generation, describing how data items were transferred from sources to targets and what manipulations were applied during transfer. Process metadata serve both technical and business users — IT professionals use it for activating and managing the ETL (extraction, transformation and loading) processes, while business managers use it to assess data sources and understand the manipulations applied. Shankaranarayanan et al. (2003) propose the information product map (IPMAP) as a technique for repre- senting the processes in the manufacture of an information product by adopting the information product approach defined by Wang et al., (1998). Table 1. Infrastructure metadataTable 2. Data model metadata
Business Technical Back-End • Business
identification of systems
• URLs
• Maintenance of hardware, OS and Database servers
• Network protocols and address configuration
• Database administration and configuration parameters
Front- End
Similar to the above Similar to the above
Business Technical
Back-End • Business interpretation of data items
• Data structure, which depends on the storage type – e.g., text files, RDBMS or data streams.
• Elements specific to RDBMS storage – tables, fields, indices, views, stored procedures and triggers
• Mapping source data elements to the data warehouse
Front-End • “Semantic Layer” of naming and definitions of data items in “business” language
• Reports contents and format
• Mapping data items to tables, fields,or file locations
• Data extraction syntax
• Syntax for joining multiple data sources
The IPMAP allows the decision maker to visualize not only the widespread distribution of data and data processing resources, but also the flow of data elements and the sequence by which they were processed. It is discussed in detail with an example in section 4.
•
Quality metadata: Quality metadata (Table 4) contains quality-related information on the actual data stored and helps with assessing the quality of the data. It would include factual measurements, such as the number of records stored, as well as data quality measures along dimensions such as accuracy, completeness and timeliness, using techniques suggested in Ballou et al. (1998) and Shankaranarayanan et al. (2003).•
Interface metadata: Interface metadata (Table 5), also known as report- ing metadata, supports the delivery of data to end users. Interface metadata includes information on report templates used for delivering data, where fields are linked to one or more data elements. It may also include the dimension hierarchies associated with the data, template files (for example, Cascade Style Sheets or CSS for Web outputs or Formula-1 templates for Excel-like outputs) used for displaying reports and configurations of reports that constitute “dashboards” — collections of data views that support managerial needs.•
Administration metadata: Administrative metadata (Table 6) includesdata necessary for administering the decision environments and associated Table 3. Process metadata
Table 4. Quality metadata
Business Technical Back-End • Business interpretation of data
transfers: source-target mapping, new fields, integration, aggregations and filtering
• Process charting (IPMAP)
• Data cleansing business rules
• ETL software, engines and APIs
• Implementation of business rules
• Data transfer schedule and monitoring
• Source/target schema adjustment
• Data cleansing utilities
Front-End N/A N/A
Business Technical
Back-End • Definition of measurements
• Actual measurements
• Utilities for automated data quality assessment
Front-End • Presentation of quality measurement
• Quality dimensions and reporting
• Making quality metadata available for reporting and data analysis utilities
applications, supporting tasks such as security, authentication and usage tracking.
The taxonomy seen in Table 6 suggests that metadata captures the design choices and maintenance decisions associated with information systems and reflects the design of information systems. Hence, it may serve as a basis for evaluating the design of information systems and be used to evaluate the performance of complex decision environments.
Table 5. Interface metadata
Table 6. Administration metadata
Business Technical
Back-End N/A N/A
Front-End • User vocabulary
• Metaphors for data visualization
• Personalized aggregation and other computation definitions
• User-defined dimension hierarchies
• Report template/layout
• Template visibility and sharing
• Dashboard configuration
• Delivery setup
• Metadata on report templates including report fields and layouts
• Physical location of template files
• Mapping of report fields to data elements
• Format preferences
• Consolidating multiple reports into dashboards
• Tracking report delivery failures
• Personalizing data delivery formats and styles including metaphors for visualization
Business Technical
Back-End • Usage privileges, usernames and passwords
• Groups and roles for business functions
• Legal limitation on data use
• Tracking use of data elements
• Documentation, on-line help and training aids
• Authentication interfaces
• Application and data security
Front-End • Users and passwords
• Use privileges for data and tools
• Tracking use of applications
• Tracking report delivery failures
• Personalizing data delivery formats and styles
Managing Metadata in Decision
Environments
Well-designed and integrated metadata offers important benefits to managing complex decision environments. The complex functional requirements of metadata may cause significant challenges for integration. Commercial, off-the-shelf software products (COTS) for data warehouses offer metadata management capabilities. Each product focuses on a specific type of metadata and no one product offers the ability to comprehensively integrate and manage metadata. This motivates organizations to build a customized enterprise metadata reposi- tory.