• No results found

Design of Data Management Guideline for Open Data Implementation

N/A
N/A
Protected

Academic year: 2021

Share "Design of Data Management Guideline for Open Data Implementation"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Design of Data Management Guideline

for Open Data Implementation

(case study in Indonesia)

Arry Akhmad Arman

Institut Teknologi Bandung Jl. Ganesha 10 Bandung

Indonesia 40132 Phone: +62-22-2502260

[email protected]

Gilang Ramadhan

Institut Teknologi Bandung

Jl. Ganesha 10 Bandung Indonesia 40132 Phone: +62-22-2502260

[email protected]

Muhammad Fajrin

Institut Teknologi Bandung

Jl. Ganesha 10 Bandung Indonesia 40132 Phone: +62-22-2502260

[email protected]

ABSTRACT

Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control [8]. The goals of the open data movement are similar to those of other "Open" movements such as open source, open platform, open standard, and others.

In 2008 Indonesian Government released Act Number 14 of 2008 about Public Information Openness. Following that Act, in 2010 Indonesian Government released Government Regulation number 61 of 2010. This regulation is implementable guideline for implementation of Open Data in Indonesia.

Many initiatives already conducted to push many government institution to be more open. The challenge is the lack of guideline and practices in Data Management. Open Data implementation without strong Data Management and Governance can create many risks.

Realizing that challenge, this research try to propose the guideline to govern and manage Open Data activities. The approaches is to modify DAMA-DMBOK guideline to fulfill characteristics of Open Data. There are some proposed improvement of DAMA-DMBOK to fulfill Open Data needs. Even this research triggered by the need of Indonesian Government, but hopefully this solution can be apply to other parties.

CCS Concepts

• General and reference~Computing standards, RFCs and guidelines • Applied computing~Enterprise data managementApplied computing~E-government

Keywords

Open Data Guideline, Data Management, DAMA-DMBOK.

1.

INTRODUCTION

Indonesia Government has a lot of valuable data and information to be shared. However, most of them are still difficult to be consumed and accessed by society. Many data and information are still stored in hardcopy form. The majority of data and information are also closed or hard to access and stored in various places. Align with the growing of ICT using in Indonesian Government and the initiation of Open Government Indonesia initiative, the era of Open Data is starting in Indonesia.

According to the World Bank, Open Data is data that is technically open and legally open. Technically, open means that the data is available in a standard format that can be retrieved, read, and processed by computer applications. Open legally means that data has a license that allows the use of freely and can be reused without restriction [8]. Open data can provide benefits in various fields. Based on the results of research conducted by McKinsey & Company in 2013, open data can provide a positive impact on education, transport, electricity, oil and gas, healthcare, and finance. The positive impact can be reached because people can more easily obtain and process the data to obtain the information needed [6].

In addition to provide a variety of benefits, open the data also has many challenges. For example, from the Data Security side, it must be ensured that the internal database system can be protected from any attack. From the Data Quality side, it must be ensured that the data have a good quality. Moreover, there are still many aspects to be considered such as data format, metadata quality, compliance with laws and regulations, and others. To overcome various challenges in open data, the data management activities in organization need to be conducted in daily basis. Without a strong data management, the benefits offered by open data will not be realized. Furthermore, there will be many risks and threats if data management activity is not conducted properly.

One alternative for implementing data management is by following data management guidelines defined by the Data Management Association (DAMA). DAMA create data management guidelines called The Guide to the Data Management Body of Knowledge (DMBOK). DMBOK contains a very comprehensive data management. DMBOK Version 1 defines 10 areas in data management functions. Each function area contains activities to perform data management.

DAMA-DMBOK defines a general data management activity that is not designed specifically for open data. In the implementation of open data, there are the addition activities that must be done. Moreover, the activities must be comply with the law related with open data in Indonesia. This research goal is creating the

(2)

Guidelines of Open Data Management that comply with Open Data Principle and the laws and regulations related to open data.

2.

OPEN DATA

2.1

Definition

According to the Open Knowledge Foundation, Open Data is data that can be used independently as well as used and redistributed by anyone. Definition of open data can be summarized as follows [6].

1) Availability and Access: the data must be available completely and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. 2) Re-use and Redistribution: the data must be provided under

terms that permit re-use and redistribution including the intermixing with other datasets.

3) Universal Participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavor or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.

The World Bank stated that Data is open if it satisfies both conditions below [8].

1) The data must be legally open, which means they must be placed in the public domain or under liberal terms of use with minimal restrictions.

2) The data must be technically open, which means they must be published in electronic formats that are machine readable and preferably non-proprietary, so that anyone can access and use the data using common, freely available software tools. Data must also be publicly available and accessible on a public server, without password or firewall restrictions.

2.2

Open Data Principles

Data shall be considered open if it is made public in a way that complies with the principles below [7]:

1) Complete, All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

2) Primary, Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.

3) Timely, Data is made available as quickly as necessary to preserve the value of the data.

4) Accessible, Data is available to the widest range of users for the widest range of purposes.

5) Machine processable, Data is reasonably structured to allow automated processing.

6) Non-discriminatory, Data is available to anyone, with no requirement of registration.

7) Non-proprietary, Data is available in a format over which no entity has exclusive control.

8) License-free, Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

2.3

Open Data Lifecycle Model

SELECT MODEL PUBLISH FIND INTEGRATE REUSE

DATA PUBLISHER DATA CONSUMER

DATA MANAGEMENT

FEEDBACK

SUPPLY DEMAND

Figure 1. Open Data Lifecycle Model [5]

Open Data Support defines open data lifecycle in Figure 1 called Linked Open Government Data lifecycle. The lifecycle is divided into two sides: Data Publisher and Data Consumer. Based on Figure 1, data management is occurred in data publisher side. In the data publisher part, there are 3 phases as follows [5].

1) Select

Several dimensions can be considered in the selection process of open data as follows.

a. transparency b. legal requirements c. relation to public task

d. current status of open publication e. type of value

f. audience 2) Model

The purpose of data modeling phase is to make data available in structured, comprehensible, and machine-readable way. In this phase, data must be cleaned so that can increase data quality. Furthermore, there is an addition of metadata to the dataset.

3) Publish

After dataset has been selected and modeled, data must be uploaded to the open data portal or website.

3.

DATA MANAGEMENT

3.1

DAMA-DMBOK Framework

Actually, Data Management is not new issue in computing environment. There are several comprehensive references that can be adopted to do Data Management practices in organization. One of the popular references is DMBOK (Data Management Body of Knowledge) from DAMA (Data Management Association). Agency can adopt DMBOK, fully or partially. Actually, all DMBOK components are important to manage all internal data properly, but not all components are really important for Open Data.

As mentioned before, one of popular reference for Data Management is DMBOK that published by DAMA. DMBOK divide Data Management into 9 areas plus Data Governance as an umbrella for all areas [2].

1) Data Governance, its focus on planning, supervision, and control over data management and use.

2) Data Architecture Management, its defining the blueprint for managing data assets.

(3)

3) Data Development, its focus on analysis, design, implementation, testing, deployment and maintenance of data.

4) Data Operation Management, its focus on providing support from data acquisition to purging.

5) Data Security Management, its focus on insuring privacy, confidentiality and appropriate access.

6) Data Quality Management, its focus on defining, monitoring, and improving data quality.

7) Reference and Master Data Management, its focus on managing golden versions and replicas.

8) Data Warehousing and Business Intelligence, its focus on enabling reporting and analysis

9) Meta-Data Management, its focus on integrating, controlling, and providing meta-data

Figure 2. DAMA-DMBOK Functions Area [6]

3.2

Roles of Data Management Framework

Principally, Open Data is a part of complete set of internal data. Some of internal data, raw or processed data, will be defined as an open data. Data Management Framework will play in three different roles [1].

First, DMF (Data Management Framework) role is to manage all internal data of an organization. This role actually always needed in any organization in the condition if its apply Open Data or not. Some organizations still don’t play this role. All data flow automatically between applications and database systems without supporting by Data Management and organization part to do that. Second, DMF role is to support selection and preparation activities for Open Data. Not all DMBOK area is needed to conduct this role. Third, DMF role is to manage all Opened Data as a Service. It’s can be identify easily that all DMBOK area is needed to conduct properly, except Data Warehouse and Business Intelligence Management. This activity is not part of Open Data, but it’s possibly conducted in a user side.

Data Management Framework (DMF)

All Data

Open Data

DMF role for Managing Open Data as a service DMF role for

Selecting and Preparing Open Data DMF role for

All (internal) Data

Complete, Primary, Timely, Accessible, Machine Processable Non-Discriminatory Non-Propiteary License-Free

Users of Data

Figure 3. Role of Data Management Framework [1]

4.

LAWS IN INDONESIA RELATED TO

OPEN DATA

4.1

Act Number 14 of 2008 about Public

Information Openness

Based on this Act, information that can be open to the public in open government data is called public information. Public information is information that is created, stored, managed, sent, and accepted by public agencies. Act Number 14 of 2008 guarantee citizens right to be informed about plan of public policy, public policy program, and process of public policy decision making, and the reason behind the decision:

1) encourage public participation in the public policy decision making process;

2) increase active public participation in the public policy decision making process;

3) create good governance which transparent, effective and efficient, accountable and responsible…".

4.2

Act Number 43 of 2009 about Archives

This act was made in order to realize the comprehensive and integrated national archival system implementation. National archival institutions or public agencies need to build a national archival system which includes the management of static archives and dynamic archives. National archival system and public agencies ensures the availability of authentic, complete, and reliable archives. Moreover, National Archival System also able to identify the existence of archives that has relevance information as the integrity of archival information on all organizations.

4.3

Indonesian Government Regulation

61/2010 as Implementation of Act 14/2008

Indonesian Act 14/2008 on Public Information Openness is a new regime of law in Indonesia that carrying transparency principles on the implementation of national life. That act doesn’t only affect the government agency, but also the other organization in Indonesia that receive funds from the government. For further adjustment, the Indonesian government makes a special regulation

(4)

according to the Act 14/2008. In this act, everyone in the organization must become more transparent, responsible and service oriented to drive a better democratic life.

4.4

Indonesian Minister of Home Affairs

Regulation 35/2010

This Act regulates the management of information and documentation on public agencies such as ministry of home affairs and local government. The information that is regulated by this Act is public information. Based on this regulation, public information in the Indonesian Ministry of Home Affairs and local government is opened and accessible for all public information users. Nevertheless, not all of public information can be accessible by users. Based on article this regulation (article 4), there are limited public information that can only access by privileged users.

5.

ANALYSIS AND DESIGN

5.1

Mapping between Activity in

DAMA-DMBOK and Role of Data Management

Framework

DAMA-DMBOK defines 10 processes area in the data management framework. All of the process area is general. In order to use DAMA-DMBOK in the implementation of open data, classification of data management role is required.

Table 1. Mapping between process area in DAMA-DMBOK and role of data management framework No. Process Area Role 1 Role 2 Role 3

1. Data Governance    2. Data Architecture

Management

  -

3. Data Development

  -

4. Data Operation Management

  

5. Data Security Management

  

6. Reference and Master Data Management

  -

7. Data Warehouse and Business

Intelligence Management

  -

8. Document and Content Management

  -

9. Metadata Management

  

10. Data Quality Management

  

Data management framework has three main roles that are role 1 for all (internal) data, role 2 for selecting and preparing open data, and role 3 for managing open data as a service. Based on Table 1,

all of activity in a process area can be classified into role for all (internal) data. Nevertheless, not all activity can be classified into open data role. An analysis needs to be done to see relationship between process area in DAMA-DMBOK and data management role.

5.2

Relationship Analysis between Data

Management and Open Data Lifecycle Model

Activities that are defined by DAMA-DMBOK are general activities. Nevertheless, open data implementation needs specific data management guidelines. An analysis need to be done to see the relationship between data management activity and open data lifecycle. Table 2 describes the relationship between open data lifecycle model and data management activities.

Table 2. Relationship between open data lifecycle model and DAMA-DMBOK Stage of

Open Data Lifecycle

Model

Process Area of

DAMA-DMBOK Activity

Select

Data governance Monitor and ensure regulatory compliance Data development Select data for open data Data operations

management

Achieve, retain, and purge data

Data security management

Define data security policy

Data warehousing and business intelligence management

Process data for business intelligence

Data quality management

Define data quality metrics

Model

Data governance

Develop and approve data policies, standards, and procedures

Data development Data conversion for open data

Data operations management

Inventory and track data technology licenses Data security

management

Define data security policy

Reference and master data management

Define and maintain match rules Data warehousing

and business intelligence management

Process data for business intelligence

Metadata

Management Define metadata standard Data quality

management

Clean and correct data quality defect

Publish

Data governance Identify and appoint data stewards

Data development Publish data for open data Reference and

master data management

Define and maintain hierarchies and affiliations Metadata

management

Distribute and deliver metadata

(5)

5.3

Mapping between Activity in

DAMA-DMBOK and Open Data Principles

Based on the mapping between activity in DAMA-DMBOK and role of data management framework, we could find activity in DAMA-DMBOK that contributes directly to the open data implementation. That activity is classified into role for selecting and preparing open data; and role for managing open data as a service. Other activity that only classify into role for all (internal) data will not be included in this mapping process due to not directly related to open data implementation.

The activity that related to open data implementation will be map into the 8 open data principles that has been mentioned before. One data management activity can be mapped into several open data principles. The purpose of this mapping is to show the open data principles that supported by a data management activity. Based on Table 3, we can see that most of the process area is only related to the first three open data principles which are complete, primary and timely. Otherwise, the other open data principles are only related to some of the process area, such as data governance, data development and data operation management. The other open data principles are technical implementation based and hardly related to DAMA-DMBOK process area.

Table 3. Mapping between process area in DAMA-DMBOK and open data principles No. Process Area Open Data

Principles

Relationship

1. Data Governance

Complete  Primary  Timely  Accessible  Machine Processable  Non Discriminatory  Non Proprietary  License Free 

2. Data Architecture Management

Complete  Primary - Timely - Accessible  Machine Processable  Non Discriminatory - Non Proprietary  License Free -

3. Data Development

Complete  Primary  Timely  Accessible  Machine Processable  Non Discriminatory - Non Proprietary  License Free -

4. Data Operation Management

Complete  Primary  Timely  Accessible  Machine Processable  Non Discriminatory - Non Proprietary  License Free 

No. Process Area Open Data Principles

Relationship

5. Data Security Management

Complete  Primary  Timely - Accessible  Machine Processable  Non Discriminatory - Non Proprietary - License Free 

6.

Reference and Master Data Management

Complete  Primary  Timely - Accessible  Machine Processable  Non Discriminatory - Non Proprietary  License Free -

7.

Data Warehousing and

Business Intelligence Management

Complete  Primary  Timely  Accessible - Machine Processable - Non Discriminatory - Non Proprietary - License Free -

8.

Document and Content Management

Complete  Primary  Timely  Accessible - Machine Processable - Non Discriminatory - Non Proprietary - License Free -

9. Metadata Management

Complete  Primary  Timely  Accessible  Machine Processable  Non Discriminatory - Non Proprietary  License Free -

10. Data Quality Management

Complete  Primary  Timely - Accessible - Machine Processable - Non Discriminatory - Non Proprietary - License Free -

5.4

Mapping between Activity in

DAMA-DMBOK and Laws in Indonesia Related to

Open Data

DAMA-DMBOK framework is a generic framework that made suitable for any data management process in any organization in the world. Therefore, the data management activity that defined in the DAMA-DMBOK is also a general process. With that fact, we have to give attention to the Indonesian laws in order to make the

(6)

designed guidelines be more contextual to organization in Indonesia.

Nowadays, Indonesia still doesn’t have any specific law that regulate about open data implementation. So that, in this process we use Indonesian laws that related to data openness. There are several laws that have been mentioned in the previous section. Based on Table 4, not all the process area can be mapped into the Indonesian laws. There are two processes area that all the process within is not related to any Indonesian laws that regulate about data openness. That process area is data warehouse and business intelligence management; and metadata management. This is occur because Indonesian government still doesn’t have regulation that specifically address metadata, data warehouse and business intelligence.

Table 4. Mapping between process area in DAMA-DMBOK and laws in Indonesia related to open data

N

o Process Area Act 14/ 2008

Act 43/ 2009

IG Regulation

61/ 2010

IMHA Regula-tion 35/ 2010 1. Data

Governance     2. Data

Architecture Management

- - - 

3. Data

Development   -  4. Data

Operation Management

   -

5. Data Security

Management     6. Reference and

Master Data Management

 - - -

7. Data Warehouse and Business Intelligence Management

- - - -

8. Document and Content Management

   -

9. Metadata

Management - - - - 1

0.

Data Quality

Management   - -

5.5

Guideline Design

The form of proposed Data Management Guideline shown in Figure 4. It’s adopted from COBIT 5 Process Assessment Model (PAM). Although the form is adopted from COBIT PAM, COBIT is not used in this guidelines design at all. Table form from COBIT 5 PAM is selected as a presentation of the data management guidelines because COBIT PAM defines the process clearly and is easy to understand.

The guideline is divided into four small section as follows. 1) General information which contains:

a. role of data management which is obtained from the result of relationship analysis between process area in

DAMA-DMBOK and role of data management framework (part 5.1)

b. description of activity c. purpose of activity

d. open data principles that are supported by data management activities

2) Outcome of activity: desired state after the activity has been done

3) Best practice: steps that must be done so that the outcome of activity can be achieved. Best practice comes from various sources such as DAMA-DMBOK, open data lifecycle model, and law in Indonesia related to Open Data.

4) Work products input and output: deliverables that produced or required in an activity.

Figure 4. Data management guidelines design

Best practice and work products have a relationship with other elements in this guidelines. Best practice help to achieve outcomes that expressed in the activity. Work products can support outcomes and best practice in the activity.

6.

CONCLUDING REMARK

Government will get many benefits by pushing implementation of Open Data in many government agencies as well as business sectors. Indonesia has Acts and Government Regulation related to Open Data, but it’s not enough to guide the detail implementation or Open Data management process.

This research show the process of designing Open Data Management Guideline that can be used as a reference for Open

(7)

Data Management in Indonesia Government Agencies. The same steps can be adopted for other government to fulfill the need of similar guideline.

This research not give strong attention to the security aspect. Further research can be focused on enhancement of security aspect of the guideline by combining with Information Security Management Standard such as ISO/IEC 27000.

7.

REFERENCES

[1] Arman, A. A., Sembiring, J., Suhardi., The importance of data management to support open data: case study in Indonesia. Proceedings of the 8th International Conference on Theory and Practice of Electronic Governance ICEGOV 2014. 504-504. DOI=

http://dl.acm.org/citation.cfm?doid=2691195.2691292 [2] Data Management Association. 2009. The DAMA Guide

to The Data Management Body of Knowledge (DAMA-DMBOK Guide) First Edition, Technics Publications, LLC.

[3] McKinsey & Company. 2013. Open data: Unlocking innovation and performance with liquid information. Retrieved on January 12, 2015 from McKinsey & Company:

http://www.mckinsey.com/~/media/McKinsey/dotcom/I nsights/Business%20Technology/Open%20data%20Unl ocking%20innovation%20and%20performance%20with %20liquid%20information/MGI_OpenData_Full_report _Oct2013.ashx

[4] Ministry of Communication and Information of Republic of Indonesia. 2008. Act of the Republic of Indonesia Number 14 of 2008 on Public Information Openness. Retrieved on June 20, 2015 from Ministry of Communication and Information of Republic of Indonesia:

https://ppidkemkominfo.files.wordpress.com/2012/12/a ct-of-the-republic-of-indonesia-number-14-of-2008-on-public-information-openness.pdf

[5] Open Data Support. 2014. The Linked Open

Government Data & Metadata Life Cycle. Retrieved on May 20, 2015 from Open Data Support:

http://www.slideshare.net/OpenDataSupport/the-linked-open-government-data-lifecycle

[6] Open Government Data. 2014. The Annotated 8 Principles of Open Government Data. Retrieved on January 15, 2015 from Open Government Data: http://opengovdata.org/, January 15, 2015 [7] Open Knowledge Foundation. 2012. Open Data

Handbook. Retrieved on January 12, 2015, from Open Knowledge Foundation:

http://opendatahandbook.org/en/index.html. [8] World Bank Group. 2014. Open Data Essentials.

Retrieved on July 10, 2014 from World Bank Open Government Data Toolkit:

http://data.worldbank.org/about/open-government-data-toolkit/knowledge-repository.

Figure

Figure 1. Open Data Lifecycle Model [5]
Figure 2. DAMA-DMBOK Functions Area [6]
Table 1. Mapping between process area in   DAMA-DMBOK and role of data management framework  No
Table 3. Mapping between process area in   DAMA-DMBOK and open data principles  No.  Process Area  Open Data
+2

References

Related documents

Bruce spent 14 seasons with the Rams after being drafted by Los Angeles in the second round (33rd overall) of the 1994 NFL Draft, and holds every major receiving record in team

As you may recall, last year Evanston voters approved a referendum question for electric aggregation and authorized the city to negotiate electricity supply rates for its residents

The most prevalent causes of virus-induced anterior uveitis are herpes simplex virus (HSV), varicella zoster virus (VZV), human cytomegalovirus (HCMV), and rubella virus.. 1

15,16 According to the Medical Advisory Board of the National Multiple Sclerosis Society, treatment with disease-modifying agents should continue indefinitely, as indicated,

The human microbiomes contained a high relative abundance of resistance genes, while the relative abundances varied greatly in the marine and soil metagenomes, when datasets

Indeed, in a survey of bilingual parents of children with ASD who were contacted through the Autism Spectrum Disorders—Canadian American Research Consortium, parents indicated

 Volume fraction (F21) of the dispersed phase in O/W emulsions is increased by increasing the oily phase and/or particle concentration if the product contains suspended

5 CDDC functions transferred into DCC, clearly branded as a business focused team, but arms length company retained to maximise any future trading opportunities