Design of Data Management Guideline
for Open Data Implementation
(case study in Indonesia)
Arry Akhmad Arman
Institut Teknologi Bandung Jl. Ganesha 10 Bandung
Indonesia 40132 Phone: +62-22-2502260
[email protected]
Gilang Ramadhan
Institut Teknologi BandungJl. Ganesha 10 Bandung Indonesia 40132 Phone: +62-22-2502260
[email protected]
Muhammad Fajrin
Institut Teknologi BandungJl. Ganesha 10 Bandung Indonesia 40132 Phone: +62-22-2502260
[email protected]
ABSTRACT
Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control [8]. The goals of the open data movement are similar to those of other "Open" movements such as open source, open platform, open standard, and others.
In 2008 Indonesian Government released Act Number 14 of 2008 about Public Information Openness. Following that Act, in 2010 Indonesian Government released Government Regulation number 61 of 2010. This regulation is implementable guideline for implementation of Open Data in Indonesia.
Many initiatives already conducted to push many government institution to be more open. The challenge is the lack of guideline and practices in Data Management. Open Data implementation without strong Data Management and Governance can create many risks.
Realizing that challenge, this research try to propose the guideline to govern and manage Open Data activities. The approaches is to modify DAMA-DMBOK guideline to fulfill characteristics of Open Data. There are some proposed improvement of DAMA-DMBOK to fulfill Open Data needs. Even this research triggered by the need of Indonesian Government, but hopefully this solution can be apply to other parties.
CCS Concepts
• General and reference~Computing standards, RFCs and guidelines • Applied computing~Enterprise data management • Applied computing~E-government
Keywords
Open Data Guideline, Data Management, DAMA-DMBOK.
1.
INTRODUCTION
Indonesia Government has a lot of valuable data and information to be shared. However, most of them are still difficult to be consumed and accessed by society. Many data and information are still stored in hardcopy form. The majority of data and information are also closed or hard to access and stored in various places. Align with the growing of ICT using in Indonesian Government and the initiation of Open Government Indonesia initiative, the era of Open Data is starting in Indonesia.
According to the World Bank, Open Data is data that is technically open and legally open. Technically, open means that the data is available in a standard format that can be retrieved, read, and processed by computer applications. Open legally means that data has a license that allows the use of freely and can be reused without restriction [8]. Open data can provide benefits in various fields. Based on the results of research conducted by McKinsey & Company in 2013, open data can provide a positive impact on education, transport, electricity, oil and gas, healthcare, and finance. The positive impact can be reached because people can more easily obtain and process the data to obtain the information needed [6].
In addition to provide a variety of benefits, open the data also has many challenges. For example, from the Data Security side, it must be ensured that the internal database system can be protected from any attack. From the Data Quality side, it must be ensured that the data have a good quality. Moreover, there are still many aspects to be considered such as data format, metadata quality, compliance with laws and regulations, and others. To overcome various challenges in open data, the data management activities in organization need to be conducted in daily basis. Without a strong data management, the benefits offered by open data will not be realized. Furthermore, there will be many risks and threats if data management activity is not conducted properly.
One alternative for implementing data management is by following data management guidelines defined by the Data Management Association (DAMA). DAMA create data management guidelines called The Guide to the Data Management Body of Knowledge (DMBOK). DMBOK contains a very comprehensive data management. DMBOK Version 1 defines 10 areas in data management functions. Each function area contains activities to perform data management.
DAMA-DMBOK defines a general data management activity that is not designed specifically for open data. In the implementation of open data, there are the addition activities that must be done. Moreover, the activities must be comply with the law related with open data in Indonesia. This research goal is creating the
Guidelines of Open Data Management that comply with Open Data Principle and the laws and regulations related to open data.
2.
OPEN DATA
2.1
Definition
According to the Open Knowledge Foundation, Open Data is data that can be used independently as well as used and redistributed by anyone. Definition of open data can be summarized as follows [6].
1) Availability and Access: the data must be available completely and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. 2) Re-use and Redistribution: the data must be provided under
terms that permit re-use and redistribution including the intermixing with other datasets.
3) Universal Participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavor or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
The World Bank stated that Data is open if it satisfies both conditions below [8].
1) The data must be legally open, which means they must be placed in the public domain or under liberal terms of use with minimal restrictions.
2) The data must be technically open, which means they must be published in electronic formats that are machine readable and preferably non-proprietary, so that anyone can access and use the data using common, freely available software tools. Data must also be publicly available and accessible on a public server, without password or firewall restrictions.
2.2
Open Data Principles
Data shall be considered open if it is made public in a way that complies with the principles below [7]:
1) Complete, All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
2) Primary, Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
3) Timely, Data is made available as quickly as necessary to preserve the value of the data.
4) Accessible, Data is available to the widest range of users for the widest range of purposes.
5) Machine processable, Data is reasonably structured to allow automated processing.
6) Non-discriminatory, Data is available to anyone, with no requirement of registration.
7) Non-proprietary, Data is available in a format over which no entity has exclusive control.
8) License-free, Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.
2.3
Open Data Lifecycle Model
SELECT MODEL PUBLISH FIND INTEGRATE REUSE
DATA PUBLISHER DATA CONSUMER
DATA MANAGEMENT
FEEDBACK
SUPPLY DEMAND
Figure 1. Open Data Lifecycle Model [5]
Open Data Support defines open data lifecycle in Figure 1 called Linked Open Government Data lifecycle. The lifecycle is divided into two sides: Data Publisher and Data Consumer. Based on Figure 1, data management is occurred in data publisher side. In the data publisher part, there are 3 phases as follows [5].
1) Select
Several dimensions can be considered in the selection process of open data as follows.
a. transparency b. legal requirements c. relation to public task
d. current status of open publication e. type of value
f. audience 2) Model
The purpose of data modeling phase is to make data available in structured, comprehensible, and machine-readable way. In this phase, data must be cleaned so that can increase data quality. Furthermore, there is an addition of metadata to the dataset.
3) Publish
After dataset has been selected and modeled, data must be uploaded to the open data portal or website.
3.
DATA MANAGEMENT
3.1
DAMA-DMBOK Framework
Actually, Data Management is not new issue in computing environment. There are several comprehensive references that can be adopted to do Data Management practices in organization. One of the popular references is DMBOK (Data Management Body of Knowledge) from DAMA (Data Management Association). Agency can adopt DMBOK, fully or partially. Actually, all DMBOK components are important to manage all internal data properly, but not all components are really important for Open Data.
As mentioned before, one of popular reference for Data Management is DMBOK that published by DAMA. DMBOK divide Data Management into 9 areas plus Data Governance as an umbrella for all areas [2].
1) Data Governance, its focus on planning, supervision, and control over data management and use.
2) Data Architecture Management, its defining the blueprint for managing data assets.
3) Data Development, its focus on analysis, design, implementation, testing, deployment and maintenance of data.
4) Data Operation Management, its focus on providing support from data acquisition to purging.
5) Data Security Management, its focus on insuring privacy, confidentiality and appropriate access.
6) Data Quality Management, its focus on defining, monitoring, and improving data quality.
7) Reference and Master Data Management, its focus on managing golden versions and replicas.
8) Data Warehousing and Business Intelligence, its focus on enabling reporting and analysis
9) Meta-Data Management, its focus on integrating, controlling, and providing meta-data
Figure 2. DAMA-DMBOK Functions Area [6]
3.2
Roles of Data Management Framework
Principally, Open Data is a part of complete set of internal data. Some of internal data, raw or processed data, will be defined as an open data. Data Management Framework will play in three different roles [1].
First, DMF (Data Management Framework) role is to manage all internal data of an organization. This role actually always needed in any organization in the condition if its apply Open Data or not. Some organizations still don’t play this role. All data flow automatically between applications and database systems without supporting by Data Management and organization part to do that. Second, DMF role is to support selection and preparation activities for Open Data. Not all DMBOK area is needed to conduct this role. Third, DMF role is to manage all Opened Data as a Service. It’s can be identify easily that all DMBOK area is needed to conduct properly, except Data Warehouse and Business Intelligence Management. This activity is not part of Open Data, but it’s possibly conducted in a user side.
Data Management Framework (DMF)
All Data
Open Data
DMF role for Managing Open Data as a service DMF role for
Selecting and Preparing Open Data DMF role for
All (internal) Data
Complete, Primary, Timely, Accessible, Machine Processable Non-Discriminatory Non-Propiteary License-Free
Users of Data
Figure 3. Role of Data Management Framework [1]
4.
LAWS IN INDONESIA RELATED TO
OPEN DATA
4.1
Act Number 14 of 2008 about Public
Information Openness
Based on this Act, information that can be open to the public in open government data is called public information. Public information is information that is created, stored, managed, sent, and accepted by public agencies. Act Number 14 of 2008 guarantee citizens right to be informed about plan of public policy, public policy program, and process of public policy decision making, and the reason behind the decision:
1) encourage public participation in the public policy decision making process;
2) increase active public participation in the public policy decision making process;
3) create good governance which transparent, effective and efficient, accountable and responsible…".
4.2
Act Number 43 of 2009 about Archives
This act was made in order to realize the comprehensive and integrated national archival system implementation. National archival institutions or public agencies need to build a national archival system which includes the management of static archives and dynamic archives. National archival system and public agencies ensures the availability of authentic, complete, and reliable archives. Moreover, National Archival System also able to identify the existence of archives that has relevance information as the integrity of archival information on all organizations.
4.3
Indonesian Government Regulation
61/2010 as Implementation of Act 14/2008
Indonesian Act 14/2008 on Public Information Openness is a new regime of law in Indonesia that carrying transparency principles on the implementation of national life. That act doesn’t only affect the government agency, but also the other organization in Indonesia that receive funds from the government. For further adjustment, the Indonesian government makes a special regulation
according to the Act 14/2008. In this act, everyone in the organization must become more transparent, responsible and service oriented to drive a better democratic life.
4.4
Indonesian Minister of Home Affairs
Regulation 35/2010
This Act regulates the management of information and documentation on public agencies such as ministry of home affairs and local government. The information that is regulated by this Act is public information. Based on this regulation, public information in the Indonesian Ministry of Home Affairs and local government is opened and accessible for all public information users. Nevertheless, not all of public information can be accessible by users. Based on article this regulation (article 4), there are limited public information that can only access by privileged users.
5.
ANALYSIS AND DESIGN
5.1
Mapping between Activity in
DAMA-DMBOK and Role of Data Management
Framework
DAMA-DMBOK defines 10 processes area in the data management framework. All of the process area is general. In order to use DAMA-DMBOK in the implementation of open data, classification of data management role is required.
Table 1. Mapping between process area in DAMA-DMBOK and role of data management framework No. Process Area Role 1 Role 2 Role 3
1. Data Governance 2. Data Architecture
Management
-
3. Data Development
-
4. Data Operation Management
5. Data Security Management
6. Reference and Master Data Management
-
7. Data Warehouse and Business
Intelligence Management
-
8. Document and Content Management
-
9. Metadata Management
10. Data Quality Management
Data management framework has three main roles that are role 1 for all (internal) data, role 2 for selecting and preparing open data, and role 3 for managing open data as a service. Based on Table 1,
all of activity in a process area can be classified into role for all (internal) data. Nevertheless, not all activity can be classified into open data role. An analysis needs to be done to see relationship between process area in DAMA-DMBOK and data management role.
5.2
Relationship Analysis between Data
Management and Open Data Lifecycle Model
Activities that are defined by DAMA-DMBOK are general activities. Nevertheless, open data implementation needs specific data management guidelines. An analysis need to be done to see the relationship between data management activity and open data lifecycle. Table 2 describes the relationship between open data lifecycle model and data management activities.
Table 2. Relationship between open data lifecycle model and DAMA-DMBOK Stage of
Open Data Lifecycle
Model
Process Area of
DAMA-DMBOK Activity
Select
Data governance Monitor and ensure regulatory compliance Data development Select data for open data Data operations
management
Achieve, retain, and purge data
Data security management
Define data security policy
Data warehousing and business intelligence management
Process data for business intelligence
Data quality management
Define data quality metrics
Model
Data governance
Develop and approve data policies, standards, and procedures
Data development Data conversion for open data
Data operations management
Inventory and track data technology licenses Data security
management
Define data security policy
Reference and master data management
Define and maintain match rules Data warehousing
and business intelligence management
Process data for business intelligence
Metadata
Management Define metadata standard Data quality
management
Clean and correct data quality defect
Publish
Data governance Identify and appoint data stewards
Data development Publish data for open data Reference and
master data management
Define and maintain hierarchies and affiliations Metadata
management
Distribute and deliver metadata
5.3
Mapping between Activity in
DAMA-DMBOK and Open Data Principles
Based on the mapping between activity in DAMA-DMBOK and role of data management framework, we could find activity in DAMA-DMBOK that contributes directly to the open data implementation. That activity is classified into role for selecting and preparing open data; and role for managing open data as a service. Other activity that only classify into role for all (internal) data will not be included in this mapping process due to not directly related to open data implementation.
The activity that related to open data implementation will be map into the 8 open data principles that has been mentioned before. One data management activity can be mapped into several open data principles. The purpose of this mapping is to show the open data principles that supported by a data management activity. Based on Table 3, we can see that most of the process area is only related to the first three open data principles which are complete, primary and timely. Otherwise, the other open data principles are only related to some of the process area, such as data governance, data development and data operation management. The other open data principles are technical implementation based and hardly related to DAMA-DMBOK process area.
Table 3. Mapping between process area in DAMA-DMBOK and open data principles No. Process Area Open Data
Principles
Relationship
1. Data Governance
Complete Primary Timely Accessible Machine Processable Non Discriminatory Non Proprietary License Free
2. Data Architecture Management
Complete Primary - Timely - Accessible Machine Processable Non Discriminatory - Non Proprietary License Free -
3. Data Development
Complete Primary Timely Accessible Machine Processable Non Discriminatory - Non Proprietary License Free -
4. Data Operation Management
Complete Primary Timely Accessible Machine Processable Non Discriminatory - Non Proprietary License Free
No. Process Area Open Data Principles
Relationship
5. Data Security Management
Complete Primary Timely - Accessible Machine Processable Non Discriminatory - Non Proprietary - License Free
6.
Reference and Master Data Management
Complete Primary Timely - Accessible Machine Processable Non Discriminatory - Non Proprietary License Free -
7.
Data Warehousing and
Business Intelligence Management
Complete Primary Timely Accessible - Machine Processable - Non Discriminatory - Non Proprietary - License Free -
8.
Document and Content Management
Complete Primary Timely Accessible - Machine Processable - Non Discriminatory - Non Proprietary - License Free -
9. Metadata Management
Complete Primary Timely Accessible Machine Processable Non Discriminatory - Non Proprietary License Free -
10. Data Quality Management
Complete Primary Timely - Accessible - Machine Processable - Non Discriminatory - Non Proprietary - License Free -
5.4
Mapping between Activity in
DAMA-DMBOK and Laws in Indonesia Related to
Open Data
DAMA-DMBOK framework is a generic framework that made suitable for any data management process in any organization in the world. Therefore, the data management activity that defined in the DAMA-DMBOK is also a general process. With that fact, we have to give attention to the Indonesian laws in order to make the
designed guidelines be more contextual to organization in Indonesia.
Nowadays, Indonesia still doesn’t have any specific law that regulate about open data implementation. So that, in this process we use Indonesian laws that related to data openness. There are several laws that have been mentioned in the previous section. Based on Table 4, not all the process area can be mapped into the Indonesian laws. There are two processes area that all the process within is not related to any Indonesian laws that regulate about data openness. That process area is data warehouse and business intelligence management; and metadata management. This is occur because Indonesian government still doesn’t have regulation that specifically address metadata, data warehouse and business intelligence.
Table 4. Mapping between process area in DAMA-DMBOK and laws in Indonesia related to open data
N
o Process Area Act 14/ 2008
Act 43/ 2009
IG Regulation
61/ 2010
IMHA Regula-tion 35/ 2010 1. Data
Governance 2. Data
Architecture Management
- - -
3. Data
Development - 4. Data
Operation Management
-
5. Data Security
Management 6. Reference and
Master Data Management
- - -
7. Data Warehouse and Business Intelligence Management
- - - -
8. Document and Content Management
-
9. Metadata
Management - - - - 1
0.
Data Quality
Management - -
5.5
Guideline Design
The form of proposed Data Management Guideline shown in Figure 4. It’s adopted from COBIT 5 Process Assessment Model (PAM). Although the form is adopted from COBIT PAM, COBIT is not used in this guidelines design at all. Table form from COBIT 5 PAM is selected as a presentation of the data management guidelines because COBIT PAM defines the process clearly and is easy to understand.
The guideline is divided into four small section as follows. 1) General information which contains:
a. role of data management which is obtained from the result of relationship analysis between process area in
DAMA-DMBOK and role of data management framework (part 5.1)
b. description of activity c. purpose of activity
d. open data principles that are supported by data management activities
2) Outcome of activity: desired state after the activity has been done
3) Best practice: steps that must be done so that the outcome of activity can be achieved. Best practice comes from various sources such as DAMA-DMBOK, open data lifecycle model, and law in Indonesia related to Open Data.
4) Work products input and output: deliverables that produced or required in an activity.
Figure 4. Data management guidelines design
Best practice and work products have a relationship with other elements in this guidelines. Best practice help to achieve outcomes that expressed in the activity. Work products can support outcomes and best practice in the activity.
6.
CONCLUDING REMARK
Government will get many benefits by pushing implementation of Open Data in many government agencies as well as business sectors. Indonesia has Acts and Government Regulation related to Open Data, but it’s not enough to guide the detail implementation or Open Data management process.
This research show the process of designing Open Data Management Guideline that can be used as a reference for Open
Data Management in Indonesia Government Agencies. The same steps can be adopted for other government to fulfill the need of similar guideline.
This research not give strong attention to the security aspect. Further research can be focused on enhancement of security aspect of the guideline by combining with Information Security Management Standard such as ISO/IEC 27000.
7.
REFERENCES
[1] Arman, A. A., Sembiring, J., Suhardi., The importance of data management to support open data: case study in Indonesia. Proceedings of the 8th International Conference on Theory and Practice of Electronic Governance ICEGOV 2014. 504-504. DOI=
http://dl.acm.org/citation.cfm?doid=2691195.2691292 [2] Data Management Association. 2009. The DAMA Guide
to The Data Management Body of Knowledge (DAMA-DMBOK Guide) First Edition, Technics Publications, LLC.
[3] McKinsey & Company. 2013. Open data: Unlocking innovation and performance with liquid information. Retrieved on January 12, 2015 from McKinsey & Company:
http://www.mckinsey.com/~/media/McKinsey/dotcom/I nsights/Business%20Technology/Open%20data%20Unl ocking%20innovation%20and%20performance%20with %20liquid%20information/MGI_OpenData_Full_report _Oct2013.ashx
[4] Ministry of Communication and Information of Republic of Indonesia. 2008. Act of the Republic of Indonesia Number 14 of 2008 on Public Information Openness. Retrieved on June 20, 2015 from Ministry of Communication and Information of Republic of Indonesia:
https://ppidkemkominfo.files.wordpress.com/2012/12/a ct-of-the-republic-of-indonesia-number-14-of-2008-on-public-information-openness.pdf
[5] Open Data Support. 2014. The Linked Open
Government Data & Metadata Life Cycle. Retrieved on May 20, 2015 from Open Data Support:
http://www.slideshare.net/OpenDataSupport/the-linked-open-government-data-lifecycle
[6] Open Government Data. 2014. The Annotated 8 Principles of Open Government Data. Retrieved on January 15, 2015 from Open Government Data: http://opengovdata.org/, January 15, 2015 [7] Open Knowledge Foundation. 2012. Open Data
Handbook. Retrieved on January 12, 2015, from Open Knowledge Foundation:
http://opendatahandbook.org/en/index.html. [8] World Bank Group. 2014. Open Data Essentials.
Retrieved on July 10, 2014 from World Bank Open Government Data Toolkit:
http://data.worldbank.org/about/open-government-data-toolkit/knowledge-repository.