“Free the Data”: E-Governance - the Role of Data and
Information Technology
Gregory Curtin
Data, information technology, structured databases, the “deep web”…this is not generally the stuff of high level policy discussion and governance debates. Make no doubt about it, though: In the 21st Century, variously called the Digital Century, the Information Age or even the Post-Information Age, the Networked Society, and numerous other tech-hopeful names, publicly accessible data and information technologies will be critical to the success of international, national, regional and local policies. Government administrators at all levels are finally seeing this, and some are taking steps to realize the promise. President Obama’s new administration, for example, has led the charge and elevated technology to the national policy agenda. The Preamble to the new National Agenda for Technology is exemplary of this:
President Obama and Vice President Biden understand the immense transformative power of technology and innovation and how they can improve the lives of Americans. They will work to ensure the full and free exchange of information through an open Internet and use technology to create a more transparent and connected democracy. They will encourage the deployment of modern communications infrastructure to improve America's competitiveness and employ technology to solve our nation's most pressing problems -- including improving clean energy, healthcare costs, and public safety.
Importantly, the states are following the lead, evidenced by the National Association of State Chief Information Officers (NASCIO) call in June of 2009 to make all State data transparent and open as well, followed with the publication in September 2009 of a report entitled “A Call to Action for State Government: Guidance for Opening the Doors to State Data.” In short, NASCIO committed to “collaborat[ing] with the Office of Management and Budget, Office of the Federal CIO and the General Services Administration to promote broader transparency of state data….[T]o promote the development of public data catalogs in the states, adopt a standard naming convention, meta data standards and to share best practices on making state data sets available to the public” (NASCIO 2009).
California is one of only two states that have already provided data source links to the state and local section of the federal Data.gov portal, a major step on the path toward regional data e-governance. The California state data site, located at http://www.ca.gov/data/, provides a range of raw data available for download in a variety of machine readable formats.
There has never been a better time for technologists, policy makers and practitioners to work together to effect real change in government and bring real benefits to society in the form of economic growth, improved environmental and public health, and overall improved quality of life for all.
Set the Data Free
Any and all data created, collected, managed, or “touched” by a government agency or funded by taxpayer dollars should be made universally accessible to and reusable by any interested public or private party. Technologies are now available to do this and will become increasingly more so—technology is no longer the issue or the problem. Security and privacy issues can be addressed—these threats cannot continue to be held out by public agencies of all kinds and all
levels to stifle the opening up of government data sources and thereby choking innovation. President Obama in one of his first formal memoranda directed that “[government] should not keep information confidential merely because of…speculative or abstract fears” (Obama, Freedom of Information 2009).
The notion of transparency, long a cornerstone of democratic government, has gained new political currency as a result of the global financial and economic crisis. Observers of the cozy Wall Street-Federal Government relationship and lack of oversight have made calls for “radical transparency” as the necessary medicine. Specifically in the economic and financial arena “[t]he revolution will be powered by data, which should be unshackled from the pages of regulatory filings and made more flexible and useful. “ Extending the concept to each of the identified mega region policy areas, all important data should be reported “…online—and in real time…uniformly tagged and exportable into any spreadsheet, database, widget, or Web page” (Roth 2009). The revolution—the transformation of how government does business—will be powered by data.
Immediately following the 2008 election of Barak Obama as President of the United States, the technology world was abuzz (or rather, a-“Twitter”) with the prospects for the new administration. As one observer noted just days after the election, “[t]he 2008 Presidential Election crowned the Internet as the king of all political media, ending the era of the television presidency that started with John F. Kennedy. Barack Obama’s pioneering use of social networking and other information technologies not only transformed campaign politics, but it could influence the way government and business work as well” (Wagner 2008). The new president and his administration did not disappoint. One of the first high profile moves by the Obama administration was the appointment on March 5, 2009 of the nation’s first ever Chief Information Officer, or CIO (Vivek Kundra, the former Chief Technology Officer for Washington, D.C.). One of the new CIO’s initial announcements was the creation of a new website, Data.gov, with a stated purpose of making “a broad array of U.S. Government data available in downloadable formats” (http://www.data.gov/). To be sure, this may not appear to be a very “sexy” new website announcement in this day and age of Facebook, MySpace, Twitter and the like; it is nevertheless an important step for the new administration. The site, launched in May 2009, promises to serve as a model for open government data and data transparency, a model that other governments will follow.
As other evidence of the administration’s moves, just click around the revamped E-Government sections of the new WhiteHouse.gov website, and you will run into a new tagline: “Powering America’s future with technology” (see, for instance, the newly launched Visualization to Understand Expenditures in Information Technology, or VUE-IT, feature accessible at http://www.whitehouse.gov/omb/e-gov/). The talk is certainly there for the administration, and the walk promises to follow.
Two of the first three official memoranda issued by the new President just a day after his inauguration dealt with new technologies and their place in a 21st Century democracy. The President’s Memorandum on Transparency and Open Government includes specific language that “[e]xecutive departments and agencies should harness new technologies to put information about their operations and decisions online and readily available to the public.” Further, the development of a formal Open Government Directive has been put under the purview of the new federal CIO, clearly aligning policy with the importance of data, information technology and the web in modern society. In a second presidential memorandum that same day on Freedom of Information, the President notes that “[all] agencies should use modern technology to inform citizens about what is known and done by their Government…” and further directs “the Director of the Office of Management and Budget to update guidance to the agencies to increase and improve information dissemination to the public, including through the use of new technologies…” (Obama, Freedom of Information, and Transparency and Open Government Memoranda, 2009).
A cross-disciplinary group of researchers from Princeton recently published a seminal article in the Yale Journal of Law and Technology framing the argument for opening up government data
and capitalizing on the digital world: “If President Barack Obama’s new administration really wants to embrace the potential of Internet-enabled government transparency, it should follow a counter-intuitive but ultimately compelling strategy: reduce the federal role in presenting important government information to citizens.” The overarching point being made by the authors is that the federal government—the focus of their article—should open up its data rather than trying to manage it or deciding how best to package it for the public. “In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must re-imagine its role as an information provider…it should focus on creating a simple, reliable and publicly accessible infrastructure that ‘exposes’ the underlying data” (Robinson et al., 160). This should no doubt apply to any and all government actors at all levels, not just the federal government.
Steps to Setting the Data Free
According to the World Wide Web Consortium (W3C) E-government Interest Group, which has been preparing a charter document on improving access to government, in order to publish Open Government Data (also referred to as Public Sector Information, or PSI, on a broader level), there are three fundamental steps that need to be taken: identify the data that one controls; represent that data in a way that people can use; and expose the data to the wider world (Alonso 2009).
The first part of this equation is for each agency to identify the data and information under its control, and formally “approve” it to be opened up. This formal approval indicates the originator of the data, and in the case of government agencies, signifies that it is official data and open for reuse. To be sure, the “creators” of new data should retain the right, and the duty, to validate data and approve it for release to the public. This approval, however, should be provided in as expedient a manner as possible—the digital world of the web moves in real time. Unless there is some overriding reason related to identified security or privacy concerns (not the abstract security or privacy issues that so often are used to restrict the distribution of data and information), all public data must be freed. A critical and somewhat controversial issue tightly connected to this first step is the development of new legal and policy frameworks addressing the range of intellectual property and digital rights issues that have arisen around the sharing, publishing and reuse of data and information on the Internet. Addressing these issues in detail is not within the scope of this paper at this time, however work will need to be done to put in place the policy and legal frameworks to encourage, if not require, that accessing and reusing of data needs to be open and transparent, based on some identifiable and knowable standards or guidelines, and that the reuse should be attributable to and provide some value back to the originator.
The second part of the open government data equation—representing the data in a way people can use—involves the technical aspects of freeing the data. This is primarily related to the initial creation of the data in standard machine readable formats so that they can be easily found, especially via web based searches, accessed over the web and easily reused. There is a rich body of technical literature on this topic that predates the current situation—the exchange, sharing and integration of data among multiple sources and in multiple formats has long been a vexing issue for computer scientists, let alone the hopeful end users of the data. The main point for this discussion is that now, at a point where the political will and desire are transcendent, and the technological capabilities related to the web are available, that the chasm between technologists—computer scientists, engineers, programmers, etc.—on the one side, and policy makers and government practitioners on the other, must be bridged.
Even agencies that do have available data and coding standards often reflect outdated approaches and needs that were based on old mainframe or early personal/business computing database requirements. These standards are often based on hardware (legacy computer systems) or specific technologies, rather than driven by the desire to share, reuse, leverage, mine for information, etc. New standards—data exchange and data sharing—need to be developed so all new
data collection activities can be useful for future sharing. All too often public sector information and database projects utilize non-reusable, closed and proprietary databases, often because they were done in a vacuum, driven as much by technology requirements or opportunities as by policy outcomes. These types of projects and initiatives, while resting squarely on technological underpinnings, must be lead and championed by the business side, in this case the policy makers and domain area practitioners who will be charged with planning, evaluating and delivering public services across a mega-region. Ultimately it is they who will deliver on the triple bottom line.
There are a number of ways for data developers to initially prepare data and information for open access and reuse. The W3C E-Government interest group published in March 2009 a draft working paper on improving access to government, which includes a comprehensive section on preparing “open” government data. From publishing standard data APIs (application programming interfaces), to ensuring that all data can be shared via RSS (real simple syndication) or Atom, utilizing new Semantic Web Technologies and RDF (resource description framework), there are a number of emerging practices for data readability and reusability. (For more on the technical aspects of how to achieve open government data, see Alonso, Improving Government Access through Use of the Web, 2009.) Additionally, the widespread use of Extensible Hypertext Markup Language (XML) on the web as a standard way of presenting content and data has lead to initiatives to develop XML based standards for various categories of data. An emerging Extensible Business Reporting Markup Language (XBRL) has been developed for financial data, for example, and is being supported as a future requirement for both governments and public companies. A number of national and state governments have worked over the years to create XML standards for E-Government (see, for example, XML.gov for information on related efforts in government). The tools, languages and standards exist—data creators and managers simply have to adopt and apply them.
The third step is to open the data up to the wider world. This is potentially the most important, and valuable, step in the process. Assuming that governments and other public sector agencies have complied with the first two steps, efforts should be made to make the data accessible and to actively disseminate it. In common web parlance, the content needs to be placed into the “clickstream” so that others can find it, reuse it, and hopefully create new content and information around the original data that can provide even greater value. Most public agencies, however, do not have the kind of ongoing resources or internal expertise necessary to facilitate and foster over time the dynamic giving, taking and shaping of data and information on the web. This is where new entities or structures, such as Centers of Excellence within specific mega-regions, can enter the process and provide immense value right from the start. Enabling this kind of open government data infrastructure will also serve as a stimulus for new content, and potentially new economic activity, as “…private actors have demonstrated a remarkably strong desire and ability to make government data more available and useful for citizens—often by going to great lengths to reassemble data that government bodies already possess but are not sharing in a machine-readable format” (Robinson et al, 165).
Government Data, Transparency and the “Deep Web”
A recent New York Times article noted that sometime in the summer of 2008 Google’s search engine “added the one trillionth address to the list of Web pages it knows about.” This is an almost incomprehensible number, yet “beyond those trillion pages lies an even vaster Web of hidden data: financial information, shopping catalogs, flight schedules, medical research and all kinds of other material stored in databases that remain largely invisible to search engines” (Wright 2009). This deep web illustrates well the immense challenges and opportunities related to data in the 21st century.
A significant amount of deep web content that has high commercial value has already been mined on the web, for example at consumer product sites, entertainment information and news sites,
and meta search sites such as for travel and real estate. However, as Madhaven et al. have pointed out, “in domains with more rarely searched content, the surfacing of high-quality form-based sites can have a huge impact. Prominent examples include governmental and NGO portals” (2009). Such public sector sites have a tremendous amount of structured data and content on a dizzying array of subjects, but generally do not do a good job of making that content available, if at all. As Wright continues, “[d]eep web technologies may eventually let businesses use data in new ways. For example, a health site could cross-reference data from pharmaceutical companies with the latest findings from medical researchers, or a local news site could extend its coverage by letting users tap into public records stored in government databases” (2009, emphasis added). One can imaging regional actors in mega-regions—formal or informal—mining the newly accessible deep web to find multiple official and unofficial data sources to track regional activities and trends.
Much of the latest renewed support for data transparency has been driven by the current economic and financial crisis. As Christopher Cox, the former Chairman of the SEC, has noted, “The SEC was founded on the legal concept of disclosure and transparency….It was not a technological concept….Today, we have technology that was unimaginable in the early part of the 20th century, that can reify this idea in ways that are far more expansive and consequential” (quoted in Roth 2009). Implementing a Strategic Investment Framework on a mega-regions basis with open and accessible data is both more expansive and potentially more consequential.
On the industry side of things, Google launched in late April 2009 a new search tool called Google Public Data. Simply type into the standard Google Search a specific data type and location, for example, [unemployment rate California], the standard Google search results will provide as the first result a special Google Public Data link to interactive data charts, and additional datasets and variables that can be selected. The Public Data included so far is still fairly limited, including data from the US Census Bureau and the US Bureau of Labor Statistics. And by September Google was ramping up its moves into the public sector and open data announcing on September 15, 2009, that it would be launching in 2010 a dedicated Google government “cloud” for providing and managing government data and applications in a fully hosted internet environment (the “cloud”) that would be compliant with Federal Information Security Management Act (FISMA) requirements. Microsoft, Sun and other technology heavyweights have indicated their plans for government specific “cloud” computing initiatives for federal, state and local governments.
N.B. This is an excerpt of an article by Gregory Curtin to be published shortly in the Journal “Public Works Management and Policy”