The move to open knowledge: open data and open source

1.5 Contributions

1.5.1 The move to open knowledge: open data and open source

The idea ofopen knowledgeis based upon the publication and subsequent exposure ofopen datato users in ways which make that data useful to them. It relies upon content and information which is freely available to anyone who is interested in it. Openness encompasses the freedom to use, re-use, modify and re-distribute data without any legal, technological or social restriction. The “Open Definition” sets out the key principles which define openness in relation to data and content. This definition holds that“knowledge is open if anyone is free to access, use, modify, and share it - subject, at most, to measures that preserve provenance and openness.”[176] In order to be open, data must be available as a whole and at no more than

1.5. Contributions reasonable reproduction cost. The content must be published in a convenient and modifiable form and this is usually facilitated by dissemination via the Internet. There must be provision for reuse and redistribution of the data which foresees and allows intermixing it with other content. Finally, there must be no discrimination in either the system of publication or the conditions of use. Commercial and non- commercial applications are equally valid. Everyone should be able to make use of the data for their own ends. There are various licensing schemes and agreements which are compatible to different extents with the definition of open data. These contain a range of terms for content attribution, subsequent sharing of derived or modified products based upon open data and a requirement to keep resultant products licensed openly. The most liberal open data licenses are theOpen Data Commons Public Domain Dedication and Licence[143] and theOpen Data Commons Open Database License[142].

The related idea of open source usually refers to software. It means that the source code for that software is openly available, thus allowing for inspection, modification, enhancement and forking, or the establishment of derivative software on an original codebase. The software may be redistributed freely. Most open source software is free of cost, but some applications do carry licensing fees. There are a wide variety of legal licensing frameworks which the authors of open source software can choose between in order to dictate how access to their code is facilitated and constrained. In general, open source licenses grant users permission to use the licensed software for any purpose they wish. Some open source licenses, which are known as “copyleft” licenses, stipulate that anyone who releases a modified program based on the source code of an original project must also release the source code for that underlying program alongside it. Moreover, some open source licenses stipulate that anyone who alters and shares a program with others must also share the original source code without charging a licensing fee for it. Common licenses for open source software include the General Public License (GPL), the Lesser General Public License (LGPL) and the Berkeley Software Distribution (BSD) license [56].

“Creativity flourished there because the Internet protected an innovation commons. The Internet’s very design built a neutral platform upon which the widest range of creators could experiment. The legal architecture surrounding it protected this free space so that culture and information - the ideas of our era - could flow freely and inspire an unprecedented breadth of expression. But

this structural design is changing - both legally and technically. This shift will destroy the opportunities for creativity and innovation that the Internet originally engendered. The cultural dinosaurs of our recent past are moving to quickly remake cyberspace so that they can better protect their interests...”

[112]

There is a consistent argument throughout this thesis that legal data under the English legal system should be made available on an “open data” basis. Some limited progress has been made here thanks to the publication of case transcripts by the courts themselves on the web. The British and Irish Legal Information Institute [111] was established in the early years of the current century in an effort to improve general access to legal information. This is a significant step which is of central utility to this research because the products that BAILII have published form the basis of the legal information used in the software which is presented here, under a bespoke licensing agreement for the research project. These sources remain relatively limited in scope and disparate at present, however. It is also the case that the standard licensing terms used by organisations such as BAILII are not sufficiently liberal to render their publications “open data”. Modification of the sources and their inclusion, parsing and presentation in part or as a whole through derivative systems is prohibited to most users. In order for this situation to change, the movement towards open knowledge needs to gain impetus for both philosophical and practical reasons. This research would not have been possible without a bespoke licensing agreement with BAILII since the text mining, presentation and summarisation techniques implemented in the software require extensive modification of and extrapolation from the underlying sources.

The idea that a person should be able to easily establish the current state of the law and thereby to understand their rights and obligations under it demands that it be possible for them to simply determine what the law is. The centrality of precedent in the English system means that a fulsome appreciation of the law requires access to a broad range of information sources and to both historical and contemporary legal case reports and legislation. The currently dominant model of closed publication of legal information which can only be accessed by the general public on an onerous subscription basis or through a solicitor cannot be said to adequately fulfil the general requirement of access to and understanding of the law for everyone regardless of means. Thus there is a clear need to move towards publishing legal data on an open basis in order to improve access to the law and

1.5. Contributions to justice.

Secondly, this thesis presents an argument that diversification in the legal software ecosystem will be greatly aided by a focus on open source software development. This is essentially a tactical decision because it is very difficult to compete with large established software companies in this sector on a commercial, closed-source basis. The providers of legal software also tend to be information publishers in their own right and the dominant companies here have been offering products to law students, lawyers and academics for many decades. Therefore there is a severe disparity in resources and experience between these existing companies and any new companies which attempt to enter the marketplace. New entrants have a difficult task to secure access to legal information in the first place. It is also likely that the amount of development effort which is required to produce a competitive legal research product would require large scale financial investment even before any new software could be released. The open source model, by contrast, offers a possibility to distribute development amongst a large and diverse pool of developers and legal experts on the basis that they are interested in the work and in the release of a final product. Established open source developments like LibreOffice [58], Ubuntu Linux [181] and the GNU Image Manipulation Program (GIMP) [167] (amongst others) demonstrate that is is possible to create and to support large and complex software products on an open source basis. Thus the open source focus in this thesis is essentially a decision which is motivated by a desire to release products for lawyers which quickly and comprehensively compete with established commercial platforms with minimal direct financial investment.

In document Corpus linguistics for the exploration of legal precedent (Page 33-36)