• No results found

Discretionary Version Control Access Control for Versionable Documents

N/A
N/A
Protected

Academic year: 2021

Share "Discretionary Version Control Access Control for Versionable Documents"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

Discretionary Version Control

Access Control for Versionable Documents

Diskretionär versionshantering

Accesskontroll för versionshanterade dokument

Johan Hellström

Rickard Hermansson

Degree Project in Computer Technology and Software

Development, First Level, 15 hp

Advisor at KTH: Anders Lindström

Examiner at KTH: Ibrahim Orhan

TRITA-STH: 2014:29

School of Technology and Health

136 40 Handen, Sweden

(2)
(3)

Abstract

A common problem in the workplace is sharing digital documents with coworkers. For some companies the problem extends to wanting the documents kept internally backed up and controlling which people in the company has rights to read and revise certain documents.

This paper shows different systems and models for access control, version control, and distribution of the documents that can be used to create a system that solves these problems.

One requirement for this system was a user interface where users can upload, down-load and manage access to their documents. Another requirement was a service that handles version control for the documents, and a way to quickly connect and distribute the documents. The system also needed to be able to handle access control of the ver-sioned documents on document level, referred to as "fine grained access control" in this paper.

These models and systems were evaluated based on aspects of the access control mod-els, version control systems, and distribution systems and protocols. After evaluating, appropriate selections were made to create a prototype to test the system as a whole.

The prototype ended up meeting the goals that Nordicstation set for the project but only with basic functionality. Functionality for retrieving any version from a docu-ments history, controlling access for the docudocu-ments at document level, and a simple web based user interface for managing the documents.

Keywords

Version control, access control, content management, document control, java content repository, webDAV, document sharing

(4)
(5)

Sammanfattning

Att enkelt dela dokument med arbetskollegor är något alla företag har ett behov utav. Ofta är dessa dokument interna och skall hållas inom företaget. Även inom företaget kan det finnas behov av att styra vem som har rätt att läsa eller revidera dokumenten.

Denna examensarbetesrapport beskriver olika tekniker och modeller för accesskon-troll, versionshantering och distribution som kan användas för att implementera ett system som kan lösa de nämnda problemen.

Ett av kraven för systemet var ett användargränssnitt där användare kan ladda upp och ned sina dokument. Ytterligare krav var att systemet skulle versionshantera dokumeneten och att användare skall kunna komma åt de olika versionerna. Systemet skulle också kunna hantera åtkomstkontroll på dokumentnivå, något denna examensrapport definerar som "fine grained access control".

För att designa ett sådant system så utredes och utvärderades olika tekniker kring åtkomstkontroll och versionshantering samt distribution av dokumenten. För att testa systemet så utvecklads en prototyp baserad på de valda lösningsmetoderna.

Den resulterande prototypen uppfyllde de mål som Nordicstation satte för projektet, dock endast med grundläggande funktionalitet. Stöd för att hämta olika versioner av dokument, kontrollera access till dokumentet nere på dokument nivå och ett webbaserat gränssnitt för att administrera dokumenten.

Nyckelord

Version control, access control, content management, document control, java content repository, webDAV, document sharing

(6)
(7)

Acknowledgements

We would like to thank Anders Lindström, our supervisor at KTH, for all the valuable advice and feedback during the writing process of this degree paper.

We would also like to thank Tobias Hultgren, our supervisor at Nordicstation, for helping us with providing the specifications for the resulting prototype. Some credit should also go to Tobias Österberg for his help with designing the system architecture used by the prototype.

(8)
(9)

Contents

1 Introduction 1 1.1 Background . . . 1 1.2 Contributions . . . 1 1.3 Methodology . . . 1 1.4 Goals . . . 1 1.5 Delimitations . . . 2

2 Prestudy of Existing Solutions and Models 3 2.1 Access Control Models . . . 3

2.1.1 Flat Role Based Access Control . . . 3

2.1.2 Hierarchical Role Based Access Control . . . 4

2.1.3 Discretionary Access Control . . . 4

2.2 Version Control Systems . . . 5

2.2.1 Subversion . . . 5

2.2.2 Git . . . 5

2.2.3 Extended Version Control for WebDAV . . . 6

2.2.4 Java Content Repository Version Control . . . 7

2.3 Distribution platforms and protocols . . . 7

2.3.1 Web based Distributed Authoring and Versioning . . . 7

2.3.2 File Transfer Protocol . . . 8

2.3.3 Java Content Repository . . . 9

2.3.4 Apache Jackrabbit . . . 10

2.3.5 JBoss ModeShape . . . 10

3 Evaluation of Existing Solutions and Models 11 3.1 Evaluation of Access control models . . . 11

3.1.1 Flat Role Based Access Control . . . 11

3.1.2 Hierarchical Role Based Access Control . . . 12

3.1.3 Discretionary Access Control . . . 12

3.2 Evaluation of Version Control Systems . . . 12

3.2.1 Subversion . . . 12

3.2.2 Git . . . 13

3.2.3 WebDAV Version Control . . . 13

3.2.4 Java Content Repository Version Control . . . 13

3.2.5 Custom made Version Control System . . . 14

3.3 Evaluation of distribution platforms and protocols . . . 14

3.3.1 Web based Distributed Authoring and Versioning . . . 14

3.3.2 File Transfer Protocol . . . 14

3.3.3 Java Content Repository . . . 15

3.3.4 Apache Jackrabbit . . . 15

3.3.5 JBoss ModeShape . . . 15

3.3.6 Custom made distribution system . . . 15

(10)

4 Resulting Prototype 21

4.1 The Access Control . . . 21

4.2 The Version Control . . . 21

4.3 The Distribution . . . 21

4.3.1 The Watcher . . . 22

4.4 Evaluating the prototype . . . 22

5 Analysis and Discussion 25 5.1 Analysis . . . 25

5.2 Discussion . . . 26

6 Conclusions 29 6.1 Practical considerations of the prototype . . . 29

6.2 Future work . . . 29

(11)

1

Introduction

This section will present the goals and background of this degree project.

1.1

Background

Nordicstation was in need of a system for sharing and collaborating digital documents. These documents were confidential and had to be kept in an internal environment. In addition, the system needed to allow version control for documents with access control on document level. Some employees were provided with tablets, and these are considered personal, meaning that it is allowed to have personal data. Nordicstation required a system where employees also could store their personal data safely and privately.

1.2

Contributions

The contributions of this paper are a deeper understanding of available systems and technologies when developing a version control system for documents, as well on how to implement fine grained access control for those documents.

The results of this paper can be used by companies who use or want to use version control for documents or by developers who want to implement access control on doc-ument level.

1.3

Methodology

To solve the problem of building a system for sharing and collaborating digital doc-uments with focus on access control on document level, the following methods were used when conducting the work of the project:

• Research was conducted in the form of a prestudy for several solutions fo

differ-ent areas relating to the system.

• The solutions for the different areas were compared, evaluated and ranked on

different aspects.

• A prototype was implemented using the solutions evaluated in the research stage. • The research and the resulting prototype was analysed and evaluated by

compar-ing the results to the goals of the paper.

1.4

Goals

The purpose of this degree project was to examine and evaluate models suitable for implementing a version control system targeting documents. The goal was to find models that enable fine grained access on document level, meaning that documents under version control should have different permission for different users.

(12)

The goals are split into two sections, the theoretical (Evaluation) and the practical (Implementation). The criteria for the practical section was suggested by Nordicstation and is listed in the implementation goals.

Evaluation

• Finding suitable solutions for implementing access control on document level

by comparing administration, and support for document level access control on different access control models.

• Finding suitable solutions for implementing version control for documents that

are not necessarily text files by comparing the amount of APIs available, support for non-text file formats, options for conflict handling, and how detailed and thorough the documentation of different version control systems and methods are.

• Finding suitable solutions for implementing distribution of the documents by

comparing platform independence, support for file access and version-control, user and administration interface support, and how detailed and thorough the documentation of different distribution systems and methods are.

Implementation

• Choosing suitable solutions for implementing a prototype based on the access

control model, version control system, and distribution system/methods evalu-ated, while taking in consideration the functionality requested by Nordicstation as listed below:

– A web based user interface where users can view, download, and upload documents, as well as allowing users to share folders and files with other users.

– A backend service that handles the version control on uploaded documents and makes sure the version controlled files are stored persistently.

– A module that allows WebDAV (see 2.3.1) access to the system for upload-ing folders and documents.

• Evaluating the prototype based on the implementation time, access control

func-tionality, version control functionality and overall stability of the system.

1.5

Delimitations

The security aspects of the system was not evaluated. The resulting prototype had security based on Nordicstations requirements; evaluating security was out of the scope of this project.

(13)

2

Prestudy of Existing Solutions and Models

This chapter will explain the theory behind the technologies and methods that are rele-vant to the system, as well as referencing existing solutions in the area.

2.1

Access Control Models

The purpose of access control models in general is to determine the extent of users capabilities within a system, for example access to shared resources and the permission to alter them. The following sections will discuss different models to determine how to define these capabilities. The following subsections will discuss some existing models that can be used as a base when implementing access control.

2.1.1 Flat Role Based Access Control

This type of access control is called Role Based Access Control (RBAC). All actions available to a user is defined by the users role. Each user can have an arbitrary number of roles and each role may have an arbitrary number of users. Users can invoke per-missions from multiple roles at the same time [1]. It is called flat since the roles does not affect each other. This is illustrated by figure 2.1.

Figure 2.1: The user has many roles, each role has permissions to perform one or more operations. The user will aggregate all permissions from each role.

(14)

2.1.2 Hierarchical Role Based Access Control

When using Hierarchical Role Based Access Control (HRBAC), the roles are placed in a hierarchy as seen in figure 2.2. Roles higher up in the hierarchy inherit permissions from those beneath. There is therefore no need for users to have multiple roles.

Figure 2.2: Roles are placed inside a hierarchical tree. The project manager, at the top of the tree, will inherit permissions from the roles beneath. The senior developer inherits permissions from the junior developer. Both the admin and junior developer are at the bottom of the tree and inherits from no one.

2.1.3 Discretionary Access Control

Discretionary Access Control (DAC) dictates user capabilities by mapping the permis-sions for resources in a table, commonly referred to as the access matrix [2]. There is one row per user, and one column per resource. Permissions for each resource is then stored in the cells. Table 2.1 shows a hypothetical example. One example of DAC in practise is the UNIX operating system [2]. New files (resources) belong to the user who created it, granting read and write access. It’s then up to the owner to grant permissions to other users, e.g modifying the access matrix.

Users Document 1 Document 2 Document 3 Note 4

User1 Write NULL Read Read, Write User2 NULL NULL Read, Write NULL

Table 2.1: User1 and User2 have been granted different permission to the resources, in this scenario documents. NULL means that access is denied for that resource.

Access Control Lists

One way of implementing DAC would be to use Access Control Lists (ACL) [2]. Each resource have a list that contains users paired with their permissions for the resource. It’s now possible to pair permissions for user groups rather than single users without adding overhead, as would be the case when using an access matrix. Adding a group

(15)

to the ACL would only add a new row, adding a group to the access matrix would add a new column for every row.

2.2

Version Control Systems

This section will discuss Version Control Systems (VCS). There are two dominating types of VCS, centralized and distributed [3].

Subversion (SVN) is an example of a centralized system. In centralized systems, the server keeps track of the different file versions, and the clients get copies of the version controlled files from the server. Version control with SVN will be discussed further in this section [4].

Git is an example of a distributed system. When clients get copies from the server in a distributed system they don’t just get the most recent versions, they get a copy of the whole repository, including all previous versions. Version control with Git will also be discussed further in this section [4].

2.2.1 Subversion

Subversion (SVN) is an open source VCS.

A repository in SVN is a file storage server that remembers each version of all files that has been changed by clients. When checking out a repository in SVN, the local working copy of the repository is comprised by the files added to repository. The client handles the current files and the changes made to them. A subversion repository has support for access control on repository level [5].

SVN has support for automatic merging and differentiating of text-files, and will try to merge binary files as well if not told to ignore them.

Subversion has support for several programming languages via several APIs[6], the primary APIs being implemented in C and Java. By the use of a wrapper called SWIG (Simplified Wrapper and Interface Generator), more languages are supported, includ-ing Python, Perl, Ruby, C#, and PHP. The languages supported with the help of the wrappers are "usable", meaning that they are not necessarily stable but most likely functional [7]. Outside of the wrapper, a Java API named SVNKit [8] and a .NET API named SharpSvn [9] exists. Documentation for the official APIs are available [10], but no official guides or extensive examples for different functionality is offered.

2.2.2 Git

Git is another VCS implemented as open source. The key aspects of git is listed below, taken directly from the Pro Git book [3].

• Speed. • Simple design.

• Strong support for non-linear development (thousands of parallel branches). • Fully distributed.

(16)

• Able to handle large projects like the Linux kernel efficiently (speed and data

size).

Git handles data differently from many other version control systems (including SVN). By thinking of the data like "a set of snapshots of a mini filesystem". This allows the system to be more efficient by not needing to save multiples of the same files but only references to the old ones. This combined with the fact that Git creates checksums for all files in the repository for references gives the system more integrity. Git handles almost everything locally in a local database, which means it is much faster than other version control systems that does things remotely.

Git supports automatic merging and differentiating text-files and will try to merge binary files if not told to ignore them.

Git has many APIs for many different programming languages through language bindings to libgit2 [11], a pure C implementation of the Git-core methods. Languages included are: Ruby, .Net, Python, Lua, Perl, C++, Go, Erlang, Parrot, D, Objective-C, and Node.js. Other options include a Java API namned JavaGit [12]. Documentation for libgit2 as well as many examples for using the API are available through the official libgit2 page [11].

2.2.3 Extended Version Control for WebDAV

RFC 3253[13] introduce additional features for managing Web based Distributed Au-thoring and Versioning (WebDAV) resources under version control. Some of these features are listed bellow. Section 2.3.1 will explain how WebDAV works for distribu-tion and it will also explain how WebDAV define resources and properties which are mentioned here.

• Version controlled resources can be checked in and out, and this can be thought

of as locking and unlocking the resource. Each resources has a dead property indicating if it is checked in or out, checking out a resource may yield an exclu-sive or shared lock depending on how the WebDAV-server is configured. A new version is created once the resource has been checked in.

• It is possible to create branches (parallel versions) by using forks. A resource

is divided into two separate copies, each with it’s own version history. These parallel versions can be merged later on. A new fork is made by setting a property when checking in a resource.

• Resources under version control must contain certain meta data (properties),

such as comments, meant for describing the resource, and the display name of the resource creator. Comments can be tought of as a commit messages.

• Resources under version control can be updated without creating a new version. • It is possible to query revision history and meta data about resources, including

(17)

HTTP methods

Table 2.2 shows available HTTP methods relevant to the listing on the previous page.

Method Description

VERSION-CONTROL Request to put resource under version control.

REPORT Request information about a resource, based on the provided parameters. LOCK Lock a resource using an exclusive or shared lock

CHECKOUT Request to check out a resource so it can be edited.

CHECKIN Request to check in a resource. This will produce a new version. UNCHECKOUT Request to check in a resource and discard the changes. UPDATE Update the content of a version controlled resource.

LABEL Add a string label to a specific version, versions can be queried by labels. Table 2.2: Descriptions of HTTP methods from RFC 3253.

The only official documentation available for the WebDAV versioning extension is the RFC 3253[13].

2.2.4 Java Content Repository Version Control

An alternative version control system is to use the built in functionality of Java Content Repository, explained in section 2.3.3.

2.3

Distribution platforms and protocols

This section explains distribution platforms and protocols used for communicating the data from the user to the server and vice versa. This includes user interfaces and databases.

2.3.1 Web based Distributed Authoring and Versioning

WebDAV is an extension of HTTP and contains features for clients to collaborate when handling files, specified by RFC 918. Following subsections describe how WebDAV specifies resources, access control and versioning [14, 15].

Resource Property Model

WebDAV define resources by properties, these are made of well formed XML key-value pairs. There are two types of properties, and they can either be alive or dead. Living properties are always checked by the server, meaning the syntax and consistency will be checked. Dead properties are not checked by the server, that responsibility falls on the client. Resources are placed inside hierarchical HTTP name spaces, an example is given below.

(18)

There is special type of resources called collections. Collections are resources that refer directly to other resources (grouping them together), making it possible to perform operations on multiple resources.

Locking

It is possible to lock a resource, using either an exclusive lock or a shared lock. An exclusive lock means that the write permission is denied for everyone except the owner of the lock. A shared lock means that a flag will be raised indicating that the owner of the lock currently is writing, others are how ever still permitted to write.

HTTP methods

Table 2.3 shows available HTTP methods [15] relevant to the previous subsections.

Method Description

PROPFIND Retrieve resource properties. PROPPATCH Write resource properties.

LOCK Lock resource using an exclusive or shared lock UNLOCK Unlock a resource.

MKCOL Create a new collection. COPY Copy a resource or collection. MOVE Move a resource or collection.

ACL Write to the access control list for a resource. Table 2.3: HTTP methods.

Discretionary access control

WebDAV use ACLs for managing access to resources. Users, referred to as principals, are given permissions defined in the Access Control Entries (ACE), the ACL is simply a list of ACEs. Users can be collected in to groups, it is possible to set permissions for groups in the same way as for single users.

2.3.2 File Transfer Protocol

The central task in a distribution system is moving files from and to the system, a com-monly used protocol used for this purpose is the File Transfer Protocol (FTP). There are many APIs available for integrating FTP into a system, and there are also plenty of FTP clients available. FTP traffic can be encrypted using Transport Layer Secu-rity (TLS) [16] or Secure Sockets Layer (SSL), this is important if the traffic contains sensitive data.

Users who connect via FTP must have an account, comprised of a username and pass-word. Administration consist of controlling which user have access to which directory and if the user should be permitted to perform write operations (upload files).

(19)

FTP is well documented, and all functionality is described in RFC 959 [17]. Most APIs and clients come with their own documentation, some examples are Apache Com-mons Net library [18] and the FileZilla Client [19]. There are many other alternatives not mentioned here.

2.3.3 Java Content Repository

Java Content Repository (JCR) is an API specification for data storage, targeting con-tent management systems. The following subsections will discuss how JCR specifies repositories, access control and versioning[20].

Repositories & Workspaces

A JCR repository consist of an arbitrary number of workspaces with unique names. Each workspace have a graph of items, and the graph is hierarchical starting with a root node.

Workspace items

Items can either be nodes or properties. Nodes can have an arbitrary number of child items. Properties can have an arbitrary number values but no children. The actual content is stored as property values, such as binary and string values. Each item have a jcr:name property, a string pair consisting of the namespace and the local item name. A namespaces is a Universal Resource Identifier (URI). Each item has a jcr:path, a property specifying where the item is located within the workspace.

Node types

Every node has a primary type and an arbitrary number of mixin types. The primary type typically describes what node is. The mixin types are used as metadata, and may, unlike the primary type, be added later during the nodes life cycle instead being defined at creation. JCR specifies some default node types, new types can be created by developers if needed. Node types support inheritance, and can also be abstract (nodes can not be of this type).

Access control

Users are given a session when accessing a workspace. The workspace contains the user privileges. Each node within the workspace have access control policies, and the user privileges are then matched against the policies inside the node. Hence JCR support access control at node level.

Version control

Nodes under version control are called versionable nodes, and is defined by a mixin type. These nodes have a version history. All versions are accessible via the JCR API and it is possible to revert to previous versions. They can be checked in and out.

(20)

Checking in an item will result in a new version and checking out will raise a flag indicating that the node is being edited. Conflicts can occur if content is changed by more than one session at the same time, an exception will then be thrown. It is up to the implementation to handle the conflict, JCR is only a specification. It is possible to merge versionable nodes as with most version control systems.

Observation

JCR specifies functionality for observing events. Events are triggered when there is a persistent change in the workspaces. Observation can either be asynchronous or journaled. Asynchronous events can be observed by listeners (applications) as they occur. Journaled observation allows applications to connect to the repository and re-ceive information about events that occurred within a specific timespan. Events contain references to affected nodes.

2.3.4 Apache Jackrabbit

Jackrabbit is an implementation of JCR [21](see section 2.3.3), enabling developers to implement systems for storing digital content. It also offer a ready to deploy server, available as a standalone executable jar or as a .war file which developers can deploy on a web server such as Apache Tomcat. Repositories are defined using configura-tion files, specifying workspaces, how data is stored/versioned persistently, and how access control is handled. Developers can connect remotely to repositories via Remote Method Invocation (RMI) or locally via Java Naming and Directory Interface (JNDI). Jackrabbit comes with a server that enables WebDAV access [22]. There is an API that offers functionality beyond JCR, for example administrational features. The Jackrabbit API is intended for creating more complex systems. One example is the systems made by Jahia who offer commercial solutions for content management [23].

2.3.5 JBoss ModeShape

ModeShape is a data store compliant with JCR [24]. Developers can use the JCR API or an included REST API for managing content. There is a WebDAV server available for accessing the repositories, it can be deployed on a JBoss EAP server. ModeShape is in many ways similar to Jackrabbit, the main difference is the additional REST API, this is useful when working with a system other than Java. It is however not possible to implement observation via REST as described in section 2.3.3. ModeShape is also made for developing more complex systems, one example is Magnolia, who decided to replace Jackrabbit with ModeShape claiming it to be more flexible for their needs [25]. They offer a commercial content management system.

(21)

3

Evaluation of Existing Solutions and Models

This section evaluates systems and models from the prestudy, used for building a ver-sion control system for sharing files and implementing access control.

The evaluations are split into three groups: Access control, Version Control, and Dis-tribution (user interface and backend for uploading and downloading the files).

The access control models are compared and evaluated on:

• Amount of administration required for the model to function properly. • Possibility to support fine grained access control.

The version control systems are compared and evaluated on:

• Amount of Developer APIs available. • Support for files other than text files. • Options for conflict handling.

• Documentation availability and quality.

The distribution systems and protocols are compared and evaluated on:

• Platform independence.

• Built in support for access-/version-control.

• Built in support for user and administration interfaces. • Documentation availability and quality.

3.1

Evaluation of Access control models

This subsection will evaluate the access control alternatives.

3.1.1 Flat Role Based Access Control

This type of access control is easy to administrate and roles can be modeled after an organization, this means the relationship between roles and permission become less ab-stract. For example giving the role "accountant" permission to access the organizations bank account. However, roles are less practical when it comes to fine grained access control. Lets say John and Jane are two project managers within the same organization, and both of them have the same role. John is responsible for developing documentation for product A, Jane has the same responsibility for product B. John should not be able to modify documentation for product A, but he can since permissions are based on the role. One solution for this problem would be to divide John and Jane into different groups, limiting Johns permission within group B and do the same for Jane in group A. Doing this would break the model, roles would have different meanings for different contexts making it difficult to administrate.

(22)

3.1.2 Hierarchical Role Based Access Control

Hierarchical RBAC has the same advantages and drawbacks as flat RBAC. One ad-ditional problem arises since roles inherit permissions. Looking back at the example when evaluating flat RBAC, if John and Jane has someone above them in the organiza-tion hierarchy, like a product supervisor, that role would automatically grant access to modify the documentation for both their products. The desirable permission would be some sort of read only access.

3.1.3 Discretionary Access Control

It is harder to administrate a DAC-system, where permissions vary for each resource. Administration initially falls on the user who create the resource. This means there are no centralised administrators for the system. Lets take the example of John and Jane from the evaluation of flat RBAC. John creates the documentation for product A and the grants read access to Jane, Jane does the same for John regarding product B. Both parties now have the correct access. The administration that occured only involved and affected the relevant party. The only downside for this scenario is if a third party needs access, like in the evaluation example for hierarchical RBAC. If there is a supervisor that need access to all products then he/she must be granted permissions for each resource. This problem is illustrated in figure 3.1.

Figure 3.1: Each intersection indicates the need for administrating permissions. This is a trade off for the fine grained access control.

3.2

Evaluation of Version Control Systems

This section will evaluate the VCS alternatives. The version control systems are evalu-ated and compared to creating a version control system from scratch with the relevant functionality.

3.2.1 Subversion

Subversion have many choices of API to chose from when developing towards it as mentioned in section 2.2.1.

Because of the fact that subversion doesn’t look at file extensions when versioning, it has no problem versioning files other than text files. However, if conflicts occur when

(23)

not working with text files, the merging process is practically impossible and needs to be resolved either by choosing your own copy of the file, or the other persons copy of the file. Conflicts can be avoided by for example renaming the files that are uploaded to the server, which is a logical option due to the fact that merging isn’t a possibility anyway.

Because of the large amount of APIs available, documentation for each one varies. The average availability of documentation for the official APIs are good, while the quality of them is just basic, with no examples or guides. SVN fits with the systems criteria of versioning files, but setting up a repository and connecting that repository to the system with the APIs available is not trivial.

3.2.2 Git

Git has a comparable amount of APIs as SVN available, as mentioned in section 2.2.2. Git, in the same tone as SVN, can handle all file types but has a harder time merging things other than text-files. This means the same logic can be used in Git to handle conflicts as SVN.

The documentation available for the APIs is good. However the libgit2 pure C docu-mentation has some examples in addition to the docudocu-mentation which makes it better than average. While the API availability and the documentation quality for them is good, there is still a lot of research needed to completely understand the features and limitations of them and choosing one to integrate into the system. The same issues as SVN is present with Git, being that setting up a repository and connecting it to the system would be a lot of work.

3.2.3 WebDAV Version Control

Other systems such as SVN and Git provide developers with APIs that enable inte-gration, WebDAV have HTTP methods available, as seen in section 2.2.3. The http methods are limited compared to the APIs for SVN and Git. The strength of the ver-sion control in WebDAV is that it is a part of a standard, fully compatible with the access control.

WebDAV supports all file types. The conflict handling is similar to other version control systems in that it offers merging of files. The documentation aspect of WebDAV is the weak point. With only the RFC as the official documentation. It’s very detailed but very hard to read and summarize.

The WebDAV version control does not outperform or offer additional functionality compared to the others, but it offers all the functionality required for the system.

3.2.4 Java Content Repository Version Control

JCR specifies how content of different kinds (including binary) can be stored in ver-sionable nodes. Because of the fact that conflicts are handled by throwing exceptions, it’s easy to catch and stop a commit and in turn warn the user. It’s also possible to avoid conflicts just as with the other systems by changing meta data of the document.

As mentioned in section 3.3.3 about distribution with JCR, the documentation is very good.

(24)

3.2.5 Custom made Version Control System

The problem with using an existing VCS is that it is difficult to customize and offers lots of functionality that is out of scope for the problems at hand. The following solution was designed together with Nordicstation, specifically designed to solve their needs.

Creating a VCS from scratch would mean that the functionality would be very limited to start off with. Creating the file storage system by storing files or choosing a database to carry the data. This sort of system would then use conflict avoidance by storing the data by a name based on the user that committed that file in a file based system, or connecting tables in a database to refer to the specific user for a file in a database based system. This means that the data would be versioned by time of upload. Options for implementing hashing of files to compare checksums of the files for the purpose of not uploading the exact same file twice could also be implemented. This system would work like a versioned backup system, where the implementation for access control and distribution would handle the presentation of the files. For more details about a custom solution see section 3.3.6.

3.3

Evaluation of distribution platforms and protocols

This subsection will evaluate the distribution alternatives. Distribution platforms and protocols are evaluated and compared to creating a distribution platform from scratch with the relevant functionality.

3.3.1 Web based Distributed Authoring and Versioning

WebDAV is an extension of HTTP and is therefore completely platform independent. Version control with support for version history and merging is available as an ad-ditional extension. ACLs are used for the access control, measuring up to the fine grained access defined by the goals. There is no out of the box user interface for man-aging WebDAV, but creating a custom made interface for manman-aging a repository would be quite straight forward since all functionality is exposed as HTTP methods. The doc-umentation consist mainly of RFC documents which is very detailed, but also hard to read compared to the JSR document for JCR.

3.3.2 File Transfer Protocol

The File Transfer Protocol is cross platform, there are a lot of well documented APIs available making it easy to integrate. It offer no version control, and the integrated access control is limited to logging users in and out. Access is given to whole folders and all content inside, so there is no support for access control on file level.

Using FTP for uploading and download would have worked fine for the prototype, but WebDAV offers the same functionality and more, albeit WebDAV may be more complicated to integrate.

(25)

3.3.3 Java Content Repository

JCR is a java specification, two cross platform implementations the specification are evaluated in the next two subsections. JCR offer very detailed documentation how ver-sion control and access control works. The verver-sion control support verver-sion history and functionality to access all versions. ACLs are used for the access control, measuring up to the fine grained access defined by the goals. The specification describes how developers can establish session based connection toward workspaces, this can be used for creating a user interfaces where content and access control can be managed.

3.3.4 Apache Jackrabbit

Jackrabbit is cross platform and can be deployed on a web server, making it accessible from anywhere. There is no user interface for manging the files under version control. Documentation on how to deploy Jackrabbit as a server is somewhat lacking, there is an official wiki [26] but most of the examples are referring to older versions of Jackrabbit. There is however a mailing list where users can ask for support, the list is available as a forum [27] and it is active. Jackrabbit support both version control and access control since it is fully compliant with JCR.

Problems occurred when trying to deploy and experiment with the APIs, problems that required help from documentation and/or guides to solve, but the lack of both made the problem impossible to solve without asking for community support.

3.3.5 JBoss ModeShape

ModeShape was compared with Jackrabbit since they are very similar. ModeShape has much more documentation, it was however lacking for users with no previous experi-ence. ModeShape has a very active forum, this can sometimes replace documentation. It was not possible to see if ModeShape was customizable enough to suit the system requirements based on the documentation, experimenting with the API would include setting up a server and configure a repository, it would take a lot of time. That mistake had already been made with Jackrabbit.

3.3.6 Custom made distribution system

The problem with using an existing distribution system is that it might be hard to inte-grate with a VCS and provide the necessary access control.

The positive aspects of implementing a custom made system is that all requirements will be met. The implementation requirements made by Nordicstation listed in sec-tion 1.4 were all considered when designing this custom made system. It is also the easiest system to implement. The complex part is to design a database suitable for implementing access control and version control.

Frontend

The system is managed through a web based interfaces. Users can download/upload files, mange privileges, browse version history and choose whom to share them with.

(26)

Using a web solution would make the system platform independent. The alternative would be to create several applications for different platforms (Windows, iOS, An-droid...) to communicate with the backend and enable the users to manage and down-load/upload files.

Backend

The backend handles the file versioning, and consists of two parts. One part is an application that searches for new files posted by users, and these files will be inserted into the data layer (versioned). The data layer is the second part, and an example of how it could be designed is shown in figure 3.2. It is designed to support versioning of files and also access control, see section 2.1.3 about DAC.

Figure 3.2: ER diagram for a possible database solution. Each user can own an arbitrary number of files, the user can then grant permissions to other users by inserting entries in the ACL table.

(27)

System Architecture

Each user within the system have a personal folder. Users can only upload files to their own folder. The user must create a new group via the frontend to share files with others. Users belonging to that group can then create a folder within their personal folder that match the group name (group names are unique). All files posted in this folder will automatically be available to other members of the group. The personal folder is available via WebDAV, this is a convenient alternative to uploading via the web interface. However, the backend will erase files once they’ve been saved in the database as describe previously, it can be seen as a way to upload files, not a way to access them. Figure 3.3 illustrates the workflow.

Figure 3.3: A user upload a document via WebDAV, placing it in the HR folder. The backend system will find the file, save it to the database and erase it from the folder. All members of HQ will be able to download the file using the web interface, assuming the ACL table allows them.

(28)

3.4

Comparisons and Choices for the prototype

The following subsection will present how the evaluated models and solutions match the comparison aspects of the goals presented in section 1.4.

Access Control

Table 3.1 shows compiled comparisons of functionality for the access control models evaluated.

Model Administration Fine Grained Access Control Flat RBAC Easy Hard

Hierarchical RBAC Easy Hard

DAC Hard Easy

Table 3.1: Access Control Model Comparison Table.

When it comes to the complexity of administrating a system. DAC is only slightly harder to administrate because every file needs its own administration. This can be worked around by setting default values for example. DAC is the only model that give enough control over user privileges to fit the access control on document level part of the system. Access control based on roles is not suitable for the system since there is no way to restrict access for users with the same role, or roles with higher privileges.

Version Control

Table 3.2 shows compiled comparisons of functionality for the version control systems evaluated.

VCS APIs available Multiple file types Conflict handling Documentation Subversion Many Yes Yes Good

Git Many Yes Yes Good

WebDAV One Yes Yes Ok

JCR Several Yes Yes Excellent

Custom None Yes Yes None Table 3.2: Version Control System Comparison Table.

When choosing the VCS to use for the prototype, both SVN and Git both have many APIs available while WebDAV and JCR has few. The custom solution was not applica-ble to this question as the API would be written when implementing the system. One very important aspect of the custom made VCS is that it does not take very long to get it up and running, because the functionality required is low. The custom solution also allows continued development after the prototype is finished with little effort.

The negatives of implementing a solution that integrates either SVN or Git includes the fact that setting up the repository and selecting an API that would fit the rest of the system would be both difficult and time consuming. This and the fact that the additional

(29)

functionality of those systems isn’t necessary for evaluating the prototype means that a custom VCS is a very suitable fit.

Distribution

Table 3.3 shows compiled comparisons of functionality for the distribution systems evaluated.

System Cross platform Default AC Default VCS Detailed documentation WebDAV Yes Yes Yes Ok

FTP Yes No No Excellent

JCR Yes Yes Yes Excellent

Apahae Jackrabbit Yes Yes No Poor

JBoss ModeShape Yes Yes No Ok

Custom Yes Yes No None Table 3.3: Distribution System Comparison Table.

When choosing distribution system for the prototype, there were two deployable so-lutions, Jackrabbit and ModeShape. Both implement the JCR specification and both offer support for WebDAV clients. Jackrabbit has very poor documentation, Mode-Shape offered better documentation and had a very active forum for support. None of these proved easy to work with, problems that arose when experimenting with the servers were enough to discard them for the prototype.

FTP is easy to integrate since there is a lot of APIs, and it is a commonly used tech-nique. But it offers no version control and limited access control.

Support for WebDAV clients was one of Nordicstations criteria for the prototype. Documentation for WebDAV is fairly good. WebDAV was used as an extra access point for uploading content to the prototype.

The custom made solution was chosen for distribution, a web based interface offers the most availability for users, independent of platform or device. Using a database was the simplest and most practical way of keeping the data within a system that Nordic-station controls. The main reason for creating a custom made solution is to ease the integration with the VCS and access control model.

(30)
(31)

4

Resulting Prototype

This section will discuss how the prototype was designed and implemented, and present an evaluation of its features.

4.1

The Access Control

The prototype uses an ACL for managing access control for versioned files. The ACL give file owners the power to set privileges for other users on file level. The user interface featured a lot of administrative functionality as a consequence.

4.2

The Version Control

The prototype have a version history of every file that is uploaded, versions are deter-mined by a timestamp. Versioned files are stored in a database, as seen in figure 3.2 in the Evaluation chapter. There can never occur conflicts since each version is a new copy.

4.3

The Distribution

The web application lets users administrate their content. The key functionality imple-mented is listed below. The software architecture of the web application followed the Model-View-Controller (MVC) design pattern.

• Users have a view of the files at their disposal, divided in two categories, private

and shared files. Each file have a version history, any version can be downloaded.

• Users can control access to files they own. By adding another user to a file, the

owner can change permissions for the user on that file to either Read or Write.

• Users can create groups. A new group must have a unique name, the user who

created the group becomes its owner. The owner can add other users to the group, an invitation is sent when doing so. Users who accept the invitation will have the group listed under shared files. The owner specifies the default permission for all files in the group when sending the invite. These permission can be changed for specific files once the user has accepted the invitation.

• Users can upload files so they automatically becomes shared with a group, this

is done by specifying the file path so it matches the group name, for example /MyGroup/Thefile.txt. If there is a group with that name and if the user is a member of that group, the file will become available for all group members.

• Users can upload files using a WebDAV client, these files are then collected by

(32)

4.3.1 The Watcher

The watcher applications is small application that browses the repository for new files. Files that are found will be saved in the database (versioned) and then removed from the file system. The purpose of this application is best illustrated by a scenario.

The same server that host the database and the web application has a WebDAV folder for each user, the folder name must match the user name. The user can, via a WebDAV client create new folders and post files within his or her folder. This is the same as uploading a file via the web application described before. If a user posts a file to /UserId/MyGroupand is a member of a group with that name, then the file will be shared with that group.

The watcher application make sure the file is versioned and updates the ACL. The server should schedule this application to run frequently. This is the only way to incor-porate WebDAV into the solution. The Watcher has no knowledge about what type of client that deliver the files, meaning it is not exclusive for WebDAV. This type of loose coupling makes the system more flexible and easier to expand upon.

4.4

Evaluating the prototype

The following aspects were evaluated for the prototype:

• Time.

• The functionality of the access control. • The functionality of the version control.

Time

The system described above was implemented in about 100 hours (2 persons á 50 hours). This includes installing and configuring the development environments and tools, setting up version control for the prototype source code, designing the database, creating and configuring all the projects, actual development time, and testing the sys-tem functionality. While the development cycle was fast, this was only a prototype and much of the functionality required for a complete system was missing. In addition, the testing of the system was minimal and only crucial bugs were fixed.

Access Control

The fine grained access control of the system works, and was not difficult to implement with an ACL.

One trade off for choosing a DAC solution was that a lot of logic had to be imple-mented into the system to incorporate the fine grained access control. In addition, the administrative responsibilities scales with the amount of users and files. A system with a large amount of users would become very difficult to administrate if there is a need for sharing files with a majority of the users.

(33)

Version Control

Using timestamps for determining versions proved to be a simple and versatile, it opened up the possibility to have different permissions based on time. For example making files deprecated.

This kind of system does lack functionality such as merging of documents and dif-ferentiating versions, but the potential to further develop it is always there if the need arises.

(34)
(35)

5

Analysis and Discussion

The following subsections will analyse and discuss the result of this degree project. One aspect of this degree project was how it relates to environmental science and oc-cupational safety. Information security and privacy fall in under this topic.

5.1

Analysis

The following subsections will analyse how the results reflect the goals for the theoret-ical evaluation and for the implementation (prototype).

Evaluation Goals

The primary goals for the project were to find suitable solutions for implementing a system for sharing documents and for that system to have access control on document level. In addition, the system was required to version control the shared documents and provide a distribution channel for these features.

When evaluating the access control models, DAC suited the goal of integrating access control on document level. This was because the other access control models did not in a practical sense allow for access control on document level.

Finding a suitable version control system to use was harder as all of the evaluated choices had similar functionality. The chosen custom made system did, as most of the other systems that were evaluated, accomplish the goal of version controlling the doc-uments. The custom made system was however the least complex for us to implement, and it was therefore also very quick to get up and running.

The goal of finding a suitable solution for distributing the documents was accom-plished, but choosing a suitable solution from the ones we found was hard. A custom system suited the goals best because it was deemed the easiest to implement in combi-nation with the previously chosen solutions.

Partial goals were to keep a version history for each document and to keep the data private. One of the advantages of having the documents stored in a relational database is that the documents become queryable. This would make implementing functionality to search for documents belonging to a certain user or group trivial. Encrypting the database to further ensure that data is kept private would also be quite simple. Other systems offer methods for differentiating versioned files, often to solve conflicts and merge files. Our solution is to avoid conflicts by not merging files at all. Because of this, our solution supports binary files, which unlike text files can not be merged.

Implementation Goals

The prototype has a web based user interface and a backend that handles the logic and stores the content persistently, two of the main goals. The final goal for the prototype was to enable WebDAV access. We achieved this by adding an extra module, The Watcher.

(36)

5.2

Discussion

The following subsections will discuss how the project progressed, problems that oc-curred and alternatives that could have been perused under different circumstances.

Theoretical work

The first phase of this project was to find existing models and solutions for access control, version control and distribution.

Finding relevant literature and articles for access control proved to be fairly straight forward, there were plenty relevant models to evaluate.

Finding version control systems that could be modified to fit the goals proved to be harder. The shortage of suitable alternatives is one reason why we presented a custom solution.

Research for the distribution platform was also fairly straight forward, we researched those systems and methods that seemed to be practical or specifically requested by Nordicstation.

The time spent on research took longer than expected, taking time from implementing the prototype.

Practical work

We experienced almost no problems when implementing the prototype, the only obsta-cles we faced were problems with configuring the web server hosting the system, this could always be resolved using official documentation. We had less time than excepted for the prototype, and the user interface design became quite crude as a consequence.

Alternatives

Had time allowed, other choice may have been made to add functionality to the version control system, either by incorporating the choices explained in section 3 (Git or SVN) or continue the building of the custom VCS. Implementations of JCR, Jack Rabbit, and ModeShape looked very strong on paper. The fact that they both offer version control, DAC and offer ready to deploy solutions for WebDAV make them strong candidates for further research and evaluation.

Another piece of the prototype which could have been implemented as an alternative to The Watcher would be something that reacted on events, which in this case would be users uploading documents. This would have made the system more resource friendly (using less system resources) and would not have had the inherit delay that a polling module such as The Watcher has. But additional research would be needed to integrate a module with support for events with the rest of our system.

Ethical, Environmental, Social and Economical consequences of the project

Ethical

Integrity in the form of privacy is one important aspect of this degree paper. The proposed solution enables a user to protect his or her private and personally identifiable

(37)

data from other users within the system. The system also minimize the risk of leaking information to unauthorized users by only granting document access to specific users or groups of users within the organization.

Another important aspect is where the data is stored. The data can be kept private by deploying the proposed system within a closed network. An alternative would be to use a cloud service, such as Microsoft Azure och Google Cloud SQL. Choosing the latter alternative would result in private data being kept in an environment not exclusively controlled by the organization who wants to store the data, and allowing access of the stored documents to the respective cloud service provider which could potentially peek or leak the data.

Environmental

Using a system that simplifies the usage of document management and distribution via IT reduces the need for printing on paper. This saves money and is beneficial for the environment since most paper is made from harvesting trees.

Social

A key aspect of the proposed system is to let users collaborate and also distribute their work. This project has had an emphasis on protecting the data and limiting access. It could easily be used the other way around. An efficient way to spread content to a vast amount of users, such as digital art or literature.

Economical

As mentioned before, this system will reduce the need for printing, this will of course also lead to lower costs. Another aspect is efficiency, it is easier to develop through a version control system than, for example, sending changes via e-mail. Imagine if software developers has to manually copy and paste changes by other developers in their source code, it would be impossible to maintain and very time consuming. Same concept is true for any text document.

(38)
(39)

6

Conclusions

This paper present suitable solutions, and also some alternatives to investigate in the future, for implementing a version control system with emphasis on access control. This work is beneficial for anyone in need of a version control system suitable for collaborating with other parties and requires variable and restricted access.

The following subsections will discuss the practical considerations of the prototype and present suggestions for future work.

6.1

Practical considerations of the prototype

The developed prototype was designed to offer version control with emphasis on access control. The application is in its current state not suitable for use due to the lack of security. The web interface offer no more security than a simple log-in for the users. The prototype did however handle the versioning and sharing of documents with other users well. Further development put into the prototype to make it more secure and user friendly would result in a system that could be put into production.

6.2

Future work

One aspect that this paper did not investigate was how the system scales for larger organisations. Is it possible to expand the solution to support multiple repositories or workspaces?

Another aspect to expand upon is security. Implementing encryption for the commu-nication and storage of the documents is a big concern when storing confidential data for most people and companies.

6.3

Final Words

This paper answers the question of how to build a document sharing system with access control on document level, and the models and technologies that best fit that criteria.

The result would therefore benefit anyone interested in building such a system to gain insight into these aspects and be able to build one and not having to resort to proprietary products. This could help companies to transition towards a paperless office.

(40)
(41)

References

[1] Sandhu, Ravi ; Ferraiolo, David ; Kuhn, Richard Rebensburg, Klaus , ”The NIST model for role-based access control: towards a unified standard”, Role-based access control: Proceedings of the fifth ACM workshop, (RBAC ’00), 2000, pp.47-63.

[2] Ferraiolo; David F. Kuhn; D. Richard Chandramouli; Ramaswamy, "Role-Based Access Control", Chapter 2, Artech House, eISBN: 9781596931145, Published 2007.

[3] Scott Chacon, "Pro Git (Expert’s Voi.ce in Software Development)", Apress, ISBN-10: 1430218339, Published 2009

[4] Petr Baudiš, "Current Concepts in Version Control Systems", http://arxiv.org/pdf/1405.3496v1.pdf, Link validated August 5, 2014.

[5] Ben Collins-Sussman; Brian W. Fitzpatrick; C. Michael Pilato, "Version Control with Subversion", Chapter 1, O’Reilly Media, ISBN 10:0-596-00448-6, Published 2004.

[6] Ben Collins-Sussman; Brian W. Fitzpatrick; C. Michael Pilato, "Version Control with Subversion", Preface, O’Reilly Media, ISBN 10:0-596-00448-6, Published 2004.

[7] Ben Collins-Sussman; Brian W. Fitzpatrick; C. Michael Pilato, "Version Control with Subversion", Chapter 8, O’Reilly Media, ISBN 10:0-596-00448-6, Published 2004.

[8] TMate Software, "SVNKit", http://svnkit.com/, Link validated May 19, 2014. [9] CollabNet Inc, "SharpSvn", https://sharpsvn.open.collab.net, Link validated May

19, 2014.

[10] The Apache Software Foundation, "Apache Subversion Documentation", https://subversion.apache.org/docs, Link validated May 26, 2014.

[11] GitHub, "Libgit2", http://libgit2.github.com, Link validated May 19, 2014. [12] JavaGit Project, "JavaGit", http://javagit.sourceforge.net, Link validated May 19,

2014.

[13] The Internet Engineering Task Force, "Versioning Extensions to WebDAV", http://www.ietf.org/rfc/rfc3253.txt, Published March 2002, Link validated May 19, 2014.

[14] The Internet Engineering Task Force, "HTTP Extensions for Web Distributed Authoring and Versioning", http://www.webdav.org/specs/rfc4918.html, Published June 2007, Link validated May 19, 2014.

(42)

[15] IEEE Computer Society, "WebDAV:Versatile Collaboration Multiprotocol", http://users.soe.ucsc.edu/ ejw/papers/dav-ic-2005-final.pdf, Published February 2005, Link validated May 19, 2014.

[16] The Internet Engineering Task Force, "Securing FTP with TLS",

https://tools.ietf.org/html/rfc4217, Published October 2005, Link validated May 19, 2014.

[17] The Internet Engineering Task Force, "File Transfer Protocol",

http://www.ietf.org/rfc/rfc959.txt, Published October 1985, Link validated May 19, 2014.

[18] The Apache Software Foundation, "Apache Commons Net",

http://commons.apache.org/proper/commons-net/, Link validated May 19, 2014. [19] The FileZilla Project, "FileZilla Documentation",

https://wiki.filezilla-project.org/Documentation, Link validated May 19, 2014. [20] Java Specification Requests, "JCR 2.0 Specification",

https://jcp.org/en/jsr/detail?id=28, Published by 10 August 2009, Link validated May 19, 2014.

[21] The Apache Software Foundation, "Apache Jackrabbit", http://jackrabbit.apache.org, Link validated May 19, 2014. [22] The Apache Software Foundation, "Jackrabbit JCR Server",

http://jackrabbit.apache.org/jackrabbit-jcr-server.html, Link validated May 19, 2014.

[23] Jahia Solutions Group SA, "Jahia Embedded Frameworks", http://www.jahia.com/documentation-and-downloads/developers-techwiki/frameworks, Link validated May 19,

2014.

[24] JBoss, "ModeShape", http://modeshape.jboss.org, Link validated May 19, 2014. [25] Magnolia International Ltd, "Magnolia - Replacing Jackrabbit with

ModeShape",

http://www.magnolia-cms.com/resources-directory/slideshows/mconf10-modeshape.html, Link validated May 19, 2014.

[26] The Apache Software Foundation, "Jackrabbit Wiki",

http://wiki.apache.org/jackrabbit, Link validated May 19, 2014. [27] The Apache Software Foundation, "Jackrabbit mailing list",

References

Outline

Related documents