Categorising posts into sections and using metadata.

Chapter 4 LaBLog, the laboratory blog: a new approach to open notebook science through the

4.2.1 The development of LaBLog Phase 1 – an introduction to blogging using the “beta-glu” blog.

4.2.1.2 Categorising posts into sections and using metadata.

In a paper laboratory notebook, experiments are naturally always arranged chronologically by date. The pages of the lab book follow on with no gaps being allowed. A blog is different as the news feed is usually organised to show the most recent post first i.e. chronologically reversed. A blog is also more flexible as the posts can also be organised into sections, which may sort the posts by type rather than date. At the outset of using the blog, both the directed evolution project on β-glucuronidase (chapter 2), and the inhibitor investigations on β-galactosidase (chapter 3) were in progress concurrently. Therefore the posts were initially categorised into two sections corresponding to the two projects, with the sections being called “beta-glucuronidase” and “beta-galactosidase preparation and assays”. All posts were assigned into one of those particular sections, which aided following the work flow by categorising which experiment a particular procedure belonged to, as by selecting the relevant section, the newsfeed could be ordered chronologically by experiment.

Shortly after the blog was initiated, the beta-galactosidase project was stopped and newer posts on the beta-glucuronidase project began to be categorised into new, more specific, sections which were added to the beta-glu blog as the research expanded, rather than just added to the “beta-glucuronidase” section. The number of sections increased to accommodate these new posts, leading to the blog becoming difficult to navigate and posts being more difficult to assign (figure 4.2). This is a natural process as the blog data model mentally evolves and is a process which makes reorganisation difficult. For an ordinary blog covering a wide variety of subjects this many sections may not have been a problem, but for a primary research notebook it was a problem.

Figure 4.2: close up section of a screenshot of the beta-glu blog showing the number of sections on the blog. These are listed under the heading sections and amount to 13 categories in total (beta-galactosidase..., Beta-glucuronidase, starting materials..., templates, Thermocycler programs, buffers, cell strain, Data, enzymes, materials, primers, product, and software discussions)

The main method of organising and categorising blog posts is through the use of user-defined categories, and tags comprised of key-value pairs. As users are able to define categories, it can lead to a tendency to overload specific categories. While a post can only have one category, it can have multiple key-value pairs associated with it.

Early use of the blog tended to overload the categories (in this case the category defined as “sections”), and not use the tags effectively. As a consequence of the categorising and tagging being ineffective it made it difficult to link sequences of work together in a meaningful manner. For instance, at this stage experiments were still being written up in the paper lab book as well as on the blog. Due to this the duplicated paper laboratory book page was selected as the key- value pair for tagging experiments. This was not a sensible tag as laboratory book references do not help to categorise data effectively. A laboratory book reference may be a useful piece of information if you have the book, but it doesn’t provide any useful information about the experiment to an outside user such as what type of experiment it is and other information regarding what happened during the experiment. Additionally a book reference is an infinite list,

the number of references increase as the number of experiments increase. Therefore the list of key-value pairs was infinite.

Within a complex set of experiments, such as those of the directed evolution, it was clear that a goal for the blog was to enable a researcher to be able to track through the process of one research “thread” among many “threads” in progress. For example on the beta-glu blog this would be to track between experiments regarding the directed evolution and experiments regarding the inhibition assays. Early attempts to implement this tracking led to tagging the data as “sample parent” and “product”. Categorising data as sample parent and product attempted to link experiments together and show how they were related through the procedure by which a sample is changed from a reagent (“sample parent”) to a product. In a series of experiments, the product of one procedure may become the reagent for another. Therefore tagging with these categories should have made it easier to get an overall picture of how the samples fitted together and were used for each procedure and throughout the cycle. In practice this did not work as the wrong value was assigned to the tag key. The key was “sample parent” or product, but the given value assigned to each key was its associated paper lab book reference.

This meant that like using lab book reference as a key value pair, the list was infinite and meaningless. Though a sample was now a sample parent or product or both, it still didn’t tell the reader or user anything useful about the data such as what it was exactly, only that it linked to the paper reference. This did not help research in general, only the transition to understanding recording experiments on the blog alone (though this was not happening at that stage).

As the new tagging system was used, it also became clear that the tag system was causing a lot of confusion in the linking of sample parents and products. This confusion highlighted a fundamental flaw in the organisation. One example is shown by reactions such as ligations which combine two sample parents (in this example the ‘insert’ and the ‘backbone’) to produce one product. It was impossible to tag both the insert and the backbone as the sample parent, so further tags of sample parent 2, and in some examples sample parent 3 had to be added as a solution to this problem. Figure 4.3 demonstrates a digestion reaction where above the method are the tags sample parent, sample parent 2 and sample parent three with three lab book references. This highlights the confusion in tagging the data and putting it in the context of the overall experiment.

Figure 4.3: screenshot of a post on the beta-glu blog where the metadata tags and descriptors are useless for categorising the data due to being assigned what are, in essence, arbitrary numbers. These are lab book ref, sample parent, sample parent 2 and sample parent 3. There are a number of random symbols on view within the table. This is due to an upgrade of html, which has led to html rendering differently to when it was first entered.

The solution to this organisational problem came from the realisation that the different parts of the experiment could be linked together using links, while the post categories and key-value pairs should focus on what the post represents. Therefore the connection between posts was shown by links and finally the data in a post could be categorised by what it represented.

In document Investigations into Open Notebook Science: directed evolution as a model for exploring online laboratory notebooks. (Page 172-175)