Semantic annotation of best-practices
Angelo Di Iorio, Davide Rossi, Fabio Vitali
Department of Computer Science and Engineering,
University of Bologna, ItalyAbstract
WikiRecPlay is a tool to record/replay/share/recommend sequences of interaction steps between users and web applications. It has been designed to support the sharing and automating of (part of) organizational best practices, that are unstructured, emergent pro-cesses that characterize knowledge-intensive tasks within organizations. WikiRecPlay is implemented as an extension to the Firefox web browser and uses a wiki as a back-end to store/retrieve sequences of interaction. The social dimension of the wiki promotes the sharing of knowledge between members of the organization but most of the know-how sharing with WikiRecPlay takes place via the integrated recommender that is used to suggest the available sequences stored in the wiki that have as starting point the web page that users are visiting. They can then browse the wiki page for a suggested sequence in order to obtain knowledge from other users or can enact one of the sequences that is automatically executed on their behalf. However, sequences stored in the wiki, even if they start from the same web page, can refer to different practice, can be sub-optimal or can be outdated. In this paper we study how to leverage on semantic wiki extensions in order to improve the quality of WikiRecPlay’s recommender trying to overcome the aforementioned issues.
Keywords Semantic Wikis, Enterprise Know-how, Web Replay, Knowledge sharing, WikiRecPlay
Processes, as usually intended by the Business Process Management (BPM) discipline, are structured entities for which a model exist (Aldin and de Cesare, 2001). Most of the BPM activities revolve around process models: process design, simulation, monitoring, enactment, etc. We call these processes
struc-tured processes, referring to the fact that they adhere to a well defined structure
represented in a process behaviour language or notation. In (Rossi and Vitali, 2009) we discussed how social software, by letting users freely interact,
sup-ports emergent coordination scenarios, where process are dynamically refined by the actors without being enforced to obey to a predefined process model.
In this context the business processes become more dynamic, flexible (and less defined) entities that we call organizational best practices, highlighting the fact that they are the result of refinements embodying the organization’s
This concept has been reiterated and refined, over the last few years, by
other researchers as well, and has many contact points with the concepts be-hind adaptive case management. In (Bohringer, 2011), for example, the con-cept of emergent case management is introduced along with a solution based on social software tools like microblogs. The case management paradigm is also the framework for handling dynamic processes with social software tools presented in (Bider et al., 2011) and for works such as (Rychkova and Nurcan, 2011) in which is presented a BPMN extension to cope with this less
struc-tured framework, an effort to find a common ground between strucstruc-tured and
WikiRecPlay is a tool (implemented as an extension to the Firefox web browser) that has been designed to support recording/replaying/sharing/rec-ommending sequences of interaction steps representing parts of best practices in a context in which individuals from an organization adopt social software tools deployed as software-as-a-service (SaaS) platforms1.
By publishing and processing information in blogs, microblogs, wiki, fo-rums; by using tagging services; by collaboratively editing documents the us-ers reach organizational goals following sequences of activities that have been
refined in previous interactions.
This how-to knowledge in WikiRecPlay is captured in the form of sequences of interactions that can be used as a documentation or can be en-acted on the behalf of the user. To make this sharing as effective as possible WikiRecPlay integrates a recommender system that suggests to the users the interaction sequences (stored in the wiki) to which they can be interested in. A recommender, however, is just as good as it is the ability of the user to take advantage of it. The original recommender integrated byWikiRecPlay is a very naïve one: the sequences suggested to the users are all and only those whose start page corresponds to the page they are visiting; moreover only the name and a brief author-edited description are available to character-ize them. This behaviour is sub-optimal for various reasons: (i) for the very same page it is possible that multiple sequences, dealing with very differ-ent practices, are available; (ii) a sequence can be outdated (e.g. because the 1 A demo version is available at http://www.cs.unibo.it/projects/wikirecplay/
Semantic annotation of best-practices 13 web application it refers to has been changed); (iii) no information is avail-able on how to choose among different sequences that have similar names/ descriptions (suggestion from other users and a reputation metric associated to the author could be useful).
The aim of the work presented in this paper is trying to overcome the aforementioned limits of the current proposal by adding a social dimension to the sequences stored in the wiki and let this kind of information emerge from the recommender. The structure of this presentation is as follows. Sec-tion 2 presents WikiRecPlay and its design goals, and provides a little details on its implementation. Section 3 introduces the semantic wiki extension that we propose to improve the social dimension of WikiRecPlay and its recom-mender. In section 4 we conclude and discuss our approach, in comparison to similar works.
2 WikiRecPlay: design goals, usage and implementation
WikiRecPlay is a Firefox extension that allows users to record and re-play sequences of web activities (interactions with web sites using a browser). The way users perform such activities has been subject to changes in the recent past: web applications are getting more interactive, ubiquitous and easy to use; the social dimension has become crucial: different users – with different skills and tools – share content easily and complete tasks together, in a new and spontaneous way. From a technical perspective, monolithic server-side applications are being replaced by Ajax-based ones that load and manipulate (pieces of) content client-side. WikiRecPlay has been designed to support us-ers in automating web interactions within this context.
In order to define what we wanted from WikiRecPlay we selected a number
of test cases build around known web applications2: GoogleDocs for its very
sophisticated interface and Ajax-based machinery; MediaWiki and WordPress for their relevance as social software tools; PizzaBo and JqueryUI for the large amount of highly dynamic client-side code.
WikiRecPlay has been built on an event-based model, in order to work on highly dynamic web pages: the application is able to record and re-play the
events occurring in the browser (mouse click, form filling, selections, etc.).
An alternative approach would have been to capture, store and replay HTTP 2 http://docs.google.com http://www.mediawiki.org http://wordpress.com
transactions but such an approach cannot cope with (client-side) dynamic pages, leaving all Ajax-based applications unsupported.
Fig. 2.1: The sidebar Fig. 2.2: Configuring sequence steps
Figure 2.1 shows the main interface of the WikiRecPlay client. The sidebar lists all loaded sequences and allows users to edit or replay each of them. A new sequence can be recorded and stored through the same interface.
Figure 2.2 shows the interface for editing a sequence. Once it has been recorded, in fact, its details appear in the ‘Step list’ panel. Each step can be
configured, moved or deleted separately.
Each step is associated to an event occurring on a page element. The in-terface shows a screenshot of the page highlighting that element with a red bordered rectangle, and allows users to decide:
• which event needs to be captured
• which information users are expected to provide
• how event-related data (like the content of a text field) will be set when
re-playing the same sequence. Three options are available: (i) default value, to use the values originally recorded, (ii) ask at start time, to make users pro-vide that information before playing the whole sequence, or (iii) ask during execution to let the application stop the sequence re-play and ask the user to provide the required data right before they are used.
All these data are automatically collected by WikiRecPlay when recording a sequence; users can update and customize them at any time.
Semantic annotation of best-practices 15
Fig. 2.3: Setting parameters to replay a sequence
Figure 2.3 shows a sample interface for inserting data when re-playing a sequence. Such a dialog is dynamically built by WikiRecPlay from the de-scription of [each step of] a registration.
WikiRecPlay also supports synchronization steps, these are steps that can be suspended until a given event occurs. The event does not have to be nec-essarily triggered on the same page and can be associated to different web applications, like the publication of some content on Twitter, the tagging of a photo on Facebook and so on. This mechanism makes it possible to replay sequences that need to synchronize with activities carried out by different users. In order to support this feature WikiRecPlay can halt a sequence until a
specified value appears in a RSS stream. By using tools like Dapper3 (a web
content extractor) and Pipes4 (an interactive feed aggregator and
manipula-tor), both from Yahoo, it is possible to capture and process complex events,
like a reply to a post in a forum, a tweet with a specific hashtag being pub -lished by a given user, and many others, and reduce them to simple entries in a RSS feed (the one that is monitored by WikiRecPlay to resume the replay of a sequence).
Last but not least, WikiRecPlay allows users to share sequences. Users can
store data in two places: in a local XML file or on a wiki (which the sidebar is configured to communicate to). Wikis make it possible not only to easily share
sequences but also to edit and improve them collaboratively.
WikiRecPlay is built on the standard Firefox extension mechanisms and, in particular, the XPCOM framework. The overall application follows the MVC (Model-View-Controller) design pattern. The sidebar (View) is in charge of capturing user‘s actions and sending them to the Controller, that actually stores and manipulate procedures by interacting with other components. An
internal XML format – whose details are not relevant here – has been defined
to describe sequences and steps and is used throughout the application. The main modules of WikiRecPlay are listed below:
Recorder: captures all the activities (DOM events) performed by the user.
Player: reads a sequence descriptor and replays it, in case asking input data to the user; the player exploits browser facilities to send HTTP requests and to parse responses. LocalStorageManager: saves sequence descriptors in a lo-cal repository, by epxloiting the browser storage space. WikiStorageManager: saves sequence descriptors on a wiki. This module is in charge of login to the wiki, posting data, retrieving sequences or updating them. It uses the
WikiGate-way API (Shanks, 2008) – that defines a set of operations exported by multiple wiki clones – so that WikiRecPlay is not bounded to a specific server-side plat -form. Validator: validates sequence descriptors, before saving and exporting them. This module actually communicates with a web-service exporting valida-tion features.
While a detailed description to the inner workings of WikiRecPlay is out the scope of this paper we want to highlight that one to the main problems we had to face is related to dynamic web pages: since most elements can be cre-ated/moved/deleted at any time it can be tricky (if at all possible) to associate events and current page elements; in several occasions we had to rely on smart heuristics to overcome these kind of problems.
Semantic annotation of best-practices 17
3 From WikiRecPlay to SemanticWikiRecPlay
One of the most important features of WikiRecPlay is the possibility of
shar-ing sequences. Users can store data in two places: in a local XML file or on a wiki, which the extension is configured to communicate to. Wikis make it
possible not only to easily share sequences but also to edit and improve them collaboratively.
The wiki-side part of WikiRecPlay is very basic. Each sequence is stored in a page that includes both a human-readable description of the sequence and the XML snippet processed by the extension. Whenever a user records and saves a new sequence, data are saved onto the corresponding wiki page through an Ajax request. A module called WikiStorageManager is in charge of login to the wiki, posting data, retrieving sequences or updating them. From then on, sequences can also be edited directly by editing the corresponding wiki pages. For instance, users can customize parameters, add descriptions of a sequence, provide documentation and so on.
The wiki also acts as back-end for the WikiRecPlay recommender module. Whenever a user visits a page, in fact, WikiRecPlay suggests her/him all se-quences whose start page corresponds to the page. The information provided about each sequence is raw (just the name and a brief textual description)
and retrieved on-the-fly from the wiki. That information might be partial and not sufficiently useful for the users. Just think about multiple sequences hav -ing similar names and descriptions: how to choose among them? How to be
sure that one fits our needs? How to recognize a faster and/or more precise
sequence for a given task? How to measure the reliability (and security) of a sequence?
Our idea is to improve this recommender module by exploiting semantic wikis. A semantic wiki is a wiki system enabling users to write semantic an-notations about wiki pages (more precisely, they are meant to write seman-tic data about a given domain, represented by wiki pages). Semanseman-tic wikis join together the advantages of the wiki open editing model, and those of semantically-enriched knowledge-bases. In our case, a semantic wiki can be exploited to let users annotate sequences with information that is available to
the recommender and shown to the final users when suggesting sequences.
Furthermore semantic descriptions can be exploited to search sequences re-lated to the current one, to search sequences of a given author, to mine data on sequences and so on.
In fact, we have developed a proof-of-concept implementation of the WikiRecPlay environment that uses a semantic wiki instead of a plain wiki,
and includes a recommender that surfs and summarizes the semantic informa-tion stored in the wiki back-end. We call it Semantic WikiRecPlay due to the newly added semantic layer. It is based on SemanticMediaWiki5, an extension
of MediaWiki with a new syntax to let users annotate content fragments, and an enhanced interface to manage semantic data. The technical robustness and the support of the community make Semantic MediaWiki particularly suitable for our needs.
There is also another good reason for choosing Semantic MediaWiki (SMW). Being an extension of MediaWiki, it can be integrated with other extensions in order to create a richer environment where to combine the WikiRecPlay semantic knowledge-base with other data. In particular, our architecture also includes the TagAsCategory extension6, that allows users to categorize content
without using explicitly the syntax of SMW. This extension, in fact, shows users a very simple interface to add new tags to a page. Behind the scenes, these tags are converted into page categories that can be surfed and searched as usual. The interesting point is that such data are directly integrated with semantic assertions and can generate further feedback and information about sequences, available to the recommender.
A very simple example helps us discussing the potentialities of this ap-proach. The following code snippet shows the source code of a page storing a WikiRecPlay sequence. Note that we omitted the operational description of the sequence, that is not relevant for the purposes of this discussion.
<sequenceDescriptor> […] </sequenceDescriptor> This sequence was created by [[author:: Angelo Di Iorio]].
It allows you to buy a bus pass on the [[company:: ATC]] website, valid for [[city:: Bologna]].
[[Category: Bus]][[Category: Ticket]]
The sequence allows users to buy a bus pass on the ATC website7, valid for
the city of Bologna. Such information is expressed as human-readable text but, above all, as semantic statements that can be extracted and processed separately. In fact, the previous page is rendered as follows:
6 http://www.mediawiki.org/wiki/Extension:TagAsCategory 7 http://www.atc.bo.it/
Semantic annotation of best-practices 19
Fig. 3.1: The wiki page storing a sequence to buy a bus pass, enriched with tags and semantic assertions
The bottom part of the page contains a factbox that summarizes statements on the sequence. These statements indicate the author, the name of the com-pany and the city a pass can be used for in a machine-redable format (extracted from the in-line syntax of SMW). The TagAsCategory extension adds a widget to easily add tags (categories) to the page: the current list includes ‘Tickets’ and ‘Bus’ but new ones can be added with just one click. Note also that other similar extensions, such as SelectCategoryTagCloud8 or CategorySuggest9,
also provide functionalities to show the cloud of tags, and suggest similar tags through auto-completion features. We plan to experiment these extensions too, but TagAsCategory is sufficiently power for the current proof-of-concept im -plementation.
The knowledge-base created by categories and semantic assertions is now available for advanced searching, retrieving and recommendation. In particu-lar, it is possible to exploit the SMW query language to look for and suggest sequences to the WikiRecPlay users. A very basic query is shown below: 8 http://www.mediawiki.org/wiki/Extension:SelectCategoryTagCloud
[[Category: Bus]] | ?city
| ?company | ?author
This query lists all pages (sequences) whose category is ‘Bus’ and, for each of them shows its author, and the city and the company the sequence is valid for. Figure 3.2 shows the screenshot of a page containing a table generated automatically by the query.
Fig. 3.2: The list of sequences that allows users to buy bus tickets, along with semanticproperties associated to the sequence
The integration of the results within a wiki page is only one possible outcome of the query execution. Similarly, queries can be launched by the WikiRecPlay recommender in order to suggest sequences to the users.
The list of semantic properties that can be set on a page, and that can be read by the recommender, is boundless: users might decide to tag sequences by their scope, by their application domain, by their limitations, and the list could go on and on.
These user-defined properties can also be sided by other pre-defined data,
that the same community agree on for characterizing sequences. For instance, sequences can be described in terms of reliability, stability and frequence of
Semantic annotation of best-practices 21
available for surfing and changes, some of these properties can also be gener -ated automatically and integr-ated in the overall knowledge-base.
A further class of properties that the WikiRecPlay recommender exploits are the ‘typed links’ . Semantic MediaWiki, in fact, allows users to character-ize links between pages. These links can express semantic relations between sequences. For instance, users can state that a sequence ‘specializes’ another one, or that ‘is faster than’, or that ‘replaces’ an older version, or that is simpy ‘related’ to other sequences. The network of relations created by such links can be again surfed by the recommender for suggesting new sequences.
This is only a preliminary and incomplete set of semantic information that can be added on sequences. We expect the WikiRecPlay community of users to come into play at this stage, supported by the open editing model of wikis, and the semantic and folksonomies facilities.
Notice also that MediaWiki includes some extensions to simplify the au-thoring process of such semantic information. The dilemma between com-pletely free editing environments and more structured ones for semantic data
is still unsolved (Di Iorio et al, 2008). For the purposes of this paper, it suffices to say that different tools successfully mitigate such difficulties. In particular,
we plan to integrate Semantic_Forms10 in our architecture. It is a MediaWiki
extention that generates forms for easily editing structured data within wiki
pages. It exploits templates, i.e. pre-defined structures which are dynamically filled by content and rendered as articles: forms are generated from templates,
whose data unit have been previously typed. Creating templates for
describ-ing sequences, it is then possible to create simplified interfaces for the final
Rating and social facilities complete the picture. One of the parameter that users can read to choose sequences (among those suggested by the recom-mender on the basis of their semantic characterization) is about the reputa-tion of the author and of the sequence itself. Even this informareputa-tion can be added to wiki pages through some MediaWiki extensions. Plenty of these extensions exist: W4G_Rating_Bar11 is a commercial solution powerful and
flexible; CommunityVoice12 and AjaxRatingScripts13 are freely available and
stable. These extensions allow users to add comments and to assign a vote to 10 http://www.mediawiki.org/wiki/Extension:Semantic_Forms
11 http://www.mediawiki.org/wiki/Extension:W4G_Rating_Bar 12 http://www.mediawiki.org/wiki/Extension:CommunityVoice 13 http://www.mediawiki.org/wiki/Extension:AjaxRatingScript
the page (i.e. to the sequence) that the WikiRecPlay recommender could
ag-gregate and show to the final users. The set of parameters to be measured, as well as ranges and possible values, is highly configurable. Ratings can then
provide users feedback on the popularity, liveliness, quality of the sequences and so on. From an implementation point of view, the integration of ratings facilities is not straightforward. These extensions, in fact, are still not natively
integrated with Semantic MediaWiki but it should not be difficult to translate
data at runtime and to populate semantic data.
Social capabilities could also be integrated in the whole framework. Social-Profile14, for instance, is another extension that allows wiki users to create their
own (social) profiles and to share content with other users. Statistics about the
reputation of the users, about the sequences they created, about their previous and current productivity are all useful data for the recommender and, eventu-ally, for the users that will choose the sequences to replay. This last extension is not yet integrated in our proof-of-concept implementation but represents a promising development direction we will investigate in the near future.
Finally, note that exportation capabilities of Semantic MediaWiki into RDF and OWL make the whole WikiRecPlay knowledge-base on sequences avail-able for advanced retrieving, searching, and reasoning. Such data can also be integrated with other sources of information, for instance to Linked Data silos or other Semantic Web resources.
The use of client-based tools to record and replay interactions sequences with web applications has been mainly developed in the context of web application testing - with tools such as Selenium15, RoboFox (Koesnandar et al., 2008),
Mugshot (Mickens et al., 2010) and others. A tool that uses these techniques to enable how-to knowledge sharing within organizations is CoScripter (Leshed et al., 2008) and its evolution ActionShot (Li et al., 2010). Like WikiRecPlay, CoScripter can use a wiki to promote the sharing of sequences of interactions (that are represented as sequences of item described using a natural language-like syntax). The CoScripter wiki allows users to freely tag pages correspond-ing to sequences and gives them a ratcorrespond-ing. Sequences can be sorted by criteria
such as “recent use”, “favorite”, “modification” or “creation date”. 14 http://www.mediawiki.org/wiki/Extension:SocialProfile
Semantic annotation of best-practices 23
In order to find a suitable sequence users have to perform a search from the
integrated sidebar or have to browse the wiki. No explicit recommender is cur-rently integrated in the system.
It is our opinion that, although the direction taken by CoScripter and simi-lar tools was very promising, the part of sequences recommendation, and the overall mechanisms to characterize sequences, is still under-designed and under-exploited. The overall social dimension of such tools could be pushed forward. WikiRecPlay had similar limitations.
That is why we explored a social semantic layer on top of the original platform: social web tools can leverage on WikiRecPlay to better support organizational best practices and WikiRecPlay can leverage on social
se-mantic web to increase its usefulness to the users. This mutual benefits loop
improves the usefulness of WikiRecPlay by adding it a (semantic-aware) social dimension.
In this paper we have shown a proof-of-concept semantic wiki (built on top of Semantic Media Wiki with selected extensions) that can be used as a back-end to WikiRecPlay in order to achieve our goals. Still, there is a lot of work to be done in this direction. We believe that preliminary experiments are promis-ing. We are now setting up a freely accessible installation of theWikiRecPlay back-end, around which building a lively community of users. The semantic dimension of WikiRecPlay can help us in such a challenge.
Aldin, L. and de Cesare, S. 2001. A literature review on business process modelling: new frontiers of reusability. Enterp. Inf. Syst., 5(3):359–383.
Bider, I., Johannesson, P. and Perjons, E. 2011. A strategy for merging social software with business process support. In Business Process Management Workshops, volume 66 of Lecture Notes in Business Information Processing, pages 372–383. Springer Berlin Heidelberg.
Bohringer, M. 2011. Emergent case management for ad-hoc processes: A solution based on microblogging and activity streams. In Business Process Management Workshops, vol-ume 66 of Lecture Notes in Business Information Processing, pages 384–395. Springer Berlin Heidelberg.
Di Iorio, A., Vitali, F. and Zacchiroli, S. 2008. Wiki content templating. In Proceedings of the 17th international conference on World Wide Web, WWW ’08, pages 615–624, New York, NY, USA. ACM.
Koesnandar, A., Elbaum, S., Rothermel, G., Hochstein, L., Scaffidi, C. and Stolee, K. T. 2008. Using assertions to help end-user programmers create dependable web macros. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations