Practical Guidelines for Building Semantic eRecruitment
Malgorzata Mochol, Elena Paslaru Bontas Simperl Free University of Berlin
Takustr. 9, 14195 Berlin, Germany mochol, email@example.com
Abstract: This paper describes practical lessons learned in the project Knowledge Nets which examines the technological feasibility of the upcoming Semantic Web and the business implications of using these technologies in specific market sectors. In devel-oping a Human Resource ontology to inject semantics-awareness to current job portals, we investigated the potential of reusing the huge amount of domain knowledge already available in ontology-like form in the eRecruitment domain as an input for the do-main conceptualization. According to our experiences we confirm previous findings in the knowledge acquisition literature and in recent surveys of the state of the art in the ontology engineering area: 1) building ontology-based applications is still a te-dious process mainly due to the lack of mature tools and methods which can handle the requirements of real-world applications; and 2) using existing ontologies in new application contexts is currently related to considerable efforts which should not be neglected by the engineering team. We use the insights gained during this project to derive a set of guidelines for developing Semantic Web applications in similar domains. Key Words: Semantic Web, eRecruitment, Ontology Building, Reuse
Category: D.2.10, D.2.m, I.2.4.
Ontology engineering is currently evolving from a pure research topic to real-world applications. This trend is emphasized by the wide range of international projects  with major industry involvement and by the increasing interest of small and medium size enterprizes requesting consultancy in this domain . However, due to the well-known difficulties associated with ontology engineering activities, building and deploying ontologies at industrial scale has to be sup-ported not only by elaborated methodologies(e.g. [2, 7]), technologies and tools,
but also by extended case studies and comprehensive guidelines, which aid the
engineering team in practical situations in their attempt to develop ontology-based applications. In this paper we aim to make a modest contribution to these efforts by reporting our practical experiences in building an ontology-based eRe-cruitment application. The project we present uses domain ontologies to improve the quality of conventional job search engines beyond the multitude of various keyword- and statistics-based algorithms. In developing the target application ontology we investigate the possibility of reusing the impressive body of domain knowledge available in the form of classifications, taxonomies and ontologies on
the Web. Our experiences confirm previous findings in the ontology engineering literature: 1) building ontology-based applications is still a tedious process due to the lack of proved and tested support tools and methods; and 2) reusing ex-isting ontologies within new application contexts is currently related to efforts, potentially comparable to the costs of a new implementation.
The setting of our project is a typical use scenario for the deployment of Seman-tic Web technologies, considered w.r.t the application type (i.e. semanSeman-tic search) and the application domain (i.e. eRecruitment)[3, 6]. Given the complexity of the application domain correlated with the lack of operational experience as regard-ing ontologies at corporate level, practice-oriented case studies and guidelines for building Semantic Web applications for Human Resource are a core require-ment for a serious impact of semantic technologies in this field. Therefore, a second goal of this paper was to use the experiences gained during the project to further elaborate existing best practices towards a list of recommendations for the eRecruitment domain, which, far from claiming completeness, might be useful for developing similar real-world applications.
The rest of the paper is organized as follows: Section 2 gives a brief overview of the application scenario. Section 3 describes the generation of the HR-ontology, while Section 4 introduces a preliminary set of guidelines for building seman-tic verseman-tical retrieval applications  according to the empirical findings of the project. We summarize the lesson learned with a brief conclusion in Section 5.
The Project Setting
The project Knowledge Nets1 approaches the impact of semantic technologies
from the business and technical viewpoint in order to make predictions about the influence of these new technologies on markets, enterprizes and individuals. For this purpose the project takes a closer look at particular market sectors and
application scenarios. Every scenario includes atechnological component which
makes use of the prospectedavailability of semantic technologiesin a perspective
of several years and a deployment component assuming the availability of the
required informationin machine-readable form. The first scenario in this context was situated at the Human Resource (HR) domain with the aim to analyze the online job seeking and procurement processes with and without the usage of se-mantic technologies [1, 5, 9]. In a Sese-mantic Web-based recruitment application the data exchange between employers, job applicants and job portals is based on a set of shared vocabularies describing domain relevant terms: occupations, industrial sectors and skills. These commonly used vocabularies can be formally
defined by means of a so-called Human Resource ontology (HR-ontology) (cf.
Sec. 3). Given a rich and machine-processable representation of the domain of in-1
terest, the ontology forms the basis for the implementation ofsemantic matching
techniques which compute semantic similarities between information resources, i.e. job postings and applicant profiles. In doing so ontologies contribute to the realization of more powerful and flexible eRecruitment solutions which include advanced search and presentation facilities based on knowledge about the ap-plication domain. Furthermore, the analysis of the potential economic impacts showed that the main actors in the employment market (employers and job seek-ers) would both benefit from the realization of the ontology-based scenario.
The HR Ontology
The need for comprehensive classification systems in the HR field has been recog-nized at an early stage of the eRecruitment era by many interested parties. In particular major governmental and international organizations strove the emer-gence of standard classifications comprising unambiguous and well-documented descriptions of occupational titles, associated skills and qualifications. The result is an impressive inventory of classification systems, mostly with national impact, ready to be deployed in job portals to simplify the management of electronically available job postings and job seeker profiles and to encourage application inter-operability. Standards such as O*NET (Occupational Net), ISIC (International Standard Industrial Classification of All Economic Activities), SOC (Standard Occupational Classification) or NAICS (North American Industry Classification System) provide a feasible basis for the development of eRecruitment informa-tion systems. At the same time they are valuable knowledge resources for the de-velopment of application-specific ontologies which can inject domain semantics-awareness to classical solutions in this field.
Due to the high interoperability requirements in our application scenario, cor-related with the complexity of the domain of interest, we decided to apply a reuse-oriented engineering strategy performed according to the following phases:
•Discovery of the reuse candidates:conducing a survey on potentially use-ful ontological sources.
•Evaluation of the ontological sources:analysis of the result from the pre-vious step w.r.t. its domain, application relevance, quality and availability.
• Customization of the sources: extraction and integration of the relevant fragments of the (partially comprehensive) sources to the final HR ontology.
3.1 Discovery of the Reuse Candidates
In order to compute a list of existing ontologies or ontology-like structures which are potentially relevant for the HR-domain, we carried out a comprehensive search with the help of current support ontology location technologies:
pre-defined queries combining format and content descriptors, such as “file-type:xsd human resources” or “filetype:html occupation classification”.
• Ontology search engines & repositories:resorting to existing search en-gines and ontology repositories pointed out the immaturity of these technologies for the Semantic Web. However, we obtained better results by consulting
collec-tions of ontology-related resources such as controlled vocabularies.2
• Domain-related sites & organizations: this search strategy focused on international and national governmental institutions which might be involved in standardizations efforts in the HR-area. Discussions with domain experts com-plemented by Internet research led to the identification of several major players in this field: at national level the Federal Agency of Employment (Bundesagentur
f¨ur Arbeit), at foreign level the American, Canadian, Australian and Swedish
agencies and at international level institutions like the United Nations (UN), United Nations Educational Scientific and Cultural Organization (UNESCO) or the HR-Consortium. These organizations are involved in standardization, pub-licly available in the form of lightweight HR-relevant ontologies.
The result of the discovery procedure was a list of approximately 24 resources covering both descriptions of the recruitment process and classifications of oc-cupations, apprenticeship, skills or industrial sectors in English and German.
3.2 Evaluation of the Reuse Candidates
The engineering team decided to reuse the following resources3:
•HR-BA-XML:official German extension of Human Resource XML, the most widely used standard for process documents. HR-XML is a library of more than 75 interdependent XML schemes defining particular process transactions, as well as options and constraints regulating the correct usage of the XML elements.
•SOC-Standard Occupational Classification:classifies employees into oc-cupational categories (23 major groups, 96 minor groups, and 449 occupations).
• BKZ: German version of the SOC, classifying employees into 5597
occupa-tional categories according to occupaoccupa-tional definitions.
• WZ2003:German standard classification of economic activities.
• NAICS-North American Industry Classification System:provides in-dustry sector definitions for Canada, Mexico, and the United States to facilitate uniform economic studies across the boarders of these countries.
•KOWIEN:a skill ontology which defines concepts representing competencies required to describe job position requirements and job applicant skills.
The selection of the 6 sources was performed manually without the usage of 2 http://www.taxonomywarehouse.com
http://www.hr-xml.org, www.bls.gov/soc, www.arbeitsamt.de/hst/markt/ news/BKZ\_alpha.txt, www.destatis.de/allg/d/klassif/wz2003.htm, www. census.gov/epcd/www/naics.html, www.kowien.uni-essen.de/publikationen/ konstruktion.pdf
a pre-defined methodology or evaluation framework. The documentation of 24 potential reuse candidates was consulted in order to assess the relevance of the modelled domain to the application setting. The decision for or against a par-ticular resource was very effective due to the small number of reuse candidates covering the same or similar domains and the simplicity of the evaluation frame-work, which focused on provenance and natural language aspects. For the Ger-man version of the ontology the BKZ and the WZ2003 were the natural choice for representing occupational categories and industrial sectors, respectively. The same applies for the English version, which re-used the SOC and NAICS clas-sifications. As for occupational classifications in the English language, the SOC system was preferred over alternatives like NOC or O*NET due to the
availabil-ity of an official German translation.4 The same applies to the choice between
industry sector classifications: in contrast to ISIC5, the NAICS system is
pro-vided with a German version, while used in various applications in the HR-area.
3.3 Customization and Integration of the Relevant Sources
The main challenge of the eRecruitment scenario was the adaption of the 6
reusable ontologies to thetechnical requirements of the job portal. From a
con-tent oriented perspective, 4 of the sources were included to 100% to the final setting, due to the generality of the application domain.The focus on a particu-lar industrial sector or occupation category would require a customization of the source ontologies in form of an extraction of the relevant fragments. To accom-plish this task for the KOWIEN ontology we compiled a small conceptual vocab-ulary (of approx. 15 concepts) from various job portals and job procurement Web sites and matched these core concepts manually to the source ontology. To reuse the HR-BA-XML we manually examined the 75 independent XML schemes, which define various HR transactions, in order to identify those XML-segments describing job position seeker and posting as relevant for our scenario. The can-didate sources varied w.r.t. the represented domain, the degree of formality and the granularity of the conceptualization. They were labeled using different nat-ural languages and implemented in a broad range of formats: text files (BKZ, WZ2003), XML-schemes (HR-XML, HR-BA-XML), DAML+OIL (KOWIEN). While dealing with different natural languages complicated the process, human readable concept names in various languages were required in order to make the ontology usable in different job portals and to avoid language specific problems. Another important characteristic of the candidate ontologies was the absence (except for the KOWIEN ontology) of semantic relationships among concepts. Consequently we had to focus on how vocabularies (concepts and relations) can
http://www23.hrdc-drhc.gc.ca/2001/e/generic/matrix.pdf, www.onetcenter. org
be extracted and integrated into the target ontology. The usage of the target HR ontology in semantic matching tasks requires that it is represented in a highly formal representation language (implementation realized in OWL). We implemented dedicated translation tools to convert the semi-structured input formalisms to OWL. Furthermore, the text-based classification standards were inserted manually to the ontology with the help of a conventional ontology edi-tor. The instance data, i.e. descriptions of job postings and applicants’ profiles, is stored in RDF using the vocabulary defined by the HR-ontology.
As mentioned in the previous sections major national and international insti-tutions actively contributed to the development of various forms of classifica-tions and taxonomies describing particular aspects of the eRecruitment domain. Reusing the most popular ones is likely to improve the interoperability between eRecruitment applications. Furthermore reusing classification schemes like BKZ or WZ2003, which did not require any customization effort, definitely resulted in significant cost savings, while guaranteeing a comprehensive conceptualization of occupations and industrial sectors, respectively. Nevertheless, the reuse process was still associated with some challenges. Seeking for adequate reuse candidates could not be performed systematically in absence of appropriate methodological and technological support. The evaluation of the candidate knowledge sources, though straight forward, proved to be a time-consuming task, because of the wide range of “standards” available in the HR-domain so far and the absence of tools simplifying the interaction of the domain experts with their content. Finally, ontology customization and integration tasks still require a considerable amount of manual work, even when using common representation languages like XML schemes or OWL. Though the manual selection of the relevant parts from the KOWIEN ontology and the HR-BA-XML standard was possible in our case, tools which assist the ontology engineer during this kind of tasks on real-world, large scale ontologies are definitely required. This applies also to tools creating Semantic Web content from existing ontology-like structures: tools for translat-ing between formal representation languages (e.g. from XML or DAML+OIL to OWL) and for extracting OWL conceptual structures from semi-structured documents (such as those textually describing taxonomies and glossaries).
According to our findings in the mentioned setting we conclude this section with a (incomplete) list of guidelines for building Semantic Web retrieval appli-cations in the domain of medicine. As a starting point we used a set of
domain-independent guidelines emerged in the European projectOntoWeb, which focus
less on technical aspects, but mainly on “issues that relate to the business envi-ronment that affects the deployment, integration and acceptance of the ontology-based application”. The initial checklist contains 13 items, which cover both
organizational and ontology-specific issues. Since we did not encounter any prob-lems related to the organizational setting (satisfactory user involvement or li-cence problems etc.), we elaborated the topics which relate directly to the on-tology engineering process and adapted them to the HR-domain (cf. Figure 1). The resulted list could be complemented with modelling guidelines which are
equally important in a complex domain such as HR.6
# Item Description
1 Ontology discovery
Finding an appropriate ontology is currently associated to considerable efforts and is dependent on the level of expertise and intuition of the engineering team. In absence of fully-fledged ontology repositories and mature ontology search engines the following strategies could be helpful: (i)use conventional search engines with queries containing core concepts of the domain of interest and terms like ontology, classification, taxonomy, controlled vocabulary, glossary; e.g.“classification AND skills”; (ii)identify institutions which might be interested in developing standards in the domain of interest and visit their Web sites in order to check whether they have published relevant resources; (iii)large amounts of domain knowledge are currently available in terms of lightweight models, whose meaning is solely human-understandable and whose representation is in proprietary, sometimes unstructured formats. These conceptual structures can be translated to more formal ontologies if appropriate parsing tools are implemented, and are therefore a useful resource for building a new ontology. (iv)Dedicated libraries, repositories and search engines are still in their infancy. The majority of the ontologies stored in this form are currently not appropriate for the Human Resources domain. This applies also for other institutional areas such as eGovernment or eHealth.
2 Ontology evaluation
Due to the high number of classifications proposed for standardization in the HR domain the evaluation methodology should take into consideration the high content overlapping degree between the reuse candidates and the impact of the originating organization in the field. Furthermore the evaluation methodology should be aware of the following facts: (i)a complete evaluation of the usability of the reuse candidates is extremely tedious, if not impossible. The same domain is covered to a similar extent by several ontologies, while there are no fundamental differences among them w.r.t. their suitability in a semantic job portal. Eliminating candidate ontologies which are definitely not relevant is sometimes more feasible than an attempt to a complete evaluation; (ii)an important decision criterion is the provenance of the ontology, since this area is dominated by several emerging standards. Many standards situated at international institutions such as the EU or the UNO are likely to be available in various natural languages; (iii)many high-quality standards are freely available; (iv)as the majority of HR ontologies are hierarchical classifications, the evaluation process requires tools supporting various views upon the vocabulary of the evaluated sources: (v)these considerations apply for further application scenarios such as eGovernment and eHealth.
3 Ontology integration and merging
Existing HR ontologies have a considerable size, but a relatively simple structure.
Adapt your integration methodology to their particularities: (i) matching and merging ontologies with overlapping domains imposes serious scalability and performance problems to available tools in this area. Nevertheless, using simple algorithms (e.g. linguistic and taxonomic matchers) considerably increases the efficiency of this activity; (ii)the merging results are to be evaluated by human experts. Due to the size of the ontologies, the merging methodology should foresee a flexible and transparent involvement of the users during the process in order to reduce the complexity of the subsequent evaluation; (iii)there is a need for dedicated tools extracting lightweight ontol. structures from textual documents or Web sites; (iv)the integration step requires means to translate between heterogeneous formats (XML to OWL and RDFS, data base schemes to OWL and RDFS etc.); (v)the customization of these structures w.r.t. particular domains of interest (e.g. a HR ontology for the chemical domain) causes additional efforts as all HR standards are domain-independent.
Figure 1:Preliminary Guidelines
In this paper we gave a brief overview of a Semantic Web-based application scenario in the HR-sector by way of describing the process of ontology devel-opment (based on reuse paradigm), including summarizing our experiences in a set of preliminary guidelines whose focus is on the reuse of existing sources from ontology discovery through evaluation to ontology integration and merging. Since Semantic Web technologies are maturing and moving beyond academic ap-plications into broader industrial, practice-oriented case studies and guidelines
Such guidelines are currently emerging as a result of the initiatives of the W3C Semantic Web Best Practices and Deployment Working Group.
for building applications based on these new Internet technologies are a fun-damental requirement for the realization of a fully developed Semantic Web. A significant factor for the success of Semantic Web and ontology technologies in the industrial sectors is cost (time and money) reduction. With this mind,
we strive to utilize the empirical findings we have gained from the Knowledge
Nets project to provide us with an overview of exemplary approach, which has
proved to be quite useful and cost-effective in eRecruitment applications as well as enabling us to derive a set of guidelines for developing Semantic Web ap-plications in similar domains. We have picked up eRecruitment as a potential
earlier adopter scenario as a result of the recommendations of Knowledge Web
which define Human Resource as a one of the key fields for the early uptake of Semantic Web technologies within the European industry . Our experiences show that the reuse of available ontologies cannot yet be performed optimally as it still requires considerable manual work, thereby pointing out the need for tools to assist the ontology engineer during, for instance, selection of relevant parts from existing sources on real-world, large scale ontologies.
Acknowledgements: This work is a result of the cooperation within the Seman-tic Web PhD-Network Berlin-Brandenburg and has been partially supported by the “Knowledge Nets” project, which is part of the InterVal - Berlin Research Centre for the Internet Economy, funded by the German Ministry of Research BMBF, and by the KnowledgeWeb - FP6 Network of Excellence.
1. C. Bizer, R. Heese, M. Mochol, R. Oldakowski, R. Tolksdorf, and R. Eckstein. The Impact of Semantic Web Technologies on Job Recruitment Processes. In Interna-tional Conference Wirtschaftsinformatik (WI’05), 2005.
2. M. Fernandez, A. Gomez-Perez, and N. Juristo. Methontology: From ontological art towards ontological engineering. InProc. of the AAAI’97 Spring Symposium on Ontological Engineering, 1997.
3. KnowledgeWeb European Project. Prototypical Business Use Cases (Deliverable D1.1.2 KnoweldgeWeb FP6-507482), 2004.
4. T. Lau and Y. Sure. Introducing Ontology-based Skills Management at a large In-surance Company. InModellierung 2002, Modellierung in der Praxis - Modellierung f¨ur die Praxis, pages 123–134, 2002.
5. M. Mochol, R. Oldakowski, and R. Heese. Ontology-based Recruitment Process. http://page.mi.fu-berlin.de/ mochol/papers/SemTech.pdf, 2004.
6. OntoWeb European Project. Successful scenarios for ontology-based applications (Deliverable D2.2 OntoWeb IST-2001-29243), 2002.
7. H. S. Pinto and J. P. Martins. Ontologies: How can They be Built? Knowledge and Information Systems, 6(4):441 – 464, 2002.
8. Y. Sure, A. Maedche, and S. Staab. Leveraging Corporate Skill Knowledge - From ProPer to OntoProper. In D. Mahling and U. Reimer, editors, Proc. of the 3rd International Conference on Practical Aspects of Knowledge Management, 2000. 9. R. Tolksdorf, M. Mochol, R. Heese, R. Eckstein, R. Oldakowski, and C. Bizer.
Semantic-Web-Technologien im Arbeitsvermittlungsprozess. Wirtschatfsinformatik: Internet¨okonomie, 48(1):17–26, 2006.