Research area and mission: The general research area of the ILPS group is intelligent access to Internet information. The focus is on textual information, and the group’s mission is to develop intelligent access methods that exploit both content and structure features of web content, improving search, discovery, and retrieval tools for both objective and subjective content. To this end the group pursues three strategic research themes: information retrieval, language technology, and semi-structured data.
NABS code: 12.1: R&D related to Natural Sciences -financed from General University Funds Program leader: Prof.dr. Maarten de Rijke
Starting date of the program: April 1, 2004; ILPS forms ISLA, together with IAS and ISIS, since 2004.
Affiliations outside the institute:
Within the university: Faculteit Maatschappij- en Gedragswetenschappen (ASCOR), Faculteit der Geestes-wetenschappen (New Media), Institute for Logic, Language and Computation (ILLC), Center for Creation, Content and Technology (CCCT).
Research school: Dutch Research School for Information and Knowledge Systems (SIKS).
National: CWI, Twente University, Delft University, Eindhoven University, University of Groningen (Humanities), Radboud Universiteit Nijmegen, Tilburg University (TiCC), Free University Amsterdam (Computer Science, Linguistics), Wageningen.
International: University of Glasgow, Hogeschool Gent, Dublin City University, University of Sheffield, UNED Madrid, CNR-ISTI Pisa, University of Lisbon, University of Udine, University of Lugano, TU Dortmund, University of Edinburgh, Hasselt University, INRIA Paris, Oxford University, Warsaw Univer-sity, University of Bolzano, Fraunhofer Institute, FBK Trento, EMLR Heidelberg, Johns Hopkins UniverUniver-sity, University of Maryland, Columbia University.
Industry/SMEs/government: Dienst Informatievoorziening Tweede Kamer, Elsevier, Yahoo! Research Barcelona, Google Mountainview, IBM Almaden, Microsoft Research Cambridge, Korps Landelijke Politiediensten, Nederlands Forensisch Instituut, Instituut voor Beeld en Geluid, Gemeentemuseum Den Haag, Favela Fabric, TrendLight, Talking Trends, Ministery of the Interior and Kingdom Relations, Minis-tery of Foreign Affairs, Koninklijke Bibliotheek Den Haag, Teezir, Textkernel, Gridline, Aggregate, Ge-meente Amsterdam, Deutsche Welle, Algemeen Nederlands Persbureau, Dutch chapter of the Wikimedia Foundation.
Supported by: NWO, EU FP6, EU FP7, Elsevier, Korps Landelijke Politiediensten, SenterNovem, Philips, Favela Fabric, Gridline, TrendLight, Talking Trends, Platform Béta Techniek, Algemeen Nederlands Persbureau, Kieskompas BV, Hoppinger BV, Koninklijke Bibliotheek, Kennisland, Ministery of the Interior and Kingdom Relations, Ministery of Foreign Affairs.
B.1 Leadership
The management style is aimed at establishing a world-leading research group in the area of intelligent information access. Communication and control are aimed at transparency, collaboration, and a high degree of independence in all members of the program, senior or junior. Program members, and especially PhD students, are encouraged to broaden and deepen their expertise by following courses, engaging in joint research activities, conference visits, and internships.
All staff members have an “open door” policy, together with the use of collaborative software (“wikis” and collaborative note taking). In addition, there is a weekly group lunch, and a bi-weekly seminar, all aimed at
53
enhancing communication and internal quality control. All members of the group, including faculty and PhD students, regularly give internal research presentations to inform others about their latest research as well as to obtain feedback from group members who are not directly involved with their research.
Internal research collaborations are strongly encouraged, both by setting up task forces and through a financial reward scheme (“Most Valuable Player of the Year Award”); innovative initiatives that give rise to (externally) funded research are rewarded with a financial bonus. Participation in international evaluation exercises and online demonstrations provide additional instruments for motivation and innovation.
B.2 Strategy and policy
The main objective of this program is to develop methods for intelligent access to textual information, with a special focus on internet information. A number of web-related developments guide our research. One is that, the World Wide Web is changing from a publishing forum into a participation forum, where search is no longer restricted to documents, but may also target more specific “things” (people, locations, services, answers, sentiments, …) or larger aggregates (groups, communities, …). Another is the increasing amount of subjective information available on the Web such as discussion forums and web logs (“blogs”). A third concerns the increasingly multilingual nature of the web: an increasing fraction of Internet content is au-thored in languages other than English, thus increasing the demand for automated ways of accessing foreign-language information. Those developments raise a number of challenging scientific question: What are effective methods to support search? How can information that is distributed across several pages be linked?
How can we search across different languages and automatically translate foreign content? What tools are most effective in this new setting? Within the ILPS group we believe that a more semantically informed approach is required than mere full text indexing. Our strategy is to exploit both content (of documents) and structure (document structure as well as linguistic, link, and site structure). Often, structure is not explicitly available and has to be recovered, through language technology, document analysis, collection analysis and link analysis.
Thus, the group’s mission is to develop intelligent access methods that exploit both content and structural features of web content, improving search, discovery, and retrieval tools for both objective and subjective content. To this end, the group pursues strategic research themes along three dimensions: information retrieval, language technology, and semi-structured data. Our research strategy is to cover theoretical, experimental, and applied aspects of these research themes. As much as possible, we take a task-based point of departure for our research, where tasks are either defined at worldwide community-based evaluation forums (such as TREC, CLEF, TAC and INEX) or through our increasing interactions with external (indus-trial, governmental, societal, academic) partners. Bearing a clear task in mind ensures that our research remains both focused and relevant to the widest possible audience crossing the traditional boundaries of academic research.
The group’s short term goals are organized around a number of themes, which, in turn, are covered by a number of ongoing demos and participations in evaluation forums. The themes include multidimensional markup, data exchange, foundations and benchmarking in entity retrieval, focused information access, social media search as well as concurrent mining of text and time series. Many of our research outcomes are accompanied by Internet demos that illustrate our research to peer researchers and end users in an interactive fashion, in some cases attracting many tens of thousands of users. Our long term aim is to build up and maintain a sustainable infrastructure, both in terms of algorithms, software infrastructure, and applied and theoretical insights concerning text and structure in the context of information access. To realize these plans and support these activities, the group attracts a mixture of second and thrid stream funding.
While the ILPS group’s research activities focus mainly on textual information there are ongoing collabora-tions with other groups within ISLA that involve other modes such as image analysis, which has led to joint supervision of a PhD student, joint participations in TRECVid, joint collaborations with external partners in the intelligence domain, and joint leadership of (and participation in) the newly founded Center for Creation, Content and Technology (CCCT). The group’s research efforts feed into its teaching activities at all levels, ranging from the Bachelor level (web programming, data mining, and project information retrieval) to the master level (Internet information, Language technology project), where much of the project-based work and
lab-sessions use, and are used by, the outcomes of our research activities.
B.3 Processes in research, internal and external collaboration
The group’s research processes are extensive and diverse due to the broad field of expertise that is required for tackling textual information access. Research themes are identified in both a bottom-up and top-down fashion, where individual researchers or research projects propose new themes that are natural follow-ups (or complements) to ongoing research activities, and senior members of the group identify and operationalize more long-term strategies and themes. Both are strongly task-driven. Themes are retired when goals have been achieved and/or available evaluation resources do not support answering additional research questions.
While we aim to build strong individual researchers, we are fully committed to teamwork which increases productivity and generally creates new opportunities for everyone involved. All researchers in the group have both a personal research agenda and are part of group-wise research activities. To foster team-work and sharing of expertise, the group uses wikis, mailing lists, collaborative note taking facilities for communica-tion purposes, and large-scale demos to provide focus and a clear sense of purpose and direccommunica-tion; the demos also serve as an internal quality control mechanism. The group’s strong sense of a collaborative research spirit is probably best illustrated by the substantial number of research publications that involve two or more senior staff members.
PhD students have standing weekly appointments with their supervisors. Postdocs have standing bi-weekly meetings with their project leaders, while multi-person projects meet weekly. We involve our postdocs in the supervision of PhD students, and our PhD students in the supervision of final Bachelor and Master projects, both for training purposes and to encourage the postdocs and PhD students to reflect on their own roles as
“researchers in training.” There is intensive communication between the program leader and the PhD students to ensure their progress and personal development. We strive for all PhD students within the ILPS group to participate in competitive international doctoral consortia such as the SIGIR doctoral consortium where students receive critical feedback from a panel of international researchers on their research ideas. On top of that, and in view of the fact that many of our PhD students end up outside academia, all PhD students undergo a period of industrial internship.
In compliance with faculty and institute regulations a personal development and training program is set up and maintained for each member of the program. Additionally, PhD students participate in, and present their work at, the ISLA-wide colloquium.
Both individual and collaborative research efforts result in participations in worldwide collaborative evalua-tion efforts (e.g., TREC, CLEF, TAC, INEX, TRECvid) and in submissions to peer-reviewed conferences and journals. Contract research, and other research arising from interaction with external “problem holders”
is assessed in terms of evaluation criteria determined in agreement with the problem holders.
B.4 Academic reputation
The ILPS group is still relatively young. It joined the Informatics Institute in April 2004. At that time, the program leader and his group were transitioning from a group focused on applied logic to a group focused on information retrieval and access. As part of that process, for much of the review period the group placed a strong emphasis on workshop and conference publications (to rapidly make a name for itself in the new area in which it was trying to establish itself). In 2008, the group decided that it had completed its transition and changed its publication strategy to focus on highly competitive conference publications followed by journal publications. The ILPS group is internationally recognized as one of the leading research groups in the area of information retrieval and access; this is evident by the large number of citations of the research publica-tions by group members, the number of invited presentapublica-tions, its active role in a number of worldwide community-based evaluation efforts, and the large number of requests to review research publications and proposals, participate in editorial boards and act as members of conference program and external PhD committees; see the appendix for a detailed list.
55
The individual research quality of members of the group has been recognized by NWO by awarding individ-ual grants to three members of ILPS: De Rijke (Pionier), Ten Cate (Veni), and Afanasiev (Mozaiek).
In 2004, Marx won the Best Paper Award at PODS, the leading conference on principles of database sys-tems. Ten Cate won the EACSL Ackermann best dissertation award 2006. Mishne won the ECIR 2005 best student paper award. Balog won the SIGIR 2007 best Doctoral Consortium Paper award. In 2008 a team lead by Marx received the XML Holland Award for PoliDocs.nl and a team lead by De Rijke received the WICOW 2008 Best Paper Award. In addition, the group has often managed to achieve top ranking results in community-based benchmarking efforts (CLEF, TREC, INEX).
The group has attracted very considerable funding from the Netherlands Organisation for Scientific Research (NWO); today, De Rijke is NWO’s largest individual “customer,” with 7 active NWO projects; in total, the group has secured 18 NWO-funded projects during the review period.
The scientific impact of the ILPS group is not only based on its research achievements but also on its active role in shaping the future research direction of the field. De Rijke has been a member of the Cross-Language Evaluation Forum (CLEF) steering committee that plays an important role in defining relevant research issues for the European (and global) information retrieval research community. De Rijke helped launch and organize the Question Answering track at CLEF; Mishne and De Rijke helped launch organize the Blog track at TREC; De Rijke launched and organized the WebCLEF web retrieval track at CLEF; Balog is helping to launch the Entity track at TREC. Where legally possible we share the software we develop ourselves (as open source) as well as the data sets that we create for evaluation purposes.
Before joining ISLA, several members of the ILPS group (including the program leader De Rijke) were part of the Language and Information Technology (LIT) group which was part of the Institute for Logic, Lan-guage and Computation (ILLC). The LIT group was evaluated in the 2004 QANU (Quality Assurance Netherlands Universities) report. The report was very positive about the productivity, research direction and coherence of the group but also highlighted two shortcomings: the lack of sufficient permanent staff, especially the lack of a full professor, and the fact that the group might be more appropriately integrated within the Informatics Institute and the ISLA lab in particular. Both issues have been fully addressed: since April 2004, the group has been fully integrated within ISLA and as of February 2009 the group consists of three permanent staff members (1 full and 2 assistant professors).
B.5 Internal evaluation
As a consequence of the broad scientific area covered by the group, the research efforts are diverse, both methodologically and content-wise. The group’s methodological spectrum ranges from theoretical founda-tions, batch experiments and user studies to applications. The program ranges from large-scale information retrieval to language technology and semi-structured data. At the same time, statistical and data driven methodologies are at the core of almost all research activities within the group, ensuring a substantial degree of cross-fertilization and collaboration. Except for the tenured staff members, no group member is involved with, or active in, all three areas, allowing them to develop their research in a focused manner. Nevertheless, we have effective measures in place to make sure that all group members are active in at least two areas, and that they take part in cross-disciplinary research activities.
For measuring success and to assure quality, the ILPS group uses well-founded academic mechanisms.
These include citations, editorial positions, memberships in scientific boards, publications in top-ranked conferences and journals, keynotes, invited talks, involvement in (the organization of) conferences, accepted peer-reviewed project grants, and so on. These quality metrics are also discussed with individual group members as part of their yearly evaluation (“jaargesprek”). The yearly evaluations may e.g., lead to arrange-ments for improving scientific output, increasing the opportunities for personal growth, and so on, and for early stage PhD students they may result in termination of the contract. During the yearly evaluations, as well as in plenary group meetings, the overall working atmosphere, research climate, and facilities within the group are also discussed.
Together with the IAS and ISIS groups, the ILPS group contributes to ISLA-wide internal performance
goals. These come in three groups: annual (at least five papers in top ranking conferences and high impact journals; top rank in at least one benchmarking effort; two significant demonstrators; coverage in the press;
citations), bi-annual (at least one individual grant—Mozaïek, Veni, Vidi, Vici, ERC; lead large-scale or EU project; spin-off), and long-term (internal ISLA-collaboration; five strategic partnerships in steady state;
semi-permanent collaborations with other faculties and the Hogeschool van Amsterdam (University of Applied Sciences)).
B.6 External validation
The research being undertaken by the ILPS group mixes foundational, experimental and applied work.
Where possible, our research is task-based. As a consequence, external validation and collaboration with external end users—industrial, governmental, societal, or academics from other disciplines, …—not only comes natural but is essential, for providing us with data, evaluation metrics, and ground truth. In view of this, we entertain active working relations with a large number of academic teams and non-academic organi-zations, resulting in joint private-public evaluation efforts, joint projects funded by national or EU govern-ment, or direct contract funding. In such settings, our work receives validation by being integrated in our external partners’ software solutions or workflow.
The program maintains research links outside academia links with a number of SMEs in the areas of search and language technology (Hippo, Ilse media, Irion, Q-Go, Textkernel, TrendLight, Gridline, Hoppinger, Favela Fabric, Kieskompas BV) and with various “problem owners” that provide input for the program’s research strategy (Elsevier, Ilse, Beeld en Geluid, Philips, Koninklijke Bibliotheek, Palga, Unilever, Alge-meen Nederlands Persburerau, Korps Landelijke Politiediensten, Ministery of the Interior and Kingdom Relations, Ministery of Foreign Affairs, Government Communications Headquarters). These contacts result in knowledge transfer, data transfer, software transfer, joint research activities, BA and MSc graduation projects, and joint research proposals (EU, NWO and STW).
Live web demos, while labor and resource intensive, are an important and integral part of the program’s research policy, both for outreach, evaluation, and data collection purposes. Two specific examples stand out: MoodViews (actively maintained mid 2005–mid 2009) has attracted millions of visitors as well as worldwide press coverage; our VerkiezingsKijker search engine for the national elections in 2006 attracted over 125,000 users in three weeks.
After two years of preparation, a spin-off was launched in Q2 2009, called Talking Trends.
B.7 Researchers and other personnel
During this evaluation, the program was still largely in its start-up phase, until recently operating with an incomplete team—due to market pressure and competition, hiring high-quality tenured staff proved to be a challenge. Until early 2009, the tenured staff of the group consisted of one full professor (De Rijke) and one assistant professor (Marx). In February 2009 Christof Monz has joined ILPS as an assistant professor, bringing the group to its targeted level of tenured staffing and strengthening the group’s language technology and multilingual expertise. At its full strength, the program wants to increase the number of PhD thesis completed per year from 1–2 to around 3, which means that the total number of PhD students within the group should increase to between 12 and 15.
For recruiting, selection, training, and personal development opportunities, the program follows and uses the faculty, institute and research school policies and opportunities. The program attaches great value to mobility and exchange, and encourages (and supports) its members, and especially its junior members to take part in external training programs, doctoral consortia, internships, and working visits; to support these activities, the program typically uses external funding (exchange grants, industrial support). Recent PhD students and
For recruiting, selection, training, and personal development opportunities, the program follows and uses the faculty, institute and research school policies and opportunities. The program attaches great value to mobility and exchange, and encourages (and supports) its members, and especially its junior members to take part in external training programs, doctoral consortia, internships, and working visits; to support these activities, the program typically uses external funding (exchange grants, industrial support). Recent PhD students and