• No results found

4.3 Evaluation: Usability Study

5.1.5 GoPIZ

To further explore the semantic search with patents we developed a prototype together with the Patent Information Centre (PIZ) Dresden. The GoPIZ system uses ontological background knowl- edge to classify full text patents. The usage of full text removes the problem of too short text snippets for the text mining as with the GoFreePatentsOnline system.

Similar to all current search engines, GoPIZ provides a simple keyword search interface but with the extension of semantic filtering with identified concepts from text. For the prototype GoPIZ, an expert for patents selected a set of more than 7700 patents with relevant text fields such as title, abstract, objective, advantages, independent claims, claims in English, French or German and main claim. These are mostly all available text fields of patents from the European Patent Office (EPO). The selection of the patents is based on an actual case in patent research. It is part of a patent application in the area of valve regulation in mechanical engineering.

The background knowledge was specifically created for this task. The knowledge network was semi-automatically generated and structured using a built-in ontology editor of the system.

8

With this system a non-expert created the GoPIZ taxonomy within a week. It covers more than 200 concepts, including synonyms, variations, and abbreviations.

The shown example for GoPIZ uses the query “air conditioning control system”. This results in 41 articles. To find all articles relating to “solenoid”, select the concept in induced ontology tree on the left side. This results in 3 articles. The full text is hidden by default and can easily be viewed by just clicking on the required part. To view the objective of the patent, the user may click on the “Objective” heading. As a result of this action, the text of the claim is annotated, the identified concepts and keywords are highlighted, and in the interface this claim is added (Figure 5.1). Additionally, GoPIZ offers links to Wikipedia entries, for instance, to learn more about solenoids. This additional information source includes the fact, that electro-mechanical solenoids can be used as special type of relay in pneumatic or hydraulic valves.

Figure 5.1: GoPIZ with query “air conditioning control system” and selected concept “solenoid”, including the highlighted and annotated patent objective. Wikipedia links are available below the patent.

The biggest technical and algorithmic challenges of GoPIZ is the handling of full text patents. Patents as used in GoPIZ are structured entries, but the fields containing the claims and descrip- tions are long free text passages. These text fields alone may possibly be longer than a normal scientific full text article. This needs to be reflected in system architecture. For this the render- ing of patents search results, especially the more detailed claims, are done asynchronously and on-demand. To achieve this AJAX with JSON messages is employed. In the initial search result presented to the user, the claims are not rendered. Instead a place holder with the option to open the claims is shown to the user, thus reducing the rendering time.

For the induction of the tree for the filtering feature, all the patents are pre-annotated with the background knowledge and cached. Therefore no additional logic is required. The reduced presentation of patents enables also a more compact overview of the search results and scanning for relevant patents by the user.

As a result of this pilot study and resulting prototype, feedback was collected from the PIZ. The main comment, was that the interface was more comfortable, than current systems such as the

patent search of the European Patent Office (EPO)9. Furthermore, the filtering system of GoPIZ that includes the sub-concepts of a concept helped to explore the patents.

There are two main requests to improve the prototype form the PIZ. First, remove the step of gathering potentially relevant patents. This means that the system should have access to all patents. Second, integrate the existing patent classification systems. Unfortunately, in this pilot study both requests are out of the scope for the prototype. Other requirements for a patent search system have been proposed by Tseng and Wu (2008). Their list consists of the following items:

Self-defined search result item display function To asses the relevance the most used parts of a patent are the summary snippets, pictures, titles, and claims. However, most government patent search websites do not display these data on the search result page.

GoPIZ includes these parts already in the search results.

Suggested vocabulary feedback function Provide a mechanism to suggest and extend search vocabulary, if the initial search results are unsatisfactory.

GoPIZ provides the ontology to identify related concepts.

Word and command error correction function Correct errors in keywords and search commands, especially for more complex Boolean queries. GoPIZ allows to construct complex queries with the help of the ontology.

Correspondence system for frequently used vocabulary between languages Provide built-in sup- port for cross-language queries and search for common concept in multiple languages, such as translations between traditional Chinese, simplified Chinese, and English. Can be added in GoPIZ by adding non-English synonyms to the ontology.

Look-up system between and mapping between patent classification schemes Provide a helper to map and classification numbers of different patent classification schemes, i.e. country specific to improve multi-lingual patent search. Ontology and schema mapping is open research topic.

These items show that the current patent search system needs a customization feature and many convenience-based hints and links. GoPIZ already incorporates the first three of these feature requests.

Semantic patent search is a promising application for semantic technologies. They help to filter the large quantities of data that are connected to patent search. But one current limit is also the data size, as any serious effort is straightaway confronted with scaling issues.

9