• No results found

The research on creativity and the creative process is closely connected to the field of knowledge management and knowledge engineering. The analysis of creativity always re- quires and also produces new knowledge. Hence, it is necessary to introduce a knowledge management facility for the process of analysing creativity and the creative process. The knowledge is usually managed via specially designed frameworks. These knowledge man- agement frameworks play an important role in the domain of computer science and many computer-based projects. A project community can use this type of framework to learn from, contribute to and collectively increase and improve the knowledge of the community [12].

Knowledge Management, however, is not only required for analysis purposes, but necessary as a foundation and support for the creation of novel products in industry and academia. The research on knowledge management emerged in large corporations. The main aim was the preservation of the employees’ knowledge in order to avoid a critical loss in case of large staff fluctuation or downsizing [101]. Knowledge is sometimes considered to be the most important asset of every company [16, 128].

The knowledge stored inside a knowledge management facility is subject to the following life cycle [111]: 1. Originate/create knowledge 2. Capture/acquire knowledge 3. Transform/organise knowledge 4. Deploy/access knowledge 5. Apply knowledge

These five steps cover all, with the knowledge management facility related, activities. Any software application, designed as a knowledge management facility, must support these actions.

The understanding of knowledge management has changed over the years. It started somewhat as a hype, but the topic received less attention in the last couple of years. In both academia and industry, it is considered more and more as not the only important or the central instance [41], but one of many important pieces of the creation process [1].

2.6.1 Data Preparation and Retrieval

The exchange of knowledge and information via modern communication devices bears certain difficulties. A large knowledge database must provide search facilities in order to enable a fast and easy access. The search facilities require a tagging of the knowledge for a categorisation with so-called metadata. The in the internet provided information is heavily depending on the usage of metadata. It simplifies the work of website crawler. The Hypertext Markup Language (HTML) even contains language elements for the handling of keywords for a website.

The content of the knowledge database is seldomly created by a single person. As a result, there will be differences in the labelling and lead to the problem of using different terms carrying the same meaning. This problem requires techniques catering for the identification of the right context of words.

Polysemy and Synonymy of Words

Polysemy and synonymy are serious problems in the domain of natural language pro- cessing. The fact that many words in a language have more than one meaning causes problems for comparisons. A good example is the word “bank”, which can be a seating- accommodation or a financial institution. As a result, people looking for a financial insti- tution with the aid of a search engine could receive information about a nice place in a nearby park. The phenomenon of multiple meanings for a word is known as polysemy.

The other, more important issue in regard to tagging information in databases is called synonymy. Each language has sets of words, which have the same or a similar meaning. An example are the words “thinking” and “contemplating”. Both words have a very similar meaning and it is very useful for searches to also include the entries, which are tagged with terms, similar to the original term of the search query. A lot of research has been undertaken so far in order to tackle these problems.

The problems with the ambiguity of words has led to the emergence of a new research area in computer science, called word sense disambiguation (WSD). The aim of WSD is the development of techniques, which enable to identify the meaning of words in a given text. This aspect plays an important role in text mining and information retrieval processes [112]. Hence, it is worth to take notice of WSD as it uses the same techniques for the identification process, which are being used to handle metadata in knowledge databases. The first approaches in WSD were made in 1950’s with more or less poor techniques [62]. One technique used specialised databases which contain terms for a specific, minimal topic. The data sets where often manually enriched with metadata. The disadvantage of this approach is quite obvious as it is impractical for texts with versatile topics [127]. Furthermore, the method is very time consuming.

The research about WSD quickly became connected to the domain of artificial intel- ligence (AI). The development of the AI approaches began in the 1960’s. Improved approaches tried to solve the language understanding with a type of rule-based system. The disambiguation of the data was realised through the introduction of large knowledge databases for semantics and syntax of a language. The systems then produced semantic representations of the text in the data set. Some of the systems already utilised the rela- tions between nouns and verbs. Furthermore, some of these methods used ontologies or case-based approaches. The rules for the WSD process were quite simple and less precise than the approaches in the following decades. Especially the synonymy and polysemy were difficult to handle which those approaches.

The following paragraphs introduce the most common techniques for solving the issue of polysemy and synonymy. The discussed techniques are useful in regard to the tagging of data and information in knowledge databases as well as WSD for text.

Knowledge-based Approach

The aim of the knowledge-based technique is the usage of machine-readable dictionar- ies [134]. The approach reaches a high precision on a defined topic area. Hence, the systems are quite poor at handling text with a different context. There are three differ- ent types of knowledge sources: machine-readable dictionaries, machine-readable thesauri and machine-readable lexicons. The knowledge-base approaches were much later invented than the AI techniques because of the lack of knowledge databases before the 1980’s. The further development of these techniques lead to the corpus-based approach.

Corpus-based Approach

The corpus-based approach is based on dictionaries and thesauri, enriched with informa- tion, which were gained from statistics over large corpora [78]. The field of research is divided into two different categories, supervised techniques and unsupervised techniques. Supervised techniques are very precise in the disambiguation of words. This is caused by the human trimmed database. Unfortunately, the supervised method is not applicable for projects, which cater for a large variety in topics. The unsupervised techniques often use machine-readable dictionaries or thesauri for the precessing of the text. Yarowski devel- oped an interesting method for unsupervised WSD. In his approach, he assumes that an ambiguous word just occurs with one sense of it in the majority of texts. Furthermore, the approach considers the collocations in regard to the corpora. Collocations are sequences of terms which often occur together. For instance, a noun is usually used with certain set of verbs. It assumes that there is one sense per collocation. This means that nearby words help to describe the meaning of the ambiguous word. Knowing this, the algorithm collects seed collocations for an incremental process for the identification of collocations for target senses. His approach is almost as powerful as supervised techniques [135].

WordNet

WordNet is a sophisticated thesaurus for the usage of computer based language processing. The project is focussing on the development of a semantic lexicon which surpasses the functionality of a common thesaurus. It is being developed at the Princeton University and is currently available in the 3rd version (June, 2010) [91]. The WordNet project is designed for the English language. However, there are already similar projects for other languages. One of them is the PersiaNet, the Persian WordNet [70].

The software and the database of the project is available for download from the project homepage (http://wordnet.princeton.edu). The software runs under Windows and vari- ous UNIX and Linux operating systems. Furthermore, APIs for different programming languages are existent, for instance a Java API or Prolog version. The software license is similar to GPL and allows changing the source code and a usage of the software for commercial applications. Furthermore, an online service of WordNet is available as well.

The database of WordNet contains information about nouns, verbs, adjectives and ad- verbs, so called ”open-class words”. Other types like determiners, prepositions, pronouns, conjunctions, and particles are left out of consideration. The semantic database is a hy- brid of a dictionary and a thesaurus. Nouns and verbs are categorised into hierarchies.

Hence, it has knowledge about the affiliation of a word. A simplified example would be: oak ⇒ tree ⇒ plant. Another very useful feature is the knowledge about synonymy and polysemy [32].

Self-Organizing Map

A Self-Organizing Map (SOM), also known as Kohonen Map, is a clustering technique, closely related to artificial neural networks. This technique converts the nonlinear statis- tical relationships between high-dimensional data into low-dimensional maps. It utilises an unsupervised learning approach for the visualisation of high-dimensional input data without losing the most important relationships of the primary data elements [74].

SOMs are used for various analyses, pattern recognition and categorisation tasks. The technique requires a training with the data set in order to generate the map. During the training or learning step, an array of nodes (neurons), each one containing a weight vector, is arranged by the weight of the vector. The training is repeated for many times (up to several thousand) until no significant changes are observed anymore [73].

Self-Organizing Maps enable a good categorisation of terms, based on the relation to each other. It is a good technique for the identification of similar terms in order to manage e.g. information or documents. A project, called WEBSOM was initiated for the exploration of the information provided by the internet, by enabling interactive browsing [59]. The WEBSOM technique has proven to be able to handle massive data sets and is therefore capable of being used for large projects [77].

The drawback of the SOM approach is the need of a preprocessing and training if new elements are added to the data set.

2.6.2 Conclusion

Knowledge management in general becomes more and more important. Knowledge is considered as being the key asset of a company. It is necessary to focus on an efficient distribution and storage in collaborative environments. Without a proper knowledge man- agement, creativity and innovation might be hindered severely.

Data preparation is a necessary part for the management of knowledge in collaborative environments. The requirement of preparation increases with the variety of information stored in the data set. Especially frameworks with large a diversity of users like Web 2.0 compliant systems should remove the ambiguity of terms as good as possible [90].