CHAPTER 5 DIALOGUE MECHANISM DESIGN
5.3 D IALOGUE M ECHANISM D ESIGN
5.3.4 Concept Classification & Information Retrieval
In the proposed system, the first task for Dialogue Mechanism is to receive the text of submitted postings from the front-end side and then start a concept classification process. Concept Classification Agent is designed for performing this process. In the discussion forum, there are semantic relationships among all postings. Some posting may be related in semantic meanings, but the relevant information can be posted anywhere in the discussion forum and relevant postings are specifically related to one topic (a big concept) such as J2EE or LSA. Thousands of postings may be submitted anywhere in the discussion forum and they might be related to hundreds of different topics, making it impossible for students to capture all relevant postings that they might need or be interested in. The proposed approach for concept classification processing in Dialogue Mechanism is to set up a concept space for the discussion forum where every posting belongs to a certain semantic concept. It assumes that each posting representing sub-concept belongs to a semantic concept in the discussion forum. The introduced cosine similarity algorithm based on LSA is used to calculate the similarity between a posting (p) and a semantic concept (C):
sim (q, C) = cos_sim(q, C)
Central DB Concept Classification &
Reclassification Process Information Retrieval Process
User Front-end Dialogue Mechanism Databas
Concept Classification Agent
Query Monitoring Agent Submit Message
Evaluate retrieved message
The Concept Classification Agent calculates the similarity weight between each submitted posting with every existing semantic concept. The result could then be in two scenarios: 1) if there is no similarity weight greater than a default threshold, the submitted posting itself will become a concept in the concept space; or 2) if there is a group of concepts whose similarity weights are greater than the threshold, the submitted posting will be classified as a sub-concept of the semantic concept with maximum similarity value in the concept space. Therefore all submitted postings in the discussion forum should be classified into different semantic concepts in the concept space. The figure 5-5 describes a concept classification example for a submitted posting in a discussion forum. In this example, there are two existing concepts (Concept 1 and Concept 2) in the concept space and the newly submitted posting is classified into the semantic concept 1 (Java) because the similarity between them is greater than threshold (0.25) and the highest weight (0.4).
Figure 5-5: An Example of Concept Classification in a Discussion Forum Concept Space - Discussion Forum
Concept 2(topic 2): PHP Concept 1 (topic 1): Java Method: method is … , java … Similarity = 0.4 Similarity = 0.3 Threshold=0.25
Submitted Posting Sub- concept 1 Sub- concept 2 Sub- concept 4 Sub- concept 5 Sub- concept 3 Sub- concept 6
When a message is submitted into the discussion forum and sent to the Dialogue Mechanism component, the Concept Classification Agent performs the concept classification task to identify which concept this message belongs to. After the submitted posting is classified into a concept, the relationship between it with other postings (sub-concepts) can be identified and information retrieval process is performed by Query Monitoring Agent. This relationship is represented by the similarity among them. Such relationship is clearly shown in the concept space view. The figure 5-6 depicts the relationships between sub-concepts and between concepts in the concept space view, such as sub-concept 1 with sub-concept 2, sub-concept 2 with sub-concept 3, and concept 1 with concept 2.
Figure 5-6: AnExample of Relationships between Sub-Concepts and Concepts
Based on the concept classification measure, the basic similarity calculation measure is modified to calculate the relationship - similarity (sim) between a submitted posting
(p) with each existing sub-concept s1, s2,…, sn belonging to both same concept Cs and
Discussion Forum
Concept 1(topic 1):
Java Concept 2 (topic 2): PHP
Posting 1: (Sub- Concept 1) - Method: method is … Posting 2: (Sub- Concept 2) - Object: object is called … Posting 3 (Sub- Concept 3): - JDBC: this is a tool … Posting 4 (Sub- Concept 4): - Variable: in any program … Similarity: 0.2 Similarity: 0.4 Similarity: 0.3
other conceptsCo for the information retrieval process. Based on a set-up threshold of
the similarity calculation, the Query Monitoring Agent finds out the most relevant sub-concepts and returns them to the users. Two assumptions made for this proposed similarity calculation algorithms based on the cosine similarity measure are denoted as follows:
• Assumption 1: the existing sub-concept si belongs to the same concept with submitted sub-concept p:
sim(p,si) = cos_sim( p, si)
• Assumption 2: the existing sub-concept si belongs to a different concept with submitted sub-concept p:
sim(p, si) = αcos_sim(Cs, Co) × cos_sim( p, si)
where α is a parameter that controls the influence of the relationship between two semantic concepts.
In the concept space, the relationship between two semantic concepts is represented by the similarity between them. Therefore the relationship between two sub-concepts can be influenced if those two sub-concepts belong to different semantic concepts. This proposed approach for Dialogue Mechanism component design and development has combined a modified cosine similarity measure based on LSA and concept space to perform the text similarity calculation and information retrieval tasks. As previous research contributions have described that relevance feedback is another important assistant approach to reinforce the performance of how the system understands users meaning and preference, and that it is also able to weaken the word/phrase semantic confusion limitation of cosine similarity measure, such as different meaning for one word and different words for one meaning, relevance feedback is integrated with
above proposed approach to re-enhance the capability of Dialogue Mechanism component to perform natural language processing tasks. The next section introduces how relevance feedback can be integrated with above approach to reinforce the performance of Dialogue Mechanism component.