4.2 Reuse Algorithms for Text Authoring
4.2.2 Transformational Text Reuse
We introduce a transformational (XFRM) approach to text reuse which uses multiple near- est neighbours of a new problem to assemble a solution text, rather than a single nearest neighbour as used in our baseline technique (see Section 4.2.1). The main idea is to use
4.2. Reuse Algorithms for Text Authoring 77
other nearest neighbours to progressively adapt the solution text from the best match case into a more accurate solution. This takes place only if a new problem (query) and its best match are not identical; that is there are mismatches. Therefore some attribute values in the query will be different from those in the best match case. Transformational text reuse occurs by replacing chunks of text aligned to mismatched attributes with those from other nearest neighbours having better attribute values than the query; ‘better’ here means their values are closer to the query than those from the best match case. Such chunks of text are only replaced if they are not aligned to any other problem attribute. Also, the search space for cases with better attribute values can be controlled by lim- iting the neighbourhood parameter (k). This approach to text reuse is similar to CBR transformational adaptation (Chang, Cui, Wang & Hu 2004), where solution elements are re-organised through add and delete operations. It is also similar to substitutional adaptation (Wilke & Bergmann 1998, Gonz´alez-Calero et al. 1999, Adeyanju et al. 2008), when viewed as successive replacement of aligned chunks of text in solution obtained from the best match case.
C
L
R
S
V
2
1
3
5
2
Query
Soluon
C le a n lin e ss d e scrip tio n . L o ca tio n d e scrip tio n . R o o m d e scrip tio n . S e rvice d e scrip tio n . Va lu e d e scrip tio n .
C
L
R
S
V
1
1
4
5
3
2
1
3
4
4
2
2
4
3
2
2
4
5
1
2
…
…
…
…
…
1-NN 2-NN 3-NN 4-NN …C
L
R
S
V
…
…
…
…
…
Case Descripon Author Rangs Case Soluon Aligned SentencesFigure 4.2: Transformational text reuse with hotel reviews
Transformational text reuse is illustrated in Figure 4.2 where it is applied to our hotel review dataset. Given a query consisting of values for the five rating attributes, specified
k−nearest neighbours are retrieved. Here k = 4, therefore four nearest neighbours are
shown with their problem attribute values and aligned sentences as shaded squares. Only sentences aligned to the ‘location’ and ‘service’ ratings are chosen from the best match (1-NN) since they have attribute values identical to the query’s. Mismatched attribute
4.2. Reuse Algorithms for Text Authoring 78
values are resolved by utilizing aligned sentences from other similar cases in the neighbour- hood (2-NN & 3-NN) with closer values to the query than 1-NN. The aligned sentences used for assembling a proposed solution are shown in the diagram as squares with dark outlines. Note that if no better values for the mismatches are found in the other neigh- bours, sentences from the best match case aligned to these rating attributes are retained in the assembled solution. Therefore, there will always be five or more sentences in the assembled solution, assuming all cases in the casebase have at least one unique sentence aligned to each attribute.
Algorithm 4.2 Transformational text reuse algorithm (XFRM)
Require: CB={C1, . . . , Cn}, set of cases in the case base
Require: R ={r1, . . . , rp}, set of structured attributes e.g. ratings in hotel reviews
Require: V ={v1, . . . , vq}, set of possible values for each structured attribute
e.g. rating values
Require: IE= information entity consisting of a structured attribute with distinct value,
where (attribute(IE)∈ R) ∧ (attributeV alue(IE) ∈ V )
Require: Ci={IEi1, . . . ,IEip, SolutionTexti}, where (i ∈ {1 . . . n})
i.e. a case consists of p attribute values and a solution text
Require: Q ={IE1, . . . ,IEp} , a query with p attribute values
1: SOLN ={SOL1, . . . , SOLp},
<!– set of proposed sentences for each problem attribute –>
2: CBlocal ← RET (CB, Q, k), retrieve k similar cases
3: for each IEj ∈ Q do
4: qr = attribute(IEj)
5: qv← attributeV alue(IEj)
6: dv ← 1000 <!– initialise difference between query and case attribute values to
determine best match –>
7: for each Ci∈ CBlocal (in order of decreasing similarity) do
8: SolutionText← getSolutionT ext(Ci)
9: rj ← attribute(IEj, Ci)
10: vj ← attributeV alue(IEj, Ci)
11: if qr = rj∧ |qv − vj| < dv then
12: dv ← |qv − vj|
13: clear(SOLj)
14: Sj ← selectAlignedT extChunks(rj, SolutionText)
15: addT extChunks(Sj, SOLj)
16: end if
17: end for
18: end for
19: Aggregate all chunks of text in SOLN for reuse
4.2. Reuse Algorithms for Text Authoring 79
RET (in line 2) returns k nearest neighbours CBlocal given a query. The aligned chunks
of text for each problem attribute are then extracted from the nearest neighbours whose attribute values are most similar to the query’s. Text chunks are extracted from retrieved cases in decreasing order of similarity to the query, because similarity reflects the overall closeness to the query. This means that chunks of text from a case with a higher overall similarity should be more reusable by a new author with little or no modifications. The conditional statement ‘qr = rj AND|qv − vj| < dv’ on line 11 ensures that aligned chunks of text for each attribute are only extracted from the first similar case within the specified neighbourhood whose attribute values best match the query’s. The difference between the query and case attribute values, dv, should be initialised to any value bigger than the difference between the lowest and highest possible values of each attribute. This ensures that the difference between the attribute values of the query and the best match case is smaller than the initialised dv value. Therefore the conditional statement will be true at least once and sentences from the best match case will be the default in the assembled solution if better values are not found in the neighbourhood for any particular attribute. When the algorithm is applied for hotel review authoring, function attribute returns one of the five rating attributes (e.g. location) while attributeV alue returns an integer between 1 and 5 (e.g. 4) showing the rating value.