Transformational Text Reuse - Reuse Algorithms for Text Authoring

4.2 Reuse Algorithms for Text Authoring

4.2.2 Transformational Text Reuse

We introduce a transformational (XFRM) approach to text reuse which uses multiple nearest neighbours of a new problem to assemble a solution text, rather than a single nearest neighbour as used in our baseline technique (see Section 4.2.1). The main idea is to use

4.2. Reuse Algorithms for Text Authoring 77

other nearest neighbours to progressively adapt the solution text from the best match case into a more accurate solution. This takes place only if a new problem (query) and its best match are not identical; that is there are mismatches. Therefore some attribute values in the query will be diﬀerent from those in the best match case. Transformational text reuse occurs by replacing chunks of text aligned to mismatched attributes with those from other nearest neighbours having better attribute values than the query; ‘better’ here means their values are closer to the query than those from the best match case. Such chunks of text are only replaced if they are not aligned to any other problem attribute. Also, the search space for cases with better attribute values can be controlled by lim- iting the neighbourhood parameter (k). This approach to text reuse is similar to CBR transformational adaptation (Chang, Cui, Wang & Hu 2004), where solution elements are re-organised through add and delete operations. It is also similar to substitutional adaptation (Wilke & Bergmann 1998, Gonz´alez-Calero et al. 1999, Adeyanju et al. 2008), when viewed as successive replacement of aligned chunks of text in solution obtained from the best match case.

C

L

R

S

V

2

1

3

5

2

Query

Soluon

C le a n lin e ss d e scrip tio n . L o ca tio n d e scrip tio n . R o o m d e scrip tio n . S e rvice d e scrip tio n . Va lu e d e scrip tio n .

C

L

R

S

V

1

4

5

3

2

1

3

4

2

4

3

2

4

5

1

2 …

1-NN 2-NN 3-NN 4-NN …

C

L

R

S

V

…

Case Descripon Author Rangs Case Soluon Aligned Sentences

Figure 4.2: Transformational text reuse with hotel reviews

Transformational text reuse is illustrated in Figure 4.2 where it is applied to our hotel review dataset. Given a query consisting of values for the ﬁve rating attributes, speciﬁed

k−nearest neighbours are retrieved. Here k = 4, therefore four nearest neighbours are

shown with their problem attribute values and aligned sentences as shaded squares. Only sentences aligned to the ‘location’ and ‘service’ ratings are chosen from the best match (1-NN) since they have attribute values identical to the query’s. Mismatched attribute

4.2. Reuse Algorithms for Text Authoring 78

values are resolved by utilizing aligned sentences from other similar cases in the neighbourhood (2-NN & 3-NN) with closer values to the query than 1-NN. The aligned sentences used for assembling a proposed solution are shown in the diagram as squares with dark outlines. Note that if no better values for the mismatches are found in the other neighbours, sentences from the best match case aligned to these rating attributes are retained in the assembled solution. Therefore, there will always be ﬁve or more sentences in the assembled solution, assuming all cases in the casebase have at least one unique sentence aligned to each attribute.

Algorithm 4.2 Transformational text reuse algorithm (XFRM)

Require: CB={C1, . . . , C_n}, set of cases in the case base

Require: R ={r1, . . . , rp}, set of structured attributes e.g. ratings in hotel reviews

Require: V ={v1, . . . , vq}, set of possible values for each structured attribute

e.g. rating values

Require: IE= information entity consisting of a structured attribute with distinct value,

where (attribute(IE)∈ R) ∧ (attributeV alue(IE) ∈ V )

Require: C_i={IE_i1, . . . ,IE_ip, SolutionText_i}, where (i ∈ {1 . . . n})

i.e. a case consists of p attribute values and a solution text

Require: Q ={IE1, . . . ,IEp} , a query with p attribute values

1: SOLN ={SOL1, . . . , SOL_p},

<!– set of proposed sentences for each problem attribute –>

2: CB_local ← RET (CB, Q, k), retrieve k similar cases

3: for each IE_j ∈ Q do

4: qr = attribute(IE_j)

5: qv← attributeV alue(IE_j)

6: dv ← 1000 <!– initialise diﬀerence between query and case attribute values to

determine best match –>

7: for each C_i∈ CB_local (in order of decreasing similarity) do

8: SolutionText← getSolutionT ext(C_i)

9: r_j ← attribute(IE_j, C_i)

10: v_j ← attributeV alue(IE_j, C_i)

11: if qr = r_j∧ |qv − v_j| < dv then

12: dv ← |qv − v_j|

13: clear(SOL_j)

14: S_j ← selectAlignedT extChunks(r_j, SolutionText)

15: addT extChunks(S_j, SOL_j)

16: end if

17: end for

18: end for

19: Aggregate all chunks of text in SOLN for reuse

4.2. Reuse Algorithms for Text Authoring 79

RET (in line 2) returns k nearest neighbours CB_local given a query. The aligned chunks

of text for each problem attribute are then extracted from the nearest neighbours whose attribute values are most similar to the query’s. Text chunks are extracted from retrieved cases in decreasing order of similarity to the query, because similarity reflects the overall closeness to the query. This means that chunks of text from a case with a higher overall similarity should be more reusable by a new author with little or no modifications. The conditional statement ‘qr = r_j AND|qv − v_j| < dv’ on line 11 ensures that aligned chunks of text for each attribute are only extracted from the first similar case within the specified neighbourhood whose attribute values best match the query’s. The difference between the query and case attribute values, dv, should be initialised to any value bigger than the difference between the lowest and highest possible values of each attribute. This ensures that the difference between the attribute values of the query and the best match case is smaller than the initialised dv value. Therefore the conditional statement will be true at least once and sentences from the best match case will be the default in the assembled solution if better values are not found in the neighbourhood for any particular attribute. When the algorithm is applied for hotel review authoring, function attribute returns one of the five rating attributes (e.g. location) while attributeV alue returns an integer between 1 and 5 (e.g. 4) showing the rating value.

In document Case reuse in textual case-based reasoning. (Page 91-94)