U
NIVERSITY OF
M
ELBOURNE
DOCTORAL
THESIS
Significant Revision Identification between
Revised Texts in a Multi-Author
Environment
Author: Ping Ping TAN ORCID: 0000-0003-3798-0199 Supervisor: Karin VERSPOOR, Tim MILLERA thesis submitted in total fulfilment of the requirements for the degree of Doctor of Philosophy
in the
School of Computing and Information Systems
iii
UNIVERSITY OF MELBOURNE
Abstract
School of Computing and Information Systems
Doctor of Philosophy
Significant Revision Identification between Revised Texts in a Multi-Author Environment
by Ping Ping TAN
Despite advancement in collaborative writing tools, the track changes capability remains limited to highlighting syntactic changes, with authors still required to manu-ally read through each of the revisions. We envision a collaborative authoring system where an author could accept all minor edits first and then focus on the substantial changes. The primary goal of this thesis is to develop a computational framework for significant revision identification where paraphrase approaches cannot fully support such identification. An existing taxonomy of revision analysis categorises revisions to surface (i.e. no meaning) and text-base (i.e. meaning) changes, with further categorisa-tion of surface change to formal changes and meaning preserving changes, while text-base change is sub-divided to micro-structure and macro-structure changes. How-ever, the taxonomy lacks details for computational modelling. Through examination of the works in the domain of psycho-linguistics, introspective analysis and feedback from both authors and non-authors on what constitute significant revisions, a con-ceptual framework for significant revision identification is proposed. An inter-rater agreement of alpha Krippendorff = 0.745 was obtained for the annotation between the authors and non-authors. The core concept of our proposed approach is bi-directional textual entailment assessment. We demonstrated that this concept is computationally feasible by relying on existing textual entailment systems. Our proposed approach is
more accurate (micro-averagedF1 = 0.541) compared to several baseline approaches
based on edit distance, which are similar to the current track changes capability built in most of the word processors. Computationally identifying significant revisions between two versions of a text document has the potential to improve the revision process in a multi-author environment when multiple revisions are done by different authors.
v
Declaration of Authorship
I, Ping Ping TAN, declare that this thesis titled, “Significant Revision Identification
between Revised Texts in a Multi-Author Environment” and the work presented in it are my own. I confirm that:
• due acknowledgement has been made in the text to all other material used; and • the thesis is fewer than the 100 000 word limit in length, exclusive of tables,
maps, bibliographies and appendices.
Signed:Tan Ping Ping
Preface
The following peer-reviewed publications were published in the candidature: Tan, P. P. , Verspoor, K. & Miller, T. (2015). Structural alignment as the basis to improve significant change detection in versioned sentences. In Proceedings of the Australasian Language Technology Association Workshop 2015 (pp. 101-109).
Tan, P. P. , Verspoor, K. & Miller, T. (2016). Rev at SEMEVAL-2016 Task 2: Aligning chunks by lexical, part of speech and semantic equivalence. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 777-782).
vii
Acknowledgements
My PhD journey had been an amazing journey and I forever grateful to everyone who had been on this journey with me. To my dearest supervisors Karin Verspoor and Tim Miller, thank you very much for your patience and guidance. I am thankful to all the annotators: Philip, Liz, Julian, Jan, Oliver, Marco, Bahar, Jey Han and all those who had contributed through the online survey. Special thanks to Justin, Rhonda, Julie, Steven and all of the supportive staff of UNIMELB. Thank you to my lab mates Miji, Doris, Nitika, Mohammad, Fei, Siva, Yitong, Qingyu, Diego, Long, Aili, Wei Hao and Wenxi. Thank you to my dearest housemate Alex and my uncles, aunties and cousins in Melbourne. Thank you very much my dearest mum, dad, sis, bro, bro in law, Aidan and Leonard. Not forgetting Zilu, Elly, Alex, friends and supportive colleagues at UNIMAS.
ix
Contents
Abstract iii Declaration of Authorship v Acknowledgements vii 1 Introduction 1 1.1 Motivation . . . 1 1.2 Background of Study . . . 4 1.3 Collaborative Writing . . . 41.4 Significant Revisions Identification between Revised Text Documents . . 6
1.5 Aims and Objectives . . . 7
1.6 Research Approach . . . 7
1.7 Thesis Overview . . . 8
2 Literature Review 11 2.1 Theoretical Analysis of Text Revision Changes . . . 11
2.1.1 A Taxonomy for Analysing Revision . . . 12
2.1.2 Micro- and Macro-structure in Written Discourse . . . 16
2.2 Automatic Classification for Various Types of Edits in Text Revision . . . 17
2.3 Measuring and Scoring Edits . . . 19
2.3.1 Edit Distance . . . 20
2.3.1.1 Levenshtein’s Edit Distance . . . 20
2.3.1.2 Word Error Rate . . . 21
2.3.2 String Similarity Measurement . . . 21
2.3.2.1 Jaro-Winkler Similarity . . . 22
2.3.2.2 Normalised Edit Distance . . . 22
2.3.3 Sentence Similarity Measurement . . . 23
2.3.4 Pearson Correlation Coefficient . . . 23
2.3.5 Scoring of Edit Importance . . . 23
2.4 Text Revision Processing . . . 25
2.4.1 Summarisation and Visualisation in Collaborative Writing . . . . 25
2.4.2 Diff Utility . . . 27
2.4.3 Sentence Alignment . . . 27
2.5 Evaluation of Text Revision Classification . . . 30
2.5.2 Evaluation Measurements for Revision Classification . . . 32
2.6 Meaning Change Identification . . . 33
2.6.1 Paraphrase Recognition . . . 33
2.6.2 Recognition of Textual Entailment . . . 35
2.6.2.1 Tree Edit Distance . . . 37
2.6.2.2 Transformation Based . . . 37
2.6.2.3 Classification . . . 38
2.7 Chapter Summary . . . 38
3 A Conceptual Framework for Revision Types Categorisation 41 3.1 An Overview of Revision Types Categorisation Conceptual Framework 42 3.2 Inferring Meaning Change in a Text Discourse using Textual Entailment 46 3.3 Corpus I: Versioned Use Case Specifications . . . 49
3.4 Introspective Assessment . . . 50
3.4.1 Assessment Scope . . . 51
3.4.1.1 No Change . . . 51
3.4.1.2 Local Change . . . 51
3.4.1.3 Global . . . 53
3.4.2 Advanced Edit Operation . . . 53
3.4.3 Bi-directional Textual Entailment . . . 54
3.5 Human Feedback on Meaning Change in Text Revision . . . 55
3.5.1 Authors’ Perception of Meaning Change in Text Revisions . . . . 57
3.5.2 Authors versus Non-authors’ Perception of Meaning Change in Text Revision . . . 60
3.6 2-Category and 4-Category Meaning Change in Text Revisions . . . 62
3.7 Preliminary Comparison - Similarity and Alignment . . . 65
3.8 Derivation of the Different Kinds of Revision Changes . . . 66
3.8.1 Formal and Meaning Preserving Changes . . . 67
3.8.2 Micro-structure Change . . . 68
3.8.3 Macro-structure Change . . . 70
3.9 Chapter Summary . . . 70
4 Significant Revision Identification Computational Framework 73 4.1 Overview of Significant Revision Identification Computational Frame-work . . . 73
4.2 Versioned Texts Pre-processing . . . 76
4.3 Textual Entailment Evaluation Phase . . . 81
4.4 Revision Type Categorisation Phase . . . 83
4.4.1 Bi-directional Textual Entailment Evaluation Component . . . 83
4.4.2 Surface Change: Differentiation between Formal and Meaning Preserving Changes . . . 84
xi
5 Development of Comparison Data and Baseline Comparison 87
5.1 Corpus II: Drafts of Academic Papers . . . 87
5.2 Human Annotation of Significant Revisions . . . 89
5.2.1 Annotation Guidelines . . . 89
5.2.2 Annotation Process . . . 90
5.3 Inter-annotator Reliability for Human Annotation of Revision Types . . 91
5.4 Baselines . . . 92
5.4.1 Correlation between Human Annotation and Levenshtein’s Dis-tance at Word And Character Level . . . 94
5.4.2 Baseline Methods . . . 94
5.5 Chapter Summary . . . 96
6 A Case Study of Significant Revision Identification 99 6.1 Revision Type Categorisation General Process Flow . . . 100
6.2 Significant Revision Identification Experimental Setup . . . 100
6.2.1 Versioned Texts Pre-processing . . . 101
6.2.2 Recognition of Textual Entailment . . . 101
6.2.3 Classification of Revision Type . . . 103
6.2.4 A Revised Sentence Pair Example for Significant Revision Iden-tification . . . 103
6.3 Baseline Experimental Setup . . . 104
6.4 Revision Type Classification Results and Analysis . . . 104
6.4.1 Tree Edit Distance . . . 105
6.4.2 Different Feature Sets in Classification Based Entailment Deci-sion Algorithms . . . 108
6.4.3 Knowledge-based Transformations . . . 109
6.4.4 Levenshtein’s Edit Distance based Approaches . . . 112
6.5 Surface Change: Distinguishing Formal and Meaning Preserving Changes112 6.6 Micro-structure Change Categorisation . . . 115
6.7 Macro-structure Change as Significant Revision . . . 116
6.8 Surface change vs Text-Base change . . . 118
6.9 Other Observed Revisions and Entailment Decision Algorithm . . . 122
6.10 Limitations of Recognition of Textual Entailment System . . . 122
6.11 Chapter Summary . . . 124
7 Conclusion, Contributions and Future Work 127 7.1 Summary of Chapters . . . 128
7.2 Contributions . . . 129
7.3 Limitations and Future Work . . . 131
7.4 Closing Remark . . . 132
A Author Feedback Form 145
B Non-author Feedback Form 171
C Significant Revision Identification Annotation Guidelines 183
C.1 Introduction . . . 183
C.2 Types of Meaning Change in Revision . . . 183
C.3 Main Annotation Steps . . . 185
xiii
List of Figures
1.1 A taxonomy for analyzing revision (Faigley and Witte,1981) . . . 3
1.2 An Overview of the Research Methodology . . . 8
2.1 An overview of Literature Review Chapter . . . 12
2.2 A taxonomy for analyzing revision (Faigley and Witte,1981) . . . 13
2.3 Topic Evolution Chart of four topics (T1, T2, T3 and T4) for 17 versions according to the percentage the topic is covered in a version (Southavi-lay et al.,2013) . . . 26
2.4 LaTeXDiff Output . . . 28
3.1 Revision Type Categorisation Conceptual Framework adapted from Faigley and Witte (1981) by applying bi-directional textual entailment concepts . 43 3.2 Authors’ Ratings . . . 58
3.3 Differences in Authors’ Ratings . . . 59
3.4 Non-Authors’ Ratings . . . 61
3.5 Authors versus Majority Ratings . . . 63
4.1 The process of developing Significant Revision Identification from Tax-onomy to Computational Framework . . . 74
4.2 Significant Revision Identification Computational Framework . . . 75
4.3 The process flow in Versioned Texts Pre-processing Phase . . . 76
4.4 Sample Output of LATEXDiff between the original text, vo and revised text,vr. Red strike off shows deletion, while blue curly underline shows addition, black text is unchanged text . . . 80
4.5 The process flow in Textual Entailment Evaluation Phase . . . 81
4.6 The process flow in the Revision Type Classification Phase . . . 83
6.1 Revision Type Categorisation General Process Flow . . . 100
6.2 Experimental setup that consists of three main phases to investigate dif-ferent entailment decision algorithms (presented using red box) on clas-sification of revision type . . . 101
6.3 Baseline approaches experimental setup . . . 105
6.4 Micro- and macro averaged F1-score for the overall surface and text-based changes categorisation results for Annotator 1 . . . 121
xv
List of Tables
1.1 Diff between Original and Revised Sentences . . . 2
2.1 Evaluation Measures withµas micro-averaging and M as macro-averaging (Sokolova and Lapalme,2009) . . . 33
3.1 Bi-directional Textual Entailment in relation to Revision Changes . . . . 48
3.2 Changes Statistics for OWS Use Case Specifications: Pre-Operative Plan-ning for Hip Version 0.9 and Version 1.0 . . . 50
3.3 Examples of Versioned Sentence Pairs . . . 52
3.4 Example of Local and Global Assessments . . . 53
3.5 Examples of sentence revision according to revision type as presented in the introductory page of the questionnaire . . . 56
3.6 Inter-rater reliability measurements from the feedback of two authors’ . 64 3.7 Inter-rater reliability measurements for feedbacks from the Non-author Participants . . . 64
3.8 Comparison of various approaches to support identification of signif-icant changes with the correlation coefficient against human feedback on significance . . . 66
3.9 Different kinds of revision changes based on feedback by human with the related entailment outcome . . . 67
4.1 Core in the conceptual framework for significant revision identification . 74 4.2 Segmented Sentence fromdiff output . . . 78
4.3 Example of sentence pair from segmentation process . . . 80
4.4 Inputs to RTE System . . . 82
4.5 Example of Input and Output for RTE Phase . . . 82
4.6 Example of Input and Output for Bi-direction Entailment Evaluation Component . . . 84
4.7 Example of Input and Output for Formal and Meaning Preserving Change Differentiation Component . . . 84
5.1 Corpus Summary for drafts of Academic Papers . . . 88
5.2 Qualitative Questions for Human Annotation of Significant Revision Identification . . . 91
5.3 Inter-Annotators Reliability Measurement for Revision Type Categori-sation for Drafts of Academic Papers . . . 91
5.4 Revision Types Distribution for Corpus II as annotated by Human
An-notators . . . 92
5.5 Sample Revision Sentences from Corpus II . . . 93
5.6 Correlation between Levenshtein’s Distance and Revision Types . . . 94
5.7 Range settings algorithms for each paper . . . 95
5.8 LvDWord Range for Paper 1, Paper 2 and Paper 3 . . . 96
5.9 LvDChar Range for Paper 1, Paper 2 and Paper 3 . . . 96
6.1 Significant revision identification results against annotation by annota-tor 1, A1 for micro- and macro-averaged Precision, Recall andF1-score . 105 6.2 Significant revision identification results against annotation by annota-tor 2, A2 for micro- and macro-averaged Precision, Recall andF1-score . 106 6.3 Confusion Matrix for SigRevTED . . . 108
6.4 Confusion Matrices for SigRevMaxEnt, SigRevMaxEntWNVO and Si-gRevMaxEntAll with cells filled with blue colour are true positives as compared to annotator 1 and 2 for the respective revision types: for-mal change (FC), meaning preserving change (MPC), micro-structure change (MiSC) and macro-structure change (MaSC) . . . 110
6.5 Confusion Matrix for SigRevBIUTEE . . . 111
6.6 Confusion Matrices for Levenshtein’s Word and Character Level . . . 112
6.7 Performance for formal (FC) and meaning preserving (MPC) changes against annotation by annotator 1, A1 for Precision, Recall andF1-score . 113 6.8 Performance for formal (FC) and meaning preserving (MPC) changes against annotation by annotator 2, A2 for Precision, Recall andF1-score . 113 6.9 Performance for micro-structure change (MiSC) against annotation by annotator 1, A1 for Precision, Recall andF1-score . . . 115
6.10 Performance for micro-structure change (MiSC) against annotation by annotator 2, A2 for Precision, Recall andF1-score . . . 116
6.11 Performance for macro-structure change (MaSC) against annotation by annotator 1, A1 for Precision, Recall andF1-score . . . 117
6.12 Performance for macro-structure change (MaSC) against annotation by annotator 2, A2 for Precision, Recall andF1-score . . . 117
6.13 Surface and text-base revision types Precision, Recall andF1-score cat-egorisation results comparing between SigRevTED, SigRevMaxEnt, Si-gRevMaxEntVOWN, SigRevMaxEntAll, SigRevBIUTEE, LvDWord and LvDChar . . . 119
6.14 Different kinds of revision changes in relation to the strategy used in entailment decision algorithm . . . 123
1
Chapter 1
Introduction
1.1
Motivation
Revision or versioned text documents are texts that have been changed from the origi-nal text, where the origiorigi-nal source texts are available. Some revisions of documents merely re-phrase or improve writing style, while others can change the meaning of passages (i.e. significant revisions). Revision to documents is commonly practised in many contexts, such as academic writing, legal document preparation, policy refine-ment, and software requirements review, which generally involve multiple authors.
Aneditis defined as a change that involves operations such as insertion, deletion or
substitution of characters or words within a revised text. Authors can make multi-ple edits for the same text document, and especially in a multi-author environment, multiple edits by different authors can complicate the revision process.
Most of the current collaborative editors are enhancements to text processors, for
example Microsoft Word1and Overleaf2, provide the capability to track which author
made the change. More advanced versioned document tools that are used for version control such as Apache Subversion3, not only serve as a repository for versioned docu-ments, but also as an administrative platform to enforce good versioning practises, for instance, standard naming of files and document revision history. In addition, these tools provide the capabilities to link multiple documents together. When a change occurs, other users may also be notified of the change. Despite advancement in these tools, the track changes capability remains limited to highlighting edits at character and word level. In addition to the track changes feature, current word processors have grammar and spell checker features, which also track at word or character level. Hence, users must still manually go through each edit. Furthermore, with the cur-rent track changes feature, the authors are still required to read the overall sentences surrounding the edits in order to make sense of the changes, regardless of how small the revision may be. When multiple revisions occur, this task can be overwhelming especially when multiple authors are involved in the writing process. When revising a document within a limited time, some changes can be easily overlooked or unno-ticed and the consequences can be more severe if a meaning change goes unnounno-ticed.
1https://office.live.com/start/Word.aspx 2https://www.overleaf.com
Should versioned document tools be able to automatically differentiate the edits be-tween meaning and no meaning change, we hypothesized that this can improve the revision efficiency in terms of attention and time by authors to concentrate on edits with meaning change and may be helpful especially when one draft by an author is passed to another author.
The presentation of the track changes feature in most of the word processors is quite similar where characters or words that have been added or deleted are
high-lighted. Such presentations are normally generated using thediff utility which
com-pares the two versions of the revised text. An example of output usingdiff between an
original and revised sentences is provided in Table1.1. so andsrare syntactically
sim-ilar, but contain superficially minor differences (see the highlighted words), that nev-ertheless change the meaning substantially. In this case, the login process is revised to be a compulsory step. These types of sentences are common in revised documents, which makes it challenging to compute meaning change. For example, inserting a word ‘not’ is a small syntactic change with a large semantic meaning change. In ad-dition, since edits are widely available, can edits alone be used to assess the impact of revision changes? This research investigates how edits and the words surrounding the edits can support the task of identifying significant revision.
TABLE1.1:Diff between Original and Revised Sentences
Original Sentence,so Revised Sentence,sr
Surgeon authentication, e.g. user id and password, may be performed for safety and data security reasons
Authentication, e.g. user id and pass-word, is performed for safety and data se-curity reasons
Diff Output
An automatic identification of significant revisions between two versions of a text document will assist the author to make better informed decisions whether recent changes are of major or minor changes. Assisting authors to assess whether a revi-sion change is meaning change or not can be useful in prioritising revirevi-sion especially drafting among multiple parties. This can reduce an author's time in reviewing edits especially where the documents can be thousands of pages long and where changes can have profound impact such as public policy documents where changes can have profound impact on how a government mandate is operationalised or in an educa-tion environment where editorial changes to student work could inform areas where instructors should focus their teaching.
There are works that automatically classified user edits such as factual and fluency
edits (Bronner and Monz,2012; Daxenberger and Gurevych,2013), and students’
re-vision behaviour (Zhang and Litman,2015), while Goyal et al. (2017) look into certain
1.1. Motivation 3 automatic classification of revisions based on minor and major meaning change or what we defined as significant change identification.
Previous work on revision, whether based on automated or manual analysis, has acknowledged that there are both meaning and non-meaning affecting changes (Faigley
and Witte,1981; Bronner and Monz,2012; Daxenberger and Gurevych, 2013; Zhang
and Litman,2015; Goyal et al.,2017). Faigley and Witte (1981) proposed a taxonomy
to analyse revision according to the meaning change (Figure1.1). They classified
re-vision into several types. On a general scale, they definedsurface changesas edits that
improved readability without actually changing the meaning of the text, andtext-base
changesas edits that altered the original meaning of the text. These categories were
sub-divided. The subcategories for surface changes:formal changeincludes copy
edit-ing operations such as correction in spelledit-ing, tense, format, etc., whilemeaning preserv-ing changeincludes re-phrasing. For text-base changes, the sub-categories are micro-structure changeor meaning-altering change which do not affect the original summary
of the text andmacro-structure changeor major change which alters the original
sum-mary of the text.
FIGURE 1.1: A taxonomy for analyzing revision (Faigley and Witte, 1981)
Framed by this taxonomy (Figure 1.1), this research investigates revision from a
meaning change perspective. We explore how to identify revision that has greater impact or is more significant than another and how to automatically differentiate re-vision types, in a multi-author rere-vision environment. On the whole, we hope this will improve revision experience especially when transitioning from one draft by one author to another.
1.2
Background of Study
Revision is defined as any change that occurs during the writing process including
error corrections, rephrasing and removing or replacing content (Fitzgerald, 1987).
Revision can be viewed in two parts (Fitzgerald,1987):
• the changes made; the by-product of revision (i.e. revised documents)
• the mental workings of revision or in other terms, the processes involved in revision before making direct edits.
Revision, part of the writing process, is a multifaceted process (Faigley and Witte, 1981; Boiarsky,1984; Hashemi and Schunn,2014) where the writer is trying to articu-late his/her thought. At the same time, the writer is reading and trying to see the text from the reader's perspective, while taking into different considerations like the sub-ject matter, the knowledge of the reader, the style of writing. When the writing flow is not right, the process turns into a troubleshooting process which leads to a problem solving process. Most of all, revision is a recursive process (Boiarsky,1984; Fitzgerald,
1987). The complexity of the revision process cannot be easily comprehended even
for expert writers, let alone for novice writers (Faigley and Witte,1981; Wallace and
Hayes,1991).
Some collaborative editors include a revision history of all user edits including edit
tags to maintain changes, for example Wikipedia4, creating a large pool of data
use-ful for the purpose of classifying user edits such as factual and fluency edits (Bronner
and Monz,2012; Daxenberger and Gurevych,2013). Other datasets are not as detailed
as Wikipedia, as metadata is limited to revision date and the author who made the
revision (Southavilay et al.,2013). Requesting authors to markup each of their
revi-sions such as grammar correction or re-phrase, will disrupt their writing flow, not to mention that it will be time consuming if there is lots of revision. Furthermore, these markup tags might not necessarily be usable for documents of different types.
1.3
Collaborative Writing
Collaborative writing(CW) is defined as two or more authors directly involved in
col-laborating to produce a written work (Storch,2005; Ede and Lunsford,1990; Noël and
Robert,2004). Our research focuses on existing versions of text documents produced
in a multi-author environment, thus, this section reviews related works on collabo-rative writing such as writing strategies and tools to support collabocollabo-rative writing
or computer supported collaborative writing (CSCW)5. There are four CW strategies
(Noël and Robert,2004; Scheliga,2015): 4https://en.wikipedia.org/wiki/Help:Page_history
5This thesis focuses on collaborative writing (CW), hence, CSCW refers to Computer Supported Col-laborative Writing or computer assisted tools for colCol-laborative writing. However, do not be confused with Computer Supported Cooperative Work which Baecker et al. (1995) defined as “computer-assisted coordinated activity carried out by groups of collaborating individuals” that covers a wide range of ac-tivities such as communication and problem-solving, including co-authoring a document.
1.3. Collaborative Writing 5 • One author produces a draft and passes it to another author, sequentially; • Different authors write different parts of a text;
• Only one author writes the text but the text is extended or improved through group discussion;
• Multiple authors write synchronously.
Collaborative writing should not be confused with interactive writing (Button,
Johnson, and Furgerson, 1996; Aditomo, Calvo, and Reimann, 2011; Mulligan and
Garofalo, 2011; Storch, 2005; Yarrow and Topping, 2001). In interactive writing, an
author is given feedback such as an opinion about their writing but the person sup-plying that feedback is not directly involved in producing the piece of written work. This occurs in teacher feedback and peer review. CW strategies describe the possible ways authors may interact. When we consider the content of a text document revised
by multiple authors (Table1.1), we will see that an automated meaning change
detec-tion between revised texts will assist during the transidetec-tion from one draft by an author to another.
According to a survey conducted by Noël and Robert (2004), they found that de-spite the existence of specialized collaborative writing tools, most respondents re-ported still using individual word processors and email as their main tools for writ-ing joint documents. Their findwrit-ings indicated that users want more than just a tool to write together and recommended functions such as change tracking, version con-trol, and synchronous work for collaborative writing tools. Currently, most CSCW tools incorporate those features and are widely available. Although Wikipedia is not a CSCW tool, CW strategies still apply such as different authors collaboratively writing to contribute various parts of a text. Both Wikipedia and Google Docs are widely used
as teaching tools to improve social interaction among writers (Bonk and King, 1995;
Hadjerrouit,2014; Parker and Chao,2007; Sharples et al.,1993; Schà ˝uch,2014; Zhou,
Simpson, and Domizi,2012). In a more recent survey (Scheliga,2015), a similar
find-ing is obtained by Noël and Robert (2004): writers use a text processor in combination with other digital technologies such as email and content sharing services, instead of using a CSCW tool.
Earlier research in CSCW focuses on supporting collaboration (Fish, Kraut, and
Leland, 1988; Haake and Wilson, 1992; Sharples et al., 1993) and designing better
user interfaces (Baecker et al., 1993). As technology advances, more research is
fo-cused on CSCW tools for the purpose of teaching and learning (Calvo et al., 2011;
Parker and Chao, 2007; McWilliams et al., 2013; Hadjerrouit,2014; Weiss, Urso, and
Molli, 2007) including studies on behavioural aspects of CW such as frequency of
revisions (Du et al.,2016, visualisation for interaction between authors (Biuk-Aghai,
Kelen, and Venkatesan, 2008) and analysis of writing processes for instance,
devel-opment of ideas during writing (Southavilay et al.,2013) and individual contribution
In summary, CW focuses on the interaction between authors during the writing process. Piolat (1991) stated that it was difficult to conclude with certainty that the use of word processors is always effective in improving writers'revision skills, or that their use necessarily leads to the production of higher quality texts. Even with audit trail data such as which version, which author, what has been edited and the timestamp, the writer lacks input as to whether there has been any substantive meaning change in a revision. Advanced features in CSCW tools have limited support for prioritis-ing revisions and meanprioritis-ing change detection. In the subsequent section, we explore computational works predominantly associated with text revisions.
1.4
Significant Revisions Identification between Revised Text
Documents
The section introduces issues related to the task of significant revision identification (SigRevId) between revised text documents which we explore further in this thesis. The question of the significance of revision is particularly challenging in a multi-author environment as different multi-authors might view the impact differently and mainly, how do we determine what actually constitutes a revision with larger impact com-pared to another or a significant revision for computational implementation. Attempt had been done to rate the importance of the edits according to very important,
mod-erate important, important, neutral and not necessary (Goyal et al.,2017). The edit
importance is rated by reviewers, which is used to predict authors’ perception of edit importance. However, authors and reviewers might have different perceptions. Fur-thermore, edits that are more important might not necessarily have higher impact of change and vice versa.
Previous works on revisions whether based on automated or manual analysis have acknowledged that there are both meaning and non-meaning affecting changes
(Faigley and Witte,1981; Bronner and Monz,2012; Daxenberger and Gurevych,2013).
However, automated classification approach is applied in computational works, while linguistic approach is used in the taxonomy for analyzing revisions (Faigley and Witte,
1981) to categorise revisions to minor and major meaning change. We based
signifi-cant revisions according to the taxonomy (Faigley and Witte,1981), thus, the challenge is how do we integrate linguistic approach to a computational method for identifying significant revision.
Revision varies widely with classification of different revisions requiring different annotated data although the texts can be the same. For example, annotated data is prepared for classification of the reason for revisions in students’ writing (Zhang and
Litman, 2015; Zhang and Litman, 2014). We intend to explore meaning change
de-tection based on linguistic approaches to identify significant revision and a suitable corpus is required for such purpose. The challenge here is to propose an annotation scheme for significant revision identification where authors will agree.
1.5. Aims and Objectives 7
1.5
Aims and Objectives
This research aims to introduce the task of significant revision identification between two versions of a text document in a multi-author environment. We look at versions that come from the same lineage, where one version evolves to another version. For cases where the documents are from different sources, for instance privacy statements from different companies, we do not regard these as versioned documents for the purpose of this research because different companies can derive their own policy in-dependently. However is within our research interest that if the policy is being revised within the same company, the new policy is regarded as a versioned document of the original policy.
This research also aims to develop a computational approach to automatically identify significant changes between versions of a text document, where both ver-sioned documents and original source document are available. The computational al-gorithm developed in this research applies directly to the end product of revision (i.e. revised text documents) excluding external aspects of text revision such as intention of the revision. Our aim is to create a framework that uses linguistic approach with minimal annotated data as training data. However in order to evaluate the compu-tational approach, a corpus to evaluate the task significant revision identification will be prepared. By having such corpus available, various approaches can be compared to further improve the identification of significant revision.
The aim of computationally identifying significant revision changes is to be able to assist authors in making better decisions in response to the impact of change, es-pecially through prioritising revisions. Edits can be as short as inserting a character. Hence, our focus is on identifying significant revisions for the revised sentences be-tween two versions of a text as sentence-level lets authors comprehend the meaning of the changes better compared to edit (Zhang and Litman 2014).
The research questions explored in this thesis are as follows:
• What are the different kinds of revision changes to be considered as significant revision for revised text documents in a multi-author environment?
• Given two versions of a text document in a multi-author environment, how do we identify significant revisions?
• What are the factors in recognition of textual entailment that can support differ-entiation of revision changes between revised sentence pairs?
1.6
Research Approach
In this research, as the task of significant revision change identification consists of for-malising the various revision types for computational implementation, we use various methods: introspective assessment, user studies, a study of text revisions and various computational approaches. As our aim is to create a computational model that closely
resembles how human evaluate the impact of change, we use the understanding of revision process and linguistic knowledge to derive our conceptual framework. The conceptual framework is validated through user studies with document authors and readers or non-authors. In order to ensure that the computational model is applica-ble in general, we evaluate on two different document types. As the computational model consists of various components such as recognition of textual entailment sys-tems, we experiment with various approaches to find a suitable approach for our task.
The overview of the methodology is presented in Figure1.2.
FIGURE1.2: An Overview of the Research Methodology
1.7
Thesis Overview
In Chapter 2, a review of works related to text revision in a multi-author environment is presented. We will divide this review into two main subsections of revision analy-sis research: manual revision categorisation and computational revision classification. This chapter will cover the supporting approaches that can assist in SigRevId.
In Chapter 3, we propose a conceptual model for categorising revision changes based on existing work from linguistics and provide a formal definition for the differ-ent types of revision changes. Based on feedback from authors and non-authors on revision changes, we discuss what constitute as significant revision changes. We per-form introspective analysis of an existing corpus of closely related versioned use case specifications. The introspective analysis, together with the analysis on existing work on psycho-linguistics model, where humans comprehend meaning word by word and follow by phrase by phrase, we highlight that meaning changes can be determined through assessment of both the textual entailment evaluation at sentence level. In ad-dition, the analysis shows properties of this dataset as specific versioned documents.
1.7. Thesis Overview 9 We derive the different types of revision according to the meaning changes, and pro-vide examples for each revision type. The core of the revision classification is assess-ment of both the textual entailassess-ment directions of the revised sentences. As a result of the revision types classification, significant revision change is defined as major mean-ing change.
Conceptually, meaning changes can be determined through assessment of both the textual entailment directions of revised sentences as derived in Chapter 3. Based on this conceptual framework, we propose a computational framework to identify signif-icant revision changes between revised documents in Chapter 4. As edits are widely available, we investigate edits in assessing the impact of change before exploring other components such as words surrounding the edits in assessing the impact of change. Firstly we explore the effect of scoping edits at phrase level. Then participating in Se-mantic Evaluation in the task of SeSe-mantic Textual Similarity (Agirre et al.,2016), using a rule based approach, we examine different similarity level of chunks. These elements contribute to the formation of a computational framework for revision classification.
The conceptual framework proposed in Chapter 3 is derived from specific ver-sioned text documents: software requirements specification, the use case specification. In Chapter 5, we derive an annotation scheme to create a corpus purely for evaluation of our computational framework. The corpus we collected consists of academic pa-pers, which is a different type of revision text documents compared to the one we used to derive the conceptual framework. We evaluated the effectiveness of the anno-tation scheme by measuring the inter-annotator measurements.
Subsequently in Chapter 6, we implement our computational framework using various approaches to recognition of textual entailment and evaluate it on the drafts of academic papers which were annotated earlier. This chapter demonstrates the fea-sibility of using bi-directional textual entailment evaluation in classifying different types of revisions in addition to edits and other components beyond edits. Based on our results, edit distance approach used in recognition of textual entailment system is suitable for revision types categorisation. We demonstrated that we need to consider more than just edits alone for revision types categorisation such as considerations of dependency trees and sentences entailment. However, edits can support detection of formal changes. Significant revision changes can be detected by evaluating that there is no sentence entailment between the revised sentence pairs.
We conclude our research in the last chapter, considering the broad contribution of the thesis and discuss future work.
11
Chapter 2
Literature Review
In the previous chapter, the limitations in existing collaborative authoring tools have been identified and a potential approach to improve collaborative authoring systems by enabling authors to automatically accept edits that do not alter the meaning and focus their cognitive attention on edits that do change the meaning of a document is proposed. Even though our aim is a computational framework for identifying signif-icant revision change between revised text documents, our assumption is that imple-menting a model that resembles natural processes of identifying revisions in texts will be more intuitive and palatable to human users. Thus, human readers can directly understand the detected changes. Hypothetically, the identification of significant re-visions will improve the revision experience for authors. In this chapter, related works on text revision are presented. This review is divided into three main parts:
1. Theoretical analysis from manual text revision research to provide linguistic con-text for our proposed framework,
2. Review of research on measuring edit importance, and
3. Related computational works addressing revision of texts including supporting methods to help us solve our central research problem of automating meaning change detection between versioned text documents.
An overview of this chapter is presented in the Figure2.1.
(Note: Throughout this thesis, theitalicstyle is used to identify a new term: term,
while examples of text revision are presented as: original sentence −→ revised
sentence.)
2.1
Theoretical Analysis of Text Revision Changes
This section reviews revision research as considered from a non-computational per-spective. Text revision can be viewed as the attempt to improve existing written text. It may involve adding or deleting information, or merely re-phrasing so that the mes-sage becomes clearer. Our research aim is to create an automatic approach to signifi-cant revision change identification between revised text documents in a multi-author environment. However, there are more fundamental questions such as how do we
FIGURE2.1: An overview of Literature Review Chapter
define the significance of a revision? Is there any existing definition for this? Theoret-ical research on text revision is analysed to aid us in answering these questions and serve as a foundation for analysing meaning change in text revision. To begin, a key study that provides a classification scheme for revision changes, in term of whether the change alters the meaning of the text or not (Faigley and Witte,1981) is reviewed.
2.1.1 A Taxonomy for Analysing Revision
This section reviews a taxonomy for manual analysis of revision (Faigley and Witte,
1981) (Figure2.2). This taxonomy differentiates between revisions with no meaning
change (i.e. surface) and meaning change (i.e. text-base) revisions. Faigley and Witte (1981) defined surface change (SC) as revision made with no new information added
or old information being removed. This type of change is extended to formal and
meaning preserving changes. Formal change (FC) includes most conventional copy-editing operations. According to the Society of Editors and Proofreaders (Standards
director and Ltd,2016), copy-editing means copying from an existing raw text and
checking for consistency and accuracy in preparation for publication. Changes that fall under FC are revisions such as spelling, tense correction, consistent numbering and modality, abbreviation, punctuation, and format.
As defined in the taxonomy (Faigley and Witte,1981), meaning preserving change
(MPC) includes paraphrases of the concepts in the text without altering those con-cepts. The taxonomy includes revision operations: additions, deletions, substitutions, permutations, distributions and consolidations. They described each of these opera-tions with examples: Addition is defined as “raise to the surface what can be inferred”:
2.1. Theoretical Analysis of Text Revision Changes 13
FIGURE 2.2: A taxonomy for analyzing revision (Faigley and Witte, 1981)
Deletion is described as doing the opposite of addition, thus, “a reader is forced to infer what had been explicit”:
several rustic looking restaurants −→ several rustic restaurants.
Substitution “trades words or longer units that represent the same concept”:
out-of-the-way spots −→ out-of-the-way places.
Permutation “involves rearrangements or rearrangements with substitutions”:
springtime means to most people −→ springtime, to most people, means.
Distribution “occurs when material in one text segment is passed into more than one segment or falls into more than one unit”:
I figured after walking so far the least it could do would be to
provide a relaxing dinner since I was hungry. −→ I figured the least
it owed me was a good meal. All that walking made me hungry.
Consolidation does the opposite of distribution; “elements in two or more units are consolidated into one unit or can be viewed as exercise to combine sentences”:
And there you find Hamilton’s Pool. It has cool green water
surrounded by 50-foot cliffs and lush vegetation. −→ And there you
find Hamilton’s Pool: cool green water surrounded by 50-foot cliffs and
lush vegetation.
If their examples are analysed, addition, deletion and substitution tend to involve words while permutation phrase, distribution and consolidation tend to involve sen-tences.
When the definitions given in (Faigley and Witte,1981) are analysed, no meaning
changeis defined as no new information is brought to a text or removing old infor-mation, while meaning change is defined as adding of new content or the deletion of existing content in such a way that it cannot be recovered through drawing inference.
Here, whether information is new or old is based on the assessment of texts before and after revision. The notion of information is vague and how is inference deduced and made this computationally feasible? Nevertheless, track changes feature is helpful to present the texts before and after revision for comparison purpose.
Text-base change (TBC) or revision with meaning change is divided to micro-structure change (MiSC) and macro-micro-structure change (MaSC). Faigley and Witte (1981) define major and minor revision changes using Kintsch and Van Dijk’s model (Kintsch
and Van Dijk, 1978) for comprehending and producing text. In this linguistics and
cognitive psychology model, readers are said to comprehend a text phrase-by-phrase
and at the same time derive some overall notion of the text calledgistor topicof the
text. The meaning of a text is processed at two levels: micro-structure and macro-structure. Micro-structure is all the concepts in the text, both explicit and inferred concepts. Macro-structure characterises the discourse as a whole and represents the “gist” of the text such as a series of labels for section of a text or plot outline. Al-though Faigley and Witte (1981) stated that macro-structure is a summary of the text, they explained that there was a difference between a summary and macro-structure. Macro-structure can be abstracted from the proposition of a text using series of rules, which we will review in the subsection below.
Faigley and Witte (1981) explained that to distinguish between micro- and macro-structure changes, MiSC would not affect a summary of a text, while MaSC changes the summary. However, MPC falls under the same circumstances as MiSC, that is MPC does not affect the summary. As stated, the difference between MPC and revision with meaning change or TBC is TBC affects the concepts in the text. They did not provide additional explanation on how the concepts were affected.
Faigley and Witte (1981) added more ways to differentiate between micro- and macro-structure changes that is using “constructing summaries for entire texts is to determine if the concepts involved in a particular change affect the reading of other parts of the text.” If the entire text is summarised, how do we determine the length or scope of the summary suited for computational implementation. Furthermore, it will not be that obvious which part of the text is affected by the particular change, thus, posing a challenge for computational implementation. Faigley and Witte (1981) stated that micro-structure is all the concepts that can be inferred. This leads to the question of whether there are concepts that do not affect reading of other parts of the text, and if there are, how much of the affected text should be read? Southavilay et al. (2013) have shown that there are a lot of topics overlap when the two revised texts are compared. Hence, when comparing two versions of a text document, the summarisation approach might not be an effective way to determine if the concepts involved in a particular change affect the reading of other parts of the text because the revised texts can be very similar.
To recap no meaning change and meaning change are defined as follow:
No meaning change new information is brought to the text or old information is
2.1. Theoretical Analysis of Text Revision Changes 15
Meaning change new information is brought to the text or old information is
re-moved in such a way that itcannotbe recovered through drawing inference
Faigley and Witte (1981) used Kintsch and Van Dijk’s (1978) theoretical model to explain meaning at two levels: a microstructure level is all concepts in a text includ-ing those that can be inferred, while macrostructure level represents the “gist” of the
text. Based on the theoretical model (Kintsch and Van Dijk,1978), gist or topic, can
be thought of as a series of labels for sections in a text. Macrostructure is essentially the summary of a text and an example of macrostructure is a plot outline. Faigley and Witte (1981) agreed that macrostructure theory is useful but inadequate for distin-guishing minor and major revision change. Although Faigley and Witte (1981) consid-ered micro-structure change as minor change of meaning and macro-structure change as major change of meaning, concise definitions are required in order to develop a computational implementation.
The summary of the taxonomy for analyzing revision (Faigley and Witte,1981) is
listed in List2.1.
LIST 2.1: Summary of taxonomy for analysing revision (Faigley and Witte,1981)
• There are four types of meaning change in text revision: Formal, Meaning
Pre-serving, Micro-structure and Macro-structure (Figure2.2).
• Formal change has no meaning change and is generally spelling and grammar correction, numbering, copy-editing changes such as formatting. Other than capitalisation, no other exceptional case is listed.
• Meaning preserving changeis re-wording or re-phrasing or re-arrangement of sen-tences, including paraphrasing, that does not result in any meaning change. The examples supplied for addition, deletion, substitution and permutation are word- and phrase-level while consolidation and distribution are changes at sentence-level.
• Micro-structure changeis meaning change which does not affect the summary of the text, which covers all concepts in a text that can be inferred. Micro-structure change is minor revision change.
• Macro-structure change is change that affects the summary or the “gist” of the text. Macro-structure change is major meaning change.
The attempt by Faigley and Witte (1981) to classify revision is helpful and this theo-retical analysis provides a fundamental understanding of meaning change in text revi-sion. Even though they provided a general definition for the terms in their taxonomy, there is lack of detailed specifications of micro- and macro-structure changes regarding what is considered as “gist” or new information. Faigley and Witte (1981) suggested to use summary approach, however summary can involve summary of a paragraph or summary of the overall text documents. Furthermore, summaries by definition, are precise description which do not contain events or actions but exhibit rather general
or global facts (Van Dijk,1980). The underlying concept for the taxonomy is “whether new information is brought to the text or whether old information is removed in such a
way that it cannot be recovered through drawing inference” (Faigley and Witte,1981).
Our aim is to propose a computational approach that can identify significant revision or revision with higher impact of change. Computationally, what is the approach to draw inference? Thus, an automated system for identification of significant revisions cannot be built on top of this taxonomy directly. Clear definitions of MiSC and MaSC are crucial to ensure that a computationally implementable algorithm can be proposed to identify these revisions. The subsection below attempt to understand micro- and macro-structure in discourse better.
2.1.2 Micro- and Macro-structure in Written Discourse
Based on our earlier review of the taxonomy for analysing revision (Faigley and Witte,
1981), micro- and macro-structure changes lack detailed definitions to enable
com-putational implementation. Faigley and Witte (1981) proposed to use the two-level classification to explain meaning change in revision. This section reviews the existing works that look into the micro- and macro-structure of written discourse.
Faigley and Witte (1981) proposed to use Kintsch and Van Dijk’s model (Kintsch
and Van Dijk, 1978). According to this theoretical model, a set of propositions
or-dered by various semantic relations can be used to interpret the surface structure of a discourse. The relations can be either explicit or inferred with additional knowledge such as context-specific and general knowledge. Micro-structure in a discourse is the local structure, for instance, sentences and sequence of sentences that include
cohe-sion, anaphora and inference (Van Dijk,1980). When deducing meaning, using local
sentences and sentence connections alone are insufficient, instead a broader sense or global meaning of the text is required which is the macro-structure (Van Dijk,1980).
Nonetheless, Van Dijk (1980) proposed general rules that link textual propositions with the macropropositions. These macropropositions are used to define the global topic of a fragment. The rules are considered as semantic derivation or inference rules, where macrostructures are derived from microstructures. These rules are based on the relation of semantic entailment or rather, preserve both truth and meaning. They defined such semantic rules which link text bases, or fragments of these, with macro-propositions asmacrorules. Some of the basic macrorules are:
Deletion or reduction For a sequence of propositions, one or more propositions which are unnecessary to interpret other propositions in the text at the macro-structure level are deleted. The resulting macroproposition is entailed by the microstruc-tural sequence.
Generalisation Propositions can be generalised to a single proposition higher level of abstraction or a global concept. Only the joint sequence of propositions entails the global concept and not each of the propositions in the sequence.
2.2. Automatic Classification for Various Types of Edits in Text Revision 17
Construction New proposition must be constructed, involving a new predicate to denote the complex event described by the respective propositions of the text. These respective propositions are considered as a joint sequence and is substi-tuted by the new constructed proposition that denotes a global fact of which the micropropositions denote normal components, conditions, or consequences or what is defined as macroproposition. The entailment relation holds between the sequence of proposition the global concept in the knowledge set (or the lex-icon), where given the global concept, ideally the necessary propositions in the sequence can be specified.
Even though the macro-structure theory (Kintsch and Van Dijk, 1978) is referred
in the taxonomy (Faigley and Witte,1981), Kintsch and Van Dijk’s micro- and
macro-structures are based on propositions in a discourse rather than revisions of a discourse. The macrorules of reduction, generalisation and construction given in the theoretical
model (Kintsch and Van Dijk,1978) are too abstract for computational
implementa-tion. Micro- and macro-structure for revision changes remain without detailed def-inition for computational implementation. However, these theoretical understand-ings serve as the basis to conceptualise micro- and macro-structure revision changes for computational implementation of significant revision identification, which is ex-plained further in the next chapter (Chapter3).
2.2
Automatic Classification for Various Types of Edits in Text
Revision
This section provides a review of approaches to automatic classification of different
revision types. Anedit segmenthas been defined by Bronner and Monz (2012) as a
con-tiguous sequence of deleted, inserted or equal words by comparing between the
orig-inal and revised texts. They further definedfluency editsas changes to improve on the
style and readability andfactual editsare changes that alter the meaning. They used
supervised classification to differentiate the fluency and factual edits in Wikipedia revisions. Daxenberger and Gurevych (2013) proposed to use a predefined 21-edit category taxonomy and used Wikipedia revision histories to perform supervised clas-sification to classify revisions into these categories. Their 21-category taxonomy is divided into three main categories: text-base, surface and Wikipedia policy (vandal-ism and revert) edits, that is not in the taxonomy for revision analysis (Faigley and Witte,1981).
Based on the 13-category taxonomy of the semantic intention behind edits in Wikipedia articles, Yang et al. (2017) built a computational classifier of intentions using labelled article edits. This model is used to investigate the effectiveness of edit intention: how different types of edits predict the retention of newcomers and changes in the qual-ity of articles. In a typical collaborative writing, authors do not vandalise their own writing, thus, categories such as vandalism and counter vandalism are not considered.
However, consideration should be given to the other 11 categories in comparison to the four category taxonomy for analysing revision (i.e. formal, meaning preserving, micro- and macro-structure). Furthermore, similar to the reviews of the other super-vised classification approaches for revision, we foresee the challenges of implementing their model (Yang et al.,2017) as such a model requires a large corpus of labelled data.
Faigley and Witte (1981) worked on manual revision while Daxenberger and Gurevych (2013) addressed computational analysis revision. Some of the definitions can be
linked directly to the taxonomy for analysing revisions (Faigley and Witte, 1981):
surface changes correspond to fluency edits while text-base changes correspond to factual edits. Surface changes can also correspond to surface edits which consist of paraphrases, spelling and grammar corrections, relocations and markup edits
(Daxen-berger and Gurevych,2013). Other observable similarities between manual and
com-putational revision works are the edit operations: addition, deletion and substitution
(Dix,2006; Faigley and Witte,1981; Hashemi and Schunn,2014; Zhang and Litman,
2014; Bronner and Monz, 2012). Other than the edit operations, the edit categories
introduced for text-base edits in (Daxenberger and Gurevych,2013) are not included
(Faigley and Witte,1981). They proposed that text-base edits include sub-categories
for templates, references (internal and external links), files and information, each of which is further divided into insertion, deletion and modification types (Daxenberger
and Gurevych,2013).
Using collaborative editors such as Wikipedia and Google Docs not only track user edits, but are also markups for the type of edits made in the document revision history
(Bronner and Monz,2012; Daxenberger and Gurevych,2013; Southavilay et al.,2013).
This information is valuable for automated supervised machine learning, where fea-tures are generated and used as training set. The feature sets used are character-level, word-level, part-of-speech, named entities, acronym and language model (Bronner
and Monz,2012). As not all edits can be labelled, Daxenberger and Gurevych (2013)
proposed an ‘Other’ category.
One of the possible edits that fall into ’Other’ category is vandalism and reverts. For a free online encyclopedia such as Wikipedia, where most people rely on the in-formation shared, vandalism is a major issue and violating their policies can cause serious problem and thus, this edit category can be considered as a significant change. On the other hand, in a more typical multi-author environment, where it might not necessary be published online or at such scale, where only the authors are allowed to contribute, there might not be any policy intact at all and changes of vandalism
is small. The Wikipedia policy edit category (Daxenberger and Gurevych,2013)
can-not be directly applicable to all revisions in a multi-author environment because the policy is to avoid vandalism such as intentionally stated wrong facts while in an atyp-ical multi-author environment, it is a collaborative written work. Although edit cate-gories have been proposed for a typical multi-author environment (Daxenberger and Gurevych,2013), what is considered as significant revision in this context remains un-known.
2.3. Measuring and Scoring Edits 19 Not all versioned text documents have edits well tracked or a revision history available, as most revisions still use a word processor in combination with emails
or other sharing services (Scheliga,2015). We therefore also review works related to
computational methods that can assist in classifying text revision when edits and re-vision history are unavailable. The most relevant work addresses classification of the
purpose for revision in augmentative writing (Zhang and Litman,2015). They
pro-posed a text revision processing pipeline using supervised machine leaning, exploring different features and supervised classification approach to classify the reasons why writers make revisions. Their revision categories consist of two high level categories, i.e. surface and text-base followed by the sub-categories for surface changes which are organization, conventions/grammar/spelling and word usage/clarity, while the sub-categories for text-based changes are claims/ideas, warrant/reasoning/backing, rebuttal/reservation, general content and evidence. The broader categories in Zhang and Litman (2015), text-base and surface changes, correspond to Faigley and Witte’s (1981) taxonomy for revision analysis. However, the sub-categories are all different and micro- and macro-structure changes cannot be directly compared to those sub-categories. The sub-categories require annotation in order to be able to differentiate them. Furthermore, they do not consider the impact of revision change.
To summarise, there are works on classification of various types of edits (Bronner
and Monz,2012; Daxenberger and Gurevych, 2013; Zhang and Litman,2015).
How-ever, these works do not look into the meaning change implications of these edits and not all edits have markups. In contrast, we attempt to assist writers in a more meaningful way by presenting the assessment of the significance of the revision. This enables prioritisation of revision changes in multi-author revision. In the remainder of this chapter, we review possible computational components for use in building the framework for automatic identification of significant revision between versioned text documents that we will use in our investigations in later chapters.
2.3
Measuring and Scoring Edits
When an author made an edit, she might view the edit as important, while other co-authors of the same paper might not view that edit as important as the author that made the edit. Edit importance can be subjective depending on the author. When humans are presented with edits, in the scenario where an edit is within a sentence, if the edit is not directly comprehensible (as presented in Figure2.4) through reading the edits alone, we have a tendency to read the text surrounding the edit. For the case of when a sentence is added or deleted, a reader skims for similar sentence(s), if it exists. These sentences can either be syntactically similar, or have similar or the same meaning. We summarise other cases we need to consider in text revision as below:
• sentences with high lexical overlap with minor edits that result in no meaning change, for instance spelling corrections
• sentences with high lexical overlap with minor edits that might change the over-all meaning of the sentence
• sentences that have been revised using different words but the meaning re-mained the same (high semantic similarity with possibility of low lexical over-lap), for example paraphrase of a sentence
• sentences that has been revised entirely although there still exists one or two words of overlap.
Here, we review several possible ways to measure edits and score the importance of the edits based on the summary of the revised sentences.
2.3.1 Edit Distance
Edits are changes made to a text. The track changes feature built into word processors,
especially in real-time collaboration environments such as Google Docs1 and
Over-leaf2, shows the edits made by authors. Edit distance (ED) is the minimum number of
edits (deletion, insertion, or substitution) required to transform one string into another
(Navarro,2001). The underlying calculation of the track changes feature is assumed
to be edit distance, similarly to thediff approach that focuses on comparing two files
to identify the changes made and spelling checkers (Gail et al.,2016). There are a few variance of edit distances such as Levenstein’s edit distance (LvD) word error rate (WER), Jaro-Winkler distance and normalised edit distance. LvD and WER will be re-viewed in this section while Jaro-Winkler distance and edit distance in general, can be normalised to be used as measurement for string similarity, which will be reviewed in the string similarity section (Section2.3.2).
2.3.1.1 Levenshtein’s Edit Distance
The Levenshtein’s edit distance (LvD) (Levenshtein,1966) between two strings a and
b, and the length of a and b is |a| and |b| respectively, is given by leva,b(|a|,|b|)
where leva,b(i,j) = max(i,j) i f min(i,j) =0, min(i,j) leva,b(i−1,j) +1
leva,b(i,j−1) +1 otherwise.
leva,b(i−1,j−1) +1(ai6=bj)
(2.1) 1https://www.google.com/docs/about/
2.3. Measuring and Scoring Edits 21 where 1(ai6=bj)is the indicator function equal to 0 whenai =bjand equal to 1 otherwise,
andleva,b(i,j)is the distance between the first i characters of a and the first j characters
of b. The more changes there are between two strings, the higherleva,b.
We provide actual revision sentences as an example to show LvD at word level between a pair of revised sentences,so andsr. If there is no revision at all betweenso
andsr, LvD (so,sr) = 0. If s is revised to t as follows:
so = Surgeon authentication, e.g. user id and password, may be
performed for safety and data security reasons.
sr = Authentication, e.g. user id and password, is performed for
safety and data security reasons.,
then LvD(so,sr) = 3, because there are two deletions (Surgeonandbe) and one
substi-tution (may−→is).
2.3.1.2 Word Error Rate
Word error rate (WER) derives from Levenshtein’s edit distance and commonly used
to evaluate automatic speech recognition systems (Marzal and Vidal, 1993), where
there are automatic generated transcription and reference transcript (McCowan et al.,
2004). We consider WER because for revised sentence pair, there are original and the
revised sentence, which we can consider as generated transcription and automatic transcript. WER is computed as edit distance between a reference word sequence and its automatic transcription, normalised by the length of the reference word sequence (Equation2.2). WER= S+D+I N = S+D+I S+D+C (2.2) where
S is the number of substitutions, D is the number of deletions, I is the number of insertions, C is the number of correct words,
N is the number of words in the reference (N=S+D+C)
2.3.2 String Similarity Measurement
String similarity measurement for two strings compares the two strings and quantifies how similar the strings are (Lu et al., 2013). Similarity value of 0 indicates that the two strings are dissimilar while value of 1 indicates both the sentences are the same, while similarity value closer to 0 shows less similarity while closer to 1 shows the two strings are more similar. We consider similarity approaches that utilise edit distances because previous works on text revision works focus on edits (Bronner and Monz,
2012; Daxenberger and Gurevych,2013; Goyal et al., 2017; Zhang and Litman,2015).
Here, we review Jaro-Winkler similarity and normalised edit distance which can also be used for alignment of various revised sentences.
2.3.2.1 Jaro-Winkler Similarity
Jaro-Winkler distance is another variants of edit distance. In order to measure the
similarity of revised sentences, we consider Jaro-Winkler (Winkler,1990) string
met-ric. Jaro-Winkler algorithm is a modification of Jaro algorithm (Jaro,1989). Both the
equations are computed as below:
simj = ( 0 ifm=0 1 3 m |s1| + m |s2| + m−t m otherwise (2.3) Where:
|si|is the length of the stringsi;
m is the number of matching characters (see below); t is half the number of transpositions (see below)
simw=simj+`p(1−simj), (2.4)
where:
simj is the Jaro similarity for strings
s1ands2
`is the length of common prefix at the start of the string up to a maximum of four
characters
p is a constant scaling factor for how much the score is adjusted upwards for hav-ing common prefixes.
p should not exceed 0.25, otherwise the distance can become larger than 1. The standard value for this constant in Winkler’s work is p=0.1.
2.3.2.2 Normalised Edit Distance
Levenstein’s edit distance (Section2.3.1.1) values are normalised (Equation2.5) to [0,
1] (Attig and Perner,2011) which is used as a string similarity measurement (Navarro,
2001). Conceptually, when these values are applied to string similarity, the value of
1 indicate complete lexical overlap, while value of 0 indicates no minimal overlap, likewise value closer to 0, less lexical overlap and closer to 1, higher lexical overlap. When applied to revised sentences, revised sentences with high lexical overlap but with minor edits will likely to have high string similarity values.
1− editdistance
lengtho f thelargero f thetwostrings (2.5)
Nevertheless, edit distance based approaches only indicate surface changes and cannot measure meaning change. This clearly shows the limitation of current change detection features in supporting meaning change detection. Conceptualising edit im-portance will be tricky because if edit imim-portance is measured according to word over-lap, edit distance is a good indicator. However if edit importance is based on meaning change, edit distance alone might not be that helpful.