4 3 Predicting organization quality using syntactic regularities Given a training set of articles with the same purpose, we use two models of coherence

to learn syntactic regularities.

4.3.1 Simple co-occurrence model

In this approach, we estimate the probabilities of pairs of syntactic items from adjacent sentences in the training data and use these probabilities to compute the organization quality of new texts.

The coherence of a text Tcontainingnsentences (S1...Sn) is computed as:

P(T) = n

’

i=2 |Si|

’

j=1 1 |Si 1| |Si 1|

Â

k=1 p(S_ij|S_ik ₁)

where Syx indicates the yth item of Sx. Items are either productions or syntactic word unigrams depending on the representation. Suppose thatS_ij=wqandS_ik ₁=wrwherewq andwr are syntactic items in the vocabulary. The conditional probability for the equation above is computed as follows and uses Lidstone smoothing.

p(wq|wr) =_cc₍(_wwr,wq) +dC r) +dC⇤|V|

wherec(wr,wq)is the number of adjacent sentence pairs where the first sentence contains the item wr and is immediately followed by a sentence that contains wq. c(wr) is the number of sentences which containwr. |V|is the vocabulary size for syntactic items.

4.3.2 Hidden Markov Model approach

This approach uses a Hidden Markov Model (HMM) which has been a popular imple- mentation for modeling coherence [8, 41, 51]. The hidden states in our model depict communicative goals by encoding a probability distribution over syntactic items. This distribution gives higher weight to syntactic items that are more likely for that communicative goal. Transitions between states record the common patterns in intentional

Cluster a Cluster b

ADJP!JJ PP|VP!VBZ ADJP VP!VB VP|VP!MD VP [1] This method VP-[is ADJP-[capable of [1] Our results for the difference in sequence-specific detection of DNA with reactivity VP-[can VP-[be linked to high accuracy]-ADJP]-VP . experimental observations]-VP]-VP . [2] The same VP-[is ADJP-[true for synthetic [2] These phenomena taken together polyamines such as polyallylamine]-ADJP]-VP . VP-[can VP-[be considered as the

signature of the gelation process]-VP]-VP .

Table4.3: Example syntactic similarity clusters using productions representation. The top two descriptive productions for each cluster are also listed.

structure for the domain. This approach can be expected to have some benefits compared to the simple co-occurrence model. We can model document beginning and end in a better manner using the HMM and also implement more directly the idea that sentences with similar syntax could have the same intentional structure.

Parameter initialization.

In this syntax-HMM, states hk are created by clustering the sentences from the training documents bysyntactic similarity. For the productions representation of syntax, the features for clustering are the number of times a given production appeared in the parse of the sentence. For thed-sequence approach, the features aren-grams of size one to four of syntactic words from the sequence. Clustering was done by optimizing for average cosine similarity and was implemented using the CLUTO toolkit [174]. C clusters are formed and taken as the states of the model. Table4.3shows sentences from two clusters formed on the abstracts of chemistry journal articles (taken from [84]) using the productions representation. Cluster (a), appears to capture descriptive sentences and cluster (b) involves mostly speculation type sentences.

The emission probabilities for each state are modeled as a (syntactic) language model derived from the sentences in it. For productions representation, this is the unigram distribution of productions from the sentences in hk. For d-sequences, the distribution is

computed for bigrams of syntactic words. These language models use Lidstone smoothing with constant dE. The probability for a sentence Sl to be generated from state hk,

pE(Sl|hk) is computed using these syntactic language models.

The transition probability pM from a state hi to statehj is computed as:

pM(hj|hi) =_dd₍(_hhi,hj) +dM i) +dM⇤C

whered(hi)is the number of documents whose sentences appear inhi andd(hi,hj)is the number of documents which have a sentence in hi which is immediately followed by a sentence inhj. In addition to theC states, we add one initialhSand one finalhF state to capture document beginning and end. Transitions from hS to any state hk records how likely it is forhkto be the starting state for documents of that domain. dM is a smoothing constant.

The likelihood of a text with nsentences is given by:

P(T) =

Â

h1...hn n

’

t=1 pM(ht|ht 1)pE(St|ht) Re-estimation.

With these settings as an initial HMM, we use the Baum Welch algorithm [133] to iter- atively re-estimate parameters. We run iterations until the training data likelihood no longer increases or a fixed number of iterations is reached.

All model parameters—the number of clusters C, smoothing constants dC, dE, dM andd for d-sequences—are tuned to optimize how well the model can distinguish well- organized articles from incoherent ones. We describe these settings in the next section.

We used the models we developed to perform text quality prediction for both of our genres related to research writing—academic articles and science journalism.

In document Predicting Text Quality: Metrics for Content, Organization and Reader Interest (Page 86-88)