• No results found

In Section 6.2 a comprehensive study of classification approaches is conducted for content-free topic boundary detection. The study focuses on the comparison of classification algorithms with respect to various feature sets. However, I con- fine the study with four feature sets (Equation (3.5) to (3.8)) for convenience of analysis. In this section, I probe more vocalisational and acoustic features as well as their Vocalisation Horizon form, in order to evaluate their function with topic segmentation.

6.3.1

Empty pause and filled pause

In order to scrutinize the effect ofpause on topic boundary locations, I distinguish pauses as empty pause and filled pause (introduced in Section 3.1.2). The two types of pauses are not simply additive features on V Ec, because they modify

the definition of a V Ec. If “en” is treated as a filled pause and is extracted from

a piece of continuous talk, this continuous vocalisation is separated into two new vocalisations. On the other hand, if I only identify empty pauses, the separation does not happen.

In this section, I examine these two settings through FT and PT na¨ıve Bayes classifiers and evaluate whetherfilled pause has a positive effect on segmentation. Table 6.14 and 6.15 show segmentation results from V Ec with filled pauses and

with only empty pauses, with respect to various feature settings. NoAdjmethod has been applied.

Two facts are discernable here. First, for each V Ec feature set (i.e., V OC,

GAP) EPs has higher segmentation accuracy than FPswith respect to both Pk

This result shows that filled pauses are not particularly effective as classifi- cation features, which contradicts claims by Swerts et al. [1996] on the utility of filled pauses. Empty pauses produce better results than filled pauses. We can also say that reserving long V Ec instead of breaking them at filled pause yields

higher segmentation accuracy.

Second, the GAP features are superior to V OC with any metric in PT NB, and GAP are partially superior in FT NB, on Pk and ω. So, with Bayesian

classifier, the horizon of empty pause and overlap are more predicative than vo- calisation horizon, no matter if filled pause is in consideration. But with Ensemble classifiers, V OC produces better results (see Table 6.9 and 6.10, Table 6.12 and 6.13). A possible explanation is that GAP features have 6 more dimensions than

V OC, and the dimensions are not independent. Higher complexity of feature space adversely influences most classifiers, but less so on na¨ıve Bayes models. Table 6.14: Classification accuracy of Fixed Threshold (FT) Na¨ıve Bayes classifier on EP s and F P sfeatures (with NoAdj)

Pk W D ω EP 0.341 0.41 0.553 EPV OC 0.328 0.406 0.654 EPV OCP 0.34 0.423 0.716 EPGAP 0.326 0.44 1.144 F P 0.386 0.433 0.455 F PV OC 0.383 0.422 0.298 F PV OCP 0.391 0.455 0.501 F PGAP 0.366 0.466 1.034

6.3.2

Augmented Horizon features

Vocalisation horizon and GAP horizon are tested upon with various classifica- tion schemes. Results demonstrate that horizon has substantial effect of improv- ing segmentation accuracy. In previous experiments, Vocalisation horizon only utilises the basic property: duration of V E. Now I expect to use more V E fea- tures in study, and thoroughly analyse the potential of horizon. Speaker role is selected as alternatives in this section.

Table 6.15: Classification accuracy of Proportional Threshold (PT) Na¨ıve Bayes classifier on EP s and F P s features (with NoAdj)

Pk W D ω EP 0.378 0.471 0.962 EPV OC 0.355 0.434 0.755 EPV OCP 0.365 0.446 0.789 EPGAP 0.344 0.429 0.828 F P 0.42 0.499 0.954 F PV OC 0.418 0.484 0.797 F PV OCP 0.411 0.484 0.861 F PGAP 0.395 0.471 0.804 6.3.2.1 Speaker role

Briefly speaking, speaker role horizon integrates the speaker role of previous or following V Es as a feature of currentV E. Since F Ps based features are not nec- essary, I only expand EPs features. They are formally represented fromEPROLE

(Equation 6.1) to EPROLE GAP (Equation 6.4).

EPROLE = (s, t, de, s−n, ..., s−1, s1, ..., sn) (6.1)

EPROLEP = (s, t, de, pe, o, s−n, ..., s−1, s1, ..., sn) (6.2)

EPROLEP V OC = (s, t, de, pe, o, d−n, ..., d−1, d1, ..., dn,

s−n, ..., s−1, s1, ..., sn) (6.3)

EPROLE GAP = (s, t, de, pe, o, u−n, ..., u−1, u1, ..., un,

s−n, ..., s−1, s1, ..., sn) (6.4)

Equations (6.1) to (6.4) show vocalisation features without filled pauses, where

ui = (pei, oi). I name these features together as EPROLE based features. In all

these equations, sis the identifier for one speaker role,si is speaker role of theith

V Ec preceding (i<0) or following (i>0) V Ec, t is the start time of current V Ec,

d is its duration (de refers to V Ec duration without filled pause), pe is duration

of the ith V E

c preceding (i<0) or following (i>0) V Ec, ui refers to empty pause

and overlap preceding or following V Ec (forV Ec without filled pause) and I set

n = 3 as the length of the context (or “horizon”) spanned by theV E.

Table 6.16: Segmentation accuracy of PT NB with ROLE Horizon related features

Original NoAdj Pk W D ω Pk W D ω EP 0.378 0.473 0.99 0.378 0.471 0.962 EPROLE 0.379 0.47 1.01 0.379 0.467 0.974 EPROLEP 0.388 0.478 1.01 0.388 0.478 0.98 EPROLEP V OC 0.354 0.448 1.02 0.354 0.431 0.763 EPROLE GAP 0.376 0.469 1.01 0.374 0.45 0.75

Table 6.17: Segmentation accuracy of MAP NB based AdaBoostM1 with ROLE Horizon related features

Original NoAdj Pk W D ω Pk W D ω EP 0.391 0.497 3.4 0.385 0.482 1.44 EPROLE 0.386 0.476 1.22 0.386 0.471 1.1 EPROLEP 0.413 0.505 1.96 0.413 0.502 1.34 EPROLEP V OC 0.386 0.502 2.57 0.38 0.478 1.11 EPROLE GAP 0.419 0.557 3.69 0.419 0.541 1.72

In Section 6.2 I found that thresholding na¨ıve Bayes classifier and na¨ıve Bayes based Boosting classifier are mostly successful on topic segmentation. So I test

ROLE Horizon effect with these two classifiers. From all PT NB results (Table 6.16), EP is a benchmark, which yields nearly perfect ω and moderatePk, W D.

Comparing with EP,EPROLE has limited effect on each metric. However, other

feature sets drop either ω or Pk, W D. With the increasing model complexity, ω

decreases. When I have cross comparison against VOC/GAP Horizon in Table 6.3, it is also hard to conclude that ROLE Horizon offers advantage. It is fair to conclude that ROLE is an useful alternative ofV OC with PT NB, sinceROLE

predicts a better ω.

On Boosting algorithm (Table 6.17),EPROLE andEPROLEP V OC predict bet-

obtains ω = 0.98. Other combinations of ROLE Horizon are even worse. Then

V OC Horizon fits Boosting better than ROLE Horizon.

6.3.2.2 Acoustic features

As suggested in Section 3.1.2.3, pitch and intensity features may signal segment boundaries. Here I test the importance of pitch in relation to topic boundary identification. In order to accommodate pitch information as a V E feature, I use the mean value of pitch during a V E, instead of extracting patterns of pitch variation as [Shriberg et al., 2000]. I expect to follow Shriberg’s methods in future research. EPP P = (s, t, de, pe, o, pch) (6.5) EPP IT CH P = (s, t, de, pe, o, pch, pch−n, ..., pch−1, pch1, ..., pchn) (6.6) EPV OC P P = (s, t, de, pe, o, pch, d−n, ..., d−1, d1, ..., dn) (6.7) EPV OC P IT CH = (s, t, de, pe, o, pch, d−n, ..., d−1, d1, ..., dn, pch−n, ..., pch−1, pch1, ..., pchn) (6.8)

Equations (6.5) to (6.8) show pitch horizon related features without filled pauses. I name these features together as EPP IT CH. Most of the symbols in

equations are same as those in Section 6.3.2.1, except pch and pchi which mean

pitch mean value in a V E and pitch horizon in adjacent V E. I set n = 3 as the length of the context (or “horizon”) spanned by the V E.

Table 6.18: Segmentation accuracy of PT NB with PITCH Horizon related fea- tures Original NoAdj Pk W D ω Pk W D ω EP 0.378 0.473 0.99 0.378 0.471 0.962 EPP P 0.402 0.501 1.01 0.402 0.501 0.99 EPP IT CH P 0.408 0.491 1.0 0.408 0.486 0.94 EPV OC P P 0.378 0.475 1.0 0.378 0.475 0.996 EPV OC P IT CH 0.382 0.475 1.0 0.382 0.475 0.996

In experiment,V Eseparated pitch values are extracted from 23 AMI meetings recordings. Based on previous cases, PT NB is selected as the first classifier to test pitch horizon effect. Results in Table 6.18 are not pleasant, since they present very similar or even worse accuracy than the basic EP feature. Then it is not necessary to extract pitch feature and compose the complicated horizon features. It is not responsible to simply discard all acoustic features for content-free topic segmentation, I expect to modify the approach of using pitch, and include more features such as voice intensity and prosody.