Mining Collaborative Patterns in Tutorial Dialogues

(1)

Mining Collaborative Patterns in Tutorial

Dialogues

1

SIDNEY D’MELLO

Institute for Intelligent Systems, University of Memphis,

Memphis, TN, 38152, USA [email protected]

ANDREW OLNEY

Institute for Intelligent Systems, University of Memphis,

NATALIE PERSON

Department of Psychology, Rhodes College,

________________________________________________________________________

We present a method to automatically detect collaborative patterns of student and tutor dialogue moves. The method identifies significant two-step excitatory transitions between dialogue moves, integrates the transitions into a directed graph representation, and generates and tests data-driven hypotheses from the directed graph. The method was applied to a large corpus of student-tutor dialogue moves from expert tutoring sessions. An examination of the subset of the corpus consisting of tutor lectures revealed collaborative patterns consistent with information-transmission, information-elicitation, off topic-conversation, and student initiated questions. Sequences of dialogue moves within each of these patterns were also identified. Comparisons of the method to other approaches and applications towards the computational modeling of expert human tutors are discussed.

Key Words: Tutorial dialogue, pattern mining, expert tutors, lectures, likelihood metric, information transmission, information elicitation, human tutoring, educational data mining

________________________________________________________________________

1. INTRODUCTION

It is widely acknowledged that the classroom is not the optimal place for learning at

deeper levels of comprehension. Hence, it comes as no surprise that students turn to

one-on-one human tutoring when they are having difficulty in courses that require them to

demonstrate causal reasoning, diagnose and solve problems, make subtle comparisons,

generate inferences and explanations, and show application and transfer of acquired

knowledge. Investing in one-on-one tutoring does have a big payoff, as evident from the

substantial empirical evidence showing that human tutoring is extremely effective when

compared to typical classroom environments [Bloom 1984; Cohen et al. 1982; Corbett

1_{This research was supported by the by the Institute of Education Sciences, U.S. Department of Education,}

(2)

2001]. Cohen et al. [1982] performed a meta-analysis on a large sample of studies that

compared human-to-human tutoring with classroom controls. Even ―novice‖ human

tutors achieved significant learning gains; with an effect size of .4 sigma (approximately

half a letter grade) compared to classroom and comparable control conditions.

Intelligent Tutoring Systems (ITSs) that promote active knowledge construction

similar to novice human tutors are also quite effective. The ITSs that have been

successfully implemented and tested (such as the Andes physics tutor, Cognitive Tutor,

and AutoTutor) have produced learning gains of approximately 1.0 standard deviation

units (sigma), or approximately one letter grade [Corbett et al. 1999; Graesser et al. 2004;

VanLehn et al. 2007]. This is an impressive feat because the 1.0 sigma effect size

produced by ITSs is superior to the 0.4 sigma effect size obtained from novice human

tutors [Cohen, Kulik and Kulik 1982], although it is lower than the 2.0 sigma effect

obtained by some expert human tutors in mathematics [Bloom 1984].

The effectiveness of one-on-one tutoring in human and computer tutors raises the

question of what makes tutoring so powerful? Chi et al. [2001; 2008] formulate three different hypotheses, which are known as the tutor-centered, student-centered, and

interaction hypotheses. The tutor-centered hypothesis contends that it is the pedagogical

and motivational strategies of the tutor that underlie the effectiveness of one-on-one

tutoring. Alternatively, the student-centered hypothesis predicts that tutoring is effective

because it gives students more opportunities to actively construct knowledge, rather than

anything the tutor does in particular. The interaction hypothesis is essentially the

blending of the tutor and student-centered hypotheses, by focusing on the coordinated

effort of both tutor and student. These different hypotheses require different research

methodologies and techniques of data analysis as is elaborated below.

The tutor-centered hypothesis has been of primary focus over the past several decades

of research, yielding some important insights about the pedagogical strategies employed

by tutors. For example, we have learned how tutors adapt to students needs by (a)

modeling and monitoring student knowledge [Chi et al. 2001; Derry and Potts 1998], (b)

employing tutoring tactics and strategies that contribute to learning gains [Du Boulay and

Luckin 2001; Fox 1991; Fox 1993; Lajoie et al. 2001; Palinscar and Brown 1984; Person

et al. 1994], (c) planning at local and global levels of discourse [Littman et al. 1990;

McArthur et al. 1990; Putnam 1987], and (d) providing emotional support for students in

social, affective, and motivational ways [del Soldato and du Boulay 1995; Issroff and del

(3)

are coded and the most frequent are assumed to be associated with enhanced learning.

More sophisticated models using regression models attempt to identify combinations of

tutor moves that are associated with learning gains in students [Chi et al. 2008; Di

Eugenio et al. 2009; Ohlsson, Di Eugenio, Chow, Fossati, Lu and Kershaw 2007].

The student-centered hypothesis contends that students are active participants in the

construction of their own knowledge, rather than being mere information receptacles.

This hypothesis has found substantial support in the tutoring literature [Chi 1996; Chi,

Siler, Jeong, Yamauchi and Hausmann 2001; Core et al. 2003; Jordan and Siler 2002;

Shah et al. 2002]. As with the tutor-centered methodology, it is fairly common for student

behavior to be coded and regressed or correlated with learning outcome measures [Chi,

Roy and Hausmann 2008]. This is perhaps not surprising: since structurally both the

tutor-centered and student-centered hypothesis considers the behavior of each participant

in isolation, there is usually no consideration of the larger structure or context

surrounding the interaction.

In contrast, the interaction hypothesis predicts that the effectiveness of tutoring draws

from both tutor and student behavior and their coordination with each other. Accordingly the traditional methodology of regressing or correlating coded behaviors with learning

gains is no longer sufficient. Instead, investigating the interaction hypothesis requires

regressing or correlating patterns of coded behavior with learning gains. The interaction

hypothesis has substantial overlap with the collaborative learning literature where it has

been shown that group learning outperforms individual learning [Wiley and Bailey 2006;

Wiley and Jensen 2006]. Simply put, both stress the interaction between participants.

Equally related are collaborative forms of instruction that are dialogue-centric,(e.g.,

reciprocal teaching) [Palinscar and Brown 1984]. Indeed, some researchers are beginning

to explore mixtures of collaborative learning and tutoring [Chi, Roy and Hausmann 2008;

Graesser et al. 2008; Kumar et al. 2007].

Investigations into the nature of one-on-one tutoring will inevitably encounter issues pertaining to ―grain size‖ or the level of analysis required to answer certain theoretical

questions. Barring a few exceptions [Cromley and Azevedo 2005; Shah, Evens, Michael

and Rovick 2002], the analysis of tutorial dialogue usually takes place at a very

fine-grained level, or the speech act level. Examples include investigations into how tutors

detect errors and misconceptions [Brown and Burton 1978; Chi et al. 1981; Corbett and

Anderson 1992; Feltovich et al. 2001; Fox 1991; Fox 1993; Sleeman et al. 1989], how

tutors provide feedback [Fox 1991; Fox 1993; McKendree 1990; Shute 2008], and how

(4)

tutor questions to elicit information from the student [Graesser and Person 1994; Graesser

et al. 1995; Lajoie, Faremo and Wiseman 2001].

Much of the analyses focus on the distribution of dialogue moves [Ohlsson, Di

Eugenio, Chow, Fossati, Lu and Kershaw 2007] and occasionally extend to two-step

student-tutor response cycles, (e.g., bigrams and transition probabilities) [Forbes-Riley et

al. 2005; Litman and Forbes-riley 2006; Lu et al. 2007]. Comparatively little is known

about how larger sequences of dialogue moves unite to achieve particular pedagogical

goals. Simply put, what are the dialogue patterns beyond two step student-tutor response

cycles? Identifying these dialogue move sequences along with their contextual

underpinnings will certainly expand the understanding of tutorial dialogue at a deeper

level that an analysis of the frequency of individual dialogue moves (or speech acts).

Another limitation of some of the previous work on human tutoring is that the vast

majority of studies monitored novice or unaccomplished tutors. These tutors were

untrained in tutoring skills and had moderate domain knowledge; they were peer tutors,

cross-age tutors, or paraprofessionals, but rarely were accomplished professionals or

expert tutors. Since, there is some evidence that expert tutors are more effective in

promoting learning gains than their unaccomplished counterparts [Bloom 1984; Cohen,

Kulik and Kulik 1982], an analysis of the discourse structure of expert tutoring dialogues

warrants further consideration.

This paper provides a methodology to examine the structure of tutorial dialogues

between students and expert human tutors by mining frequently occurring sequences of

dialogue moves generated during 50 naturalistic tutorial sessions. We show how two-step

transitions between moves can be aggregated into a graphical structure which permits

visual examination as well as the detection of more complex patterns. The method is

applied to a large corpus (over 45,000 speech acts) of tutoring dialogues between students

and the expert tutors in order to detect the latent structure and frequent patterns inherent

in tutorial lectures.

2. BRIEF OVERVIEW OF RELATED WORK

Although the impetus of investigations into tutorial dialogues has been at the dialogue

move level [Ohlsson, Di Eugenio, Chow, Fossati, Lu and Kershaw 2007], some recent

research has attempted to mine more complex patterns. This brief overview discusses

some of the recent data mining approaches for detecting patterns in tutorial dialogues.

The review focuses on Hidden Markov Models (HMMs) [Rabiner 1989], as a number of

(5)

Although sequential association rule mining techniques have not been extensively used in

the context of educational data mining, these are also described as they are potentially

useful pattern detection methods.

2.1 Hidden Markov Models (HMMs)

Some of the recent efforts to investigate the latent structure of tutorial dialogues utilizes

Hidden Markov Models (HMMs) [Beal et al. 2007; Boyer et al. 2009; Boyer et al. 2009;

Jeong et al. 2008; Soller and Stevens 2007]. HMMs are powerful tools for modeling

systems with sequential observable outcomes when the states producing the outcomes

cannot be directly observed (i.e., they are hidden) [Rabiner 1989]. HMMs can be used to model tutorial dialogue by assuming that the hidden states are ―tutorial contexts‖ or ―tutorial modes‖ [Cade et al. 2008], and the observed states are the dialogue moves. Examples of tutorial modes include ―lecturing on a topic‖ and ―modeling a problem‖,

while examples of dialogue moves are hints, prompts, and positive feedback. In this

context, an HMM will specify (a) the distribution of start states [start matrix], (b) the

distribution of dialogue moves given that the HMM is in a particular mode [emission

matrix], and (c) the probability of transitions between modes [transition matrix].

HMMs can be trained to automatically detect tutorial modes [Boyer, Phillips, Ha,

Wallis, Vouk and Lester 2009; Boyer, Young, Wallis, Phillips, Vouk and Lester 2009],

which is an advantage over manual annotation of modes [Cade, Copeland, Person and

D'Mello 2008]. However, the large number of parameters that need to be estimated

during the training process raises some problems. For example, the corpus analyzed in

this paper consists of 8 modes and 43 speech acts (described in the Tutorial Corpus

section). An HMM to model this data set would require 412 parameters (8 for the start

matrix, 8 × 43 for the emission matrix, and 8 × 8 for the transition matrix). This is, of

course, a conservative lower bound estimate, because we assumed that the number of

hidden states was known prior to the parameter estimation. In many cases, when there is

no underlying theory for guidance, determining the appropriate number of hidden states

is a challenging problem in itself.

Another problem is that the parameter estimation procedures that are used to train

HMMs, such as the popular Baum-Welch algorithm [Jurafsky and Martin 2008; Rabiner

1989], are quite sensitive to initial parameter estimates and can converge onto local

instead of global optima. This is a critical problem when models are seeded with random

parameter estimates. Although the parameterization problem can be alleviated by seeding

(6)

complex models with a large number of parameters, such as the models discussed in this

paper.

2.2 Sequential Association Rule Mining

Detecting structure in tutorial dialogues is essentially a sequential data mining problem.

Several algorithms that detect frequent patterns in sequential data have been proposed.

These include algorithms based on the Apriori framework such as GSP [Srikant and

Agrawal 1996], SPADE [Zaki et al. 1997], PSP [Masseglia et al. 1998], and a more

recent set of algorithms based on the pattern-growth framework such as Span,

PrefixSpan, CloSpan, and IncSpan [Han et al. 2000; Mortazavi-Asl et al. 2004].

Classification based on associations [Liu et al. 1998] has recently been used in a tutoring

context by Lu and colleagues to automatically extract feedback rules from a corpus of

human-human tutoring [Lu et al. 2008].

In general, sequential association mining algorithms and exploratory sequential data

analysis techniques attempt to detect frequent sequences of events as well as frequently

occurring nested events [Sanderson and Fisher 1994]. To this extent, they can contribute

towards an analysis of tutorial dialogues. However, one of the limitations of these

algorithms is that apart from identifying frequent patterns, most do not offer a mechanism

to integrate different patterns in order to obtain a broader understanding of the

phenomenon being analyzed. For example, log-linear models can be used to determine

whether a sequence is significant, but they are not well-suited for extracting larger

structure than two steps [Olson et al. 1994]. Lag sequential analysis is perhaps more

appropriate for determining larger structures since it can be used to determine across

multiple spans, or lags, whether a succeeding category occurs significantly differently

from chance [Bakeman and Gottman 1997].

The common sequence mining approaches have some limitations that introduce some

challenges with respect to their applicability for the analysis of tutorial dialogues. For

example, one limitation of log-linear models is that they are sensitive to low frequency

cell counts in the contingency tables used to construct these models. One proposed

solution to alleviate this problem involves eliminating low-frequency data elements prior

to the analyses [Olson, Herbsleb and Rueter 1994]. Although this is a viable solution in

some domains, it is not suitable for analyses of tutorial dialogues, because infrequent

dialogue moves can be very meaningful, and in some cases they play a more important

role than high frequency moves [Ohlsson, Di Eugenio, Chow, Fossati, Lu and Kershaw

(7)

however, it has a different set of limitations that are discussed in length in a previous

publication [Olson, Herbsleb and Rueter 1994]. Another limitation that is specific to

sequential association rule mining techniques that are based on the Apriori framework

[Srikant and Agrawal 1996], is that these approaches require the specification of arbitrary

parameters to prune the search space (i.e., support thresholds).

Finally, PRONET is a methodology for constructing a network representation from

proximity data [Cooke et al. 1996]. The basis of the PRONET approach is the Pathfinder

algorithm [Schvaneveldt 1990; Schvaneveldt et al. 1989] which takes a proximity matrix

as input and yield a directed or undirected graph representation. The resulting graph has

the property that each link is a minimum weight path connecting two entries in the

proximity matrix. The graph representation is parameterized by , which is the

Minkowski distance between nodes, and , which is the maximum number of links that

can connect two nodes. In essence, Pathfinder is not a sequential datamining algorithm,

but rather a technique of scaling data into a network representation.

2.3 Differences between Current Approach and Existing Sequential Analysis

Methods

As evident from the aforementioned discussion, HMMs, sequential association mining

approaches (e.g., Span, PrefixSpan, CloSpan, IncSpan, GSP, PSP), and exploratory

sequential data analysis techniques (e.g., log-linear models) are all viable methods to

extract patterns in tutorial dialogues. These methods are accompanied with an associated

set of disadvantages that limit their applicability towards the analysis of tutorial

dialogues. The present paper proposes an alternate method to address some of these

limitations. Briefly, our method involves detecting frequent two-step sequences of

dialogue moves and integrating these sequences into a directed graph. Different aspects

of the tutorial dialogues can be investigated by focusing on different characteristics of the

graph, as will be elaborated later.

Our approach capitalizes on the benefits of the various pattern mining methods, but

differs from these methods in a number of ways. First, it incorporates fewer parameters

than HMMs and does not require complex parameter estimation methods, thereby

avoiding the local versus global optima conundrum. Second, unlike methods derived

from the Apriori framework, it utilizes null hypothesis significance testing instead of

arbitrary threshold specification for search space pruning. Third, it is not sensitive to low

(8)

and finally, it integrates frequent patterns into a larger structure that is conducive to a

broader analysis of the phenomenon being modeled (i.e., tutorial dialogues in this paper).

We illustrate the method by analyzing the latent structure in student-tutor dialogue

moves from an existing corpus. We begin with a brief overview of the corpus and the

coding scheme used to annotate the student and tutor speech acts.

3. TUTORIAL DIALOGUE CORPUS

The corpus consisted of 50 tutoring sessions between students and expert tutors on the

topics of algebra, geometry, physics, chemistry, and biology. The students were all

having difficulty in a science and/or math course and were either recommended for

tutoring by school personnel or voluntarily sought professional tutoring help.

The expert tutors were recommended by academic support personnel from public and

private schools in a large urban school district. All of the tutors have long-standing

relationships with the academic support offices that recommend them to parents and

students. The criteria for being an expert tutor were (a) have a minimum of five years of

one-to-one tutoring experience, (b) have a secondary teaching license, (c) have a degree

in the subject that they tutor, (d) have an outstanding reputation as a private tutor, and (e)

have an effective track record (i.e., students who work with these tutors show marked

improvement in the subject areas for which they receive tutoring).

Fifty one-hour tutoring sessions were videotaped and transcribed. To capture the

complexity of what transpires during a tutoring interaction, two coding schemes were

developed to classify every tutor and student dialogue move in the 50 hours of tutoring

[Person et al. 2007; Person et al. 2007]. A total of 47,296 dialogue moves were coded. A

dialogue move was either a speech act (e.g., a tutor hint), an action (e.g., student reads

aloud), or a qualitative contribution made by a student (e.g., partially-correct or vague

answer). Multiple dialogue moves could occur within a single conversational turn. For

example, the conversational turn ―[No, no, let’s, let’s], [let’s let’s not get confused on

that]. [When we talk about mitosis, we’re talking about how your skin would produce new skin cells]‖ is comprised of negative feedback, a general motivational statement, and

direct instruction.

The Tutor Coding Scheme consisted of 27 categories inspired by previous tutoring

research on pedagogical and motivational strategies and dialogue moves [Cromley and

Azevedo 2005; Graesser, Person and Magliano 1995; Lepper and Woolverton 2002]. The

moves consisted of various forms of information delivery (e.g., direct instruction,

(9)

prompts, pumps, forced choices), elaborated and non-elaborated feedback (i.e., positive,

negative, neutral), some motivational moves (e.g., general motivation statement,

solidarity statement), humor, and off-topic conversation (see Table I).

A 16 category coding scheme was also developed to classify all student dialogue

moves (see Table I). Some of the student move categories captured the qualitative nature

of a student dialogue move (e.g., correct answer, partially-correct answer, error-ridden

answer), whereas others were used to classify types of questions, conversational

acknowledgments, and student actions (e.g., reading aloud or solving a problem).

Four trained judges coded the 50 transcripts on the dialogue move schemes. Cohen’s

kappas were computed to determine the reliability of their judgments. The kappa scores were .92 for the tutors’ moves and .88 for the students’ moves. A kappa of .75 or greater

is considered to be excellent [Robson 1993].

In addition to coding dialogue moves, we also coded ―tutorial modes,‖ or briefly ―mode‖. A mode can be considered to be the overarching context or teaching phase

during which learning occurs, such as a lecture, a problem modeling phase, a problem

scaffolding phase, etc. Each of the dialogue moves in the current corpus was assigned to

one of eight modes: introduction, lecture, highlighting, modeling, scaffolding, fading,

off-topic, and conclusion. Kappa scores for the eight modes ranged from 0.8 to 1.0 with a

(10)

Table I. Coding Schemes for Dialogue Moves

Dialogue Move Examples

Tutor Moves

Comprehension Gauging Question ―Do you understand‖, ―You remember?‖ Attribution Acknowledgment ―You’re just making a mental block.‖

Conversational Ok ―Ok.‖ ―Alright.‖

Counter Example ―And I don’t mean chunks like in pieces of potato.‖

Direct Instruction ―Well the lysosomes contain digestive enzymes.‖

Example ―Most picked FedEx as an analogy for Golgi body.‖

Forced Choice ―Would that be random, uniformed, or clumped?‖

Motivational Statement ―See, you get it now.‖ ―Wow, you’re so good.‖

Hint ―It’s a family tree…it’s just a regular family tree.‖

Humor ―You never see a rose bush jump on a person.‖

Negative Feedback ―No.‖ ―Uh uh.‖

Negative Feedback Elaborated ―Before that, you’re jumping too far ahead.‖

Neutral Feedback ―Well, that’s not quite it.‖

Neutral Feedback Elaborated ―Uh, ATP is gonna be one of the players.‖

Paraphrase [S] ―It ………...‖ [T] ―It’s inside the wall.‖

Off topic ―I’m cooking dinner—I’m really excited.‖

Pose New Problem ―Difference between prokaryotes and eukaryotes.‖

Pose Simplified Problem ―What is the outside layer?‖

Positive Feedback ―Correct.‖ ―Right.‖ ―Exactly.‖

Positive Feedback Elaborated ―It’s still random, exactly.‖

Preview ―Chloroplast, we haven’t talked about that yet.‖

Prompt ―So 200 times one is what?‖

Provide Correct Answer ―Code next to it.‖

Pump ―Which is?‖ ―And?‖ ―Then?‖

Repetition S: ―Is it commensalism?‖ T: ―Commensalism.‖

Solidarity Statement ―Now think with me.‖ ―Alright, here we go.‖

Summary ―Alright, so we’ve talked about…‖

Student Moves

Common Ground Question ―Aren’t they more lined up, like more in order?‖ Conversational Acknowledgment ―Ok.‖ ―No sir.‖ ―Yes ma’am.‖

Correct Answer ―In meiosis it starts out the same with 1 diploid.‖

Error Ridden Answer ―Prokaryotes are human, eukaryotes are bacteria.‖

Gripe ―Ugh.‖ ―<groans>‖

Knowledge Deficit Question ―What do you mean by it doesn’t have a skeleton?‖

Metacomment ―I don’t know.‖ ―Yes, I understand.‖

Misconception ―I always used to get diploid and haploid mixed up.‖

No Answer ―Umm.‖ ―Mmm.‖

Off topic (Tutor) ―Did you all have a quiz today.‖

Partial Answer ―It has to do with the cells.‖

Read Aloud ―Question 7: Plot growth pattern.‖

Social Coordination Action ―No, I didn’t hear about that.‖

Student Works Silently -

Think Aloud ―500 equals 50 and 50 divided by 500 gives 10.‖

(11)

4. MINING STRUCTURES IN LECTURES

We use lectures as an illustrative example of the pattern mining approach. Our use of

lectures is motivated by a number of factors. First, lectures and scaffolding are the two

most frequently occurring modes during expert tutoring sessions. They collectively

comprise 77% of the corpus, with approximately 30% of the turns being lecture

deliveries. Second, the complexity of lectures lies somewhere between scaffolding and

the other modes, thereby making lectures an ideal example to illustrate the method. Third,

our analysis of the structure of lectures yielded some interesting patterns that are

somewhat different from some of the ways lectures have been previously conceptualized

[Chi 1996; Lu, Di Eugenio, Kershaw, Ohlsson and Corrigan-Halpern 2007; Wheatley

1991].

The method involves three stages or processes. These include (1) detecting significant

excitatory transitions between dialogue moves, (2) constructing a representation that

integrates the significant excitatory transitions, (3) generating and testing data-driven

hypotheses from the representation. The input data consists of time series of student-tutor

dialogue moves that have been annotated by a human or a computer (human annotation in

this paper). Each student-tutor dyad has its own time series; hence, the input consists of

such time series.

4.1. Transition Detection

4.1.1 Likelihood Metric. The first step involves detecting significant two-step transitions between dialogue moves from the time series. If there are dialogue

moves in the corpus, then the number of transitions considered is . These

transitions are computed for each individual time series and significant transitions are

identified via null hypothesis significance testing. It should be noted that this form of

brute force sample space searching is only applied to detect significant two-step

transitions. The search space is substantially pruned for higher level transitions as will be

illustrated in Section 4.3.

We use the likelihood metric [D'Mello et al. 2007] to compute the likelihood of a

transition between any two moves. The metric can be represented as ,

where is the move at time (the current move), is the next move at .

We motivate the likelihood metric by contrasting it with the conditional probability of

a transition to given that the immediate move is (see Equation 1). While

conditional probabilities are suitable for exploring the strength of association between

(12)

feedback (NF), is very frequent, the conditional probability (NF | ) will be high

because (NF | ) is proportional to the frequency of NF (also known as base rate).

(Equation 1)

The likelihood metric addresses the influence of base rate by penalizing associations

that are not greater than an expected amount of association. Formally, the likelihood

metric is defined as:

(Equation 2)

The reader may note significant similarity to Cohen’s kappa for agreement between

raters [Cohen 1960] and indeed the likelihood metric can be justified in a similar fashion.

The definition of Cohen’s is listed in Equation 3.

(Equation 3)

In Equation 3, Pr(A) is the agreement between two raters, and Pr(E) is the expected

agreement. So removes the agreement expected by chance and then normalizes by the

total possible agreement (1: perfect agreement) minus expected agreement again. The

likelihood metric proceeds in the same way, however rather than agreement we use

conditional probability as a measure of association (see Equation 2). The expected degree

of association is , because if and are independent, then

Therefore the numerator of Equation 2 equals the degree of

association observed minus the degree of association expected under independence. If the

observed degree of association is higher than that expected under independence, then the

numerator will be positive (a positive association). If the observed association is equal to

that expected under independence, then the numerator will be zero (no association).

Finally, if the observed association is less than that expected under independence, the

(13)

is intuitively understandable as the direction and size of the association between

and , accounting for the base rate of .

One sample t-tests can be used to ascertain whether a transition is statistically

significantly greater than (excitatory), less than (inhibitory), or equal to zero (no

relationship between immediate and next move). The effect size (in standard deviation

units) for a transition can be measured by Cohen’s , , where

and are the sample means and standard deviations for a transition,

respectively. Effect sizes of .2 sigma are small effects, .5 sigma is a medium effect, and .8

sigma or greater is a large effect [Cohen 1992].

4.1.2 Applying Likelihood Metric to Lectures. A time series that preserved the temporal ordering of the moves was constructed for each session. We extracted the move

sequences that were classified as lectures. Four out of the 50 sessions did not include a

lecture mode; hence, we examined the remaining 46 time series. On average, there were

298 moves per time series (SD = 325). Time series ranged from 16 to 1692 moves with a median of 172 moves. Overall, the 46 lectures comprised 13,712 moves.

We applied Equation 2 to each of the lecture time series. A one-sample t-test was

used to test the significance of 1806 out of the possible 1869 (43 × 43) transitions.

Transitions between the same moves were not considered as the coding scheme does not

permit such transitions. It appears that there were 970 transitions that were statistically

significant at the .05 level.

It should be noted that there is a risk of increasing Type 1 (false positive) errors when

a large number of significance tests are conducted. One option to alleviate this risk is to

focus on a small set of theoretically derived predictions instead of considering the entire

set of potential transitions. This option was not considered because the primary goal of

the method is to mine tutorial dialogues for interesting patterns and latent structures.

Focusing on a small subset of transitions does not advance this goal.

Another option is to make the significance criteria more stringent by applying a

post-hoc correction. A commonly used multiple test correction is the Bonferroni correction

[Shaffer 1995], where the significance criterion (α) is adjusted for the number of

comparisons ( ). In particular, , where is the corrected significance

threshold.

Applying the Bonferroni correction to our data would result in an extremely

conservative α threshold of 0.000028 ( Using such a low alpha value is

also not attractive as this substantially increases the risk of committing Type II (false

(14)

A closer examination of the number of significant transitions revealed that it is

unlikely that our results were obtained by a mere capitalization on chance. Monte-Carlo

simulations across 100,000 runs confirmed that the probability of obtaining 970 out of

1806 significant transitions (52%) by chance alone is approximately 0. Therefore, it is

unlikely that the overall patterns in the data were obtained by chance.

Table II presents the direction (+,-) of the transitions that were significant at the .05

level. The results indicate that a majority of the transitions were inhibitory, which is what

we expected, as the likelihood metric is quite conservative. Although inhibitory

transitions are informative in their own right, an examination of these transitions is

beyond the scope of the paper. Hence, the subsequent analyses exclusively focus on the

68 excitatory transitions.

There is an obvious concern that 68 out of 1806 significant excitatory transitions is

not sufficient to capture the complexity of expert tutors’ lectures. We addressed this

concern by constructing a session × transition matrix (46 × 1806), where each

represented the probability of observing transition in session . The average for each

column represents the mean probability of transition across the 46 sessions with

lectures. The results indicated that the 68 excitatory transitions represented 52.4% of all

possible transitions. Hence, a substantial portion of the variance is explained by a handful

(15)

Table II. Significant transitions between moves

Next Move

Current Move 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

1. Attribution Acknowledgment + - - - -

2. Acknowledgment - + - + - - - + - - - -

3. Counter Example - - - -

4. Common Ground Question - - - + + - - - + + - - + - - - -

5. Comprehension Gauging Question + - - - + - - - -

6. Correct Answer - - - + + + - + - - - -

7. Direct Instruction/Explanation + + - - + - - - - + - - - -

8. Error Ridden Answer - - - + - + - - - + - - - -

9. Example - + - - - -

10.Forced Choice - + - - - -

11.General Motivational Statement - - - -

12.Gripe - - - -

13.Hint - + - + - + - - - + - - - -

14.Humor - - - + - - - -

15.Knowledge Deficit Question - - - - + - - - -

16.Metacomment - - - + - - - -

17.Misconception - - - + - - - -

18.Negative Feedback - - + - - - -

19.Negative Feedback Elaborated - - - -

20.Neutral Feedback - + - - - -

21.Neutral Feedback Elaborated - - - -

22.New Problem - - + - + - + - - - + - - - +

23.No Answer - - - + - - - + - - - - -

24.Conversational Ok - - - - + - - - + - - - - + + - -

25.Other - - - + - - - + + - - - - -

26.Other - - - + - - - -

27.Partial Answer - - - + + - - - -

28.Paraphrase - - - -

29.Provide Correct Answer - - - -

30.Positive Feedback - + - - - -

31.Positive Feedback Elaborated + - - - -

32.Preview - - - -

33.Prompt - - + - + - - - + - - + - - - -

34.Pump - - - -

35.Read Aloud -

36.Repetition - - - - + - - - + - - - -

37.Social Coordination Action - - - + - - - -

38.Simplified Problem - - - + - + - - - + + - - - + - - + - - - +

39.Solidarity Statement - - - -

40.Summary - - - + - - - -

41.Student Works Silently - - - -

42.Think Aloud - - - -

(16)

4.2. Integrated Representation

It is difficult to detect tutorial dialogue patterns from the transition matrix presented in

Table II. What is needed is a representation that integrates the two-step excitatory

transitions. We propose a directed graph representation of the excitatory transitions in the

matrix. Directed graph representations are useful for a number of reasons. First, they

serve as powerful illustrative tools for visual pattern detection. Second, graph algorithms

(e.g., paths, trails, searches, and many others) can be used to reveal complex,

non-intuitive patterns that elude visual detection.

Figure 1 presents a directed graph representation of the excitatory transitions in Table

II. The vertices of the graph are the individual dialogue moves and the edges represent

significant excitatory transitions between the moves (see Figure 1). For simplicity, we

only included the 34 transitions with effect sizes greater than the median, which was.52

sigma. Effect sizes for these 34 transitions ranged from .52 to 1.88 sigma (medium to

large effects), with a mean effect size of .89 sigma (SD = 35). These 34 transitions comprised 44.6% of all possible transitions observed in the data.

4.2.1 Visual Inspection of Collaborative Patterns. A visual analysis of the directed graph in Figure 1 yielded four collaborative student-tutor move clusters. The first cluster

pertains to conversational moves that support direct instruction (solid links in Figure 1).

These include, the tutor transmitting information via direct instruction and explanations, comprehension gauging questions initiated by the tutor, students’ metacomments, and

conversational moves such as acknowledgements and oks. This cluster can be considered

to be an information transmission cluster as its primary pedagogical goal appears to be the delivery of information from tutor to student.

The second cluster, or the information elicitation cluster (dotted links in Figure 1), consists of moves associated with attempts by the tutor to elicit information from the

student. These moves are variations of the Initiate Respond Evaluate (IRE) sequence

[Mehan 1979]. The sequence begins by the tutor asking the student a question with

prompts, pumps, forced choices, or simplified problems. The student responds with no

answers, error-ridden answers, misconceptions, partially-correct answers, or correct

answers. The tutor evaluates the student’s response and provides positive, negative, or

neutral feedback, with or without elaborations.

The two clusters (information transmission and information elicitation) appear to be

connected with a link from the various forms of feedback to direct instruction. The

following short excerpt from an actual lecture between a biology tutor (T) and a student

(17)

representative of information elicitation, while moves 149-151 are indicative of

information transmission. The two clusters are connected by the link from positive

feedback to direct instruction (moves 148 → 149).

[145]

T So in other words, is it is equal, greater, or less than? [Forced Choice] [146]

S Equal. [Correct Answer] [147]

T Equal. [Repetition] [148]

T Very good. [Positive Feedback] [149]

T So when you are at the carrying capacity, birth rate equals death rate.

That makes sense because you reach the carrying capacity. [Direct

Instruction] [150]

T Makes sense? [Comprehension Gauging Question] [151]

S Mm hmm. [Acknowledgment]

[152]

T So the growth rate is 0 [Direct Instruction]

In addition to the information transmission and information elicitation clusters, there

is also an off-topic conversation exchange between the student and the tutor (separate graph on top right of Figure 1). These off-topic conversations appear to be triggered by

humor (e.g., ―Well, let’s get to a good one. Surely. Ooh Rhombus! Ooh hexagon!‖) on

the part of the tutor or a socially coordinated action (e.g., ―Hand me the calculator‖) by

the student.

Although students rarely asks questions in tutoring sessions [Graesser and Person

1994], they sometimes take initiative by asking common ground questions (e.g., ―You

mean the formulas?‖) or knowledge deficit questions (e.g., ―What is the difference

between anaphase and telephase?‖). Tutors respond to common ground questions with

positive feedback, presumably to encourage such questions, followed by direct

instruction. Knowledge deficiencies are corrected with direct instruction and

explanations. Thus, student questions comprise the fourth collaborative pattern observed during lectures (dashed links in Figure 1).

4.2.2 Incidence of the Collaborative Patterns. There is the question of determining how often each of the collaborative patterns appears in the tutorial lectures. One possible

way to assess the incidence of a pattern is to add the proportional occurrence of dialogue

moves within that pattern. In general, moves can be assigned to one of the four patterns

(i.e., assignment of moves to clusters is generally mutually exclusive). However, direct

(18)

patterns. Although direct instruction is mostly associated with information transmission,

it also occurs with information-elicitation and student questions. Similarly, positive

feedback is a member of the student question and information elicitation patterns. Hence,

it is not possible to analyze the occurrence of a pattern by simply adding the proportional

occurrence of moves within that pattern.

Assignment of links to patterns, however, is mutually exclusive. It is possible to

compute the incidence of a pattern by adding the proportional occurrence of links within

that pattern. The results indicated that the eight links associated with information

transmission represented 70.2% of the 34 links. The information elicitation, off-topic

conversation, and student question patterns represented 18.6, 9.0, and 2.2 percent of the

links, respectively. Hence, lectures mainly consist of information transmission with the

occasional information elicitation and off-topic conversation patterns. As always, student

(19)

Figure 1. Directed graph representation of positive excitatory transitions between moves

(20)

4.3. Hypothesis Generation and Testing

A visual inspection of the directed graph representation of the significant excitatory

transitions was useful to identify and assess the incidence of major collaborative

student-tutor patterns. However, the real merits of directed graph representations lie in their

ability to generate data-driven hypotheses that can be empirically tested. The critical

insight here is that properties of the graph provide hypotheses about complex associations

between dialogue moves.

This claim can be best explained with some examples. Graphs have paths, cycles, and

circuits. A path is a set of distinct vertices such that two consecutive vertices in a path are

connected by an edge [Tucker 1995]. For example, new problem (newp), correct answer

(cor), positive feedback (pf), and direct instruction (die) form a path of length three (see

Figure 1). Note the length of a path is computed on the basis of its edges and not vertices.

A cycle is a sequence of vertices where the starting and ending vertex are the same,

and any two consecutive vertices are connected by an edge. An edge cannot appear more

than once in a cycle, but the same vertex can be visited more than once [Tucker 1995].

Direct instruction (die) → acknowledgement (ack) → direct instruction (die) → comprehension gauging question (cgq) → acknowledgement (ack) → conversational ok (ok) → direct instruction (die) form a complex cycle in Figure 1. A circuit is like a cycle, but each vertex can appear only once, except the starting and ending vertices. Hence,

direct instruction (die) → comprehension gauging question (cgq) → acknowledgement (ack) → conversational ok (ok) → direct instruction (die) form a circuit in the directed

graph (see Figure 1).

The directed graph representations of the excitatory representations can be explored

with graph algorithms to obtain complex patterns such as paths, cycles, and circuits.

These represent hypotheses of possible patterns in the tutorial dialogues. The existence of

these patterns can be subsequently verified by scanning the time series of student-tutor

dialogue moves. This can be achieved by computing the probability of a pattern in each

of the observed time series ( ) and comparing this probability to what could be

expected by chance. Chance can be estimated by computing the probability that the

pattern exists in the randomly shuffled surrogate of the observed time series ( ).

Random shuffled surrogates are useful controls because they preserve the prior

probabilities of individual moves while breaking temporal dependencies among moves.

The transition likelihood metric from Equation 2 can be modified to quantify the

likelihood of a particular pattern , where can be a path, cycle, trail, circuit, etc, (see

(21)

(Equation 4)

This metric is similar to the kappa statistic [Cohen 1960] and can be interpreted in the

same way as the metric presented in Equation 2. Specifically, if , then the

pattern occurs at rates greater than chance; , implies rates below chance, and

, indicates occurrence at chance levels. The significance of pattern likelihood

can be assessed with one-sample t-tests. A one-tailed one-sample t-test can be used

because we are only interested in determining whether a hypothesized pattern occurs

above chance. An example application of the hypothesis generation and testing method is

described below.

4.3.1 Detecting Circuits in Lectures. A circuit detection algorithm applied to the directed graph revealed 18 potential circuits. A sequence-matching algorithm was then

used to compute the probability of each circuit occurring in each time series. One-tailed,

one-sample t-tests indicated that 11 circuits occurred at rates significantly greater than

chance. Monte-Carlo simulations across 100,000 runs confirmed that the probability of

obtaining 11 out of 18 significant circuits (61%) is very small (approximately zero).

Therefore it is unlikely that the significant circuits were obtained by capitalizing on

chance alone.

The significant circuits primarily involved links in the information-transmission

pattern, which is what could be expected as this pattern accounts for 70.2% of the links.

The simplest circuit pertains to an oscillation between direct instruction by the tutor and

acknowledgments by the student (direct instruction → acknowledgement → direct

instruction). The tutor asserts some information (e.g., ―You always have to have radius‖),

the student provides a verbal acknowledgement (e.g., ―Right‖), and the tutor continues to

assert another chunk of information (e.g., ―You’re always gonna have radius because area’s gonna equal pi r squared‖).

A conversational extension to this circuit occurs when the tutor acknowledges the student’s acknowledgment with a discourse marker (e.g., ―ok‖) followed by another

direct instruction move (direct instruction → acknowledgement → ok → direct

instruction). The conversational ok presumably indicates that the conversational floor is

now being shifted from the student to the tutor.

A more pedagogically oriented extension to this basic circuit occurs when the tutor

(22)

are saying‖) and the tutor responds with a conversational ―ok‖ and continues with more

direct instruction (direct instruction → comprehension gauging question → metacomment → ok → direct instruction).

It is important to note that although these patterns seem trivial, they serve important

pedagogical and communicative goals. Back-channel feedback and conversational

acknowledgments are how students communicate to tutors that they are listening,

attentive, engaged, and are active partners in the conversation [Heinz 2003; Ward and

Tsukahara 2000]. Tutors presumably use comprehension gauging questions to monitor

students understanding of the content [Chi, Siler, Jeong, Yamauchi and Hausmann 2001],

although there is some evidence that students cannot accurately monitor their own

understanding [Glass et al. 1999; Graesser, Person and Magliano 1995; Person, Graesser,

Magliano and Kreuz 1994]. Tutors might also interleave these questions into the direct instruction cycle to enhance students’ engagement and to cue students to the fact that they

need to attentively comprehend the lecture.

4.3.2 Connected Components and Degrees of Vertices. In addition to path, cycle, and circuit detection, some additional properties of the directed graph provide important

insights into the nature of tutorial dialogues. One important concept pertains to the

strongly connected components of a directed graph. These are a set of maximal subgraphs where a path exists from each vertex of the subgraph to each other vertex of the subgraph

[Tucker 1995]. Identifying the strongly connected components of a graph allows us to

identify tightly coupled clusters of dialogue moves.

An analysis on the graph in Figure 1 yielded two strongly connected components.

These included moves in the information-transmission cluster (direct instruction,

acknowledgement, comprehension gauging question, metacomment, and ok) and

off-topic conversation by the student and tutor. Although these components could be

analyzed with a simple visual analysis due to the relative simplicity of lectures, more

formal methods [Tarjan 1972] would be required to detect connected components for

more complex tutorial modes such as scaffolding.

Another useful feature of the directed graph representation is the characterization of

the relative importance of a given dialogue move on the basis of the degree of the vertex representing that move. The degree of a vertex is the number of edges connected to it

[Tucker 1995]. The degree of a move, represented by a vertex in the directed graph,

provides a characterization of its contextual influence beyond mere proportional

occurrence. Contextual influence refers to the number of moves a particular move is

(23)

conversational acknowledgements, which are two frequent student moves.

Acknowledgments comprise 45% of all student moves during lectures compared to the

mere 10% accounted for by correct answers. However, an examination of Figure 1

indicates that correct answers have a larger contextual influence than acknowledgements

(degreecorrect = 6; degreeacknowledgement= 4), even though acknowledgement are four times

more likely to occur than correct answers.

Additional insights can be gleaned by comparing the in and out degrees of a given

vertex. The in-degree of a vertex is the number of incoming edges while the out-degree is

the number of outgoing edges. For example, correct answers have an in-degree of 4 and

an out-degree of 2. This indicates that the tutor is making an effort to elicit correct

information from the student via forced choices, prompts, new problems, and simplified

problems (see Figure 1). Similarly, direct instruction has an in-degree of 8 and an

out-degree of 2, suggesting that to some extent all roads lead to direct instructions, which is

what could be expected during a lecture.

In summary, as our examples show, the directed graph representations can be

examined in a number of ways to detect patterns of theoretical or practical interest.

Potential analyses not described in the present paper include performing tree

decompositions with junction trees on an undirected version of the graph, finding shortest

paths between any two moves, comparing alternate paths between vertices, and

comparing graphs across domains (e.g., math versus science).

5. GENERAL DISCUSSION

In a recent paper, Ohlsson and colleagues (2007) questioned the widely used ―code-and-count‖ method for analyzing tutorial dialogues. Code-and-count methodologies

investigate tutorial dialogues by (a) designing a coding scheme to classify tutorial speech

into dialogue moves, (b) computing the occurrence of each dialogue move, and (c)

attributing learning gains to the most frequent dialogue moves. They question the

assumption that sheer frequency of a move is predictive of its casual efficacy, and

recommend focusing on effective combinations of moves.

This paper advanced this goal by presenting a method to automatically identify

collaborative patterns in tutorial dialogues. We used the method to identify patterns of

student and tutor dialogue moves during lectures. Although we did not explicitly link the

patterns to learning gains, the focus on collaborative patterns, beyond proportional

occurrence of dialogue moves, represents an important advance in the analysis of tutorial

(24)

approach, discusses some important limitations, delves deeper into the structure of

collaborative lectures, and discusses some applications of this research towards the

development of intelligent tutoring systems.

5.1 Advantages of Pattern Mining Method

Our pattern mining method consisted of identifying significant two-step excitatory

transitions between dialogue moves, integrating the transitions into a directed graph

representation, visually and statistically analyzing the directed graph for collaborative

patterns, and generating and testing data-driven hypotheses from the directed graph. This

method has a number of advantages over alternate methods such as dependent adjacency

analysis, Hidden Markov Models (HMMs), and sequential association rule mining.

Our use of the likelihood metric (Equation 2) to identify significant move transitions

has three significant advantages. First, the metric explicitly controls for base rates,

thereby eliminating spurious transitions that can be attributed to a mere capitalization on

chance. Second, the metric yields a normalized score, thereby making it possible to

compare any two transitions, including transitions with different antecedents and

consequents. This comparison is not afforded by adjacency pair analysis. The third

advantage of the likelihood metric is that it provides a standard way of computing effect

sizes [Cohen 1992]. In contrast, association rule mining algorithms require arbitrary

specification of candidate generation thresholds such as support and confidence [Srikant and Agrawal 1996].

Another set of advantages pertains to our use of directed graph representations to

integrate significant excitatory transitions. Directed graphs offer the ability to visually

identify dialogue move clusters with distinct pedagogical and motivational functions

(e.g., information-transmission versus information-elicitation). It should be noted that the

enhanced ability to visually detect patterns is not arbitrary, but is attained by virtue of

visualization algorithms from the field of graph drawing [Di Battista et al. 1994; Herman

et al. 2000]. These algorithms use optimization techniques to pictorially depict a graph in

order to feature its important properties. For example, information-transmission moves

were spatially segregated from information-elicitation moves, so these two clusters could

be detected by a visual inspection. Visual pattern detection is difficult when complexity

of the graph increases, however, a variety of graph algorithms (e.g., connected

components) can be used in these situations.

It should be noted that HMMs also afford the ability to identify interesting

(25)

of patterns (i.e., hidden states) needs to be specified apriori. It is also the case that

additional complexities arise during HMM training when a large number of parameters

need to be estimated and there is no underlying theory to guide parameter initialization.

Our use of a directed graph to generate candidate patterns has several advantages over

other methods such as naïve pattern scanning and sequential association rule mining. The

naïve approach would entail scanning the data for every possible pattern. For example,

considering all combinations for two, three, and four-length paths yields 12,341 paths of

length two (i.e., three vertices and two edges), 123,410 paths of length three, and 962,598

paths of length four (1,098,349 paths in all). In contrast, a path analysis using our pattern

mining method would only require scanning 147 potential paths. In addition to the

obvious computational advantages, testing the significance of a reduced number of

candidate patterns alleviates concerns pertaining to false positives and Type I errors.

Although sequential association rule mining techniques reduce the number of

potential patterns, they do not provide the ability to conceptualize a graph on the basis of

strongly connected components, vertex degrees, junction trees, edge covers, vertex covers

and other potentially useful constructs. Simply put, directed graph representations are

more powerful than the item sets and rules provided by association rule mining.

5.2 Limitations of Pattern Mining Method

It is important to acknowledge some of the limitations with the pattern mining method

with respect to its application for educational data mining. Perhaps the most important

limitation pertains to the substantial data preparation activities that need to be performed

before the method can be applied. In particular, the spoken tutorial dialogues need to be

transcribed and coded for dialogue moves before they can be submitted to the pattern

mining algorithm. The transcription and data coding phases are laborious and time

consuming. They may also involve additional resource intensive subphases such as

checking transcription quality and assessing the reliability of the coding protocols.

Although this limitation would also apply to related pattern mining methods, the

substantial human resources that need to be invested before our pattern mining approach

can be successfully applied warrants further consideration.

One option to alleviate this limitation is to use automatic speech recognition for

speech-to-text transcription and automatic speech act classification to code the dialogue

moves. The automatic speech recognition option is somewhat less viable because

contemporary speech recognizers are still imperfect, particularly for real world

(26)

to 45.5% based on the system, domain, and testing environment [Hagen et al. 2007; Kato

et al. 2000; Leeuwis et al. 2003; Litman et al. 2006; Pellom and Hacioglu 2003; Rogina

and Schaaf 2002; Zolnay et al. 2007].

The news is more positive for automatic speech act classification as a number of

advances have been made. For example, Olney and colleagues used finite state

transducers to accurately classify student speech acts during tutoring sessions with

AutoTutor [Olney et al. 2003]. Hoque and colleagues used a combination of

acoustic-prosodic and discourse features to classify speech acts between two conversational

participants [Hoque et al. 2007]. We are currently exploring the possibility of

automatically classifying the student and tutor dialogue moves in our expert tutoring

corpus and have made some important advances along this front [Williams and D'Mello

in press]. Although we do not expect the automatic speech act classification system to

yield perfect results, we hope that it will be sufficiently accurate to not adversely affect

the fidelity of our pattern mining approach.

The second limitation of our approach lies in the excitatory transition detection phase.

Here, two-step significant transitions are identified and preserved for subsequent

analyses. The problem is that the current algorithm only considers exact matches between

adjacent dialogue moves. For example, the transition will only be counted

once in the following sequence: . The

second transition will be ignored because of the intermediate discourse marker

between the and moves. Naturalistic tutorial dialogues are rife with such

discourse markers, acknowledgments, and back-channel feedback. Their occurrence

might interfere with the discovery of more interesting relations such as the

transition. Fortunately, this limitation can be alleviated by either removing these

conversational oriented dialogue moves from the time series prior to the analysis or by

endowing the transition detection algorithm with approximate sequence matching

capabilities [Mount 2004]. These measures were not incorporated in our analysis of

tutorial lectures because we were interested in analyzing both the pedagogical (e.g.,

positive feedback, correct answer) as well as the conversational (e.g., ok,

acknowledgment) dialogue moves of the student and tutor.

A somewhat less important, but nevertheless noteworthy limitation pertains to the

construction of the directed graph from the significant two-step excitatory transitions. In

some cases the directed graph is unusually complex and consequently less amenable to

visual inspection. The human visual system is a highly attuned pattern detector, but it

(27)

because the graph for the lecture mode was not too complex. However, when we applied

our pattern mining approach to the scaffolding mode, the derived directed graph was too

complex for visual examination.

The problem is that the complexity of the graph increases proportional to the number

of significant excitatory transitions. Hence, one simple approach is to reduce the number

of edges in the directed graph. This can be achieved by constructing the directed graph

from a predefined number of edges; presumably the edges associated with the strongest

effects. This is precisely the approach we adopted in the present paper by only including

edges with medium to large effect sizes. Although this approach solves the complexity

problem by yielding simpler graphs, one undesirable side effect is that the number of

orphan nodes increases (i.e., nodes with no parents such as the node in Figure 1).

Therefore, advanced graph visualization methods might be necessary to produce directed

graphs that can be visually inspected.

5.3 The Collaborative Lecture

Having discussed some of the advantages and limitations of our pattern mining approach

we now discuss some of the insights we have discovered with respect to expert tutor

lectures.

It is widely acknowledged that scaffolded problem solving via mixed-initiative

dialogues underlie the merits of one-on-one tutoring [Chi et al. 1994; Graesser, Person

and Magliano 1995; Lepper et al. 1997; Puntambekar and Hubscher 2005; Rogoff and

Gardner 1984; Shah, Evens, Michael and Rovick 2002]. Hence, it is somewhat

counterintuitive that expert tutors devote substantial portions (30%) of the tutoring

sessions to lectures, because didactic lectures are not expected to yield impressive

learning gains [Chi 1996]

It might be the case that the high incidence of lecturing is one of the characteristics

that distinguish expert tutors from novice tutors and intelligent tutoring systems. This is

essentially an empirical question that can be addressed by comparing expert versus

non-expert tutors. However, this analysis is compromised by the fact that relatively few

studies have investigated expert tutors. The few studies that have documented the

strategies of expert tutors have typically focused on one or two experts with no objective

criteria of what constitutes expertise in tutoring [Person, Lehman and Ozbun 2007]. In

some of the studies, the expert tutors are PhDs with extensive teaching and/or tutoring

experience [Evens et al. 1993; Glass, Kim, Evens, Michael and Rovick 1999; Jordan and