Motion Analysis - Camera-based estimation of student's attention in class

vectors in the current frame are assigned to groups.

After the groups in the new frame were created, they are connected into motion tracks (T ) which spans across frames and models temporal consistency. Groups either initiate a new motion track, or are assigned to an existing track detected in the previous frame. Assignment is done with a greedy algorithm:

1. For each vector group in the current frame, calculate the group’s centre of mass (mean value of (x, y) locations of tracking points in the group).

2. Merge the group with the motion track which has the closest centre of gravity in the previous frame, if the distance between the two centre is smaller than the specified threshold.

Raw motion vectors are shown in Figure 5.6c as purple arrows. For visualisation purposes, a set of cloud centres from several frames are connected into a track which is represented in Figure 5.6d as a coloured jagged line.

At each frame a motion track is either refreshed (a group of points was assigned to it), or is terminated. A terminated motion track is taken out of the pool of active tracks, and is evaluated to be assigned to a specific student.

The track is assigned to the student with highest probability of generating the entire motion (gb), determined by Formula 5.1. Each student (g ) is modelled with a 2D-Gaussian distribution centred on the the head location (depicted in Figure 5.6b). Function p(~v|g) represents the probability of motion vector~v being generated by student (g), based on the location of the vector in respect to the 2D Gaussian associated with the student.

gb= arg max_g X

∀~v∈T

p(~v|g) (5.1)

5.2 Motion Analysis

Amount of recorded motion varied significantly from person to person. In captured units, each motion vector represents the distance of motion displacement in pixels. Total motion intensity for a given person was captured by summing intensities of all motion vectors within the time step (2 seconds, explained in Section 5.2.3). In order to create a measurement which we can compare between individuals, normalization step was required.

Figure 5.7 – Visualization of motion intensity for a single person (green) over the class mean (grey). Horizontal axes represents the time, vertical the relative intensity of motion for the given person. Vertical markers represent the annotated classroom events, most notably - the 4 paired red markings represent the moments of start and end of the questionnaire fill-out. Horizontal red line shows the minimal intensity of motion considered an “observable movement”.

5.2.1 Motion normalization

Our distribution of motion-tracking points ensures that we provide the same chance for detecting motion of people in the front and back, but does not guarantee the same motion intensity for equivalent actions. In order to achieve this, all vector intensities associated with a specific student are normalized by the diagonal length of the student’s annotated region. Given that we do not consider the nature of the motion, our final aim is to transform the observed motion intensity to relative motion intensity which we can relate to the person’s in-class activity. The final range of relative motion intensity is between 0-100% and is based on two observations from our recordings:

1. Student is on average sitting still during the class.

2. Student has at least one full upper-body movement in the recorded footage (e.g. pose shift).

In accordance with these assumptions, our scaling mechanism also consists out of two steps: 1. We take the median value of movement intensity as the 5%. The value was taken heuristically to approximate small motion with a reasonable value, and allows us a large scaling space for bigger movements.

2. The algorithm checks that given the 5% motion intensity value, the student reaches 100% motion at least once during the class. Motion which registers above the threshold of 100% is clipped to the maximum value.

The relative motion intensity measure for a single person is shown in Figure 5.7. We acknowl- edge that this scale is tightly connected to our scenario (students sitting and taking notes, with occasional body-shift) and that the scaling should be re-defined in contexts which include a more dynamic behaviour.

5.2. Motion Analysis

5.2.2 Observable and synchronized movement

Given that 100% of relative motion intensity is roughly equivalent to full upper-body move- ment, we defined observable movement as motion with more than 30% intensity. The 30% threshold was heuristically taken as the limit which separates minor body movement and motion that can be registered by people in the student’s surroundings. The motion intensity is visualized in Figure 5.7 as the horizontal red line.

Synchronized movement instance between two persons is defined as two instances of observ-

able movement happening at approximately the same time. We use the term “approximately” because we consider different time delays between the motions to model different synchronization precisions. Detailed explanation is given in the following section.

5.2.3 Motion synchronization

Figure 5.8 – Example of co-movement on a movement intensity graphic of two persons. The picture is a snippet of a motion-visualization as shown on Figure 5.7. We are displaying mo- tion intensity of two persons overlayed over each other. Person 2 shifted hers seating position (blue line), 2 seconds later, neighbouring Person 1 (marked in green) also started re-adjusting herself.

From the dual eye-tracking theory, we know that quality of collaboration (Richardson and Dale, 2005) and understanding (Jermann and Nüssli, 2012) between two persons can be as- sessed by analysing the correlation of their gaze patterns. In our work, we expand on previous conclusions in two ways - i) by analysing the whole audience and ii) using a more general measure of activity. Our hy- pothesis is that students who listen to the teacher will be more likely to move in a synchronized manner, while an absent-minded student will act on his/hers own internal rhythm.

Synchronized motion is not limited to a specific action, but can be explained on example of note-taking - attentive students would turn pages on the hand-outs and take notes immediately after they were presented in class. More than a reaction to lecture’s au- dio/visual stimulus, motion can be seen as a

“convergence” of audience, or indirect synchronization to a signal (Section 3.1.4). If students perceive an outside event (e.g. loud noise, truck) as more important than the lecture, they would still have a synchronized motion (everybody looking through the window) but caused by a different stimulus than the teacher. In their publication Delaherche et al. (2012) terms this concept as process coordination.

Classroom synchronization was studied in a dyadic fashion, by comparing pairs of any two students. We used only the data collected during uninterrupted teaching, and did not take into account the questionnaire fill-out periods.

Similarly to the analysis of synchronization between pairs done by Delaherche and Chetouani (2010), we took into consideration the seating arrangement between two analysed students. We divided the dyads into three conditions based on their mutual visibility: immediate neighbours, visible neighbourhood or non-visible student pairs (as described in Section 3.2). In the case of immediate neighbours, the students were considered as mutually visible (both of them can observe and synchronize to the actions of the other). In case of visible neighbourhood, student sitting behind can observe the actions of the other student, but not the other way around. Non-visible student pairs were considered for cases of accidental and indirect synchronization, but not as a direct influence of one on the other.

Given that learning is not a scripted activity, reactions of students can vary or be completely blank. The research in dual eye-tracking has identified a delay of 2 seconds between the speaker’s and listener’s gaze when a specific item was referenced (Richardson et al., 2007). The conclusion of that research was that the comprehension between participants is inversely proportional to the time-lag. Based on this threshold, we define actions of two students as

co-movement if the actions co-occur within a time window of 4 seconds (depicted in Figure

5.8). We differentiate between:

• perfect synchronization, < 2 seconds apart,

• synchronization, 2-4 seconds apart (2 seconds added as time needed for executing the motion),

• weak synchronization, 4-6 sec apart.

Third period (4-6 seconds) was introduced to take into account mimicking - when the person is not reacting to the teachers stimulus but is following the reaction of others, in which we add 2 seconds for the person to observe the reaction of others and then reproduce it. Sleeper’s

lag represents the delay (“lag”) in movement caused by mimicking actions of other students

instead of reacting to the original source of information.

Algorithmically, motion synchronization between two persons was calculated as matrix multi- plication. Each person is represented with a time series of motion intensity values, sampled in 2 second steps. Co-movement matrix is created by multiplying the two time series as Nx1 and 1xN matrix (visualized in Figure 5.9a). N represents the number of samples collected for each person during the lecture.

Within the two time series, values with the same index represent same time period in the lecture. This means that perfect synchronization moments will be found on the diagonal of the co-movement matrix, coordinates (t , t ). To analyse synchronization instances (2-4 seconds

In document Camera-based estimation of student's attention in class (Page 95-99)