III. STUDY 1: THE LATIN SQUARE TASK
3.13. Experiment 4: Results
3.13.1. Approach to the analysis
Analyses used participant gaze data within areas of interest (AOIs) as the measure of interest. As seen in the example item in Figure 3.5, the cells of the matrix and the response options were used to create an AOI template applied to all items. The value of each AOI was the total gaze time spent within that AOI on that item for that participant. These values were then transferred to a set of dynamic AOIs, created for each item using an index of item attributes. For instance, for the example in Figure 3.5, item 03, cell 2B is the final target cell (the cell with the ‘?’ that must be solved), and so the gaze time value of the final target cell is equal to the gaze time value of the cell 2B for that item (whereas for item 04, the ‘?’ cell is in cell 4B, so the final target cell for that item is taken from the gaze time for cell 4B). In addition to final target cell, the other dynamic AOIs within the matrix were distractor cells (filled cells which have no impact on the solvability of the item), final relation cells (filled cells involved in the relation of the final step), interim target cell (the cell that must be solved in the first step of a 2-step item, i.e., the interim step), and interim relation cells (filled cells involved in the relation of the interim step). In addition, two dynamic AOIs corresponding to the response options were calculated: final-answer-RO (the response option with the answer to the item) and interim-answer-RO (the response option with the answer to the interim step). For distractor cells, interim relation cells, and final relation cells (all of which may have more than one cell per item), the value was a sum of all the cells that corresponded to that attribute for that item.
The target, relational, and interim cells were derived from Birney et al.’s (2006) RC analysis of the LST, though an additional item analysis was then conducted for each item to determine if there were alternate solution pathways. For some items, there were indeed multiple solution pathways, which made calculating distractor and relation cells difficult. For these, we first assumed that the most relationally simple pathway was taken. In the event of a tie (e.g., an item where two binary solution pathways were available), each separate solution pathway was calculated separately, and the final value of the relation cells was equal to the
highest gaze duration solution pathway used by each participant. For distractor cells, the cells
were only summed if they did not contribute to any potential solution pathways. In other words, distractor cells were filled cells that, if removed and turned to empty cells, would not affect the solvability of the item regardless of the pathway taken. Although this approach may result in some loss of gaze data if participants switch solution pathways through the problem, it was the most straightforward solution to ensuring there was only one set of relation and distractor cells per item for use in the analyses.
Figure 3.5. Areas of interest (AOIs) for the LST. The matrix on the left displays the AOI
template analogously applied to all items. These template AOIs are converted to dynamic AOIs for each item, as demonstrated by the example matrix on the right. For 1-step items, there is no interim target cell, interim relation cells, or interim answer RO (response option).
Distractor cells are shape-filled cells that have no impact on the solvability of the item (i.e.,
they could be turned to empty cells and the item would have the same solution pathway). For items with multiple paths to solution (e.g., two sets of relation cells per target cell), the set of relation cells with the highest amount of gaze duration (per participant) are recorded as the relation cells for that item.
Finally, we also included RO-revisits, a measure of the number of times a participant returned to the response options on each item. Although not a measure of gaze duration, revisits is nonetheless a gaze metric, one which Laurence et al. (2018) found was the best predictor of test scores on a similar, matrix-style reasoning task.
The program recorded gaze duration data in milliseconds, but values are reported in seconds for interpretability. Hypotheses were tested using binary logistic regression on item- level data, using item metrics (RC, steps) and gaze metrics (e.g., final target cell, final
relation cells, RO-revisits, etc.) for each item predicting success on that item (0 for incorrect,
change in log-odds. Confidence intervals for odds ratios are reported, for ease of interpretability (CIs containing 1 indicate non-significance).
3.13.2. Gaze time descriptives and logistic regressions
Overall, performance was similar to that described in the earlier experiments for RC (2D M = .94, SD = .24; 3D M = .83, SD = .37; 4D M = .58, SD = .50) and Steps (1S M = .82, SD = .38; 2S M = .75, SD = .43). Descriptives for gaze metrics are provided in Table 3.8. These mean values demonstrate that, on average, about 3.5 seconds were spent on final relation cells of each item, while 4.9 seconds were spent on interim relation cells. The high variance in these descriptives is to be expected, considering they average across item types. Table 3.8. Gaze time metric descriptives.
Mean SD
Final answer RO 0.97s 0.74s
Interim answer RO (2S only) 0.67s 0.85s
Final target cell 2.21s 3.22s
Interim target cell (2S only) 1.75s 2.19s
Final relation cells 3.54s 4.00s
Interim relation cells (2S only) 4.90s 5.29s
Distractor cells 1.87s 2.96s
RO Revisits 4.22 5.32
N = 510 (15 x 36) item responses (255 for 2S only metrics)
For the first regression, item success was predicted using RC, Steps, final answer RO,
final target cell, final relation cells, distractor cells, and RO revisits. As hypothesized, RC
was a significant predictor of item success (CI95% = [0.172, 0.400], p < .001), as was Steps (CI95% = [0.216, 0.802], p = .009), both lowering the chance of success with increases. For the gaze metrics, final answer RO was a significant and very powerful positive predictor of success (CI95% = [17.77, 90.95], p < .001), though this was unsurprising, as it was
attributable to the fact that participants needed to input their answer by clicking the corresponding response option. Final target cell was also significant (CI95% = [0.801,
0.983], p = .022), though in a negative direction: for every 1 second spent looking at the final
target cell, there was, on average, a 12.3% reduction in the chance of correctly answering the
item. The distractor cells were also significant (CI95% = [0.750, 0.935], p = .002) in a negative direction: for every 1 second spent looking at distractor cells, there was, on average, a 16.3% reduction in the chance of correctly answering the item. The number of RO revisits (toggling rate) was also significant (CI95% = [0.736, 0.867], p < .001) in a negative direction: for every additional revisit to the response options, there was, on average, a 20.1% reduction in the chance to solve the item correctly. Contrary to the hypothesis, the final relation cells were not significant predictors of item success (CI95% = [0.966, 1.112], p = .001). Table 3.9 displays the full output of this regression.
Table 3.9. Output of Binary Logistic Regression with Item Characteristics, Gaze Time on
Areas of Interests (AOIs), and Revisit Rates predicting Item Success (1S and 2S items).
Exp(B) CI-Exp(B) Sig.
Relational Complexity 0.263 0.172, 0.400 < 0.001
Steps 0.416 0.216, 0.802 0.009
Final-answer Response Option (sec) 40.207 17.774, 90.952 < 0.001
Final-answer Target Cell (sec) 0.887 0.801, 0.983 0.022
Final relation cells (sec) 1.036 0.966, 1.112 0.323
Distractor cells (sec) 0.837 0.750, 0.935 0.002
Response Option Revisits (#) 0.799 0.736, 0.867 < 0.001
Constant 329.65 < 0.001
χ2 =240.17, df = 7, p < .001
Classification Accuracy = 88.8% Nagelkerke R2 = .582
N = 510 items
The second regression included the same predictors as above, but also added interim gaze metrics as additional predictors (interim answer RO, interim target cell, interim relation
cells). Because interim gaze metrics were only calculated for 2S items, only 2S items were
included. This regression was conducted over two models. The first model aimed to replicate the results of the first regression (i.e., interim AOIs were not included), while the second
model added the interim AOI metrics. The first model mostly replicated the previous
regression. However, this time, the final-answer target cell was not a significant predictor of item success, (CI95% = [0.748, 1.090], p = .288); but the final relation cells were
(CI95% = [1.005, 1.416], p = .044), such that for every 1 additional second spent looking at the final relation cells, there was, on average, a 19.3% increase in the chance of solving the item correctly. In the second model, the pattern of predictions for the previous predictors remained the same. Of the three new predictors, only interim answer RO was a significant predictor, in a negative direction (CI95% = [0.108, .813], p = .018. However, as with the other response option AOIs, this should be interpreted with caution, since those looking to input their answer look towards the response options (in this case, inputting the interim response would result in an incorrect answer, so the chance of success decreases). Contrary to hypotheses, the other two predictors, interim target cell and interim relation cells were not significant predictors, p’s > .05.