Chapter 5 Spatial ability and success in solving algebra word problems
5.2 Summary of results for problems and core competency questions
To begin addressing the hypothesis that problem representation and not problem solution is connected to spatial ability, Pearson correlation coefficients between scores on all three variables were calculated. The sample was then grouped by spatial ability into weak and strong visualizers to compare the problem and question scores of this dichotomous grouping.
This analysis of math scores was followed by looking at performance on the individual questions and problems. Two approaches to analysing math performance were taken. One was to group the sample into correct and incorrect groups and compare the PSVT:R scores of each group, calculating the magnitude and significance levels of any differences. The second was to calculate point-biserial correlation coefficients between the score on the problem or question (either 0 or 1) and the PSVT:R score.
Correlations between measures
A strong and highly significant correlation coefficient was measured between the PSVT:R and the math problem scores: F(1, 114) = 47.42, r(113) = .544, r2 = .296, p < .001 (Table 5-5). In
contrast, there was a negligible and insignificant correlation coefficient measured between the PSVT:R and the math questions: F(1, 114) = .1.99, r(113) = .131, r2 = .017, N.S. Lastly, the math problem scores were found to have a significant correlation with the questions: F(1, 114) = 23.974, r(113) = .418, r2 = .175, p < .001. The Pearson correlation is based on the assumption that the data are normally distributed which is violated in the case of the math question data.
It is more appropriate, therefore, to use the non-parametric Spearman correlation coefficient and this was calculated as rS = .459, p < .001 for math problems to questions and as rS = -.153, N.S. for the PSVT:R to questions. With either method – Pearson or Spearman - the correlation coefficient between the two math measures is significant with 18 to 21 % of variation shared between them.
Table 5-5. Correlation matrix for scores on the PSVT:R, math problems and math questions, n
= 115.
These findings indicate a stark contrast in magnitude and significance between the PSVT:R correlations with each math measure: 30 % of the variation in the PSVT:R data is shared with the problem scores but nothing of significance is shared with the question scores. This
difference indicates that the null hypothesis – there is no difference between weak and strong visualizers on either math measure – is false. The correlation values tell us that as PSVT:R scores increase, math problem scores also increase but math question scores remain constant.
Hence a grouping based on two different ranges of PSVT:R should reveal a difference in math problem scores but not question scores. This was tested by comparing the math problems scores of weak and strong visualizers using an independent samples t-test which revealed a sizeable and significant difference between the math problem scores of the two groups but not between their math question scores (Table 5-6). Hence, the null hypothesis can be
rejected and it can be said that there is a difference between weak and strong visualizers in problem solving ability but not in core competency skills.
Weak Strong
Math
Measure n M SD n M SD t-test Sig
(2-tailed) Cohen’s d (Size) Problems 55 1.71 1.36 60 3.07 1.29 -5.507 .000 1.03 (Large) Questions 55 5.49 1.48 60 5.15 1.26 1.335 .185 0.25 (Medium)
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
Table 5-6. Sample grouped by spatial ability level to compare means of math problems and questions.
Given the relative simplicity of these problems, this finding can be interpreted to mean that spatial ability plays an important role during problem representation but is not relevant for the problem solution phase.
Performance on each problem and question
Some further statistical analysis was conducted to examine how weak and strong visualizers performed on each individual problem and question. This was done by grouping the entire sample by correct or incorrect response to each problem and question and then calculating mean PSVT:R scores for each of these two groups as shown in Table 5-7 for the problems and Table 5-8 for the questions. These means were then compared using an independent samples t-test to measure the significance of any difference between them. A Cohen’s d effect size was also calculated to indicate the relative size of the difference.
Correct Incorrect
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
Table 5-7. Comparison of means of PSVT:R scores for both the DIT common 1st eng and OSU freshman eng groups for those correct and incorrect on each math problem (n=115)
For problems 1 to 6 (Table 5-7), the differences between the spatial test scores of the Correct and Incorrect groups are significant at the p < .01 level for 4 problems – Jug, Cans, Rain and Blood,significant at the p < .05 level for the Lawn Problem and not significant at all for the Jars problem. In terms of effect size, the magnitudes of the differences in PSVT:R was found to be large for all but the Jars problem. For Problems 1 to 6 there are at least 30 participants in each group (correct v incorrect). Problems 7 and 8 consist of fewer cases as these problems were not attempted by the DIT sample. It was decided that the membership of these categories is too small to draw meaningful conclusions from statistical results and they were excluded from further analysis in this report.
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
Table 5-8. Comparison of means of PSVT:R scores for both the DIT common 1st eng and OSU freshman eng groups for those correct and incorrect on each math question (n=115)
In contrast, when the same comparison was made between the Correct and Incorrect groups for the core competency questions (Table 5-8), none of the differences in PSVT:R were significant at p < .01, three were significant at p < .05 – questions 1, 2 and 5 – and three were not significant at all. As measured by Cohen’s d, moderate to large and significant differences were observed for 2 of these questions. It was interesting to observe some significant
differences in spatial ability based on these simple tests of math competencies as there was no significant correlation between questions as a set and spatial ability. This raised the possibility that some participants failed to solve the problems because they lacked the required core competency. To eliminate core competency as potentially confounding variable, the sample was restricted to those who correctly answered the corresponding core competency question when analysing the data for each problem. This repeated analysis is presented in Table 5-9 where it can be seen that the reduction in the number of cases was not significant because a majority of participants correctly answered the core competency questions.
Correct Incorrect
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
Table 5-9. Comparison of means of PSVT:R scores for those correct and incorrect on each math problem with cases excluded if the answer to the question corresponding to the problem is incorrect (n varies by problem).
After limiting the sample in this way, the pattern observed in Table 5-7 is repeated in Table 5-9. Membership of each group is still reasonably large, the smallest containing 22
participants. As before, there is a highly significant (p < .01) difference with a large effect size (d ≥ .75) in the spatial ability levels between correct and incorrect problem solvers on the Jug, Cans, Rain and Blood problems, a significant (p < .05) and moderate (d = .52) difference on the
Lawn Problem and no difference on the Jars Problem. Again, the Jars and Pencils problem (no.
5) is the exception to the rule – there is no difference in spatial ability levels of those who got this problem correct and those who didn’t.
In a case like this where one variable is binary or dichotomous and the other continuous, an alternative is to calculate a point bi-serial correlation by grouping the sample as above and comparing the PSVT:R scores of the two groups. These results, presented in Table 5-10, support the same findings as above: correlations are medium to large and significant at p < .01 for Problems 2, 3, 4 and 6, small for the Lawn problem while the Pencils & Jars problem is the exception with no significant correlation.
Problem 1 2 3 4 5 6
PSVT:R .203* .414** .362** .336** -.072 .395**
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
Table 5-10. Point bi-serial correlation between PSVT:R and each math problem (n = 115)
The correlation between the spatial test and math problem scores can also be presented in a categorical way by grouping the sample based on the number of correct problems scored and then computing the mean PSVT:R score for each of these groups. As shown in Table 5-11 and Figure 5-3, weak visualizers are overrepresented in the low scoring groups and
underrepresented in the high scoring groups.
PSVT:R
Table 5-11. Sample grouped by success rate on problems and by spatial ability.
The problems the participants were asked to solve are short, simple algebra word problems that require basic arithmetic and either a basic core math competency or prior knowledge of
an equation such as cylinder volume. Problems of this nature are typically found in a US middle school or the Irish junior certificate curriculum. In order to solve the problem, a participant is required to read and interpret the problem statement from which a strategy or approach is determined before applying basic arithmetic, core competency and/or prior knowledge. The findings from this analysis show the entire problem solving process shares much in common with the ability to score well on a totally different test, the PSVT:R, that measures a cognitive factor called spatial visualization.
Figure 5-3. Number of weak and strong visualizers grouped by score on the math problems and expressed as percentage of total for each problem score.
The absence of a correlation between the PSVT:R and the competency questions can be interpreted in two ways – there is no relationship between the two measures or there is a relationship but it is hidden by the homogeneity of the question data set. No relationship implies the cognitive activities required by each group are very different and do not overlap.
This is plausible if these competencies have been practiced so many times that they can be applied in a reflexive manner with little thought so one is not required to visualize or hold things in working memory while transforming them. The lack of variation in the competency data set itself (most participants were correct on most of the questions) could result in no correlation when there potentially is a relationship but it’s just not revealed by these particular
competency questions. Despite this homogeneity, a significant but small correlation did emerge with the problem solving scores. The two math measures are certainly related – one can’t solve the problem without having the core competency – and are strongly enough related for a correlation to be observed even with one data set being highly skewed.
Therefore, if a relationship exists between competency questions and PSVT:R, it is not strong enough to emerge in this case and it seems more logical to conclude the two measures share little in common at a cognitive level. As discussed in the literature review, not all measures of mathematics correlate with spatial ability.
These findings reveal a correlation between spatial ability and problem solving and a significant difference in the problem solving abilities of weak and strong visualizers. This difference is not explained by the problem solution step as there is no variation in core competency with spatial ability. When cases are limited to those who correctly answered the competency questions, the findings still hold. Therefore, the null hypothesis is falsified on the basis of these results – it has been shown there is a difference between weak and strong visualizers in the way they solve problems. If the null hypothesis is false, does this mean the hypothesis is true, that strong visualizers have superior skills in representing problems? Not necessarily. In fact, all that has been shown is that when this sample is grouped by spatial ability scores a difference was observed in the math problem scores of the two groups. Why there is a difference remains a matter of conjecture and is the point addressed in the next round of analysis, a qualitative examination of the participants’ solutions to identify how participants were reading and interpreting the problems, what actions they performed in solving them, what mistakes they made and what schema were evident in their solution.
Repeatability of the findings
It is quite possible these findings may not hold for other samples of first year engineering students. This could occur, for example, if the problem solving data collected from another
sample were highly skewed, either negatively (sample consists of very good problem solvers) or positively (bad problem solvers). In such a case, a correlation with the spatial test might not emerge and very different conclusions would be made. This occurred in one multi sample study in which a significant relationship between spatial and math was observed for some samples but not for others, i.e. it varied with the sample (Casey et al., 1995). Hence, there is a real risk that these findings might not be observed in other samples and that what has been observed here is unlikely to be repeatable in other contexts.
In order to address the issue of repeatability, findings from an additional two samples are presented here and compared with the existing results. However, some interpretation is required as there was variation in the measurement instruments used. An additional sample from DIT consisted of 23 second year engineering technology students who were administered the MCT and the 6 math problems and 6 math questions. Another sample from OSU consisted of the majority of first year engineering students who completed the PSVT:R and a Math Placement Test (MPT) during their orientation programme in the summer of 2016. The MPT consists of 25 questions, many of which are procedural in nature but several questions require some form of problem solving. One question on the test was in word algebra format. An example of this test is provided in Appendix D. Of the 25 questions, five were categorised as problems and 6 as core competency questions. The remaining 14 did not fit into either category. Correlations between PSVT:R and all 25 questions on the MPT, the five problems and the 6 questions were then calculated. The findings from these additional samples are shown along with the existing data in Table 5-12.
Sample Spatial Test
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
1. An additional two math problems were administered to this sample but this correlation was calculated using data from the set of 6 problems that were common to both samples. The correlation between the PSVT:R and all 8 problems was measured to be .387**, i.e. a very similar result.
Table 5-12. Summary of spatial-math correlations for all samples used in this study.
One interesting finding did emerge from the MPT data. When each question was analysed by comparing the PSVT:R scores of those who were correct and incorrect, the largest effect size belonged to the word algebra problem on the MPT (d = .41, p < .000) and effect sizes were found to be negligible for the procedural questions.
The finding taken from the DIT and OSU 1st year engineering data – spatial correlates with problem solving but not core competencies – was replicated using data from another DIT sample and a very similar observation was made using data from a large sample of OSU 1st year engineering students which showed a sizeable and significant difference in spatial ability for those able to solve a word algebra problem versus those who could not. While
repeatability is never guaranteed, some evidence exists to show these findings are not isolated but are likely to be observed in other samples.
Conclusions
Beginning with a broad and somewhat confused definition of engineering problem solving, the initial phase of this project was successful in producing a measurement of problem solving that co-varied to a large and significant extent with the measures of spatial ability selected for the study. This success was both notable and encouraging as not all measures of mathematical ability correlate with spatial ability tests. The set of math problems produced in the pilot study presented an opportunity to collect valuable data that could be used to examine the
relationship between math problem solving and spatial visualization. Given the word algebra nature of the problems it also presented an opportunity to add to the existing literature on approaches to solving such problems.
The two samples of students who participated in this study came from first year engineering programmes in different locations, Dublin Institute of Technology and Ohio State University.
Although the percentage of weak visualizers, those scoring 18 or lower on the PSVT:R, in such samples is normally around 20 %, the focused recruitment of weak visualizers in the OSU sample resulted in a combined sample that contained 55 weak and 60 strong visualizers. This was important from a statistical point of view because when the sample was divided in various ways during the analysis the membership of the resulting groups was, in most cases, large enough to allow statistical tests of significance to be performed.
Having administered the PSVT:R, the set of math problems and the core competency questions to the full sample (n = 115), it was found that both the PSVT:R and the math problems scores were normally distributed. Most participants achieved high scores in the set of competency questions which led to this data set being highly skewed to the upper end of the scale. The PSVT:R was found to be significantly correlated with scores on the math problems, r = .544, p <
.001. The magnitude of this correlation indicates a sizeable amount of variation, 30%, is shared between the two measures or, in other words, 30 % of the variation in the math scores is explained by the spatial measure. A significance of p < .001 indicates it was highly unlikely this occurred by chance and would be observed again if the study was repeated with this sample. Scores on the core competency questions were found to be significantly correlated with the math problems, r = .289, p < .01, but the magnitude of this correlation is small with only 8 % of variation shared. Math competencies are needed to solve the problems but the core competency data set is too skewed for a large correlation to emerge. No significant correlation between the math questions and the PSVT:R was observed and it is concluded that
these two measures share little in common at a cognitive level. These finding were replicated in another sample of engineering technology students from DIT.
Analysis of a large Math Placement Test data set collected from 1053 1st year engineering students at OSU revealed that correlations with the PSVT:R were very small for questions that were highly procedural and larger for questions that had an element of problem solving. The largest correlation was found for the one question on the MPT that was presented as a word algebra problem. Hence, it is concluded that the results presented in this study are likely to be replicated if the same measures are administered to other samples of engineering students in other contexts and settings.
Strong visualizers outperformed weak visualizers in solving all but one of the math problems, the exception being the Pencils and Jars problem. When grouped by correct and incorrect for each problem, large and highly significant differences in spatial ability were found for the other
Strong visualizers outperformed weak visualizers in solving all but one of the math problems, the exception being the Pencils and Jars problem. When grouped by correct and incorrect for each problem, large and highly significant differences in spatial ability were found for the other