The second phase of analysis involved delineating the evidence rules that connect the process data and student responses to the LPs defined in Phase I and applying those

rules to the actual and simulated datasets. This phase proceeded in five main parts: creating initial evidence rules, checking the initial evidence rules through a process of line-by-line coding for a set of select cases, revising evidence rules based on the results of coding the select cases, applying final evidence rules to the full dataset, and simulating a larger dataset using the defined evidence and actual data. One of the documented difficulties of analyzing process data is that it is open to multiple methods of analysis and interpretation (e.g., Gobert et al., 2013). This phase of analysis acknowledges this difficulty through its iterative creating, checking, and revising of the evidence rules. The final products of this analysis are the actual and simulated datasets that contain coded evidence about students’ content learning and practices.

Identifying initial evidence rules. The first part of this phase of analysis identified

types of evidence. For student responses to the reflection questions and embedded table, scoring considerations in terms of the desired evidence were specified. The open-ended responses in the embedded table and reflection questions allowed for a wide range of responses which needed to be accounted for in the evidence rules. Boundaries of the acceptable types of answers needed to be established and the different ways students could answer the data needed to be acknowledged. For the process data, evidence available from the logged actions was specified. For instance, there are several different ways students could use a simulation to understand states of matter, ranging from the least complex, clicking between phases without running the simulation or simply running the simulation through one phase transition, to a more sophisticated usage, viewing each of the phases as well as their transitions. In this example, viewing the transitions between phases is considered important for students to understand what is happening at the melting and boiling points to counteract the misconception that substances change phase all at once. Some of the specifications of the evidence rules, such as the 10 seconds of viewing states, come from the design of the simulation, while others, such as the pausing between states, come from earlier cluster analysis of the process data which indicated the importance of pausing in students’ conceptual understanding of the simulation (Toutkoushian & Ryoo, 2019).

Checking initial evidence rules. Once the initial evidence rules for the process data were

established for all of the LPs, ten cases were chosen to undergo a deeper, qualitative

investigation into whether the initial codes were a comprehensive representation the possible activities students can engage in while using the simulation. In order to choose cases that cover a range of student activities two cases were chosen from each of five clusters that were defined in an earlier study using this data (Toutkoushian & Ryoo, 2019). The process data were coded line- by-line using the initial evidence rules and checked to determine if there are any actions that

were not being captured by the initial codes or if any revisions should be made to the descriptions of the initial rules. This type of coding is similar to a coding technique used in EDM, called text replay, where the codes are developed by going through cases line-by-line and re-creating the actions of the students using codes (e.g., Sao Pedro, Baker, & Gobert, 2013). Although text replay generally involves choosing small subsets of actions within a log to code, this study looked at the full log of actions due to the innate dependencies within the logs that needed to be inferred. For instance in the POM simulation the state of matter at the time of an action was not recorded, meaning that coding of the actions needed to track the state of the simulation by calculating the times between actions. The student responses on the embedded table and reflection questions were also scored using initial scoring rules and checked to ensure that the rules capture the range of possible student responses and are nuanced enough to uncover differences. As an additional coding check, a second rater was used to code 20% of the data from each of the three sources. The second rater’s coding was compared and any discrepancies were discussed and addressed to reach 100% agreement. The initial evidence rules for these data were revised based on these findings.

Revising and applying evidence rules to full dataset. The final step of the analysis in

Phase II involved updating the evidence rules to reflect the conclusions derived from

investigating the cases and applying the rules to the full dataset, as well as a simulated dataset. The final codes were applied to every pair using Excel to clean the initial data and R to parse the lines of code and score for the specific rules. All of the pieces of evidence were scored

dichotomously (i.e., observed/not observed). Descriptive statistics for each of the pieces of evidence were generated to illustrate the coverage of the different rules and ensure that the rules are being applied appropriately.

Generating simulated dataset. The last step of this phase resulted in the creation of the

simulated dataset. The purpose of generating a simulated dataset was to test how the models perform in Phase III with a larger, more idealized sample. Therefore, the simulated dataset was designed to have a large n size (1000), while also maintaining the complexity of the actual dataset. The data from the POM and CR simulations in this study contain multiple data sources and implicit relationships between the different sources of evidence that needed to be captured in the simulated dataset. In order to capture the relationships between the pieces of evidence, the simulated dataset was generated through a series of steps that considered the design of the simulation and relationships that could be determined from the actual dataset. Using the

identified evidence from this phase, the first step of generating the simulated data was to map out the relationship rules among the pieces of evidence based on the design and desired use of the simulation. For instance, students manipulated the POM simulation to observe changes in molecular movement at the different states of matter, record their observations in the embedded table, and then answer questions about molecular movement in the reflection. From this example it can be seen that there would be relationships among observing a state of matter, being able to correctly fill in the table regarding that state of matter, and correctly answering reflection

questions about that state of matter. The relationship rules allowed for multiple data sources to be related, so a student who observed the liquid state and correctly filled in the table about

molecular movement for liquids could be more likely to answer reflection questions about molecular movement than a student who did not view the liquid state or did not collect correct data about molecular movement for liquids.

Once the relationships among the pieces of evidence were mapped out, the actual dataset was used to determine the probabilities associated with the relationships. The percent of students

that corresponded with the different relationships was used for the probabilities in most cases and recorded in the results. Finally, the data were simulated using a code written in R to define the relationships and apply a binomial distribution associated the different probabilities with the pieces of evidence. As with the actual data, the simulated dataset was scored dichotomously. Descriptive statistics for the simulated dataset were also calculated and compared to the actual dataset.

Phase III. The third phase of analysis identified and applied possible models from two

In document Toutkoushian_unc_0153D_18818.pdf (Page 97-101)