3.3 Material and Method
3.3.4 Coding speech
To assess the quantity and quality of self-directed speech all children were videotaped during the ToL task. Afterwards the utterances which children produced during baseline 1, baseline 2, and triggering condition were transcribed and coded using the software Videograph (2016). The coding process followed the quantitative content analysis (Berelson, 1952) as event-sampling procedure. Utterances were defined as a complete sentences, a sentence fragment or any string of speech which is temporally separated from another (e.g. a pause of 2 s) or semantically dis- content (e.g. a change of content) (Winsler, Fernyhough, McClaren, & Way, 2005). In a first
85 step, we aimed identifying self-directed speech. To this end, we followed the procedure of as- signing each utterance to a subordinate category (Abdul Aziz et al., 2016a; Fernyhough & Fradley, 2005; Fernyhough & Russell, 1997; Sturn & Johnston, 1999) in social or private
speech. Private speech was defined as child´s speech directed to itself which is not explicitly addressed to another person and that do not meet criteria of social speech (Diaz, Winsler, Atencio, & Harbers, 1992). An utterance was coded as social when the child interacting with the experimenter with any of one or combined following elements: (1) eye contact: eye contact with the examiner during or within 2 s of an utterance; (2) behavioral: child´s behavior involved the experimenter through physical contact, gaze direction or extension of arms; (3) content markers: the utterance had the same topic as the examiner verbalized utterance before; or (4) temporal contiguity: an utterance which occurred less than 2 s after any social utterance. Utter- ances coded as social speech were not included in further analyses.
Utterances were coded as private speech when the conditions for social speech were not met. For speech during our triggering condition, we defined a third category, called triggered speech. The utterances were coded as triggered only in the triggering condition and when the utterances did not meet the criteria for social speech. See Appendix A for the coding scheme.
3.3.4.1 Quantity
As a measure of quantity, the mean number of private and triggered speech utterances per con- dition was coded (Winsler et al., 2005).
3.3.4.2 Quality: Internalization
Each private speech utterance were coded in terms of its level of internalization. We followed the procedure of Lidstone et al. (2012) of a five-level internalization system: Level 1: fully overt speech; Level 2: intelligible muttering; Level 3: intelligible whispering or unintelligible muttering; Level 4: audible bur unintelligible whispering; and Level 5: inaudible and barely
86 audible verbal lip movements. The internalization score was calculated by the mean of all levels occurred during a condition. The internalization score per condition ranged from 1.0 till 5.0, i.e. the higher the score the more internalized is private speech.
3.3.4.3 Quality: Spatial language
Each private and triggered speech utterance was coded in terms of containing spatial words based on the System for Analyzing Children´s Language about Space by Cannon, Levine, & Huttenlocher (2007). Utterances were coded as spatial language when at least one word within the utterance could be considered in one spatial category: Shape Terms, Spatial Dimensions, as well as Locations and Directions. Shape Terms were defined as words describing mathematical names of objects, e.g. disc, peg. Spatial Dimensions were defined as words describing the sizes of objects, e.g. big, short. Locations and Directions were defined as words describing the rela- tive position, orientation or transformation of objects, e.g. at, from, on, over, under, left, right. Spatial language was rated on a two-point-scale (0 = utterance containing no spatial word ac- cording to category; 1 = utterance containing minimum of one spatial word according to cate- gory).
3.3.4.4 Quality: Planning function
Each private and triggered speech utterance was coded in terms of containing statements with planning function modified from the coding scheme of Feigenbaum (1992). Utterances were coded as planning function by applying the following criteria: formulating goal statements, e.g. I have to move the yellow disc on the short peg (with the requirement that the child said it before moving any discs), defining discrepancies, e.g. I can´t move the disc, because there is
the blue one (differences between the problem and the goal), hypothetical reasoning, e.g. I do
it before (future-oriented statements, if-then constructions), beginning a new plan (returns to start state with a different exploration from previous ones - with the requirement that the child
87 said it before moving any discs), questions to self, e.g. How can I move it?, remembering, e.g.
At this peg only two discs are allowed (goals, rules or solutions), evaluating, e.g. I make it (statements about child´s performance or motivation), and non-words related to planning pro- cesses, e.g. mh (while focusing on the task). Utterances were coded as non-planning function by applying the following criteria: I-don´t-know-statements, exclamations, e.g. oops (typically one word expressions of affect), non-words (wordplay, humming, sound effects), evaluation that is unrelated to planning, task-irrelevance (utterances not related to the ToL), or speaking related to the in-process-moving of a disc. Planning function was rated on a two-point-scale (0 = utterance containing no planning related statements according to criteria; 1 = utterance containing function-related statements according to criteria).
3.3.4.5 Quality: Grammatical completeness
Each private and triggered speech utterance was categorized as being either fragmented or com- plete (Winsler et al., 2005; Winsler, 1998; Winsler & Naglieri, 2003). Grammatical complete- ness was rated on a two-point-scale (1 = fragmented, 2 = complete, i.e. utterances containing a subject and a predicate (and object), and also one-word questions, answers, and imperatives to the self).
3.3.4.6 Interrater reliability
Two coders, naive to hypotheses and to group, coded the video recordings for addressee (social vs. private speech), internalization level, spatial language, planning function, and grammatical completeness. The second coder independently coded 20 % of the video recordings to calculate the interrater reliability. Cohens Kappa was computed for all variables with exception of the internalization level which was calculated by intraclass correlation (ICC). The interrater relia- bility for addressee was K = .913; for the presence/absence of spatial language in private /trig-
88 gered speech trials Cohens Kappa was K = .963/.939; for the presence/absence of planning func-
tion in private/triggered speech trials K = .964/.906; and for grammatical completeness in pri-
vate/triggered speech trials K = .912/.895. The average measure ICC for internalization level
was .952 with a 95% confidence interval from .952 to .969 (F(80,80) = 20.53, p = .0001***).