Binomial logistic regression analyses and mixed effect generalized linear regression models were used for all analyses. Mixed effect generalized linear regression models with logistic link functions were used to answer questions 1A, 1B, 1C, 2B, and 2C. Binomial logistic regression models were used to answer question 2A. These analyses were used because they have several advantages over analyses of variance. First, they are useful for explaining the effects of predictor variables on a binary (categorical) regressor variable, including count data and binary response data such as “correct” or “incorrect” responses on test items (Barr, 2008; Jaeger, 2008).
Second, they allow predictor variables related to time, such as age to be treated as continuous variables (Barr, 2008). Finally, mixed-effect modeling can be used to account for random effects of participants in analyses with repeated measures to improve the accuracy of inferences (Baayen, Davidson, & Bates, 2008).
These models used use the cumulative distribution function of the logistic distribution to estimate the probability that the response of a binary regressor variable will be positive as a function of a set of predictor variables (Myers, Montgomery, Vining, & Robinson, 2010). For analyses of individual tense marker stimulability (Question 2A), models estimated the probability that a given child would be stimulable for each tense marker. For all other analyses, models estimated the probability that a given child would give a positive response for any given observation and treated each tense marker, productivity point or stimulability probe item as a separate observation. Predicted probabilities of a positive result are a sigmoid function limited between 0 and1. Predicted probabilities asymptotically approach 0 as the probability of a positive result decreases and asymptotically approach 1 as the probability of a positive response
of a positive response to probability of a negative response for that observation. The odds of a positive response are multiplicative, unbounded, and vary as a function of the predictor variables.
Prior studies following Hadley and Short (2005) have treated each language sample as a single observation. In these prior analyses, one tense marker total (range 0-15), one composite productivity score (range 25) and one category productivity score for each category (ranges 0-5) was reported for each language sample. In the analyses for questions 1A and 2A, each tense marker was treated as a separate, binary observation. In the analyses for questions 1B and 1C, each productivity point was treated as a separate, binary observation. In the analyses for questions 2B and 2C, each probe item was treated as a separate, binary observation.
3.4.1 Question 1A.
Research question 1A asked if the number of tense markers children use in language samples with a fixed number of multi-morpheme utterances increases with age between 30 and 54 months. In order to answer this question, tense marker use was modeled using a mixed-effect generalized linear regression with a logistic link function. This model estimated the proportion of tense markers used in a sample of 150 multi-morpheme utterances as a function of age (at Visit 1). Age was entered into the model as a fixed covariate. Age was centered at 21 months following Rispoli et al.’s (2009) finding that the true zero for tense marker development occurred at 21 months. Random effects intercepts were entered for children.
The exponential function of the estimated coefficient (eb) was used to interpret the relationship between age and the odds that a child would use any given tense marker. eb is the odds ratio associated with a one-unit increase in the value of a predictor variable (Szumilas,
3.4.2 Question 1B.
Research question 1B asked if productivity scores increase with age between 30 and 54 months in language samples with a fixed number of multi-morpheme utterances. In order to answer this question, productivity point level data were modeled using a mixed-effect generalized linear regression with a logistic link function. This model estimated the proportion of 25 possible productivity points that children received in a sample of 150 multi-morpheme utterances as a function of age (at Visit 1). Age was entered into the model as a fixed covariate. Age was centered at 21 months following Rispoli et al.’s (2009) finding that the true zero for tense marker development occurred at 21 months. Random effects intercepts were entered for children. eb was used to interpret the relationship between age and the odds that a child would receive any given productivity point.
3.4.3 Question 1C.
Research question 1C asked if morpheme category productivity increases at the same rate across morpheme categories. In order to answer this question, productivity point-level data were analyzed using a mixed-effect generalized linear regression with a logistic link function. The model estimated the proportion of productivity points obtained in each morpheme category as a function of three fixed effects (morpheme category, age, and age-by-category interaction) and random effects intercepts for children. Morpheme categories were entered into the model using dummy variables. Age was entered into the model as a continuous fixed effect variable. The model was formed three times with age centered at different ages to examine differences
between morpheme category productivity scores at different ages. Age was centered at 30 months, 42 months, and 54 months.
When significant fixed effects were found, the exponential function of the estimated coefficient (eb) was used to interpret the relationship between the average odds that a child would receive any given productivity point in a morpheme category and values of the fixed effect variables. For age, eb is the odds ratio associated with a one-month increase in age. For morpheme category, eb is the odds ratio associated with a change between two morpheme categories.
3.4.4 Question 2A.
Research question 2A asked if the odds of a child being stimulable for each individual tense marker increased at the same rate across communication modalities. In order to answer this question, a series of multivariate binomial logistic regression analyses were used to estimate the effects of age and communication modality on the probability that a child will be stimulable for each individual tense marker. A separate set of regression models was estimated for each of the 15 individual tense markers. The set of models included a full model with 3 parameters (main effect of age, main effect of communication modality, and age-by-modality interaction), a two-parameter subset model with main effects of age and communication modality, age-only and modality-only subset models, and an intercept-only model. For each of these analyses, the unit of observation was the individual child participant. The regressor variable in each of these analyses was tense marker stimulability. Stimulable tense markers were coded as 1. Non-stimulable tense markers were coded as 0. The predictor variables in each analysis were age and
continuous variable. Communication modality was a categorical variable with two categories (graphic symbol, spoken). The model with the lowest AICc was selected as the best fitting model for each tense marker. Random effects of children were not included in these models because these models only used one observation per child.
eb was used to interpret the relationship between the odds that a child would be stimulable for a given tense marker and values of the significant predictor variables in the final models. For age, eb is the odds ratio associated with a one-month increase in age. For modality, eb is the odds ratio associated with a change from the graphic symbol modality to the spoken modality.
Children were excluded from individual tense marker analyses casewise if they did not complete all probe items on the stimulability test corresponding to a given tense marker. One child was excluded from the analyses for copula was, were, -ed, auxiliary does, do, did, auxiliary is, am, and are. Three children were excluded from the analyses for auxiliary was and were.
3.4.5 Question 2B.
Research question 2B asked if morpheme category stimulability increases at the same rate across communication modalities for each morpheme category. In order to answer this question, probe item-level data were analyzed using a series of mixed-effect generalized linear regressions with a logistic link function. Each model estimated the proportion of correct responses on stimulability probe items corresponding to all tense markers in one morpheme category as a function of three fixed effects (communication modality, age, and age-by-modality interaction). A separate regression model was estimated for each morpheme category. Communication modality was
centered at 42 months and entered into each model as a continuous fixed effect variable. When significant fixed effects were found, eβ was used to interpret the relationship between the average odds that a child would give a correct response on any given probe item in a morpheme category and values of the fixed effect variables.
Each model also included random effects intercepts for children. Since communication modality was randomly assigned at the level of morpheme categories, any random effects of communication modality were nested within child participants.
Children were excluded from analyses casewise if they did not complete all probe items on the stimulability test corresponding to a given morpheme category. Three children were excluded from the AUXILIARY BE category analysis. One child was excluded from each of the other four category analyses.
3.4.6 Question 2C.
Research question 2C asked if morpheme category stimulability increases at the same rate across morpheme categories. In order to answer this question, probe item-level data were analyzed using mixed-effect generalized linear regression models with logistic link functions. A separate model was formed for each communication modality. The models estimated the proportion of correct responses on stimulability probe items corresponding to all tense markers in each morpheme category as a function of three fixed effects (morpheme category, age, and age-by-category interaction). Morpheme categories were entered into the models using dummy variables. Each model was formed three times with age centered at different ages to examine differences between morpheme category stimulability scores at different ages. Age was centered
β
used to interpret the relationship between the average odds that a child would give a correct response on any given probe item in a morpheme category and values of the fixed effect variables.
Each model also included random effects intercepts for children. Since communication modality was randomly assigned at the level of morpheme categories, and each child was tested in at least one morpheme category in each modality, each child’s data were split between models. The pattern of this split varied randomly from child to child.
Children were excluded from analyses pairwise if they did not complete all probe items on the stimulability test corresponding to a given morpheme category. Three children were excluded from the AUXILIARY BE category analysis in the graphic symbol model. One child was excluded from each of the other four category analyses in the graphic symbol model. No children were excluded from any analyses in the spoken model.