Neighborhood Density and Word Frequency Predict Vocabulary Size in Toddlers PurposeTo document the lexical characteristics of neighborhood density (ND) and word frequency (WF) in the lexicons of a large sample of English-speaking toddlers.MethodParents of 222 British-English–speaking children aged 27(±3) months completed a British adaptation of the MacArthur–Bates Communicative Development Inventory: Words and Sentences (MCDI; Klee & Harrison, 2001). Child words ... Article
Free
Article  |   June 01, 2010
Neighborhood Density and Word Frequency Predict Vocabulary Size in Toddlers
 
Author Affiliations & Notes
  • Stephanie F. Stokes
    Curtin University of Technology, Perth, Australia
  • Contact author: Stephanie F. Stokes, who is now with the Department of Communication Disorders, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand. E-mail: stephanie.stokes@canterbury.ac.nz.
Article Information
Development / Special Populations / Research Issues, Methods & Evidence-Based Practice / Language Disorders / Speech, Voice & Prosody / Language
Article   |   June 01, 2010
Neighborhood Density and Word Frequency Predict Vocabulary Size in Toddlers
Journal of Speech, Language, and Hearing Research, June 2010, Vol. 53, 670-683. doi:10.1044/1092-4388(2009/08-0254)
History: Received December 10, 2008 , Revised May 10, 2009 , Accepted August 2, 2009
 
Journal of Speech, Language, and Hearing Research, June 2010, Vol. 53, 670-683. doi:10.1044/1092-4388(2009/08-0254)
History: Received December 10, 2008; Revised May 10, 2009; Accepted August 2, 2009
Web of Science® Times Cited: 28

PurposeTo document the lexical characteristics of neighborhood density (ND) and word frequency (WF) in the lexicons of a large sample of English-speaking toddlers.

MethodParents of 222 British-English–speaking children aged 27(±3) months completed a British adaptation of the MacArthur–Bates Communicative Development Inventory: Words and Sentences (MCDI; Klee & Harrison, 2001). Child words were coded for ND and WF, and the relationships among vocabulary, ND, and WF were examined. A cut-point of −1 SD below the mean on the MCDI classified children into one of two groups: low or high vocabulary size. Group differences on ND and WF were examined using nonparametric statistics.

ResultsIn a hierarchical regression, ND and WF accounted for 47% and 14% of unique variance in MCDI scores, respectively. Low-vocabulary children scored significantly higher on ND and significantly lower on WF than did high-vocabulary children, but there was more variability in ND and WF for children at the lowest points of the vocabulary continuum.

ConclusionChildren at the lowest points of a continuum of vocabulary size may be extracting statistical properties of the input language in a manner quite different from their more able age peers.

Children who have a late or slow onset of expressive vocabulary development at 2 years of age are described as late talkers (LTs; Ellis Weismer, 2007), and their status is usually determined by one of three rubrics. The first is fewer than 50 words in their expressive vocabulary or no word combinations at 24–30 months (e.g., Paul, 1996). The second is a score at or below the 10th–15th percentile on a parent checklist of child expressive vocabulary (e.g., Bishop, Price, Dale, & Plomin, 2003; Thal, Reilly, Seibert, Jeffries, & Fenson, 2003). In the third, a composite score from a general developmental scale (Rice, Taylor, & Zubrick, 2008; Zubrick, Taylor, Rice, & Slegers, 2007) has been used to identify children with late language emergence. These three rubrics all use quantitative criteria to identify children who are developing slower than their age-matched peers and who may be at risk for later language impairments. Not all LTs go on to have a language impairment in the preschool years. The general consensus is that the majority of children who were LTs at 2 years may resolve between 3 and 4 years of age, with LTs moving into the normal range of performance for vocabulary by the school years (Paul, 1996; Rescorla, 2002), although studies have reported continuing syntactic and literacy levels below the normal range in the school and adolescent years (e.g., Rescorla, 2002).
A recurring theme in these studies is the fact that there is more variability in vocabulary size within groups of LT children than is found in typically developing (TD) children, but as yet there have been few attempts to explore this variability. Exploring variability in the lexicons of LT and TD children may be instructive in the search for why some LTs continue to have a language impairment as they approach school age and why some children “catch up” with their peer group, falling into the category of late bloomers (those who have a slow onset to vocabulary development but then bloom into the normal range of vocabulary scores). Ellis Weismer (2007)  reported that in her study of 40 LTs, 35% were enrolled in speech–language treatment at 3½ years, and 37.5% of the original 40 were enrolled in speech–language treatment at 5½ years. As yet, there is little indication for which LTs will grow into language impairment status and which will be eventually recategorized as late bloomers. Examining the nature of the lexicons of LTs may shed light on this issue. The lack of research on variability may be due to the continuing focus on quantitative measures. Although useful for identifying cases, quantitative measures do not shed light on how the lexicons of LT and TD children actually differ.
The lexicons of children identified as LTs or late language emergence have been described in only a handful of studies (see Desmarais, Sylvestre, Meyer, Bairati, & Rouleau, 2008). Some studies have explored lexical and syntactic growth over time (e.g., Hadley & Holt, 2006; Moyle, Ellis Weismer, Evans, & Lindstrom, 2007; Rescorla, Mirak, & Singh, 2000). The lexicons of LTs do not reflect a selective delay in verbs relative to nouns (Ellis Weismer, 2007) or in verb morphology (Rescorla & Roberts, 2002), but there has been no examination of the lexical or sublexical properties of the lexicons of LT children, despite a focus on these factors for other groups of children. For example, there is now a considerable body of literature on the lexical (phonological neighborhood density [ND] and word frequency [WF]) and sublexical (phonotactic probability [PP]) characteristics of cohort data from TD children, particularly work by Storkel and colleagues (see details in the paragraphs that follow).
Characteristics of Developing Lexicons
Lexical characteristics in very young children’s lexicons have been studied via two primary means, corpora studies of lexical norms and individual child developmental patterns. In corpora studies, researchers examined the properties of words known by young children using parental checklist data (e.g., Dollaghan, 1994; Goodman, Dale, & Li, 2008; Storkel, 2004a). Earlier work by Dollaghan (1994)  examined only monosyllabic words on earlier versions of parental checklists of child vocabulary size (Rescorla, 1989; Reznick & Goldsmith, 1989) and showed that 84% of the words had at least one phonological neighbor, indicating that early vocabularies are likely to be dense in terms of phonological neighbors. Storkel (2004a)  took the Dale and Fenson (1996)  database that documents the words from the MacArthur–Bates Communicative Development Inventory (MCDI; Klee & Harrison, 2001) that are known by children aged 0;8 to 2;6 (years;months) and examined the lexical properties of nouns that were learned earliest (e.g., were achieved by 50% of the children at age 18 months vs. words that were not achieved by 50% of the children until 24 months). Storkel found that words that were known by the younger children (acquired earliest) tended to have higher ND (i.e., have many neighbors in the ambient language—in this case, English) and higher WF than words acquired later.
Recently, Storkel (2008a), using the same lexical database, examined the phonological, lexical, and semantic properties of nouns in predicting two different dependent outcome variables. In the first analysis, the predicted variable was the percentage of children at each age that knew a given word. In this analysis, the best predictors were the lexical variables (ND and word length), followed by the phonological variables (PP). Semantic variables contributed less to the regression, and WF contributed very little to the regression once the other variables had been entered, primarily because of the high correlation between frequency and the other measures. In the second analysis, the outcome measure was the age at which 75% of infants/toddlers knew a given word. Here, the best predictor was again the lexical variables (high ND and short words learned earliest). Furthermore, the effects of PP remained constant across the age range studied, whereas the effects of ND diminished with age, suggesting that ND may be an important factor in word learning for very young children, but with increasing vocabulary size, children more readily learn words from sparse neighborhoods. The strongest effects for WF were seen at 1;10–2;0, suggesting that frequency may be important as a cue to word learning in the population of interest (LTs).
Goodman et al. (2008)  also used the Dale and Fenson (1996)  database to examine WF effects in the norms for the percentage of children at each age who knew a given word. They reported a direct relationship between WF and expressive MCDI scores when all word classes were included in the analysis (i.e., nouns, people words, verbs, adjectives, closed class, and others), such that earlier learned words were of low frequency in the input. They interpreted the finding as reflective of the inclusion of closed class words. Focusing only on open class words, they found, as expected, that words of higher frequency were learned earlier.
Individual child developmental patterns have been examined as an alternative to corpora studies—for example, Coady and Aslin (2003)  and Maekawa and Storkel (2006) . First, Coady and Aslin (2003)  calculated the neighborhood densities for two children tracked longitudinally from 2;3 to 3;6 using data from the Child Language Data Exchange System (CHILDES; MacWhinney, 2000) database. Their reference database was the Kučera and Francis (1967)  corpus of 77,581 American-English words commonly used in research of this nature (Nusbaum, Pisoni, & Davis, 1984). Coady and Aslin (2003)  included monosyllabic words that were not proper names, homonyms, contractions, or inflected forms, produced across time (923 for Adam and 760 for Sarah). They found that children were more likely to learn (use in connected spontaneous speech) words that come from dense neighborhoods in the ambient language (i.e., “acquiring words from denser portions of the adult lexical neighborhoods,” p. 455).
In a second study, Maekawa and Storkel (2006)  examined the lexicons of 3 children aged 1;4–3;1 in longitudinal data from CHILDES (MacWhinney, 2000). The children varied by the rate at which they learned words, such that at age 2;10, Allison had a projected total of 174 different nouns, April had 316, and Peter had 719. The authors entered ND, PP, word length, and WF into a regression to predict the age of first production of given words and found that these sublexical and lexical variables were represented differently in the vocabularies of 3 different children tracked longitudinally. For the child with the smallest vocabulary, words of high PP and short length were learned earliest. For the child with the midsized vocabulary, word length (preferring shorter words) was the best predictor of age of acquisition, and in the lexicon of the child with the largest vocabulary, the predictors were ND (sparse words were learned earlier than dense words) and WF (high-frequency words were learned earlier than low-frequency words). It is notable that in the 2 children with the greatest vocabulary size, the PP measures were eliminated from the regression first, whereas for the first child, this was the best predictor. It seems that once lexical size reaches a certain (as yet unidentified) level, ND rather than PP is an important cue to learning.
All of this research was conducted on TD children. Another body of literature that might influence hypothesis formation is that of learning patterns of children with phonological disorders. (The term phonological disorder is used to describe children with speech sound disorders and is generally construed as being different from specific language impairment.) Storkel (2004b)  examined novel word learning in three groups of children: preschool children with phonological disorder; younger, phonology-matched TD children; and age- and vocabulary-matched TD children. She found that although the younger TD children learned high PP words more easily than low PP words, the children with phonological disorder had the opposite learning pattern. The age-matched children showed no preference. Storkel concluded that children with phonological disorder may have a particular difficulty in forming lexical representations for words that sound like other words in the ambient language. Similarly, in a study of treatment for phonological disorder, Gierut, Morrisette, and Champion (1999)  concluded that better generalization from treated to untreated words occurred when treatment focused on words that were frequent in the language and of low ND. The findings of these two reports could support the hypothesis that children who are learning language in an atypical fashion (different from their peers) may not follow the predicted patterns of learning words of high ND.
When synthesizing all of this research, the findings appear somewhat incongruent (see Table 1). Most studies show that early (and probably small) vocabularies of TD children consist of words that come from dense neighborhoods in the ambient language. However, this pattern may not hold for all children and may be different for children not developing language in step with their peers. Second, words that are learned earliest tend to be of high frequency in the input, but this claim is inconclusive because several studies reported high correlations between WF and ND, limiting conclusions that can be drawn about the value of WF as a predictor of vocabulary size.
Table 1Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.
Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.×
AuthorSourceNDWF
Storkel (2004a) Checklist databaseHighHigh
Storkel (2008a) Checklist databaseHigh(Correlated with ND)
Coady & Aslin (2003) Individual language samples (N = 2)High
Maekawa & Storkel (2006) Individual language samples (N = 3)Child 3 = lowaChild 3 = higha
Goodman, Dale, & Li (2008) Checklist databaseWord-category dependent
Storkel (2004b) Children with PDLow (PP)
Gierut, Morrisette, & Champion (1999) Children with PDLowHigh
Note. PD = phonological disorder. PP = phonotactic probability.
Note. PD = phonological disorder. PP = phonotactic probability.×
aChild 1 and 2 = effects of word length.
aChild 1 and 2 = effects of word length.×
Table 1Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.
Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.×
AuthorSourceNDWF
Storkel (2004a) Checklist databaseHighHigh
Storkel (2008a) Checklist databaseHigh(Correlated with ND)
Coady & Aslin (2003) Individual language samples (N = 2)High
Maekawa & Storkel (2006) Individual language samples (N = 3)Child 3 = lowaChild 3 = higha
Goodman, Dale, & Li (2008) Checklist databaseWord-category dependent
Storkel (2004b) Children with PDLow (PP)
Gierut, Morrisette, & Champion (1999) Children with PDLowHigh
Note. PD = phonological disorder. PP = phonotactic probability.
Note. PD = phonological disorder. PP = phonotactic probability.×
aChild 1 and 2 = effects of word length.
aChild 1 and 2 = effects of word length.×
×
Patterned Variation in ND and WF
There has been some suggestion that ND and WF pattern together systematically, such that ND is positively and strongly correlated with WF. For example, Storkel (2008a)  stated that “higher-frequency words tend to have more neighbors and lower-frequency words tend to have fewer neighbors” (p. 300), based on the work of Landauer and Streeter (1973) . However, Landauer and Streeter’s (1973)  finding that high-frequency words had more neighbors than low-frequency words was for orthographic neighborhood density of English words of four letters in length and for only two frequency ranges (75 occurrences per million vs. 1 occurrence per million). Frauenfelder, Baayen, and Hellwig (1993)  replicated Landauer and Streeter’s (1973)  methodology and found the same results for orthographic neighbors in English. Next, examining the word lengths of three to eight letters and the entire frequency range, Frauenfelder et al. (1993)  found that although the correlation between ND and WF was significant, it was extremely weak, r(2151) = .16, p < .001, with a significant result due to the sheer sample size and with a scatter illustrating great variability in the distribution of ND × WF. Further, there was a downward curve in WF as ND increased, with words of very high ND having lower WF than words of lower density. Thus, high ND appears to map with high WF for written words of four letters in length for extreme frequencies but not words that are shorter or longer and cover the range of expected WF values. Research exploring Landauer and Streeter’s (1973)  claim in phonological neighbors in English across a wider range of frequencies and word lengths has found that high-frequency words did not have more neighbors than low-frequency words (Pisoni, Nusbaum, Luce, & Slowiaczek, 1985). Frauenfelder et al. (1993)  also examined the ND × WF relationship for phonological neighbors in English and reported significant but very weak associations, with an increase in strength of association for words of one to four phonemes in length but a decrease in the association for longer words, as had been found for the orthographic neighborhoods. All of these results need to be considered with the following caveat: Studies did not examine the same lexicons or the same words. Frauenfelder et al. (1993)  appear to have used the same definitions of neighborhoods as prior studies, but this may have differed slightly, too. In short, the relationship between ND and WF is still unclear.
Nonetheless, the claim that high ND is related to high WF has led some researchers to weight words by frequency, creating a frequency-weighted ND measure called relative frequency(R), estimated as “the frequency of a word divided by the sum of the frequencies of all of its neighbors” (Scarborough, 2004, p. 6). A word of high relative frequency (high R) has a high frequency relative to the combined frequencies of its neighbors, and a word of low relative frequency (low R) has a low frequency relative to the combined frequencies of its neighbors. High-R words are easily accessed (they have low lexical confusability), whereas low-R words are more difficult to access (they have high lexical confusability), according to Scarborough (2004) .
The Implications of ND and WF Patterning
Studying the effects of relative frequency on word production, Scarborough (2004)  reported that low-R words were produced with more coarticulation, which she interpreted as a speaker’s attempt to increase intelligibility and reduce confusability of these words. She found that “low frequency words and high density words pattern together such that both resist reduction” (p. 123), where reduction refers to reduced word duration and reduced vowel space. This means that low-frequency/high-density words cause lexical competition effects in production, causing speakers to tacitly heighten the phonetic characteristics of low-frequency/high-density words to facilitate listener perception. Munson and Solomon (2004)  examined the effects of lexical competition on speech production, measuring vowel duration and vowel formant frequency. They found that high-density words were produced with an expanded vowel space in comparison to low-density words but that this occurred independently of frequency effects. Indeed, Scarborough stated that the relative frequency measure is “much more highly correlated with neighborhood density than frequency” (p. 122). Perhaps the most important information gleaned from this literature on frequency-weighted density is that words that are of low frequency and high density may be produced by speakers with greater duration and a more expanded vowel space, reducing confusability for the listener (Munson & Solomon, 2004; Scarborough, 2004; Wright, 2004). Note, however, that in a carefully controlled experiment, Munson and Solomon (2004)  reported density effects in the absence of frequency effects and vowel expansion effects in the absence of vowel duration effects.
These findings could be relevant for research on lexical development. It is possible that these speaker modifications for words in high lexical competition contribute to the range of cues that facilitate infant/toddler abstraction of word units from the ambient speech stream. There is recent evidence that variability of specific phonetic dimensions of words has an impact on infant word learning (Rost & McMurray, 2009). Variability of vowel space within a dense neighborhood may facilitate learning of that neighborhood. If so, there should be a propensity for toddlers at the earliest stages of vocabulary development to learn words of high ND, as has been attested to in the literature. Children who are struggling to learn vocabulary may or may not show the same preference for high ND words. This is the question that is addressed in the current research. A logical hypothesis for ND would be that early, or small, vocabularies will consist of words that are of high ND in the ambient language. It is not clear what role WF plays. Given the lack of consensus surrounding the issue of the (in)dependence of ND and WF, in this study they are investigated as separate, independent variables before collinearity of the two is examined in a regression analysis.
Study Aims
The current study focuses on lexical factors that are known to contribute to the ease of recognizing or producing words (for a review, see Storkel, 2008b). The aim was to identify characteristics of the lexicons of children with good and poor vocabulary skills at 2 years of age that might explain how children respond to cues in the ambient language environment that are suspected to facilitate language learning. If we can identify distinguishing lexical characteristics for children with high and low vocabulary sizes, then we might gain some understanding of how these two groups of children use ambient cues implicit in the statistical regularities of their language. If we find that children at the low end of the vocabulary continuum demonstrate the same distributional patterns as children in the mid and high range of vocabulary size, then statistical regularities in the input would seem to be used in similar ways by good and poor language learners. If, however, we find that children with small vocabularies demonstrate sensitivity to input characteristics that differ from their TD peers, then individual perceptual or input characteristics could be implicated. If there is individual variability in the characteristics of the lexicon, then there may be some indication of why some children with poor early vocabulary development eventually normalize into the range of typical ability (so-called late bloomers), whereas some children do not (so-called late talkers).
In this first exploratory study, two possible candidates, ND and WF, were examined as input factors that may contribute to qualitative differences between 2-year-old children with large and small vocabularies. Note that the term phonological neighbor as used here refers to words that differ from all other words by the substitution, deletion, or addition of a sound in any word position (±1 segment; Luce & Pisoni, 1998). The term word frequency refers to the number of times that a given word occurs in a database, representing WF in the ambient language.
The hypotheses are that
  1. Smaller vocabularies will consist of words that are of higher WF and that come from denser neighborhoods in the ambient language than words included in larger vocabularies, and

  2. Children with extremely small vocabularies for their age may not reflect the same sensitivities to high ND and high WF shown by children with moderate and large vocabularies.

These hypotheses are relevant for the two primary aims of improving the early identification of language impairment and finding factors that may predict outcomes for children with small vocabularies relative to their peers at 2 years. In the 24- to 30-month age range, children who are quantitatively different from their peers in vocabulary size may also be qualitatively different in the nature of their lexicons. Such differences may indicate that children at the low end of the continuum respond differently to cues in the ambient language that are thought to contribute to language learning.
The research questions are as follows:
  1. How much variance in vocabulary size is accounted for by ND and WF together and independently in 2-year-old children?

  2. Is there a significant difference between children with small and large vocabularies in ND and WF?

Method
Participants
The study participants have been described previously in Stokes and Klee (2009a, 2009b). The sample consisted of 232 U.K. English-speaking children ranging in age from 24 to 30 months (M = 26.83, SD = 1.48). There were 121 girls and 111 boys; 58% were from Southern England, and 42% were from Northeast England. In the former location, children were recruited from a research database, and in the latter location, they were recruited from parent–toddler playgroups. Children had no developmental disability, hearing impairment, or medical condition, as confirmed by parent questionnaire.
Procedures
Vocabulary size. Parents received a mailed package containing a demographic questionnaire, a British English version of the MCDI (Klee & Harrison, 2001), and a consent form. Parents returned the consent form and the completed questionnaires in a stamped, addressed envelope. On the MCDI, the parent checks off each word they know is used (spoken) by their child. For each child, words checked by the parent were entered into an SPSS database. The monosyllabic noun (160), verb (88), and adjective (31) words from the MCDI were included, and the words from the following categories were excluded: sound effects and animal sounds, people, games and routines, words about time, pronouns, question words, prepositions and locations, quantifiers and articles, helping verbs, and connecting words. This was to center the analysis on core vocabulary rather than words likely to be context-based (e.g., people) or function words. In line with previous research, only monosyllabic words were included for analysis, for a total of 279 words. Eight families did not return MCDI forms, and these children were excluded from the analyses. Two children had scores of 3 and 8 for core vocabulary, and they too were excluded from the analyses because calculating density values for such small lexicons would be nonsensical. The final range of MCDI core vocabulary scores in the remaining 222 files was 27–601.
Neighborhood density. De Cara and Goswami’s (2002, n.d.) calculations of ND for British English were used in this study. The reference database was 4,086 monosyllables from the CELEX (Baayen, Piepenbrock, & Gulikers, 1995) database. This database was chosen because of its relatively large size (17.9 million words) and because it is appropriate for British English. Two different definitions of ND were used by the authors, but only one is used here, to align findings with previous research. The chosen metric is the commonly used ±1 phoneme substitution, addition, or deletion (Ph ±1 metric; e.g., Charles-Luce & Luce, 1990), in which “hat” and “bat” would be rhyme neighbors but not “hat” and “splat.”
Word frequency.De Cara and Goswami’s (2002)  coding of lexical frequency is the occurrence per million words of adult speech within a 17.9 million spoken word corpus.
Data entry. For each child, values for overall ND (Ph ±1 ND) and WF were entered for each MCDI word used. An average value was generated for each child. This method yielded a continuum of ND and frequency values for the sample of 222 children, along with the continuum of MCDI scores.
Data analysis. A multivariate analysis of variance showed that there was a significant effect of age on MCDI scores, F(6, 215) = 5.20, p < .001, partial η2 = .13, although the effect size was small. Although there was no significant effect of age on ND or WF scores, correlations among the variables were low but significant: Age × ND, r(222) Age × ND = −.17, p = .01; Age × WF, r(222) = .23, p = .001. Thus, all variables were converted to z-scores within age groups for subsequent analyses.For example, MCDI scores were converted to z-scores (represented as z-MCDI) for 24 months, 25 months, etc. This effectively controlled for age in all analyses.
Results
Predicting Vocabulary Size
In preparation for answering the first research question (i.e., “How much variance in vocabulary size is accounted for by ND and WF together and independently in 2-year-old children?”), correlations among the variables were examined. The z-MCDI had a positive and moderately significant correlation with z-WF and a negative and moderately significant correlation with z-ND: r(222) = .60, p < .001, and r(222) = −.69, p < .001, respectively. That is, as vocabulary size increased, WF increased and ND decreased; z-ND and z-WF were weakly and negatively correlated, r(222) = −.36, p < .001.
Plots of the relationships between z-MCDI and z-ND and z-MCDI and z-WF are shown in Figures 1 and 2. The plot for z-MCDI × z-ND reflects the significant negative correlation, with low vocabularies being comprised of high z-ND values relative to larger vocabularies. The plot for z-MCDI × z-WF reflects the significant positive correlation, with low vocabularies generally consisting of low z-WF values, although it is clear that there is some variability in both z-WF and z-ND values for the lowest vocabulary scores.
Figure 1

Scatter plot of z-scores for vocabulary and neighborhood density (ND).

Scatter plot of z-scores for vocabulary and neighborhood density (ND).
Figure 1

Scatter plot of z-scores for vocabulary and neighborhood density (ND).

×
Figure 2

Scatter plot of z-scores for vocabulary and word frequency (WF).

Scatter plot of z-scores for vocabulary and word frequency (WF).
Figure 2

Scatter plot of z-scores for vocabulary and word frequency (WF).

×
The plots suggested possible quadratic relationships, so curve estimations were generated. Curve estimation showed that the relationship between z-WF and z-MCDI was linear, F(1, 220) = 121.87, p < .001, with z-WF accounting for 36% of the variance in z-MCDI. Curve estimation showed that although there was a significant linear relationship between z-ND and z-MCDI, F(1, 220) = 194.43, p < .001, R2 = .47, there was also a strong quadratic relationship, F(2, 219) = 189.70, p < .001, R2 = .63.
A multiple regression was conducted with z-ND and WF predictors entered together using the backward method, and probability plots of the residuals were examined for violations of the model. As the residuals were satisfactorily distributed, the linear model was used. The model was significant, F(2, 219) = 171.99, p < .001, with z-WF and z-ND accounting for 61% of the variance in vocabulary scores. Inspection of the t values in the table of coefficients (see Table 2) shows that z-ND was the strongest predictor, followed by z-WF. The partial correlation for z-ND was --.63, and for z-WF it was .52, suggesting that WF and ND should be considered as separate factors contributing to the variance in vocabulary scores. With these results, a hierarchical multiple regression was run in which z-ND accounted for 47% of unique variance in z-MCDI, F(1, 220) = 194.43, p < .001, and z-WF accounted for 14% of additional unique variance, F(1, 219) = 79.86, p < .001.
Table 2Table of standardized coefficients for the multiple regression predicting MCDI scores.
Table of standardized coefficients for the multiple regression predicting MCDI scores.×
PredictorBtp95% confidence interval
ND−.54−11.97.000[−.63, −.45]
WF.408.94.000[.31, −.49]
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).×
Table 2Table of standardized coefficients for the multiple regression predicting MCDI scores.
Table of standardized coefficients for the multiple regression predicting MCDI scores.×
PredictorBtp95% confidence interval
ND−.54−11.97.000[−.63, −.45]
WF.408.94.000[.31, −.49]
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).×
×
Overall, ND was inversely related to vocabulary size and was a strong predictor of vocabulary size in these 2-year-old children. As vocabulary size increased, ND dropped (i.e., more words from sparse neighborhoods in the ambient language were added to the lexicon). WF was directly related to vocabulary size; as vocabulary size increased, so did WF.
Group Differences on ND
To answer the second research question (i.e., “Is there a significant difference between children with small and large vocabularies in ND and WF?”), children who scored at or below −1 on the z-MCDI for age were coded as low vocabulary, yielding 37 children in the low-vocabulary group and 185 in the high-vocabulary group. Inspection of the distributions of z-ND for the two groups indicated a lack of homogeneity of variance. A nonparametric Mann–Whitney test revealed a significant difference between low- and high-vocabulary children in z-ND values (z = −8.03, p < .001). Children in the low-vocabulary group had higher z-ND scores than did children in the high-vocabulary group (M = 1.57, SD = 1.32, and M = −0.31, SD = 0.48, respectively). The same analysis was conducted for z-WF scores. The children in the low-vocabulary group scored significantly lower on z-WF than did the children in the high-vocabulary group (z = −5.97, p < .001; M = −1.0, SD = 1.16, and M = 0.20, SD = 0.81, for low- and high-vocabulary groups, respectively). Figures 3 and 4 show the error bar plots (bars indicate SDs) for the two groups. Clearly, there is more variability in the low-vocabulary group for both z-ND and z-WF (discussed in the paragraphs that follow). In summary, the lexicons of children with small vocabularies consist of words that are of high ND and low WF in the ambient language.
Figure 3

Error bar plots of ND for low- and high-vocabulary groups.

Error bar plots of ND for low- and high-vocabulary groups.
Figure 3

Error bar plots of ND for low- and high-vocabulary groups.

×
Figure 4

Error bar plots of WF for low- and high-vocabulary groups.

Error bar plots of WF for low- and high-vocabulary groups.
Figure 4

Error bar plots of WF for low- and high-vocabulary groups.

×
Group Differences Unpacked
There appear to be very different relationships among the variables for children who scored above and below −1 z-MCDI. Correlations between z-MCDI, z-ND, and z-WF are reported in Table 3. In the low-vocabulary group, vocabulary size was negatively and moderately correlated with ND but bore little relationship with WF. Children with very low vocabulary scores had learned words with very high ND in the ambient language. Almost the opposite pattern was seen in the high-vocabulary group, where there was a positive and moderately significant correlation with WF and a weak but significant negative correlation with ND. For these children, as vocabulary size increased, WF increased.
Table 3Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.
Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.×
VariableLow vocabularyHigh vocabulary
NDWFNDWF
MCDI−.63*.06−.33*.52*
ND−.05−.06
*p < .001.
*p < .001.×
Table 3Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.
Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.×
VariableLow vocabularyHigh vocabulary
NDWFNDWF
MCDI−.63*.06−.33*.52*
ND−.05−.06
*p < .001.
*p < .001.×
×
Separate regressions for z-ND and z-WF predicting z-MCDI were run for low- and high-vocabulary groups. For the low-z-MCDI children, the variables combined accounted for 37% of the variance in z-MCDI. As would be predicted from the correlations, z-WF was not a significant predictor (t = 1.39, p = .18) in the presence of z-ND (t = −4.3, p < .0001), with z-ND and z-WF accounting for 34% and 0.04% of the variance in z-MCDI, respectively. As vocabulary score increased, ND decreased, but WF did not pattern with vocabulary score.
For the high-z-MCDI children, the variables combined accounted for 44% of the variance in z-MCDI. Both z-WF and z-ND were significant predictors (t = 9.39, p < .001, and t = −6.09, p < .001, for z-WF and z-ND, respectively), with z-WF accounting for 32% of the variance in z-MCDI, and z-ND contributing 11% of unique variance accounted for. Here, WF was a stronger predictor of MCDI score than ND. As vocabulary score increased, WF increased, and although the patterning between ND and vocabulary score was weaker, as vocabulary size increased, ND decreased.
The surprising results for WF using the De Cara and Goswami (2002)  values (the relationship was the opposite of that reported in previous research) suggested replication with an alternative WF measure. The WF values from the CHILDES file of child-directed speech (Li & Shirai, 2000; MacWhinney, 2000) were used. There was a positive, strong correlation between the CHILDES frequencies and the De Cara and Goswami frequencies, r(222) = .90, p < .001, and a weak positive correlation between the CHILDES frequencies and the overall MCDI scores (not corrected for age), r(222) = .34, p < .001.
The split file correlations (separate analyses for low- and high-vocabulary groups) were repeated. For the low-vocabulary group, the CHILDES results were almost the same as the De Cara and Goswami (2002)  results, r(37) = −.09, p = .60. That is, the child-directed speech database yielded essentially the same results for the low-vocabulary children as this study’s original results. For the high-vocabulary group, the correlation between MCDI scores and the CHILDES WF values was weak but significant, r(185) = .32, p < .001—that is, not as strong as the De Cara and Goswami results, r(185) = .52. Overall, the alternate WF measure yielded results comparable to the De Cara and Goswami measure.
Figures 1 and 2 show that when vocabularies were very small (more than −1.5 below the mean z-score), there was variability in both ND and WF scores. The ND z-scores for these children (n = 27) ranged from −.03 to +3.8, and the WF z-scores ranged from −3.2 to +2.17. This means that some children achieved ND and WF values commensurate with larger vocabularies. Variability is further examined in the paragraphs that follow.
Overall, for group differences, the children with low vocabulary scores had learned words of high ND in the ambient language, with little impact of WF. For children with larger vocabularies, the effect of ND diminished, and increasing WF was seen as vocabularies increased in size.
Fact or Artifact?
Children who are slow to learn vocabulary (or have small vocabularies relative to their peers) have learned words that are of high ND and low WF in the ambient language (with variability yet to be addressed). Although these results appear to be compelling, the results could simply be an artifact of the MCDI checklist. Baayen (2001)  described in detail the fact that distributions of WF data are dependent on sample size. Are these child vocabulary results authentic or epiphenomenal of the dataset?1  What would small and large vocabularies look like in terms of ND and WF if random samples of child lexicons were drawn from the MCDI word list? To answer this question, simulations of random distributions of the 279 words in the database were generated using SPSS to explore the possibility that the results for the 222 two-year-old children were simply an artifact of the MCDI dataset. First, using SPSS macros, syntax commands were written to generate 50 random sample sizes at each of 5%, 10%, 15%, … 90% of the data (900 datasets). For each sample, average ND and WF were generated, just as had been done for the child data. The scatter plots of MCDI × ND and MCDI × WF for these random samples are shown in Figures 5 and 6. Both ND and WF show extreme variability for small vocabulary sizes and reduced variability in larger vocabularies (heteroscedasticity). Note also that for the group with small vocabularies, ND and WF cover the entire range of possible scores. For example, for the smallest vocabulary size, the mean ND value could be anywhere between <16 and >30, and the mean WF value could be anywhere between <10 and >1,000.
Figure 5

Scatter plot of MCDI × ND for 900 random samples. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).

Scatter plot of MCDI × ND for 900 random samples. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).
Figure 5

Scatter plot of MCDI × ND for 900 random samples. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).

×
Figure 6

Scatter plot of MCDI × WF for 900 random samples.

Scatter plot of MCDI × WF for 900 random samples.
Figure 6

Scatter plot of MCDI × WF for 900 random samples.

×
Second, 5 random samples of 232 cases (comparable to the start point for this study) were generated for comparison with the current dataset. Figures 7 and 8 show the scatter plots of these 5 random samples with the scores from this study’s real cases. For both variables (ND and WF), the actual child data does not map onto the random samples, having more linear distributions, indicating that the results of this study are authentic rather than epiphenomenal; that is, these are nonrandom developmental profiles. For ND, the children appear to score higher than the random samples. For WF, the children appear to score lower than the random samples. To find statistical evidence for this visual pattern, two one-way analyses of variance were run to determine whether or not the distributions differed among groups (Levene’s test of equality of error variances was not violated, F[5, 1378] = 0.30, p = .91). There was a significant difference between the actual child data and all random samples, although the effect size was small, but there were no differences among the random samples for ND, F(1, 5) = 4.36, p < .01, partial η2 = .02. A similar result was achieved for WF, F(1, 5) = 7.83, p < .001, partial η2 = .03. Figures 9 and 10 show ND and WF graphed by sample with 95% confidence intervals for group means on the ordinal axis. The statistical analysis shows that the probability that the child distributions for both WF and ND arose by chance was < .03.
Figure 7

Scatter plot of 5 random samples and the actual child data for ND.

Scatter plot of 5 random samples and the actual child data for ND.
Figure 7

Scatter plot of 5 random samples and the actual child data for ND.

×
Figure 8

Scatter plot of 5 random samples and the actual child data for WF.

Scatter plot of 5 random samples and the actual child data for WF.
Figure 8

Scatter plot of 5 random samples and the actual child data for WF.

×
Figure 9

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for ND.

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for ND.
Figure 9

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for ND.

×
Figure 10

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for WF.

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for WF.
Figure 10

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for WF.

×
These simulations suggest that the results for the 222 children were not artifactual but truly reflect the WF and ND of small and large lexicons. A word about variability is warranted. Given the distributions from the random samples, it is possible that the wide variability in z-ND and z-WF scores for children below −1.5 of the mean z-MCDI score simply reflect distribution properties of the word set rather than any real finding (see Figures 9 and 10). However, if this was the case, higher and lower WF and ND values may have been expected in the child data. Variability at the extremely low end of the distributions is raised in the Discussion. Figure 11 plots z-ND × z-WF for children at this extreme end of the vocabulary distribution.
Figure 11

Scatter plot of ND and WF z-scores for children below −1 on MCDI vocabulary scores.

Scatter plot of ND and WF z-scores for children below −1 on MCDI vocabulary scores.
Figure 11

Scatter plot of ND and WF z-scores for children below −1 on MCDI vocabulary scores.

×
Finally, to return to the issue of the relationship between ND and WF, the distributions of these variables in the 279 MCDI words used in this study were examined. The correlation between ND and De Cara and Goswami (2002)  WF was .07 (p = .26), and that between ND and CHILDES WF was .11 (p = .07).
Summary of Findings
These findings indicate that ND is a strong predictor of vocabulary size for 2-year-old children, accounting for 47% of the variance in MCDI scores, with WF contributing an additional 14% of variance accounted for. ND was inversely related to vocabulary size and was a stronger predictor of vocabulary size than WF, which was directly (not inversely) related to vocabulary size.
The relationships between ND and vocabulary size and WF and vocabulary size for small and large vocabularies were remarkably different. Children with low vocabulary scores appear to have learned words that are of high ND in the ambient language, but there was no clear pattern for WF; children with high vocabulary scores had learned words that were of low ND and high WF in the ambient language. Children who scored more than −1 SD below the mean on the MCDI (by age group) were much more variable on both ND and WF values than children who scored above the cut-point. Simulations showed that the results were authentic rather than epiphenomenal of the word set; however, the marked variability in ND and WF for children with very small vocabularies could reflect random distributions.
Discussion
This research addressed two questions: (a) How much variance in vocabulary size is accounted for by ND and WF together and independently in 2-year-old children? and (b) Is there a significant difference between children with small and large vocabularies in ND and WF? The first question and the outcomes for the sample as a whole are discussed before focusing on the children at the low end of the vocabulary continuum.
The Sample as a Whole
Together, the variables accounted for 61% of the variance in vocabulary size, making these variables strong predictors of the size of children’s lexicons. ND accounted for 47% of the variance in vocabulary size. WF also contributed 14% of unique variance accounted for. Small vocabularies were comprised of words that were of high ND and low WF in the ambient language. These results are partially congruent with those of Coady and Aslin (2003)  and Storkel (2004a)  who, in a case study and a corpora study, respectively, found that words that were known by younger children (i.e., acquired earliest) tended to have higher ND (i.e., have many neighbors in the ambient language, in this case English) and higher WF than words acquired later. The results are also congruent with Storkel (2008a)  who, using the same cohort database, reported that the best predictors of the percentage of children at each age who knew a given word were ND and word length. Storkel’s (2008a)  second analysis used the age at which 75% of children know a word as the dependent variable, and results indicated that although ND may be an important factor in word learning for very young children, the effect diminishes with age, suggesting that with increasing vocabulary size, children more readily learn words from sparse neighborhoods. The present results show conclusively—as has been claimed by several authors—that words that have many neighbors in the ambient language are learned earlier than words with fewer neighbors, suggesting once again that early in development, children are sensitive to common word patterns in the ambient language (Storkel, 2008b).
WF is where the current results differ from the extant literature. Prior results for WF are limited in number and not well explained. Storkel (2008a)  found that WF was not a strong predictor of word learning but stated that “trends were in the expected direction with more infants knowing higher-frequency than lower-frequency words” (p. 313). Storkel (2004a)  and Maekawa and Storkel (2006)  reported that high-frequency words were learned earlier than low-frequency words. Goodman et al. (2008)  provide the best insight to date, finding that when analysis includes both open and closed word categories, age of acquisition is directly related to WF, but the relationship is indirect when each word category is analyzed separately. Odd results can often be explained by the choice of study variables. Recall that the current research examined British English data, whereas previous reports focused on American English. This necessitated the use of ND and WF measures (De Cara & Goswami, 2002) that differed from those previously reported. It could be that this simple difference generated markedly different results. However, the CHILDES WF counts (Li & Shirai, 2000; MacWinney, 2000) produced similar results. Another possibility is that that early vocabularies are not comprised of high-frequency words. Table 1 shows that WF has not been studied extensively in the developmental literature. As this is the first time actual data from a large group of same-aged children has been explored, these results may reflect the true state of play for WF for emerging lexicons. Most results showing that early words were of high WF were gleaned from MCDI databases. Maekawa and Storkel (2006)  found that only 1 of the 3 children they studied learned high-frequency words before low-frequency words. WF may need further exploration.
Children at the Low End of the Vocabulary Continuum
Striking results were noted when the relationships among these variables were considered separately for children with low (at or below 1 SD below the mean on the MCDI) and high (above this value) vocabulary scores. ND was fairly uniform for vocabulary sizes within ±1 SD of the mean. At −1.5 SDs below the mean for vocabulary, there is marked variability in ND values, with z ranging from −0.03 to 3.8. This band of MCDI scores could reveal patterns that resemble the distributions of random samples, or they could indicate the point at which children who stay late talkers and those who become late bloomers part company. Of all 27 of these children, only 9 scored between zero and −1 SD of the mean (i.e., within the normal range for ND), and they are circled on Figure 11. Perhaps those children who begin to learn words that have a lower ND in the ambient language (in this sample, 9 children) eventually normalize in vocabulary size, and perhaps children who do not begin to learn words of lower ND (n = 19) continue on as late talkers who develop a language impairment. The children who are at between > −2 and −3 SDs below the mean MCDI score have very high ND values relative to the rest of the sample (> 3 SDs above the mean ND score). There is very little variability in ND in these children. It is possible that these children, who are at the extremely low end of vocabulary scores, remain less able to learn words of lower ND from the ambient language input. Reasons for this, along with the WF findings, are discussed in the paragraphs that follow. These ND values are mean values for a given child’s lexicon, so of course some words, by definition, must have come from low ND sets; however, the pattern of ND in these low-vocabulary children suggests that this hypothesis requires investigation. These conclusions are of course speculative but worthy of further investigation in a longitudinal dataset.
For the low-vocabulary children, there was no significant correlation between vocabulary size and WF. This was a surprising result because the prediction based on prior research was that children at the low end of the vocabulary continuum would have learned words that were high in both ND and WF in the ambient language.
As with ND, when vocabulary size was very small (< −1.5), there was marked variability, with a spread of z-WF ranging from −3.2 to 2.17. However, only 3 of the children scored above zero (the mean). That is, all but 3 of the 27 children had very low WF values relative to the rest of the sample.
Why would high ND and low WF be a facilitatory cue for children struggling to learn language? There may be any number of ways to interpret these findings. Recent findings on speakers' modifications of vowel space and duration as a function of word frequency and density could explain the strong result that high ND and low WF pattern together for children with low vocabulary size. Wright (2004)  suggested that speakers implicitly regulate production to expand vowel space and increase word duration for low WF and high ND (i.e., lexically difficult) words in order to maximize listener perception. These two input features could be sufficient to raise the ambient salience of these words for toddlers, and, if so, high ND and low WF could be cues for learning for these children. Scarborough (2004)  claimed that words from dense neighborhoods received more activation than words from sparse neighborhoods during both perception and production, but low frequency words make lexical access more difficult. If this is the case, then experimental research is needed to examine the ability of very young children with high and low vocabulary scores to process (or learn) words that pattern as high ND/low WF versus words that pattern as low ND/high WF, and so forth.
Another interpretation, possibly related to the first, is that infants are adept at segmenting familiar word structures from the ambient stream of language. Here, high ND forms could be salient and more easily abstracted from the stream than words from sparse neighborhoods (Saffran & Graf Estes, 2006). It is not clear then what role low WF would play, except perhaps that of novelty.
Finally, as always, there are exceptions to the rule. Two children in the 27 identified by extremely low vocabulary scores patterned as high ND/high WF, and 1 patterned as average ND/high WF. Such exceptions are a reminder that a myriad of factors impinge on vocabulary learning, and a range of social, linguistic, affective, and cognitive variables could be implicated.
Limitations and Further Research
As with all studies of this nature, only monosyllabic words were used, given the difficulty of examining ND in multisyllabic words. It is possible that the results would be different if all MCDI words were included. Measurement of ND and WF should be explored in a large sample size of children’s lexicons derived from spontaneous speech data or expressive/receptive vocabulary tests. Given the novel findings of this study, other lexical and sublexical factors should be examined in this same dataset, namely phonotactic probability, word length, and differences in patterns for word classes (e.g., nouns vs. verbs). Further, it could be informative to conduct a detailed examination of the phonological structure of words that children learn early (i.e., adult or target structure), or the types of neighbors that are learned early (e.g., rhyme, vowel, or onset neighbors), to shed light on the nature of cues in dense neighborhoods. These are rich avenues for further research that might elucidate the early onsets of language impairment.
Conclusion
This report has indicated that children at the lowest points of a continuum of vocabulary size may be extracting statistical properties of the input language in a manner quite different from their more able age peers. This finding should be paired with recently gleaned knowledge that these same children also have difficulty in processing nonword stimuli in a repetition task (Stokes & Klee, 2009a, 2009b). It would be worthwhile to track such children longitudinally to see if these indications of different phonological processing abilities at 2 years of age differentiate children who subsequently become late talkers (and language impaired) or late bloomers (normalized) by 4 years of age.
Acknowledgments
This project was funded by ESRC RES-000-22-0712. I thank Carmel Houston-Price and Graham Schafer for access to the Reading University research database; Jill Hearing and Sarah Fincham-Majumdar for excellent work as research assistants; and Northumbria University for their generous support of the project.
References
Baayen, R. H. (2001). Word frequency distributions. Dordrecht, the Netherlands Kluwer
Baayen, R. H. (2001). Word frequency distributions. Dordrecht, the Netherlands Kluwer×
Baayen, R. H., Piepenbrock, R., Gulikers, L. (1995). The CELEX lexical database (CD-ROM). Philadelphia, PA University of Pennsylvania, Linguistic Data Consortium
Baayen, R. H., Piepenbrock, R., Gulikers, L. (1995). The CELEX lexical database (CD-ROM). Philadelphia, PA University of Pennsylvania, Linguistic Data Consortium×
Bishop, D. V. M., Price, T. S., Dale, P. S., Plomin, R. (2003). Outcomes of early language delay: II. Etiology of transient and persistent language difficulties. Journal of Speech, Language, and Hearing Research. 46 561–575 [Article]
Bishop, D. V. M., Price, T. S., Dale, P. S., Plomin, R. (2003). Outcomes of early language delay: II. Etiology of transient and persistent language difficulties. Journal of Speech, Language, and Hearing Research. 46 561–575 [Article] ×
Charles-Luce, J., Luce, P. A. (1990). Similarity neighborhoods of words in young children’s lexicons. Journal of Child Language. 17 205–215 [Article] [PubMed]
Charles-Luce, J., Luce, P. A. (1990). Similarity neighborhoods of words in young children’s lexicons. Journal of Child Language. 17 205–215 [Article] [PubMed]×
Coady, J. A., Aslin, R. N. (2003). Phonological neighbourhoods in the developing lexicon. Journal of Child Language. 30 441–469 [Article] [PubMed]
Coady, J. A., Aslin, R. N. (2003). Phonological neighbourhoods in the developing lexicon. Journal of Child Language. 30 441–469 [Article] [PubMed]×
Dale, P. S., Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments, & Computers. 28 125–127 [Article]
Dale, P. S., Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments, & Computers. 28 125–127 [Article] ×
De Cara, B., Goswami, U. (2002). Similarity relations among spoken words: The special status of rimes in English. Behavior Research Methods, Instruments, & Computers. 34 416–423 [Article]
De Cara, B., Goswami, U. (2002). Similarity relations among spoken words: The special status of rimes in English. Behavior Research Methods, Instruments, & Computers. 34 416–423 [Article] ×
De Cara, B., Goswami, U.(n.d.). Statistical analysis of similarity relations among spoken words: Evidence for the special status of rimes in English. Retrieved from http://portail.unice.fr/jahia/page12414.html
De Cara, B., Goswami, U.(n.d.). Statistical analysis of similarity relations among spoken words: Evidence for the special status of rimes in English. Retrieved from http://portail.unice.fr/jahia/page12414.html×
Desmarais, C., Sylvestre, A., Meyer, F., Bairati, I., Rouleau, N. (2008). Systematic review of the literature on characteristics of late-talking toddlers. International Journal of Language and Communication Disorders. 43 361–389 [Article]
Desmarais, C., Sylvestre, A., Meyer, F., Bairati, I., Rouleau, N. (2008). Systematic review of the literature on characteristics of late-talking toddlers. International Journal of Language and Communication Disorders. 43 361–389 [Article] ×
Dollaghan, C. A. (1994). Children’s phonological neighbourhoods: Half empty or half full?. Journal of Child Language. 21 257–272 [Article]
Dollaghan, C. A. (1994). Children’s phonological neighbourhoods: Half empty or half full?. Journal of Child Language. 21 257–272 [Article] ×
Ellis Weismer, S. (2007). Typical talkers, late talkers, and children with specific language impairment: A language endowment spectrum?. Paul, R. Language disorders from a developmental perspective.  83–101 Mahwah, NJ Erlbaum
Ellis Weismer, S. (2007). Typical talkers, late talkers, and children with specific language impairment: A language endowment spectrum?. Paul, R. Language disorders from a developmental perspective.  83–101 Mahwah, NJ Erlbaum×
Frauenfelder, U. H., Baayen, R. H., Hellwig, F. M. (1993). Neighborhood density and frequency across languages and modalities. Journal of Memory and Language. 32 781–804 [Article]
Frauenfelder, U. H., Baayen, R. H., Hellwig, F. M. (1993). Neighborhood density and frequency across languages and modalities. Journal of Memory and Language. 32 781–804 [Article] ×
Gierut, J. A., Morrisette, M. L., Champion, A. H. (1999). Lexical constraints in phonological acquisition. Journal of Child Language. 26 261–294 [Article]
Gierut, J. A., Morrisette, M. L., Champion, A. H. (1999). Lexical constraints in phonological acquisition. Journal of Child Language. 26 261–294 [Article] ×
Goodman, J. C., Dale, P. S., Li, P. (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language. 35 515–531 [Article]
Goodman, J. C., Dale, P. S., Li, P. (2008). Does frequency count? Parental input and the acquisition of vocabulary. Journal of Child Language. 35 515–531 [Article] ×
Hadley, P. A., Holt, J. K. (2006). Individual differences in the onset of tense marking: A growth-curve analysis. Journal of Speech, Language, and Hearing Research. 49 984–1000 [Article]
Hadley, P. A., Holt, J. K. (2006). Individual differences in the onset of tense marking: A growth-curve analysis. Journal of Speech, Language, and Hearing Research. 49 984–1000 [Article] ×
Klee, T., Harrison, C. (2001). JulyCDI Words and Sentences: Validity and preliminary norms for British English. Paper presented at Child Language Seminar, University of Hertfordshire, England
Klee, T., Harrison, C. (2001). JulyCDI Words and Sentences: Validity and preliminary norms for British English. Paper presented at Child Language Seminar, University of Hertfordshire, England×
Kučera, F., Francis, W. (1967). Computational analysis of present day American English. Providence, RI Brown University Press
Kučera, F., Francis, W. (1967). Computational analysis of present day American English. Providence, RI Brown University Press×
Landauer, T. K., Streeter, L. A. (1973). Structural differences between common and rare words: Failure of equivalence assumptions for theories of word recognition. Journal of Verbal Learning and Verbal Behavior. 12 119–131 [Article]
Landauer, T. K., Streeter, L. A. (1973). Structural differences between common and rare words: Failure of equivalence assumptions for theories of word recognition. Journal of Verbal Learning and Verbal Behavior. 12 119–131 [Article] ×
Li, P., Shirai, Y. (2000). The acquisition of lexical and grammatical aspect. Berlin, Germany, & New York, NY Mouton de Gruyter
Li, P., Shirai, Y. (2000). The acquisition of lexical and grammatical aspect. Berlin, Germany, & New York, NY Mouton de Gruyter×
Luce, P., Pisoni, D. (1998). Recognizing spoken words: The neighborhood activation model. The Journal of the Acoustical Society of America. 96 40–55
Luce, P., Pisoni, D. (1998). Recognizing spoken words: The neighborhood activation model. The Journal of the Acoustical Society of America. 96 40–55×
MacWhinney, B. J. (2000). The CHILDES project: Tools for analyzing talk. 3rd ed. Mahwah, NJ Erlbaum
MacWhinney, B. J. (2000). The CHILDES project: Tools for analyzing talk. 3rd ed. Mahwah, NJ Erlbaum×
Maekawa, J., Storkel, H. L. (2006). Individual differences in the influence of phonological characteristics on expressive vocabulary development by young children. Journal of Child Language. 33 439–459 [Article]
Maekawa, J., Storkel, H. L. (2006). Individual differences in the influence of phonological characteristics on expressive vocabulary development by young children. Journal of Child Language. 33 439–459 [Article] ×
Moyle, M. J., Ellis Weismer, S., Evans, J. L., Lindstrom, M. J. (2007). Longitudinal relationships between lexical and grammatical development in typical and late-talking children. Journal of Speech, Language, and Hearing Research. 50 508–528 [Article]
Moyle, M. J., Ellis Weismer, S., Evans, J. L., Lindstrom, M. J. (2007). Longitudinal relationships between lexical and grammatical development in typical and late-talking children. Journal of Speech, Language, and Hearing Research. 50 508–528 [Article] ×
Munson, B., Solomon, N. P. (2004). The effect of phonological neighborhood density on vowel articulation. Journal of Speech, Language, and Hearing Research. 47 1048–1058 [Article]
Munson, B., Solomon, N. P. (2004). The effect of phonological neighborhood density on vowel articulation. Journal of Speech, Language, and Hearing Research. 47 1048–1058 [Article] ×
Nusbaum, H. C., Pisoni, D. B., Davis, C. K. (1984). Sizing up the Hoosier Mental Lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report No. 10. Bloomington, IN Indiana University
Nusbaum, H. C., Pisoni, D. B., Davis, C. K. (1984). Sizing up the Hoosier Mental Lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report No. 10. Bloomington, IN Indiana University×
Paul, R. (1996). Clinical implications of the natural history of slow expressive language development. American Journal of Speech-Language Pathology. 5 5–21 [Article]
Paul, R. (1996). Clinical implications of the natural history of slow expressive language development. American Journal of Speech-Language Pathology. 5 5–21 [Article] ×
Pisoni, D. B., Nusbaum, H. C., Luce, P. A., Slowiaczek, L. M. (1985). Speech perception, word recognition and the structure of the lexicon. Speech Communication. 4 75–95 [Article]
Pisoni, D. B., Nusbaum, H. C., Luce, P. A., Slowiaczek, L. M. (1985). Speech perception, word recognition and the structure of the lexicon. Speech Communication. 4 75–95 [Article] ×
Rescorla, L. (1989). The language development survey: A screening tool for delayed language in toddlers. Journal of Speech and Hearing Disorders. 54 587–599 [Article]
Rescorla, L. (1989). The language development survey: A screening tool for delayed language in toddlers. Journal of Speech and Hearing Disorders. 54 587–599 [Article] ×
Rescorla, L. (2002). Language and reading outcomes to age 9 in late-talking toddlers. Journal of Speech, Language, and Hearing Research. 45 360–371 [Article]
Rescorla, L. (2002). Language and reading outcomes to age 9 in late-talking toddlers. Journal of Speech, Language, and Hearing Research. 45 360–371 [Article] ×
Recorla, L., Mirkak, J., Singh, L. (2000). Vocabulary growth in late talkers: Lexical development from 2;0 to 3;0. Journal of Child Language. 27 293–311 [Article]
Recorla, L., Mirkak, J., Singh, L. (2000). Vocabulary growth in late talkers: Lexical development from 2;0 to 3;0. Journal of Child Language. 27 293–311 [Article] ×
Rescorla, L., Roberts, J. (2002). Nominal versus verbal morpheme use in late talkers at ages 3 and 4. Journal of Speech, Language, and Hearing Research. 45 1219–1231 [Article]
Rescorla, L., Roberts, J. (2002). Nominal versus verbal morpheme use in late talkers at ages 3 and 4. Journal of Speech, Language, and Hearing Research. 45 1219–1231 [Article] ×
Reznick, J., Goldsmith, L. (1989). A multiple form word production checklist for assessing early language. Journal of Child Language. 16 91–100 [Article]
Reznick, J., Goldsmith, L. (1989). A multiple form word production checklist for assessing early language. Journal of Child Language. 16 91–100 [Article] ×
Rice, M. L., Taylor, C. L., Zubrick, S. R. (2008). Language outcomes of 7-year-old children with or without a history of late language emergence at 24 months. Journal of Speech, Language, and Hearing Research. 51 394–407 [Article]
Rice, M. L., Taylor, C. L., Zubrick, S. R. (2008). Language outcomes of 7-year-old children with or without a history of late language emergence at 24 months. Journal of Speech, Language, and Hearing Research. 51 394–407 [Article] ×
Rost, G. C., McMurray, B. (2009). Speaker variability augments phonological processing in early word learning. Developmental Sciences. 12 339–349 [Article]
Rost, G. C., McMurray, B. (2009). Speaker variability augments phonological processing in early word learning. Developmental Sciences. 12 339–349 [Article] ×
Saffran, J. R., Graf Estes, K. (2006). Mapping sound to meaning: Connections between learning about sounds and learning about words. Advances in Child Development and Behavior. 34 1–38
Saffran, J. R., Graf Estes, K. (2006). Mapping sound to meaning: Connections between learning about sounds and learning about words. Advances in Child Development and Behavior. 34 1–38×
Scarborough, R. A.(2004). Coarticulation and the structure of the lexicon (Doctoral dissertation, University of California, Los Angeles). Retrieved from http://www.linguistics.ucla.edu/faciliti/research/scarb_diss.pdf.
Scarborough, R. A.(2004). Coarticulation and the structure of the lexicon (Doctoral dissertation, University of California, Los Angeles). Retrieved from http://www.linguistics.ucla.edu/faciliti/research/scarb_diss.pdf.×
Stokes, S. F., Klee, T. (2009a). The diagnostic accuracy of a new test of early nonword repetition for differentiating late talking and typically developing children. Journal of Speech, Language, and Hearing Research.
Stokes, S. F., Klee, T. (2009a). The diagnostic accuracy of a new test of early nonword repetition for differentiating late talking and typically developing children. Journal of Speech, Language, and Hearing Research. ×
Stokes, S. F., Klee, T. (2009b). Factors that influence vocabulary development in two-year-old children. Journal of Child Psychology and Psychiatry. 50 498–505 [Article]
Stokes, S. F., Klee, T. (2009b). Factors that influence vocabulary development in two-year-old children. Journal of Child Psychology and Psychiatry. 50 498–505 [Article] ×
Storkel, H. L. (2004a). Do children acquire dense neighborhoods? An investigation of similarity neighborhoods in lexical acquisition. Applied Psycholinguistics. 25 201–221 [Article]
Storkel, H. L. (2004a). Do children acquire dense neighborhoods? An investigation of similarity neighborhoods in lexical acquisition. Applied Psycholinguistics. 25 201–221 [Article] ×
Storkel, H. L. (2004b). The emerging lexicon of children with phonological delays. Journal of Speech, Language, and Hearing Research. 47 1194–1212 [Article]
Storkel, H. L. (2004b). The emerging lexicon of children with phonological delays. Journal of Speech, Language, and Hearing Research. 47 1194–1212 [Article] ×
Storkel, H. L. (2008a). Developmental differences in the effects of phonological, lexical and semantic variables on word learning by infants. Journal of Child Language. 26 291–321
Storkel, H. L. (2008a). Developmental differences in the effects of phonological, lexical and semantic variables on word learning by infants. Journal of Child Language. 26 291–321×
Storkel, H. L. (2008b). First utterances. Rickheit, G., Strohner, H. Handbook of communication competence.  125–147 Berlin, Germany Mouten de Gruyter
Storkel, H. L. (2008b). First utterances. Rickheit, G., Strohner, H. Handbook of communication competence.  125–147 Berlin, Germany Mouten de Gruyter×
Thal, D. J., Reilly, J., Seibert, L., Jeffries, R., Fenson, J. (2003). Language development in children at risk for language impairment: Cross population comparisons. Brain and Language. 88 167–179 [Article]
Thal, D. J., Reilly, J., Seibert, L., Jeffries, R., Fenson, J. (2003). Language development in children at risk for language impairment: Cross population comparisons. Brain and Language. 88 167–179 [Article] ×
Wright, R. (2004). Factors of lexical competition in vowel articulation. Local, J., Ogden, R., Temple, R. Papers in laboratory phonology VI.  75–87 Cambridge, England Cambridge University Press
Wright, R. (2004). Factors of lexical competition in vowel articulation. Local, J., Ogden, R., Temple, R. Papers in laboratory phonology VI.  75–87 Cambridge, England Cambridge University Press×
Zubrick, S. R., Taylor, C. L., Rice, M. L., Slegers, D. W. (2007). Late language emergence at 24 months: An epidemiological study of prevalence, predictors, and covariates. Journal of Speech, Language, and Hearing Research. 50 1562–1592 [Article]
Zubrick, S. R., Taylor, C. L., Rice, M. L., Slegers, D. W. (2007). Late language emergence at 24 months: An epidemiological study of prevalence, predictors, and covariates. Journal of Speech, Language, and Hearing Research. 50 1562–1592 [Article] ×
1I am grateful to Associate Editor Benjamin Munson for this suggestion.
I am grateful to Associate Editor Benjamin Munson for this suggestion.×
Figure 1

Scatter plot of z-scores for vocabulary and neighborhood density (ND).

Scatter plot of z-scores for vocabulary and neighborhood density (ND).
Figure 1

Scatter plot of z-scores for vocabulary and neighborhood density (ND).

×
Figure 2

Scatter plot of z-scores for vocabulary and word frequency (WF).

Scatter plot of z-scores for vocabulary and word frequency (WF).
Figure 2

Scatter plot of z-scores for vocabulary and word frequency (WF).

×
Figure 3

Error bar plots of ND for low- and high-vocabulary groups.

Error bar plots of ND for low- and high-vocabulary groups.
Figure 3

Error bar plots of ND for low- and high-vocabulary groups.

×
Figure 4

Error bar plots of WF for low- and high-vocabulary groups.

Error bar plots of WF for low- and high-vocabulary groups.
Figure 4

Error bar plots of WF for low- and high-vocabulary groups.

×
Figure 5

Scatter plot of MCDI × ND for 900 random samples. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).

Scatter plot of MCDI × ND for 900 random samples. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).
Figure 5

Scatter plot of MCDI × ND for 900 random samples. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).

×
Figure 6

Scatter plot of MCDI × WF for 900 random samples.

Scatter plot of MCDI × WF for 900 random samples.
Figure 6

Scatter plot of MCDI × WF for 900 random samples.

×
Figure 7

Scatter plot of 5 random samples and the actual child data for ND.

Scatter plot of 5 random samples and the actual child data for ND.
Figure 7

Scatter plot of 5 random samples and the actual child data for ND.

×
Figure 8

Scatter plot of 5 random samples and the actual child data for WF.

Scatter plot of 5 random samples and the actual child data for WF.
Figure 8

Scatter plot of 5 random samples and the actual child data for WF.

×
Figure 9

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for ND.

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for ND.
Figure 9

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for ND.

×
Figure 10

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for WF.

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for WF.
Figure 10

Box plots of 95% confidence intervals around the means for 5 random samples and actual child data for WF.

×
Figure 11

Scatter plot of ND and WF z-scores for children below −1 on MCDI vocabulary scores.

Scatter plot of ND and WF z-scores for children below −1 on MCDI vocabulary scores.
Figure 11

Scatter plot of ND and WF z-scores for children below −1 on MCDI vocabulary scores.

×
Table 1Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.
Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.×
AuthorSourceNDWF
Storkel (2004a) Checklist databaseHighHigh
Storkel (2008a) Checklist databaseHigh(Correlated with ND)
Coady & Aslin (2003) Individual language samples (N = 2)High
Maekawa & Storkel (2006) Individual language samples (N = 3)Child 3 = lowaChild 3 = higha
Goodman, Dale, & Li (2008) Checklist databaseWord-category dependent
Storkel (2004b) Children with PDLow (PP)
Gierut, Morrisette, & Champion (1999) Children with PDLowHigh
Note. PD = phonological disorder. PP = phonotactic probability.
Note. PD = phonological disorder. PP = phonotactic probability.×
aChild 1 and 2 = effects of word length.
aChild 1 and 2 = effects of word length.×
Table 1Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.
Summary of the main findings for neighborhood density (ND) and word frequency (WF) characteristics in child vocabulary development.×
AuthorSourceNDWF
Storkel (2004a) Checklist databaseHighHigh
Storkel (2008a) Checklist databaseHigh(Correlated with ND)
Coady & Aslin (2003) Individual language samples (N = 2)High
Maekawa & Storkel (2006) Individual language samples (N = 3)Child 3 = lowaChild 3 = higha
Goodman, Dale, & Li (2008) Checklist databaseWord-category dependent
Storkel (2004b) Children with PDLow (PP)
Gierut, Morrisette, & Champion (1999) Children with PDLowHigh
Note. PD = phonological disorder. PP = phonotactic probability.
Note. PD = phonological disorder. PP = phonotactic probability.×
aChild 1 and 2 = effects of word length.
aChild 1 and 2 = effects of word length.×
×
Table 2Table of standardized coefficients for the multiple regression predicting MCDI scores.
Table of standardized coefficients for the multiple regression predicting MCDI scores.×
PredictorBtp95% confidence interval
ND−.54−11.97.000[−.63, −.45]
WF.408.94.000[.31, −.49]
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).×
Table 2Table of standardized coefficients for the multiple regression predicting MCDI scores.
Table of standardized coefficients for the multiple regression predicting MCDI scores.×
PredictorBtp95% confidence interval
ND−.54−11.97.000[−.63, −.45]
WF.408.94.000[.31, −.49]
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).
Note. MCDI = MacArthur–Bates Communicative Development Inventory: Words and Sentences (Klee & Harrison, 2001).×
×
Table 3Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.
Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.×
VariableLow vocabularyHigh vocabulary
NDWFNDWF
MCDI−.63*.06−.33*.52*
ND−.05−.06
*p < .001.
*p < .001.×
Table 3Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.
Correlations among MCDI, ND, and WF (all z-scores) for children in the low- and high-vocabulary groups.×
VariableLow vocabularyHigh vocabulary
NDWFNDWF
MCDI−.63*.06−.33*.52*
ND−.05−.06
*p < .001.
*p < .001.×
×