Establishing Language Benchmarks for Children With Typically Developing Language and Children With Language Impairment Purpose Practitioners, researchers, and policymakers (i.e., stakeholders) have vested interests in children's language growth yet currently do not have empirically driven methods for measuring such outcomes. The present study established language benchmarks for children with typically developing language (TDL) and children with language impairment (LI) from 3 to 9 years ... Research Article
Free
Research Article  |   February 01, 2017
Establishing Language Benchmarks for Children With Typically Developing Language and Children With Language Impairment
 
Author Affiliations & Notes
  • Mary Beth Schmitt
    Department of Speech, Language and Hearing Sciences, Health Sciences Center, Texas Tech University, Lubbock
  • Jessica A. R. Logan
    Crane Center for Early Childhood Research and Policy, The Ohio State University, Columbus
  • Sherine R. Tambyraja
    Crane Center for Early Childhood Research and Policy, The Ohio State University, Columbus
  • Kelly Farquharson
    Emerson College, Boston, MA
  • Laura M. Justice
    Crane Center for Early Childhood Research and Policy, The Ohio State University, Columbus
  • Disclosure: The authors have declared that no competing interests existed at the time of publication.
    Disclosure: The authors have declared that no competing interests existed at the time of publication. ×
  • Correspondence to Mary Beth Schmitt: MaryBeth.Schmitt@ttuhsc.edu
  • Editor: Rhea Paul
    Editor: Rhea Paul×
  • Associate Editor: Marleen Westerveld
    Associate Editor: Marleen Westerveld×
Article Information
Language / Research Articles
Research Article   |   February 01, 2017
Establishing Language Benchmarks for Children With Typically Developing Language and Children With Language Impairment
Journal of Speech, Language, and Hearing Research, February 2017, Vol. 60, 364-378. doi:10.1044/2016_JSLHR-L-15-0273
History: Received August 4, 2015 , Revised December 21, 2015 , Accepted April 28, 2016
 
Journal of Speech, Language, and Hearing Research, February 2017, Vol. 60, 364-378. doi:10.1044/2016_JSLHR-L-15-0273
History: Received August 4, 2015; Revised December 21, 2015; Accepted April 28, 2016
Web of Science® Times Cited: 2

Purpose Practitioners, researchers, and policymakers (i.e., stakeholders) have vested interests in children's language growth yet currently do not have empirically driven methods for measuring such outcomes. The present study established language benchmarks for children with typically developing language (TDL) and children with language impairment (LI) from 3 to 9 years of age.

Method Effect sizes for grammar, vocabulary, and overall language were calculated for children with TDL (n = 20,018) using raw score means and standard deviations from 8 norm-referenced measures of language. Effect sizes for children with LI were calculated using fall and spring norm-referenced language measures for 497 children with LI receiving business-as-usual therapy in the public schools.

Results Considerable variability was found in expected change across both samples of children over time, with preschoolers exhibiting larger effect sizes (d = 0.82 and 0.70, respectively) compared with school-age children (d = 0.49 and 0.55, respectively).

Conclusions This study provides a first step toward establishing empirically based language benchmarks for children. These data offer stakeholders an initial tool for setting goals based on expected growth (practitioners), making informed decisions on language-based curricula (policymakers), and measuring effectiveness of intervention research (researchers).

Language benchmarks, defined as a standard of achievement against which individual children's language growth might be measured, are commonly used to make judgments about children's development (see Burns, Midgette, Leong, & Bodrova, 2002; Tager-Flusberg et al., 2009). This topic is of great interest for practitioners, researchers, and policymakers alike. For practitioners, language benchmarks are critical for monitoring language development for children with typically developing language (TDL) and children with language impairment (LI). For the latter group, language benchmarks are important for measuring language growth during interventions relative to desired outcomes. Researchers rely heavily on benchmarks for a priori power calculations for achieving desired effects for language-based intervention studies and post hoc analysis of the practical significance of intervention effects (see Abraham & Russell, 2008). Policymakers utilize benchmarks to identify educational programs and curricula that are effective in bolstering children's language outcomes. To this end, practitioners, researchers, and policymakers, hereafter referred to collectively as stakeholders, each have a vested interest—or stake—in measuring children's language growth.
To date, great disparity exists in the form and effectiveness of benchmarks used to assess children's language growth (see Brindley, 1998; Burns et al., 2002; Roberts & Kaiser, 2012; Tager-Flusberg et al., 2009), and as yet no single empirically driven benchmark is available for expected language growth for children with TDL, nor for children with LI. In the present study, we address this need by calculating effect sizes of expected language growth to establish empirically driven language benchmarks for two populations of preschool and school-age children: those with TDL and those with LI.
Significance of Language Development
Children's early language development has long-standing implications for short-term and long-term success across social and academic outcomes (Young et al., 2002). Children's language skills, including the domains of grammar (form), vocabulary (content), and listening comprehension and pragmatics (use), are foundational for mastery of oral language competencies (i.e., expressing needs and comprehending spoken messages; L. Bloom & Lahey, 1978) and development of literate language (e.g., reading and writing; Paul & Norbury, 2012). A large corpus of empirical research links language competency at school entry with school readiness (i.e., following directions and answering questions; Justice, Bowles, Pence Turnbull, & Skibbe, 2009; Rimm-Kaufman, Pianta, & Cox, 2000) and academic success in reading (e.g., Catts, Fey, Zhang, & Tomblin, 1999), writing (e.g., Mackie, Dockrell, & Lindsay, 2013), math (e.g., Abedi & Lord, 2001), and science (e.g., McDermott, Rikoon, & Fantuzzo, 2014). This body of research suggests that children with stronger language skills demonstrate a stronger command of content areas compared with peers with weaker language skills (Adams, 2010; Snow, 2010). In fact, language abilities in preschool children have been shown to predict language and academic outcomes for beginning kindergarten students (Zucker, Cabell, Justice, Pentimonti, & Kaderavek, 2013) and later elementary students (Catts, 1993; Catts, Fey, Tomblin, & Zhang, 2002; Fey, Catts, Proctor-Williams, Tomblin, & Zhang, 2004) and to predict vocational success (Johnson, Beitchman, & Brownlie, 2010).
Recent research suggests that the rate at which children acquire language, rather than mastery of discrete language skills alone, informs later achievement (Rowe, Raudenbush, & Goldin-Meadow, 2012). Rowe et al. (2012)  found that preschoolers who demonstrated accelerated growth of language (i.e., vocabulary) at 30 months of age were more likely to exhibit increased vocabulary skills at kindergarten entry than were children with equal vocabulary skills but slower growth. Given the significant relation between language and children's short- and long-term outcomes and the emerging research showing added benefit for children whose language grows at a faster rate (Rowe et al., 2012), it is critical that stakeholders be able to both identify children's current achievement relative to language development and measure children's language growth as an indicator of language and academic success.
Current Language Benchmarks
Language benchmarks have been operationally defined in a myriad of ways, including “learner performance standards” (Brindley, 1998, p. 46), “knowledge and skills students gain over time” (Burns et al., 2002, p. 3), and “a framework for describing language progress” (Tager-Flusberg et al., 2009, p. 644) and have been represented in equally varied forms (e.g., developmental milestones, criterion-referenced measures, and curriculum standards; Bancroft, 2010; Burns et al., 2002). For instance, practitioners often rely on established milestones of normal development as a benchmark of children's language growth and to determine when children may deviate from what is normally expected. As an example, a long-standing benchmark, based on the seminal work of Roger Brown (1968), is that 4-year-old children can produce regular forms of past tense verbs (e.g., played), and a child's omission of past tense markers may signal slower than expected growth in language.
A large volume of language development research in the late 1960s and 1970s helped to solidify milestones in typical language development, such as when children are likely to begin to babble and gesture, produce their first word, produce early sentences, and express a variety of grammatical morphemes (e.g., Brown, 1968, 1973). These milestones, although well documented, come from longitudinal studies of very few children (n = 3; Brown, 1968) and as such are limited in the extent to which they generalize to diverse populations and might be used to assess incremental progress in language development.
Another type of benchmark that has emerged in the last decade, due to large-scale educational reforms, is the content standard. Content standards are curricular expectations that inform children's learning objectives within a given content area (e.g., English-language arts) on a grade-by-grade basis, with achievement tests used to assess children's learning relative to these standards. Content standards are typically locally developed, although the Common Core standards represent a recent effort to establish a unified set of expectations for children across the United States (Common Core State Standards Initiative, 2011). For example, one core English-language arts standard for kindergarten is to “identify new meanings for familiar words and apply them accurately” (Common Core State Standards Initiative, 2011). The Common Core Standards represent a nationwide effort to identify educational expectations for all children. However, use of content standards as language benchmarks has many limitations, including determining mastery of each standard, the inability to measure incremental progress, and limited relation to research (Bancroft, 2010; Brindley, 1998; Burns et al., 2002), which minimizes the extent to which these benchmarks can reliably be used as metrics for children's language growth.
Effect sizes, drawn from large-scale cross-sectional research, are another form of benchmark often utilized by researchers to measure change. Effect sizes can be defined mathematically as the difference between two means expressed in standard deviation units (Cohen, 1992). In practical terms, effect sizes represent the magnitude of change observed (Fritz, Morris, & Richler, 2012). By extension, an effect-size benchmark represents the amount of expected change (in standard deviation units) that a child might make in a single year (H. S. Bloom, Hill, Black, & Lipsey, 2008). Utilizing effect sizes to document children's language growth may be more rigorous because these benchmarks are empirically derived, measurable, and rooted in research (Ellis, 2010).
Effect sizes enable professionals to understand the magnitude of change and to determine whether that magnitude is of practical significance (Hill, Bloom, Black, & Lipsey, 2008). Practical significance is the extent to which an intervention with statistical significance presents findings that matter to the populations studied (see Bain & Dollaghan, 1991). For instance, assume that researchers tracked the progress in language skills of children with LI from the beginning to the end of the school year. Results indicate a significant difference (p = .01) between the fall and spring scores with an effect size of 0.01, suggesting that the children with LI improved their language skills by only 0.01 SD. Although statistically significant, this progress does not have much practical significance (i.e., meaningful change in the population). In contrast, consider the same study with a statistical significance of 0.01 but with an effect size of 1.0. In this scenario, children with LI made 1.0 SD of improvement over the academic year. This study now has both statistical and practical significance. The difficulty, however, is that effect sizes (i.e., measures of practical significance) alone offer little information to professionals without a comparison point. In other words, the effect size from a particular study must be compared with that of a reference group to allow adequate assessment of the practical significance of children's language growth.
Current Methods for Interpreting Effect Sizes
To date, stakeholders do not have an empirically driven reference group to judge the relevance of effect sizes specific to children's language growth. Rather, Cohen's d guidelines and previous intervention studies are consistently used as metrics for interpreting effect sizes (Cohen, 1988; Ellis, 2010). Each method and the corresponding drawbacks as a standard for language growth are discussed in turn.
Cohen's d
Traditionally, researchers in the social and behavioral sciences have relied on Cohen's guidelines to assess meaningful change, with an effect size of 0.2 representing small change, 0.5 representing moderate change, and 0.8 representing large change (Ellis, 2010). Cohen (1988)  developed these guidelines based on a review of existing research at that time and indicated that the guidelines for assessing effect sizes were established as a rule of thumb rather than as a rule. Independent consideration for the study's population and the intervention were important in assessing the actual effectiveness of the designs (Cohen, 1988; Hill et al., 2008).
Concerns exist about using one standard set of guidelines for interpreting intervention effects; however, without an established benchmark of typical language growth, stakeholders do not have population-specific indices to assess growth. In other words, Cohen's standards for interpreting effect sizes are arbitrary without information on the average expected growth in a particular population—for our purposes, children with TDL and children with LI.
Precedence for Establishing Empirically Driven Benchmarks
Bloom and colleagues (H. S. Bloom et al., 2008; Hill et al., 2008) utilized an empirical approach to establish benchmarks for reading and math. Their methodology relied on published data from nationally normed tests to estimate the amount of growth an average child can be expected to make in 1 year. Each norm-referenced test (i.e., a standardized assessment administered to a large, nationally representative sample of children) provided means and standard deviations for the test broken down by age or grade. The effect sizes used by H. S. Bloom et al. (2008)  were calculated as the difference between the mean score for children 1 year apart in age. For example, to estimate how much growth should occur between ages 5 and 6, the mean raw score for the published normative sample at age 5 was subtracted from the mean raw score published for the normative sample at age 6. Bloom and colleagues relied on the raw scores rather than a standard score because the mean of each age group of scores after standardization would be, by definition, identical. Because the samples for the two age groups were both nationally representative, Bloom et al. argued that the difference between the two means is a valid representation of the growth we should expect students to make during this time.
Because each individual test may have issues with reliability or scaling across ages, H. S. Bloom et al. (2008)  recommended that researchers repeat this process and combine results across multiple tests. In this way, the Bloom method is a type of meta-analysis; it combines the means and standard deviations from normative samples in several published standardized tests weighted by the included sample size. With their work, Bloom and colleagues (H. S. Bloom et al., 2008; Hill et al., 2008) represent the first attempt at addressing the limitations associated with Cohen's standards for interpreting effect sizes for children's expected growth in reading and math. However, we are not of aware of any such benchmarks established for children's expected growth in language.
Study Rationale and Aims
Stakeholders have a vested interest in measuring children's language; however, few reliable benchmarks are available to judge the magnitude and practical significance of observed language growth. Of the current benchmarks available, effect sizes offer the most objective and systematic form of documenting children's language growth. However, without established effect sizes for this population, stakeholders cannot accurately interpret such values. To address this significant need, the present study takes an initial step in establishing an empirically driven set of benchmarks for evaluating language growth for children. Specifically, this study was conducted to (a) determine the average amount of language growth expected for children with TDL and (b) determine the average amount of language growth expected for children with LI.
Method
The present project used the analytic process of H. S. Bloom et al. (2008)  to generate language benchmarks for two populations: children with TDL and children with LI. We calculated effect sizes corresponding to 1 year of growth for preschool (i.e., 3-year-olds) through early elementary-age children (i.e., 9-year-olds) utilizing data from norm-referenced measures of language (see Appendix A).
Study Aim 1
To develop benchmarks for children with TDL, we used population data from eight norm-referenced measures of language (see Appendix B) selected to represent the most common language assessments used clinically (Betz, Eickhoff, & Sullivan, 2013; Huang, Hopkins, & Nippold, 1997) and for research purposes (e.g., Catts et al., 1999; Fey et al., 2004). The pooled sample size across six age ranges for the eight language measures included over 20,000 children (n = 20,018). Each subtest within these eight norm-referenced measures was classified into the language domains targeted. The majority of subtests targeted expressive or receptive grammar and expressive or receptive vocabulary (n = 27; 77%), which mirrors the primary outcomes of interest in two meta-analyses of language interventions (see Law, Garrett, & Nye, 2004; Nye, Foster, & Seaman, 1987). The remaining subtests (n = 8; 23%) targeted integration of multiple language domains or cognitive processes (e.g., vocabulary, syntactic structure, and word retrieval) or targeted supralinguistic domains (i.e., higher level language such as inferencing). Thus, these subtests were labeled as targeting either multiple or supralinguistic domains (see Appendix B for a comprehensive list). Given the predominance of subtests targeting grammar and vocabulary and the variability—as well as sparseness—of subtests representing multiple and supralinguistic language domains, effect sizes were calculated for grammar, vocabulary, and overall language (which included all subtests, regardless of language domain).
Study Aim 2
For the second research aim, we used data collected in two large-scale, multicohort studies of children with LI. A critical distinction for both of these studies is that they represent children with LI receiving business-as-usual language therapy in the public schools. These children were part of studies that either introduced print awareness intervention in the classrooms (see STAR-2 information below) or sought to capture representative treatment as delivered on a daily basis by speech-language pathologists without systematic manipulation of treatment procedures (see STEPS information below). As such, children with LI in the present study represent “average” children with LI receiving language therapy by a speech-language pathologist in the public schools. To address the second aim, we used fall and spring assessment data collected in each study to generate effect sizes for language benchmarks. As with study aim 1, we classified each subtest into one of four categories: grammar, vocabulary, multiple, or supralinguistic (see Appendix C).
The first study, Sit Together and Read–2 (STAR-2), involved 314 children with disabilities in 83 early childhood special education classrooms, 90% of whom had individualized education plans (n = 302). The children were enrolled in STAR-2 in three consecutive cohorts (2008–2009, 2009–2010, and 2010–2011). As a part of STAR-2, children received a comprehensive battery of individualized assessments during a 6-week window in the fall and spring. The assessments were the Kaufman Brief Intelligence Test–2 (Kaufman & Kaufman, 2004), the Clinical Evaluation of Language Fundamentals–Preschool (Wiig, Secord, & Semel, 2004), and the Test of Preschool Early Literacy (Lonigan, Wagner, Torgesen, & Rashotte, 2007), which are used here to benchmark language growth for children with LI. In STAR-2, the children were assigned to three intervention conditions focused on improving children's print knowledge; none of the conditions were focused on improving children's language skills. We tested the following hypothesis: There were no significant differences in language gain between the control and treatment conditions. Thus, for our purposes, we included data from children in all three of the STAR-2 study conditions and excluded those who had significant comorbid diagnoses (n = 48) and who were not currently receiving treatment for LI (n = 20; missing data n = 5). The final sample from STAR-2 in this study was 229 children with LI. Descriptive statistics for these 229 children are presented in Table 1; see Justice, Logan, Kaderavek, and Dynia (2015)  for comprehensive procedures for STAR-2.
Table 1. Descriptive statistics for children with language impairment by sample.
Descriptive statistics for children with language impairment by sample.×
Attribute N (%) M (SD) Minimum Maximum
STAR-2
 Core language 229 77.31 (16.81) 45 131
 Cognition 153 83.89 (17.63) 53 124
 Age (months) 229 51 (7.3) 36 67
  36–47 (=3 years) 73 (35)
  48–59 (=4 years) 107 (50)
  60–71 (=5 years) 31 (15)
 SES (income) 198 10.19 (6) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 159 (77)
  African American 28 (14)
  Other 18 (9)
STEPS
 Core Language 266 69.79 (16.39) 40 115
 Cognition 266 88.52 (11.62) 44 131
 Age (months) 268 75.9 (8.59) 59 96
  60–71 (=5 years) 82 (33)
  72–83 (=6 years) 119 (47)
  84–95 (=7 years) 50 (20)
 SES (income) 203 9.8 (5.72) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 145 (54)
  African American 27 (10)
  Other 29 (14)
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.×
Table 1. Descriptive statistics for children with language impairment by sample.
Descriptive statistics for children with language impairment by sample.×
Attribute N (%) M (SD) Minimum Maximum
STAR-2
 Core language 229 77.31 (16.81) 45 131
 Cognition 153 83.89 (17.63) 53 124
 Age (months) 229 51 (7.3) 36 67
  36–47 (=3 years) 73 (35)
  48–59 (=4 years) 107 (50)
  60–71 (=5 years) 31 (15)
 SES (income) 198 10.19 (6) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 159 (77)
  African American 28 (14)
  Other 18 (9)
STEPS
 Core Language 266 69.79 (16.39) 40 115
 Cognition 266 88.52 (11.62) 44 131
 Age (months) 268 75.9 (8.59) 59 96
  60–71 (=5 years) 82 (33)
  72–83 (=6 years) 119 (47)
  84–95 (=7 years) 50 (20)
 SES (income) 203 9.8 (5.72) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 145 (54)
  African American 27 (10)
  Other 29 (14)
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.×
×
The second study, Speech Therapy Experiences in Public Schools (STEPS), involved 293 children in the early primary grades who were receiving speech-language services per an individualized education plan and whose primary disability was LI. The children were enrolled in STEPS in three consecutive cohorts (2009–2010, 2010–2011, and 2011–2012). As part of STEPS, children received a comprehensive battery of individualized assessments in the fall and spring of the academic year, which included the Kaufman Brief Intelligence Test–2 (Kaufman & Kaufman, 2004), the Clinical Evaluation of Language Fundamentals–Fourth Edition (Semel, Wiig, & Secord, 2003), and the Woodcock-Johnson Test of Achievement (WJ-III; Woodcock, McGraw, & Mather, 2001). For our purposes, we included all children in STEPS except those who had significant comorbid diagnoses (n = 25); the final sample from STEPS in this study was 268 children with LI. Descriptive statistics for these children are presented in Table 1; see Schmitt, Justice, Logan, Schatschneider, and Bartlett (2014)  for comprehensive procedures of STEPS. Both STAR-2 and STEPS met institutional review board standards, and all authors were approved for participation in the present research project. Appendix C provides a complete list of measures and related subtests for each study.
Results
Establishing Benchmarks for Children With TDL
To address the first study aim, following the example of H. S. Bloom et al. (2008)  and Hill et al. (2008), we used a two-step process to obtain effect sizes of language growth in six age groups of children with TDL. First, raw score means and standard deviations of each language subtest were collected from the technical manuals of eight norm-referenced language measures (see Appendix B for a list of measures). The raw score means and standard deviations for each age, from 3 to 9 years, were included as available.
Second, we used the mean raw scores and standard deviations to calculate Hedges's g for the effect size of language growth between each pair of observed scores for adjacent ages (e.g., 3 and 4 years old) for each language subtest. Hedges's g provides a weighted effect size based on sample size (Hedges, 1982). Hedges's g was the most appropriate effect-size calculator for this analysis because multiple samples were combined, and the sample size differed for each norm-referenced measure (Hedges, 1982; see Table 2 for sample sizes). Hedges's g is interpreted the same as Cohen's d (i.e., the difference between two scores expressed in standard deviation units). The difference is that Hedges's g provides an average estimate across several samples, whereas Cohen's d provides an estimate for a single sample. Therefore, comparison of findings across these statistical metrics (Cohen's d and Hedges's g) is possible because of the similarities in calculation. To calculate Hedges's g, the difference between the raw scores for adjacent age groups is divided by the pooled standard deviation. This can be expressed in the following equation: Display Formula
g i = Y ¯ i A Y ¯ i B n i A 1 S i A 2 + n i B 1 S i B 2 n i A + n i B 2
(1)
Table 2. Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.
Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.×
Measure Age 3–4
Age 4–5
Age 5–6
Age 6–7
Age 7–8
Age 8–9
n ES n ES n ES n ES n ES n ES
CASL 400 0.74 300 1.04 200 1.91 200 1.01 200 0.39 200 0.62
CELF-4 NP NP 200 0.87 300 0.84 400 0.45 400 0.75
EVT 200 1.33 210 0.55 235 0.87 325 0.80 400 0.56 400 0.5
OWLS 400 0.97 325 0.65 250 0.95 250 0.68 251 0.56 249 0.66
EOWPVT 314 0.75 414 0.71 426 0.67 449 0.60 412 0.55 375 0.46
PPVT 200 1.33 210 0.65 235 0.78 325 0.86 400 0.58 400 0.59
TOLD-4 NP 348 0.52 450 0.66 534 0.42 492 0.41 NP
WJ-III 1,280 0.66 1,403 0.48 1,350 0.69 1,311 0.53 1,387 0.49 1,571 0.29
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).×
Table 2. Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.
Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.×
Measure Age 3–4
Age 4–5
Age 5–6
Age 6–7
Age 7–8
Age 8–9
n ES n ES n ES n ES n ES n ES
CASL 400 0.74 300 1.04 200 1.91 200 1.01 200 0.39 200 0.62
CELF-4 NP NP 200 0.87 300 0.84 400 0.45 400 0.75
EVT 200 1.33 210 0.55 235 0.87 325 0.80 400 0.56 400 0.5
OWLS 400 0.97 325 0.65 250 0.95 250 0.68 251 0.56 249 0.66
EOWPVT 314 0.75 414 0.71 426 0.67 449 0.60 412 0.55 375 0.46
PPVT 200 1.33 210 0.65 235 0.78 325 0.86 400 0.58 400 0.59
TOLD-4 NP 348 0.52 450 0.66 534 0.42 492 0.41 NP
WJ-III 1,280 0.66 1,403 0.48 1,350 0.69 1,311 0.53 1,387 0.49 1,571 0.29
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).×
×
In this equation, Y ¯ i A is the mean score on a given test or subtest (i) for children at a given age (A), and Y ¯ i B is the mean score on the same test or subtest (i) for the sample 1 year younger (B). In the denominator, n i A is the number of participants who were included in the estimate of the mean at age A ( Y ¯ i A ), S i A represents the standard deviation for subtest i at age A; the same follows for age B.
To illustrate how the effect sizes were calculated, we walk through the calculation of expected language growth on the Picture Vocabulary subtest of the WJ-III (Woodcock et al., 2001) for 3- to 4-year-old children. The mean raw score 1   reported in the technical manual for Picture Vocabulary at age 3 ( Y ¯ i B ) is 445.34 (SD = 19.66, n =308) and at age 4 ( Y ¯ i A ) is 460.60 (SD = 17.46, n = 391). The expected language growth from age 3 to age 4 was calculated by subtracting the average raw scores at each age (15.29 raw score change points) and dividing by the pooled, weighted standard deviation of the two samples Display Formula
391 1 17.46 2 + 308 1 19.66 2 391 + 308 2 = 18.46
(2)
resulting in an effect size (g) for 3- to 4-year-olds on the Picture Vocabulary subtest of 0.83. This is interpreted as a change of 0.83 SD from 3 to 4 years old. Using this process, effect sizes (g) were calculated for each age group across all indicated language measures (see Table 2).
The effect sizes for each subtest were then reported in two ways: per norm-referenced measure and per language domain (i.e., grammar, vocabulary, and overall language). First, one unique effect size was calculated for each norm-referenced measure (see Table 2) by using the average effect size and corresponding sample size for each subtest on that measure. For example, to determine an overall effect size for the WJ-III, the effect sizes for both language subtests (i.e., Picture Vocabulary and Oral Language) were weighted by the number of children represented (e.g., Picture Vocabulary: n = 699; Oral Language: n = 581) and divided by the total sample size (N = 1,280) to determine an overall effect size for that measure. For the WJ-III: Display Formula
average g = .45 × 581 + .83 × 699 1280 = 0.66.
(3)
All effect sizes for the norm-referenced measures are presented in Table 2, and each unique sample provided only one estimate to each effect size.
Resulting effect sizes across measures ranged from 0.48 to 1.33 for preschool children and from 0.39 to 1.91 for school-age children. For instance, from age 4 to age 5, children are expected to grow 0.48 SD on the WJ-III but are expected to grow 1.0 SD (g = 1.04) on the Comprehensive Assessment of Spoken Language (Williams, 1999). Generally speaking, effect sizes decreased as children's ages increased (e.g., yearly growth on the Expressive Vocabulary Test for 3- to 4-year-old children is 1.32 and for 8- to 9-year-old children is 0.51).
The subtests were then used to calculate unique effect sizes for two language domains, grammar and vocabulary, and for an overall language estimate (i.e., all subtests: grammar, vocabulary, multiple, and supralinguistic domains). Effect sizes were calculated by first obtaining one estimate of the domain of interest (grammar, vocabulary, or overall) per norm-referenced measure (k) and then combining each estimate across measures using meta-analytic weighting. Results of this analysis are presented in Table 3. Effect sizes across language domains ranged from 0.55 to 0.95 for preschool children and from 0.37 to 0.81 for school-age children. Within the vocabulary and overall language domains, greater growth was found between ages 3 and 6, but in the grammar domain, growth was greater in ages 5–7. Similar to the pattern seen across measures, effect sizes generally decreased as age increased.
Table 3. Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.
Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.×
Domain Age 3–4 Age 4–5 Age 5–6 Age 6–7 Age 7–8 Age 8–9
Grammar 0.58 0.71 0.81 0.70 0.37 0.77
Vocabulary 0.95 0.55 0.68 0.61 0.52 0.46
Overall language
M 0.82 0.60 0.74 0.64 0.49 0.44
SD 0.39 0.17 0.11 0.20 0.14 0.16
k 9 10 11 11 10 9
 Pooled n 2,794 3,210 3,346 3,694 3,691 3,346
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.×
Table 3. Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.
Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.×
Domain Age 3–4 Age 4–5 Age 5–6 Age 6–7 Age 7–8 Age 8–9
Grammar 0.58 0.71 0.81 0.70 0.37 0.77
Vocabulary 0.95 0.55 0.68 0.61 0.52 0.46
Overall language
M 0.82 0.60 0.74 0.64 0.49 0.44
SD 0.39 0.17 0.11 0.20 0.14 0.16
k 9 10 11 11 10 9
 Pooled n 2,794 3,210 3,346 3,694 3,691 3,346
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.×
×
The results of the overall language domain are presented in Figure 1. In this figure, we include the mean observed effect size at each given age flanked by the minimum and maximum observed effect size across all measures. This figure demonstrates that there is relatively high uncertainty for young children (ages 3–4) in terms of how much they grow in 1 year.
Figure 1.

Observed effect sizes across all measures for children with typically developing language. The middle line represents the mean, and the top and bottom of the bars represent the observed maximum and minimum effect sizes, respectively, for each age.

 Observed effect sizes across all measures for children with typically developing language. The middle line represents the mean, and the top and bottom of the bars represent the observed maximum and minimum effect sizes, respectively, for each age.
Figure 1.

Observed effect sizes across all measures for children with typically developing language. The middle line represents the mean, and the top and bottom of the bars represent the observed maximum and minimum effect sizes, respectively, for each age.

×
Establishing Benchmarks for Children With LI
Our second research aim was to generate effect sizes for children with LI. To address the second research aim, we used data from two longitudinal studies of children with LI (STAR-2 and STEPS) and used a two-step process similar to that utilized for the first research aim. First, we collected raw score means and standard deviations from each language subtest in the fall and spring of the academic year, based on children's ages in the fall (see Appendix C for a list of measures).
Second, we used the means and standard deviations to calculate a Cohen's d for effect size of language growth over an academic year. Cohen's d was used for this research aim rather than Hedges's g because only one sample was used for each estimate. With only one sample, the estimate did not need to be weighted by sample size. As noted previously, Cohen's d is interpreted the same way as is Hedges's g, which allows for comparison across samples. Effect sizes were calculated separately based on children's ages at the beginning of the year (e.g., the mean fall score for children who were 3 years old at the beginning of the year was calculated separately from the mean fall score for children who were 4 years old at the beginning of the year) and represent the average language growth for those children over an academic year. For example, the average fall raw score mean on the Word Structure subtest from the Clinical Evaluation of Language Fundamentals–Preschool: Second Edition for participating 3-year-olds was 4.63 (SD = 4.97), and their spring raw score mean was 8.05 (SD = 5.74). To calculate Cohen's d for this group of children, the fall mean score was subtracted from the spring mean score (3.42 raw score points of change) and divided by the pooled standard deviation (5.36) to achieve an effect size of 0.64.
Similar to our method for study aim 1, we report the effect sizes for children with LI in two ways: per subtest of language and per language domain. First, an estimate of the observed effect sizes at each age was calculated for each language subtest (see Table 4). For most ages, only one sample of children (i.e., STAR-2 or STEPS) provided information because of specific ages of the samples (preschool population in STAR-2 and school-age population in STEPS), with the exception of age 5. Table 4 displays the effect sizes across language subtests, which ranged from 0.26 to 0.83 for preschool children with LI and from 0.39 to 0.65 for school-age children with LI. Overall, larger effect sizes were found for preschool children with LI than for school-age children with LI.
Table 4. Effect sizes (d) for children with language impairment by age and subtest.
Effect sizes (d) for children with language impairment by age and subtest.×
Subtest Age 3
Age 4
Age 5A
Age 5B
Age 6
Age 7
n d n d n d n d n d n d
CELF-P:2 and/or CELF-4
 Sentence Structure 73 0.70 108 0.70 31 0.59 NG NG NG NG NG NG
 Word Structure 72 0.64 106 0.77 31 0.26 82 0.56 119 0.48 50 0.57
 Expressive Vocabulary 72 0.73 108 0.83 31 0.36 NG NG NG NG NG 6
 Recalling Sentences 72 0.68 106 0.76 31 0.66 82 0.45 118 0.40 50 0.39
 Concepts and Following Directions 69 0.64 106 0.61 31 0.67 82 0.51 119 0.45 50 0.64
 Basic Concepts 71 0.72 107 0.67 NG NG NG NG NG NG NG NG
 Formulated Sentences NG NG NG NG NG NG 82 0.59 119 0.65 50 0.62
TOPEL: Expressive Vocabulary 74 0.79 106 0.64 31 0.51 NG NG NG NG NG NG
Woodcock-Johnson: Picture Vocabulary NG NG NG NG NG NG 82 0.46 118 0.50 50 0.54
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.×
Table 4. Effect sizes (d) for children with language impairment by age and subtest.
Effect sizes (d) for children with language impairment by age and subtest.×
Subtest Age 3
Age 4
Age 5A
Age 5B
Age 6
Age 7
n d n d n d n d n d n d
CELF-P:2 and/or CELF-4
 Sentence Structure 73 0.70 108 0.70 31 0.59 NG NG NG NG NG NG
 Word Structure 72 0.64 106 0.77 31 0.26 82 0.56 119 0.48 50 0.57
 Expressive Vocabulary 72 0.73 108 0.83 31 0.36 NG NG NG NG NG 6
 Recalling Sentences 72 0.68 106 0.76 31 0.66 82 0.45 118 0.40 50 0.39
 Concepts and Following Directions 69 0.64 106 0.61 31 0.67 82 0.51 119 0.45 50 0.64
 Basic Concepts 71 0.72 107 0.67 NG NG NG NG NG NG NG NG
 Formulated Sentences NG NG NG NG NG NG 82 0.59 119 0.65 50 0.62
TOPEL: Expressive Vocabulary 74 0.79 106 0.64 31 0.51 NG NG NG NG NG NG
Woodcock-Johnson: Picture Vocabulary NG NG NG NG NG NG 82 0.46 118 0.50 50 0.54
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.×
×
To establish a global estimate, effect sizes were calculated across two language domains (grammar and vocabulary) and an overall language estimate. Similar to study aim 1, each subtest was separated into the targeted language domain (i.e., grammar, vocabulary, multiple, or supralinguistic; see Appendix C). Overall estimates of effect size were calculated with Hedges's g to provide a weighted estimate across the two samples. Effect sizes by language domain are presented in Table 5. A similar pattern was seen for the effect sizes across language domains (grammar, vocabulary, and overall language) as with the individual subtest effect sizes. Effect sizes across language domains ranged from 0.67 to 0.75 for preschool children with LI and from 0.44 to 0.55 for school-age children with LI. Larger effect sizes were found for vocabulary and overall language across ages than for grammar. Within the grammar domain, preschool children on average had larger effect sizes (d = 0.67) than did school-age children (d = 0.44).
Table 5. Weighted effect sizes (g) across domain and age for children with language impairment.
Weighted effect sizes (g) across domain and age for children with language impairment.×
Domain Age 3 Age 4 Age 5 Age 6 Age 7
Grammar 0.67 0.74 0.50 0.44 0.48
Vocabulary 0.75 0.71 0.44 0.5 0.54
Overall language 0.7 0.71 0.51 0.5 0.55
n 72 106 113 119 50
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.×
Table 5. Weighted effect sizes (g) across domain and age for children with language impairment.
Weighted effect sizes (g) across domain and age for children with language impairment.×
Domain Age 3 Age 4 Age 5 Age 6 Age 7
Grammar 0.67 0.74 0.50 0.44 0.48
Vocabulary 0.75 0.71 0.44 0.5 0.54
Overall language 0.7 0.71 0.51 0.5 0.55
n 72 106 113 119 50
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.×
×
Discussion
The present study pooled data from over 20,000 children with TDL and over 490 children with LI, ranging from 3 to 9 years of age, to generate the first set of empirically driven language benchmarks. As such, the findings presented herein exhibit strong external validity and provide both cross-sectional (expected growth at each age) and longitudinal (expected growth over time) norms of language growth for children with TDL and children with LI. These data represent initial efforts to generate empirically driven language benchmarks for children with TDL and children with LI. These data may best be considered as a reference tool with which stakeholders can begin to empirically measure and interpret observed language growth.
From our data, we note two main observations. First, for children with TDL, the magnitude of language growth varied considerably longitudinally (younger to older children) and across language domain (grammar vs. vocabulary). Effect sizes for vocabulary growth were larger for younger children (i.e., preschool age) than for older children (i.e., school age), and effect sizes for grammar were larger for school-age children than for preschool children. Second, preschool children with LI demonstrated similar magnitudes of language growth as did preschool children with TDL; however, this magnitude did not vary as a function of construct as it did for preschool children with TDL. In short, younger children with LI appeared to exhibit higher rates of growth than did older children with LI across all language domains. These findings are discussed in further detail below.
Language Benchmarks for Children With TDL
The first aim of the present study was to determine language benchmarks for children with TDL. Normative data from eight standardized language tests were utilized to derive empirical benchmarks of typical language growth. Calculated effect sizes ranged considerably across age groups and language constructs (e.g., 0.58 to 0.95 for preschool children and 0.37 to 0.81 for school-age children). In this work, language was considered in terms of grammar, vocabulary, and overall language (all subtest domains: grammar, vocabulary, multiple, and supralinguistic). Our findings suggest that, in terms of overall language, preschool children had greater average language growth than did school-age children (i.e., effect sizes of 0.82 and 0.44, respectively). In terms of language domains, preschool children with TDL had greater growth in vocabulary than in grammar (i.e., effect size of 0.95 for vocabulary and 0.58 for grammar), whereas school-age children experienced greater growth in grammar than in vocabulary (i.e., effect sizes of 0.77 to 0.81 for grammar and 0.46 to 0.68 for vocabulary). To some degree, these data are consistent with previous work examining relations between vocabulary and grammatical competence. Thordardottir, Weismer, and Evans (2002)  found that in a group of English-speaking children, those whose vocabulary size was 130 words or less did not produce any inflectional suffixes (e.g., past tense -ed or plural -s); however, children whose vocabulary exceeded 550 words produced both noun and verb suffixes. Those findings substantiate Brown's (1973)  research, which indicated that children acquire -ing and plural -s between 24 and 36 months, an age at which most children have acquired at least 300 words in their lexicon (Fenson et al., 1994). Collectively, these data suggest that a certain level of vocabulary knowledge must be achieved to begin producing appropriate grammatical structures.
Hill et al. (2008)  used a similar methodology to establish reading benchmarks for children with TDL and reported findings comparable to those of the present study. Hill et al. found that younger children had larger effect sizes in reading growth (about 1 SD/year for kindergarten children through second-grade children) than did older children (about 0.65 SD for third- and fourth-grade children), mirroring the pattern of greater language growth for younger children found in the present study. In some respects, this finding may not be surprising, as evidence suggests strong, positive correlations between reading and language development (e.g., Catts et al., 1999). Nonetheless, the comparable methodological approaches and patterns of development in the present study and that of Hill et al. provide a lens through which to consider our findings in the absence of discipline-specific prior literature.
Language Benchmarks for Children With LI
To address our second research aim, we used data from two large-scale investigations in the public schools to derive effect sizes of expected language growth for children with LI. Previous work reporting effect sizes of language growth for this population have either measured outcomes resulting from highly controlled intervention studies (see Cirrin & Gillam, 2008) or utilized a sample with researcher-determined populations of children with LI (Tomblin, Zhang, Buckwalter, & O'Brien, 2003). The present study included large samples of clinically identified children with LI who were receiving business-as-usual therapy in the public schools. Thus, findings from the present work represent a more externally valid group of children with LI than did findings from previous studies.
The language growth for children with LI in the present study followed a trajectory similar to that of the children with TDL. Furthermore, preschool children with LI had growth similar to that of preschool children with TDL across all language domains (effect sizes of approximately 0.70). By age 5, however, the magnitude of growth for children with LI was less than that of 5-year-olds with TDL (average effect sizes of 0.51 and 0.74, respectively). This decrease in language growth was not evident in the sample of children with TDL until children were age 7. Thus, the data suggest that, despite receiving language intervention, language growth may slow down at an earlier age in children with LI than in children with TDL.
We offer explanations for this finding in two parts. First, our data align with previous work suggesting differences between transient and persistent LI. Bishop and Edmundson (1987),  for example, studied a clinical sample of children diagnosed with LI (n = 87) starting at age 4. By age 5½, over 40% of their sample no longer exhibited language difficulties and were considered to exhibit transient LI. However, the children whose language difficulties were still evident at age 5½ represented those with a persistent LI or language difficulties that did not resolve with intervention. Furthermore, children in this latter category exhibited more severe and widespread language problems across domains of syntax, semantics, receptive language, and phonology. In the present study, therefore, it is possible that the children with LI in the 5-year age group and older who had smaller effect sizes on language growth represent a subgroup of children with severe and persistent LI. The present study did not follow children longitudinally; therefore, we cannot fully evaluate the extent to which children in these studies exhibited transient or persistent LI. However, these findings do suggest that children who qualify for language services at age 5 appear to demonstrate less rapid language growth throughout the academic year compared with their younger counterparts.
A second explanation for why this pattern of diminished growth occurred at an earlier age for children with LI than for children with TDL may relate to their degree of school readiness. School readiness is a term used to describe the set of optimal skills needed for successful entry into kindergarten, including both academic abilities and social skills (La Paro, Kraft-Sayre, & Pianta, 2003; Rimm-Kaufman et al., 2000). Children with LI, on average, have poorer school readiness skills than do children with TDL (e.g., Spaulding, 2010). This finding suggests that children whose language difficulties are present as they enter formal schooling (i.e., age 5) may not be prepared to engage in academic environments, a key context for language growth among school-age children (Rimm-Kaufmann et al., 2000). Thus, children with LI may not fully benefit from advanced language presented via classroom instruction compared with peers with TDL. Consequently, their ability to maintain a similar rate of language growth would be compromised, which may explain why children with LI appear to have slower rates of language growth compared with children with TDL around the age of 5.
In addition to the age-related difference observed between the children with TDL and those with LI, our data also indicated group differences with respect to domain-specific patterns of language growth. School-age children with LI did not experience an increase in grammatical growth compared with vocabulary as did the school-age children with TDL. We previously explained that the age-related differential patterns of grammar and vocabulary growth for children with TDL likely reflect developmental patterns of language growth, in that early boosts in vocabulary knowledge precede growth in grammatical abilities (e.g., Thordardottir et al., 2002). Therefore, our findings concerning children with LI suggest that their language growth patterns do not follow this typical course. Although children with LI exhibit a heterogeneous set of deficits, grammatical difficulties are commonly observed in children with LI (e.g., Eadie, Fey, Douglas, & Parsons, 2002; Fey et al., 2004). Findings from the present study suggest that many children with LI have poorer grammatical skills than do their peers with TDL and that their rate of grammatical growth is also slower. Thus, the poorer language skills already demonstrated by children with LI coupled with reduced grammatical growth in the school-age years suggest that grammatical deficits are likely to persist.
Clinical Applications
Collectively, the results of the present investigation lend clinical utility to stakeholders (i.e., practitioners, policymakers, and researchers). We present clinical applications for each stakeholder. First, armed with these data, practitioners may now establish empirically driven expectations for growth, as determined from norm-referenced measures, across the academic year. According to Kamhi (1999), goal writing and response to intervention are often implemented with arbitrary outcome levels (e.g., with 80% accuracy over three consecutive sessions). The degree to which that percentage increase represents practical significance is uncertain. These benchmarks provide practitioners with an initial means of determining expected levels of growth over an academic year that could be used to establish outcomes with more practical significance for each child. Practitioners can also use these data to implement practice-based evidence (i.e., “gathering good-quality data from routine practice”; Margison et al., 2000, p. 123) to examine the rate of change caused by their unique intervention approaches (Apel, 1999; Kamhi, 1999). Furthermore, these benchmarks allow practitioners one metric by which to monitor yearly progress of children with TDL and children with LI using norm-referenced measures and to readily identify those children not making expected yearly growth according to such language outcomes.
Second, these data have implications for policymakers. State and federal governments recently have made significant investments in education for both children with TDL (e.g., Common Core) and children with LI (e.g., No Child Left Behind). As a result, policymakers have a vested interest in appropriate identification of children needing special education services and in establishing reasonable levels of progress. With these data, policymakers have an initial set of benchmarks by which they can begin to establish normal points of growth and to accurately access effective implementation of language curricula.
Third, many researchers are interested in testing language interventions and understanding their effects on children's language functioning. The data presented herein provide researchers with an initial tool for considering the practical significance of their results relative to expected language change, rather than relying on a “rule of thumb” approach (Cohen, 1988) that does not account for specific populations of children. Researchers continuing this work may also want to explore how different service delivery models and treatment intensities are tied to these benchmarks.
Limitations and Future Directions
Some limitations of the present work warrant discussion to inform future research. First, effect sizes for the LI group were based on four available norm-referenced measures versus eight norm-referenced assessments for children with TDL. Although similar language domains were represented across these assessments, the effect-size results may have been different for children with LI. Future research should incorporate a broader corpus of norm-referenced assessments to substantiate effect sizes of language growth for children with LI found in the present study. Second, the data presented herein represent a clinically identified sample of children with LI. This sample was intentionally chosen to advance our understanding of expected growth during business-as-usual interventions for children with LI. Thus, it is unclear how these data might inform intervention studies using researcher-defined samples of children with LI and children with LI who are not receiving language services. Future work should address these varied groups to better understand the potentially differential outcomes due to participant selection criteria and intervention effects.
We also acknowledge considerable variability among the effect sizes of typical language growth across norm-referenced measures. For example, the range of effect sizes for 3- to 4-year-olds across all language measures was 0.65 to 1.33 (see Table 2), suggesting substantial disparity in the expected language growth based on language measures. This variability may be best explained by the lack of item response theory (IRT)-based assessments. Historically, norm-referenced language measures have not relied on IRT methodology to standardize items on each subtest (but see subtests for the WJ-III; Woodcock et al., 2001). IRT methodology ensures that each item on a test is ordered by increasing complexity, with easier items preceding more difficult items. Without use of IRT-based methods, each norm-referenced test may vary dramatically in the difficulty of subtest items. Subtests with more difficult items early in the test may result in lower scores than subtests with easier items early in the test, which may explain the drastic variability in effect sizes across language measures. To account for this variability, we included numerous commonly used norm-referenced measures of language in our study. Nonetheless, it is important, both from a methodological standpoint and from a practitioner perspective, that researchers develop IRT-based language assessments to ensure construct validity and appropriate difficulty of language items. Language benchmarks in the present study were determined based solely on norm-referenced measures. Future studies should consider the role of language samples and/or artifact analyses in measuring children's language growth.
Conclusion
In an era of high-stakes testing and accountability for children's outcomes, stakeholders must have a valid and reliable means by which to measure children's language growth. This study utilized normative data representing over 20,000 children with TDL and two large-scale federally funded investigations of children with LI to determine an initial set of empirically driven benchmarks of children's language growth. These initial benchmarks may allow practitioners to monitor language development and readily identify children who may be at risk for academic problems secondary to language difficulties and to monitor progress for children receiving supplemental language intervention. For researchers, empirically driven language benchmarks, as initially identified in this study, may provide a reliable means to determine a priori power analyses and a context within which to compare the practical significance of language intervention studies. For policymakers, benchmarks of typical language growth provide one mechanism for critical appraisal of language curriculum that effectively promotes language development for preschool and school-age children. Taken together, findings from the present work represent a significant contribution to our knowledge of expected language growth for young children and provide a framework for future research to replicate and extend study findings.
Acknowledgments
We acknowledge the efforts of our project staff and research assistants who were instrumental in these research projects. We are especially thankful to the speech-language pathologists, classroom teachers, families, and students who participated in these studies. These research projects were supported by grant R324A080037 (STAR-2) and grant R324A090012 (STEPS) from the U.S. Department of Education, Institute of Education Sciences, to Laura M. Justice.
References
Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in Education, 14(3), 219–234. [Article]
Abedi, J., & Lord, C. (2001). The language factor in mathematics tests. Applied Measurement in Education, 14(3), 219–234. [Article] ×
Abraham, W. T., & Russell, D. W. (2008). Statistical power analysis in psychological research. Social and Personality Psychology Compass, 2(1), 283–301. [Article]
Abraham, W. T., & Russell, D. W. (2008). Statistical power analysis in psychological research. Social and Personality Psychology Compass, 2(1), 283–301. [Article] ×
Adams, M. J. (2010). Advancing our students’ language and literacy: The challenge of complex texts. American Educator, 34, 3–11.
Adams, M. J. (2010). Advancing our students’ language and literacy: The challenge of complex texts. American Educator, 34, 3–11.×
Apel, K. (1999). Checks and balances: Keeping the science in our profession. Language, Speech, and Hearing Services in Schools, 30, 98–107. [Article] [PubMed]
Apel, K. (1999). Checks and balances: Keeping the science in our profession. Language, Speech, and Hearing Services in Schools, 30, 98–107. [Article] [PubMed]×
Bain, B. A., & Dollaghan, C. A. (1991). The notion of clinically significant change. Language, Speech, and Hearing Services in Schools, 22, 264–270. [Article]
Bain, B. A., & Dollaghan, C. A. (1991). The notion of clinically significant change. Language, Speech, and Hearing Services in Schools, 22, 264–270. [Article] ×
Bancroft, K. (2010). Implementing the mandate: The limitations of benchmark tests. Educational Assessment, Evaluation and Accountability, 22(1), 53–72. [Article]
Bancroft, K. (2010). Implementing the mandate: The limitations of benchmark tests. Educational Assessment, Evaluation and Accountability, 22(1), 53–72. [Article] ×
Betz, S. K., Eickhoff, J. R., & Sullivan, S. F. (2013). Factors influencing the selection of standardized tests for the diagnosis of specific language impairment. Language, Speech, and Hearing Services in Schools, 44, 133–146. [Article] [PubMed]
Betz, S. K., Eickhoff, J. R., & Sullivan, S. F. (2013). Factors influencing the selection of standardized tests for the diagnosis of specific language impairment. Language, Speech, and Hearing Services in Schools, 44, 133–146. [Article] [PubMed]×
Bishop, D. V. M., & Edmundson, A. (1987). Language-impaired 4-year-olds: Distinguishing transient from persistent impairment. Journal of Speech and Hearing Disorders, 52, 156–173. [Article] [PubMed]
Bishop, D. V. M., & Edmundson, A. (1987). Language-impaired 4-year-olds: Distinguishing transient from persistent impairment. Journal of Speech and Hearing Disorders, 52, 156–173. [Article] [PubMed]×
Bloom, H. S., Hill, C. J., Black, A. R., & Lipsey, M. W. (2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289–328. [Article]
Bloom, H. S., Hill, C. J., Black, A. R., & Lipsey, M. W. (2008). Performance trajectories and performance gaps as achievement effect-size benchmarks for educational interventions. Journal of Research on Educational Effectiveness, 1(4), 289–328. [Article] ×
Bloom, L., & Lahey, M. (1978). Language development and language disorders. Hoboken, NJ: Wiley.
Bloom, L., & Lahey, M. (1978). Language development and language disorders. Hoboken, NJ: Wiley.×
Brindley, G. (1998). Outcomes-based assessment and reporting in language learning programmes: A review of the issues. Language Testing, 15(1), 45–85.
Brindley, G. (1998). Outcomes-based assessment and reporting in language learning programmes: A review of the issues. Language Testing, 15(1), 45–85.×
Brown, R. (1968). The development of Wh questions in child speech. Journal of Verbal Learning and Verbal Behavior, 7(2), 279–290. [Article]
Brown, R. (1968). The development of Wh questions in child speech. Journal of Verbal Learning and Verbal Behavior, 7(2), 279–290. [Article] ×
Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.
Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.×
Brownell, R. (2000). Expressive One-Word Picture Vocabulary Test–Third Edition. Novato, CA: Academic Therapy.
Brownell, R. (2000). Expressive One-Word Picture Vocabulary Test–Third Edition. Novato, CA: Academic Therapy.×
Burns, M. S., Midgette, K., Leong, D., & Bodrova, E. (2002). Prekindergarten benchmarks for language and literacy: Progress made and challenges to be met. New Brunswick, NJ: National Institute for Early Education Research.
Burns, M. S., Midgette, K., Leong, D., & Bodrova, E. (2002). Prekindergarten benchmarks for language and literacy: Progress made and challenges to be met. New Brunswick, NJ: National Institute for Early Education Research.×
Carrow-Woolfolk, E. (1995). Oral and Written Language Scales. Circle Pines, MN: AGS.
Carrow-Woolfolk, E. (1995). Oral and Written Language Scales. Circle Pines, MN: AGS.×
Catts, H. W. (1993). The relationship between speech-language impairments and reading disabilities. Journal of Speech and Hearing Research, 36, 948–958. [Article] [PubMed]
Catts, H. W. (1993). The relationship between speech-language impairments and reading disabilities. Journal of Speech and Hearing Research, 36, 948–958. [Article] [PubMed]×
Catts, H. W., Fey, M. E., Tomblin, J. B., & Zhang, X. (2002). A longitudinal investigation of reading outcomes in children with language impairments. Journal of Speech, Language, and Hearing Research, 45, 1142–1157. [Article]
Catts, H. W., Fey, M. E., Tomblin, J. B., & Zhang, X. (2002). A longitudinal investigation of reading outcomes in children with language impairments. Journal of Speech, Language, and Hearing Research, 45, 1142–1157. [Article] ×
Catts, H. W., Fey, M. E., Zhang, X., & Tomblin, J. B. (1999). Language basis of reading and reading disabilities: Evidence from a longitudinal investigation. Scientific Studies of Reading, 3, 331–361. [Article]
Catts, H. W., Fey, M. E., Zhang, X., & Tomblin, J. B. (1999). Language basis of reading and reading disabilities: Evidence from a longitudinal investigation. Scientific Studies of Reading, 3, 331–361. [Article] ×
Cirrin, F. M., & Gillam, R. B. (2008). Language intervention practices for school-age children with spoken language disorders: A systematic review. Language, Speech, and Hearing Services in Schools, 39, S110–S137. [Article] [PubMed]
Cirrin, F. M., & Gillam, R. B. (2008). Language intervention practices for school-age children with spoken language disorders: A systematic review. Language, Speech, and Hearing Services in Schools, 39, S110–S137. [Article] [PubMed]×
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.×
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. [Article] [PubMed]
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. [Article] [PubMed]×
Common Core State Standards Initiative. (2011). The standards. Retrieved from http://www.corestandards.org/the-standards
Common Core State Standards Initiative. (2011). The standards. Retrieved from http://www.corestandards.org/the-standards ×
Dunn, L. M., & Dunn, D. M. (2007). Peabody Picture Vocabulary Test–Fourth Edition. Minneapolis, MN: Pearson.
Dunn, L. M., & Dunn, D. M. (2007). Peabody Picture Vocabulary Test–Fourth Edition. Minneapolis, MN: Pearson.×
Eadie, P. A., Fey, M. E., Douglas, J. M., & Parsons, C. L. (2002). Profiles of grammatical morphology and sentence imitation in children with specific language impairment and Down syndrome. Journal of Speech, Language, and Hearing Research, 45, 720–732. [Article]
Eadie, P. A., Fey, M. E., Douglas, J. M., & Parsons, C. L. (2002). Profiles of grammatical morphology and sentence imitation in children with specific language impairment and Down syndrome. Journal of Speech, Language, and Hearing Research, 45, 720–732. [Article] ×
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge, England: Cambridge University Press.
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge, England: Cambridge University Press.×
Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thal, D., Pethick, S. J., … Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, 59(5). doi:10.2307/1166093
Fenson, L., Dale, P. S., Reznick, J. S., Bates, E., Thal, D., Pethick, S. J., … Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, 59(5). doi:10.2307/1166093 ×
Fey, M. E., Catts, H. W., Proctor-Williams, K., Tomblin, J. B., & Zhang, X. (2004). Oral and written story composition skills of children with language impairment. Journal of Speech, Language, and Hearing Research, 47, 1301–1318. [Article]
Fey, M. E., Catts, H. W., Proctor-Williams, K., Tomblin, J. B., & Zhang, X. (2004). Oral and written story composition skills of children with language impairment. Journal of Speech, Language, and Hearing Research, 47, 1301–1318. [Article] ×
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2–18. [Article] [PubMed]
Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141, 2–18. [Article] [PubMed]×
Hedges, L. V. (1982). Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92, 490–499. [Article]
Hedges, L. V. (1982). Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92, 490–499. [Article] ×
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177. [Article]
Hill, C. J., Bloom, H. S., Black, A. R., & Lipsey, M. W. (2008). Empirical benchmarks for interpreting effect sizes in research. Child Development Perspectives, 2(3), 172–177. [Article] ×
Huang, R.-J., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speech-language pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–29. [Article]
Huang, R.-J., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speech-language pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–29. [Article] ×
Johnson, C. J., Beitchman, J. H., & Brownlie, E. B. (2010). Twenty-year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes. American Journal of Speech-Language Pathology, 19, 51–65. [Article] [PubMed]
Johnson, C. J., Beitchman, J. H., & Brownlie, E. B. (2010). Twenty-year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes. American Journal of Speech-Language Pathology, 19, 51–65. [Article] [PubMed]×
Justice, L. M., Bowles, R. P., Pence Turnbull, K. L., & Skibbe, L. E. (2009). School readiness among children with varying histories of language difficulties. Developmental Psychology, 45, 460–476. [Article] [PubMed]
Justice, L. M., Bowles, R. P., Pence Turnbull, K. L., & Skibbe, L. E. (2009). School readiness among children with varying histories of language difficulties. Developmental Psychology, 45, 460–476. [Article] [PubMed]×
Justice, L. M., Logan, J. A., Kaderavek, J. N., & Dynia, J. M. (2015). Print-focused read-alouds in early childhood special education programs. Exceptional Children, 81, 292–311. [Article]
Justice, L. M., Logan, J. A., Kaderavek, J. N., & Dynia, J. M. (2015). Print-focused read-alouds in early childhood special education programs. Exceptional Children, 81, 292–311. [Article] ×
Kamhi, A. G. (1999). To use or not to use: Factors that influence the selection of new treatment approaches. Language, Speech, and Hearing Services in Schools, 30, 92–98. [Article] [PubMed]
Kamhi, A. G. (1999). To use or not to use: Factors that influence the selection of new treatment approaches. Language, Speech, and Hearing Services in Schools, 30, 92–98. [Article] [PubMed]×
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Brief Intelligence Test–Second Edition. Circle Pines, MN: AGS.
Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Brief Intelligence Test–Second Edition. Circle Pines, MN: AGS.×
La Paro, K. M., Kraft-Sayre, M., & Pianta, R. C. (2003). Preschool to kindergarten transition activities: Involvement and satisfaction of families and teachers. Journal of Research in Childhood Education, 17, 147–158. [Article]
La Paro, K. M., Kraft-Sayre, M., & Pianta, R. C. (2003). Preschool to kindergarten transition activities: Involvement and satisfaction of families and teachers. Journal of Research in Childhood Education, 17, 147–158. [Article] ×
Law, J., Garrett, Z., & Nye, C. (2004). The efficacy of treatment for children with developmental speech and language delay/disorder: A meta-analysis. Journal of Speech, Language, and Hearing Research, 47, 924–943. [Article]
Law, J., Garrett, Z., & Nye, C. (2004). The efficacy of treatment for children with developmental speech and language delay/disorder: A meta-analysis. Journal of Speech, Language, and Hearing Research, 47, 924–943. [Article] ×
Lonigan, C. J., Wagner, R. K., Torgesen, J. K., & Rashotte, C. A. (2007). Test of preschool early literacy. Austin, TX: Pro-Ed.
Lonigan, C. J., Wagner, R. K., Torgesen, J. K., & Rashotte, C. A. (2007). Test of preschool early literacy. Austin, TX: Pro-Ed.×
Mackie, C. J., Dockrell, J., & Lindsay, G. (2013). An evaluation of the written texts of children with SLI: The contributions of oral language, reading and phonological short-term memory. Reading and Writing, 26, 865–888. [Article]
Mackie, C. J., Dockrell, J., & Lindsay, G. (2013). An evaluation of the written texts of children with SLI: The contributions of oral language, reading and phonological short-term memory. Reading and Writing, 26, 865–888. [Article] ×
Margison, F. R., McGrath, G., Barkham, M., Clark, J. M., Audin, K., Connell, J., & Evans, C. (2000). Measurement and psychotherapy: Evidence-based practice and practice-based evidence. British Journal of Psychiatry, 177(2), 123–130. [Article] [PubMed]
Margison, F. R., McGrath, G., Barkham, M., Clark, J. M., Audin, K., Connell, J., & Evans, C. (2000). Measurement and psychotherapy: Evidence-based practice and practice-based evidence. British Journal of Psychiatry, 177(2), 123–130. [Article] [PubMed]×
McDermott, P. A., Rikoon, S. H., & Fantuzzo, J. W. (2014). Tracing children's approaches to learning through Head Start, kindergarten, and first grade: Different pathways to different outcomes. Journal of Educational Psychology, 106, 200–213. [Article]
McDermott, P. A., Rikoon, S. H., & Fantuzzo, J. W. (2014). Tracing children's approaches to learning through Head Start, kindergarten, and first grade: Different pathways to different outcomes. Journal of Educational Psychology, 106, 200–213. [Article] ×
McGrew, K. S., Schrank, F. A., & Woodcock, R. W. (2007). Technical manual. Woodcock-Johnson III. Itasca, IL: Riverside.
McGrew, K. S., Schrank, F. A., & Woodcock, R. W. (2007). Technical manual. Woodcock-Johnson III. Itasca, IL: Riverside.×
Newcomer, P. L., & Hammill, D. D. (2008). Test of Language Development—Primary: Fourth Edition. Austin, TX: Pro-Ed.
Newcomer, P. L., & Hammill, D. D. (2008). Test of Language Development—Primary: Fourth Edition. Austin, TX: Pro-Ed.×
Nye, C., Foster, S. H., & Seaman, D. (1987). Effectiveness of language intervention with the language/learning disabled. Journal of Speech and Hearing Disorders, 52, 348–357. [Article] [PubMed]
Nye, C., Foster, S. H., & Seaman, D. (1987). Effectiveness of language intervention with the language/learning disabled. Journal of Speech and Hearing Disorders, 52, 348–357. [Article] [PubMed]×
Paul, R., & Norbury, C. (2012). Language disorders from infancy through adolescence: Listening, speaking, reading, writing, and communicating. Amsterdam, the Netherlands: Elsevier Scientific.
Paul, R., & Norbury, C. (2012). Language disorders from infancy through adolescence: Listening, speaking, reading, writing, and communicating. Amsterdam, the Netherlands: Elsevier Scientific.×
Rimm-Kaufman, S. E., Pianta, R. C., & Cox, M. J. (2000). Teachers' judgments of problems in the transition to kindergarten. Early Childhood Research Quarterly, 15, 147–166. [Article]
Rimm-Kaufman, S. E., Pianta, R. C., & Cox, M. J. (2000). Teachers' judgments of problems in the transition to kindergarten. Early Childhood Research Quarterly, 15, 147–166. [Article] ×
Roberts, M. Y., & Kaiser, A. P. (2012). Assessing the effects of a parent-implemented language intervention for children with language impairments using empirical benchmarks: A pilot study. Journal of Speech, Language, and Hearing Research, 55, 1655–1670. [Article]
Roberts, M. Y., & Kaiser, A. P. (2012). Assessing the effects of a parent-implemented language intervention for children with language impairments using empirical benchmarks: A pilot study. Journal of Speech, Language, and Hearing Research, 55, 1655–1670. [Article] ×
Rowe, M. L., Raudenbush, S. W., & Goldin-Meadow, S. (2012). The pace of vocabulary growth helps predict later vocabulary skill. Child Development, 83, 508–525. [Article] [PubMed]
Rowe, M. L., Raudenbush, S. W., & Goldin-Meadow, S. (2012). The pace of vocabulary growth helps predict later vocabulary skill. Child Development, 83, 508–525. [Article] [PubMed]×
Schmitt, M. B., Justice, L., Logan, J., Schatschneider, C., & Bartlett, C. (2014). Do symptoms of language disorders align with treatment goals? An exploratory study of primary-grade students' IEPs. Journal of Communication Disorders, 52, 99–110. [Article] [PubMed]
Schmitt, M. B., Justice, L., Logan, J., Schatschneider, C., & Bartlett, C. (2014). Do symptoms of language disorders align with treatment goals? An exploratory study of primary-grade students' IEPs. Journal of Communication Disorders, 52, 99–110. [Article] [PubMed]×
Semel, E., Wiig, E. H., & Secord, W. A. (2003). Clinical Evaluation of Language Fundamentals–Fourth Edition. San Antonio, TX: The Psychological Corporation.
Semel, E., Wiig, E. H., & Secord, W. A. (2003). Clinical Evaluation of Language Fundamentals–Fourth Edition. San Antonio, TX: The Psychological Corporation.×
Snow, C. E. (2010, April 23 ). Academic language and the challenge of reading for learning about science. Science, 328, 450–452. [Article] [PubMed]
Snow, C. E. (2010, April 23 ). Academic language and the challenge of reading for learning about science. Science, 328, 450–452. [Article] [PubMed]×
Spaulding, T. (2010). Investigating mechanisms of suppression in preschool children with specific language impairment. Journal of Speech, Language, and Hearing Research, 53, 725–738. [Article]
Spaulding, T. (2010). Investigating mechanisms of suppression in preschool children with specific language impairment. Journal of Speech, Language, and Hearing Research, 53, 725–738. [Article] ×
Tager-Flusberg, H., Rogers, S., Cooper, J., Landa, R., Lord, C., Paul, R., … Yoder, P. (2009). Defining spoken language benchmarks and selecting measures of expressive language development for young children with autism spectrum disorders. Journal of Speech, Language, and Hearing Research, 52, 643–652. [Article]
Tager-Flusberg, H., Rogers, S., Cooper, J., Landa, R., Lord, C., Paul, R., … Yoder, P. (2009). Defining spoken language benchmarks and selecting measures of expressive language development for young children with autism spectrum disorders. Journal of Speech, Language, and Hearing Research, 52, 643–652. [Article] ×
Thordardottir, E. T., Weismer, S. E., & Evans, J. L. (2002). Continuity in lexical and morphological development in Icelandic and English-speaking 2-year-olds. First Language, 22, 3–28.
Thordardottir, E. T., Weismer, S. E., & Evans, J. L. (2002). Continuity in lexical and morphological development in Icelandic and English-speaking 2-year-olds. First Language, 22, 3–28.×
Tomblin, J. B., Zhang, X., Buckwalter, P., & O'Brien, M. (2003). The stability of primary language disorder: Four years after kindergarten diagnosis. Journal of Speech, Language, and Hearing Research, 46, 1283–1296. [Article]
Tomblin, J. B., Zhang, X., Buckwalter, P., & O'Brien, M. (2003). The stability of primary language disorder: Four years after kindergarten diagnosis. Journal of Speech, Language, and Hearing Research, 46, 1283–1296. [Article] ×
Wiig, E., Secord, W., & Semel, E. (2004). Clinical Evaluation of Language Fundamentals—Preschool: Second Edition. San Antonio, TX: The Psychological Corporation.
Wiig, E., Secord, W., & Semel, E. (2004). Clinical Evaluation of Language Fundamentals—Preschool: Second Edition. San Antonio, TX: The Psychological Corporation.×
Williams, K. T. (1999). Comprehensive assessment of spoken language. Circle Pines, MN: AGS.
Williams, K. T. (1999). Comprehensive assessment of spoken language. Circle Pines, MN: AGS.×
Williams, K. T. (2007). Expressive Vocabulary Test–Second Edition. Minneapolis, MN: Pearson.
Williams, K. T. (2007). Expressive Vocabulary Test–Second Edition. Minneapolis, MN: Pearson.×
Woodcock, R. W., McGraw, K. S., & Mather, N. (2001). Tests of achievement: Woodcock-Johnson III. Itasca, IL: Riverside.
Woodcock, R. W., McGraw, K. S., & Mather, N. (2001). Tests of achievement: Woodcock-Johnson III. Itasca, IL: Riverside.×
Young, A. R., Beitchman, J. H., Johnson, C., Douglas, L., Atkinson, L., Escobar, M., & Wilson, B. (2002). Young adult academic outcomes in a longitudinal sample of early identified language impaired and control children. Journal of Child Psychology and Psychiatry, 43, 635–645. [Article] [PubMed]
Young, A. R., Beitchman, J. H., Johnson, C., Douglas, L., Atkinson, L., Escobar, M., & Wilson, B. (2002). Young adult academic outcomes in a longitudinal sample of early identified language impaired and control children. Journal of Child Psychology and Psychiatry, 43, 635–645. [Article] [PubMed]×
Zucker, T. A., Cabell, S. Q., Justice, L. M., Pentimonti, J. M., & Kaderavek, J. N. (2013). The role of frequent, interactive prekindergarten shared reading in the longitudinal development of language and literacy skills. Developmental Psychology, 49, 1425–1439. [Article] [PubMed]
Zucker, T. A., Cabell, S. Q., Justice, L. M., Pentimonti, J. M., & Kaderavek, J. N. (2013). The role of frequent, interactive prekindergarten shared reading in the longitudinal development of language and literacy skills. Developmental Psychology, 49, 1425–1439. [Article] [PubMed]×
Appendix A
Norm-Referenced Measures of Language
Test Reliability Validity Sensitivity, specificity Normative sample
CASL Test–retest for individual subtests: 0.65–0.95 Intercorrelation coefficients: 0.30–0.79, low. Construct validity established by developmental progression of scores, intercorrelations of tests, factor structures of the indexes 1,700 children for standardization
CELF-4 Test–retest, 0.71–0.86; split-half, 0.71–0.92 (subtests); interscorer agreement, 0.88–0.99 Validity established by test content, response processes, internal structure, relationships with other variables, consequences of testing. 1 SD, sensitivity: 1.00, specificity: 0.82; 1.5 SD, sensitivity: 1.00, specificity: 0.89; 2 SD, sensitivity: 0.87, specificity: 0.96 2,650 children for standardization
CELF-P:2 Test–retest, 0.77–0.96; coefficient alpha, 0.88–0.97; split-half, 0.88–0.98; interscorer agreement, 0.95–0.97 1 SD, sensitivity: 0.82, specificity: 0.84; 1.5 SD, sensitivity: 0.88, specificity: 0.72; 2 SD, sensitivity: 0.95, specificity: 0.60 >1,500 preschool children for standardization
EVT-2 Split-half (by age), 0.93–0.94; split-half (by grade), 0.93; alternate form reliability (by age) 0.87; test–retest (by age), 0.95 Construct validity: EVT-2 correlated with the EVT, CASL, CELF-4, GRADE, and PPVT-4. Age-normed sample = 3,540; grade-normed sample = 2,003 (conormed 100% with PPVT-4)
Content validity: stimuli chosen from review of published reference works and represent 20 content areas.
OWLS Internal, >0.92; test–retest, 0.73–0.94 Strong correlation with WJ-III Normative Update Broad Reading Composite. 2,123 for standardization
Construct validity: established by factor analysis related to integrative language theory.
EOWPVT Coefficient alpha median, 0.95; test–retest, 0.97–0.98 Validity established by correlation of 0.43 with WISC-IV VCI; correlation of 0.68–0.86 with ROWPVT Standardized on >2,400
PPVT Split-half (by age), 0.94; split-half (by grade), 0.94–0.95; alternate form reliability (by age), 0.89; test–retest, 0.93 Validity established by correlations with EVT-2, CASL, CELF-4, GRADE, and PPVT-III; correlations with the CASL: 0.41–0.79; correlations with the CELF-4: 0.67–0.75 Age-normed sample = 3,540; grade-normed sample = 2,003
TOLD-P:4 Coefficient alpha, 0.97; test–retest, 0.80–0.90; interscorer agreement, >0.90 Validity established through content validity, criterion-prediction validity, and construct-identification validity. Strong correlations with PLOS, TOLD-I:4, and the WISC-IV. Sensitivity: 0.74; specificity: 0.88 Standardized sample = 1,009
WJ-III Test–retest, 0.70–0.96 (depending on age) Standardized sample = 8,818
TOPEL Coefficient alpha, 0.86–0.96; interscorer agreement, 0.96–0.98; test–retest, 0.81–0.91 Validity as a measure of early literacy established through content validity, criterion-prediction validity, and construct-identification validity. Standardized sample = 842
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.×
Test Reliability Validity Sensitivity, specificity Normative sample
CASL Test–retest for individual subtests: 0.65–0.95 Intercorrelation coefficients: 0.30–0.79, low. Construct validity established by developmental progression of scores, intercorrelations of tests, factor structures of the indexes 1,700 children for standardization
CELF-4 Test–retest, 0.71–0.86; split-half, 0.71–0.92 (subtests); interscorer agreement, 0.88–0.99 Validity established by test content, response processes, internal structure, relationships with other variables, consequences of testing. 1 SD, sensitivity: 1.00, specificity: 0.82; 1.5 SD, sensitivity: 1.00, specificity: 0.89; 2 SD, sensitivity: 0.87, specificity: 0.96 2,650 children for standardization
CELF-P:2 Test–retest, 0.77–0.96; coefficient alpha, 0.88–0.97; split-half, 0.88–0.98; interscorer agreement, 0.95–0.97 1 SD, sensitivity: 0.82, specificity: 0.84; 1.5 SD, sensitivity: 0.88, specificity: 0.72; 2 SD, sensitivity: 0.95, specificity: 0.60 >1,500 preschool children for standardization
EVT-2 Split-half (by age), 0.93–0.94; split-half (by grade), 0.93; alternate form reliability (by age) 0.87; test–retest (by age), 0.95 Construct validity: EVT-2 correlated with the EVT, CASL, CELF-4, GRADE, and PPVT-4. Age-normed sample = 3,540; grade-normed sample = 2,003 (conormed 100% with PPVT-4)
Content validity: stimuli chosen from review of published reference works and represent 20 content areas.
OWLS Internal, >0.92; test–retest, 0.73–0.94 Strong correlation with WJ-III Normative Update Broad Reading Composite. 2,123 for standardization
Construct validity: established by factor analysis related to integrative language theory.
EOWPVT Coefficient alpha median, 0.95; test–retest, 0.97–0.98 Validity established by correlation of 0.43 with WISC-IV VCI; correlation of 0.68–0.86 with ROWPVT Standardized on >2,400
PPVT Split-half (by age), 0.94; split-half (by grade), 0.94–0.95; alternate form reliability (by age), 0.89; test–retest, 0.93 Validity established by correlations with EVT-2, CASL, CELF-4, GRADE, and PPVT-III; correlations with the CASL: 0.41–0.79; correlations with the CELF-4: 0.67–0.75 Age-normed sample = 3,540; grade-normed sample = 2,003
TOLD-P:4 Coefficient alpha, 0.97; test–retest, 0.80–0.90; interscorer agreement, >0.90 Validity established through content validity, criterion-prediction validity, and construct-identification validity. Strong correlations with PLOS, TOLD-I:4, and the WISC-IV. Sensitivity: 0.74; specificity: 0.88 Standardized sample = 1,009
WJ-III Test–retest, 0.70–0.96 (depending on age) Standardized sample = 8,818
TOPEL Coefficient alpha, 0.86–0.96; interscorer agreement, 0.96–0.98; test–retest, 0.81–0.91 Validity as a measure of early literacy established through content validity, criterion-prediction validity, and construct-identification validity. Standardized sample = 842
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.×
×
Appendix B
Measures Used to Obtain Benchmarks for Children With Typically Developing Language
Measure Subtest Domain
CASL Syntax Construction Grammar
Paragraph Comprehension Grammar
Grammatical Morphemes Grammar
Grammaticality Judgment Grammar
Basic Concepts Vocabulary
Antonyms Vocabulary
Sentence Completion Multiple a
Nonliteral Language Supralinguistic
Inference Supralinguistic
Pragmatic Judgment Discourse
CELF-4 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Formulated Sentences Grammar
Expressive Vocabulary Vocabulary
Word Classes–Receptive 1 Vocabulary
Word Classes–Expressive 1 Vocabulary
Word Classes–Receptive 2 Vocabulary
Word Classes–Expressive 2 Vocabulary
Concepts and Following Directions Multiple a
EOWPVT No subtests (median used) Vocabulary
EVT Form A Vocabulary
Form B Vocabulary
OWLS Listening Comprehension Multiple a
Oral Expression Multiple a
PPVT Form A Vocabulary
Form B Vocabulary
TOLD-4 Syntactic Understanding Grammar
Sentence Imitation Grammar
Morphological Completion Grammar
Picture Vocabulary Vocabulary
Relational Vocabulary Vocabulary
Oral Vocabulary Vocabulary
WJ-III Picture Vocabulary Vocabulary
Oral Comprehension Multiple a
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.
Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.×
Measure Subtest Domain
CASL Syntax Construction Grammar
Paragraph Comprehension Grammar
Grammatical Morphemes Grammar
Grammaticality Judgment Grammar
Basic Concepts Vocabulary
Antonyms Vocabulary
Sentence Completion Multiple a
Nonliteral Language Supralinguistic
Inference Supralinguistic
Pragmatic Judgment Discourse
CELF-4 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Formulated Sentences Grammar
Expressive Vocabulary Vocabulary
Word Classes–Receptive 1 Vocabulary
Word Classes–Expressive 1 Vocabulary
Word Classes–Receptive 2 Vocabulary
Word Classes–Expressive 2 Vocabulary
Concepts and Following Directions Multiple a
EOWPVT No subtests (median used) Vocabulary
EVT Form A Vocabulary
Form B Vocabulary
OWLS Listening Comprehension Multiple a
Oral Expression Multiple a
PPVT Form A Vocabulary
Form B Vocabulary
TOLD-4 Syntactic Understanding Grammar
Sentence Imitation Grammar
Morphological Completion Grammar
Picture Vocabulary Vocabulary
Relational Vocabulary Vocabulary
Oral Vocabulary Vocabulary
WJ-III Picture Vocabulary Vocabulary
Oral Comprehension Multiple a
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.
Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.×
×
Appendix C
Measures Used to Obtain Benchmarks for Children With Language Impairment
Test Subtest Domain
STAR-2 Measures
 CELF-P:2 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Expressive Vocabulary Vocabulary
Basic Concepts Vocabulary
Concepts and Following Directions Multiple a
 TOPEL Expressive Vocabulary Vocabulary
STEPS Measures
 CELF-4 Word Structure Grammar
Recalling Sentences Grammar
Concepts and Following Directions Multiple a
Formulated Sentences Multiple a
 WJ-III Picture Vocabulary Vocabulary
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.
Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.×
Test Subtest Domain
STAR-2 Measures
 CELF-P:2 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Expressive Vocabulary Vocabulary
Basic Concepts Vocabulary
Concepts and Following Directions Multiple a
 TOPEL Expressive Vocabulary Vocabulary
STEPS Measures
 CELF-4 Word Structure Grammar
Recalling Sentences Grammar
Concepts and Following Directions Multiple a
Formulated Sentences Multiple a
 WJ-III Picture Vocabulary Vocabulary
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.
Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.×
×
Footnote
1 Note that for the Woodcock-Johnson III subtests, mean raw scores were IRT-based W scores.
Note that for the Woodcock-Johnson III subtests, mean raw scores were IRT-based W scores.×
Figure 1.

Observed effect sizes across all measures for children with typically developing language. The middle line represents the mean, and the top and bottom of the bars represent the observed maximum and minimum effect sizes, respectively, for each age.

 Observed effect sizes across all measures for children with typically developing language. The middle line represents the mean, and the top and bottom of the bars represent the observed maximum and minimum effect sizes, respectively, for each age.
Figure 1.

Observed effect sizes across all measures for children with typically developing language. The middle line represents the mean, and the top and bottom of the bars represent the observed maximum and minimum effect sizes, respectively, for each age.

×
Table 1. Descriptive statistics for children with language impairment by sample.
Descriptive statistics for children with language impairment by sample.×
Attribute N (%) M (SD) Minimum Maximum
STAR-2
 Core language 229 77.31 (16.81) 45 131
 Cognition 153 83.89 (17.63) 53 124
 Age (months) 229 51 (7.3) 36 67
  36–47 (=3 years) 73 (35)
  48–59 (=4 years) 107 (50)
  60–71 (=5 years) 31 (15)
 SES (income) 198 10.19 (6) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 159 (77)
  African American 28 (14)
  Other 18 (9)
STEPS
 Core Language 266 69.79 (16.39) 40 115
 Cognition 266 88.52 (11.62) 44 131
 Age (months) 268 75.9 (8.59) 59 96
  60–71 (=5 years) 82 (33)
  72–83 (=6 years) 119 (47)
  84–95 (=7 years) 50 (20)
 SES (income) 203 9.8 (5.72) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 145 (54)
  African American 27 (10)
  Other 29 (14)
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.×
Table 1. Descriptive statistics for children with language impairment by sample.
Descriptive statistics for children with language impairment by sample.×
Attribute N (%) M (SD) Minimum Maximum
STAR-2
 Core language 229 77.31 (16.81) 45 131
 Cognition 153 83.89 (17.63) 53 124
 Age (months) 229 51 (7.3) 36 67
  36–47 (=3 years) 73 (35)
  48–59 (=4 years) 107 (50)
  60–71 (=5 years) 31 (15)
 SES (income) 198 10.19 (6) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 159 (77)
  African American 28 (14)
  Other 18 (9)
STEPS
 Core Language 266 69.79 (16.39) 40 115
 Cognition 266 88.52 (11.62) 44 131
 Age (months) 268 75.9 (8.59) 59 96
  60–71 (=5 years) 82 (33)
  72–83 (=6 years) 119 (47)
  84–95 (=7 years) 50 (20)
 SES (income) 203 9.8 (5.72) 1 (<$5,000) 18 (>$85,000)
 Ethnicity
  White 145 (54)
  African American 27 (10)
  Other 29 (14)
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.
Note. STAR-2 = Sit Together and Read–2; SES = socioeconomic status; STEPS = Speech Therapy Experiences in Public Schools.×
×
Table 2. Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.
Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.×
Measure Age 3–4
Age 4–5
Age 5–6
Age 6–7
Age 7–8
Age 8–9
n ES n ES n ES n ES n ES n ES
CASL 400 0.74 300 1.04 200 1.91 200 1.01 200 0.39 200 0.62
CELF-4 NP NP 200 0.87 300 0.84 400 0.45 400 0.75
EVT 200 1.33 210 0.55 235 0.87 325 0.80 400 0.56 400 0.5
OWLS 400 0.97 325 0.65 250 0.95 250 0.68 251 0.56 249 0.66
EOWPVT 314 0.75 414 0.71 426 0.67 449 0.60 412 0.55 375 0.46
PPVT 200 1.33 210 0.65 235 0.78 325 0.86 400 0.58 400 0.59
TOLD-4 NP 348 0.52 450 0.66 534 0.42 492 0.41 NP
WJ-III 1,280 0.66 1,403 0.48 1,350 0.69 1,311 0.53 1,387 0.49 1,571 0.29
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).×
Table 2. Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.
Weighted effect sizes (ES, Hedges's g) across ages for each language measure for children with typically developing language.×
Measure Age 3–4
Age 4–5
Age 5–6
Age 6–7
Age 7–8
Age 8–9
n ES n ES n ES n ES n ES n ES
CASL 400 0.74 300 1.04 200 1.91 200 1.01 200 0.39 200 0.62
CELF-4 NP NP 200 0.87 300 0.84 400 0.45 400 0.75
EVT 200 1.33 210 0.55 235 0.87 325 0.80 400 0.56 400 0.5
OWLS 400 0.97 325 0.65 250 0.95 250 0.68 251 0.56 249 0.66
EOWPVT 314 0.75 414 0.71 426 0.67 449 0.60 412 0.55 375 0.46
PPVT 200 1.33 210 0.65 235 0.78 325 0.86 400 0.58 400 0.59
TOLD-4 NP 348 0.52 450 0.66 534 0.42 492 0.41 NP
WJ-III 1,280 0.66 1,403 0.48 1,350 0.69 1,311 0.53 1,387 0.49 1,571 0.29
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NP = one or both of the means were not provided in the test manual; EVT = Expressive Vocabulary Test (Williams, 2007); OWLS = Oral and Written Language Scales (Carrow-Woolfolk, 1995); EOWPVT = Expressive One Word Picture Vocabulary Test (Brownell, 2000); PPVT = Peabody Picture Vocabulary Test (Dunn & Dunn, 2007); TOLD-4 = Test of Language Development–Fourth Edition (Newcomer & Hammill, 2008); WJ-III = Woodcock-Johnson Test of Achievement–Third Edition (McGrew et al., 2007).×
×
Table 3. Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.
Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.×
Domain Age 3–4 Age 4–5 Age 5–6 Age 6–7 Age 7–8 Age 8–9
Grammar 0.58 0.71 0.81 0.70 0.37 0.77
Vocabulary 0.95 0.55 0.68 0.61 0.52 0.46
Overall language
M 0.82 0.60 0.74 0.64 0.49 0.44
SD 0.39 0.17 0.11 0.20 0.14 0.16
k 9 10 11 11 10 9
 Pooled n 2,794 3,210 3,346 3,694 3,691 3,346
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.×
Table 3. Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.
Weighted effect sizes (Hedges's g) for each domain across ages for children with typically developing language.×
Domain Age 3–4 Age 4–5 Age 5–6 Age 6–7 Age 7–8 Age 8–9
Grammar 0.58 0.71 0.81 0.70 0.37 0.77
Vocabulary 0.95 0.55 0.68 0.61 0.52 0.46
Overall language
M 0.82 0.60 0.74 0.64 0.49 0.44
SD 0.39 0.17 0.11 0.20 0.14 0.16
k 9 10 11 11 10 9
 Pooled n 2,794 3,210 3,346 3,694 3,691 3,346
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.
Note. Overall language represents all subtests. Weighted effect sizes are estimated across 11 independent samples. k = number of independent samples represented at that age. Standard deviation is for the k independent samples at that age.×
×
Table 4. Effect sizes (d) for children with language impairment by age and subtest.
Effect sizes (d) for children with language impairment by age and subtest.×
Subtest Age 3
Age 4
Age 5A
Age 5B
Age 6
Age 7
n d n d n d n d n d n d
CELF-P:2 and/or CELF-4
 Sentence Structure 73 0.70 108 0.70 31 0.59 NG NG NG NG NG NG
 Word Structure 72 0.64 106 0.77 31 0.26 82 0.56 119 0.48 50 0.57
 Expressive Vocabulary 72 0.73 108 0.83 31 0.36 NG NG NG NG NG 6
 Recalling Sentences 72 0.68 106 0.76 31 0.66 82 0.45 118 0.40 50 0.39
 Concepts and Following Directions 69 0.64 106 0.61 31 0.67 82 0.51 119 0.45 50 0.64
 Basic Concepts 71 0.72 107 0.67 NG NG NG NG NG NG NG NG
 Formulated Sentences NG NG NG NG NG NG 82 0.59 119 0.65 50 0.62
TOPEL: Expressive Vocabulary 74 0.79 106 0.64 31 0.51 NG NG NG NG NG NG
Woodcock-Johnson: Picture Vocabulary NG NG NG NG NG NG 82 0.46 118 0.50 50 0.54
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.×
Table 4. Effect sizes (d) for children with language impairment by age and subtest.
Effect sizes (d) for children with language impairment by age and subtest.×
Subtest Age 3
Age 4
Age 5A
Age 5B
Age 6
Age 7
n d n d n d n d n d n d
CELF-P:2 and/or CELF-4
 Sentence Structure 73 0.70 108 0.70 31 0.59 NG NG NG NG NG NG
 Word Structure 72 0.64 106 0.77 31 0.26 82 0.56 119 0.48 50 0.57
 Expressive Vocabulary 72 0.73 108 0.83 31 0.36 NG NG NG NG NG 6
 Recalling Sentences 72 0.68 106 0.76 31 0.66 82 0.45 118 0.40 50 0.39
 Concepts and Following Directions 69 0.64 106 0.61 31 0.67 82 0.51 119 0.45 50 0.64
 Basic Concepts 71 0.72 107 0.67 NG NG NG NG NG NG NG NG
 Formulated Sentences NG NG NG NG NG NG 82 0.59 119 0.65 50 0.62
TOPEL: Expressive Vocabulary 74 0.79 106 0.64 31 0.51 NG NG NG NG NG NG
Woodcock-Johnson: Picture Vocabulary NG NG NG NG NG NG 82 0.46 118 0.50 50 0.54
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.
Note. Ages 3, 4, and 5A represent data collected on the CELF-P:2 from the Sit Together and Read–2 sample; ages 5B, 6, and 7 represent data collected on the CELF-4 from the STEPS sample. Effect sizes represent average language growth over an academic year. CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; NG = subtest was not given to this sample at this age range; TOPEL = Test of Preschool Early Literacy.×
×
Table 5. Weighted effect sizes (g) across domain and age for children with language impairment.
Weighted effect sizes (g) across domain and age for children with language impairment.×
Domain Age 3 Age 4 Age 5 Age 6 Age 7
Grammar 0.67 0.74 0.50 0.44 0.48
Vocabulary 0.75 0.71 0.44 0.5 0.54
Overall language 0.7 0.71 0.51 0.5 0.55
n 72 106 113 119 50
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.×
Table 5. Weighted effect sizes (g) across domain and age for children with language impairment.
Weighted effect sizes (g) across domain and age for children with language impairment.×
Domain Age 3 Age 4 Age 5 Age 6 Age 7
Grammar 0.67 0.74 0.50 0.44 0.48
Vocabulary 0.75 0.71 0.44 0.5 0.54
Overall language 0.7 0.71 0.51 0.5 0.55
n 72 106 113 119 50
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.
Note. Effect sizes represent average language growth over an academic year. Effects for age 5 are weighted across both samples, and the two estimates were within 0.03 of one another.×
×
Test Reliability Validity Sensitivity, specificity Normative sample
CASL Test–retest for individual subtests: 0.65–0.95 Intercorrelation coefficients: 0.30–0.79, low. Construct validity established by developmental progression of scores, intercorrelations of tests, factor structures of the indexes 1,700 children for standardization
CELF-4 Test–retest, 0.71–0.86; split-half, 0.71–0.92 (subtests); interscorer agreement, 0.88–0.99 Validity established by test content, response processes, internal structure, relationships with other variables, consequences of testing. 1 SD, sensitivity: 1.00, specificity: 0.82; 1.5 SD, sensitivity: 1.00, specificity: 0.89; 2 SD, sensitivity: 0.87, specificity: 0.96 2,650 children for standardization
CELF-P:2 Test–retest, 0.77–0.96; coefficient alpha, 0.88–0.97; split-half, 0.88–0.98; interscorer agreement, 0.95–0.97 1 SD, sensitivity: 0.82, specificity: 0.84; 1.5 SD, sensitivity: 0.88, specificity: 0.72; 2 SD, sensitivity: 0.95, specificity: 0.60 >1,500 preschool children for standardization
EVT-2 Split-half (by age), 0.93–0.94; split-half (by grade), 0.93; alternate form reliability (by age) 0.87; test–retest (by age), 0.95 Construct validity: EVT-2 correlated with the EVT, CASL, CELF-4, GRADE, and PPVT-4. Age-normed sample = 3,540; grade-normed sample = 2,003 (conormed 100% with PPVT-4)
Content validity: stimuli chosen from review of published reference works and represent 20 content areas.
OWLS Internal, >0.92; test–retest, 0.73–0.94 Strong correlation with WJ-III Normative Update Broad Reading Composite. 2,123 for standardization
Construct validity: established by factor analysis related to integrative language theory.
EOWPVT Coefficient alpha median, 0.95; test–retest, 0.97–0.98 Validity established by correlation of 0.43 with WISC-IV VCI; correlation of 0.68–0.86 with ROWPVT Standardized on >2,400
PPVT Split-half (by age), 0.94; split-half (by grade), 0.94–0.95; alternate form reliability (by age), 0.89; test–retest, 0.93 Validity established by correlations with EVT-2, CASL, CELF-4, GRADE, and PPVT-III; correlations with the CASL: 0.41–0.79; correlations with the CELF-4: 0.67–0.75 Age-normed sample = 3,540; grade-normed sample = 2,003
TOLD-P:4 Coefficient alpha, 0.97; test–retest, 0.80–0.90; interscorer agreement, >0.90 Validity established through content validity, criterion-prediction validity, and construct-identification validity. Strong correlations with PLOS, TOLD-I:4, and the WISC-IV. Sensitivity: 0.74; specificity: 0.88 Standardized sample = 1,009
WJ-III Test–retest, 0.70–0.96 (depending on age) Standardized sample = 8,818
TOPEL Coefficient alpha, 0.86–0.96; interscorer agreement, 0.96–0.98; test–retest, 0.81–0.91 Validity as a measure of early literacy established through content validity, criterion-prediction validity, and construct-identification validity. Standardized sample = 842
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.×
Test Reliability Validity Sensitivity, specificity Normative sample
CASL Test–retest for individual subtests: 0.65–0.95 Intercorrelation coefficients: 0.30–0.79, low. Construct validity established by developmental progression of scores, intercorrelations of tests, factor structures of the indexes 1,700 children for standardization
CELF-4 Test–retest, 0.71–0.86; split-half, 0.71–0.92 (subtests); interscorer agreement, 0.88–0.99 Validity established by test content, response processes, internal structure, relationships with other variables, consequences of testing. 1 SD, sensitivity: 1.00, specificity: 0.82; 1.5 SD, sensitivity: 1.00, specificity: 0.89; 2 SD, sensitivity: 0.87, specificity: 0.96 2,650 children for standardization
CELF-P:2 Test–retest, 0.77–0.96; coefficient alpha, 0.88–0.97; split-half, 0.88–0.98; interscorer agreement, 0.95–0.97 1 SD, sensitivity: 0.82, specificity: 0.84; 1.5 SD, sensitivity: 0.88, specificity: 0.72; 2 SD, sensitivity: 0.95, specificity: 0.60 >1,500 preschool children for standardization
EVT-2 Split-half (by age), 0.93–0.94; split-half (by grade), 0.93; alternate form reliability (by age) 0.87; test–retest (by age), 0.95 Construct validity: EVT-2 correlated with the EVT, CASL, CELF-4, GRADE, and PPVT-4. Age-normed sample = 3,540; grade-normed sample = 2,003 (conormed 100% with PPVT-4)
Content validity: stimuli chosen from review of published reference works and represent 20 content areas.
OWLS Internal, >0.92; test–retest, 0.73–0.94 Strong correlation with WJ-III Normative Update Broad Reading Composite. 2,123 for standardization
Construct validity: established by factor analysis related to integrative language theory.
EOWPVT Coefficient alpha median, 0.95; test–retest, 0.97–0.98 Validity established by correlation of 0.43 with WISC-IV VCI; correlation of 0.68–0.86 with ROWPVT Standardized on >2,400
PPVT Split-half (by age), 0.94; split-half (by grade), 0.94–0.95; alternate form reliability (by age), 0.89; test–retest, 0.93 Validity established by correlations with EVT-2, CASL, CELF-4, GRADE, and PPVT-III; correlations with the CASL: 0.41–0.79; correlations with the CELF-4: 0.67–0.75 Age-normed sample = 3,540; grade-normed sample = 2,003
TOLD-P:4 Coefficient alpha, 0.97; test–retest, 0.80–0.90; interscorer agreement, >0.90 Validity established through content validity, criterion-prediction validity, and construct-identification validity. Strong correlations with PLOS, TOLD-I:4, and the WISC-IV. Sensitivity: 0.74; specificity: 0.88 Standardized sample = 1,009
WJ-III Test–retest, 0.70–0.96 (depending on age) Standardized sample = 8,818
TOPEL Coefficient alpha, 0.86–0.96; interscorer agreement, 0.96–0.98; test–retest, 0.81–0.91 Validity as a measure of early literacy established through content validity, criterion-prediction validity, and construct-identification validity. Standardized sample = 842
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; EVT-2 = Expressive Vocabulary Test–Second Edition; GRADE = Group Reading Assessment and Diagnostic Evaluation; PPVT-III and -4 = Peabody Picture Vocabulary Test–III and Fourth Editions; OWLS = Oral and Written Language Scales; EOWPVT= Expressive One Word Picture Vocabulary Test; WISC-IV VCI = Wechsler Intelligence Scale for Children–Fourth Edition: Verbal Comprehension Index; ROWPVT = Receptive One-Word Picture Vocabulary Test; TOLD-P:4 = Test of Language Development–Primary: Fourth Edition; PLOS = Pragmatic Language Observation Scale; TOLD-I:4 = Test of Language Development–Intermediate: Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition; TOPEL = Test of Preschool Early Literacy.×
×
Measure Subtest Domain
CASL Syntax Construction Grammar
Paragraph Comprehension Grammar
Grammatical Morphemes Grammar
Grammaticality Judgment Grammar
Basic Concepts Vocabulary
Antonyms Vocabulary
Sentence Completion Multiple a
Nonliteral Language Supralinguistic
Inference Supralinguistic
Pragmatic Judgment Discourse
CELF-4 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Formulated Sentences Grammar
Expressive Vocabulary Vocabulary
Word Classes–Receptive 1 Vocabulary
Word Classes–Expressive 1 Vocabulary
Word Classes–Receptive 2 Vocabulary
Word Classes–Expressive 2 Vocabulary
Concepts and Following Directions Multiple a
EOWPVT No subtests (median used) Vocabulary
EVT Form A Vocabulary
Form B Vocabulary
OWLS Listening Comprehension Multiple a
Oral Expression Multiple a
PPVT Form A Vocabulary
Form B Vocabulary
TOLD-4 Syntactic Understanding Grammar
Sentence Imitation Grammar
Morphological Completion Grammar
Picture Vocabulary Vocabulary
Relational Vocabulary Vocabulary
Oral Vocabulary Vocabulary
WJ-III Picture Vocabulary Vocabulary
Oral Comprehension Multiple a
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.
Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.×
Measure Subtest Domain
CASL Syntax Construction Grammar
Paragraph Comprehension Grammar
Grammatical Morphemes Grammar
Grammaticality Judgment Grammar
Basic Concepts Vocabulary
Antonyms Vocabulary
Sentence Completion Multiple a
Nonliteral Language Supralinguistic
Inference Supralinguistic
Pragmatic Judgment Discourse
CELF-4 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Formulated Sentences Grammar
Expressive Vocabulary Vocabulary
Word Classes–Receptive 1 Vocabulary
Word Classes–Expressive 1 Vocabulary
Word Classes–Receptive 2 Vocabulary
Word Classes–Expressive 2 Vocabulary
Concepts and Following Directions Multiple a
EOWPVT No subtests (median used) Vocabulary
EVT Form A Vocabulary
Form B Vocabulary
OWLS Listening Comprehension Multiple a
Oral Expression Multiple a
PPVT Form A Vocabulary
Form B Vocabulary
TOLD-4 Syntactic Understanding Grammar
Sentence Imitation Grammar
Morphological Completion Grammar
Picture Vocabulary Vocabulary
Relational Vocabulary Vocabulary
Oral Vocabulary Vocabulary
WJ-III Picture Vocabulary Vocabulary
Oral Comprehension Multiple a
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. CASL = Comprehensive Assessment of Spoken Language; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition ; EOWPVT = Expressive One Word Picture Vocabulary Test; EVT = Expressive Vocabulary Test; OWLS = Oral and Written Language Scales; PPVT = Peabody Picture Vocabulary Test; TOLD-4 = Test of Language Development–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.
Multiple = multiple domains for each subtest: CASL Sentence Completion: vocabulary, syntactic structure, and word retrieval; CELF-4 Concepts and Following Directions: listening comprehension and memory; OWLS Listening Comprehension: receptive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; OWLS Oral Expression: expressive vocabulary, grammar, pragmatic structures, and supralinguistic thinking; WJ-III Oral Comprehension: listening, reasoning, and vocabulary.×
×
Test Subtest Domain
STAR-2 Measures
 CELF-P:2 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Expressive Vocabulary Vocabulary
Basic Concepts Vocabulary
Concepts and Following Directions Multiple a
 TOPEL Expressive Vocabulary Vocabulary
STEPS Measures
 CELF-4 Word Structure Grammar
Recalling Sentences Grammar
Concepts and Following Directions Multiple a
Formulated Sentences Multiple a
 WJ-III Picture Vocabulary Vocabulary
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.
Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.×
Test Subtest Domain
STAR-2 Measures
 CELF-P:2 Sentence Structure Grammar
Word Structure Grammar
Recalling Sentences Grammar
Expressive Vocabulary Vocabulary
Basic Concepts Vocabulary
Concepts and Following Directions Multiple a
 TOPEL Expressive Vocabulary Vocabulary
STEPS Measures
 CELF-4 Word Structure Grammar
Recalling Sentences Grammar
Concepts and Following Directions Multiple a
Formulated Sentences Multiple a
 WJ-III Picture Vocabulary Vocabulary
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.
Note. STAR-2 = Sit Together and Read–2; CELF-P:2 = Clinical Evaluation of Language Fundamentals–Preschool: Second Edition; TOPEL = Test of Preschool Early Literacy; STEPS = Speech Therapy Experiences in Public Schools; CELF-4 = Clinical Evaluation of Language Fundamentals–Fourth Edition; WJ-III = Woodcock-Johnson Test of Achievement–Third Edition.×
a Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.
Multiple = multiple domains for each subtest: CELF-P:2 and CELF-4 Concepts and Following Directions: listening comprehension and memory; CELF-4 Formulated Sentences: semantics, syntax, and pragmatics.×
×