A Motor Speech Assessment for Children With Severe Speech Disorders: Reliability and Validity Evidence Purpose In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method Participants were 81 children between 36 and 79 ... Research Article
Free
Research Article  |   April 01, 2013
A Motor Speech Assessment for Children With Severe Speech Disorders: Reliability and Validity Evidence
 
Author Affiliations & Notes
  • Edythe A. Strand
    Mayo Clinic, Rochester, Minnesota
  • Rebecca J. McCauley
    The Ohio State University, Columbus
  • Stephen D. Weigand
    Mayo Clinic, Rochester, Minnesota
  • Ruth E. Stoeckel
    Mayo Clinic, Rochester, Minnesota
  • Becky S. Baas
    Mayo Clinic, Rochester, Minnesota
  • Correspondence to Edythe A. Strand: strand.edythe@mayo.edu
  • Editor: Jody Kreiman
    Editor: Jody Kreiman×
  • Associate Editor: Julie Liss
    Associate Editor: Julie Liss×
Article Information
Speech, Voice & Prosodic Disorders / Research Issues, Methods & Evidence-Based Practice / Speech / Research Articles
Research Article   |   April 01, 2013
A Motor Speech Assessment for Children With Severe Speech Disorders: Reliability and Validity Evidence
Journal of Speech, Language, and Hearing Research, April 2013, Vol. 56, 505-520. doi:10.1044/1092-4388(2012/12-0094)
History: Received March 24, 2012 , Revised July 9, 2012 , Accepted August 18, 2012
 
Journal of Speech, Language, and Hearing Research, April 2013, Vol. 56, 505-520. doi:10.1044/1092-4388(2012/12-0094)
History: Received March 24, 2012; Revised July 9, 2012; Accepted August 18, 2012
Web of Science® Times Cited: 7

Purpose In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS).

Method Participants were 81 children between 36 and 79 months of age who were referred to the Mayo Clinic for diagnosis of speech sound disorders. Children were given the DEMSS and a standard speech and language test battery as part of routine evaluations. Subsequently, intrajudge, interjudge, and test–retest reliability were evaluated for a subset of participants. Construct validity was explored for all 81 participants through the use of agglomerative cluster analysis, sensitivity measures, and likelihood ratios.

Results The mean percentage of agreement for 171 judgments was 89% for test–retest reliability, 89% for intrajudge reliability, and 91% for interjudge reliability. Agglomerative hierarchical cluster analysis showed that total DEMSS scores largely differentiated clusters of children with CAS vs. mild CAS vs. other speech disorders. Positive and negative likelihood ratios and measures of sensitivity and specificity suggested that the DEMSS does not overdiagnose CAS but sometimes fails to identify children with CAS.

Conclusions The value of the DEMSS in differential diagnosis of severe speech impairments was supported on the basis of evidence of reliability and validity.

Pediatric speech sound disorders (SSDs) result from a variety of etiologies and represent impairment at a number of different levels of speech production, including linguistic–phonologic and/or motor speech. One of the many challenges in differential diagnosis of SSD is determining to what degree motor speech impairment (i.e., deficits in planning–programming movement sequences and/or executing the movement) contributes to the child's SSD. In this article, we report evidence regarding the reliability and validity of the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new instrument intended to address this challenge. The DEMSS is intended to assist in differential diagnosis of SSD in younger children and/or those with more severe impairments, including children who have little or no functional verbal communication but can at least attempt imitation.
The DEMSS is specifically designed to identify those children who have difficulty with praxis for speech. This level of motoric difficulty is now most commonly designated by the label childhood apraxia of speech (CAS; e.g., American Speech-Language-Hearing Association [ASHA], 2007; Terband & Maassen, 2010). Isolating deficits in the ability to plan and program movement transitions between articulatory postures for volitional speech in children with SSD is difficult, in part because speech and language processes are interactive (Goffman, 2004; Kent, 2004; Strand, 1992). Because these motor deficits often occur along with deficits in phonology, symptomatology may reflect a combination of linguistic (phonologic) and motor speech deficits (CAS and/or dysarthria; Crary, 1993; Rvachew, Hodge, & Ohberg, 2005; Smith & Goffman, 2004). In the absence of a physiologic marker for CAS (Shriberg, 2003; Shriberg, Aram, & Kwiatkowski, 1997), clinicians are left to make this diagnosis on the basis of behavioral characteristics.
A number of possible discriminative behavioral characteristics for CAS have been proposed (e.g., Davis, Jakielski, & Marquardt, 1998; Ozanne, 1995; Shriberg etal., 2003). In an effort to encourage greater uniformity in methods used for diagnosis, treatment, and study of CAS, ASHA (2007)  undertook an examination of this area of practice. On the basis of an extensive review of the literature and expert testimony, authors of the resulting position statement came to the following conclusion:

[T]hree segmental and suprasegmental features that are consistent with a deficit in the planning and programming of movements for speech have gained some consensus among investigators in apraxia of speech in children: (a) inconsistent errors on consonants and vowels in repeated productions of syllables or words, (b) lengthened and disrupted co-articulatory transitions between sounds and syllables; and (c) inappropriate prosody, especially in the realization of lexical or phrasal stress. (p. 1)

Although these characteristics are broadly endorsed, the position statement contains the acknowledgment that no list of specific characteristics has yet been empirically validated for differential diagnosis.
Although ASHA's (2007)  position statement and other frequently cited publications (e.g., Davis etal., 1998) posit behavioral characteristics as discriminative for the diagnosis of CAS, they do not directly lead to assessment procedures. Typical strategies used in the clinical assessment of SSDs include taking a thorough history describing phonetic and phonemic inventories, administering oral structural–functional examinations, as well as giving articulation tests and/or measures of phonologic performance. Tests of articulation and phonology are designed for children who have at least a rudimentary speech sound inventory. Yet, there is a need for a tool to examine the performance of children who are very young and/or very limited in their speech production skills, including those who are nonverbal.
Motor Speech Examination
One type of assessment tool that may be adapted for use with these younger children is a motor speech examination (MSE), which is frequently used with adults to determine the presence of, or to rule out difficulty with, speech motor planning and programming (Duffy, 2005; McNeil, Robin, & Schmidt, 2009; Yorkston, Beukelman, Strand, & Hackel, 2010). It allows the clinician to observe speech production across utterances that vary in length and phonetic complexity using hierarchically organized stimuli—conditions that systematically vary programming demands. It also allows the clinician to make observations of behaviors frequently associated with deficits in speech praxis, including consonant and vowel distortions due to lengthened and/or distorted movement transitions, timing errors, dysprosody, and inconsistency across repeated trials. However, MSEs are less frequently used in assessing child SSDs (Skahan, Watson, & Lof, 2007), and, when used, they often lack evidence of psychometric quality (e.g., Guyette, 2001; McCauley, 2003; McCauley & Strand, 2008). Specifically, McCauley and Strand (2008)  found that of six published tests designed to assist in the diagnosis of motor speech disorders, only one—the Verbal Motor Production Assessment for Children (Hayden & Square, 1999)—provided evidence of validity, and none provided adequate evidence of reliability. Thus, this report is designed to describe our efforts in developing an MSE supported by more extensive evidence of reliability and validity.
Reliability and validity. Reliability evidence is fundamental to a test's development because measures of reliability estimate the degree to which a test is vulnerable to various sources of error (e.g., variation due to readministrations, testers, or within-tester inconsistencies), which constitute threats to the test's validity. This is especially true for an MSE for children because of the high potential for error presented by the nature of young children and their response to the artificialities of speech testing (Kent, Kent, & Rosenbek, 1987).
Validity refers to the degree a test measures what it purports to measure. Validity evidence for a test's use in diagnosis—that is, evidence supporting its contribution to accurate diagnosis—can be provided using several different approaches (Downing & Haladyna, 2006). Contrasting groups and correlations between the test under study and a gold standard (i.e., acknowledged valid measure) are particularly common methods (McCauley, 2001). Cluster analysis—a statistical method that divides data into meaningful groups sharing common characteristics—may provide an equally compelling method for exploration of a test's validity for use in diagnosis. In studies of communication disorders, researchers have used cluster analysis primarily to (a) probe constructs such as the existence of homogeneous subgroups within broader diagnoses, including autism spectrum disorders (e.g., Verté etal., 2006), specific language disorders (e.g., Conti-Ramsden & Botting, 1999), and SSDs (Arndt, Shelton, Johnson, & Furr, 1977; Johnson, Shelton, & Arndt, 1982; Peter & Stoel-Gammon, 2008) and (b) identify clusters of co-occurring speech and nonspeech characteristics in CAS (Ball, Bernthal, & Beukelman, 2002).
When used in test development, cluster analysis allows one to begin with a heterogeneous group of children (e.g., children with SSDs) and determine to what extent the test and its components (e.g., subscores) can identify meaningful subgroups—for example, ones that might be related to motor speech status or other unanticipated participant variables. When test-determined clusters match other bases for participant groupings (e.g., diagnosis using different methods), it provides valuable evidence of the test's construct (diagnostic) validity. This report on the DEMSS makes use of this type of evidence.
The Dynamic Evaluation of Motor Speech Skill
As a motor speech examination, the DEMSS systematically varies the length, vowel content, prosodic content, and phonetic complexity within sampled utterances. Thus, it is not intended to serve as yet another articulation test or test of phonologic proficiency, both of which typically sample all segments in a language. Rather the DEMMS is designed specifically to examine the speech movements of younger children and/or children who are more severely impaired, even those who may not yet produce many sounds, syllables, or words. Therefore, instead of sampling all American English speech sounds, the DEMSS focuses on earlier developing consonant sounds paired with a variety of vowels in several earlier developing syllable shapes. The stimuli and the scoring system are designed to allow examination of characteristics that have been frequently posited or reported in the literature to be associated with CAS (e.g., ASHA, 2007; Campbell, 2003; Davis etal., 1998; Strand, 2003), including lengthened and disrupted coarticulatory transitions between sounds and syllables (articulatory accuracy), inconsistency of errors on consonants and vowels across repeated trials, and prosodic accuracy such as lexical stress.
The DEMSS comprises nine subtests (see Table 1), consisting of 66 utterances. These 66 utterances are associated with 171 items (i.e., judgments) that contribute to four types of subscores (overall articulatory accuracy of the word, vowel accuracy, prosodic accuracy, and consistency). Judgments of overall articulatory accuracy are made for all 66 utterances, judgments of vowel accuracy are made for 56 of these utterances, judgments of prosodic accuracy (viz., lexical stress accuracy) are made for 21 of the utterances, and judgments of consistency are made for 28 of the utterances. The DEMSS total score is the sum of the four subscores (overall articulatory accuracy, vowel accuracy, prosodic accuracy, and consistency). Items are scored either during testing or from a videotape sample following administration of the test.
Table 1 Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.
Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.×
Utterance type (examples) No. of utterances No. of items judged for each subscore (range of possible scores)
Overall articulatory accuracy Vowel accuracy Prosodic accuracy Consistency
CV (me, hi) 8 8 (0–32) 8 (0–16) 4 (0–4)
VC (up, eat) 8 8 (0–32) 8 (0–16) 4 (0–4)
Reduplicated syllables (mama, booboo) 4 4 (0–16) 4 (0–4)
CVC1 (mom, peep, pop) 6 6 (0–24) 6 (0–12) 6 (0–6)
CVC2 (mad, bed, hop) 8 8 (0–32) 8 (0–16) 8 (0–8)
Bisyllabic 1 (baby, puppy) 5 5 (0–20) 5 (0–10) 5 (0–5)
Bisyllabic 2 (bunny, happy) 6 6 (0–24) 6 (0–6)
Multisyllabic (banana, kangaroo) 6 6 (0–24) 6 (0–12) 6 (0–6) 6 (0–6)
Utterances of increasing length (dad, hi dad, hi daddy) 15 15 (0–60) 15 (0–30)
Total utterances 66 66 56 21 28
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.×
Table 1 Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.
Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.×
Utterance type (examples) No. of utterances No. of items judged for each subscore (range of possible scores)
Overall articulatory accuracy Vowel accuracy Prosodic accuracy Consistency
CV (me, hi) 8 8 (0–32) 8 (0–16) 4 (0–4)
VC (up, eat) 8 8 (0–32) 8 (0–16) 4 (0–4)
Reduplicated syllables (mama, booboo) 4 4 (0–16) 4 (0–4)
CVC1 (mom, peep, pop) 6 6 (0–24) 6 (0–12) 6 (0–6)
CVC2 (mad, bed, hop) 8 8 (0–32) 8 (0–16) 8 (0–8)
Bisyllabic 1 (baby, puppy) 5 5 (0–20) 5 (0–10) 5 (0–5)
Bisyllabic 2 (bunny, happy) 6 6 (0–24) 6 (0–6)
Multisyllabic (banana, kangaroo) 6 6 (0–24) 6 (0–12) 6 (0–6) 6 (0–6)
Utterances of increasing length (dad, hi dad, hi daddy) 15 15 (0–60) 15 (0–30)
Total utterances 66 66 56 21 28
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.×
×
The DEMSS uses dynamic assessment (Bain, 1994; Glaspey & Stoel-Gammon, 2007; Lidz & Peña, 1996), in which multiple attempts are elicited for scoring as the clinician uses cues and other strategies (e.g., slowed rate or simultaneous production) designed to facilitate performance. Dynamic assessment may be especially important in assessing SSDs. For some children who have few sounds, syllables, or words, even modest support may allow them to produce the utterance more accurately, showing emerging skills. Further, when the child is attempting to imitate specific speech movements with cuing, he or she may increase attention and/or effort toward achieving a particular spatial or temporal target. This allows observation of groping, segmentation, timing errors, or other characteristics associated with CAS that are frequently not evident in spontaneous utterances or in noncued repetitions.
The cuing used in dynamic assessment has the potential to facilitate judgments of severity and therefore prognosis, as well as to facilitate treatment planning. For example, if a child consistently needs considerable cuing to correctly produce a target or never produces it correctly despite cuing, his or her problem is seen as more severe and the prognosis for rapid improvement as more guarded (Peña etal., 2006). Treatment planning is facilitated in that the types of cues that proved helpful during the administration of the test suggest cuing strategies that are likely to be useful in treatment. Further, reviewing errors on specific vowels and across particular syllable shapes facilitates choices of content and complexity of early stimulus sets. Because the entire DEMSS uses dynamic assessment, judgments of severity and prognosis are facilitated beyond tools currently available.
During administration of the DEMSS, the clinician asks the child to imitate a series of words, with the child's eyes directed to the clinician's face as much as possible. Depending on the child's initial imitation of an utterance, the clinician may elicit one or more additional imitative attempts, with various levels of cuing, before scoring is completed. Visual, temporal, and tactile cues are implemented to help the child improve accuracy of production over repeated trials, with multidimensional scoring reflecting his or her responsiveness to cuing. (Please refer to the Appendix for examples of the multidimensional scoring.)
Table 2 describes basic rules for assigning specific scores within the four subscores—overall articulatory accuracy, vowel accuracy, prosody (lexical stress accuracy), and consistency (with higher scores reflecting poorer performance). Only vowel accuracy and prosody are always scored on the first attempt at an utterance. Overall articulatory accuracy may be scored based on subsequent trials. Consistency is always scored after all attempts on an item have been made. If the child's initial repetition is correct, the utterance is also scored for overall articulatory accuracy on the first trial. Then, the child is asked to imitate the utterance again so that consistency can be judged. If the initial response is incorrect, overall articulatory accuracy and consistency are scored on the basis of the child's subsequent efforts in response to a cuing hierarchy. Specifically, after an initial incorrect attempt, the examiner provides another auditory model while using a gesture to highlight the clinician's articulatory configuration (e.g., pointing with thumb and forefinger to rounded or closed lips) and making sure the child is watching the clinician's face. If the response is still incorrect, the clinician provides additional cuing (e.g., tactile cuing and/or has the child repeat the utterance simultaneously with the clinician, using slower movement gestures). After three or four unsuccessful attempts with cuing, the utterance is scored as incorrect, thereby receiving a score of 4 for overall articulatory accuracy.
Table 2 DEMSS scoring.
DEMSS scoring.×
Assigning specific scores
Overall articulatory accuracy: 5-point multidimensional scoring0 = correct on first attempt1 = consistent developmental substitution error (e.g., /t/ for /k/; /w/ for /r/) without slowness or distortion of movement gestures2 = correct after first cued attempt3 = correct after two or three additional cued attempts4 = not correct after all cued attempts
Vowel accuracy: 3-point multidimensional scoring0 = correct1 = mild distortion2 = frank distortion
Prosodic accuracy: Binary scoring0 = correct1 = incorrect
Consistency: Binary scoring0 = consistent across all trials1 = inconsistent across any 2 or more trials
Table 2 DEMSS scoring.
DEMSS scoring.×
Assigning specific scores
Overall articulatory accuracy: 5-point multidimensional scoring0 = correct on first attempt1 = consistent developmental substitution error (e.g., /t/ for /k/; /w/ for /r/) without slowness or distortion of movement gestures2 = correct after first cued attempt3 = correct after two or three additional cued attempts4 = not correct after all cued attempts
Vowel accuracy: 3-point multidimensional scoring0 = correct1 = mild distortion2 = frank distortion
Prosodic accuracy: Binary scoring0 = correct1 = incorrect
Consistency: Binary scoring0 = consistent across all trials1 = inconsistent across any 2 or more trials
×
Despite the careful selection of stimuli, cuing methods, and scoring procedures undertaken during the development of the DEMSS, its value as an MSE can be firmly asserted only following empirical demonstrations of its reliability and validity. In particular, further development of the DEMSS required that it be studied to determine the extent to which its scores were consistent across different testing occasions and test givers and accurate when used for diagnostic purposes.
Purpose
The purpose of this study was to examine the reliability and validity of this dynamic test of motor speech skill in children. Specifically, we studied the DEMSS's intraexaminer, interexaminer, and test–retest reliability. We also examined its ability to distinguish subgroups within a larger group of 81 children with SSDs. Our goal of this work was to demonstrate the construct validity of the DEMSS for use in identifying children who have speech motor difficulty, especially difficulty with praxis, and therefore need to have motoric approaches included in treatment strategies.
Method
Participants
This study was approved by the Mayo Clinic Institutional Review Board, and legal guardians of participants agreed to the children's participation. Participants included 81 children (63 males, 18 females) between the ages of 36 and 79 months who were consecutively referred for evaluations at the Mayo Clinic for concerns regarding SSDs. Exclusionary criteria included structural deficits (e.g., cleft palate), hearing loss, English as a second language, autism spectrum disorder, and dysarthria. These were determined by the medical record, history, and clinical examination. In the case of dysarthria, the structural functional examination and clinical observations were also used. Children with dysarthria were excluded from participation because the DEMSS was not designed to prove useful in its differential diagnosis (Yorkston etal., 2010). A specific cutoff for cognitive skill was not determined for inclusion in the study. Participants needed to be able to attend to the clinician for the duration of the DEMSS, attempt the direct imitation, and tolerate cuing. Table 3 reports descriptive data for all participants.
Table 3 Participant characteristics ordered by DEMSS score within diagnostic group.
Participant characteristics ordered by DEMSS score within diagnostic group.×
Group ID DEMSS GFTA Sex Age (mos) PPVT RLS ELS
Non-CAS 6 0 97 F 72 93 100 75
8 0 105 F 62 114 105 115
65 4 69 M 59 126 119 115
59 4 72 M 62 119 94 108
50 4 97 M 50 101 81 95
45 6 61 M 72 77 93 93
17 7 92 F 54 120 102 89
5 7 97 M 61 108 85 87
75 10 83 M 62 119 122 101
43 11 80 M 52 123 110 95
42 11 86 M 53 115 110 103
49 12 106 M 49 97 87 93
35 13 97 M 61 87 82 84
2 17 80 M 66 110 108 100
61 18 84 M 38 89 81 80
14 18 91 M 51 100 95 95
47 19 77 M 60 77 85 80
80 20 54 M 63 109 97 80
81 20 77 M 46 101 103 93
24 20 91 M 64 111 97 102
54 21 106 M 39 129 112 122
23 22 71 F 53 99 104 103
36 22 88 F 40 110 116 114
21 22 92 M 64 94 78 73
26 23 90 F 58 85 84
72 24 40 M 75 92 97 73
46 25 46 F 61 85 99
38 25 85 M 45 98 98 92
78 27 60 M 61 112 115 88
73 28 77 F 45 119 107 92
83 29 73 M 47 103 107 96
19 30 80 M 44 93 90 69
12 32 78 M 48 95 92 86
18 34 95 F 44 106 86 92
44 38 45 M 72 93 88 87
15 41 107 M 39 126 96 100
27 42 49 M 67 78 58 70
29 42 78 M 40 102 104 100
74 45 78 M 47 77 88 74
63 46 70 M 56 63 73 67
62 50 78 M 45 118 107 110
10 52 80 M 64 63 67 63
48 54 82 M 56 92 91 78
60 55 78 M 57 102 83 78
52 57 73 M 56 95 95 80
11 62 69 M 73 70 69 59
76 62 76 M 45 91 90 83
3 63 83 F 39 109 114 112
34 82 81 M 36 92 92 102
1 82 93 M 39 97 94 81
58 83 62 F 54 112 105 93
32 85 82 F 53 77
4 95 60 M 71 103 78 86
71 101 56 F 47 103 104 86
20 103 81 M 36 94 85 86
31 106 71 M 39 108 96 100
7 109 84 M 46 77
13 113 60 M 45 110 113 119
51 120 49 M 61 88 81 70
37 134 69 M 43 85 86 82
79 205 40 M 56 83 76 82
Mild CAS 66 54 56 M 56 119 99 123
57 74 55 F 63 99 94 85
16 99 47 M 71 86 69 65
9 112 50 M 68 90 84 70
30 123 62 F 45 123 90 92
28 145 40 F 69 84 74 75
67 159 65 M 44 83 73 72
56 159 81 M 37 90
Severe CAS 41 46 46 M 69 102 103 99
25 73 58 M 48 85 76 86
70 161 < 40 M 71 80 64
82 212 71 M 53 106 89 74
39 232 78 M 38 90 90 80
40 237 40 M 74 88 92
64 240 40 M 70 75 59 64
22 260 52 F 45 101 87 79
55 270 40 F 79 68
77 328 60 M 84 64 50 50
53 337 59 M 37 98 86 90
33 425 40 M 43 95 84 57
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.×
Table 3 Participant characteristics ordered by DEMSS score within diagnostic group.
Participant characteristics ordered by DEMSS score within diagnostic group.×
Group ID DEMSS GFTA Sex Age (mos) PPVT RLS ELS
Non-CAS 6 0 97 F 72 93 100 75
8 0 105 F 62 114 105 115
65 4 69 M 59 126 119 115
59 4 72 M 62 119 94 108
50 4 97 M 50 101 81 95
45 6 61 M 72 77 93 93
17 7 92 F 54 120 102 89
5 7 97 M 61 108 85 87
75 10 83 M 62 119 122 101
43 11 80 M 52 123 110 95
42 11 86 M 53 115 110 103
49 12 106 M 49 97 87 93
35 13 97 M 61 87 82 84
2 17 80 M 66 110 108 100
61 18 84 M 38 89 81 80
14 18 91 M 51 100 95 95
47 19 77 M 60 77 85 80
80 20 54 M 63 109 97 80
81 20 77 M 46 101 103 93
24 20 91 M 64 111 97 102
54 21 106 M 39 129 112 122
23 22 71 F 53 99 104 103
36 22 88 F 40 110 116 114
21 22 92 M 64 94 78 73
26 23 90 F 58 85 84
72 24 40 M 75 92 97 73
46 25 46 F 61 85 99
38 25 85 M 45 98 98 92
78 27 60 M 61 112 115 88
73 28 77 F 45 119 107 92
83 29 73 M 47 103 107 96
19 30 80 M 44 93 90 69
12 32 78 M 48 95 92 86
18 34 95 F 44 106 86 92
44 38 45 M 72 93 88 87
15 41 107 M 39 126 96 100
27 42 49 M 67 78 58 70
29 42 78 M 40 102 104 100
74 45 78 M 47 77 88 74
63 46 70 M 56 63 73 67
62 50 78 M 45 118 107 110
10 52 80 M 64 63 67 63
48 54 82 M 56 92 91 78
60 55 78 M 57 102 83 78
52 57 73 M 56 95 95 80
11 62 69 M 73 70 69 59
76 62 76 M 45 91 90 83
3 63 83 F 39 109 114 112
34 82 81 M 36 92 92 102
1 82 93 M 39 97 94 81
58 83 62 F 54 112 105 93
32 85 82 F 53 77
4 95 60 M 71 103 78 86
71 101 56 F 47 103 104 86
20 103 81 M 36 94 85 86
31 106 71 M 39 108 96 100
7 109 84 M 46 77
13 113 60 M 45 110 113 119
51 120 49 M 61 88 81 70
37 134 69 M 43 85 86 82
79 205 40 M 56 83 76 82
Mild CAS 66 54 56 M 56 119 99 123
57 74 55 F 63 99 94 85
16 99 47 M 71 86 69 65
9 112 50 M 68 90 84 70
30 123 62 F 45 123 90 92
28 145 40 F 69 84 74 75
67 159 65 M 44 83 73 72
56 159 81 M 37 90
Severe CAS 41 46 46 M 69 102 103 99
25 73 58 M 48 85 76 86
70 161 < 40 M 71 80 64
82 212 71 M 53 106 89 74
39 232 78 M 38 90 90 80
40 237 40 M 74 88 92
64 240 40 M 70 75 59 64
22 260 52 F 45 101 87 79
55 270 40 F 79 68
77 328 60 M 84 64 50 50
53 337 59 M 37 98 86 90
33 425 40 M 43 95 84 57
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.×
×
Procedure
All participants completed a comprehensive test battery typically given at the Mayo Clinic for children who are seen for concerns regarding SSDs. Assessment was completed by one of four certified speech-language pathologists who provide pediatric assessments at the Mayo Clinic. The assessment battery included standardized receptive and expressive language testing using the receptive and expressive subtests of the oral language scales of the Oral and Written Language Scales (Carrow-Woolfolk, 1997) or the Preschool Language Scales, Fourth Edition (PLS–4; Zimmerman, Steiner, & Pond, 2002) as well as the Peabody Picture Vocabulary Test—III (PPVT–III; Dunn & Dunn, 1997). A 15-min language sample was elicited through play and by engaging the child in topics of interest. The sample was used clinically to obtain mean length of utterance, to obtain phonetic and phonemic inventories, and to make observations regarding morphology and syntax. The Evaluation of Oral Function and Praxis (EOFP), an unpublished oral structural functional examination similar to that described by Strand and McCauley (1999), was used to examine neuromuscular status, cranial nerve function (strength, speed of movement, range of motion, etc., for each of the oral structures), and oral nonverbal praxis. The Goldman Fristoe Test of Articulation—Second Edition (GFTA–2; Goldman & Fristoe, 2000) was used to examine accuracy of consonant production in an isolated word context and to identify relevant consistent developmental substitution errors. The DEMSS was used for the motor speech examination by three of the four assessing clinicians. Two clinicians (fourth and fifth authors) underwent training on the test. The test's developer (first author) spent one session with them, explaining and demonstrating the administration and scoring. The clinicians were also given a sheet of scoring instructions. The first author then co-administered or observed video recordings of at least five of each of their administrations prior to beginning the study. Feedback was provided on both methods of elicitation and scoring. For those children assessed by the fourth assessing clinician, the DEMSS was administered by the first author, the fourth author, or the fifth author immediately following their evaluation.
Given that the study was conducted in the context of standard clinical practice at the Mayo Clinic, three of the four clinicians who conducted the clinical exam and speech diagnosis also administered the DEMSS as part of the assessment protocol. In order to reduce possible bias, the subscores (overall articulatory accuracy, vowel accuracy, prosodic accuracy, and consistency) and total scores on the DEMSS were not calculated prior to dictating the report, which was done soon after the assessment session. These reports included a statement of differential diagnosis based on the examiner's clinical observations and examination of language and articulation test results. Although observations included those made during administration of the DEMSS as well as all other assessment tasks, the DEMSS scores were not explicitly considered in the diagnosis.
In our clinical practice, diagnoses related to apraxia of speech typically indicate whether the difficulty with praxis for speech movement is the primary deficit, which is reported as CAS. Alternatively, if difficulty with speech praxis contributes in a lesser or mild way to the child's SSD, it is reported as either “phonologic impairment with mild deficits in motor planning/programming (mild CAS)” or “articulation errors with evidence for mild CAS.” (This is because CAS occurs on a continuum of severity. Some children have such severe deficits in praxis that they have trouble with even CVC syllables. Others may exhibit various degrees of characteristics of apraxia, even though the primary problem may be phonology or residual articulation errors.) For the purposes of this article, this clinical distinction is dichotomized as CAS or mild CAS (mCAS), respectively.
Data Preparation and Handling
Data for each child were recorded on assessment protocol sheets by the clinician and then were entered into a database by professional data entry staff. Accurate data entry was ensured through double data-entry techniques in which the data were entered by each of two independent data entry personnel, results were compared, and any discrepancies were resolved. Children were identified only by a sequential study identifier.
Reliability Procedures and Analyses
We analyzed three components of DEMSS reliability: (a) test–retest reliability, (b) intrajudge reliability, and (c) interjudge reliability of the instrument using subgroups of the 81 participants. This type of sampling was used in other studies of reliability (McLeod, Harrison, & McCormack, 2012; Wetherby, Allen, Cleary, Kublin, & Goldstein, 2002). Included in the test–retest analysis were results from 11 children who were able to return for readministration of the DEMSS within 1 week of their initial test. (The first and fourth authors readministered to seven and four children, respectively.) Twelve randomly chosen children were used for the intrajudge reliability (15%). The first and fourth authors each rescored the DEMSS for six of those 12 randomly selected participants based on review of the videotape of their original administration of the instrument. Twenty children were randomly chosen for the interjudge reliability analyses (25%). We chose to use a larger percentage because we wanted to make a strong case for reliability across clinicians. In this case, the third author scored 10 of those DEMSS administered by the first author and 10 of those DEMSS administered by the fourth author, using the videotapes of the original assessment. Because reliability is more difficult to obtain for children who have more severe impairment, we examined the randomly selected sample for levels of severity and found that 25% of the children randomly selected for the examination of reliability had severe impairment. To evaluate the three components of reliability, we examined percentages of agreement as well as intraclass correlation coefficients (ICCs) of the DEMSS total score and subscores. As a measure of the correlation among repeated measurements on the same participant, the ICC can be thought of as the proportion of total variability in a measurement that can be attributed to participant-to-participant variability (Streiner & Norman, 2003). Because high ICC values suggest that other sources of variability are relatively small, they provide evidence of reliability. For example, if 90% of the variability in a measurement was attributed to participant-to-participant variability and 10% was attributed to within-participant variability, the ICC would be 0.90. We estimated the ICC using random-intercept mixed-effects linear regression models (Pinheiro & Bates, 2002), in which one treats participant as a random effect, with the fixed-effects portion of the model including only an intercept.
Validity Procedures
We used two complementary approaches to assess validity, one that does not use CAS status a priori and the other that does. In the first approach, which does not use CAS status a priori, we used cluster analysis (Hastie, Tibshirani, & Friedman, 2003) to determine the degree to which the DEMSS identifies meaningful subgroups of children with speech disorders. In the second approach, which does take a priori CAS status into account, we examined the ability of the DEMSS scores to discriminate between children who were clinically classified as having CAS or mCAS versus those who were not.
Cluster Analysis Methods
We used a hierarchical agglomerative cluster analysis to identify groups of children with similar profiles of performance on the DEMSS subscores independent of their clinical diagnosis. We chose this method of cluster analysis because it provides a graphic representation of a series of clusterings (e.g., two clusters, then three clusters, etc.), which can more informatively summarize the heterogeneity of the subject group than alternative methods such as k-means cluster analysis, which does not provide this sequence of clusters (Hastie etal., 2003). The four DEMSS subscores (overall articulatory accuracy, vowel accuracy, prosody, and consistency) were the four variables considered. Because of positive skewness, each variable was transformed using the square root transformation. So that each variable was on a common scale, we converted transformed subscores to z scores by subtracting the variable mean from each raw score then dividing the result by the variable SD.
The cluster analysis algorithm starts with each participant forming his or her own cluster. Next, the two clusters that are least dissimilar are merged. The algorithm repeats this procedure until there are only two remaining clusters and stops after merging these two clusters into one single cluster consisting of all participants. The dissimilarity measure we used is the Euclidian distance, defined as the square root of the sum of squared differences. For example, if the four normalized subscores for Participant A were {0.3, 0.5, 0.6, 1.1} and for Participant B were {0.7, 0.2, 0.5, 0.0}, the dissimilarity between the two participants would be calculated as the square root of (0.3 − 0.7)2 + (0.5 − 0.2)2 + (0.6 − 0.5)2 + (1.1 − 0.0)2. The dissimilarity between two clusters is defined as the mean of all pairwise dissimilarities between participants. The cluster analysis is summarized using a dendrogram, which graphically represents a series of nested clusters in such a way that the members of each cluster are individually identified and the distance or dissimilarity between clusters is indicated by the vertical axis.
To supplement the cluster approach, we used the gap statistic to assess the number of clusters supported by the data (Tibshirani, Walther, & Hastie, 2001). The gap statistic is based on summing across clusters the within-cluster sum of squared errors (SSE). Additional clusters will always reduce the total SSE, but the gap statistic can be used to determine the point at which increasing the number of clusters no longer reduces the total SSE beyond what might be expected by chance. Thus, the gap statistic is not a direct hypothesis test of the number of clusters in a data set but instead helps examine the efficiency (parsimoniousness) of a particular solution.
To evaluate the clustering in the context of other measurements, we compared scores across the clusters for the following variables: gender, age, GFTA, PPVT, receptive language standard score (RLS), and expressive language standard score (ELS). Gender was evaluated with a χ2 test, whereas the others were evaluated with a nonparametric Kruskal–Wallis test due to skewness in some measures.
Discrimination Methods
To minimize the potential confound of diagnosis and DEMSS administration having been completed by the same individuals, we structured the study so that the clinician's diagnostic activities and calculation of DEMSS scores were separate tasks. Weeks after all of the children had completed the protocol and after cluster data analyses were completed, the experimenters went back to the children's medical records. At that time, we categorized children on the basis of whether their medical record indicated a diagnosis of (a) CAS (characteristics of apraxia of speech as the major contributor to their SSD), (b) mCAS (phonologic impairment and/or residual articulation errors, with a contribution of at least mild difficulty with speech praxis), or (c) any other SSD. We then compared the diagnostic categorization with the results of the DEMSS cluster analysis.
We used several methods to evaluate the ability of the DEMSS total score and the four subscores to discriminate between participants with CAS (i.e., those with CAS or mCAS diagnoses) and those without CAS. Unlike the cluster analysis described above, these methods explicitly used the participants' clinically defined CAS status, which was taken as the reference standard in this study. When a test is used for diagnosis, sensitivity refers to the extent to which the test correctly identifies those with the disorder as having the disorder, whereas specificity refers to the extent to which it correctly identifies those without the disorder as not having the disorder (Dollaghan, 2007). The cut point of a test or subtest is the score that separates scores considered indicative of disorder from those that are considered indicative of no disorder. The cut points used in this study were empirically derived from univariate binary logistic regression models. The cut points were defined as the smallest value of the variable (total score or subscore) such that the estimated probability of having CAS (viz., having a clinical diagnosis of CAS or mCAS) is greater than 50%.
Because sensitivity and specificity are both affected by the sample base rate (i.e., the likelihood that affected individuals will be included in the sample), we also calculated two other measures of accuracy that are less likely to be affected by sample base rates (Dollaghan, 2007). This is important in this study because there is almost certainly a referral bias at the Mayo Clinic, a tertiary medical center to which many families come for a second opinion about possible CAS. The additional accuracy measures were positive and negative likelihood ratios: LR+ and LR−, respectively. LR+ indicates how much more likely a positive test result is for someone with the disorder than for someone without the disorder. LR−, on the other hand, indicates how much more likely a negative test result is for someone with the disorder than for someone without the disorder. For diagnostic tests, Dollaghan (2007)  recommended LR+ exceeding 10 and LR− being less than 0.10.
As a final method for examining the discrimination ability of the DEMSS, we used a nonparametric estimate of the area under the receiver operating characteristic (AUROC) curve and 95% bootstrap confidence intervals to quantify discrimination. The AUROC can be thought of as the average sensitivity of a measurement as well as an estimate of the probability of correctly classifying two children, one with CAS and one without CAS, based only on the DEMSS. We compared the discriminative ability of the DEMSS total score and subscores by using the method described by DeLong, DeLong, and Clarke-Pearson (1988) .
Results
Reliability
Results for the reliability analyses are reported in order for test–retest reliability, intrajudge reliability, and interjudge reliability. For each type of reliability, we report the mean percentage of agreement for the 171 judgments as well as the ICC. Figure 1 illustrates the observed agreement for the three reliability conditions.
Figure 1.

A: Relationship between first and second assessment on each subject. B: First and second assessment for the rater. C: The two raters' assessment. Because of the large range of values, the data are shown on a log-transformed scale.

 A: Relationship between first and second assessment on each subject. B: First and second assessment for the rater. C: The two raters' assessment. Because of the large range of values, the data are shown on a log-transformed scale.
Figure 1.

A: Relationship between first and second assessment on each subject. B: First and second assessment for the rater. C: The two raters' assessment. Because of the large range of values, the data are shown on a log-transformed scale.

×
Test–retest reliability. For test–retest reliability, the mean percentage of agreements was 89%, with a range from 77% to 99%. The data for these calculations were obtained from the 11 participants for whom test–retest reliability data were obtained and consisted of mean percentage of agreement across the 171 judgments per participant at Time 1 and Time 2. The test–retest column of Table 4 shows the ICCs for the total DEMSS score and for the subscores. These ranged from 0.82 for the consistency subscore to > 0.99 for the total DEMSS score.
Table 4 Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.
Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.×
Reliability
Variable Test–retest Intrajudge Interjudge
Total DEMSS score 1.00 [.99, 1.00] .92 [.77, .98] .98 [.95, .99]
Vowel accuracy subscore .99 [.98, 1.00] .89 [.67, .97] .94 [.87, .98]
Overall articulatory accuracy subscore .99 [.98, 1.00] .95 [.84, .98] .98 [.96, .99]
Prosodic accuracy subscore .99 [.96, 1.00] .31 [−.27, .73] .91 [.78, .96]
Consistency subscore .82 [.48, .95] .38 [−.20, .77] .73 [.44, .88]
Table 4 Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.
Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.×
Reliability
Variable Test–retest Intrajudge Interjudge
Total DEMSS score 1.00 [.99, 1.00] .92 [.77, .98] .98 [.95, .99]
Vowel accuracy subscore .99 [.98, 1.00] .89 [.67, .97] .94 [.87, .98]
Overall articulatory accuracy subscore .99 [.98, 1.00] .95 [.84, .98] .98 [.96, .99]
Prosodic accuracy subscore .99 [.96, 1.00] .31 [−.27, .73] .91 [.78, .96]
Consistency subscore .82 [.48, .95] .38 [−.20, .77] .73 [.44, .88]
×
Intrajudge reliability. The mean intrajudge agreement of the 171 judgments was 89%, with a range from 81% to 95%. The data for these calculations were obtained from the 12 participants for whom intrajudge reliability data were obtained and consisted of mean percentage of agreements across the 171 judgments per participants made by each of two judges across two scorings. The corresponding ICCs are shown in the intrajudge column of Table 4. Low intrajudge ICC values for two subscores (0.31 for prosody and 0.38 for consistency) along with associated confidence intervals that included negative values suggest poor or even absent ICCs for these two subscores. In contrast, the intrajudge ICCs for the overall articulatory accuracy subscore and the total DEMSS score were high: 0.95 for overall articulatory accuracy and 0.92 for the total DEMSS score.
Interjudge reliability. The mean interjudge agreement of the 171 judgments was 91%, with a range from 80% to 98%. The data for these calculations were obtained from the 20 participants for whom interjudge reliability data were obtained and consisted of mean percentage of agreement for the two judges across the 171 judgments per participant. The interjudge column of Table 4 shows that the ICCs ranged from 0.73 for the consistency subscore to 0.98 for both the total DEMSS score and the overall articulatory accuracy subscore.
Validity
The cluster analysis using four DEMSS subscores (overall articulatory accuracy of the word, vowel accuracy, prosody, and consistency) is summarized in the dendrogram in Figure 2. Although the clustering algorithm does not take diagnosis into account, children who had been diagnosed with CAS or mCAS in their initial clinical evaluation appear in the figure as boxes and triangles, respectively, to identify to what degree children with this diagnosis are clustered together. The vertical axis of the dendrogram represents the dissimilarity between clusters. A horizontal line drawn at any point on the dendrogram will divide the participants into clusters, that is, groups of participants whose profiles on the four subscores (rather than the DEMSS total scores) are similar. Key clusters are labeled to guide the reader.
Figure 2.

Dendrogram indicating the sequence of cluster merges beginning with each participant in his or her own cluster at bottom and ending with all participants merged into a single cluster at top. The vertical axis represents the dissimilarity between merging clusters. A horizontal line drawn at any point will divide the participants into clusters. We have indicated the point at which there are three clusters and have labeled them A, B, and C. Although clinical status was not used in the cluster analysis, we have indicated participants with CAS and mCAS, and we have identified them by their study identifier.

 Dendrogram indicating the sequence of cluster merges beginning with each participant in his or her own cluster at bottom and ending with all participants merged into a single cluster at top. The vertical axis represents the dissimilarity between merging clusters. A horizontal line drawn at any point will divide the participants into clusters. We have indicated the point at which there are three clusters and have labeled them A, B, and C. Although clinical status was not used in the cluster analysis, we have indicated participants with CAS and mCAS, and we have identified them by their study identifier.
Figure 2.

Dendrogram indicating the sequence of cluster merges beginning with each participant in his or her own cluster at bottom and ending with all participants merged into a single cluster at top. The vertical axis represents the dissimilarity between merging clusters. A horizontal line drawn at any point will divide the participants into clusters. We have indicated the point at which there are three clusters and have labeled them A, B, and C. Although clinical status was not used in the cluster analysis, we have indicated participants with CAS and mCAS, and we have identified them by their study identifier.

×
This dendrogram illustrates that there appear to be three major clusters of children. Cluster A consists of 15 children: Three who had been diagnosed with CAS, five who had been diagnosed with mild CAS, and seven who had been diagnosed with some other SSD. Cluster B is composed of seven participants, who had been diagnosed with CAS. As might be expected, there is considerable variability within the cluster (as indicated by the horizontal distances among individuals), even though the algorithm clearly distinguishes this group from the rest of the children. Cluster C includes all other participants, including two children who had been clinically diagnosed with CAS and three children who had been diagnosed with at least mild difficulty with praxis for speech movement (mCAS).
Although not designed for use in the framework of hypothesis testing, the gap statistic provides a heuristic for determining the number of clusters in a data set. The gap statistics for the first five clusters were, in order, 0.42, 0.99, 1.02, 0.88, and 0.82. This sequence shows strong evidence against the single-cluster solution in that there is a marked increase in the gap statistic for the two-cluster solution, corresponding to an appreciable reduction in the within-cluster sum of squares. The gap statistic shows the three-cluster solution to be optimal. Given that there is only a marginal increase from two to three clusters, the two-cluster solution is the more parsimonious of the two. In this sense, we have found the gap statistic to be generally consistent with the above interpretation of the dendrogram.
Using Clusters A, B, and C identified in the dendrogram, we found no differences across the clusters in gender (p = .81) or age (p = .32). Presenting data in the order Cluster A, B, and C, respectively, we did observe significant differences in scores on the GFTA (Mdn = 65, 60, and 78 for Clusters A, B, and C, respectively; p = .03), PPVT (Mdn = 86, 96, and 102 for Clusters A, B, and C, respectively; p < .01), RLS (Mdn = 86, 88, and 95 for Clusters A, B, and C, respectively; p < .01), and ELS (Mdn = 81, 77, and 89 for Clusters A, B, and C, respectively; p = .03).
Table 5 provides measures of how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity. AUROC values (estimates of the probability of correct classification based only on the DEMSS scores) were above 90% for total score and for the vowel, accuracy, and consistency subscores but were not significantly different among these four measures (p = .32; DeLong etal., 1988).
Table 5 Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.
Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.×
Measure AUROC (95% CI) Logistic regression cutoffa Sensitivity/specificityb Specificity at 90% sensitivity LR+/LR−b
Total DEMSS 0.93 [0.83, 0.97] 129 0.65/0.97 0.70 19.8/0.36
Vowel accuracy 0.91 [0.79, 0.96] 20 0.60/0.98 0.62 36.6/0.41
Total accuracy 0.93 [0.82, 0.97] 99 0.65/0.98 0.66 39.6/0.36
Prosody 0.78 [0.64, 0.87] 6 0.35/0.95 0.49 7.1/0.68
Consistency 0.93 [0.82, 0.97] 11 0.70/0.93 0.74 10.7/0.32
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.×
a Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.
Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.×
b Based on the indicated logistic regression cutoff.
Based on the indicated logistic regression cutoff.×
Table 5 Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.
Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.×
Measure AUROC (95% CI) Logistic regression cutoffa Sensitivity/specificityb Specificity at 90% sensitivity LR+/LR−b
Total DEMSS 0.93 [0.83, 0.97] 129 0.65/0.97 0.70 19.8/0.36
Vowel accuracy 0.91 [0.79, 0.96] 20 0.60/0.98 0.62 36.6/0.41
Total accuracy 0.93 [0.82, 0.97] 99 0.65/0.98 0.66 39.6/0.36
Prosody 0.78 [0.64, 0.87] 6 0.35/0.95 0.49 7.1/0.68
Consistency 0.93 [0.82, 0.97] 11 0.70/0.93 0.74 10.7/0.32
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.×
a Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.
Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.×
b Based on the indicated logistic regression cutoff.
Based on the indicated logistic regression cutoff.×
×
The prosody subscore AUROC of 0.78 was significantly lower than the other measures (p < .02 for each comparison). The DEMSS total score and subscores are ordinal and, therefore, are not designed to support the identification of strict cutoffs. However, we found that based on the logistic regression models and by using a cutoff of an estimated probability greater than 0.5, sensitivity ranged from 0.70 for consistency down to 0.35 for prosody. Specificity was 93% or greater for all measures. Choosing a cutoff with sensitivity at 90% resulted in a specificity ranging from 0.74 for the consistency subscore to 0.49 for the prosody subscore. In addition to measures of specificity and sensitivity, similar findings were demonstrated for two related measures (LR+ and LR−, using the logistic regression cutoff), which were calculated because of their greater independence from the specific sample of test takers (see Table 5). A score above the cutpoint was at least 20 times more likely for participants with CAS than for participants without CAS for the DEMSS total score, accuracy subscore, and vowel subscore, with LR+ values ranging from 20 to 39—values that easily exceed the recommended LR+ > 10 (Dollaghan, 2007). However, LR− values did not reach the recommended LR− < .10 (Dollaghan, 2007).
Discussion
The purpose of this study was to provide evidence for the reliability and validity of a dynamic speech assessment tool for younger children and/or those with more severe speech impairments. The data provide support for acceptable intrajudge, interjudge, and test–retest reliability of the instrument. Results also provide evidence supporting the validity of the DEMSS for the diagnostic purpose of differentiating children with severe speech impairment due, at least, in part to deficits in motor skill, especially planning and programming of movements for speech.
Reliability
Assessments of reliability are undertaken to examine the vulnerability of a test to error or to unintended sources of variability. The test–retest mean agreement of 89% obtained for the DEMSS in this study suggests a high level of test–retest reliability or a relatively low level of error due to differences in the child's performance over two test administrations. This finding was valuable given the difficulties intrinsic to this type of testing—difficulties related to variations in child factors, such as attention and mood, as well as variations in clinician factors, such as judgment focus and interpretation of the child's response (Kent etal., 1987). The intrajudge mean agreement of 89% suggests that the clinician consistently judged a child's performance on the DEMSS with few differences across judgments, even when the second set of judgments was made from a video recording in which the child's activity sometimes prevented visualization of the face or introduced extraneous noise. Thus, clinicians were able to score this test consistently across live versus taped contexts, a finding suggesting that test users can score it online or from a video record as needed, based on the demands of their clinical context. The interjudge mean agreement of 91% suggests acceptable reliability across clinicians. This finding is particularly important given the use of multidimensional scoring for two DEMSS subscores (overall articulatory accuracy and vowel accuracy) because this type of scoring can pose a particular challenge to reliability (Odekar & Hollowell, 2005). Further, this finding is important because two of the three judges in the study (although experienced clinicians) had not been involved in the DEMSS's development and had received less than 2 hr of instruction and practice prior to assuming their roles in the study. Thus, the demonstration of high interjudge agreement suggests the feasibility of the test's use by experienced clinicians following modest training in its administration.
As a further exploration of the DEMSS reliability for subscores, ICCs were used to examine consistency across subscores. ICC findings suggest that the DEMSS total score as well as the four subscores were highly reliable for both the test–retest and interjudge contexts. That is, differences in scores across children were primarily due to differences among children rather than other factors. In the intrajudge context only, prosody and consistency subscores were less reliable. The poorer performance of these two subscores, however, did not undermine the value of the DEMSS total score in discriminating groups of children. The smaller number of utterances judged for prosody (n = 21) and consistency (n = 28) than for either vowels (n = 56) or overall articulatory accuracy subscores (n = 66) likely contributed to the poorer ICCs. Also, the clinicians anecdotally reported that these judgments were among the most difficult they were required to make, consistent with the findings of Munson, Byorum, and Windsor (2003), thus suggesting the need for providing ample practice on judgments of stress. We plan to provide numerous examples of judgments of stress when we create a training tape for test administration.
Validity
The DEMSS was designed to help clinicians identify the subset of children with severe SSDs who exhibit difficulty in motor skill, particularly motor programming difficulty. The label CAS is most frequently used for these children, who seem to have problems executing accurate movement gestures to reach specific spatial and temporal targets for speech (in the absence of weakness or other neuromuscular deficits). The ability of the DEMSS to distinguish children with characteristics associated with CAS versus those without, within a population of children with SSDs, represents a vital contribution to evidence of its validity.
In this study, agglomerative hierarchical cluster analysis was selected as the preferred method to examine the validity of the DEMSS in part because it could provide evidence that the test differentiated subgroups of children based on their patterns of performance in motor speech skill. The content of the DEMSS was designed not only to include information about children's articulatory accuracy but also to focus on vowel accuracy, prosody, and consistency of error productions on repeated attempts, which have been posited as important behavioral markers for the label of CAS. The three clusters identified using the DEMSS closely resembled groups that were based solely on a clinical diagnosis (CAS, mCAS, and other SSD), which had been made based on an earlier and more comprehensive set of test observations. Although the DEMSS was part of that test protocol, the scores were not tallied until weeks later, and they were not used in establishing the clinical diagnosis. Although this represents a limitation of this study, it is consistent with the incomplete nature of any examination of validity.
The variability among the seven children in one cluster (Cluster B, all of whom had been diagnosed with CAS) illustrates the expected heterogeneity of behaviors observed within the diagnosis. For the children participating in this study, the total DEMSS score as well as the overall articulatory accuracy, vowel, and consistency subscores were all effective in distinguishing among groups of children within a larger group with SSDs. Although the findings suggest that any of those discriminating subscores might be used as a substitute for the whole test in differential diagnosis, the data also suggest that the total DEMSS is the best discriminator.
When logistic regression modeling was used to determine optimal cut points on the DEMSS, measures of sensitivity and specificity as well as likelihood ratios showed that the test did not overidentify CAS. However, a few children who had been given a diagnosis of CAS (n = 2) or mCAS (n = 3) based on the comprehensive speech evaluation were not successfully identified by the DEMSS. This is probably because the DEMSS had been designed specifically to address the significant challenges posed by children who are younger and/or exhibit more severe speech impairments. Thus, some children who were less severely affected probably performed well on the simpler stimuli incorporated in the DEMSS but were given the diagnosis of CAS or mCAS because of their performance on more phonetically complex or multisyllabic stimuli used during other testing conducted as part of the entire evaluation. For example, participants 41 and 25 were not identified by the DEMSS as being CAS, although the clinical examination showed significant difficulty with praxis. They exhibited numerous vowel distortions in connected speech. Direct imitation using visual attention to the clinician's face facilitated their performance on this isolated word task. Groping and voicing errors (frequently noted in children with CAS) were also noted in more difficult tasks but are not used in the scoring of the DEMSS. This likely contributed to the DEMSS not identifying them. This issue points to the likelihood that no one task or test is sufficient for differential diagnosis. This finding was not unexpected and emphasizes the fact that the DEMSS is best used as part of a test battery for children who are younger and/or demonstrate more severely impaired speech acquisition. Until an alternate version of the DEMSS is designed for older children with more speech output, existing measures may be more appropriate for children with those skills (e.g., Rvachew etal., 2005; Thoonen, Maassen, Gabreels, & Schreuder, 1999; Thoonen, Maassen, Wit, Gabreels, & Schreuder, 1996).
This study showed that the DEMSS could identify groups with similar profiles of motor speech characteristics (movement accuracy, vowel accuracy, consistency, and prosody). We have been using the term CAS given the literature suggesting these characteristics as being supportive of the label for this subgroup of children with speech sound disorders. Although theories of CAS accounting for the pathophysiology of the disorder are in early stages of development (e.g., ASHA, 2007; Caruso & Strand, 1999; Nijland, Maassen, & van der Meulen, 2003; Shriberg, Lohmeier, Strand, & Jakielski, 2012; Terband & Maassen, 2010; Terband, Maassen, Guenther, & Brumberg, 2009), there is now fairly good agreement that these characteristics should be present to warrant the use of the diagnostic label. Although a larger number of characteristics may also be present (e.g., sound omissions, substitution errors, reduced phonemic and phonetic inventories), they are less likely to be discriminative than movement accuracy, vowel accuracy, and prosodic accuracy. There are reasons to expect these particular characteristics in difficulties with praxis (planning–programming volitional movement for speech; Odell & Shriberg, 2001; Shriberg etal., 2003). Although it is still unclear in what way or to what degree the proprioceptive feedback mechanisms, basal ganglia, and cerebellar circuits and/or motor planning areas of cortex are implicated in speech praxis deficits in children, deficits in these neural networks appear consistent with inaccurate movement trajectories and mistiming. If one has trouble with programming specific movement parameters, it would be very difficult to achieve the exact vocal tract shape for vowels and consistent movement patterns across repetitions. Further, because lexical stress, at least in English, depends on the ability to program subtle differences in syllable durations, fundamental frequency, and signal amplitude (Kehoe, Stoel-Gammon, & Buder, 1995), difficulties with motor programming would probably result in lexical stress errors (Shriberg etal., 2003).
The analysis of clustering in the context of other measurements seems to be consistent with expectations that those children in Group C (which was largely comprised of children without diagnoses of either CAS or mild CAS) would perform best on both speech (GFTA) and language measures (PPVT, RLS, ELS). Clusters A and B did not exhibit large differences except on the PPVT, where Cluster B, comprised entirely of children with a CAS clinical diagnosis, unexpectedly performed better than those in Cluster A, which included children with mCAS or other SSD as their diagnosis.
Clinical Significance
Development of the DEMSS was motivated by the need for a psychometrically sound motor speech examination to add to our battery of measurement tools for children with SSDs to identify subgroups of children with speech praxis difficulties. The results of the study also describe a clinically significant observation that is frequently overlooked. That is, children with SSDs do not typically fall neatly into a binary pattern of either having a particular impairment (e.g., phonologic vs. CAS) or not. Rather, children more often present with overlapping characteristics, such as seen in the overlap between diagnostic categories in Figure 2. Children with motor speech impairment probably exhibit errors in phonology given that the motoric deficit almost certainly makes phonologic acquisition more difficult. Further, various degrees of severity of impairment also result in overlapping categories. The reason for a clinical tool such as the DEMSS is not to separate categories completely, but to provide a tool to help identify those children who have at least some evidence for difficulty with praxis so that clinicians planning their treatment can incorporate principles of motor learning.
For a tool to be useful in the clinic, it must be able to be given and scored in a timely manner. Administration time for the DEMSS ranged from 7 min (for children with predictable articulation substitution errors and little need for cuing) to 25 min for more severe children who needed more frequent cuing. A test of this length should be practical even in a busy clinic environment. To use it, clinicians will need to learn the scoring system and practice making judgments, especially those related to lexical stress. The clinicians administering the DEMSS for this study had one 30-min session with the developer, followed by feedback on five administrations. Using a 30-min training tape (which should include practice and feedback) and practicing administering five tests with video review will likely allow clinicians to administer the DEMSS reliably, although verification of this claim is still needed.
Although the purpose of this article was to report the validity and reliability of the DEMSS, comments on other potential uses of this instrument in clinical decision making warrant at least brief discussion. For example, the amount of cuing needed by the child during administration of the DEMSS offers additional information related to the severity of the child's problem as well as the types of cuing that may be most facilitative in treatment. Observation of specific vowel errors and performance on specific syllable structures can be helpful in choosing initial sets of stimuli. Performance under dynamic testing conditions such as those used in the DEMSS could be expected to lead to predictions regarding prognosis. This, in turn, would influence suggestions regarding the use of signing and/or other methods of augmentative communication to facilitate language development and functional communication while work to improve speech acquisition proceeds.
The initial motivation for the DEMSS was that there was no motor speech tool appropriate for assessing very young children with limited output. Many young children with limited output cannot complete the demands of most standardized tests. The dynamic nature of the DEMSS, utilizing direct imitation of simple phonetic content and syllable structure with cuing, provides a valuable tool for these younger children. Our study shows that many children, even those 3 years of age with very little speech output, can complete the DEMSS by attempting imitation of each item. Clinicians are then able to make observations of what happens when the child attempts the production, with and without cuing, allowing clinicians to judge the presence of specific characteristics that may be helpful in determining differential diagnosis as well as severity and prognosis.
Conclusions
The results of this study provide initial evidence for the validity and reliability of the DEMSS as part of a comprehensive protocol for differential diagnosis of children with severe SSDs. The DEMSS was designed to meet the need for a motor speech examination to be used with young and/or severely affected children to determine to what degree motor impairment may be impacting their difficulties with speech acquisition. We believe the data provide a foundation supporting the validity of this instrument to discriminate a subgroup of children who likely have difficulty with praxis for speech. The data also provide some evidence for the validity of these four characteristics as discriminatory for the diagnosis of CAS and support future research in determining the validity of these variables as part of a behavioral phenotype for CAS.
Although the development of theories of CAS and explication of its nature and most consistently discriminative signs continue, it is important to be able to identify children who may benefit most from treatment approaches and strategies that directly facilitate motor learning and speech motor control. The ASHA (2007)  position statement on CAS was a start toward more consistent participant description so that researchers and consumers of research could be more confident about the similarity of children studied and treated under the label CAS. The reliability and validity data for the DEMSS provided in this study suggest that this test has the potential to further advance that clinical and research goal.
Acknowledgments
The first author acknowledges the support of Mayo Clinic CTSA through National Center for Research Resources Grant UL1RR024150. We also thank Heather Clark and David Ridge for reading and commenting on earlier versions of this article.
References
American Speech-Language-Hearing Association. (2007). Childhood apraxia of speech: Nomenclature, definition, roles and responsibilities, and a call for action [Position statement]. Rockville, MD: Author.
American Speech-Language-Hearing Association. (2007). Childhood apraxia of speech: Nomenclature, definition, roles and responsibilities, and a call for action [Position statement]. Rockville, MD: Author.×
Arndt, W. B., Shelton, R. L., Johnson, A. F., & Furr, M. L. (1977). Identification and description of homogeneous subgroups within a sample of misarticulating children. Journal of Speech and Hearing Research, 20, 263–292. [Article] [PubMed]
Arndt, W. B., Shelton, R. L., Johnson, A. F., & Furr, M. L. (1977). Identification and description of homogeneous subgroups within a sample of misarticulating children. Journal of Speech and Hearing Research, 20, 263–292. [Article] [PubMed]×
Bain, B. A. (1994). A framework for dynamic assessment in phonology: Stimulability revisited. Clinics in Communication Disorders, 4, 12–22. [PubMed]
Bain, B. A. (1994). A framework for dynamic assessment in phonology: Stimulability revisited. Clinics in Communication Disorders, 4, 12–22. [PubMed]×
Ball, L. J., Bernthal, J. E., & Beukelman, D. R. (2002). Profiling communication characteristics of children with developmental apraxia of speech. Journal of Medical Speech-Language Pathology, 10, 221–229.
Ball, L. J., Bernthal, J. E., & Beukelman, D. R. (2002). Profiling communication characteristics of children with developmental apraxia of speech. Journal of Medical Speech-Language Pathology, 10, 221–229.×
Campbell, T. (2003). Childhood apraxia of speech: Clinical symptoms and speech characteristics. InShriberg, L. D., & Campbell, T. F. (Eds.), Proceedings of the 2002 Childhood Apraxia of Speech Symposium (pp. 37–40). Carlsbad, CA: The Hendrix Foundation.
Campbell, T. (2003). Childhood apraxia of speech: Clinical symptoms and speech characteristics. InShriberg, L. D., & Campbell, T. F. (Eds.), Proceedings of the 2002 Childhood Apraxia of Speech Symposium (pp. 37–40). Carlsbad, CA: The Hendrix Foundation.×
Carrow-Woolfolk, E. (1997). Oral & Written Language Scales. Circle Pines, MN: AGS.
Carrow-Woolfolk, E. (1997). Oral & Written Language Scales. Circle Pines, MN: AGS.×
Caruso, A., & Strand, E. (1999). Motor speech disorders in children: Definitions, background and a theoretical framework. InCaruso, A., & Strand, E. A. (Eds.), Clinical management of motor speech disorders in children (pp. 1–27). New York, NY: Thieme.
Caruso, A., & Strand, E. (1999). Motor speech disorders in children: Definitions, background and a theoretical framework. InCaruso, A., & Strand, E. A. (Eds.), Clinical management of motor speech disorders in children (pp. 1–27). New York, NY: Thieme.×
Conti-Ramsden, G., & Botting, N. (1999). Classification of children with specific language impairment: Longitudinal considerations. Journal of Speech, Language, and Hearing Research, 42, 1195–1204. [Article]
Conti-Ramsden, G., & Botting, N. (1999). Classification of children with specific language impairment: Longitudinal considerations. Journal of Speech, Language, and Hearing Research, 42, 1195–1204. [Article] ×
Crary, M. A. (1993). Developmental motor speech disorders. San Diego, CA: Singular.
Crary, M. A. (1993). Developmental motor speech disorders. San Diego, CA: Singular.×
Davis, B. L., Jakielski, K. J., & Marquardt, T. P. (1998). Developmental apraxia of speech: Determiners of differential diagnosis. Clinical Linguistics & Phonetics, 12, 25–45. [Article]
Davis, B. L., Jakielski, K. J., & Marquardt, T. P. (1998). Developmental apraxia of speech: Determiners of differential diagnosis. Clinical Linguistics & Phonetics, 12, 25–45. [Article] ×
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44, 837–845. [Article] [PubMed]
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44, 837–845. [Article] [PubMed]×
Dollaghan, C. (2007). The handbook for evidence-based practice in communication disorders. Baltimore, MD: Brookes.
Dollaghan, C. (2007). The handbook for evidence-based practice in communication disorders. Baltimore, MD: Brookes.×
Downing, S. M., & Haladyna, T. M. (2006). Handbook of test development. Mahwah, NJ: Erlbaum.
Downing, S. M., & Haladyna, T. M. (2006). Handbook of test development. Mahwah, NJ: Erlbaum.×
Duffy, J. (2005). Motor speech disorders: Substrates, differential diagnosis and management ((2nd ed.)). St. Louis, MO: Elsevier Mosby.
Duffy, J. (2005). Motor speech disorders: Substrates, differential diagnosis and management ((2nd ed.)). St. Louis, MO: Elsevier Mosby.×
Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test—III. Circle Pines, MN: AGS.
Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test—III. Circle Pines, MN: AGS.×
Glaspey, A., & Stoel-Gammon, C. (2007). A dynamic approach to phonological assessment. Advances in Speech Language Pathology, 9, 286–296. [Article]
Glaspey, A., & Stoel-Gammon, C. (2007). A dynamic approach to phonological assessment. Advances in Speech Language Pathology, 9, 286–296. [Article] ×
Goffman, L. (2004). Assessment and classification: An integrative model of language and motor contributions to phonological development. InKamhi, A., & Pollock, K. (Eds.), Phonological disorders in children: Clinical decision making in assessment and intervention (pp. 51–64). Baltimore, MD: Brookes.
Goffman, L. (2004). Assessment and classification: An integrative model of language and motor contributions to phonological development. InKamhi, A., & Pollock, K. (Eds.), Phonological disorders in children: Clinical decision making in assessment and intervention (pp. 51–64). Baltimore, MD: Brookes.×
Goldman, R., & Fristoe, M. (2000). Goldman Fristoe Test of Articulation—Second Edition. Circle Pines, MN: AGS.
Goldman, R., & Fristoe, M. (2000). Goldman Fristoe Test of Articulation—Second Edition. Circle Pines, MN: AGS.×
Guyette, T. W. (2001). Review of the Apraxia profile. InPlake, B. S., & Impara, J. C. (Eds.), The fourteenth mental measurements yearbook (pp. 57–58). Lincoln, NE: Buros Institute of Mental Measurements.
Guyette, T. W. (2001). Review of the Apraxia profile. InPlake, B. S., & Impara, J. C. (Eds.), The fourteenth mental measurements yearbook (pp. 57–58). Lincoln, NE: Buros Institute of Mental Measurements.×
Hastie, T., Tibshirani, R., & Friedman, J. H. (2003). The elements of statistical learning. New York, NY: Springer-Verlag.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2003). The elements of statistical learning. New York, NY: Springer-Verlag.×
Hayden, D., & Square, P. (1999). Verbal Motor Production Assessment for Children. San Antonio, TX: Pearson, PsychCorp.
Hayden, D., & Square, P. (1999). Verbal Motor Production Assessment for Children. San Antonio, TX: Pearson, PsychCorp.×
Johnson, A. F., Shelton, R. L., & Arndt, W. B. (1982). A technique for identifying the subgroup membership of certain misarticulating children. Journal of Speech and Hearing Research, 25, 162–166. [Article] [PubMed]
Johnson, A. F., Shelton, R. L., & Arndt, W. B. (1982). A technique for identifying the subgroup membership of certain misarticulating children. Journal of Speech and Hearing Research, 25, 162–166. [Article] [PubMed]×
Kehoe, M., Stoel-Gammon, C., & Buder, E. H. (1995). Acoustic correlates of stress in young children's speech. Journal of Speech and Hearing Research, 38, 338–350. [Article] [PubMed]
Kehoe, M., Stoel-Gammon, C., & Buder, E. H. (1995). Acoustic correlates of stress in young children's speech. Journal of Speech and Hearing Research, 38, 338–350. [Article] [PubMed]×
Kent, R. D. (2004). Models of speech motor control: Implications from recent developments in neurophysiological and neurobehavioral science. InMaassen, B., Kent, R. D., Peters, H. F. M., van Lieshout, P. H. M., & Hultsijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 3–28). London, England: Oxford University Press.
Kent, R. D. (2004). Models of speech motor control: Implications from recent developments in neurophysiological and neurobehavioral science. InMaassen, B., Kent, R. D., Peters, H. F. M., van Lieshout, P. H. M., & Hultsijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 3–28). London, England: Oxford University Press.×
Kent, R. D., Kent, J., & Rosenbek, J. (1987). Maximal performance tests of speech production. Journal of Speech and Hearing Disorders, 52, 367–387. [Article] [PubMed]
Kent, R. D., Kent, J., & Rosenbek, J. (1987). Maximal performance tests of speech production. Journal of Speech and Hearing Disorders, 52, 367–387. [Article] [PubMed]×
Lidz, C. S., & Peña, E. D. (1996). Dynamic assessment: The model, its relevance as a non-biased approach and its application to Latino American preschool children. Language, Speech, and Hearing Services in Schools, 27, 367–372. [Article]
Lidz, C. S., & Peña, E. D. (1996). Dynamic assessment: The model, its relevance as a non-biased approach and its application to Latino American preschool children. Language, Speech, and Hearing Services in Schools, 27, 367–372. [Article] ×
McCauley, R. J. (2001). Assessment of language disorders in children. Mahwah, NJ: Erlbaum.
McCauley, R. J. (2001). Assessment of language disorders in children. Mahwah, NJ: Erlbaum.×
McCauley, R. J. (2003). Review of Screening Test for Developmental Apraxia of Speech. (2nd ed.)Plake, B., Impara, J. C., & Spies, R. A. (Eds.), The fifteenth mental measurements yearbook (pp. 786–789). Austin, TX: Pro-Ed.
McCauley, R. J. (2003). Review of Screening Test for Developmental Apraxia of Speech. (2nd ed.)Plake, B., Impara, J. C., & Spies, R. A. (Eds.), The fifteenth mental measurements yearbook (pp. 786–789). Austin, TX: Pro-Ed.×
McCauley, R. J., & Strand, E. A. (2008). A review of standardized tests of nonverbal oral and speech motor performance in children. American Journal of Speech-Language Pathology, 17, 1–11. [Article]
McCauley, R. J., & Strand, E. A. (2008). A review of standardized tests of nonverbal oral and speech motor performance in children. American Journal of Speech-Language Pathology, 17, 1–11. [Article] ×
McLeod, S., Harrison, L. J., & McCormack, J. (2012). The intelligibility in context scale: Validity and reliability of a subjective rating measure. Journal of Speech, Language, and Hearing Research, 55, 648–656. [Article]
McLeod, S., Harrison, L. J., & McCormack, J. (2012). The intelligibility in context scale: Validity and reliability of a subjective rating measure. Journal of Speech, Language, and Hearing Research, 55, 648–656. [Article] ×
McNeil, M. R., Robin, D. A., & Schmidt, R. A. (2009). Apraxia of speech: Definition, differential, and treatment. InMcNeil, M. (Ed.), Clinical management of sensorimotor speech disorders ((2nd ed., pp. 249–268). New York, NY: Thieme.
McNeil, M. R., Robin, D. A., & Schmidt, R. A. (2009). Apraxia of speech: Definition, differential, and treatment. InMcNeil, M. (Ed.), Clinical management of sensorimotor speech disorders ((2nd ed., pp. 249–268). New York, NY: Thieme. ×
Munson, B., Byorum, E., & Windsor, J. (2003). Acoustic and perceptual correlates of stress in nonwords produced by children with suspected developmental apraxia of speech and children with phonological disorder. Journal of Speech, Language, and Hearing Research, 46, 189–202. [Article]
Munson, B., Byorum, E., & Windsor, J. (2003). Acoustic and perceptual correlates of stress in nonwords produced by children with suspected developmental apraxia of speech and children with phonological disorder. Journal of Speech, Language, and Hearing Research, 46, 189–202. [Article] ×
Nijland, L., Maassen, B., & van der Meulen, S. (2003). Evidence of motor programming deficits in children with DAS. Journal of Speech, Language, and Hearing Research, 46, 437–450. [Article]
Nijland, L., Maassen, B., & van der Meulen, S. (2003). Evidence of motor programming deficits in children with DAS. Journal of Speech, Language, and Hearing Research, 46, 437–450. [Article] ×
Odekar, A., & Hollowell, B. (2005). Comparison of alternatives to multidimensional scoring in the assessment of language comprehension in aphasia. American Journal of Speech-Language Pathology, 14, 337–345. [Article] [PubMed]
Odekar, A., & Hollowell, B. (2005). Comparison of alternatives to multidimensional scoring in the assessment of language comprehension in aphasia. American Journal of Speech-Language Pathology, 14, 337–345. [Article] [PubMed]×
Odell, K. H., & Shriberg, L. D. (2001). Prosody-voice characteristics of children and adults with apraxia of speech. Clinical Linguistics & Phonetics, 15, 275–307. [Article]
Odell, K. H., & Shriberg, L. D. (2001). Prosody-voice characteristics of children and adults with apraxia of speech. Clinical Linguistics & Phonetics, 15, 275–307. [Article] ×
Ozanne, A. (1995). The search for developmental verbal dyspraxia. InDodd, B. (Ed.), Differential diagnosis and treatment of children with speech disorder (pp. 91–101). San Diego, CA: Singular.
Ozanne, A. (1995). The search for developmental verbal dyspraxia. InDodd, B. (Ed.), Differential diagnosis and treatment of children with speech disorder (pp. 91–101). San Diego, CA: Singular.×
Peña, E. R., Gillam, R. B., Malek, M., Ruez-Felter, R., Recendez, M., & Sabel, T. (2006). Dynamic assessment of school-age children's narrative ability: An investigation of reliability and validity. Journal of Speech, Language, and Hearing Research, 49, 1037–1057. [Article]
Peña, E. R., Gillam, R. B., Malek, M., Ruez-Felter, R., Recendez, M., & Sabel, T. (2006). Dynamic assessment of school-age children's narrative ability: An investigation of reliability and validity. Journal of Speech, Language, and Hearing Research, 49, 1037–1057. [Article] ×
Peter, B., & Stoel-Gammon, C. (2008). Central timing deficits in subtypes of primary speech disorders. Clinical Linguistics & Phonetics, 22, 171–198. [Article] [PubMed]
Peter, B., & Stoel-Gammon, C. (2008). Central timing deficits in subtypes of primary speech disorders. Clinical Linguistics & Phonetics, 22, 171–198. [Article] [PubMed]×
Pinheiro, J. C., & Bates, D. M. (2002). Mixed effects models in S and S-Plus. New York, NY: Springer.
Pinheiro, J. C., & Bates, D. M. (2002). Mixed effects models in S and S-Plus. New York, NY: Springer.×
Rvachew, S., Hodge, M., & Ohberg, A. (2005). Obtaining and interpreting maximum performance tasks from children: A tutorial. Journal of Speech-Language Pathology and Audiology, 29, 146–157.
Rvachew, S., Hodge, M., & Ohberg, A. (2005). Obtaining and interpreting maximum performance tasks from children: A tutorial. Journal of Speech-Language Pathology and Audiology, 29, 146–157.×
Shriberg, L. (2003). Diagnostic markers for child speech-sound disorders: Introductory comments. Clinical Linguistics & Phonetics, 17, 501–505. [Article] [PubMed]
Shriberg, L. (2003). Diagnostic markers for child speech-sound disorders: Introductory comments. Clinical Linguistics & Phonetics, 17, 501–505. [Article] [PubMed]×
Shriberg, L. D., Aram, D. M., & Kwiatkowski, J. (1997). Developmental apraxia of speech: II. Toward a diagnostic marker. Journal of Speech, Language, and Hearing Research, 40, 286–312. [Article]
Shriberg, L. D., Aram, D. M., & Kwiatkowski, J. (1997). Developmental apraxia of speech: II. Toward a diagnostic marker. Journal of Speech, Language, and Hearing Research, 40, 286–312. [Article] ×
Shriberg, L. D., Campbell, T. F., Karlsson, H. B., Brown, R. L., McSweeny, J. L., & Nadler, C. J. (2003). A diagnostic marker for childhood apraxia of speech: The lexical stress ratio. Clinical Linguistics & Phonetics, 17, 549–574. [Article] [PubMed]
Shriberg, L. D., Campbell, T. F., Karlsson, H. B., Brown, R. L., McSweeny, J. L., & Nadler, C. J. (2003). A diagnostic marker for childhood apraxia of speech: The lexical stress ratio. Clinical Linguistics & Phonetics, 17, 549–574. [Article] [PubMed]×
Shriberg, L. D., Lohmeier, H. L., Strand, E. A., & Jakielski, K. J. (2012). Encoding, memory, and transcoding deficits in childhood apraxia of speech. Clinical Linguistics & Phonetics, 26, 445–482. [Article] [PubMed]
Shriberg, L. D., Lohmeier, H. L., Strand, E. A., & Jakielski, K. J. (2012). Encoding, memory, and transcoding deficits in childhood apraxia of speech. Clinical Linguistics & Phonetics, 26, 445–482. [Article] [PubMed]×
Skahan, S. M., Watson, M., & Lof, G. L. (2007). Speech-language pathologists' assessment practices for children with suspected speech sound disorders: Results of a national survey. American Journal of Speech-Language Pathology, 16, 246–249. [Article] [PubMed]
Skahan, S. M., Watson, M., & Lof, G. L. (2007). Speech-language pathologists' assessment practices for children with suspected speech sound disorders: Results of a national survey. American Journal of Speech-Language Pathology, 16, 246–249. [Article] [PubMed]×
Smith, A., & Goffman, L. (2004). Interaction of motor and linguistic factors in the development of speech production. InMaassen, B., Kent, R., Peters, H., van Lieshout, P., & Hulstijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 225–252). London, England: Oxford University Press.
Smith, A., & Goffman, L. (2004). Interaction of motor and linguistic factors in the development of speech production. InMaassen, B., Kent, R., Peters, H., van Lieshout, P., & Hulstijn, W. (Eds.), Speech motor control in normal and disordered speech (pp. 225–252). London, England: Oxford University Press.×
Strand, E. A. (1992). The integration of motor-speech processes and language formulation in process models of language acquisition. InChapman, R. S. (Ed.), Child talk: Processes in language acquisition and disorder (pp. 86–107). St. Louis, MO: Mosby.
Strand, E. A. (1992). The integration of motor-speech processes and language formulation in process models of language acquisition. InChapman, R. S. (Ed.), Child talk: Processes in language acquisition and disorder (pp. 86–107). St. Louis, MO: Mosby.×
Strand, E. A. (2003). Childhood apraxia of speech: Suggested diagnostic markers for the younger child. InShriberg, L. D., & Campbell, T. F. (Eds.), Proceedings of the 2002 Childhood Apraxia of Speech Symposium (pp. 75–79). Carlsbad, CA: The Hendrix Foundation.
Strand, E. A. (2003). Childhood apraxia of speech: Suggested diagnostic markers for the younger child. InShriberg, L. D., & Campbell, T. F. (Eds.), Proceedings of the 2002 Childhood Apraxia of Speech Symposium (pp. 75–79). Carlsbad, CA: The Hendrix Foundation.×
Strand, E. A., & McCauley, R. J. (1999). Assessment procedures for treatment planning in children with phonologic and motor speech disorders. InCaruso, A., & Strand, E. A. (Eds.), Clinical management of motor speech disorders of children (pp. 73–108). New York, NY: Thieme.
Strand, E. A., & McCauley, R. J. (1999). Assessment procedures for treatment planning in children with phonologic and motor speech disorders. InCaruso, A., & Strand, E. A. (Eds.), Clinical management of motor speech disorders of children (pp. 73–108). New York, NY: Thieme.×
Streiner, D. L., & Norman, G. R. (2003). Health measurement scales: A practical guide to their development and use ((3rd ed.)). New York, NY: Oxford University Press.
Streiner, D. L., & Norman, G. R. (2003). Health measurement scales: A practical guide to their development and use ((3rd ed.)). New York, NY: Oxford University Press.×
Terband, H., & Maassen, B. (2010). Speech motor development in childhood apraxia of speech: Generating testable hypotheses by neurocomputational modeling. Folia Phoniatrica et Logopaedica, 62, 134–142. [Article] [PubMed]
Terband, H., & Maassen, B. (2010). Speech motor development in childhood apraxia of speech: Generating testable hypotheses by neurocomputational modeling. Folia Phoniatrica et Logopaedica, 62, 134–142. [Article] [PubMed]×
Terband, H., Maassen, B., Guenther, F., & Brumberg, J. (2009). Computational neural modeling of speech motor control in childhood apraxia of speech (CAS). Journal of Speech, Language, and Hearing Research, 52, 1595–1609. [Article]
Terband, H., Maassen, B., Guenther, F., & Brumberg, J. (2009). Computational neural modeling of speech motor control in childhood apraxia of speech (CAS). Journal of Speech, Language, and Hearing Research, 52, 1595–1609. [Article] ×
Thoonen, G., Maassen, B., Gabreels, F., & Schreuder, R. (1999). Validity of maximum performance tasks to diagnose motor speech disorders in children. Clinical Linguistics & Phonetics, 13, 1–23. [Article]
Thoonen, G., Maassen, B., Gabreels, F., & Schreuder, R. (1999). Validity of maximum performance tasks to diagnose motor speech disorders in children. Clinical Linguistics & Phonetics, 13, 1–23. [Article] ×
Thoonen, G., Maassen, B., Wit, J., Gabreels, F., & Schreuder, R. (1996). The integrated use of maximum performance tasks in differential diagnostic evaluations among children with motor speech disorders. Clinical Linguistics & Phonetics, 10, 311–336. [Article]
Thoonen, G., Maassen, B., Wit, J., Gabreels, F., & Schreuder, R. (1996). The integrated use of maximum performance tasks in differential diagnostic evaluations among children with motor speech disorders. Clinical Linguistics & Phonetics, 10, 311–336. [Article] ×
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411–423. [Article]
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411–423. [Article] ×
Verté, S., Geurts, H. M., Roeyers, H., Rosseel, Y., Oosterlaan, J., & Sergeant, J. A. (2006). Can the Children's Communication Checklist differentiate autism spectrum subtypes?. Autism, 10, 266–287. [Article] [PubMed]
Verté, S., Geurts, H. M., Roeyers, H., Rosseel, Y., Oosterlaan, J., & Sergeant, J. A. (2006). Can the Children's Communication Checklist differentiate autism spectrum subtypes?. Autism, 10, 266–287. [Article] [PubMed]×
Wetherby, A. M., Allen, L., Cleary, J., Kublin, K., & Goldstein, H. (2002). Validity and reliability of the Communication and Symbolic Behavior Scales Developmental Profile with very young children. Journal of Speech, Language, and Hearing Research, 45, 1202–1218. [Article]
Wetherby, A. M., Allen, L., Cleary, J., Kublin, K., & Goldstein, H. (2002). Validity and reliability of the Communication and Symbolic Behavior Scales Developmental Profile with very young children. Journal of Speech, Language, and Hearing Research, 45, 1202–1218. [Article] ×
Yorkston, K., Beukelman, D., Strand, E. A., & Hackel, M. (2010). Management of motor speech disorders in children and adults. Austin, TX: Pro-Ed.
Yorkston, K., Beukelman, D., Strand, E. A., & Hackel, M. (2010). Management of motor speech disorders in children and adults. Austin, TX: Pro-Ed.×
Zimmerman, L. L., Steiner, V. G., & Pond, R. E. (2002). Preschool Language Scales (4th ed.). San Antonio, TX: Pearson PsychCorp.
Zimmerman, L. L., Steiner, V. G., & Pond, R. E. (2002). Preschool Language Scales (4th ed.). San Antonio, TX: Pearson PsychCorp.×
Appendix
DEMSS Content Scoring
General Instructions
  • Make sure the child's face is in good view on the camera. Check the audio levels. Make sure there is little background noise.

  • Draw the child's attention to your face for every item.

  • Use a moderate to slightly slow, but natural, rate, and natural prosody.

  • Use reinforcers that keep their attention to your face and are quick.

  • Score online—so that you will be sure to do the appropriate cuing for the third and fourth attempts. Then, if necessary, go back and score using videotape.

  • Give up to four trials.

  • Use an X only if there was no attempt.

Rules for Dynamic Administration and Scoring
Overall Articulatory Accuracy Scoring
  1. 0 = Immediate correct repetition.

  2. 1 = Immediate (no groping; trial and error); accurate rate and movement but consistent error.

    • Must have normal vowel

    • Substitution error that is developmental, consistent, and predictable (e.g., f/th; w/r; w/l; lateral lisp). This does not include a final consonant deletion (or any sound deletion), which should be cued.

    • Score children with phoneme collapse (use of one or two sounds to represent most substitutions) with a 1, since it represents a consistent error. Of course, if the child also exhibits vowel distortion, groping, or other errors, it will be a 2, 3, or 4.

  3. 2 = Incorrect first try. Give a 2 if the child is able to produce the word after one additional model of the item (clinician draws attention to the therapist's face and provides the model a little more slowly). Also, give a 2 if the child gets the correct response on the first try but with more time or more groping and/or trial-and-error behavior.

  4. 3 = Needs cuing such as slow, simultaneous production and/or gestural or tactile cues (correct within three or four additional trials). Do not go beyond four cued trials.

  5. 4 = No correct response with repeated attempts.

  6. X = Refusal/inattention/no attempt.

    • Score accuracy on first try only if correct (0). If correct, go on to the second repetition in order to score consistency. If first production is correct but second is incorrect, keep the accuracy score as a 0 but give a 1 for consistency.

    • If the child is not accurate on the first attempt, score accuracy after doing the described cuing procedure.

    • Let the child use his or her own strategies for self-cuing during first attempt.

    • If the child self-corrects on the first attempt, without intervening cues from the clinician, score the self-correction.

    • If you suspect they said the wrong target word (e.g., you say ma—they think you said mom), then start again.

    • Final plosives do not have to be aspirated. Aspiration or not is OK.

Vowel Scoring—Score Only on First Attempt
  1. 0 = Immediate correct repetition of the vowel in that coarticulatory context

  2. 1 = Mild distortion (e.g., neutralization; slightly “off” vocal track shape; if you think to yourself, “Was that OK?” give it a 1)

  3. 2 = Frank distortion

    • Always score the vowel on the first trial.

    • If a word has two or more vowels (e.g., banana), score the vowel in the stressed syllable.</emph>

    • If the child self-corrects on the first attempt, without intervening cues from the clinician, score the self-correction.

Prosody Scoring (Lexical Stress)—Score on First Attempt
  1. 0 = Correct prosody

  2. 1 = Incorrect prosody

    • Score prosody using binary scoring. Use a 0 for correct prosody and a 1 for incorrect prosody (lexical stress errors such as segmentation, equal stress, or wrong stress).

    • Use appropriate lexical stress as a model, especially for the sections where prosody is scored (e.g., don't segment syllables or use equal stress in an effort to elicit better accuracy). The stressed syllable is in bold.

Consistency Scoring
  1. 0 = Consistent

  2. 1 = Inconsistent

    • When scoring consistency, have the child repeat the utterance in succession, without intervening test items. If any two responses during cuing are different, then score inconsistent.

    • Score consistency using binary scoring. Use a 0 for “consistent” and a 1 for “inconsistent.”

Figure 1.

A: Relationship between first and second assessment on each subject. B: First and second assessment for the rater. C: The two raters' assessment. Because of the large range of values, the data are shown on a log-transformed scale.

 A: Relationship between first and second assessment on each subject. B: First and second assessment for the rater. C: The two raters' assessment. Because of the large range of values, the data are shown on a log-transformed scale.
Figure 1.

A: Relationship between first and second assessment on each subject. B: First and second assessment for the rater. C: The two raters' assessment. Because of the large range of values, the data are shown on a log-transformed scale.

×
Figure 2.

Dendrogram indicating the sequence of cluster merges beginning with each participant in his or her own cluster at bottom and ending with all participants merged into a single cluster at top. The vertical axis represents the dissimilarity between merging clusters. A horizontal line drawn at any point will divide the participants into clusters. We have indicated the point at which there are three clusters and have labeled them A, B, and C. Although clinical status was not used in the cluster analysis, we have indicated participants with CAS and mCAS, and we have identified them by their study identifier.

 Dendrogram indicating the sequence of cluster merges beginning with each participant in his or her own cluster at bottom and ending with all participants merged into a single cluster at top. The vertical axis represents the dissimilarity between merging clusters. A horizontal line drawn at any point will divide the participants into clusters. We have indicated the point at which there are three clusters and have labeled them A, B, and C. Although clinical status was not used in the cluster analysis, we have indicated participants with CAS and mCAS, and we have identified them by their study identifier.
Figure 2.

Dendrogram indicating the sequence of cluster merges beginning with each participant in his or her own cluster at bottom and ending with all participants merged into a single cluster at top. The vertical axis represents the dissimilarity between merging clusters. A horizontal line drawn at any point will divide the participants into clusters. We have indicated the point at which there are three clusters and have labeled them A, B, and C. Although clinical status was not used in the cluster analysis, we have indicated participants with CAS and mCAS, and we have identified them by their study identifier.

×
Table 1 Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.
Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.×
Utterance type (examples) No. of utterances No. of items judged for each subscore (range of possible scores)
Overall articulatory accuracy Vowel accuracy Prosodic accuracy Consistency
CV (me, hi) 8 8 (0–32) 8 (0–16) 4 (0–4)
VC (up, eat) 8 8 (0–32) 8 (0–16) 4 (0–4)
Reduplicated syllables (mama, booboo) 4 4 (0–16) 4 (0–4)
CVC1 (mom, peep, pop) 6 6 (0–24) 6 (0–12) 6 (0–6)
CVC2 (mad, bed, hop) 8 8 (0–32) 8 (0–16) 8 (0–8)
Bisyllabic 1 (baby, puppy) 5 5 (0–20) 5 (0–10) 5 (0–5)
Bisyllabic 2 (bunny, happy) 6 6 (0–24) 6 (0–6)
Multisyllabic (banana, kangaroo) 6 6 (0–24) 6 (0–12) 6 (0–6) 6 (0–6)
Utterances of increasing length (dad, hi dad, hi daddy) 15 15 (0–60) 15 (0–30)
Total utterances 66 66 56 21 28
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.×
Table 1 Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.
Dynamic Evaluation of Motor Speech Skills (DEMSS) content coverage.×
Utterance type (examples) No. of utterances No. of items judged for each subscore (range of possible scores)
Overall articulatory accuracy Vowel accuracy Prosodic accuracy Consistency
CV (me, hi) 8 8 (0–32) 8 (0–16) 4 (0–4)
VC (up, eat) 8 8 (0–32) 8 (0–16) 4 (0–4)
Reduplicated syllables (mama, booboo) 4 4 (0–16) 4 (0–4)
CVC1 (mom, peep, pop) 6 6 (0–24) 6 (0–12) 6 (0–6)
CVC2 (mad, bed, hop) 8 8 (0–32) 8 (0–16) 8 (0–8)
Bisyllabic 1 (baby, puppy) 5 5 (0–20) 5 (0–10) 5 (0–5)
Bisyllabic 2 (bunny, happy) 6 6 (0–24) 6 (0–6)
Multisyllabic (banana, kangaroo) 6 6 (0–24) 6 (0–12) 6 (0–6) 6 (0–6)
Utterances of increasing length (dad, hi dad, hi daddy) 15 15 (0–60) 15 (0–30)
Total utterances 66 66 56 21 28
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.
Note. Only items within specific utterance types contribute to subscores for consistency, vowel accuracy, or prosodic accuracy.×
×
Table 2 DEMSS scoring.
DEMSS scoring.×
Assigning specific scores
Overall articulatory accuracy: 5-point multidimensional scoring0 = correct on first attempt1 = consistent developmental substitution error (e.g., /t/ for /k/; /w/ for /r/) without slowness or distortion of movement gestures2 = correct after first cued attempt3 = correct after two or three additional cued attempts4 = not correct after all cued attempts
Vowel accuracy: 3-point multidimensional scoring0 = correct1 = mild distortion2 = frank distortion
Prosodic accuracy: Binary scoring0 = correct1 = incorrect
Consistency: Binary scoring0 = consistent across all trials1 = inconsistent across any 2 or more trials
Table 2 DEMSS scoring.
DEMSS scoring.×
Assigning specific scores
Overall articulatory accuracy: 5-point multidimensional scoring0 = correct on first attempt1 = consistent developmental substitution error (e.g., /t/ for /k/; /w/ for /r/) without slowness or distortion of movement gestures2 = correct after first cued attempt3 = correct after two or three additional cued attempts4 = not correct after all cued attempts
Vowel accuracy: 3-point multidimensional scoring0 = correct1 = mild distortion2 = frank distortion
Prosodic accuracy: Binary scoring0 = correct1 = incorrect
Consistency: Binary scoring0 = consistent across all trials1 = inconsistent across any 2 or more trials
×
Table 3 Participant characteristics ordered by DEMSS score within diagnostic group.
Participant characteristics ordered by DEMSS score within diagnostic group.×
Group ID DEMSS GFTA Sex Age (mos) PPVT RLS ELS
Non-CAS 6 0 97 F 72 93 100 75
8 0 105 F 62 114 105 115
65 4 69 M 59 126 119 115
59 4 72 M 62 119 94 108
50 4 97 M 50 101 81 95
45 6 61 M 72 77 93 93
17 7 92 F 54 120 102 89
5 7 97 M 61 108 85 87
75 10 83 M 62 119 122 101
43 11 80 M 52 123 110 95
42 11 86 M 53 115 110 103
49 12 106 M 49 97 87 93
35 13 97 M 61 87 82 84
2 17 80 M 66 110 108 100
61 18 84 M 38 89 81 80
14 18 91 M 51 100 95 95
47 19 77 M 60 77 85 80
80 20 54 M 63 109 97 80
81 20 77 M 46 101 103 93
24 20 91 M 64 111 97 102
54 21 106 M 39 129 112 122
23 22 71 F 53 99 104 103
36 22 88 F 40 110 116 114
21 22 92 M 64 94 78 73
26 23 90 F 58 85 84
72 24 40 M 75 92 97 73
46 25 46 F 61 85 99
38 25 85 M 45 98 98 92
78 27 60 M 61 112 115 88
73 28 77 F 45 119 107 92
83 29 73 M 47 103 107 96
19 30 80 M 44 93 90 69
12 32 78 M 48 95 92 86
18 34 95 F 44 106 86 92
44 38 45 M 72 93 88 87
15 41 107 M 39 126 96 100
27 42 49 M 67 78 58 70
29 42 78 M 40 102 104 100
74 45 78 M 47 77 88 74
63 46 70 M 56 63 73 67
62 50 78 M 45 118 107 110
10 52 80 M 64 63 67 63
48 54 82 M 56 92 91 78
60 55 78 M 57 102 83 78
52 57 73 M 56 95 95 80
11 62 69 M 73 70 69 59
76 62 76 M 45 91 90 83
3 63 83 F 39 109 114 112
34 82 81 M 36 92 92 102
1 82 93 M 39 97 94 81
58 83 62 F 54 112 105 93
32 85 82 F 53 77
4 95 60 M 71 103 78 86
71 101 56 F 47 103 104 86
20 103 81 M 36 94 85 86
31 106 71 M 39 108 96 100
7 109 84 M 46 77
13 113 60 M 45 110 113 119
51 120 49 M 61 88 81 70
37 134 69 M 43 85 86 82
79 205 40 M 56 83 76 82
Mild CAS 66 54 56 M 56 119 99 123
57 74 55 F 63 99 94 85
16 99 47 M 71 86 69 65
9 112 50 M 68 90 84 70
30 123 62 F 45 123 90 92
28 145 40 F 69 84 74 75
67 159 65 M 44 83 73 72
56 159 81 M 37 90
Severe CAS 41 46 46 M 69 102 103 99
25 73 58 M 48 85 76 86
70 161 < 40 M 71 80 64
82 212 71 M 53 106 89 74
39 232 78 M 38 90 90 80
40 237 40 M 74 88 92
64 240 40 M 70 75 59 64
22 260 52 F 45 101 87 79
55 270 40 F 79 68
77 328 60 M 84 64 50 50
53 337 59 M 37 98 86 90
33 425 40 M 43 95 84 57
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.×
Table 3 Participant characteristics ordered by DEMSS score within diagnostic group.
Participant characteristics ordered by DEMSS score within diagnostic group.×
Group ID DEMSS GFTA Sex Age (mos) PPVT RLS ELS
Non-CAS 6 0 97 F 72 93 100 75
8 0 105 F 62 114 105 115
65 4 69 M 59 126 119 115
59 4 72 M 62 119 94 108
50 4 97 M 50 101 81 95
45 6 61 M 72 77 93 93
17 7 92 F 54 120 102 89
5 7 97 M 61 108 85 87
75 10 83 M 62 119 122 101
43 11 80 M 52 123 110 95
42 11 86 M 53 115 110 103
49 12 106 M 49 97 87 93
35 13 97 M 61 87 82 84
2 17 80 M 66 110 108 100
61 18 84 M 38 89 81 80
14 18 91 M 51 100 95 95
47 19 77 M 60 77 85 80
80 20 54 M 63 109 97 80
81 20 77 M 46 101 103 93
24 20 91 M 64 111 97 102
54 21 106 M 39 129 112 122
23 22 71 F 53 99 104 103
36 22 88 F 40 110 116 114
21 22 92 M 64 94 78 73
26 23 90 F 58 85 84
72 24 40 M 75 92 97 73
46 25 46 F 61 85 99
38 25 85 M 45 98 98 92
78 27 60 M 61 112 115 88
73 28 77 F 45 119 107 92
83 29 73 M 47 103 107 96
19 30 80 M 44 93 90 69
12 32 78 M 48 95 92 86
18 34 95 F 44 106 86 92
44 38 45 M 72 93 88 87
15 41 107 M 39 126 96 100
27 42 49 M 67 78 58 70
29 42 78 M 40 102 104 100
74 45 78 M 47 77 88 74
63 46 70 M 56 63 73 67
62 50 78 M 45 118 107 110
10 52 80 M 64 63 67 63
48 54 82 M 56 92 91 78
60 55 78 M 57 102 83 78
52 57 73 M 56 95 95 80
11 62 69 M 73 70 69 59
76 62 76 M 45 91 90 83
3 63 83 F 39 109 114 112
34 82 81 M 36 92 92 102
1 82 93 M 39 97 94 81
58 83 62 F 54 112 105 93
32 85 82 F 53 77
4 95 60 M 71 103 78 86
71 101 56 F 47 103 104 86
20 103 81 M 36 94 85 86
31 106 71 M 39 108 96 100
7 109 84 M 46 77
13 113 60 M 45 110 113 119
51 120 49 M 61 88 81 70
37 134 69 M 43 85 86 82
79 205 40 M 56 83 76 82
Mild CAS 66 54 56 M 56 119 99 123
57 74 55 F 63 99 94 85
16 99 47 M 71 86 69 65
9 112 50 M 68 90 84 70
30 123 62 F 45 123 90 92
28 145 40 F 69 84 74 75
67 159 65 M 44 83 73 72
56 159 81 M 37 90
Severe CAS 41 46 46 M 69 102 103 99
25 73 58 M 48 85 76 86
70 161 < 40 M 71 80 64
82 212 71 M 53 106 89 74
39 232 78 M 38 90 90 80
40 237 40 M 74 88 92
64 240 40 M 70 75 59 64
22 260 52 F 45 101 87 79
55 270 40 F 79 68
77 328 60 M 84 64 50 50
53 337 59 M 37 98 86 90
33 425 40 M 43 95 84 57
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.
Note. Em dashes indicate that child attention or lack of time prevented completion of the entire test and of scoring. ID = identification; GFTA = standard score on the Goldman Fristoe Test of Articulation—Second Edition; PPVT = Peabody Picture Vocabulary Test; RLS = Receptive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; ELS = Expressive Language standard score on the Oral and Written Language Scales or the Preschool Language Scales; CAS = childhood apraxia of speech; M = male; F = female.×
×
Table 4 Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.
Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.×
Reliability
Variable Test–retest Intrajudge Interjudge
Total DEMSS score 1.00 [.99, 1.00] .92 [.77, .98] .98 [.95, .99]
Vowel accuracy subscore .99 [.98, 1.00] .89 [.67, .97] .94 [.87, .98]
Overall articulatory accuracy subscore .99 [.98, 1.00] .95 [.84, .98] .98 [.96, .99]
Prosodic accuracy subscore .99 [.96, 1.00] .31 [−.27, .73] .91 [.78, .96]
Consistency subscore .82 [.48, .95] .38 [−.20, .77] .73 [.44, .88]
Table 4 Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.
Intraclass correlation coefficient estimate (95% confidence interval [CI]) for three components of the reliability analysis.×
Reliability
Variable Test–retest Intrajudge Interjudge
Total DEMSS score 1.00 [.99, 1.00] .92 [.77, .98] .98 [.95, .99]
Vowel accuracy subscore .99 [.98, 1.00] .89 [.67, .97] .94 [.87, .98]
Overall articulatory accuracy subscore .99 [.98, 1.00] .95 [.84, .98] .98 [.96, .99]
Prosodic accuracy subscore .99 [.96, 1.00] .31 [−.27, .73] .91 [.78, .96]
Consistency subscore .82 [.48, .95] .38 [−.20, .77] .73 [.44, .88]
×
Table 5 Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.
Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.×
Measure AUROC (95% CI) Logistic regression cutoffa Sensitivity/specificityb Specificity at 90% sensitivity LR+/LR−b
Total DEMSS 0.93 [0.83, 0.97] 129 0.65/0.97 0.70 19.8/0.36
Vowel accuracy 0.91 [0.79, 0.96] 20 0.60/0.98 0.62 36.6/0.41
Total accuracy 0.93 [0.82, 0.97] 99 0.65/0.98 0.66 39.6/0.36
Prosody 0.78 [0.64, 0.87] 6 0.35/0.95 0.49 7.1/0.68
Consistency 0.93 [0.82, 0.97] 11 0.70/0.93 0.74 10.7/0.32
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.×
a Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.
Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.×
b Based on the indicated logistic regression cutoff.
Based on the indicated logistic regression cutoff.×
Table 5 Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.
Summary measures indicating how well the DEMSS total score and subscores discriminate among children with and without CAS, regardless of severity.×
Measure AUROC (95% CI) Logistic regression cutoffa Sensitivity/specificityb Specificity at 90% sensitivity LR+/LR−b
Total DEMSS 0.93 [0.83, 0.97] 129 0.65/0.97 0.70 19.8/0.36
Vowel accuracy 0.91 [0.79, 0.96] 20 0.60/0.98 0.62 36.6/0.41
Total accuracy 0.93 [0.82, 0.97] 99 0.65/0.98 0.66 39.6/0.36
Prosody 0.78 [0.64, 0.87] 6 0.35/0.95 0.49 7.1/0.68
Consistency 0.93 [0.82, 0.97] 11 0.70/0.93 0.74 10.7/0.32
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.
Note. AUROC = area under the receiver operating characteristic curve; LR+ = likelihood ratio for a positive test result; LR− = likelihood ratio for a negative test result.×
a Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.
Cutoff is defined as the minimum value such that the estimated probability of CAS from the logistic regression model is greater than 0.5.×
b Based on the indicated logistic regression cutoff.
Based on the indicated logistic regression cutoff.×
×