The Segmentation Problem in Early Language Acquisition
Kim Plunkett
University of Aarhus, Denmark
[The following paper does not include diagrams that are
referred to in the text; however, CRL will be more than
happy to send hard copies, which include diagrams, to people
who request them. Requests can be directed to us via email
or regular mail. Additionally, footnotes in this version of
the paper appear at the end.]
The segmentation problem in early language acquisition
Kim Plunkett *
University of Aarhus, Denmark
Abstract
An important source of individual variation in young
children acquiring their first language is the variety of
potential solutions to the segmentation problem. Alternative
solutions can result in non-standard forms, such as formu-
laic expressions and phonologically reduced forms, in early
productions. Articulatory/fluency criteria for identifying
formulaic expressions, phonologically reduced forms and tar-
get lexemes in linguistic productions are defined and
applied to the analysis of two Danish children's language
development between the ages of 12 months and 26 months.
The results of this analysis are compared to the results of
applying standard distributional/frequency criteria in the
tabulation of mean length of utterance and vocabulary pro-
files for both standard and non-standard forms. It is
argued that although the two methods yield converging pro-
files of development during the latter part of the period
studied, articulatory/fluency criteria provide a coherent
methodology for analysing children's early linguistic pro-
ductions and offer a potentially powerful tool for identify-
ing alternative segmentation strategies. The application of
articulatory/fluency criteria identifies one of the children
as seeking primarily holistic solutions to the segmentation
problem and relying heavily in early acquisition on formu-
laic expressions. The second child seeks primarily analytic
solutions to the segmentation problem and is prone to use
phonologically reduced forms as productive linguistic units.
Profiles of vocabulary development for these two children
suggest that the solution to the segmentation problem may be
an important trigger for their vocabulary spurts. Possible
environmental correlates of these individual differences are
discussed and a learning mechanism for solving the segmenta-
tion problem in a manner which honours the facts of indivi-
dual variation is introduced.
1. Introduction
A central problem in first language acquisition is the
segmentation problem: How do children discover the struc-
tural components in the speech signal without knowing the
identity of the target elements Peters [1983]? Although
adult speech contains a wealth of phonetic and prosodic cues
that afford a structural interpretation, these cues are
often ambiguous and distorted as a result of performance
factors (such as hesitations, slips of the tongue, etc.) or
as a result of interference from other linguistic cues
transmitted in the speech signal (co-articulation effects,
context effects, etc.). Children must overcome these diffi-
culties by bootstrapping their way into the linguistic sys-
tem, making use of whatever information they can extract
from the surrounding environment (linguistic and non-
linguistic) and their own predispositions (innate or
acquired) for processing linguistic information.
En route to solving the segmentation problem, children
may ascribe structural properties to the speech signal which
do not match those of the adult language user. For example,
children may ascribe lexical status to syllabic sequences
which in the adult tongue are considered parts of words.
Thus, Bloom [1973] and Peters [1989] note the prevalent use
of schwaa as a filler expression in early word combina-
tions. Alternatively, children may ascribe lexical status to
whole sequences of words and use these lexical chunks with
the same distributional properties as adult lexical items.
MacWhinney [1978] refers to these apparently unanalysed
sequences as amalgams. Peters [1983] calls them formu-
laic expressions. The use of fillers and formulaic expres-
sions would appear to be a natural outcome of children's
attempts to solve the segmentation problem. [1] Elements of
the speech signal, thus identified, may serve as indispens-
able bootstraps for children entering the linguistic commun-
ity, especially in interaction with conversational partners
willing to offer generous interpretations of non-standard
forms.
A number of authors (Bates, Bretherton & Snyder
[1988]; Nelson [1981]; Peters [1983]) have noted indivi-
dual differences between children in the degree to which
they exploit fillers and formulaic expressions. Furthermore,
it has been observed that these differences tend to corre-
late with other characteristic differences between children.
For example, Bates, Bretherton & Snyder [1988], Bretherton
et al. [1983], Hampson & Nelson [In Press], and Nelson
[1973] observe that reliance on formulaic expressions is
characteristic of a ``social/expressive'' speech style and a
proportionally high use of pronouns. In contrast, the
absence of formulaic expressions in a child's productive
vocabulary has been associated with an
``analytic/referential'' speech style characterised by a
proportionally high use of concrete nouns. Peters [1983]
also notes that formulaic productions are typically highly
fluent, possessing a ``mush-mouth'' character.
Individual variation in the use of formulaic expres-
sions and fillers across children may result from alterna-
tive solutions in the application of segmentation processes.
Differences in the perceptual processing preferences of
individual children as well as variation in the linguistic
environment (parental speech style, degree of exposure to
other children, etc.) undoubtedly influence the performance
of the parsing mechanisms that yield the potential struc-
tural elements in the speech signal. For example, the ten-
dency for analytic/referential language users to possess a
high proportion of concrete object names in their vocabu-
laries may result from a parsing preference by some children
to focus on stressed parts of the speech signal. Pronominal
usage may reflect a preference for particular positional
characteristics (signal initial or final) in the input
stream or a sensitivity to frequency characteristics of the
speech signal. Alternatively, individual differences
between children may emerge from the intersection of a range
of factors, distinct from segmentation issues, such as
social and cognitive style (what type of things do children
like to talk about and how do they prefer to express them-
selves), level of conceptual and semantic development (what
do they know about the meanings of different words) and
conversational settings (the degree to which the interests
and conversational styles of others constrain the linguistic
productions of children).
The determination of the child's representation of
linguistic units constitutes a major methodological problem
for child language research. In essence, the child language
researcher is confronted with a problem similar to that con-
fronting the child acquiring the language: What are the pro-
ductive units underlying the speech signal? For example,
when the child articulates the utterance ``What's that?'',
the analyst must decide whether it consists of one, two or
three distinct morphemes. Typically, researchers resolve
this problem by assuming that a target item produced by the
child can be identified with a morpheme in the adult lexi-
con. Various criteria are then applied to determine the for-
mulaic status of an utterance. [2] Distributional and fre-
quency criteria (Brown [1973]) have been proposed as tools
in helping to establish the internal structure of children's
utterances. Thus, if potential constituent morphemes only
occur within the context of a given expression, then it may
be prudent to treat the expression as formulaic. Similarly,
if particular expressions are used frequently by a child,
then those expressions may be unanalysed for that child.
Note that neither of these criteria guarantee identifica-
tion of formulaic expressions. Morphemes may be used pro-
ductively by a child in a variety of distributional
contexts, and yet still participate non-productively in for-
mulaic expressions. The term ``bucket'' may be used produc-
tively in a variety of constructions, but in the phrase
``kicked the bucket'', it has taken on an idiomatic non-
productive status. And particular expressions may be highly
frequent in the child's productions and still result from
the combination of distinct lexical representations. These
shortcomings suggest that evaluations of formulaicity also
need to take into account properties specific or local to a
given token of an expression.
Hickey [1990] suggests criteria to determine whether
an expression is formulaic (relative to other spontaneous
utterances). These concern the length of the expression, its
phonological coherence, the level of grammatical complexity
of the expression, its frequency of usage in the community,
the idiosyncracy of the expression, situational dependency
and semantic or syntactic appropriateness. She points out
that none of these criteria alone will suffice in determin-
ing the formulaic status of an expression. Some criteria,
such as length and phonological coherence, may be necessary
characteristics of a formulaic expression but other criteria
can only be considered typical and not definitional of for-
mulaicity. Hickey [1990] concludes that the evaluation of
an expression should be made in relation to a graded contin-
uum i.e., expressions can be more or less formulaic.
Plunkett [1986] describes a longitudinal investigation
of two Danish children's linguistic and cognitive develop-
ment between the ages of 12 months and 25 months. This study
identifies three types of linguistic units in the children's
productions; idiosyncratic expressions, target lexemes and
formulaic expressions. Each of these types can be con-
sidered alternative solutions to the segmentation problem.
Idiosyncratic expressions involve sound segments produced
consistently by the children which cannot be identified with
lexical items in the adult language. Typically, idiosyn-
cratic expressions are filler-like segments containing a
single vowel or vowel-consonant combination. They can be
thought of as undershooting solutions to the segmentation
problem: The child extracts sound segments from the input
signal which undershoot the scope of adult lexical items. In
contrast, formulaic expressions result from overshooting
solutions to the segmentation problem. Formulaic expres-
sions contain identifiable adult lexical items in combina-
tion with other sound segments which may be either other
lexical items or filler-like material. Finally, target lex-
emes represent correct solutions to the segmentation problem
in that they map directly onto adult lexical items. The
three types of linguistic unit are thus related in terms of
their length, with target lexemes representing the inter-
mediate case.
In order to distinguish between formulaic expressions
and productive combinations, Plunkett [1986] applied
standard distributional and frequency criteria in establish-
ing the status of a linguistic expression. These evaluations
provided the foundations for establishing a profile of MLU
(mean length of utterance) and vocabulary development for
the two children. For one child, this analysis revealed a
``standard'' MLU profile, such as that observed for
Brown's [1973] subjects Adam and Eve, i.e., MLU remains at
a low level (close to 1.0) until around 21 months when both
MLU and vocabulary size show marked increases. However, for
the other child, MLU began at a high level (around 1.6),
increased to around 2.0 by 14 months of age and then dropped
over consecutive sessions during a three month period to
around 1.3. A subsequent increase in MLU coincided with
the child's vocabulary spurt (around 22 months). Plunkett
[1986] speculates that the unusually high MLU observed in
the child Jens during early development is an artifact of
the inadequacy of distributional and frequency criteria to
identify formulaic expressions in this child's speech.
Ascribing productive status to expressions which are formu-
laic artificially inflates MLU measures.
More generally, it might be concluded from this study
that distributional and frequency criteria are inadequate to
the task of uncovering this Danish child's solution to the
segmentation problem. The issue then arises as to whether
distributional and frequency criteria should be supplemented
with other criteria, as Hickey [1990] suggests, or whether
they should be discarded in favour of a separate set of cri-
teria. In the following, an alternative set of criteria
based on articulatory/fluency factors (relating to Hickey's
[1990] necessary phonological coherence criterion) will be
proposed for identifying children's solutions to the segmen-
tation problem, and systematically compared with analyses
based exclusively on distributional and frequency criteria.
In the coding of the two Danish children's productions
reported in Plunkett [1986], transcribers noted informally
that the children differed in how well they tended to arti-
culate their utterances. For example, Jens appeared more
prone than Anne to produce longer, imprecisely articulated
utterances whilst Anne's utterances appeared shorter and
clearly articulated. [3] As noted above, Peters [1983]
points out that formulaic expressions are often produced in
a ``fluent'' manner by children, also noting that children
may be focusing on supra-segmental aspects of the utterance.
Formulaic expressions may have prosodic qualities associated
with those of adult utterances (such as intonation) but may
be articulated imprecisely, resulting in difficulty in iden-
tifying the content of the expression. In contrast, limited
processing resources may result in shorter linguistic units
such as target lexemes (or segments thereof) being
articulated more clearly by the child and hence being easier
to identify. It is possible to hypothesise an
articulatory/fluency continuum for linguistic units where
long, badly articulated, formulaic expressions represent one
end of the continuum and precisely articulated, short units
(such as mono-syllabic target lexemes or segments of target
lexemes) represent the other end of the continuum.
Articulatory/fluency criteria may thus offer a means for
identifying alternative solutions to the segmentation prob-
lem, where overshooting solutions (formulaic expressions)
will tend to be badly articulated and undershooting solu-
tions (segments of target lexemes) will tend to be clearly
articulated. Undershooting solutions to the segmentation
problem will henceforth be referred to as Phonologically
Reduced Forms (PRFs).
There are good theoretical and experimental grounds for
supposing that the precision with which a linguistic unit is
articulated is closely related to its length. Lindblom
[1985] has argued for an approach to understanding the
developing phonetic skills of children in terms of self-
organising systems. He shows how the placement of vowels in
phonetic space is dynamically sensitive to the position of
other vowels in this space. The tendency of vowels to change
character according to linguistic context may be ascribed
(partially) to the dynamics of this phonetic space. The
phonetic realisation of a vowel is determined not only by
the other phonemes with which it combines in a given utter-
ance, but also by the internal dynamics of the vowel system
to which it belongs. A similar point has been made by Jor-
dan [1990] in a computer model of motor control in speech
production.
Integral defining features of Lindblom's [1985]
phonetic space are the dimensions of articulatory precision
and fluency. Articulatory precision defines the degree of
accuracy (in relation to a target) with which a phoneme is
articulated. Fluency refers to the level of integration of a
sequence of phonemes that differentiates a smooth from a
halting performance. Any phonetic production by a speaker
can be evaluated along these two scales. Lindblom [1985]
concludes from a series of experimental studies that the
dimensions of articulatory precision and fluency are
inversely related i.e., under conditions which require arti-
culate speech, fluency tends to deteriorate whilst articula-
tion deteriorates when high fluency is demanded. This
trade-off between articulatory precision and fluency is
explained as an emergent property of the self-organising
dynamics of phonetic space. In other words, it is difficult
to talk quickly and accurately at the same time. [4]
Now consider a young child confronted with the segmen-
tation problem. Let us suppose that regularities in the
child's linguistic environment together with her perceptual
processing mechanisms, have led the child to identify a
sequence of sounds as a structural component in the speech
signal. Let us further suppose that in this particular case,
the sequence of sounds corresponds to a sequence of words in
the adult lexicon (e.g., ``What's that?''). The child stores
this linguistic unit in memory, encoding such information as
its distributional linguistic properties, its non-linguistic
context of usage, perhaps information about its meaning, and
information about its phonetic and supra-segmental shape.
The processes of segmentation may also lead the child to
identify ``That?'' as a linguistic unit. Many of the
representational properties encoded along with ``That?'' may
overlap with those of ``What's that?''. However, the
representation of phonetic and supra-segmental shape will
crucially differ in the two cases. In particular, the pro-
duction of the two units require coordination of a sequence
of phonemes that place differential load on the ``fluency''
dimension of the self-organising phonetic space. The
expression ``What's that?'' requires a greater degree of
phonetic integration that the expression ``That?''. Since
fluency is inversely related to articulatory precision, it
can be expected that the fluency load on the longer expres-
sion will detract from its articulatory precision. Thus, the
shorter expression ``That?'' will be articulated more pre-
cisely than the longer expression ``What's that?''. Further-
more, to the extent that the supra-segmental properties of a
linguistic unit are unaffected by the self-organising char-
acter of phonetic space, we may expect the supra-segmental
properties of the longer unit to remain intact whilst its
phonetic features suffer articulatory distortion. [5]
It is unclear whether the proposed trade-off between
articulatory and fluency characteristics should be con-
sidered a part of the linguistic representation of a unit or
a by-product of the processes of production. The distinc-
tion between representation and production may have impor-
tant implications for children's developing segmentation
hypotheses: If a unit's phonetic features are imprecisely
encoded, then the representation of that unit will be
affected in its role in recognising new tokens of the unit
or functioning itself as the object of further segmentation
(Snow [1986]). In this case, the representational system
may exhibit a natural tendency to maintain phonetically pre-
cise units in memory, as the phonetically imprecise units
atrophy in the absence of a functional recognitory role.
In summary, this framework provides a theoretical
rationale for the claim that longer linguistic units will
tend to be badly articulated and shorter linguistic units
will be precisely articulated. By definition, overshooting
solutions to the segmentation problem (formulaic expres-
sions) will be classified as longer linguistic units whilst
undershooting solutions (PRFs produced as productive units)
will be classified as shorter linguistic units. Hence, a
hierarchy of articulatory precision can be predicted for the
three logically permitted solutions to the segmentation
problem: Phonologically reduced forms will tend to be well
articulated, target lexemes less well articulated and formu-
laic expressions the least well articulated. This hierarchy
is depicted in Figure 1, which also shows the postulated
inverse relationship between precision of articulation and
degree of fluency.
Two factors interfere with these predictions. First,
target lexemes vary in length and hence their articulation
will vary (all else being equal) in phonetic precision.
Thus, there will be some overlap in articulatory precision
of long target lexemes and formulaic expressions on the one
hand, and short target lexemes and PRFs on the other hand.
Within the category of units that map onto adult words,
short words should be easiest to identify given their poten-
tial accuracy of articulation. Second, practice on any
given sound sequence will tend to influence the equilibrium
between fluency and articulatory precision such that more
fluent renditions of a sound sequence are achieved without
loss of articulatory precision. Thus, formulaic expressions
(and long target lexemes) may increase in articulatory accu-
racy with practice.
In this paper, a complete re-analysis of the data
described in Plunkett [1986] is presented in an attempt
to evaluate the degree to which articulatory/fluency charac-
teristics of children's speech furnish an adequate set of
criteria for identifying children's solutions to the segmen-
tation problem. In the next section, an operationalisation
of articulatory/fluency criteria is presented and then
applied to an analysis of the two Danish children's linguis-
tic productions. A profile of vocabulary development,
including formulaic expressions, target lexemes and phono-
logically reduced forms, together with MLU assessments is
provided. The results of this analysis are then compared
with a re-analysis based exclusively on
distributional/frequency criteria. These analyses are used
to evaluate the view that a source of individual variation
in children is the range of solutions they uncover in their
attempts to solve the segmentation problem and that this
variation interacts with the self-organising character of
the phonetic system to produce individual variation in the
articulatory precision of children's utterances.
2. Methodology
Two Danish children, a boy and a girl, and their
parents have participated in a longitudinal study of
linguistic and cognitive development. The parents volun-
teered their families for the investigation. In one family,
the girl, Anne, has an elder sister (by two years) and
parents who have both completed university educations. In
the second family, the boy, Jens, is a single child. His
mother was beginning a university education and his father
is a skilled labourer.
The families were visited in their homes on a regular
basis (approximately every 10 days) and 60 to 90 minute
audio and video recordings made of the children in interac-
tion with their parents (most often the mother) in a variety
of situations. All sessions included a free play situation,
a testing situation and an eating situation. On some occa-
sions, bathing and kitchen situations were also recorded.
Testing situations involved administering Uzgiriz & Hunt
[1975] infancy assessment scales. Two investigators were
present at each visit; one investigator carried out testing
procedures and took notes; the second investigator managed
the recording equipment. The parents were also interviewed
about their child's development since the previous visit.
Specifically, they were questioned about the children's
motor development, use of new and old words and any other
noteworthy events in the family. Recordings of the boy,
Jens, began when he was 11 months old. Recordings of the
girl, Anne, began when she was 8 months old.
Complete transcriptions of the children's and parent's
speech and any investigator speech to the children were
coded in a computerised database. Transcriptions were made
primarily from the video recordings though the audio record-
ings were used on those occasions where improved sound qual-
ity might aid analysis. Non-verbal activities that might
help in the interpretation of the speech were also coded in
the transcriptions. The transcription format is based on the
Chat scheme taken from the Childes initiative (MacWhinney
[1990]; MacWhinney & Snow [1985]; MacWhinney & Snow [1990]).
Utterances were identified in accordance with intonational
and pause criteria (Snow [1972]). Interjudge reliability
checks on identification of utterance boundaries across 20%
of the database for two independent transcribers achieved
agreement scores of 90%.
Utterances are identified as containing target lexemes,
phonologically reduced forms (PRFs) or formulaic expres-
sions. Utterances may consist in just one of these expres-
sion types or any combination thereof. A target lexeme is
defined as a unit of speech which can be recognised as a
token of a entire word or morpheme belonging to the adult
lexicon. Recognition criteria are interpreted liberally
insofar as exact replication of the adult form is not
demanded. Thus, a word may be produced with an inappropri-
ate vowel or consonant and yet still be transcribed as a
target lexeme. Recognition is thus determined by global
characteristics of the sound segment and, inevitably, by
appropriateness of the conditions of usage. Inaccuracies in
the pronunciation of a target lexeme are recorded on a
separate coding tier, as permitted by the Chat coding for-
mat.
PRFs are defined as segments of speech which cannot be
identified with nor contain target lexemes. They are assumed
to derive from undershooting solutions to the segmentation
problem. PRFs are judged by the transcriber to be used in a
meaningful, communicative fashion. For example, the child
may point at an object and simultaneously articulate the
vowel /e/ --- a sound which does not correspond to a Danish
word. Furthermore, PRFs must achieve a certain frequency of
usage to be included in the analysis. It is stipulated that
an expression must have been identified on at least three
occasions (within a previous session or the current session)
to qualify. PRF expressions are coded in the Chat tran-
scription format using the ``special learner form markers''.
Interjudge reliability measures have been calculated for the
identification of target lexemes and PRFs across 20% of the
database for two independent transcribers. An agreement
score of 92% and 88% for target lexemes and PRFs, respec-
tively, was achieved.
Formulaic expressions are identified in terms of their
content and fluency of articulation. A formulaic expression
must contain at least one target lexeme plus some additional
phonetic material that may be identified as one or more
target lexemes and/or PRFs. The distinction between produc-
tive sequences of target lexemes and PRFs, on the one hand,
and formulaic amalgams of these segments, on the other, is
determined in terms of precision of articulation. Sequences
which are produced fluently and imprecisely articulated are
categorised as formulaic expressions. The operationalisa-
tion of articulatory/fluency criteria is described below.
It is important to distinguish between sound segments
which are incorrectly articulated and those which are pro-
duced in a fluent and imprecise fashion. As noted above, a
child may be attributed with the ability to produce a target
lexeme despite the fact that some of the consonants or
vowels in the actual articulation are incorrect. However,
an incorrectly produced vowel or consonant may still be pre-
cisely articulated. In contrast, sound segments in an actual
production may appear to contain the same phonemes as the
adult target lexeme, and yet be articulated in a fluent and
imprecise manner. In general, a fluent and imprecise articu-
lation will contribute to the difficulty in identifying con-
stituent phonemes, and consequently, the boundaries between
phonemes. On the other hand, the tendency for fluently pro-
duced utterances to carry supra-segmental properties (like
intonation) may assist in the identification of constituent
segments.
The identification of formulaic expressions has been
operationalised in the following manner: For each child and
each session, a set of 10 utterances is selected by a native
speaker of Danish with considerable experience in child
language transcription as being articulated in a character-
istically precise fashion by the child. A second set of 10
utterances is also selected as being produced in a charac-
teristically imprecise, ``mush-mouthed'' fashion. Both set
of utterances contain single unit. [6] Two
independent transcribers (both experienced) are informed
that the 20 utterances from each session contain two clearly
distinct types of child utterance and are asked to categor-
ise the utterances along the dimension of articulatory pre-
cision versus mush-mouthedness. The transcribers compare
their judgements until they reach full agreement on
categorising this limited set of utterances for each child
and each session, into the two correct sets. The two tran-
scribers are then asked to make ``forced-choice'' judgements
into the two categories of articulatory precision on 20 of
the child utterances for each session selected randomly from
the database. Interjudge reliability scores are then com-
puted. These measures indicate an 85 level of agreement
between the two transcribers. The remaining material for
each child and each session is then coded by an individual
(trained) transcriber.
In summary, target lexemes are those sound sequences
which can be identified with adult lexical items. They may
occur in isolation (as single-word utterances) or in combi-
nation with other linguistic units. If target lexemes occur
alone, or in combination with other linguistic units and
they are precisely articulated, then they are attributed the
status of productive lexical items. Otherwise, target lex-
emes are categorised as occurring in formulaic combination
with additional phonetic material. Similarly, PRFs may
occur in isolation or in combination with other linguistic
units. In both cases they are assumed to derive from
undershooting solutions to the segmentation problem. PRFs
which occur in isolation, or in combination with other
linguistic units and are precisely articulated, are categor-
ised as having a productive status. PRFs which are articu-
lated imprecisely together with additional linguistic
material are categorised as being in formulaic combination
with those sound segments. Finally, formulaic expressions
are those sound sequences which have been identified as con-
taining at least one target lexeme plus some other phonetic
material (either other target lexemes or PRFs) and which
have been categorised under a forced choice decision as
being imprecisely articulated. [7] Formulaic expressions are
coded in Chat format using the notation for ``compound''
expressions. This format maintains a coding for potential
boundaries between the constituents of formulaic
expressions, and hence permits the analysis of the database
in terms of both articulatory/fluency criteria and
distributional/frequency criteria. [8]
3. Analysis
Measures of mean length of utterance ( MLU ) are com-
puted for each child on each session. MLU is computed
under two conditions for identifying the child's representa-
tion of linguistic units:
1. Articulatory/fluency criteria are used to identify pro-
ductive linguistic units.
2. Distributional/frequency criteria, as suggested by
Brown [1973], are used to identify productive linguistic
units. Both target lexemes and PRFs are accorded produc-
tive status only if they occur in at least three distinct
linguistic contexts, including single unit utterance con-
texts.
Formulaic expressions are treated as single units in the
calculation of MLU irrespective of the method used for iden-
tifying formulaic expressions. In each condition, MLU is
calculated both when PRFs are included and discounted from
the measure. MLU calculations include all utterances pro-
duced by the child on any given session. Tabulations of the
MLF (mean length of formulaic expressions) as identified by
articulatory/fluency criteria and use of formulaic expres-
sions, target lexemes and PRFs as a proportion of total
vocabulary are also calculated. Finally, the phonological
overlap between PRFs in formulaic expressions and PRFs used
in productive combinations or used in single unit utterances
is examined.
4. Results and Discussion
The following analysis is based on 6776 and 4229 utter-
ances by the children Anne and Jens, respectively. This
yields an average of 322 utterances per session for Anne and
201 utterances per session for Jens. The minimum number of
utterances observed in a session was 76 for both children
(session 5 for Anne and session 9 for Jens). There was one
other session where total observed utterances dropped below
100 (session 11 for Jens).
The analyses summarised in Figures 2 and 3 compare Anne
and Jens' MLU when calculated according to
articulatory/fluency criteria and distributional/frequency
criteria. MLU measures are reported as a means of assess-
ing the emergence and development of productive, combina-
torial language. In addition, these analyses provide an
evaluation of the role that PRFs play in single unit utter-
ances and combinatorial expressions. Normally, PRFs are
excluded from MLU calculations (Miller & Chapman [1981]).
However, the focus here is upon the child's solutions to the
segmentation problem and the role that different types of
segmentation play in the child's productive language. There-
fore, Figures 2 and 3 include comparisons of MLU when PRFs
are included and excluded from the analysis.
Figure 2(a) plots Anne's MLU when measured using
distributional/frequency criteria to identify productive
linguistic units. When PRFs are excluded from the analysis,
Anne starts producing productive combinations of target lex-
emes from 15 months onwards. Significant gains are made in
her MLU score between 21 and 24 months. The inclusion of
PRFs in the analysis tends to elevate MLU scores particu-
larly for the first half of the study reported here. This
finding indicates that Anne tends to use PRFs in combination
with target lexemes early in acquisition but that an
increasing proportion of combinations are composed
exclusively of target lexemes as development proceeds.
Figure 2(b) plots Anne's MLU when measured according
to articulatory/fluency criteria. On this analysis, Anne
starts producing productive combinations from 18 months
onwards. The inclusion of PRFs shows a slight tendency to
elevate MLU scores during the second part of the period
reported here. MLU scores during the first half of
development (up to 18 months) are minimally affected by the
inclusion or exclusion of PRFs in the analysis.
The profile of MLU development for Anne, when calcu-
lated according to articulatory/fluency criteria, matches
that reported for many other children (Miller & Chapman
[1981]), i.e., MLU remains at a minimum level (1.0) during
early acquisition with the first productive combinations
emerging during the second half of the second year. This
development is also often associated with a ``vocabulary
spurt''. In contrast, the distributional/frequency criteria
depict Anne as a somewhat precocious combinatorial language
user, productively combining target lexemes with each other
and PRFs with target lexemes well before 18 months of age.
For the later periods of development,
distributional/frequency and articulatory/fluency criteria
reveal similar MLU profiles.
Figure 3(a) plots Jens' MLU when measured according to
distributional/frequency criteria. When PRFs are included
in the analysis, high levels of MLU are observed throughout.
Furthermore, substantial swings in MLU are observed. In
particular, a tendency for MLU to decrease over 4 consecu-
tive sessions (from 13 months through 16 months) is
apparent. This result replicates the findings reported for
Jens' MLU profile in Plunkett [1986]. Eliminating PRFs
from the MLU analysis on the distributional/frequency condi-
tion dramatically alters the profile of development. MLU
scores are reduced throughout, though again, especially dur-
ing the earlier stages of acquisition. In particular, the
decrement in MLU reported in the original study is obli-
terated, indicating that this earlier finding was due
entirely to including PRFs in the MLU calculations. When
PRFs are eliminated from the analysis, Jens' MLU score is
seen to make substantial gains between 21 and 24 months.
However, MLU for the first half of the study remains high
even when PRFs are excluded from the analysis (compare Fig-
ure 2(a) with Figure 3(a)), indicating that Jens is produc-
ing combinations of target lexemes from the beginning of the
study.
Figure 3(b) plots Jens' MLU when measured according to
articulatory/fluency criteria. On this analysis, productive
combinations do not emerge until around 21 months. From
this point in development onwards, MLU scores make substan-
tial gains. The inclusion of PRFs in the analysis has little
effect on this profile of development. During the latter
part of development (from 21 months), the MLU scores for
both the distributional/frequency and articulatory/fluency
conditions are remarkably similar. Furthermore, the pat-
terns of MLU development for Anne and Jens, when measured
using articulatory/fluency criteria, are quite similar. The
primary difference between the two children is that the
onset of Jens' productive combinations is later and more
sudden.
The results of these analyses support two main conclu-
sions. First, the application of articulatory/fluency cri-
teria to the identification of linguistic units results in a
less precocious profile of the onset of productive, combina-
torial speech for these two children than the profile that
results from the application of distributional/frequency
criteria. Second, articulatory/fluency criteria are less
sensitive to the inclusion of PRFs in the analysis of pro-
ductive, combinatorial speech. This finding indicates that a
greater proportion of PRFs are categorised as belonging to
formulaic expressions when articulatory/fluency criteria are
applied than when distributional/frequency criteria are
applied. The role of PRFs in formulaic expressions will be
evaluated in more detail later in this section.
In general, the profile of productive combinations
revealed by articulatory/fluency criteria provides a picture
of development for these two children which matches that
reported in the literature for many other children. Further-
more, the application of articulatory/fluency criteria elim-
inates some apparent inconsistencies (such as the apparent
regression in MLU for Jens) that can result from the appli-
cation of distributional/frequency criteria. However, these
findings do not warrant the further conclusion that
articulatory/fluency criteria are necessarily a superior
method for identifying linguistic units. In fact, in the
absence of an independent measure of productivity (which we
lack), it is difficult to reach any final conclusion as to
which set of criteria are most appropriate. Furthermore,
given the similarity in profiles of development revealed by
both sets of criteria for the later parts of development, it
is unclear whether either of the methods can be claimed
superior to the other throughout development. At best, it
might be argued that the application of articulatory/fluency
criteria yields a convergent set of interpretations of the
nature of a child's language productions; interpretations
that do not result from the application of
distributional/frequency criteria. In the remainder of this
section, analyses of the two Danish children's use of formu-
laic expressions, target lexemes and PRFs as identified by
articulation/fluency criteria is presented, in an attempt to
provide such a coherent account of their development.
Figure 4(a) provides a breakdown of the proportion of
formulaic expressions, target lexemes and PRFs used by Jens.
The proportions (percentage of total vocabulary) are based
on non-cumulative measures of types used on each session. It
is apparent that a major proportion of Jens' early ``vocabu-
lary items'' consist of formulaic expressions. There is a
period of development (13 through 16 months) in which the
proportion of formulaic expressions decreases over consecu-
tive sessions. This period corresponds to that in which MLU
decreases for Jens when measured by distributional/frequency
criteria (see Figure 3(a)). Plunkett [1986] speculated
that this temporary decrease in MLU might manifest a switch
by Jens from a holistic learning strategy to a more analytic
strategy. It is noteworthy from the current analysis that at
the same time as formulaic expressions decrease in usage,
PRFs (not incorporated in formulaic expressions) undergo an
increase in usage by Jens. In terms of the current theoreti-
cal framework (see section 1), productive PRFs reflect a
tendency to segment the input into small units whilst formu-
laic expressions reflect segmentation into larger chunks.
The decrease in proportion of formulaic expressions and
increase in proportion of PRFs may result from a more gen-
eral switch of segmentation strategy by Jens to smaller
units of phonetic material. The cause of this switch is
unclear. Nevertheless, this interpretation is supported by
further developments in the structure of Jens' vocabulary: A
subsequent increase in formulaic expressions is accompanied
by a decrease in the use of PRFs. Throughout this period
of segmentation switching, the proportion of target lexemes
in Jens' vocabulary remains relatively stable. Target lex-
emes represent an intermediate level of segmentation of the
speech signal --- see Figure 1.
Target lexemes constitute a relatively small proportion
of Jens' vocabulary during the early stages of acquisition.
This proportion gradually increases until 22 months at which
point they constitute a majority of the items in Jens' voca-
bulary. Beyond this point, the proportion of target lexemes
increases dramatically (Jens' vocabulary spurt) and the
usage of formulaic expressions and PRFs atrophies, indicat-
ing that Jens has discovered appropriate adult-like segmen-
tations of the input stream. The vocabulary spurt occurs at
the same time as substantial gains in MLU, as measured by
articulation/fluency criteria, occur for Jens.
Figure 4(b) plots an equivalent vocabulary analysis for
Anne. In contrast to Jens, Anne uses relatively few formu-
laic expressions. On the other hand, her proportional usage
of PRFs is high during early development. Like Jens,
increases in usage of formulaic expressions covary with
decreases in usage of PRFs. Thus, around 16 months formu-
laic expressions increase in usage whilst PRFs decrease.
Usage of target lexemes tends to remain within a fairly con-
fined range during the first half of the second year. This
analysis suggests that Anne tends to focus on shorter seg-
ments in the input stream during early development, though
she does begin to explore larger chunks towards the middle
of her second year. Segmentation switching is thus also
observed in Anne as well as Jens, though in contrastive
directions. By 20 months, however, target lexemes consti-
tute a majority (over 50 ) of Anne's vocabulary items.
Beyond this point of development, the proportion of target
lexemes increases dramatically (Anne's vocabulary spurt) and
the usage of formulaic expressions and PRFs atrophies, indi-
cating that Anne too has discovered appropriate adult-like
segmentations of the input stream. Like Jens, this vocabu-
lary spurt occurs at the same time as substantial gains in
MLU, as measured by articulation/fluency criteria, occur for
Anne.
Several conclusions are warranted from the findings for
the structure and profile of development in Anne and Jens'
vocabularies. First, the earlier work (Plunkett [1986]) is
corroborated in characterising Jens as a holistically
oriented language user and Anne as an analytically oriented
language user insofar as Jens shows a preference for longer
formulaic expressions in early development while Anne shows
a preference for shorter PRFs. It is noteworthy, however,
that both children seem to explore alternative segmentation
strategies before they undergo a vocabulary spurt. Segmenta-
tion switching may represent an attempt by these children to
calibrate their hypotheses as to what constitutes a target
lexeme in the adult language. Second, the onset of the
vocabulary spurt for both of these children corresponds less
to the absolute number of target lexemes in their vocabu-
laries [9] than to the proportion of target lexemes in their
vocabularies. Thus, we observe that when a majority of pos-
tulated linguistic units match vocabulary items in the adult
lexicon, then vocabulary development experiences an
accelerated growth. The finding that usage of PRFs and for-
mulaic expressions atrophies beyond this point suggests that
these two children have discovered a ``key'' to the solution
of the segmentation problem. Although these results do not
tell us what this ``key'' is or the nature of the mechanisms
that might lead to its discovery, these results suggest that
some critical proportional mass of target lexemes may be a
prerequisite for accelerated vocabulary growth and that the
solution to the segmentation problem may play an important
role in triggering the vocabulary spurt. The timing of the
vocabulary spurt corresponds very closely to the achievement
of a high proportion of target lexemes in the children's
productive vocabularies.
A fundamental assumption of this work is that the arti-
culatory precision associated with a linguistic unit is
closely related to its length. Thus, one of the identifying
features of a formulaic expression is that it is imprecisely
articulated. However, the combination of linguistic
units, such as target lexemes, also leads to longer expres-
sions that may themselves suffer imprecise articulation due
to limited processing resources. Therefore, one interpreta-
tion of these findings might be that formulaic expressions
as identified by articulatory/fluency criteria are, in fact,
productive combinations which are imprecisely articulated.
In order to evaluate this interpretation, the next analysis
highlights the formulaic expressions used by these two chil-
dren by calculating the mean length of formulaic expressions
as measured in terms of average number of target lexemes and
PRFs. These results are compared with the children's overall
MLU measures (see Figures 2 and 3).
Figure 5(a) provides an analysis of the formulaic
expressions used by Jens as identified by
articulatory/fluency criteria. The mean length of formulaic
expressions (MLF) is determined both under those conditions
in which PRFs are included and excluded from the calcula-
tions. When PRFs are excluded from the analysis, MLF
reflects the average number of target lexemes used by Jens
in formulaic expressions. The PRFs included in formulas
have the character of ``fillers'' (Peters [1983]), since
they are by definition used in combination with target lex-
emes and are imprecisely and fluently articulated.
During the period that Jens uses formulaic expressions
(up to but not including the final session), they show a
general tendency to increase in length, as defined in terms
of average number of target morphemes. However, during the
early stages of acquisition, the majority of Jens' formulas
contain just one target lexeme plus an expression-initial
and/or expression-final PRF. Towards the end of the period
reported here, PRFs come to play less of a role in formulaic
expressions, which are made up primarily of target lexemes.
Substantial increases in the number of target lexemes in
formulaic expressions occur at the same time as Jens under-
goes a vocabulary spurt.
Most importantly, the average length of formulaic
expressions (as measured in target lexemes) does not greatly
exceed the overall Mlu for Jens during the period in
development when he begins to produce productive combina-
tions (after 22 months) as determined by
articulatory/fluency criteria (see Figure 3(b)). Thus, Jens'
productive combinations are precisely articulated even
though they approach the same length as his formulaic
expressions which are poorly articulated.
An analysis of formulaic expressions for Anne is pro-
vided in Figure 5(b). The pattern of development for Anne is
quite similar to that of Jens. The length of formulaic
expressions, defined only in terms of target lexemes
increases with development. PRFs contribute diminishingly
as development proceeds. For a few sessions in early
acquisition, Anne produces formulaic expressions which con-
sist exclusively of a single target lexeme combined with an
expression-initial PRF. PRFs occupy expression-initial
position when they occur in a formulaic expression
throughout the period reported here. Substantial increases
in the number of target lexemes in formulaic expressions
occur at the same time as Anne undergoes a vocabulary spurt.
Like Jens, the average length of formulaic expressions
does not greatly exceed the overall MLU (as shown in Figure
2(b)) during the period when Anne begins to produce produc-
tive combinations (after 19 months). In other words, Anne's
productive combinations are precisely articulated even
though they approach the same length as her formulaic
expressions. Taken together, these findings indicate that
the trade-off between articulatory precision and fluency
operates primarily at the level of the linguistic unit
rather than over the span of whole utterances.
Although PRFs are derived from undershooting solu-
tions to the segmentation problem, the previous analysis
demonstrates that PRFs are involved in formulaic expres-
sions, particularly during the early stages of acquisition.
This result can be viewed as a natural outcome of the
children's attempts to segment the speech signal. For exam-
ple, an hypothesised unit may overshoot a single target lex-
eme but fail to encompass a second target lexeme. The
resulting unit will contain a single target lexeme plus a
segment of the second target lexeme, i.e., a PRF. However,
this raises the question as to whether the PRFs found in
formulaic expressions are of the same type as PRFs found in
single unit utterances or in productive combination with
target lexemes.
Suppose that formulaic expressions are typically
anchored around stressed parts of the input signal. [10]
Given the tendency for speech to oscillate between stressed
and unstressed segments, we might expect some overshooting
solutions to the segmentation problem to incorporate a
salient stressed item plus a less salient, unstressed item.
Under these conditions of segmentation, PRFs may often
derive from unstressed segments. Grammatical functors are
frequently unstressed segments in the input stream. Hence,
we might expect a tendency for PRFs in formulaic expres-
sions to derive from grammatical functors. In contrast, PRFs
that occur in single unit utterances or productive combina-
tions are likely to derive from salient (for the child) seg-
ments of the input stream. For example, PRFs may derive from
stressed segments. Open class lexical items are frequent
candidates for stress in adult Danish. Thus, it might be
expected that productive PRFs are often based on open
class lexical items. A corollary of this prediction is that
the class of productive PRFs is likely to be more numerous
than the class of formulaic PRFs, since grammatical functors
constitute a relatively small closed class of phonological
forms.
This prediction is tested in an analysis of the range
of distinct types of PRFs used by the two Danish children
over the course of development. Figure 6 summarises the use
of PRFs by Jens and Anne in terms of three different
categories. First, tokens of productive PRFs are identi-
fied. These consist of PRFs that occur exclusively in single
unit utterances or productive combinations. Second, tokens
of formulaic PRFs are identified. These consist of PRFs that
occur exclusively in formulaic expressions. Finally, tokens
of shared PRFs are identified as those PRFs which occur
both in formulaic expressions and in single unit utterances
or productive combinations.
Figure 6(a) provides a breakdown of the proportion of
productive PRFs, formulaic PRFs and shared PRFs used by
Jens. The proportions (percentage of total PRF usage) are
based on non-cumulative measures of phonological types
used on each session. The analysis reveals a fairly clear
pattern of usage of PRFs throughout development. Productive
PRFs constitute the largest class for the greater part of
the developmental period reported here. Shared PRFs consti-
tute a relatively small proportion of total PRFs throughout
development. Formulaic PRFs exhibit a more variable develop-
mental profile. They tend to constitute a relatively small
class though there is a temporary increase in usage during
the middle of the second year. This period corresponds to
the period in which the proportion of formulaic expressions
in Jens' vocabulary increases (see Figure 4 (a)).
Figure 6(b) provides a breakdown of the proportion of
productive PRFs, formulaic PRFs and shared PRFs used by
Anne. The distribution of PRFs over the three categories
reveals that the majority of PRF forms used by Anne are
found in productive expressions. Both formulaic and shared
PRFs are rare in Anne's vocabulary. The increase in formu-
laic and shared PRFs observed around the middle of the
second year corresponds to an increase in Anne's usage of
formulaic expressions (see 4 (b)).
These results confirm the prediction that the phonolog-
ical form of PRFs will differ according to their source.
Thus, shared PRFs constitute a small class for both chil-
dren. The results also confirm the prediction that produc-
tive PRFs constitute a larger class of phonological types
than formulaic PRFs. The findings are consistent with the
view that productive PRFs are initially derived from open
class lexical items and that formulaic PRFs are initially
derived from grammatical functors. Furthermore, these
results support the earlier characterisations of individual
differences between the children. Insofar as Jens adopts an
holistic (overshooting) segmentation strategy, we might
expect to observe a larger range of formulaic expressions
and hence a larger range of formulaic PRFs. Insofar as Anne
adopts an analytic (undershooting) segmentation strategy,
the range of formulaic PRFs will be inhibited and the range
of productive PRFs enhanced.
5. Conclusions
A principal methodological finding of this work is that
articulatory/fluency criteria comprise a viable set of cod-
ing procedures for identifying the child's representation of
linguistic units. First, it has been shown that transcribers
can reliably agree on the relative precision of
articulation/fluency of an expression to categorise sound
segments as either PRFs, target lexemes or formulaic expres-
sions. [11] Second, the application of articulatory/fluency
criteria in this fashion yields a coherent profile of
development for these children which does not emerge when
distributional/frequency criteria are applied. In particu-
lar, measurements of MLU and vocabulary scores yield pro-
files for the two Danish children which match those reported
in the literature for many other children.
The application of articulatory/fluency criteria to the
identification of linguistic units also permits the identif-
ication of the pattern of individual differences that dis-
tinguish the two children in this study. Thus, Jens is
observed to rely heavily on formulaic chunks in his early
language productions whilst Anne uses shorter units, relying
heavily on Prf s and target lexemes in early productions.
This pattern of individual differences is not apparent from
the application of distributional/frequency criteria.
Interestingly, articulatory/fluency criteria also seems to
assist in the identification of the onset of productive
inflectional morphology in these children. Usage of the Dan-
ish plural morpheme /-er/ increased substantially in usage
in both children shortly after their vocabulary spurts
(Plunkett & Stromqvist [1990]). At this time, articulatory
stress on the inflection was exaggerated by the children,
both in relation to adult norms and their own earlier pro-
ductions. A similar observation with respect to the emer-
gence of inflectional morphology has been made by Smoc-
zynska [1981] in a study of Polish children.
It has been observed that the profiles of linguistic
development revealed by the two methods of analysis tend to
converge towards the end of the developmental period
reported here. This result suggests that the two sets of
criteria are in some sense equivalent during the later
stages of development. It is noteworthy that convergent pro-
files begin to emerge after the children have passed into
(or through) their vocabulary spurts. It has been argued
above that an important triggering factor for the two
children's vocabulary spurts may be their solution of the
segmentation problem. Both PR s and formulaic expressions
atrophy in the two children at precisely the point in
development that target lexemes undergo a rapid increase in
number. Furthermore, both children appear to investigate
alternative segmentation strategies just before the vocabu-
lary spurt occurs, suggesting that they may be calibrating
their hypotheses as to what constitutes an adult target lex-
eme. The solution to the segmentation problem may also
underlie the apparent convergence of the two methods for
identifying linguistic units. Once the children discover the
correct segmentation of formulaic expressions, then these
longer phonetic chunks will be decomposed into separate lex-
ical items. [12] The ensuing shorter length of these units
will result in their being more precisely articulated by the
child and hence more easily identified as productive units
by the transcriber. Furthermore, the distinct lexical encod-
ing of these units will support their participation in a
range of productive combinations which will lead to them
being identified as productive units on the application of
distributional/frequency criteria.
The timing of the vocabulary spurt may also be related
to the predominant segmentation strategy adopted by children
in early development. Anne is a precocious user of target
lexemes as compared to Jens. Her identification of target
lexemes may be assisted by a segmentation strategy that
focuses on shorter units of speech. It was argued above that
the lack of articulatory precision and the fluency associ-
ated with formulaic expressions may result in impoverished
representations in memory of these longer units. These
sparse representational characteristics may hinder the
recognition of new tokens of linguistic units in the speech
signal or hinder the possibility of such units becoming the
objects of segmentation processes themselves. Thus, children
who prefer a more holistically oriented segmentation stra-
tegy may experience greater difficulty in identifying target
lexemes than children whose segmentation preferences focus
on shorter units which are represented in memory in a more
phonetically accurate fashion. Recent work on ``early talk-
ers'' and ``late talkers''(Bates & Thal [1991]; Hampson &
Nelson [In Press]) identifies a tendency for
analytic/referential children to undergo vocabulary spurts
earlier in development than social/expressive children.
Given the clustering of characteristics associated with
these categorisations of children's language (Bates, Breth-
erton & Snyder [1988]), the suggestion that predominant
segmentation strategies play a role in determining profiles
of development coheres with the findings of these studies.
From a theoretical perspective, the results leave open
the question as to why the two children have such distinct
segmentation preferences early in development or how they go
about solving the segmentation problem. In an attempt to
evaluate the role of environmental factors in determining
the children's segmentation strategies, preliminary analyses
of the utterances of the mothers of the two children have
been performed. The same articulatory/fluency criteria as
those applied to the children are used to identify formulaic
expressions in three sessions for each mother (at the begin-
ning, middle and end of the period reported here). The
results indicate that Jens' mother uses a higher proportion
of formulaic utterances. Furthermore, Anne's mother tends to
exaggerate the prosodic contours of her speech to a greater
extent than Jens' mother. Prosodic exaggeration of child
directed speech has also been reported in other studies
(Fernald [1989]; Fernald et al. [1989]; Shute & Wheldall
[1989]). These preliminary findings suggest that the dis-
tinct segmentation strategies adopted by the children may
have an environmental source. The imprecisely articulated
and fluent character of the speech in Jens' mother may
hinder the identification of regularities at the lexeme
level. In contrast, the prosodic exaggerations by Anne's
mother may highlight constituents of an expression, either
through focused stress patterns or contour boundaries.
Any attempt to identify the source of individual
differences in children with environmental factors (such as
child directed speech) must confront the observation that
children reared in apparently similar environments can
develop language in diverse ways. An environmental explana-
tion of why siblings differ in their acquisition strategies
would entail an identification of differences in the
environments of the siblings (say, in the child directed
speech). However, the identification of environmental
correlates of individual differences would not constitute an
explanation of those differences. An account of how
environmental correlates become entrenched in the child's
cognitive/linguistic/perceptual processing apparatus is
still required. In other words, environmental correlates of
individual differences contribute little to our understand-
ing of how children acquire language unless we identify the
manner in which learning mechanisms exploit these
environmental correlates. Unfortunately, we have only a lim-
ited understanding of the nature of these learning mechan-
isms, despite possessing a rich database of acquisition pro-
files across a variety of languages.
With respect to the segmentation problem, some progress
is being made in identifying potential candidate mechanisms
of acquisition. For example, Elman [1990] has shown
how an artificial neural network is able to learn to segment
a sequence of phonemes into linguistic units that correspond
to lexemes. The segmentation process is based on a purely
distributional analysis of the input stream. Interestingly,
the neural network trained on this task passes through
stages en route to solving the segmentation problem in which
it hypotheses units which are both greater and smaller than
the target lexemes. In other words, the network postulates
organisational units during periods of its training that
correspond to the formulaic expressions and PRFs observed in
the children in this study. Clearly, the task that Elman
gives his network is a simplification of the segmentation
problem confronting children. Children are not provided with
an input stream neatly dissected into discrete phonemes.
However, the model does provide an account of the type of
learning mechanism that might be involved in segmenting
speech. Furthermore, this modelling approach provides a
framework for systematically evaluating the effects of
environmental conditions, given a particular cognitive
architecture and learning mechanism. For example, specific
linguistic characteristics that child language researchers
believe might contribute to individual differences in
language acquisition might be manipulated in the input
stream to the network and their effects compared to data
collected from children. The modelling approach offers a
potentially powerful tool for evaluating the interactions of
environmental conditions with different types of learning
mechanisms, in a non-intrusive manner unavailable with
`live' subjects.
The principal empirical finding of this work is that
the timing of children's vocabulary spurts is closely asso-
ciated with their solution of the segmentation problem. The
segmentation switching immediately prior to their vocabulary
spurts, suggests that the children are actively testing
hypotheses as to the appropriate segmentation of the speech
signal. Although this finding does not rule out the
possibility that conceptual developments may play an impor-
tant role in triggering a vocabulary spurt, it does suggest
that the processing of the speech signal itself sets impor-
tant constraints on development. Even though the child may
have discovered all the essential building blocks concerning
the sounds in a particular language, there are still consid-
erable problems to be solved in determining which sounds go
together to form ``legal strings'' in the language.
The results reported here are based exclusively on
children's linguistic productions . However, the solution
to the segmentation problem implies that the child has iden-
tified target lexemes based on representations of the input
signal. It is to be expected, therefore, that segmentation
solutions in expressive language will go hand in hand with
correct segmentations of the input language. In other words,
it is likely that a vocabulary spurt in expressive language
will be accompanied by a similar spurt in receptive
language.
Finally, it has been shown that solutions to the seg-
mentation problem can vary according to the source from
which various segments are derived. For example, PRFs in
formulaic expressions tend to have a different phonological
form to PRFs used in single unit utterances or productive
combinations. It was suggested that this variation could be
traced to the tendency for linguistic units to be anchored
in stressed segments of the speech signal. Formulaic PRFs
may derive from unstressed segments (such as grammatical
functors) whilst productive PRFs may derive from stressed
segments (such as open class lexical items). To the extent
that prosodic and supra-segmental properties of the speech
signal influence segmentation strategies, then cross-
linguistic variation in segmentation solutions reflecting
gross prosodic and supra-segmental differences between
languages can be anticipated. For example, the Mainland
Scandinavian Languages --- Danish, Norwegian and Swedish ---
constitute a typologically (grammatically) homogeneous
group. However, the languages differ considerably in terms
of their prosodic and supra-segmental characteristics. It
might be expected that these differences may lead to cross-
linguistic variation in the form of the segmentation solu-
tions that, say, Danish and Swedish children discover during
the course of development.
Footnotes
* I would like to thank Elizabeth Bates, Judith Goodman,
Virginia Marchman, Ann Peters and Donna Thal for detailed
comments on an earlier draft of this manuscript. Address for
correspondence: Kim Plunkett, Institute of Psychology,
University of Aarhus, Asylvej 4, DK-8240 Risskov, DENMARK.
Email: psykimp@aau.dk
1. Formulaic expressions may also result from the automati-
sation of syllabic sequences, as a result of practice
effects. Practice effects may also lead to filler-like
expressions, such as when words are shortened to encode only
the stressed syllable.
2. In most studies, the internal structure of a child's
utterance is taken for granted. Hence the second stage in
the researcher's segmentation problem, the application of
formulaicity criteria, is not carried through.
3. Given that Anne's MLU profile resembles that of Brown's
[1973] subjects Adam and Eve, it is noteworthy that Brown
used clarity of articulation as a selection criteria in his
study, in order to facilitate the process of transcription.
4. Many speakers may seem to contradict this claim. However,
both practice effects and individual differences amongst
speakers may influence the equilibrium state of the self-
organising phonetic space. Nevertheless, it should be born
in mind that the speed/accuracy trade-off is a within-
subject variable.
5. Some phonetic features may be more robust than others (as
is usually the case in self-organising systems) and hence
appear as articulatory anchor points in the child's produc-
tions.
6. A single-unit expression may consist of either a target
lexeme or a PRF. expressions and combinations.
7. There is a certain circularity in the argument here
since, on the one hand, it is predicted that formulaic
expressions will be imprecisely articulated because of their
length, and on the other hand, articulatory/fluency criteria
are used to identify formulaic expressions. This circularity
is broken only by the empirical findings reported by Lind-
blom [1985] concerning the trade-off between fluency and
articulatory precision. At another level, we may consider
the current work as an attempt to evaluate the assumption
that formulaic expressions are imprecisely articulated by
observing whether this assumption leads to plausible
descriptions of linguistic development.
8. A complete record of these codings, as well as the entire
database for this longitudinal study, can be obtained from
the CHILDES archive administered by Brian MacWhinney,
Dept. of Psychology, Carnegie Mellon University, Pittsburgh,
PA 15213 . Alternatively, copies of the database can be
requested from the author who also maintains the original
recordings.
9. For Anne the vocabulary spurt begins when she controls 93
target lexemes and for Jens 62 target lexemes.
10. This assumption finds support in the current database.
For Anne and Jens, the overwhelming majority of target lex-
emes found in formulaic expressions are deictic pronouns,
Hv. (Wh.) words and concrete nouns --- all lexical items
that tend to be stressed in Danish. The only exception is
the frequent use of the present tense copula in formulaic
expressions which tends to be unstressed in adult Danish.
However, the copula is a frequently used form in adult Dan-
ish occurring in a wide variety of constructions (see
Plunkett & Stromqvist [1990]).
11. It is unclear exactly which properties of the speech
signal transcribers are responding to when they make
categorisations of expressions as formulaic or productive.
Although the coding procedures are designed to focus the
transcriber's attention on the articulatory/fluency dimen-
sion, more detailed phonetic studies are required to isolate
and identify the factors contributing to transcribers'
judgements.
12. Snow [1986] also argues that expressions previously
unanalysed by the child may become the objects of segmenta-
tion processes.
References
Bates, E., Bretherton, I., & Snyder, L. [1988], From First
Words to Grammar: Individual Differences and Dissoci-
able Mechanisms, Cambridge University Press, Cambridge,
MA.
Bates, E., & Thal, D. [1991], "Associations and dissocia-
tions in child language development," in Research on
child language diorders: A decade of progress, J.
Miller & R. Schiefelbusch, eds., Pro-Ed, Austin, TX.
Bloom, L. M. [1973], One word at a time: The use of single
word utterances before syntax, The Hague, Mouton.
Bretherton, I., McNew S., Snyder L., & Bates, E. [1983],
"Individual differences at 20 months: analytic and
holistic strategies in language acquisition," Journal
of Child Language, 10, 293--320.
Brown R. [1973], A first language: The early stages, Lon-
don, Allen Unwin.
Elman, J. L. [1990], "Finding structure in time," Cognitive
Science 14, 179-211.
Fernald, A. [1989]. "Intonation and communicative intent
in mother's speech to infants: Is the melody the mes-
sage? irregular past tense verbs," Child Development
60, 1497--1510
Fernald, A., Taeschner, T., Dunn, J., Papousek, M.,
Boysson-Bardies, B. De & Fukui, I. [1989], "A cross-
language study of prosodic modifications in mothers'
and fathers' speech to preverbal infants," Journal of
Child Language 16, 477-501.
Hampson, J. & Nelson, K. [In Press], "The relation of mater-
nal language to variation in rate and style of language
acquisition," Journal of Child Language.
Hickey, T. [1990], "Identifying formulas in first language
acquisition: an application to Irish," Paper presented
to the Fifth International Congres for the Study of
Child Language Budapest, Hungary.
Jordan, M. I. [1990], "Motor learning and the degrees of
freedom problem," in Attention and Performance, M.
Jeannerod, ed. #XIII, Lawrence Erlbaum Associates,
Hillsdale, NJ.
Lindblom, B. [1985], "Phonetic universals in vowel sys-
tems," in Experimental Phonology, J. J. Ohala, ed.,
Academic Press, San Francisco.
MacWhinney, B. [1978], "The acqusition of morphology,"
Monographs of the Society for Research in Child
Development 43.
MacWhinney, B. [1990], Computational tools for language
analysis: the CHILDES system, Lawrence Erlbaum Associ-
ates, Hillsdale, NJ.
MacWhinney, B. & Snow, C. [1985], "The child language data
exchange scheme," Journal of Child Language 12, 271-
298.
MacWhinney, B. & Snow, C. [1990], "The child language data
exchange scheme: An update," Journal of Child Language.
Miller, J. F. & Chapman R. S. [1981], The relation between
age and mean length of utterance in morphemes," Journal
of Speech and Hearing Research 24, 154-164.
Nelson, K. [1973], "Structure and strategy in learning to
talk," Monographs of the Society for Research in Child
Development 38.
Nelson, K. [1981], "Individual differences in language
development: Implications for acquisition and develop-
ment," Developmental Psychology 17, 170-187.
Peters, A. M. [1989], The units of language acquisition,
Cambridge Series of Monographs and Texts in Applied
Psycholinguistics, Cambridge Univ. Press, New York.
Peters, A. M. [1989], "From schwaa to grammar: The emergence
of grammatical morphemes," Paper presented to the Bos-
ton University Conference on Language Development
Plunkett, K. [1986], "Learning Strategies in Two Danish
Children's Language Development," Scandinavian Journal
of Psychology 27, 64-73.
Plunkett, K. & Stromqvist S. [1990], The Acquisition of
Scandinavian Languages #59, Gothenburg Papers in
Theoretical Linguistics, University of Gothenburg.
Shute, B. & Wheldall, K. [1989], "Pitch alterations in
British motherese: some preliminary acoustic data,"
Journal of Child Language 16, 503-512.
Smoczynska, M. [1981], "Uniformities and Individual Varia-
tion in Early Syntactic Development," Polish Psycholog-
ical Bulletin 12, 3-15.
Snow, C. E. [1972], "Mother's speech to children learning
language," Child Development 43, 549--65.
Snow, C. E. [1986], "conversations with children," in
Language acquisition: Studies in first language
development. Second edition, P. Fletcher & M. Garman,
Cambridge University Press, Cambridge.
Uzgiriz, I. C. & Hunt, J. McV. [1975], Assessment in
Infancy," University of Illinois Press, Chicago.