The Segmentation Problem in Early Language Acquisition

Kim Plunkett

University of Aarhus, Denmark

[The following paper does  not  include  diagrams  that  are
referred  to  in  the  text;  however, CRL will be more than
happy to send hard copies, which include diagrams, to people
who  request them.  Requests can be directed to us via email
or regular mail.  Additionally, footnotes in this version of
the paper appear at the end.]
    The segmentation problem in early language acquisition
                      Kim Plunkett *
               University of Aarhus, Denmark


     An important source of individual  variation  in  young
children  acquiring  their  first language is the variety of
potential solutions to the segmentation problem. Alternative
solutions  can  result in non-standard forms, such as formu-
laic expressions and phonologically reduced forms, in  early
productions.   Articulatory/fluency criteria for identifying
formulaic expressions, phonologically reduced forms and tar-
get  lexemes  in  linguistic  productions  are  defined  and
applied to the analysis of two  Danish  children's  language
development  between  the  ages  of 12 months and 26 months.
The results of this analysis are compared to the results  of
applying  standard  distributional/frequency criteria in the
tabulation of mean length of utterance and  vocabulary  pro-
files  for  both  standard  and  non-standard  forms.  It is
argued that although the two methods yield  converging  pro-
files  of  development  during the latter part of the period
studied, articulatory/fluency criteria  provide  a  coherent
methodology  for  analysing children's early linguistic pro-
ductions and offer a potentially powerful tool for identify-
ing alternative segmentation strategies.  The application of
articulatory/fluency criteria identifies one of the children
as  seeking primarily holistic solutions to the segmentation
problem and relying heavily in early acquisition  on  formu-
laic  expressions. The second child seeks primarily analytic
solutions to the segmentation problem and is  prone  to  use
phonologically reduced forms as productive linguistic units.
Profiles of vocabulary development for  these  two  children
suggest that the solution to the segmentation problem may be
an important trigger for their vocabulary  spurts.  Possible
environmental correlates of these individual differences are
discussed and a learning mechanism for solving the segmenta-
tion  problem in a manner which honours the facts of indivi-
dual variation is introduced.

                     1. Introduction

     A central problem in first language acquisition is  the
segmentation  problem:  How  do children discover the struc-
tural components in the speech signal  without  knowing  the
identity  of  the  target  elements Peters [1983]?  Although
adult speech contains a wealth of phonetic and prosodic cues
that  afford  a  structural  interpretation,  these cues are
often ambiguous and distorted as  a  result  of  performance
factors  (such as hesitations, slips of the tongue, etc.) or
as a result  of  interference  from  other  linguistic  cues
transmitted  in  the speech signal (co-articulation effects,
context effects, etc.).  Children must overcome these diffi-
culties  by bootstrapping their way into the linguistic sys-
tem, making use of whatever  information  they  can  extract
from   the  surrounding  environment  (linguistic  and  non-
linguistic)  and  their  own  predispositions   (innate   or
acquired) for processing linguistic information.

     En route to solving the segmentation problem,  children
may ascribe structural properties to the speech signal which
do not match those of the adult language user. For  example,
children  may  ascribe  lexical status to syllabic sequences
which in the adult tongue are  considered  parts  of  words.
Thus,  Bloom [1973] and Peters [1989] note the prevalent use
of schwaa as a  filler expression  in  early  word  combina-
tions. Alternatively, children may ascribe lexical status to
whole sequences of words and use these lexical  chunks  with
the  same  distributional properties as adult lexical items.
MacWhinney  [1978] refers  to  these  apparently  unanalysed
sequences  as  amalgams.   Peters  [1983]  calls them formu-
laic expressions. The use of fillers and  formulaic  expres-
sions  would  appear  to  be a natural outcome of children's
attempts to solve the segmentation problem. [1] Elements  of
the  speech signal, thus identified, may serve as indispens-
able bootstraps for children entering the linguistic commun-
ity,  especially in interaction with conversational partners
willing to offer generous  interpretations  of  non-standard

     A  number  of  authors  (Bates,  Bretherton  &   Snyder
[1988];  Nelson   [1981]; Peters  [1983]) have noted indivi-
dual differences between children in  the  degree  to  which
they exploit fillers and formulaic expressions. Furthermore,
it has been observed that these differences tend  to  corre-
late with other characteristic differences between children.
For example, Bates, Bretherton & Snyder  [1988],  Bretherton
et  al.   [1983],  Hampson  &  Nelson [In Press], and Nelson
[1973] observe that reliance  on  formulaic  expressions  is
characteristic of a ``social/expressive'' speech style and a
proportionally  high  use  of  pronouns.  In  contrast,  the
absence  of  formulaic  expressions  in a child's productive
vocabulary     has     been     associated      with      an
``analytic/referential''  speech  style  characterised  by a
proportionally high use of concrete nouns.   Peters   [1983]
also  notes  that formulaic productions are typically highly
fluent, possessing a ``mush-mouth'' character.

     Individual variation in the use  of  formulaic  expres-
sions  and  fillers across children may result from alterna-
tive solutions in the application of segmentation processes.
Differences  in  the  perceptual  processing  preferences of
individual children as well as variation in  the  linguistic
environment  (parental  speech  style, degree of exposure to
other children, etc.) undoubtedly influence the  performance
of  the  parsing  mechanisms that yield the potential struc-
tural elements in the speech signal.  For example, the  ten-
dency  for  analytic/referential language users to possess a
high proportion of concrete object names  in  their  vocabu-
laries may result from a parsing preference by some children
to focus on stressed parts of the speech signal.  Pronominal
usage  may  reflect  a  preference for particular positional
characteristics (signal  initial  or  final)  in  the  input
stream  or a sensitivity to frequency characteristics of the
speech  signal.    Alternatively,   individual   differences
between children may emerge from the intersection of a range
of factors,  distinct  from  segmentation  issues,  such  as
social  and cognitive style (what type of things do children
like to talk about and how do they prefer to  express  them-
selves),  level of conceptual and semantic development (what
do they know about the  meanings  of  different  words)  and
conversational  settings  (the degree to which the interests
and conversational styles of others constrain the linguistic
productions of children).

     The determination  of  the  child's  representation  of
linguistic  units constitutes a major methodological problem
for child language research.  In essence, the child language
researcher is confronted with a problem similar to that con-
fronting the child acquiring the language: What are the pro-
ductive  units  underlying  the  speech signal? For example,
when the child articulates the utterance  ``What's  that?'',
the  analyst  must decide whether it consists of one, two or
three distinct morphemes.   Typically,  researchers  resolve
this problem by assuming that a  target item produced by the
child can be identified with a morpheme in the  adult  lexi-
con. Various criteria are then applied to determine the for-
mulaic status of an utterance. [2] Distributional  and  fre-
quency  criteria  (Brown [1973]) have been proposed as tools
in helping to establish the internal structure of children's
utterances.  Thus,  if  potential constituent morphemes only
occur within the context of a given expression, then it  may
be  prudent to treat the expression as formulaic. Similarly,
if particular expressions are used frequently  by  a  child,
then  those  expressions  may  be unanalysed for that child.
Note that neither of these criteria   guarantee  identifica-
tion  of  formulaic expressions.  Morphemes may be used pro-
ductively  by  a  child  in  a  variety  of   distributional
contexts, and yet still participate non-productively in for-
mulaic expressions. The term ``bucket'' may be used  produc-
tively  in  a  variety  of  constructions, but in the phrase
``kicked the bucket'', it has taken  on  an  idiomatic  non-
productive status.  And particular expressions may be highly
frequent in the child's productions and  still  result  from
the  combination  of distinct lexical representations. These
shortcomings suggest that evaluations of  formulaicity  also
need  to take into account properties specific or local to a
given token of an expression.

     Hickey  [1990] suggests criteria to  determine  whether
an  expression  is  formulaic (relative to other spontaneous
utterances). These concern the length of the expression, its
phonological  coherence, the level of grammatical complexity
of the expression, its frequency of usage in the  community,
the  idiosyncracy  of the expression, situational dependency
and semantic or syntactic appropriateness.  She  points  out
that  none of these criteria alone will suffice in determin-
ing the formulaic status of an  expression.  Some  criteria,
such  as length and phonological coherence, may be necessary
characteristics of a formulaic expression but other criteria
can  only be considered typical and not definitional of for-
mulaicity.   Hickey [1990] concludes that the evaluation  of
an expression should be made in relation to a graded contin-
uum i.e., expressions can be more or less formulaic.

     Plunkett [1986] describes a longitudinal  investigation
of  two  Danish children's linguistic and cognitive develop-
ment between the ages of 12 months and 25 months. This study
identifies three types of linguistic units in the children's
productions; idiosyncratic expressions, target  lexemes  and
formulaic  expressions.   Each  of  these  types can be con-
sidered alternative solutions to the  segmentation  problem.
Idiosyncratic  expressions  involve  sound segments produced
consistently by the children which cannot be identified with
lexical  items  in  the  adult language. Typically, idiosyn-
cratic expressions are  filler-like  segments  containing  a
single  vowel  or  vowel-consonant  combination. They can be
thought of as undershooting solutions  to  the  segmentation
problem:  The  child  extracts sound segments from the input
signal which undershoot the scope of adult lexical items. In
contrast,  formulaic  expressions  result  from overshooting
solutions to the segmentation  problem.   Formulaic  expres-
sions  contain  identifiable adult lexical items in combina-
tion with other sound segments which  may  be  either  other
lexical  items or filler-like material. Finally, target lex-
emes represent correct solutions to the segmentation problem
in  that  they  map  directly  onto adult lexical items. The
three types of linguistic unit are thus related in terms  of
their  length,  with  target lexemes representing the inter-
mediate case.

     In order to distinguish between  formulaic  expressions
and  productive  combinations,    Plunkett   [1986]  applied
standard distributional and frequency criteria in establish-
ing the status of a linguistic expression. These evaluations
provided the foundations for establishing a profile  of  MLU
(mean  length  of  utterance) and vocabulary development for
the two children.  For one child, this analysis  revealed  a
``standard''   MLU   profile,  such  as  that  observed  for
Brown's  [1973] subjects Adam and Eve, i.e., MLU  remains at
a  low level (close to 1.0) until around 21 months when both
MLU and vocabulary size show marked increases. However,  for
the  other  child,  MLU  began at a high level (around 1.6),
increased to around 2.0 by 14 months of age and then dropped
over  consecutive  sessions  during  a three month period to
around 1.3. A subsequent increase in    MLU  coincided  with
the  child's vocabulary spurt (around 22 months).   Plunkett
[1986] speculates that the unusually high  MLU  observed  in
the  child  Jens  during early development is an artifact of
the inadequacy of distributional and frequency  criteria  to
identify  formulaic  expressions  in  this  child's  speech.
Ascribing productive status to expressions which are  formu-
laic artificially inflates MLU measures.

     More generally, it might be concluded from  this  study
that distributional and frequency criteria are inadequate to
the task of uncovering this Danish child's solution  to  the
segmentation  problem.  The  issue then arises as to whether
distributional and frequency criteria should be supplemented
with other criteria, as Hickey  [1990]  suggests, or whether
they should be discarded in favour of a separate set of cri-
teria.  In  the  following,  an  alternative set of criteria
based on articulatory/fluency factors (relating to  Hickey's
[1990]  necessary  phonological coherence criterion) will be
proposed for identifying children's solutions to the segmen-
tation  problem,  and  systematically compared with analyses
based exclusively on distributional and frequency criteria.

     In the coding of the two Danish children's  productions
reported  in  Plunkett [1986], transcribers noted informally
that the children differed in how well they tended to  arti-
culate  their  utterances.  For  example, Jens appeared more
prone than Anne to produce longer,  imprecisely  articulated
utterances  whilst  Anne's  utterances  appeared shorter and
clearly articulated. [3]  As  noted  above,  Peters   [1983]
points  out that formulaic expressions are often produced in
a ``fluent'' manner by children, also noting  that  children
may be focusing on supra-segmental aspects of the utterance.
Formulaic expressions may have prosodic qualities associated
with  those of adult utterances (such as intonation) but may
be articulated imprecisely, resulting in difficulty in iden-
tifying the content of the expression.  In contrast, limited
processing resources may result in shorter linguistic  units
such   as   target   lexemes  (or  segments  thereof)  being
articulated more clearly by the child and hence being easier
to    identify.   It   is   possible   to   hypothesise   an
articulatory/fluency continuum for  linguistic  units  where
long, badly articulated, formulaic expressions represent one
end of the continuum and precisely articulated, short  units
(such  as mono-syllabic target lexemes or segments of target
lexemes)  represent  the  other  end   of   the   continuum.
Articulatory/fluency  criteria  may  thus  offer a means for
identifying alternative solutions to the segmentation  prob-
lem,  where  overshooting  solutions (formulaic expressions)
will tend to be badly articulated  and  undershooting  solu-
tions  (segments  of target lexemes) will tend to be clearly
articulated.  Undershooting solutions  to  the  segmentation
problem  will  henceforth  be referred to as  Phonologically
Reduced Forms (PRFs).

     There are good theoretical and experimental grounds for
supposing that the precision with which a linguistic unit is
articulated is closely  related  to  its  length.   Lindblom
[1985]  has  argued  for  an  approach  to understanding the
developing phonetic skills of children  in  terms  of  self-
organising systems.  He shows how the placement of vowels in
phonetic space is dynamically sensitive to the  position  of
other vowels in this space. The tendency of vowels to change
character according to linguistic context  may  be  ascribed
(partially)  to  the  dynamics  of  this phonetic space. The
phonetic realisation of a vowel is determined  not  only  by
the  other phonemes with which it combines in a given utter-
ance, but also by the internal dynamics of the vowel  system
to  which it belongs.  A similar point has been made by Jor-
dan [1990] in a computer model of motor  control  in  speech

     Integral  defining  features   of   Lindblom's   [1985]
phonetic  space are the dimensions of articulatory precision
and fluency.  Articulatory precision defines the  degree  of
accuracy  (in  relation to a target) with which a phoneme is
articulated. Fluency refers to the level of integration of a
sequence  of  phonemes  that  differentiates a smooth from a
halting performance. Any phonetic production  by  a  speaker
can  be  evaluated along these two scales.  Lindblom  [1985]
concludes from a series of  experimental  studies  that  the
dimensions   of   articulatory  precision  and  fluency  are
inversely related i.e., under conditions which require arti-
culate speech, fluency tends to deteriorate whilst articula-
tion  deteriorates  when  high  fluency  is  demanded.  This
trade-off  between  articulatory  precision  and  fluency is
explained as an emergent  property  of  the  self-organising
dynamics  of phonetic space. In other words, it is difficult
to talk quickly and accurately at the same time. [4]

     Now consider a young child confronted with the  segmen-
tation  problem.   Let  us  suppose that regularities in the
child's linguistic environment together with her  perceptual
processing  mechanisms,  have  led  the  child to identify a
sequence of sounds as a structural component in  the  speech
signal. Let us further suppose that in this particular case,
the sequence of sounds corresponds to a sequence of words in
the adult lexicon (e.g., ``What's that?''). The child stores
this linguistic unit in memory, encoding such information as
its distributional linguistic properties, its non-linguistic
context of usage, perhaps information about its meaning, and
information  about  its  phonetic and supra-segmental shape.
The processes of segmentation may also  lead  the  child  to
identify  ``That?''  as  a  linguistic  unit.  Many  of  the
representational properties encoded along with ``That?'' may
overlap   with  those  of  ``What's  that?''.  However,  the
representation of phonetic and  supra-segmental  shape  will
crucially  differ  in the two cases. In particular, the pro-
duction of the two units require coordination of a  sequence
of  phonemes that place differential load on the ``fluency''
dimension  of  the  self-organising  phonetic  space.    The
expression  ``What's  that?''  requires  a greater degree of
phonetic integration that the expression  ``That?''.   Since
fluency  is  inversely related to articulatory precision, it
can be expected that the fluency load on the longer  expres-
sion will detract from its articulatory precision. Thus, the
shorter expression ``That?'' will be articulated  more  pre-
cisely than the longer expression ``What's that?''. Further-
more, to the extent that the supra-segmental properties of a
linguistic  unit are unaffected by the self-organising char-
acter of phonetic space, we may expect  the  supra-segmental
properties  of  the  longer unit to remain intact whilst its
phonetic features suffer articulatory distortion. [5]

     It is unclear whether the  proposed  trade-off  between
articulatory  and  fluency  characteristics  should  be con-
sidered a part of the linguistic representation of a unit or
a  by-product  of the processes of production.  The distinc-
tion between representation and production may  have  impor-
tant  implications  for  children's  developing segmentation
hypotheses: If a unit's phonetic  features  are  imprecisely
encoded,  then  the  representation  of  that  unit  will be
affected in its role in recognising new tokens of  the  unit
or  functioning itself as the object of further segmentation
(Snow  [1986]).  In this case, the  representational  system
may exhibit a natural tendency to maintain phonetically pre-
cise units in memory, as the  phonetically  imprecise  units
atrophy in the absence of a functional recognitory role.

     In  summary,  this  framework  provides  a  theoretical
rationale  for  the  claim that longer linguistic units will
tend to be badly articulated and  shorter  linguistic  units
will  be  precisely articulated. By definition, overshooting
solutions to the  segmentation  problem  (formulaic  expres-
sions)  will be classified as longer linguistic units whilst
undershooting solutions (PRFs produced as productive  units)
will  be  classified  as  shorter linguistic units. Hence, a
hierarchy of articulatory precision can be predicted for the
three  logically  permitted  solutions  to  the segmentation
problem: Phonologically reduced  forms will tend to be  well
articulated, target lexemes less well articulated and formu-
laic expressions the least well articulated. This  hierarchy
is  depicted  in  Figure  1, which also shows the postulated
inverse relationship between precision of  articulation  and
degree of fluency.

     Two factors interfere with  these  predictions.  First,
target  lexemes  vary in length and hence their articulation
will vary (all else  being  equal)  in  phonetic  precision.
Thus,  there  will be some overlap in articulatory precision
of long target lexemes and formulaic expressions on the  one
hand,  and  short target lexemes and PRFs on the other hand.
Within the category of units  that  map  onto  adult  words,
short words should be easiest to identify given their poten-
tial accuracy of  articulation.   Second,  practice  on  any
given  sound sequence will tend to influence the equilibrium
between fluency and articulatory precision  such  that  more
fluent  renditions  of a sound sequence are achieved without
loss of articulatory precision. Thus, formulaic  expressions
(and long target lexemes) may increase in articulatory accu-
racy with practice.

     In this paper,  a  complete  re-analysis  of  the  data
described  in   Plunkett  [1986]  is presented in an attempt
to evaluate the degree to which articulatory/fluency charac-
teristics  of  children's  speech furnish an adequate set of
criteria for identifying children's solutions to the segmen-
tation  problem.  In the next section, an operationalisation
of  articulatory/fluency  criteria  is  presented  and  then
applied to an analysis of the two Danish children's linguis-
tic  productions.  A  profile  of  vocabulary   development,
including  formulaic  expressions, target lexemes and phono-
logically reduced forms, together with  MLU  assessments  is
provided.  The  results  of  this analysis are then compared
with     a     re-analysis     based     exclusively      on
distributional/frequency  criteria.  These analyses are used
to evaluate the view that a source of  individual  variation
in  children is the range of solutions they uncover in their
attempts to solve the segmentation  problem  and  that  this
variation  interacts  with  the self-organising character of
the phonetic system to produce individual variation  in  the
articulatory precision of children's utterances.

                       2. Methodology

     Two Danish children,  a  boy  and  a  girl,  and  their
parents   have  participated  in  a  longitudinal  study  of
linguistic and cognitive  development.  The  parents  volun-
teered  their families for the investigation. In one family,
the girl, Anne, has an  elder  sister  (by  two  years)  and
parents  who  have  both completed university educations. In
the second family, the boy, Jens, is  a  single  child.  His
mother  was  beginning a university education and his father
is a skilled labourer.

     The families were visited in their homes on  a  regular
basis  (approximately  every  10  days)  and 60 to 90 minute
audio and video recordings made of the children in  interac-
tion with their parents (most often the mother) in a variety
of situations. All sessions included a free play  situation,
a  testing  situation and an eating situation. On some occa-
sions, bathing and kitchen situations  were  also  recorded.
Testing  situations  involved  administering  Uzgiriz & Hunt
[1975] infancy assessment  scales.  Two  investigators  were
present  at each visit; one investigator carried out testing
procedures and took notes; the second  investigator  managed
the  recording  equipment. The parents were also interviewed
about their child's development since  the  previous  visit.
Specifically,  they  were  questioned  about  the children's
motor development, use of new and old words  and  any  other
noteworthy  events  in  the  family.  Recordings of the boy,
Jens, began when he was 11 months old.   Recordings  of  the
girl, Anne, began when she was 8 months old.

     Complete transcriptions of the children's and  parent's
speech  and  any  investigator  speech  to the children were
coded in a computerised database. Transcriptions  were  made
primarily from the video recordings though the audio record-
ings were used on those occasions where improved sound qual-
ity  might  aid  analysis.  Non-verbal activities that might
help in the interpretation of the speech were also coded  in
the transcriptions. The transcription format is based on the
Chat scheme taken from the  Childes  initiative  (MacWhinney
[1990]; MacWhinney & Snow [1985]; MacWhinney & Snow [1990]).
Utterances were identified in accordance  with  intonational
and  pause  criteria (Snow  [1972]).  Interjudge reliability
checks on identification of utterance boundaries across  20%
of  the  database  for two independent transcribers achieved
agreement scores of 90%.

     Utterances are identified as containing target lexemes,
phonologically  reduced  forms  (PRFs)  or formulaic expres-
sions. Utterances may consist in just one of  these  expres-
sion  types  or  any combination thereof. A target lexeme is
defined as a unit of speech which can  be  recognised  as  a
token  of  a  entire word or morpheme belonging to the adult
lexicon.  Recognition  criteria  are  interpreted  liberally
insofar  as  exact  replication  of  the  adult  form is not
demanded.  Thus, a word may be produced with an  inappropri-
ate  vowel  or  consonant  and yet still be transcribed as a
target lexeme.  Recognition is  thus  determined  by  global
characteristics  of  the  sound  segment and, inevitably, by
appropriateness of the conditions of usage. Inaccuracies  in
the  pronunciation  of  a  target  lexeme  are recorded on a
separate coding tier, as permitted by the Chat  coding  for-

     PRFs  are defined as segments of speech which cannot be
identified with nor contain target lexemes. They are assumed
to derive from  undershooting solutions to the  segmentation
problem.  PRFs are judged by the transcriber to be used in a
meaningful, communicative fashion. For  example,  the  child
may  point  at  an  object and simultaneously articulate the
vowel /e/ --- a sound which does not correspond to a  Danish
word.  Furthermore, PRFs must achieve a certain frequency of
usage to be included in the analysis. It is stipulated  that
an  expression  must  have been identified on at least three
occasions (within a previous session or the current session)
to  qualify.   PRF   expressions are coded in the Chat tran-
scription format using the ``special learner form markers''.
Interjudge reliability measures have been calculated for the
identification of target lexemes and PRFs across 20% of  the
database  for  two  independent  transcribers.  An agreement
score of 92% and 88% for target lexemes  and  PRFs,  respec-
tively, was achieved.

     Formulaic expressions are identified in terms of  their
content  and fluency of articulation. A formulaic expression
must contain at least one target lexeme plus some additional
phonetic  material  that   may  be identified as one or more
target lexemes and/or PRFs.  The distinction between produc-
tive  sequences of target lexemes and PRFs, on the one hand,
and formulaic amalgams of these segments, on the  other,  is
determined  in terms of precision of articulation. Sequences
which are produced fluently and imprecisely articulated  are
categorised  as  formulaic expressions.  The operationalisa-
tion of articulatory/fluency criteria is described below.

     It is important to distinguish between  sound  segments
which  are  incorrectly articulated and those which are pro-
duced in a fluent and imprecise fashion. As noted  above,  a
child may be attributed with the ability to produce a target
lexeme despite the fact  that  some  of  the  consonants  or
vowels  in  the actual articulation are incorrect.  However,
an incorrectly produced vowel or consonant may still be pre-
cisely articulated. In contrast, sound segments in an actual
production may appear to contain the same  phonemes  as  the
adult  target lexeme, and yet be articulated in a fluent and
imprecise manner. In general, a fluent and imprecise articu-
lation will contribute to the difficulty in identifying con-
stituent phonemes, and consequently, the boundaries  between
phonemes.  On the other hand, the tendency for fluently pro-
duced utterances to carry supra-segmental  properties  (like
intonation)  may assist in the identification of constituent

     The identification of formulaic  expressions  has  been
operationalised  in the following manner: For each child and
each session, a set of 10 utterances is selected by a native
speaker  of  Danish  with  considerable  experience in child
language transcription as being articulated in a  character-
istically  precise fashion by the child.  A second set of 10
utterances is also selected as being produced in  a  charac-
teristically  imprecise, ``mush-mouthed'' fashion.  Both set
of    utterances    contain    single    unit. [6]     Two
independent transcribers   (both experienced)  are  informed
that the 20 utterances from each session contain two clearly
distinct types of child utterance and are asked to  categor-
ise  the utterances along the dimension of articulatory pre-
cision versus mush-mouthedness.   The  transcribers  compare
their   judgements   until  they  reach  full  agreement  on
categorising this limited set of utterances for  each  child
and  each  session, into the two correct sets. The two tran-
scribers are then asked to make ``forced-choice'' judgements
into  the two categories of articulatory precision on 20  of
the child utterances for each session selected randomly from
the  database.  Interjudge  reliability scores are then com-
puted. These measures indicate an  85   level  of  agreement
between  the  two  transcribers.  The remaining material for
each child and each session is then coded by  an  individual
(trained) transcriber.

     In summary, target lexemes are  those  sound  sequences
which  can  be identified with adult lexical items. They may
occur in isolation (as single-word utterances) or in  combi-
nation with other linguistic units.  If target lexemes occur
alone, or in combination with  other  linguistic  units  and
they are precisely articulated, then they are attributed the
status of productive lexical items. Otherwise,  target  lex-
emes  are  categorised as occurring in formulaic combination
with additional  phonetic  material.   Similarly,  PRFs  may
occur  in  isolation or in combination with other linguistic
units. In  both  cases  they  are  assumed  to  derive  from
undershooting  solutions  to the segmentation problem.  PRFs
which occur in  isolation,  or  in  combination  with  other
linguistic units and are precisely articulated, are categor-
ised as having a productive status. PRFs which  are  articu-
lated   imprecisely   together  with  additional  linguistic
material are categorised as being in  formulaic  combination
with  those  sound  segments. Finally, formulaic expressions
are those sound sequences which have been identified as con-
taining  at least one target lexeme plus some other phonetic
material (either other target lexemes or  PRFs)  and   which
have  been  categorised  under  a  forced choice decision as
being imprecisely articulated. [7] Formulaic expressions are
coded  in  Chat  format  using the notation for ``compound''
expressions. This format maintains a  coding  for  potential
boundaries    between    the   constituents   of   formulaic
expressions, and hence permits the analysis of the  database
in   terms   of   both   articulatory/fluency  criteria  and
distributional/frequency criteria. [8]

                        3. Analysis

     Measures of mean length of utterance ( MLU ) are com-
puted  for  each  child on each session.    MLU  is computed
under two conditions for identifying the child's representa-
tion of linguistic units:

  1. Articulatory/fluency criteria are used to identify pro-
ductive linguistic units.

  2.  Distributional/frequency  criteria,  as  suggested  by
Brown  [1973],  are  used  to identify productive linguistic
units.  Both target lexemes and  PRFs are  accorded  produc-
tive  status  only  if they occur in at least three distinct
linguistic contexts, including single  unit  utterance  con-

Formulaic expressions are treated as  single  units  in  the
calculation of MLU irrespective of the method used for iden-
tifying formulaic expressions. In each  condition,  MLU   is
calculated  both  when PRFs are included and discounted from
the measure. MLU  calculations include all  utterances  pro-
duced by the child on any given session.  Tabulations of the
MLF (mean length of formulaic expressions) as identified  by
articulatory/fluency  criteria  and use of formulaic expres-
sions, target lexemes and PRFs  as  a  proportion  of  total
vocabulary  are  also  calculated. Finally, the phonological
overlap between PRFs in formulaic expressions and  PRFs used
in productive combinations or used in single unit utterances
is examined.
                 4. Results and Discussion

     The following analysis is based on 6776 and 4229 utter-
ances  by  the  children  Anne  and Jens, respectively. This
yields an average of 322 utterances per session for Anne and
201  utterances  per session for Jens. The minimum number of
utterances observed in a session was 76  for  both  children
(session  5  for Anne and session 9 for Jens). There was one
other session where total observed utterances dropped  below
100 (session 11 for Jens).

     The analyses summarised in Figures 2 and 3 compare Anne
and    Jens'    MLU     when    calculated    according   to
articulatory/fluency criteria  and  distributional/frequency
criteria.   MLU  measures are reported as a means of assess-
ing the emergence and development  of  productive,  combina-
torial  language.   In  addition,  these analyses provide an
evaluation of the role that PRFs play in single unit  utter-
ances  and  combinatorial  expressions.   Normally, PRFs are
excluded from MLU calculations (Miller  &  Chapman  [1981]).
However, the focus here is upon the child's solutions to the
segmentation problem and the role that  different  types  of
segmentation play in the child's productive language. There-
fore, Figures 2 and 3 include comparisons of MLU  when  PRFs
are included and excluded from the analysis.

     Figure 2(a)  plots  Anne's  MLU   when  measured  using
distributional/frequency  criteria  to  identify  productive
linguistic units.  When PRFs are excluded from the analysis,
Anne starts producing productive combinations of target lex-
emes from 15 months onwards. Significant gains are  made  in
her   MLU  score between 21 and 24 months.  The inclusion of
PRFs in the analysis tends to elevate  MLU  scores  particu-
larly  for  the  first half of the study reported here. This
finding indicates that Anne tends to use PRFs in combination
with  target  lexemes  early  in  acquisition  but  that  an
increasing   proportion   of   combinations   are   composed
exclusively of target lexemes as development proceeds.

     Figure 2(b) plots Anne's MLU  when  measured  according
to  articulatory/fluency  criteria.  On  this analysis, Anne
starts producing  productive  combinations  from  18  months
onwards.  The  inclusion  of PRFs shows a slight tendency to
elevate MLU scores during the  second  part  of  the  period
reported  here.   MLU   scores  during  the  first  half  of
development (up to 18 months) are minimally affected by  the
inclusion or exclusion of PRFs in the analysis.

     The profile of MLU development for  Anne,  when  calcu-
lated  according  to  articulatory/fluency criteria, matches
that reported for many  other  children  (Miller  &  Chapman
[1981]), i.e.,   MLU remains at a minimum level (1.0) during
early acquisition with  the  first  productive  combinations
emerging  during  the  second half of the second year.  This
development is also often  associated  with  a  ``vocabulary
spurt''.  In contrast, the distributional/frequency criteria
depict Anne as a somewhat precocious combinatorial  language
user,  productively combining target lexemes with each other
and PRFs with target lexemes well before 18 months  of  age.
For      the      later      periods     of     development,
distributional/frequency and  articulatory/fluency  criteria
reveal similar MLU profiles.

     Figure 3(a) plots Jens' MLU  when measured according to
distributional/frequency  criteria.   When PRFs are included
in the analysis, high levels of MLU are observed throughout.
Furthermore,  substantial  swings  in MLU  are observed.  In
particular, a tendency for MLU  to decrease over 4  consecu-
tive   sessions  (from  13  months  through  16  months)  is
apparent. This result replicates the findings  reported  for
Jens'  MLU   profile in  Plunkett  [1986].  Eliminating PRFs
from the MLU analysis on the distributional/frequency condi-
tion  dramatically  alters  the  profile of development. MLU
scores are reduced throughout, though again, especially dur-
ing  the  earlier  stages of acquisition. In particular, the
decrement in MLU reported in the  original  study  is  obli-
terated,  indicating  that  this  earlier  finding  was  due
entirely to including PRFs in  the  MLU  calculations.  When
PRFs  are  eliminated  from the analysis, Jens' MLU score is
seen to make substantial gains between  21  and  24  months.
However,  MLU   for the first half of the study remains high
even when PRFs are excluded from the analysis (compare  Fig-
ure  2(a) with Figure 3(a)), indicating that Jens is produc-
ing combinations of target lexemes from the beginning of the

     Figure 3(b) plots Jens' MLU  when measured according to
articulatory/fluency  criteria. On this analysis, productive
combinations do not emerge until  around  21  months.   From
this point in development onwards, MLU  scores make substan-
tial gains. The inclusion of PRFs in the analysis has little
effect  on  this  profile  of development. During the latter
part of development (from 21 months), the  MLU   scores  for
both  the  distributional/frequency and articulatory/fluency
conditions are remarkably similar.  Furthermore,  the   pat-
terns  of  MLU  development for Anne and Jens, when measured
using articulatory/fluency criteria, are quite similar.  The
primary  difference  between  the  two  children is that the
onset of Jens' productive combinations  is  later  and  more

     The results of these analyses support two main  conclu-
sions.   First, the application of articulatory/fluency cri-
teria to the identification of linguistic units results in a
less precocious profile of the onset of productive, combina-
torial speech for these two children than the  profile  that
results  from  the  application  of distributional/frequency
criteria.  Second, articulatory/fluency  criteria  are  less
sensitive  to  the inclusion of PRFs in the analysis of pro-
ductive, combinatorial speech. This finding indicates that a
greater  proportion  of PRFs are categorised as belonging to
formulaic expressions when articulatory/fluency criteria are
applied  than  when  distributional/frequency  criteria  are
applied. The role of PRFs in formulaic expressions  will  be
evaluated in more detail later in this section.

     In general,  the  profile  of  productive  combinations
revealed by articulatory/fluency criteria provides a picture
of development for these two  children  which  matches  that
reported in the literature for many other children. Further-
more, the application of articulatory/fluency criteria elim-
inates  some  apparent inconsistencies (such as the apparent
regression in  MLU for Jens) that can result from the appli-
cation  of distributional/frequency criteria. However, these
findings  do  not  warrant  the  further   conclusion   that
articulatory/fluency  criteria  are   necessarily a superior
method for identifying linguistic units.  In  fact,  in  the
absence  of an independent measure of productivity (which we
lack), it is difficult to reach any final conclusion  as  to
which  set  of  criteria  are most appropriate. Furthermore,
given the similarity in profiles of development revealed  by
both sets of criteria for the later parts of development, it
is unclear whether either of  the  methods  can  be  claimed
superior  to  the  other throughout development. At best, it
might be argued that the application of articulatory/fluency
criteria  yields  a convergent set of interpretations of the
nature of a child's  language  productions;  interpretations
that    do    not    result    from   the   application   of
distributional/frequency criteria. In the remainder of  this
section, analyses of the two Danish children's use of formu-
laic expressions, target lexemes and PRFs as  identified  by
articulation/fluency criteria is presented, in an attempt to
provide such a coherent account of their development.

     Figure 4(a) provides a breakdown of the  proportion  of
formulaic expressions, target lexemes and PRFs used by Jens.
The proportions (percentage of total vocabulary)  are  based
on non-cumulative measures of types used on each session. It
is apparent that a major proportion of Jens' early ``vocabu-
lary  items''  consist of formulaic expressions.  There is a
period of development (13 through 16 months)  in  which  the
proportion  of formulaic expressions decreases over consecu-
tive sessions. This period corresponds to that in which  MLU
decreases for Jens when measured by distributional/frequency
criteria (see Figure 3(a)).   Plunkett   [1986]   speculated
that this temporary decrease in MLU  might manifest a switch
by Jens from a holistic learning strategy to a more analytic
strategy. It is noteworthy from the current analysis that at
the same time as formulaic expressions  decrease  in  usage,
PRFs  (not incorporated in formulaic expressions) undergo an
increase in usage by Jens. In terms of the current theoreti-
cal  framework  (see  section  1), productive PRFs reflect a
tendency to segment the input into small units whilst formu-
laic  expressions  reflect  segmentation into larger chunks.
The decrease in  proportion  of  formulaic  expressions  and
increase  in  proportion of PRFs may result from a more gen-
eral switch of segmentation  strategy  by  Jens  to  smaller
units  of  phonetic  material.  The  cause of this switch is
unclear. Nevertheless, this interpretation is  supported  by
further developments in the structure of Jens' vocabulary: A
subsequent increase in formulaic expressions is  accompanied
by  a  decrease in the use of  PRFs.  Throughout this period
of  segmentation switching, the proportion of target lexemes
in  Jens'  vocabulary remains relatively stable. Target lex-
emes represent an intermediate level of segmentation of  the
speech signal --- see Figure 1.

     Target lexemes constitute a relatively small proportion
of  Jens' vocabulary during the early stages of acquisition.
This proportion gradually increases until 22 months at which
point they constitute a majority of the items in Jens' voca-
bulary. Beyond this point, the proportion of target  lexemes
increases  dramatically  (Jens'  vocabulary  spurt)  and the
usage of formulaic expressions and  PRFs atrophies, indicat-
ing  that Jens has discovered appropriate adult-like segmen-
tations of the input stream.  The vocabulary spurt occurs at
the  same  time  as substantial gains in MLU, as measured by
articulation/fluency criteria, occur for Jens.

     Figure 4(b) plots an equivalent vocabulary analysis for
Anne.  In  contrast to Jens, Anne uses relatively few formu-
laic expressions. On the other hand, her proportional  usage
of  PRFs  is  high  during  early  development.  Like  Jens,
increases in usage  of  formulaic  expressions  covary  with
decreases  in  usage of  PRFs. Thus, around 16 months formu-
laic expressions increase in  usage  whilst  PRFs  decrease.
Usage of target lexemes tends to remain within a fairly con-
fined range during the first half of the second year.   This
analysis  suggests  that Anne tends to focus on shorter seg-
ments in the input stream during early  development,  though
she  does  begin to explore larger chunks towards the middle
of her second year.  Segmentation  switching  is  thus  also
observed  in  Anne  as  well  as Jens, though in contrastive
directions.  By 20 months, however, target  lexemes  consti-
tute  a  majority  (over  50  )  of Anne's vocabulary items.
Beyond this point of development, the proportion  of  target
lexemes increases dramatically (Anne's vocabulary spurt) and
the usage of formulaic expressions and PRFs atrophies, indi-
cating  that  Anne too has discovered appropriate adult-like
segmentations of the input stream.  Like Jens, this  vocabu-
lary  spurt  occurs at the same time as substantial gains in
MLU, as measured by articulation/fluency criteria, occur for

     Several conclusions are warranted from the findings for
the  structure  and profile of development in Anne and Jens'
vocabularies. First, the earlier work (Plunkett  [1986])  is
corroborated   in  characterising  Jens  as  a  holistically
oriented language user and Anne as an analytically  oriented
language  user insofar as Jens shows a preference for longer
formulaic expressions in early development while Anne  shows
a  preference  for shorter PRFs.  It is noteworthy, however,
that both children seem to explore alternative  segmentation
strategies before they undergo a vocabulary spurt. Segmenta-
tion switching may represent an attempt by these children to
calibrate  their  hypotheses as to what constitutes a target
lexeme in the adult language.   Second,  the  onset  of  the
vocabulary spurt for both of these children corresponds less
to the absolute number of target lexemes  in  their  vocabu-
laries [9] than to the proportion of target lexemes in their
vocabularies. Thus, we observe that when a majority of  pos-
tulated linguistic units match vocabulary items in the adult
lexicon,  then   vocabulary   development   experiences   an
accelerated  growth. The finding that usage of PRFs and for-
mulaic expressions atrophies beyond this point suggests that
these two children have discovered a ``key'' to the solution
of the segmentation problem. Although these results  do  not
tell us what this ``key'' is or the nature of the mechanisms
that might lead to its discovery, these results suggest that
some  critical  proportional mass of target lexemes may be a
prerequisite for accelerated vocabulary growth and that  the
solution  to  the segmentation problem may play an important
role in triggering the vocabulary spurt. The timing  of  the
vocabulary spurt corresponds very closely to the achievement
of a high proportion of target  lexemes  in  the  children's
productive vocabularies.

     A fundamental assumption of this work is that the arti-
culatory  precision  associated  with  a  linguistic unit is
closely related to its length. Thus, one of the  identifying
features of a formulaic expression is that it is imprecisely
articulated.  However,  the     combination  of   linguistic
units,  such as target lexemes, also leads to longer expres-
sions that may themselves suffer imprecise articulation  due
to  limited processing resources. Therefore, one interpreta-
tion of these findings might be that  formulaic  expressions
as identified by articulatory/fluency criteria are, in fact,
productive combinations which are  imprecisely  articulated.
In  order to evaluate this interpretation, the next analysis
highlights the formulaic expressions used by these two chil-
dren by calculating the mean length of formulaic expressions
as measured in terms of average number of target lexemes and
PRFs. These results are compared with the children's overall
MLU  measures (see Figures 2 and 3).

     Figure 5(a)  provides  an  analysis  of  the  formulaic
expressions     used    by    Jens    as    identified    by
articulatory/fluency criteria.  The mean length of formulaic
expressions  (MLF) is determined both under those conditions
in which PRFs are included and excluded  from  the  calcula-
tions.  When  PRFs  are  excluded  from  the  analysis,  MLF
reflects the average number of target lexemes used  by  Jens
in  formulaic  expressions.   The  PRFs included in formulas
have the character of ``fillers''  (Peters   [1983]),  since
they  are by definition used in combination with target lex-
emes and are imprecisely and fluently articulated.

     During the period that Jens uses formulaic  expressions
(up  to  but  not  including the final session), they show a
general tendency to increase in length, as defined in  terms
of  average  number of target morphemes. However, during the
early stages of acquisition, the majority of Jens'  formulas
contain  just  one  target lexeme plus an expression-initial
and/or expression-final PRF. Towards the end of  the  period
reported here, PRFs come to play less of a role in formulaic
expressions, which are made up primarily of target  lexemes.
Substantial  increases  in  the  number of target lexemes in
formulaic expressions occur at the same time as Jens  under-
goes a vocabulary spurt.

     Most  importantly,  the  average  length  of  formulaic
expressions (as measured in target lexemes) does not greatly
exceed the overall    Mlu for  Jens  during  the  period  in
development  when  he  begins to produce productive combina-
tions    (after    22    months)    as     determined     by
articulatory/fluency criteria (see Figure 3(b)). Thus, Jens'
productive  combinations  are  precisely  articulated   even
though  they  approach  the  same  length  as  his formulaic
expressions which are poorly articulated.

     An analysis of formulaic expressions for Anne  is  pro-
vided in Figure 5(b). The pattern of development for Anne is
quite similar to that  of  Jens.  The  length  of  formulaic
expressions,   defined  only  in  terms  of  target  lexemes
increases with development.  PRFs  contribute  diminishingly
as  development  proceeds.  For  a  few  sessions  in  early
acquisition, Anne produces formulaic expressions which  con-
sist  exclusively of a single target lexeme combined with an
expression-initial  PRF.   PRFs  occupy   expression-initial
position   when   they   occur  in  a  formulaic  expression
throughout the period reported here.  Substantial  increases
in  the  number  of  target lexemes in formulaic expressions
occur at the same time as Anne undergoes a vocabulary spurt.

     Like Jens, the average length of formulaic  expressions
does not greatly exceed the overall MLU  (as shown in Figure
2(b)) during the period when Anne begins to produce  produc-
tive  combinations (after 19 months). In other words, Anne's
productive  combinations  are  precisely  articulated   even
though  they  approach  the  same  length  as  her formulaic
expressions.  Taken together, these findings  indicate  that
the  trade-off  between  articulatory  precision and fluency
operates primarily at  the  level  of  the  linguistic  unit
rather than over the span of whole utterances.

     Although  PRFs are derived from   undershooting   solu-
tions  to  the  segmentation  problem, the previous analysis
demonstrates that PRFs are  involved  in  formulaic  expres-
sions,  particularly during the early stages of acquisition.
This result can be  viewed  as  a  natural  outcome  of  the
children's  attempts to segment the speech signal. For exam-
ple, an hypothesised unit may overshoot a single target lex-
eme  but  fail  to  encompass  a  second  target lexeme. The
resulting unit will contain a single target  lexeme  plus  a
segment  of  the second target lexeme, i.e., a PRF. However,
this raises the question as to whether  the  PRFs  found  in
formulaic expressions are of the same type as  PRFs found in
single unit utterances or  in  productive  combination  with
target lexemes.

     Suppose  that  formulaic  expressions   are   typically
anchored  around  stressed  parts  of the input signal. [10]
Given the tendency for speech to oscillate between  stressed
and  unstressed  segments, we might expect some overshooting
solutions to  the  segmentation  problem  to  incorporate  a
salient  stressed item plus a less salient, unstressed item.
Under these  conditions  of  segmentation,  PRFs  may  often
derive  from  unstressed segments.  Grammatical functors are
frequently unstressed segments in the input stream.   Hence,
we  might  expect  a tendency for  PRFs in formulaic expres-
sions to derive from grammatical functors. In contrast, PRFs
that  occur in single unit utterances or productive combina-
tions are likely to derive from salient (for the child) seg-
ments of the input stream. For example, PRFs may derive from
stressed segments. Open class  lexical  items  are  frequent
candidates  for  stress  in  adult Danish. Thus, it might be
expected that  productive  PRFs  are  often  based  on  open
class  lexical items. A corollary of this prediction is that
the class of productive PRFs is likely to be  more  numerous
than the class of formulaic PRFs, since grammatical functors
constitute a relatively small closed class  of  phonological

     This prediction is tested in an analysis of  the  range
of  distinct  types  of PRFs used by the two Danish children
over the course of development.  Figure 6 summarises the use
of  PRFs  by  Jens  and  Anne  in  terms  of three different
categories. First, tokens of productive   PRFs  are  identi-
fied. These consist of PRFs that occur exclusively in single
unit utterances or productive combinations.  Second,  tokens
of formulaic PRFs are identified. These consist of PRFs that
occur exclusively in formulaic expressions.  Finally, tokens
of  shared   PRFs  are  identified as those PRFs which occur
both in formulaic expressions and in single unit  utterances
or productive combinations.

     Figure 6(a) provides a breakdown of the  proportion  of
productive   PRFs,  formulaic  PRFs  and shared PRFs used by
Jens.  The proportions (percentage of total  PRF usage)  are
based  on  non-cumulative  measures  of  phonological  types
used on each session. The analysis reveals  a  fairly  clear
pattern  of usage of PRFs throughout development. Productive
PRFs constitute the largest class for the  greater  part  of
the  developmental period reported here. Shared PRFs consti-
tute a relatively small proportion of total PRFs  throughout
development. Formulaic PRFs exhibit a more variable develop-
mental profile. They tend to constitute a  relatively  small
class  though  there is a temporary increase in usage during
the middle of the second year. This  period  corresponds  to
the  period in which the proportion of formulaic expressions
in Jens' vocabulary increases (see Figure 4 (a)).

     Figure 6(b) provides a breakdown of the  proportion  of
productive  PRFs,  formulaic  PRFs  and  shared PRFs used by
Anne. The distribution of  PRFs over  the  three  categories
reveals  that  the  majority of  PRF  forms used by Anne are
found in productive expressions.  Both formulaic and  shared
PRFs  are  rare in Anne's vocabulary. The increase in formu-
laic and shared PRFs  observed  around  the  middle  of  the
second  year  corresponds  to an increase in Anne's usage of
formulaic expressions (see 4 (b)).

     These results confirm the prediction that the phonolog-
ical  form  of  PRFs  will differ according to their source.
Thus, shared  PRFs constitute a small class for  both  chil-
dren.  The  results also confirm the prediction that produc-
tive PRFs constitute a larger class  of  phonological  types
than  formulaic  PRFs.  The findings are consistent with the
view that productive  PRFs are initially derived  from  open
class  lexical  items  and that formulaic PRFs are initially
derived  from  grammatical  functors.   Furthermore,   these
results  support the earlier characterisations of individual
differences between the children. Insofar as Jens adopts  an
holistic  (overshooting)  segmentation  strategy,  we  might
expect to observe a larger range  of  formulaic  expressions
and  hence a larger range of formulaic PRFs. Insofar as Anne
adopts an analytic  (undershooting)  segmentation  strategy,
the  range of formulaic PRFs will be inhibited and the range
of productive PRFs enhanced.

                      5. Conclusions

     A principal methodological finding of this work is that
articulatory/fluency  criteria comprise a viable set of cod-
ing procedures for identifying the child's representation of
linguistic units. First, it has been shown that transcribers
can  reliably   agree   on   the   relative   precision   of
articulation/fluency  of  an  expression to categorise sound
segments as either PRFs, target lexemes or formulaic expres-
sions.  [11] Second, the application of articulatory/fluency
criteria in  this  fashion  yields  a  coherent  profile  of
development  for  these  children which does not emerge when
distributional/frequency criteria are applied.  In  particu-
lar,  measurements of  MLU  and vocabulary scores yield pro-
files for the two Danish children which match those reported
in the literature for many other children.

     The application of articulatory/fluency criteria to the
identification of linguistic units also permits the identif-
ication of the pattern of individual differences  that  dis-
tinguish  the  two  children  in  this  study. Thus, Jens is
observed to rely heavily on formulaic chunks  in  his  early
language productions whilst Anne uses shorter units, relying
heavily on    Prf s and target lexemes in early productions.
This  pattern of individual differences is not apparent from
the  application   of   distributional/frequency   criteria.
Interestingly,  articulatory/fluency  criteria also seems to
assist in the identification  of  the  onset  of  productive
inflectional morphology in these children. Usage of the Dan-
ish plural morpheme /-er/ increased substantially  in  usage
in  both  children  shortly  after  their  vocabulary spurts
(Plunkett  & Stromqvist [1990]). At this time,  articulatory
stress  on  the  inflection was exaggerated by the children,
both in relation to adult norms and their own  earlier  pro-
ductions.  A  similar  observation with respect to the emer-
gence of inflectional morphology has  been  made  by   Smoc-
zynska [1981] in a study of Polish children.

     It has been observed that the  profiles  of  linguistic
development  revealed by the two methods of analysis tend to
converge  towards  the  end  of  the  developmental   period
reported  here.  This  result  suggests that the two sets of
criteria are in  some  sense  equivalent  during  the  later
stages of development. It is noteworthy that convergent pro-
files begin to emerge after the children  have  passed  into
(or  through)  their  vocabulary  spurts. It has been argued
above that  an  important  triggering  factor  for  the  two
children's  vocabulary  spurts  may be their solution of the
segmentation problem. Both PR s  and  formulaic  expressions
atrophy  in  the  two  children  at  precisely  the point in
development that target lexemes undergo a rapid increase  in
number.  Furthermore,  both  children  appear to investigate
alternative segmentation strategies just before the  vocabu-
lary  spurt  occurs, suggesting that they may be calibrating
their hypotheses as to what constitutes an adult target lex-
eme.  The  solution  to  the  segmentation  problem may also
underlie the apparent convergence of  the  two  methods  for
identifying linguistic units. Once the children discover the
correct segmentation of formulaic  expressions,  then  these
longer phonetic chunks will be decomposed into separate lex-
ical items. [12] The ensuing shorter length of  these  units
will result in their being more precisely articulated by the
child and hence more easily identified as  productive  units
by the transcriber. Furthermore, the distinct lexical encod-
ing of these units will support  their  participation  in  a
range  of  productive  combinations  which will lead to them
being identified as productive units on the  application  of
distributional/frequency criteria.

     The timing of the vocabulary spurt may also be  related
to the predominant segmentation strategy adopted by children
in early development. Anne is a precocious  user  of  target
lexemes  as  compared  to Jens. Her identification of target
lexemes may be assisted  by  a  segmentation  strategy  that
focuses on shorter units of speech. It was argued above that
the lack of articulatory precision and the  fluency  associ-
ated  with  formulaic expressions may result in impoverished
representations in  memory  of  these  longer  units.  These
sparse   representational  characteristics  may  hinder  the
recognition of new tokens of linguistic units in the  speech
signal  or hinder the possibility of such units becoming the
objects of segmentation processes themselves. Thus, children
who  prefer  a more holistically oriented segmentation stra-
tegy may experience greater difficulty in identifying target
lexemes  than  children whose segmentation preferences focus
on shorter units which are represented in memory in  a  more
phonetically accurate fashion.  Recent work on ``early talk-
ers'' and ``late talkers''(Bates & Thal  [1991];  Hampson  &
Nelson     [In    Press])    identifies   a   tendency   for
analytic/referential children to undergo  vocabulary  spurts
earlier  in  development  than  social/expressive  children.
Given the  clustering  of  characteristics  associated  with
these  categorisations of children's language (Bates, Breth-
erton & Snyder  [1988]),  the  suggestion  that  predominant
segmentation  strategies play a role in determining profiles
of development coheres with the findings of these studies.

     From a theoretical perspective, the results leave  open
the  question  as to why the two children have such distinct
segmentation preferences early in development or how they go
about  solving  the  segmentation  problem. In an attempt to
evaluate the role of environmental  factors  in  determining
the children's segmentation strategies, preliminary analyses
of the utterances of the mothers of the  two  children  have
been  performed.  The same  articulatory/fluency criteria as
those applied to the children are used to identify formulaic
expressions in three sessions for each mother (at the begin-
ning, middle and end  of  the  period  reported  here).  The
results  indicate that Jens' mother uses a higher proportion
of formulaic utterances. Furthermore, Anne's mother tends to
exaggerate  the prosodic contours of her speech to a greater
extent than Jens' mother.  Prosodic  exaggeration  of  child
directed  speech  has  also  been  reported in other studies
(Fernald  [1989]; Fernald et al.  [1989]; Shute  &  Wheldall
[1989]).  These  preliminary  findings suggest that the dis-
tinct segmentation strategies adopted by  the  children  may
have  an  environmental  source. The imprecisely articulated
and fluent character of  the  speech  in  Jens'  mother  may
hinder  the  identification  of  regularities  at the lexeme
level.  In contrast, the prosodic  exaggerations  by  Anne's
mother  may  highlight constituents of an expression, either
through focused stress patterns or contour boundaries.

     Any  attempt  to  identify  the  source  of  individual
differences  in children with environmental factors (such as
child directed speech) must confront  the  observation  that
children  reared  in  apparently  similar  environments  can
develop language in diverse ways. An environmental  explana-
tion  of why siblings differ in their acquisition strategies
would  entail  an  identification  of  differences  in   the
environments  of  the  siblings  (say, in the child directed
speech).  However,  the  identification   of   environmental
correlates of individual differences would not constitute an
explanation of  those  differences.  An  account  of     how
environmental  correlates  become  entrenched in the child's
cognitive/linguistic/perceptual  processing   apparatus   is
still  required. In other words, environmental correlates of
individual differences contribute little to our  understand-
ing  of how children acquire language unless we identify the
manner  in  which  learning  mechanisms     exploit    these
environmental correlates. Unfortunately, we have only a lim-
ited understanding of the nature of these  learning  mechan-
isms, despite possessing a rich database of acquisition pro-
files across a variety of languages.

     With respect to the segmentation problem, some progress
is  being made in identifying potential candidate mechanisms
of acquisition.  For example,   Elman  [1990]     has  shown
how an artificial neural network is able to learn to segment
a sequence of phonemes into linguistic units that correspond
to  lexemes.   The segmentation process is based on a purely
distributional analysis of the input stream.  Interestingly,
the  neural  network  trained  on  this  task passes through
stages en route to solving the segmentation problem in which
it  hypotheses units which are both greater and smaller than
the target lexemes. In other words, the  network  postulates
organisational  units  during  periods  of its training that
correspond to the formulaic expressions and PRFs observed in
the  children  in  this  study. Clearly, the task that Elman
gives his network is a simplification  of  the  segmentation
problem confronting children. Children are not provided with
an input stream neatly  dissected  into  discrete  phonemes.
However,  the  model  does provide an account of the type of
learning mechanism  that might  be  involved  in  segmenting
speech.   Furthermore,  this  modelling  approach provides a
framework  for  systematically  evaluating  the  effects  of
environmental   conditions,  given  a  particular  cognitive
architecture and learning mechanism.  For example,  specific
linguistic  characteristics  that child language researchers
believe  might  contribute  to  individual  differences   in
language  acquisition  might  be  manipulated  in  the input
stream to the network and their  effects  compared  to  data
collected  from  children.   The modelling approach offers a
potentially powerful tool for evaluating the interactions of
environmental  conditions  with  different types of learning
mechanisms,  in  a  non-intrusive  manner  unavailable  with
`live' subjects.

     The principal empirical finding of this  work  is  that
the  timing of children's vocabulary spurts is closely asso-
ciated with their solution of the segmentation problem.  The
segmentation switching immediately prior to their vocabulary
spurts, suggests that  the  children  are  actively  testing
hypotheses  as to the appropriate segmentation of the speech
signal.  Although  this  finding  does  not  rule  out   the
possibility  that conceptual developments may play an impor-
tant role in triggering a vocabulary spurt, it does  suggest
that  the processing of the speech signal itself sets impor-
tant constraints on development. Even though the  child  may
have discovered all the essential building blocks concerning
the sounds in a particular language, there are still consid-
erable  problems to be solved in determining which sounds go
together to form ``legal strings'' in the language.

     The results reported  here  are  based  exclusively  on
children's  linguistic   productions . However, the solution
to the segmentation problem implies that the child has iden-
tified  target lexemes based on representations of the input
signal. It is to be expected, therefore,  that  segmentation
solutions  in  expressive language will go hand in hand with
correct segmentations of the input language. In other words,
it  is likely that a vocabulary spurt in expressive language
will  be  accompanied  by  a  similar  spurt  in   receptive

     Finally, it has been shown that solutions to  the  seg-
mentation  problem  can  vary  according  to the source from
which various segments are derived.  For  example,  PRFs  in
formulaic  expressions tend to have a different phonological
form to  PRFs used in single unit utterances  or  productive
combinations.  It was suggested that this variation could be
traced to the tendency for linguistic units to  be  anchored
in  stressed  segments  of the speech signal. Formulaic PRFs
may derive from unstressed  segments  (such  as  grammatical
functors)  whilst  productive  PRFs may derive from stressed
segments (such as open class lexical items). To  the  extent
that  prosodic  and supra-segmental properties of the speech
signal  influence  segmentation  strategies,   then   cross-
linguistic  variation  in  segmentation solutions reflecting
gross  prosodic  and  supra-segmental  differences   between
languages  can  be  anticipated.  For  example, the Mainland
Scandinavian Languages --- Danish, Norwegian and Swedish ---
constitute   a   typologically  (grammatically)  homogeneous
group. However, the languages differ considerably  in  terms
of  their  prosodic  and supra-segmental characteristics. It
might be expected that these differences may lead to  cross-
linguistic  variation  in the form of the segmentation solu-
tions that, say, Danish and Swedish children discover during
the course of development.


*  I would like to thank Elizabeth  Bates,  Judith  Goodman,
Virginia  Marchman,  Ann  Peters and Donna Thal for detailed
comments on an earlier draft of this manuscript. Address for
correspondence:   Kim  Plunkett,  Institute  of  Psychology,
University of Aarhus, Asylvej 4, DK-8240  Risskov,  DENMARK.

1. Formulaic expressions may also result from the  automati-
sation  of  syllabic  sequences,  as  a  result  of practice
effects. Practice  effects  may  also  lead  to  filler-like
expressions, such as when words are shortened to encode only
the stressed syllable.

2. In most studies, the  internal  structure  of  a  child's
utterance  is  taken  for granted. Hence the second stage in
the researcher's segmentation problem,  the  application  of
formulaicity criteria, is not carried through.

3. Given that Anne's MLU profile resembles that of   Brown's
[1973]    subjects Adam and Eve, it is noteworthy that Brown
used clarity of articulation as a selection criteria in  his
study, in order to facilitate the process of transcription.

4. Many speakers may seem to contradict this claim. However,
both  practice  effects  and  individual differences amongst
speakers may influence the equilibrium state  of  the  self-
organising  phonetic  space. Nevertheless, it should be born
in mind that  the  speed/accuracy  trade-off  is  a  within-
subject variable. 

5. Some phonetic features may be more robust than others (as
is  usually  the  case in self-organising systems) and hence
appear as articulatory anchor points in the child's  produc-

6. A single-unit expression may consist of either  a  target
lexeme or a PRF.  expressions and combinations.

7. There is a  certain  circularity  in  the  argument  here
since,  on  the  one  hand,  it is  predicted that formulaic
expressions will be imprecisely articulated because of their
length, and on the other hand, articulatory/fluency criteria
are used to identify formulaic expressions. This circularity
is  broken only by the empirical findings reported by  Lind-
blom [1985] concerning the  trade-off  between  fluency  and
articulatory  precision.  At  another level, we may consider
the current work as an attempt to  evaluate  the  assumption
that  formulaic  expressions  are imprecisely articulated by
observing  whether  this  assumption  leads   to   plausible
descriptions of linguistic development.

8. A complete record of these codings, as well as the entire
database  for  this longitudinal study, can be obtained from
the CHILDES archive  administered  by     Brian  MacWhinney,
Dept. of Psychology, Carnegie Mellon University, Pittsburgh,
PA 15213 . Alternatively, copies  of  the  database  can  be
requested  from  the  author who also maintains the original

9. For Anne the vocabulary spurt begins when she controls 93
target lexemes and for Jens 62 target lexemes.

10. This assumption finds support in the  current  database.
For  Anne and Jens, the overwhelming majority of target lex-
emes found in formulaic expressions  are  deictic  pronouns,
Hv.  (Wh.)  words  and  concrete nouns --- all lexical items
that tend to be stressed in Danish. The  only  exception  is
the  frequent  use  of the present tense copula in formulaic
expressions which tends to be unstressed  in  adult  Danish.
However,  the copula is a frequently used form in adult Dan-
ish occurring  in  a  wide  variety  of  constructions  (see
Plunkett & Stromqvist [1990]).

11. It is unclear exactly which  properties  of  the  speech
signal   transcribers  are  responding  to  when  they  make
categorisations of expressions as formulaic  or  productive.
Although  the  coding  procedures  are designed to focus the
transcriber's attention on the  articulatory/fluency  dimen-
sion, more detailed phonetic studies are required to isolate
and  identify  the  factors  contributing  to  transcribers'

12. Snow [1986]  also  argues  that  expressions  previously
unanalysed  by the child may become the objects of segmenta-
tion processes.


Bates, E., Bretherton, I., & Snyder, L. [1988],  From  First
     Words  to  Grammar: Individual Differences and Dissoci-
     able Mechanisms, Cambridge University Press, Cambridge,

Bates, E., & Thal, D. [1991],  "Associations  and  dissocia-
     tions  in  child  language development," in Research on
     child language  diorders:  A  decade  of  progress,  J.
     Miller & R. Schiefelbusch, eds., Pro-Ed, Austin, TX.

Bloom, L. M. [1973], One word at a time: The use  of  single
     word utterances before syntax, The Hague, Mouton.

Bretherton, I., McNew  S., Snyder  L., & Bates,  E.  [1983],
     "Individual  differences  at  20  months:  analytic and
     holistic strategies in language  acquisition,"  Journal
     of Child Language, 10, 293--320.

Brown  R. [1973], A first language:  The early stages,  Lon-
     don, Allen   Unwin.

Elman,  J. L. [1990], "Finding structure in time," Cognitive
     Science 14, 179-211.

Fernald,  A. [1989].  "Intonation and  communicative  intent
     in  mother's  speech to infants: Is the melody the mes-
     sage? irregular past tense  verbs,"  Child  Development
     60, 1497--1510

Fernald,   A.,  Taeschner,  T.,  Dunn,  J.,  Papousek,   M.,
     Boysson-Bardies,  B.  De & Fukui, I.  [1989], "A cross-
     language study of prosodic  modifications  in  mothers'
     and  fathers'  speech to preverbal infants," Journal of
     Child Language 16, 477-501.

Hampson, J. & Nelson, K. [In Press], "The relation of mater-
     nal language to variation in rate and style of language
     acquisition," Journal of Child Language.

Hickey, T. [1990],  "Identifying formulas in first  language
     acquisition:  an application to Irish," Paper presented
     to the Fifth International Congres  for  the  Study  of
     Child Language Budapest, Hungary.

Jordan,  M. I. [1990], "Motor learning and  the  degrees  of
     freedom  problem,"  in  Attention  and  Performance, M.
     Jeannerod,  ed.  #XIII,  Lawrence  Erlbaum  Associates,
     Hillsdale, NJ.

Lindblom,  B. [1985], "Phonetic  universals  in  vowel  sys-
     tems,"  in  Experimental  Phonology,  J. J. Ohala, ed.,
     Academic Press, San Francisco.

MacWhinney,  B.  [1978],  "The  acqusition  of  morphology,"
     Monographs   of  the  Society  for  Research  in  Child
     Development 43.

MacWhinney,  B. [1990],  Computational  tools  for  language
     analysis:  the CHILDES system, Lawrence Erlbaum Associ-
     ates, Hillsdale, NJ.

MacWhinney,  B. & Snow, C.  [1985], "The child language data
     exchange  scheme,"  Journal  of Child Language 12, 271-

MacWhinney, B.  & Snow, C. [1990], "The child language  data
     exchange scheme: An update," Journal of Child Language.

Miller,  J. F. & Chapman  R. S. [1981], The relation between
     age and mean length of utterance in morphemes," Journal
     of Speech and Hearing Research 24, 154-164.

Nelson,  K. [1973], "Structure and strategy in  learning  to
     talk,"  Monographs of the Society for Research in Child
     Development 38.

Nelson,  K.  [1981],  "Individual  differences  in  language
     development:  Implications for acquisition and develop-
     ment," Developmental Psychology 17, 170-187.

Peters,  A. M. [1989], The units  of  language  acquisition,
     Cambridge  Series   of  Monographs and Texts in Applied
     Psycholinguistics, Cambridge Univ. Press, New York.

Peters, A. M. [1989], "From schwaa to grammar: The emergence
     of  grammatical morphemes," Paper presented to the Bos-
     ton University Conference on Language Development

Plunkett, K. [1986],  "Learning  Strategies  in  Two  Danish
     Children's  Language Development," Scandinavian Journal
     of Psychology 27, 64-73.

Plunkett,  K. & Stromqvist  S. [1990],  The  Acquisition  of
     Scandinavian   Languages   #59,  Gothenburg  Papers  in
     Theoretical Linguistics, University of Gothenburg.

Shute, B. & Wheldall,  K.  [1989],   "Pitch  alterations  in
     British  motherese:  some  preliminary  acoustic data,"
     Journal of Child Language 16, 503-512.

Smoczynska, M. [1981], "Uniformities and  Individual  Varia-
     tion in Early Syntactic Development," Polish Psycholog-
     ical Bulletin 12, 3-15.

Snow, C. E. [1972],  "Mother's speech to  children  learning
     language," Child Development 43, 549--65.

Snow,  C.  E.  [1986],  "conversations  with  children,"  in
     Language   acquisition:   Studies   in  first  language
     development.  Second edition, P. Fletcher & M.  Garman,
     Cambridge University Press, Cambridge.

Uzgiriz, I. C.  &  Hunt,  J.  McV.   [1975],  Assessment  in
     Infancy," University of Illinois Press,  Chicago.