Connectionist Representations: the State of the Art

Daniel Memmi

LIMSI-CNRS Orsay (France)

ABSTRACT

When connectionist networks are used to design high-level cognitive
models, the comparison with symbolic AI becomes unavoidable, as well as
fundamental representational issues. What kind of representations do
connectionist models use and are these representations fully adequate?
In particular how do they compare with symbolic representations to deal
with complex structures? We will claim that local representations are
insufficient to deal with compositionality, but that distributed
representations are much more powerful, as they allow a novel,
functional interpretation of structure. However, we will also contend
that the implicit nature of connectionist representations is still
inadequate for general cognitive modeling.


INTRODUCTION

Neural networks have not only offered new techniques for practical
applications (such as pattern recognition or optimization problems),
but they have also opened new avenues for cognitive modeling (Rumelhart
& McClelland 86). In the past decade, many novel connectionist models
of cognitive functions have been proposed, especially in psychology and
linguistics. For instance neural networks have been used to simulate
memory access, inferencing, word reading, learning morphology, parsing
sentences, text comprehension... Connectionism thus appears to compete
more and more with classical Artificial Intelligence (AI) to offer
models of high-level cognition.

THE QUESTION OF REPRESENTATIONS

Now if one is to take seriously neural networks as cognitive models,
the question of representations becomes inescapable. The notion of
representation is so central to cognitive science that connectionism
cannot evade this fundamental issue. Indeed most cognitive models
consist essentially in representing a task domain and devising
appropriate operations on the representations. To give a simple but
typical example, expert systems apply standard inference schemes on
sets of facts and rules. Other types of representations (such as mental
images) have also been used for various other tasks, but we are still
dealing with representations.

However, connectionist systems do not seem at first sight to contain or
use classical representations. Instead they often deal with a task by
learning directly from examples. And they do not use symbols as AI
does, a fact which has led to very serious criticism of connectionism
(Pinker & Mehler 88). Conversely this has encouraged other
theoreticians to defend an eliminativist position, where
representations would not be needed for cognitive modeling (Churchland
86). It would take a longer discussion to dismiss eliminativism, but it
might be enough to point that every level of description requires its
own vocabulary, both practically and theoretically. And as a matter of
fact, all practical applications as far as we know, symbolic or
connectionist, use representations of some kind or other.

CONNECTIONIST REPRESENTATIONS

So the real problem is not whether neural networks employ
representations, but what kind of representations exactly they make use
of. And those representations have to be compared with classical
symbols, if only because a large majority so far of cognitive models
has been designed with symbolic representations, whose expressive power
is indeed hard to match (Memmi 90). This text will therefore deal
essentially with the nature of connectionist representations, and their
possible limitations for cognitive modeling when compared with
classical systems.

Our main thesis will be that even though distributed connectionist
representations are more powerful than is often assumed, they still
lack the important explicit quality of symbolic representations. We
will first (1) consider the nature of representations in general, and
then investigate in turn (2) local connectionist representations, and
(3) distributed representations, which will prove more adequate in
several ways. Yet we will consequently (4) contend that implicit
connectionist representations are still insufficient, before coming to
a conclusion.


1) REPRESENTATIONS AND SYMBOLS

Granted that representations are unavoidable, we must try to define
them as precisely as possible. This is unfortunately a potentially
enormous task, ranging from philosophy to computer science through
psychology and linguistics, because representation is such a
fundamental concept of cognitive science (Fodor 87). So we can give
here only a rough sketch for a description, with the most relevant
features for our central argument.

GENERAL REPRESENTATIONS

Basically, a representation in general can be defined by two main
characteristics:

- reference to an object or feature in an external domain, which can
itself be real, abstract, or imaginary;

- a form of its own, which may or may not be related to the physical
shape, the nature or connotations of the object represented.

For example the word "horse" may refer to a specific horse or to the
class of horses, the figure "5" can refer to a number in arithmetic,
but they bear no relation to the shape or nature of their reference; on
the other hand a picture of a horse will show some resemblance to the
real animal, allowing a more direct, iconic form of processing.

It must be added that representations function within a system
(language, mathematics, road signs...), where each representation
should have a distinct form and reference. Note also that
representations are themselves objects which can be referred to or
operated upon, making it possible to build very complex models.

SYMBOLIC REPRESENTATIONS

Among other representations, symbols (in the logical-linguistic sense)
can be distinguished in the following way:

- atomic symbols have a discrete and arbitrary form, with no natural
relation to their reference;

- complex symbols (expressions) possess a compositional structure, and
can be decomposed recursively into atomic symbols.

For example "horse" and "5" are atomic symbols, whereas "the horse
jumped the fence" and "1789" are complex expressions with an inner
structure (and a corresponding compositional semantic interpretation).

The arbitrary form of classical symbols sets them off sharply from
other types of representations, such as images, icons or schemas, whose
form is analog to their reference, and the compositional structure of
complex symbols gives them enormous representational power.

SYSTEMATICITY AND COMPOSITIONALITY

This fairly long (though still simplistic) preamble was necessary to
deal more precisely with connectionist models. It will allow us to
state succintly the issue of compositionality, which classical symbol
systems solve neatly, but which appears to be a serious problem for
neural networks.

The following argument is now well known (Fodor & Pylyshyn 88).
Cognitive processes are productive and systematic: if one can deal with
a given task, one can also deal systematically with a different but
similar task. For instance a native speaker who can utter or understand
the sentence "John loves Mary" can also process "Mary loves John"
without further difficulty. The most obvious explanation for this fact
would be that the underlying representations have a compositional
structure, with constituents explicitly concatenated according to
strict combinatorial rules, and that cognitive processes are
systematically sensitive to the form and structure of representations.
This is how classical systems operate thanks to the discrete and
compositional nature of symbols, and this is probably why symbolic AI
has been so fruitful over so many years.

However, connectionist systems do not seem at first sight to be able to
use such representations. How can the global activations of a neural
network implement discrete components and combinatorial structures?
And how can mere activity propagation perform structure-sensitive
processes? One may therefore doubt the capabilities of neural networks
for high-level cognitive modeling, unless we can find either
connectionist symbols, or connectionist representations and processes
with equivalent power. With this aim in view, we will look in turn at
local and then at distributed representations.


2) LOCAL REPRESENTATIONS

In neural networks with local representations, every neuron stands for
one concept only (or one semantic feature) of the task domain, whereas
in distributed representations a concept will be spread over several
neurons. Local systems are often (but not necessarily) interactive
activation models with a Hopfield-type architecture. Local
representations are easier to understand and implement, and they are
also closer to some of the formalisms in symbolic AI. In fact there is
no sharp distinction between a local neural network and a semantic net
with activity propagation (though the latter usually shows different
types of labeled links).

Local representations partly meet our definition of symbols. Each
neuron is a clearly identifiable unit, referring to a unique object in
the semantic domain. As with classical symbols, reference is then
discrete and conventional. Although neurons do not have an individual
form, they can be identified without any ambiguity by their position in
the network. This positional coding, together with the weights on the
connections, is often sufficient to allow the appropriate inferences
simply by activity propagation. Very interesting models of constraint
satisfaction have thus been implemented by propagation and competition
between local neurons.

THE PROBLEM OF COMPOSITIONALITY

However, compositionality is not guaranteed. As neurons have no inner
structure, a complex expression can only be represented by an assembly
of neurons, and local means become insufficient. For example "John
loves Mary" may be represented by activating the individual neurons
standing for "John", "love" and "Mary" respectively. But there is no
easy way to ensure an unambiguous representation of the structure of
the expression. In the previous example, one no longer knows who loves
whom in this neural representation, which might just as well mean "Mary
loves John"!

Another possibility is to represent structural links explicitly by
additional connections between neurons (often together with additional
neurons to carry structural labels). Whole parse trees can be easily
represented in this way (Waltz & Pollack 85) (Cottrell 89).
Unfortunately, one is dealing with a fixed structure which then
requires an external module and costly copies to move constituents
around.  Similarly, using different neurons for various structural
positions (e.g.  one neuron for "John-as-agent" and another one for
"John-as-patient") is just another way to hard-wire particular
structures, with similar drawbacks. In short, there is no satisfactory
general solution as yet to compositional issues within local systems.

SHORTCOMINGS OF LOCAL MODELS

Moreover, systems with local representations exhibit to a much lower
degree some of the typical qualities of connectionism. They cannot
represent fuzziness as well as distributed systems (as basic concepts
remain discrete), and they do not elaborate by themselves original (and
fuzzy) internal representations through learning. As a matter of fact,
most local models do not include learning, because the systems are
still simple enough to set weights by hand and learning would not be of
great benefit. Therefore new concepts cannot emerge, and such networks
do not generalize well to unexpected input. So even if one could solve
compositionality problems, local models seem limited to fairly simple
cognitive tasks.

It should be mentioned, however, that the distinction between local and
distributed representations is not absolute, since it depends mainly on
an interpretation by an external observer. For instance a model with
local basic concepts may also learn to represent structure in
distributed fashion by its connection weights (Crucianu & Memmi 92).
And quite a few multilayer models use both local representations on
input-output layers (for ease of coding) and distributed
representations in hidden layers (through learning).


3) DISTRIBUTED REPRESENTATIONS

In distributed models, a concept is represented by the activation of an
assembly of neurons, and each neuron may participate in the
representation of different concepts. Such representations are more
difficult to understand and to use, but they are quite advantageous.
They may require complex coding schemes (notably on input), but they
are often produced by automatic learning procedures, usually in the
hidden layers of multilayer architectures. Distributed codings are also
more efficient as to information capacity and generalization ability.
These models show more clearly such typical connectionist qualities as
learning, fuzziness, robustness... Moreover, we will see that
distributed representations offer far better hopes for dealing with
compositionality problems.

STATE-SPACE DESCRIPTIONS

The complexity of distributed representations makes them rather opaque
for the human user. It often proves difficult to identify and interpret
the internal representations developed through learning, and even more
difficult to investigate precisely their characteristics and possible
limitations. Appropriate tools are then required: state-space
descriptions and data analysis techniques; see for instance (Elman 90)
(Jodouin 93)... State-space is the vector space where different
activation patterns of a network can be placed, by considering each
state of the system as a vector. A distributed representation will then
be a point in such a space (with as many dimensions as neurons).
Representations can be investigated by comparing corresponding points
in state-space with data analysis techniques: hierarchical clustering,
principal component analysis, discriminant analysis...

By using such descriptions, we will see that distributed
representations offer functionalities similar to classical symbols, but
achieved in a very different way. Like symbols, a distributed
representation can be identified by its own form (an activation
pattern), and each form may refer to a distinct object in the semantic
domain.  However, the state-space of neural networks, whether discrete
or continuous, is generally much denser than the sparse discrete space
of symbolic representations. And contrary to atomic symbols, the form
of distibuted representations is not totally arbitrary, since similar
objects have close representations in state-space. This suggests a
possible solution to the classical symbol-grounding problem, as
connectionist representations may retain enough information about the
objects represented (Harnad 92), though present proposals remain fairly
tentative.

FUNCTIONAL COMPOSITIONALITY

The continuous (or quasi-continuous) variability of distributed
representations has another very important consequence. It offers a
general connectionist solution to the problems of systematicity and
compositionality. By representing structural similarity by vectorial
similarity in state-space, a systematic representation of complex
expressions can be achieved (van Gelder 90). Closely related
expressions (e.g. differing by one constituent) will be represented by
neighboring vectors, and constituents can be identified by data
analysis in state-space. For example "John loves Mary" and "John loves
Ann" will have close (but different) connectionist representations, and
the difference in vector space will correspond to the different
constituents "Mary" and "Ann" in the sentence.

Moreover this type of representation allows not only a connectionist
coding of complex expressions, but also the systematic processing of
structure, because similar vectors will give similar outputs in a
systematic manner (after appropriate training). Structure-sensitive
processes can thus be implemented with a functional, non-concatenative
compositionality. Though apparently rather vague, this functional
approach to systematicity has inspired or clarified quite a few
high-level connectionist models, especially in natural language (Elman
90) (Chalmers 90) (St John & McClelland 90) (Jodouin 93)... Those
models have by an large validated the approach practically, for complex
and highly structured tasks such as grammar acquisition, syntactic
transformations, sentence comprehension...

ADEQUACY OF DISTRIBUTED REPRESENTATIONS

Distributed models thus provide a rather original answer to the
compositionality problem by developing and using systematic, but
global, non-concatenative representations of structure. Classical
symbolic compositionality turns out to be but a special case (though a
particularly simple and useful one). Moreover, distributed
representations offer new capabilities, which are specific to
connectionism: variation of constituents in context, learning of
representations, fine-grain processing... We might then speak of
functional, virtual, emergent or implicit symbols and structures, but
one could also contend that connectionist methods offer a totally novel
representation space for cognitive modeling.

Nevertheless connectionist models of high-level tasks (especially
natural language processing) are still in short supply, and rather
simple (in ability) when compared with symbolic AI. It is not quite
clear to what extent the functional approach can be successfully used
(possible structures seem to be of limited complexity), and the opacity
of distributed representations requires data analysis techniques to be
properly understood. Consequently the stage of practical applications
has not yet been reached, and further research is necessary. Moreover
we now have a more fundamental objection to present connectionist
models:  the implicit nature of distributed representations.


4) IMPLICIT AND EXPLICIT REPRESENTATIONS

Even though distributed representations seem to meet most classical
criteria while offering new qualities, they remain implicit. Such
representations are inextricably entangled with their treatment within
a neural network, since connection weights as a whole determine both
internal representations and processing. Connectionist representations
arise during processing as transient activation patterns which are not
stored anywhere, and weights constitute the system's only long-term
knowledge source (and a particularly opaque one). This stands in sharp
contrast with symbolic systems, where knowledge and processing are
carefully separated: symbolic expressions are explicit, independent
objects, which are stored as such in memory to be manipulated by
general inference schemes.

REFLEXIVITY AND ABSTRACTION

The explicit nature of symbolic representations make it very easy to
reuse knowledge in an open way. Classical systems may for example
explain their behavior, reformulate their knowledge, perform meta-
reasonings... Generally speaking, they are able to look at and reason
about their own behavior and knowledge, and are thus potentially
endowed with reflexivity and abstraction. Such crucial intellectual
qualities seem necessary for high-level cognitive modeling, and are
also important for interactions with human users. By comparison,
connectionist systems remain, by and large, black box models both for
themselves and for outside users. We believe this to be a serious
theoretical and practical shortcoming of neural networks.

It could be argued that the implicit / explicit distinction is not
absolutely clear (Clark 92). Knowledge could be considered explicit
when there are mechanisms to recognize or express the appropriate
information. Symbolic representations are clearly explicit because
their computer coding now seems so simple and evident. Distributed
representations might be said to be explicit for the network processing
them, but the knowledge is unfortunately not clearly available for
another use or another task. Explicit expressions can be extracted from
neural networks thanks to various data analysis techniques, but this
would no longer be a purely connectionist system. If external methods
or modules are needed, we are then dealing with hybrid systems.

CONNECTIONIST ATTEMPTS

In fact, there is no general connectionist method as yet to extract and
use in an open way the knowledge acquired by a network. Various schemes
have been proposed to compose and extract explicit constituents of
neural representations: tensor product formalism (Smolensky 90),
BoltzCONS system (Touretzky 90), RAAM nets (Pollack 90), SG model (St
John & McClelland 90)... But even though structure extraction is
achieved by connectionist means (without data analysis), all those
systems require complex architectures with several subnets and an
external control mechanism. And explicit structure processing is only a
small part of full reflexivity, so no satisfactory general solution
appears to be in sight for ordinary, homogeneous neural networks.

One way to attack the problem would be to concede that it might be too
difficult to find a purely connectionist answer. We should instead try
to develop hybrid systems combining connectionist and symbolic AI
methods with their complementary qualities; see for example (Amy et al.
90). As compositionality and reflexivity are dealt with adequately by
symbolic AI, we should use neural networks only where they work best:
learning, generalization, resistance to noise... The problem now is to
translate connectionist representations into symbolic ones, and vice-
versa, between different modules of the system. This is feasible
(though not as easy as one might think), and hybrid systems constitute
a reasonable and practical research direction for the near future.

Still, our brain is able to accomplish all those tasks, from low-level
perception to abstract reasoning, presumably with connectionist-like
methods. But real biological architectures are much more complex than
the simple, homogeneous artificial neural networks usually proposed.
Highly structured assemblies and modules have been observed, whether in
the visual system or the cortex (Burnod 1988). Moreover, it seems very
implausible that complex cognitive tasks could be achieved without
architectures and strategies of a corresponding level of complexity. We
therefore think that more complex connectionist architectures are most
probably needed, such as modular or hierarchical networks where one
part of the system could watch another. One might deal in this way with
reflexivity and abstraction, and also achieve totally unbounded
compositionality.


CONCLUSION

We have investigated here the nature and power of connectionist
representations with respect to the requirements of cognitive
modeling.  For this purpose, a series of oppositions has been used to
organize the discussion: symbolic / connectionist, discrete /
continuous, arbitrary / analog, compositional / global, local /
distributed, explicit / implicit...  Though all such distinctions
become blurred upon closer examination, they have helped us to clarify
the capacities of connectionist representations, and thus evaluate more
precisely the real potentiality of connectionism. This inquiry also
points to likely research directions in the future.

In short, local connectionist representations seem less promising, and
are not powerful enough to deal fully with compositionality problems.
Distributed representations on the other hand offer an original
implementation of structure, and thus a functional equivalence to
complex symbolic expressions, without losing typical connectionist
qualities. However, such implicit representations are still
insufficient to account for the reflexivity of high-level cognition. We
discern then two main research directions to attend to this issue:
either hybrid (connectionist-symbolic) systems, or more complex modular
connectionist architectures.

ACKNOWLEDGMENTS
REFERENCES

Amy B., Giacometti A. & Gut A. (1990) Mod�les connexionnistes de 
l�expertise, Neuro-N�mes 90, N�mes.

Burnod Y. (1988) An Adaptive Neural Network: The Cerebral Cortex, 
Masson.

Chalmers D.J. (1990) Syntactic transformations on distributed 
representations, Connection Science 2 (1).

Churchland P. (1986) Neurophilosophy, MIT Press.

Clark A. (1992) The presence of a symbol, Connection Science 4 (3-4).

Cottrell G.W. (1989) A Connectionist Approach to Word Sense 
Disambiguation, Morgan Kaufmann.

Crucianu M. & Memmi D. (1992) Extraction de la structure implicite 
dans un r�seau connexionniste, Neuro-N�mes 92, N�mes.

Elman J.L. (1990) Finding structure in time, Cognitive Science 14 (2).

Fodor J.A. (1987) Psychosemantics, MIT Press.

Fodor J.A. & Pylyshyn Z.W. (1988) Connectionism and cognitive 
architecture: a critical analysis, Cognition 28 (1-2).

van Gelder T. (1990) Compositionality: a connectionist variation on a 
classical theme, Cognitive Science 14 (3).

Harnad S. (1992) Connecting object to symbol in modeling cognition, in 
Connectionism in Context, Clark & Lutz eds., Springer-Verlag.

Hinton G.E. ed. (1990) Connectionist Symbol Processing, special 
issue, Artificial Intelligence 46 (1-2).

Jodouin J.F.  (1993) R�seaux de Neurones et Traitement du Langage 
Naturel, Th�se Universit� de Paris-Sud, Orsay.

Memmi D. (1990) Connectionism and artificial intelligence as cognitive 
models, AI & Society 4 (2).

Pinker S. & Mehler J. eds. (1988) Connectionism and Symbol Systems, 
special issue, Cognition 28 (1-2).

Pollack J.B. (1990) Recursive distributed representations, Artificial 
Intelligence 46 (1-2).

Rumelhart D.E. & McClelland J.L. eds. (1986) Parallel Distributed 
Processing, MIT Press.

St. John M.F. & McClelland J.L. (1990) Learning and applying 
contextual constraints in sentence comprehension, Artificial 
Intelligence 46 (1-2).

Smolensky P. (1990) Tensor product variable binding and the 
representation of structure in connectionist systems, Artificial 
Intelligence 46 (1-2).

Touretzky D.S. (1990) BoltzCONS: dynamic symbol structure in a 
connectionist network, Artificial Intelligence 46 (1-2).

Waltz D.L. & Pollack J.B. (1985) Massively parallel parsing: a 
strongly interactive model of natural language interpretation, 
Cognitive Science 9.



This text owes much to regular discussions with M. Crucianu and J.F.
language interpretation, Cognitive Science 9.
[CRL Newsletter Home Page] [CRL Home Page]
Center for Research in Language
CRL Newsletter
Article 8-1