1 Introduction

What is at stake in enactivist approaches is the “cognitive gap” (De Jaegher & Froese, 2009) that non- or anti-representationalists are called on to fill: that is, to offer a convincing account of higher forms of cognition and the continuity between them and lower forms of cognition. This also means offering a convincing account of the emergence of language from the basic, content-free forms of cognition and communicative interactions advocated by enactivism. Indeed, explaining linguistic capabilities requires more than the possibility to explain the dynamic interaction with the here-and-now environment – it requires the reference to a higher-level cognition. At least, that is how it would seem.

Some recent works seem to point in interesting directions of investigation on the relationship between enactivism and language,Footnote 1 focusing on language as enactive, i.e., as the extension of action. Among these proposals, Radical Enactivism (REC) takes its cue from the analytical re-interpretation of Wittgenstein’s perspective on language to remodel analytically oriented theories of Millikan’s teleosemantics with a contentless enactivist perspective (Hutto & Myin, 2017).Footnote 2

However, the RECers only partially explain the transition from a prelinguistic to a linguistic form of communication. Assuming that human language involves content, they view it as a “kinky” result of evolutionary bio-social learning processes and admit that they are “at odds with evolutionary continuity” of contentful cognition and language (Hutto & Myin, 2017: 122).Footnote 3 They merely provide a philosophical sketch of how content might have emerged in nature, using an evolutionary continuity view and a psychological discontinuist view that considers “content utterly unprecedented in nature” (Hutto & Myin, 2017: 128). Accordingly, they propose a kind of “continuity” view that requires a “relaxed” naturalistic explanation of how cognition involving content can arise from minds without content through the mastery of specific sociocultural practices.

Drawing on Hutto and Myin’s interpretation of REC as the most radical version of the “pragmatic turn” (Hutto & Myin, 2017: 37), I propose that a pragmatist enactive approach to understanding the emergence of language and contentful minds from a basic form of social communicative interaction can enhance the framework of REC’s sketched perspective. To move in this direction, we consider a pragmatist behavioral theory of meaning according to which gestural conversation is the origin of the evolution of linguistic conversation. This involves further exploring what Hutto and Myin already acknowledge: the potential for spontaneous expressive behavior to have meaning. To enhance REC’s proposal, I draw on Mead’s pragmatist theory of gesture as “truncated acts”. Mead’s theory is helpful for connecting prelinguistic and linguistic communication, providing a continuist perspective on the evolution of symbolic language and contentful cognition. Therefore, I believe that Mead’s pragmatist theory of gesture can provide useful components for constructing a continuist non-representationalist theory of the genesis of language that includes REC’s Ur-intentionality in a wider, non-dualistic phylogenetic and ontogenetic theory of symbolic language from gestural communication. This approach allows us to overcome the separation between content-less and content-involving intentionality. As we will see in the last section of the paper, although Mead’s terminology may seem outdated, his theory has been recently revived as relevant in explaining the innate social dimension of both human and non-human animals, and the human communicative capacity through the conditioning of bio-social canons and structures (McNeill, 2005, 2012; Rizzolatti & Sinigaglia, 2008). Several authors have suggested a continuity between action, gesture, cognition and language (Arbib, 2012; Corballis, 2002, 2017; Donald, 1991, 2012; Ferretti et al., 2018; Kendon, 2004; McNeill, 2005, 2012; Tomasello, 2008; Volterra et al., 2017).Footnote 4 My aim is to showcase that Mead’s theory of gesture as “truncated acts”, according to which gestural communication and manipulation generally precede propositional communication, having their roots in a natural brain endowment that enables and facilitates the transition from one to the other, can be part of this wide range of studies.

2 Radical enactivism and language

Hutto and Myin (2017) define Radical Enactivism (REC) as the extreme take on the “pragmatic turn” away from the representation-centered framework towards a paradigm that focuses on cognition as a skillful activity involving ongoing interaction with the world.Footnote 5 In particular, REC wants to challenge the representational theories of cognition as well as the information processing picture of cognition, by arguing for the possibility that basic minds, which are phylogenetically and ontogenetically “the most fundamental kinds of minds” (Hutto & Myin, 2017: 89), may be contentless. REC denies the kind of representation that equates to propositional content, namely that which represents things as being thus and so regardless of how they actually are (Hutto & Myin, 2017: 10), claiming that neither are all forms of cognition content-involving nor are all contentful thoughts cognition. Some embodied activities of reenactments like basic perceiving, learning, imagining, and remembering involve no content, that is, no encoding and processing of information or representations.Footnote 6

This does not mean that REC denies contentful cognition altogether. Content-involving minds have “features and capacities that other, more basic minds lack: they stand apart. This difference can be thought to mark a difference in kind, not just degree, of mindedness” (Hutto & Myin, 2017: 134. Italic added). What they call a “kinky” capacity for content-involving cognition is an exceptional achievement based on the mastery of very special kinds of scaffolded practices involving public norms for using symbols. Such norms depend on a range of customs and institutions possible thanks to “the construction of sociocultural cognitive niches in the human lineage” (Hutto & Myin, 2017: 134). Only basic minds capable of mastering cultural practices could acquire new cognitive capacities and be open to new possibilities for engagement with the world and other organisms. In other words, sociocultural practices constitute a scaffolding of the basic mind through which new and qualitatively distinct cognitive niches have emerged that made organisms possible to become capable of “new forms of thinking of a unique kind” (Hutto & Myin, 2017: 138).

What REC assumes for constructing the “kinky” content-involving cognitive capacities are some purely biologically based forms of basic cognition that are shared across the species, and that could have given rise to social learning processes. Through such processes, our ancestors could learn from other species members and establish cultural practices and institutions that have stabilised over time. The capacity of social learning processes presupposes the possibility for organisms to interact teleologically with the natural and “social” world around them.

Therefore, according to REC, the first kind of interaction happens thanks to “teleosemiotics,” which means organisms have to interact purposefully with the world around them, not characterized in semantic terms such as reference or truth. As Hutto and Myin explain, teleosemiotics is a contentless version of Ruth Millikan’s teleosemantics (Millikan, 1984, 1989, 1990), namely “a teleofunctional account of what determines the semantic contents of inner representations” (Millikan, 1990: 151). In particular, Millikan’s theory aimed to explain representational properties in naturalistic terms, focusing on how the interpretative work of cognitive agent-independent representational content enables responses to aspects of environments in ways that answer organismic needs. As Hutto and Myin put it, the idea behind Millikan’s proposal is that “a device will have the teleofunction of representing Xs if it is used, interpreted, or consumed by the system because it has the proper function of representing the presence of Xs.” (Hutto & Myin, 2013: 76).Footnote 7 Millikan wanted to offer a naturalistic account of what it is for something to have a function, that is, what it is supposed to do (e.g., the proper function of my legs is to make me move, the proper function of my heart is to pump blood). More specifically, she aimed to give a naturalistic account of the teleological function of representational properties by appealing to an evolutionary history involving some form of natural selection and associative learning (Millikan, 1990: 152).

According to RECers, however, to talk of proper function emphasises that the content is determined by what organisms are supposed to do in their interpretative activity rather than what they are simply disposed to do. Furthermore, the idea of pre-existing content to be used, interpreted, or consumed suggests that contentful mental states exhibit properties of reference, truth, or accuracy, namely semantic properties.Footnote 8 To keep the teleosemantic apparatus without the representational aspect of basic responding, they empty Millikan’s proposal transforming it into a contentless “teleosemiotics” (see also Hutto & Satne, 2015):

The teleosemantic apparatus is used to give an account of contentless attitudes exhibiting basic intentional directedness—aka intentional attitudes—as opposed to providing a robust semantic theory of content. This allows us to understand basic cognition in terms of active, informationally sensitive, world-directed engagements, where a creature’s current tendencies for active engagement are shaped by its ontogenetic and phylogenetic history. (Hutto & Myin, 2017: 138-9)

Teleosemiotics considers the organism’s response sufficient to explain the attribution of meaning to informational stimuli without involving an interpretative process within the organism that would then have to resort to representational content of the ongoing situation. According to REC (Hutto & Myin, 2013: 81), organisms often successfully act by making appropriate responses to stimuli (objects or states of affairs) in ways that are mediated only by their sensory response to natural signs, and this response does not imply a contentful representation of the stimuli in question.

To give a reason for the organisms’ purposeful, i.e., target-focused, contentless acting, RECers refer to a contentless intentional attitude that they call Ur-Intentionality, which cannot be equated with a mere property of natural attunement between organisms and their environment. For natural attunements that have occurred in the past not only structure the profile of an organism’s current tendencies for responding but also normatively fix on multiple spatial and temporal scales what it is intentionally directed towards:

There is no reason to suppose that the cognition at play in such social engagements and interactions must be grounded in representationally based rules of any kind. Rather, all that needs to be assumed is that normally developing participants in such practices are already set up, nonaccidentally, to target and tune into the expressively rich intentional attitudes of others. (Hutto & Myin, 2017: 140)

REC provides the sense for the biological basis of the rule-following of basic minds by referring to a long process of natural selection by consequences. Accordingly, some biological facts fix what a basic-minded organism is directed towards and explain why it is so directed and, thus, why it is connected and reacts to certain sensory stimuli and not others.

3 Some open issues on Ur-intentionality and discontinuist view

Ur-intentionality is a basic intentional directionality rooted in the intertwining of natural instincts and associative learning processes.Footnote 9 However, it is so basic, pervasive, and natural that, as Jean-Michel Roy (2015) claimed, to call it “intentionality” is almost useless and misleading.Footnote 10 Furthermore, as Pierre Steiner argues, it presents some contradictions that rely on the fact that:

“The teleological norms they [RECers] appeal to in their account are historically and diachronically important for identifying the objects of intentional attitudes (do frogs perceive flies or black dots?), but they are not sufficient if, synchronically, we want to understand how some relations between organisms and the environment are intentional relations.” (Steiner [forthcoming])

To these critical aspects, it should be added that teleosemiotics, which is fundamental to accounting for Ur-intentionality, relies on the evolutionary history of organisms to define their normative ways of detecting and responding to specific environmental stimuli. Nevertheless, the idea that organisms are already predisposed by natural selection to respond to environmental stimuli and that their directionality is biologically fixed raises the risk of reducing Ur-intentionality to the behaviourist stimulus-response scheme. Indeed, some authors have equated REC’s account of the functioning of content-free cognition with the stimulus-response behaviourism model (O’Brien & Opie, 2015: 724) Accordingly, what has not worked for behaviourism, and therefore risks not working for enactivism, is the fact that moment-to-moment stimuli are too poor to explain the selective capacity of evolved creatures and the complexity, variety, and specificity of their behaviour.Footnote 11 If this were the case, REC could not explain a biologically credible story of how organisms endowed only with content-free mental states could engage in flexible types of basic and social cognition necessary to triangulate in primitive, nonlinguistic ways with others.

To answer this criticism, Hutto & Myin refer to Dorith Bar-On’s account of animal expressive attitudes. In particular, Bar-On (2013a, b; Bar-On & Green, 2010) suggests against a discontinuist and skeptical perspective on the phylogeny of linguistic communication from prelinguistic communication to refer to expressive behaviour, and the kind of communication it affords, as exhibiting affective and cognitive “features that foreshadow significant semantic and pragmatic aspects of linguistic communication, suggesting important and perhaps unexpected ways in which linguistic and nonlinguistic animal communication lie on a natural continuum” (Bar-On, 2013b: 344). Behind Bar-On’s hypothesis, there is the idea that many human and nonhuman bodily gestures and vocalizations, including facial contortions and bodily demeanors, do not only convey information about the producer’s biologically significant attributes, they also involve “an overt gaze direction, head tilt, or distinctive bodily orientation guiding the receiver’s attention not only to the expressive agent’s affective state but also to the object of that state—the source or target of the relevant state” (Bar-On, 2013a: 318). Thus, even if expressive communication is not intentional, it presents characteristics that may foreshadow significant aspects of linguistic communication.

Following Bar-On, Hutto and Myin argue that expressive attitudes of the kind that feature in prelinguistic triangulation are “subtle and adjustable responses to sophisticated patterns of expressive behavior” (Hutto & Myin, 2017: 143). To understand these interactions, it is necessary to take into account the complexity of the animal communication network. So, even if such expressive behaviours are not understood as involving fully formed communicative intentions and internal representations, they cannot be reduced to automatic physiological reactions, for they show a significant degree of spontaneity.

Notwithstanding the reference to Bar-On’s expressive behaviour, Hutto and Myin provide too little detail on the question of the transition from Ur-intentional to intentional communicative interactions.

RECers themselves recognize the difficulty. It is not incidental that they are wary of conceding a continuity between basic and higher cognition, and thus between an Ur-intentional contentless attitude and an intentional contentful, i.e., directed toward a mental content, attitude, preferring a discontinuist perspective between preverbal and verbal communication. More specifically, the RECers only partially answer the question of how to explain the transition from teleosemiotics to teleosemantics.

The main reason for RECers’ view lies, I think, in their conception of language as equated with mental content, i.e., their old representationalism about language, which does not help to bridge the cognitive gap.Footnote 12

As seen above, thanks to dynamic sociocultural processes, basic minds have developed a capacity to attribute meanings to informational stimuli, that is, to what can be called “natural signs” (otherwise, there would be no reason to talk about “teleosemiotics”) without involving interpretative contentful processes. Accordingly, the first bearers of “meaning” – even if Hutto e Myin prefer to talk about “informationally sensitive responses to natural signs” (Hutto & Myin, 2013: 78) – are equated to the first bearers of contents as semantically articulated symbols that occur in an appropriate holistic dynamic pattern. As Hutto and Myin argue, mental content originates through the “mastery of the use of public symbol systems” (Hutto & Myin, 2017: 134). Moreover, referring to Haugeland’s neopragmatist approach to intentional understanding, they regard symbols as including “in principle all the interdependent relationships instituted by the way of life of which they are a part” (Haugeland, 1990: 412).

REC thus acknowledges the public and primarily interactional origin of mental content. As Hutto and Myin argue, “normally developing participants in sociocultural practices are already set up, nonaccidentally, to target and tune into the expressively rich intentional attitudes of others” (Hutto & Myin, 2017: 140. Italic added). However, assuming Haugeland’s view, the public origin of the mental content referred to by RECers would seem to be already linguistic, i.e., propositional. In so doing, they do not explain the nature (instinctual? acquired?) of the reciprocal coordination predisposition of participants in sociocultural practices, nor the transition from a behavioural attribution of meaning to natural signs to the attribution of semantically articulated symbolic content. They must maintain a distinction between a contentless intentionality, i.e., Ur-intentionality, and a content-involving intentionality, i.e., propositional intentionality, without helping to understand the phylogenetic and ontogenetic connection between the two types of intentionality. On the one hand, Hutto and Myin explain contentful cognition referring to the crucial role played by “sociocultural scaffolding”; on the other hand, as they admit, they are “at odds with evolutionary continuity” (Hutto & Myin, 2017: 122). In this way, either the content has to be regarded as incompatible with naturalism, or it has to be rejected in order to preserve naturalism.Footnote 13

The RECers attempt to solve this dilemma by adopting a perspective that considers the “content utterly unprecedented in nature” (Hutto & Myin, 2017: 128). This perspective can be justified through what they defined as a “relaxed” naturalistic explanation of how cognitive processes involving content emerge from contentless minds through the mastery of specific sociocultural practices.Footnote 14 As a result, they argue that language, which involves content and exhibits semantic properties of reference and truth, is a “kinky” outcome of bio-social evolutionary learning processes. This approach allows them to hold both an evolutionary continuity and a psychological discontinuist view.

The issue at hand is: why did RECers not consider the bigger picture and bring together the communicative interaction and the emergence of the mind with content and language? Why did they not go a little further to explicate the emergence of symbolic language from more basic forms of communicative expressive behaviours? They only offer a possible story for how content might have emerged, but they do not delve deeper than a preliminary philosophical sketch. Essentially, their radical position is quite moderate when it comes to the emergence of content and language, as they only argue that it is not impossible for them to emerge.

As a matter of fact, drawing on Hutto and Myin’s reference to Bar-On’s expressive behaviour as meaningful, and their definition of REC as the extreme take on the “pragmatic turn” (Hutto & Myin, 2017: 37), it is possible to strengthen their sketched view and foster a continuist perspective on language. By adopting a pragmatist enactive approach, we can gain a better understanding of how meaning, language and contentful minds emerge from basic communicative interactions, with spontaneous expressive behaviour at the starting point of linguistic communication. Indeed, although RECers view public interaction primarily based on symbol systems, they must necessarily presuppose a prelinguistic social interaction founded precisely on contentless social practices. Insofar as at the basis of the normative order of social practices there is a community model that imposes itself on the behaviour or states of individual community members in such a way as to confer mental content on them as well, the very possibility of the first community model turns out to be derived from prelinguistic communicative contentless interactions, initially founded on spontaneous expressive behaviours whose nature we can assume to be of gestures.

Since there are many uses of the term ‘gesture’ as well as various classifications,Footnote 15 it is important to make it clear that I first refer to gestures as spontaneous expressive behaviours, in line with both Bar-On and RECers.Footnote 16 In fact, Bar-On includes among the spontaneous expressive behaviours “yelps, growls, teeth-barings, tail-waggings, fear barks, and grimaces, lip smacks, ground slaps, food-begging gestures, ‘play faces’ and play bows, copulation grimaces and screams, pant hoots, alarm, distress, and food calls, grooming grunts, … and so on” (Bar-On, 2013a: 317). These all can be regarded as gestures.

Although spontaneous expressive behaviours can be referred to as gestures, this term does not fully address the transition from prelinguistic to linguistic communication. It is important to understand how these gestures evolve into symbolic language. Answering this question is crucial for developing a phylogenetic explanation of the emergence of contentful cognition that goes beyond associationist or discontinuist views of cognition and language. To address the issue at hand, I suggest looking into Mead’s theory of gesture, which some more recent hypotheses on gesture and language have taken up (McNeill, 2005, 2012; Rizzolatti & Sinigaglia, 2008). By revisiting the key components of Mead’s theory, we can gain valuable insights for developing a continuist non-representationalist theory of language genesis that would overcome the divide between content-less Ur-intentionality and content-involving intentionality, i.e., a semantic propositional intentionality.

4 Emotions and gestures

What role does gesture play in the relationship between language and cognition? How is it possible to explain the transition from gesture to an abstract language not directly linked to a sensory experience?Footnote 17 To address these issues, Mead begins by investigating the psychophysiological aspects involved in emotionally charged situations in which organisms interact with their natural and social environment. His theory of gesture dates back to his early contributions to the elaboration with Dewey of a functionalist theory of emotion built on the better-known James-Lange theory (Dewey, 1894, 1895; Mead, 1895, 2001). In line with James’ focus on the behavioural aspect of emotion, Dewey and Mead elaborated a theory that anticipated and, in some respects, provided the basis for the organic circuit theory (Dewey, 1896).Footnote 18 In particular, they developed a theory of emotions that rooted the Jamesian perspective in a teleological view of the organic (ideo-)sensorimotor process, envisaging in the expression of emotional attitudes the mark of the passing over of emotional attitude into a communicative gesture.Footnote 19 I refer here in particular to what I consider Dewey and Mead’s “theory of emotion,” even if the most known essays are those of Dewey. Mead’s contributions to the theory of emotion have long remained unpublished. Nevertheless, some scholars maintain that this theory, as the better-known organic circuit theory, results from a collaboration between the two colleagues and friends.Footnote 20

According to Dewey and Mead, emotion is a disposition to respond to a problematic situation, that is, a goal-directed mode of behaviour that is reflected in the Affect, i.e., the emotional seizure, in terms of a sensorial consciousness of what is objectively expressed in purpose. Behaviour is, therefore, consistent with James’ theory of emotion, the condensation centre of organic activity in which the emotional attitude is a sub-functional phase in the coordinated transactions of the acting subject. Such emotional attitude is teleological in that bodily movements are not expressions of certain inner states but rather part of the organism’s conduct. The emotional attitude that arises from the breakdown of an ongoing act generates tension between determined behavioural habits and the target situation, paving the way for a functional relationship of the organism with environmental stimuli and with anything related to them in terms of the selection and elaboration of appropriate responses to the given situation. Such a behavioural mode consists of the organic coordination between (ideo-) sensorimotor and vegetative-motor activities, the awareness of which would constitute the emotional seizure (Dewey, 1895: 180; Mead, 1895). More precisely, the act to be modified or corrected and their partial inhibition are the kinaesthetic activities that result in the sensory object or stimulus, and the vegetative-motor activities are the reaction or response to the object—the distinction between object and response being a functional interpretation rather than a distinction found in experience (Mead, 1894). Since emotion is the motor expression in the organic process, the distinction between stimulus and response presupposes at its basis an active and dynamic conception of the sensorimotor process through which the organism becomes perceptually aware of the sensory stimulus at the moment it proves beneficial for the restoration of the interrupted act.

In a nutshell, from a psychological point of view, emotion is the adaptation or tension of habit and ideal, while from a physiological point of view, organic changes resolve the organism’s effort to adapt to the situation. Indeed, according to Mead, the stimulation of the vasomotor system, increasing blood pressure and heartbeat accompany the bodily movement in the instinctive act, dependent on nervous system activity built into the organism and triggered by relevant stimuli, behind the emotional seizure. Emotional attitudes, and their correlated physiological fringes, are teleological – bodily movements being primarily functional rather than expressive – and make it possible for the organism a sensori-motor and vaso-motor “evaluation of the act” (Mead, 1895: 164) before the coordination of the response to the reaction has been completed. The evaluation is therefore related to sensory discrimination, which is linked to motor adjustments and reactions.

In his abstract, Mead does not provide additional details. However, as we delve into his work, we discover that he views discrimination as the most fundamental way organisms direct to something and make sense of it, even though it does not involve intention. Mead proposes that the shift from sensual to symbolic stimuli is linked to preparatory movements that indicate the instinctive attitude to act in a particular way. As a result, he traces the transition from purely instinctive behaviors to intentional ones in the early stages of the response to a sensible stimulus. In fact, if the emotional tension is overwhelming and the act is not carried out, it results in a pathogenic emotional expression (the cry of fear, for example); if, on the other hand, the tension is somehow redirected in an attempt to carry out the interrupted act that gave rise to the physiological seizure, the emotional seizure is eventually transformed into interest and voluntary conduct (Mead, 2001: 27–29). This would make it possible to retroactively explain the persistence in more evolved forms of life and their responses to symbolic stimuli of emotional attitudes that would reveal the instinctive acts inhibited over time. Symbolic stimuli are thence regarded as aesthetic stimuli, the earliest forms of which can be traced back to the war and love dances of earlier societies, i.e., social manifestations with social functions, which in their aesthetic reproduction embody the teleological character of the original instinctive acts.

The evolution from the instinctive act to the symbolic stimulus lies therefore in the qualitative shift between the selfish instincts associated with the immediate consumption of the organism’s needs and the social instincts, according to which an individual’s conduct is “determined by the movements of other individuals” (Mead, 2001: 3), for their satisfaction.Footnote 21

As a matter of fact, Mead addresses the Jamesian need for an empirical physiological approach to explaining emotion and the organic evolution of consciousness better than Dewey.Footnote 22 Further elaborating on James’s idea of the entire circulatory system as a “sounding board,” Mead focuses on the instinctive act preceding the physiological reaction to the sensory stimulus and on the close connection between the change in the body’s physical state and the qualitative differentiation of emotional tones in the first bodily movements preparatory to the action. Moreover, his reference to discrimination as a natural selective attitude is akin to James’ “attention,” namely an organism’s capacity to select stimuli in a natural and social environment and readjust in reference to them (James, 1981). Dewey explicitly refers to Mead’s physiological explanation of emotion as the organic-functional coordination between sensorimotor and vegetative-motor activities (Dewey, 1894). However, it should also be noted that in the two texts from the 1890s Mead does not yet refer to the notion of gesture, whereas in his article Dewey mentions it, speculating that gestures and signs might be evolved by selection (Dewey, 1894: 167n). Therefore, it is not clear if Dewey’s hypothesis that gestures and signs are expected to evolve through selection has inspired Mead’s theory of gestural conversation. Indeed, Dewey merely includes this hypothesis in a footnote without elaborating on it (Garrison, 2003: 412). What seems certain is that for both authors, the reference to gesture comes from Wilhelm Wundt. Dewey had indeed used Wundt’s Grundzüge der physiologische Psychologie in the development of his Psychology (1891/1967), while Mead had attended Wundt’s courses during his doctoral period in Germany (cf. Cook, 1993: 20–6).

Between the two, however, the Wundtian notion of gesture would become prominent for Mead, who later elaborated his behavioural theory of meaning and language, identifying gesture as a key element in the development of symbolic communication.

Nevertheless, Mead’s use of gesture is devoid of Wundt’s associationist framework in which it was embedded, which did not resolve the mind-body dichotomy. Indeed, according to Mead (1903, 1904, 1906), Wundt’s theory of language was, like his psychological theory,Footnote 23 part of the associationist perspective that indicated gesture as a phenomenon associated with psychic states. This prevented him from accounting for the genesis of consciousness from gestural communication, as he characterised gesture only as an expression of emotions, wrongly assuming imitation to cooperation.

In particular, Wundt (1912)Footnote 24 maintained that gesture arose as an affective expression. And since every affective state contains emotionally charged ideas, it later becomes expression of ideas. In fact, the consensus of organisms’ emotions can occur through the passage of corresponding ideas between them. Accordingly, mimic language, the language of earlier human societies, does not result from intellectual reflection and intentional purpose. It instead emerges from involuntary expressive movements that accompany affectivity. Mimic language awakens in others the same emotion expressed by the individual who performs the mimic gesture and that it awakens in others the same representation so that others can respond with the same expressive movements or with others that have been slightly modified. The evolution of phonetic language can be explained similarly to the evolution of the natural language of gestures. The difference is that the auditory faculty added phonetic gestures to mimic and pantomimic gestures, which quickly prevailed over the others due to their greater observability and modifiability. From a psychological point of view, Wundt defined this process as a two-act succession. In the beginning, the expressive moves of individuals taking the form of acts of impulsive volition and including changes in the organs of phonation become dominant in response to a communicative need. Later, the associations between sound and representation that follow these movements slowly consolidate and extend to the speaking community. Then, additional physical and psychic conditions follow, leading to phonetic and semantic change.

According to Mead, there are two main critical points of Wundt’s theory. The first criticism concerns Wundt’s fundamental ambiguity in characterizing the stages of consciousness. He in fact relegated emotions and volitions to the original immediacy of unanalysed experience. They thence seemed to have no other purpose than to provide, in the form of a symbol, the sensible element necessary for knowledge mediated by representational concepts. In this way, they assumed a purely formal value. Accordingly, a dilemma between two methods of presentation underlies Wundt’s theory of language: on the one hand, he adopted a structure-based approach to the evolution of language, specifically of the associative and related processes that depend on the nervous system; on the other hand, he referred to a functional relationship that relies on attention and apperception. Describing apperception as a function of consciousness through which a psychic content is led to a sharper apprehension immediately in the act of accomplishing itself with the cooperation of attention (Wundt, 1913: 307–8), Wundt felt into a dualistic explanation that refers to gesture as a phenomenon associated with psychic states, while failing to explain the nature and function of apperceptive processes of synthesis. Since he characterised gesture merely as an expression of emotion, he failed to account for the genesis of consciousness from gestural communication. Apperception refers to conscious activities in terms of elements, i.e., representational elements, that are already connected to associations between past and immediate experience. It seems, therefore, that the meaning of immediate experience must necessarily refer to contentful associations already worked out in consciousness. Then, there is no place for the subject’s sensorimotor active processing, but only for a permanent re-presentation of the same psychic contents acquired in the past. Thus, the dualism between the sensorimotor process and psychic content is re-proposed, neglecting that “communication is fundamental to the nature of the so-called mind” (Mead, 2015: 50).

The second criticism of Wundt’s theory was the mimic nature of language evolution.Footnote 25 Assuming the imitative process does not account for the differences in responses to the same stimuli. To see and record the movements of someone performing an action, to hear and record someone making a particular vocal gesture does not legitimise the idea that it is only through imitation that we learn the motor idea of that action or that gesture. Instead, it is needed to presuppose a process of interactive coordination to imitation, for only within a theory of social stimuli and responses and the social situations that create these stimuli and responses does imitation find its proper place. As Mead writes:

[imitation] gives no solution for the origin of language. We have to come back to some situation out of which we can reach some symbol that will have an identical meaning, and we cannot get it out of a mere instinct of imitation, as such. There is no evidence that the gesture generally tends to call out the same gesture in the other organism. […] as soon as you recognize in the organism a set of acts which carry out the processes which are essential to the life of the form, and undertake to put the sensitive or sensory experience into that scheme, the sensitive experience, as stimulus we will say to the response, cannot be a stimulus simply to reproduce what is seen and heard; it is rather a stimulus for the carrying out of the organic process. (Mead, 2015: 59-60)

5 Gestures as inhibited acts

As we have seen so far, according to Mead language has to be rooted in the social nature of primitive instincts. The close intertwining of the biological and social dimensions is grafted onto an evolutionary perspective that points to unreflective social conduct as the expression of biological mechanisms underlying the development of reflexive conduct, rooting the capacity for symbolic communication in the process of biological-relational evolution. More specifically, the elements of coordination of social behaviour and communication are already present in the evolution of the initial phases of instinctive acts and their physiological correlates, characterized by emotional content and expression. As Mead writes:

Before conscious communication by symbols arises in gestures, signs, and articulate sounds there exists in these earliest stages of acts and their physiological fringes, the means of co-ordinating social conduct, the means of unconscious communication. And conscious communication has made use of these very expressions of the emotion to build up its signs. They were already signs. They had been already naturally selected and preserved as signs in unreflective social conduct before they were specialized as symbols. (Mead, 2001: 3)

It is important to note that when Mead uses the term “unconscious”, he is referring to a stage in evolution where organisms were not yet “conscious”, meaning they were not aware of their attitudes towards a particular situation. Therefore, the term “unconscious” pertains at first to thoughtless, automatic reflexes (James, 1981: I, 36–8). However, Mead refers to unconscious communication as part of acts articulated in multiple phases, and to the expressions of emotion as functional to coordinating social conduct. This reveals a complexity of reflex unconscious actions that cannot be reduced to a simple stimulus-response schema, rather undermining the hard and fast separation of stimulus and response. As seen above, emotional attitude is an integral part of the interrupted motor act and cannot be detached from bodily movement. Hence, it is inconceivable to consider the stimulus as coming before the organism’s selective attitude. As emotions are the motor expression in the organic process, distinguishing between stimulus and response requires an active and dynamic perception of the sensory process. This process enables the organism’s perception to become sensorially conscious of the stimulus when it proves useful following the interruption of the act. Mead’s concept of “unconscious communication” refers to communication that uses emotional attitudes and their physiological fringes as signs that are naturally selected. This means that emotions are immediately communicative and precede intentional communication, being present in the early stages of social acts and their physical correlates.

The inhibition of action due to the conflict of instincts mediated by the situation in which the organism finds itself would call into question the preparation of the act in its early stages, in which emotional tension assumes the function of indicating to other organisms the response that the organism is about to make to the stimulus received so that the others to whom the expression is addressed can, in turn, respond to the first organism’s stimulus.

We can now better understand gesture’s nature and role according to Mead. As seen above, expressive bodily movements are primarily functional acts, and the expressions of emotions are part of the teleologically determined movements. In particular, they are the reduction of movements and stimulations initially functional to the performance of the act into attitudes to act in response to some stimuli. In these reductions of expressions to attitude, finding the initial elements of communication is possible. The emotional attitude is what was once a complete activity, e.g. the activity of attacking an enemy, which with evolution has been reduced to a tendency to act, the element that by functional co-option has been placed at the basis of the expressive-communicative device. E.g. the dog snarling in anticipation of a fight is the appropriate response to a given external stimulus. However, once the attack is inhibited, the snarling remains the expression of that aborted act that takes on the value of a stimulus for the one to whom the snarling is addressed (Mead, 2015: 14).

Therefore, gestures are truncated acts, namely the earlier stages of the social acts, which mediate the appropriate responses of other individuals in the same groups. They are preparations of the act, i.e., the inhibited behaviors that became expressive. The earlier stages include the beginnings of “hostility, wooing and parental care,” the control of the sense-organs that precede and direct the manifest behaviour, the body attitudes that express the readiness to act and the direction the act will take, and the vasomotor preparations for action (flushing of the blood-vessels, changes in the rhythm of breathing, etc.). These early stages of animal reactions are stimuli for forms whose life is conditioned by others’ behaviours. Thus, the early stages of the social acts “must become in the evolutionary process particularly effective as stimuli or, on the contrary, social forms must become particularly sensitive to these early manifest stages of social acts” (Mead, 1964: 123–4). This also explains how certain gestures that initially constituted the beginning of an act persisted in the evolutionary process by modifying their original function. In other words, they experienced a process of what we could refer to, on the suggestion of Gould and Vrba (1982), as an “exaptation” that led them to become stimuli for a given response in another form of life. Mead gives examples of courtship and fighting, in which gestures mediate the sequence of stimuli relating to reproductive and hostile responses.

This interplay of preliminary and preparatory processes, even in the conduct of animal forms lower than human beings, places the animals en rapport with each other and leads wooing, quarreling, and animal-play to relatively independent activities that answer to human intercourse.

Behind these manifestations are the emotions that arise when an act is interrupted. However, a gesture is not merely the psychophysical equivalent of emotional consciousness (Wundt), nor is it reducible to the expression of an emotion (Darwin), nor is its function just that of releasing the excess of energy generated in adjusting oneself to the indication of actions on the part of the other individual (Dewey). Although the gesture reveals an emotion, its primary function is to promote the reciprocal adaptation of a changing social response to a changing social stimulation when stimulus and response are in the first overt stages of social acts (Mead, 1964: 125).

The passage from sensorial to symbolic stimulus is rooted in the qualitative differentiation of emotional tones expressed in the different instinctive attitudes. In particular, the emotional attitude expressed in inhibited acts is the first phase of the rise of meaning from the gestural interaction between organism and environment and the mutual adaptation between social stimulus, individual response, and activities at which these processes eventually arrive. The mere reference to the original social interaction situation would not otherwise have allowed bodily and vocal gestures to become meaningful. It was firstly the reference to the change in the expression of other individuals involved in the act from a mere outcome of the nervous excitement in meaning, which allowed the development of communication, shared understanding, and mutual recognition within the field of social interaction.

Accordingly, neither the emergence of social consciousness nor the development of human communication is based on an imitative process (pace Wundt). To put it bluntly: “Imitation becomes comprehensible when there is a consciousness of other selves, and not before” (Mead, 1964: 100). What, then?

The probable beginning of human communication was in cooperation, not in imitation, where conduct differed and yet where the act of the one answered to and called out the act of the other. The conception of imitation as it has functioned in social psychology needs to be developed into a theory of social stimulation and response and of the social situations which these stimulations and responses create. Here we have the matter and the form of the social object, and here we have also the medium of communication and reflection. (Mead, 1964: 101)Footnote 26

Before the mimetic process, there must be another interactive process, namely cooperation, which is at the basis of the organic process as well as of human communication.

It is worth noting that by the term ‘cooperation’ Mead seems to mean something different and broader than prosocial behaviours, pointing to the reciprocal reactions to the actions of organisms involved in interaction, no matter whether these interactions are antagonistic or collaborative. Therefore, it would be better to use the term ‘coordination’ instead of cooperation to refer to gestural interactions that makes organisms evolve towards competitive or cooperative social acts. Without such coordination, it would not be possible to determine the type of social situation they constitute.

Behind the gestural coordination is the emotional attitude as a relational property, according to which emotions are co-constitutive of the interactions they coordinate. Gestures are, therefore, a communication system.

6 Sense of meaning, selective attitude, and intention

Coordination underlies the possibility of the emergence of meaning, i.e., the organic response to some social and natural stimuli. Indeed, meaning has a bio-social nature expressed in gestures that show a functional identity of the responses of individuals to the same stimulus. This identity is rooted in the coordinative behavioural attitude of individuals as the manifestation of the social character of natural instincts. It is worth noting that Mead distinguishes between two modes of meaning: a sense of meaning and a consciousness of meaning. The sense of meaning is a “feeling of attitude” concerning “the coordination between the process of stimulation and that of response when this is properly mediated” (Mead, 1964: 125). In other words, the sense of meaning is the readiness to respond to natural and social stimuli. This point is particularly crucial for it paves the way to a direct comparison with RECers’s theory of meaning and Ur-intentionality, providing us with other valuable elements to highlight the richness of Mead’s pragmatist proposal.

Like Hutto and Myin’s distinction between basic contentless and contentful cognition, Mead distinguishes the ability to clearly recognize the different elements in the contents of consciousness from the tendencies to react to the different stimulations. As he maintains, in reacting to the stimulations involved in an ongoing act, it is difficult to detect the contents of the response, “either in terms of the attitude of body, the position of the limbs, feel of contracting muscles, or in terms of the memory of past responses” (Mead, 1964: 126). This difficulty is related to the fact that as immediate conduct is controlled by recognized differences in the field of stimulation, the analysed elements of content are of negligible importance:

It is the difference in the visual or auditory or tactual experience which results in changed response. It is the failure to secure a difference in these fields that leads to renewed effort. We are conscious of muscular strain to some degree, but attention follows the changing objects about us that register the success or failure of the activity. It is further true that the more perfect the adjustment between the stimulation and response within the act the less conscious are we of the response itself. Of incomplete adjustment we are aware as awkwardness of movement and uncontrolled reactions. Perfection of adjustment leaves us with only the recognition of the sensuous characteristics of the objects about, and we have only the attitude of familiarity to record the readiness to make a thousand responses to distinctions of vision, sound and feel that lie in our field of stimulation. Yet the meaning of these distinctions in sense experience must lie in the relation of the stimulation to the response. (Mead, 1964: 126-7. Italics added)

Organisms do not interact with the world by abstracting and analyzing elements of the environment. They instead organically interact with the environment that stimulates their responses, that is, they enact the world around them.

Mead’s “sense of meaning” can be seen as a prodromic version of RECers’ teleosemiotics as a contentless way organisms have to interact purposefully with the world around them. As we have seen, behind teleosemiotics is the “Ur-Intentionality,” namely a primitive kind of intentionality to which the sense-reference distinction does not apply. REC modifies the classical teleosemantics, according to which mental representations have the biological function of enabling organisms to keep track of specific worldly items, to a teleosemiotics, namely a teleological explanation of the interaction between an organism and its world not characterized in semantic terms such as reference or truth. The aim of this modification is to explain the semantics of language through a basic semantic rule-following to be found in the natural world. RECers refer to natural relations between organisms and their environments to explain Ur-intentionality. For this reason, they retain the idea from teleosemantics that “intentional directedness has a normative dimension such that it does not reduce to mere behavior or dispositions” (Hutto & Myin, 2017: 116), and attribute to such a normative dimension behind the determining of the objects of intentional attitudes a biological and evolutionary nature.

While Mead’s sense of meaning is particularly akin to Hutto and Myin’s teleosemiotics, he does not refer explicitly to the intention to explain an organism’s directness to something. Instead, he refers to an instinctual form of basic cognition: an organism’s selective attitude “toward its environment and the readjustment that follows upon such a selection”. This selection, which Mead calls “discrimination”, is in the higher forms of cognition “the pointing-out of things and the analysis in this pointing. This is a process of labeling the elements so that you can refer to each under its proper tag, whether that tag is a pointing of the finger, a vocal gesture, or a written word” (Mead, 1936: 350–1). In other words, at the roots of basic forms of life’s directedness towards elements of the environment, there is a natural selective attitude, close to what James (1981) has called “attention,” namely a capacity an organism has to select stimuli in a natural and social environment and to readjust in reference to them. This capacity, which is nearly ubiquitous in contemporary neuroscientific perspectives (see Bisley & Goldberg, 2010), is rooted in a biological preconscious function arising from the interaction between neural signals and social and natural environmental stimuli. Discrimination is the most basic form of knowledge, that is the most basic way of “getting the tools” (Mead, 1936: 351) to enact through gestures the natural and social world around us.

Thus, on the one hand discrimination seems to be really akin to Ur-intentionality, on the other hand, gestures can be regarded as “sensitive responses to natural signs” (Hutto & Myin, 2013: 78), that is, to informational natural and social stimuli. Accordingly, Mead’s idea of the functional identity of the gestural responses of individuals to the same stimulus can be seen as analogous to RECers’ “teleosemiotic” uniformity at the basis of the genesis of semantics through the conditioning of bio-social canons and structures that have their roots in prelinguistic behavioural attitudes.

In a nutshell, gestures are natural signs that are part of and contribute to the development of the organisms’ selective capacity and hence to their direction of attention to get some sense of meaning of the world around them. And they mark the continuity between the sense of meaning and the emergence of the “consciousness of meaning,” i.e., the ability to associate a stimulus with a mental content, the basis of the emergence of symbolic language. The transition between these two modes of meaning provides a strong continuist hypothesis between a preverbal and a linguistic dimension of cognition. Let us see how.

7 Meaning and consciousness

In basic minded organisms’ interactions, the contentless character of sensorimotor cognition involved in interaction, i.e. the sense of meaning, prevails:

the interplay of social conduct turns upon changes of attitude, upon signs of response. In themselves these signs of response become simply other stimulations to which the individual replies by means of other responses and do not at first seem to present a situation essentially different from that of the man hesitating before the uncertainties of the morning sky. (Mead, 1964: 130)

In such interactions, attention is predominantly enactive, in the sense that the action is closely and immediately intertwined with the social stimuli involved in the ongoing act. Here, the value of social stimuli is not “represented,” so to speak, in the mind that discriminates them with truth criteria.

Nonetheless, from these kinds of interactions a new type of interaction emerged in the evolutionary history of organisms in which the construction of the coordinated act was closely intertwined with the ability to anticipate the responses of others to one’s gestures, so that an unintelligent gesture acquired “just the value which is connoted by signification, both in its specific applications and in its universality” (Mead, 1964: 246). The ability to anticipate others’ reactions to one’s own behavioural attitudes is thence a further step toward language and contentful cognition.

In this framework, the gesture is no longer an immediate reaction to a specific stimulus. On the contrary, it becomes the interweaving of sensorimotor stimuli and ideo-sensorimotor anticipations of interaction subjects to norms of perception and action. The consciousness of meaning, i.e., the organism’s awareness of its attitude towards the situation it is about to react, only arises in the reciprocal adaptive relationship between social stimulation and response and the activities to which these processes eventually lead. Accordingly, the consciousness of meaning naturally emerges as an intentional capacity in social conduct through mutual adaptation and the activities in which these processes eventually issue. As Mead writes:

the feels of one’s own responses become the natural objects of attention, since they interpret first of all attitudes of others which have called them out, in the second place, because they give the material in which one can state his own value as a stimulus to the conduct of others (Mead, 1964: 132).Footnote 27

The ability to be aware of one’s own actions is intertwined with the ability to feel one’s own responses as the way to interpret others’ behavioural attitudes, and, through them, influence their conduct, i.e., responding to social stimuli in an active manner. The meaning of gestures involved in these interactions is not based on representation. It is instead a relational “mode of presentation” (Thompson, 2018) of evolving content implying imagery, namely the property of a particular field of interacting events related to the change in other’s gesture response and of the agent’s physiological mechanism, which “arouse the tendency to respond in still different fashion” (Mead, 1964: 133). Imagery is so merged with the attitudes which call it out, as well as «with incipient muscular reactions, that it is difficult to define and isolate it in our actual experience» (Mead 2002/1932: 96). In other words, the evolving content implying imagery is enactively identical to gestures and behavioural attitudes in interactions with others. Its physiological mechanism relies on the central nervous system, by means of which also the genesis of human minds “out of the human social process of experience and behavior – out of the human matrix of social relations and interactions – is made biologically possible in human individuals” (Mead, 2015: 237n).Footnote 28

Of course, sense of meaning and consciousness of meaning are not antithetical, and it would be a mistake to see them as two alternative ways of attributing meaning. Just as it would be a mistake from a Meadian pragmatist perspective, to distinguish between an instinctive and a conscious reaction to social stimuli that underly two different kinds of recognition, one “of impulse or instinct”Footnote 29 namely a contentless re-cognition, “and another of reason”, namely a contentful re-cognition. This mistake would be even worse if we assume that the mental contents “do not arise within the impulsive life and form a real part thereof” (Mead, 2015: 347–8).

8 From vocal gesture to symbolic language

In order to complete the framework sketched so far, an additional element must be added, which is essential to acknowledge the transition from gesture to language: the vocal gesture.

According to Mead, the vocal gesture is the most important among the gestures, for when a form hears her gesture as the others hear it, the opportunity emerges, and the means are offered to analyse and bring to consciousness her responses, her habits of action, as distinct from the stimuli that call them forth. In other words, when an organism that makes use of that vocal gesture hears the resulting sound, there arises within it at least a tendency to respond in the same way as the other organism will have been aroused within itself. Accordingly, the significant vocal gesture becomes a significant symbolic gesture when it has the same effect on the agent and the recipient of the gesture:

In the case of the vocal gesture the form hears its own stimulus just as when this is used by other forms, so it tends to respond also to its own stimulus as it responds to the stimulus of other forms. […] The vocal gesture, then, has an importance which no other gesture has. We cannot see ourselves when our face assumes a certain expression. If we hear ourselves speak we are more apt to pay attention. One hears himself when he is irritated using a tone that is of an irritable quality, and so catches himself. But in the facial expression of irritation the stimulus is not one that calls out an expression in the individual which it calls out in the other. One is more apt to catch himself up and control himself in the vocal gesture than in the expression of the countenance. […] If we exclude vocal gestures, it is only by the use of the mirror that one could reach the position where he responds to his own gestures as other people respond. But the vocal gesture is one which does give one this capacity for answering to one's own stimulus as another would answer. (Mead, 2015: 65-6)

In the evolutionary history, the vocal gesture, which has accompanied the behavioural gesture, eventually replaced it. With the transition from vocal to symbolic gesture, the original expression of sensation is replaced by a symbolic vocal gesture.

Therefore, the human ability to reproduce the conversation of gestures proper to the social dynamic has its condition in the vocal gesture, i.e., in the subject’s ability to influence herself in the same way she influences the other. Vocal gestures are organisms’ practical involvement with the environment as interwoven with the evolution of semantic intentionality from behavioural-based sense of meaning rooted in the organism’s capacity to discriminate and respond to social stimuli. Language does nothing but identify a situation that already exists in it logically and emotionally through the social process. This also means that the consciousness of meaning can be described, explained, or defined in terms of symbolic language only in its highest and most complex phase of development, the phase it reaches in human experience. Symbolic language is, according to Mead, merely a significant or conscious gesture, namely “a highly specialized form” of gesture (Mead, 1964: 132).

Accordingly, language “is not an affair of the individual soul” which represents it as a content of mind, and “its laws are frequently generalizations which would not have the slightest meaning if read into terms of the experience of the individual soul” (Mead, 2015: 377–8), that is, as the expression of mental states which evaluate it just in terms of semantic properties of truth and references. In other words, language is not a “Language of Thought.” On the contrary, its laws are often the product of generalisations of practical uses of symbols that cannot have the slightest meaning when interpreted in terms of truth and accuracy.

9 Mead’s bio-social theory of gesture and its relevance today

As we have seen so far, Mead proposes a theory about the development of symbolic language, suggesting that the transition from non-verbal to verbal communication is based on the idea of communicative gestures. He believes that living organisms have a natural inclination towards social coordination, which he calls the “social character of instincts”. This character is rooted in emotional interactions but is not the same as the physiological response that accompanies them.

As mentioned previously, Mead’s theory on gesture and language has influenced more recent hypotheses. Some authors have suggested that the discovery of mirror neurons could support Mead’s idea of “physiological fringes” behind unconscious communication. The mirror system may be the physiological counterpart to the conditioning of bio-social canons and structures rooted in innate behavioral attitudes of coordination, which could lead to the emergence of human communication. Rizzolatti and Sinigaglia (2008: 50, 155) quote Mead’s Concerning Animal Perception (1908) to support the hypothesis that the genesis of human language is rooted in the manipulative-gestural capacity coupled with the expression of emotions. They suggest that our pre-reflective understanding of the gestures of others, which is linked to the activation of the mirror system in our brains, played a crucial role in the evolution of language. In other words, the development of language may have been supported by our ability to recognize and imitate the gestures of others on a subconscious level..Footnote 30

In describing the function of mirror neurons, Rizzolatti and Sinigaglia refer to “comprehension of the meaning of ‘motor events’, i.e. actions performed by others” (Rizzolatti & Sinigaglia, 2008: 97), as immediate perception of the meaning of these ‘motor events’ and their interpretation in terms of “intentional acts” (Rizzolatti & Sinigaglia, 2008: 98). They further specify that by “understanding” they do not mean “explicit or even reflexive knowledge”, but “much more simply” the “ability to immediately recognise a specific type of action in the observed ‘motor events’, a specific type of action that is characterised by a particular way of interacting with objects; to differentiate this type of action from another and, finally, to use this information to respond in the most appropriate way”. (Rizzolatti & Sinigaglia, 2008: 97–8). However, simply activating neurons is not enough to fully comprehend these events and attribute intentionality to others' actions. The mimetic mechanism of neurons alone cannot account for the ability to differentiate between different organisms' responses to a single stimulus.Footnote 31 As Mead pointed out, observing someone's actions or hearing their vocal gestures does not automatically mean that we have acquired the motor idea of that action or gesture through neural-mimetic activation.Footnote 32

It is important to mention that Rizzolatti and Sinigaglia specify that an observer’s understanding of movements depends on the vocabulary of actions they have at their disposal. This vocabulary determines their possibilities of action (Rizzolatti & Sinigaglia, 2008: 96). The question then arises, how does one acquire this vocabulary? The process of imitation presupposes social coordination, which is based on natural equipment. According to Mead (2015: 237n) the human central nervous system is responsible for the biological development of minds and selves within the context of human social relations and interactions. While this biological equipment underlies social instincts, it is not sufficient. Certain behavioral attitudes of an individual must also serve as stimuli for others to respond in a specific way. As Mead writes:

an organization of social instincts gives rise to many situations which have the outward appearance of imitation, but these situations – those in which, under the influence of social stimulation, one form does what others are doing – are no more responsible for the appearance in consciousness of other selves that answer to our own than are the situations which call out different and even opposed reactions. (Mead, 1908/1964: 100)

Mead’s perspective implies that a neural simulation mechanism should be regarded as an automatic, embodied mechanism where the neural recognition of a conspecific's behavior is a precondition. Together with social interactions, this mechanism allows for the teleological process of interpreting and differentiating gestures. This emphasizes the interdependence and reciprocal conditioning between neural activation and the process of cooperative interaction that takes place in an organic relationship.

It is important to note that the notion of “simulation” can mean different things to various neuroscientists. Gallese and Goldman (1998), for instance, define simulation as the neural mechanism that allows us to understand the minds of others. On the other hand, some believe that simulation is a conscious process that requires a deliberate reenactment of past performed actions (Decety & Ingvar, 1990).Footnote 33

To avoid confusion, it is important to differentiate between two meanings of simulation – one referring to the functioning of neurons and the other to higher cognitive processes involved in relational experiences. When higher cognitive processes are involved, simulation indicates an organism’s capacity to project itself into the position of another to try to understand its intentions. On the other hand, when referring to mirror neurons, simulation is an automatic mechanism that forms the basis of recognizing the gestures of others. This mechanism is at the core of interpreting and distinguishing gestures – a gesture only holds meaning when it elicits the same response in the person making it as it does in the recipient, while also allowing for a unique reaction.

David McNeill (2005, 2012) is another author who argued that the discovery of the mirror system is a scientific accomplishment of Mead’s theory of gesture. Indeed, he proposed a hypothesis about the evolution of language that aligns with Mead’s perspective, stressing the importance of the relationship between spoken language, gestural communication, and perception. He refers to what he calls Mead’s Loop and proposes that spoken language and gesture have evolved together. In particular, gesture plays a key role in the language system, due to the organisms’ ability, rooted in the mirror neuron circuit, to respond to one’s own gestures in the same way as others. Essentially, the mirror system provides a biological basis for organisms assuming or using the gesture which another organism would use and respond or tend to respond to it in the same way (Mead, 1964: 243), thus enabling the organism to attribute meaning to its own gesture. McNeill focuses on the function of the social stimulus that gestures take on, as this function makes it possible to highlight a kind of organisms’ self-socialisation that depends on their biological-social mechanism of simulating others’ response. The evolution of gestural interaction has led to two key components: imagery and social context. These components emphasize, on the one hand, organisms’ ability to synchronise gestures with vocalisation on the basis of meanings beyond the actions themselves and, on the other hand, the cooperation of neural circuits to organise sequential actions through the meanings of the actions themselves (McNeill, 2005, pp. 50–2). This idea aligns with Mead’s notion of imagery as enactively identical to gestures and behavioural attitudes in interactions with others, and can be seen as an embedded property of interacting events and the agent’s physiological mechanisms that also biologically make intentional, skillful and unreflective bodily activities possible. Additionally, it is in line with Hutto’s Mimetic Ability Hypothesis (Hutto, 2008: 206ff), which suggests that the growth of imaginative and recreative abilities underlies impressive mimetic skills. These skills best account for the sophisticated social exchanges of hominids, including those involved in their capacity to form and learn symbolic language.Footnote 34

10 Conclusion

In my paper, I have presented an argument in favour of a pragmatist enactive approach to action, cognition and language. This perspective can provide a continuist view that complements the perspective presented by REC on how contentful cognition and language may have emerged in nature. I have specifically focused on Mead’s pragmatist theory of gesture to support this argument. Mead’s proposal allows the transition from gestural interaction to symbolic language as strictly intertwined with the emergence of contentful cognition from contentless cognition. By postponing the imitative mechanism to interactive coordination between organisms, Mead stresses the intertwining of the biological and relational dimensions of communication, offering a non-reductive naturalistic explanation of the emergence of language. Mead’s sense of meaning offers some clarifying elements on the possibility of placing the emergence of RECers’ Ur-intentionality within a naturalised framework of evolutionary continuity. As meaning is external to the mind, gestural interaction and verbal communication can be regarded as elements of a primarily enactive cognition that does not require representational references to function. Mead further elaborates his behavioural theory of meaning as the basis of the evolution of symbolic language from gestural communication, taking the difference between a sense of meaning and a consciousness of meaning as a starting point. By referring to gestures as the practical involvement with the environment, Mead provides the basis for elaborating a theory of the consciousness of meaning that precedes the development of semantic intentionality. Language is not an extraordinary event, but rather an extension of our primitive behaviour, i.e. a mastery of the ability to use words and gestures as tools for action. It is just a tool we have to enact the world around us. As claimed in the last section, the identification of mirror neurons has led to a resurgence of Mead’s theory as applicable in elucidating the innate social dimension and communicative ability of humans via the conditioning of bio-social canons and structures. Hence, I believe Mead’s theory of gesture should be included among the extensive array of present-day studies that propose a continuity between action, gesture, cognition and language. His theory connects prelinguistic and linguistic communication, providing a continuist perspective on the evolution of symbolic language and contentful cognition.