Chapter 11: Language

H.W., A WW II VETERAN, was 60, robust and physically fit, running his multi-million-dollar business, when he suffered a massive left-hemisphere stroke. After partially recovering, H.W. was left with a slight right-sided hemiparesis (muscle weakness) and a slight deficit in face recognition. His intellectual abilities were unaffected, and in a test of visuospatial reasoning, he ranked between the 91st and 95th percentiles for his age. When he decided to return to manning the helm of his company, he enlisted the help of two of his sons because he had been left with a difficult language problem: He couldn’t name most objects.

H.W. suffered from a severe anomia, the inability to find the words to label things in the world. Testing revealed that H.W. could retrieve adjectives better than verbs, but his retrieval of nouns was the most severely affected. He could understand both what was said to him and written language, but he had problems naming objects, though not with speaking per se. As the case of H.W. shows, anomia can be strikingly discrete. In one test where he was shown 60 items and asked to name them, H.W. could name only one item, a house. He was impaired in word repetition tests, oral reading of words and phrases, and generating numbers. Although he suffered what most would consider a devastating brain injury, H.W. was able to compensate for his loss. He could hold highly intellectual conversations through a combination of circumlocutions, pointing, pantomiming, and drawing the first letter of the word he wanted to say. For instance, in response to a question about where he grew up, he replied to researcher Margaret Funnell:

Though H.W. was unable to produce nouns to describe aspects of his childhood, he did use proper grammatical structures and was able to pantomime the words he wanted. He was acutely aware of his deficits.

H.W.’s problem was not one of object knowledge. He knew what an object was and its use. He simply could not produce the word. He also knew that when he saw the word he wanted, he would recognize it. To demonstrate this, he would be given a description of something and then asked how sure he was about being able to pick the correct word for it from a list of 10 words. As an example, when asked if he knew the automobile instrument that measures mileage, he said he would recognize the word with 100 % accuracy.

We understand from H.W. that retrieval of object knowledge is not the same as retrieval of the linguistic label (the name of the object). You may have experienced this yourself: Sometimes when you try say someone’s name, you can’t come up with the correct one, but when someone tries to help you and mentions a bunch of names, you know for sure which ones are not correct. This experience is called the tip-of-the-tongue phenomenon. H.W.’s problems further illustrate that the ability to produce speech is not the same thing as the ability to comprehend language, and indeed the networks involved in language comprehension and language production differ.

Of all the higher functions humans possess, language is perhaps the most specialized and refined and may well be what most clearly distinguishes our species. Although animals have sophisticated systems for communication, the abilities of even the most prolific of our primate relatives is far inferior to that of humans. Because there is no animal homolog for human language, language is less well understood than sensation, memory, or motor control. Human language arises from the abilities of the brain, and thus, is called a natural language. It can be written, spoken, and gestured. It uses symbolic coding of information to communicate both concrete information and abstract ideas. Human language can convey information about the past, the present, and our plans for the future. Language allows humans to pass information between social partners. We also can gain information from those who are not present as well as from those who are no longer alive. Thus, we can learn from our own experiences and from those of previous generations (if we are willing to).

In this chapter, we concentrate on the cognitive neuroscience of language: how language arises from the structure and function of the human brain. Our current understanding began with the 19th-century researchers who investigated the topic through the study of patients with language deficits. Their findings produced the “classical model” of language, which emphasized that specific brain regions performed specific tasks, such as language comprehension and language production.

In the 1960s, researchers developed a renewed interest in studying patients with language deficits to understand the neural structures that enable language. At the same time, psycholinguistics, a branch of psychology and linguistics, used a different approach, concentrating on the cognitive processes underlying language. Cognitive neuroscience incorporates these neuropsychological and psycholinguistic approaches to investigate how humans comprehend and produce language. The development of new tools, such as ERP recordings and high-resolution functional and structural neuroimaging (Chapter 3), has accelerated our discovery of the brain bases of language, creating a revolution in cognitive neuroscience. The classical model of language is now being replaced by a new language systems approach, in which investigators are identifying brain networks that support human language and revealing the computational processes they enable.

In this chapter, we discuss how the brain derives meaning from both auditory speech input and visual language input, and how it in turn produces spoken and written language output to communicate meaning to others. We begin with a quick anatomical overview and then describe what we have learned from patients with language deficits. Next, with the help of psycholinguistic and cognitive neuroscience methods, we look at what we know about language comprehension and production and some current neuroanatomical models. Finally, we consider how this miraculous human mental faculty may have arisen through the course of primate evolution.

Split-brain patients as well as patients with lateralized, focal brain lesions have taught us that a great deal of language processing is lateralized to the left-hemisphere regions surrounding the Sylvian fissure. Neuroimaging data, cortical stimulation mapping, and electrical and magnetic brain recording methods are revealing the details of the neuroanatomy of language. Language areas include the left temporal cortex, which includes Wernicke’s area in the posterior superior temporal gyrus, portions of the left anterior temporal cortex, the inferior parietal lobe (which include the supramarginal gyrus and the angular gyrus), the left inferior frontal cortex, which includes Broca’s area, and the left insular cortex (see the Anatomical Orientation box). Collectively, these brain areas, and their interconnections, form the left perisylvian language network of the human brain (they surround the Sylvian fissure; Hagoort, 2013).

The left hemisphere may do the lion’s share of language processing, but the right hemisphere does make some contributions. The right superior temporal sulcus plays a role in processing the rhythm of language (prosody), and the right prefrontal cortex, middle temporal gyrus, and posterior cingulate activate when sentences have metaphorical meaning.

Language production, perception (think lip reading and sign language), and comprehension also involve both motor movement and timing. Thus, all the cortical (premotor cortex, motor cortex, and supplementary motor area—SMA) and subcortical (thalamus, basal ganglia, and cerebellum) structures involved with motor movement and timing that we discussed in Chapter 8, make key contributions to our ability to communicate (Kotz & Schwartze, 2010).

Before the advent of neuroimaging, most of what was discerned about language processing came from studying patients who had brain lesions that resulted in aphasia. Aphasia is a broad term referring to the collective deficits in language comprehension and production that accompany neurological damage, even though the articulatory mechanisms are intact. Aphasia may also be accompanied by speech problems caused by the loss of control over articulatory muscles, known as dysarthria, and deficits in the motor planning of articulations, called speech apraxia. Aphasia is extremely common following brain damage. Approximately 40 % of all strokes (usually those located in the left hemisphere) produce some aphasia, though it may be transient. In many patients, the aphasic symptoms persist, causing lasting problems in producing or understanding spoken and written language.

Broca’s aphasia, also known as anterior aphasia, nonfluent aphasia, expressive aphasia, or agrammatic aphasia, is the oldest and perhaps best-studied form of aphasia. It was first clearly described by Parisian physician Paul Broca in the 19th century. He performed an autopsy on a patient who for several years before his death could speak only a single word, “tan.” Broca observed that the patient had a brain lesion in the posterior portion of the left inferior frontal gyrus, which is made up of the pars triangularis and pars opercularis, now referred to as Broca’s area (Figure 11.1). After studying many patients with language problems, Broca also concluded that brain areas that produce speech were localized in the left hemisphere.

In the most severe forms of Broca’s aphasia, singleutterance patterns of speech, such as that of Broca’s original patient, are often observed. The variability is large, however, and may include unintelligible mutterings, single syllables or words, short simple phrases, sentences that mostly lack function words or grammatical markers, or idioms such as “Barking up the wrong tree.” Sometimes the ability to sing normally is undisturbed, as might be the ability to recite phrases and prose, or to count. The speech of Broca’s aphasics is often telegraphic, coming in uneven bursts, and very effortful (Figure 11.2a). Finding the appropriate word or combination of words and then executing the pronunciation is compromised. This condition is often accompanied by apraxia of speech (Figure 11.2b). Broca’s aphasics are aware of their errors and have a low tolerance for frustration.

Broca’s notion that these aphasics had only a disorder in speech production, however, is not correct. They can also have comprehension deficits related to syntax (rules governing how words are put together in a sentence). Often only the most basic and overlearned grammatical forms are produced and comprehended, a deficit known as agrammatic aphasia. For example, consider the following sentences: “The boy kicked the girl” and “The boy was kicked by the girl.” The first sentence can be understood from word order, and Broca’s aphasics understand such sentences fairly well. But the second sentence has a more complicated grammar, and in such cases Broca’s aphasics would misunderstand who kicked whom (Figure 11.2c).

When Broca first described this disorder, he related it to damage to the cortical region now known as Broca’s area (see Figure 11.1b). Challenges to the idea that Broca’s area was responsible for speech deficits seen in aphasia have been raised since Broca’s time. For example, aphasiologist Nina Dronkers (1996) at the University of California, Davis, reported 22 patients with lesions in Broca’s area, as defined by neuroimaging, but only 10 of these patients had Broca’s aphasia.

Broca never dissected the brain of his original patient, Leborgne, and could therefore not determine whether there was damage to structures in the brain that could not be seen on the surface of the brain. Leborgne’s brain was preserved and is now housed in Musée Dupuytren in Paris (as is Broca’s brain). Recent high-resolution MRI scans showed that Leborgne’s lesions extended into regions underlying the superficial cortical zone of Broca’s area, and included the insular cortex and portions of the basal ganglia (Dronkers et al., 2007). This finding suggested that damage to the classic regions of the frontal cortex known as Broca’s area may not be solely responsible for the speech production deficits of Broca’s aphasics.

Wernicke’s aphasia, also known as posterior aphasia or receptive aphasia, was first described fully by the German physician Carl Wernicke, and is a disorder primarily of language comprehension. Patients with this syndrome have difficulty understanding spoken or written language and sometimes cannot understand language at all. Although their speech is fluent with normal prosody and grammar, what they say is nonsensical.

In performing autopsies on his patients who showed language comprehension problems, Wernicke discovered damage in the posterior regions of the superior temporal gyrus, which has since become known as Wernicke’s area (Figure 11.3). Because auditory processing occurs nearby (anteriorly) in the superior temporal cortex within Heschl’s gyri, Wernicke deduced that this more posterior region participated in the auditory storage of words—as an auditory memory area for words. This view is not commonly proposed today. As with Broca’s aphasia and Broca’s area, inconsistencies are seen in the relationship between brain lesion and language deficit in Wernicke’s aphasia. Lesions that spare Wernicke’s area can also lead to comprehension deficits.

More recent studies have revealed that dense and persistent Wernicke’s aphasia is ensured only if there is damage in Wernicke’s area and in the surrounding cortex of the posterior temporal lobe, or damage to the underlying white matter that connects temporal lobe language areas to other brain regions. Thus, although Wernicke’s area remains in the center of a posterior region of the brain whose functioning is required for normal comprehension, lesions confined to Wernicke’s area produce only temporary Wernicke’s aphasia. It appears that damage to this area does not actually cause the syndrome. Instead, secondary damage due to tissue swelling in surrounding regions contributes to the most severe problems. When swelling around the lesioned cortex goes away, comprehension improves.

In the 1880s, Ludwig Lichtheim introduced the idea of a third region that stored conceptual information about words, not word storage per se. Once a word was retrieved from word storage, he proposed that the word information was sent to the concept area, which supplied all that was associated with the word. Lichtheim first described the classical localizationist model (Figure 11.4), where linguistic information, word storage (A 5 Wernicke’s area), speech planning (M 5 Broca’s area), and conceptual information stores (B) are located in separate brain regions interconnected by white matter tracts. The white matter tract that flows from Wernicke’s area to Broca’s area is the arcuate fasciculus. Wernicke predicted that a certain type of aphasia should result from damage to its fibers. It was not until the late 1950s, when neurologist Norman Geschwind became interested in aphasia and the neurological basis of language, that Wernicke’s connection idea resurfaced and was later revived (Geschwind, 1967). Disconnection syndromes, such as conduction aphasia, have been observed with damage to the arcuate fasciculus (see Figure 11.3).

Conduction aphasics can understand words that they hear or see and can hear their own speech errors, but they cannot repair them. They have problems producing spontaneous speech as well as repeating speech, and sometimes they use words incorrectly. Recall that H.W. was impaired in word-repetition tasks. Similar symptoms, however, are also evident with lesions to the insula and portions of the auditory cortex. One explanation for this similarity may be that damage to other nerve fibers is not detected, or that connections between Wernicke’s area and Broca’s area are not as strong as connections between the more widely spread anterior and posterior language areas outside these regions. Indeed, we now realize that the emphasis should not really be on Broca’s and Wernicke’s areas, but on the brain regions currently understood to be better correlated with the syndromes of Broca’s aphasia and Wernicke’s aphasia. Considered in this way, a lesion to the area surrounding the insula could disconnect comprehension from production areas.

We could predict from the model in Figure 11.4 that damage to the connections between conceptual representation areas (area B) and Wernicke’s area (A) would harm the ability to comprehend spoken inputs but not the ability to repeat what was heard (this is known as transcortical sensory aphasia). Such problems exist as the result of lesions in the supramarginal and angular gyri regions of patients. These patients have the unique ability to repeat what they have heard and to correct grammatical errors when they repeat it, but they are unable to understand the meaning. These findings have been interpreted as evidence that this aphasia may come from losing the ability to access semantic (the meaning of a word) information, without losing syntactic (grammatical) or phonological abilities. A third disconnection syndrome, transcortical motor aphasia, results from a disconnection between the concept centers (B) and Broca’s area (M) while the pathway between Wernicke’s area and Broca’s area remains intact. This condition produces symptoms similar to Broca’s aphasia, yet with the preserved ability to repeat heard phrases. Indeed, these patients may compulsively repeat phrases, a behavior known as echolalia. Finally, global aphasia is a devastating syndrome that results in the inability to both produce and comprehend language. Typically, this type of aphasia is associated with extensive left-hemisphere damage, including Broca’s area, Wernicke’s area, and regions between them.

Although the classical localizationist model could account for many findings, it could not explain all of the neurological observations, nor can it explain current neuroimaging findings. Studies in patients with specific aphasic syndromes have revealed that the classical model’s assumption that only Broca’s and Wernicke’s areas are associated with Broca’s aphasia and Wernicke’s aphasia, respectively, is incorrect. Part of the problem is that the original lesion localizations were not very sophisticated. Another part of the problem lies in the classification of the syndromes themselves: Both Broca’s and Wernicke’s aphasias are associated with a mixed bag of symptoms and do not present with purely production and comprehension deficits, respectively. As we have seen, Broca’s aphasics may have apraxia of speech and problems with comprehension, which are different linguistic processes. It is not at all surprising that this variety of language functions is supported by more than Broca’s area. The tide has turned away from a purely locationist view, and scientists have begun to assume that language emerges from a network of brain regions and their connections. You also may have noticed that the original models of language were mostly concerned with the recognition and production of individual words and, as we discuss later in the chapter, language entails much more than that.

Information about language deficits following brain damage and studies in split-brain patients (see Chapter 4) have provided a wealth of information about the organization of human language in the brain, specifically identifying a left hemisphere language system. Language, however, is a vastly complicated cognitive system. To understand it, we need to know much more than merely the gross functional anatomy of language. We need to learn a bit about language itself.

Let’s begin with some simple questions. How does the brain cope with spoken, signed, and written input to derive meaning? And, how does the brain produce spoken, signed, and written output to communicate meaning to others? We can tackle these questions by laying out the aspects of language we need to consider in this chapter. First, the brain must store words and concepts. One of the central concepts in word (lexical) representation is the mental lexicon—a mental store of information about words that includes semantic information (the words’ meanings), syntactic information (how the words are combined to form sentences), and the details of word forms (their spellings and sound patterns). Most theories agree on the central role for a mental lexicon in language. Some theories, however, propose one mental lexicon for both language comprehension and production, whereas other models distinguish between input and output lexica. In addition, the representation of orthographic (vision-based) and phonological (sound-based) forms must be considered in any model. The principal concept, though, is that a store (or stores) of information about words exists in the brain. Words we hear, or see signed or written must first, of course, be analyzed perceptually.

Once words are perceptually analyzed, three general functions are hypothesized: lexical access, lexical selection, and lexical integration. Lexical access refers to the stage(s) of processing in which the output of perceptual analysis activates word-form representations in the mental lexicon, including their semantic and syntactic attributes. Lexical selection is the next stage, where the lexical representation in the mental lexicon that best matches the input can be identified (selected). Finally, to understand the whole message, lexical integration integrates words into the full sentence, discourse, or larger context. Grammar and syntax are the rules by which lexical items are organized in a particular language to produce the intended meaning. We must also consider not only how we comprehend language but also how we produce it as utterances, signs, and in its written forms. First things first, though: We begin by considering the mental lexicon, the brain’s store of words and concepts, and ask how it might be organized, and how it might be represented in the brain.

A normal adult speaker has passive knowledge of about 50,000 words and yet can easily recognize and produce about three words per second. Given this speed and the size of the database, the mental lexicon must be organized in a highly efficient manner. It cannot be merely the equivalent of a dictionary. If, for example, the mental lexicon were organized in simple alphabetical order, it might take longer to find words in the middle of the alphabet, such as the ones starting with K, L, O, or U, than to find a word starting with an A or a Z. Fortunately, this is not the case.

Instead, the mental lexicon has other organizational principles that help us quickly get from the spoken or written input to the representations of words. First is the representational unit in the mental lexicon, called the morpheme, which is the smallest meaningful unit in a language. As an example consider the words frost, defrost, and defroster. The root of these words, frost, forms one morpheme; the prefix “de” in defrost changes the meaning of the word frost and is a morpheme as well; and finally the word defroster consists of three morphemes (adding the morpheme “er”). An example of a word with a lot of morphemes comes from a 2007 New York Times article on language by William Safire; he used the word editorializing. Can you figure out how many morphemes are in this word? A second organizational principle is that more frequently used words are accessed more quickly than less frequently used words; for instance, the word people is more readily available than the word fledgling.

A third organizing principle is the lexical neighborhood, which consists of those words that differ from any single word by only one phoneme or one letter (e.g., bat, cat, hat, sat). A phoneme is the smallest unit of sound that makes a difference to meaning. In English, the sounds for the letters L and R are two phonemes (the words late and rate mean different things), but in the Japanese language, no meaningful distinction is made between L and R, so they are represented by only one phoneme. Behavioral studies have shown that words having more neighbors are identified more slowly during language comprehension than words with few neighbors (e.g., bat has many neighbors, but sword does not). The idea is that there may be competition between the brain representations of different words during word recognition—and this phenomenon tells us something about the organization of our mental lexicon. Specifically, words with many overlapping phonemes or letters must be organized together in the brain, such that when incoming words access one word representation, others are also initially accessed, and selection among candidate words must occur, which takes time.

A fourth organizing factor for the mental lexicon is the semantic (meaning) relationships between words. Support for the idea that representations in the mental lexicon are organized according to meaningful relationships between words comes from semantic priming studies that use a lexical (word) decision task. In a semantic priming study, participants are presented with pairs of words. The first member of the word pair, the prime, is a word; the second member, the target, can be a real word (truck), a nonword (like rtukc), or a pseudoword (a word that follows the phonological rules of a language but is not a real word, like trulk). If the target is a real word, it can be related or unrelated in meaning to the prime. For the task, the participants must decide as quickly and accurately as possible whether the target is a word (i.e., make a lexical decision), pressing a button indicating their decision. Participants are faster and more accurate at making the lexical decision when the target is preceded by a related prime (e.g., the prime car for the target truck) than an unrelated prime (e.g., the prime sunny for the target truck). Related patterns are found when the participant is asked to simply read the target out loud and there are only real words presented. Here, naming latencies are faster for words related to the prime word than for unrelated ones. What does this pattern of facilitated response speed tell us about the organization of the mental lexicon? It reveals that words related in meaning must somehow be organized together in the brain, such that activation of the representation of one word also activates words that are related in meaning. This makes words easier to recognize when they follow a related word that primes their meaning.

Several models have been proposed to explain the effects of semantic priming during word recognition. In an influential model proposed by Collins and Loftus (1975), word meanings are represented in a semantic network in which words, represented by conceptual nodes, are connected with each other. Figure 11.5 shows an example of a semantic network. The strength of the connection and the distance between the nodes are determined by the semantic relations or associative relations between the words. For example, the node that represents the word car will be close to and have a strong connection with the node that represents the word truck.

A major component of this model is the assumption that activation spreads from one conceptual node to others, and nodes that are closer together will benefit more from this spreading activation than will distant nodes. If we hear “car,” the node that represents the word car in the semantic network will be activated. In addition, words like truck and bus that are closely related to the meaning of car, and are therefore nearby and well connected in the semantic network, will also receive a considerable amount of activation. In contrast, a word like rose most likely will receive no activation at all when we hear “car.” This model predicts that hearing “car” should facilitate recognition of the word truck but not rose, which is true.

Although the semantic-network model that Collins and Loftus proposed has been extremely influential, the way that word meanings are organized is still a matter of dispute and investigation. There are many other models and ideas of how conceptual knowledge is represented. Some models propose that words that co-occur in our language prime each other (e.g., cottage and cheese), and others suggest that concepts are represented by their semantic features or semantic properties. For example, the word dog has several semantic features, such as “is animate,” “has four legs,” and “barks,” and these features are assumed to be represented in the conceptual network. Such models are confronted with the problem of activation: How many features have to be activated for a person to recognize a dog? For example, it is possible to train dogs not to bark, yet we can recognize a dog even when it does not bark, and we can identify a barking dog that we cannot see. Furthermore, it is not exactly clear how many features would have to be stored. For example, a table could be made of wood or glass, and in both cases we would recognize it as a table. Does this mean that we have to store the features “is of wood/glass” with the table concept? In addition, some words are more “prototypical” examples of a semantic category than others, as reflected in our recognition and production of these words. When we are asked to generate bird names, for example, the word robin comes to mind as one of the first examples; but a word like ostrich might not come up at all, depending on where we grew up or have lived.

In sum, it remains a matter of intense investigation how word meanings are represented. No matter how, though, everyone agrees that a mental store of word meanings is crucial to normal language comprehension and production. Evidence from patients with brain damage and from functional brain-imaging studies is revealing how the mental lexicon and conceptual knowledge may be organized.

Through observations of deficits in patients’ language abilities, we can infer a number of things about the functional organization of the mental lexicon. Different types of neurological problems create deficits in understanding and producing the appropriate meaning of a word or concept, as we described earlier. Patients with Wernicke’s aphasia make errors in speech production that are known as semantic paraphasias. For example, they might use the word horse when they mean cow. Patients with deep dyslexia make similar errors in reading: They might read the word horse where cow is written.

Patients with progressive semantic dementia initially show impairments in the conceptual system, but other mental and language abilities are spared. For example, these patients can still understand and produce the syntactic structure of sentences. This impairment has been associated with progressive damage to the temporal lobes, mostly on the left side of the brain. But the superior regions of the temporal lobe that are important for hearing and speech processing are spared (these areas are discussed later, in the subsection on spoken input). Patients with semantic dementia have difficulty assigning objects to a semantic category. In addition, they often name a category when asked to name a picture; when viewing a picture of a horse, they will say “animal,” and a picture of a robin will produce “bird.” Neurological evidence from a variety of disorders provides support for the semantic-network idea because related meanings are substituted, confused, or lumped together, as we would predict from the degrading of a system of interconnected nodes that specifies meaning relation.

In the 1970s and early 1980s, Elizabeth Warrington and her colleagues performed groundbreaking studies on the organization of conceptual knowledge in the brain, originating with her studies involving perceptual disabilities in patients possessing unilateral cerebral lesions. We have discussed these studies in some detail in Chapter 6, so we will only summarize them here. In Chapter 6 we discussed category-specific agnosias and how they might reflect the organization of semantic memory (conceptual) knowledge. Warrington and her colleagues found that semantic memory problems fell into semantic categories. They suggested that the patients’ problems were reflections of the types of information stored with different words in the semantic network. Whereas the biological categories rely more on physical properties or visual features, man-made objects are identified by their functional properties. Some of these studies were done on patients who would now be classified as suffering from semantic dementia.

Since these original observations by Warrington, many cases of patients with category-specific deficits have been reported, and there appears to be a striking correspondence between the sites of lesions and the type of semantic deficit. The patients whose impairment involved living things had lesions that included the inferior and medial temporal cortex, and often these lesions were located anteriorly. The anterior inferotemporal cortex is located close to areas of the brain that are crucial for visual object perception, and the medial temporal lobe contains important relay projections from association cortex to the hippocampus, a structure that, as you might remember from Chapter 9, has an important function in the encoding of information in long-term memory. Furthermore, the inferotemporal lobe is the end station for “what” information, or the object recognition stream, in vision (see Chapter 6).

Less is known about the localization of lesions in patients who show greater impairment for human-made things, simply because fewer of these patients have been identified and studied. But left frontal and parietal areas appear to be involved in this kind of semantic deficit. These areas are close to or overlap with areas of the brain that are important for sensorimotor functions, and so they are likely to be involved in the representation of actions that can be undertaken when human-made artifacts such as tools are being used.

Correlations between the type of semantic deficit and the area of brain lesion are consistent with a hypothesis by Warrington and her colleagues about the organization of semantic information. They have suggested that the patients’ problems are reflections of the types of information stored with different words in the semantic network. Whereas the biological categories (fruits, foods, animals) rely more on physical properties or visual features (e.g., what is the color of an apple?), human-made objects are identified by their functional properties (e.g., how do we use a hammer?).

This hypothesis by Warrington and colleagues has been both supported and challenged. The computational model by Martha Farah and James McClelland (1991), which has been discussed in Chapter 6, supported Warrington’s model. A challenge to Warrington’s proposal comes from observations by Alfonso Caramazza and others (e.g., Caramazza & Shelton, 1998) that the studies in patients did not always use well-controlled linguistic materials. For example, when comparing living things versus human-made things, some studies did not control the stimulus materials to ensure that the objects tested in each category were matched on things like visual complexity, visual similarity across objects, frequency of use, and the familiarity of objects. If these variables differ widely between the categories, then clear-cut conclusions about differences in their representation in a semantic network cannot be drawn. Caramazza has proposed an alternative theory in which the semantic network is organized along lines of the conceptual categories of animacy and inanimacy. He argues that the selective damage that has been observed in brain-damaged patients, as in the studies of Warrington and others, genuinely reflects “evolutionarily adapted domain-specific knowledge systems that are subserved by distinct neural mechanisms” (Caramazza & Shelton, 1998, p. 1).

In the 1990s, studies using imaging techniques in neurologically unimpaired human participants looked further into the organization of semantic representations. Alex Martin and his colleagues (1996) at the National Institute of Mental Health (NIMH) conducted studies using PET imaging and functional magnetic resonance imaging (fMRI). Their findings reveal how the intriguing dissociations in neurological patients that we just described can be identified in neurologically normal brains. When participants read the names of or answered questions about animals, or when they named pictures of animals, the more lateral aspects of the fusiform gyrus (on the brain’s ventral surface) and the superior temporal sulcus were activated. But naming animals also activated a brain area associated with the early stages of visual processing—namely, the left medial occipital lobe. In contrast, identifying and naming tools were associated with activation in the more medial aspect of the fusiform gyrus, the left middle temporal gyrus, and the left premotor area, a region that is also activated by imagining hand movements. These findings are consistent with the idea that in our brains, conceptual representations of living things versus human-made tools rely on separable neuronal circuits engaged in processing of perceptual versus functional information.

More recently, studies of the representation of conceptual information indicate that there is a network that connects the posterior fusiform gyrus in the inferior temporal lobe to the left anterior temporal lobes. Lorraine Tyler and her colleagues (Taylor et al., 2011) at the University of Cambridge have studied the representation and processing of concepts of living and nonliving things in patients with brain lesions to the anterior temporal lobes and in unimpaired participants using fMRI, EEG, and MEG measures. In these studies, participants are typically asked to name pictures of living (e.g., tiger) and nonliving (e.g., knife) things. Further, the level at which these objects should be named was varied. Participants were asked to name the pictures at the specific level (e.g., tiger or knife), or they were asked to name the pictures at the domain general level (e.g., living or nonliving). Tyler and colleagues suggest that naming at the specific level requires retrieval and integration of more detailed semantic information than at the domain general level. For example, whereas naming a picture at a domain general level requires activation of only a subset of features (e.g., for animals: has-legs, has-fur, has-eyes, etc.), naming at the specific level requires retrieval and integration of additional and more precise features (e.g., to distinguish a tiger from a panther, features such as “has-stripes” have to be retrieved and integrated as well). Interestingly, as can be seen in Figure 11.6, whereas nonliving things can be represented by only a few features (e.g., knife), living things are represented by many features (e.g., tiger). Thus, it may be more difficult to select the feature that distinguishes living things from each other (e.g., a tiger from a panther; has-stripes vs. has-spots) than it is to distinguish nonliving things (e.g., a knife from a spoon; cuts vs. scoops; Figure 11.6b). This model suggests that the dissociation between naming of nonliving and living things in patients with category-specific deficits may also be due to the complexity of the features that help distinguish one thing from another.

Tyler and colleagues observed that patients with lesions to the anterior temporal lobes cannot reliably name living things at the specific level, indicating that the retrieval and integration of more detailed semantic information is impaired. Functional MRI studies in unimpaired participants showed greater activation in the anterior temporal lobe with specific-level naming of living things than with domain-level naming (Figure 11.7).

Finally, studies with MEG and EEG have revealed interesting details about the timing of the activation of conceptual knowledge. Activation of the perceptual features occurs in primary cortices within the first 100 ms after a picture is presented; activation of more detailed semantic representations occurs in the posterior and anterior ventral–lateral cortex between 150 and 250 ms; and starting around 300 ms, participants are able to name the specific object that is depicted in the picture, which requires the retrieval and integration of detailed semantic information that is unique to the specific object.

In understanding spoken language and understanding written language, the brain uses some of the same processes; but there are also some striking differences in how spoken and written inputs are analyzed. When attempting to understand spoken words (Figure 11.8), the listener has to decode the acoustic input. The result of this acoustic analysis is translated into a phonological code because, as discussed above, that is how the lexical representations of auditory word forms are stored in the mental lexicon. After the acoustic input has been translated into a phonological format, the lexical representations in the mental lexicon that match the auditory input can be accessed (lexical access), and the best match can then be selected (lexical selection). The selected word includes grammatical and semantic information stored with it in the mental lexicon. This information helps to specify how the word can be used in the given language. Finally, the word’s meaning (store of the lexical-semantic information) results in activation of the conceptual information.

The process of reading words shares at least the last two steps of linguistic analysis (i.e., lexical and meaning activation) with auditory comprehension, but, due to the different input modality, it differs at the earlier processing steps, as illustrated in Figure 11.8. Given that the perceptual input is different, what are these earlier stages in reading? The first analysis step requires that the reader identify orthographic units (written symbols that represent the sounds or words of a language) from the visual input. These orthographic units may then be directly mapped onto orthographic (vision-based) word forms in the mental lexicon, or alternatively, the identified orthographic units might be translated into phonological units, which in turn activate the phonological word form in the mental lexicon as described for auditory comprehension.

In the next few sections, we delve into the processes involved in the understanding of spoken and written inputs of words. Then we consider the understanding of sentences. We begin with auditory processing and then turn to the different steps involved in the comprehension of reading, also known as visual language input.

The input signal in spoken language is very different from that in written language. Whereas for a reader it is immediately clear that the letters on a page are the physical signals of importance, a listener is confronted with a variety of sounds in the environment and has to identify and distinguish the relevant speech signals from other “noise.”

As introduced earlier, important building blocks of spoken language are phonemes. These are the smallest units of sound that make a difference to meaning; for example, in the words cap and tap the only difference is the first phoneme (/c/ versus /t/). The English language uses 40 phonemes; other languages may use more or less. Perception of phonemes is different for speakers of different languages. As we mentioned earlier in this chapter, for example, in English, the sounds for the letters L and R are two phonemes (the words late and rate mean different things, and we easily hear that difference). But in the Japanese language, L and R cannot be distinguished by adult native speakers, so these sounds are represented by only one phoneme.

Interestingly, infants have the perceptual ability to distinguish between any possible phonemes during their first year of life. Patricia Kuhl and her colleagues at the University of Washington found that, initially, infants could distinguish between any phonemes presented to them; but during the first year of life, their perceptual sensitivities became tuned to the phonemes of the language they experienced (Kuhl et al., 1992). So, for example, Japanese infants can distinguish L from R sounds, but then lose that ability over time. American infants, on the other hand, do not lose that ability, but do lose the ability to distinguish phonemes that are not part of the English language. The babbling and crying sounds that infants articulate from ages 6–12 months grow more and more similar to the phonemes that they most frequently hear. By the time babies are one year old, they no longer produce (nor perceive) nonnative phonemes. Learning another language often involves phonemes that don’t occur in a person’s native language, such as the guttural sounds of Dutch or the rolling R of Spanish. Such nonnative sounds can be difficult to learn, especially when we are older and our native phonemes have become automatic, and make it challenging or impossible to lose our native accent. Perhaps that was Mark Twain’s problem when he quipped, “In Paris they just simply opened their eyes and stared when we spoke to them in French! We never did succeed in making those idiots understand their own language” (from The Innocents Abroad).

Recognizing that phonemes are important building blocks of spoken language and that we all become experts in the phonemes of our native tongue does not eliminate all challenges for the listener. The listener’s brain must resolve a number of additional difficulties with the speech signal; some of these challenges have to do with (a) the variability of the signal (e.g., male vs. female speakers), and (b) the fact that phonemes often do not appear as separate little chunks of information. Unlike the case for written words, auditory speech signals are not clearly segmented, and it can be difficult to discern where one word begins and another word ends. When we speak, we usually spew out about 15 phonemes per second, which adds up to about 180 words a minute. The puzzling thing is that we say these phonemes with no gaps or breaks: that is, there are no pauses between words. Thus, the input signal in spoken language is very different from that in written language, where the letters and phonemes are neatly separated into word chunks. Two or more spoken words can be slurred together or, in other words, speech sounds are often coarticulated. There can also be silences within words as well. The question of how we differentiate auditory sounds into separate words is known as the segmentation problem. This is illustrated in Figure 11.9, which shows the speech signal of the sentence, “What do you mean?”

How do we identify the spoken input, given this variability and the segmentation problem? Fortunately, other clues help us divide the speech stream into meaningful segments. One important clue is the prosodic information, which is what the listener derives from the speech rhythm and the pitch of the speaker’s voice. The speech rhythm comes from variation in the duration of words and the placement of pauses between them. Prosody is apparent in all spoken utterances, but it is perhaps most clearly illustrated when a speaker asks a question or emphasizes something. When asking a question, a speaker raises the frequency of the voice toward the end of the question; and when emphasizing a part of speech, a speaker raises the loudness of the voice and includes a pause after the critical part of the sentence.

In their research, Anne Cutler and colleagues (Tyler and Cutler, 2009) at the Max Planck Institute for Psycholinguistics in the Netherlands have revealed other clues that can be used to segment the continuous speech stream. These researchers showed that English listeners use syllables that carry an accent or stress (strong syllables) to establish word boundaries. For example, a word like lettuce, with stress on the first syllable, is usually heard as a single word and not as two words (“let us”). In contrast, words such as invests, with stress on the last syllable, are usually heard as two words (“in vests”) and not as one word.

Neural Substrates of Spoken-Word Processing Now we turn to the questions of where in the brain the processes of understanding speech signals may take place and what neural circuits and systems support them. From animal studies, studies in patients with brain lesions, and imaging and recording (EEG and MEG) studies in humans, we know that the superior temporal cortex is important to sound perception. At the beginning of the 20th century, it was already well understood that patients with bilateral lesions restricted to the superior parts of the temporal lobe had the syndrome of “pure word deafness.” Although they could process other sounds relatively normally, these patients had specific difficulties recognizing speech sounds. Because there was no difficulty in other aspects of language processing, the problem seemed to be restricted primarily to auditory or phonemic deficits—hence the term pure word deafness. With evidence from more recent studies in hand, however, we can begin to determine where in the brain speech and nonspeech sounds are first distinguished.

When the speech signal hits the ear, it is first processed by pathways in the brain that are not specialized for speech but that are used for hearing in general. Heschl’s gyri, which are located on the supratemporal plane, superior and medial to the superior temporal gyrus (STG) in each hemisphere, contain the primary auditory cortex, or the area of cortex that processes the auditory input first (see Chapter 2). The areas that surround Heschl’s gyri and extend into the superior temporal sulcus (STS) are collectively known as auditory association cortex. Imaging and recording studies in humans have shown that Heschl’s gyri of both hemispheres are activated by speech and nonspeech sounds (e.g., tones) alike, but that the activation in the STS of both hemispheres is modulated by whether the incoming auditory signal is a speech sound or not. This view is summarized in Figure 11.10 showing that there is a hierarchy in the sensitivity to speech in our brain (Peelle et al., 2010; Poeppel et al., 2012). As we move farther away from Heschl’s gyrus toward anterior and posterior portions of the STS, the brain becomes less sensitive to changes in nonspeech sounds but more sensitive to speech sounds. Although more left lateralized, the posterior portions of the STS of both hemispheres seem especially relevant to processing of phonological information. It is clear from many studies, however, that the speech perception network expands beyond the STS.

As described earlier, Wernicke found that patients with lesions in the left temporoparietal region that included the STG (Wernicke’s area) had difficulty understanding spoken and written language. This observation led to the now-century-old notion that this area is crucial to word comprehension. Even in Wernicke’s original observations, however, the lesions were not restricted to the STG. We can now conclude that the STG alone is probably not the seat of word comprehension.

One study that has contributed to our new understanding of speech perception is an fMRI study done by Jeffrey Binder and colleagues (2000) at the Medical College of Wisconsin. Participants in the study listened to different types of sounds, both speech and nonspeech. The sounds were of several types: white noise without systematic frequency or amplitude modulations; tones that were frequency modulated between 50 and 2,400 Hz; reversed speech, which was real words played backward; pseudowords, which were pronounceable strings of nonreal words that contain the same letters as the real word—for example, sked from desk; and real words.

Figure 11.11 shows the results of the Binder study. Relative to noise, the frequency-modulated tones activated posterior portions of the STG bilaterally. Areas that were more sensitive to the speech sounds than to tones were more ventrolateral, in or near the superior temporal sulcus, and lateralized to the left hemisphere. In the same study, Binder and colleagues showed that these areas are most likely not involved in lexical-semantic aspects of word processing (i.e., the processing of word forms and word meaning), because they were equally activated for words, pseudowords, and reversed speech.

Based on their fMRI findings and the findings of other groups identifying brain regions that become activated in relation to subcomponents of speech processing, Binder and colleagues (2000) proposed a hierarchical model of word recognition (Figure 11.12). In this model, processing proceeds anteriorly in the STG. First, the stream of auditory information proceeds from auditory cortex in Heschl’s gyri to the superior temporal gyrus. In these parts of the brain, no distinction is made between speech and nonspeech sounds, as noted earlier. The first evidence of such a distinction is in the adjacent mid-portion of the superior temporal sulcus, but still, no lexical-semantic information is processed in this area.

Neurophysiological studies now indicate that recognizing whether a speech sound is a word or a pseudoword happens in the first 50–80 ms (MacGregor et al., 2012). This processing tends to be lateralized more to the left hemisphere, where the combinations of the different features of speech sounds are analyzed (pattern recognition). From the superior temporal sulcus, the information proceeds to the final processing stage of word recognition in the middle temporal gyrus and the inferior temporal gyrus, and finally to the angular gyrus, posterior to the temporal areas just described (see Chapter 2), and in more anterior regions in the temporal pole (Figure 11.10).

Over the course of the decade following the Binder study, multiple studies were done in an attempt to localize speech recognition processes. In reviewing 100 fMRI studies, Iain DeWitt and Josef Rauschecker (2012) of Georgetown University Medical Center confirmed the findings that the left mid-anterior STG responds preferentially to phonetic sounds of speech. Researchers also have tried to identify areas in the brain that are particularly important for the processing of phonemes. Recent fMRI studies from the lab of Sheila Blumstein at Brown University suggest a network of areas involved in phonological processing during speech perception and production, including the left posterior superior temporal gyrus (activation), the supramarginal gyrus (selection), inferior frontal gyrus (phonological planning), and precentral gyrus (generating motor plans for production; Peramunage et al., 2011).

Reading is the perception and comprehension of written language. For written input, readers must recognize a visual pattern. Our brain is very good at pattern recognition, but reading is a quite recent invention (about 5,500 years old). Although speech comprehension develops without explicit training, reading requires instruction. Specifically, learning to read requires linking arbitrary visual symbols into meaningful words. The visual symbols that are used vary across different writing systems. Words can be symbolized in writing in three different ways: alphabetic, syllabic, and logographic. For example, many Western languages use the alphabetic system, Japanese uses the syllabic system, and Chinese uses the logographic system.

Regardless of the writing system used, readers must be able to analyze the primitive features, or the shapes of the symbols. In the alphabetic system—our focus here—this process involves the visual analysis of horizontal lines, vertical lines, closed curves, open curves, intersections, and other elementary shapes.

In a 1959 paper that was a landmark contribution to the emerging science of artificial intelligence (i.e., machine learning), Oliver Selfridge proposed a collection of small components or demons (a term he used to refer to a discrete stage or substage of information processing) that together would allow machines to recognize patterns. Demons record events as they occur, recognize patterns in those events, and may trigger subsequent events according to patterns they recognize. In his model, known as the pandemonium model, the sensory input (R) is temporarily stored as an iconic memory by the so-called image demon. Then 28 feature demons each sensitive to a particular feature like curves, horizontal lines, and so forth start to decode features in the iconic representation of the sensory input (Figure 11.13). In the next step, all representations of letters with these features are activated by cognitive demons. Finally, the representation that best matches the input is selected by the decision demon. The pandemonium model has been criticized because it consists solely of stimulus-driven (bottom-up) processing and does not allow for feedback (top-down) processing, such as in the word superiority effect (see Chapter 3). Humans are better at processing letters found in words than letters found in nonsense words or even single letters.

In 1981, James McClelland and David Rumelhart proposed a computational model that has been important for visual letter recognition. This model assumes three levels of representation: (a) a layer for the features of the letters of words, (b) a layer for letters, and (c) a layer for the representation of words. An important characteristic of this model is that it permits top-down information (i.e., information from the higher cognitive levels, such as the word layer) to influence earlier processes that happen at lower levels of representation (the letter layer and/or the feature layer).

This model contrasts sharply with Selfridge’s model, where the flow of information is strictly bottom up (from the image demon to the feature demons to the cognitive demons and finally to the decision demon). Another important difference between the two models is that, in the McClelland and Rumelhart model, processes can take place in parallel such that several letters can be processed at the same time, whereas in Selfridge’s model, one letter is processed at a time in a serial manner. As Figure 11.14 shows, the model of McClelland and Rumelhart permits both excitatory and inhibitory links between all the layers.

The empirical validity of a model can be tested on real-life behavioral phenomena or against physiological data. McClelland and Rumelhart’s connectionist model does an excellent job of mimicking reality for the word superiority effect. This remarkable result indicates that words are probably not perceived on a letter-by-letter basis. The word superiority effect can be explained in terms of the McClelland and Rumelhart model, because the model proposes that top-down information of the words can either activate or inhibit letter activations, thereby helping the recognition of letters.

We learned in Chapters 5 and 6 that single-cell recording techniques have enlightened us about the basics of visual feature analysis and how the brain analyzes edges, curves, and so on. Unresolved questions remain, however, because letter and word recognition are not really understood at the cellular level, and recordings in monkeys are not likely to enlighten us about letter and word recognition in humans. Recent studies using PET and fMRI have started to shed some light on where letters are processed in the human brain.

Neural Substrates of Written-Word Processing The actual identification of orthographic units may take place in occipitotemporal regions of the left hemisphere. It has been known for over 100 years that lesions in this area can give rise to pure alexia, a condition in which patients cannot read words, even though other aspects of language are normal. In early PET imaging studies, Steven Petersen and his colleagues (1990) contrasted words with non-words and found regions of occipital cortex that preferred word strings. They named these regions the visual word form area. In later studies using fMRI in normal participants, Gregory McCarthy at Yale University and his colleagues (Puce et al., 1996) contrasted brain activation in response to letters with activation in response to faces and visual textures. They found that regions of the occipitotemporal cortex were activated preferentially in response to unpronounceable letter strings (Figure 11.15). Interestingly, this finding confirmed results from an earlier study by the same group (Nobre et al., 1994), in which intracranial electrical recordings were made from this brain region in patients who later underwent surgery for intractable epilepsy. In this study, the researchers found a large negative polarity potential at about 200 ms in occipitotemporal regions, in response to the visual presentation of letter strings. This area was not sensitive to other visual stimuli, such as faces, and importantly, it also appeared to be insensitive to lexical or semantic features of words.

In a combined ERP and fMRI study that included d healthy persons and patients with callosal lesions, Laurent Cohen, Stanislas Dehaene, and their colleagues (2000) investigated the visual word form area. While the participants fixated on a central crosshair, a word or a non-word was flashed to either their right or left visual field. Non-words were consonant strings incompatible with French orthographic principles and were impossible to translate into phonology. When a word flashed on the screen, they were to repeat it out loud, and if a non-word flashed, they were to think “rien” (which means nothing; this was a French study after all).

The event-related potentials (ERPs) indicated that initial processing was confined to early visual areas contralateral to the stimulated visual hemifield. Activations then revealed a common processing stage, which was associated with the activation of a precise, reproducible site in the left occipitotemporal sulcus (anterior and lateral to area V4), part of the visual word form area, which coincides with the lesion site that causes pure alexia (Cohen et al., 2000). This and later studies showed that this activation was visually elicited (Dehaene et al., 2002) only for prelexical forms (before the word form was associated with a meaning), yet was invariant for the location of the stimulus (right or left visual field) and the case of the word stimulus (Dehaene et al., 2001). These findings were also in agreement with Nobre’s findings. Finally, the researchers found that the processing beyond this point was the same for all word stimuli from either visual field—a result that corresponds to the standard model of word reading. Activation of the visual word form area is reproducible across cultures that use different types of symbols, such as Japanese kana (syllabic) and kanji (logographic; Bolger et al., 2005). This convergent neurological and neuroimaging evidence gives us clues as to how the human brain solves the perceptual problems of letter recognition.

Awake, a young man lies on his side on a table, draped with clean, light-green sheets. His head is partially covered by a sheet of cloth, so we can see his face if we wish. On the other side of the cloth is a man wearing a surgical gown and mask. One is a patient; the other is his surgeon. His skull has been cut through, and his left hemisphere is exposed. Rather than being a scene from a sci-fi thriller, this is a routine procedure at the University of Washington Medical School, where George Ojemann and his colleagues (1989) have been using direct cortical stimulation to map the brain’s language areas.

The patient suffers from epilepsy and is about to undergo a surgical procedure to remove the epileptic tissue. Because this epileptic focus is in the left, language-dominant hemisphere, it is first essential to determine where language processes are localized in the patient’s brain. Such localization can be done by electrical stimulation mapping. Electrodes are used to pass a small electrical current through the cortex, momentarily disrupting activity; thus, electrical stimulation can probe where a language process is localized. The patient has to be awake for this test. Language-related areas vary among patients, so these areas must be mapped carefully. During surgery, it is essential to leave the critical language areas intact.

One benefit of this work is that we can learn more about the organization of the human language system (Figure 1). Patients are shown line drawings of everyday objects and are asked to name those objects. During naming, regions of the left perisylvian cortex are stimulated with low amounts of electricity. When the patient makes an error in naming or is unable to name the object, the deficit is correlated with the region being stimulated during that trial, so that area of cortex is assumed to be critical for language production and comprehension.

Stimulation of between 100 and 200 patients revealed that aspects of language representation in the brain are organized in mosaic-like areas of 1 to 2 cm2. These mosaics usually include regions in the frontal cortex and posterior temporal cortex. In some patients, however, only frontal or posterior temporal areas were observed. The correlation between these effects in either Broca’s area or Wernicke’s area was weak; some patients had naming disruption in the classic areas, and others did not. Perhaps the single most intriguing fact is how much the anatomical localizations vary across patients. This finding has implications for how across-subject averaging methods, such as PET activation studies, reveal significant effects.

FIGURE 1 Regions of the brain of two patients studied with cortical stimulation mapping.
During surgery, with the patient awake and lightly anesthetized, the surgeon maps the somatosensory and motor areas by stimulating the cortex and observing the responses. The patient also is shown pictures and asked to verbally name them. Discrete regions of the cortex are stimulated with electrical current during the task. Areas that induce errors in naming when they are stimulated are mapped, and those regions are implicated as being involved in language. The surgeon uses this mapping to avoid removing any brain tissue associated with language. The procedure thus treats brain tumors or epilepsy as well as enlightens us about the cortical organization of language functions.

We come now to the point in word comprehension where auditory and visual word comprehension share processing components. Once a phonological or visual representation is identified as a word, then for it to gain any meaning, semantic and syntactic information must be retrieved. Usually words are not processed in isolation, but in the context of other words (sentences, stories, etc.). To understand words in their context, we have to integrate syntactic and semantic properties of the recognized word into a representation of the whole utterance.

At what point during language comprehension do linguistic and nonlinguistic context (e.g., information seen in pictures) influence word processing? Is it possible to retrieve word meanings before words are heard or seen when the word meanings are highly predictable in the context? More specifically, does context influence word processing before or after lexical access and lexical selection are complete?

Consider the following sentence, which ends with a word that has more than one meaning. “The tall man planted a tree on the bank.” Bank can mean both “financial institution” and “side of the river.” Semantic integration of the meaning of the final word bank into the context of the sentence allows us to interpret bank as the “side of the river” and not as a “financial institution.” The relevant question is, when does the sentence’s context influence the activation of the multiple meanings of the word bank? Do both the contextually appropriate meaning of bank (in this case “side of the river”) and the contextually inappropriate meaning (in this case “financial institution”) become briefly activated regardless of the context of the sentence? Or does the sentence context immediately constrain the activation to the contextually appropriate meaning of the word bank?

From this example, we can already see that two types of representations play a role in word processing in the context of other words: lower-level representations, those constructed from sensory input (in our example, the word bank itself); and higher-level representations, those constructed from the context preceding the word to be processed (in our example, the sentence preceding the word bank). Contextual representations are crucial to determine in what sense or what grammatical form a word should be used. Without sensory analysis, however, no message representation can take place. The information has to interact at some point. The point where this interaction occurs differs in competing models.

In general, three classes of models attempt to explain word comprehension. Modular models (also called autonomous models) claim that normal language comprehension is executed within separate and independent modules. Thus, higher-level representations cannot influence lowerlevel ones, and therefore, the flow is strictly data driven, or bottom up. In contrast, interactive models maintain that all types of information can participate in word recognition. In these models, context can have its influence even before the sensory information is available, by changing the activational status of the word-form representations in the mental lexicon. McClelland and colleagues (1989) have proposed this type of interactivity model, as noted earlier. Between these two extreme views is the notion that lexical access is autonomous and not influenced by higher-level information, but that lexical selection can be influenced by sensory and higher-level contextual information. In these hybrid models, information is provided about word forms that are possible given the preceding context, thereby reducing the number of activated candidates.

An elegant study by Pienie Zwitserlood (1989), involving a lexical decision task, addressed the question of modularity versus interactivity in word processing. She asked participants to listen to short texts such as: “With dampened spirits the men stood around the grave. They mourned the loss of their captain.” At different points during the auditory presentation of the word captain (e.g., when only /c/ or only /ca/ or only /cap/, etc., could be heard), a visual target stimulus was presented. This target stimulus could be related to the actual word captain, or to an auditory competitor—for example, capital. In this example, target words could be words like ship (related to captain) or money (unrelated to captain, but related to capital ). In other cases, a pseudoword would be presented. The task was to decide whether the target stimulus was a word or not (lexical decision task).

The results of this study showed that participants were faster to decide that ship was a word in the context of the story about the men mourning their captain, and slower to decide that money was a word, even when only partial sensory information of the stimulus word captain was available (i.e., before the whole word was spoken). Apparently, the lexical selection process was influenced by the contextual information that was available from the text that the participants had heard before the whole word captain was spoken.

This finding is consistent with the idea that lexical selection can be influenced by sentence context. We do not know for certain which type of model best fits word comprehension, but growing evidence from studies like that of Zwitserlood and others suggests that at least lexical selection is influenced by higher-level contextual information. More recently, William Marslen-Wilson and colleagues (Zhuang et al., 2011) have performed fMRI studies of word recognition and shown that the processes of lexical access and lexical selection involve a network that includes the middle temporal gyrus (MTG), superior temporal gyrus (STG), and the ventral inferior and bilateral dorsal inferior frontal gyri (IFG). They showed that MTG and STG are important for the translation of speech sounds to word meanings. They also showed that the frontal cortex regions were important in the selection process and that greater involvement of dorsal IFG occurred when selection required choosing the actual word from among many lexical candidates (lexical competition).

Normal language comprehension requires more than just recognizing individual words. To understand the message conveyed by a speaker or a writer, we have to integrate the syntactic and semantic properties of the recognized word into a representation of the whole sentence, utterance, or signed message. Let’s consider again the sentence, “The tall man planted a tree on the bank.” Why do we read bank to mean “side of the river” instead of “financial institution”? We do so because the rest of the sentence has created a context that is compatible with one meaning and not the other. This integration process has to be executed quickly, in real time—as soon as we are confronted with the linguistic input. If we come upon a word like bank in a sentence, usually we are not aware that this word has an alternative meaning, because the appropriate meaning of this word has been rapidly integrated into the sentence context.

Higher order semantic processing is important to determine the right sense or meaning of words in the context of a sentence, as with ambiguous words such as bank, which have the same form but more than one meaning. Semantic information in words alone, however, is not enough to understand the message, as made clear in the sentence, “The little old lady bites the gigantic dog.” Syntactic analysis of this sentence reveals its structure: who was the actor, what was the theme or action, and what was the subject. The syntax of the sentence demands that we imagine an implausible situation in which an old lady is biting and not being bitten. Syntactic analysis goes on even in the absence of real meaning. In various studies, normal participants can detect a target word in a sentence when it makes no sense but is grammatically correct faster than they can do so when the grammar is locally disrupted. An example from the famous linguist Noam Chomsky illustrates this. The sentence “Colorless green ideas sleep furiously” is easier to process than “Furiously sleep ideas green colorless.” This is because the first sentence, even though meaningless, still has an intact syntactic structure, but the second sentence lacks both meaning and structure.

How do we process the structure of sentences? As we have learned, when we hear or read sentences, we activate word forms that in turn activate the grammatical and semantic information in the mental lexicon. Unlike the representation of words and their syntactic properties that are stored in a mental lexicon, however, representations of whole sentences are not stored in the brain. It is just not feasible for the brain to store the incredible number of different sentences that can be written and produced. Instead, the brain has to assign a syntactic structure to words in sentences. This is called syntactic parsing. Syntactic parsing is, therefore, a building process that does not, and cannot, rely on the retrieval of representations of sentences. To investigate the neural bases of semantic and syntactic analyses in sentence processing, researchers have used cognitive neuroscience tools, such as electrophysiological methods. We review these briefly in the next sections.

After pulling the fragrant loaf from the oven, he cut a slice and spread the warm bread with socks. What? You may not realize it, but you just had a large N400 response in your brain. Marta Kutas and Steven Hillyard (1980) at the University of California, San Diego, first described the N400 response, an ERP component related to linguistic processes. The name N400 indicates that it is a negativepolarity voltage peak in brain waves that usually reaches maximum amplitude about 400 ms after the onset of a word stimulus that has evoked it. This brain wave is especially sensitive to semantic aspects of linguistic input. They discovered the wave when they were comparing the processing of the last word of sentences in three conditions:

The sentences were presented on a computer screen, one word at a time. Participants were asked to read the sentences attentively, knowing that questions about the sentences would be asked at the end of the experiment. The electroencephalograms (EEGs) were averaged for the sentences in each condition, and the ERPs were extracted by averaging data for the last word of the sentences separately for each sentence type.

When anomalous words ended the sentence, the amplitude of N400 was greater than the amplitude of N400 when the participants read congruent words (see Figure 11.16). This amplitude difference is called the N400 effect. In contrast, words that were semantically congruent with the sentence but were merely physically deviant (e.g., having larger letters) elicited a positive potential rather than an N400. Subsequent experiments showed that nonsemantic deviations like musical or grammatical violations also failed to elicit the N400. Thus, the N400 effect is specific to semantic analysis.

The N400 response is also sensitive to comprehension of language that goes beyond single sentences. In a series of studies, Jos van Berkum and colleagues (1999, 2008) found an N400 response to words that were inconsistent with the meaning of an entire story. In these studies, participants listened to or read short stories. In the last sentence of these stories, words could be included that were inconsistent with the meaning of the story. For example, in a story about a man who had become a vegetarian, the last sentence could be: “He went to a restaurant and ate a steak that was prepared well.” Although the word steak is fine when this sentence is read by itself, it is inconsistent within the context of the story. The researchers found that participants who read this sentence in this story exhibited an N400 effect.

The P600 response, also known as the syntactic positive shift (SPS), was first reported by Lee Osterhout at Washington University and Phil Holcomb (1992) at Tufts, and Peter Hagoort, Colin Brown, and their colleagues (1993) in the Netherlands. Osterhout and Holcomb observed it at about 600 ms after the onset of words that were incongruous with the expected syntactic structure. It is evoked by the type of phrase that headline writers love: Drunk gets nine months in violin case or Enraged cow injures farmer with ax. Known as garden path phrases or sentences, they are temporarily ambiguous because they contain a word group which appears to be compatible with more than one structural analysis: We are “led down the garden path,” so to speak.

Peter Hagoort, Colin Brown, and their colleagues asked participants to silently read sentences that were presented one word at a time on a video monitor. Brain responses to normal sentences were compared with responses to sentences containing a grammatical violation. Figure 11.17 shows the results: There is a large positive shift to the syntactic violation in the sentence, and the onset of this effect is approximately 600 ms after the violating word (throw in the example). The P600 shows up in response to a number of other syntactic violations as well, and it occurs both when participants have to read sentences and when they have to listen to them. As with the N400, the P600 response has now been reported for several different languages.

Finally, Gina Kuperberg and colleagues (2003, 2007) demonstrated that the P600 response is also evoked by a semantic violation in the absence of any syntactic violation. For instance, when there is a semantic violation between a verb and its subject but the syntax is correct, such as: “The eggs would eat toast with jam at breakfast.” This sentence is grammatically correct and not ambiguous, but it contains a so-called thematic violation (eggs cannot eat). Eggs and eating often occur in the same scenario, however, and are semantically related to each other. The P600 response in these types of sentences is elicited because the syntactic-based analysis of a sentence structure (e.g., subject-verb-object) is challenged by strong semantic relations of the words in a sentence.

FIGURE 11.17 ERPs reflecting grammatical aspects of language.
ERPs from a parietal (Pz) scalp recording site elicited in response to each word of sentences that are syntactically anomalous (dashed waveform) versus those that are syntactically correct (solid waveform). In the violated sentence, a positive shift emerges in the ERP waveform at about 600 ms after the syntactic violation (shaded). It is called the syntactic positive shift (SPS), or P600.

Syntactic processing is reflected in other types of brain waves as well. Cognitive neuroscientists Thomas Münte and colleagues (1993) and Angela Friederici and colleagues (1993), described a negative wave over the left frontal areas of the brain. This brain wave has been labeled the left anterior negativity (LAN) and has been observed when words violate the required word category in a sentence (e.g., as in “the red eats,” where noun instead of verb information is required), or when morphosyntactic features are violated (e.g., as in “he mow”). The LAN has about the same latency as the N400 but a different voltage distribution over the scalp, as Figure 11.18 illustrates. What do we know about the brain circuitry involved in syntactic processing? Some brain-damaged patients have severe difficulty producing sentences and understanding complex sentences. These deficits are apparent in patients with agrammatic aphasia, who generally produce two- or three-word sentences consisting exclusively of content words and hardly any function words (and, then, the, a, etc.). They also have difficulty understanding complex syntactic structures. So when they hear the sentence “The gigantic dog was bitten by the little old lady,” they will most likely understand it to mean that the lady was bitten by the dog. This problem in assigning syntactic structures to sentences traditionally has been associated with lesions that include Broca’s area in the left hemisphere. But not all agrammatic aphasic patients have lesions in Broca’s area. So, we do not want to assign syntactic processing to a specific structure like Broca’s area. Instead, the evidence suggests that the left inferior frontal cortex (in and around classical Broca’s area) has some involvement in syntactic processing.

Neuroimaging evidence from studies by David Caplan and colleagues (2000) at Harvard Medical School provides some additional clues about syntactic processing in the brain. In these studies, PET scans were made while participants read sentences varying in syntactic complexity. Caplan and colleagues found increased activation in the left inferior frontal cortex for the more complex syntactic structures (Figure 11.19).

In other studies, sentence complexity manipulations led to activation of more than just the left inferior frontal cortex. For example, Marcel Just and colleagues (1996) reported activation in Broca’s and Wernicke’s areas and in the homologous areas in the right hemisphere. PET studies have identified portions of the anterior superior temporal gyrus, in the vicinity of area 22 (Figure 11.20a) as another candidate for syntactic processing. Nina Dronkers at the University of California, Davis, and colleagues (1994) also implicated this area in aphasics’ syntactic processing deficits (Figure 11.20b).

Thus, a more contemporary view is emerging: Syntactic processing takes place in a network of left inferior frontal and superior temporal brain regions that are activated during language processing.

Do aphasic symptoms reflect processing losses, representational losses, or some combination of the two? One way of tackling this question is to analyze online measures of language processing. Such measures include the event-related potentials (ERPs) elicited by language processing. The idea is to investigate the processing of spoken language, observe how the patient’s brain responds to linguistic inputs, and to compare these responses to those in healthy control participants. One study used the N400 component of the ERP to investigate spoken-sentence understanding in Broca’s and Wernicke’s aphasics. Tamara Swaab (now at the University of California, Davis), Colin Brown, and Peter Hagoort at the Max Planck Institute for Psycholinguistics in the Netherlands (1997) tried to determine whether spoken-sentence comprehension might be hampered by a deficit in the online integration of lexical information.

Patients listened to sentences spoken at a normal rate (Figure 1). In half of the sentences, the meaning of the final word of the sentence matched the semantic meaning building up from the sentence context. In the other half of the sentences, the final word was anomalous with respect to the preceding context. As in Kutas and Hillyard’s (1980) study, the amplitude of the N400 wave should be larger in response to the anomalous final words than it is in response to the congruent final words. This result was obtained for normal age-matched control participants. Non-aphasic brain-damaged patients (controls with right-hemisphere damage) and aphasic patients with a light comprehension deficit (high comprehenders) had an N400 effect comparable to that of neurologically unimpaired participants. In aphasics with moderate to severe comprehension deficits (low comprehenders), the N400 effect was reduced and delayed.

The results are compatible with the idea that aphasics with moderate to severe comprehension problems have an impaired ability to integrate lexical information into a higher order representation of the sentence context, because the N400 component indexes the process of lexical integration. By incorporating electrical recordings into studies of neurological patients with behavioral deficits such as aphasia, scientists can track the processing of information in real time as it occurs in the brain. Observations from this tracking can be combined with analysis by means of traditional approaches such as reaction time measures in, for example, lexical decision tasks. Significantly, ERPs can also provide measures of processing in patients whose neurobehavioral deficit is too severe to use behavior alone because their comprehension is too low to understand the task instructions.

FIGURE 1 The N400 effect to different anomalous words at the end of a sentence in different groups of patients and healthy control participants. The recording is from a single electrode located at the midline parietal scalp site, Pz, in elderly healthy control participants, aphasics with high comprehension scores, aphasics with low comprehension scores, and patients with right-hemisphere lesions (control patients). The waveform for the low comprehenders is clearly delayed and somewhat reduced compared to that for the other groups. The waveforms for the normal control participants, the high comprehenders, and the patients with right-hemisphere lesions are comparable in size and do not differ in latency. This pattern implies a delay in time course of language processing in the patients with low comprehension.

Many new neural models of language have emerged that are different from the classical model initiated by the work of Paul Broca, Carl Wernicke, and others. In the contemporary models, these classical language areas are no longer always considered language specific, nor are their roles in language processing limited to those proposed in the classical model. Moreover, additional areas in the brain have been found to be part of the circuitry that is used for normal language processing.

One recent neural model of language that combines work in brain and language analysis has been proposed by Peter Hagoort (2005). His model divides language processing into three functional components—memory, integration, and control—and identifies their possible representation in the brain (Figure 11.21):

As Figure 11.21 shows, the temporal lobes are especially important for the storage and retrieval of word representations. Phonological and phonetic properties of words are stored in the central to posterior superior temporal gyrus (STG, which includes Wernicke’s area) extending into the superior temporal sulcus (STS), and semantic information is distributed over different parts of the left, middle, and inferior temporal gyri. This part of the model is very similar to what we have seen before in Binder’s neural model of spoken-word comprehension (see Figure 11.11).

The processes that combine and integrate (unify) phonological, lexical-semantic, and syntactic information recruit frontal areas of the brain, including our old friend Broca’s area or the left inferior frontal gyrus (LIFG). LIFG now appears to be involved in all three unification processes: semantic unification in Brodmann’s area 47 and BA45, syntactic unification in BA45 and BA44, and phonological unification in BA44 and parts of BA6.

The control component of the model becomes important when people are actually involved in communication—for example, when they have to take turns during a conversation. Cognitive control in language comprehension has not been studied very much, but areas that are involved in cognitive control during other tasks, such as the anterior cingulate cortex (ACC) and the dorsolateral prefrontal cortex (DLPC, BA46/9), also play a role during cognitive control in language comprehension.

We have reviewed a lot of studies focusing on brain regions in the left hemisphere that are involved in various language functions. How are these brain regions organized to create a language network in the brain? From recent studies that have considered the functional and structural connectivity in a language network, several pathways have been identified that connect the representations of words in the temporal lobes to the unification areas in the frontal lobes. For spoken sentence comprehension, Angela Friederici has elaborated a model of the language network that includes the connecting pathways (Figure 11.22). In this model, four pathways are distinguished. Two ventral pathways connect the posterior temporal lobes with the anterior temporal lobe and the frontal operculum. These ventral pathways are important for comprehension of the meanings of words. Two dorsal pathways connect the posterior temporal lobes to the frontal lobes. The dorsal pathway that connects to the premotor cortex is involved in speech preparation. The other dorsal pathway connects Broca’s area (specifically BA44) with the superior temporal gyrus and superior temporal sulcus. This pathway is important for aspects of syntactic processing.

So far we have focused mainly on language comprehension. Now we turn our attention to language production. To provide a framework for this discussion, we will concentrate mostly on one influential cognitive model for language production, proposed by Willem Levelt (1989) of the Max Planck Institute for Psycholinguistics in the Netherlands. Figure 11.23 illustrates this model.

A seemingly trivial but nonetheless important difference between comprehension and production is our starting point. Whereas language comprehension starts with spoken or written input that has to be transformed into a concept, language production starts with a concept for which we have to find the appropriate words.

The first step in speech production is to prepare the message. Levelt maintains that there are two crucial aspects to message preparation: macroplanning and microplanning. The speaker must determine what she wants to express in her message to the listener.

A message directing someone to our home will be formulated differently from a message instructing someone to close the door. The intention of the communication is represented by goals and subgoals, which are expressed in an order that best serves the communicative plan. This aspect of message planning is macroplanning.

Microplanning, in contrast, proposes how the information is expressed, which means adopting a perspective. If we describe a scene in which a house and a park are situated side by side, we must decide whether to say, “The park is next to the house” or “The house is next to the park.” The microplan determines word choice and the grammatical roles that the words play (e.g., subject, object).

The output of the macroplanning and microplanning is a conceptual message that constitutes the input for the hypothetical formulator, which puts the message in a grammatically and phonologically correct form. During grammatical encoding, a message’s surface structure is computed. The surface structure is a message’s syntactic representation, including information such as “is subject of,” “is object of,” the grammatically correct word order, and so on. The lowest-level elements of surface structure (known as lemmas) are about a word’s syntactic properties (e.g., whether the word is a noun or a verb, gender information, and other grammatical features) and its semantic specifications, and/or the conceptual conditions where it is appropriate to use a certain word. These types of information in the mental lexicon are organized in a network that links lemmas by meaning.

Levelt’s model predicts the following result when someone is presented with a picture of a flock of goats and is asked to name them. First the concept that represents a goat is activated, but concepts related to the meaning of goat are also activated—for example, sheep, cheese, farm. Activated concepts, in turn, activate representations in the mental lexicon, starting with “nodes” at the lemma level to access syntactic information such as word class (in our example, goat is a noun, not a verb). At this point, lexical selection occurs when the syntactical properties of the word appropriate to the presented picture must be retrieved. The selected information (in our example, goat) activates the word form. Next it undergoes morphological encoding when the suffix is added: goats. The newly formed morpheme contains both phonological information and metrical information, which is information about the number of syllables in the word and the stress pattern (in our example, goats consists of one syllable that is stressed). The process of phonological encoding ensures that the phonological information is mapped onto the metrical information. Sometimes we cannot activate the sound form of a word, because there is a rift between the syntax and the phonology; this is known as the tip of the tongue (TOT) state. You most likely have experienced a TOT state—you know a lot about the thing (i.e., a goat) that you are trying to name: You can say that it has four legs and white curly hair, you can visualize it in your mind, you can also reject words that do not match the concept (e.g., horse), and if someone tells you the word’s first letter (g), you probably will say, “Oh yes, goats.”

In addition to mentally blocking on a word, speech errors might happen during production. Sometimes we mix up speech sounds or exchange words in a sentence. Usually all goes well, though. The appropriate word form is selected, and phonetic and articulatory programs are matched. In the last phase of speech production, we plan our articulation: The word’s syllables are mapped onto motor patterns that move the tongue, mouth, and vocal apparatus to generate the word. At this stage, we can repair any errors in our speech, for example, by saying “um,” which gives us more time to generate the appropriate term.

Brain damage can affect each of these processing stages. Some anomic patients (impaired in naming), like H.W. at the beginning of the chapter, are afflicted with an extreme TOT state. When asked to name a picture, they can give a fairly accurate description—even naming the gender of the word they are looking for if the language requires one—but they cannot name the word. Their problem is not one of articulation, because they can readily repeat the word aloud. Their problems are on the word-form level. Patients with Wernicke’s aphasia produce semantic paraphasias, generating words related in meaning to the intended word. This can be due to inappropriate selection of concepts or lemmas or lexemes (units of lexical meaning). These patients might also make errors at the phoneme level by incorrectly substituting one sound for another. Finally, as mentioned earlier, Broca’s aphasia is often accompanied by dysarthria, which hinders articulation and results in effortful speech, because the muscles that articulate the utterance cannot be properly controlled.

In contrast to the modular view in Levelt’s model, interactive models such as the one proposed by Gary Dell (1986) at the University of Illinois suggest that phonological activation begins shortly after the semantic and syntactic information of words has been activated. Unlike modular models, interactive models permit feedback from the phonological activation to the semantic and syntactic properties of the word, thereby enhancing the activation of certain syntactic and semantic information.

Ned Sahin and his colleagues (2009) had the rare opportunity to shed some light on this question of how different forms of linguistic information are combined during speech production. They recorded electrical responses from multiple electrodes implanted in and around Broca’s area during presurgical screening of three epilepsy patients. To investigate word production in the brain, patients were engaged in a task involving three conditions that distinguished lexical, grammatical, and phonological linguistic processes. Most of the electrodes in Broca’s area yielded strong triphasic electrical responses (Figure 11.24). The responses (waves) correlated with distinct linguistic processing stages (Figure 11.25). The first wave, at about 200 ms, appeared to reflect lexical identification. The second wave occurred at about 320 ms (Figure 11.25b) and was modulated by inflectional demands. It was not, however, modulated by phonological programming. This was seen in the third wave (Figure 11.25c) that appeared at about 450 ms and reflected phonological encoding.

In naming tasks, speech typically occurs at 600 ms. Sahin and coworkers could also see that motor neuron commands occur 50–100 ms before speech, putting them just after the phonological wave (Figure 11.25d). These apparent processing steps were separated not only temporally but also spatially, but only by a few millimeters (below the resolution of standard fMRI), and all were located in Broca’s area. These findings provide support for serial processing, at least initially during speech production. Inflectional processing did not occur before the word was identified, and phonological processing did not occur until inflected phonemes were selected. The results are also consistent with the idea that Broca’s area has distinct circuits that process lexical, grammatical, and phonological information.

Imaging studies of the brain during picture naming and word generation found activation in the inferior temporal regions of the left hemisphere and in the left frontal operculum (Broca’s area). The activation in the frontal operculum might be specific to phonological encoding in speech production. The articulation of words likely involves the posterior parts of Broca’s area (BA44), but in addition, studies showed bilateral activation of motor cortex, the supplementary motor area (SMA), and the insula. PET and fMRI studies of the motor aspect of speech have shown that they involve the SMA, the opercular parts of the precentral gyrus, the posterior parts of the inferior frontal gyrus (Broca’s area), the insula, the mouth region of the primary sensory motor cortex, the basal ganglia, thalamus, and cerebellum (reviewed in Ackermann & Riecker, 2010). It is clear that a widespread network of brain regions, predominantly in the left hemisphere in most people, are involved in producing speech.

FIGURE 11.25 Lexical, grammatical, and phonological information is processed sequentially in overlapping circuits.
Results from one of several depth probes placed in Broca’s area while people read words verbatim or grammatically inflected them. The shaded areas indicate the three separate wave components. (a, top) Recorded from several channels in Broca’s area BA45, the task consistently evoked three local field potential (LFP) components (~200, ~320, and ~450 ms). (a, bottom) The first component that occurred at ~200 ms was sensitive to word frequency but not word length. Thus, it is not merely reflecting perception, but suggests that it indexes a lexical identification process. (b) The second LFP pattern at ~320 ms suggests inflectional processing. (c) The third LFP pattern at ~450 ms suggests phonological processing. (d) The waveform component occurring at ~450 ms, which is sensitive to phonological differences among inflectional conditions, is also sensitive to phonological complexity (the more syllables, the greater the peak) of the target word.

Young children acquire language easily and quickly when exposed to it. This behavior led Charles Darwin to suggest in his book The Descent of Man, and Selection in Relation to Sex that humans have a biological predisposition toward language. The evolutionary origins of language remain unknown, though there is no shortage of theories. Indeed, Noam Chomsky took the view in 1975 that language was so different from the communication systems used by other animals that it could not be explained in terms of natural selection. Stephen Pinker and Paul Bloom suggested in an article in 1990 that only natural selection could have produced the complex structures of language. There are divergent views as to when language emerged, whether the question trying to be explained is an underlying cognitive mechanism specific to language or a cooperative social behavior, and what crucial evolutionary problems had to be solved before language could emerge (Sterelny, 2012).

Communication is the transfer of information by speech, signals, writing, or behavior. The function of human language is to influence the behavior of others by changing what they know, think, believe, or desire (Grice, 1957), and we tend to think communication is intentional. When we are looking for the origins of language, however, we cannot assume that communication sprang up in this form. Animal communication is more specifically defined as any behavior by one animal that affects the current or future behavior of another animal, intentional or otherwise.

A well-known series of studies in animal communication were done by Robert Seyfarth and Dorothy Cheney on vervet monkeys in Kenya (Seyfarth et al., 1980). These monkeys have different alarm calls for snakes, leopards, and predatory birds. Monkeys that hear an alarm call for a snake will stand up and look down. But with a leopard call, they scamper into the trees; and with a bird call, they run from the exposed ends of the branches and huddle by the trunk. Formerly it was thought that animal vocalizations were exclusively emotional—and indeed, they most likely originated as such. A vervet, however, does not always make an alarm call, seldom calls when it is alone, and is more likely to call when it is with kin than with nonkin. The calls are not an automatic emotional reaction.

If a call is to provide information, it has to be specific (the same call can’t be used for several different reasons) and informative—it has to be made whenever a specific situation arises (Seyfarth & Cheney, 2003a). Thus, even though a scream may be an emotional reaction, if it is specific, it can convey information other than the emotion (Premack, 1972). Natural selection favors callers who vocalize to affect the behavior of listeners and listeners who acquire information from vocalizations (Seyfarth & Cheney, 2003b). The two do not need to be linked by intention originally, and indeed, vervet monkeys don’t appear to attribute mental states to others (Seyfarth & Cheney, 1986). Most animal studies suggest that although animal vocalizations may result in a change of another’s behavior, this outcome is unintentional (see Seyfarth & Cheney, 2003a).

Alarm calls have since been found in many other monkey species and non-primate species. For instance, they have been observed with meerkats (Manser et al., 2001) and chickadees (Templeton et al., 2005) among others. The Diana monkeys of West Africa comprehend the alarm calls of another species that resides in the area, the Campbell monkey (Zuberbühler, 2001). They also understand that if the alarm call is preceded by a “boom” call, the threat is not as urgent. Thus, it appears calls are strung together to enable a simple grammar, indicating that the communications include syntax and semantics (meaning). The communication skills of these monkeys are impressive, but it remains clear that such communication is quite different from human language.

Studying vocalization, however, may not be the best place to look for the precursors of human language. Michael Tomasello, a researcher at the Max Planck Institute for Evolutionary Anthropology, points out that among primates, especially the great apes, the function of communication differs depending on whether it is vocal or gestural (Tomasello, 2007). In general, vocal calls in primates tend to be involuntary signals, associated with a specific emotional state, produced in response to specific stimuli, and broadcast to the surrounding group. They are inflexible. By contrast, gestures are flexible, they are used in non-urgent contexts to initiate such things as playing and grooming with a specific individual, and some are learned socially by gorillas (Pika et al., 2003), chimps (Liebal et al., 2004), and bonobos (Pika et al., 2005). Tomasello emphasizes that unlike vocalizations, using gestures requires knowing the attentional state of the communicating partner. No good making a gesture if no one is paying attention to you. He concludes that primate gestures, which are flexible, socially learned, and require shared attention, are more like human language than primate vocalizations, which typically are inflexible, automatic, and independent from shared attention. Tomasello suggests that language evolved from gestural communication.

Interestingly, nonhuman primates have little cortical control over vocalization but have excellent cortical control over the hands and arms (Ploog, 2002). From these findings and what we know about primate anatomy, it is not surprising that attempts to teach nonhuman primates to speak have failed. Teaching them to communicate manually has been more successful. For instance, Washoe, a chimp, learned a form of manual sign language (Gardener & Gardener, 1969); and Kanzi, a bonobo, learned to point to abstract visual symbols (lexigrams) on a keyboard (Savage-Rumbaugh & Lewin, 1994).

Kanzi is able to match pictures, objects, lexigrams, and spoken words. He freely uses the keyboard to ask for objects he wants. He can indicate a place with a lexigram and then go there. He can generalize a specific reference; for instance, he uses the lexigram for bread to mean all breads including tacos. He can listen to an informational statement and adjust what he is doing using the new information. Thus, Kanzi is able to understand signs in a symbolic way.

According to Chomsky, language cannot be explained in terms of learned sequences, but depends on rules; and its most distinct feature is that it is generative, meaning that it allows us to create and understand an endless variety of novel sequences. The arrangement of the words and the meaning of the sequence depend on the conventional rules of the grammar.

Now consider that Kanzi understands the difference between “Make the doggie bite the snake” and “Make the snake bite the doggie,” and he demonstrates his understanding by using stuffed animals. Seventy percent of the time, he will respond correctly to spoken sentences (from a concealed instructor) that he has never heard before, such as “Squeeze the hot dog.” He is the first nonhuman to demonstrate either of these abilities. Kanzi uses both the keyboard and gesture, sometimes combining the two in an arbitrary rule (syntax) that he has developed. For instance, to specify an action, he will use a lexigram first and then a pointing gesture to specify the agent, always in that order, even if he has to walk across the room to point to the lexigram first and then return to indicate the agent. Not too surprising, since primates use combinations of gestures, vocalizations, and facial expressions.

Michael Corballis has also reached the conclusion that language began with gestures. He has proposed that generative language evolved, perhaps from Homo habilis on, as a system of manual gestures, but switched to a predominantly vocal system with H. sapiens sapiens (1991; 2009). Giacomo Rizzolatti and Michael Arbib (1998) suggest that language arose from a combination of gesture and facial movements, speculating that mirror neurons are a piece of the language puzzle. Mirror neurons, you recall, were first discovered in area F5 in the monkey. The dorsal portion of F5 is involved with hand movements and the ventral portion with movement of the mouth and larynx. Tantalizingly, area F5 is the homolog for Brodmann’s area (BA) 44, a portion of Broca’s area in the human. BA44 is involved not only in speech production and larynx control but also in complex hand movements as well as in sensorimotor learning and integration (Binkofski & Buccino, 2004).

Many studies in humans show how hand gestures and language are connected. For example, one study found that both congenitally blind speakers and sighted speakers gestured as they spoke at the same rate and used the same range of gesture forms. The blind speakers gestured even while they spoke to another blind person, which suggests that gestures are tightly coupled to the act of speaking (Iverson & Goldin-Meadow, 1998). Another study followed the progress of congenitally deaf Nicaraguan children who had had no previous contact with each other and were brought together in a school. Although the school was geared to teaching them to speak orally, the children, on their own, gradually developed their own fully communicative hand gesture language, complete with syntax (Senghas, 1995).

Initially this close association of hand and mouth may have been related to eating, but later could have expanded to gesture and vocal language. There is some evidence for this proposal. In macaque monkeys, neurons in the lateral part of F5 have been found to activate with conditioned vocalizations, that is, voluntary coo-calls the monkeys were trained to make (Coudé et al., 2011). We know that the left hemisphere controls the motor movements of the right side of the body, both in humans and the great apes. Chimpanzees exhibit preferential use of the right hand in gestural communication both with other chimps and with humans (Meguerditchian et al., 2010), but not when making noncommunicative gestures. This behavior is also seen in captive baboons (Meguerditchian & Vauclair, 2006; Meguerditchian et al., 2010), suggesting that the emergence of language and its typical left lateralization may have arisen from a left lateralized gestural communication system in the common ancestor of baboons, chimps, and humans.

Indeed, the cortical location of signing ability in humans has been studied in the few congenitally deaf “signers” who have had right or left-hemisphere lesions. Interestingly, patients with left-hemisphere lesions that involved the language areas of the temporal and frontal lobes were “aphasic,” or impaired in sign production and comprehension, whereas those with right hemisphere lesions in similar areas were not. They, however, were impaired in emotional processing and expression, as might be expected with impaired prosody (for a full review of the brain organization in deaf signers, see Bellugi et al., 2010).

A few brain imaging studies provide support for the theory that gestures and language are connected. In the macaque monkey, the rostral part of the inferior parietal lobule, an area involved with control of hand and orofacial action (homologous to the human supramarginal gyrus) is linked via a distinct branch of the superior longitudinal fasciculus with area 44 (homologous with part of Broca’s area) and the ventral part of the premotor cortex, which controls the orofacial musculature (Petrides & Pandya, 2009). This may be analogous to the dorsal stream in humans (see Chapter 6), involved with mapping of sound to motor articulation. These monkeys also recognize the correspondence of an auditory “coo” and “threat” call with the facial expression that coincides with it.

In addition, a PET study on chimps found that when they made a communicative gesture or an atypical novel sound when begging for food, the left inferior frontal gyrus was activated, a region considered to be homologous to Broca’s area (Taglialatela et al., 2008). What is an atypical sound? First described in 1991 (Marshall et al., 1991), atypical sounds are produced only by some captive chimps. Three have been identified: a “raspberry,” an “extended grunt,” and a “kiss.” The sounds have been observed to be socially learned and selectively produced to gain the attention of an inattentive human (Hopkins et al., 2007). This behavior suggests that chimps have some voluntary control over some of their vocalizations and facial expressions and a link between sensory perception and motor action. These sounds are unlike the species-typical vocalizations that are related to a specific emotional state (Goodall, 1986) and context (Polick & DeWaal, 2007).

The left-hemisphere dominance for language may also be present in the chimpanzee. In humans, the left lateralization of speech is actually visible: The right side of the mouth opens first and wider. In contrast, the left side gears up first with emotional expressions. In two large colonies of captive chimps, the same thing was found: A left-hemispheric dominance for the production of learned attention-getting sounds, and right-hemispheric dominance for the production of species-typical vocalizations (Losin et al., 2008; Wallex et al., 2011). These studies all suggest that the left hemisphere’s voluntary control of hand gestures (area F5) and vocalizations may have combined into an integrative system.

Chomsky was on the mark when he observed that human language is very different from the communications of other animals. That it is spontaneously generated, however, has not proven to be the case. Rudimentary roots of human language have been observed in our primate relatives in both their behavior and brain structures.

In 1990, a report was published about the KE family in England. Half the family members, spanning three generations, suffered a severe speech and language disorder (Hurst et al., 1990). Their verbal and oral dyspraxia closely resembled that seen in Broca’s aphasia. Since that time, the family has been studied extensively (i.e., Vargha-Khadem et al., 1995). In a direct comparison with patients with aphasia, they were found to be equally impaired on tests of grammar competence, manipulating inﬂectional morphology (i.e., distinctions between the same lexeme, such as the verb endings to paints, painting, painted), and derivational morphology (i.e., distinctions between different lexemes that are related, such as paintings and painting). There were differences too. In tests of word and non-word repetition, the aphasics could repeat the words but not the non-words, but the affected KE family members could do neither, suggesting that the aphasics had learned the articulation patterns of real words before the onset of their aphasia. When it came to semantic, phonemic, and written fluency, the KE family members were less impaired (Vargha-Khadem et al., 2005). Extensive behavioral testing suggested at least one core deficit: orofacial dyspraxia. Whether the semantic and other cognitive impairments were secondary to this deficit or were also core deficits remains undetermined.

The neural basis of the abnormalities was sought using structural and functional imaging. Bilateral abnormalities were seen in several motor-related regions. For instance, affected family members had a 25% reduction in the volume of the caudate nucleus. Abnormally low levels of gray matter were also found in other motor areas including the inferior frontal gyrus (Broca’s area), precentral gyrus, frontal pole, and cerebellum. Meanwhile, abnormally high levels were seen in the superior temporal gyrus (Wernicke’s area), angular gyrus, and putamen. Functional MRI studies using silent verb generation, spoken verb generation, and word repetition tasks revealed that the affected members had posterior and bilateral activations in regions not generally used for language functions for both tasks (reviewed in Vargha-Khadem et al., 2005).

By looking at the family tree, researchers found that the disorder was inherited in a simple fashion: The disorder in the KE family resulted from a defect in a single autosomal dominant gene (Hurst et al., 1990). The person with the mutation has a 50% chance of passing it to his or her offspring.

The hunt for the gene commenced at the Wellcome Trust Centre for Human Genetics at the University of Oxford. Researchers found a single base-pair mutation in the FOXP2 gene sequence (adenine for guanine) in the affected members of the KE family (Lai et al., 2001). This mutation caused the amino acid histidine to be substituted for arginine in the FOXP2 protein. The FOX genes are a large family of genes, and this particular arginine is invariant among all of them, suggesting that it has a crucial functional role.

How can one little change do so much damage? FOX genes code for proteins that are transcription factors, which act as switches that turn gene expression on or off. Mutations in FOX genes may cause phenotypes as varied as cancer, glaucoma or, as we see here in the case of the FOXP2 gene, language disorders.

If the FOXP2 gene is so important in the development of language, is it unique to humans? This question is complicated, and its complexity speaks to huge differences between talking about genes and talking about the expression of genes. The FOXP2 gene is present in a broad range of animals. The protein encoded for by the FOXP2 gene differs at five amino acids between humans and birds, three amino acids between mouse and man, and only two between humans and chimpanzees or gorillas. The sequencing of Neandertal DNA revealed that they had the same FOXP2 gene that we have (Krause et al., 2007). These researchers also found that the gene changes lie on the common modern human haplotype (DNA sequences that are next to each other on a chromosome that are transmitted together), which was shown earlier to have been subject to a selective sweep (Enard et al., 2002; Zhang et al., 2002). A selective sweep means what it sounds like. This gene was a hot item that produced a characteristic that gave its owners an obvious competitive advantage. Whoever had it had more offspring, and it became the dominant gene. These findings support the idea that these genetic changes and the selective sweep predate the common ancestor of modern human and Neandertal populations, which existed about 300,000–400,000 years ago. Thus humans do have a unique version of the FOXP2 gene that produces unique FOXP2 proteins.

Is this the gene that codes for speech and language? Not necessarily. What we have is a uniquely human modification of a gene that seems to influence human brain phenotype (Preuss, 2012). Many questions remain. For instance, what genes are regulated by FOXP2? A lot. Genes involved with morphogenesis, intracellular signaling, cation homeostasis, neuron outgrowth, axonal morphology, dendritic branching, calcium mobilization and concentration, and learning have been identified. Although this gene has been extensively studied, there is still no direct connection to human speech or language. The neuroscientist Todd Preuss at Yerkes National Primate Research Center observes that the problem with tying FOXP2 to language is that we are trying to relate a gene that has many functions to a complex, high-level phenotype. This effort is probably not realistic, because most phenotypes arise through the interactions of multiple genes, and most genes influence multiple phenotypes—lessons learned from population genetics (Preuss, 2012). Most likely the evolution of the FOXP2 gene is one of many changes on the pathway to language function.

Summary

Language is unique among mental functions in that only humans possess a true language system. How is language organized in the human brain, and what can this functional and anatomical organization tell us about the cognitive architecture of the language system? We have known for more than a century that regions around the Sylvian fissure of the dominant left hemisphere participate in language comprehension and production. Classical models, however, are insufficient for understanding the computations that support language. Newer formulations based on detailed analysis of the effects of neurological lesions (supported by improvements in structural imaging), functional neuroimaging, human electrophysiology, transcranial magnetic stimulation (TMS), and computational modeling now provide some surprising modifications of older models. The human language system is complex, and much remains to be learned about how the biology of the brain enables the rich speech and language comprehension that characterize our daily lives. The future of language research is promising as psycholinguistic models combine with neuroscience to elucidate the neural code for this uniquely human mental faculty.

Key Terms

agrammatic aphasia (p. 472)

alexia (p. 486)

anomia (p. 469)

aphasia (p. 472)

apraxia (p. 472)

arcuate fasciculus (p. 474)

Broca’s aphasia (p. 472)

Broca’s area (p. 471)

conduction aphasia (p. 474)

dysarthria (p. 472)

global aphasia (p. 474)

lexical access (p. 475)

lexical integration (p. 475)

lexical selection (p. 475)

mental lexicon (p. 475)

morpheme (p. 476)

N400 response (p. 490)

P600 response (p. 491)

phoneme (p. 476)

phonology (p. 487)

semantic (p. 476)

semantic paraphasia (p. 477)

Sylvian fissure (p. 471)

syntactic parsing (p. 490)

syntax (p. 472)

Wernicke’s aphasia (p. 473)

Wernicke’s area (p. 471)

Thought Questions

How might the mental lexicon be organized in the brain? Would we expect to find it localized in a particular spot in cortex? If not, why not?
At what stage of input processing are the comprehension of spoken and of written language the same, and where must they be different? Are there any exceptions to this rule?
Describe the route that an auditory speech signal might take in the cortex, from perceptual analysis to comprehension.
What evidence exists for the role of the right hemisphere in language processing? If the right hemisphere has a role in language, what might that role be?
Can knowledge of the world around you affect the way you process and understand words?
Describe the anatomy and circuitry on the left perisylvian language system.

H.W.:	It was, uh... leave out of here and where’s the next legally place down from here (gestures down).
M.F.:	Down? Massachusetts.
H.W.:	Next one (gestures down again). M.F.: Connecticut.
H.W.:	Yes. And that’s where I was. And at that time, the closest people to me were this far away (holds up five fingers).
M.F.:	Five miles?
H.W.:	Yes, okay, and, and everybody worked outside but I also, I went to school at a regular school. And when you were in school, you didn’t go to school by people brought you to school, you went there by going this way (uses his arms to pantomime walking).
M.F.:	By walking.
H. W.:	And to go all the way from there to where you where you went to school was actually, was, uh, uh (counts in a whisper) twelve.
H.F.:	Twelve miles?
H.W.:	Yes, and in those years you went there by going this way (pantomimes walking). When it was warm, I, I found an old one of these (uses his arms to pantomime bicycling).
M.F.:	Bicycle.
H.W.:	And I, I fixed it so it would work and I would use that when it was warm and when it got cold you just, you do this (pantomimes walking). (Funnell et al., 1996, p. 180)