A free, AI-powered research tool for scientific literature

  • Emilee Booth Chapman
  • Frida Kahlo
  • Carbon Cycle

New & Improved API for Developers

Introducing semantic reader in beta.

Stay Connected With Semantic Scholar Sign Up What Is Semantic Scholar? Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI.

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Entry Contents

Bibliography

Academic tools.

  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Theories of Meaning

The term “theory of meaning” has figured, in one way or another, in a great number of philosophical disputes over the last century. Unfortunately, this term has also been used to mean a great number of different things. In this entry, the focus is on two sorts of “theory of meaning”. The first sort of theory—a semantic theory—is a theory which assigns semantic contents to expressions of a language. The second sort of theory—a foundational theory of meaning—is a theory which states the facts in virtue of which expressions have the semantic contents that they have. Following a brief introduction, these two kinds of theory are discussed in turn.

1. Two Kinds of Theory of Meaning

2.1.1 the theory of reference, 2.1.2 theories of reference vs. semantic theories, 2.1.3 the relationship between content and reference, 2.1.4 character and content, context and circumstance, 2.1.5 possible worlds semantics, 2.1.6 russellian semantics, 2.1.7 fregean semantics, 2.2.1 davidsonian semantics, 2.2.2 internalist semantics, 2.2.3 inferentialist semantics, 2.2.4 dynamic semantics, 2.2.5 expressivist semantics, 2.3.1 how much context-sensitivity, 2.3.2 how many indices, 2.3.3 what are propositions, anyway, 3.1.1 the gricean program, 3.1.2 meaning, belief, and convention, 3.1.3 mental representation-based theories, 3.2.1 causal origin, 3.2.2 truth-maximization and the principle of charity, 3.2.3 reference magnetism, 3.2.4 regularities in use, 3.2.5 social norms, other internet resources, related entries.

In “General Semantics”, David Lewis wrote

I distinguish two topics: first, the description of possible languages or grammars as abstract semantic systems whereby symbols are associated with aspects of the world; and, second, the description of the psychological and sociological facts whereby a particular one of these abstract semantic systems is the one used by a person or population. Only confusion comes of mixing these two topics. (Lewis 1970: 19)

Lewis was right. Even if philosophers have not consistently kept these two questions separate, there clearly is a distinction between the questions “What is the meaning of this or that symbol (for a particular person or group)?” and “In virtue of what facts about that person or group does the symbol have that meaning?”

Corresponding to these two questions are two different sorts of theory of meaning. One sort of theory of meaning—a semantic theory —is a specification of the meanings of the words and sentences of some symbol system. Semantic theories thus answer the question, “What is the meaning of this or that expression?” A distinct sort of theory—a foundational theory of meaning —tries to explain what about some person or group gives the symbols of their language the meanings that they have. To be sure, the shape of a correct semantic theory places constraints on the correct foundational theory of meaning, and vice versa; but that does not change the fact that semantic theories and foundational theories are simply different sorts of theories, designed to answer different questions.

To see the distinction between semantic theories and foundational theories of meaning, it may help to consider an analogous one. Imagine an anthropologist specializing in table manners sent out to observe a distant tribe. One task the anthropologist clearly might undertake is to simply describe the table manners of that tribe—to describe the different categories into which members of the tribe place actions at the table, and to say which sorts of actions fall into which categories. This would be analogous to the task of the philosopher of language interested in semantics; her job is say what different sorts of meanings expressions of a given language have, and which expressions have which meanings.

But our anthropologist might also become interested in the nature of manners; he might wonder how, in general, one set of rules of table manners comes to be the system of etiquette governing a particular group. Since presumably the fact that a group obeys one system of etiquette rather than another is traceable to something about that group, the anthropologist might put his new question by asking,

In virtue of what facts about a person or group does that person or group come to be governed by a particular system of etiquette, rather than another?

Our anthropologist would then have embarked upon the analogue of the construction of a foundational theory of meaning: he would then be interested, not in which etiquette-related properties particular action types have in a certain group, but rather the question of how action-types can, in any group, come to acquire properties of this sort. [ 1 ] Our anthropologist might well be interested in both sorts of questions about table manners; but they are, pretty clearly, different questions. Just so, semantic theories and foundational theories of meaning are, pretty clearly, different sorts of theories.

The term “theory of meaning” has, in the recent history of philosophy, been used to stand for both semantic theories and foundational theories of meaning. As this has obvious potential to mislead, in what follows I’ll avoid the term which this article is meant to define and stick instead to the more specific “semantic theory” and “foundational theory of meaning”. “Theory of meaning” simpliciter can be understood as ambiguous between these two interpretations.

Before turning to discussion of these two sorts of theories, it is worth noting that one prominent tradition in the philosophy of language denies that there are facts about the meanings of linguistic expressions. (See, for example, Quine 1960 and Kripke 1982; for critical discussion, see Soames 1997.) If this sort of skepticism about meaning is correct, then there is neither a true semantic theory nor a true foundational theory of meaning to be found, since the relevant sort of facts simply are not around to be described or analyzed. Discussion of these skeptical arguments is beyond the scope of this entry, so in what follows I’ll simply assume that skepticism about meaning is false.

2. Semantic Theories

The task of explaining the main approaches to semantic theory in contemporary philosophy of language might seem to face an in-principle stumbling block. Given that no two languages have the same semantics—no two languages are comprised of just the same words, with just the same meanings—it may seem hard to see how we can say anything about different views about semantics in general, as opposed to views about the semantics of this or that language. This problem has a relatively straightforward solution. While it is of course correct that the semantics for English is one thing and the semantics for French something else, most assume that the various natural languages should all have semantic theories of (in a sense to be explained) the same form. The aim of what follows will, accordingly, be to introduce the reader to the main approaches to natural language semantics—the main views about the right form for a semantics for a natural language to take—rather than to provide a detailed examination of the various views about the semantics of some particular expression. (For an overview, see the entry on word meaning . For discussion of issues involving particular expression types, see the entries on names , quantifiers and quantification , descriptions , propositional attitude reports , and natural kinds .)

One caveat before we get started: before a semantic theorist sets off to explain the meanings of the expressions of some language, she needs a clear idea of what she is supposed to explain the meaning of . This might not seem to present much of a problem; aren’t the bearers of meaning just the sentences of the relevant language, and their parts? This is correct as far as it goes. But the task of explaining what the semantically significant parts of a sentence are, and how those parts combine to form the sentence, is as complex as semantics itself, and has important consequences for semantic theory. Indeed, most disputes about the right semantic treatment of some class of expressions are intertwined with questions about the syntactic form of sentences in which those expressions figure. Unfortunately, discussion of theories of this sort, which attempt to explain the syntax, or logical form, of natural language sentences, is well beyond the scope of this entry. As a result, figures like Richard Montague, whose work on syntax and its connection to semantics has been central to the development of semantic theory over the past few decades, are passed over in what follows. (Montague’s essays are collected in Montague 1974.) For an excellent introduction to the connections between syntax and semantics, see Heim & Kratzer (1998); for an overview of the relations between philosophy of language and several branches of linguistics, see Moss (2012).

There are a wide variety of approaches to natural language semantics. My strategy in what follows will be to begin by explaining one prominent family of approaches to semantics which developed over the course of the twentieth century and is still prominently represented in contemporary work in semantics, both in linguistics and in philosophy. For lack of a better term, let's call these types of semantic theories classical semantic theories . (As in discussions of classical logic, the appellation “classical” is not meant to suggest that theories to which this label is applied are to be preferred over others.) Classical semantic theories agree that sentences are (typically) true or false, and that whether they are true or false depends on what information they encode or express. This “information” is often called “the proposition expressed by the sentence”. The job of a semantic theory, according to the classical theorist, is at least in large part to explain how the meanings of the parts of the sentence, along with the context in which the sentence is used, combine to determine which proposition the sentence expresses in that context (and hence also the truth conditions of the sentence, as used in that context).

Classical semantic theories are discussed in §2.1 . In §§2.1.1–4 the theoretical framework common to classical semantic theories is explained; in §§2.1.5–7 the differences between three main versions of classical semantic theories are explained. In §2.2 there is a discussion of the alternatives to classical semantic theories. In §2.3 few general concluding questions are discussed; these are questions semantic theorists face which are largely, though not completely, orthogonal to one’s view about the form which a semantic theory ought to take.

2.1 Classical Semantic Theories

The easiest way to understand the various sorts of classical semantic theories is by beginning with another sort of theory: a theory of reference.

A theory of reference is a theory which pairs expressions with the contribution those expressions make to the determination of the truth-values of sentences in which they occur. (Though later we will see that this view of the reference of an expression must be restricted in certain ways.)

This construal of the theory of reference is traceable to Gottlob Frege ’s attempt to formulate a logic sufficient for the formalization of mathematical inferences (see especially Frege 1879 and 1892.) The construction of a theory of reference of this kind is best illustrated by beginning with the example of proper names. Consider the following sentences:

  • (1) Barack Obama was the 44th president of the United States.
  • (2) John McCain was the 44th president of the United States.

(1) is true, and (2) is false. Obviously, this difference in truth-value is traceable to some difference between the expressions “Barack Obama” and “John McCain”. What about these expressions explains the difference in truth-value between these sentences? It is very plausible that it is the fact that “Barack Obama” stands for the man who was in fact the 44th president of the United States, whereas “John McCain” stands for a man who was not. This suggests that the reference of a proper name—its contribution to the determination of truth conditions of sentences in which it occurs—is the object for which that name stands. (While this is plausible, it is not uncontroversial that the purpose of a name is to refer to an individual; see Graff Fara (2015) and Jeshion (2015) for arguments on opposite sides of this issue.)

Given this starting point, it is a short step to some conclusions about the reference of other sorts of expressions. Consider the following pair of sentences:

  • (3) Barack Obama is a Democrat.
  • (4) Barack Obama is a Republican.

Again, the first of these is true, whereas the second is false. We already know that the reference of “Barack Obama” is the man for which the name stands; so, given that reference is power to affect truth-value, we know that the reference of predicates like “is a Democrat” and “is a Republican” must be something which combines with an object to yield a truth-value. Accordingly, it is natural to think of the reference of predicates of this sort as functions from objects to truth-values. The reference of “is a Democrat” is that function which returns the truth-value “true” when given as input an object which is a member of the Democratic party (and the truth-value “false” otherwise), whereas the reference of “is a Republican” is a function which returns the truth-value “true” when given as input an object which is a member of the Republican party (and the truth-value “false” otherwise). This is what explains the fact that (3) is true and (4) false: Obama is a member of the Democratic party, and is not a member of the Republican party.

Matters get more complicated, and more controversial, as we extend this sort of theory of reference to cover more and more of the types of expressions we find in natural languages like English. (For an introduction, see Heim and Kratzer (1998).) But the above is enough to give a rough idea of how one might proceed. For example, some predicates, like “loves” combine with two names to form a sentence, rather than one. So the reference of two-place predicates of this sort must be something which combines with a pair of objects to determine a truth-value—perhaps, that function from ordered pairs of objects to truth-values which returns the truth-value “true” when given as input a pair of objects whose first member loves the second member, and “false” otherwise.

So let’s suppose that we have a theory of reference for a language, in the above sense. Would we then have a satisfactory semantic theory for the language?

Some plausible arguments indicate that we would not. To adopt an example from Quine (1970 [1986], pp. 8–9), let’s assume that the set of animals with hearts (which Quine, for convenience, calls “cordates” – not to be confused with “chordates”) is the same as the set of animals with kidneys (which Quine calls “renates”). Now, consider the pair of sentences:

  • (5) All cordates are cordates.
  • (6) All cordates are renates.

Given our assumption, both sentences are true. Moreover, from the point of view of the theory of reference, (5) and (6) are just the same: they differ only in the substitution of “renates” for “cordates”, and these expressions have the same reference (because they stand for the same function from objects to truth-values).

All the same, there is clearly an intuitive difference in meaning between (5) and (6); the sentences seem, in some sense, to say different things. The first seems to express the trivial, boring thought that every creature with a heart is a creature with a heart, whereas the second expresses the non-trivial, potentially informative claim that every creature with a heart also has a kidney. This suggests that there is an important difference between (5) and (6) which our theory of reference simply fails to capture.

Examples of the same sort can be generated using pairs of expressions of other types which share a reference, but intuitively differ in meaning; for example, “Clark Kent” and “Superman”, or (an example famously discussed by Frege (1892 [1960]), “the Morning Star” and “the Evening Star”.

This might seem a rather weak argument for the incompleteness of the theory of reference, resting as it does on intuitions about the relative informativeness of sentences like (5) and (6). But this argument can be strengthened by embedding sentences like (5) and (6) in more complex sentences, as follows:

  • (7) John believes that all cordates are cordates .
  • (8) John believes that all cordates are renates .

(7) and (8) differ only with respect to the italic expressions and, as we noted above, these expressions have the same reference. Despite this, it seems clear that (7) and (8) could differ in truth-value: someone could know that all cordates have a heart without having any opinion on the question of whether all cordates have a kidney. But that means that the references of expressions don’t even do the job for which they were introduced: they don’t explain the contribution that expressions make to the determination of the truth-value of all sentences in which they occur. (One might, of course, still think that the reference of an expression explains its contribution to the determination of the truth-value of a suitably delimited class of simple sentences in which the expression occurs.) If we are to be able to explain, in terms of the properties of the expressions that make them up, how (7) and (8) can differ in truth-value, then expressions must have some other sort of value, some sort of meaning, which goes beyond reference.

(7) and (8) are called belief ascriptions , for the obvious reason that they ascribe a belief to a subject. Belief ascriptions are one sort of propositional attitude ascription —other types include ascriptions of knowledge, desire, or judgment. As will become clear in what follows, propositional attitude ascriptions have been very important in recent debates in semantics. One of the reasons why they have been important is exemplified by (7) and (8). Because these sentences can differ in truth-value despite the fact that they differ only with respect to the italic words, and these words both share a reference and occupy the same place in the structure of the two sentences, we say that (7) and (8) contain a non-extensional context : roughly, a “location” in the sentence which is such that substitution of terms which share a reference in that location can change truth-value. (They’re called “non-extensional contexts” because “extension” is another term for “reference”.)

We can give a similar argument for the incompleteness of the theory of reference based on the substitution of whole sentences. A theory of reference assigns to subsentential expressions values which explain their contribution to the truth-values of sentences; but to those sentences, it only assigns “true” or “false”. But consider a pair of sentences like

  • (9) Mary believes that Barack Obama was the president of the United States .
  • (10) Mary believes that John Key was the prime minister of New Zealand .

Because both of the italic sentences are true, (9) and (10) are a pair of sentences which differ only with respect to substitution of expressions (namely, the italic sentences) with the same reference. Nonetheless, (9) and (10) could plainly differ in truth-value.

This seems to show that a semantic theory should assign some value to sentences other than a truth-value. Another route to this conclusion is the apparent truth of claims of the following sort:

  • There are three things that John believes about Indiana, and they are all false.
  • There are many necessary truths which are not a priori , and my favorite sentence expresses one of them.
  • To get an A you must believe everything I say.

Sentences like these seem to show that there are things which are the objects of mental states like belief, the bearers of truth and falsity as well as modal properties like necessity and possibility and epistemic properties like a prioricity and posterioricity, and the things expressed by sentences. What are these things? The theory of reference provides no answer.

These entities are often called propositions . Friends of propositions aim both to provide a theory of these entities, and, in so doing, also to solve the two problems for the theory of reference discussed above: (i) the lack of an explanation for the fact that (5) is trivial while (6) is not, and (ii) the fact (exemplified by (7)/(8) and (9)/(10) ) that sentences which differ only in the substitution of expressions with the same reference can differ in truth-value.

A theory of propositions thus does not abandon the theory of reference, as sketched above, but simply says that there is more to a semantic theory than the theory of reference. Subsentential expressions have, in addition to a reference, a content . The contents of sentences—what sentences express—are known as propositions .

The natural next question is: What sorts of things are contents? Below I’ll discuss some of the leading answers to this question. But in advance of laying out any theory about what contents are, we can say some general things about the role that contents are meant to play.

First, what is the relationship between content and reference? Let’s examine this question in connection with sentences; here it amounts to the question of the relationship between the proposition a sentence expresses and the sentence’s truth-value. One point brought out by the example of (9) and (10) is that two sentences can express different propositions while having the same truth-value. After all, the beliefs ascribed to Mary by these sentences are different; so if propositions are the objects of belief, the propositions corresponding to the italic sentences must be different. Nonetheless, both sentences are true.

Is the reverse possible? Can two sentences express the same proposition, but differ in truth-value? It seems not, as can be illustrated again by the role of propositions as the objects of belief. Suppose that you and I believe the exact same thing—both of us believe the world to be just the same way. Can my belief be true, and yours false? Intuitively, it seems not; it seems incoherent to say that we both believe the world to be the same way, but that I get things right and you get them wrong. (Though see the discussion of relativism in §2.3.2 below for a dissenting view.) So it seems that if two sentences express the same proposition, they must have the same truth value.

In general, then, it seems plausible that two sentences with the same content—i.e., which express the same proposition—must always have the same reference, though two expressions with the same reference can differ in content. This is the view stated by the Fregean slogan that sense determines reference (“sense” being the conventional translation of Frege’s Sinn , which was his word for what we are calling “content”).

If this holds for sentences, does it also hold for subsentential expressions? It seems that it must. Suppose for reductio that two subsentential expressions, e and e* , have the same content but differ in reference. It seems plausible that two sentences which differ only by the substitution of expressions with the same content must have the same content. (While plausible, this principle is not uncontroversial; see entry on compositionality .) But if this is true, then sentences which differ only in the substitution of e and e* would have the same content. But such a pair of sentences could differ in truth-value, since, for any pair of expressions which differ in reference, there is some pair of sentences which differ only by the substitution of those expressions and differ in truth-value. So if there could be a pair of expressions like e and e* , which differ in their reference but not in their content, there could be a pair of sentences which have the same content—which express the same proposition—but differ in truth-value. But this is what we argued above to be impossible; hence there could be no pair of expressions like e and e* , and content must determine reference for subsentential expressions as well as sentences.

This result—that content determines reference—explains one thing we should, plausibly, want a semantic theory to do: it should assign to each expression some value—a content—which determines a reference for that expression.

However, there is an obvious problem with the idea that we can assign a content, in this sense, to all of the expressions of a language like English: many expressions, like “I” or “here”, have a different reference when uttered by different speakers in different situations. So we plainly cannot assign to “I” a single content which determines a reference for the expression, since the expression has a different reference in different situations. These “situations” are typically called contexts of utterance , or just contexts , and expressions whose reference depends on the context are called indexicals (see entry) or context-dependent expressions .

The obvious existence of such expressions shows that a semantic theory must do more than simply assign contents to every expression of the language. Expressions like “I” must also be associated with rules which determine the content of the expression, given a context of utterance. These rules, which are (or determine) functions from contexts to contents, are called characters . (The terminology here, as well as the view of the relationship between context, content, and reference, is due to Kaplan (1989).) So the character of “I” must be some function from contexts to contents which, in a context in which I am the speaker, delivers a content which determines me as reference; in a context in which Barack Obama is the speaker, delivers a content which determines Barack Obama as reference; and so on. (See figure 1.)

Figure 1. [An extended description of figure 1 is in the supplement.]

Here we face another potentially misleading ambiguity in “meaning”. What is the real meaning of an expression—its character, or its content (in the relevant context)? The best answer here is a pluralist one. Expressions have characters which, given a context, determine a content. We can talk about either character or content, and both are important. The important thing is to be clear on the distinction, and to see the reasons for thinking that expressions have both a character and (relative to a context) a content.

How many indexical expressions are there? There are some obvious candidates—“I”, “here”, “now”, etc.—but beyond the obvious candidates, it is very much a matter of dispute; for discussion, see §2.3.1 below.

But there is a kind of argument which seems to show that almost every expression is an indexical. Consider an expression which does not seem to be context-sensitive, like “the second-largest city in the United States”. This does not seem to be context-sensitive, because it seems to refer to the same city—Los Angeles—whether uttered by me, you, or some other English speaker. But now consider a sentence like

  • (11) 100 years ago, the second-largest city in the United States was Chicago.

This sentence is true. But for it to be true, “the second-largest city in the United States” would have to, in (11), refer to Chicago. But then it seems like this expression must be an indexical—its reference must depend on the context of utterance. In (11), the thought goes, the phrase “one hundred years ago” shifts the context: in (11), “the second-largest city in the United States” refers to that city that it would have referred to if uttered one hundred years ago.

However, this can’t be quite right, as is shown by examples like this one:

  • (12) In 100 years, I will not exist.

Let’s suppose that this sentence, as uttered by me, is true. Then, if what we said about (11) was right, it seems that “I” must, in, (12), refer to whoever it would refer to if it were uttered 100 years in the future. So the one thing we know is that (assuming that (12) is true), it does not refer to me—after all, I won’t be around to utter anything. But, plainly, the “I” in (12) does refer to me when this sentence is uttered by me—after all, it is a claim about me. What’s going on here?

What examples like (12) are often taken to show is that the reference of an expression must be relativized, not just to a context of utterance, but also to a circumstance of evaluation —roughly, the possible state of the world relevant to the determination of the truth or falsity of the sentence. In the case of many simple sentences, context and circumstance coincide; details aside, they both just are the state of the world at the time of the utterance, with a designated speaker and place. But sentences like (12) show that they can come apart. Phrases like “In 100 years” shift the circumstance of evaluation—they change the state of the world relevant to the evaluation of the truth or falsity of the sentence—but don’t change the context of utterance. That’s why when I utter (12), “I” refers to me—despite the fact that I won’t exist to utter it in 100 years time.

Figure 2. [An extended description of figure 2 is in the supplement.]

This is sometimes called the need for double-indexing semantics —the two indices being contexts of utterance and circumstances of evaluation. (See figure 2.)

The classic explanation of a double-indexing semantics is Kaplan (1989); another important early discussion is Kamp (1971). For a different interpretation of the framework, see David Lewis (1980). For a classic discussion of some of the philosophical issues raised by indexicals, see Perry (1979).

Double-indexing explains how we can regard the reference of “the second-largest city in the United States” in (11) to be Chicago, without taking “the second-largest city in the United States” to be an indexical like “I”. On this view, “the second-largest city in the United States” does not vary in content depending on the context of utterance; rather, the content of this phrase is such that it determines a different reference with respect to different circumstances of evaluation. In particular, it has Los Angeles as its reference with respect to the present state of the actual world, and has Chicago as its reference with respect to the state of actual world 100 years ago. [ 2 ] Because “the second-largest city in the United States” refers to different things with respect to different circumstances, it is not a rigid designator (see entry) —these being expressions which (relative to a context of utterance) refer to the same object with respect to every circumstance of evaluation at which that object exists, and never refer to anything else with respect to another circumstance of evaluation. (The term “rigid designator” is due to Kripke [1972].)

(Note that this particular example assumes the highly controversial view that circumstances of evaluation include, not just possible worlds, but also times. For a discussion of different views about the nature of circumstances of evaluation and their motivations, see §2.3.2 below.)

So we know that expressions are associated with characters, which are functions from contexts to contents; and we know that contents are things which, for each circumstance of evaluation, determine a reference. We can now raise a central question of (classical) semantic theories: what sorts of things are contents? The foregoing suggests a pleasingly minimalist answer to this question: perhaps, since contents are things which together with circumstances of evaluation determine a reference, contents just are functions from circumstances of evaluation to a reference.

This view sounds abstract but is, in a way, quite intuitive. The idea is that the meaning of an expression is not what the expression stands for in the relevant circumstance, but rather a rule which tells you what the expression would stand for if the world were a certain way. So, on this view, the content of an expression like “the tallest man in the world” is not simply the man who happens to be tallest, but rather a function from ways the world might be to men—namely, that function which, for any way the world might be, returns as a referent the tallest man in that world (if there is one, and nothing otherwise). This fits nicely with the intuitive idea that to understand such an expression one needn’t know what the expression actually refers to—after all, one can understand “the tallest man” without knowing who the tallest man is—but must know how to tell what the expression would refer to, given certain information about the world (namely, the heights of all the men in it).

These functions, or rules, are called (following Carnap (1947)) intensions. Possible worlds semantics is the view that contents are intensions (and hence that characters are functions from contexts to intensions, i.e. functions from contexts to functions from circumstances of evaluation to a reference). (See figure 3.)

Figure 3. [An extended description of figure 3 is in the supplement.]

For discussion of the application of the framework of possible world semantics to natural language, see David Lewis (1970). The intension of a sentence—i.e., the proposition that sentence expresses, on the present view—will then be a function from worlds to truth-values. In particular, it will be that function which returns the truth-value true for every world with respect to which that sentence is true, and false otherwise. The intension of a simple predicate like “is red” will be a function from worlds to the function from objects to truth-values which, for each world, returns the truth-value true if the thing in question is red, and returns the truth-value false otherwise. In effect, possible worlds semantics takes the meanings of expressions to be functions from worlds to the values which would be assigned by a theory of reference to those expressions at the relevant world: in that sense, intensions are a kind of “extra layer” on top of the theory of reference.

This extra layer promises to solve the problem posed by non-extensional contexts, as illustrated by the example of “cordate” and “renate” in (7) and (8) . Our worry was that, since these expressions have the same reference, if meaning just is reference, then it seems that any pair of sentences which differ only in the substitution of these expressions must have the same truth-value. But (7) and (8) are such a pair of sentences, and needn’t have the same truth-value. The proponent of possible worlds semantics solves this problem by identifying the meaning of these expressions with their intension rather than their reference, and by pointing out that “cordate” and “renate”, while they share a reference, seem to have different intensions. After all, even if in our world every creature with a heart is a creature with a kidney (and vice versa), it seems that the world could have been such that some creatures had a heart but not a kidney. Since with respect to that circumstance of evaluation the terms will differ in reference, their intensions—which are just functions from circumstances of evaluations to referents—must also differ. Hence possible worlds semantics leaves room for (7) and (8) to differ in truth value, as they manifestly can.

The central problem facing possible worlds semantics, however, concerns sentences of the same form as (7) and (8) : sentences which ascribe propositional attitudes, like beliefs, to subjects. To see this problem, we can begin by asking: according to possible worlds semantics, what does it take for a pair of sentences to have the same content (i.e., express the same proposition)? Since contents are intensions, and intensions are functions from circumstances of evaluation to referents, it seems that two sentences have the same content, according to possible worlds semantics, if they have the same truth-value with respect to every circumstance of evaluation. In other words, two sentences express the same proposition if and only if it is impossible for them to differ in truth-value.

The problem is that there are sentences which have the same truth-value in every circumstance of evaluation, but seem to differ in meaning. Consider, for example

  • (13) \(2+2=4\).
  • (14) There are infinitely many prime numbers.

(13) and (14) are are both, like other truths of mathematics, necessary truths. Hence (13) and (14) have the same intension and, according to possible worlds semantics, must have the same content.

But this is highly counterintuitive. (13) and (14) certainly seem to say different things. The problem (just as with (5) and (6) ) can be sharpened by embedding these sentences in propositional attitude ascriptions:

  • (15) John believes that \(\mathit{2+2=4}\) .
  • (16) John believes that there are infinitely many prime numbers .

As we have just seen, the proponent of possible worlds semantics must take the italic sentences, (13) and (14), to have the same content; hence, it seems, the proponent of possible worlds semantics must take (15) and (16) to be a pair of sentences which differ only in the substitution of expressions with the same content. But then it seems that the proponent of possible worlds semantics must take this pair of sentences to express the same proposition, and have the same truth-value; but (15) and (16) (like (7) and (8) ) seem to differ in truth-value, and hence seem not to express the same proposition. For an influential extension of this argument, see Soames (1988).

For attempts to reply to the argument from within the framework of possible worlds semantics, see among other places Stalnaker (1984) and Yalcin (2018); for discussion of a related approach to semantics which aims to avoid these problems, see situations in natural language semantics (see entry) . Another option is to invoke impossible as well as possible worlds; one might then treat propositions as sets of worlds, which may or may not be possible. If there is an impossible world in which there are only finitely many primes but in which 2+2=4, that would promise to give us the resources to distinguish between the set of worlds in which (13) is true and the set of worlds in which (14) is true, and hence to explain the difference in truth-value between (15) and (16) . For an overview of issues involving impossible worlds, see Nolan (2013).

What we need, then, is an approach to semantics which can explain how sentences like (13) and (14) , and hence also (15) and (16) , can express different propositions. That is, we need a view of propositions which makes room for the possibility that a pair of sentences can be true in just the same circumstances but nonetheless have genuinely different contents.

A natural thought is that (13) and (14) have different contents because they are about different things; for example, (14) makes a general claim about the set of prime numbers whereas (13) is about the relationship between the numbers 2 and 4. One might want our semantic theory to be sensitive to such differences: to count two sentences as expressing different propositions if they are have different subject matters, in this sense. One way to secure this result is to think of the contents of subsentential expressions as components of the proposition expressed by the sentence as a whole. Differences in the contents of subsentential expressions would then be sufficient for differences in the content of the sentence as a whole; so, for example, since (14) but not (13) contains an expression which refers to prime numbers, these sentences will express different propositions.

Proponents of this sort of view think of propositions as structured : as having constituents which include the meanings of the expressions which make up the sentence expressing the relevant proposition. (See, for more discussion, the entry on structured propositions .) One important question for views of this sort is: what does it mean for an abstract object, like a proposition, to be structured, and have constituents? But this question would take us too far afield into metaphysics (see §2.3.3 below for a brief discussion). The fundamental semantic question for proponents of this sort of structured proposition view is: what sorts of things are the constituents of propositions?

The answer to this question given by a proponent of Russellian propositions is: objects, properties, relations, and functions. (The view is called “Russellianism” because of its resemblance to the view of content defended in Chapter IV of Russell 1903.) So described, Russellianism is a general view about what sorts of things the constituents of propositions are, and does not carry a commitment to any views about the contents of particular types of expressions. However, most Russellians also endorse a particular view about the contents of proper names which is known as Millianism : the view that the meaning of a simple proper name is the object (if any) for which it stands.

Russellianism has much to be said for it. It not only solves the problems with possible worlds semantics discussed above, but fits well with the intuitive idea that the function of names is to single out objects, and the function of predicates is to (what else?) predicate properties of those objects.

However, Millian-Russellian semantic theories also face some problems. Some of these are metaphysical in nature, and are based on the premise that propositions which have objects among their constituents cannot exist in circumstances in which those objects do not exist. (For discussion, see the entry on singular propositions .) Of the semantic objections to Millian-Russellian semantics, two are especially important.

The first of these problems involves the existence of empty names : names which have no referent. It is a commonplace that there are such names; an example is “Vulcan”, the name introduced for the planet between Mercury and the sun which was causing perturbations in the orbit of Mercury. Because the Millian-Russellian says that the content of a name is its referent, the Millian-Russellian seems forced into saying that empty names lack a content. But this is surprising; it seems that we can use empty names in sentences to express propositions and form beliefs about the world. The Millian-Russellian owes some explanation of how this is possible, if such names genuinely lack a content. An excellent discussion of this problem from a Millian point of view is provided in Braun (1993).

Perhaps the most important problem facing Millian-Russellian views, though, is Frege’s puzzle. Consider the sentences

  • (17) Clark Kent is Clark Kent.
  • (18) Clark Kent is Superman.

According to the Millian-Russellian, (17) and (18) differ only in the substitution of expressions with have the same content: after all, “Clark Kent” and “Superman” are proper names which refer to the same object, and the Millian-Russellian holds that the content of a proper name is the object to which that name refers. But this is a surprising result. These sentences seem to differ in meaning, because (17) seems to express a trivial, obvious claim, whereas (18) seems to express a non-trivial, potentially informative claim.

This sort of objection to Millian-Russellian views can (as above) be strengthened by embedding the intuitively different sentences in propositional attitude ascriptions, as follows:

  • (19) Lois believes that Clark Kent is Clark Kent .
  • (20) Lois believes that Clark Kent is Superman .

The problem posed by (19) and (20) for Russellian semantics is analogous to the problem posed by (15) and (16) for possible worlds semantics. Here, as there, we have a pair of belief ascriptions which seem as though they could differ in truth-value despite the fact that these sentences differ only with respect to expressions counted as synonymous by the relevant semantic theory.

Russellians have offered a variety of responses to Frege’s puzzle. Many Russellians think that our intuition that sentences like (19) and (20) can differ in truth-value is based on a mistake. This mistake might be explained at least partly in terms of a confusion between the proposition semantically expressed by a sentence in a context and the propositions speakers would typically use that sentence to pragmatically convey (Salmon 1986; Soames 2002), or in terms of the fact that a single proposition may be believed under several “propositional guises” (again, see Salmon 1986), or in terms of a failure to integrate pieces of information stored using distinct mental representations (Braun & Saul 2002). [ 3 ] Alternatively, a Russellian might try to make room for (19) and (20) to genuinely differ in truth-value by giving up the idea that sentences which differ only in the substitution of proper names with the same content must express the same proposition (Taschek 1995, Fine 2007).

However, these are not the only responses to Frege’s puzzle. Just as the Russellian responded to the problem posed by (15) and (16) by holding that two sentences with the same intension can differ in meaning, one might respond to the problem posed by (19) and (20) by holding that two names which refer to the same object can differ in meaning, thus making room for (19) and (20) to differ in truth-value. This is to endorse a Fregean response to Frege’s puzzle, and to abandon the Russellian approach to semantics (or, at least, to abandon Millian-Russellian semantics).

Fregeans, like Russellians, think of the proposition expressed by a sentence as a structured entity with constituents which are the contents of the expressions making up the sentence. But Fregeans, unlike Russellians, do not think of these propositional constituents as the objects, properties, and relations for which these expressions stand; instead, Fregeans think of the contents as modes of presentation, or ways of thinking about, objects, properties, and relations. The standard term for these modes of presentation is sense . (As with “intension”, “sense” is sometimes also used as a synonym for “content”. But, as with “intension”, it avoids confusion to restrict “sense” for “content, as construed by Fregean semantics”. It is then controversial whether there are such things as senses, and whether they are the contents of expressions.) Frege explained his view of senses with an analogy:

The reference of a proper name is the object itself which we designate by its means; the idea, which we have in that case, is wholly subjective; in between lies the sense, which is indeed no longer subjective like the idea, but is yet not the object itself. The following analogy will perhaps clarify these relationships. Somebody observes the Moon through a telescope. I compare the Moon itself to the reference; it is the object of the observation, mediated by the real image projected by the object glass in the interior of the telescope, and by the retinal image of the observer. The former I compare to the sense, the latter is like the idea or experience. The optical image in the telescope is indeed one-sided and dependent upon the standpoint of observation; but it is still objective, inasmuch as it can be used by several observers. At any rate it could be arranged for several to use it simultaneously. But each one would have his own retinal image. (Frege 1892 [1960])

Senses are then objective, in that more than one person can express thoughts with a given sense, and correspond many-one to objects. Thus, just as Russellian propositions correspond many-one to intensions, Fregean propositions correspond many-one to Russellian propositions. This is sometimes expressed by the claim that Fregean contents are more fine-grained than Russellian contents (or intensions).

Indeed, we can think of our three classical semantic theories, along with the theory of reference, as related by this kind of many-one relation, as illustrated by the chart below:

diagram: link to extended description below

Figure 4. [An extended description of figure 4 is in the supplement.]

The principal argument for Fregean semantics (which also motivated Frege himself) is the neat solution the view offers to Frege’s puzzle: the view says that, in cases like (19) and (20) in which there seems to be a difference in content, there really is a difference in content: the names share a reference, but differ in their sense, because they differ in their mode of presentation of their shared reference.

The principal challenge for Fregeanism is the challenge of giving a non-metaphorical explanation of the nature of sense. This is a problem for the Fregean in a way that it is not for the possible worlds semanticist or the Russellian since the Fregean, unlike these two, introduces a new class of entities to serve as meanings of expressions rather than merely appropriating an already recognized sort of entity—like a function, or an object, property, or relation—to serve this purpose. [ 4 ]

A first step toward answering this challenge is provided by a criterion for telling when two expressions differ in meaning, which might be stated as follows. In his 1906 paper, “A Brief Survey of My Logical Doctrines”, Frege seems to endorse the following criterion:

Frege’s criterion of difference for senses Two sentences S and S* differ in sense if and only if some rational agent who understood both could, on reflection, judge that S is true without judging that S* is true.

One worry about this formulation concerns the apparent existence of pairs of sentences, like “If Obama exists, then Obama=Obama” and “If McCain exists, McCain=McCain” which are such that any rational person who understands both will take both to be true. These sentences seem intuitively to differ in content—but this is ruled out by the criterion above. One idea for getting around this problem would be to state our criterion of difference for senses of expressions in terms of differences which result from substituting one expression for another:

Two expressions e and e* differ in sense if and only if there are a pair of sentences, S and S* which (i) differ only in the substitution of e for e * and (ii) are such that some rational agent who understood both could, on reflection, judge that S is true without judging that S* is true.

This version of the criterion has Frege’s formulation as a special case, since sentences are, of course, expressions; and it solves the problem with obvious truths, since it seems that substitution of sentences of this sort can change the truth value of a propositional attitude ascription. Furthermore, the criterion delivers the wanted result that coreferential names like “Superman” and “Clark Kent” differ in sense, since a rational, reflective agent like Lois Lane could think that (17) is true while withholding assent from (18) .

But even if this tells us when names differ in sense, it does not quite tell us what the sense of a name is . Here is one initially plausible way of explaining what the sense of a name is. We know that, whatever the content of a name is, it must be something which determines as a reference the object for which the name stands; and we know that, if Fregeanism is true, this must be something other than the object itself. A natural thought, then, is that the content of a name—its sense—is some condition which the referent of the name uniquely satisfies. Coreferential names can differ in sense because there is always more than one condition which a given object uniquely satisfies. (For example, Superman/Clark Kent uniquely satisfies both the condition of being the superhero Lois most admires, and the newspaperman she least admires.) Given this view, it is natural to then hold that names have the same meanings as definite descriptions —phrases of the form “the so-and-so”. After all, phrases of this sort seem to be designed to pick out the unique object, if any, which satisfies the condition following the “the”. (For more discussion, see entry on descriptions .) This Fregean view of names is called Fregean descriptivism .

However, as Saul Kripke argued in Naming and Necessity , Fregean descriptivism faces some serious problems. Here is one of the arguments he gave against the view, which is called the modal argument . Consider a name like “Aristotle”, and suppose for purposes of exposition that the sense I associate with that name is the sense of the definite description “the greatest philosopher of antiquity”. Now consider the following pair of sentences:

  • (21) Necessarily, if Aristotle exists, then Aristotle is Aristotle .
  • (22) Necessarily, if Aristotle exists, then Aristotle is the greatest philosopher of antiquity .

If Fregean descriptivism is true, and “the greatest philosopher of antiquity” is indeed the description I associate with the name “Aristotle”, then it seems that (21) and (22) must be a pair of sentences which differ only via the substitution of expressions (the italic ones) with the same content. If this is right, then (21) and (22) must express the same proposition, and have the same truth-value. But this seems to be a mistake; while (21) appears to be true (Aristotle could hardly have failed to be himself), (22) appears to be false (perhaps Aristotle could have been a shoemaker rather than a philosopher; or perhaps if Plato had worked a bit harder, he rather than Aristotle could have been the greatest philosopher of antiquity).

An important precursor to Kripke’s arguments against Fregean descriptivism is Marcus (1961), which argues that names are “tags” for objects rather than abbreviated descriptions. Fregean descriptivists have given various replies to Kripke’s modal and other arguments; see especially Plantinga (1978), Dummett (1981), and Sosa (2001). For rejoinders to these Fregean replies, see Soames (1998, 2002) and Caplan (2005). For a defense of a view of descriptions which promises a reply to the modal argument, see Rothschild (2007). For a brief sketch of Kripke’s other arguments against Fregean descriptivism, see names, §2.4 .

Kripke’s arguments provide a strong reason for Fregeans to deny Fregean descriptivism, and hold instead that the senses of proper names are not the senses of any definite description associated with those names by speakers. The main problem for this sort of non-descriptive Fregeanism is to explain what the sense of a name might be such that it can determine the reference of the name, if it is not a condition uniquely satisfied by the reference of the name. Non-descriptive Fregean views were defended in McDowell (1977) and Evans (1981). The most sophisticated and well-developed version of the view is a kind of blend of Fregean semantics and possible worlds semantics. This is the epistemic two-dimensionalist approach to semantics which has been developed by David Chalmers. See Chalmers (2004,2006).

Three other problems for Fregean semantics are worth mentioning. The first is the problem of whether the Fregean can give an adequate treatment of indexical expressions. A classic argument that the Fregean cannot is given in Perry (1977); for a Fregean reply, see Evans (1981).

The first calls into question the Fregean’s claim to have provided a plausible solution to Frege’s puzzle. The Fregean resolves instances of Frege’s puzzle by positing differences in sense to explain apparent differences in truth-value. But this sort of solution, if pursued generally, seems to lead to the surprising result that no two expressions can have the same content. For consider a pair of expressions which really do seem to have the same content, like “catsup” and “ketchup”. (The example, as well as the argument to follow, is borrowed from Salmon 1990.) Now consider Bob, a confused condiment user, who thinks that the tasty red substance standardly labeled “catsup” is distinct from the tasty red substance standardly labeled “ketchup”, and consider the following pair of sentences:

  • (23) Bob believes that catsup is catsup .
  • (24) Bob believes that catsup is ketchup .

(23) and (24) seem quite a bit like (19) and (20) : these each seem to be pairs of sentences which differ in truth-value, despite differing only in the substitution of the italic expressions. So, for consistency, it seems that the Fregean should explain the apparent difference in truth-value between (23) and (24) in just the way he explains the apparent difference in truth-value between (19) and (20): by positing a difference in meaning between the italic expressions. But, first, it is hard to see how expressions like “catsup” and “ketchup” could differ in meaning; and, second, it seems that an example of this sort could be generated for any alleged pair of synonymous expressions. (A closely related series of examples is developed in much more detail in Kripke 1979.)

The example of “catsup” and “ketchup” is related to a second worry for the Fregean, which is the reverse of the Fregean’s complaint about Russellian semantics: a plausible case can be made that Frege’s criterion of difference for sense slices contents too finely, and draws distinctions in content where there are none. One way of developing this sort of argument involves (again) propositional attitude ascriptions. It seems plausible that if I utter a sentence like “Hammurabi thought that Hesperus was visible only in the morning”, what I say is true if and only if one of Hammurabi’s thoughts has the same content as does the sentence “Hesperus was visible only in the morning”, as used by me. On a Russellian view, this places a reasonable constraint on the truth of the ascription; it requires only that Hammurabi believe of a certain object that it instantiates the property of being visible in the morning. But on a Fregean view, this sort of view of attitude ascriptions would require that Hammurabi thought of the planet Venus under the same mode of presentation as I attach to the term “Hesperus”. This seems implausible, since it seems that I can truly report Hammurabi’s beliefs without knowing anything about the mode of presentation under which he thought of the planets. (For a recent attempt to develop a Fregean semantics for propositional attitude ascriptions which avoids this sort of problem by integrating aspects of a Russellian semantics, see Chalmers (2011).)

2.2 Alternatives to Classical Semantic Theories

Classical semantic theories, however, are not the only game in town. This section lays out the basics of five alternatives to classical semantic theorizing.

One kind of challenge to classical semantics attacks the idea that the job of a semantic theory is to systematically pair expressions with the entities which are their meanings. Wittgenstein was parodying just this idea when he wrote

You say: the point isn’t the word, but its meaning, and you think of the meaning as a thing of the same kind as the word, though also different from the word. Here the word, there the meaning. The money, and the cow that you can buy with it. (Wittgenstein 1953, §120)

While Wittgenstein himself did not think that systematic theorizing about semantics was possible, this anti-theoretical stance has not been shared by all subsequent philosophers who share his aversion to “meanings as entities”. A case in point is Donald Davidson. Davidson thought that semantic theory should take the form of a theory of truth for the language of the sort which Alfred Tarski showed us how to construct (see Tarski 1944 and entry on Tarski’s truth definitions ).

For our purposes, it will be convenient to think of a Tarskian truth theory as a variant on the sorts of theories of reference introduced in §2.1.1 . Recall that theories of reference of this sort specified, for each proper name in the language, the object to which that name refers, and for every simple predicate in the language, the set of things which satisfy that predicate. If we then consider a sentence which combines a proper name with such a predicate, like

Amelia sings

the theory tells us what it would take for that sentence to be true: it tells us that this sentence is true if and only if the object to which “Amelia” refers is a member of the set of things which satisfy the predicate “sings”—i.e., the set of things which sing. So we can think of a full theory of reference for the language as implying, for each sentence of this sort, a T-sentence of the form

“Amelia sings” is T (in the language) if and only if Amelia sings.

Suppose now that we expand our theory of reference so that it implies a T-sentence of this sort for every sentence of the language, rather than just for simple sentences which result from combining a name and a monadic predicate. We would then have a Tarskian truth theory for our language. Tarski’s idea was that such a theory would define a truth predicate (“ T ”) for the language; Davidson, by contrast, thought that we find in Tarskian truth theories “the sophisticated and powerful foundation of a competent theory of meaning” (Davidson 1967).

This claim is puzzling: why should a a theory which issues T-sentences, but makes no explicit claims about meaning or content, count as a semantic theory? Davidson’s answer was that knowledge of such a theory would be sufficient to understand the language. If Davidson were right about this, then he would have a plausible argument that a semantic theory could take this form. After all, it is plausible that someone who understands a language knows the meanings of the expressions in the language; so, if knowledge of a Tarskian truth theory for the language were sufficient to understand the language, then knowledge of what that theory says would be sufficient to know all the facts about the meanings of expressions in the language, in which case it seems that the theory would state all the facts about the meanings of expressions in the language.

One advantage of this sort of approach to semantics is its parsimony: it makes no use of the intensions, Russellian propositions, or Fregean senses assigned to expressions by the propositional semantic theories discussed above. Of course, as we saw above, these entities were introduced to provide a satisfactory semantic treatment of various sorts of linguistic constructions, and one might well wonder whether it is possible to provide a Tarskian truth theory of the sort sketched above for a natural language without making use of intensions, Russellian propositions, or Fregean senses. The Davidsonian program obviously requires that we be able to do this, but it is still very much a matter of controversy whether a truth theory of this sort can be constructed. Discussion of this point is beyond the scope of this entry; one good way into this debate is through the debate about whether the Davidsonian program can provide an adequate treatment of propositional attitude ascriptions. See the discussion of the paratactic account and interpreted logical forms in the entry on propositional attitude reports . (For Davidson’s initial treatment of attitude ascriptions, see Davidson 1968; for further discussion see, among other places, Burge 1986; Schiffer 1987; Lepore and Loewer 1989; Larson and Ludlow 1993; Soames 2002.)

Let’s set this aside, and assume that a Tarskian truth theory of the relevant sort can be constructed, and ask whether, given this supposition, this sort of theory would provide an adequate semantics. There are two fundamental reasons for thinking that it would not, both of which are ultimately due to Foster (1976). Larson and Segal (1995) call these the extension problem and the information problem .

The extension problem stems from the fact that it is not enough for a semantic theory whose theorems are T-sentences to yield true theorems; the T-sentence

“Snow is white” is T in English iff grass is green.

is true, but tells us hardly anything about the meaning of “Snow is white”. Rather, we want a semantic theory to entail, for each sentence of the object language, exactly one interpretive T-sentence: a T-sentence such that the sentence used on its right-hand side gives the meaning of the sentence mentioned on its left-hand side. Our theory must entail at least one such T-sentence for each sentence in the object language because the aim is to give the meaning of each sentence in the language; and it must entail no more than one because, if the theory had as theorems more than one T-sentence for a single sentence S of the object language, an agent who knew all the theorems of the theory would not yet understand S , since such an agent would not know which of the T-sentences which mention S was interpretive.

The problem is that it seems that any theory which implies at least one T-sentence for every sentence of the language will also imply more than one T-sentence for every sentence in the language. For any sentences p , q , if the theory entails a T-sentence

S is T in L iff p ,

then, since p is logically equivalent to \(p \amp \nsim(q \amp \nsim q)\), the theory will also entail the T-sentence

S is T in L iff \(p \amp \nsim(q \amp \nsim q)\),

which, if the first is interpretive, won’t be. But then the theory will entail at least one non-interpretive T-sentence, and someone who knows the theory will not know which of the relevant sentences is interpretive and which not; such a person therefore would not understand the language.

The information problem is that, even if our semantic theory entails all and only interpretive T-sentences, it is not the case that knowledge of what is said by these theorems would suffice for understanding the object language. For, it seems, I can know what is said by a series of interpretive T-sentences without knowing that they are interpretive. I may, for example, know what is said by the interpretive T-sentence

“Londres est jolie” is T in French iff London is pretty

but still not know the meaning of the sentence mentioned on the left-hand side of the T-sentence. The truth of what is said by this sentence, after all, is compatible with the sentence used on the right-hand side being materially equivalent to, but different in meaning from, the sentence mentioned on the left. This seems to indicate that knowing what is said by a truth theory of the relevant kind is not, after all, sufficient for understanding a language. (For replies to these criticisms, see Davidson (1976), Larson and Segal (1995) and Kölbel (2001); for criticism of these replies, see Soames (1992) and Ray (2014). For a reply to the latter, see Kirk-Giannini and Lepore (2017).)

The Davidsonian, on one reading, diagnoses the mistake of classical semantics in its commitment to a layer of content which goes beyond a theory of reference. A different alternative to classical semantics departs even more radically from that tradition, by denying that mind-world reference relations should play any role in semantic theorizing.

This view is sometimes called “internalist semantics” by contrast with views which locate the semantic properties of expressions in their relation to elements of the external world. This internalist approach to semantics is associated with the work of Noam Chomsky (see especially Chomsky 2000).

It is easy to say what this approach to semantics denies. The internalist denies an assumption common to all of the approaches discussed so far: the assumption that in giving the content of an expression, we are primarily specifying something about that expression’s relation to things in the world which that expression might be used to say things about. According to the internalist, expressions as such don’t bear any semantically interesting relations to things in the world; names don’t, for example, refer to the objects with which one might take them to be associated; predicates don’t have extensions; sentences don’t have truth conditions. On this sort of view, we can use sentences to say true or false things about the world, and can use names to refer to things; but this is just one thing we can do with names and sentences, and is not a claim about the meanings of those expressions.

So what are meanings, on this view? The most developed answer to this question is given in Pietroski (2018), according to which “meanings are instructions for how to build concepts of a special sort” (2018: 36). By “concepts”, Pietroski means mental representations of a certain kind. So the meaning of an expression is an instruction to form a certain sort of mental representation.

On this kind of view, while concepts may have extensions, expressions of natural languages do not. So this approach rejects not just the details but the foundation of the classical approach to semantics described above.

One way to motivate an approach of this kind focuses on the ubiquity of the phenomenon of polysemy in natural languages. As Pietroski says,

We can use “line” to speak of Euclidean lines, fishing lines, telephone lines, waiting lines, lines in faces, lines of thought, etc. We can use “door” to access a concept of certain impenetrable objects, or a concept of certain spaces that can be occupied by such objects. (2018: 5)

The defender of the view that expressions have meanings which determine extensions seems forced to say that “line” and “door” are homophonous expressions, like “bank”. But that seems implausible; when one uses the expressions “fishing line” and “line of thought” one seems to be using “line” in recognizably the same sense. (This is a point of contrast with standard examples of homophony, as when uses “bank” once to refer to a financial institution and then later to refer to the side of a river.) The internalist, by contrast, is not forced into treating these as cases of homophony; he can say that the meaning of “line” is an instruction to fetch one of a family of concepts.

For defenses and developments of internalist approaches to semantics, see McGilvray (1998), Chomsky (2000), and Pietroski (2003, 2005, 2018).

Internalist semantics can be understood as denying that classical semantic assumption that a semantic theory should assign truth conditions to sentences. Another alternative to classical semantics does not deny that assumption, but does deny that truth conditions should play the fundamental role in semantics that classical semantics give to them.

This alternative is inferentialist semantics. The difference between classical and inferentialist semantics is nicely put by Robert Brandom:

The standard way [of classical semantics] is to assume that one has a prior grip on the notion of truth, and use it to explain what good inference consists in. … [I]nferentialist pragmatism reverses this order of explanation … It starts with a practical distinction between good and bad inferences, understood as a distinction between appropriate and inappropriate doings, and goes on to understand talk about truth as talk about what is preserved by the good moves. (Brandom 2000: 12)

The classical semanticist begins with certain language-world representational relations, and uses these to explain the truth conditions of sentences; we can then go on to use these truth conditions to explain the difference between good and bad inferences. The inferentialist, by contrast, begins with the distinction between good and bad inferences, and tries to explain the representational relations which the classical semanticist takes as (comparatively) basic in inferentialist terms. (I say “comparatively basic” because the classical semanticist might go on to provide a reductive explanation of these representational relations, and the inferentialist might go on to provide a reductive explanation of the distinction between good and bad inferences.

As Brandom also emphasizes, the divergence between the classical and inferentialist approaches to semantics arguably brings with it a divergence on two other fundamental topics.

The first is the relative explanatory priority of the semantic properties of sentences, on the one hand, and subsentential expressions, on the other. It is natural for the classical semanticist to think that the representational relations between subsentential expressions and their semantic contents can be explained independently of the representational properties of sentences (i.e., their truth conditions); the latter can thus be explained in terms of the former. For the inferentialist, on the other hand, the semantic properties of sentences must come first, because inferential relations hold between sentences but not between subsentential expressions. (One cannot, for example, infer one name from another.) So the inferentialist will not explain the semantic properties of, for example, singular terms in terms of representational relations between those singular terms and items in the world; rather, she will explain what is distinctive of singular terms in terms of their role in certain kinds of inferences. (To see how this strategy might work, see Brandom 2000: Ch. 4.)

The second is the relative explanatory priority of the semantic properties of individual sentences, on the one hand, and the semantic relations between sentences on the other. The classical semanticist can, so to speak, explain the meanings of sentences one by one; there is no difficulty in explaining the meaning of a sentence without mentioning other sentences. By contrast, according to the inferentialist,

if the conceptual content expressed by each sentence or word is understood as essentially consisting in its inferential relations (broadly construed) or articulated by its inferential relations (narrowly construed), then one must grasp many such contents in order to grasp any. (Brandom 2000: 29)

This is sometimes called a holist approach to semantics. For discussions of the pros and cons of this kind of view, see the entry on meaning holism .

For book length defenses of inferentialism, see Brandom (1994) and Brandom (2000). Important precursors include Wittgenstein (1953) and Sellars (1968); see also the entries on Ludwig Wittgenstein and Wilfrid Sellars . For a classic objection to inferentialism, see Prior (1960). For a discussion of a prominent approach within the inferentialist tradition, see the entry on proof-theoretic semantics .

In laying out the various versions of classical semantics, we said a lot about sentences. By comparison, we said hardly anything about conversations, or discourses. This is no accident; classical approaches to semantics typically think of properties of conversations or discourses as explicable in terms of explanatorily prior semantic properties of sentences (even if classical semanticists do often take the semantic contents of sentences to be sensitive to features of the discourse in which they occur).

Dynamic semantics is, to a first approximation, an approach to semantics which reverses these explanatory priorities. (The sorts of classical theories sketched above are, by contrast, called “static” semantic theories.) On a dynamic approach, a semantic theory does not aim primarily to deliver a pairing of sentences with propositions which then determine the sentence’s truth conditions. Rather, according to these dynamic approaches to semantics, the semantic values of sentences are rather “context change potentials”—roughly, instructions for updating the context, or discourse.

In a dynamic context, many of the questions posed above about how best to understand the nature of semantic contents show up instead as questions about how best to understand the nature of contexts and context change potentials. It is controversial not just whether a dynamic or static approach to semantics is likely to be more fruitful, but also what exactly the distinction between dynamic and static systems comes to. (For discussion of the latter question, see Rothschild & Yalcin 2016.)

The relationship between dynamic semantics and classical semantics is different than the relationship between the latter and the other alternatives to classical semantics that I’ve discussed. The other alternatives to classical semantics reject some core feature of classical semantics—for example, the assignments of entities as meanings, or the idea that meaning centrally involves word-world relations. By contrast, dynamic semantics can be thought of as a kind of extension or generalization of classical semantics, which can employ modified versions of much of the same theoretical machinery.

Foundational works in this tradition include Irene Heim’s file change semantics (1982) and Hans Kamp’s discourse representation theory (see entry) . For more details on different versions of this alternative to classical semantics, see the entry on dynamic semantics . For critical discussion of the motivations for dynamic semantics, see Karen Lewis (2014). For discussion of the extent to which dynamic and static approaches are really in competition, see Stojnić (2019).

A final alternative to classical semantics differs from those discussed in the preceding four subsections in two (related) respects.

The first is that, unlike the other non-classical approaches, expressivist semantics was originally not motivated by linguistic considerations. Rather, it was developed in response to specifically metaethical considerations. A number of philosophers held metaethical views which made it hard for them to see how a classical semantic treatment of sentences about ethics could be correct, and so developed expressivism as an alternative treatment of these parts of language.

This leads to a second difference between expressivist and our other four alternatives to classical semantics. The latter are all “global alternatives”, in the sense that they propose non-classical approaches to the semantics of all of a natural language. By contrast, expressivists typically agree that classical semantics (or one of the other non-expressivist alternatives to it discussed in §§2.2.1–4 ) is correct for many parts of language; they just think that special features of some parts of language require expressivist treatment.

One can think of many traditional versions of expressivism, which were motivated by metaethical concerns, as involving two basic ideas. First, we can explain the meaning of a mental state by saying what mental state that sentence expresses. Second, the mental state expressed by a sentence about ethics is different in kind from the mental state expressed by a “factual” sentence.

Two follow up questions suggest themselves. One is about what “expresses” means here; for one answer, see Gibbard (1990). A second is about what the relevant difference in mental states consists in. On many views, the mental states expressed by non-ethical sentences are beliefs, whereas the mental states expressed by ethical sentences are not. Different versions of expressivism propose different candidates for the mental states which are expressed by ethical sentences. Prominent candidates include exclamations (Ayer 1936), commands (Hare 1952), and plans (Gibbard 1990, 2003).

A classic problem for expressivist theories of the kind just sketched comes from interactions between ethical and non-ethical bits of language. This problem has come to be known as the Frege-Geach problem, because a very influential version of the problem was posed by Geach (1960, 1965). (A version of the problem is also independently presented in Searle 1962.) In one of its versions, the problem comes in two parts. First, whatever mental state expressivists take ethical sentences to express will typically not be expressed by complex sentences which embed the relevant ethical sentence. So even if

Lying is wrong.

expresses the mental state of planning not to lie, the same sentence when embedded in the conditional

If lying is wrong, then what Jane did was wrong.

does not. After all, one can endorse this conditional without endorsing a plan not to lie. So it seems that the expressivist must say that “lying is wrong” means something different when it occurs alone than it does when it occurs in the antecedent of a conditional. The problem, though, is that if one takes that view it is hard to see how it could follow from the above two sentences, as it surely does, that

What Jane did was wrong.

For a discussion of solutions to this problem, and an influential critique of expressivism, see Schroeder (2008).

Much recent work on expressivism is both less focused on the special case of ethics, and more motivated by purely linguistic considerations, than has often been the case traditionally. Examples include discussions of expressivism about epistemic modality in Yalcin (2007), about knowledge ascriptions in Moss (2013), and about vagueness in MacFarlane (2016).

2.3 General Questions Facing Semantic Theories

As mentioned above, the aim of §2 of this entry is to discuss issues about the form which a semantic theory should take which are at a higher level of abstraction than issues about the correct semantic treatment of particular expression-types. (Also as mentioned above, some of these may be found in the entries on conditionals , descriptions , names , propositional attitude reports , and tense and aspect .) But there are some general issues in semantics which, while more general than questions about how, for example, the semantics of adverbs should go, are largely (though not wholly) orthogonal to the question of which of the frameworks for semantic theorizing laid out in §2.1–2 should be adopted. The present subsection introduces a few of these.

§2.1.4 introduced the idea that some expressions might be context-sensitive, or indexical. Within a propositional semantics, we’d say that these expressions have different contents relative to distinct contexts; but the phenomenon of context-sensitivity is one which any semantic theory must recognize. A very general question which is both highly important and orthogonal to the above distinctions between types of semantic theories is: How much context-sensitivity is there in natural languages?

Virtually everyone recognizes a sort of core group of indexicals, including “I”, “here”, and “now”. Most also think of demonstratives, like (some uses of) “this” and “that”, as indexicals. But whether and how this list should be extended is a matter of controversy. Some popular candidates for inclusion are:

  • devices of quantification
  • gradable adjectives
  • alethic modals, including counterfactual conditionals
  • “knows” and epistemic modals
  • propositional attitude ascriptions
  • “good” and other moral terms

Many philosophers and linguists think that one or more of these categories of expressions are indexicals. Indeed, some think that virtually every natural language expression is context-sensitive.

Questions about context-sensitivity are important, not just for semantics, but for many areas of philosophy. And that is because some of the terms thought to be context-sensitive are terms which play a central role in describing the subject matter of other areas of philosophy.

Perhaps the most prominent example here is the role that the view that “knows” is an indexical has played in recent epistemology. This view is often called “contextualism about knowledge”; and in general, the view that some term F is an indexical is often called “contextualism about F ”. Contextualism about knowledge is of interest in part because it promises to provide a kind of middle ground between two opposing epistemological positions: the skeptical view that we know hardly anything about our surroundings, and the dogmatist view that we can know that we are not in various Cartesian skeptical scenarios. (So, for example, the dogmatist holds that I can know that I am not a brain in a vat which is, for whatever reason, being made to have the series of experiences subjectively indistinguishable from the experiences I actually have.) Both of these positions can seem unappealing—skepticism because it does seem that I can occasionally know, e.g., that I am sitting down, and dogmatism because it’s hard to see how I can rule out the possibility that I am in a skeptical scenario subjectively indistinguishable from my actual situation.

But the disjunction of these positions can seem, not just unappealing, but inevitable; for the proposition that I am sitting entails that I am not a brain in a vat, and it’s hard to see—presuming that I know that this entailment holds—how I could know the former without thereby being in a position to know the latter. The contextualist about “knows” aims to provide the answer: the extension of “knows” depends on features of the context of utterance. Perhaps—to take one among many possible contextualist views—a pair of a subject and a proposition p will be in the extension of “knows” relative to a context only if that subject is able to rule out every possibility which is both (i) inconsistent with p and (ii) salient in C . The idea is that “I know that I am sitting down” can be true in a normal setting, simply because the possibility that I am a brain in a vat is not normally salient; but typically “I know that I am not a brain in a vat” will be false, since discussion of skeptical scenarios makes them salient, and (if the skeptical scenario is well-designed) I will lack the evidence needed to rule them out. See for discussion, among many other places, the entry on epistemic contextualism , Cohen (1986), DeRose (1992), and David Lewis (1996).

Having introduced one important contextualist thesis, let’s return to the general question which faces the semantic theorist, which is: How do we tell when an expression is context-sensitive? Contextualism about knowledge, after all, can hardly get off the ground unless “knows” really is a context-sensitive expression. “I” and “here” wear their context-sensitivity on their sleeves; but “knows” does not. What sort of argument would suffice to show that an expression is an indexical?

Philosophers and linguists disagree about the right answers to this question. The difficulty of coming up with a suitable diagnostic is illustrated by considering one intuitively plausible test, defended in Chapter 7 of Cappelen & Lepore (2005). This test says that an expression is an indexical iff it characteristically blocks disquotational reports of what a speaker said in cases in which the original speech and the disquotational report are uttered in contexts which differ with respect to the relevant contextual parameter. (Or, more cautiously, that this test provides evidence that a given expression is, or is not, context-sensitive.)

This test clearly counts obvious indexicals as such. Consider “I”. Suppose that Mary utters

I am hungry.

One sort of disquotational report of Mary’s speech would use the very sentence Mary uttered in the complement of a “says” ascription. So suppose that Sam attempts such a disquotational report of what Mary said, and utters

Mary said that I am hungry.

The report is obviously false; Mary said that Mary is hungry, not that Sam is. The falsity of Sam’s report suggests that “I am hungry” has a different content out of Mary’s mouth than out of Sam’s; and this, in turn, suggests that “I” has a different content when uttered by Mary than when uttered by Sam. Hence, it suggests that “I” is an indexical.

It isn’t just that this test gives the right result in many cases; it’s also that the test fits nicely with the plausible view that an utterance of a sentence of the form “ A said that S ” in a context C is true iff the content of S in C is the same as the content of what the referent of “ A ” said (on the relevant occasion).

The interesting uses of this test are not uses which show that “I” is an indexical; we already knew that. The interesting use of this test, as Cappelen and Lepore argue, is to show that many of the expressions which have been taken to be indexicals—like the ones on the list given above—are not context-sensitive. For we can apparently employ disquotational reports of the relevant sort to report utterances using quantifiers, gradable adjectives, modals, “knows”, etc. This test thus apparently shows that no expressions beyond the obvious ones—“I”, “here”, “now”, etc.—are genuinely context-sensitive.

But, as Hawthorne (2006) argues, naive applications of this test seem to lead to unacceptable results. Terms for relative directions, like “left”, seem to be almost as obviously context-sensitive as “I”; the direction picked out by simple uses of “left” depends on the orientation of the speaker of the context. But we can typically use “left” in disquotational “says” reports of the relevant sort. Suppose, for example, that Mary says

The coffee machine is to the left.

Sam can later truly report Mary’s speech by saying

Mary said that the coffee machine was to the left.

despite the fact that Sam’s orientation in the context of the ascription differs from Mary’s orientation in the context of the reported utterance. Hence our test seems to lead to the absurd result that “left” is not context-sensitive.

One interpretation of this puzzling fact is that our test using disquotational “says” ascriptions is a bit harder to apply than one might have thought. For, to apply it, one needs to be sure that the context of the ascription really does differ from the context of the original utterance in the value of the relevant contextual parameter . And in the case of disquotational reports using “left”, one might think that examples like the above show that the relevant contextual parameter is sometimes not the orientation of the speaker, but rather the orientation of the subject of the ascription at the time of the relevant utterance.

This is but one criterion for context-sensitivity. But discussion of this criterion brings out the fact that the reliability of an application of a test for context-sensitivity will in general not be independent of the space of views one might take about the contextual parameters to which a given expression is sensitive. For an illuminating discussion of ways in which we might revise tests for context-sensitivity using disquotational reports which are sensitive to the above data, see Cappelen & Hawthorne (2009). For a critical survey of other proposed tests for context-sensitivity, see Cappelen & Lepore (2005: Part I).

This is just an introduction to one central issue concerning the relationship between context and semantic content. A sampling of other influential works on this topic includes Sperber and Wilson (1995), Carston (2002), Recanati (2004, 2010), Bezuidenhout (2002), and the essays in Stanley (2007).

§2.1.5 introduced the idea of an expression determining a reference, relative to a context, with respect to a particular circumstance of evaluation. But that discussion left the notion of a circumstance of evaluation rather underspecified. One might want to know more about what, exactly, these circumstances of evaluation involve—and hence about what sorts of things the reference of an expression can (once we’ve fixed a context) vary with respect to.

One way to focus this question is to stay at the level of sentences, and imagine that we have fixed on a sentence S , with a certain character, and context C . If sentences express propositions relative to contexts, then S will express some proposition P relative to C . If the determination of reference in general depends not just on character and context, but also on circumstance, then we know that P might have different truth-values relative to different circumstances of evaluation. Our question is: exactly what must we specify in order to determine P ’s truth-value?

Let’s say that an index is the sort of thing which, for some proposition P , we must at least sometimes specify in order to determine P’s truth-value. Given this usage, we can think of circumstances of evaluation—the things which play the theoretical role outlined in §2.1.5 —as made up of indices.

The most uncontroversial candidate for an index is a world, because most advocates of a propositional semantics think that propositions can have different truth-values with respect to different possible worlds. The main question is whether circumstances of evaluation need contain any indices other than a possible world.

The most popular candidate for a second index is a time. The view that propositions can have different truth-values with respect to different times—and hence that we need a time index—is often called “temporalism”. The negation of temporalism is eternalism.

The motivations for temporalism are both metaphysical and semantic. On the metaphysical side, A-theorists about time (see the entry on time ) think that corresponding to predicates like “is a child” are A-series properties which a thing can have at one time, and lack at another time. (Hence, on this view, the property corresponding to “is a child” is not a property like being a child in 2014 , since that is a property which a thing has permanently if at all, and hence is a B-series rather than A-series property.) But then it looks like the proposition expressed by “Violet is a child”—which predicates this A-series property of Violet—should have different truth-values with respect to different times. And this is enough to motivate the view that we should have an index for a time.

On the semantic side, as Kaplan (1989) notes, friends of the idea that tenses are best modeled as operators have good reason to include a time index in circumstances of evaluation. After all, operators operate on contents, so if there are temporal operators, they will only be able to affect truth-values if those contents can have different truth-values with respect to different times.

A central challenge for the view that propositions can change truth-value over time is whether the proponent of this view can make sense of retention of propositional attitudes over time. For suppose that I believe in 2014 that Violet is a child. Intuitively, I might hold fixed all of my beliefs about Violet for the next 40 years, without its being true, in 2054, that I have the obviously false belief that Violet is still a child. But the temporalist, who thinks of the proposition that Violet is a child as something which incorporates no reference to a time and changes truth-value over time, seems stuck with this result. Problems of this sort for temporalism are developed in Richard (1981); for a response see Sullivan (2014).

Motivations for eternalism are also both metaphysical and semantic. Those attracted to B-theories of time will take propositions to have their truth-values eternally, which makes inclusion of a time index superfluous. And those who think that tenses are best modeled in terms of quantification over times rather than using tense operators will, similarly, see no use for a time index. For a defense of the quantificational over the operator analysis of tense, see King (2003).

Is there a case to be made for including any indices other than a world and a time? There is; and this has spurred much of the recent interest in relativist semantic theories. Relativist semantic theories hold that our indices should include not just a world and (perhaps) a time, but also a context of assessment . Just as propositions can have different truth values with respect to different worlds, so, on this view, they can vary in their truth depending upon features of the conversational setting in which they are considered. (Though this way of putting things assumes that the relativist should be a “truth relativist” rather than a “content relativist”. See for discussion Weatherson and Egan 2011: § 2.3.)

The motivations for this sort of view can be illustrated by a type of example whose importance is emphasized in Egan et al. (2005). Suppose that, at the beginning of a murder investigation inquiry, I say

The murderer might have been on campus at midnight.

It looks like the proposition expressed by this sentence will be true, roughly, if we don’t know anything which rules out the murderer having been on campus at midnight. But now suppose that more information comes in, some of which rules out the murderer having been on campus at midnight. At this point, it seems, I could truly say

What I said was false—the murderer couldn’t have been on campus at midnight.

But this is puzzling. It is not puzzling that the sentence “The murderer might have been on campus at midnight” could be true when uttered in the first context but false when uttered in the second context; that fact could be accommodated by any number of contextualist treatments of epistemic modals, which would dissolve the puzzle by saying that the sentence expresses different propositions relative to the two contexts. The puzzle is that the truth of the second sentence seems to imply that the proposition expressed by the first—which we agreed was true relative to that context—is false relative to the second context. Here we don’t have (or don’t just have) sentences varying in truth-value depending on context; we seem to have propositions varying in truth-value depending on context. The relativist about epistemic modals takes appearance here to be reality, and holds that, in addition to worlds (and maybe times), propositions can sometimes differ in their truth-value relative to contexts of assessment (roughly, the context in which the proposition is being considered). (Note that it is not essential to the case that the two contexts of assessment are at different times; much the same intuitions can be generated by considering cases of “eavesdropping”, in which one party overhears the utterance of some other group which lacks some of its evidence.)

Relativist treatments of various expressions have also been motivated by certain apparent facts about disagreement. Lasersohn (2005) considers the example of predicates of personal taste. He points out that we’re often inclined to think that, if our tastes differ sufficiently, my utterance of “That soup is tasty” can be true even while your utterance of “That soup is not tasty” is also true. As above, this fact by itself is not especially surprising, and might seem to cry out for a contextualist treatment of “tasty”. But the puzzling thing is that, despite the fact that we think that each of us are uttering sentences which express true propositions, we are clearly disagreeing with each other. (You might say, after overhearing me, “ No , that soup is not tasty”.)

The contrast here with indexicals is apparently quite sharp. If I say “I’m hungry”, and you’re not hungry, you’d never reply to my utterance by saying “No, I’m not hungry”—precisely because it’s obvious that we would not be disagreeing. So again we have a puzzle: a puzzle about how each of our “soup” sentences could express true propositions, despite those propositions contradicting each other. Relativism suggests an answer: these propositions are only true or false relative to individuals. The one I express is true relative to me, and its negation is true relative to you; they’re contradictory in the sense that it is impossible for both to be true relative to the same individual (at the same time).

It’s very controversial whether any of these relativist arguments are convincing. For more discussion, see the discussion of “new relativism” in the entry on relativism . For an explication of relativism and its application to various kinds of discourse, see MacFarlane (2014). For an extended critique of relativism, see Cappelen & Hawthorne (2009).

Most philosophers believe in propositions, and hence think that semantics should be done according to one of the three broad categories of propositionalist approaches sketched above: possible worlds semantics, Russellianism, or Fregeanism. But it is notable that of these three views, only one—possible worlds semantics—actually tells us what propositions are. (Even in that case, of course, one might ask what possible worlds are, and hence what propositions are sets of . See the entry on possible worlds .) Russellian and Fregean views make claims about what sorts of things are the constituents of propositions—but don’t tell us what the structured propositions so constituted are.

There are really two questions here. One is the question: what does it mean to say that x is a constituent of a proposition? The language of constituency suggests parthood; but there’s some reason to think that x ’s being a constituent of a proposition isn’t a matter of x ’s being a part of that proposition. This is perhaps clearest on a Russellian view, according to which ordinary physical objects can be constituents of propositions. The problem is that a thing can be a constituent of a proposition without every part of that thing being a constituent of that proposition; a proposition with me as a constituent, it seems, need not also have every single molecule that now composes me as a constituent. But that fact is inconsistent with the idea that constituency is parthood and the plausible assumption that parthood is transitive. For discussion of this and other problems, see Gilmore (2014), Keller (2013), and Merricks (2015).

Hence the proponent of structured propositions owes some account of what “structure” and “constituent” talk amounts to in this domain. And they can hardly take these notions as primitive, since it would then be very unclear what explanatory value the claim that propositions are structured could have.

The second, in some ways more fundamental, question, is: What sort of thing are propositions? To what metaphysical category do they belong? The simplest and most straightforward answer to this question is: “They belong to the sui generis category of propositions”. (This is the view of Plantinga (1974) and Merricks (2015).)

But recently many philosophers have sought to give different answers to this question, by trying to explain how propositions could be members of some other ontological category in which we have independent reason to believe. Recent work of this sort can be divided into three main families of views.

According to the first, propositions are a kind of fact. This view was, on some interpretations, advocated by Russell (1903) and Wittgenstein (1922). The most prominent current defender of this view is Jeffrey King. On his version of the view, propositions (at least the propositions expressed by sentences) are meta-linguistic facts about sentences. At a first pass, and ignoring some important subtleties, the proposition expressed by the sentence “Amelia talks” will be the fact that there is some language L , some expression x , some expression y , and some syntactic relation R such that R ( x , y ), x has Amelia as its semantic value, y has the property of talking as its semantic value, and R encodes ascription. In some respects, this view is not so far from—though much more thoroughly developed than—Wittgenstein’s view in the Tractatus that “a proposition is a propositional sign in its projective relation to the world” (3.12). See for development and defense of this view King (2007, 2014).

According to the second sort of view, propositions are kind of property. Versions of this view vary both according to which properties they take propositions to be, and what they take propositions to be properties of. This view is most closely associated with David Lewis (1979) and Chisholm (1981), who took the objects of propositional attitudes to be properties which the bearer of the attitude ascribes to him- or herself. Other versions of the view are defended by van Inwagen (2004) and Gilmore (forthcoming), who take propositions to be 0-place relations, and Richard (2013) and Speaks (2014), who take propositions to be monadic properties of certain sorts.

According to the third sort of view, propositions are entities which are, or owe their existence to, the mental acts of subjects. While their views differ in many ways, both Hanks (2007, 2011) and Soames (2010, 2014) think of propositions as acts of predication. In the simplest case—a monadic predication—the proposition will be the act of predicating a property of an object.

Of course, not all views fit into these three categories. An important view which fits into none of them is defended in Moltmann (2013).

Different theorists differ, not just in their views about what propositions are, but also in their views about what a theory of propositions should explain. The representational properties of propositions are a case in point. Hanks, King, and Soames take one of the primary tasks of a theory of propositions to be the explanation of the representational properties of propositions. Others, like McGlone (2012) and Merricks (2015), hold that a proposition’s having certain representational properties is a primitive matter. Still others, like Richard and Speaks, deny that propositions have representational properties in any interesting sense. See for further discussion of these issues the entry on structured propositions .

3. Foundational Theories of Meaning

We now turn to our second sort of “theory of meaning”: foundational theories of meaning, which are attempts to specify the facts in virtue of which expressions of natural languages come to have the semantic properties that they have.

The question which foundational theories of meaning try to answer is a common sort of question in philosophy. In the philosophy of action (see entry) we ask what the facts are in virtue of which a given piece of behavior is an intentional action; in questions about personal identity (see entry) we ask what the facts are in virtue of which x and y are the same person; in ethics we ask what the facts are in virtue of which a given action is morally right or wrong. But, even if they are common enough, it is not obvious what the constraints are on answers to these sorts of questions, or when we should expect questions of this sort to have interesting answers.

Accordingly, one sort of approach to foundational theories of meaning is simply to deny that there is any true foundational theory of meaning. One might be quite willing to endorse one of the semantic theories outlined above while also holding that facts about the meanings of expressions are primitive, in the sense that there is no systematic story to be told about the facts in virtue of which expressions have the meanings that they have. (See, for example, Johnston 1988.)

There is another reason why one might be pessimistic about the prospects of foundational theories of meaning. While plainly distinct from semantics, the attempt to provide foundational theories is clearly in one sense answerable to semantic theorizing, since without a clear view of the facts about the semantic contents of expressions we won't have a clear view of the facts for which we are trying to provide an explanation. One might, then, be skeptical about the prospects of foundational theories of meaning not because of a general primitivist view of semantic facts, but just because one holds that natural language semantics is not yet advanced enough for us to have a clear grip on the semantic facts which foundational theories of meaning aim to analyze. (See for discussion Yalcin (2014).)

Many philosophers have, however, attempted to provide foundational theories of meaning. This section sets aside pessimism about the prospects for such theories and lays out the main attempts to give a systematic account of the facts about language users in virtue of which their words have the semantic properties that they do. It is useful to separate these theories into two camps.

According to the first sort of view, linguistic expressions inherit their contents from some other sort of bearer of content. So, for example, one might say that linguistic expressions inherit their contents from the contents of certain mental states with which they are associated. I’ll call views of this type mentalist theories. Mentalist theories are discussed in §3.1 , and non-mentalist theories in §3.2 .

3.1 Mentalist Theories

All mentalist theories of meaning have in common that they analyze one sort of representation—linguistic representation—in terms of another sort of representation—mental representation. For philosophers who are interested in explaining content, or representation, in non-representational terms, then, mentalist theories can only be a first step in the task of giving an ultimate explanation of the foundations of linguistic representation. The second, and more fundamental explanation would then come at the level of a theory of mental content. (For an overview of theories of this sort, see entry on mental representation and the essays in Stich and Warfield 1994.) Indeed, the popularity of mentalist theories of linguistic meaning, along with the conviction that content should be explicable in non-representational terms, is an important reason why so much attention has been focused on theories of mental representation over the last few decades.

Since mentalists aim to explain the nature of meaning in terms of the mental states of language users, mentalist theories may be divided according to which mental states they take to be relevant to the determination of meaning. The most well-worked out views on this topic are the Gricean view, which explains meaning in terms of the communicative intentions of language users, and the view that the meanings of expressions are fixed by conventions which pair sentences with certain beliefs. We will discuss these in turn, followed by a brief discussion of a third alternative available to the mentalist.

Paul Grice developed an analysis of meaning which can be thought of as the conjunction of two claims: (1) facts about what expressions mean are to be explained, or analyzed, in terms of facts about what speakers mean by utterances of them, and (2) facts about what speakers mean by their utterances can be explained in terms of their intentions. These two theses comprise the “Gricean program” for reducing meaning to the contents of the intentions of speakers.

To understand Grice’s view of meaning, it is important first to be clear on the distinction between the meaning, or content, of linguistic expressions —which is what semantic theories like those discussed in §2 aim to describe—and what speakers mean by utterances employing those expressions. This distinction can be illustrated by example. (See entry on pragmatics for more discussion.) Suppose that in response to a question about the weather in the city where I live, I say “Well, South Bend is not exactly Hawaii”. The meaning of this sentence is fairly clear: it expresses the (true) proposition that South Bend, Indiana is not identical to Hawaii. But what I mean by uttering this sentence is something more than this triviality: I mean by the utterance that the weather in South Bend is not nearly as good as that in Hawaii. And this example utterance is in an important respect very typical: usually the propositions which speakers mean to convey by their utterances include propositions other than the one expressed by the sentence used in the context. When we ask “What did you mean by that?” we are usually not asking for the meaning of the sentence uttered.

The idea behind stage (1) of Grice’s theory of meaning is that of these two phenomena, speaker-meaning is the more fundamental: sentences and other expressions mean what they do because of what speakers mean by their utterances of those sentences. (For more details about how Grice thought that sentence-meaning could be explained in terms of speaker-meaning, see the discussion of resultant procedures in the entry on Paul Grice .) One powerful way to substantiate the claim that speaker-meaning is explanatorily prior to expression-meaning would be to show that facts about speaker-meaning may be given an analysis which makes no use of facts about what expressions mean; and this is just what stage (2) of Grice’s analysis, to which we now turn, aims to provide.

Grice thought that speaker-meaning could be analyzed in terms of the communicative intentions of speakers—in particular, their intentions to cause beliefs in their audience.

The simplest version of this idea would hold that meaning p by an utterance is just a matter of intending that one’s audience come to believe p . But this can’t be quite right. Suppose I turn to you and say, “You’re standing on my foot”. I intend that you hear the words I am saying; so I intend that you believe that I have said, “You’re standing on my foot”. But I do not mean by my utterance that I have said, “You’re standing on my foot”. That is my utterance—what I mean by it is the proposition that you are standing on my foot, or that you should get off of my foot. I do not mean by my utterance that I am uttering a certain sentence.

This sort of example indicates that speaker meaning can’t just be a matter of intending to cause a certain belief—it must be intending to cause a certain belief in a certain way. But what, in addition to intending to cause the belief, is required for meaning that p ? Grice’s idea was that one must not only intend to cause the audience to form a belief, but also intend that they do so on the basis of their recognition of the speaker’s intention. This condition is not met in the above example: I don’t expect you to believe that I have uttered a certain sentence on the basis of your recognition of my intention that you do so; after all, you’d believe this whether or not I wanted you to. This is all to the good.

This Gricean analysis of speaker-meaning can be formulated as follows: [ 5 ]

  • [G] a means p by uttering x iff a intends in uttering x that
  • 1. his audience come to believe p ,
  • 2. his audience recognize this intention, and
  • 3. (1) occur on the basis of (2).

However, even if [G] can be given a fairly plausible motivation, and fits many cases rather well, it is also open to some convincing counterexamples. Three such types of cases are: (i) cases in which the speaker means p by an utterance despite knowing that the audience already believes p , as in cases of reminding or confession; (ii) cases in which a speaker means p by an utterance, such as the conclusion of an argument, which the speaker intends an audience to believe on the basis of evidence rather than recognition of speaker intention; and (iii) cases in which there is no intended audience at all, as in uses of language in thought. These cases call into question whether there is any connection between speaker-meaning and intended effects stable enough to ground an analysis of the sort that Grice envisaged; it is still a matter of much controversy whether an explanation of speaker meaning descended from [G] can succeed.

For developments of the Gricean program, see—in addition to the classic essays in Grice (1989)—Schiffer (1972), Neale (1992), and Davis (2002). For an extended criticism, see Schiffer (1987).

An important alternative to the Gricean analysis, which shares the Gricean’s commitment to a mentalist analysis of meaning in terms of the contents of mental states, is the analysis of meaning in terms of the beliefs rather than the intentions of speakers.

It is intuitively plausible that such an analysis should be possible. After all, there clearly are regularities which connect utterances and the beliefs of speakers; roughly, it seems that, for the most part, speakers seriously utter a sentence which (in the context) means p only if they also believe p . One might then, try to analyze meaning directly in terms of the beliefs of language users, by saying that what it is for a sentence S to express some proposition p is for it to be the case that, typically, members of the community would not utter S unless they believed p . However, we can imagine a community in which there is some action which everyone would only perform were they to believe some proposition p , but which is such that no member of the community knows that any other member of the community acts according to a rule of this sort. It is plausible that in such a community, the action-type in question would not express the proposition p , or indeed have any meaning at all.

Because of cases like this, it seems that regularities in meaning and belief are not sufficient to ground an analysis of meaning. For this reason, many proponents of a mentalist analysis of meaning in terms of belief have sought instead to analyze meaning in terms of conventions governing such regularities. Roughly, a regularity is a matter of convention when the regularity obtains because there is something akin to an agreement among a group of people to keep the regularity in place. So, applied to our present example, the idea would be (again roughly) that for a sentence S to express a proposition p in some group is for there to be something like an agreement in that group to maintain some sort of regularity between utterances of S and agents’ believing p . This seems to be what is lacking in the example described in the previous paragraph.

There are different ways to make this rough idea precise (see the entry on convention ). According to one important view, a sentence S expresses the proposition p if and only if the following three conditions are satisfied: (1) speakers typically utter S only if they believe p and typically come to believe p upon hearing S , (2) members of the community believe that (1) is true, and (3) the fact that members of the community believe that (1) is true, and believe that other members of the community believe that (1) is true, gives them a good reason to go on acting so as to make (1) true. (This a simplified version of the theory defended in David Lewis 1975.)

For critical discussion of this sort of analysis of meaning, see Burge 1975, Hawthorne 1990, Laurence 1996, and Schiffer 2006.

The two sorts of mentalist theories sketched above both try to explain meaning in terms of the relationship between linguistic expressions and propositional attitudes of users of the relevant language. But this is not the only sort of theory available to a theorist who wants to analyze meaning in terms of broadly mental representation. A common view in the philosophy of mind and cognitive science is that the propositional attitudes of subjects are underwritten by an internal language of thought, comprised of mental representations. (See entry on the computational theory of mind .) One might try to explain linguistic meaning directly in terms of the contents of mental representations, perhaps by thinking of language processing as pairing linguistic expressions with mental representations; one could then think of the meaning of the relevant expression for that individual as being inherited from the content of the mental representation with which it is paired.

While this view has, historically, not enjoyed as much attention as the mentalist theories discussed in the preceding two subsections, it is a natural view for anyone who endorses the widely held thesis that semantic competence is to be explained by some sort of internal representation of the semantic facts. If we need to posit such internal representations anyway, it is natural to think that the meaning of an expression for an individual can be explained in terms of that individual's representation of its meaning. For discussion of this sort of theory, see Laurence (1996).

Just as proponents of Gricean and convention-based theories typically view their theories as only the first stage in an analysis of meaning—because they analyze meaning in terms of another sort of mental representation—so proponents of mental representation-based theories will typically seek to provide an independent analysis of contents of mental representations. For an overview of attempts to provide the latter sort of theory, see the entry on mental representation and the essays in Stich and Warfield (1994).

3.2 Non-Mentalist Theories

As noted above, not all foundational theories of meaning attempt to explain meaning in terms of mental content. One might be inclined to pursue a non-mentalist foundational theory of meaning for a number of reasons; for example, one might be skeptical about the mentalist theories on offer; one might think that mental representation should be analyzed in terms of linguistic representation, rather than the other way around; or one might think that representation should be analyzable in non-representational terms, and doubt whether there is any true explanation of mental representation suitable to accompany a mentalist reduction of meaning to mental representation.

To give a non-mentalist foundational theory of meaning is to say which aspects of the use of an expression determine meaning—and do so without taking that expression to simply inherit its content from some more fundamental bearer of content. In what follows I’ll briefly discuss some of the aspects of the use of expressions which proponents of non-mentalist theories have taken to explain their meanings.

In Naming and Necessity , Kripke suggested that the reference of a name could be explained in terms of the history of use of that name, rather than by descriptions associated with that name by its users. In the standard case, Kripke thought, the right explanation of the reference of a name could be divided into an explanation of the name’s introduction as name for this or that—an event of “baptism”—and its successful transmission from one speaker to another.

One approach to the theory of meaning is to extend Kripke’s remarks in two ways: first, by suggesting that they might serve as an account of meaning, as well as reference; [ 6 ] and second, by extending them to parts of speech other than names. (See, for discussion, Devitt 1981.) In this way, we might aim to explain the meanings of expressions in terms of their causal origin.

While causal theories don’t take expressions to simply inherit their contents from mental states, it is plausible that they should still give mental states an important role to play in explaining meaning. For example, it is plausible that introducing a term involves intending that it stand for some object or property, and that transmission of a term from one speaker to another involves the latter intending to use it in the same way as the former.

There are two standard problems for causal theories of this sort (whether they are elaborated in a mentalist or a non-mentalist way). The first is the problem of extending the theory from the case of names to to other sorts of vocabulary for which the theory seems less natural. Examples which have seemed to many to be problematic are empty names and non-referring theoretical terms, logical vocabulary, and predicates which, because their content does not seem closely related to the properties represented in perceptual experience, are not intuitively linked to any initial act of “baptism”. The second problem, which is sometimes called the “ qua problem”, is the problem of explaining which of the many causes of a term’s introduction should determine its content. Suppose that the term “water” was introduced in the presence of a body of H 2 O. What made it a term for this substance, rather than for liquid in general, or colorless liquid, or colorless liquid in the region of the term's introduction? The proponent of a causal theory owes some answer to this question; see for discussion Devitt and Sterelny (1987).

For a classic discussion of the prospects of causal theories, see Evans (1973). For a recent theory which makes causal origin part but not all of the story, see Dickie (2015).

Causal theories aim to explain meaning in terms of the relations between expressions and the objects and properties they represent. A very different sort of foundational theory of meaning which maintains this emphasis on the relations between expressions and the world gives a central role to a principle of charity which holds that (modulo some qualifications) the right assignment of meanings to the expression of a subject’s language is that assignment of meanings which maximizes the truth of the subject’s utterances.

An influential proponent of this sort of view was Donald Davidson, who stated the motivation for the view as follows:

A central source of trouble is the way beliefs and meanings conspire to account for utterances. A speaker who holds a sentence to be true on an occasion does so in part because of what he means, or would mean, by an utterance of that sentence, and in part because of what he believes. If all we have to go on is the fact of honest utterance, we cannot infer the belief without knowing the meaning, and have no chance of inferring the meaning without the belief. (Davidson 1974a: 314; see also Davidson 1973)

Davidson’s idea was that attempts to state the facts in virtue of which expressions have a certain meaning for a subject face a kind of dilemma: if we had an independent account of what it is for an agent to have a belief with a certain content, we could ascend from there to an account of what it is for a sentence to have a meaning; if we had an independent account of what it is for a sentence to have a meaning, we could ascend from there to an account of what it is for an agent to have a belief with a certain content; but in fact neither sort of independent account is available, because many assignments of beliefs and meanings are consistent with the subject’s linguistic behavior. Davidson’s solution to this dilemma is that we must define belief and meaning together, in terms of an independent third fact: the fact that the beliefs of an agent, and the meanings of her words, are whatever they must be in order to maximize the truth of her beliefs and utterances.

By tying meaning and belief to truth, this sort of foundational theory of meaning implies that it is impossible for anyone who speaks a meaningful language to be radically mistaken about the nature of the world; and this implies that certain levels of radical disagreement between a pair of speakers or communities will also be impossible (since the beliefs of each community must be, by and large, true). This is a consequence of the view which Davidson embraced (see Davidson 1974b); but one might also reasonably think that radical disagreement, as well as radical error, are possible, and hence that any theory, like Davidson’s, which implies that they are impossible must be mistaken.

A different sort of worry about a theory of this sort is that the requirement that we maximize the truth of the utterances of subjects hardly seems sufficient to determine the meanings of the expressions of their language. It seems plausible, offhand, that there will be many different interpretations of a subject’s language which will be tied on the measure of truth-maximization; one way to see the force of this sort of worry is to recall the point, familiar from our discussion of possible worlds semantics in §2.1.5 above, that a pair of sentences can be true in exactly the same circumstances and yet differ in meaning. One worry is thus that a theory of Davidson’s sort will entail an implausible indeterminacy of meaning. For Davidson’s fullest attempt to answer this sort of worry, see Chapter 3 of Davidson (2005).

A different sort of theory emerges from a further objection to the sort of theory discussed in the previous section. This objection is based on Hilary Putnam’s (1980, 1981) model-theoretic argument. This argument aimed to show that there are very many different assignments of reference to subsentential expressions of our language which make all of our utterances true. (For details on how the argument works, see the entry on Skolem’s paradox , especially §3.4.) Putnam’s argument therefore leaves us with a choice between two options: either we must accept that there are no facts of the matter about what any of our expressions refer to, or we must deny that is determined solely by a principle of truth-maximization.

Most philosophers take the second option. Doing so, however, doesn’t mean that something like the principle of charity can’t still be part of our foundational theory of meaning.

David Lewis (1983, 1984) gave a version of this kind of response, which he credits to Merrill (1980), and which has since been quite influential. His idea was that the assignment of contents to expressions of our language is fixed, not just by the constraint that the right interpretation will maximize the truth of our utterances, but by picking the interpretation which does best at jointly satisfying the constraints of truth-maximization and the constraint that the referents of our terms should, as much as possible, be “the ones that respect the objective joints in nature” (1984: 227).

Such entities are often said to be more “eligible” to be the referents of expressions than others. An approach to the foundations of meaning based on the twin principles of charity + eligibility has some claim to being the most widely held view today. See Sider (2011) for an influential extension of the Lewisian strategy.

Lewis’ solution to Putnam’s problem comes with a non-trivial metaphysical price tag: recognition of an objective graded distinction between more and less natural properties. Some have found the price too much to pay, and have sought other approaches to the foundational theory of meaning. But even if we recognize in our metaphysics a distinction between properties which are “joint-carving” and those which are not, we might still doubt whether this distinction can remedy the sorts of indeterminacy problems which plague foundational theories based solely on the principle of charity. For doubts along these lines, see Hawthorne (2007).

A different way to develop a non-mentalist foundational theory of meaning focuses less on relations between subsentential expressions or sentences and bits of non-linguistic reality and more on the regularities which govern our use of language. Views of this sort have been defended by a number of authors; this section focuses on the version of the view developed in Horwich (1998, 2005).

Horwich’s core idea is that our acceptance of sentences is governed by certain laws, and, in the case of non-ambiguous expressions, there is a single “acceptance regularity” which explains all of our uses of the expression. The type of acceptance regularity which is relevant will vary depending on the sort of expression whose meaning is being explained. For example, our use of a perceptual term like “red” might be best explained by the following acceptance regularity:

The disposition to accept “that is red” in response to the sort of visual experience normally provoked by a red surface.

whereas, in the case of a logical term like “and”, the acceptance regularity will involve dispositions to accept inferences involving pairs of sentences rather than dispositions to respond to particular sorts of experiences:

The disposition to accept the two-way argument schema “ p , q // p and q ”.

As these examples illustrate, it is plausible that a strength of a view like Horwich’s is its ability to handle expressions of different categories.

Like its competitors, Horwich’s theory is also open to some objections. One might worry that his use of the sentential attitude of acceptance entails a lapse into mentalism, if acceptance either just is, or is analyzed in terms of, beliefs. There is also a worry—which affects other “use” or “conceptual role” or “functional role” theories of meaning—that Horwich’s account implies the existence of differences in meaning which do not exist; it seems, for example, that two people’s use of some term might be explained by distinct basic acceptance regularities without their meaning different things by that term. Schiffer (2000) discusses the example of “dog”, and the differences between the basic acceptance regularities which govern the use of the term for the blind, the biologically unsophisticated, and people acquainted only with certain sorts of dogs. [ 7 ]

This last concern about Horwich’s theory stems from the fact that the theory is, at its core, an individualist theory: it explains the meaning of an expression for an individual in terms of properties of that individual’s use of the term. A quite different sort of use theory of meaning turns from the laws which explain an individual’s use of a word to the norms which, in a society, govern the use of the relevant terms. Like the other views discussed here, the view that meaning is a product of social norms of this sort has a long history; it is particularly associated with the work of the later Wittgenstein and his philosophical descendants. (See especially Wittgenstein 1953.)

An important defender of this sort of view is Robert Brandom. On Brandom’s view, a sentence’s meaning is due to the conditions, in a given society, under which it is correct or appropriate to perform various speech acts involving the sentence. To develop a theory of this sort, one must do two things. First, one must show how the meanings of expressions can be explained in terms of these normative statuses—in Brandom’s (slightly nonstandard) terms, one must show how semantics can be explained in terms of pragmatics. Second, one must explain how these normative statuses can be instituted by social practices.

For details, see Brandom (1994), in which the view is developed at great length; for a critical discussion of Brandom’s attempt to carry out the second task above, see Rosen (1997). For discussion of the role (or lack thereof) of normativity in a foundational theory of meaning, see Hattiangadi (2007), Gluer and Wilkforss (2009), and the entry on meaning normativity .

  • Ayer, Alfred Jules, 1936, Language, Truth, and Logic , London: Victor Gollancz.
  • Beaney, Michael (ed.), 1997, The Frege Reader , Oxford: Basil Blackwell.
  • Bezuidenhout, Anne, 2002, Truth-Conditional Pragmatics, Philosophical Perspectives , 16: 105–134.
  • Brandom, Robert B., 1994, Making It Explicit: Reasoning, Representing, and Discursive Commitment , Cambridge, MA: Harvard University Press.
  • –––, 2000, Articulating Reasons: An Introduction to Inferentialism , Cambridge, MA: Harvard University Press.
  • Braun, David, 1993, “Empty Names”, Noûs , 27(4): 449–469. doi:10.2307/2215787
  • Braun, David and Jennifer Saul, 2002, “Simple Sentences, Substitutions, and Mistaken Evaluations”, Philosophical Studies , 111(1): 1–41. doi:10.1023/A:1021287328280
  • Burge, Tyler, 1975, “On Knowledge and Convention”, The Philosophical Review , 84(2): 249–255. doi:10.2307/2183970
  • –––, 1986, “On Davidson’s ‘Saying That’”, in Truth and Interpretation: Perspectives on the Philosophy of Donald Davidson , Ernest Lepore (ed.), Oxford: Blackwell Publishing, 190–210.
  • Burgess, Alexis and Brett Sherman (eds.), 2014, Metasemantics: New Essays on the Foundations of Meaning , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199669592.001.0001
  • Caplan, Ben, 2005, “Against Widescopism”, Philosophical Studies , 125(2): 167–190. doi:10.1007/s11098-004-7814-1
  • Cappelen, Herman and John Hawthorne, 2009, Relativism and Monadic Truth , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199560554.001.0001
  • Cappelen, Herman and Ernie Lepore, 2005, Insensitive Semantics: A Defense of Semantic Minimalism and Speech Act Pluralism , Malden, MA: Blackwell Publishing. doi:10.1002/9780470755792
  • Carnap, Rudolf, 1947, Meaning and Necessity: A Study in Semantics and Modal Logic , Chicago: University of Chicago Press.
  • Carston, Robyn, 2002, Thoughts and Utterances: The Pragmatics of Explicit Communication , Malden, MA: Blackwell Publishing. doi:10.1002/9780470754603
  • Chalmers, David J., 2004, “Epistemic Two-Dimensional Semantics”, Philosophical Studies , 118(1/2): 153–226. doi:10.1023/B:PHIL.0000019546.17135.e0
  • –––, 2006, “The Foundations of Two-Dimensional Semantics”, in Two-Dimensional Semantics , Manuel Garcia-Carpintero and Josep Macià (eds.), Oxford: Clarendon Press, 55–140.
  • –––, 2011, “Propositions and Attitude Ascriptions: A Fregean Account”, Noûs , 45(4): 595–639. doi:10.1111/j.1468-0068.2010.00788.x
  • Chisholm, Roderick M., 1981, The First Person: An Essay on Reference and Intentionality , Minneapolis, MN: University of Minnesota Press.
  • Chomsky, Noam, 2000, New Horizons in the Study of Language and Mind , Cambridge: Cambridge University Press. doi:10.1017/CBO9780511811937
  • Cohen, Stewart, 1986, “Knowledge and Context”, The Journal of Philosophy , 83(10): 574–583. doi:10.2307/2026434
  • Davidson, Donald, 1967, “Truth and Meaning”, Synthese , 17(1): 304–323; reprinted in Davidson 1984: 17–36. doi:10.1007/BF00485035
  • –––, 1968, “On Saying That”, Synthese , 19(1–2): 130–146. doi:10.1007/BF00568054
  • –––, 1973, “Radical Interpretation”, Dialectica , 27(3–4): 313–328. doi:10.1111/j.1746-8361.1973.tb00623.x
  • –––, 1974a, “Belief and the Basis of Meaning”, Synthese , 27(3–4): 309–323. doi:10.1007/BF00484597
  • –––, 1974b, “On the Very Idea of a Conceptual Scheme”, Proceedings and Addresses of the American Philosophical Association , 47: 5–20; reprinted in Davidson 1984: 183–198. doi:10.2307/3129898
  • –––, 1976, “Reply to Foster”, in Evans and McDowell (eds.) 1976: 33–41.
  • –––, 1984, Inquiries into Truth and Interpretation , Oxford: Oxford University Press.
  • –––, 2005, Truth and Predication , Cambridge, MA: Harvard University Press.
  • Davis, Wayne A., 2002, Meaning, Expression and Thought , Cambridge: Cambridge University Press. doi:10.1017/CBO9780511498763
  • DeRose, Keith, 1992, “Contextualism and Knowledge Attributions”, Philosophy and Phenomenological Research , 52(4): 913–929. doi:10.2307/2107917
  • Devitt, Michael, 1981, Designation , New York: Columbia University Press. [ Devitt 1981 available online ]
  • Devitt, Michael and Kim Sterelny, 1987, Language and Reality: An Introduction to the Philosophy of Language , Cambridge, MA: MIT Press.
  • Dickie, Imogen, 2015, Fixing Reference , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780198755616.001.0001
  • Dummett, Michael A.E., 1981, The Interpretation of Frege’s Philosophy , Cambridge, MA: Harvard University Press.
  • Egan, Andy, John Hawthorne, and Brian Weatherson, 2005, “Epistemic Modals in Context”, in Preyer and Peter 2005: 131–170.
  • Egan, Andy and Brian Weatherson (eds.), 2011, Epistemic Modality , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199591596.001.0001
  • Evans, Gareth, 1973, “The Causal Theory of Names”, Proceedings of the Aristotelian Society (Supplement), 47 (1): 187–208.
  • –––, 1981, “Understanding Demonstratives”, in Meaning and Understanding , Herman Parret and Jacques Bouveresse (eds.), New York: Walter de Gruyter, 280–304; reprinted in his 1996 Collected Papers , Oxford: Clarendon Press, 291–304.
  • Evans, Gareth and John McDowell (eds.), 1976, Truth and Meaning: Essays in Semantics , Oxford: Clarendon Press.
  • Fine, Kit, 2007, Semantic Relationism , New York: Blackwell Publishing.
  • Foster, J.A., 1976, “Meaning and Truth Theory”, in Evans and McDowell (eds.) 1976: 1–32.
  • Frege, Gottlob, 1879 [1997], Begriffsschrift , Halle: Louis Nebert; translated and reprinted in Beaney 1997: 47–79.
  • –––, 1892 [1960], “Über Sinn und Bedeutung” (On Sense and Reference), Zeitschrift für Philosophie und philosophische Kritik , 100: 25–50. Translated and reprinted in Translations from the Philosophical Writings of Gottlob Frege , Peter Geach and Max Black (eds.), Oxford: Basil Blackwell, 1960, 56–78.
  • –––, 1906 [1997], “Kurze Übersicht meiner logischen Lehren?”, unpublished. Translated as “A Brief Survey of My Logical Doctrines”, in Beaney 1997: 299–300.
  • Geach, P. T., 1960, “Ascriptivism”, The Philosophical Review , 69(2): 221–225. doi:10.2307/2183506
  • –––, 1965, “Assertion”, The Philosophical Review , 74(4): 449–465. doi:10.2307/2183123
  • Gibbard, Allan, 1990, Wise Choices, Apt Feelings: A Theory of Normative Judgment , Cambridge, MA: Harvard University Press.
  • –––, 2003, Thinking How to Live , Cambridge, MA: Harvard University Press.
  • Gilmore, Cody, 2014, “Parts of Propositions”, in Mereology and Location , Shieva Kleinschmidt (ed.), Oxford: Oxford University Press, 156–208. doi:10.1093/acprof:oso/9780199593828.003.0009
  • –––, forthcoming, “Why 0-adic Relations Have Truth Conditions”, in Tillman forthcoming.
  • Gluer, Kathrin and Åsa Wikforss, 2009, “Against Content Normativity”, Mind , 118(469): 31–70. doi:10.1093/mind/fzn154
  • Graff Fara, Delia, 2015, “Names Are Predicates”, Philosophical Review , 124(1): 59–117. doi:10.1215/00318108-2812660
  • Grice, H. P., 1957, “Meaning”, The Philosophical Review , 66(3): 377–388; reprinted in Grice 1989: 213–223. doi:10.2307/2182440
  • –––, 1969, “Utterer’s Meaning and Intention”, The Philosophical Review , 78(2): 147–177; reprinted in Grice 1989: 86–116. doi:10.2307/2184179
  • –––, 1989, Studies in the Way of Words , Cambridge, MA: Harvard University Press.
  • Hanks, Peter W., 2007, “The Content–Force Distinction”, Philosophical Studies , 134(2): 141–164. doi:10.1007/s11098-007-9080-5
  • –––, 2011, “Structured Propositions as Types”, Mind , 120(477): 11–52. doi:10.1093/mind/fzr011
  • Hare, R. M., 1952, The Language of Morals , Oxford: Oxford University Press. doi:10.1093/0198810776.001.0001
  • Hattiangadi, Anandi, 2007, Oughts and Thoughts: Scepticism and the Normativity of Meaning , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199219025.001.0001
  • Hawthorne, John, 2007, “Craziness and Metasemantics”, Philosophical Review , 116(3): 427–440. doi:10.1215/00318108-2007-004
  • –––, 2006, “Testing for Context-Dependence”, Philosophy and Phenomenological Research , 73(2): 443–450. doi:10.1111/j.1933-1592.2006.tb00627.x
  • –––, 1990, “A Note on ‘Languages and Language’”, Australasian Journal of Philosophy , 68(1): 116–118. doi:10.1080/00048409012340233
  • Heim, Irene, 1982, The Semantics of Definite and Indefinite Noun Phrases , Ph.D. thesis, Department of Linguistics, University of Massachusetts at Amherst
  • Heim, Irene and Angelika Kratzer, 1998, Semantics in Generative Grammar , (Blackwell Textbooks in Linguistics 13), Malden, MA: Blackwell.
  • Horwich, Paul, 1998, Meaning , Oxford: Oxford University Press. doi:10.1093/019823824X.001.0001
  • –––, 2005, Reflections on Meaning , Oxford: Clarendon Press. doi:10.1093/019925124X.001.0001
  • Jeshion, Robin, 2015, “Referentialism and Predicativism About Proper Names”, Erkenntnis , 80(S2): 363–404. doi:10.1007/s10670-014-9700-3
  • Johnston, Mark, 1988, “The End of the Theory of Meaning”, Mind & Language , 3(1): 28–42. doi:10.1111/j.1468-0017.1988.tb00131.x
  • Kamp, Hans, 1971, “Formal Properties of ‘Now’”, Theoria , 37(3): 227–273. doi:10.1111/j.1755-2567.1971.tb00071.x
  • –––, 1981, “A Theory of Truth and Semantic Representation”, in Formal Methods in the Study of Language , Jeroen A. G. Groenendijk, Theo M. V. Janssen, and M. B. J. Stokhof (eds.), Amsterdam: Mathematisch Centrum; reprinted in in Formal Semantics: The Essential Readings , Paul Portner and Barbara H. Partee (eds.), Oxford: Blackwell Publishers, 189–222. doi:10.1002/9780470758335.ch8
  • Kaplan, David, 1989, “Demonstratives”, in Themes from Kaplan , Joseph Almog, John Perry, and Howard Wettstein (eds.), New York: Oxford University Press, 481–563.
  • Keller, Lorraine, 2013, “The Metaphysics of Propositional Constituency”, Canadian Journal of Philosophy , 43(5–6): 655–678. doi:10.1080/00455091.2013.870735
  • King, Jeffrey C., 2003, “Tense, Modality, and Semantic Values”, Philosophical Perspectives , 17(1): 195–246. doi:10.1111/j.1520-8583.2003.00009.x
  • –––, 2007, The Nature and Structure of Content , Oxford: Oxford University Press.
  • –––, 2014, “Naturalized Propositions”, in King, Soames, and Speaks (eds.) 2014: 47–70.
  • King, Jeffrey C., Scott Soames, and Jeff Speaks (eds.), 2014, New Thinking about Propositions , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199693764.001.0001
  • Kirk-Giannini, Cameron Domenico and Ernie Lepore, 2017, “De Ray: On the Boundaries of the Davidsonian Semantic Programme”, Mind , 126(503): 697–714. doi:10.1093/mind/fzv186
  • Kolbel, Max, 2001, “Two Dogmas of Davidsonian Semantics”, The Journal of Philosophy , 98(12): 613–635. doi:10.2307/3649462
  • Kripke, Saul A., 1972, Naming and Necessity , Cambridge, MA: Harvard University Press.
  • –––, 1979, “A Puzzle about Belief”, in Meaning and Use , Avishai Margalit (ed.), (Synthese Language Library 3), Dordrecht: Springer Netherlands, 239–283. doi:10.1007/978-1-4020-4104-4_20
  • –––, 1982, Wittgenstein on Rules and Private Language: An Elementary Exposition , Cambridge, MA: Harvard University Press.
  • Larson, Richard K. and Peter Ludlow, 1993, “Interpreted Logical Forms”, Synthese , 95(3): 305–355. doi:10.1007/BF01063877
  • Larson, Richard K and Gabriel Segal, 1995, Knowledge of Meaning: An Introduction to Semantic Theory , Cambridge, MA: MIT Press.
  • Lasersohn, Peter, 2005, “Context Dependence, Disagreement, and Predicates of Personal Taste*”, Linguistics and Philosophy , 28(6): 643–686. doi:10.1007/s10988-005-0596-x
  • Laurence, Stephen, 1996, “A Chomskian Alternative to Convention-Based Semantics”, Mind , 105(418): 269–301. doi:10.1093/mind/105.418.269
  • Lepore, Ernest and Kirk Ludwig, 2007, Donald Davidson’s Truth-Theoretic Semantics , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199290932.001.0001
  • Lepore, Ernest and Barry Loewer, 1989, “You Can Say That Again”, Midwest Studies in Philosophy , 14: 338–356. doi:10.1111/j.1475-4975.1989.tb00196.x
  • Lewis, David, 1970, “General Semantics”, Synthese , 22(1–2): 18–67. doi:10.1007/BF00413598
  • –––, 1975, “Languages and Language”, in Language, Mind, and Knowledge , Keith Gunderson (ed.), Minneapolis: University of Minnesota Press.
  • –––, 1979, “Attitudes De Dicto and De Se ”, The Philosophical Review , 88(4): 513–543. doi:10.2307/2184843
  • –––, 1980, “Index, Context, and Content”, in Philosophy and Grammar , Stig Kanger and Sven Ōhman (eds.) (Synthese Library 143), Dordrecht: Springer Netherlands, 79–100. doi:10.1007/978-94-009-9012-8_6
  • –––, 1983, “New Work for a Theory of Universals”, Australasian Journal of Philosophy , 61(4): 343–377. doi:10.1080/00048408312341131
  • –––, 1984, “Putnam’s Paradox”, Australasian Journal of Philosophy , 62(3): 221–236. doi:10.1080/00048408412340013
  • –––, 1996, “Elusive Knowledge”, Australasian Journal of Philosophy , 74(4): 549–567. doi:10.1080/00048409612347521
  • Lewis, Karen S., 2014, “Do We Need Dynamic Semantics?”, in Burgess and Sherman 2014: 231–258. doi:10.1093/acprof:oso/9780199669592.003.0010
  • MacFarlane, John, 2014, Assessment Sensitivity: Relative Truth and Its Applications , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199682751.001.0001
  • MacFarlane, John, 2016, “I – Vagueness as Indecision”, Aristotelian Society Supplementary Volume , 90(1): 255–283. doi:10.1093/arisup/akw013
  • Marcus, Ruth Barcan, 1961, “Modalities and Intensional Languages”, Synthese , 13(4): 303–322. doi:10.1007/BF00486629
  • McDowell, John, 1977, “On the Sense and Reference of a Proper Name”, Mind , 86(342): 159–185. doi:10.1093/mind/LXXXVI.342.159
  • McGilvray, James, 1998, “Meanings Are Syntactically Individuated and Found in the Head”, Mind & Language , 13(2): 225–280. doi:10.1111/1468-0017.00076
  • McGlone, Michael, 2012, “Propositional Structure and Truth Conditions”, Philosophical Studies , 157(2): 211–225. doi:10.1007/s11098-010-9633-x
  • McKeown-Green, Arthur Jonathan, 2002, The Primacy of Public Language , PhD Dissertation, Princeton University.
  • Merricks, Trenton, 2015, Propositions , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780198732563.001.0001
  • Merrill, G. H., 1980, “The Model-Theoretic Argument against Realism”, Philosophy of Science , 47(1): 69–81. doi:10.1086/288910
  • Moltmann, Friederike, 2013, “Propositions, Attitudinal Objects, and the Distinction between Actions and Products”, Canadian Journal of Philosophy , 43(5–6): 679–701. doi:10.1080/00455091.2014.892770
  • Montague, Richard, 1974, Formal Philosophy: The Selected Papers of Richard Montague , R. Thomason (ed.), New Haven, CT: Yale University Press.
  • Moss, Sarah, 2012, “The Role of Linguistics in the Philosophy of Language”, in Routledge Companion to the Philosophy of Language , Delia Graff Fara and Gillian Russell (eds.), Routledge.
  • –––, 2013, “Epistemology Formalized”, Philosophical Review , 122(1): 1–43. doi:10.1215/00318108-1728705
  • Neale, Stephen, 1992, “Paul Grice and the Philosophy of Language”, Linguistics and Philosophy , 15(5): 509–559. doi:10.1007/BF00630629
  • Nolan, Daniel P., 2013, “Impossible Worlds”, Philosophy Compass , 8(4): 360–372. doi:10.1111/phc3.12027
  • Perry, John, 1977, “Frege on Demonstratives”, The Philosophical Review , 86(4): 474–497. doi:10.2307/2184564
  • –––, 1979, “The Problem of the Essential Indexical”, Noûs , 13(1): 3–21. doi:10.2307/2214792
  • Pietroski, Paul M., 2003, “The Character of Natural Language Semantics”, in Epistemology of Language , Alex Barber (ed.), Oxford: Oxford University Press, 217–256.
  • –––, 2005, “Meaning Before Truth”, in Preyer and Peter 2005: 255–302.
  • –––, 2018, Conjoining Meanings: Semantics Without Truth Values , Oxford: Oxford University Press. doi:10.1093/oso/9780198812722.001.0001
  • Plantinga, Alvin, 1974, The Nature of Necessity , Clarendon Press.
  • –––, 1978, “The Boethian Compromise”, American Philosophical Quarterly , 15(2): 129–138.
  • Preyer, Gerhard and Georg Peter (eds.), 2005, Contextualism in Philosophy: Knowledge, Meaning, and Truth , Oxford: Clarendon Press.
  • Prior, A. N., 1960, “The Runabout Inference-Ticket”, Analysis , 21(2): 38–39. doi:10.1093/analys/21.2.38
  • Putnam, Hilary, 1980, “Models and Reality”, Journal of Symbolic Logic , 45(3): 464–482. doi:10.2307/2273415
  • –––, 1981, Reason, Truth and History , Cambridge: Cambridge University Press. doi:10.1017/CBO9780511625398
  • Quine, W.V.O., 1960, Word and Object , Cambridge, MA: MIT Press.
  • –––, 1970 [1986], Philosophy of Logic , New Jersey: Prentice Hall; second edition, 1986. (Page reference is to the second edition.)
  • Ray, Greg, 2014, “Meaning and Truth”, Mind , 123(489): 79–100. doi:10.1093/mind/fzu026
  • Recanati, François, 2004, Literal Meaning , Cambridge: Cambridge University Press. doi:10.1017/CBO9780511615382
  • –––, 2010, Truth-Conditional Pragmatics , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199226993.001.0001
  • Richard, Mark, 1981, “Temporalism and Eternalism”, Philosophical Studies , 39(1): 1–13. doi:10.1007/BF00354808
  • –––, 2013, “What Are Propositions?”, Canadian Journal of Philosophy , 43(5–6): 702–719. doi:10.1080/00455091.2013.870738
  • Rosen, Gideon, 1997, “Who Makes the Rules Around Here?”, Philosophy and Phenomenological Research , 57(1): 163–171. doi:10.2307/2953786
  • Rothschild, Daniel, 2007, “Presuppositions and Scope”:, Journal of Philosophy , 104(2): 71–106. doi:10.5840/jphil2007104233
  • Rothschild, Daniel and Seth Yalcin, 2016, “Three Notions of Dynamicness in Language”, Linguistics and Philosophy , 39(4): 333–355. doi:10.1007/s10988-016-9188-1
  • Russell, Bertrand, 1903, The Principles of Mathematics , Cambridge: Cambridge University Press.
  • Salmon, Nathan U., 1986, Frege’s Puzzle , Atascadero, CA: Ridgeview Publishing Company.
  • –––, 1990, “A Millian Heir Rejects the Wages of Sinn ”, in Propositional Attitudes: The Role of Content in Logic, Language, and Mind , C. Anthony Anderson and Joseph Owens (eds.), (CSLI Lecture Notes 20), Stanford, CA: CSLI Publications.
  • Searle, John R., 1962, “Meaning and Speech Acts”, The Philosophical Review , 71(4): 423–432. doi:10.2307/2183455
  • Schiffer, Stephen, 1972, Meaning , Oxford: Oxford University Press.
  • –––, 1987, Remnants of Meaning , Cambridge, MA: MIT Press.
  • –––, 2000, “Review: Horwich on Meaning”, Philosophical Quarterly , 50(201): 527–536.
  • –––, 2006, “Two Perspectives on Knowledge of Language”, Philosophical Issues , 16: 275–287. doi:10.1111/j.1533-6077.2006.00114.x
  • Schroeder, Mark, 2008, Being For: Evaluating the Semantic Program of Expressivism , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199534654.001.0001
  • Sellars, Wilfrid, 1968, Science and Metaphysics: Variations on Kantian Themes , New York: Humanities Press.
  • Sider, Theodore, 2011, Writing the Book of the World , Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780199697908.001.0001
  • Soames, Scott, 1988, “Direct Reference, Propositional Attitudes, and Semantic Content”, in Propositions and Attitudes , Nathan Salmon and Scott Soames (eds.), Oxford: Oxford University Press, 197–239.
  • –––, 1992, “Truth, Meaning, and Understanding”, Philosophical Studies , 65(1–2): 17–35. doi:10.1007/BF00571314
  • –––, 1997, “Skepticism about Meaning: Indeterminacy, Normativity, and the Rule-Following Paradox”, Canadian Journal of Philosophy , 27 Supplementary Volume 23: 211–249. doi:10.1080/00455091.1997.10715967
  • –––, 1997, “Skepticism About Meaning: Indeterminacy, Normativity, and the Rule-following Paradox”, Canadian Journal of Philosophy (Supplementary Volume), 23: 211–249.
  • –––, 1998, “The Modal Argument: Wide Scope and Rigidified Descriptions”, Noûs , 32(1): 1–22. doi:10.1111/0029-4624.00084
  • –––, 2002, Beyond Rigidity: The Unfinished Semantic Agenda of Naming and Necessity , Oxford: Oxford University Press. doi:10.1093/0195145283.001.0001
  • –––, 2010, Philosophy of Language , Princeton, NJ: Princeton University Press.
  • –––, 2014. “Cognitive Propositions”, in King, Soames, and Speaks 2014: 91–125.
  • Sosa, David, 2001, “Rigidity in the Scope of Russell’s Theory”, Noûs , 35(1): 1–38. doi:10.1111/0029-4624.00286
  • Speaks, Jeff, 2014, “Propositions Are Properties of Everything or Nothing”, in King, Soames, and Speaks 2014: 71–90.
  • Sperber, D. & Wilson, D., 1995, Relevance , Blackwell.
  • Stalnaker, Robert C., 1984, Inquiry , Cambridge, MA: MIT Press.
  • Stanley, Jason, 2007, Language in Context: Selected Essays , Oxford: Clarendon Press.
  • Stich, Stephen P. and Ted A. Warfield (eds.), 1994, Mental Representation: A Reader , Cambridge, MA: Blackwell.
  • Stojnić, Una, 2019, “Content in a Dynamic Context”, Noûs , 53(2): 394–432. doi:10.1111/nous.12220
  • Sullivan, Meghan, 2014, “Change We Can Believe In (and Assert): Change We Can Believe In (and Assert)”, Noûs , 48(3): 474–495. doi:10.1111/j.1468-0068.2012.00874.x
  • Tarski, Alfred, 1944, “The Semantic Conception of Truth: And the Foundations of Semantics”, Philosophy and Phenomenological Research , 4(3): 341–376. doi:10.2307/2102968
  • Taschek, William W., 1995, “Belief, Substitution, and Logical Structure”, Noûs , 29(1): 71–95. doi:10.2307/2215727
  • Tillman, Chris (ed.), forthcoming, Routledge Handbook of Propositions , Abingdon, UK: Routledge.
  • Inwagen, Peter van, 2004, “A Theory of Properties”, in Oxford Studies in Metaphysics , volume 1, Dean W Zimmerman (ed.), Oxford: Clarendon Press, 107–138.
  • Weatherson, Brian and Andy Egan, 2011, “Introduction: Epistemic Modals and Epistemic Modality”, in Egan and Weatherson 2011: 1–18. doi:10.1093/acprof:oso/9780199591596.003.0001
  • Wilson, Deirdre and Dan Sperber, 2012, Meaning and Relevance , Cambridge: Cambridge University Press. doi:10.1017/CBO9781139028370
  • Wittgenstein, Ludwig, 1922, Tractatus Logico-Philosophicus , C. K. Ogden (trans.), London: Routledge & Kegan Paul. Originally published as “Logisch-Philosophische Abhandlung”, in Annalen der Naturphilosophische , XIV (3/4), 1921.
  • –––, 1953, Philosophical Investigations , G.E.M. Anscombe (trans.), New York: MacMillan.
  • Yalcin, Seth, 2007, “Epistemic Modals”, Mind , 116(464): 983–1026. doi:10.1093/mind/fzm983
  • –––, 2014, “Semantics and Metasemantics in the Context of Generative Grammar”, in Burgess and Sherman 2014: 17–54. doi:10.1093/acprof:oso/9780199669592.003.0002
  • –––, 2018, “Belief as Question-Sensitive”, Philosophy and Phenomenological Research , 97(1): 23–47. doi:10.1111/phpr.12330
How to cite this entry . Preview the PDF version of this entry at the Friends of the SEP Society . Look up topics and thinkers related to this entry at the Internet Philosophy Ontology Project (InPhO). Enhanced bibliography for this entry at PhilPapers , with links to its database.
  • philpapers: Philosophy of Language
  • The semantics archive

action | compositionality | conditionals | contextualism, epistemic | convention | descriptions | discourse representation theory | Frege, Gottlob | Grice, Paul | indexicals | meaning: normativity of | meaning: of words | meaning holism | mental representation | mind: computational theory of | names | natural kinds | paradox: Skolem’s | personal identity | possible worlds | pragmatics | propositional attitude reports | propositions: singular | propositions: structured | quantifiers and quantification | relativism | rigid designators | Sellars, Wilfrid | semantics: dynamic | semantics: proof-theoretic | semantics: two-dimensional | situations: in natural language semantics | Tarski, Alfred: truth definitions | tense and aspect | time | Wittgenstein, Ludwig

Copyright © 2019 by Jeff Speaks < jspeaks @ nd . edu >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2023 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

  • Tools and Resources
  • Customer Services
  • Applied Linguistics
  • Biology of Language
  • Cognitive Science
  • Computational Linguistics
  • Historical Linguistics
  • History of Linguistics
  • Language Families/Areas/Contact
  • Linguistic Theories
  • Neurolinguistics
  • Phonetics/Phonology
  • Psycholinguistics
  • Sign Languages
  • Sociolinguistics
  • Share This Facebook LinkedIn Twitter

Article contents

Lexical semantics.

  • Dirk Geeraerts Dirk Geeraerts University of Leuven
  • https://doi.org/10.1093/acrefore/9780199384655.013.29
  • Published online: 25 January 2017

Lexical semantics is the study of word meaning. Descriptively speaking, the main topics studied within lexical semantics involve either the internal semantic structure of words, or the semantic relations that occur within the vocabulary. Within the first set, major phenomena include polysemy (in contrast with vagueness), metonymy, metaphor, and prototypicality. Within the second set, dominant topics include lexical fields, lexical relations, conceptual metaphor and metonymy, and frames. Theoretically speaking, the main theoretical approaches that have succeeded each other in the history of lexical semantics are prestructuralist historical semantics, structuralist semantics, and cognitive semantics. These theoretical frameworks differ as to whether they take a system-oriented rather than a usage-oriented approach to word-meaning research but, at the same time, in the historical development of the discipline, they have each contributed significantly to the descriptive and conceptual apparatus of lexical semantics.

  • structuralism
  • cognitive semantics
  • lexical field theory
  • componential analysis
  • semasiology
  • onomasiology

Lexical semantics is the study of word meaning. The following first presents an overview of the main phenomena studied in lexical semantics and then charts the different theoretical traditions that have contributed to the development of the field. The focus lies on the lexicological study of word meaning as a phenomenon in its own right, rather than on the interaction with neighboring disciplines. This implies that morphological semantics, that is the study of the meaning of morphemes and the way in which they combine into words, is not covered, as it is usually considered a separate field from lexical semantics proper. Similarly, the interface between lexical semantics and syntax will not be discussed extensively, as it is considered to be of primary interest for syntactic theorizing. There is no room to discuss the relationship between lexical semantics and lexicography as an applied discipline. For an entry-level text on lexical semantics, see Murphy ( 2010 ); for a more extensive and detailed overview of the main historical and contemporary trends of research in lexical semantics, see Geeraerts ( 2010 ).

1 The Descriptive Scope of Lexical Semantics

The main phenomena studied by lexical semantics are organized along two dimensions. First, it makes a difference whether we look at semantic phenomena within individual words or whether we look at meaningful structures within the vocabulary as a whole. Terminologically, this difference of perspective can be expressed by referring to a ‘semasiological’ and an ‘onomasiological’ perspective. (Semasiology looks at the relationship between words and meaning with the word as starting point: it is basically interested in the polysemy of words. Onomasiology takes the converse perspective: given a concept to be expressed or a thing to be categorized, what options does a language offer, and how are the choices made?) Second, a distinction needs to be made between an approach that focuses on elements and relations only and one that takes into account the differences of structural weight between those elements and relations. Even though the terms are not perfect, we can use the terms ‘qualitative approach’ and ‘quantitative approach’ to refer to this second distinction. If we cross-classify the two distinctions, we get four groups of topics. ‘Qualitative’ semasiology deals with word senses and the semantic links among those senses, like metaphor and metonymy at the level of individual words. ‘Qualitative’ onomasiology deals with the semantic relations among lexical items, like lexical fields and lexical relations. ‘Quantitative’ semasiology deals with prototype effects: differences of salience and structural weight within an item or a meaning. ‘Quantitative’ onomasiology deals with salience effects in the lexicon at large, like basic-level phenomena.

Table 1 The Descriptive Scope of Lexical Semantics

The four groups of topics are summarized in Table 1 . As will be seen later, this schematic representation is also useful to identify the contribution of the various theoretical approaches that have successively dominated the evolution of lexical semantics.

1.1 Polysemy and vagueness

Establishing which meanings a word has is arguably the basic step in lexical semantic research. Polysemy is the common term for the situation in which a lexical item has more than one meaning, such as when late can mean ‘after the usual, expected, or agreed time’ ( I am late again ), ‘advanced in day or night’ ( a late dinner ), or ‘no longer alive’ ( my late aunt Polly ). Terminologically speaking, polysemy needs to be contrasted with homonymy and, more importantly, vagueness. When two (or more) words have the same shape, such as bank (‘slope, elevation in sea or river bed’) and bank (‘financial institution’), they are homonyms; whereas polysemy refers to multiplicity of meaning within a single word, the multiplicity is distributed over various words in the case of homonymy. As such, making a distinction between polysemy and homonymy comes down to determining whether we are dealing with one and the same word or with two different ones. The distinction between vagueness and polysemy involves the question of whether a particular piece of semantic information is part of the underlying semantic structure of the item or is the result of a contextual (and hence pragmatic) specification. For instance, neighbor is not polysemous between the readings ‘male dweller next door’ and ‘female dweller next door,’ in the sense that the utterance my neighbor is a civil servant will not be recognized as requiring disambiguation in the way that she is smart might ( Do you mean ‘bright’ or ‘stylish’? ). The semantic information that is associated with the item neighbor in the lexicon does not, in other words, contain a specification regarding sex; neighbor is vague (or general, or unspecified) as to the dimension of gender.

To decide between polysemy and vagueness, a number of tests can be invoked. The three main ones are the following. First, from a truth-theoretical point of view, a lexical item is polysemous if it can simultaneously be clearly true and clearly false of the same referent. Considering the readings ‘harbor’ and ‘fortified sweet wine from Portugal’ of port , the polysemy of that item is established by sentences such as Sandeman is a port (in a bottle) , but not a port (with ships). This criterion basically captures a semantic intuition: are two interpretations of a given expression intuitively sufficiently dissimilar so that one may be said to apply and the other not?

Second, linguistic tests involve syntactic rather than semantic intuitions. Specifically, they are based on acceptability judgments about sentences that contain two related occurrences of the item under consideration (one of which may be implicit). If the grammatical relationship between both occurrences requires their semantic identity, the resulting sentence may be an indication for the polysemy of the item. For instance, the so-called identity test involves ‘identity-of-sense anaphora.’ Thus, at midnight the ship passed the port, and so did the bartender is awkward if the two lexical meanings of port are at stake. Disregarding puns, it can only mean that the ship and the bartender alike passed the harbor, or conversely that both moved a particular kind of wine from one place to another. A mixed reading, in which the first occurrence of port refers to the harbor and the second to wine, is normally excluded. By contrast, the fact that the notions ‘vintage sweet wine from Portugal’ and ‘blended sweet wine from Portugal’ can be combined in Vintage Noval is a port, and so is blended Sandeman indicates that port is vague rather than polysemous with regard to the distinction between blended and vintage wines.

Third, the definitional criterion specifies that an item has more than one lexical meaning if there is no minimally specific definition covering the extension of the item as a whole, and that it has no more lexical meanings than there are maximally general definitions necessary to describe its extension. Definitions of lexical items should be maximally general in the sense that they should cover as large a subset of the extension of an item as possible. Thus, separate definitions for ‘blended sweet fortified wine from Portugal’ and ‘vintage sweet fortified wine from Portugal’ could not be considered definitions of lexical meanings, because they can be brought together under the definition ‘sweet fortified wine from Portugal.’ On the other hand, definitions should be minimally specific in the sense that they should be sufficient to distinguish the item from other nonsynonymous items. A maximally general definition covering both port ‘harbor’ and port ‘kind of wine’ under the definition ‘thing, entity’ is excluded because it does not capture the specificity of port as distinct from other words.

The distinction between polysemy and vagueness is not unproblematic, methodologically speaking. An examination of different basic criteria for distinguishing between polysemy and vagueness reveals, first, that those criteria may be in mutual conflict (in the sense that they need not lead to the same conclusion in the same circumstances) and, second, that each of them taken separately need not lead to a stable distinction between polysemy and vagueness (in the sense that what is a distinct meaning according to one of the tests in one context may be reduced to a case of vagueness according to the same test in another context). Without going into detail (for a full treatment, see Geeraerts, 1993 ), let us illustrate the first type of problem. In the case of autohyponymous words, for instance, the definitional approach does not reveal an ambiguity, whereas the truth-theoretical criterion does. Dog is autohyponymous between the readings ‘Canis familiaris,’ contrasting with cat or wolf , and ‘male Canis familiaris,’ contrasting with bitch . A definition of dog as ‘male Canis familiaris,’ however, does not conform to the definitional criterion of maximal coverage, because it defines a proper subset of the ‘Canis familiaris’ reading. On the other hand, the sentence Lady is a dog, but not a dog , which exemplifies the logical criterion, cannot be ruled out as ungrammatical.

1.2 Semantic Relations

Once senses are identified (and assuming they can be identified with a reasonable degree of confidence), the type of relationship that exists between them needs to be established. The most common classification of semantic relations emerges from the tradition of historical semantics, that is, the vocabulary used to describe synchronic relations between word meanings is essentially the same as the vocabulary used to describe diachronic changes of meaning. In the simplest case, if sense a is synchronically related to sense b by metonymy, then a process of metonymy has acted diachronically to extend sense a to sense b : diachronic mechanisms of semasiological change reappear synchronically as semantic relations among word meanings.

The four basic types are specialization, generalization, metaphor, and metonymy (described here, from a diachronic perspective, as mechanisms rather than synchronic relations). In the case of semantic specialization , the new meaning is a restriction of the old meaning: the new meaning is a subcase of the old. In the case of semantic generalization , the reverse holds: the old meaning is a subcase of the new. Classical examples of specialization are corn (originally a cover-term for all kinds of grain, now specialized to ‘wheat’ in England, to ‘oats’ in Scotland, and to ‘maize’ in the United States), starve (moving from ‘to die’ to ‘to die of hunger’), and queen (originally ‘wife, woman,’ now restricted to ‘king’s wife, or female sovereign’). Examples of generalization are moon (primarily the earth’s satellite, but extended to any planet’s satellite), and French arriver (which originally meant ‘to reach the river’s shore, to embank,’ but which now signifies ‘to reach a destination’ in general). There is a lot of terminological variation in connection with specialization and generalization. ‘Restriction’ and ‘narrowing’ of meaning equal ‘specialization,’ while ‘extension,’ ‘schematization,’ and ‘broadening’ of meaning equal ‘generalization.’ Also, the meanings involved can be said to entertain relations of taxonomical subordination or superordination: in a taxonomy (a tree-like hierarchical classification) of concepts, the specialized meaning is subordinate with regard to the original one, whereas the generalized meaning is superordinate with regard to the original.

Like specialization and generalization, it is convenient and customary to introduce metaphor and metonymy together, even though the relationship is not as close as with the former pair. (More on metaphor and metonymy follows in section 1.6, “Conceptual Metaphor and Metonymy.” ) Metaphor is then said to be based on a relationship of similarity between the old and the new reading, and metonymy on a relationship of contiguity. Current computer terminology yields examples of both types. The desktop of your computer screen, for instance, is not the same as the desktop of your office desk—except that in both cases, it is the space (a literal space in one case, a virtual one in the other) where you position a number of items that you regularly use or that urgently need attention. The computer desktop, in other words, is not literally a desktop in the original sense, but it has a functional similarity with the original: the computer reading is a metaphorical extension of the original office furniture reading. Functional similarities also underlie metaphorical expressions like bookmark , clipboard , file , folder , cut , and paste . Mouse , on the other hand, is also metaphorically motivated, but here, the metaphorical similarity involves shape rather than function. But now consider a statement to the effect that your desktop will keep you busy for the next two weeks, or that you ask aloud where your mouse has gone when you are trying to locate the pointer on the screen. In such cases, desktop and mouse are used metonymically. In the former case, it’s not the virtual space as such that is relevant, but the items that are stored there. In the latter case, it’s not the mouse as such (the thing that you hold in your hand) that you refer to, but the pointer on the screen that is operated by the mouse. The desktop and the stored items, or the mouse and the pointer, have a relationship of real-world connectedness that is usually captured by the notion of ‘contiguity.’ When, for instance, one drinks a whole bottle, it is not the bottle but merely its contents that are consumed: bottle can be used to refer to a certain type of container, and the (spatially contiguous) contents of that container. When lexical semanticians state that metonymical changes are based on contiguity, contiguity should not be understood in a narrow sense as referring to spatial proximity only, but more broadly as a general term for various associations in the spatial, temporal, or causal domain.

1.3 Lexical Fields and Componential Analysis

A lexical field is a set of semantically related lexical items whose meanings are mutually interdependent. The single most influential study in the history of lexical field theory is Trier’s ( 1931 ) monograph, in which he presents a theoretical formulation of the field approach and investigates how the terminology for mental properties evolves from Old High German up to the beginning of the 13th century. Theoretically, Trier emphasizes that only a mutual demarcation of the words under consideration can provide a decisive answer regarding their exact value. Words should not be considered in isolation, but in their relationship to semantically related words: demarcation is always a demarcation relative to other words.

While different conceptions of the notion ‘lexical field’ were suggested after Trier’s initial formulation, the most important development is the emergence of componential analysis as a technique for formalizing the semantic relationships between the items in a field: once a lexical field has been demarcated, the internal relations within the field will have to be described in more detail. It is not sufficient to say that the items in the field are in mutual opposition—these oppositions will have to be identified and defined. Componential analysis is a method for describing such oppositions that takes its inspiration from structuralist phonology: just like phonemes are described structurally by their position on a set of contrastive dimensions, words may be characterized on the basis of the dimensions that structure a lexical field. Componential analysis provides a descriptive model for semantic content, based on the assumption that meanings can be described on the basis of a restricted set of conceptual building blocks—the semantic ‘components’ or ‘features.’

A brief illustration of the principles of componential analysis is given by Pottier ( 1964 ), who provides an example of a componential semantic analysis in his description of a field consisting of, among others, the terms siège , pouf , tabouret , chaise , fauteuil , and canapé (a subfield of the field of furniture terms in French). The word which acts as a superordinate to the field under consideration is siège , ‘seating equipment with legs.’ If we use the dimensions s1 ‘for seating,’ s2 ‘for one person,’ s3 ‘with legs,’ s4 ‘with back,’ s5 ‘with armrests,’ s6 ‘of rigid material,’ then chaise ‘chair’ can be componentially defined as [+ s1, + s2, + s3, + s4, − s5, + s6], and canapé ‘sofa’ as [+ s1, − s2, + s3, + s4, + s5, + s6], and so on.

While componential forms of description are common in formal types of semantic description (see the historical overview in section 2, “The Theoretical Evolution of Lexical Semantics,” specifically section 2.3, “Neostructuralist Semantics” ), the most important theoretical development after the introduction of componential analysis is probably Wierzbicka’s ( 1996 ) attempt to identify a restricted set of some 60 universally valid, innate components. The Natural Semantic Metalanguage aims at defining cross-linguistically transparent definitions by means of those allegedly universal building-blocks.

1.4 Lexical Relations

Like componential analysis, relational semantics, as introduced by Lyons ( 1963 ), develops the idea of describing the structural relations among related words. It, however, restricts the theoretical vocabulary to be used in such a description. In a componential analysis, the features are essentially of a ‘real world’ kind: as in Pottier’s example, they name properties of the things referred to, rather than properties of the meanings as such. But if linguistics is interested in the structure of the language rather than the structure of the world, it may want to use a descriptive apparatus that is more purely linguistic. Relational semantics looks for such an apparatus in the form of sense relations like synonymy (identity of meaning) and antonymy (oppositeness of meaning): the fact that aunt and uncle refer to the same genealogical generation is a fact about the world, but the fact that black and white are opposites is a fact about words and language. Instead of deriving statements about the synonymy or antonymy of a word (and in general, statements about the meaning relations it entertains) from a separate and independent description of the word’s meaning, the meaning of the word could be defined as the total set of meaning relations in which it participates. A traditional approach to synonymy would for instance describe the meaning of both quickly and speedily as ‘in a fast way, not taking up much time,’ and then conclude to the synonymy of both terms on the basis of their definitional identity. Lyons by contrast deliberately eschews such content descriptions, and equates the meaning of a word like quickly with the synonymy relation it has with speedily , plus any other relations of that kind.

In the actual practice of relational semantics, ‘relations of that kind’ specifically include—next to synonymy and antonymy—relations of hyponymy (or subordination) and hyperonymy (or superordination), which are both based on taxonomical inclusion. The major research line in relational semantics involves the refinement and extension of this initial set of relations. The most prominent contribution to this endeavor after Lyons is found in Cruse ( 1986 ). Murphy ( 2003 ) is a thoroughly documented critical overview of the relational research tradition.

1.5 Distributional Relations

Given a Saussurean distinction between paradigmatic and syntagmatic relations, lexical fields as originally conceived are based on paradigmatic relations of similarity. One extension of the field approach, then, consists of taking a syntagmatic point of view. Words may in fact have specific combinatorial features which it would be natural to include in a field analysis. A verb like to comb , for instance, selects direct objects that refer to hair, or hair-like things, or objects covered with hair. Describing that selectional preference should be part of the semantic description of to comb . For a considerable period, these syntagmatic affinities received less attention than the paradigmatic relations, but in the 1950s and 1960s, the idea surfaced under different names. Firth ( 1957 ) for instance introduced the (now widely used) term collocation .

The distributional approach can be more radical than the mere incorporation of lexical combinatorics into the description of words: if the environments in which a word occurs could be used to establish its meaning, lexical semantics could receive a firm methodological basis. The general approach of a distributionalist method is summarized by Firth’s dictum: ‘You shall know a word by the company it keeps,’ that is, words that occur in the same contexts tend to have similar meanings. In the final decades of the 20th century, major advances in the distributional approach to semantics were achieved by applying a distributional way of meaning analysis to large text corpora. Sinclair, a pioneer of the approach, developed his ideas (see Sinclair, 1991 ) through his work on the Collins Cobuild English Language Dictionary , for which a 20-million-word corpus of contemporary English was compiled. In Sinclair’s original conception, a collocational analysis is basically a heuristic device to support the lexicographer’s manual work. A further step in the development of the distributional approach was taken through the application of statistics as a method for establishing the relevance of a collocation and, more broadly, for analyzing the distributional co-occurrence patterns of words (see Glynn & Robinson, 2014 , for a state-of-the-art overview of quantitative corpus semantics).

1.6 Conceptual Metaphor and Metonymy

Metaphorical relations of the kind mentioned in section 1.2 ( “Semantic Relations” ) do not only exist between the readings of a given word: several words may exhibit similar metaphorical patterns. Conceptual metaphor theory, the approach introduced by Lakoff and Johnson ( 1980 ), includes two basic ideas: first, the view that metaphor is a cognitive phenomenon, rather than a purely lexical one; second, the view that metaphor should be analyzed as a mapping between two domains. To illustrate the first point, metaphor comes in patterns that transcend the individual lexical item. A typical example (Lakoff & Johnson, 1980 , pp. 44–45) is the following.

love is a journey Look how far we’ve come. We are at a crossroads. We’ll just have to go our separate ways. We cannot turn back now. We are stuck. This relationship is a dead-end street. I don’t think this relationship is going anywhere. It’s been a long, bumpy road. We have gotten off the track.

The second pillar of conceptual metaphor theory is the analysis of the mappings inherent in metaphorical patterns. Metaphors conceptualize a target domain in terms of the source domain, and such a mapping takes the form of an alignment between aspects of the source and target. For love is a journey , for instance, the following correspondences hold (compare Lakoff & Johnson, 1999 , p. 64).

Metonymies too can be systematic in the sense that they form patterns that apply to more than just an individual lexical item. Thus, the bottle example mentioned in section 1.2 ( “Semantic Relations” ) exhibits the name of a container (source) being used for its contents (target), a pattern that can be abbreviated as container for contents . Making use of this abbreviated notation, other common types of metonymy are the following: a spatial location for what is located there ( the whole theater was in tears ); a period of time for what happens in that period, for the people who live then, or for what is produced during that period ( the 19th century had a nationalist approach to politics ); a material for the product made from it ( a cork ); the origin for what originates from it ( astrakhan , champagne , emmental ); an activity or event for its consequences (when the blow you have received hurts, it is not the activity of your adversary that is painful, but the physical effects that it has on your body); an attribute for the entity that possesses the attribute ’ ( majesty does not only refer to ‘royal dignity or status,’ but also to the sovereign himself); and of course part for whole ( a hired hand ). The relations can often work in the other direction as well. To fill up the car , for instance, illustrates a type whole for part : it’s obviously only a part of the car that gets filled. For the current state of affairs in metonymy research from a cognitive semantic point of view, see Benczes, Barcelona, and Ruiz de Mendoza Ibáñez ( 2011 ).

Yet another approach to semantic structure in the lexicon focuses on the way our knowledge of the world is organized in larger ‘chunks of knowledge’ and how these interact with language. The most articulate model in this respect is Fillmore’s frame theory (Fillmore & Atkins, 1992 ; and see Ruppenhofer, Ellsworth, Petruck, Johnson, & Scheffczyk, 2006 , for the large-scale application of frame theory in the FrameNet project). Frame theory is specifically interested in the way in which language may be used to perspectivize an underlying conceptualization of the world: it’s not just that we see the world in terms of conceptual models, but those models may be verbalized in different ways. Each different way of bringing a conceptual model to expression so to speak adds another layer of meaning: the models themselves are meaningful ways of thinking about the world, but the way we express the models while talking adds perspective. This overall starting point of Fillmorean frame theory leads to a description on two levels. On the one hand, a description of the referential situation or event consists of an identification of the relevant elements and entities, and the conceptual role they play in the situation or event. On the other hand, the more purely linguistic part of the analysis indicates how certain expressions and grammatical patterns highlight aspects of that situation or event.

An illustration comes from the standard example of frame theory, the commercial transaction frame. The commercial transaction frame involves words like buy and sell . The commercial transaction frame can be characterized informally by a scenario in which one person gets control or possession of something from a second person, as a result of a mutual agreement through which the first person gives the second person a sum of money. Background knowledge involved in this scenario includes an understanding of ownership relations, a money economy, and commercial contracts. The categories that are needed for describing the lexical meanings of the verbs linked to the commercial transaction scene include Buyer, Seller, Goods, and Money as basic categories. Verbs like buy and sell then each encode a certain perspective on the commercial transaction scene by highlighting specific elements of the scene. In the case of buy , for instance, the buyer appears in the participant role of the agent, for instance as the subject of the (active) sentence. In active sentences, the goods then appear as the direct object; the seller and the money appear in prepositional phrases: Paloma bought a book from Teresa for €30 . In the case of sell , on the other hand, it is the seller that appears in the participant role of the agent: Teresa sold a book to Paloma for €30 .

1.8 Prototype Effects and Radial Sets

The prototype-based conception of categorization originated in the mid-1970s with Rosch’s psycholinguistic research into the internal structure of categories (see, among others, Rosch, 1975 ). Rosch concluded that the tendency to define categories in a rigid way clashes with the actual psychological situation. Linguistic categories do not have sharply delimited borderlines. Instead of clear demarcations between equally important conceptual areas, one finds marginal areas between categories that are unambiguously defined only in their focal points. This observation was taken over and elaborated in linguistic lexical semantics (see Hanks, 2013 ; Taylor, 2003 ). Specifically, it was applied not just to the internal structure of a single word meaning, but also to the structure of polysemous words, that is, to the relationship between the various meanings of a word. Four characteristics, then, are frequently mentioned in the linguistic literature as typical of prototypicality.

Prototypical categories cannot be defined by means of a single set of criterial (necessary and sufficient) attributes.

Prototypical categories exhibit a family-resemblance structure, i.e., one like the similarities that exist between relatives (some have the same typical hair color, some have the same typically shaped nose, some have the same typical eyes, but none have all and only the typical family traits); the different uses of a word have several features in common with one or more other uses, but no features are common to all uses. More generally, their semantic structure takes the form of a set of clustered and overlapping meanings (which may be related by similarity or by other associative links, such as metonymy). Because this clustered set is often built up round a central meaning, the term ‘radial set’ is often used for this kind of polysemic structure.

Prototypical categories exhibit degrees of category membership; not every member is equally representative for a category.

Prototypical categories are blurred at the edges.

By way of example, consider fruit as referring to a type of food. If you ask people to list kinds of fruit, some types come to mind more easily than others. For American and European subjects (there is clear cultural variation on this point), oranges, apples, and bananas are the most typical fruits, while pineapples, watermelons, and pomegranates receive low typicality ratings. This illustrates the third characteristic mentioned above. But now, consider coconuts and olives. Is a coconut or an olive a fruit in the ordinary everyday sense of that word? For many people, the answer is not immediately obvious, which illustrates the fourth characteristic: if we zoom in on the least typical exemplars of a category, membership in the category may become fuzzy. A category like fruit should be considered not only with regard to the exemplars that belong to it, but also with regard to the features that these category members share and that together define the category. Types of fruit do not, however, share a single set of definitional features that sufficiently distinguishes fruit from, say, vegetables and other natural foodstuffs. All are edible seed-bearing parts of plants, but most other features that we think of as typical for fruit are not general: while most are sweet, some are not, like lemons; while most are juicy, some are not, like bananas; while most grow on trees and tree-like plants, some grow on bushes, like strawberries; and so on. This absence of a neat definition illustrates the first characteristic. Instead of such a single definition, what seems to hold together the category are overlapping clusters of representative features. Whereas the most typical kinds of fruit are the sweet and juicy ones that grow on trees, other kinds may lack one or even more of these features. This then illustrates the second characteristic mentioned above.

The four characteristics are systematically related along two dimensions. On the one hand, the third and the fourth characteristics take into account the referential, extensional structure of a category. In particular, they consider the members of a category; they observe, respectively, that not all referents of a category are equal in representativeness for that category and that the denotational boundaries of a category are not always determinate. On the other hand, these two aspects (centrality and nonrigidity) recur on the intensional level, where the definitional rather than the referential structure of a category is envisaged. For one thing, nonrigidity shows up in the fact that there is no single necessary and sufficient definition for a prototypical concept. For another, family resemblances imply overlapping of the subsets of a category; consequently, meanings exhibiting a greater degree of overlapping will have more structural weight than meanings that cover only peripheral members of the category. As such, the clustering of meanings that is typical of family resemblances implies that not every meaning is structurally equally important (and a similar observation can be made with regard to the components into which those meanings may be analyzed).

The four characteristics are not coextensive; that is, they do not necessarily occur together. In that sense, some words may exhibit more prototypicality effects than others. In the practice of linguistics, the second feature in particular has attracted the attention, and the radial set model (which graphically represents the way in which less central meanings branch out from the prototypical, core reading) is a popular representational format in lexical semantics; see Tyler and Evans ( 2001 ) for an example.

1.9 Basic Levels and Onomasiological Salience

Possibly the major innovation of the prototype model of categorization is to give salience a place in the description of semasiological structure: next to the qualitative relations among the elements in a semasiological structure (like metaphor and metonymy), a quantifiable center-periphery relationship is introduced as part of the architecture. But the concept of salience can also be applied to the onomasiological domain.

The initial step in the introduction of onomasiological salience is the basic-level hypothesis . The hypothesis is based on the ethnolinguistic observation that folk classifications of biological domains usually conform to a general organizational principle, in the sense that they consist of five or six taxonomical levels (Berlin, 1978 ). The basic-level hypothesis embodies a notion of onomasiological salience, because it is a hypothesis about alternative categorizations of referents: if a particular referent (a particular piece of clothing) can be alternatively categorized as a garment, a skirt, or a wrap-around skirt, the choice will be preferentially made for the basic-level category ‘skirt.’ But differences of onomasiological preference also occur among categories on the same level in a taxonomical hierarchy. If a particular referent can be alternatively categorized as a wrap-around skirt or a miniskirt, there could just as well be a preferential choice: when you encounter something that is both a wrap-around skirt and a miniskirt, the most natural way of naming that referent in a neutral context would probably be ‘miniskirt.’ If, then, we have to reckon with intra-level differences of salience next to inter-level differences, the concept of onomasiological salience has to be generalized in such a way that it relates to individual categories at any level of the hierarchy.

This notion of generalized onomasiological salience was first introduced in Geeraerts, Grondelaers, and Bakema ( 1994 ). Using corpus materials, this study established that the choice for one lexical item rather than the other as the name for a given referent is determined by the semasiological salience of the referent (i.e., the degree of prototypicality of the referent with regard to the semasiological structure of the category), by the overall onomasiological salience of the category represented by the expression, and by contextual features of a classical sociolinguistic and geographical nature, involving the competition between different language varieties. By zooming in on the last type of factor, a further refinement of the notion of onomasiological salience is introduced, in the form the distinction between conceptual and formal onomasiological variation. Whereas conceptual onomasiological variation involves the choice of different conceptual categories for a referent (like the examples presented so far), formal onomasiological variation merely involves the use of different synonymous names for the same conceptual category. The names jeans and trousers for denim leisure-wear trousers constitute an instance of conceptual variation, for they represent categories at different taxonomical levels. Jeans and denims , however, represent no more than different (but synonymous) names for the same denotational category.

2. The Theoretical Evolution of Lexical Semantics

Four broadly defined theoretical traditions may be distinguished in the history of word-meaning research.

2.1 Prestucturalist Historical Semantics

The prestructuralist period (ranging from the middle of the 19th century up to the 1930s) was the heyday of historical semantics, in the sense that the study of meaning change reigned supreme within semantics. The main theoretical achievement of prestructuralist historical semantics consists of various classifications of types of semantic change, coupled with considerable attention to psychological processes as the explanatory background of changes: the general mechanisms of change included in the classifications were generally considered to be based on the associative patterns of thought of the human mind. Important figures (among many others) are Hermann Paul, Michel Bréal, and Gustaf Stern (see Ullmann, 1962 , for an introductory overview). With the shift toward a structuralist approach that occurred round 1930 , lexical semantics switched from a preference for diachronic studies to a preference for synchronic studies. However, the poststructuralist cognitive approach provides a new impetus for historical lexical semantics.

2.2 Structuralist Semantics

Inspired by the Saussurean conception of language, structural semantics originated as a reaction against prestructural historical semantics. The origins of structural semantics are customarily attributed to Trier ( 1931 ), but while Trier’s monograph may indeed be the first major descriptive work in structural semantics, the first theoretical and methodological definition of the new approach is to be found in Weisgerber ( 1927 ), a polemical article that criticized historical linguistics on three points. First and foremost, because the vocabulary of a language is not simply an unstructured set of separate items, and because the meaning of a linguistic sign is determined by its position in the linguistic structures in which it takes part, the proper subject matter of semantics is not the atomistic changes of word meanings that historical semantics had concentrated on, but the semantic structure of the language that demarcates the meanings of individual words with regard to each other. Second, because that structure is a linguistic rather than a psychological phenomenon, linguistic meanings should not be studied from a psychological perspective, but from a purely linguistic one. And third, because semantic change has to be redefined as change in semantic structures, synchronic semantics methodologically precedes diachronic semantics: the synchronic structures have to be studied before their changes can be considered. The realization of this attempt to develop a synchronic, nonpsychological, structural theory of semantics depends on the way in which the notion of semantic structure is conceived. In actual practice, there are mainly three distinct definitions of semantic structure that have been employed by structuralist semanticians. More particularly, three distinct kinds of structural relations among lexical items have been singled out as the proper methodological basis of lexical semantics. First, there is the relationship of semantic similarity that lies at the basis of semantic field analysis and componential analysis: see section 1.3, “Lexical Fields and Componential Analysis.” Second, there are unanalyzed lexical relations such as synonymy, antonymy, and hyponymy: see section 1.4, “Lexical Relations.” Third, syntagmatic lexical relations lie at the basis of a distributional approach to semantics: see section 1.5, “Distributional Relations.”

2.3 Neostructuralist Semantics

While componential analysis was developed in the second half of the 1950s and the beginning of the 1960s by European as well as American structural linguists, its major impact came from its incorporation into generative grammar: the publication of Katz and Fodor ( 1963 ) marked a theoretical migration of lexical semantics from a structuralist to a generativist framework. As a model for lexical semantics, Katzian semantics combined an essentially structuralist approach with two novel characteristics: the explicit inclusion of lexical description in a generative grammar and, accordingly (given that the grammar is a formal one), an interest in the formalization of lexical descriptions. Although Katzian semantics as such has long been abandoned, both features continue to play a role in this ‘neostructuralist’ tradition (the label is not an established one, but it will do for lack of a more conventional one). On the one hand, the integration of the lexicon into the grammar informs the continuing debate about the interface of lexicon and syntax; see Wechsler ( 2015 ) for an overview. On the other hand, a number of models for the formalization of word meaning have been developed, the most prominent of which is Pustejovsky’s ‘generative lexicon’ approach ( 1995 ).

2.4 Cognitive Semantics

Compared to prestructuralist semantics, structuralism constitutes a move toward a more purely ‘linguistic’ type of lexical semantics, focusing on the linguistic system rather than the psychological background or the contextual flexibility of meaning. With the poststructuralist emergence of cognitive semantics, the pendulum swings back to a position in which the distinction between semantics and pragmatics is not a major issue, in which language is seen in the context of cognition at large, and in which language use is as much a focus of enquiry as the language system. Cognitive lexical semantics emerged in the 1980s as part of cognitive linguistics, a loosely structured theoretical movement that opposed the autonomy of grammar and the marginal position of semantics in the generativist theory of language. Important contributions to lexical semantics include prototype theory (see section 1.8, “Prototype Effects and Radial Sets” ), conceptual metaphor theory (see section 1.6, “Conceptual Metaphor and Metonymy” ), frame semantics (see section 1.8), and the emergence of usage-based onomasiology (see section 1.9, “Basic Levels and Onomasiological Salience” ).

From a theoretical perspective, the various traditions are to some extent at odds with each other (as may be expected). Specifically, structuralist (and to a large extent neostructuralist) theories tend to look at word meaning primarily as a property of the language, that is the linguistic system as an entity in its own right. Prestructuralist historical semantics and cognitive semantics, on the other hand, tend to emphasize the way in which word meanings are embedded in or interact with phenomena that lie outside language in a narrow sense, like general cognitive principles, or the cultural, social, historical experience of the language user. They then also take a more ‘pragmatic’ perspective: if the emphasis moves away from the linguistic system as a more or less stable, more or less autonomous repository of possibilities, there will be more attention to language use as the actualization of those possibilities.

Descriptively speaking, however, each of the major theoretical frameworks has contributed to the expansion of lexical semantics, that is they have drawn attention to specific phenomena and they have proposed terms, classifications, and representational formats for analyzing those phenomena. Focusing on the major topics, these contributions successively include the links between the various senses of words in prestructuralist historical semantics, the semantic relationships within the vocabulary in the structuralist era, and the importance of semasiological and onomasiological salience effects in cognitive semantics. Regardless of the theoretical oppositions, these phenomena all belong to the descriptive scope of current lexical semantics: the emergence of new points of attention has not made the older topics irrelevant.

Table 2 The Contribution of the Successive Theoretical Traditions

A summary of the contribution of the major theoretical approaches is given in Table 2 . If one keeps in mind the chronology of the various theories, it will be clear that regardless of the theoretical differences, lexical semantics has witnessed an outspoken descriptive expansion, from a semasiological starting point to various forms of onomasiological structure, and from a focus on elements and structures alone to the relevance of salience effects on the semasiological and onomasiological architecture of meaning.

Further Reading

  • Goddard, C. (1998). Semantic analysis: A practical introduction . Oxford: Oxford University Press.
  • Riemer, N. (2015). Word meanings. In J. R. Taylor (Ed.), The Oxford handbook of the word (pp. 315–319). Oxford: Oxford University Press.
  • Benczes, R. , Barcelona, A. , & Ruiz de Mendoza Ibáñez, F. (Eds.). (2011). Defining metonymy in cognitive linguistics: Towards a consensus view . Amsterdam: John Benjamins.
  • Berlin, B. (1978). Ethnobiological classification. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization (pp. 9–26). Hillsdale, NJ: Lawrence Erlbaum.
  • Cruse, D. A. (1986). Lexical semantics . Cambridge, U.K.: Cambridge University Press.
  • Fillmore, C. J. , & Atkins, B. T. S. (1992). Toward a frame-based lexicon: The semantics of ‘risk’ and its neighbors. In A. Lehrer & E. F. Kittay (Eds.), Frames, fields and contrasts: New essays in semantic and lexical organization (pp. 75–102). Hillsdale, NJ: Lawrence Erlbaum.
  • Firth, J. R. (1957). Papers in linguistics, 1934–51 . Oxford: Oxford University Press.
  • Geeraerts, D. (1993). Vagueness’s puzzles, polysemy’s vagaries. Cognitive Linguistics , 4 , 223–272.
  • Geeraerts, D. (2010). Theories of lexical semantics. Oxford: Oxford University Press.
  • Geeraerts, D. , Grondelaers, S. , & Bakema, P. (1994). The structure of lexical variation: Meaning, naming, and context . Berlin: Mouton de Gruyter.
  • Glynn, D. , & Robinson, J. A. (Eds.). (2014). Corpus methods for semantics: Quantitative studies in polysemy and synonymy . Amsterdam: John Benjamins.
  • Hanks, P. W. (2013). Lexical analysis: Norms and exploitations . Cambridge, MA: MIT Press.
  • Katz, J. J. , & Fodor, J. A. (1963). The structure of a semantic theory. Language , 39 , 170–210.
  • Lakoff, G. , & Johnson, M. (1980). Metaphors we live by . Chicago: University of Chicago Press.
  • Lakoff, G. , & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenges to western thought . Chicago: University of Chicago Press.
  • Lyons, J. (1963). Structural semantics . Oxford: Blackwell.
  • Murphy, M. L. (2003). Semantic relations and the lexicon: Antonymy, synonymy, and other paradigms . Cambridge, U.K.: Cambridge University Press.
  • Murphy, M. L. (2010). Lexical meaning . Cambridge, U.K.: Cambridge University Press.
  • Pottier, B. (1964). Vers une sémantique moderne. Travaux de linguistique et de littérature , 2 , 107–137.
  • Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press.
  • Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology , 104 , 192–233.
  • Ruppenhofer, J. , Ellsworth, M. , Petruck, M. R. L. , Johnson, C. R. , & Scheffczyk, J. (2006). FrameNet II: Extended theory and practice . Berkeley, CA: FrameNet.
  • Sinclair, J. M. (1991). Corpus, concordance, collocation . Oxford: Oxford University Press.
  • Taylor, J. R. (2003). Linguistic categorization . 3d ed. Oxford: Oxford University Press.
  • Trier, J. (1931). Der deutsche Wortschatz im Sinnbezirk des Verstandes: Die Geschichte eines sprachlichen Feldes I. Von den Anfängen bis zum Beginn des 13. Jhdts. Heidelberg: Winter.
  • Tyler, A. , & Evans, V. (2001). Reconsidering prepositional polysemy networks: the case of ‘over.’ Language , 77 , 724–765.
  • Ullmann, S. (1962). Semantics: An introduction to the science of meaning . Oxford: Blackwell.
  • Wechsler, S. (2015). Word meaning and syntax: Approaches to the interface . Oxford: Oxford University Press.
  • Weisgerber, L. (1927). Die Bedeutungslehre: Ein Irrweg der Sprachwissenschaft? Germanisch-Romanische Monatsschrift , 15 , 161–183.
  • Wierzbicka, A. (1996). Semantics: Primes and universals . Oxford: Oxford University Press.

Related Articles

  • Middle English
  • Chinese Semantics
  • Phonological Templates in Development
  • Argument Realization in Syntax
  • Lexical Semantic Framework for Morphology
  • Cognitively Oriented Theories of Meaning
  • Acquisition of Pragmatics
  • Type Theory for Natural Language Semantics
  • Conversational Implicature
  • Nominal Reference
  • The Onomasiological Approach
  • Nominalization: General Overview and Theoretical Issues
  • Artificial Languages

Printed from Oxford Research Encyclopedias, Linguistics. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 26 April 2024

  • Cookie Policy
  • Privacy Policy
  • Legal Notice
  • [66.249.64.20|91.193.111.216]
  • 91.193.111.216

Character limit 500 /500

University of Notre Dame

Notre Dame Philosophical Reviews

  • Home ›
  • Reviews ›

Metasemantics: New Essays on the Foundations of Meaning

Placeholder book cover

Alexis Burgess and Brett Sherman, Metasemantics: New Essays on the Foundations of Meaning, Oxford University Press, 2014, 367pp., $74.00 (hbk), ISBN 9780199669592.

Reviewed by Derek Ball, University of St Andrews

Metasemantics is a valuable addition to the literature in philosophy of language and linguistics. The subtitle indicates that the topic is the foundations of meaning, but the essays discuss a surprisingly wide range of themes. In fact, one of the remarkable features of the collection is the level of reflective disagreement about the right foundational questions to ask: in addition to the editors' introduction, several of the essays are devoted to distinguishing and defending the significance of different metasemantic questions. I will start by describing those essays that focus in large part on questions, with an eye toward using those questions to structure the rest of the discussion.

I. Metasemantic Questions

What, then, are the metasemantic questions? In their introduction, Alexis Burgess and Brett Sherman distinguish three families of metasemantic issues: basic metasemantics (which aims to describe the grounds or metaphysical bases of semantic facts); the theory of meaning (which aims to give an account of the nature or essence of the semantic); and the metaphysics of semantic values (which addresses such questions as whether semantic values should be thought of as truth-conditions, conceptual roles, or something else).

In his contribution, Seth Yalcin distinguishes between semantic value and content. 'Content', in his terminology, has its home in folk psychological explanation of behaviour and the theory of intentionality, while 'semantic value' is a technical notion of natural language semantics, designed to explain facts about productivity, entailment, and acceptability. Meta-level questions can be asked about content and about semantic value. Yalcin casts semantic value as part of the Chomskian project of characterising the language faculty, contrasting this project with various alternative conceptions of semantics. Yalcin's view is not just that concepts of content and semantic value are distinguishable in principle; he regards it as a completely open question how (and indeed whether) semantic value relates to content. Yalcin ends with a methodological claim: we cannot be confident in our metasemantics as things stand since semantics is at an early stage of development, so the best way to make progress on metasemantic theorising is to do more work in semantics.

Someone sympathetic to the distinction between content and semantic value might see a tension between this methodology and Yalcin's seeming confidence that semantic values should be conceived of as part of a Chomskian project with no clear relevance to issues of primary concern to many or most philosophers (and at least some linguists) interested in semantics. Perhaps until we have more detailed first-order semantic theories, we should remain open to other ways of understanding what claims about semantic value are about. One of the contributions of Alejandro Pérez Carballo's essay is its attempt to articulate what is at issue between various views of what semantics is about. He calls this 'the hermeneutic question'; on a rough first gloss, it is the question of what it is for a theory in a particular discipline to be correct. He then argues that expressivism is compatible with a standard semantics that (for example) assigns sets of worlds as the semantic values of sentences because expressivism is a way of answering the hermeneutic question: according to the expressivist, a theory that assigns a set of worlds p as the semantic value of a sentence s is correct not because utterances of s represent (in some robust sense) the actual world as being among the p worlds but rather because p adequately characterises certain features of the mental state expressed by utterances of s -- an intriguing suggestion, which Carballo leaves relatively undeveloped.

Of those authors who explicitly defend a particular metasemantic question, Mark Greenberg (in the second of his two contributions) comes closest to one of the questions discussed in the editors' introduction. He argues that we should look for a constitutive account of content -- an account of its nature or essence' (169) -- and that we should not be satisfied with an account that merely specifies the grounds or metaphysical bases of content since an account of grounds leaves important questions unanswered. (In the editors' terminology, one might say that Greenberg's idea is that basic metasemantics is incomplete without a theory of meaning.) For example, consider Lewis's view that naturalness plays a role in determining content. Greenberg's claim is that this sort of view does not go deep enough. Even if Lewis is right, we should wonder: 'What is it to have content such that a property's naturalness is part of what makes it figure in the content of a representation?' (178)

II. Issues in 'Basic Metasemantics'

Greenberg's push for a constitutive account is motivated by arguments in his first essay, which emphasises the importance of Burge style examples that Greenberg takes to show that 'there are no beliefs or inferences, or transitions in thought more generally, that a thinker must have or be disposed to make in order to have a particular concept.' (147). Greenberg shows (what is perhaps obvious) that this claim poses a problem for conceptual role theories of content and also (what is somewhat more surprising) that it poses a problem for standard covariation theories (that attempt to account for content in terms of nomic or informational connections). Proponents of these theories who take the threat of the Burge phenomena seriously have sometimes claimed that while in normal cases, concept possession is to be explained in terms of (e.g.) conceptual role, there is a second, distinct way of possessing concepts (by 'deference'). But, Greenberg claims, this is no minor addition: if deference is a distinct way of possessing concepts, then conceptual role or covariation theories do not capture what it is to possess a concept (the nature or essence of concept possession) even if they explain one possible ground of concept possession.

Michael Caie's chapter argues against supervaluationist accounts of vagueness and suggests that the considerations that seem to motivate supervaluationism in fact motivate metaphysical indeterminacy. He situates supervaluationism in a Lewisian framework, according to which the right semantic theory is the theory that maximises fit with linguistic usage and intrinsic eligibility. Supervaluationism seems attractive on this framework because it is a way to accommodate the fact that many precise semantic theories fit use and eligibility equally well. But, Caie shows, there are alternatives to supervaluationism that fit the facts of use and eligibility as well as the supervaluationist theory. So the supervaluationist must endorse a double standard: she holds the precisifications on which her theory is based to a different standard than her own view. Caie then argues that the proponent of metaphysical indeterminacy can avoid such double standards by adopting a view on which it is determinate that our words have content but indeterminate what content they have.

Jeffrey C. King defends his 'coordination account' of the metasemantics of context-sensitive expressions, according to which the semantic values of contextual parameters are determined by facts about speaker intentions and facts about what a 'competent, attentive, reasonable hearer' (102) could come to know, against the objection (which he takes from unpublished work by Michael Glanzberg) that contemporary formal semantic theories often assign complex formal entities (such as functions) as the values of contextual parameters and that these formal entities are not plausibly the objects of speaker intentions. King argues that, at least in the case of gradable adjectives, there is a plausible alternative account according to which contextual parameters are simpler entities that can be determined by speaker intentions and that in cases where speakers do not seem to have suitably specific intentions, a range of semantic values are determined.

Amie L. Thomasson argues that deflationism about truth -- the view that the concept of truth is not 'even attempting to refer to a substantive property the nature of which we can investigate and hope to discover' (185) -- leads to a corresponding deflationism about existence -- the rejection of the view that '"existing" names (or even attempts to name) a property or activity of objects the substantive nature of which we can investigate' (192). Truth deflationism is a widely held view, and existence deflationism may strike many as counterintuitive, but Thomasson's argument is simple: in essence, it is just that we can derive existence claims from truth claims (for example, '<n is P> is true' entails that n exists and Ps exist). If there is no substantive characterisation of truth, then we may 'derive different existence claims from the truth of diverse propositions -- without any common condition holding, and so without any single substantive criterion for existence being fulfilled in all cases' (196). (Whether the argument is sound will depend on exactly how one understands truth deflationism. If truth claims entail existence claims and existence has a substantive nature, then there is a substantive necessary condition for truth. But in the absence of further argument, it is not clear that this entails that 'truth' names a substantive property or indeed anything about the nature of truth.) Thomasson goes on to argue that existence deflationism fits naturally with a conceptual role theory of concepts and that this leads to 'easy ontology': 'many of the most disputed existence questions may be answered quite straightforwardly' (205). (This is one of several places where it might have been interesting to put the volume's papers into dialogue with each other. One wonders how Thomasson might respond to Greenberg's discussion of conceptual role theory.)

Richard G. Heck Jr. is concerned to develop a view of semantics that can resist 'radical contextualist' arguments to the effect that truth-conditional semantics is impossible because truth conditions can depend on such a wide range of contextual factors that they could only be assigned by a theory that incorporated a complete theory of human rationality. Heck concedes that if we agree that 'semantic theory must systematically assign a truth condition to each actual utterance' (346), the objection is correct (a point he develops in detail with respect to demonstrative reference). But he rejects this conception of semantics in favour of a Strawsonian view on which speakers use sentences to say things. On Heck's view, there is no role for objective notions of reference or of what is said; there is only what speakers intend to refer to and to say and what their audiences interpret them as saying. Language facilitates this communication because the words one uses 'constrain what one can say by uttering them' (354); the job of semantics is to describe these constraints.

III. What Is Semantics?

A third group of papers can be seen as taking steps towards defining semantics. Glanzberg's conception of semantics is motivated by his view of the scope and limits of semantic explanation. He claims that the best explanations in semantics arise where semantic theory makes use of mathematical apparatus (as in the case of generalised quantifiers). Semantic theories tend to be unexplanatory where they rely on disquotation, but there is no alternative to this reliance, so semantic theory is inevitably explanatorily partial. On Glanzberg's view, we can make sense of this state of affairs by adopting a particular view of the structure of the lexicon: the lexicon consists partly of information that is 'coded by the language faculty' (282) and partly of 'pointers' to the conceptual system outside the language faculty. Semantic theory is explanatory in those areas where it can capture information coded by the language faculty; where the lexicon consists of pointers outside the language faculty, semantics may resort to disquotation.

Matti Eklund's essay is one of the hardest to place with respect to the metasemantic questions that are the focus of the volume. Eklund's focus is on views that regard the concept of truth as inconsistent and conclude on this basis that it ought to be replaced. (He himself agrees that truth is an inconsistent concept but is sceptical that it ought to be replaced.). So one connection between Eklund's essay and metasemantic issues is that he is considering an approach to semantics that is normative and revisionary: the aim is not to describe the semantics of the words or concepts we have but to engineer better words and concepts. A further connection is that his focus is on replacing truth for theoretical purposes, specifically for the purposes of semantic theorising. Eklund argues that this project makes sense only on some conceptions of the role of truth in semantics.

Isidora Stojanovic focuses on the semantics/pragmatics distinction. She observes a tension between the view that semantics is concerned with stable, lexically encoded properties of words and the view that semantics must determine truth conditional content (since truth conditions are what feed in to Gricean pragmatic processes). Stojanovic's solution is to complicate the semantics/pragmatics distinction: semantics is stable and lexically encoded; in order to get truth-conditional content, it must be supplemented by 'prepragmatics', which provides (inter alia) referents for context sensitive expressions.

Karen S. Lewis also focuses on the distinction between semantics and pragmatics. She argues that many phenomena that are claimed to motivate dynamic semantics can be better explained pragmatically. For example, Lewis argues that cross-sentential anaphora can be explained pragmatically on a view that allows pragmatic processes to work off more than a set of worlds (for example, a view on which semantic values are structured propositions) since such a view can distinguish a sentence like 'A woman walked in,' which permits anaphoric continuation ('She was wearing a hat'), from a sentence like 'It is not the case that no women walked in', which is truth-conditionally equivalent but does not permit anaphoric continuation. Roughly, the idea is that since a cooperative speaker will typically use an indefinite description such as 'a woman' only when she plans potentially to say more about the woman, hearers can reason in a broadly Gricean way to the conclusion that they should look out for further reference to the woman.

IV. Metaphysics of Semantic Values

Stojanovic's and Lewis's papers are in part concerned with the metaphysics of semantic values, and one further paper also has metaphysics as its focus. Samuel Cumming gives an account of the metaphysics of 'discourse content', beginning with the notion of discourse referents: entities that are posited to do explanatory work in the semantics of anaphora. The idea is that 'Two noun phrases denote the same dref [discourse referent] if, and only if, they are anaphorically linked' (214). On Cumming's view, discourse referents are not the same as referents (in the usual philosophical sense): drefs are socially constructed abstract objects. He argues that cross-discourse anaphora is possible, so 'the lifespan of a dref is not limited to the discourse in which it was established' (220). He claims that the apparatus of drefs puts one in a position to understand various Fregean puzzles about the meaning of empty and co-referring names as well as Geach-style puzzles about intentional identity.

V. Concluding Remarks

It should be clear from these summaries that this is a wide ranging and extremely interesting collection that reflects many current trends in philosophical theorising about language, notably attention to developments in formal semantics. (It also reflects trends in philosophy more generally, including relatively uncritical acceptance of hyperintensional notions (ground, essence, real definition, etc.) that would have been anathema to a previous generation of philosophers.) It clearly shows the continued relevance of semantic and metasemantic issues to metaphysics, metaethics, and other areas of philosophy, as well as the intrinsic interest of various metasemantic questions. I look forward to seeing the new wave of metasemantic theorising that is likely to follow.

What is Semantics?

Semantics is essentially the science of meaning. It’s like being a detective whose specialty is language. Let’s say you find a word or phrase at the scene of a conversation. Your job is to figure out what it really means. For instance, if someone says, “That’s sick!” are they talking about someone being ill, or are they actually excited about something cool? Semantics helps to sort out this kind of puzzle by examining words and their meanings in the context they are used.

In a simpler definition, semantics studies how we assign meaning to words, phrases, symbols, and signs. It’s like when you read a text message that says, “I’m up for it.” The semantics involves understanding if “up for it” means the person is awake and available or if they are expressing their eagerness to participate in an activity. Linguists work with semantics to ensure that the intended message is conveyed and received accurately.

How to Guide on Understanding Semantics

If you want to get to grips with semantics, here are some tips that can help make sense of the meanings behind words:

  • Context is King: You can’t ignore the setting or situation a word is used in. The words around it can shed light on what’s actually meant.
  • Think About the Speaker or Writer: The person using the words can affect the meaning. Their age, job, where they live – these things can change how words are understood.
  • Culture Counts: People in different parts of the world can understand the same word in a bunch of different ways. A friendly gesture in one place might be a huge insult somewhere else.
  • Connotation and Denotation Matter: The dictionary definition of a word is its denotation – but words carry feelings and ideas beyond that. That’s the connotation, and it’s just as important.

Types of Semantics

There are several key areas within semantics, each with a different focus:

  • Formal Semantics: This branch is like solving equations but with language. It uses rules of logic to figure out solid, exact meanings.
  • Lexical Semantics: Here, we dive into individual words. For instance, ‘run’ can mean moving quickly on foot or operating, as in ‘My computer runs well.’
  • Conceptual Semantics: This branch looks at how we mentally process meanings. It’s concerned with the underlying concepts that language represents.
  • Computational Semantics: It focuses on how computers can understand human language. Picture programming Siri or Alexa to make sense of what we’re saying.

Examples of Semantics

Let’s explore some examples to see semantics in action:

  • Sarcasm: If someone exclaims, “What a wonderful day!” while caught in a downpour, they don’t actually mean the day is wonderful. Semantics helps us understand that the words convey the opposite of their literal meaning because of the context and tone of voice.
  • Homonyms: Take the word “bank.” It could mean the side of a river or a place where money is kept. Semantics is about deciphering which meaning is intended based on surrounding words and situational context.
  • Metaphors: Describing a person as having a “heart of stone” doesn’t mean their heart is literally made of rock; semantics interprets it as a metaphor for being emotionally unresponsive.

Why is Semantics Important?

Semantics isn’t just for academics; it has real-world impacts. It teaches us how to express ourselves clearly, avoiding miscommunication. For an everyday example, consider when you tell a joke – you’re relying on the other person’s understanding of semantics to get the humor and not take the words literally. Legally, the correct interpretation of a word or phrase in contracts can decide the outcome of disputes. Semantics is the glue that holds the consistency of meaning in our laws and agreements, and ensures that machines can communicate with us and serve us better.

Related Topics

Here are some areas that are closely related to semantics:

  • Pragmatics: This is about how context influences the interpretation of language. Unlike semantics, pragmatics takes into account the social aspects and what is implicitly meant, not just the explicit meaning.
  • Syntax : It’s about the structure of sentences – how we arrange words to form a grammatically correct sentence. While semantics is about meaning, syntax is about form.
  • Linguistics : This is the overall study of language , including how it’s formed, used, and changes over time. Semantics is one part of this larger field.
  • Philosophy of Language : Philosophers ponder about the very nature of meaning, sound, and structure in language. They ask deeper questions about how our language relates to reality.

Origin of Semantics

The origin of the word ‘semantics’ stems from Greek, with “semantikos” meaning “significant.” Interest in the field blossomed in the 1800s, and it has since become a vital part of studies in language and communication.

Controversies in Semantics

Believe it or not, semantics can get pretty controversial. Lawyers debate over word meanings in court, and scholars argue about whether our thoughts control language or the other way around. These arguments show how deeply intertwined language and meaning are with our daily lives and even our thought processes.

Exploring the field of semantics means diving into the depths of meaning in language. It’s crucial for clear communication and understanding across diverse cultures and even in advancing technology. The power of semantics lies in its ability to make sense of our complex, vibrant world of language. It’s not just about talking or writing – it’s about connecting and truly understanding each other.

Semantics Coursework

Semantics definition.

  • The word Dear

Vocabulary instruction

Semantics is a literary field that studies meaning of signs, symbols, words, and phrases and how they are used in linguistics. It is a wide field of study which has two main branches: lexical semantics, which focuses on the meaning of words and their relations when used together, and logical semantics, which is concerned with references, sense, implication and presupposition.

Semantic properties refer to the parts of linguistic units that give its elements meaning of words or phrases. The property deals with the other meanings associated with the part, or any ambiguity resulting from the relation of the words in the phrase.

Semantics is essential since it determines how language users acquire a sense of meaning for different linguistic elements. It also determines how users understand the change in language over time, also referred to as semantic shift, and how languages are utilized in any social context. A group of words or a phrase that has been used to refer to the same object or concept is called a semantic field or a semantic domain, and gives meaning to a word in relation to the other phrases used together with the word.

Semantics feature analysis is a teaching technique that uses a grid to determine how a set of objects are related. This technique is the most effective in teaching kids by letting them analyze and complete the grids. This technique also enhances vocabulary and comprehension process in learning and the students are able to learn concepts while making connections.

Denotative meaning of a word is the cognitive or reference meaning that describes the central meaning associated with the word and can also change over time. Connotation is the effect the word has on someone which can either be personal or emotional. It is the basis of approved denotation, that is, a popular connotation is made a denotation that is standard over a given period of time.

The relationship between semantic development and reading and writing skills is centered on the use of words to communicate. Age is a crucial factor that determines the development of reading and writing skills. First, we note that at a young age children first seek to understand before learning to use the words or a language.

The word “Dear”

The word “dear” is a noun whose synonym is “precious”. The most important semantic property for this word is the rate at which its connotations surpass its denotation. The word means something that is costly or treasured. Some of its connotation meanings are either girlfriend or boyfriend and its use is especially common for people in a relationship.

Vocabulary instruction is a vital teaching technique that teaches students to read and improve their knowledge. It encompasses comprehension, fluency, word study and phonemic awareness. The understanding of how important vocabulary is to the learning process allows students to be knowledgeable about their background and enable them to have better comprehension in and out of class.

Common core standard tests have relatively detrimental impacts on vocabulary instruction, such as limited knowledge of English due to the limited exposure time provided to students as a result of studying for examinations and no broad understanding of the extensive impact of one’s vocabulary on their comprehension skills. Tests do not encourage students to explore more outside what they learn in class and worse still standard tests do not really consider the challenges of learners with disabilities.

One way of teaching vocabulary directly is by creating a word rich environment using word walls, reading a room, word jars, word books and vocabulary rings. This will enhance the ability of students to learn and retain this knowledge. Games such as boggle, Pictionary and scrabble can also be used to foster independent vocabulary learning due to their easiness to learn and remember yet at the same time they are still fun activities.

  • Chicago (A-D)
  • Chicago (N-B)

IvyPanda. (2024, January 5). Semantics. https://ivypanda.com/essays/semantics/

"Semantics." IvyPanda , 5 Jan. 2024, ivypanda.com/essays/semantics/.

IvyPanda . (2024) 'Semantics'. 5 January.

IvyPanda . 2024. "Semantics." January 5, 2024. https://ivypanda.com/essays/semantics/.

1. IvyPanda . "Semantics." January 5, 2024. https://ivypanda.com/essays/semantics/.

Bibliography

IvyPanda . "Semantics." January 5, 2024. https://ivypanda.com/essays/semantics/.

  • Denotation and Connotation on the Basis of Amy Tan’s Mother Tongue
  • Denotations and Connotations in Language
  • Cognitive Linguistics: Semantic Networks Assimilation
  • Philosophy: The Puzzle of Identity by Gottlob Frege
  • Stereotyping Heroes in Cinema
  • Coca-Cola's Advertising: Media and Cultural Criticism
  • Truth Values, Their History and Use in Semantics
  • Natural Semantic Metalanguage
  • The Representation of Men and Women in Advertisements
  • Linguistic Determinism and Linguistic Relativity
  • Sexism in the English Language
  • Spelling and Sound Challenges to Spanish L2 Learners of English
  • Syntax Transformational Grammar and Systemic Functional Grammar
  • The Characteristics of Generative Syntax
  • The Establishment of an Immersion Program at a Prep Elementary School in Saudi Arabia

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Semantic Typicality Effects in Acquired Dyslexia: Evidence for Semantic Impairment in Deep Dyslexia

Acquired deep dyslexia is characterized by impairment in grapheme-phoneme conversion and production of semantic errors in oral reading. Several theories have attempted to explain the production of semantic errors in deep dyslexia, some proposing that they arise from impairments in both grapheme-phoneme and lexical-semantic processing, and others proposing that such errors stem from a deficit in phonological production. Whereas both views have gained some acceptance, the limited evidence available does not clearly eliminate the possibility that semantic errors arise from a lexical-semantic input processing deficit.

To investigate semantic processing in deep dyslexia, this study examined the typicality effect in deep dyslexic individuals, phonological dyslexic individuals, and controls using an online category verification paradigm. This task requires explicit semantic access without speech production, focusing observation on semantic processing from written or spoken input.

Methods & Procedures

To examine the locus of semantic impairment, the task was administered in visual and auditory modalities with reaction time as the primary dependent measure. Nine controls, six phonological dyslexic participants, and five deep dyslexic participants completed the study.

Outcomes & Results

Controls and phonological dyslexic participants demonstrated a typicality effect in both modalities, while deep dyslexic participants did not demonstrate a typicality effect in either modality.

Conclusions

These findings suggest that deep dyslexia is associated with a semantic processing deficit. Although this does not rule out the possibility of concomitant deficits in other modules of lexical-semantic processing, this finding suggests a direction for treatment of deep dyslexia focused on semantic processing.

Introduction

Caused by an acquired disease of the central nervous system such as stroke or traumatic brain injury, acquired dyslexia results from damage to the mature reading system and manifests as an impairment in the comprehension of written language ( Ellis & Young, 1988 ). Deep dyslexia and phonological dyslexia are two subtypes of acquired dyslexia. Although the literature offers varying descriptions of deep dyslexia, patients exhibit several hallmark symptoms including: 1) severely impaired pseudoword reading, 2) semantic errors in oral reading, 3) visual errors in oral reading, 4) morphological errors in oral reading, and 5) an imageability effect in word reading with greater success in reading concrete, imageable words ( Coltheart, 1980 ; Ellis & Young, 1988 ). In contrast, the hallmark of phonological dyslexia is impairment in pseudoword reading in conjunction with an absence of semantic reading errors ( Beauvois & Derouesne, 1979 ; Derouesne & Beauvois, 1979 ; Ellis & Young, 1988 ). These patients also may show visual errors and imageability effects, as in phonological dyslexia. However, the presence of semantic errors distinguishes between the two reading disorders ( Friedman, 1996 ; Glosser & Friedman, 1990 ).

Several theories of semantic error production in deep dyslexia have been proposed, but the theories relevant to the current study include the multiple-deficit hypothesis ( Morton & Patterson, 1980 ) and the failure of inhibition theory ( Buchanan, McEwen, Westbury, & Libben, 2003 ). The multiple-deficit hypothesis suggests that reading deficits in deep dyslexia stem from more than one source. Referencing patient data from several case studies, Morton and Patterson (1980) suggested damage to both the nonlexical grapheme-phoneme route (to account for pseudoword reading deficits) and lexical-semantic routes (to account for semantic errors). Shallice and Warrington (1980) further proposed multiple candidates for impairment constrained to the lexical-semantic processing route: 1) impaired access to the semantic system, 2) impaired processing within the semantic system, and 3) impaired phonological retrieval following semantic processing. Some studies support impaired access to the semantic system, reporting differences in performance accuracy on visual and auditory tasks ( Shallice & Coughlan, 1980 ; Shallice & Warrington, 1975 ). In contrast, other studies support a central semantic processing deficit, reporting individuals with deep dyslexia who show poor performance accuracy in both visual and auditory word picture matching tasks ( Newcombe & Marshall, 1980 ; Patterson, 1978 ). Still other studies support a phonological impairment, with evidence of deep dyslexic patients producing semantic errors in naming as well as reading ( Marshall & Newcombe, 1966 ; Patterson, 1978 ; Saffran & Marin, 1977 ). Although Shallice and Warrington (1980) were able to specify possible deficit locations, they acknowledged that supporting evidence exists for all three possibilities and the problem of locating the lexical-semantic impairment still remains.

In contrast, the Failure of Inhibition Theory (FIT) proposes that semantic errors in deep dyslexia are caused by failure to inhibit incorrect responses at the level of phonological production ( Buchanan et al., 2003 ). Within this theoretical framework, when a word is read aloud, the word’s stored orthographic representation is activated, followed by activation of its semantic representation as well as spreading activation of semantically-related neighbors, and, in turn, these entries in the phonological lexicon are activated. In a normal reading system, incorrect responses would be inhibited, however, in deep dyslexia, FIT hypothesizes that the inhibition mechanism is impaired, resulting in production of semantic errors ( Buchanan et al., 2003 ). One critical aspect of FIT relates to how orthographic, semantic, and phonological representations are accessed, a feature of the model referred to PEIR ( P roduction, E xplicit access, I mplicit access, R epresentation) ( Buchanan et al., 2003 ). PEIR proposes that production relies on explicit access, which relies on implicit access, which relies on intact representations ( Buchanan et al., 2003 ; Colangelo & Buchanan, 2007 ). Authors of FIT propose that deep dyslexia is the result of a deficit in explicit access at the level of phonological production ( Colangelo & Buchanan, 2007 ).

This hypothesis is based primarily on data from a single case of deep dyslexia, patient JO, whose responses the authors interpret as intact implicit access to semantics, intact explicit access to semantics in the absence of production, and deficient explicit access in tasks requiring production ( Colangelo & Buchanan, 2005 , 2007 ). To test implicit access, JO completed a lexical decision task and performed within accuracy ranges of controls, leading the authors to conclude that implicit semantic processing was intact ( Colangelo & Buchanan, 2005 ). To test explicit access to semantics with a production component, performance on an oral reading task was compared in a semantically-blocked condition and an unrelated condition. JO produced significantly more semantic errors in the semantically-blocked condition, so the authors argued that FIT was supported in that explicit access to phonology was not achieved ( Colangelo & Buchanan, 2005 , 2007 ). In order to further specify that this explicit access related to the phonological lexicon and not semantics, JO was given a forced choice trial for multiple lists of semantically-related words, requiring silent reading of the list, followed by selection of the most semantically-associated word from a field of two choices ( Colangelo & Buchanan, 2005 , 2007 ). JO’s accuracy on this task was within the range of control subjects’ performance; thus it was concluded that explicit access to the semantic system was not impaired. Although on the surface, this evidence may appear convincing, it is to date the only piece of evidence directly addressing the explicit semantic access capabilities of deep dyslexic individuals and is only based on data from a single case study. Additionally, although the task examined semantic access, the only measure was overall accuracy of performance, which could potentially be insensitive to processing impairments that may be detected by online measures such as reaction time.

In summary, both models of deep dyslexia discussed here provide supporting evidence, but neither is able to adequately address all the evidence, especially regarding semantic error production. To investigate semantic processing in deep dyslexia, the current study examined the semantic typicality effect using a task that requires explicit semantic access without production.

Semantic Typicality Effects

Semantic typicality is a factor that affects the organization of semantic categories in the mental lexicon. While the classical view of semantic categorization considers each category to possess a set of defining features, it has been shown that not all members of a category represent these features to the same degree ( Rosch, 1973 , 1975 ). Some items in a category can be considered good or typical exemplars that possess many of the defining features of the category (e.g., robin in the category of birds ), whereas, others can be considered poor or atypical exemplars that possess fewer defining features of the same category (e.g., ostrich ). Some studies have shown that typical members of a category are processed faster than atypical members, an effect known as the typicality effect ( Kiran & Thompson, 2003 ; Rosch, 1975 ).

In healthy control participants, the typicality effect has been found in several studies using a variety of experimental paradigms, including item ranking, lexical decision, category verification, and category naming ( Casey, 1992 ; Hampton, 1995 ; Rosch, 1973 , 1975 ). This effect also has been found for people with nonfluent Broca’s, but not fluent Wernicke’s aphasia. Using a category verification task Kiran and Thompson (2003) found faster reaction times (RTs) for typical as compared to atypical exemplars of category items for the nonfluent patients. But, for the fluent patients, no RT differences were found for the two types of words, a finding associated with an underlying semantic deficit.

The purpose of the current study was to test for a semantic processing impairment in deep dyslexia using an online category verification paradigm identical to that used by Kiran and Thompson (2003) . This task requires explicit access to the semantic system without requiring oral production. The following questions were posed: (1) Do individuals with deep dyslexia fail to show a typicality effect, which would suggest impaired semantic processing? and (2) In the event that deep dyslexic readers fail to show this effect, does the impairment show up when processing both visual and auditory words? It was predicted that, in contrast to participants with deep dyslexia, both healthy control subjects and phonological dyslexic participants would show a typicality effect in both visual and auditory modalities. Whereas, we predicted that if a semantic impairment underlies semantic errors in deep dyslexia, the typicality effect would be absent in one or both modalities. A deficit in visual, but not auditory, processing would suggest a deficit in accessing semantic representations from the written form of lexical items; whereas, deficient processing in both modalities would indicate an amodal central semantic deficit.

Participants

Nine individuals served as adult control participants (6 females; age 22 to 29, M=24.5; years of education 16 to 20, M=26.2) and eleven individuals with acquired dyslexia participated in this study (4 females; age 47 to 69, M=58.2; years of education 12 to 19, M=15.5). All were monolingual English speakers, had normal or corrected-to-normal vision, and normal hearing. None reported a history of psychiatric, developmental speech-language, or neurological disorders, other than stroke in the patient participants. All patient participants demonstrated behavioral characteristics consistent with acquired dyslexia subsequent to left hemisphere cerebrovascular accident (CVA); time elapsed after CVA ranged from 2.5 to 17.5 years (M=6.9 years). Control participants were recruited from Northwestern University and the surrounding community and acquired dyslexia participants were recruited from the Northwestern University Aphasia and Neurolinguistics Laboratory and the Northwestern University Speech and Language Clinic. All participants gave informed consent in accordance with Northwestern University’s Institutional Review Board.

Six individuals showed patterns consistent with a diagnosis of phonological dyslexia and five showed ‘deep dyslexia’ patterns. Participants in the deep dyslexia group were not significantly different from those in the phonological dyslexia group for age, years of education, or years post-CVA ( t (9)=-0.053, p=0.61, age; t (9)=-1.46, p=0.18, years education; t (9)=-0.87, p=0.41, years post-CVA). Participants with acquired dyslexia were not significantly different from controls in years of education ( t (18)=1.97, p=0.07), but control participants were significantly younger ( t (18)=-14.67, p<.05).

To determine a diagnosis of acquired dyslexia, participants were administered the Western Aphasia Battery-Revised ( WAB-R ) ( Kertesz, 2007 ), a subtest from the Woodcock Johnson-III Diagnostic Reading Battery ( WJDRB-III ) ( Woodcock, Mather, & Schrank, 2004 ), and selected subtests from the Psycholinguistic Assessment of Language Processing in Aphasia ( PALPA ) ( Kay, Lesser, & Coltheart, 1992 ). Although a diagnosis of aphasia was not a criterion for subject inclusion, all acquired dyslexic participants presented with aphasia as measured by the WAB-R , with Aphasia Quotients ranging from 51.4 to 87.6. All dyslexic participants also demonstrated deficits in pseudoword reading on the WJDRB-III and on the Nonword Reading subtest of the PALPA (see Table 1 ). To ensure integrity of the written version of the experimental task, all dyslexic participants were required to demonstrate the ability to match single written words to pictures on the Written Word-Picture Matching subtest of the PALPA .

Language testing data for acquired dyslexic participants. P1 through P6 represent phonological dyslexic participants and D1 through D5 represent deep dyslexic participants.

To distinguish between the phonological and deep dyslexia patient groups, additional subtests of the PALPA were administered, including the Imageability and Frequency Reading subtest ( PALPA 31) and Regularity Reading subtest ( PALPA 35). Individuals included in the deep dyslexia group produced semantic errors in single word oral reading, whereas individuals in the phonological dyslexia group did not. Additionally, although participants in both the deep and phonological dyslexia groups showed effects of imageability ( t (4)=5.404, p<0.01, deep dyslexia; t (5)=5.809, p<0.05, phonological dyslexia) in single word oral reading as demonstrated by the Imageability and Frequency Reading subtest of the PALPA , individuals in the deep dyslexia group demonstrated imageability effects of a significantly greater magnitude ( t (9)=-4.804, p<0.01) (see Table 1 ).

The stimuli used in this study were based on those used by Kiran & Thompson (2003) in their online category verification task. Three animate superordinate categories were used (birds, vegetables, and fish), with each category containing 15 typical and 15 atypical items for a total of 90 items. The typicality norms for each category were obtained from Kiran (2001) , derived by asking groups of healthy young and elderly individuals to rate items on a 7-point scale based on category typicality. Within this scale, a low rank (e.g., 1) indicated items judged as good members of a category and a high rank (e.g., 7) indicated items judged as poor members of a category. The items for each category were then rank ordered using z-scores. Because low ranks represented “typical” items and high ranks represented “atypical” items, z-score calculations resulted in a range of z-scores from -1.2 on the extreme end of “typical” items to 1.3 on the extreme end of “atypical” items. For each category, typical exemplars were selected based on the 15 items with the lowest z-scores (range: -1.2 to -0.45) and atypical exemplars were selected based on the 15 items with the highest z-scores (range: 0.01 to 1.3). An additional 90 items belonging to different superordinate categories served as nonmember exemplars ( Kiran & Thompson, 2003 ). For the online category verification task, each item was paired with a superordinate category label (i.e., bird, vegetable, fish), resulting in 180 word pairs. The order of stimulus presentation was randomized during the experimental task. For each superordinate category, 30 items matched the category (‘yes’ response) and 30 items did not match the category (‘no’ response).

The same 180 word pairs were used for both the visual and auditory online category verification tasks. For the visual task, stimulus words were presented in 48 point Arial Black font. For the auditory task, stimuli were recorded in a soundproof booth using a female voice and presented through external speakers adjusted to each participant’s sound comfort level. A Mac Book computer with Superlab 4.0 was used to present stimuli and collect accuracy and reaction time data. In the visual online category verification task, the superordinate prime word was presented for 750 ms, followed by a 200 ms Inter-Stimulus Interval (ISI) and then the target word, which remained on the screen until the participant responded. This sequence was followed by a 1500 ms Inter-Trial Interval (ITI). For the auditory online category verification task, the superordinate prime word was presented for the duration of the audio file (500-700 ms), followed by a 200 ms ISI and the audio presentation of the target word. A 1500 ms ITI followed each participant response and the next superordinate prime was presented. For both task modalities, the participant’s response time was recorded from the onset of the target word.

All participants completed both the visual and auditory versions of the task, which were presented in two separate sessions on different days, with at least five days between the two. The order of modality presentation was counterbalanced across participants. For both modalities, participants were seated in front of the computer screen with one hand resting on a button response box. Participants were instructed to either read or listen to (depending on modality) each word pair presented and decide if the target word belonged to the preceding category. If the word was judged to be a member of the category, the participant was instructed to press the green “yes” response button on the button box. If the word was judged not to be a member of the preceding category, the participant was instructed to press the red “no” response button on the button box.

After receiving the task instructions, a 10 item training session was completed to allow practice with items similar to those used in the actual experiment and feedback on the accuracy of the participant’s response was provided by Superlab 4.0. Participants were instructed to respond as quickly and as accurately as possible for all items. After the training session was completed, the experimental task was begun. To avoid effects of fatigue, during the experiment, participants were given an opportunity for two breaks, the first when a third of the items were completed and the second when two-thirds of the items were completed.

Data Analysis

Accuracy and response times for each participant were recorded by Superlab 4.0. For each participant and subject group, the mean and median proportions of accurate responses and response times were calculated for each independent variable. Only the response times of correct responses were included in the final statistical analysis. Due to the small number of subjects in each group as well as the uneven number of subjects representing each group, nonparametric statistics were used to analyze the data. For between-subject comparisons of group accuracy, Kruskal-Wallis tests were performed. For within-subject comparisons of reaction time across typicality, Friedman tests were performed to assess overall differences within participant groups and Wilcoxon Signed Rank tests were performed for pair-wise comparisons of typical and atypical RTs within each participant group. To reduce the possibility of Type I statistical error, only comparisons critical to answering the research questions were conducted, resulting in a single pair-wise typicality comparison (typical versus atypical) within each participant group.

Analysis of the response accuracy data showed that in the visual task, no significant differences were found for any level of typicality ( H (2)=0.40, ns , typical; H (2)=1.24, ns , atypical; H (2)=4.15, ns , nonmember); in the auditory task, no significant differences were found for any level of typicality ( H (2)=4.08, ns , typical; H (2)=2.32, ns , atypical) except for nonmember items ( H (2)=8.90, p <0.05). Although a significant difference was found for nonmember items in the auditory task, because this was not one of the critical questions of this study, no post-hoc tests were conducted for this analysis.

Reaction Time

Visual condition analysis.

Analysis of the reaction time data resulted in a significant difference between typical and atypical items for the control group (X 2 (2)=8.667, p <0.05), and the phonological dyslexic group (X 2 (2)=9.00, p <0.05), but not for the deep dyslexic group (X 2 (2)=2.8, ns ). To address the research questions and identify the specific level of difference within each group, a single comparison between typical and atypical items was completed. This analysis revealed that for the control group and phonological dyslexic group, typical responses were significantly faster than atypical responses ( T =2.00, p <0.05, controls; T =0.00, p <0.05, phonological dyslexics). However, for deep dyslexics, this analysis revealed no significant difference between the two ( T =3.00, ns ) (see Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is nihms159851f1.jpg

Mean response times for correct responses to typical and atypical in the visual task (top figure) and auditory task (bottom figure).

Auditory condition analysis

Analysis of the reaction time data resulted in a significant difference between typical and atypical items for the control group (X 2 (2)=16.22, p <0.05), phonological dyslexic group (X 2 (2)=5.33, p <0.05), and deep dyslexic group (X 2 (2)=7.6, p <0.05). To address the research questions and identify the specific level of difference within each group, a single comparison between typical and atypical items was completed. This analysis revealed that for the control group and phonological dyslexic group, typical responses were significantly faster than atypical responses ( T =0.00, p <0.05, controls; T =0.00, p <0.05, phonological dyslexics). However, for deep dyslexics, this analysis revealed no significant difference between the two ( T =0.00, ns ) (see Figure 1 ).

The purpose of the current study was to identify and locate the potential semantic impairment in deep dyslexia using a task that would specifically require explicit semantic access from both visual and auditory input without requiring an oral response, thus avoiding a confound of speech production. Results indicated that healthy control and phonological dyslexic participant groups demonstrated a typicality effect in both visual and auditory modalities, but deep dyslexic participants did not demonstrate this effect in either modality. These results will be discussed in relation to each of the two research questions posed in the current study. Taking this new evidence into account, a modified theory of deep dyslexia is proposed.

The first question concerned whether or not a semantic impairment exists in deep dyslexia. The Failure of Inhibition Theory (FIT) suggests that semantic reading errors result from failure to inhibit incorrect responses at the level of the phonological output lexicon, with all prior levels of processing intact ( Buchanan et al., 2003 ). The evidence used to support this theory was that performance accuracy levels similar to normal controls were found for their patient for a task requiring implicit access to semantics and a task requiring explicit access to semantics without oral production. Although accuracy measures for this single case study did not differ from control subjects, no other more sensitive measures were collected to explore the possibility of semantic processing differences. The current study examined explicit access to semantics without requiring oral production by testing a well-documented lexical-semantic processing effect (semantic typicality) using reaction time as a primary dependent measure.

It was predicted that in the visual task, if deep dyslexics showed a typicality effect similar to control subjects, then this would suggest intact semantic processing. Conversely, differing typicality effects between the two groups would suggest impaired semantic processing. Results showed that whereas control subjects demonstrated a typicality effect, deep dyslexics did not. These findings suggest that in a task requiring semantic processing of written lexical items, deep dyslexics process semantic information differently from controls, thus indicating a semantic impairment.

While the evidence presented here supports a semantic impairment in deep dyslexia, it does not directly test the presence or absence of a deficit at the level of the phonological output lexicon or the claim that semantic errors result from failure of inhibition. It is entirely possible, even likely, that the mechanism for semantic errors is an inability to inhibit incorrect responses. However, the current study now calls into question the level of processing at which this inhibition mechanism fails. Perhaps it is a failure to inhibit at both the semantic and phonological output lexicon levels, which is subtle at the level of semantics and only obviously detectable at the level of production.

The second question addressed the locus of the semantic impairment along the lexical-semantic reading route. Most models of lexical processing represent the semantic system as a single processing system accessed by all modalities. In order to test whether or not the semantic processing impairment in deep dyslexia affects general semantic processing or semantic access from written input, the current study examined category verification in both the visual and auditory modalities. Whereas control subjects (and phonological dyslexic subjects) demonstrated a typicality effect in both the visual and auditory modalities, deep dyslexic subjects did not demonstrate a typicality effect in either modality. This pattern suggests that deep dyslexics demonstrate impairment in semantic processing. The multiple-deficit hypothesis ( Morton & Patterson, 1980 ) conceptualizes deep dyslexia as a disorder involving two separate impairments: a lexical-semantic impairment and grapheme-phoneme conversion impairment. Whereas the current study did not address grapheme-phoneme conversion, it directly addressed the locus of impairment along the lexical-semantic reading route, with the results supporting the existence of a semantic impairment in deep dyslexia. The design of the experimental task allowed a direct comparison of semantic system processing and access, with deep dyslexics showing impairment in semantic processing.

Toward a modified theory of deep dyslexia

Although the current study has provided additional evidence supporting and opposing various parts of the previously discussed theories, it appears that neither of these theories account for all the evidence, therefore, we propose a modified version of FIT. In deep dyslexia, as FIT proposes, selection inhibition is impaired. However, the lack of a semantic typicality effect in deep dyslexia may indicate an inability to efficiently select a correct lexical-semantic representation, suggesting that selection inhibition becomes impaired beginning at the level of semantics.

FIT provides a comprehensive view of deep dyslexia that compliments other theories by proposing a mechanism for semantic error production. Proponents of FIT suggest that semantic processing is intact in deep dyslexia; however, the current study suggests that deep dyslexics demonstrate semantic processing impairments. These data suggest a modification to the FIT model, conceptualizing failure of inhibition beginning at the level of semantic processing.

  • Beauvois MF, Dérouesné J. Phonological alexia: three dissociations. Journal of Neurology Neurosurgery Psychiatry. 1979; 42 (12):1115–1124. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Buchanan L, McEwen S, Westbury C, Libben G. Semantics and semantic errors: Implicit access to semantic information from words and nonwords in deep dyslexia. Brain and Language. 2003; 84 :65–83. [ PubMed ] [ Google Scholar ]
  • Casey PJ. A re-examination of the roles of typicality and category dominance in verifying category membership. Journal of Experimental Psychology: Learning Memory and Cognition. 1992; 12 (2):237–267. [ PubMed ] [ Google Scholar ]
  • Colangelo A, Buchanan L. Semantic ambiguity and the failure of inhibition hypothesis as an explanation for reading errors in deep dyslexia. Brain and cognition. 2005; 57 (1):39–42. [ PubMed ] [ Google Scholar ]
  • Colangelo A, Buchanan L. Localizing damage in the functional architecture: The distinction between implicit and explicit processing in deep dyslexia. Journal of Neurolinguistics. 2007; 20 (2):111–144. [ Google Scholar ]
  • Coltheart M. Deep dyslexia: a review of the syndrome. In: Coltheart M, Patterson K, Marshall J, editors. Deep Dyslexia. London: Routledge & Kegan Paul; 1980. [ Google Scholar ]
  • Dérouesné J, Beauvois MF. Phonological processing in reading: data from alexia. Journal of Neurology, Neurosurgery, and Psychiatry. 1979; 42 (12):1125–1132. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ellis AW, Young AW. Human Cognitive Neuropsychology. Hove, UK: Lawrence Erlbaum Associates; 1988. [ Google Scholar ]
  • Friedman RB. Recovery from Deep Alexia to Phonological Alexia: Points on a Continuum. Brain and Language. 1996; 52 :114–128. [ PubMed ] [ Google Scholar ]
  • Glosser G, Friedman RB. The continuum of deep/phonological alexia. Cortex. 1990; 26 (3):343–359. [ PubMed ] [ Google Scholar ]
  • Hampton JA. Testing the prototype theory of concepts. Journal of Memory and Language. 1995; 34 :686–708. [ Google Scholar ]
  • Humphreys GW, Riddoch MJ, Quinlan PT. Cascade processes in picture identification. Cognitive Neuropsychology. 1988; 5 :67–103. [ Google Scholar ]
  • Kay J, Lesser R, Coltheart M. The Psycholinguistic Assessment of Language Processing in Aphasia (PALPA) Hove, UK: Lawrence Erlbaum Associates; 1992. [ Google Scholar ]
  • Kertesz A. Western Aphasia Battery-Revised. San Antonio, TX: PsychCorp; 2007. [ Google Scholar ]
  • Kiran S. Effects of exemplar typicality on naming deficits in fluent aphasia. Dissertation, Northwestern University; Evanston, IL: 2001. [ Google Scholar ]
  • Kiran S, Thompson CK. Effect of typicality on online category verification of animate category exemplars in aphasia. Brain & Language. 2003; 85 (3):441–450. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Marshall JC, Newcombe F. Syntactic and semantic errors in paralexia. Neuropsychologia. 1966; 4 :169–176. [ Google Scholar ]
  • Morton J, Patterson K. A new attempt at an interpretation, or, an attempt at a new interpretation. In: Coltheart M, Patterson K, Marshall J, editors. Deep Dyslexia. London: Routledge & Kegan Paul; 1980. [ Google Scholar ]
  • Newcombe F, Marshall J. Response monitoring and response blocking in deep dyslexia. In: Coltheart M, Patterson K, Marshall J, editors. Deep Dyslexia. London: Routledge & Kegan Paul; 1980. [ Google Scholar ]
  • Patterson K. Phonemic dyslexia: Errors of meaning and the meaning of errors. The Quarterly Journal of Experimental Psychology. 1978; 30 :587–601. [ PubMed ] [ Google Scholar ]
  • Rosch E. On the internal structure of perceptual and semantic categories. In: Moore TE, editor. Cognitive development and the acquisition of language. New York: Academic Press; 1973. [ Google Scholar ]
  • Rosch E. Cognitive representations of semantic categories. Journal of Experimental Psychology: General. 1975; 104 (3):192–233. [ Google Scholar ]
  • Saffran E, Marin O. Reading without phonology: Evidence from aphasia. The Quarterly Journal of Experimental Psychology. 1977; 29 (3):515–525. [ PubMed ] [ Google Scholar ]
  • Shallice T, Coughlan AK. Modality specific word comprehension deficits in deep dyslexia. British Medical Journal. 1980; 43 (10):866. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Shallice T, Warrington E. Word recognition in a phonemic dyslexic patient. Quarterly Journal of Experimental Psychology. 1975; 27 :187–199. [ PubMed ] [ Google Scholar ]
  • Shallice T, Warrington E. Single and multiple component central dyslexic syndromes. In: Coltheart M, Patterson K, Marshall J, editors. Deep Dyslexia. London: Routledge & Kegan Paul; 1980. [ Google Scholar ]
  • Woodcock RW, Mather N, Schrank FA. The Woodcock-Johnson III Diagnostic Reading Battery. Rolling Meadows, IL: Riverside Publishing; 2004. [ Google Scholar ]

helpful professor logo

15 Semantic Memory Examples

semantic memory examples and definition

Semantic memory refers to the long-term storage of facts and is a form of declarative memory . Examples of semantic memory include remembering definitions of concepts, historical dates, and the names of people, places, and things.

Theoretically, semantic memory has unlimited capacity. However, memory does decay over time and is also subject to various kinds of interference that can delete information.

Although sometimes information in semantic memory can be automatically activated, generally speaking, retrieval requires conscious intention and mental effort.

Semantic Memory Definitions

Scholarly definitions of semantic memory include:

  • “Semantic memory consists of your entire knowledge base including your vocabulary, concepts, and ideas.” (Levy, 2013, p. 206)
  • “Semantic memory is our storehouse of more-or-less permanent knowledge, such as the meanings of words in a language (e.g., the meaning of “parasol”) and the huge collection of facts about the world (e.g., there are 196 countries in the world, and 206 bones in your body).” (Kearns & Lee, 2015, p. 155)

Semantic Memory vs Episodic Memory

Endel Tulving (1972; 1983) is recognized as one of the pioneers in the study of long-term memory and has made many significant contributions to our present-day understanding.

He saw semantic and episodic memory as two types of declarative (aka ‘ explicit memory ‘). See graph below.

types of long-term memory, reproduced as text in the appendix

One of his first insights was to make a distinction between semantic and episodic memory (memory for experiences), stating that:

“…one system can operate independently of the other,” and most likely are “governed at least partially by different principles” (1983; p. 66), although both systems are “closely interdependent and interact with one another virtually all the time” (p. 65).

Similarly, Stangor and Walinga (2014) compare the two with examples:

“Episodic memory refers to the firsthand experiences that we have had (e.g., recollections of our high school graduation day or of the fantastic dinner we had in New York last year). Semantic memory refers to our knowledge of facts and concepts about the world (e.g., that the absolute value of −90 is greater than the absolute value of 9 and that one definition of the word “affect” is “the experience of feeling or emotion”).”

Read About More Types of Long-Term Memory Here

Semantic Memory Examples

  • Memorizing the names and birthdates of all of your relatives, including grandparents, aunts and uncles, plus cousins.  
  • Responding to your competition’s remarks during a debate by citing key facts from research.      
  • Labeling each part of the brain involved in reading comprehension. 
  • Learning the words for various fruits and vegetables in a foreign language.      
  • A third-grader taking a fill-in-the-blank test naming all parts of a volcano.  
  • Participating in a spelling-bee.
  • Filling in a timeline that shows the historical events that led to the Civil Rights Movement.   
  • Writing a paper that compares and contrasts two novels in English literature.          
  • Memorizing the defining characteristics of different styles of architecture.   
  • Being able to cite key team stats of your favorite basketball team’s game last week.           

See more examples of long-term memory here

Detailed Examples

1. the animal classification system.

Scientists use a taxonomy of classification to categorize all living things. All living organisms, plants and animals, are grouped together based on commonly shared characteristics.

There are 8 taxonomic ranks that are arranged from broadest to most narrow. As groups become smaller, the members become more similar.

Biology students have to memorize the classification system and know the placement of the living things they study. That means putting a lot of information in semantic memory.

Remembering taxonomic ranks can be confusing, so many students rely on a mnemonic: “Daring King Phillip came over for good spaghetti.”

Responding to exam questions about this system requires committing the information to semantic memory and then retrieving it.

2. The GeoBee

Beginning in 1989, the National Geographic Society held an annual geography contest, commonly referred to as the GeoBee. Over the years, millions of students, educators, and parents have been involved in this exhilarating tournament.

Starting at the local school level, students participated in competitions and moved their way up to state, regional, and eventually the final rounds.

The competition involved knowing basic Earth facts, including the highest, lowest, and deepest points around the world, in addition to the location of countries, bodies of waters, and major physical features of the planet.

That is an incredible amount of data to commit to long-term semantic memory.

Although the GeoBee was discontinued in 2022, you can still take advantage of a wealth of resources available at the Resource Library .

3. Debate 

Participating in a debate can be an exciting, or nerve-wracking, experience. Although there are many versions of formal debate, the fundamental goal is to present a set of well-researched and grounded arguments that support a given position.

The key term here is “well-researched and grounded.” That means that whenever a participant presents their views, they must back-up their position with facts.

Simply giving an opinion that is solely based on one’s personal view is a sure-fire way to lose.

In preparation for a debate, research is paramount. That means hours of studying published articles in the arts and social sciences . Key terms and concepts must be committed to memory and then organized to formulate a coherent and logical position on the issue at hand.

During the debate itself, those points must be retrieved from semantic memory and presented to the competition and judges.

Although there can be some differences in the basic structure of debates, the essential role of semantic memory stays the same.

4. Writing an Essay: Semantic Memory 

Writing an excellent essay may seem straightforward on the surface. However, it actually involves an extensive series of steps. First, one must conduct the necessary research. That involves reading various articles and books, or perhaps watching a documentary or biographical account of an historical figure.

At each instance, the student must determine which information is the most pertinent and then make a conscious effort to place that information in semantic memory. That does not happen instantaneously for most and usually requires considerable repetition.

Once the information has been placed in long-term semantic memory, it must then be retrieved at the moment of writing the essay.

The retrieval process also requires cognitive effort and conscious intention. One must initiate a search through the memory store, locate the targeted data, and bring it into consciousness.

Assuming the necessary information was placed into storage to begin with, and assuming it did not decay over time, one must also hope that it has been retrieved in its most accurate form and not accidentally jumbled with other information.

After all of that, one can begin formulating an organized and coherent essay, assuming the question was read correctly.

5. Semantic Memory Network and Spreading Activation

Collins and Loftus (1975; Anderson, 2013) suggested that semantic information is stored similar to a concept map. Each concept is represented in the network as a node, with each node connected via links, called arcs.

The stronger the association between two concepts, the closer in the network the nodes appear. For example, the concepts of “doctor” and “nurse” will be more closely connected than the concepts of “doctor” and “keyboard.”

The network model of semantic memory also postulates that when one concept is activated, other concepts that are closely linked will also be activated.

The stronger the association, the easier the activation. That activation will then spread throughout the network until it eventually loses momentum.

In the words of Collins and Loftus:

“The more properties two concepts have in common, the more links there are between the two nodes via these properties and the more closely related are the concepts…When a concept is processed (or stimulated), activation spreads out along the paths of the network in a decreasing gradient” (p. 411).

Semantic memory stores factual information, including the definition of terms and concepts. Theoretically, it has unlimited storage and lasts forever. Practically speaking, information fades over time and is subject to interference, which can cause some information to be deleted.

Semantic memory is used in many aspects of life. When trying to formulate a coherent essay, or when engaged in a debate in which we support our position with facts, we are utilizing semantic memory.

Similarly, memorizing the taxonomic classification of various living organisms or physical features of the planet rely on semantic memory.

Information in semantic memory is stored structurally similar to a concept map. Related concepts are closely connected. When one concept is stimulated, activation spreads throughout the network and activates other concepts.

Anderson, J. R. (2013). Language, memory, and thought . Psychology Press.

Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior , 8 (2), 240-247.

Collins, W. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82 (6), 407-428.

Faul, L. & LaBar, K. (2022). Mood-congruent memory revisited. Psychological Review . https://doi.org/10.1037/rev0000394

Riedel, W. J., & Blokland, A. (2015). Declarative memory. Cognitive Enhancement , 215-236.

Renoult, L., & Rugg, M. D. (2020). An historical perspective on Endel Tulving’s episodic-semantic distinction. Neuropsychologia , 139 , 107366. https://doi.org/10.1016/j.neuropsychologia.2020.107366

Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization of Memory , (pp. 381–403). New York: Academic Press.

Tulving, E. (1983). Elements of episodic memory . New York: Oxford University Press.

Appendix: Description of Image

Top Level Heading: Long-term Memory Sub-Category 1: Explicit Memory (conscious effort) Types of Explicit Memory: Semantic Memory (facts and general knowledge) and Episodic Memory (events and experiences) Sub-Category 2: Implicit Memory (without conscious effort) Types of Implicit Memory: Procedural Memory (motor skills) and Priming (enhanced activation)

Dave

Dave Cornell (PhD)

Dr. Cornell has worked in education for more than 20 years. His work has involved designing teacher certification for Trinity College in London and in-service training for state governments in the United States. He has trained kindergarten teachers in 8 countries and helped businessmen and women open baby centers and kindergartens in 3 countries.

  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 25 Positive Punishment Examples
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 25 Dissociation Examples (Psychology)
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ 15 Zone of Proximal Development Examples
  • Dave Cornell (PhD) https://helpfulprofessor.com/author/dave-cornell-phd/ Perception Checking: 15 Examples and Definition

Chris

Chris Drew (PhD)

This article was peer-reviewed and edited by Chris Drew (PhD). The review process on Helpful Professor involves having a PhD level expert fact check, edit, and contribute to articles. Reviewers ensure all content reflects expert academic consensus and is backed up with reference to academic studies. Dr. Drew has published over 20 academic articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education and holds a PhD in Education from ACU.

  • Chris Drew (PhD) #molongui-disabled-link 25 Positive Punishment Examples
  • Chris Drew (PhD) #molongui-disabled-link 25 Dissociation Examples (Psychology)
  • Chris Drew (PhD) #molongui-disabled-link 15 Zone of Proximal Development Examples
  • Chris Drew (PhD) #molongui-disabled-link Perception Checking: 15 Examples and Definition

Leave a Comment Cancel Reply

Your email address will not be published. Required fields are marked *

Semantic Memory

Reviewed by Psychology Today Staff

Semantic memory is a form of long-term memory that comprises a person’s knowledge about the world. Along with episodic memory, it is considered a kind of explicit memory, because a person is consciously aware of the facts, meanings, and other information that it contains.

On This Page

  • How We Use Semantic Memory
  • How Semantic Memory Works

Semantic memory is key to understanding and describing how everything around us works. Collected over each person’s lifetime of learning, the information in semantic memory—facts, relationships between objects or concepts, and many more abstract details—is invaluable to everyone from kindergartners to gameshow contestants.

The information contained in semantic memory ranges from basic facts such as the meanings of words and what colors different kinds of food are to more complex forms of understanding, such as how certain concepts relate to each other. Semantic memory also reflects the abstract details of one’s own life, such as birth date, hometown, or personal characteristics.

Semantic memory isn’t just a library of trivia: In compiling a vast range of meanings, details about the way things are, and conceptual linkages, it enables one to learn about the world and other people, to use language and share ideas, and to interpret personal experiences, among other important behaviors.

Unlike episodic memory, which reproduces the subjective impressions of past experiences, semantic memory contains information that is context-free—not grounded in a particular time and place. A person who started learning the alphabet on a particular afternoon in childhood doesn’t need to revisit that moment to remember (thanks to semantic memory) that the letter P comes after M .

The base of knowledge contained in semantic memory is accumulated through many moments of learning, from picking up the basics of language in early childhood to grasping complex ideas and systems in class, in conversations, or while reading books. While few of these moments of learning will remain with us as scenes in episodic memory, our brains collect the abstract insights to help us answer questions, communicate, and solve problems in the future.

The medial temporal lobe, which includes the hippocampus, appears to play a role in the creation of semantic memories, they are ultimately thought to be stored throughout the neocortex—and other areas of the brain are likely involved in the process of retrieving semantic memories.

Semantic memory ability seems to develop earlier in childhood than episodic memory (the memory for personal experiences). In older age, it tends to decline, on average, but remains more stable than episodic memory.

Personal semantic memory is a term that describes semantic memories about one’s own life. These are related to episodic memories, but are considered distinct, since they do not require revisiting a specific moment. While remembering what attending a great concert was like would count as episodic memory, knowing that it was one’s favorite concert is an example of personal semantic memory.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Teletherapy
  • United States
  • Brooklyn, NY
  • Chicago, IL
  • Houston, TX
  • Los Angeles, CA
  • New York, NY
  • Portland, OR
  • San Diego, CA
  • San Francisco, CA
  • Seattle, WA
  • Washington, DC
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Therapy Center NEW
  • Diagnosis Dictionary
  • Types of Therapy

March 2024 magazine cover

Understanding what emotional intelligence looks like and the steps needed to improve it could light a path to a more emotionally adept world.

  • Coronavirus Disease 2019
  • Affective Forecasting
  • Neuroscience

Semantic memory: A review of methods, models, and current challenges

  • Published: 03 September 2020
  • Volume 28 , pages 40–80, ( 2021 )

Cite this article

semantic essay

  • Abhilasha A. Kumar 1  

28k Accesses

96 Citations

97 Altmetric

14 Mentions

Explore all metrics

Adult semantic memory has been traditionally conceptualized as a relatively static memory system that consists of knowledge about the world, concepts, and symbols. Considerable work in the past few decades has challenged this static view of semantic memory, and instead proposed a more fluid and flexible system that is sensitive to context, task demands, and perceptual and sensorimotor information from the environment. This paper (1) reviews traditional and modern computational models of semantic memory, within the umbrella of network (free association-based), feature (property generation norms-based), and distributional semantic (natural language corpora-based) models, (2) discusses the contribution of these models to important debates in the literature regarding knowledge representation (localist vs. distributed representations) and learning (error-free/Hebbian learning vs. error-driven/predictive learning), and (3) evaluates how modern computational models (neural network, retrieval-based, and topic models) are revisiting the traditional “static” conceptualization of semantic memory and tackling important challenges in semantic modeling such as addressing temporal, contextual, and attentional influences, as well as incorporating grounding and compositionality into semantic representations. The review also identifies new challenges regarding the abundance and availability of data, the generalization of semantic models to other languages, and the role of social interaction and collaboration in language learning and development. The concluding section advocates the need for integrating representational accounts of semantic memory with process-based accounts of cognitive behavior, as well as the need for explicit comparisons of computational models to human baselines in semantic tasks to adequately assess their psychological plausibility as models of human semantic memory.

Similar content being viewed by others

semantic essay

From Semantic Memory to Semantic Content

semantic essay

An Instance Theory of Semantic Memory

semantic essay

SNAFU: The Semantic Network and Fluency Utility

Avoid common mistakes on your manuscript.

Introduction

What does it mean to know what an ostrich is? The question of how meaning is represented and organized by the human brain has been at the forefront of explorations in philosophy, psychology, linguistics, and computer science for centuries. Does knowing the meaning of an ostrich involve having a prototypical representation of an ostrich that has been created by averaging over multiple exposures to individual ostriches? Or does it instead involve extracting particular features that are characteristic of an ostrich (e.g., it is big, it is a bird, it does not fly, etc.) that are acquired via experience, and stored and activated upon encountering an ostrich ? Further, is this knowledge stored through abstract and arbitrary symbols such as words, or is it grounded in sensorimotor interactions with the physical environment? The computation of meaning is fundamental to all cognition, and hence it is not surprising that considerable work has attempted to uncover the mechanisms that contribute to the construction of meaning from experience.

There have been several important historical seeds that have laid the groundwork for conceptualizing how meaning is learned and represented. One of the earliest attempts to study how meaning is represented was by Osgood ( 1952 ; also see Osgood, Suci, & Tannenbaum, 1957) through the use of the semantic differential technique. Osgood collected participants’ ratings of concepts (e.g., peace ) on several polar scales (e.g., hot-cold, good-bad, etc.), and using multidimensional scaling, showed that these ratings aligned themselves along three universal dimensions: evaluative (good-bad), potency (strong-weak), and activity (active-passive). Osgood’s work was important in the following two ways: (1) it introduced an empirical tool to study the nature of semantic representations; (2) it provided early evidence that the meaning of a concept or word may actually be distributed across several dimensions, in contrast to being represented through a localist representation, i.e., through a single dimension, feature, or node. As subsequently discussed, this contrast between localist and distributed meaning representations has led to different modeling approaches to understanding how meaning is learned and represented.

Another important milestone in the study of meaning was the formalization of the distributional hypothesis (Harris, 1970 ), best captured by the phrase “you shall know a word by the company it keeps” (Firth, 1957 ), which dates back to Wittgenstein’s early intuitions (Wittgenstein, 1953 ) about meaning representation. The idea behind the distributional hypothesis is that meaning is learned by inferring how words co-occur in natural language. For example, ostrich and egg may become related because they frequently co-occur in natural language, whereas ostrich and emu may become related because they co-occur with similar words. This distributional principle has laid the groundwork for several decades of work in modeling the explicit nature of meaning representation. Importantly, despite the fact that several distributional models in the literature do make use of distributed representations, it is their learning process of extracting statistical redundancies from natural language that makes them distributional in nature.

Another historically significant event in the study of meaning was Tulving’s ( 1972 ) classic distinction between episodic and semantic memory. Tulving proposed two subdivisions of declarative memory: episodic memory, consisting of memories of experiences linked to specific times and places (e.g., seeing an ostrich at the zoo last month), and semantic memory, storing general knowledge about the world and what verbal symbols (i.e., words) mean in an amodal (i.e., not linked to any specific modality) memory store (e.g., storing what an ostrich is, what it looks like, etc. through words). Although there are long-standing debates regarding the strong distinction between semantic and episodic memory (e.g., McKoon, Ratcliff, & Dell, 1986 ), this dissociation was supported by early neuropsychological studies of amnestic patients who were able to acquire new semantic knowledge without having any concrete memory for having learned this information (Gabrieli, Cohen, & Corkin, 1988 ; O’Kane, Kensinger, & Corkin, 2004 ). Indeed, the relative independence of these two types of memory systems has guided research efforts for many years, as is evidenced by early work on computational models of semantic memory. As described below, this perspective is beginning to change with the onset of more recent modeling perspectives.

These theoretical seeds have driven three distinct approaches to modeling the structure and organization of semantic memory: associative network models, distributional models, and feature-based models. Associative network models are models that represent words as individual nodes in a large memory network, such that words that are related in meaning are connected to each other through edges in the network (e.g., Collins & Loftus, 1975 ; Collins & Quillian, 1969 ). On the other hand, inspired by the distributional hypothesis, Distributional Semantic Models (DSMs) collectively refer to a class of models where the meaning of a word is learned by extracting statistical redundancies and co-occurrence patterns from natural language. Importantly, DSMs provide explicit mechanisms for how words or features for a concept may be learned from the natural environment. Finally, feature models assume that words are represented in memory as a distributed collection of binary features (e.g., birds have wings, whereas cars do not), and the correlation or overlap of these features determines the extent to which words have similar meanings (Smith, Shoben, & Rips, 1974 ; Tversky, 1977 ). Overall, the network-based, feature-based, and distributional approaches to semantic modeling have sparked important debates in the literature and informed our understanding of the different facets involved in the construction of meaning. Therefore, this review attempts to highlight important milestones in the study of semantic memory, identify challenges currently facing the field, and integrate traditional ideas with modern approaches to modeling semantic memory.

In a recent article, Günther, Rinaldi, and Marelli ( 2019 ) reviewed several common misconceptions about distributional semantic models and evaluated the cognitive plausibility of modern DSMs. Although the current review is somewhat similar in scope to Günther et al.’s work, the current paper has different aims. Specifically, this review is a comprehensive analysis of models of semantic memory across multiple fields and tasks and so is not focused only on DSMs. It ties together classic models in psychology (e.g., associative network models, standard DSMs, etc.) with current state-of-the-art models in machine learning (e.g., transformer neural networks, convolutional neural networks, etc.) to elucidate the potential psychological mechanisms that these fields posit to underlie semantic retrieval processes. Further, the present work reviews the literature on modern multimodal semantic models, compositional semantics, and newer retrieval-based models, and therefore assesses these newer models and applications from a psychological perspective. Therefore, while Günther et al.’s review serves the role of clarifying how DSMs may indeed represent a cognitively plausible account of how meaning is learned, the present review serves the role of presenting a more comprehensive assessment and synthesis of multiple classes of models, theories, and learning mechanisms, as well as drawing closer ties between the rapidly progressing machine-learning literature and the constraints imposed by psychological research on semantic memory – two fields that have so far been only loosely connected to each other. Therefore, the goal of the present review is to survey the current state of the field by tying together work from psychology, computational linguistics, and computer science, and also identify new challenges to guide future empirical research in modeling semantic memory.

This review emphasizes five important areas of research in semantic memory. The first section presents a modern perspective on the classic issues of semantic memory representation and learning. Associative, feature-based, and distributional semantic models are introduced and discussed within the context of how these models speak to important debates that have emerged in the literature regarding semantic versus associative relationships, prediction, and co-occurrence. In particular, a distinction is drawn between distributional models that propose error-free versus error-driven learning mechanisms for constructing meaning representations, and the extent to which these models explain performance in empirical tasks. Overall, although empirical tasks have partly informed computational models of semantic memory, the empirical and computational approaches to studying semantic memory have developed somewhat independently. Therefore, the first section attempts to bridge this gap by integrating empirical findings from lexical decision, pronunciation, and categorization tasks, with modeling approaches such as large-scale associative semantic networks (e.g., De Deyne, Navarro, Perfors, Brysbaert, & Storms, 2019 ; Steyvers & Tenenbaum, 2005 ), error-free learning-based DSMs (e.g., Jones & Mewhort, 2007 ; Landauer & Dumais, 1997 ), as well as error-driven learning-based models (e.g., Mikolov, Chen, Corrado, & Dean, 2013 ).

The second section presents an overview of psychological research in favor of conceptualizing semantic memory as part of a broader integrated memory system (Jamieson, Avery, Johns, & Jones, 2018 ; Kwantes, 2005 ; Yee, Jones, & McRae, 2018 ). The idea of semantic memory representations being context-dependent is discussed, based on findings from episodic memory tasks, sentence processing, and eye-tracking studies (e.g., Yee & Thompson-Schill, 2016 ). These empirical findings are then integrated with modern approaches to modeling semantic memory as a dynamic system that is sensitive to contextual dependencies, and can account for polysemy and attentional influences through topic models (Griffiths, Steyvers, & Tenenbaum, 2007 ), recurrent neural networks (Elman, 1991 ; Peters et al., 2018 ), and attention-based neural networks (Devlin, Chang, Lee, & Toutanova, 2019 ; Vaswani et al., 2017 ). The remainder of the section discusses the psychological plausibility of a relatively new class of models (retrieval-based models, e.g., Jamieson et al., 2018 ) that question the need for “learning” meaning at all, and instead propose that semantic representations are merely a product of retrieval-based operations in response to a cue, therefore blurring the traditional distinction between semantic and episodic memory (Tulving, 1972 ).

The third section discusses the issue of grounding , and how sensorimotor input and environmental interactions contribute to the construction of meaning. First, empirical findings from sensorimotor priming and cross-modal priming studies are discussed, which challenge the static, amodal, lexical nature of semantic memory that has been the focus of the majority of computational semantic models. There is now accumulating evidence that meaning cannot be represented exclusively through abstract, amodal symbols such as words (Barsalou, 2016 ). Therefore, important critiques of amodal computational models are clarified in the extent to which these models represent psychologically plausible models of semantic memory that include perceptual motor systems. Next, state-of-the-art computational models such as convolutional neural networks (Collobert et al., 2011 ), feature-integrated DSMs (Andrews, Vigliocco, & Vinson, 2009 ; Howell, Jankowicz, & Becker, 2005 ; Jones & Recchia, 2010 ), and multimodal DSMs (Bruni, Tran, & Baroni, 2014 ; Lazaridou, Pham, & Baroni, 2015 ) are discussed within the context of how these models are incorporating non-linguistic information in the learning process and tackling the grounding problem.

The fourth section focuses on the issue of compositionality , i.e., how words can be effectively combined and scaled up to represent higher-order linguistic structures such as sentences, paragraphs, or even episodic events. In particular, some early approaches to modeling compositional structures like vector addition (Landauer & Dumais, 1997 ), frequent phrase extraction (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013 ), and finding linguistic patterns in sentences (Turney & Pantel, 2010 ) are discussed. The rest of the section focuses on modern approaches to representing higher-order structures through hierarchical tree-based neural networks (Socher et al., 2013 ) and modern recurrent neural networks (Elman & McRae, 2019 ; Franklin, Norman, Ranganath, Zacks, & Gershman, 2019 ).

The fifth and final section focuses on some open issues in semantic modeling, such as proposing models that can be applied to other languages, issues related to data abundance and availability, understanding the social and evolutionary roles of language, and finding mechanistic process-based accounts of model performance. These issues shed light on important next steps in the study of semantic memory and will be critical in advancing our understanding of how meaning is constructed and guides cognitive behavior.

Many tasks, many models

Before delving into the details of each of the sections, it is important to emphasize here that models of semantic memory are inextricably tied to the behaviors and tasks that they seek to explain. For example, associative network models and early feature-based models explained response latencies in sentence verification tasks (e.g., deciding whether “a canary is a bird” is true or false). Similarly, early semantic models accounted for higher-order semantic relationships that emerge out of similarity judgments (e.g., Osgood, Suci, & Tannenbaum, 1957 ), although several of these models have since been applied to other tasks. Indeed, the study of meaning has spanned a variety of tasks, models, and phenomena, including but not limited to semantic priming effects in lexical decision tasks (Balota & Lorch, 1986 ), false memory paradigms (Deese, 1959 ; Roediger & McDermott, 1995 ), sentence verification (Smith et al., 1974 ), sentence comprehension (Duffy, Morris, & Rayner, 1988 ; Rayner & Frazier, 1989 ), and argument reasoning (Niven & Kao, 2019 ) tasks. Importantly, the cognitive processes underlying the sentence verification task may vastly differ from those underlying similarity judgments, which in turn may also differ from the processes underlying other complex tasks like reading comprehension and argument reasoning, and it is unclear whether and how a model of semantic memory that can successfully explain behavior in one task would be able to explain behavior in an entirely different task.

Of course, the ultimate goal of the semantic modeling enterprise is to propose one model of semantic memory that can be flexibly applied to a variety of semantic tasks, in an attempt to mirror the flexible and complex ways in which humans use knowledge and language (see, e.g., Balota & Yap, 2006 ). However, it is important to underscore the need to separate representational accounts from process -based accounts in the field. Modern approaches to modeling the representational nature of semantic memory have come very far in describing the continuum in which meaning exists, i.e., from the lowest-level input in the form of sensory and perceptual information, to words that form the building blocks of language, to high-level structures like schemas and events. However, process models operating on these underlying semantic representations have not received the same kind of attention and have developed somewhat independently from the representation modeling movement. For example, although process models like the drift-diffusion model (Ratcliff & McKoon, 2008 ), the optimal foraging model (Hills, 2006 ), and the temporal context model (Howard & Kahana, 2002 ) have been applied to some semantic tasks like verbal fluency (Hills, Jones, & Todd, 2012 ), free association (Howard, Shankar, & Jagadisan, 2011 ), and semantic judgments (e.g., Pirrone, Marshall, & Stafford, 2017 ), their application to different tasks remains limited and most research has instead focused on representational issues. Ultimately, combining process-based accounts with representational accounts is going to be critical in addressing some of the current challenges in the field, an issue that is emphasized in the final section of this review.

I. Semantic memory representation and learning

How individuals represent knowledge of concepts is one of the most important questions in semantic memory research and cognitive science. Therefore, significant research on human semantic memory has focused on issues related to memory representation and given rise to three distinct classes of models: associative network models, feature-based models, and distributional semantic models. This section presents a broad overview of these models, and also discusses some important debates regarding memory representation that these models have sparked in the field. Another related fundamental question in semantic memory research is regarding the learning of concepts, and the remainder of this section focuses on semantic models that subscribe to two broad psychological mechanisms (error-free and error-driven learning) that have been posited to underlie the learning of meaning representations.

Semantic memory representation

Network-based approaches.

Network-based approaches to semantic memory have a long and rich tradition rooted in psychology and computer science. Collins and Quillian ( 1969 ) investigated how individuals navigate through semantic memory to verify the truth of sentences (e.g., the time taken to verify that a shark <has fins>), and found that retrieval times were most consistent with a hierarchically organized memory network (see Fig. 1 ), where nodes represented words, and links or edges represented semantic propositions about the words (e.g., fish was connected to animal by an “is a” link, and fish also had its own attributes such as <has fins> and <can swim>). The mechanistic account of these findings was through a spreading activation framework (Quillian, 1967 , 1969 ), according to which individual nodes in the network are activated, which in turn leads to the activation of neighboring nodes, and the network is traversed until the desired node or proposition is reached and a response is made. Interestingly, the number of steps taken to traverse the path in the proposed memory network predicted the time taken to verify a sentence in the original Collins and Quillian ( 1969 ) model. However, the original model could not explain typicality effects (e.g., why individuals respond faster to “ robin <is a> bird ” compared to “ ostrich <is a> bird ”), and also encountered difficulties in explaining differences in latencies for “false” sentences (e.g., why individuals are slower to reject “ butterfly <is a> bird” compared to “ dolphin <is a> bird”). Collins and Loftus ( 1975 ) later proposed a revised network model where links between words reflected the strength of the relationship, thereby eliminating the hierarchical structure from the original model to better account for behavioral patterns. This network/spreading activation framework was extensively applied to more general theories of language, memory, and problem solving (e.g., Anderson, 2000 ).

figure 1

Original network proposed by Collins and Quillian ( 1969 ). Reprinted from Balota and Coane ( 2008 )

Computational network-based models of semantic memory have gained significant traction in the past decade, mainly due to the recent popularity of graph theoretical and network-science approaches to modeling cognitive processes (for a review, see Siew, Wulff, Beckage, & Kenett, 2018 ). Modern network-based approaches use large-scale databases to construct networks and capture large-scale relationships between nodes within the network. This approach has been used to empirically study the World Wide Web (Albert, Jeong, & Barabási, 2000 ; Barabási & Albert, 1999 ), biological systems (Watts & Strogatz, 1998 ), language (Steyvers & Tenenbaum, 2005 ; Vitevitch, Chan, & Goldstein, 2014 ), and personality and psychological disorders (for reviews, see Fried et al., 2017 ). Within the study of semantic memory, Steyvers and Tenenbaum ( 2005 ) pioneered this approach by constructing three different semantic networks using large-scale free-association norms (Nelson, McEvoy, & Schreiber, 2004 ), Roget’s Thesaurus (Roget, 1911 ), and WordNet (Fellbaum, 1998 ; Miller, 1995 ). They presented several analyses to indicate that semantic networks possessed a “small-world structure,” as indexed by high clustering coefficients (a parameter that estimates the likelihood that neighbors of two nodes will be neighbors themselves) and short average path lengths (a parameter that estimates the average number of steps from one node in the network to another), similar to several naturally occurring networks. Importantly, network metrics such as path length and clustering coefficients provide a quantitative way of estimating the large-scale organizational structure of semantic memory and the strength of relationships between words in a network (see Fig. 2 ), and have also proven to be successful in explaining performance across a wide variety of tasks, such as relatedness judgments (De Deyne & Storms, 2008 ; Kenett, Levi, Anaki, & Faust, 2017 ; Kumar, Balota, & Steyvers, 2019 ), verbal fluency (Abbott, Austerweil, & Griffiths, 2015 ; Zemla & Austerweil, 2018 ), and naming (Steyvers & Tenenbaum, 2005 ).Other work in this area has explored the influence of semantic network metrics on psychological disorders (Kenett, Gold, & Faust, 2016 ), creativity (Kenett, Anaki, & Faust, 2014 ), and personality (Beaty et al., 2016 ).

figure 2

Large-scale visualization of a directed semantic network created by Steyvers and Tenenbaum ( 2005 ) and shortest path between RELEASE to ANCHOR. Adapted from Kumar, Balota, and Steyvers ( 2019 )

Despite the success of modern semantic networks at predicting cognitive performance, there is some skepticism in the field regarding the use of free-association norms to create network representations (Jones, Hills, & Todd, 2015 ; Siew et al., 2018 ). Specifically, it is not clear whether networks constructed from association norms are indeed a representational account of semantic memory or simply reflect the product of a retrieval-based process on an underlying representation of semantic memory. For example, producing the response ostrich to the word emu represents a retrieval-based process cued by the word emu , and may not necessarily reflect how the underlying representation of the words came to be closely associated in the first place. Therefore, it appears that associative network models lack an explicit mechanism through which concepts were learned in the first place.

A recent example of this fundamental debate regarding the origin of the representation comes from research on the semantic fluency task, where participants are presented with a natural category label (e.g., “animals”) and are required to generate as many exemplars from that category (e.g., lion , tiger , elephant …) as possible within a fixed time period. Hills, Jones, and Todd ( 2012 ) proposed that the temporal pattern of responses produced in the fluency task mimics optimal foraging techniques found among animals in natural environments. They provided a computational account of this search process based on the BEAGLE model (Jones & Mewhort, 2007 ). However, Abbott et al. ( 2015 ) contended that the behavioral patterns observed in the task could also be explained by a more parsimonious random walk on a network representation of semantic memory created from free-association norms. This led to a series of rebuttals from both camps (Jones, Hills, & Todd, 2015 ; Nematzadeh, Miscevic, & Stevenson, 2016 ), and continues to remain an open debate in the field (Avery & Jones, 2018 ). However, Jones, Hills, and Todd ( 2015 ) argued that while free-association norms are a useful proxy for memory representation, they remain an outcome variable from a search process on a representation and cannot be a pure measure of how semantic memory is organized. Indeed, Avery and Jones ( 2018 ) showed that when the input to the network and distributional space was controlled (i.e., both were constructed from text corpora), random walk and foraging-based models both explained semantic fluency data, although the foraging model outperformed several different random walk models. Of course, these findings are specific to the semantic fluency task and adequately controlled comparisons of network models to DSMs remain limited. However, this work raises the question of whether the success of association networks in explaining behavioral performance in cognitive tasks is a consequence of shared variance with the cognitive tasks themselves (e.g., fluency tasks can be thought of as association tasks constrained to a particular category) or truly reflects the structural representation of semantic memory, an issue that is discussed in detail in the section summary. Nonetheless, recent work in this area has focused on creating network representations using a learning model instead of behavioral data (Nematzadeh et al., 2016 ), and also advocated for alternative representations that incorporate such learning mechanisms and provide a computational account of how word associations might be learned in the first place.

Feature-based approaches

Feature-based models depart from the traditional notion that a word has a localized representation (e.g., in an association network). The core idea behind feature models is that words are represented in memory as a collection of binary features (e.g., birds have wings , whereas cars do not), and the correlation or overlap of these features determines the extent to which words have similar meanings. Smith et al. ( 1974 ) proposed a feature-comparison model in which concepts had two types of semantic features: defining features that are shared by all concepts, and characteristic features that are specific to only some exemplars. For example, all birds <have wings> (defining feature) but not all birds <fly> (characteristic feature). Similarity between concepts in this model was computed through a feature comparison process, and the degree of overlap between the features of two concepts directly predicted sentence verification times, typicality effects, and differences in response times in responding to “false” sentences. This notion of featural overlap as an index of similarity was also central to the theory of feature matching proposed by Tversky ( 1977 ). Tversky viewed similarity as a set-theoretical matching function, such that the similarity between a and b could be conceptualized through a contrast model as a function of features that are common to both a and b (common features), and features that belong to a but not b , as well as features that belong to b but not a (distinctive features). Tversky’s contrast model successfully accounted for asymmetry in similarity judgments and judgments of difference for words, shapes, and letters.

Although early feature-based models of semantic memory set the groundwork for modern approaches to semantic modeling, none of the models had any systematic way of measuring these features (e.g., Smith et al., 1974 , applied multidimensional scaling to similarity ratings to uncover underlying features). Later versions of feature-based models thus focused on explicitly coding these features into computational models by using norms from property-generation tasks (McRae, De Sa, & Seidenberg, 1997 ). To obtain these norms, participants were asked to list features for concepts (e.g., for the word ostrich , participants may list <is a> bird, <has wings>, <is heavy>, and <does not fly> as features), the idea being that these features constitute explicit knowledge participants have about a concept. McRae et al. then used these features to train a model using simple correlational learning algorithms (see next subsection) applied over a number of iterations, which enabled the network to settle into a stable state that represented a learned concept. A critical result of this modeling approach was that correlations among features predicted response latencies in feature-verification tasks in human participants as well as model simulations. Importantly, this approach highlighted how statistical regularities among features may be encoded in a memory representation over time. Subsequent work in this line of research demonstrated how feature correlations predicted differences in priming for living and nonliving things and explained typicality effects (McRae, 2004 ).

Despite the success of computational feature-based models, an important limitation common to both network and feature-based models was their inability to explain how knowledge of individual features or concepts was learned in the first place. For example, while feature-based models can explain that ostrich and emu are similar because both <have feathers>, how did an individual learn that <having feathers> is a feature that an ostrich or emu has? McRae et al. claimed that features were derived from repeated multimodal interactions with exemplars of a particular concept, but how this learning process might work in practice was missing from the implementation of these models. Still, feature-based models have been very useful in advancing our understanding of semantic memory structure, and the integration of feature-based information with modern machine-learning models continues to remain an active area of research (see Section III).

Distributional approaches

Distributional Semantic Models (DSMs) refer to a class of models that provide explicit mechanisms for how words or features for a concept may be learned from the natural environment. Therefore, unlike associative network models or feature-based models, DSMs do not use free-association responses or feature norms, but instead build representations by directly extracting statistical regularities from a large natural language corpus (e.g., books, newspapers, online articles, etc.), the assumption being that large text corpora are a good proxy for the language that individuals are exposed to in their lifetime. The principle of extracting co-occurrence patterns and inferring associations between concepts/words from a large text-corpus is at the core of all DSMs, but exactly how these patterns are extracted has important implications for how these models conceptualize the learning process. Specifically, two distinct psychological mechanisms have been proposed to account for associative learning, broadly referred to as error-free and error-driven learning mechanisms. Error-free learning mechanisms refer to a class of psychological mechanisms that posit that learning occurs by identifying clusters of events that tend to co-occur in temporal proximity, and dates back to Hebb’s ( 1949 ; also see McCulloch & Pitts, 1943) proposal of how individual neurons in the brain adjust their firing rates and activations in response to activations of other neighboring neurons. This Hebbian learning mechanism is at the heart of several classic and recent models of semantic memory, which are discussed in this section. On the other hand, error-driven learning mechanisms posit that learning is accomplished by predicting events in response to a stimulus, and then applying an error-correction mechanism to learn associations. Error-correction mechanisms often vary across learning models but broadly share principles with Rescorla and Wagner’s ( 1972 ) model of animal cognition, where they described how learning may actually be driven by expectation error, instead of error-free associative learning (Rescorla, 1988 ). This section reviews DSMs that are consistent with the error-free and error-driven learning approaches to constructing meaning representations, and the summary section discusses the evidence in favor of and against each class of models.

Semantic memory learning

Error-free learning-based dsms.

One of the earliest DSMs, the Hyperspace Analogue to Language (HAL; Lund & Burgess, 1996 ), built semantic representations by counting the co-occurrences of words within a sliding window of five to ten words, where co-occurrence between any two words was inversely proportional to the distance between the two words in that window. These local co-occurrences produced a word-by-word co-occurrence matrix that served as a spatial representation of meaning, such that words that were semantically related were closer in a high-dimensional space (see Fig. 3 ; ear , eye , and nose all acquire very similar representations). This relatively simple error-free learning mechanism was able to account for a wide variety of cognitive phenomena in tasks such as lexical decision and categorization (Li, Burgess, & Lund, 2000 ). However, HAL encountered difficulties in accounting for mediated priming effects (Livesay & Burgess, 1998 ; see section summary for details), which was considered as evidence in favor of semantic network models. However , despite its limitations, HAL was an important step in the ongoing development of DSMs.

figure 3

The high-dimensional space produced by HAL from co-occurrence word vectors. Adapted from Lund and Burgess ( 1996 )

Another popular distributional model that has been widely applied across cognitive science is Latent Semantic Analysis (LSA; Landauer & Dumais, 1997 ), a semantic model that has successfully explained performance in several cognitive tasks such as semantic similarity (Landauer & Dumais, 1997 ), discourse comprehension (Kintsch, 1998 ), and essay scoring (Landauer, Laham, Rehder, & Schreiner, 1997 ). LSA begins with a word-document matrix of a text corpus, where each row represents the frequency of a word in each corresponding document, which is clearly different from HAL’s word-by-word co-occurrence matrix. Further, unlike HAL, LSA first transforms these simple frequency counts into log frequencies weighted by the word’s overall importance over documents, to de-emphasize the influence of unimportant frequent words in the corpus. This transformed matrix is then factorized using truncated singular value decomposition, a factor-analytic technique used to infer latent dimensions from a multidimensional representation. The semantic representation of a word can then be conceptualized as an aggregate or distributed pattern across a few hundred dimensions. The construction of a word-by-document matrix and the dimensionality reduction step are central to LSA and have the important consequence of uncovering global or indirect relationships between words even if they never co-occurred with each other in the original context of documents. For example, lion and stripes may have never co-occurred within a sentence or document, but because they often occur in similar contexts of the word tiger , they would develop similar semantic representations. Importantly, the ability to infer latent dimensions and extend the context window from sentences to documents differentiates LSA from a model like HAL.

Despite its widespread application and success, LSA has been criticized on several grounds over the years, e.g., for ignoring word transitions (Perfetti, 1998 ), violating power laws of connectivity (Steyvers & Tenenbaum, 2005 ), and for the lack of a mechanism for learning incrementally (Jones, Willits, & Dennis, 2015 ). The last point is particularly important, as the LSA model assumes that meaning is learned and computed after a large amount of co-occurrence information is available (i.e., in the form of a word-by-document matrix). This is clearly unconvincing from a psychological standpoint and is often cited as a reason for distributional models being implausible psychological models (Hoffman, McClelland, & Lambon Ralph, 2018 ; Sloutsky, Yim, Yao, & Dennis, 2017 ). However, as Günther et al. ( 2019 ) have recently noted, this is an argument against batch-learning models like LSA, and not distributional models per se. In principle, LSA can learn incrementally by updating the co-occurrence matrix as each input is received and re-computing the latent dimensions (for a demonstration, see Olney, 2011 ), although this process would be computationally very expensive. In addition, there are several modern DSMs that are incremental learners and propose psychologically plausible accounts of semantic representation.

One such incremental approach involves developing random representations of words that slowly accumulate information about meaning through repeated exposure to words in a large text corpus. For example, Bound Encoding of the Aggregate Language Environment (BEAGLE; Jones & Mewhort, 2007 ) is a random vector accumulation model that gradually builds semantic representations as it processes text in sentence-sized context windows. BEAGLE begins by assigning a random, static environmental vector to a word the first time it is encountered in the corpus. This environmental vector does not change over different exposures of the word and is hypothesized to represent stable physical characteristics about the word. When words co-occur in a sentence, their environmental vectors are added to each other’s representations, and, thus, their memory representations become similar over time. Further, even if two words never co-occur, they develop similar representations if they co-occur with the same words. This leads to the formation of higher-order relationships between words, without performing any LSA-type dimensionality reduction. Importantly, BEAGLE integrates this context-based information with word-order information using a technique called circular convolution (an effective method to combine two n-dimensional vectors into an associated vector of the same dimensions). BEAGLE computes order information by binding together all word chunks (formally called n -grams) that a particular word is part of (e.g., for the sentence “an ostrich flapped its wings”, the two-gram convolution would bind the representations for < an , ostrich > and < ostrich , flapped > together) and then summing this order vector with the word’s context vector to compute the final semantic representation of the word. Thus, words that co-occur in similar contexts as well as in the same syntactic positions develop similar representations as the model acquires more experience through the corpus. BEAGLE outperforms several classic models of word representation (e.g., LSA and HAL), and explains performance on several complex tasks, such as mediated priming effects in lexical decision and pronunciation tasks, typicality effects in exemplar categorization, and reading times in stem completion tasks (Jones & Mewhort, 2007 ). Importantly, through the addition of environmental vectors of words whenever they co-occur, BEAGLE also indirectly infers relationships between words that did not directly co-occur. This process is similar in principle to inferring indirect co-occurrences across documents in LSA and can be thought of as an abstraction-based process applied to direct co-occurrence patterns, albeit through different mechanisms. Other incremental models use ideas similar to BEAGLE for accumulating semantic information over time, although they differ in their theoretical underpinnings (Howard et al., 2011 ; Sahlgren, Holst, & Kanerva, 2008 ) and the extent to which they integrate order information in the final representations (Kanerva, 2009 ). It is important to note here that the DSMs discussed so far (HAL, LSA, and BEAGLE) all share the principle of deriving meaning representations through error-free learning mechanisms, in the spirit of Hebbian associative learning. The following section discusses other DSMs that also produce rich semantic representations but are instead based on error-driven learning mechanisms or prediction.

Error-driven learning-based DSMs

In contrast to error-free learning DSMs, a different approach to building semantic representations has focused on how representations may slowly develop through prediction and error-correction mechanisms. These models are also referred to as connectionist models and propose that meaning emerges through prediction-based weighted interactions between interconnected units (Rumelhart, Hinton, & McClelland, 1986 ). Most connectionist models typically consist of an input layer, an output layer, and one or more intervening units collectively called the hidden layers, each of which contains one or more “nodes” or units. Activating the nodes of the input layer (through an external stimulus) leads to activation or suppression of units connected to the input units, as a function of the weighted connection strengths between the units. Activation gradually reaches the output units, and the relationship between output units and input units is of primary interest. Learning in connectionist models (sometimes called feed-forward networks if there are no recurrent connections, see section II), can be accomplished in a supervised or unsupervised manner. In supervised learning, the network tries to maximize the likelihood of a desired goal or output for a given set of input units by predicting outputs at every iteration. The weights of the signals are thus adjusted to minimize the error between the target output and the network’s output, through error backpropagation (Rumelhart, Hinton, & Williams, 1988 ). In unsupervised learning, weights within the network are adjusted based on the inherent structure of the data, which is used to inform the model about prediction errors (e.g., Mikolov, Chen, et al., 2013 ; Mikolov, Sutskever, et al., 2013 ).

Rumelhart and Todd ( 1993 ) proposed one of the first feed-forward connectionist models of semantic memory. To train the network, all facts represented in a traditional semantic network (e.g., Collins & Quillian, 1969 ) were first converted to input-output training pairs (e.g., the fact bird <has wings> was converted to term 1 : bird – relation : has – term 2 : wings). Then, the network learned semantic representations in a supervised manner, by turning on the input and relation units, and backpropagating the error from predicted output units through two hidden layers. For example, the words oak and pine acquired a similar pattern of activation across the hidden units because their node-relations pairs were similar during training. Additionally, the network was able to hierarchically learn information about new concepts (e.g., adding the sparrow <is a> bird link in the model formed a new representation for sparrow that also included relations like <has wings>, <can fly>, etc.). Connectionist networks are sometimes also called neural networks (NNs) to emphasize that connectionist models (old and new) are inspired by neurobiology and attempt to model how the brain might process incoming input and perform a particular task, although this is a very loose analogy and modern researchers do not view neural networks as accurate models of the brain (Bengio, Goodfellow, & Courville, 2015 ).

A feed-forward NN model, word2vec, proposed by researchers at Google (Mikolov, Chen, et al., 2013 ) has gained immense popularity in the last few years due to its impressive performance on a variety of semantic tasks. Word2vec is a two-layer NN model that has two versions: continuous bag-of-words (CBOW) and skip-gram. The objective of the CBOW model is to predict a target word, given four context words before and after the intended word, using a classifier. The skip-gram model reverses this objective and attempts to predict the surrounding context words, given an input word (see Figs. 4 and 5 ). In this way, word2vec trains the network on a surrogate task and iteratively improves the word representations or “embeddings” (represented via the hidden layer units) formed during this process by computing stochastic gradient descent, a common technique to compute prediction error for backpropagation in NN models. Further, word2vec tweaks several hyperparameters to achieve optimal performance. For example, it uses dynamic context windows so that words that are more distant from the target word are sampled less frequently in the prediction task. Additionally, word2vec de-emphasizes the role of frequent words by discarding frequent words above a threshold with some probability. Finally, to refine representations, word2vec uses negative sampling, by which the model randomly samples a set of unrelated words and learns to suppress these words during prediction. These sophisticated techniques allow word2vec to develop very rich semantic representations. For example, word2vec is able to solve verbal analogy problems, e.g., man: king :: woman: ??, through simple vector arithmetic (but see Chen, Peterson, & Griffiths, 2017 ), and also model human similarity judgments. This indicates that the representations acquired by word2vec are sensitive to complex higher-order semantic relationships, a characteristic that had not been previously observed or demonstrated in other NN models. Further, word2vec is a very weakly supervised (or unsupervised) learning algorithm, as it does not require labeled or annotated data but only sequential text (i.e., sentences) to generate the word embeddings. word2vec’s pretrained embeddings have proven to be useful inputs for several downstream natural language-processing tasks (Collobert & Weston, 2008 ) and have inspired several other embedding models. For example, fastText (Bojanowski, Grave, Joulin, & Mikolov, 2017 ) is a word2vec-type NN that incorporates character-level information (i.e., n-grams) in the learning process, which leads to more fine-grained representations for rare words and words that are not in the training corpus. However, the psychological validity of some of the hyperparameters used by word2vec has been called into question by some researchers. For example, Johns, Mewhort, and Jones ( 2019 ) recently investigated how negative sampling, which appears to be psychologically unintuitive, affects semantic representations. They argued that negative sampling simply establishes a more accurate base rate of word occurrence and proposed solutions to integrate base-rate information into BEAGLE without the need to randomly sample unrelated words or even a prediction mechanism. However, as discussed in subsequent sections, prediction appears to be a central mechanism in certain tasks that involve sequential dependencies, and it is possible that NN models based on prediction are indeed capturing these long-term dependencies.

figure 4

A depiction of the skip-gram version of the word2vec model architecture. The model is creating a vector representation for the word lived by predicting its surrounding words in the sentence “Jane’s mother lived in Paris.” The weights of the hidden layer represent the vector representation for the word lived , as the model performs the prediction task and adjusts the weights based on the prediction error. Adapted from Günther et al. ( 2019 )

figure 5

Ratio of co-occurrence probabilities for ice and steam, as described in Pennington et al. ( 2014 )

Another modern distributional model, Global Vectors (GloVe), which was recently introduced by Pennington, Socher, and Manning ( 2014 ), shares similarities with both error-free and NN-based error-driven models of word representation. Similar to several DSMs, GloVe begins with a word-by-word co-occurrence matrix. But, instead of using raw counts as a starting point, GloVe estimates the ratio of co-occurrence probabilities between words. To give an example used by the authors, based on statistics from text corpora, ice co-occurs more frequently with solid than it does with gas , whereas steam co-occurs more frequently with gas than it does with solid . Further, both words ( ice and steam ) co-occur with their shared property water frequently, and both co-occur with the unrelated word fashion infrequently. The critical insight that GloVE capitalizes on is that words like water and fashion are non-discriminative, but the words gas and solid are important in differentiating between ice and steam . The ratio of probabilities highlights these differences, such that large values (much greater than 1) correspond to properties specific to ice , and small values (much less than 1) correspond to properties specific of steam (see Fig. 4 ). In this way, co-occurrence ratios successfully capture abstract concepts such as thermodynamic phases. GloVe aims to predict the logarithm of these co-occurrence ratios between words using a regression model, in the same spirit as factorizing the logarithm of the co-occurrence matrix in LSA. Therefore, while incorporating global information in the learning process (similar to LSA), GloVe also uses error-driven mechanisms to minimize the cost function from the regression model (using a modified version of stochastic gradient descent, similar to word2vec), and therefore represents a type of hybrid model. Further, to de-emphasize the overt influence of frequent and rare words, GloVe penalizes words with very high and low frequency (similar to importance weighting in LSA). The final abstracted representations or “embeddings” that emerge from the GloVe model are particularly sensitive to higher-order semantic relationships, and the GloVe model has been shown to perform remarkably well at analogy tasks, word similarity judgments, and named entity recognition (Pennington et al., 2014 ), although there is little consensus in the field regarding the relative performance of GloVe against strictly prediction-based models (e.g., word2vec; see Baroni, Dinu, & Kruszewski, 2014 ; Levy & Goldberg, 2014 )

This section provided a detailed overview of traditional and recent computational models of semantic memory and highlighted the core ideas that have inspired the field in the past few decades with respect to semantic memory representation and learning. While several models draw inspiration from psychological principles, the differences between them certainly have implications for the extent to which they explain behavior. This summary focuses on the extent to which associative network and feature-based models, as well as error-free and error-driven learning-based DSMs speak to important debates regarding association, direct and indirect patterns of co-occurrence, and prediction.

Semantic versus associative relationships

Within the network-based conceptualization of semantic memory, concepts that are related to each other are directly connected (e.g., ostrich and emu have a direct link). An important insight that follows from this line of reasoning is that if ostrich and emu are indeed related, then processing one of the words should facilitate processing for the other word. This was indeed the observation made by Meyer and Schvaneveldt ( 1971 ), who reported the first semantic priming study, where they found that individuals were faster to make lexical decisions (deciding whether a presented stimulus was a word or non-word) for semantically related (e.g., ostrich - emu ) word pairs, compared to unrelated word pairs (e.g., apple - emu ). Given that individuals were not required to access the semantic relationship between words to make the lexical decision, these findings suggested that the task potentially reflected automatic retrieval processes operating on underlying semantic representations (also see Neely, 1977 ). The semantic priming paradigm has since become the most widely applied task in cognitive psychology to examine semantic representation and processes (for reviews, see Hutchison, 2003 ; Lucas, 2000 ; Neely, 1977 ).

An important debate that arose within the semantic priming literature was regarding the nature of the relationship that produces the semantic priming effect as well as the basis for connecting edges in a semantic network. Specifically, does processing the word ostrich facilitate the processing of the word emu due to the associative strength of connections between ostrich and emu , or because the semantic features that form the concepts of ostrich and emu largely overlap? As discussed earlier, associative relations are thought to reflect contiguous associations that individuals likely infer from natural language (e.g., ostrich - egg ). Traditionally, such associative relationships have been operationalized through responses in a free-association task (e.g., De Deyne et al., 2019 ; Nelson et al., 2004 ). On the other hand, semantic relations have traditionally included only category coordinates or concepts with similar features (e.g., ostrich - emu ; Hutchison, 2003 ; Lucas, 2000 ). Given these different operationalizations, some researchers have attempted to isolate pure “semantic” priming effects by selecting items that are semantically related (i.e., share category membership; Fischler, 1977 ; Lupker, 1984 ; Thompson-Schill, Kurtz, & Gabrieli, 1998 ) but not associatively related (i.e., based on free-association norms), although these attempts have not been successful. Specifically, there appear to be discrepancies in how associative strength is defined and the locus of these priming effects. For example, in a meta-analytic review, Lucas ( 2000 ) concluded that semantic priming effects can indeed be found in the absence of associations, arguing for the existence of “pure” semantic effects. In contrast, Hutchison ( 2003 ) revisited the same studies and argued that both associative and semantic relatedness can produce priming, and the effects largely depend on the type of semantic relation being investigated as well as the task demands (also see Balota & Paul, 1996 ).

Another line of research in support of associative influences underlying semantic priming comes from studies on mediated priming. In a typical experiment, the prime (e.g., lion ) is related to the target (e.g., stripes ) only through a mediator (e.g., tiger ), which is not presented during the task. The critical finding is that robust priming effects are observed in pronunciation and lexical decision tasks for mediated word pairs that do not share any obvious semantic relationship or featural overlap (Balota & Lorch, 1986 ; Livesay & Burgess, 1998 ; McNamara & Altarriba, 1988 ). Traditionally, mediated priming effects have been explained through an associative-network based account of semantic representation (e.g., Balota & Lorch, 1986 ), where, consistent with a spreading activation mechanism, activation from the prime node (e.g., lion ) spreads to the mediator node in the network (e.g., tiger ), which in turn activates the related target node (e.g., stripes ). Recent computational network models have supported this conceptualization of semantic memory as an associative network. For example, Kenett et al. ( 2017 ) constructed a Hebrew network based on correlations of responses in a free-association task, and showed that network path lengths in this Hebrew network successfully predicted the time taken by participants to decide whether two words were related or unrelated, for directly related (e.g., bus - car ) and relatively distant word pairs (e.g., cheater - carpet ). More recently, Kumar, Balota, and Steyvers ( 2019 ) replicated Kenett et al.’s work in a much larger corpus in English, and also showed that undirected and directed networks created by Steyvers and Tenenbaum ( 2005 ) also account for such distant priming effects.

While network models provide a straightforward account for mediated (and distant) priming, such effects were traditionally considered a core challenge for feature-based and distributional semantic models (Hutchison, 2003 ; Masson, 1995 ; Plaut & Booth, 2000 ). The argument was that in feature-based representations that conceptualize word meaning through the presence or absence of features, lion and stripes would not overlap because lions do not have stripes . Similarly, in distributional models, at least some early evidence from the HAL model suggested that mediated word pairs neither co-occur nor have similar high-dimensional vector representations (Livesay & Burgess, 1998 ), which was taken as evidence against a distributional representation of semantic memory. However, other distributional models such as LSA and BEAGLE have since been able to account for mediated priming effects (e.g., Chwilla & Kolk, 2002 ; Hutchison, 2003 ; Jones, Kintsch, & Mewhort, 2006 ; Jones & Mewhort, 2007 ; Kumar, Balota, & Steyvers, 2019 ). In fact, Jones et al. ( 2006 ) showed that HAL’s greater focus on “semantic” relationships contributes to its inability to account for mediated priming effects, which are more “associative” in nature (also see Sahlgren, 2008 ). However, LSA and other DSMs that subscribe to a broader conceptualization of meaning that includes both local “associative” as well as global “semantic” relationships are indeed able to account for mediated priming effects. The counterargument is that mediated priming may simply reflect weak semantic relationships between words (McKoon & Ratcliff, 1992 ), which can indeed be learned from statistical regularities in natural language. Thus, even though lion and stripes may have never co-occurred, newer semantic models that capitalize on higher-order indirect relationships are able to extract similar vectors for these words and produce the same priming effects without the need for a mediator or a spreading activation mechanism (Jones et al., 2006 ).

Therefore, an important takeaway from these studies on clarifying the locus of semantic priming effects is that the traditional distinction between associative and semantic relations may need to be revisited. Importantly, the operationalization of associative relations through free-association norms has further complicated this distinction, as only responses that are produced in free-association tasks have been traditionally considered to be associative in nature. However, free association responses may themselves reflect a wide variety of semantic relations (McRae, Khalkhali, & Hare, 2012 ; see also Guida & Lenci, 2007 ) that can produce different types of semantic priming (Hutchison, 2003 ). Indeed, as McRae et al. ( 2012 ) noted, several of the associative level relations examined in previous work (e.g., Lucas, 2000 ) could in fact be considered semantically related in the broad sense (e.g., scene, feature, and script relations). Within this view, it is unclear exactly how associative relations operationalized in this way can be truly separated from semantic relations, or conversely, how semantic relations could truly be considered any different from simple associative co-occurrence. In fact, it is unlikely that words are purely associative or purely semantically related. As McNamara ( 2005 ) noted, “Having devoted a fair amount of time perusing free-association norms, I challenge anyone to find two highly associated words that are not semantically related in some plausible way” (McNamara, 2005 ; pp. 86). Furthermore, the traditional notion of what constitutes a “semantic” relationship has changed and is no longer limited to only coordinate or feature-based overlap, as is evidenced by the DSMs discussed in this section. Therefore, it appears that both associative relationships as well as coordinate/feature relationships now fall within the broader umbrella of what is considered semantic memory.

There is one possible way to reconcile the historical distinction between what are considered traditionally associative and “semantic” relationships. Some relationships may be simply dependent on direct and local co-occurrence of words in natural language (e.g., ostrich and egg frequently co-occur in natural language), whereas other relationships may in fact emerge from indirect co-occurrence (e.g., ostrich and emu do not co-occur with each other, but tend to co-occur with similar words). Within this view, traditionally “associative” relationships may reflect more direct co-occurrence patterns, whereas traditionally “semantic” relationships, or coordinate/featural relations, may reflect more indirect co-occurrence patterns. As discussed in this section, DSMs often distinguish between and differentially emphasize these two types of relationships (i.e., direct vs. indirect co-occurrences; see Jones et al., 2006 ), which has important implications for the extent to which these models speak to this debate between associative vs. truly semantic relationships. The combined evidence from the semantic priming literature and computational modeling literature suggests that the formation of direct associations is most likely an initial step in the computation of meaning. However, it also appears that the complex semantic memory system does not simply rely on these direct associations but also applies additional learning mechanisms (vector accumulation, abstraction, etc.) to derive other meaningful, indirect semantic relationships. Implementing such global processes allows modern distributional models to develop more fine-grained semantic representations that capture different types of relationships (direct and indirect). However, there do appear to be important differences in the underlying mechanisms of meaning construction posited by different DSMs. Further, there is also some concern in the field regarding the reliance on pure linguistic corpora to construct meaning representations (De Deyne, Perfors, & Navarro, 2016 ), an issue that is closely related to assessing the role of associative networks and feature-based models in understanding semantic memory, as discussed below. Furthermore, it is also unlikely that any semantic relationships are purely direct or indirect and may instead fall on a continuum, which echoes the arguments posed by Hutchison ( 2003 ) and Balota and Paul ( 1996 ) regarding semantic versus associative relationships.

Value of associative networks and feature-based models

Another important part of this debate on associative relationships is the representational issues posed by association network models and feature-based models. As discussed earlier, the validity of associative semantic networks and feature-based models as accurate models of semantic memory has been called into question (Jones, Hills, & Todd, 2015 ) due to the lack of explicit mechanisms for learning relationships between words. One important observation from this work is that the debate is less about the underlying structure (network-based/localist or distributed) and more about the input contributing to the resulting structure. Networks and feature lists in and of themselves are simply tools to represent a particular set of data, similar to high-dimensional vector spaces. As such, cosines in vector spaces can be converted to step-based distances that form a network using cosine thresholds (e.g., Gruenenfelder, Recchia, Rubin, & Jones, 2016 ; Steyvers & Tenenbaum, 2005 ) or a binary list of features (similar to “dimensions” in DSMs). Therefore, the critical difference between associative networks/feature-based models and DSMs is not that the former is a network/list and the latter is a vector space, but rather the fact that associative networks are constructed from free-association responses, feature-based models use property norms, and DSMs learn from text corpora. Therefore, as discussed earlier, the success of associative networks (or feature-based models) in explaining behavioral performance in cognitive tasks could be a consequence of shared variance with the cognitive tasks themselves. However, associative networks also explain performance in tasks that are arguably not based solely on retrieving associations or features – for example, progressive demasking (Kumar, Balota, & Steyvers, 2019 ), similarity judgments (Richie, Zou, & Bhatia, 2019 ), and the remote triads task where participants are asked to choose the most related pair among a set of three nouns (De Deyne, Perfors, & Navarro, 2016 ). This points to the possibility that the part of the variance explained by associative networks or feature-based models may in fact be meaningful variance that distributional models are unable to capture, instead of entirely being shared task-based variance.

To the extent that DSMs are limited by the corpora they are trained on (Recchia & Jones, 2009 ), it is possible that the responses from free-association tasks and property-generation norms capture some non-linguistic aspects of meaning that are missing from standard DSMs, for example, imagery, emotion, perception, etc. Therefore, even though it is unlikely that associative networks and feature-based models are a complete account of semantic memory, the free-association and property-generation norms that they are constructed from are likely useful baselines to compare DSMs against, because they include different types of relationships that go beyond those observable in textual corpora (De Deyne, Perfors, & Navarro, 2016 ). To that end, Gruenenfelder et al. ( 2016 ) compared three distributional models (LSA, BEAGLE, and Topic models) and one simple associative model and indicated that only a hybrid model that combined contextual similarity and associative networks successfully predicted the graph theoretic properties of free-association norms (also see Richie, White, Bhatia, & Hout, 2019 ). Therefore, associative networks and feature-based models can potentially capture complementary information compared to standard distributional models, and may provide additional cues about the features and associations other than co-occurrence that may constitute meaning. For instance, there is evidence to show that perceptual features such as size , color , and texture that are readily apparent to humans and may be used to infer semantic relationships, are not effectively captured by co-occurrence statistics derived from natural language corpora (e.g., Baroni & Lenci, 2008 ; see Section III), suggesting that semantic memory may in fact go beyond simple co-occurrence. Indeed, as discussed in Section III, multimodal and feature-integrated DSMs that use different linguistic and non-linguistic sources of information to learn semantic representations are currently a thriving area of research and are slowly changing the conceptualization of what constitutes semantic memory (e.g., Bruni et al., 2014 ; Lazaridou et al., 2015 ).

Error-free versus error-driven learning

Prediction is another contentious issue in semantic modeling that has gained a considerable amount of traction in recent years, and the traditional distinction between error-free Hebbian learning and error-driven Rescorla-Wagner-type learning has been carried over to debates between different DSMs in the literature. In particular, DSMs that are based on extracting temporally contiguous associations via error-free learning mechanisms to derive word meanings (e.g., HAL, LSA, BEAGLE, etc.) have been referred to as “count-based” models in computational linguistics and natural language processing, and have been contrasted against DSMs that employ a prediction-based mechanism to learn representations (e.g., word2vec, fastText, etc.), often referred to as “predict” models. It is important to note here that the count versus predict distinction is somewhat artificial and misleading, because even prediction-based DSMs effectively use co-occurrence counts of words from natural language corpora to generate predictions. The important difference between these models is therefore not that one class of models counts co-occurrences whereas the other predicts them, but in fact that one class of models employs an error-free Hebbian learning process whereas the other class of models employs a prediction-based error-driven learning process to learn direct and indirect associations between words. Nonetheless, in an influential paper, Baroni et al. ( 2014 ) compared 36 “count-based” or error-free learning-based DSMs to 48 “predict” or error-driven learning-based DSMs and concluded that error-driven learning-based (predictive) models significantly outperformed their Hebbian learning-based counterparts in a large battery of semantic tasks. Additionally, Mandera, Keuleers, and Brysbaert ( 2017 ) compared the relative performance of error-free learning-based DSMs (LSA and HAL-type) and error-driven learning-based models (CBOW and skip-gram versions of word2vec) on semantic priming tasks (Hutchison et al., 2013 ) and concluded that predictive models provided a better fit to the data. They also argued that predictive models are psychologically more plausible because they employ error-driven learning mechanisms consistent with principles posited by Rescorla and Wagner ( 1972 ) and are computationally more compact.

However, the argument that predictive models employ psychologically plausible learning mechanisms is incomplete, because error-free learning-based DSMs also employ equally plausible learning mechanisms, consistent with Hebbian learning principles. Further, there is also some evidence challenging the resounding success of predictive models. Asr, Willits, and Jones ( 2016 ) compared an error-free learning-based model (similar to HAL), a random vector accumulation model (similar to BEAGLE), and word2vec in their ability to acquire semantic categories when trained on child-directed speech data. Their results indicated that when the corpus was scaled down to stimulus available to children, the HAL-like model outperformed word2vec. Other work has also found little to no advantage of predictive models over error-free learning-based models (De Deyne, Perfors, & Navarro, 2016 ; Recchia & Nulty, 2017 ). Additionally, Levy, Goldberg, and Dagan ( 2015 ) showed that hyperparameters like window sizes, subsampling, and negative sampling can significantly affect performance, and it is not the case that predictive models are always superior to error-free learning-based models.

Collectively, these results point to two possibilities. First, it is possible that large amounts of training data (e.g., a billion words) and hyperparameter tuning (e.g., subsampling or negative sampling) are the main factors contributing to predictive models showing the reported gains in performance compared to their Hebbian learning counterparts. To address this possibility, Levy and Goldberg ( 2014 ) compared the computational algorithms underlying error-free learning-based models and predictive models and showed that the skip-gram word2vec model implicitly factorizes the word-context matrix, similar to several error-free learning-based models such as LSA. Therefore, it does appear that predictive models and error-free learning-based models may not be as different as initially conceived, and both approaches may actually converge on the same set of psychological principles. Second, it is possible that predictive models are indeed capturing a basic error-driven learning mechanism that humans use to perform certain types of complex tasks that require keeping track of sequential dependencies, such as sentence processing, reading comprehension, and event segmentation. Subsequent sections in this review discuss how state-of-the-art approaches specifically aimed at explaining performance in such complex semantic tasks are indeed variants or extensions of this prediction-based approach, suggesting that these models currently represent a promising and psychologically intuitive approach to semantic representation.

Language is clearly an extremely complex behavior, and even though modern DSMs like word2vec and GloVe that are trained on vast amounts of data successfully explain performance across a variety of tasks, adequate accounts of how humans generate sufficiently rich semantic representations with arguably lesser “data” are still missing from the field. Further, there appears to be relatively little work examining how newly trained models on smaller datasets (e.g., child-directed speech) compare to children’s actual performance on semantic tasks. The majority of the work in machine learning and natural language processing has focused on building models that outperform other models, or how the models compare to task benchmarks for only young adult populations. Therefore, it remains unclear how the mechanisms proposed by these models compare to the language acquisition and representation processes in humans, although subsequent sections make the case that recent attempts towards incorporating multimodal information, and temporal and attentional influences are making significant strides in this direction. Ultimately, it is possible that humans use multiple levels of representation and more than one mechanism to produce and maintain flexible semantic representations that can be widely applied across a wide range of tasks, and a brief review of how empirical work on context, attention, perception, and action has informed semantic models will provide a finer understanding on some of these issues.

II. Contextual and Retrieval-Based Semantic Memory

Despite the traditional notion of semantic memory being a “static” store of verbal knowledge about concepts, accumulating evidence within the past few decades suggests that semantic memory may actually be context-dependent. Consider the meaning of the word ostrich . Does the conceptualization of what the word ostrich means change when an individual is thinking about the size of different birds versus the types of eggs one could use to make an omelet? Although intuitively it appears that there is one “static” representation of ostrich that remains unchanged across different contexts, considerable evidence on the time course of sentence processing suggests otherwise. In particular, a large body of work has investigated how semantic representations come “online” during sentence comprehension and the extent to which these representations depend on the surrounding context. For example, there is evidence to show that the surrounding sentential context and the frequency of meaning may influence lexical access for ambiguous words (e.g., bark has a tree and sound-related meaning) at different timepoints (Swinney, 1979 ; Tabossi, Colombo, & Job, 1987 ). Furthermore, extensive work by Rayner and colleagues on eye movements in reading has shown that the frequency of different meanings of a word, the bias in the linguistic context, and preceding modifiers can modulate the extent to which multiple meanings of a word are automatically activated (Binder, 2003 ; Binder & Rayner, 1998 ; Duffy et al., 1988 ; Pacht & Rayner, 1993 ; Rayner, Cook, Juhasz, & Frazier, 2006 ; Rayner & Frazier, 1989 ). Collectively, this work is consistent with the two-process theories of attention (Neely, 1977 ; Posner & Snyder, 1975 ), according to which a fast, automatic activation process, as well as a slow, conscious attention mechanism are both at play during language-related tasks. The two-process theory can clearly account for findings like “automatic” facilitation in lexical decisions for words related to the dominant meaning of the ambiguous word in the presence of biasing context (Tabossi et al., 1987 ), and longer “conscious attentional” fixations on the ambiguous word when the context emphasizes the non-dominant meaning (Pacht & Rayner, 1993 ).

Another aspect of language processing is the ability to consciously attend to different parts of incoming linguistic input to form inferences on the fly. One line of evidence that speaks to this behavior comes from empirical work on reading and speech processing using the N400 component of event-related brain potentials (ERPs). The N400 component is thought to reflect contextual semantic processing, and sentences ending in unexpected words have been shown to elicit greater N400 amplitude compared to expected words, given a sentential context (e.g., Block & Baldwin, 2010 ; Federmeier & Kutas, 1999 ; Kutas & Hillyard, 1980 ). This body of work suggests that sentential context and semantic memory structure interact during sentence processing (see Federmeier & Kutas, 1999 ). Other work has examined the influence of local attention, context, and cognitive control during sentence comprehension. In an eye-tracking paradigm, Nozari, Trueswell, and Thompson-Schill ( 2016 ) had participants listen to a sentence (e.g., “She will cage the red lobster”) as they viewed four colorless drawings. The drawings contained a local attractor (e.g., cherry ) that was compatible with the closest adjective (e.g., red ) but not the overall context, or an adjective-incompatible object (e.g., igloo ). Context was manipulated by providing a verb that was highly constraining (e.g., cage ) or non-constraining (e.g., describe ). The results indicated that participants fixated on the local attractor in both constraining and non-constraining contexts, compared to incompatible control words, although fixation was smaller in more constrained contexts. Collectively, this work indicates that linguistic context and attentional processes interact and shape semantic memory representations, providing further evidence for automatic and attentional components (Neely, 1977 ; Posner & Snyder, 1975 ) involved in language processing.

Given these findings and the automatic-attentional framework, it is important to investigate how computational models of semantic memory handle ambiguity resolution (i.e., multiple meanings) and attentional influences, and depart from the traditional notion of a context-free “static” semantic memory store. Critically, DSMs that assume a static semantic memory store (e.g., LSA, GloVe, etc.) cannot straightforwardly account for the different contexts under which multiple meanings of a word are activated and suppressed, or how attending to specific linguistic contexts can influence the degree to which other related words are activated in the memory network. The following sections will further elaborate on this issue of ambiguity resolution and review some recent literature on modeling contextually dependent semantic representations.

Ambiguity resolution in error-free learning-based DSMs

Virtually all DSMs discussed so far construct a single representation of a word’s meaning by aggregating statistical regularities across documents or contexts. This approach suffers from the drawback of collapsing multiple senses of a word into an “average” representation. For example, the homonym bark would be represented as a weighted average of its two meanings (the sound and the trunk), leading to a representation that is more biased towards the more dominant sense of the word. Homonyms (e.g., bark ) and polysemes (e.g., newspaper may refer to the physical object or a national daily) represent over 40% of all English words (Britton, 1978 ; Durkin & Manning, 1989), and because DSMs do not appropriately model the non-dominant sense of a word, they tend to underperform in disambiguation tasks and also cannot appropriately model the behavior observed in sentence-processing tasks (e.g., Swinney, 1979 ). Indeed, Griffiths et al. ( 2007 ) have argued that the inability to model representations for polysemes and homonyms is a core challenge and may represent a key falsification criterion for certain distributional models (also see Jones, 2018 ). Early distributional models like LSA and HAL recognized this limitation of collapsing a word’s meaning into a single representation. Landauer ( 2001 ) noted that LSA is indeed able to disambiguate word meanings when given surrounding context, i.e., neighboring words (for similar arguments see Burgess, 2001 ). To that end, Kintsch ( 2001 ) proposed an algorithm operating on LSA vectors that examined the local context around the target word to compute different senses of the word. While the approach of applying a process model over and above the core distributional model could be criticized, it is important to note that meaning is necessarily distributed across several dimensions in DSMs and therefore any process model operating on these vectors is using only information already contained within the vectors (see Günther et al., 2019 , for a similar argument).

An alternative proposal to model semantic memory and also account for multiple meanings was put forth by Blei, Ng, and Jordan ( 2003 ) and Griffiths et al. ( 2007 ) in the form of topic models of semantic memory. In topic models, word meanings are represented as a distribution over a set of meaningful probabilistic topics, where the content of a topic is determined by the words to which it assigns high probabilities. For example, high probabilities for the words desk , paper , board , and teacher might indicate that the topic refers to a classroom , whereas high probabilities for the words board , flight , bus , and baggage might indicate that the topic refers to travel . Thus, in contrast to geometric DSMs where a word is represented as a point in a high-dimensional space, words (e.g., board ) can have multiple representations across the different topics (e.g., classroom , travel ) in a topic model. Importantly, topic models take the same word-document matrix as input as LSA and uncover latent “topics” in the same spirit of uncovering latent dimensions through an abstraction-based mechanism that goes over and above simply counting direct co-occurrences, albeit through different mechanisms, based on Markov Chain Monte Carlo methods (Griffiths & Steyvers, 2002 , 2003 , 2004 ). Topic models successfully account for free-association norms that show violations of symmetry, triangle inequality, and neighborhood structure (Tversky, 1977 ) that are problematic for other DSMs (but see Jones et al., 2018 ) and also outperform LSA in disambiguation, word prediction, and gist extraction tasks (Griffiths et al., 2007 ). However, the original architecture of topic models involved setting priors and specifying the number of topics a priori, which could lead to the possibility of experimenter bias in modeling (Jones, Willits, & Dennis, 2015 ). Further, the original topic model was essentially a “bag-of-words” model and did not capitalize on the sequential dependencies in natural language, like other DSMs (e.g., BEAGLE). Recent work by Andrews and Vigliocco ( 2010 ) has extended the topic model to incorporate word-order information, yielding more fine-grained linguistic representations that are sensitive to higher-order semantic relationships. Additionally, given that topic models represent word meanings as a distribution over a set of topics, they naturally account for multiple senses of a word without the need for an explicit process model, unlike other DSMs such as LSA or HAL (Griffiths et al., 2007 ).

Therefore, it appears that when DSMs are provided with appropriate context vectors through their representation (e.g., topic models) or additional assumptions (e.g., LSA), they are indeed able to account for patterns of polysemy and homonymy. Additionally, there has been a recent movement in natural language processing to build distributional models that can naturally tackle homonymy and polysemy. For example, Reisinger and Mooney ( 2010 ) used a clustering approach to construct sense-specific word embeddings that were successfully able to account for word similarity in isolation and within a sentential context. In their model, a word’s contexts were clustered to produce different groups of similar context vectors, and these context vectors were then averaged into sense-specific vectors for the different clusters. A slightly different clustering approach was taken by Li and Jurafsky ( 2015 ), where the sense clusters and embeddings were jointly learned using a Bayesian non-parametric framework. Their model used the Chinese Restaurant Process, according to which a new sense vector for a word was computed when evidence from the context (e.g., neighboring and co-occurring words) suggested that it was sufficiently different from the existing senses. Li and Jurafsky indicated that their model successfully outperformed traditional embeddings on semantic relatedness tasks. Other work in this area has employed multilingual distributional information to generate different senses for words (Upadhyay, Chang, Taddy, Kalai, & Zou, 2017 ), although the use of multiple languages to uncover word senses does not appear to be a psychologically plausible proposal for how humans derive word senses from language. Importantly, several of these recent approaches rely on error-free learning-based mechanisms to construct semantic representations that are sensitive to context. The following section describes some recent work in machine learning that has focused on error-driven learning mechanisms that can also adequately account for contextually-dependent semantic representations.

Ambiguity resolution in predictive DSMs

One particular drawback of multi-sense embeddings discussed above is that the meaning of a word can vary across multiple sentential contexts and enumerating all the different senses for a particular word can be both subjective (Westbury, 2016 ) and computationally expensive. For example, the word star can refer to its astronomical meaning, a film star, a rockstar, as well as an asterisk among other things, and the surrounding linguistic context itself may be more informative in understanding the meaning of the word star , instead of trying to enumerate all the different senses of star , which was the goal of multi-sense embeddings. The idea of using the sentential context itself to derive a word’s meaning was first proposed in Elman’s ( 1990 ) seminal work on the Simple Recurrent Network (SRN), where a set of context units that contained the previous hidden state of the neural network model served as “memory” for the next cycle. In this way, the internal representations that the SRN learned were sensitive to previously encountered linguistic context. This simple recurrent architecture successfully predicted word sequences, grammatical classes, and constituent structure in language (Elman, 1990 , 1991 ). Modern Recurrent Neural Networks (RNNs) build upon the intuitions of the SRN and come in two architectures: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). LSTMs introduced the idea of memory cells, i.e., a vector that could preserve error signals over time and overcome the problem of vanishing error signals over long sequences (Hochreiter & Schmidhuber, 1997 ). Access to the memory cells is controlled through gates in LSTMs, where gate values are linear combinations of the current input and the previous model state. GRUs also have a gated architecture but differ in the number of gates and how they combine the hidden states (Olah, 2019 ). LSTMs and GRUs are currently the most successful types of RNNs and have been extensively applied to construct contextually sensitive, compositional (discussed in Section IV) models of semantic memory.

The RNN approach inspired Peters et al. ( 2018 ) to construct Embeddings from Language Models (ELMo), a modern version of recurrent neural networks (RNNs). Peters et al.’s ELMo model uses a bidirectional LSTM combined with a traditional NN language model to construct contextual word embeddings. Specifically, instead of explicitly training to predict predefined or empirically determined sense clusters, ELMo first tries to predict words in a sentence going sequentially forward and then backward, utilizing recurrent connections through a two-layer LSTM. The embeddings returned from these “pretrained” forward and backward LSTMs are then combined with a task-specific NN model to construct a task-specific representation (see Fig. 6 ). One key innovation in the ELMo model is that instead of only using the topmost layer produced by the LSTM, it computes a weighed linear combination of all three layers of the LSTM to construct the final semantic representation. The logic behind using all layers of the LSTM in ELMo is that this process yields very rich word representations, where higher-level LSTM states capture contextual aspects of word meaning and lower-level states capture syntax and parts of speech. Peters et al. showed that ELMo’s unique architecture is successfully able to outperform other models in complex tasks like question answering, coreference resolution, and sentiment analysis among others. The success of recent recurrent models such as ELMo in tackling multiple senses of words represents a significant leap forward in modeling contextualized semantic representations.

figure 6

A depiction of the ELMo architecture. The hidden layers of two long short-term memory networks (LSTMs; forward and backward) are first concatenated, followed by a weighted sum of the hidden layers with the embedding layer, resulting in the final three-layer representation for a particular word. Adapted from Alammar ( 2018 )

Modern RNNs such as ELMo have been successful at predicting complex behavior because of their ability to incorporate previous states into semantic representations. However, one limitation of RNNs is that they encode the entire input sequence at once, which slows down processing and becomes problematic for extremely long sequences. For example, consider the task of text summarization, where the input is a body of text, and the task of the model is to paraphrase the original text. Intuitively, the model should be able to “attend” to specific parts of the text and create smaller “summaries” that effectively paraphrase the entire passage. This intuition inspired the attention mechanism , where “attention” could be focused on a subset of the original input units by weighting the input words based on positional and semantic information. The model would then predict target words based on relevant parts of the input sequence. Bahdanau, Cho, and Bengio ( 2014 ) first applied the attention mechanism to machine translation using two separate RNNs to first encode the input sequence and then used an attention head to explicitly focus on relevant words to generate the translated outputs. “Attention” was focused on specific words by computing an alignment score, to determine which input states were most relevant for the current time step and combining these weighted input states into a context vector. This context vector was then combined with the previous state of the model to generate the predicted output. Bahdanau et al. showed that the attention mechanism was able to outperform previous models in machine translation (e.g., Cho et al., 2014 ), especially for longer sentences.

Attention NNs are now at the heart of several state-of-the-art language models, like Google’s Transformer (Vaswani et al., 2017 ), BERT (Devlin et al., 2019 ), OpenAI’s GPT-2 (Radford et al., 2019 ) and GPT-3 (Brown et al., 2020 ), and Facebook’s RoBERTa (Liu et al., 2019 ). Two key innovations in these new attention-based NNs have led to remarkable performance improvements in language-processing tasks. First, these models are being trained on a much larger scale than ever before, allowing them to learn from a billion iterations and over several days (e.g., Radford et al., 2019 ). Second, modern attention-NNs entirely eliminate the sequential recurrent connections that were central to RNNs. Instead, these models use multiple layers of attention and positional information to process words in parallel. In this way, they are able to focus attention on multiple words at a time to perform the task at hand. For example, Google’s BERT model assigns position vectors to each word in a sentence. These position vectors are then updated using attention vectors, which represent a weighted sum of position vectors of other words and depend upon how strongly each position contributes to the word’s representation. Specifically, attention vectors are computed using a compatibility function (similar to an alignment score in Bahdanau et al., 2014 ), which assigns a score to each pair of words indicating how strongly they should attend to one another. These computations iterate over several layers and iterations with the dual goal of predicting masked words in a sentence (e.g., I went to the [mask] to buy a [mask] of milk; predict store and carton ) as well as deciding whether one sentence (e.g., They were out of reduced fat [mask], so I bought [mask] milk) is a valid continuation of another sentence (e.g., I went to the store to buy a carton of milk). By computing errors bidirectionally and updating the position and attention vectors with each iteration, BERT’s word vectors are influenced by other words’ vectors and tend to develop contextually dependent word embeddings. For example, the representation of the word ostrich in the BERT model would be different when it is in a sentence about birds (e.g., ostriches and emus are large birds) versus food ( ostrich eggs can be used to make omelets), due to the different position and attention vectors contributing to these two representations. Importantly, the architecture of BERT allows it to be flexibly finetuned and applied to any semantic task, while still using the basic attention-based mechanism. This framework has turned out to be remarkably efficient and models based on the general Transformer architecture (e.g., BERT, RoBERTa, GPT-2, & GPT-3) outperform LSTM-based recurrent approaches in semantic tasks such as sentiment analysis (Socher et al., 2013 ), sentence acceptability judgments (Warstadt, Singh, & Bowman, 2018 ), and even tasks that are dependent on semantic and world knowledge, such as the Winograd Schema Challenge (Levesque, Davis, & Morgenstern, 2012 ) or novel language generation (Brown et al., 2020 ). However, considerable work is beginning to evaluate these models using more rigorous test cases and starting to question whether these models are actually learning anything meaningful (e.g., Brown et al., 2020 ; Niven & Kao, 2019 ), an issue that is discussed in detail in Section V.

Although the technical complexity of attention-based NNs makes it difficult to understand the underlying mechanisms contributing to their impressive success, some recent work has attempted to demystify these models (e.g., Clark, Khandelwal, Levy, & Manning, 2019 ; Coenen et al., 2019 ; Michel, Levy, & Neubig, 2019 ; Tenney, Das, & Pavlick, 2019 ). For example, Clark et al. ( 2019 ) recently showed that BERT’s attention heads actually attend to meaningful semantic and syntactic information in sentences, such as determiners, objects of verbs, and co-referent mentions (see Fig. 7 ), suggesting that these models may indeed be capturing meaningful linguistic knowledge, which may be driving their performance. Further, some recent evidence also shows that BERT successfully captures phrase-level representations, indicating that BERT may indeed have the ability to model compositional structures (Jawahar, Sagot, & Seddah, 2019 ), although this work is currently in its nascent stages. Furthermore, it remains unclear how this conceptualization of attention fits with the automatic-attentional framework (Neely, 1977 ). Demystifying the inner workings of attention NNs and focusing on process-based accounts of how computational models may explain cognitive phenomena clearly represents the next step towards integrating these recent computational advances with empirical work in cognitive psychology.

figure 7

BERT attention heads that correspond to linguistic phenomena like attending to noun phrases and verbs. Arrows indicate specific relationships that the heads are attending to within each sentence. Adapted from Clark et al. ( 2019 )

Collectively, these recent approaches to construct contextually sensitive semantic representations (through recurrent and attention-based NNs) are showing unprecedented success at addressing the bottlenecks regarding polysemy, attentional influences, and context that were considered problematic for earlier DSMs. An important insight that is common to both contextualized RNNs and attention-based NNs discussed above is the idea of contextualized semantic representations, a notion that is certainly at odds with the traditional conceptualization of context-free semantic memory. Indeed, the following section discusses a new class of models take this notion a step further by entirely eliminating the need for learning representations or “semantic memory” and propose that all meaning representations may in fact be retrieval-based, therefore blurring the historical distinction between episodic and semantic memory.

Retrieval-based models of semantic memory

Tulving’s ( 1972 ) episodic-semantic dichotomy inspired foundational research on semantic memory and laid the groundwork for conceptualizing semantic memory as a static memory store of facts and verbal knowledge that was distinct from episodic memory, which was linked to events situated in specific times and places. However, some recent attempts at modeling semantic memory have taken a different perspective on how meaning representations are constructed. Retrieval-based models challenge the strict distinction between semantic and episodic memory, by constructing semantic representations through retrieval-based processes operating on episodic experiences. Retrieval-based models are based on Hintzman’s ( 1988 ) MINERVA 2 model, which was originally proposed to explain how individuals learn to categorize concepts. Hintzman argued that humans store all instances or episodes that they experience, and that categorization of a new concept is simply a weighted function of its similarity to these stored instances at the time of retrieval. In other words, each episodic experience lays down a trace, which implies that if an item is presented multiple times, it has multiple traces. At the time of retrieval, traces are activated in proportion to its similarity with the retrieval cue or probe. For example, an individual may have seen an ostrich in pictures or at the zoo multiple times and would store each of these instances in memory. The next time an ostrich -like bird is encountered by this individual, they would match the features of this bird to a weighted sum of all stored instances of ostrich and compute the similarity between these features to decide whether the new bird is indeed an ostrich . Hintzman’s work was crucial in developing the exemplar theory of categorization, which is often contrasted against the prototype theory of categorization (Rosch & Mervis, 1975 ), which suggests that individuals “learn” or generate an abstract prototypical representation of a concept (e.g., ostrich ) and compare new examples to this prototype to organize concepts into categories. Importantly, Hintzman’s model rejected the need for a strong distinction between episodic and semantic memory (Tulving, 1972 ) and has inspired a class of models of semantic memory often referred to as retrieval-based models .

Kwantes ( 2005 ) proposed a retrieval-based alternative to LSA-type distributional models by computing semantic representations “on the fly” from a term-document matrix of episodic experiences. Based on principles from Hintzman’s ( 1988 ) MINERVA 2 model, in Kwantes’ model, each word has a context vector (i.e., memory trace) associated with it, which contains its frequency of occurrence within each document of the training corpus. When a word is encountered in the environment, it is used as a cue to retrieve the context vector, which activates the traces of all words in lexical memory. The activation of a trace is directly proportional to the contextual similarity between their context vectors. Memory traces are then weighted by their activations and summed across the context vectors to construct the final semantic representation of the target word. The resulting semantic representations from Kwantes’ model successfully captured higher-order semantic relationships, similar to LSA, without the need for storing, abstracting, or learning these representations at the time of encoding.

Modern retrieval-based models have been successful at explaining complex linguistic and behavioral phenomena, such as grammatical constraints (Johns & Jones, 2015 ) and free association (Howard et al., 2011 ), and certainly represent a significant departure from the models discussed thus far. For example, Howard et al. ( 2011 ) proposed a model that constructed semantic representations using temporal context. Instead of defining context in terms of a sentence or document like most DSMs, the Predictive Temporal Context Model (pTCM; see also Howard & Kahana, 2002 ) proposes a continuous representation of temporal context that gradually changes over time. Items in the pTCM are activated to the extent that their encoded context overlaps with the context that is cued. Further, context is also used to predict items that are likely to appear next, and the semantic representation of an item is the collection of prediction vectors in which it appears over time. These previously learned prediction vectors also contribute to the word’s future representations. Howard et al. showed that the pTCM successfully simulates human performance in word-association tasks and is able to capture long-range dependencies in language that are problematic for other DSMs. In its core principles of constructing representations from episodic contexts, the pTCM is similar to other retrieval-based models, but its ability to learn from previous states and gradually accumulate information also shares similarities with the SRN (Elman, 1990 ), BEAGLE (Jones & Mewhort, 2007 ), and some of the recent error-driven learning DSMs discussed in Section II (e.g., word2vec, ELMo, etc.).

More recently, Jamieson, Avery, Johns, and Jones et al. ( 2018 ) proposed an instance-based theory of semantic memory, also based on MINERVA 2. In their model, word contexts are stored as n -dimensional vectors representing multiple instances in episodic memory. Memory of a document (or conversation) is the sum of all word vectors, and a “memory” vector stores all documents in a single vector. A word’s meaning is retrieved by cueing the memory vector with a probe, which activates each trace in proportion to its similarity to the probe. The aggregate of all activated traces is called an echo, where the contribution of a trace is directly weighted by its activation. The retrieved echo, in response to a probe, is assumed to represent a word’s meaning. Therefore, the model exhibits “context sensitivity” by comparing the activations of the retrieval probe with the activations of other traces in memory, thus producing context-dependent semantic representations without any mechanism for learning these representations. For example, Jamieson et al. showed that for the homograph break (with three senses, related to stopping, smashing, and news reporting), when their model is provided with a disambiguating context using a joint probe (e.g., break / car ), the retrieved representation (or “echo”) is more similar to the word stop , compared to the words report and smash , thus producing a context-dependent semantic representation of the word break . Therefore, Jamieson et al.’s model successfully accounts for some findings pertaining to ambiguity resolution that have been difficult to accommodate within traditional DSM-based accounts and proposes that meaning is created “on the fly” and in response to a retrieval cue, an idea that is certainly inconsistent with traditional semantic models.

Although it is well understood that prior knowledge or semantic memory influences how individuals perceive events (e.g., Bransford & Johnson, 1972 ; Deese, 1959 ; Roediger & McDermott, 1995 ), the notion that semantic memory may itself be influenced by episodic events is relatively recent. This section discussed how the conceptualization of semantic memory of being an independent and static memory store is slowly changing, in light of evidence that context shapes the structure of semantic memory. Retrieval-based models represent an important departure from the traditional notions about semantic memory, and instead propose that the meaning of a word is computed “on the fly” at retrieval, and do not subscribe to the idea of storing or learning a static semantic representation of a concept. This conceptualization is clearly at odds with traditional accounts of semantic memory and hearkens back to the distinction between prototype and exemplar theories of categorization briefly eluded to earlier. Specifically, in the computational models of semantic memory discussed so far (with the exception of retrieval-based models), the idea of inferring indirect co-occurrences and/or latent dimensions, i.e., learning through abstraction emerges as a core mechanism contributing to the construction of meaning. This idea of abstraction has also been central to computational models that have been applied to understand category structure. Specifically, prototype theories (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976 ; Rosch & Lloyd, 1978 ; also see Posner & Keele, 1968 ) posit that as individual concepts are experienced, humans gradually develop a prototypical representation that contains the most useful and representative information about that category. This notion of constructing an abstracted, prototypical representation is at the heart of several computational models of semantic memory discussed in this review. For example, both LSA and BEAGLE construct an “average” prototypical semantic representation from individual linguistic experiences. Of course, LSA uses a term-document matrix and singular value decomposition whereas BEAGLE learns meaning by incrementally combining co-occurrence and order information to compute a composite representation, but both models represent a word as a single point (prototype) in a high-dimensional space. Retrieval-based models, on the other hand, are inspired by Hintzman’s work and the exemplar theory of categorization and assume that semantic representations are constructed in response to retrieval cues and reject the idea of prototypical representations or abstraction-like learning processes occurring at the time of encoding. Given the success of retrieval-based models at tackling ambiguity and several other linguistic phenomena, these models clearly represent a powerful proposal for how meaning is constructed.

However, before abstraction (at encoding) can be rejected as a plausible mechanism underlying meaning computation, retrieval-based models need to address several bottlenecks, only one of which is computational complexity. Jones et al. ( 2018 ) recently noted that computational constraints should not influence our preference of traditional prototype models over exemplar-based models, especially since exemplar models have provided better fits to categorization task data, compared to prototype models (Ashby & Maddox, 1993 ; Nosofsky, 1988 ; Stanton, Nosofsky, & Zaki, 2002 ). However, implementation is a core test for theoretical models and retrieval-based models must be able to explain how the brain manages this computational overhead. Specifically, retrieval-based models argue against any type of “semantic memory” at all and instead propose that semantic representations are created “on the fly” when words or concepts are encountered within a particular context. As discussed earlier, while there is evidence to suggest that the representations likely change with every new encounter (e.g., for a review, see Yee et al., 2018 ), it is still unclear why the brain would create a fresh new representation for a particular concept “on the fly” each time that concept is encountered, and not “learn” something about the concept from previous encounters that could aid future processing. It seems more psychologically plausible that the brain learns and maintains a semantic representation (stored via changes in synaptic activity; see Mayford, Siegelbaum, & Kandel, 2012 ) that is subsequently finetuned or modified with each new incoming encounter – a proposal that is closer to the mechanisms underlying recurrent and attention-NNs discussed earlier in this section. Furthermore, in light of findings that top-down information or previous knowledge does in fact guide cognitive behavior (e.g., Bransford & Johnson, 1972 ; Deese, 1959 ; Roediger & McDermott, 1995 ) and bottom-up processes interact with top-down processes (Neisser, 1976 ), the proposal that there may not be any existing semantic structures in place at all certainly requires more investigation.

It is important to note here that individual traces for episodic events may indeed need to be stored by the system for other cognitive tasks, but the argument here is that retrieving the meaning of a concept need not necessarily require the storage of every individual experience or trace. For example, consider the simple memory task of remembering a list of words: train , ostrich , lemon , and truth . Encoding a representation of this event likely involves laying down a trace of this experience in memory. However, retrieval-based models would posit that the representation of the word ostrich in this context would in fact be a weighted sum of every other time the word or concept of ostrich has been experienced, all of which have been stored in memory. This conceptualization seems unnecessary, especially given that other DSMs that instead use more compact learning-based representations have been fairly successful at simulating performance in semantic as well as non-semantic tasks (for a model of LSA-type semantic structures applied to free recall tasks, see Polyn, Norman, & Kahana, 2009 ).

Additionally, it appears that retrieval-based models currently lack a complete account of how long-term sequential dependencies, sentential context, and multimodal information might simultaneously influence the computation of meaning. For example, how does multimodal information about an object get stored in retrieval-based models – does each individual sensorimotor encounter also leave its own trace in memory and contribute to the “context-specific” representation or is the scope of “context” limited to patterns of co-occurrence? Further, it remains unclear how representations derived from retrieval-based models differ from representations derived from modern RNNs and attention-based NNs, which also propose contextualized representations. It appears that these classes of models share similarities in their fundamental claim that the retrieval context determines the representation of a concept or word, although retrieval-based models do not subscribe to any particular learning mechanism (with the exception of Howard et al.’ s predictive pTCM model), whereas RNNs and attention-NNs are based on error-driven learning mechanisms. Specifically, RNNs and attention-NNs learn via prediction and incrementally build semantic representations, whereas retrieval-based models instead propose that representations are constructed solely at the time of retrieval, without any learning occurring at the time of exposure or encoding. Furthermore, while RNNs and attention-NNs take word order and positional information (e.g., bidirectionality in BERT) into account within their definition of “context” when constructing semantic representations, it appears that recent retrieval-based models currently lack mechanisms to incorporate word order into their representations (e.g., Jamieson et al., 2018 ), even though this may simply be a practical limitation at this point.

Finally, it is unclear how retrieval-based models would scale up to sentences, paragraphs, and other higher-order structures like events, issues that are being successfully addressed by other learning-based DSMs (see Sections III and IV). Clearly, more research is needed to adequately assess the relative performance of retrieval-based models, compared to state-of-the-art learning-based models of semantic memory currently being widely applied in the literature to a large collection of semantic (and non-semantic) tasks. Collectively, it seems most likely that humans store individual exemplars in some form (e.g., a distributed pattern of activation) or at least to some extent (e.g., storing only traces above a certain threshold of stable activation), but also learn a prototypical representation as consistent exemplars are experienced, which facilitates faster top-down processing (for a similar argument, see Yee et al., 2018 ) in cognitive tasks, although this issue clearly needs to be explored further.

The central idea that emerged in this section is that semantic memory representations may indeed vary across contexts. The accumulating evidence that meaning rapidly changes with linguistic context certainly necessitates models that can incorporate this flexibility into word representations. Attention-based NNs like BERT and GPT-2/3 represent a promising step towards constructing such contextualized, attention-based representations and appear to be consistent with the automatic and attentional components of language processing (Neely, 1977 ), although more work is needed to clarify how these models compute meaningful representations that can be flexibly applied across different tasks. The success of attention-based NNs is truly impressive on one hand but also cause for concern on the other. First, it is remarkable that the underlying mechanisms proposed by these models at least appear to be psychologically intuitive and consistent with empirical work showing that attentional processes and predictive signals do indeed contribute to semantic task performance (e.g., Nozari et al., 2016 ). However, if the ultimate goal is to build models that explain and mirror human cognition, the issues of scale and complexity cannot be ignored. Current state-of-the-art models operate at a scale of word exposure that is much larger than what young adults are typically exposed to (De Deyne, Perfors, & Navarro, 2016 ; Lake, Ullman, Tenenbaum, & Gershman, 2017 ). Therefore, exactly how humans perform the same semantic tasks without the large amounts of data available to these models remains unknown. One line of reasoning is that while humans have lesser linguistic input compared to the corpora that modern semantic models are trained on, humans instead have access to a plethora of non-linguistic sensory and environmental input, which is likely contributing to their semantic representations. Indeed, the following section discusses how conceptualizing semantic memory as a multimodal system sensitive to perceptual input represents the next big paradigm shift in the study of semantic memory.

III. Grounding Models of Semantic Memory

Virtually all distributional and network-based semantic models rely on large text corpora or databases to construct semantic representations. Consequently, a consistent and powerful criticism of distributional semantic models comes from the grounded cognition movement (Barsalou, 2016 ), which rejects the idea that meaning can be represented through abstract and amodal symbols like words in a language. Instead, grounded cognition researchers posit that sensorimotor modalities, the environment, and the body all contribute and play a functional role in cognitive processing, and by extension, the construction of meaning. Grounded (or embodied) cognition is a rather broad enterprise that attempts to redefine the study of cognition (Matheson & Barsalou, 2018 ). Within the domain of semantic memory, distributional models in particular have been criticized because they derive semantic representations from only linguistic texts and are not grounded in perception and action, leading to the symbol grounding problem (Harnad, 1990 ; Searle, 1980 ), i.e., how can the meaning of a word (e.g., an ostrich ) be grounded only in other words (e.g., big , bird , etc.) that are further grounded in more words?

While there is no one theory of grounded cognition (Matheson & Barsalou, 2018 ), the central tenet common to several of them is that the body, brain, and physical environment dynamically interact to produce meaning and cognitive behavior. For example, based on Barsalou’s account (Barsalou, 1999 , 2003 , 2008 ), when an individual first encounters an object or experience (e.g., a knife ), it is stored in the modalities (e.g., its shape in the visual modality, its sharpness in the tactile modality, etc.) and the sensorimotor system (e.g., how it is used as a weapon or kitchen utensil). Repeated co-occurrences of physical stimulations result in functional associations (likely mediated by associative Hebbian learning and/or connectionist mechanisms) that form a multimodal representation of the object or experience (Matheson & Barsalou, 2018 ). Features of these representations are activated through recurrent connections, which produces a simulation of past experiences. These simulations not only guide an individual’s ongoing behavior retroactively (e.g., how to dice onions with a knife ), but also proactively influence their future or imagined plans of action (e.g., how one might use a knife in a fight). Simulations are assumed to be neither conscious nor complete (Barsalou, 2003 ; Barsalou & Wiemer-Hastings, 2005 ), and are sensitive to cognitive and social contexts (Lebois, Wilson-Mendenhall, & Barsalou, 2015 ).

There is some empirical support for the grounded cognition perspective from sensorimotor priming studies. In particular, there is substantial evidence that modality-specific neural information is activated during language-processing tasks. For example, it has been demonstrated that reading verbs like kick (corresponding to feet) or pick (corresponding to hand) activates the motor cortex in a somatotopic fashion (Pulvermüller, 2005 ), passive reading of taste-related words (e.g., salt ) activates gustatory cortices (Barros-Loscertales et al., 2011 ), and verifying modality-specific properties of words (e.g., color, taste, sound, and touch) activates the corresponding sensory brain regions (Goldberg, Perfetti, & Schneider, 2006 ). However, whether the activation of modality-specific information is incidental to the task and simply a result of post-representation processes, or actually part of the semantic representation itself is an important question. Support for the latter argument comes from studies showing that transcranial stimulation of areas in the premotor cortex related to the hand facilitates lexical decision performance for hand-related action words (Willems, Labruna, D’Esposito, Ivry, & Casasanto, 2011 ), Parkinson’s patients show selective impairment in comprehending motor action words (Fernandino et al., 2013 ), and damage to brain regions supporting object-related action can hinder access to knowledge about how objects are manipulated (Yee, Chrysikou, Hoffman, & Thompson-Schill, 2013 ). Yee et al. also showed that when individuals performed a concurrent manual task while naming pictures, there was more naming interference for objects that are more manually used (e.g., pencils ), compared to objects that are not typically manually used (e.g., tigers ). Furthermore, Yee, Huffstetler, and Thompson-Schill ( 2011 ) used a visual eye-tracking paradigm to show that as an object unfolds over time (e.g., auditorily hearing frisbee ), particular features (e.g., form-related) come online in a temporally constrained fashion and can influence eye fixation times for related words (e.g., e.g., participants fixated longer on pizza , because frisbee and pizza are both round). Taken together, these findings suggest that semantic memory representations are accessed in a dynamic way during tasks and different perceptual features of these representations may be accessed at different timepoints, suggesting a more flexible and fluid conceptualization (also see Yee, Lahiri, & Kotzor, 2017 ) of semantic memory that can change as a function of task. Therefore, it is important to evaluate whether computational models of semantic memory can indeed encode these rich, non-linguistic features as part of their representations.

It is important to note here that while the sensorimotor studies discussed above provide support for the grounded cognition argument, these studies are often limited in scope to processing sensorimotor words and do not make specific predictions about the direction of effects (Matheson & Barsalou, 2018 ; Matheson, White, & McMullen, 2015 ). For example, although several studies show that modality-specific information is activated during behavioral tasks, it remains unclear whether this activation leads to facilitation or inhibition within a cognitive task. Indeed, both types of findings are taken to support the grounded cognition view, therefore leading to a lack of specificity in predictions regarding the role of modality-specific information (Matheson et al., 2015 ), although some recent work has proposed that timing of activation may be critical in determining how modality-specific activation influences cognitive performance (Matheson & Barsalou, 2018 ). Another strong critique of the grounded cognition view is that it has difficulties accounting for how abstract concepts (e.g., love , freedom etc.) that do not have any grounding in perceptual experience are acquired or can possibly be simulated (Dove, 2011 ). Some researchers have attempted to “ground” abstract concepts in metaphors (Lakoff & Johnson, 1999 ), emotional or internal states (Vigliocco et al., 2013 ), or temporally distributed events and situations (Barsalou & Wiemer-Hastings, 2005 ), but the mechanistic account for the acquisition of abstract concepts is still an active area of research. Finally, there is a dearth of formal models that provide specific mechanisms by which features acquired by the sensorimotor system might be combined into a coherent concept. Some accounts suggest that semantic representations may be created by patterns of synchronized neural activity, which may represent different sensorimotor information (Schneider, Debener, Oostenveld, & Engel, 2008 ). Other work has suggested that certain regions of the cortex may serve as “hubs” or “convergence zones” that combine features into coherent representations (Patterson, Nestor, & Rogers, 2007 ), and may reflect temporally synchronous activity within areas to which the features belong (Damasio, 1989 ). However, comparisons of such approaches to DSMs remain limited due to the lack of formal grounded models, although there have been some recent attempts at modeling perceptual schemas (Pezzulo & Calvi, 2011 ) and Hebbian learning (Garagnani & Pulvermüller, 2016 ).

Proponents of the grounded cognition view have also presented empirical (Glenberg & Robertson, 2000 ; Rubinstein, Levi, Schwartz, & Rappoport, 2015 ) and theoretical criticisms (Barsalou, 2003 ; Perfetti, 1998 ) of DSMs over the years. For example, Glenberg and Robertson ( 2000 ) reported three experiments to argue that high-dimensional space models like LSA/HAL are inadequate theories of meaning, because they fail to distinguish between sensible (e.g., filling an old sweater with leaves) and nonsensical sentences (e.g., filling an old sweater with water) based on cosine similarity between words (but see Burgess, 2000 ). Some recent work also shows that traditional DSMs trained solely on linguistic corpora do indeed lack salient features and attributes of concepts. Baroni and Lenci ( 2008 ) compared a model analogous to LSA with attributes derived from McRae, Cree, Seidenberg, and McNorgan ( 2005 ) and an image-based dataset. They provided evidence that DSMs entirely miss external (e.g., a car <has wheels>) and surface level (e.g., a banana <is yellow>) properties of objects, and instead focus on taxonomic (e.g., cat - dog ) and situational relations (e.g., spoon - bowl ), which are more frequently encountered in natural language. More recently, Rubinstein et al. ( 2015 ) evaluated four computational models, including word2vec and GloVE, and showed that DSMs are poor at classifying attributive properties (e.g., an elephant <is large>), but relatively good at classifying taxonomic properties (e.g., apple <is a> fruit ) identified by human subjects in a property generation task (also see Collell & Moens, 2016 ; Lucy & Gauthier, 2017 ).

Collectively, these studies appear to underscore the intuitions of the grounded cognition researchers that semantic models based solely on linguistic sources do not produce sufficiently rich representations. While this is true, it is important to realize here that the failure of DSMs to encode these perceptual features is a function of the training corpora they are exposed to, i.e., a practical limitation, and not necessarily a theoretical one. Early DSMs were trained on linguistic corpora not because it was intrinsic to the theoretical assumptions made by the models, but because text corpora were easily available (for more fleshed-out arguments on this issue, see Burgess, 2000 ; Günther et al., 2019 ; Landauer & Dumais, 1997 ). Therefore, the more important question is whether DSMs can be adequately trained to derive statistical regularities from other sources of information (e.g., visual, haptic, auditory etc.), and whether such DSMs can effectively incorporate these signals to construct “grounded” semantic representations.

Grounding DSMs through feature integration

The lack of grounding in standard DSMs led to a resurging interest in early feature-based models (McRae et al., 1997 ; Smith et al., 1974 ). As discussed earlier, early feature-based models represented words as a collection of binary features (e.g., birds have wings, whereas cars do not), and words with similar meanings had greater overlap in their constituent features (McCloskey & Glucksberg, 1979 ; Smith et al., 1974 ; Tversky, 1977 ), although these early models did not have explicit mechanisms to account for how features were learned in the first place. However, one important strength of feature-based models was that the features encoded could directly be interpreted as placeholders for grounded sensorimotor experiences (Baroni & Lenci, 2008 ). For example, the representation of a banana is distributed across several hundred dimensions in a distributional approach, and these dimensions may or may not be interpretable (Jones, Willits, & Dennis, 2015 ), but the perceptual experience of the banana’s color being yellow can be directly encoded in feature-based models (e.g., banana <is yellow>).

However, it is important to note here that, again, the fact that features can be verbalized and are more interpretable compared to dimensions in a DSM is a result of the features having been extracted from property generation norms, compared to textual corpora. Therefore, it is possible that some of the information captured by property generation norms may already be encoded in DSMs, albeit through less interpretable dimensions. Indeed, a systematic comparison of feature-based and distributional models by Riordan and Jones ( 2011 ) demonstrated that representations derived from DSMs produced comparable categorical structure to feature representations generated by humans, and the type of information encoded by both types of models was highly correlated but also complementary. For example, DSMs gave more weight to actions and situations (e.g., eat , fly , swim ) that are frequently encountered in the linguistic environment, whereas feature-based representations were better are capturing object-specific features (e.g., <is yellow>, <made of metal>) that potentially reflected early sensorimotor experiences with objects. Riordan and Jones argued that children may be more likely to initially extract information from sensorimotor experiences. However, as they acquire more linguistic experience, they may shift to extracting the redundant information from the distributional structure of language and rely on perception for only novel concepts or the unique sources of information it provides. This idea is consistent with the symbol interdependency hypothesis (Louwerse, 2011 ), which proposes that while words must be grounded in the sensorimotor action and perception, they also maintain rich connections with each other at the symbolic level, which allows for more efficient language processing by making it possible to skip grounded simulations when unnecessary. The notion that both sources of information are critical to the construction of meaning presents a promising approach to reconciling distributional models with the grounded cognition view of language (for similar accounts, see Barsalou, Santos, Simmons, & Wilson, 2008 ; Paivio, 1991 ).

Recent work in computational modeling has attempted to integrate featural information with distributional information to enrich semantic representations. For example, Andrews et al. ( 2009 ) used a Bayesian probabilistic topic model to jointly model semantic representations using experiential feature-based (e.g., an ostrich < is big>, <does not fly>, <has feathers> etc.) and linguistic (e.g., ostrich and emu co-occur) data as complementary sources of information. Further, Vigliocco, Meteyard, Andrews, and Kousta ( 2009 ) argued that affective and internal states can serve as another data source that could potentially enrich semantic representations, particularly for abstract concepts that lack sensorimotor associations (Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011 ). The information integration approach has also been applied to other types of DSMs. For example, Jones and Recchia ( 2010 ) integrated feature-based information with BEAGLE to show that temporal linguistic information plays a critical role in generating accurate semantic representations. Johns and Jones ( 2012 ) have also explored the integration of perceptual information with linguistic information based on simple associative mechanisms, borrowing principles from Hintzman’s ( 1988 ) MINERVA architecture and Kwantes’ ( 2005 ) model. Their model provided a proof of concept that perceptually rich semantic representations may be constructed by grounding them in already formed or learned representations of other words (accessible via feature norms). This notion of grounding representations in previously learned words has also been explored by Howell et al. ( 2005 ) using a recurrent NN model. Using a modified version of the Elman’s ( 1990 ) SRN with two additional output layers for noun and verb features, Howell et al. trained the model to map phonetically presented input words (nouns) to semantic features and perform a grammatical word prediction task. Howell et al. argued that this type of learning mechanism could be applied to simulate a “propagation of grounding” effect, where sensorimotor information from early, concrete words acquired by children feeds into semantic representations of novel words, although this proposal was not formally tested in the paper. Other work on integrating featural information has explored training a recurrent NN model with sensorimotor feature inputs and patterns of co-occurrence to account for a wide variety of behavioral patterns consistent with normal and impaired semantic cognition (Hoffman et al., 2018 ), implementing a feedforward NN to apply feature learning to a simple word-word co-occurrence model (Durda, Buchanan, & Caron, 2009 ) and using feature-based vectors as input to a random-vector accumulation model (Vigliocco, Vinson, Lewis, & Garrett, 2004 ).

Multimodal DSMs

Despite their considerable success, an important limitation of feature-integrated distributional models is that the perceptual features available are often restricted to small datasets (e.g., 541 concrete nouns from McRae et al., 2005 ), although some recent work has attempted to collect a larger dataset of feature norms (e.g., 4436 concepts; Buchanan, Valentine, & Maxwell, 2019 ). Moreover, the features produced in property generation tasks are potentially prone to saliency biases (e.g., hardly any participant will produce the feature <has a head> for a dog because having a head is not salient or distinctive), and thus can only serve as an incomplete proxy for all the features encoded by the brain. To address these concerns, Bruni et al. ( 2014 ) applied advanced computer vision techniques to automatically extract visual and linguistic features from multimodal corpora to construct multimodal distributional semantic representations. Using a technique called “bag-of-visual-words” (Sivic & Zisserman, 2003 ), the model discretized visual images and produced visual units comparable to words in a text document. The resulting image matrix was then concatenated with a textual matrix constructed from a natural language corpus using singular value decomposition to yield a multimodal semantic representation. Bruni et al. showed that this model was superior to a purely text-based approach and successfully predicted semantic relations between related words (e.g., ostrich - emu ) and clustering of words into superordinate concepts (e.g., ostrich - bird ).

This multimodal approach to semantic representation is currently a thriving area of research (Feng & Lapata, 2010 ; Kiela & Bottou, 2014 ; Lazaridou et al., 2015 ; Silberer & Lapata, 2012 , 2014 ). Advances in the machine-learning community have majorly contributed to accelerating the development of these models. In particular, Convolutional Neural Networks (CNNs) were introduced as a powerful and robust approach for automatically extracting meaningful information from images, visual scenes, and longer text sequences. The central idea behind CNNs is to apply a non-linear function (a “filter”) to a sliding window of the full chunk of information, e.g., pixels in an image, words in a sentence, etc. The filter transforms the larger window of information into a fixed d -dimensional vector, which captures the important properties of the pixels or words in that window. Convolution is followed by a “pooling” step, where vectors from different windows are combined into a single d -dimensional vector, by taking the maximum or average value of each of the d -dimensions across the windows. This process extracts the most important features from a larger set of pixels (see Fig. 8 ), or the most informative k -grams in a long sentence. CNNs have been flexibly applied to different semantic tasks like sentiment analysis and machine translation (Collobert et al., 2011 ; Kalchbrenner, Grefenstette, & Blunsom, 2014 ), and are currently being used to develop multimodal semantic models.

figure 8

A depiction of a typical convolutional neural network that detects vertical edges in an image. A sliding filter is multiplied with the pixelized image to produce a matrix, and then a pooling step combines results from the convolved output into a smaller matrix by selecting the maximum value from each 2 × 2 sub-matrix in the convolved matrix. This final 2 × 2 matrix represents the final representation of the image highlighting the vertical edges

Kiela and Bottou ( 2014 ) applied CNNs to extract the most meaningful features from images from a large image database (ImageNet; Deng et al., 2009 ) and then concatenated these image vectors with linguistic word2vec vectors to produce superior semantic representations compared to Bruni et al. ( 2014 ); also see Silberer & Lapata, 2014 ). Lazaridou et al. ( 2015 ) constructed a multimodal word2vec model that was trained to jointly learn visual and semantic representations for a subset of words (using image-based CNNs and word2vec), and this learning was then generalized to the entire corpus, thus echoing Howell et al.’s ( 2005 ) intuitions of “propagation of grounding.” Lazaridou et al. also demonstrated how the learning of abstract words might be grounded in concrete scenes (e.g., freedom might be the inferred concept from a scene of a person raising their hands in a protest), an intuitively powerful proposal that can potentially demystify the acquisition of abstract concepts but clearly needs further exploration.

There is also some work within the domain of associative network models of semantic memory that has focused on integrating different sources of information to construct the semantic networks. One particular line of research has investigated combining word-association norms with featural information, co-occurrence information, and phonological similarity to form multiplex networks (Stella, Beckage, & Brede, 2017 ; Stella, Beckage, Brede, & De Domenico, 2018 ). Stella et al. ( 2017 ) demonstrated that the “layers” in such a multiplex network differentially influence language acquisition, with all layers contributing equally initially but the association layer overtaking the word learning process with time. This proposal is similar to the ideas presented earlier regarding how perceptual or sensorimotor experience might be important for grounding words acquired earlier, and words acquired later might benefit from and derive their representations through semantic associations with these early experiences (Howell et al., 2005 ; Riordan & Jones, 2011 ). In this sense, one can think of phonological information and featural information providing the necessary grounding to early acquired concepts. This “grounding” then propagates and enriches semantic associations, which are easier to access as the vocabulary size increases and individuals develop more complex semantic representations.

Given the success of integrated and multimodal DSMs memory that use state-of-the-art modeling techniques to incorporate other modalities to augment linguistic representations, it appears that the claim that semantic models are “amodal” and “ungrounded” may need to be revisited. Indeed, the fact that multimodal semantic models can adequately encode perceptual features (Bruni et al., 2014 ; Kiela & Bottou, 2014 ) and can approximate human judgments of taxonomic and visual similarity (Lazaridou et al., 2015 ), suggests that the limitations of previous models (e.g., LSA, HAL etc.) were more practical than theoretical. Of course, incorporating other modalities besides vision is critical to this enterprise, and although there have been some efforts to integrate sound and olfactory data into semantic representations (Kiela, Bulat, & Clark, 2015 ; Kiela & Clark, 2015 ; Lopopolo & Miltenburg, 2015 ), these approaches are limited by the availability of large datasets that capture other aspects of embodiment that may be critical for meaning construction, such as touch, emotion, and taste. Investing resources in collecting and archiving multimodal datasets (e.g., video data) is an important next step for advancing research in semantic modeling and broadening our understanding of the many facets that contribute to the construction of meaning.

IV. Compositional Semantic Representations

An additional aspect of extending our understanding of meaning by incorporating other sources of information is that meaning may be situated within and as part of higher-order semantic structures like sentence models, event models, or schemas. Indeed, language is inherently compositional in that morphemes combine to form words, words combine to form phrases, and phrases combine to form sentences. Moreover, behavioral evidence from sentential priming studies indicates that the meaning of words depends on complex syntactic relations (Morris, 1994 ). Further, it is well known that the meaning of a sentence itself is not merely the sum of the words it contains. For example, the sentence “John loves Mary” has a different meaning to “Mary loves John,” despite both sentences having the same words. Thus, it is important to consider how compositionality can be incorporated into and inform existing models of semantic memory.

Compositional linguistic approaches

Associative network models do not have any explicit way of modeling compositionality, as they propose representations at the word level that cannot be straightforwardly scaled to higher-order semantic structures. On the other hand, distributional models have attempted to build compositionality into semantic representations by assigning roles to different entities in sentences (e.g., in “Mary loves John,” Mary is the lover and John is the lovee ; Dennis, 2004 , 2005 ), treating frequent phrases as single units and deriving phrase-based representations (e.g., treating proper names like New York as a single unit; Bannard, Baldwin, & Lascarides, 2003 ; Mikolov, Sutskever, et al., 2013 ) or forming pair-pattern matrices (e.g., encoding words that fulfil the pattern X cuts Y, i.e., mason : stone ; Turney & Pantel, 2010 ). However, these approaches were either not scalable for longer phrases or lacked the ability to model constituent parts separately (Mitchell & Lapata, 2010 ). Vector addition (or averaging) is another common method of combining distributional semantic representations for different words to form higher-order vectors (Landauer & Dumais, 1997 ), but this method is insensitive to word order and syntax and produces a blend that does not appropriately extract meaningful information from the constituent words (Mitchell & Lapata, 2010 ).

An alternative method of combining word-level vectors is through a matrix multiplication technique called tensor products . Tensor products are a way of computing pairwise products of the component word vector elements (Clark, Coecke, & Sadrzadeh, 2008 ; Clark & Pulman, 2007 ; Widdows, 2008 ), but this approach suffers from the curse of dimensionality, i.e., the resulting product matrix becomes very large as more individual vectors are combined. Circular convolution is a special case of tensor products that compresses the resulting product of individual word vectors into the same dimensionality (e.g., Jones & Mewhort, 2007 ). In a systematic review, Mitchell and Lapata ( 2010 ) examined several compositional functions applied onto a simple high-dimensional space model and a topic model space in a phrase similarity rating task (judging similarity for phrases like vast amount - large amount , start work - begin career , good place - high point , etc.). Specifically, they examined how different methods of combining word-level vectors (e.g., addition, multiplication, pairwise multiplication using tensor products, circular convolution, etc.) compared in their ability to explain performance in the phrase similarity task. Their findings indicated that dilation (a function that amplified some dimensions of a word when combined with another word, by differentially weighting the vector products between the two words) performed consistently well in both spaces, and circular convolution was the least successful in judging phrase similarity. This work sheds light on how simple compositional operations (like tensor products or circular convolution) may not sufficiently mimic human behavior in compositional tasks and may require modeling more complex interactions between words (i.e., functions that emphasize different aspects of a word).

Recent efforts in the machine-learning community have also attempted to tackle semantic compositionality using Recursive NNs. Recursive NNs represent a generalization of recurrent NNs that, given a syntactic parse-tree representation of a sentence, can generate hierarchical tree-like semantic representations by combining individual words in a recursive manner (conditional on how probable the composition would be). For example, Socher, Huval, Manning, and Ng ( 2012 ) proposed a recursive NN to compute compositional meaning representations. In their model, each word is assigned a vector that captures its meaning and also a matrix that contains information about how it modifies the meaning of another word. This representation for each word is then recursively combined with other words using a non-linear composition function (an extension of work by Mitchell & Lapata, 2010 ). For example, in the first iteration, the words very and good may be combined into a representation (e.g., very good ), which would recursively be combined with movie to produce the final representation (e.g., very good movie ). Socher et al. showed that this model successfully learned propositional logic, how adverbs and adjectives modified nouns, sentiment classification, and complex semantic relationships (also see Socher et al., 2013 ). Other work in this area has explored multiplication-based models (Yessenalina & Cardie, 2011 ), LSTM models (Zhu, Sobhani, & Guo, 2016 ), and paraphrase-supervised models (Saluja, Dyer, & Ruvini, 2018 ). Collectively, this research indicates that modeling the sentence structure through NN models and recursively applying composition functions can indeed produce compositional semantic representations that are achieving state-of-the-art performance in some semantic tasks.

Compositional Event Representations

Another critical aspect of modeling compositionality is being able to extend representations at the word or sentence level to higher-level cognitive structures like events or situations. The notion of schemas as a higher-level, structured representation of knowledge has been shown to guide language comprehension (Schank & Abelson, 1977 ; for reviews, see Rumelhart, 1991 ) and event memory (Bower, Black, & Turner, 1979 ; Hard, Tversky, & Lang, 2006 ). The past few years have seen promising advances in the field of event cognition (Elman & McRae, 2019 ; Franklin et al., 2019 ; Reynolds, Zacks, & Braver, 2007 ; Schapiro, Rogers, Cordova, Turk-Browne, & Botvinick, 2013 ). Importantly, while most event-based accounts have been conceptual, recent computational models have attempted to explicitly specify processes that might govern event knowledge. For example, Elman and McRae ( 2019 ) recently proposed a recurrent NN model of event knowledge, trained on activity sequences that make up events. An activity was defined as a collection of agents, patients, actions, instruments, states, and contexts, each of which were supplied as inputs to the network. The task of the network was to learn the internal structure of an activity (i.e., which features correlate with a particular activity) and also predict the next activity in sequence. Elman and McRae showed that this network was able to infer the co-occurrence dynamics of activities, and also predict sequential activity sequences for new events. For example, when presented with the activity sequence, “The crowd looks around. The skater goes to the podium. The audience applauds. The skater receives a ___”, the network activated the words podium and medal after the fourth sentence (“The skater receives a”) because both of these are contextually appropriate (receiving an award at the podium and receiving a medal ), although medal was more activated than podium as it was more appropriate within that context. This behavior of the model was strikingly consistent with N400 amplitudes observed for the same types of sentences in an ERP study (Metusalem et al., 2012 ), indicating that the model was able to make predictive inferences like human participants.

Franklin et al. ( 2019 ) recently proposed a probabilistic model of event cognition. In their model, each visual scene had a distributed vector representation, encoding the features that are relevant to the scene, which were learned using an unsupervised CNN. Additionally, scenes contained relational information that linked specific roles to specific fillers via circular convolution. A four-layer fully connected NN with Gated Recurrent Units (GRUs; a type of recurrent NN) was then trained to predict successive scenes in the model. Using the Chinese Restaurant Process, at each timepoint, the model evaluated its prediction error to decide if its current event representation was still a good fit. If the prediction error was high, the model chose whether it should switch to a different previously-learned event representation or create an entirely new event representation, by tuning parameters to evaluate total number of events and event durations. Franklin et al. showed that their model successfully learned complex event dynamics and simulated a wide variety of empirical phenomena. For example, the model’s ability to predict event boundaries from unannotated video data (Zacks, Kurby, Eisenberg, & Haroutunian, 2011 ) of a person completing everyday tasks like washing dishes, was highly correlated with grouped participant data and also produced similar levels of prediction error across event boundaries as human participants.

This section reviewed some early and recent work at modeling compositionality, by building higher-level representations such as sentences and events, through lower-level units such as words or discrete time points in video data. One important limitation of the event models described above is that they are not models of semantic memory per se, in that they neither contain rich semantic representations as input (Franklin et al., 2019 ), nor do they explicitly model how linguistic or perceptual input might be integrated to learn concepts (Elman & McRae, 2019 ). Therefore, while there have been advances in modeling word and sentence-level semantic representations (Sections I and II), and at the same time, there has been work on modeling how individuals experience events (Section IV), there appears to be a gap in the literature as far as integrating word-level semantic structures with event-level representations is concerned. Given the advances in language modeling discussed in this review, the integration of structured semantic knowledge (e.g., recursive NNs), multimodal semantic models, and models of event knowledge discussed in this review represents a promising avenue for future research that would enhance our understanding of how semantic memory is organized to represent higher-level knowledge structures. Another promising line of research in the direction of bridging this gap comes from the artificial intelligence literature, where neural network agents are being trained to learn language in a simulated grid world full of perceptual and linguistic information (Bahdanau et al., 2018 ; Hermann et al., 2017 ) using reinforcement learning principles. Indeed, McClelland, Hill, Rudolph, Baldridge, and Schütze ( 2019 ) recently advocated the need to situate language within a larger cognitive system. Conceptualizing semantic memory as part of a broader integrated memory system consisting of objects, situations, and the social world is certainly important for the success of the semantic modeling enterprise.

V. Open Issues and Future Directions

The question of how concepts are represented, stored, and retrieved is fundamental to the study of all cognition. Over the past few decades, advances in the fields of psychology, computational linguistics, and computer science have truly transformed the study of semantic memory. This paper reviewed classic and modern models of semantic memory that have attempted to provide explicit accounts of how semantic knowledge may be acquired, maintained, and used in cognitive tasks to guide behavior. Table 1 presents a short summary of the different types of models discussed in this review, along with their basic underlying mechanisms. In this concluding section, some open questions and potential avenues for future research in the field of semantic modeling will be discussed.

Data availability and abundance

Within the context of semantic modeling, data is a double-edged sword. On one hand, the availability of training data in the form of large text corpora such as Wikipedia articles, Google News corpora, etc. has led to an explosion of models such as word2vec (Mikolov, Chen, et al., 2013 ), fastText (Bojanowski et al., 2017 ), GLoVe (Pennington et al., 2014 ), and ELMo (Peters et al., 2018 ), which have outperformed several standard models of semantic memory traditionally trained on lesser data. Additionally, with the advent of computational resources to quickly process even larger volumes of data using parallel computing, models such as BERT (Devlin et al., 2019 ), GPT-2 (Radford et al., 2019 ), and GPT-3 (Brown et al., 2020 ) are achieving unprecedented success in language tasks like question answering, reading comprehension, and language generation. At the same time, however, criticisms of ungrounded distributional models have led to the emergence of a new class of “grounded” distributional models. These models automatically derive non-linguistic information from other modalities like vision and speech using convolutional neural networks (CNNs) to construct richer representations of concepts. Even so, these grounded models are limited by the availability of multimodal sources of data, and consequently there have been recent efforts at advocating the need for constructing larger databases of multimodal data (Günther et al., 2019 ).

On the other hand, training models on more data is only part of the solution. As discussed earlier, if models trained on several gigabytes of data perform as well as young adults who were exposed to far fewer training examples, it tells us little about human language and cognition. The field currently lacks systematic accounts for how humans can flexible use language in different ways with the impoverished data they are exposed to. For example, children can generalize their knowledge of concepts fairly easily from relatively sparse data when learning language, and only require a few examples of a concept before they understand its meaning (Carey & Bartlett, 1978 ; Landau, Smith, & Jones, 1988 ; Xu & Tenenbaum, 2007 ). Furthermore, both children and young adults can rapidly learn new information from a single training example, a phenomenon referred to as one-shot learning . To address this particular challenge, several researchers are now building models than can exhibit few-shot learning , i.e., learning concepts from only a few examples, or zero-shot learning , i.e., generalizing already acquired information to never-seen before data. Some of these approaches utilize pretrained models like GPT-2 and GPT-3 trained on very large datasets and generalizing their architecture to new tasks (Brown et al., 2020 ; Radford et al., 2019 ). While this approach is promising, it appears to be circular because it still uses vast amounts of data to build the initial pretrained representations. Other work in this area has attempted to implement one-shot learning using Bayesian generative principles (Lake, Salakhutdinov, & Tenenbaum, 2015 ), and it remains to be seen how probabilistic semantic representations account for the generative and creative nature of human language.

Errors and degradation in language processing

Another striking aspect of the human language system is its tendency to break down and produce errors during cognitive tasks. Analyzing errors in language tasks provides important cues about the mechanics of the language system. Indeed, there is considerable work on tip-of-the-tongue experiences (James & Burke, 2000 ; Kumar, Balota, Habbert, Scaltritti, & Maddox, 2019), speech errors (Dell, 1990 ), errors in reading (Clay, 1968 ), language deficits (Hodges & Patterson, 2007 ; Shallice, 1988 ), and age-related differences in language tasks (Abrams & Farrell, 2011 ), to suggest that the cognitive system is prone to interference, degradation, and variability. However, computational accounts for how language may be influenced by interference or degradation remain limited. Early connectionist models did provide ways of lesioning the network to account for neuropsychological deficits such as dyslexia (Hinton & Shallice, 1991 ; Plaut & Shallice, 1993 ) and category-specific semantic deficits (Farah & McClelland, 2013 ), and this general approach has recently been extended to train a recurrent NN based on sensorimotor and co-occurrence-based information and simulate behavioral patterns observed in patients of semantic dementia and semantic aphasia (Hoffman et al., 2018 ). However, current state-of-the-art language models like word2vec, BERT, and GPT-2 or GPT-3 do not provide explicit accounts for how neuropsychological deficits may arise, or how systematic speech and reading errors are produced. Furthermore, while there is considerable empirical work investigating age-related differences in language-processing tasks (e.g., speech errors, picture naming performance, lexical retrieval, etc.), it is unclear how current semantic models would account for these age-related changes, although some recent work has compared the semantic network structure between older and younger adults (Dubossarsky, De Deyne, & Hills, 2017 ; Wulff, Hills, & Mata, 2018 ). Indeed, the deterministic nature of modern machine-learning models is drastically different from the stochastic nature of human language that is prone to errors and variability (Kurach et al., 2019 ). Computational accounts of how the language system produces and recovers from errors will be an important part of building machine-learning models that can mimic human language.

Communication, social collaboration, and evolution

Another important aspect of language learning is that humans actively learn from each other and through interactions with their social counterparts, whereas the majority of computational language models assume that learners are simply processing incoming information in a passive manner (Günther et al., 2019 ). Indeed, there is now ample evidence to suggest that language evolved through natural selection for the purposes of gathering and sharing information (Pinker, 2003 , p. 27; DeVore & Tooby, 1987 ), thereby allowing for personal experiences and episodic information to be shared among humans (Corballis, 2017a , 2017b ). Consequently, understanding how artificial and human learners may communicate and collaborate in complex tasks is currently an active area of research. For example, some recent work in natural language processing has attempted to model interactions and search processes in collaborative language games, such as Codenames (Kumar, Steyvers, & Balota, under review ; Shen, Hofer, Felbo, & Levy, 2018 , also see Kim, Ruzmaykin, Truong, & Summerville, 2019 ), Password (Xu & Kemp, 2010 ), and navigational games (Wang, Liang, & Manning, 2016 ), and suggested that speakers and listeners do indeed calibrate their responses based on feedback from their conversational partner. Another body of work currently being led by technology giants like Google and OpenAI is focused on modeling interactions in multiplayer games like football (Kurach et al., 2019 ) and Dota 2 (OpenAI, 2019 ). This work is primarily based on reinforcement learning principles, where the goal is to train neural network agents to interact with their environment and perform complex tasks (Sutton & Barto, 1998 ). Although these research efforts are less language-focused, deep reinforcement learning models have also been proposed to specifically investigate language learning. For example, Li et al. ( 2016 ) trained a conversational agent using reinforcement learning, and a reward metric based on whether the dialogues generated by the model were easily answerable, informative, and coherent. Other learning-based models have used adversarial training, a method by which a model is trained to produce responses that would be indistinguishable from human responses (Li et al., 2017 ), a modern version of the Turing test (also see Spranger, Pauw, Loetzsch, & Steels, 2012 ). However, these recent attempts are still focused on independent learning, whereas psychological and linguistic research suggests that language evolved for purposes of sharing information, which likely has implications for how language is learned in the first place. Clearly, this line of work is currently in its nascent stages and requires additional research to fully understand and model the role of communication and collaboration in developing semantic knowledge.

Multilingual semantic models

A computational model can only be considered a model of semantic memory if it can be broadly applied to any semantic memory system and does not depend on the specific language of training. Therefore, an important challenge for computational semantic models is to be able to generalize the basic mechanisms of building semantic representations from English corpora to other languages. Some recent work has applied character-level CNNs to learn the rich morphological structure of languages like Arabic, French, and Russian (Kim, Jernite, Sontag, & Rush, 2016 ; also see Botha & Blunsom, 2014 ; Luong, Socher, & Manning, 2013 ). These approaches clearly suggest that pure word-level models that have occupied centerstage in the English language modeling community may not work as well in other languages, and subword information may in fact be critical in the language learning process. More recent embeddings like fastText (Bojanowski et al., 2017 ) that are trained on sub-lexical units are a promising step in this direction. Furthermore, constructing multilingual word embeddings that can represent words from multiple languages in a single distributional space is currently a thriving area of research in the machine-learning community (e.g., Chen & Cardie, 2018 ; Lample, Conneau, Ranzato, Denoyer, & Jégou, 2018 ). Overall, evaluating modern machine-learning models on other languages can provide important insights about language learning and is therefore critical to the success of the language modeling enterprise.

Revisiting benchmarks for semantic models

A critical issue that has not received adequate attention in the semantic modeling field is the quality and nature of benchmark test datasets that are often considered the final word for comparing state-of-the-art machine-learning-based language models. The General Language Understanding Evaluation (GLUE; Wang et al., 2018 ) benchmark was recently proposed as a collection of language-based task datasets, including the Corpus of Linguistic Acceptability (CoLA; Warstadt et al., 2018 ), the Stanford Sentiment Treebank (Socher et al., 2013 ), and the Winograd Schema Challenge (Levesque, Davis, & Morgenstern, 2012 ), among a total of 11 language tasks. Other popular benchmarks in the field include decaNLP (McCann, Keskar, Xiong, & Socher, 2018 ), the Stanford Question Answering Dataset (SQuAD; Rajpurkar et al., 2018 ), Word Similarity Test Collection (WordSim-33; Finkelstein et al., 2002 ) among others. While these benchmarks offer a standardized method of comparing performance across models, several of the tasks included within these benchmark datasets either consist of crowdsourced information collected from an unknown number of participants (e.g., SQuAD), scores or annotations based on very few human participants (e.g., 16 participants assessed similarity for 200 word-pairs in the WordSim-33 dataset), or sometimes datasets with no established human benchmark (e.g., the GLUE Diagnostic dataset, Wang et al., 2018 ). This is in contrast to more psychologically motivated models (e.g., semantic network models, BEAGLE, Temporal Context Model, etc.), where model performance is often compared against human baselines, for example in predicting accuracy or response latencies to perform a particular task, or through large-scale normed databases of human performance in semantic tasks (e.g., English Lexicon Project; Balota et al., 2007 ; Semantic Priming Project; Hutchison et al., 2013 ). Therefore, to evaluate whether state-of-the-art machine learning models like ELMo, BERT, and GPT-2 are indeed plausible psychological models of semantic memory, it is important to not only establish human baselines for benchmark tasks in the machine-learning community, but also explicitly compare model performance to human baselines in both accuracy and response times.

There have been some recent efforts in this direction. For example, Bender ( 2015 ) tested over 400 Amazon Mechanical Turk users on the Winograd Schema Challenge (a task that requires the use of world knowledge, commonsense reasoning and anaphora resolution) and provided quantitative baselines for accuracy and response times that should provide useful benchmarks to compare machine-learning models in the extent to which they explain human behavior (also see Morgenstern, Davis, & Ortiz, 2016 ). Further, Chen et al. ( 2017 ) compared the performance of the word2vec model against human baselines of solving analogies using relational similarity judgments to show that word2vec successfully captures only a subset of analogy relations. Additionally, Lazaridou, Marelli, and Baroni ( 2017 ) recently compared the performance of their multimodal skip-gram model (Lazaridou et al., 2015 ) against human relatedness judgments to visual and word cues for newly learned concepts to show that the model performed very similar to human participants. Despite these promising studies, such efforts remain limited due to the goals of machine learning often being application -focused and the goals of psychology being explanation -focused. Explicitly comparing model performance to behavioral task performance represents an important next step towards reconciling these two fields, and also combining representational and process-based accounts of how semantic memory guides cognitive behavior.

Prioritizing mechanistic accounts

Despite the lack of systematic comparisons to human baselines, an important takeaway that emerges from this review is that several state-of-the-art language models such as word2vec (Mikolov, Chen, et al., 2013 , Mikolov, Sutskever, et al., 2013 ), ELMo (Peters et al., 2018 ), BERT (Devlin et al., 2019 ), GPT-2 (Radford et al., 2019 ), and GPT-3 (Brown et al., 2020 ) do indeed show impressive performance across a wide variety of semantic tasks such as summarization, question answering, and sentiment analysis. However, despite their success, relatively little is known about how these models are able to produce this complex behavior, and exactly what is being learned by them in their process of building semantic representations. Indeed, there is some skepticism in the field about whether these models are truly learning something meaningful or simply exploiting spurious statistical cues in language, which may or may not reflect human learning. For example, Niven and Kao ( 2019 ) recently evaluated BERT’s performance in a complex argument-reasoning comprehension task, where world knowledge was critical for evaluating a particular claim. For example, to evaluate the strength of the claim “Google is not a harmful monopoly,” an individual may reason that “people can choose not to use Google,” and also provide the additional warrant that “other search engines do not redirect to Google” to argue in favor of the claim. On the other hand, if the alternative , “all other search engines redirect to Google” is true, then the claim would be false. Niven and Kao found that BERT was able to achieve state-of-the-art performance with 77% accuracy in this task, without any explicit world knowledge. For example, knowing what a monopoly might mean in this context (i.e., restricting consumer choices) and that Google is a search engine are critical pieces of knowledge required to evaluate the claim. Further analysis showed that BERT was simply exploiting statistical cues in the warrant (i.e., the word “not”) to evaluate the claim, and once this cue was removed through an adversarial test dataset, BERT’s performance dropped to chance levels (53%). The authors concluded that BERT was not able to learn anything meaningful about argument comprehension, even though the model performed better than other LSTM and vector-based models and was only a few points below the human baseline on the original task (also see Zellers, Holtzman, Bisk, Farhadi, & Choi, 2019 , for a similar demonstration on a commonsense-based inference task).

These results are especially important if state-of-the-art models like word2vec, ELMo, BERT or GPT-2/3 are to be considered plausible models of semantic memory in any manner and certainly underscore the need to focus on mechanistic accounts of model behavior. Understanding how machine-learning models arrive at answers to complex semantic problems is as important as simply evaluating how many questions the model was able to answer. Humans not only extract complex statistical regularities from natural language and the environment, but also form semantic structures of world knowledge that influence their behavior in tasks like complex inference and argument reasoning. Therefore, explicitly testing machine-learning models on the specific knowledge they have acquired will become extremely important in ensuring that the models are truly learning meaning and not simply exhibiting the “Clever Hans” effect (Heinzerling, 2019 ). To that end, explicit process-based accounts that shed light on the cognitive processes operating on underlying semantic representations across different semantic tasks may be useful in evaluating the psychological plausibility of different models. For instance, while distributional models perform well on a broad range of semantic tasks on average (Bullinaria & Levy, 2007 ; Mandera et al., 2017 ), it is unclear why their performance is better on tasks like synonym detection (Bullinaria & Levy, 2007 ) and similarity judgments (Bruni et al., 2014 ) and worse for semantic priming effects (Hutchison, Balota, Cortese, & Watson, 2008; Mandera et al., 2017 ), free association (Griffiths et al., 2007 ; Kenett et al., 2017 ), and complex inference tasks (Niven & Kao, 2019 ). A promising step towards understanding how distributional models may dynamically influence task performance was taken by Rotaru, Vigliocco, and Frank ( 2018 ), who recently showed that combining semantic network-based representations derived from LSA, GloVe, and word2vec with a dynamic spreading-activation framework significantly improved the predictive power of the models on semantic tasks. In light of this work, testing competing process-based models (e.g., spreading activation, drift-diffusion, temporal context, etc.) and structural or representational accounts of semantic memory (e.g., prediction-based, topic models, etc.) represents the next step in fully understanding how structure and processes interact to produce complex behavior.

The nature of knowledge representation and the processes used to retrieve that knowledge in response to a given task will continue to be the center of considerable theoretical and empirical work across multiple fields including philosophy, linguistics, psychology, computer science, and cognitive neuroscience. The ultimate goal of semantic modeling is to propose one architecture that can simultaneously integrate perceptual and linguistic input to form meaningful semantic representations, which in turn naturally scales up to higher-order semantic structures, and also performs well in a wide range of cognitive tasks. Given the recent advances in developing multimodal DSMs, interpretable and generative topic models, and attention-based semantic models, this goal at least appears to be achievable. However, some important challenges still need to be addressed before the field will be able to integrate these approaches and design a unified architecture. For example, addressing challenges like one-shot learning, language-related errors and deficits, the role of social interactions, and the lack of process-based accounts will be important in furthering research in the field. Although the current modeling enterprise has come very far in decoding the statistical regularities humans use to learn meaning from the linguistic and perceptual environment, no single model has been successfully able to account for the flexible and innumerable ways in which humans acquire and retrieve knowledge. Ultimately, integrating lessons learned from behavioral studies showing the interaction of world knowledge, linguistic and environmental context, and attention in complex cognitive tasks with computational techniques that focus on quantifying association, abstraction, and prediction will be critical in developing a complete theory of language.

Author’s Note

I sincerely thank David A. Balota, Jeffrey M. Zacks, Michael N. Jones, and Ian G. Dobbins for their extremely insightful feedback and helpful comments on earlier versions of the manuscript.

Open Practices Statement

Given the theoretical nature of this review, no data or program code is available. However, Table 1 provides a succinct summary of the key models discussed in this review.

Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2015). Random walks on semantic networks can resemble optimal foraging. In Neural Information Processing Systems Conference. 22 (3). 558. American Psychological Association.

Abrams, L., & Farrell, M. T. (2011). Language processing in normal aging. The Handbook of Psycholinguistic and Cognitive processes: Perspectives in Communication Sisorders , 49–73.

Alammar, J. (2018). The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning). Retrieved from http://jalammar.github.io/illustrated-bert/ .

Albert, R., Jeong, H., & Barabási, A. L. (2000). Error and attack tolerance of complex networks. Nature , 406 (6794), 378.

PubMed   Google Scholar  

Anderson, J. R. (2000). Learning and Memory: An Integrated Approach . John Wiley & Sons Inc.

Andrews, M., & Vigliocco, G. (2010). The hidden Markov topic model: A probabilistic model of semantic representation. Topics in Cognitive Science , 2 (1), 101–113.

Andrews, M., Vigliocco, G., & Vinson, D. (2009). Integrating experiential and distributional data to learn semantic representations. Psychological Review , 116 (3), 463.

Ashby, F. G., & Maddox, W. T. (1993). Relations between prototype, exemplar, and decision bound models of categorization. Journal of Mathematical Psychology , 37 (3), 372–400.

Asr, F. T., Willits, J., & Jones, M. (2016). Comparing Predictive and Co-occurrence Based Models of Lexical Semantics Trained on Child-directed Speech. Proceedings of the Annual Meeting of the Cognitive Science Society .

Avery, J., Jones, M.N. (2018). Comparing models of semantic fluency: Do humans forage optimally, or walk randomly? In Proceedings of the 40 th Annual Meeting of the Cognitive Science Society . 118–123.

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 .

Bahdanau, D., Hill, F., Leike, J., Hughes, E., Hosseini, A., Kohli, P., & Grefenstette, E. (2018). Learning to understand goal specifications by modelling reward. arXiv preprint arXiv:1806.01946 .

Balota, D. A., & Coane, J. H. (2008). Semantic memory. In Byrne JH, Eichenbaum H, Mwenzel R, Roediger III HL, Sweatt D (Eds.). Learning and Memory: A Comprehensive Reference (pp. 511–34). Amsterdam: Elsevier.

Google Scholar  

Balota, D. A., & Lorch, R. F. (1986). Depth of automatic spreading activation: Mediated priming effects in pronunciation but not in lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition , 12 (3), 336.

Balota, D. A., & Paul, S. T. (1996). Summation of activation: Evidence from multiple primes that converge and diverge within semantic memory. Journal of Experimental Psychology: Learning, Memory & Cognition , 22 , 827–845.

Balota, D. A., & Yap, M. J. (2006). Attentional control and flexible lexical processing: Explorations of the magic moment of word recognition. In S. Andrews (Ed.), From inkmarks to ideas: Current issues in Lexical processing , 229–258.

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English lexicon project. Behavior Research Methods , 39 (3), 445–459.

Bannard, C., Baldwin, T., & Lascarides, A. (2003). A statistical approach to the semantics of verb-particles. In Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment (pp. 65–72).

Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science , 286 (5439), 509–512.

Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 238–247).

Baroni, M., & Lenci, A. (2008). Concepts and properties in word spaces. Italian Journal of Linguistics , 20 (1), 55–88.

Barros-Loscertales, A., González, J., Pulvermüller, F., Ventura-Campos, N., Bustamante, J. C., Costumero, V., … Ávila, C. (2011). Reading salt activates gustatory brain regions: fMRI evidence for semantic grounding in a novel sensory modality. Cerebral Cortex , 22 (11), 2554–2563.

Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and Brain sciences , 22 (4), 577–660.

Barsalou, L. W. (2003). Abstraction in perceptual symbol systems. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences , 358(1435), 1177–1187.

Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59 , 617–645.

Barsalou, L. W. (2016). On staying grounded and avoiding quixotic dead ends. Psychonomic Bulletin & Review, 23 (4), 1122–1142.

Barsalou, L. W., Santos, A., Simmons, W. K., & Wilson, C. D. (2008). Language and simulation in conceptual processing. Symbols, Embodiment, and Meaning , 245–283.

Barsalou, L. W., & Wiemer-Hastings, K. (2005). Situating abstract concepts. Grounding Cognition: The role of Perception and Action in Memory, Language, and Thought , 129–163.

Beaty, R. E., Kaufman, S. B., Benedek, M., Jung, R. E., Kenett, Y. N., Jauk, E., … Silvia, P. J. (2016). Personality and complex brain networks: The role of openness to experience in default network efficiency. Human Brain Mapping , 37 (2), 773–779.

Bender, D. (2015). Establishing a Human Baseline for the Winograd Schema Challenge. In MAICS (pp. 39–45). Retrieved from https://pdfs.semanticscholar.org/1346/3717354ab61348a0141ebd3b0fdf28e91af8.pdf .

Bengio, Y., Goodfellow, I. J., & Courville, A. (2015). Deep learning, book in preparation for MIT press (2015).

Binder, K. S. (2003). Sentential and discourse topic effects on lexical ambiguity processing: An eye movement examination. Memory & Cognition , 31 (5), 690–702.

Binder, K. S., & Rayner, K. (1998). Contextual strength does not modulate the subordinate bias effect: Evidence from eye fixations and self-paced reading. Psychonomic Bulletin & Review , 5 (2), 271–276.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research , 3 , 993–1022.

Block, C. K., & Baldwin, C. L. (2010). Cloze probability and completion norms for 498 sentences: Behavioral and neural validation using event-related potentials. Behavior Research Methods , 42 (3), 665–670.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics , 5 , 135–146.

Botha, J., & Blunsom, P. (2014). Compositional morphology for word representations and language modelling. In International Conference on Machine Learning (pp. 1899–1907). Retrieved from http://proceedings.mlr.press/v32/botha14.pdf .

Bower, G. H., Black, J. B., & Turner, T. J. (1979). Scripts in memory for text. Cognitive Psychology , 11 (2), 177–220.

Bransford, J. D., & Johnson, M. K. (1972). Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of Verbal Learning and Verbal Behavior , 11 (6), 717–726.

Britton, B. K. (1978). Lexical ambiguity of words used in English text. Behavior Research Methods & Instrumentation, 10 , 1–7. https://doi.org/10.3758/BF03205079 .

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … Agarwal, S. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165 . Retrieved from https://arxiv.org/pdf/2005.14165.pdf .

Bruni, E., Tran, N. K., & Baroni, M. (2014). Multimodal distributional semantics. Journal of Artificial Intelligence Research , 49 , 1–47.

Buchanan, E. M., Valentine, K. D., & Maxwell, N. P. (2019). English semantic feature production norms: An extended database of 4436 concepts. Behavior Research Methods , 1–15

Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods , 39 (3), 510–526.

Burgess, C. (2000). Theory and operational definitions in computational memory models: A response to Glenberg and Robertson. Journal of Memory and Language , 43 (3), 402–408.

Burgess, C. (2001). Representing and resolving semantic ambiguity: A contribution from high-dimensional memory modeling. On the consequences of meaning selection: Perspectives on resolving lexical ambiguity , 233–260.

Carey, S., & Bartlett, E. (1978). Acquiring a single new word. Papers and Reports on Child Language Development , 15 , 17–29

Chen, D., Peterson, J. C., & Griffiths, T. L. (2017). Evaluating vector-space models of analogy. In Proceedings of the 39th Annual Conference of the Cognitive Science Society . Retrieved from https://arxiv.org/abs/1705.04416 .

Chen, X., Cardie, C. (2018). Unsupervised Multilingual Word Embeddings. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). Retrieved from https://arxiv.org/abs/1808.08933 .

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 . Retrieved from https://arxiv.org/abs/1406.1078 .

Chwilla, D. J., & Kolk, H. H. (2002). Three-step priming in lexical decision. Memory & Cognition , 30 (2), 217–225.

Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What Does BERT Look At? An Analysis of BERT's Attention. arXiv preprint arXiv:1906.04341 . Retrieved from https://arxiv.org/abs/1906.04341 .

Clark, S., Coecke, B., & Sadrzadeh, M. (2008). A compositional distributional model of meaning. In Proceedings of the Second Quantum Interaction Symposium (QI-2008) (pp. 133–140).

Clark, S., & Pulman, S. (2007). Combining symbolic and distributional models of meaning. Retrieved from https://www.aaai.org/Papers/Symposia/Spring/2007/SS-07-08/SS07-08-008.pdf .

Clay, M. M. (1968). A syntactic analysis of reading errors. Journal of Verbal Learning and Verbal Behavior , 7 (2), 434–438.

Coenen, A., Reif, E., Yuan, A., Kim, B., Pearce, A., Viégas, F., & Wattenberg, M. (2019). Visualizing and measuring the geometry of bert. arXiv preprint arXiv:1906.02715. Retrieved from https://arxiv.org/pdf/1906.02715.pdf .

Collell, G., & Moens, M. F. (2016). Is an image worth more than a thousand words? on the fine-grain semantic differences between visual and linguistic representations. In Proceedings of the 26th International Conference on Computational Linguistics (pp. 2807–2817). ACL.

Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review , 82 (6), 407.

Collins, A. M., & Quillian, M. R. (1969). Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior , 8 (2), 240–247.

Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning ,160–167. ACM.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research , 12 , 2493–2537.

Corballis, M. C. (2017a). Language evolution: a changing perspective. Trends in Cognitive Sciences , 21 (4), 229–236.

Corballis, M. C. (2017b). The evolution of language. In J. Call, G. M. Burghardt, I. M. Pepperberg, C. T. Snowdon, & T. Zentall (Eds.), APA handbooks in psychology®. APA handbook of comparative psychology: Basic concepts, methods, neural substrate, and behavior (p. 273–297). American Psychological Association. https://doi.org/10.1037/0000011-014

Damasio, A. R. (1989). Time-locked multiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition. Cognition , 33 (1–2), 25–62.

De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., & Storms, G. (2019). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research Methods , 51 (3), 987–1006.

De Deyne, S., Perfors, A., & Navarro, D. J. (2016). Predicting human similarity judgments with distributional models: The value of word associations. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 1861–1870).

De Deyne, S., & Storms, G. (2008). Word associations: Network and semantic properties. Behavior Research Methods , 40 (1), 213–231.

Deese, J. (1959). On the prediction of occurrence of particular verbal intrusions in immediate recall. Journal of Experimental Psychology , 58 (1), 17.

Dell, G. S. (1990). Effects of frequency and vocabulary type on phonological speech errors. Language and Cognitive Processes , 5 (4), 313–349.

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009, June). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). IEEE.

Dennis, S. (2004). An unsupervised method for the extraction of propositional information from text. Proceedings of the National Academy of Sciences of the United States of America , 101(Suppl 1), 5206–5213.

Dennis, S. (2005). A memory-based theory of verbal cognition. Cognitive Science , 29 (2), 145–193.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 .

DeVore, I., & Tooby, J. (1987). The reconstruction of hominid behavioral evolution through strategic modeling. The Evolution of Human Behavior: Primate Models, edited by WG Kinzey , 183–237.

Dove, G. (2011). On the need for embodied and dis-embodied cognition. Frontiers in Psychology , 1 , 242.

PubMed   PubMed Central   Google Scholar  

Dubossarsky, H., De Deyne, S., & Hills, T. T. (2017). Quantifying the structure of free association networks across the life span. Developmental Psychology , 53 (8), 1560.

Duffy, S. A., Morris, R. K., & Rayner, K. (1988). Lexical ambiguity and fixation times in reading. Journal of Memory and Language , 27 (4), 429–446.

Durda, K., Buchanan, L., & Caron, R. (2009). Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory. Behavior Research Methods, 41(4), 1210–1223.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14 (2), 179–211.

Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7 (2-3), 195–225.

Elman, J. L., & McRae, K. (2019). A model of event knowledge. Psychological Review , 126 (2), 252.

Farah, M. J., & McClelland, J. L. (2013). A computational model of semantic memory impairment: Modality specificity and emergent category specificity (Journal of Experimental Psychology: General, 120 (4), 339–357). In Exploring Cognition: Damaged Brains and Neural Networks (pp. 79–110). Psychology Press.

Federmeier, K. D., & Kutas, M. (1999). A rose by any other name: Long-term memory structure and sentence processing. Journal of Memory and Language , 41 (4), 469–495.

Fellbaum, C. (Ed.). (1998). WordNet, an electronic lexical database . Cambridge, MA: MIT Press.

Feng, Y., & Lapata, M. (2010). Visual information in semantic representation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 91–99). Association for Computational Linguistics.

Fernandino, L., Conant, L. L., Binder, J. R., Blindauer, K., Hiner, B., Spangler, K., & Desai, R. H. (2013). Where is the action? Action sentence processing in Parkinson's disease. Neuropsychologia , 51 (8), 1510–1517.

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2002). Placing search in context: The concept revisited. ACM Transactions on information systems , 20 (1), 116–131.

Firth, J. R. (1957). A synopsis of linguistic theory, 1930-1955. In Philological Society (Great Britain) (Ed.), Studies in Linguistic Analysis . Oxford: Blackwell.

Fischler, I. (1977). Semantic facilitation without association in a lexical decision task. Memory & Cognition , 5 , 335–339.

Franklin, N., Norman, K. A., Ranganath, C., Zacks, J. M., & Gershman, S. J. (2019). Structured event memory: a neuro-symbolic model of event cognition. BioRxiv , 541607. Retrieved from https://www.biorxiv.org/content/biorxiv/early/2019/02/05/541607.full.pdf .

Fried, E. I., van Borkulo, C. D., Cramer, A. O. J., Boschloo, L., Schoevers, R. A., & Borsboom, D (2017). Mental disorders as networks of problems: a review of recent insights. Social Psychiatry and Psychiatric Epidemiology , 52 (1), 1–10.

Gabrieli, J. D., Cohen, N. J., & Corkin, S. (1988). The impaired learning of semantic knowledge following bilateral medial temporal-lobe resection. Brain and Cognition , 7 (2), 157–177.

Garagnani, M., & Pulvermüller, F. (2016). Conceptual grounding of language in action and perception: a neurocomputational model of the emergence of category specificity and semantic hubs. European Journal of Neuroscience , 43 (6), 721–737.

Glenberg, A. M., & Robertson, D. A. (2000). Symbol grounding and meaning: A comparison of high-dimensional and embodied theories of meaning. Journal of Memory and Language , 43 (3), 379–401.

Goldberg, R. F., Perfetti, C. A., & Schneider, W. (2006). Perceptual knowledge retrieval activates sensory brain regions. Journal of Neuroscience , 26 (18), 4917–4921.

Griffiths, T. L., & Steyvers, M. (2002). A probabilistic approach to semantic representation. In Proceedings of the Annual meeting of the Cognitive Science Society , 24 (24).

Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Advances in Neural Information Processing Systems , 11–18.

Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences , 101 (1), 5228–5235.

Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114 (2), 211.

Gruenenfelder, T. M., Recchia, G., Rubin, T., & Jones, M. N. (2016). Graph-theoretic properties of networks based on word association norms: implications for models of lexical semantic memory. Cognitive Science , 40 (6), 1460–1495.

Guida, A., & Lenci, A. (2007). Semantic properties of word associations to Italian verbs. Italian Journal of Linguistics , 19 (2), 293–326.

Günther, F., Rinaldi, L., & Marelli, M. (2019). Vector-space models of semantic representation from a cognitive perspective: A discussion of common misconceptions. Perspectives on Psychological Science , 14 (6), 1006–1033.

Hard, B. M., Tversky, B., & Lang, D. S. (2006). Making sense of abstract events: Building event schemas. Memory & Cognition , 34 (6), 1221–1235.

Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena , 42 (1–3), 335–346.

Harris, Z. (1970). Distributional structure. In Papers in Structural and Transformational Linguistics (pp. 775–794). Dordrecht, Holland: D. Reidel Publishing Company.

Hebb, D. (1949). The organization of learning. Cambridge, MA: MIT Press.

Heinzerling, B. (2019). NLP's Clever Hans Moment has Arrived. Retrieved from https://thegradient.pub/nlps-clever-hans-moment-has-arrived/ .

Hermann, K. M., Hill, F., Green, S., Wang, F., Faulkner, R., Soyer, H., … Wainwright, M. (2017). Grounded language learning in a simulated 3d world. arXiv preprint arXiv:1706.06551 . Retrieved from https://arxiv.org/abs/1706.06551 .

Hills, T. T. (2006). Animal foraging and the evolution of goal-directed cognition. Cognitive Science , 30 (1), 3–41.

Hills, T. T., Jones, M. N., & Todd, P. M. (2012). Optimal foraging in semantic memory. Psychological Review , 119 (2), 431.

Hinton, G. E., & Shallice, T. (1991). Lesioning an attractor network: Investigations of acquired dyslexia. Psychological Review , 98 (1), 74.

Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95 (4), 528.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation , 9 (8), 1735–1780.

Hodges, J. R., & Patterson, K. (2007). Semantic dementia: a unique clinicopathological syndrome. The Lancet Neurology , 6 (11), 1004–1014.

Hoffman, P., McClelland, J. L., & Lambon Ralph, M. A. (2018). Concepts, control, and context: A connectionist account of normal and disordered semantic cognition. Psychological Review , 125 (3), 293.

Howard, M. W., & Kahana, M. J. (2002). A distributed representation of temporal context. Journal of Mathematical Psychology , 46 , 269–299.

Howard, M. W., Shankar, K. H., & Jagadisan, U. K. (2011). Constructing semantic representations from a gradually changing representation of temporal context. Topics in Cognitive Science, 3(1), 48–73.

Howell, S. R., Jankowicz, D., & Becker, S. (2005). A model of grounded language acquisition: Sensorimotor features improve lexical and grammatical learning. Journal of Memory and Language , 53(2), 258–276.

Hutchison, K. A. (2003). Is semantic priming due to association strength or feature overlap? A microanalytic review. Psychonomic Bulletin & Review , 10(4), 785–813.

Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora, E. R., Tse, C. S., … Buchanan, E. (2013). The semantic priming project. Behavior Research Methods , 45 (4), 1099–1114.

James, L. E., & Burke, D. M. (2000). Phonological priming effects on word retrieval and tip-of-the-tongue experiences in young and older adults. Journal of Experimental Psychology: Learning, Memory, and Cognition , 26 (6), 1378.

Jamieson, R. K., Avery, J. E., Johns, B. T., & Jones, M. N. (2018). An instance theory of semantic memory. Computational Brain & Behavior , 1(2), 119–136.

Jawahar, G., Sagot, B., Seddah, D. What does BERT learn about the structure of language?. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019, Florence, Italy. ffhal-02131630f. Retrieved from https://hal.inria.fr/hal-02131630/document .

Johns, B. T., & Jones, M. N. (2012). Perceptual inference through global lexical similarity. Topics in Cognitive Science, 4 (1), 103–120.

Johns, B. T., & Jones, M. N. (2015). Generating structure from experience: A retrieval-based model of language processing. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale , 69 (3), 233.

Johns, B. T., Mewhort, D. J., & Jones, M. N. (2019). The Role of Negative Information in Distributional Semantic Learning. Cognitive Science , 43 (5), e12730.

Jones, M., & Recchia, G. (2010). You can’t wear a coat rack: A binding framework to avoid illusory feature migrations in perceptually grounded semantic models. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 32, No. 32).

Jones, M. N. (2018). When does abstraction occur in semantic memory: insights from distributional models. Language, Cognition and Neuroscience , 1–9.

Jones, M. N., Gruenenfelder, T. M., & Recchia, G. (2018). In defense of spatial models of semantic representation. New Ideas in Psychology , 50 , 54–60.

Jones, M. N., Hills, T. T., & Todd, P. M. (2015). Hidden processes in structural representations: A reply to Abbott, Austerweil, and Griffiths (2015). Psychological Review , 122 (3), 570–574. doi: https://doi.org/10.1037/a0039248

Article   PubMed   PubMed Central   Google Scholar  

Jones, M. N., Kintsch, W., & Mewhort, D. J. (2006). High-dimensional semantic space accounts of priming. Journal of Memory and Language , 55 (4), 534–552.

Jones, M. N., & Mewhort, D. J. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review , 114(1), 1.

Jones, M. N., Willits, J., & Dennis, S. (2015). Models of semantic memory. Oxford Handbook of Mathematical and Computational Psychology , 232–254.

Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv :1404.2188.

Kanerva, P. (2009). Hyperdimensional computing: An introduction to computing in distributed representations with high-dimensional random vectors. Cognitive Computation , 1 , 139–159.

Kenett, Y. N., Anaki, D., & Faust, M. (2014). Investigating the structure of semantic networks in low and high creative persons. Frontiers in Human Neuroscience, 8 , 407.

Kenett, Y. N., Gold, R., & Faust, M. (2016). The hyper-modular associative mind: a computational analysis of associative responses of persons with Asperger syndrome. Language and Speech, 59 (3), 297–317.

Kenett, Y. N., Kenett, D. Y., Ben-Jacob, E., & Faust, M. (2011). Global and local features of semantic networks: Evidence from the Hebrew mental lexicon. PloS one, 6 (8), e23912.

Kenett, Y. N., Levi, E., Anaki, D., & Faust, M. (2017). The semantic distance task: Quantifying semantic distance with semantic network path length. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(9), 1470.

Kiela, D., & Bottou, L. (2014). Learning image embeddings using convolutional neural networks for improved multi-modal semantics. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 36–45).

Kiela, D., Bulat, L., & Clark, S. (2015). Grounding semantics in olfactory perception. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7 th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 231–236). Beijing, China: ACL.

Kiela, D., & Clark, S. (2015). Multi-and cross-modal semantics beyond vision: Grounding in auditory perception. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP 2015) (pp. 2461–2470). Lisbon, Portugal: ACL.

Kim, A., Ruzmaykin, M., Truong, A., & Summerville, A. (2019). Cooperation and Codenames: Understanding Natural Language Processing via Codenames. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (Vol. 15, No. 1, pp. 160–166).

Kim, Y., Jernite, Y., Sontag, D., & Rush, A. M. (2016). Character-aware neural language models. In Thirtieth AAAI Conference on Artificial Intelligence . Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12489 .

Kintsch, W. (2001). Predication. Cognitive Science, 25 (2), 173–202.

Kintsch, W., & Walter Kintsch, C. (1998). Comprehension: A paradigm for cognition. Cambridge university press.

Kousta, S. T., Vigliocco, G., Vinson, D. P., Andrews, M., & Del Campo, E. (2011). The representation of abstract words: why emotion matters. Journal of Experimental Psychology: General , 140 (1), 14.

Kumar, A. A., Balota, D. A., Habbert, J., Scaltritti, M., & Maddox, G. B. (2019). Converging semantic and phonological information in lexical retrieval and selection in young and older adults. Journal of Experimental Psychology: Learning, Memory, and Cognition . 45 (12), 2267–2289. https://doi.org/10.1037/xlm0000699

Article   PubMed   Google Scholar  

Kumar, A.A., Balota, D.A., Steyvers, M. (2019). Distant Concept Connectivity in Network-Based and Spatial Word Representations. In Proceedings of the 41 st Annual Meeting of the Cognitive Science Society , 1348–1354.

Kumar, A.A., Steyvers, M., & Balota, D.A. (under review). Investigating Semantic Memory Retrieval in a Cooperative Word Game.

Kurach, K., Raichuk, A., Stańczyk, P., Zając, M., Bachem, O., Espeholt, L., … Gelly, S. (2019). Google research football: A novel reinforcement learning environment. arXiv preprint arXiv:1907.11180 . Retrieved from https://arxiv.org/abs/1907.11180 .

Kutas, M., & Hillyard, S. A. (1980). Event-related brain potentials to semantically inappropriate and surprisingly large words. Biological Psychology , 11 (2), 99–116.

Kwantes, P. J. (2005). Using context to build semantics. Psychonomic Bulletin & Review , 12 (4), 703–710.

Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science , 350 (6266), 1332–1338.

Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain sciences , 40 .

Lakoff, G., & Johnson, M. (1999). Philosophy in the Flesh (Vol. 4). New York: Basic books.

Lample, G., Conneau, A., Ranzato, M. A., Denoyer, L., & Jégou, H. (2018). Word translation without parallel data. In International Conference on Learning Representations . Retrieved from https://openreview.net/forum?id=H196sainb .

Landau, B., Smith, L. B., & Jones, S. S. (1988). The importance of shape in early lexical learning. Cognitive Development , 3 (3), 299–321.

Landauer, T. K. (2001). Single representations of multiple meanings in latent semantic analysis.

Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review , 104(2), 211.

Landauer, T. K., Laham, D., Rehder, B., & Schreiner, M. E. (1997). How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In Proceedings of the 19th annual meeting of the Cognitive Science Society (pp. 412–417).

Lazaridou, A., Marelli, M., & Baroni, M. (2017). Multimodal word meaning induction from minimal exposure to natural text. Cognitive Science , 41 , 677–705.

Lazaridou, A., Pham, N. T., & Baroni, M. (2015). Combining language and vision with a multimodal skip-gram model. arXiv preprint arXiv:1501.02598 .

Lebois, L. A., Wilson-Mendenhall, C. D., & Barsalou, L. W. (2015). Are automatic conceptual cores the gold standard of semantic processing? The context-dependence of spatial meaning in grounded congruency effects. Cognitive Science , 39 (8), 1764–1801.

Levesque, H., Davis, E., & Morgenstern, L. (2012). The winograd schema challenge. In Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning . Retrieved from https://www.aaai.org/ocs/index.php/KR/KR12/paper/viewPaper/4492 .

Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems (pp. 2177–2185).

Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics , 3 , 211–225.

Li, J., & Jurafsky, D. (2015). Do multi-sense embeddings improve natural language understanding?. arXiv preprint arXiv:1506.01070 .

Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., & Jurafsky, D. (2016). Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541 . Retrieved from https://arxiv.org/abs/1606.01541 .

Li, J., Monroe, W., Shi, T., Jean, S., Ritter, A., & Jurafsky, D. (2017). Adversarial learning for neural dialogue generation. arXiv preprint arXiv:1701.06547 . Retrieved from https://arxiv.org/abs/1701.06547 .

Li, P., Burgess, C., & Lund, K. (2000). The acquisition of word meaning through global lexical co-occurrences. In Proceedings of the Thirtieth Annual Child Language Research Forum , 166–178.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 . Retrieved from https://arxiv.org/abs/1907.11692 .

Livesay, K., & Burgess, C. (1998). Mediated priming in high-dimensional semantic space: No effect of direct semantic relationships or co-occurrence. Brain and Cognition, 37 (1), 102–105.

Lopopolo, A., & Miltenburg, E. (2015). Sound-based distributional models. In Proceedings of the 11th International Conference on Computational Semantics (pp. 70–75).

Louwerse, M. M. (2011). Symbol interdependency in symbolic and embodied cognition. Topics in Cognitive Science , 3 (2), 273–302.

Lucas, M. (2000). Semantic priming without association: A meta-analytic review. Psychonomic Bulletin & Review , 7(4), 618–630.

Lucy, L., & Gauthier, J. (2017). Are distributional representations ready for the real world? Evaluating word vectors for grounded perceptual meaning. arXiv preprint arXiv:1705.11168 .

Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers , 28 (2), 203–208.

Luong, T., Socher, R., & Manning, C. (2013). Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning (pp. 104–113). Retrieved from https://www.aclweb.org/anthology/W13-3512/ .

Lupker, S. J. (1984). Semantic priming without association: A second look. Journal of Verbal Learning and Verbal Behavior , 23 , 709–733.

Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92 , 57–78.

Masson, M. E. (1995). A distributed memory model of semantic priming. Journal of Experimental Psychology: Learning, Memory, and Cognition , 21 (1), 3.

Matheson, H., White, N., & McMullen, P. (2015). Accessing embodied object representations from vision: A review. Psychological Bulletin , 141 (3), 511.

Matheson, H. E., & Barsalou, L. W. (2018). Embodiment and grounding in cognitive neuroscience. Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience , 3 , 1–27.

Mayford, M., Siegelbaum, S. A., & Kandel, E. R. (2012). Synapses and memory storage. Cold Spring Harbor Perspectives in Biology , 4 (6), a005751.

McCann, B., Keskar, N. S., Xiong, C., & Socher, R. (2018). The natural language decathlon: Multitask learning as question answering. arXiv preprint arXiv:1806.08730 . Retrieved from https://arxiv.org/abs/1806.08730 .

McClelland, J. L., Hill, F., Rudolph, M., Baldridge, J., & Schütze, H. (2019). Extending Machine Language Models toward Human-Level Language Understanding. arXiv preprint arXiv:1912.05877 . Retrieved from https://arxiv.org/abs/1912.05877 .

McCloskey, M., & Glucksberg, S. (1979). Decision processes in verifying category membership statements: Implications for models of semantic memory. Cognitive Psychology, 11 (1), 1–37.

McKoon, G., & Ratcliff, R. (1992). Spreading activation versus compound cue accounts of priming: Mediated priming revisited. Journal of Experimental Psychology: Learning, Memory, and Cognition , 18 (6), 1155.

McKoon, G., Ratcliff, R., & Dell, G. S. (1986). A critical evaluation of the semantic-episodic distinction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12 (2), 295–306. https://doi.org/10.1037/0278-7393.12.2.295

McNamara, T. P. (2005). Semantic priming: Perspectives from memory and word recognition. Psychology Press. (p. 86)

McNamara, T. P., & Altarriba, J. (1988). Depth of spreading activation revisited: Semantic mediated priming occurs in lexical decisions. Journal of Memory and Language , 27 (5), 545–559.

McRae, K. (2004). Semantic memory: Some insights from feature-based connectionist attractor networks. The Psychology of Learning and Motivation: Advances in Research and Theory , 45 , 41–86.

McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods , 37 (4), 547–559.

McRae, K., De Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural representations of word meaning. Journal of Experimental Psychology: General, 126 (2), 99.

McRae, K., Khalkhali, S., & Hare, M. (2012). Semantic and associative relations: Examining a tenuous dichotomy. In V. F. Reyna, S. B. Chapman, M. R. Dougherty, & J. Confrey (Eds.), The Adolescent Brain: Learning, Reasoning, and Decision Making (pp. 39–66). Washington, DC: APA.

Metusalem, R., Kutas, M., Urbach, T. P., Hare, M., McRae, K., & Elman, J. L. (2012). Generalized event knowledge activation during online sentence comprehension. Journal of Memory and Language , 66 (4), 545–567.

Meyer, D. E., & Schvaneveldt, R. W. (1971). Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. Journal of Experimental Psychology , 90 (2), 227.

Michel, P., Levy, O., & Neubig, G. (2019). Are sixteen heads really better than one?. In Advances in Neural Information Processing Systems (pp. 14014–14024).

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 .

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119).

Miller, G.A. (1995).WordNet: An online lexical database [Special Issue]. International Journal of Lexicography , 3 (4).

Mitchell, J., & Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science, 34 (8), 1388–1429.

Morgenstern, L., Davis, E., & Ortiz, C. L. (2016). Planning, executing, and evaluating the winograd schema challenge. AI Magazine , 37 (1), 50–54.

Morris, R. K. (1994). Lexical and message-level sentence context effects on fixation times in reading. Journal of Experimental Psychology: Learning, Memory, and Cognition , 20 (1), 92.

Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. Journal of Experimental Psychology: General , 106 (3), 226.

Neely, J. H. (2012). Semantic priming effects in visual word recognition: A selective review of current findings and theories. In Basic processes in reading (pp. 272–344). Routledge.

Neisser, U. 1976. Cognition and Reality . San Francisco: W.H. Freeman and Co.

Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers , 36 (3), 402–407.

Nematzadeh, A., Miscevic, F., & Stevenson, S. (2016). Simple search algorithms on semantic networks learned from language use. arXiv preprint arXiv:1602.03265 . Retrieved from https://arxiv.org/pdf/1602.03265.pdf .

Niven, T., & Kao, H. Y. (2019). Probing neural network comprehension of natural language arguments. arXiv preprint arXiv:1907.07355 . Retrieved from https://arxiv.org/pdf/1907.07355.pdf .

Nosofsky, R. M. (1988). Exemplar-based accounts of relations between classification, recognition, and typicality. Journal of Experimental Psychology: Learning, Memory, and Cognition , 14 (4), 700.

Nozari, N., Trueswell, J. C., & Thompson-Schill, S. L. (2016). The interplay of local attraction, context and domain-general cognitive control in activation and suppression of semantic distractors during sentence comprehension. Psychonomic Bulletin & Review , 23 (6), 1942–1953.

O’Kane, G., Kensinger, E. A., & Corkin, S. (2004). Evidence for semantic learning in profound amnesia: an investigation with patient HM. Hippocampus , 14 (4), 417–425.

Olah, C. (2019). Understanding LSTM Networks. Colah’s Blog . Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/

Olney, A. M. (2011). Large-scale latent semantic analysis. Behavior Research Methods , 43 (2), 414–423.

OpenAI (2019). Dota 2 with Large Scale Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1912.06680 .

Osgood, C. E. (1952). The nature and measurement of meaning. Psychological Bulletin , 49 (3), 197.

Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The Measurement of Meaning (No. 47). University of Illinois Press.

Pacht, J. M., & Rayner, K. (1993). The processing of homophonic homographs during reading: Evidence from eye movement studies. Journal of Psycholinguistic Research , 22 (2), 251–271.

Paivio, A. (1991). Dual coding theory: Retrospect and current status. Canadian Journal of Psychology/Revue canadienne de psychologie , 45 (3), 255.

Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience , 8 (12), 976.

Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532-1543).

Perfetti, C. (1998). The limits of co-occurrence: Tools and theories in language research. Discourse Processes, 25 , 363-377.

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365 .

Pezzulo, G., & Calvi, G. (2011). Computational explorations of perceptual symbol systems theory. New Ideas in Psychology , 29 (3), 275-297.

Pinker, S. (2003). Language as an adaptation to the cognitive niche. Studies in the Evolution of Language , 3 , 16-37.

Pirrone, A., Marshall, J. A., & Stafford, T. (2017). A Drift Diffusion Model account of the semantic congruity effect in a classification paradigm. Journal of Numerical Cognition , 3 (1), 77-96.

Plaut, D. C., & Booth, J. R. (2000). Individual and developmental differences in semantic priming: empirical and computational support for a single-mechanism account of lexical processing. Psychological Review , 107 (4), 786.

Plaut, D. C., & Shallice, T. (1993). Deep dyslexia: A case study of connectionist neuropsychology. Cognitive Neuropsychology , 10 (5), 377–500.

Polyn, S. M., Norman, K. A., & Kahana, M. J. (2009). A context maintenance and retrieval model of organizational processes in free recall. Psychological Review, 116 (1), 129.

Posner, M. I., & Keele, S. W. (1968). On the genesis of abstract ideas. Journal of Experimental Psychology , 77(3p1), 353.

Posner, M. I., & Snyder, C. R. R. (1975) Attention and cognitive control. In: Solso R (ed.) Information Processing and Cognition: The Loyola Symposium , pp. 55–85. Hillsdale, NJ: Erlbaum.

Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Reviews Neuroscience , 6 (7), 576.

Quillian, M. R. (1967) Word concepts: A theory and simulation of some basic semantic capabilities. Behavioral Science, 12 (5), 410–430

Quillian, M. R. (1969). The teachable language comprehender: A simulation program and theory of language. Communications of the ACM, 12 (8), 459–476.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog , 1 (8). Retrieved from https://www.techbooky.com/wp-content/uploads/2019/02/Better-Language-Models-and-Their-Implications.pdf .

Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822. Retrieved from https://arxiv.org/pdf/1806.03822.pdf .

Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural Computation , 20 (4), 873–922.

Rayner, K., Cook, A. E., Juhasz, B. J., & Frazier, L. (2006). Immediate disambiguation of lexically ambiguous words during reading: Evidence from eye movements. British Journal of Psychology , 97 (4), 467–482.

Rayner, K., & Frazier, L. (1989). Selection mechanisms in reading lexically ambiguous words. Journal of Experimental Psychology: Learning, Memory, and Cognition , 15 (5), 779.

Recchia, G., & Jones, M. N. (2009). More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods , 41 (3), 647–656.

Recchia, G., & Nulty, P. (2017). Improving a Fundamental Measure of Lexical Association. In CogSci .

Reisinger, J., & Mooney, R. J. (2010). Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 109–117). Association for Computational Linguistics.

Rescorla, R. A. (1988). Behavioral studies of Pavlovian conditioning. Annual Review of Neuroscience , 11 (1), 329–352.

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Classical Conditioning II: Current Research and Theory , 2 , 64–99.

Reynolds, J. R., Zacks, J. M., & Braver, T. S. (2007). A computational model of event segmentation from perceptual prediction. Cognitive Science , 31 (4), 613–643.

Richie, R., White, B., Bhatia, S., & Hout, M. C. (2019). The spatial arrangement method of measuring similarity can capture high-dimensional, semantic structures. Retrieved from https://psyarxiv.com/qm27p .

Richie, R., Zou, W., & Bhatia, S. (2019). Predicting high-level human judgment across diverse behavioral domains. Collabra: Psychology , 5 (1).

Riordan, B., & Jones, M. N. (2011). Redundancy in perceptual and linguistic experience: Comparing feature-based and distributional models of semantic representation. Topics in Cognitive Science, 3 (2), 303–345.

Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition , 21 (4), 803.

Roget, P. M. (1911). Roget’s Thesaurus of English Words and Phrases (1911 ed.). Retrieved October 28, 2004, from http://www.gutenberg.org/etext/10681

Rosch, E., & Lloyd, B. B. (Eds.). (1978). Cognition and categorization.

Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology , 7 (4), 573–605.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology , 8 (3), 382–439.

Rotaru, A. S., Vigliocco, G., & Frank, S. L. (2018). Modeling the Structure and Dynamics of Semantic Processing. Cognitive Science , 42 (8), 2890–2917.

Rubinstein, D., Levi, E., Schwartz, R., & Rappoport, A. (2015). How well do distributional models capture different types of semantic knowledge?. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers, 726–730.

Rumelhart, D. E. (1991). Understanding understanding. Memories, thoughts and emotions: Essays in honor of George Mandler , 257, 275.

Rumelhart, D. E., Hinton, G. E., & McClelland, J. L. (1986). A general framework for parallel distributed processing. Parallel Distributed Processing: Explorations in the Microstructure of cognition , 1 (45-76), 26.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. Cognitive Modeling, 5 (3), 1.

Rumelhart, D. E., & Todd, P. M. (1993). Learning and connectionist representations. Attention and Performance XIV: Synergies in Experimental Psychology, Artificial Intelligence, and Cognitive Neuroscience , 3–30.

Sahlgren, M. (2008). The distributional hypothesis. Italian Journal of Disability Studies , 20 , 33–53.

Sahlgren, M., Holst, A., & Kanerva, P. (2008). Permutations as a means to encode order in word space. Proceedings of the 30th Conference of the Cognitive Science Society , p. 1300–1305.

Saluja, A., Dyer, C., & Ruvini, J. D. (2018). Paraphrase-Supervised Models of Compositionality. arXiv preprint arXiv:1801.10293 .

Schank, R. C., & Abelson, R. P. (1977). Scripts. Plans, Goals and Understanding .

Schapiro, A. C., Rogers, T. T., Cordova, N. I., Turk-Browne, N. B., & Botvinick, M. M. (2013). Neural representations of events arise from temporal community structure. Nature Neuroscience , 16 (4), 486.

Schneider, T. R., Debener, S., Oostenveld, R., & Engel, A. K. (2008). Enhanced EEG gamma-band activity reflects multisensory semantic matching in visual-to-auditory object priming. Neuroimage , 42 (3), 1244–1254.

Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences , 3 (3), 417–424.

Shallice, T. (1988). Specialisation within the semantic system. Cognitive Neuropsychology , 5 (1), 133–142.

Shen, J. H., Hofer, M., Felbo, B., & Levy, R. (2018). Comparing Models of Associative Meaning: An Empirical Investigation of Reference in Simple Language Games. arXiv preprint arXiv:1810.03717 .

Siew, C. S., Wulff, D. U., Beckage, N. M., & Kenett, Y. N. (2018). Cognitive Network Science: A review of research on cognition through the lens of network representations, processes, and dynamics. Complexity .

Silberer, C., & Lapata, M. (2012). Grounded models of semantic representation. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1423–1433). Association for Computational Linguistics.

Silberer, C., & Lapata, M. (2014, June). Learning grounded meaning representations with autoencoders. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 721–732).

Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision .

Sloutsky, V. M., Yim, H., Yao, X., & Dennis, S. (2017). An associative account of the development of word learning. Cognitive Psychology , 97 , 1–30.

Smith, E. E., Shoben, E. J., & Rips, L. J. (1974). Structure and process in semantic memory: A featural model for semantic decisions. Psychological Review , 81(3), 214.

Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 1201–1211). Association for Computational Linguistics.

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical methods in Natural Language Processing (pp. 1631–1642).

Spranger, M., Pauw, S., Loetzsch, M., & Steels, L. (2012). Open-ended procedural semantics. In L. Steels & M. Hild (Eds.), Language grounding in robots (pp. 153–172). Berlin, Heidelberg, Germany: Springer.

Stanton, R. D., Nosofsky, R. M., & Zaki, S. R. (2002). Comparisons between exemplar similarity and mixed prototype models using a linearly separable category structure. Memory & Cognition , 30 (6), 934–944.

Stella, M., Beckage, N. M., & Brede, M. (2017). Multiplex lexical networks reveal patterns in early word acquisition in children. Scientific Reports , 7 , 46730.

Stella, M., Beckage, N. M., Brede, M., & De Domenico, M. (2018). Multiplex model of mental lexicon reveals explosive learning in humans. Scientific Reports , 8 (1), 2259.

Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science , 29 (1), 41–78.

Sutton, R. and Barto, A. (1998). Reinforcement learning: An introduction. Cambridge, MA, MIT Press.

Swinney, D. A. (1979). Lexical access during sentence comprehension:(Re) consideration of context effects. Journal of Verbal Learning and Verbal Behavior , 18 (6), 645–659.

Tabossi, P., Colombo, L., & Job, R. (1987). Accessing lexical ambiguity: Effects of context and dominance. Psychological Research , 49 (2-3), 161–167.

Tenney, I., Das, D., & Pavlick, E. (2019). BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950. Retrieved from https://arxiv.org/pdf/1905.05950.pdf .

Thompson-Schill, S. L., Kurtz, K. J., & Gabrieli, J. D. E. (1998). Effects of semantic and associative relatedness on automatic priming. Journal of Memory and Language , 38 , 440–458.

Tulving, E. (1972). Episodic and semantic memory. Organization of Memory , 1 , 381–403.

Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research , 37 , 141–188.

Tversky, A. (1977). Features of similarity. Psychological Review , 84 (4), 327.

Upadhyay, S., Chang, K. W., Taddy, M., Kalai, A., & Zou, J. (2017). Beyond bilingual: Multi-sense word embeddings using multilingual context. arXiv preprint arXiv:1706.08160 .

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

Vigliocco, G., Kousta, S. T., Della Rosa, P. A., Vinson, D. P., Tettamanti, M., Devlin, J. T., & Cappa, S. F. (2013). The neural representation of abstract words: the role of emotion. Cerebral Cortex , 24 (7), 1767–1777.

Vigliocco, G., Meteyard, L., Andrews, M., & Kousta, S. (2009). Toward a theory of semantic representation. Language and Cognition , 1 (2), 219–247.

Vigliocco, G., Vinson, D. P., Lewis, W., & Garrett, M. F. (2004). Representing the meanings of object and action words: The featural and unitary semantic space hypothesis. Cognitive Psychology , 48 (4), 422–488.

Vitevitch, M. S., Chan, K. Y., & Goldstein, R. (2014). Insights into failed lexical retrieval from network science. Cognitive Psychology , 68 , 1–32.

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., & Bowman, S. R. (2018). Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 . Retrieved from https://arxiv.org/abs/1804.07461 .

Wang, S. I., Liang, P., & Manning, C. D. (2016). Learning language games through interaction. arXiv preprint arXiv:1606.02447 .

Warstadt, A., Singh, A., & Bowman, S. R. (2018). Neural network acceptability judgments. arXiv preprint arXiv:1805.12471 . Retrieved from https://arxiv.org/abs/1805.12471 .

Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393 (6684), 440.

Westbury, C. (2016) Pay no attention to that man behind the curtain. The Mental Lexicon, 11 (3), 350–374

Widdows, D. (2008). Semantic Vector Products: Some Initial Investigations. In Proceedings of the Second AAAI Symposium on Quantum Interaction. Retrieved from https://research.google/pubs/pub33477/ .

Willems, R. M., Labruna, L., D’Esposito, M., Ivry, R., & Casasanto, D. (2011). A functional role for the motor system in language understanding: evidence from theta-burst transcranial magnetic stimulation. Psychological Science , 22 (7), 849–854.

Wittgenstein, Ludwig (1953). Philosophical Investigations. Blackwell Publishing.

Wulff, D. U., Hills, T., & Mata, R. (2018). Structural differences in the semantic networks of younger and older adults. Retrieved from https://psyarxiv.com/s73dp/ .

Xu, Y., & Kemp, C. (2010). Inference and communication in the game of Password. In Advances in neural information processing systems (pp. 2514–2522).

Xu, F., & Tenenbaum, J. B. (2007). Word learning as Bayesian inference. Psychological review , 114 (2), 245.

Yee, E., Chrysikou, E. G., Hoffman, E., & Thompson-Schill, S. L. (2013). Manual experience shapes object representations. Psychological Science , 24 (6), 909–919.

Yee, E., Huffstetler, S., & Thompson-Schill, S. L. (2011). Function follows form: Activation of shape and function features during object identification. Journal of Experimental Psychology: General , 140 (3), 348.

Yee, E., Jones, M. N., & McRae, K. (2018). Semantic memory. Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience , 3, 1–38.

Yee, E., Lahiri, A., & Kotzor, S. (2017). Fluid semantics: Semantic knowledge is experience-based and dynamic. The Speech Processing Lexicon: Neurocognitive and Behavioural Approaches , 22 , 236.

Yee, E., & Thompson-Schill, S. L. (2016). Putting concepts into context. Psychonomic Bulletin & Review , 23 (4), 1015–1027.

Yessenalina, A., & Cardie, C. (2011). Compositional matrix-space models for sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 172–182). Association for Computational Linguistics.

Zacks, J. M., Kurby, C. A., Eisenberg, M. L., & Haroutunian, N. (2011). Prediction error associated with the perceptual segmentation of naturalistic events. Journal of Cognitive Neuroscience, 23 (12), 4057–4066.

Zellers, R., Holtzman, A., Bisk, Y., Farhadi, A., & Choi, Y. (2019). HellaSwag: Can a Machine Really Finish Your Sentence?. arXiv preprint arXiv:1905.07830 . Retrieved from https://arxiv.org/pdf/1905.07830.pdf .

Zemla, J. C., & Austerweil, J. L. (2018). Estimating semantic networks of groups and individuals from fluency data. Computational Brain & Behavior, 1 (1), 36–58.

Zhu, X., Sobhani, P., & Guo, H. (2016). Dag-structured long short-term memory for semantic compositionality. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 917-926).

Download references

Author information

Authors and affiliations.

Department of Psychological and Brain Sciences, Washington University in St. Louis, Campus Box 1125, One Brookings Drive, St. Louis, MO, 63130, USA

Abhilasha A. Kumar

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Abhilasha A. Kumar .

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Kumar, A.A. Semantic memory: A review of methods, models, and current challenges. Psychon Bull Rev 28 , 40–80 (2021). https://doi.org/10.3758/s13423-020-01792-x

Download citation

Published : 03 September 2020

Issue Date : February 2021

DOI : https://doi.org/10.3758/s13423-020-01792-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Semantic memory
  • Distributional semantic models
  • Semantic networks
  • Neural networks
  • Language models
  • Find a journal
  • Publish with us
  • Track your research

Advertisement

Over 100 Arrested at Columbia After Pro-Palestinian Protest

The university called in the police to empty an encampment of demonstrators. But students have vowed to stay, no matter the consequences.

  • Share full article

The Columbia lawn, with police leading students away.

By Sharon Otterman and Alan Blinder

  • April 18, 2024

More than 100 students were arrested on Thursday after Columbia University called in the police to empty an encampment of pro-Palestinian demonstrators, fulfilling a vow to Congress by the school’s president that she was prepared to punish people for unauthorized protests.

“I took this extraordinary step because these are extraordinary circumstances,” the president, Nemat Shafik, wrote in a campuswide email on Thursday afternoon.

The president’s decision swiftly sharpened tensions on campus, which has been battered for months by boisterous pro-Palestinian demonstrations that many Jewish people regarded as antisemitic. And it stood to become a milestone for the country, as campuses have been torn by the Israel-Hamas war and grappled with how to manage protests.

What was far less clear was whether the harsher tactics would form an updated playbook for officials struggling to calm restive campuses, or do little besides infuriate and inflame.

Protesters had already promised that any effort to dismantle the encampment would only embolden them.

Dr. Shafik’s message arrived as swarms of New York City police officers, clad in riot gear and bearing zip ties, marched on the encampment of about 50 tents that had sprung up earlier in the week. On Thursday, protesters clutched Palestinian flags, demonstrators sat huddled on the ground and a thicket of onlookers kept watch as officers bore down on tents in the zone that had styled itself as the “Gaza Solidarity Encampment.”

“Since you have refused to disperse, you will now be placed under arrest for trespassing,” a man repeatedly called through a loudspeaker. The protesters responded with their own repeated cry: “Columbia, Columbia, you will see — Palestine will be free!”

Mayor Eric Adams said on Thursday evening that while Columbia has a “proud history of protest,” students did not “have a right to violate university policies and disrupt learning.”

Less than an hour later, at least two buses were filled with arrested protesters, while other demonstrators thundered their displeasure toward officers. Among those arrested, according to police, was Isra Hirsi, the daughter of Representative Ilhan Omar, Democrat from Minnesota. Ms. Hirsi was issued a summons for trespassing.

“They can threaten us all they want with the police, but at the end of the day, it’s only going to lead to more mobilization,” Maryam Alwan, a senior and pro-Palestinian organizer on campus, had said before the arrests.

Barnard College, across the street from Columbia and so closely linked to the university that the two institutions share dining halls, said it had begun issuing interim suspensions against its students who participated in the encampment.

“Now and always, we prioritize our students’ learning and living in an inclusive environment free from harassment,” Barnard said in its own campus message. “Given the evolving circumstances at Columbia and in the area, we are working to ensure the safety and well-being of the entire Barnard community.”

The core of the turmoil, though, was at Columbia.

Etched into Columbia’s history is the brutal police crackdown that its administrators authorized in 1968 against student protesters who were occupying academic buildings. The fallout from the violence tarnished the school’s reputation and led it to adopt reforms in favor of student activism.

Now, the university points proudly to that activism as one of the hallmarks of its culture, and markets it to prospective students. On Thursday, Dr. Shafik insisted that university officials “work hard to balance the rights of students to express political views with the need to protect other students from rhetoric that amounts to harassment and discrimination.”

In recent months, she and administrators across the country have felt that tension acutely, as the federal government opened investigations into the handling of bias claims at dozens of schools, Congress subpoenaed records and court dockets filled with lawsuits.

Columbia, with roughly 5,000 Jewish students and a vibrant strain of support for the Palestinian cause, has drawn particular attention, which led to the appearances by Dr. Shafik and three other Columbia leaders on Capitol Hill on Wednesday.

During her testimony, Dr. Shafik said she had been frustrated “that Columbia’s policies and structures were sometimes unable to meet the moment,” and said the university had updated many of them. Some of those changes include limiting protests to certain times of day and to designated spots on campus.

Columbia’s tightened rules were being tested even as Dr. Shafik testified. By 7:15 p.m. on Wednesday, Columbia said, the university had issued a written warning to students in the encampment: They had 105 minutes to leave or they would face suspension.

Administrators also deployed intermediaries to try to defuse the showdown, only, they said, to have those entreaties rejected.

In a statement before the arrests, Apartheid Divest, a coalition of student groups, said that protesters planned to remain until the university acceded to its demands, including that the university cut its financial ties to Israel. And while Dr. Shafik’s decision drew immediate criticism from the protesters and their allies, others on and around Columbia’s campus had signaled that they would support a crackdown.

“They have guidelines and if they are violating them, I don’t see why this is a special circumstance,” said Ami Nelson, a student.

Since the Oct. 7 Hamas attacks on Israel, administrators at Columbia had tried to calibrate their approaches to the demonstrations, balancing free-speech rights with the security of Jewish students.

But before the Republican-led House Committee on Education and the Workforce on Wednesday, Dr. Shafik and other Columbia leaders signaled a tougher approach. The co-chair of the university’s board, Claire Shipman, declared that there was “a moral crisis on our campus.” And Dr. Shafik went so far as to detail some of the disciplinary actions underway, including suspensions and firings.

That conciliatory approach toward House Republicans infuriated many on campus.

In New York, some students and faculty members complained that university leaders had largely kowtowed to a Congress whose insistent questioning helped fuel the recent resignations by the presidents of Harvard and the University of Pennsylvania.

There has been no indication that Dr. Shafik, who took office last July, has lost the confidence of Columbia’s board. Thursday’s tactics, though, showed how much more aggressive she has become in her campaign to quell protests.

Five days after the attack on Israel, hundreds of protesters gathered on the campus, and the university shut its gates — a step that has now become familiar as protests have flared. Weeks later, Columbia suspended a pair of student groups, Students for Justice in Palestine and Jewish Voice for Peace, in connection with an unauthorized student walkout.

The university rolled out a protest policy in February that was designed to curtail demonstrations, and this month, Dr. Shafik announced suspensions of students who had helped organize an event that included open expressions of support for Hamas.

“This is a challenging moment and these are steps that I deeply regret having to take,” Dr. Shafik wrote on Thursday.

Tents were removed later that day. But within hours, another protest had formed on the lawn and new tents were up.

Reporting was contributed by Olivia Bensimon , Anna Betts , Karla Marie Sanford, Stephanie Saul and Chelsia Rose Marcius

Sharon Otterman is a Times reporter covering higher education, public health and other issues facing New York City. More about Sharon Otterman

Alan Blinder is a national correspondent for The Times, covering education. More about Alan Blinder

IMAGES

  1. 18 Semantics Examples (2024)

    semantic essay

  2. (PDF) Syntactic, Semantic and Sentiment Analysis: The Joint Effect on

    semantic essay

  3. (PDF) Semantic Analysis

    semantic essay

  4. The Different Theories of Semantics

    semantic essay

  5. Essay of semantics

    semantic essay

  6. (PDF) Analysing Semantic Flow in Academic Writing

    semantic essay

VIDEO

  1. Literary Analysis Essay Presentation Overview

  2. Semantic Error 🇰🇷 #bl #blseries #seriesbl

  3. Semantics -Componential Analysis and Semantic Roles

  4. 【AIED 2023】Automated Essay Scoring Incorporating Multilevel Semantic Features

  5. ☆ a very relaxing yet chaotic unboxing of semantic error photo essay ☆

  6. Thoughts On Automatic Essay Grading

COMMENTS

  1. Semantic Scholar

    Semantic Reader is an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual. Try it for select papers. Semantic Scholar uses groundbreaking AI and engineering to understand the semantics of scientific literature to help Scholars discover relevant research.

  2. Semantic Change

    "Semantic Change" published on by Oxford University Press. 1. Foci of Research in the Last One Hundred Years. The main focus of work on semantic change 1 from the early 20th century on has been on changes in "sense," the concepts associated with expressions. 2 An example of sense change is the shift in the value speakers have attributed to pretty over time (first 'crafty,' then 'well ...

  3. Making semantic waves: A key to cumulative knowledge-building

    The high-achieving essay traces a series of semantic waves between wide-ranging and literary ideas and the concrete particularities of each text; the low-achieving essay traces a semantic flatline with strongly bounded discussion of highly contextualized and simple meanings from each text. Download : Download full-size image; Fig. 9.

  4. Semantics

    semantics, the philosophical and scientific study of meaning in natural and artificial languages.The term is one of a group of English words formed from the various derivatives of the Greek verb sēmainō ("to mean" or "to signify"). The noun semantics and the adjective semantic are derived from sēmantikos ("significant"); semiotics (adjective and noun) comes from sēmeiōtikos ...

  5. Essays in Logical Semantics

    About this book. Recent developments in the semantics of natural language seem to lead to a genuine synthesis of ideas from linguistics and logic, producing novel concepts and questions of interest to both parent disciplines. This book is a collection of essays on such new topics, which have arisen over the past few years.

  6. PDF Automated Essay Scoring Incorporating Multi-level Semantic ...

    Automated Essay Scoring Incorporating Multi-level Semantic Features Jianwei Li1,2 and Jiahui Wu1(B) 1 College of Network Education, Beijing University of Posts and Telecommunications, Beijing 100088, China [email protected] 2 Beijing Key Laboratory of Network System and Network Culture, Beijing University of Posts and Telecommunications, Beijing 100876, China

  7. The use of semantic similarity tools in automated content ...

    This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard was produced by a native expert. A shortlist of ...

  8. Syntactic, Semantic and Sentiment Analysis: The Joint Effect on

    Digital Object Identifier 10.1109/ACCESS.2017.DOI. Syntactic, Semantic and Sentiment. Analysis: The Joint Effect on A utomated. Essay Ev aluation. HARNEET KAUR JAND A, A TISH PA WAR, SHAN DU AND ...

  9. Semantic Scholar

    Semantic Scholar is a research tool for scientific literature powered by artificial intelligence.It is developed at the Allen Institute for AI and was publicly released in November 2015. Semantic Scholar uses modern techniques in natural language processing to support the research process, for example by providing automatically generated summaries of scholarly papers.

  10. Automated essay evaluation with semantic analysis

    Automated essay evaluation (AEE) is the process of evaluating and scoring the written essays via computer programs [1]. For teachers and educational institutions, AEE represents not only a tool to assess learning outcomes, but also helps save time, effort and money without lowering the quality. AEE systems can also be used in all other ...

  11. Theories of Meaning

    1. Two Kinds of Theory of Meaning. In "General Semantics", David Lewis wrote. I distinguish two topics: first, the description of possible languages or grammars as abstract semantic systems whereby symbols are associated with aspects of the world; and, second, the description of the psychological and sociological facts whereby a particular one of these abstract semantic systems is the one ...

  12. Lexical Semantics

    Summary. Lexical semantics is the study of word meaning. Descriptively speaking, the main topics studied within lexical semantics involve either the internal semantic structure of words, or the semantic relations that occur within the vocabulary. Within the first set, major phenomena include polysemy (in contrast with vagueness), metonymy ...

  13. An automated essay scoring systems: a systematic literature review

    The NLTK is an NLP tool used to retrieve statistical features like POS, word count, sentence count, etc. With NLTK, we can miss the essay's semantic features. To find semantic features Word2Vec Mikolov et al. , GloVe Jeffrey Pennington et al. is the most used libraries to retrieve the semantic text from the essays. And in some systems, they ...

  14. Automated essay evaluation with semantic analysis

    Detecting semantic errors in an essay. Only two mentioned systems [10], [11] partially check if the statements in the essays are correct. SAGrader, developed by Brent [11], was the first AEE system that detected semantic information in an essay and upon which we based the architecture of our system. For SAGrader, the teacher first specifies the ...

  15. Metasemantics: New Essays on the Foundations of Meaning

    Semantic theory is explanatory in those areas where it can capture information coded by the language faculty; where the lexicon consists of pointers outside the language faculty, semantics may resort to disquotation. Matti Eklund's essay is one of the hardest to place with respect to the metasemantic questions that are the focus of the volume.

  16. Semantics: Explanation and Examples

    Let's explore some examples to see semantics in action: Sarcasm: If someone exclaims, "What a wonderful day!" while caught in a downpour, they don't actually mean the day is wonderful. Semantics helps us understand that the words convey the opposite of their literal meaning because of the context and tone of voice. Homonyms: Take the ...

  17. Semantics Essay

    Semantics Essay. Language is the primary source of communication for humans and is used to convey thoughts, feelings, intentions, and desires to others (Bonvillain 2008:1). Many rules are taken into account when forming a language. According to Bolton, language is arbitrary and is unrepresentative of the objects they represent (Bolton 1996: 63).

  18. Automated Essay Scoring Incorporating Multi-level Semantic Features

    Utilizing intelligent technology to automatically grade essays is an effective method of saving significant manpower and time resources, and improving the accuracy of score. The present models typically rely on shallow semantic features, deep semantic features and multi-level semantic features. Existing models, however, struggle to be superior ...

  19. Semantics

    Semantics definition. Semantics is a literary field that studies meaning of signs, symbols, words, and phrases and how they are used in linguistics. It is a wide field of study which has two main branches: lexical semantics, which focuses on the meaning of words and their relations when used together, and logical semantics, which is concerned ...

  20. Semantic Typicality Effects in Acquired Dyslexia: Evidence for Semantic

    Semantic Typicality Effects. Semantic typicality is a factor that affects the organization of semantic categories in the mental lexicon. While the classical view of semantic categorization considers each category to possess a set of defining features, it has been shown that not all members of a category represent these features to the same degree (Rosch, 1973, 1975).

  21. 15 Semantic Memory Examples (2024)

    4. Writing an Essay: Semantic Memory . Writing an excellent essay may seem straightforward on the surface. However, it actually involves an extensive series of steps. First, one must conduct the necessary research. That involves reading various articles and books, or perhaps watching a documentary or biographical account of an historical figure.

  22. Semantic Memory

    Semantic memory is a form of long-term memory that comprises a person's knowledge about the world. Along with episodic memory, it is considered a kind of explicit memory, because a person is ...

  23. Semantic memory: A review of methods, models, and current ...

    Adult semantic memory has been traditionally conceptualized as a relatively static memory system that consists of knowledge about the world, concepts, and symbols. Considerable work in the past few decades has challenged this static view of semantic memory, and instead proposed a more fluid and flexible system that is sensitive to context, task demands, and perceptual and sensorimotor ...

  24. More Than 100 Columbia University Students Arrested as Protests

    By Sharon Otterman and Alan Blinder. April 18, 2024. More than 100 students were arrested on Thursday after Columbia University called in the police to empty an encampment of pro-Palestinian ...