My Speech Class

Public Speaking Tips & Speech Topics

41 Psychology Speech Topic Ideas

Photo of author

Jim Peterson has over 20 years experience on speech writing. He wrote over 300 free speech topic ideas and how-to guides for any kind of public speaking and speech writing assignments at My Speech Class.

psychology speech topics

  • Hierarchy of human needs theory of Abraham Maslow. The series of levels in that process are good main points: the physiological, safety, belonging, esteem and self-actualisation needs.
  • Why do so many people find adolescence so difficult? Life circumstances perhaps make you feel like you are riding in a roller coaster due to the speedily physical and emotional changes. Mention the causes and the ways to cure. Psychology speech topics to help and advice other persons.
  • How do you remember what you know? In other words describe the way your brain works for short-term and long-term memories. Molecular and chemical actions and reactions can be part of your informative conversation.
  • Artificial Intelligence technologies. E.g. computer systems performing like humans (robotica), problem solving and knowledge management with reasoning based on past cases and data.
  • Strong stimuli that cause changes in temporary behavior. E.g. Pills, money, food, sugar. In that case it is an good idea to speak about energy drinks and their short-term effectiveness – do not forget to mention the dangers …
  • Jung’s theory about our ego, personal unconscious and collective unconscious. Psychoanalyst Carl Jung discovered that neurosis is based on tensions between our psyche and attitudes.
  • What exactly is Emotional Intelligence? And why is it more important than IQ-ratings nowadays. How to measure EI with what personality tests?
  • That brings me to the next psychology speech topics: the dangers of personality tests.
  • Psychological persuasion techniques in speeches. E.g. body language, understanding audience’s motivations, trance and hypnosis.
  • Marketing and selling techniques based on psychological effects. E.g. attractive stimulating colorful packaging, or influencing behavior of consumers while shopping in malls.
  • Meditation helps to focus and calm down the mind. E.g. Teach your public to focus on breathing, revive each movement of an activity in slow motion, or the walking meditation.
  • The reasons against and for becoming a behaviorist. Behaviorism is the methodological study of how the scientifically method of psychology.
  • Sigmund Freud and his ideas. With a little bit of fantasy you can alter and convert these example themes into attractive psychology speech topics: our defense mechanisms, hypnosis and catharsis, psychosexual stages of development.
  • Biological causes of a depression. E.g. biological and genetic, environmental, and emotional factors.
  • When your boss is a woman. What happens to men? And to women?
  • The first signs of anxiety disorders. E.g. sleeping and concentration problems, edgy and irritable feelings.
  • How psychotherapy by trained professionals helps people to recover.
  • How to improve your nonverbal communication skills and communicate effectively. Study someone’s incongruent body signs, vary your tone of voice, keep eye contact while talking informally of formally with a person.
  • Always talk after traumatic events. Children, firefighter, police officers, medics in conflict zones.
  • The number one phobia on earth is fear of public speaking – and not fear of dying. That I think is a very catchy psychological topic …

And a few more topics you can develop yourself:

  • Dangers of personality tests.
  • How to set and achieve unrealistic goals.
  • Sigmund Freud Theory.
  • The Maslow’s hierarchy of needs theory.
  • Three ways to measure Emotional Intelligence.
  • Why public speaking is the number one phobia on the planet.
  • Animated violence does influence the attitude of young people.
  • Becoming a millionaire will not make you happy.
  • Being a pacifist is equal to being naive.
  • Change doesn’t equal progress.
  • Everyone is afraid to speak in public.
  • Ideas have effect and consequence on lives.
  • Mental attitude affects the healing process.
  • Philanthropy is the fundament of curiosity.
  • Praise in public and criticize or punish in private.
  • Sometimes it is okay to lie.
  • The importance of asking yourself why you stand for something.
  • The only answer to cruelty is kindness.
  • The trauma of shooting incidents last a lifetime.
  • To grab people’s attention on stage, keep a close eye on their attitude and social backgrounds.
  • Torture as an interrogation technique is never acceptable.

207 Value Speech Topics – Get The Facts

66 Military Speech Topics [Persuasive, Informative]

Leave a Comment

I accept the Privacy Policy

Reach out to us for sponsorship opportunities

Vivamus integer non suscipit taciti mus etiam at primis tempor sagittis euismod libero facilisi.

© 2024 My Speech Class

6 Mind-Blowing TED Talks About Psychology & Human Behavior

The human brain is complex and confusing, which explains why human behavior is so complex and confusing. People have a tendency to act one way when they feel something completely different. Here are a few interesting TED Talks that delve into human psychology and try to explain why we are the way we are.

The human brain is complex and confusing, which explains why human behavior is so complex and confusing. People have a tendency to act one way when they feel something completely different.

How many times have you been asked “What’s wrong?” only to answer “Nothing,” even though something truly was wrong? Personally, I’ve lost count. Humans are strange creatures, indeed.

And yet, the craziness of human behavior doesn’t end there. There are hundreds--even thousands--of different aspects to our behavior that have a strange science behind them. Some of what you believe may actually be false.

Here are a few interesting TED Talks that delve into human psychology and try to explain why we are the way we are.

The Paradox of Choice

http://youtu.be/VO6XEQIsCoM

The secret to happiness is low expectations .

With a quotable snippet like that, I think it’s easy to see that the giver of this TED Talk is very entertaining in his delivery and insightful in his material.

In this 20-minute clip, Barry Schwartz talks about the gradual increase in choices that we have as consumers and how making the right decision in a sea of choices can have a negative impact on our lives. He calls it the paradox of choice, and it all stems from a simple assumption: more freedom leads to more happiness.

Watch this video and learn why the availability of choices can actually be a detriment to your happiness as an individual.

A Kinder, Gentler Philosophy of Success

http://youtu.be/MtSE4rglxbY

Alain de Botton, a Swiss philosopher, presents a philosophical discussion on the modern world’s idea of success and how the structure of society influences our notions of success and failure. With a number of insightful illustrations and analogies, I think Botton successfully challenges the popular understanding of individual success.

He’s articulate, eloquent, and quite witty with his comments and remarks. Plus, he’s just plain funny. Even if the subject matter bores you (which I guarantee it won’t), you’ll still be entertained by his delivery style and his intelligent jokes.

Watching this clip was a pleasure and you won’t be disappointed.

The Surprising Science of Motivation

http://youtu.be/rrkrvAUbU9Y

In this talk, Daniel Pink discusses the outdated notion of extrinsic motivators in the modern world. Outside factors, like increased pay and other incentives, can actually be damaging to creativity, inspiration, and motivation.

While the idea of extrinsic motivation was useful and effective in the 20th Century, Pink argues that this outdated idea must be replaced by a new one: intrinsic motivation . Because 21st Century tasks differ so fundamentally from 20th Century tasks, new motivational forces are required.

This is a mind-blowing talk that confirms what we, as humans, already know: productivity is heightened when we want to do something rather than when we are only encouraged to do something. A must-see clip, indeed.

How We Read Each Other’s Minds

http://youtu.be/GOCUH7TxHRI

In this TED Talk, scientist Rebecca Saxe discusses a region of the brain--called the Right Temporo-Parietal Junction--that is used when you make judgments about other people and what they’re thinking.

Through a series of experiments, Saxe shows how the development of this area of the brain contributes to how you view other people, their thoughts, and their motives behind their actions. In other words, underdevelopment of the RTPJ results in a lessened ability for representing and understanding another’s beliefs.

If you love science jargon and scientific analysis, this one’s for you.

What You Don’t Know About Marriage

http://youtu.be/Y8u42OjH0ss

Nowadays, it’s a well-known fact that marriages--at least in America--are more likely to end in divorce than a happily-ever-after scenario. However, writer Jenna McCarthy presents her researched findings about the factors that are common in all successful marriages.

As a writer, McCarthy has injected jokes and humor into her presentation. Some may find her cute and pleasant to listen to. Others may be put off by her attempts to make the subject matter funny.

Nonetheless, if you are interested in making your current marriage work or if you want to know how to bulletproof your future marriage , this is the TED Talk for you.

The 8 Secrets to Success

http://youtu.be/Y6bbMQXQ180

Secrets? Perhaps if you’re young or if you’ve been hiding under a rock for the last decade. The topic of success--and how you can achieve it--has been examined and studied to death. Everyone wants to be successful, but not everyone is successful.

How can you increase the chance for your own personal success? In this TED Talk, Richard St. John tells you in eight simple words.

But beware: even if these are the secrets to success, they are not shortcuts to success. There are no shortcuts to success. So buckle yourself down, watch this video, and prepare yourself for the mental fortitude that you’ll need in order to cross the line from failure to success.

Do you follow TED Talks? Tell us about your picks in the subject of your choice.

Image Credits: Psychology Image Via Shutterstock

What makes us tick? These TED Talks -- from psychologists and journalists, doctors and patients -- share the latest research on why we do what we do.

Video playlists about Psychology

speech on human psychology

The political mind

speech on human psychology

A decade in review

speech on human psychology

The most popular TED Talks in Hindi

speech on human psychology

How perfectionism fails us

Talks about psychology.

speech on human psychology

The bias behind your undiagnosed chronic pain

speech on human psychology

When is anger justified? A philosophical inquiry

speech on human psychology

How to know if you're being selfish (and whether or not that's bad)

speech on human psychology

How to get motivated even when you don’t feel like it

speech on human psychology

My mission to change the narrative of mental health

speech on human psychology

How stress drains your brain — and what to do about it

speech on human psychology

How to make smart decisions more easily

speech on human psychology

The science behind how sickness shapes your mood

speech on human psychology

Why you shouldn't trust boredom

speech on human psychology

Are you really as good at something as you think?

speech on human psychology

How to overcome your mistakes

speech on human psychology

The single most important parenting strategy

speech on human psychology

Does more freedom at work mean more fulfillment?

speech on human psychology

How to hack your brain when you're in pain

speech on human psychology

Why is it so hard to break a bad habit?

speech on human psychology

How to enter flow state

Exclusive articles about psychology, a smart way to handle anxiety — courtesy of soccer great lionel messi, how do top athletes get into the zone by getting uncomfortable, here’s how you can handle stress like a lion, not a gazelle.

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Greek and Roman Papyrology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Agriculture
  • History of Education
  • History of Emotions
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Variation
  • Language Families
  • Language Evolution
  • Language Reference
  • Lexicography
  • Linguistic Theories
  • Linguistic Typology
  • Linguistic Anthropology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Culture
  • Music and Media
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business History
  • Business Ethics
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic Methodology
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Cognitive Psychology

  • < Previous chapter
  • Next chapter >

26 Speech Perception

Sven L. Mattys, Department of Psychology, University of York, York, UK

  • Published: 03 June 2013
  • Cite Icon Cite
  • Permissions Icon Permissions

Speech perception is conventionally defined as the perceptual and cognitive processes leading to the discrimination, identification, and interpretation of speech sounds. However, to gain a broader understanding of the concept, such processes must be investigated relative to their interaction with long-term knowledge—lexical information in particular. This chapter starts with a review of some of the fundamental characteristics of the speech signal and by an evaluation of the constraints that these characteristics impose on modeling speech perception. Long-standing questions are then discussed in the context of classic and more recent theories. Recurrent themes include the following: (1) the involvement of articulatory knowledge in speech perception, (2) the existence of a speech-specific mode of auditory processing, (3) the multimodal nature of speech perception, (4) the relative contribution of bottom-up and top-down flows of information to sound categorization, (5) the impact of the auditory environment on speech perception in infancy, and (6) the flexibility of the speech system in the face of novel or atypical input.

The complexity, variability, and fine temporal properties of the acoustic signal of speech have puzzled psycholinguists and speech engineers for decades. How can a signal seemingly devoid of regularity be decoded and recognized almost instantly, without any formal training, and despite being often experienced in suboptimal conditions? Without any real effort, we identify over a dozen speech sounds (phonemes) per second, recognize the words they constitute, almost immediately understand the message generated by the sentences they form, and often elaborate appropriate verbal and nonverbal responses before the utterance ends.

Unlike theories of letter perception and written-word recognition, theories of speech perception and spoken-word recognition have devoted a great deal of their investigation to a description of the signal itself, most of it carried out within the field of phonetics. In particular, the fact that speech is conveyed in the auditory modality has dramatic implications for the perceptual and cognitive operations underpinning its recognition. Research in speech perception has focused on the constraining effects of three main properties of the auditory signal: sequentiality, variability, and continuity.

Nature of the Speech Signal

Sequentiality.

One of the most obvious disadvantages of the auditory system compared to its visual counterpart is that the distribution of the auditory information is time bound, transient, and solely under the speaker’s control. Moreover, the auditory signal conveys its acoustic content in a relatively serial fashion, one bit of information at a time. The extreme spreading of information over time in the speech domain has important consequences for the mechanisms involved in perceiving and interpreting the input.

Illustration of the sequential nature of speech processing. ( A ) Waveform of a complete sentence, that is, air pressure changes (Y axis) over time (X axis). ( B–D ) Illustration of a listener’s progressive processing of the sentence at three successive points in time. The visible waveform represents the portion of signal that is available for processing at time t1 ( B ), t2 ( C ), and t3 ( D ).

In particular, given that relatively little information is conveyed per unit of time, the extraction of meaning can only be done within a window of time that far exceeds the amount of information that can be held in echoic memory (Huggins, 1975 ; Nooteboom, 1979 ). Likewise, given that there are no such things as “auditory saccades,” in which listeners would be able to skip ahead of the signal or replay the words or sentences they just heard, speech perception and lexical-sentential integration must take place sequentially, in real time (Fig. 26.1 ).

For a large part, listeners are extremely good at keeping up with the rapid flow of speech sounds. Marslen-Wilson ( 1987 ) showed that many words in sentences are often recognized well before their offset, sometimes as early as 200 ms after their onset, the average duration of one or two syllables. Other words, however, can only be disentangled from competitors later on, especially when they are short and phonetically reduced, for example, “you are” pronounced as “you’re” (Bard, Shillcock, & Altmann, 1988 ). Yet, in general, there is a consensus that speech perception and lexical access closely shadow the unfolding of the signal (e.g., the Cohort Model; Marslen-Wilson, 1987 ), even though “right-to-left” effects can sometimes be observed as well (Dahan, 2010 ).

Given the inevitable sequentiality of speech perception and the limited amount of information that humans can hold in their auditory short-term memory, an obvious question is whether fast speech, which allows more information to be packed into the same amount of time, helps listeners handle the transient nature of speech and, specifically, whether it affects the mechanisms leading to speech recognition. A problem, however, is that fast speech tends to be less clearly articulated (hypoarticulated), and hence, less intelligible. Thus, any processing gain due to denser information packing might be offset by diminished intelligibility. However, this confound can be avoided experimentally. Indeed, speech rate can be accelerated with minimal loss of intrinsic intelligibility via computer-assisted signal compression (e.g., Foulke & Sticht, 1969 ; van Buuren, Festen, & Houtgast, 1999 ). Time compression experiments have led to mixed results. Dupoux and Mehler ( 1990 ), for instance, found no effect of speech rate on how phonemes are perceived in monosyllabic versus disyllabic words. They started from the observation that the initial consonant of a monosyllabic word is detected faster if the word is high frequency than if it is low frequency, whereas frequency has no effect in multisyllabic words. This difference can be attributed to the use of a lexical route with short words and of a phonemic route with longer words. That is, short words are mapped directly onto lexical representations, whereas longer words undergo a process of decomposition into phonemes first. Critically, Dupoux and Mehler reported that a frequency effect did not appear when the duration of the disyllabic words was compressed to that of the monosyllabic words, suggesting that whether listeners use a lexical or phonemic route to identify phonemes depends on structural factors (number of phonemes or syllables) rather than time. Thus, on this account, the transient nature of speech has only a limited effect on the mechanisms underlying speech recognition.

In contrast, others have found significant effects of speech rate on lexical access. For example, both Pitt and Samuel ( 1995 ) and Radeau, Morais, Mousty, and Bertelson ( 2000 ) observed that the uniqueness point of a word, that is, the sequential point at which it can be uniquely specified (e.g., “spag” for “spaghetti”), could be dramatically altered when speech rate was manipulated. However, most changes were observed at slower rates, not at faster rates. Thus, changes in speech rate can have effects on recognition mechanisms, but these are observed mainly with time expansion, not with time compression. In sum, although the studies by Dupoux and Mehler ( 1990 ), Pitt and Samuel ( 1995 ), and Radeau et al. ( 2000 ) highlight different effects of time manipulation on speech processing, they all agree that packing more information per unit of time by accelerating speech rate does not compensate for the transient nature of the speech signal and for memory limitations. This is probably due to intrinsic perceptual and mnemonic limitations on how fast information can be processed by the speech system—at any rate.

In general, the sequential nature of speech processing is a feature that many models have struggled to implement not only because it requires taking into account echoic and short-term memory mechanisms (Mattys, 1997 ) but also because the sequentiality problem is compounded by a lack of clear boundaries between phonemes and between words, as described later.

The inspection of a speech waveform does not reveal clear acoustic correlates of what the human ear perceives as phoneme and word boundaries. The lack of boundaries is due to coarticulation between phonemes (the blending of articulatory gestures between adjacent phonemes) within and across words. Even though the degree of coarticulation between phonemes is somewhat less pronounced across than within words (Fougeron & Keating, 1997 ), the lack of clear and reliable gaps between words, along with the sequential nature of speech delivery, makes speech continuity one of the most challenging obstacles for both psycholinguistic theory and automatic speech recognition applications. Yet the absence of phoneme and word boundary markers hardly seems to pose a problem for everyday listening, as the subjective experience of speech is not one of continuity but, rather, of discreteness—that is, a string of sounds making up a string of words.

A great deal of the segmentation problem can be solved, at least in theory, based on lexical knowledge and contextual information. Key notions, here, are lexical competition and segmentation by lexical subtraction. In this view, lexical candidates are activated in multiple locations in the speech signal—that is, multiple alignment—and they compete for a segmentation solution that does not leave any fragments lexically unaccounted for (e.g., “great wall” is favored over “gray twall,” because “twall” in not an English word). Importantly, this knowledge-driven approach does not assign a specific computational status to segmentation, other than being the mere consequence of mechanisms associated with lexical competition (e.g., McClelland & Elman, 1986 ; Norris, 1994 ).

Another source of information for word segmentation draws upon broad prosodic and segmental regularities in the signal, which listeners use as heuristics for locating word boundaries. For example, languages whose words have a predominant rhythmic pattern (e.g., word-initial stress is predominant in English; word-final lengthening is predominant in French) provide a relatively straightforward—though probabilistic—segmentation strategy to their listeners (Cutler, 1994 ). The heuristic for English would go as follows: every time a strong syllable is encountered, a boundary is posited before that syllable . For French, it would be: every time a lengthened syllable is encountered, a boundary is posited after that syllable . Another documented heuristic is based on phonotactic probability, that is, the likelihood that specific phonemes follow each other in the words of a language (McQueen, 1998 ). Specifically, phonemes that are rarely found next to each other in words (e.g., very few English words contain the /fh/ diphone) would be probabilistically interpreted as having occurred across a word boundary (e.g., “tou gh h ero”). Finally, a wide array of acoustic-phonetic cues can also give away the position of a word boundary (Umeda & Coker, 1974 ). Indeed, phonemes tend to be realized differently depending on their position relative to a word or a syllable boundary. For example, in English, word-initial vowels are frequently glottalized (brief closure of the glottis, e.g., /e/ in “isle e nd,” compared to no closure in “I l e nd”), word-initial stop consonants are often aspirated (burst of air accompanying the release of a consonant, e.g., /t/ in “gray t anker” compared to no aspiration in “grea t anchor”).

It is important to note that, in everyday speech, lexically and sublexically driven segmentation cues usually coincide and reinforce each other. However, in suboptimal listening conditions (e.g., noise) or in rare cases where a conflict arises between those two sources of information, listeners have been shown to downplay sublexical discrepancies and give more heed to lexical plausibility (Mattys, White, & Melhorn, 2005 ; Fig. 26.2 ).

Variability

Perhaps the most defining challenge for the field of speech perception is the enormous variability of

Sketch of Mattys, White, and Melhorn’s ( 2005 ) hierarchical approach to speech segmentation. The relative weights of speech segmentation cues are illustrated by the width of the gray triangle. In optimal listening conditions, the cues in Tier I dominate. When lexical access is compromised or ambiguous, the cues in Tier II take over. Cues from Tier III are recruited when both lexical and segmental cues are compromised (e.g., background of severe noise). (Reprinted from Mattys, S. L., White, L., & Melhorn, J. F [2005]. Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General , 134 , 477–500 [Figure 7], by permission of the American Psychological Association.)

the signal relative to the stored representations onto which it must be matched. Variability can be found at the word level, where there are infinite ways a given word can be pronounced depending on accents, voice quality, speech rate, and so on, leading to a multitude of surface realizations for a unique target representation. But this many-to-one mapping problem is not different from the one encountered with written words in different handwritings or object recognition in general. In those cases, signal normalization can be effectively achieved by defining a set of core features unique to each word or object stored in memory and by reducing the mapping process to those features only.

The real issue with speech variability happens at a lower level, namely, phoneme categorization. Unlike letters whose realizations have at least some commonality from one instance to another, phonemes can vary widely in their acoustic manifestation—even within the same speaker. For example, as shown in Figure 26.3A , the realization of the phoneme /d/ has no immediately apparent acoustic commonality in /di/ and /du/ (Delattre, Liberman, & Cooper, 1955 ). This lack of acoustic invariance is the consequence of coarticulation: The articulation of /d/ in /di/ is partly determined by the articulatory preparation for /i/, and likewise for /d/ in /du/. The power of coarticulation is easily demonstrated by observing a speaker’s mouth prior to saying /di/ compared to /du/. The mode of articulation of /i/ (unrounded) versus /u/ (rounded) is visible on the speaker’s lips even before /d/ has been uttered. The resulting acoustics of /d/ preceding each vowel have therefore little in common.

The success of the search for acoustic cues, or invariants, capable of uniquely identifying phonemes or phonetic categories has been highly feature specific. For example, as illustrated in Figure 26.3A , the place of articulation of phonemes (i.e., the place in the vocal tract where the airstream is most constricted, which distinguishes, e.g., /b/, /d/, /g/) has been difficult to map onto specific acoustic cues. However, the difference between voiced and unvoiced stop consonants (/b/, /d/, /g/ vs. /p/, /t/, /k/) can be traced back fairly reliably to the duration between the release of the consonant and the moment when the vocal folds start vibrating, that is, the voice onset time (VOT; Liberman, Delattre, & Cooper, 1958 ). In English, the VOT of voiced stop consonants is typically around 0 ms (or at least shorter than 20 ms), whereas it is generally over 25 ms for voiceless consonants. Although this contrast has been shown to be somewhat influenced by consonant type and vocalic context (e.g., Lisker & Abramson, 1970 ), VOT is a fairly robust cue for the voiced-voiceless distinction.

( A ) Stylized spectrograms of /di/ and /du/. The dark bars, or formants, represent areas of peak energy on the frequency scale (Y axis), which correlate with zones of high resonance in the vocal tract. The curvy leads into the formants are formant transitions. They show coarticulation between the consonant and the following vowel. Note the dissimilarity between the second formant transitions in /di/ (rising) and /du/ (falling). However, as shown in ( B ), the extrapolation back in time of the two second formants’ transitions point to a common frequency locus.

Vowels are subject to coarticulatory influences, too, but the spectral structure of their middle portion is usually relatively stable, and hence, a taxonomy of vowels based on their unique distribution of energy bands along the frequency spectrum, or formants, can be attempted. However, such distribution is influenced by speaking rate, with fast speech typically leading to the target frequency of the formants being missed or leading to an asymmetric shortening of stressed versus unstressed vowels (Lindblom, 1963 ; Port, 1977 ). In general, speech rate variation is particularly problematic for acoustic cues involving time. Even stable cues such as VOT can lose their discriminability power when speech rate is altered. For example, at fast speech rates, the VOT difference between voiced and voiceless stop consonants decreases, making the two types of phonemes more difficult to distinguish (Summerfield, 1981 ). The same problem has been noted for the difference between /b/ and /w/, with /b/ having rapid formant transitions into the vowel and /w/ less rapid ones. This difference is less pronounced at fast speech rates (Miller & Liberman, 1979 ).

Yet, except for those conditions in which subtle differences are manipulated in the laboratory, listeners are surprisingly good at compensating for the acoustic distortions introduced by coarticulation and changes in speech rate. Thus, input variability, phonetic-context effects, and the lack of invariance do not appear to pose a serious problem for everyday speech perception. As reviewed later, however, theoretical accounts aiming to reconcile the complexity of the signal with the effortlessness of perception vary greatly.

Basic Phenomena and Questions in Speech Perception

Following are some of the observations that have shaped theoretical thinking in speech perception over the past 60 years. Most of them concern, in one way or another, the extent to which speech perception is carried out by a part of the auditory system dedicated to speech and involving speech-specific mechanisms not recruited for nonspeech sounds.

Categorical Perception

Categorical perception in a sensory phenomenon whereby a physically continuous dimension is perceived as discrete categories, with abrupt perceptual boundaries between categories and poor discrimination within categories (e.g., perception of the visible electromagnetic radiation spectrum as discrete colors). Early on, categorical perception was found to apply to phonemes—or at least some of them. For example, Liberman, Harris, Hoffman, and Griffith ( 1957 ) showed that synthesized syllables ranging from /ba/ to /da/ to /ga/ by gradually adjusting the transition between the consonant and the vowel’s formants (i.e., the formant transitions) were perceived as falling into coarse /b/, /d/, and /g/ categories, with poor discrimination between syllables belonging to a perceptual category and high discrimination between syllables straddling a perceptual boundary (Fig. 26.4 ). Importantly, categorical perception was not observed for matched auditory stimuli devoid of phonemic significance (Liberman, Harris, Eimas, Lisker, & Bastian, 1961 ). Moreover, since categorical perception meant that easy-to-identify syllables (spectrum endpoints) were also easy syllables to pronounce, whereas less-easy-to-identify syllables (spectrum midpoints) were generally less easy to pronounce, categorical perception was seen as a highly adaptive property of the speech system, and hence, evidence for a dedicated speech mode of the auditory system. This claim was later weakened by reports of categorical perception for nonspeech sounds (e.g., Miller, Wier, Pastore, Kelly, & Dooling, 1976 ) and for speech sounds by nonhuman species (e.g., Kluender, Diehl, & Killeen, 1987 ; Kuhl, 1981 ).

Idealized identification pattern (solid line, left Y axis) and discrimination pattern (dashed line, right Y axis) for categorical perception. Illustration with a /ba/ to /da/ continuum. Identification shows a sharp perceptual boundary between categories. Discrimination is finer around the boundary than inside the categories.

Effects of Phonetic Context

The effect of adjacent phonemes on the acoustic realization of a target phoneme (e.g., /d/ in /di/ vs. /du/) was mentioned earlier as a core element of the variability challenge. This challenge, that is, achieving perceptual constancy despite input variability, is perhaps most directly illustrated by the converse phenomenon, namely, the varying perception of a constant acoustic input as a function of its changing phonetic environment. Mann ( 1980 ) showed that the perception of a /da/-/ga/ continumm was shifted in the direction of reporting more /ga/ when it was preceded by /al/ and more /da/ when it was preceded by /ar/. Since these shifts are in the opposite direction of coarticulation between adjacent phonemes, listeners appear to compensate for the expected consequences of coarticulation. Whether compensation for coarticulation is evidence for a highly sophisticated mechanism whereby listeners use their implicit knowledge of how phonemes are produced—that is, coarticulated—to guide perception (e.g., Fowler, 2006 ) or simply a consequence of long-term association between the signal and the percept (e.g., Diehl, Lotto, & Holt, 2004 ; Lotto & Holt, 2006 ) has been a question of fundamental importance for theories of speech perception, as discussed later.

Integration of Acoustic and Optic Cues

The chief outcome of speech production is the emission of an acoustic signal. However, visual correlates, such as facial and lip movements, are often available to the listener as well. The effect of visual information on speech perception has been extensively studied, especially in the context of the benefit provided by visual cues for listeners with hearing impairments (e.g., Lachs, Pisoni, & Kirk, 2001 ) and for speech perception in noise (e.g., Sumby & Pollack, 1954 ). Visual-based enhancement is also observed for undegraded speech with a semantically complicated content or for foreign-accented speech (Reisberg, McLean, & Goldfield, 1987 ). In the laboratory, audiovisual integration is strikingly illustrated by the well-known McGurk effect. McGurk and McDonald ( 1976 ) showed that listeners presented with an acoustic /ba/ dubbed over a face saying /ga/ tended to report hearing /da/, a syllable whose place of articulation is intermediate between /ba/ and /ga/. The robustness and automaticity of the effect suggest that the acoustic and (visual) articulatory cues of speech are integrated at an early stage of processing. Whether early integration indicates that the primitives of speech perception are articulatory in nature or whether it simply highlights a learned association between acoustic and optic information has been a theoretically divisive debate (see Rosenblum, 2005 , for a review).

Lexical and Sentential Effects on Speech Perception

Although traditional approaches to speech perception often stop where word recognition begins (in the same way that approaches to word recognition often stop where sentence comprehension begins), speech perception has been profoundly influenced by the debate on how higher order knowledge affects the identification and categorization of phonemes and phonetic features. A key observation is that lexical knowledge and sentential context can aid phoneme identification, especially when the signal is ambiguous or degraded. For example, Warren and Obusek ( 1971 ) showed that a word can be heard as intact even when a component phoneme is missing and replaced with noise, for example, “legi*lature,” where the asterisk denotes the replaced phoneme. In this case, lexical knowledge dictates what the listener should have heard rather than what was actually there, a phenomenon referred to as phoneme restoration. Likewise, Warren and Warren ( 1970 ) showed that a word whose initial phoneme is degraded, for example, “*eel,” tends to be heard as “wheel” in “It was found that the *eel was on the axle” and as “peel” in “It was found that the *eel was on the orange.” Thus, phoneme identification can be strongly influenced by lexical and sentential knowledge even when the disambiguating context appears later than the degraded phoneme.

But is this truly of interest for speech perception ? In other words, could phoneme restoration (and other similar speech illusions) simply result from postperceptual, strategic biases? In this case, “*eel” would be interpreted as “wheel” simply because it makes pragmatic sense to do so in a particular sentential context, not because our perceptual system is genuinely tricked by high-level expectations. If so, contextual effects are of interest to speech-perception scientists only insofar as they suggest that speech perception happens in a system that is unpenetrable by higher order knowledge—an unfortunately convenient way of indirectly perpetuating the confinement of speech perception to the study of phoneme identification. The evidence for a postperceptual explanation is mixed. While Norris, McQueen, and Cutler ( 2000 ), Massaro ( 1989 ), and Oden and Massaro ( 1978 ), among others, found no evidence for online top-down feedback to the perceptual system and no logical reasons why such feedback should exist, Samuel ( 1981 , 1997 , 2001 ), Connine and Clifton ( 1987 ), and Magnuson, McMurray, Tanenhaus, and Aslin ( 2003 ), among others, have reported lexical effects on perception that challenge feedforward models—for example, evidence that lexical information truly alters low-level perceptual discrimination (Samuel, 1981 ). This debate has fostered extreme empirical ingenuity over the past decades but comparatively little change to theory. One exception, however, is that the debate has now spread to the long-term effects of higher order knowledge on speech perception. For example, while Norris, McQueen, and Cutler ( 2000 ) argue against online top-down feedback, the same group (2003) recognizes that perceptual (re-)tuning can happen over time, in the context of repeated exposure and learning. Placing the feedforward/feedback debate in the time domain provides an opportunity to examine the speech system at the interface with cognition, and memory functions in particular. It also allows more applied considerations to be introduced, such as the role of perceptual recalibration for second-language learning and speech perception in difficult listening conditions (Samuel & Kraljic, 2009 ), as discussed later.

Theories of Speech Perception (Narrowly and Broadly Construed)

Motor and articulatory-gesture theories.

The Motor Theory of speech perception, reported in a series of articles in the early 1950s by Liberman, Delattre, Cooper, and other researchers from the Haskins Laboratories, was the first to offer a conceptual solution to the lack-of-invariance problem. As mentioned earlier, the main stumbling block for speech-perception theories was the observation that many phonemes cannot uniquely be identified by a set of stable and reliable acoustic cues. For example, the formant transitions of /d/, especially the second formant, differ as a function of the following vowel. However, Delattre et al. ( 1955 ) found commonality between different /d/s by extrapolating the formant transitions back in time to their convergence point, or locus (or hub ; Potter, Kopp, & Green, 1947 ), as shown in Figure 26.3B . Thus, what is common to the formants of all /d/s is the frequency at their origin, that is, the frequency that would best reflect the position of the articulators prior to the release of the consonant. This led to one of the key arguments in support of the motor theory, namely that a one-to-one relationship between acoustics and phonemes can be established if the speech system includes a mechanism that allows the listener to work backward through the rules of production in order to identify the speaker’s intended phonemes. In other words, the lack-of-invariance problem can be solved if it can be demonstrated that listeners perceive speech by identifying the speaker’s intended speech gestures rather than (or in addition to) relying solely on the acoustic manifestation of such gestures. The McGurk effect, whereby auditory perception is dramatically altered by seeing the speaker’s moving lips (articulatory gestures), was an important contributor to the view that the perceptual primitives of speech are gestural in nature.

In addition to claiming that the motor system is recruited for perceiving speech (and partly because of this claim), the Motor Theory also posits that speech perception takes place in a highly specialized and speech-specific module that is neurally isolated and is most likely a unique and innate human endowment (Liberman, 1996 ; Liberman & Mattingly, 1985 ). However, even among supporters of a motor basis for speech perception, agreeing upon an operational definition of intended speech gestures and providing empirical evidence for the contribution of such intended gestures to perception proved difficult. This led Fowler and her colleagues to propose that the objects of speech perception are not intended articulatory gestures but real gestures, that is, actual vocal tract movements that are inferable from the acoustic signal itself (e.g., Fowler, 1986 , 1996 ). Thus, although Fowler’s Direct Realism approach aligns with the Motor Theory in that it claims that perceiving speech is perceiving gestures, it asserts that the acoustic signal itself is rich enough in articulatory information to provide a stable (i.e., invariant) signal-to-phoneme mapping algorithm. In doing so, Direct Realism can do away with claims about specialized and/or innate structures for speech perception.

Although the popularity of the original tenets of the Motor Theory—and, to some extent, associated gesture theories—has waned over the years, the theory has brought forward essential questions about the specificity of speech, the specialization of speech perception, and, more recently, the neuroanatomical substrate of a possible motor component of the speech apparatus (e.g., Gow & Segawa, 2009 ; Pulvermüller et al., 2006 ; Sussman, 1989 ; Whalen et al., 2006 ), a topic that regained interest following the discovery of mirror neurons in the premotor cortex (e.g., Rizzolatti & Craighero, 2004 ; but see Lotto, Hickok, & Holt, 2009 ). The debate has also shifted to a discussion of the extent to which the involvement of articulation during speech perception might in fact be under the listener’s control and its manifestation partly task specific (Yuen, Davis, Brysbaert, & Rastle, 2010 , Fig. 26.5 ; see comments by McGettigan, Agnew, & Scott, 2010 ; Rastle, Davis, & Brysbaert, 2010 ). The Motor Theory has also been extensively reviewed—and revisited—in an attempt to address problems highlighted by auditory-based models, as described later (e.g., Fowler, 2006 , 2008 ; Galantucci, Fowler, & Turvey, 2006 ; Lotto & Holt, 2006 ; Massaro & Chen, 2008 ).

Electropalatographic data showing the proportion of tongue contact on alveolar electrodes during the initial and final portions of /k/-initial (e.g., kib ) or /s/-initial (e.g., s ib ) syllables (collapsed) while a congruent or incongruent distractor is presented (Yuen et al., 2010 ). The distractor was presented auditorily in conditions A and B and visually in condition C. With the target kib as an example, the congruent distractor in the A condition was kib and the incongruent distractor started with a phoneme involving a different place of articulation (e.g., tib ). In condition B, the incongruent distractor started with a phoneme that differed from the target only by its voicing status, not by its place of articulation (e.g., gib ). Condition C was the same as condition A, except that the distractor was presented visually. The results show “traces” of the incongruent distractors in target production when the distractor is in articulatory competition with the target, particularly in the early portion of the phoneme (condition A), but not when it involves the same place of articulation (condition B), or when it is presented visually (condition C). The results suggest a close relationship between speech perception and speech production. (Reprinted from Yuen, I., Davis, M. H., Brysbaert, M., Rastle, K. [2010]. Activation of articulatory information in speech perception. Proceedings of the National Academy of Sciences USA , 107 , 592–597 [Figure 2], by permission of the National Academy of Sciences.)

Auditory Theory(ies)

The role of articulatory gestures in perceiving speech and the special status of the speech-perception system progressively came under attack largely because of insufficient hard evidence and lack of computational parsimony. Recall that recourse to articulatory gestures was originally posited as a way to solve the lack-of-invariance problem and turn a many(acoustic traces)-to-one(phoneme) mapping problem into a one(gesture)-to-one(phoneme) mapping solution. However, the lack of invariance problem turned out to be less prevalent and, at the same time, more complicated than originally claimed. Indeed, as mentioned earlier, many phonemes were found to preserve distinctive features across contexts (e.g., Blumstein & Stevens, 1981 ; Stevens & Blumstein, 1981 ). At the same time, lack of invariance was found in domains for which a gestural explanation was only of limited use, for example, voice quality, loudness, and speech rate.

Perhaps most problematic for gesture-based accounts was the finding by Kluender, Diehl, and Killeen ( 1987 ) that phonemic categorization, which was viewed by such accounts as necessitating access to gestural primitives, could be observed in species lacking the anatomical prerequisites for articulatory knowledge and practice (Japanese quail; Fig. 26.6 ). This result was seen by many as undermining both the motor component of speech perception and its human-specific nature. Parsimony became the new driving force. As Kluender et al. put it, “A theory of human phonetic categorization may need to be no more (and no less) complex than that required to explain the behavior of these quail” (p. 1197). The gestural explanation for compensation for coarticulation effects (Mann, 1980 ) was challenged by a general auditory mechanism as well. In Mann’s experiment, the perceptual shift on the /da/-/ga/ continumm induced by the preceding /al/ versus /ar/ context was explained by reference to articulatory gestures. However, Lotto and Kluender ( 1998 ) found a similar shift when the preceding context consisted of nonspeech sounds mimicking the spectral characteristics of the actual syllables (e.g., tone glides). Thus, the acoustic composition of the context, and in particular its spectral contrast with the following syllable, rather than an underlying reference to abstract articulatory gestures, was able to account for Mann’s context effect (but see Fowler, Brown, & Mann’s, 2000 , subsequent multimodal challenge to the auditory account).

However, auditory theories have been criticized for lacking in theoretical content. Auditory accounts are indeed largely based on counterarguments (and counterevidence) to the motor and gestural theories, rather than resting on a clear set of falsifiable principles (Diehl et al., 2004 ). While it is clear that a great deal of phenomena previously believed to require a gestural account can be explained within an arguably simpler auditory framework, it remains to be seen whether auditory theories can provide a satisfactory explanation for the entire class of phenomena in which the many-to-one puzzle has been observed (e.g., Pardo & Remez, 2006 ).

Pecking rates at test for positive stimuli (/dVs/) and negative stimuli (all others) for one of the quail in Kluender et al.’s ( 1987 ) study in eight vowel contexts. The test session was preceded by a learning phase in which the quail learned to discriminate /dVs/ syllables (i.e., syllables starting with /d/ and ending with /s/, with a varying intervocalic vowel) from /bVs/ and /gVs/ syllables, with four different intervocalic vowels not used in the test phase. During learning, the quail was rewarded for pecking in response to /d/-initial syllables (positive trials) but not to /b/- and /g/-initial syllables (negative trials). The figure shows that, at test, the quail pecked substantially more to positive than negative syllables, even though these syllables contained entirely new vowels, that is, vowels leading to different formant transitions with the initial consonant than those experienced during the learning phase. (Reprinted from Kluender, K. R., Diehl, R. L., & Killeen, P. R. [1987]. Japanese Quail can form phonetic categories. Science , 237 , 1195–1197 [Figure 1], by permission of the National Academy of Sciences.)

Top-Down Theories

This rubric and the following one (bottom-up theories) review theories of speech perception broadly construed . They are broadly construed in that they consider phonemic categorization, the scope of the narrowly construed theories, in the context of its interface with lexical knowledge. Although the traditional separation between narrowly and broadly construed theories originates from the respective historical goals of speech perception and spoken-word recognition research (Pisoni & Luce, 1987 ), an understanding of speech perception cannot be complete without an analysis of the impact of long-term knowledge on early sensory processes (see useful reviews in Goldinger, Pisoni, & Luce, 1996 ; Jusczyk & Luce, 2002 ).

The hallmark of top-down approaches to speech perception is that phonetic analysis and categorization can be influenced by knowledge stored in long-term memory, lexical knowledge in particular. As mentioned earlier, phoneme restoration studies (e.g., Warren & Obusek, 1971 ; Warren & Warren, 1970 ) showed that word knowledge could affect listeners’ interpretation of what they heard, but they did not provide direct evidence that phonetic categorization per se (i.e., perception , as it was referred to in that literature) was modified by lexical expectations. However, Samuel ( 1981 ) demonstrated that auditory acuity was indeed altered when lexical information was available (e.g., “pr*gress” [from “progress”], with * indicating the portion on which auditory acuity was measured) compared to when it was not (e.g., “cr*gress” [from the nonword “crogress”]).

This kind of result (see also, e.g., Ganong, 1980 ; Marslen-Wilson & Tyler, 1980 ; and, more recently, Gow, Segawa, Ahlfors, & Lin, 2008 ) led to conceptualizing the speech system as being deeply interactive, with information flowing not only from bottom to top but also from top down. For example, the TRACE model (more specifically, TRACE II; McClelland & Elman, 1986 ) is an interactive-activation model made of a large number of units organized into three levels: features, phonemes, and words (Fig. 26.7 A). The model includes bottom-up excitatory connections (from features to phonemes and from phonemes to words), inhibitory lateral connections (within each level), and, critically, top-down excitatory connections (from words to phonemes and from phonemes to features). Thus, the activation levels of features, for example, voicing, nasality, and burst, are partly determined by the activation levels of phonemes, and these are partly determined by the activation levels of words. In essence, this architecture places speech perception within a system that allows a given sensory input to yield a different perceptual experience (as opposed to interpretive experience) when it occurs in a word versus a nonword or next to phoneme x versus phoneme y, and so on. TRACE has been shown to simulate a large range of perceptual and psycholinguistic phenomena, for example, categorical perception, cue trading relations, phonetic context effects, compensation for coarticulation, lexical effects on phoneme detection/categorization, segmentation of embedded words, and so on. All this takes place within an architecture that is neither domain nor species specific. Later instantiations of TRACE have been proposed by McClelland ( 1991 ) and Movellan and McClelland ( 2001 ), but all of them preserve the core interactive architecture described in the original model.

Like TRACE, Grossberg’s Adaptive Resonance Theory (ART; e.g., Grossberg, 1986 ; Grossberg & Myers, 1999 ) suggests that perception emerges from a compromise, or stable state, between sensory information and stored lexical knowledge (Fig. 26.7B ). ART includes items (akin to subphonemic features or feature clusters) and list chunks (combinations of items whose composition is the result of prior learning; e.g., phonemes, syllables, or words). In ART, a sensory input activates items that, in turn, activate list chunks. List chunks feed back to component items, and items back to list chunks again in a bottom-up/top-down cyclic manner that extends over time, ultimately creating stable resonance between a set of items and a list chunk. Both TRACE and ART posit that connections between levels are only excitatory and connections within levels are only inhibitory. In ART, in typical circumstances, attention is directed to large chunks (e.g., words), and hence the content of smaller chunks is generally less readily available. Small mismatches between large chunks and small chunks do not prevent resonance, but large mismatches do. In other words, unlike TRACE, ART does not allow the speech system to “hallucinate” information that is not already there (however, for circumstances in which it could, see Grossberg, 2000a ). Large mismatches lead to the establishment of new chunks, and these gain resonance via subsequent exposure. In doing so, ART provides a solution to the stability-plasticity dilemma, that is, the unwanted erasure of prior learning by more recent learning (Grossberg, 1987 ), also referred to as catastrophic interference (e.g., McCloskey & Cohen, 1989 ).

Thus, like TRACE, ART posits that speech perception results from an online interaction between prelexical and lexical processes. However, ART is more deeply grounded in, and motivated by biologically plausible neural dynamics, where reciprocal connectivity and resonance states have been observed (e.g., Felleman & Van Essen, 1991 ). Likewise, ART replaces the hierarchical structure of TRACE with a more flexible one, in which tiers self-organize over time through competitive dynamics—as opposed to being predefined. Although sometimes accused of placing too few constraints on empirical expectations (Norris et al., 2000 ), the functional architecture of ART is thought to be more computationally economical than that of TRACE and more amenable to modeling both real-time and long-term temporal aspects of speech processing (Grossberg, Boardman, & Cohen, 1997 ).

Bottom-Up Theories

Bottom-up theories describe effects of lexical and sentential knowledge on phoneme categorization as a consequence of postperceptual biases. In this conceptualization, reporting “progress” when presented with “pr*gress” simply reflects a strategic decision to do so and the functionality of a system that is geared toward meaningful communication—we generally hear words rather than nonwords. Here, phonetic analysis itself is incorruptible by lexical or sentential knowledge. It takes place within an autonomous module that receives no feedback from lexical and postlexical layers. In Cutler and Norris’s ( 1979 ) Race model, phoneme identification is the result of a time race between a sublexical route and a lexical route activated in parallel in an entirely bottom-up fashion (Fig. 26.7C ). In normal circumstances, the lexical route is faster, which means that a sensory input that has a match in the lexicon (e.g., “progress”) is usually read out from that route. A nonlexical sensory input (e.g., “crogress”) is read out from the prelexical route. In this model, “pr*gress” is reported as containing the phoneme /o/ because the lexical route receives enough evidence to activate the word “progress” and, being faster, this route determines the response. In contrast, “cr*gress” does not lead to an acceptable match in the lexicon, and hence, readout is performed from the sublexical route, with the degraded phoneme being faithfully reported as degraded.

Simplified architecture of ( A ) TRACE, ( B ) ART, ( C ) Race, ( D ) FLMP, and ( E ) Merge. Layers are labeled consistently across models for comparability. Excitatory connections are denoted by arrows. Inhibitory connections are denoted by closed black circles.

Massaro’s Fuzzy Logical Model of Perception (FLMP; Massaro, 1987 , 1996 ; Oden & Massaro, 1978 ) also exhibits a bottom-up architecture, in which various sources of sensory input—for example, auditory, visual—contribute to speech perception without any feedback from the lexical level (Fig. 26.7D ). In FLMP, acoustic-phonetic features are activated multimodally and each feature accumulates a certain level of activation (on a continuous 0-to-1 scale), reflecting the degree of certainty that the feature has appeared in the signal. The profile of features’ activation levels is then compared against a prototypical profile of activation for phonemes stored in memory. Phoneme identification occurs when the match between the actual and prototypical profiles reaches a level determined by goodness-of-fit algorithms. Critically, the match does not need to be perfect to lead to identification; thus, there is no need for lexical top-down feedback. Prelexical and lexical sources of information are then integrated into a conscious percept. Although the extent to which the integration stage can be considered a true instantiation of bottom-up processes is a matter for debate (Massaro, 1996 ), FLMP also predicts that auditory acuity of * is fundamentally comparable in “pr*gress” and “cr*gress”—like the Race model and unlike top-down theories.

From an architectural point of view, integration between sublexical and lexical information is handled differently by Norris et al.’s ( 2000 ) Merge model. In Merge, the phoneme layer is duplicated into an input layer and a decision layer (Fig. 26.7E ). The phoneme input layer feeds forward to the lexical layer (with no top-down connections) and the phoneme decision layer receives input from both the phoneme input layer and the lexical layer. The phoneme decision layer is the place where phonemic and lexical inputs are integrated and where standard lexical phenomena arise (e.g., Ganong, 1980 ; Samuel, 1981 ). While both FLMP and Merge produce a decision by integrating unaltered lexical and sublexical information, the input received from the lexical level differs in the two models. In FLMP, lexical activation is relatively independent from the degree of activation of its component phonemes, whereas, in Merge, lexical activation is directly influenced by the pattern of activation sent upward by the phoneme input layer. While Merge has been criticized for exhibiting a contorted architecture (Gow, 2000 ; Samuel, 2000 ), being ecologically improbable (e.g., Grossberg, 2000b ; Montant, 2000 ; Stevens, 2000 ), and being simply a late instantiation of FLMP (Massaro, 2000 ; Oden, 2000 ), it has gathered the attention of both speech-perception and spoken-word-recognition scientists around a question that is as yet unanswered.

Bayesian Theories

Despite important differences in functional architecture between top-down and bottom-up models, both classes of models agree that speech perception involves distinct levels of representations (e.g., features, phonemes, words), multiple lexical activation, lexical competition, integration (of some sort) between actual sensory input and lexical expectations, and corrective mechanisms (of some sort) to handle incompleteness or uncertainty in the input. A radically different class of models based on optimal Bayesian inference has recently emerged as an alternative to the ones mentioned earlier—recently in psycholinguistics at least. These models eschew the concept of lexical activation altogether, sometimes doing away with the bottom-up/top-down debate itself—or at a minimum blurring the boundaries between the two mechanisms. For instance, in their Shortlist B model, Norris and McQueen ( 2008 ) have replaced activation with the concepts of likelihood and probability, which are seen as better approximations of actual (i.e., imperfect) human performance in the face of actual (i.e., complex and variable) speech input. The appeal of Bayesian computations is substantial because output (or posterior) probabilities, for example, probability that a word will be recognized, are estimated by tabulating both confirmatory and disconfirmatory evidence accumulated over past instances, as opposed to being tied to fixed activation thresholds (Fig. 26.8 ). In particular, Shortlist B has replaced discrete input categories such as features and phonemes with phoneme likelihoods calculated from actual speech data. Because they are derived from real speech, the phoneme likelihoods vary from instance to instance and as a function of the quality of the input and the phonetic context. Thus, while noisier, these probabilities are a better reflection of the type of challenge faced by the speech system in everyday conditions. They also allow the model to provide a single account for speech phenomena that usually require distinct ad-hoc mechanisms in other models. A general criticism leveled against Bayesian models, however, concerns the legitimacy of their priors , that is, the set of assumptions used to determine initial probabilities before any evidence has been gathered (e.g., how expected is a word or a phoneme a priori). Because priors can be difficult to establish, their arbitrariness or the modeler’s own biases can have substantial effects on the model’s outcome. Likewise, compared to the models reviewed earlier, models based on Bayesian inference often lead to less straightforward hypotheses, which makes their testability somewhat limited—even though their performance level in terms of replicating known patterns of data is usually high.

Main Bayesian equation in Shortlist B (Norris & McQueen, 2008 ). P(word i |evidence) is the conditional probability of a specific word ( word i ) having been heard given the available (intact or degraded) input ( evidence ). P(word i ) represents the listener’s prior belief, before any perceptual evidence has been accumulated, that word i will be present in the input. P(word i ) can be approximated from lexical frequencies and contextual variables. The critical term of the equation is P(evidence|word i ) , which is the likelihood of the evidence given word i , that is, the product of the probabilities of the sublexical units (e.g., phonemes) making up word i . This term is important because it acknowledges and takes into account the variability of the input (noise, ambiguity, idiosyncratic realization, etc.) in the input-to-representation mapping process. The probability of word i so calculated is then compared to that of all other words in the lexicon ( n ). Thus, Bayesian inference provides an index of word recognition that considers both lexical and sublexical factors as well as the complexity of a real and variable input.

Tailoring Speech Perception: Learning and Relearning

The literature reviewed so far suggests that perceiving speech involves a set of highly sophisticated processing skills and structures. To what extent are these skills and structures in place at birth? Of particular interest in the context of early theories of speech perception is the way in which speech perception and speech production develop relative to each other and the degree to which perceptual capacities responsible for subtle phonetic discrimination (e.g., voicing distinction) are present in prelinguistic infants. Eimas, Siqueland, Jusczyk, and Vigorito ( 1971 ) showed that 1-month-old infants perceive a voicing-based /ba/-/pa/ continuum categorically, just as adults do. Similarly, like adults (Mattingly, Liberman, Syrdal, & Halwes, 1971 ), young infants show a dissociation between categorical perception with speech and continuous perception with matched nonspeech (Eimas, 1974 ). Infants also seem to start off with an open-ended perceptual system, allowing them to discriminate a wide range of subtle phonetic contrasts—far more contrasts than they will be able to discriminate in adulthood (e.g., Aslin, Werker, & Morgan, 2002 ; Trehub, 1976 ). There is therefore strong evidence that fine speech-perception skills are in place early in life—at least well before the onset of speech production—and operational with minimal, if any, exposure to ambient speech. These findings have led to the idea that speech-specific mechanisms are part of the human biological endowment and have been taken as evidence for the innateness of language, or at least some of its perceptual aspects (Eimas et al., 1971 ). In that sense, an infant has very little to learn about speech perception. If anything, attuning to one’s native language is rather a matter of losing sensitivity to (or unlearning ) phonetic contrasts that have little communicative value for that particular language, for example, the /r/-/l/ distinction for Japanese listeners.

However, the idea that infants are born with a universal discrimination device operating according to a use-it-or-lose-it principle has not been unchallenged. For instance, on closer examination, discrimination capacities at the end of the first year appear far less acute and far less universal than expected (e.g., Lacerda & Sundberg, 2001 ). Likewise, discrimination of irrelevant contrasts does not wane as systematically and as fully as the theory would have it (e.g., Polka, Colantonio, & Sundara, 2001 ). For example, Bowers, Mattys, and Gage ( 2009 ) showed that language-specific phonemes learned in early childhood but never heard or produced subsequently, as would be the case for young children of temporarily expatriate parents, can be relearned relatively easily even decades later (Fig. 26.9A ). Thus, discriminatory attrition is not as widespread and severe as previously believed, suggesting that the representations of phonemes from “forgotten” languages, that is, those we stop practicing early in life, may be more deeply engraved in our long-term memory than we think.

By and large, however, the literature on early speech perception indicates that infants possess fine language-oriented auditory skills from birth as well as impressive capacities to learn from the ambient auditory scene during the first year of life (Fig. 26.10 ). Auditory deprivation during that period (e.g., otitis media; delay prior to cochlear implantation) can have severe consequences on speech perception and later language development (e.g., Clarkson, Eimas, & Marean, 1989 ; Mody, Schwartz, Gravel, & Ruben, 1999 ), possibly due to a general decrease of attention to sounds (e.g., Houston, Pisoni, Kirk, Ying, & Miyamoto, 2003 ). However, even in such circumstances, partial sensory information is often available through the visual channel (facial and lip information), which might explain the relative resilience of basic speech perception skills to auditory deprivation. Indeed, Kuhl and Meltzoff ( 1982 ) showed that, as early as 4 months of age, infants show a preference for matched audiovisual inputs (e.g., audio /a/ with visual /a/) over mismatched inputs (e.g., audio /a/ with visual /i/). Even more striking, infants around that age seem to integrate discrepant audiovisual information following the typical McGurk pattern observed in adults (Rosenblum, Schmuckler, & Johnson, 1997 ). These results suggest that the multimodal (or amodal) nature of speech perception, a central tenet of Massaro’s Fuzzy Logical Model of Perception (FLMP; cf. Massaro, 1987 ), is present early in life and operates without much prior experience with sound-gesture association. Although the strength of the McGurk effect is lower in infants than adults (e.g., Massaro, Thompson, Barron, & Laren, 1986 ; McGurk & MacDonald, 1976 ), early cross-modal integration is often taken as evidence for gestural theories of speech perception and as a challenge to auditory theories.

A question of growing interest concerns the flexibility of the speech-perception system when it is faced with an unstable or changing input. Can the perceptual categories learned during early infancy be undone or retuned to reflect a new environment? The issue of perceptual (re)learning is central to research on second-language (L2) perception and speech perception in degraded conditions. Evidence for a speech-perception-sensitive period during the first year of life (Trehub, 1976 ) suggests that attuning to new perceptual categories later on should be difficult and perhaps not as complete as it is for categories learned earlier. Late learning of L2 phonetic contrasts (e.g., /r/-/l/ distinction for Japanese L1 speakers) has indeed been shown to be slow, effortful, and imperfect (e.g., Logan, Lively, & Pisoni, 1991 ). However, even in those conditions, learning appears to transfer to tokens produced by new talkers (Logan et al., 1991 ) and, to some degree, to production (Bradlow, Pisoni, Akahane-Yamada, & Tohkura, 1997 ). Successful learning of L2 contrasts is not systematically observed, however. For example, Bowers et al. ( 2009 ) found no evidence that English L1 speakers could learn to discriminate Zulu contrasts (e.g., /b/-//) or Hindi contrasts (e.g., /t/ vs. /˛/) even after 30 days of daily training (Fig. 26.9 B ). Thus, although possible, perceptual learning of L2 contrasts is greatly constrained by the age of L2 exposure, the nature and duration of training, and the phonetic overlap between the L1 and L2 phonetic inventories (e.g., Best, 1994 ; Kuhl, 2000 ).

Perceptual learning of accented L1 and noncanonical speech follows the same general patterns as L2 learning, but it usually leads to faster and more complete retuning (e.g., Bradlow & Bent, 2008 ; Clarke & Garrett, 2004 ). A reason for this difference is that, while L2 contrast learning involves the formation of new perceptual categories, whose boundaries are sometimes in direct conflict with L1 categories, accented L1 learning “simply” involves retuning existing perceptual categories, often by broadening their mapping range. This latter feature makes perceptual learning of accented speech a special instance of the more general debate on the episodic versus abstract nature of phonemic and lexical representations. At issue, here, is whether phonemic and lexical representations consist of a collection of episodic instances in which surface details are preserved (voice, accent, speech rate) or, alternatively, single, abstract representations (i.e., one for each phoneme and one for each word). That at least some surface details of words are preserved in long-term memory is undeniable (e.g., Goldinger, 1998 ). The current debate focuses on (1) whether lexical representations include both indexical (e.g., voice quality) and allophonic (e.g., phonological variants) details (Luce, McLennan, & Charles-Luce, 2003 ); (2) whether such details are of a lexical nature (i.e., stored within the lexicon), rather than sublexical (i.e., stored at the subphonemic, phonemic, or syllabic level; McQueen, Cutler, & Norris, 2006 ); (3) the online time course of episodic trace activation (e.g., Luce et al., 2003 ; McLennan, Luce, & Charles-Luce, 2005 ); (4) the mechanisms responsible for consolidating newly learned instances or new perceptual categories (e.g., Fenn, Nusbaum, & Margoliash, 2003 ); and (5) the possible generalization to other types of noncanonical speech, such as disordered speech (e.g., Lee, Whitehall, & Coccia, 2009 ; Mattys & Liss, 2008 ).

( A ) AX discrimination scores over 30 consecutive days (50% chance level; feedback provided) for Zulu contrasts (e.g., /b/-//) and Hindi contrasts (e.g., /t/ vs. /˛/) by DM, a 20-year-old, male, native English speaker who was exposed to Zulu from 4 to 8 years of age but never heard Zulu subsequently. Note DM’s improvement with the Zulu contrasts over the 30 days, in sharp contrast with his inability to learn the Hindi contrasts. ( B ) Performance on the same task by native English speakers with no prior exposure to Zulu or Hindi. (Adapted with permission from Bowers, J. S., Mattys, S. L., & Gage, S. H., [2009]. Preserved implicit knowledge of a forgotten childhood language. Psychological Science , 20 , 1064–1069 [partial Figure 1].)

Summary of key developmental landmarks for speech perception and speech production in the first year of life. (Reprinted from Kuhl, P. K. [2004]. Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience , 5 , 831–843 [Figure 1], by permission of the Nature Publishing Group.)

According to Samuel and Kraljic ( 2009 ), the aforementioned literature should be distinguished from a more recent strand of research that focuses on the specific variables affecting perceptual learning and the mechanisms linking such variables to perception. In particular, Norris, McQueen, and Cutler ( 2003 ) found that lexical information is a powerful source of perceptual recalibration. For example, Dutch listeners repeatedly exposed to a word containing a sound halfway between two existing phonemes (e.g., witlo* , where * is ambiguous between /f/ and /s/, with witlof a Dutch word—chicorey—and witlos a nonword) subsequently perceived a /f/-/s/ continuum as biased in the direction of the lexically induced percept (more /f/ than /s/ in the witlo* case). Likewise, Bertelson, Vroomen, and de Gelder ( 2003 ) found that repeated exposure to McGurk audiovisual stimuli (e.g., audio /a*a/ and visual /aba/ leading to the auditory perception of / aba/) biased the subsequent perception of an audio-only /aba/-/ada/ continuum in the direction of the visually induced percept. Although visually induced perceptual learning seems to be less long-lasting than its lexically induced counterpart (Vroomen, van Linden, Keetels, de Gelder, & Bertelson, 2004 ), the Norris et al. and Bertelson et al. studies demonstrate that even the mature perceptual system can show a certain degree of flexibility when it is faced with a changing auditory environment.

Comparison of speech recognition error rate by machines (ASR) and humans. The logarithmic scale on the Y axis shows that ASR performance is approximately one order of magnitude behind human performance across various speech materials (ASR error rate for telephone conversation: 43%). The data were collated by Lippmann ( 1997 ). (Reprinted from Moore, R. K. [ 2007 ]. Spoken language processing by machine. In G. Gaskell [Ed.], Oxford handbook of psycholinguistics (pp. 723–738). Oxford, UK: Oxford University Press [Figure 44.6], by permission of Oxford University Press.)

Speech Recognition by Machines

This chapter was mainly concerned with human speech recognition (HSR), but technological advances in the past decades have allowed the topic of speech perception and recognition to become an economically profitable challenge for engineers and applied computer scientists. A complete review of Automatic Speech Recognition’s (ASR) historical background, issues, and state of the art is beyond the scope of this chapter. However, a brief analysis of ASR in the context of the key topics in HSR reviewed earlier reveals interesting commonalities as well as divergences among the preoccupations and goals of the two fields.

Perhaps the most notable difference between HSR and ASR is their ultimate aim. Whereas HSR aims to provide a description of how the speech system works (processes, representations, functional architecture, biological plausibility), ASR aims to deliver speech transcriptions as error-free as possible, regardless of the biological and cognitive validity of the underlying algorithms. The success of ASR is typically measured by the percentage of words correctly identified from speech samples varying in their acoustic and lexical complexity. While increasing computer capacity and speed have allowed ASR performance to improve substantially since the early systems of the 1970s (e.g., Jelinek, 1976 ; Klatt, 1977 ), ASR accuracy is still about an order of magnitude behind its HSR counterpart (Moore, 2007 ; see Fig. 26.11 ).

What is the cause of the enduring performance gap between ASR and HSR? Given that the basic constraints imposed by the signal (sequentiality, continuity, variability) are the same for humans and machines, it is tempting to conclude that the gap between ASR and HSR will not be bridged until the algorithms of the former resemble those of the latter. And currently, they do not. The architecture of most ASR systems is almost entirely data driven: Its structure is expressed in terms of a network of sequence probabilities calculated over large corpora of natural speech (and their supervised transcription). The ultimate goal of the corpora, or training data, is to provide a database of acoustic-phonetic information sufficiently large that an appropriate match can be found for any input sound sequence. The larger the corpora, the tighter the fit between the input and the acoustic model (e.g., triphones instantiated in hidden Markov models, HMM, cf. Rabiner & Juang, 1993 ), and the lower the ASR system’s error rate (Lamel, Gauvain, & Adda, 2000 ). By that logic, hours of training corpora, not human-machine avatars, are the solution for increased accuracy, giving support to the controversial assertion that human models have so far hindered rather than promoted ASR progress (Jelinek, 1985 ). However, Moore and Cutler ( 2001 ) estimated that increasing corpus sizes from their current average capacity (1,000 hours or less, which is the equivalent of the average hearing time of a 2-year-old) to 10,000 hours (average hearing time of a 10-year-old) would only drop the ASR error rate to 12%.

Thus, a data-driven approach to speech recognition is constrained by more than just the size of the training data set. For example, the lexical and syntactic content of the training data often determines the application for which the ASR system is likely to perform best. Domain-specific systems (e.g., banking transactions by phone) generally reach high recognition accuracy levels even when they are fed continuous speech produced by various speakers, whereas domain-general systems (e.g., speech-recognition packages on personal computers) often have to compromise on the number of speakers they can recognize and/or training time in order to be effective (Everman et al., 2005 ). Therefore, one of the current stumbling blocks of ASR systems is language modeling (as opposed to acoustic modeling), that is, the extent to which the systems include higher order knowledge—syntax, semantics, pragmatics—from which inferences can be made to refine the mapping between the signal and the acoustic model. Existing ASR language models are fairly simple, drawing upon the distributional methods of acoustic models in that they simply provide the probability of all possible word sequences based on their occurrences in the training corpora. In that sense, an ASR system can predict that “necklace” is a possible completion of “The burglar stole the…” because of its relatively high transitional probability in the corpora, not because of the semantic knowledge that burglars tend to steal valuable items, and not because of the syntactic knowledge that a noun phrase typically follows a transitional verb. Likewise, ASR systems rarely include the kind of lexical feedback hypothesized in HSR models like TRACE (McClelland & Elman, 1986 ) and ART (Grossberg, 1986 ). Like Merge (Norris et al., 2000 ), ASR systems only allow lexical information and the language model to influence the relative weights of activated candidates, but not the fit between the signal and the acoustic model (Scharenborg Norris, ten Bosch, & McQueen, 2005 ).

While the remaining performance gap between ASR and HSR is widely recognized in the ASR literature, there seems to be no clear consensus on the direction to take in order to reduce it (Moore, 2007 ). Given today’s ever-expanding computer power, increasing the size of training corpora is probably the easiest way of gaining a few percentage points in accuracy, at least in the short term. More radical solutions are also being envisaged, however. For example, attempts are being made to build more linguistically plausible acoustic models by using phonemes (as opposed to di/triphone HMMs) as basic segmentation units (Ostendorf, Digilakis, & Kimball, 1996 ; Russell, 1993 ) or by preserving and exploiting fine acoustic detail in the signal instead of treating it as noise (Carlson & Hawkins, 2007 ; Moore & Maier, 2007 ).

The scientific study of speech perception started in the early 1950s under the impetus of research carried out at the Haskins Laboratories, following the development of the Pattern Playback device. This machine allowed Franklin S. Cooper and his colleagues to visualize speech in the form of a decomposable spectrogram and, reciprocally, to create artificial speech by sounding out the spectrogram. Contemporary speech perception research is both a continuation of its earlier preoccupations with the building blocks of speech perception and a departure from them. On the one hand, the quest for universal units of speech perception and attempts to crack the many-to-one mapping code are still going strong. Still alive, too, is the debate about the involvement of gestural knowledge in speech perception, reignited recently by neuroimaging techniques and the discovery of mirror neurons. On the decline are the ideas that speech is special with respect to audition and that infants are born with speech- and species-specific perceptual capacities. On the other hand, questions have necessarily spread beyond the sublexical level, following the assumption that decoding the sensory input must be investigated in the context of the entirety of the language system—or, at the very least, some of its phonologically related components. Indeed, lexical feedback, online or learning related, has been shown to modulate the perceptual experience of an otherwise unchanged input. Likewise, what used to be treated as speech surface details (e.g., indexical variations), and commonly filtered out for the sake of modeling simplicity, are now more fully acknowledged as being preserved during encoding, embedded in long-term representations, and used during retrieval. Speech-perception research in the coming decades is likely to expand its interest not only to the rest of the language system but also to domain-general cognitive functions such as attention and memory as well as practical applications (e.g., ASR) in the field of artificial intelligence. At the same time, researchers have become increasingly concerned with the external validity of their models. Attempts to enhance the ecological contribution of speech research are manifest in a sharp increase in studies using natural speech (conversational, accented, disordered) as the front end of their models.

Aslin, R. N. , Werker, J. F. , & Morgan, J. L. ( 2002 ). Innate phonetic boundaries revisited. Journal of the Acoustical Society of America , 112, 1257–1260.

Bard, E. G. , Shillcock, R. C. , & Altmann, G. T. M. ( 1988 ). The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context. Perception and Psychophysics , 44 , 395–408.

Bertelson, P. , Vroomen, J. , & de Gelder, B. ( 2003 ). Visual recalibration of auditory speech identification: A McGurk aftereffect. Psychological Science , 14, 592–597.

Best, C. T. ( 1994 ). The emergence of native-language phonological influences in infants: A perceptual assimilation model. In H. Nusbaum & J. Goodman (Eds.), The transition from speech sounds to spoken words: The development of speech perception (pp. 167–224). Cambridge, MA: MIT Press.

Blumstein, S. E. , & Stevens, K. N. ( 1981 ). Phonetic features and acoustic invariance in speech.   Cognition , 10, 25–32

Google Scholar

Bowers, J. S. , Mattys, S. L. , & Gage, S. H. ( 2009 ). Preserved implicit knowledge of a forgotten childhood language. Psychological Science , 20, 1064–1069.

Bradlow, A. R. , & Bent, T. ( 2008 ). Perceptual adaptation to non-native speech.   Cognition , 106, 707–729.

Bradlow, A. R. , Pisoni, D. B. , Yamada, R. A. , & Tohkura, Y. ( 1997 ). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America , 101, 2299–2310.

Carlson, R. , & Hawkins, S. (2007). When is fine phonetic detail a detail? In Proceedings of the 16th ICPhS Meeting (pp. 211–214). Saarbrücken, Germany.

Clarke, C. M. , & Garrett, M. F. ( 2004 ). Rapid adaptation to foreign-accented English. Journal of the Acoustical Society of America , 116, 3647–3658.

Clarkson, R. L. , Eimas, P. D. , & Marean, G. C. ( 1989 ). Speech perception in children with histories of recurrent otitis media. Journal of the Acoustical Society of America , 85, 926–933.

Connine, C. M. , & Clifton, C. ( 1987 ) Interactive use of lexical information in speech perception. Journal of Experimental Psychology: Human Perception and Performance , 13, 291–299.

Cutler, A. ( 1994 ). Segmentation problems, rhythmic solutions.   Lingua , 92, 81–104

Cutler, A. , & Norris, D. ( 1979 ). Monitoring sentence comprehension. In W. E. Cooper & E. C. T. Walker (Eds.), Sentence processing: Psycholinguistic studies presented to Merrill Garrett (pp. 113–134). Hillsdale, NJ: Erlbaum.

Dahan, D . ( 2010 ). The time course of interpretation in speech comprehension. Current Directions in Psychological Science, 19, 121–126.

Delattre, P. C. , Liberman, A. M. , & Cooper, F. S. ( 1955 ). Acoustic loci and transitional cues for consonants.   Journal of the Acoustical Society of America , 27, 769–773.

Diehl, R. L. , Lotto, A. J. , & Holt, L. L. ( 2004 ). Speech perception. Annual Review of Psychology , 55, 149–179.

Dupoux, E. , & Mehler, J. ( 1990 ). Monitoring the lexicon with normal and compressed speech: Frequency effects and the prelexical code. Journal of Memory and Language , 29, 316–335.

Eimas, P. D. ( 1974 ). Auditory and linguistic processing of cues for place of articulation by infants. Perception and Psychophysics , 16, 513–521.

Eimas, P. D. , Siqueland, E. R. , Jusczyk, P. , & Vigorito, J. ( 1971 ). Speech perception in infants. Science , 171, 303–306.

Everman, G. , Chan, H. Y. , Gales, M. J. F , Jia, B. , Mrva, D. , & Woodland, P. C. ( 2005 ). Training LVCSR systems on thousands of hours of data. In Proceedings of the IEEE ICASSP (pp. 209–212).

Felleman, D. , & Van Essen, D. ( 1991 ). Distributed hierarchical processing in primate cerebral cortex. Cerebral Cortex , 1, 1–47.

Fenn, K. M. , Nusbaum, H. C. , & Margoliash, D. ( 2003 ). Consolidation during sleep of perceptual learning of spoken language. Nature , 425, 614–616.

Fougeron, C. , & Keating, P. A. ( 1997 ). Articulatory strengthening at edges of prosodic domains. Journal of the Acoustical Society of America , 101, 3728–3740.

Foulke, E. , & Sticht, T. G. ( 1969 ). Review of research on the intelligibility and comprehension of accelerated speech. Psychological Bulletin , 72, 50–62.

Fowler, C. A. ( 1986 ). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics , 14, 3–28.

Fowler, C. A. ( 1996 ). Listeners do hear sounds not tongues.   Journal of the Acoustical Society of America , 99, 1730–1741.

Fowler, C. A. ( 2006 ). Compensation for coarticulation reflects gesture perception, not spectral contrast. Perception and Psychophysics , 68, 161–177.

Fowler, C. A. ( 2008 ). The FLMP STMPed.   Psychonomic Bulletin and Review , 15, 458–462

Fowler, C. A. , Brown, J. M. , & Mann, V. A. ( 2000 ). Contrast effects do not underlie effects of preceding liquids on stop-consonant identification by humans. Journal of Experimental Psychology: Human Perception and Performance , 26, 877–888.

Galantucci, B. , Fowler, C. A. , & Turvey, M. T. ( 2006 ). The motor theory of speech perception reviewed. Psychonomic Bulletin and Review , 13, 361–377.

Ganong, W. F. ( 1980 ). Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance , 6, 110–125.

Goldinger, S. D. ( 1998 ). Echoes of echoes? An episodic theory of lexical access. Psychological Review , 105, 251–279.

Goldinger, S. D. , Pisoni, D. B. , & Luce, P. A. ( 1996 ). Speech perception and spoken word recognition: Research and theory. In N. J. Lass (Ed.), Principles of experimental phonetics (pp. 277–327). St. Louis, MO: Mosby.

Google Preview

Gow, D. W. ( 2000 ). One phonemic representation should suffice.   Behavioral and Brain Science , 23, 331.

Gow, D. W. , & Segawa, J. A. ( 2009 ). Articulatory mediation of speech perception: A causal analysis of multi-modal imaging data. Cognition , 110, 222–236.

Gow, D. W. , Segawa, J. A. , Ahlfors, S. P. , & Lin, F. H. ( 2008 ). Lexical influences on speech perception: A Granger causality analysis of MEG and EEG source estimates. Neuroimage , 43, 614–23.

Grossberg, S. ( 1986 ). The adaptive self-organization of serial order in behavior: Speech, language, and motor control. In E. C. Schwab & H. C. Nusbaum (Eds.), Pattern recognition by humans and machines, Vol 1. Speech perception (pp. 187–294). New York: Academic Press.

Grossberg, S. ( 1987 ). Competitive learning: From interactive activations to adaptive resonance. Cognitive Science , 11, 23–63

Grossberg, S. ( 2000 a). How hallucinations may arise from brain mechanisms of learning, attention, and volition. Journal of the International Neuropsychological Society , 6, 579–588.

Grossberg, S. ( 2000 b). Brain feedback and adaptive resonance in speech perception. Behavioral and Brain Science , 23, 332–333.

Grossberg, S. , Boardman, I. , & Cohen, M. A. ( 1997 ). Neural dynamics of variable-rate speech categorization. Journal of Experimental Psychology: Human Perception and Performance , 23, 481–503.

Grossberg, S. , & Myers, C. ( 1999 ). The resonant dynamics of conscious speech: Interword integration and duration-dependent backward effects. Psychological Review , 107, 735–767.

Houston, D. M. , Pisoni, D. B. , Kirk, K. I. , Ying, E. A. , & Miyamoto, R. T. ( 2003 ). Speech perception skills of infants following cochlear implantation: A first report. International Journal of Pediatric Otorhinolaryngology , 67, 479–495.

Huggins, A.W. F. ( 1975 ). Temporally segmented speech and “echoic” storage. In A. Cohen & S. G. Nooteboom (Eds.), Structure and process in speech perception (pp. 209–225). New York: Springer-Verlag.

Jelinek, F. ( 1976 ). Continuous speech recognition by statistical methods. Proceedings of the IEEE , 64, 532–556.

Jelinek, F. ( 1985 ). Every time I fire a linguist, the performance of my system goes up . Public statement at the IEEE ASSPS Workshop on Frontiers of Speech Recognition, Tampa, Florida.

Jusczyk, P. W. , & Luce, P. A. ( 2002 ). Speech perception and spoken word recognition: Past and present. Ear and Hearing , 23, 2–40.

Klatt, D. H. ( 1977 ). Review of the ARPA speech understanding project.   Journal of the Acoustical Society of America , 62, 1345–1366.

Kluender, K. R. , Diehl, R. L. , & Killeen, P. R. ( 1987 ). Japanese quail can form phonetic categories.   Science , 237, 1195–1197.

Kuhl, P. K. ( 1981 ). Discrimination of speech by non-human animals: Basic auditory sensitivities conductive to the perception of speech-sound categories. Journal of the Acoustical Society of America , 95, 340–349.

Kuhl, P. K. ( 2000 ). A new view of language acquisition.   Proceedings of the National Academy of Sciences USA , 97, 11850–11857.

Kuhl, P. K. ( 2004 ). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience , 5, 831–843.

Kuhl, P. K. , & Meltzoff, A. N. ( 1982 ). The bimodal development of speech in infancy. Science , 218, 1138–1141.

Lacerda, F. , & Sundberg, U. ( 2001 ). Auditory and articulatory biases influence the initial stages of the language acquisition process. In F. Lacerda , C. von Hofsten , & M. Heimann (Eds.), Emerging cognitive abilities in early infancy (pp. 91–110). Mahwah, NJ: Erlbaum.

Lachs, L. , Pisoni, D. B. , & Kirk, K. I. ( 2001 ). Use of audio-visual information in speech perception by pre-lingually deaf children with cochlear implants: A first report. Ear and Hearing , 22, 236–251.

Lamel, L. , Gauvain, J-L. , & Adda, G. ( 2000 ). Lightly supervised acoustic model training. In Proceeding of the ISCA Workshop on Automatic Speech Recognition (pp. 150–154).

Lee, A. , Whitehall, T. L. , & Coccia, V. ( 2009 ). Effect of listener training on perceptual judgement of hypernasality. Clinical Linguistics and Phonetics , 23, 319–334.

Liberman, A. M. ( 1996 ). Speech: A special code . Cambridge, MA: MIT Press.

Liberman, A. M. , Delattre, P. C. , & Cooper, F. S. ( 1958 ). Some cues for the distinction between voiced and voiceless stops in initial position. Language and Speech , 1, 153–167.

Liberman, A. M. , Harris, K. S. , Eimas, P. , Lisker, L. , & Bastian, J. ( 1961 ). An effect of learning on speech perception: The discrimination of durations of silence with and without phonemic significance. Language and Speech , 4, 175–195.

Liberman, A. M. , Harris, K. S. , Hoffman, H. S. , & Griffith, B. C. ( 1957 ). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology , 54, 358–368.

Liberman, A. M. , & Mattingly, I. G. ( 1985 ). The motor theory of speech perception revised. Cognition , 21, 1–36.

Lindblom, B. ( 1963 ). Spectrographic study of vowel reduction.   Journal of the Acoustical Society of America , 35, 1773–1781.

Lippmann, R. ( 1997 ). Speech recognition by machines and humans.   Speech Communication , 22, 1–16.

Lisker, L. , & Abramson, A. S. ( 1970 ). The voicing dimensions: Some experiments in comparative phonetics. In Proceedings of the Sixth International Congress of Phonetic Sciences (pp. 563–567) . Prague, Czechoslovakia: Academia.

Logan, J. S. , Lively, S. E. , & Pisoni, D. B. ( 1991 ). Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America , 89, 874–886.

Lotto, A. J. , Hickok, G. S. , & Holt, L. L. ( 2009 ). Reflections on mirror neurons and speech perception.   Trends in Cognitive Science , 13, 110–114.

Lotto, A. J. , & Holt, L. L. ( 2006 ). Putting phonetic context effects into context: A commentary on Fowler (2006). Perception and Psychophysics , 68, 178–183.

Lotto, A. J. , & Kluender, K. R. ( 1998 ). General contrast effects in speech perception: Effect of preceding liquid on stop consonant identification. Perception and Psychophysics , 60, 602–619.

Luce, P. A. , Mc Lennan, C. T. , & Charles-Luce, J. ( 2003 ). Abstractness and specificity in spoken word recognition: Indexical and allophonic variability in long-term repetition priming. In J. Bowers & C. Marsolek (Eds.), Rethinking implicit memory (pp. 197–214). New York: Oxford University Press.

Magnuson, J. S. , McMurray, B. , Tanenhaus, M. K. , & Aslin, R. N. ( 2003 ). Lexical effects on compensation for coarticulation: The ghost of Christmas past. Cognitive Science , 27, 285–298.

Mann, V. A. ( 1980 ). Influence of preceding liquid on stop-consonant perception. Perception and Psychophysics , 28, 407–412.

Marslen-Wilson, W. D. ( 1987 ). Functional parallelism in spoken word recognition. Cognition , 25, 71–102.

Marslen-Wilson, W. D. , & Tyler, L. K. ( 1980 ). The temporal structure of spoken language understanding. Cognition , 8, 1–71.

Massaro, D. W. ( 1987 ). Speech perception by ear and eye: A paradigm for psychological inquiry . Hillsdale, NJ: Erlbaum.

Massaro, D. W. ( 1989 ). Testing between the TRACE model and the Fuzzy Logical Model of speech perception. Cognitive Psychology , 21, 398–421.

Massaro, D. W. ( 1996 ). Integration of multiple sources of information in language processing. In T. Inui & J. L. McClelland (Eds.), Attention and performance XVI: Information integration in perception and communication (pp. 397–432). Cambridge, MA: MIT Press.

Massaro, D. W. ( 2000 ). The horse race to language understanding: FLMP was first out of the gate and has yet to be overtaken. Behavioral and Brain Science , 23, 338–339.

Massaro, D. W. , & Chen, T. H. ( 2008 ). The motor theory of speech perception revisited. Psychonomic Bulletin and Review , 15, 453–457.

Massaro, D. W. , Thompson, L. A. , Barron, B. , & Laren, E. ( 1986 ). Developmental changes in visual and auditory contributions to speech perception. Journal of Experimental Child Psychology , 41, 93–113.

Mattingly, I. G. , Liberman, A. M. , Syrdal A. K. , & Halwes T. ( 1971 ). Discrimination in speech and nonspeech modes.   Cognitive Psychology , 2, 131–157.

Mattys, S. L. ( 1997 ). The use of time during lexical processing and segmentation: A review. Psychonomic Bulletin and Review , 4, 310–329.

Mattys, S. L. , & Liss, J. M. ( 2008 ). On building models of spoken-word recognition: When there is as much to learn from natural “oddities” as from artificial normality. Perception and Psychophysics , 70, 1235–1242.

Mattys, S. L. , White, L. , & Melhorn, J. F ( 2005 ). Integration of multiple speech segmentation cues: A hierarchical framework. Journal of Experimental Psychology: General , 134, 477–500.

McClelland, J. L. ( 1991 ). Stochastic interactive processes and the effect of context on perception. Cognitive Psychology , 23, 1–44.

McClelland, J. L. , & Elman, J. L. ( 1986 ). The TRACE model of speech perception. Cognitive Psychology , 18, 1–86.

McCloskey, M. , & Cohen, N. J. ( 1989 ). Catastrophic interference in connectionist networks: The sequential learning problem. The Psychology of Learning and Motivation , 24, 109–165.

McGettigan, C. , Agnew, Z. K. , & Scott, S. K. ( 2010 ). Are articulatory commands automatically and involuntarily activated during speech perception? Proceedings of the National Academy of Sciences USA , 107, E42.

McGurk, H. , & MacDonald, J. W. ( 1976 ). Hearing lips and seeing voices.   Nature , 264, 746–748.

McLennan, C. T. , Luce, P. A. , & Charles-Luce, J. ( 2005 ). Examining the time course of indexical specificity effects in spoken word recognition. Journal of Experimental Psychology: Learning Memory and Cognition , 31, 306–321.

McQueen, J. M. ( 1998 ). Segmentation of continuous speech using phonotactics. Journal of Memory and Language , 39, 21–46.

McQueen, J. M. , Cutler, A. , & Norris, D. ( 2006 ). Phonological abstraction in the mental lexicon. Cognitive Science , 30, 1113–1126.

Miller, J. D.   Wier, C. C. , Pastore, R. , Kelly, W. J. , & Dooling, R. J. ( 1976 ). Discrimination and labeling of noise-buzz sequences with varying noise lead times: An example of categorical perception. Journal of the Acoustical Society of America , 60, 410–417.

Miller, J. L. , & Liberman, A. M. ( 1979 ). Some effects of later-occurring information on the perception of stop consonant and semivowel. Perception and Psychophysics , 25, 457–465.

Mody, M. , Schwartz, R. G. , Gravel, R. S. , & Ruben, R. J. ( 1999 ). Speech perception and verbal memory in children with and without histories of otitis media. Journal of Speech, Language and Hearing Research , 42, 1069–1079.

Montant, M. ( 2000 ). Feedback: A general mechanism in the brain.   Behavioral and Brain Science , 23, 340–341.

Moore, R. K. ( 2007 ). Spoken language processing by machine. In G. Gaskell (Ed.), Oxford handbook of psycholinguistics (pp. 723–738). Oxford, England: Oxford University Press.

Moore, R. K. , & Cutler, A. (2001, July 11-13). Constraints on theories of human vs. machine recognition of speech . Paper presented at the SPRAAC Workshop on Human Speech Recognition as Pattern Classification, Max-Planck-Institute for Psycholinguistics, Nijmegen, The Netherlands.

Moore, R. K. , & Maier, V . (2007). Preserving fine phonetic detail using episodic memory: Automatic speech recognition using MINERVA2. In Proceedings of the 16th ICPhS Meeting (pp. 197–203). Saarbrücken, Germany.

Movellan, J. R. , & McClelland, J. L. ( 2001 ). The Morton-Massaro law of information integration: Implications for models of perception. Psychological Review , 108, 113–148.

Nooteboom, S. G. ( 1979 ). The time course of speech perception. In W. J. Barry & K. J. Kohler (Eds.), “Time” in the production and perception of speech (Arbeitsberichte 12). Kiel, Germany: Institut für Phonetik, University of Kiel.

Norris, D. ( 1994 ). Shortlist: A connectionist model of continuous speech recognition. Cognition , 52, 189–234.

Norris, D. , & McQueen, J. M. ( 2008 ). Shortlist B: A Bayesian model of continuous speech recognition. Psychological Review , 115 , 357–395.

Norris, D. , McQueen. J. M. , & Cutler, A. ( 2000 ). Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences , 23, 299–370.

Norris, D. , McQueen, J. M. , & Cutler, A. ( 2003 ). Perceptual learning in speech. Cognitive Psychology , 47, 204–238.

Oden, G. C. ( 2000 ). Implausibility versus misinterpretation of the FLMP.   Behavioral and Brain Science , 23, 344.

Oden, G. C. , & Massaro, D. W. ( 1978 ). Integration of featural information in speech perception. Psychological Review , 85, 172–191.

Ostendorf, M. , Digilakis, V. , & Kimball, O. A. ( 1996 ). From HMMs to segment models: A unified view of stochastic modelling for speech recognition. IEEE Transactions, Speech and Audio Processing , 4, 360–378.

Pardo, J. S. , & Remez, R. E. ( 2006 ). The perception of speech. In M. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2nd ed., pp. 201–248). New York: Academic Press.

Pisoni, D. B. , & Luce, P. A. ( 1987 ). Acoustic-phonetic representations in word recognition. Cognition , 25, 21–52.

Pitt, M. A. , & Samuel, A. G. ( 1995 ). Lexical and sublexical feedback in auditory word recognition. Cognitive Psychology , 29 , 149–188.

Polka, L. , Colantonio, C. , & Sundara, M. ( 2001 ). A cross-language comparison of /d/–/Δ/ perception: Evidence for a new developmental pattern. Journal of the Acoustical Society of America , 109, 2190–2201.

Port, R. F. (1977). The influence of speaking tempo on the duration of stressed vowel and medial stop in English Trochee words . Unpublished Ph.D. dissertation, Indiana University, Bloomington.

Potter, R. K. , Kopp, G. A. , & Green, H. C. ( 1947 ). Visible speech . New York: D. Van Nostrand.

Pulvermüller, F. , Huss, M. , Kherif, F. , Moscoso Del Prado Martin, F. , Hauk, O. , & Shtyrof, Y. ( 2006 ). Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences USA , 103, 7865–7870.

Rabiner, L. , & Juang, B. H. ( 1993 ). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

Radeau, M. , Morais, J. , Mousty, P. , & Bertelson, P. ( 2000 ). The effect of speaking rate on the role of the uniqueness point in spoken word recognition. Journal of Memory and Language , 42, 406–422.

Rastle, K. , Davis, M. H. , & Brysbaert, M. , ( 2010 ). Response to McGettigan et al.: Task-based accounts are not sufficiently coherent to explain articulatory effects in speech perception. Proceedings Proceedings of the National Academy of Sciences USA , 107, E43.

Reisberg, D. , Mc Lean, J. , & Goldfield, A. ( 1987 ). Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. In R. Campbell & B. Dodd (Eds.), Hearing by eye: The psychology of lip-reading (pp. 97–114). Hillsdale, NJ: Erlbaum.

Rizzolatti, G. , & Craighero, L. ( 2004 ). The mirror-neuron system.   Annual Review of Neuroscience , 27, 169–192,

Rosenblum, L. D. ( 2005 ). Primacy of multimodal speech perception. In D. B. Pisoni & R. E. Remez (Eds.), The handbook of speech perception (pp. 51–78). Oxford, England: Blackwell.

Rosenblum, L. D. , Schmuckler, M. A. , & Johnson, J. A. ( 1997 ). The McGurk effect in infants.   Perception and Psychophysics , 59, 347–357.

Russell, M. J. (1993). A segmental HMM for speech pattern modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 640–643)

Samuel, A. G. ( 1981 ). Phonemic restoration: Insights from a new methodology. Journal of Experimental Psychology: General , 110, 474–494.

Samuel, A. G. ( 1997 ). Lexical activation produces potent phonemic percepts. Cognitive Psychology , 32, 97–127.

Samuel, A. G. ( 2000 ). Merge: Contorted architecture, distorted facts, and purported autonomy. Behavioral and Brain Science , 23, 345–346.

Samuel, A. G. ( 2001 ). Knowing a word affects the fundamental perception of the sounds within it. Psychological Science , 12, 348–351.

Samuel, A. G. , & Kraljic, T. ( 2009 ). Perceptual learning for speech.   Attention, Perception, and Psychophysics , 71, 1207–1218.

Scharenborg, O. , Norris, D. , ten Bosch, L. , & Mc Queen, J. M. ( 2005 ). How should a speech recognizer work ? Cognitive Science , 29, 867–918.

Stevens, K. N. ( 2000 ). Recognition of continuous speech requires top-down processing. Behavioral and Brain Science , 23, 348.

Stevens, K. N. , & Blumstein, S. E. ( 1981 ). The search for invariant acoustic correlates of phonetic features. In P. Eimas & J. Miller (Eds.), Perspectives on the study of speech (pp. 1–38). Hillsdale, NJ: Erlbaum.

Sumby, W. H. , & Pollack, I. ( 1954 ). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America , 26, 212–215.

Sussman, H. M. ( 1989 ). Neural coding of relational invariance in speech: Human language analogs to the barn owl. Psychological Review , 96, 631–642.

Summerfield, A. Q. ( 1981 ). Articulatory rate and perceptual constancy in phonetic perception. Journal of Experimental Psychology: Human Perception and Performance , 7, 1074–1095.

Trehub, S. E. ( 1976 ). The discrimination of foreign speech contrasts by infants and adults. Child Development , 47, 466–472.

Umeda, N. , & Coker, C. H. ( 1974 ). Allophonic variation in American English.   Journal of Phonetics , 2, 1–5.

van Buuren, R. A. , Festen, J. , & Houtgast, T . ( 1999 ). Compression and expansion of the temporal envelope: Evaluation of speech intelligibility and sound quality. Journal of the Acoustical Society of America , 105, 2903–2913.

Vroomen, J. , Van Linden, B. , Keetels, M. , de Gelder, B. , & Bertelson, P. ( 2004 ). Selective adaptation and recalibration of auditory speech by lipread information: Dissipation. Speech Communication , 44, 55–61.

Warren, R. M. , & Obusek, C. J. ( 1971 ). Speech perception phonemic restorations. Perception & Psychophysics , 9 , 358–362.

Warren, R. M. , & Warren, R. P. ( 1970 ). Auditory illusions and confusions.   Scientific American , 223 , 30–36.

Whalen, D. H. , Benson, R. R. , Richardson, M. , Swainson, B. , Clark, V. P. , Lai, S. ,… Liberman, A. M. ( 2006 ). Differentiation of speech and nonspeech processing within primary auditory cortex. Journal of the Acoustical Society of America , 119, 575–581.

Yuen, I. , Davis, M. H. , Brysbaert, M. , & Rastle, K. ( 2010 ). Activation of articulatory information in speech perception. Proceedings of the National Academy of Sciences USA , 107, 592–597.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

The Power of Words - Unveiling the Psychology of Speech for Effective Communication and Influence

Updated on 23rd May, 2023

The Fascinating Field of Psychology of Speech

The study of human communication and the intricate interplay between speech and psychology has given rise to a captivating field known as the psychology of speech. This multidisciplinary area of research delves into how speech influences our thoughts, behaviors, and interactions with others. By examining the psychological aspects of speech, we can unravel the complexities of language, communication, and cognition.

The psychology of speech encompasses a wide range of subfields, including speech perception, production, comprehension, and language development. Through rigorous scientific inquiry and investigation, researchers in this field aim to unravel the mysteries behind how we perceive, produce, and understand speech.

One of the fundamental aspects studied in the psychology of speech is speech perception. This involves understanding how we process and interpret speech sounds, tones, and linguistic cues. Researchers explore how our brains analyze phonetic information, recognize patterns, and extract meaning from the sounds and rhythms of speech.

Speech production is another crucial area of inquiry within the psychology of speech. It focuses on the cognitive and physiological processes involved in planning, coordinating, and executing speech movements. Understanding how our thoughts are transformed into spoken words sheds light on the complex motor skills and neural mechanisms that underlie our ability to communicate orally.

Researchers explore how our brains analyze phonetic information, recognize patterns, and extract meaning from the sounds and rhythms of speech.

Comprehension is an essential component of speech psychology, investigating how we derive meaning from the words and sentences we hear. It explores the role of linguistic structures, context, and cognitive processes in understanding spoken language. By deciphering the intricate workings of comprehension, researchers strive to uncover the mechanisms that allow us to extract and interpret meaning from spoken communication.

Language development is a fascinating aspect of the psychology of speech, focusing on how children acquire language skills and how language evolves throughout our lifespan. Researchers examine the cognitive, social, and environmental factors that influence language acquisition, such as the role of caregiver interactions and exposure to linguistic stimuli.

The knowledge and insights gained from the psychology of speech have practical applications in various domains. Effective communication is crucial in fields such as education, healthcare, business, and interpersonal relationships. By understanding the psychological underpinnings of speech, professionals can enhance their communication skills, tailor their messages to different audiences, and foster stronger connections.

The Role of Psychology of Speech in Public Speaking

Public speaking is a skill that many individuals strive to master. It involves effectively delivering a message to an audience, capturing their attention, and persuading or informing them. The psychology of speech plays a crucial role in understanding the dynamics of public speaking and can provide valuable insights for speakers aiming to engage and connect with their audience.

The psychology of speech sheds light on various aspects that contribute to effective public speaking. One key area of focus is nonverbal communication. Researchers explore how body language, facial expressions, gestures, and vocal tone impact the audience's perception and engagement. Understanding how to align verbal and nonverbal cues can enhance a speaker's ability to convey their message persuasively.

Speech psychology also emphasizes the importance of vocal delivery. The tone, pitch, volume, and pace of speech significantly influence the audience's perception of a speaker's credibility, confidence, and overall message. By understanding the psychology of speech, speakers can learn to modulate their voice, use pauses strategically, and emphasize key points effectively.

Moreover, the psychology of speech highlights the significance of audience analysis and adaptation. Speakers must consider the demographics, preferences, and needs of their audience to tailor their content and delivery style accordingly. Adapting to the audience's communication style, language, and cultural background can foster rapport and engagement.

Another crucial aspect explored in the psychology of speech is the management of anxiety and nervousness. Public speaking often elicits anxiety, which can impact a speaker's delivery and confidence. Understanding the psychological factors underlying these feelings can help speakers employ strategies to manage anxiety effectively, such as deep breathing exercises, positive self-talk, and visualization techniques.

Additionally, the psychology of speech recognizes the power of storytelling in public speaking. By integrating storytelling techniques, speakers can tap into the emotional and narrative elements that resonate with the audience. Understanding the cognitive processes and emotional responses triggered by storytelling can make a speech more memorable and impactful.

The Influence of Psychology of Speech in Effective Communication Skills

Effective communication skills are vital in various aspects of life, including personal relationships, professional settings, and social interactions. The psychology of speech offers valuable insights into understanding and enhancing communication skills, enabling individuals to convey their messages clearly, connect with others, and build meaningful relationships.

Speech psychology emphasizes the role of active listening in effective communication. By understanding how people interpret and process verbal and nonverbal cues, individuals can become more attentive listeners. Active listening involves focusing on the speaker, providing verbal and nonverbal feedback, and demonstrating empathy. Developing active listening skills enhances mutual understanding and strengthens communication bonds.

Effective communication is crucial in fields such as education, healthcare, business, and interpersonal relationships.

The psychology of speech also explores the power of effective questioning in communication. Asking relevant and open-ended questions can encourage dialogue, promote deeper understanding, and elicit valuable insights. By mastering the art of asking insightful questions, individuals can foster meaningful conversations and demonstrate genuine interest in others.

Nonverbal communication is another essential aspect studied in the psychology of speech. Body language, facial expressions, eye contact, and gestures can convey emotions, attitudes, and intentions. By becoming aware of these nonverbal cues, individuals can align their verbal and nonverbal communication to enhance clarity and avoid potential misinterpretations.

Understanding the psychology of speech also sheds light on the impact of emotional intelligence in effective communication. Emotional intelligence involves recognizing and managing one's own emotions while empathizing with the emotions of others. By developing emotional intelligence, individuals can navigate conflicts, respond appropriately to others' emotions, and cultivate healthier and more productive communication dynamics.

The psychology of speech also acknowledges the role of assertiveness in effective communication. Being assertive means expressing thoughts, needs, and boundaries in a respectful and confident manner. By developing assertiveness skills, individuals can communicate their perspectives effectively, establish clear boundaries, and engage in constructive problem-solving.

Moreover, speech psychology highlights the importance of adapting communication styles to different contexts and individuals. By understanding the psychology of speech in relation to diverse cultural backgrounds, personality traits, and communication preferences, individuals can adjust their communication approach to foster understanding and establish stronger connections.

The Psychology of Speech in Interpersonal Relationships

Interpersonal relationships play a vital role in our lives, shaping our well-being, happiness, and overall satisfaction. The psychology of speech offers valuable insights into how communication patterns, language use, and speech behaviors influence the dynamics and quality of interpersonal relationships.

One crucial aspect explored in the psychology of speech is the role of effective communication in building and maintaining healthy relationships. Clear and open communication fosters trust, understanding, and emotional connection between individuals. By understanding the principles of effective communication, such as active listening, assertiveness, and empathy, individuals can establish stronger and more fulfilling relationships.

The tone, pitch, volume, and pace of speech significantly influence the audience

The psychology of speech also delves into the impact of communication styles on relationship dynamics. Different communication styles, such as passive, aggressive, or passive-aggressive, can significantly influence how individuals interact and respond to one another. By recognizing and adapting communication styles, individuals can promote positive communication patterns and resolve conflicts constructively.

Language use and speech behaviors are additional areas of focus in the psychology of speech in interpersonal relationships. The choice of words, tone of voice, and nonverbal cues can affect how messages are received and interpreted by others. Developing awareness of these factors enables individuals to express themselves more effectively and avoid misunderstandings or miscommunications.

Speech psychology also explores the influence of emotional expression and validation in interpersonal relationships. The ability to express and validate emotions promotes a sense of closeness, understanding, and emotional support. Understanding the psychological impact of emotional expression can enhance emotional connection and strengthen relationships.

Conflict resolution is another crucial aspect studied in the psychology of speech. Effective conflict resolution techniques, such as active listening, perspective-taking, and constructive problem-solving, contribute to healthier and more resilient relationships. By understanding the psychological underpinnings of conflict and applying effective communication strategies, individuals can navigate disagreements and maintain positive relationship dynamics.

Additionally, the psychology of speech acknowledges the significance of nonverbal communication in interpersonal relationships. Body language, facial expressions, touch, and eye contact can convey trust, affection, and intimacy. Developing awareness of nonverbal cues can enhance the overall quality of interpersonal relationships.

The Psychology of Speech in Persuasive Communication

Persuasive communication is a skill that plays a significant role in various domains, including marketing, advertising, politics, and everyday interactions. The psychology of speech provides valuable insights into the principles and techniques that contribute to effective persuasive communication, enabling individuals to influence attitudes, behaviors, and decision-making.

One essential aspect explored in the psychology of speech is the art of framing. Framing involves presenting information in a way that influences how it is perceived and interpreted. By understanding the cognitive biases and heuristics that individuals rely on when processing information, persuasive communicators can strategically frame their messages to increase their persuasive impact.

Speech psychology also emphasizes the power of storytelling in persuasive communication. Stories tap into emotions, engage the audience, and make information more relatable and memorable. By incorporating compelling narratives into their messages, persuasive communicators can capture attention, evoke empathy, and ultimately influence beliefs and behaviors.

The psychology of speech also explores the role of credibility and social proof in persuasive communication. People are more likely to be persuaded by individuals they perceive as credible and by evidence that demonstrates consensus among others. By establishing credibility, providing expert opinions, and leveraging social proof, persuasive communicators can enhance their persuasive impact.

Understanding the psychology of speech also sheds light on the importance of audience analysis in persuasive communication. Persuasive messages need to be tailored to the values, needs, and beliefs of the target audience. By conducting thorough audience research and segmentation, communicators can customize their messages to resonate with specific groups and increase their persuasive influence.

Nonverbal communication is another essential aspect studied in the psychology of speech.

The psychology of speech also acknowledges the role of emotion in persuasive communication. Emotions can evoke strong responses and motivate individuals to take action. Persuasive communicators strategically evoke emotions, such as fear, joy, or empathy, to influence attitudes and behaviors. By understanding the emotional triggers of the target audience, communicators can effectively appeal to their emotions and enhance persuasive outcomes.

Additionally, the psychology of speech recognizes the impact of language and rhetoric in persuasive communication. The choice of words, persuasive techniques, and rhetorical devices can significantly influence how messages are received and interpreted. By mastering rhetorical strategies, such as repetition, rhetorical questions, and appeals to logic or emotions, communicators can increase the persuasive power of their messages.

The Psychology of Speech in Effective Leadership Communication

Effective leadership communication is essential for inspiring and guiding teams, fostering collaboration, and achieving organizational goals. The psychology of speech provides valuable insights into the principles and strategies that contribute to effective leadership communication, enabling leaders to influence, motivate, and engage their followers.

One key aspect explored in the psychology of speech is the importance of clarity and conciseness in leadership communication. Leaders must convey their messages in a clear and straightforward manner to ensure understanding and minimize misinterpretation. By using concise language, avoiding jargon, and providing specific instructions, leaders can enhance their communication effectiveness.

Speech psychology also emphasizes the significance of active listening in effective leadership communication. Listening attentively to team members fosters trust, promotes open dialogue, and demonstrates respect. By practicing active listening, leaders can gain valuable insights, address concerns, and make team members feel heard and valued.

The psychology of speech also recognizes the importance of nonverbal communication in leadership communication. Leaders' body language, facial expressions, and gestures can influence how their messages are received and interpreted. By being aware of their nonverbal cues, leaders can align their verbal and nonverbal communication to enhance credibility, engagement, and connection with their team.

Understanding the psychology of speech also sheds light on the power of inspirational and motivational communication in leadership. Leaders who can inspire and motivate their team members create a sense of purpose, commitment, and enthusiasm. By using persuasive techniques, storytelling, and appeals to shared values, leaders can ignite passion and drive performance.

Language use and speech behaviors are additional areas of focus in the psychology of speech in interpersonal relationships.

The psychology of speech also explores the impact of emotional intelligence in leadership communication. Leaders who can understand and manage their own emotions while empathizing with others create an atmosphere of trust and psychological safety. By demonstrating empathy, emotional awareness, and effective emotional expression, leaders can foster positive relationships and enhance team dynamics.

Furthermore, the psychology of speech recognizes the significance of adaptability in leadership communication. Leaders must adapt their communication style and approach based on the needs, preferences, and cultural backgrounds of their team members. By being flexible and accommodating, leaders can establish rapport, build stronger connections, and promote a positive and inclusive work environment.

The Psychology of Speech in Public Speaking and Presentation Skills

Public speaking and presentation skills are essential in various professional and personal settings, ranging from business presentations to educational seminars and social events. The psychology of speech provides valuable insights into the principles and techniques that contribute to effective public speaking and presentation skills, enabling individuals to engage, inform, and persuade their audience.

One crucial aspect explored in the psychology of speech is the significance of audience analysis in public speaking and presentations. Understanding the demographics, knowledge levels, and interests of the audience allows speakers to tailor their message to meet the audience's needs and capture their attention. By conducting thorough audience research and adapting their content and delivery style accordingly, speakers can enhance their impact.

Speech psychology also emphasizes the power of storytelling in public speaking and presentations. Stories have the ability to captivate audiences, evoke emotions, and make information more memorable. By incorporating relevant and engaging narratives into their speeches and presentations, speakers can create a deeper connection with their audience and increase their overall impact.

The psychology of speech also recognizes the importance of vocal delivery in public speaking. Tone, pitch, volume, and pace of speech can significantly influence how the audience perceives and engages with the message. By varying vocal delivery, using appropriate pauses, and emphasizing key points, speakers can effectively convey their ideas and maintain the audience's interest throughout the presentation.

Body language, facial expressions, and gestures can complement and reinforce the spoken message.

Furthermore, the psychology of speech recognizes the importance of visual aids in supporting public speaking and presentations. Effective use of visual aids, such as slides, charts, and videos, can enhance audience understanding and retention of information. By using visually appealing and relevant visuals, speakers can reinforce their key points and engage the audience visually.

The psychology of speech also acknowledges the role of confidence and self-belief in public speaking. Confidence is contagious and can positively impact audience engagement and perception of the speaker. By practicing and preparing thoroughly, managing nervousness, and projecting self-assurance, speakers can deliver their message with conviction and authority.

The psychology of speech is a fascinating field that provides valuable insights into the intricacies of communication and its impact on various aspects of our lives. From understanding the psychology of speech in interpersonal relationships to persuasive communication, effective leadership, and public speaking, this discipline sheds light on the principles, strategies, and techniques that contribute to successful communication outcomes.

In the realm of interpersonal relationships, the psychology of speech reveals how effective communication, communication styles, language use, emotional expression, conflict resolution, and nonverbal communication influence relationship dynamics. By applying these principles, individuals can cultivate healthier, more satisfying relationships, fostering trust, understanding, and emotional connection.

When it comes to persuasive communication, the psychology of speech unravels the art of framing, storytelling, credibility, social proof, audience analysis, emotion, and rhetoric. By understanding these factors, communicators can tailor their messages to influence attitudes, behaviors, and decision-making, ultimately achieving their persuasive goals.

In the context of effective leadership, the psychology of speech highlights the importance of clarity, active listening, nonverbal communication, inspirational and motivational communication, emotional intelligence, and adaptability. Leaders who embody these qualities can effectively communicate, inspire, and engage their followers, driving organizational success.

Regarding public speaking and presentation skills, the psychology of speech emphasizes the significance of audience analysis, storytelling, vocal delivery, nonverbal communication, visual aids, and confidence. By mastering these elements, speakers can captivate audiences, convey their message with clarity, and leave a lasting impact.

In all these areas, the psychology of speech reveals that effective communication is not simply about the words spoken but also encompasses understanding the psychological nuances, considering the needs and preferences of the audience, and utilizing various techniques to engage, influence, and connect with others.

By studying the psychology of speech and applying its principles, individuals can enhance their communication skills, build stronger relationships, persuade effectively, lead with influence, and deliver impactful presentations. These insights enable us to navigate the complexities of human interaction, connect on a deeper level, and achieve our communication objectives.

Related Topics

Note: We don't waste your time by requiring you to create an account to comment :)

What do you think?

Logo for British Columbia/Yukon Open Authoring Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 6: Indirect Learning and Human Potential

Speech and Language

Civilization began the first time an angry person cast a word instead of a rock .

Sigmund Freud

Observational learning has been evidenced in many species of animals including birds (Zentall, 2004) but approximations to speech appear practically unique to humans. Paul Revere famously ordered a lantern signal of “one if by land and two if by sea” during his Revolutionary War midnight ride through the streets of Massachusetts. This is not functionally different from the distinct alarm calls emitted by vervet monkeys in the presence of eagles, snakes, and leopards (Strushaker, 1967; Seyfarth and Cheney, 1980). Through observational learning, young vervets learn to respond to different screeches for “heads up”, “heads down”, and “look around!” Vervets hide under trees to the eagle warning, rear on their hind paws to the snake warning, and climb the nearest tree to the leopard warning. Recently, even more descriptive “speech” has been demonstrated in prairie dogs (Slobodchikoff, Perla, & Verdolin, 2009). These examples are the closest we see to social learning of speech in other animals. Slobodchikoff (2012) has written a fun and informative review of animal communication entitled Chasing Dr. Doolittle: Learning the Language of Animals .

Meltzoff and Moore (1977, 1983) demonstrated unambiguous examples of imitation in infant humans as young as 12- to 21-days of age, leading to the conclusion that humans normally do not need to be taught this mode of learning.

Watch the following video of Dr. Metzloff describing his research demonstrating imitation in young infants:

Skinner (1986) contributed an interesting but admittedly post-hoc speculative theoretical article describing possible evolutionary scenarios for the adaptive learning of imitation and speaking. An imitative prompt is more informative than an ordinary gestural prompt in that it specifies the specific characteristics of a desired response. Speech is preferable to signing as a means of communication since it is possible at long distances and other circumstances where individuals cannot see each other.

Hockett’s Features of Language

If we are to understand human behavior, we must understand how language is acquired and its impact upon subsequent adaptive learning. Before we proceed, we must consider what we mean by language. Charles Hockett (1960) listed 13 features that he considered essential to language:

  • Vocal-auditory channel – We saw in Chapter 1 that the human being’s brain, with its disproportional amount of space dedicated to the tongue, larynx, and voice box, facilitates the acquisition of speech. Sign language, involving a manual-visual channel, is mostly restricted to deaf people and those wishing to communicate with them.
  • Broadcast transmission and directional reception – Sound is sent out in all directions while being received in a single place. This provides an adaptive advantage in that people can communicate with others out of their line of sight.
  • Rapid fading (transitoriness) – Sounds are temporary. Writing and audio-recordings are techniques used to address this limitation of speech (and alas, lectures).
  • Interchangeability – One must be able to transmit and receive messages.
  • Total feedback – One must be able to monitor one’s own use of language.
  • Specialization – The organs used for language must be specially adapted to that task. Human lips, tongues and throats meet this criterion.
  • Semanticity – Specific signals can be matched with specific meanings. Different sounds exist for different words.
  • Arbitrariness – There is no necessary connection between a meaningful unit (e.g., word) and its reference.
  • Discreteness – There are distinct basic units of sound (phonemes) and meaning (morphemes).
  • Displacement – One must be able to communicate about things that are not present. One must be able to symbolically represent the past and the future.
  • Productivity – The units of sound and meaning must be able to be combined to create new sounds and meaningful units (sentences).
  • Duality of patterning – The sequence of meaningful units must matter (i.e., there must be a syntax).
  • Traditional Transmission – Specific sounds and words must be learned from other language users.

Although all of Hockett’s features are frequently cited as  essential characteristics of language , the first three elements are restricted to speech. These features do not apply to sign language, letter writing, reading, and other examples of non-vocal/auditory modes of symbolic communication.  The essential characteristics are interchangeability, semanticity, arbitrariness, discreteness, productivity, syntax, and displacement.

Describe Hockett’s major characteristics of language.

Language Acquisition

The principles of predictive and control learning help us understand the acquisition of language and the role it plays in subsequent human adaptation. At a few months old, infants start to babble and are able to make all the possible human sounds. Eventually, as the child is increasingly exposed to the sounds of her/his social unit, some of the sounds are “selected” and others removed from the repertoire. Routh (1969) demonstrated that infants are able to make subtle discriminations in sounds. The frequency of speaking either vowels or consonants could be increased if selectively reinforced with tickles and “coos.” It has been demonstrated that the mother’s vocal imitation of a child’s verbalizations is also an effective reinforcer (Pelaez, Virues-Ortega, and Gewirtz, 2011).

Children may learn their first word as early as 9 months. Usually the first words are names of important people (“mama”, “dada”), often followed by greetings (“hi”, “bye”) and favored foods. As described in Chapter 5, classical conditioning procedures may be used to establish word meaning. For example, the sound “papa” is consistently paired with a particular person. Children are encouraged to imitate the sound in the presence of the father. It may be the source of humor (or embarrassment) when a child over-generalizes and uses the word for another male adult. With experience, children learn to attend to the relevant dimensions and apply words consistently and exclusively to the appropriate stimuli or actions (e.g., “walk”, “run”, “eat”, etc.). Similarly, words are paired with the qualities of objects (e.g., “red”, “circle”, etc.) and actions (e.g., “fast”, “loud”, etc.). Children learn to abstract out the common properties through the process of concept formation. Words are also paired with quantities of objects. In the same way that “redness” may be a quality of diverse stimuli having little else in common, “three-ness” applies to a particular number of diverse stimuli.

Much of our vocabulary applies to non-observable objects or events. It is important to teach a child to indicate when “hurt” or “sick”, or “happy” or “sad.” In these instances, an adult must infer the child’s feelings from his/her behavior and surrounding circumstances. For example, if you see a child crying after bumping her head, you might ask if it hurts. As vocabulary size increases, meaning can be established through higher-order conditioning using only words. For example, if a child is taught that a jellyfish is a “yucky creature that lives in the sea and stings”, he/she will probably become fearful when swimming in the ocean.

Since different languages have different word orders for the parts of speech, syntax (i.e., grammatical order) must be learned. At about 18 months to 2 years of age, children usually start to combine words and by 2-1/2 they are forming brief (not always grammatical) sentences. With repeated examples of their native language, children are able to abstract out schemas (i.e., an organized set of rules) for forming grammatical sentences (e.g., “the car is blue”, “the square is big”, etc.). It is much easier to learn grammatical sequences of nonsense words (e.g., The maff vlems oothly um the glox nerfs) than non-grammatical sequences (e.g., maff vlem ooth um glox nerf). This indicates the role of schema learning in the acquisition of syntax (Osgood, 1957, p.88). Children usually acquire the intricacies of grammar by about 6 years of age. In the next chapter, we will describe the process of abstraction as it applies to concept learning, schema development, and problem-solving.

Vocabulary size has been found to be an important predictor of success in school (Anderson & Freebody, 1981). Major factors influencing vocabulary size include socio-economic status (SES) and the language proficiencies of significant others, particularly the mother. In a monumental project, Hart and Risley (1995) recorded the number of words spoken at home by parents and 7-month-to 36-month-old children in 42 families over a 3-year period. They found that differences in the children’s IQ scores, language abilities, and success in school were all related to how much their parents spoke to them. They also found significant differences in the manner in which low and high SES parents spoke to their children. Low SES parents were more likely to make demands and offer reprimands while high SES parents were more likely to engage in extended conversations, discussion, and problem-solving. Whereas the number of reprimands given for inappropriate behavior was about the same for low and high SES parents, high SES parents administered much more praise.

Speech becomes an important and efficient way of communicating one’s thoughts, wishes, and feelings. This is true for the Nukak as well as for us. Given the harshness of their living conditions and the limits of their experiences, the Nukak have much in common with low SES children within our society. Declarative statements (e.g., “the stick is sharp”, “the stove is hot”; “pick up the leaves”, “don’t fight with your sister”; “I am happy”, “you are tired”, become the primary basis for conducting much of the everyday chores and interactions.

Describe how control learning principles apply to the acquisition of language.

Spoken language is observed in stone-age hunter/gatherer and technologically advanced cultures. There has been controversy concerning the role of nature and nurture in human language development (Chomsky, 1959; Skinner, 1957). Skinner, writing from a functionalist/behavioral perspective, tellingly entitled his book Verbal Behavior , not “Using Language.” Watson (1930) described thinking as “covert speech” while Skinner (1953) referred to “private behavior.” According to Vygotsky (originally published in 1934), children initially “think out loud” and eventually learn to “think to themselves.” Skinner suggested that speaking and thinking were not different in kind from other forms of behavior and that respondent conditioning (predictive learning) and operant conditioning (control learning) could provide the necessary experiential explanatory principles. There was no need to propose a separate “language acquisition device” to account for human speech.

We saw in Chapter 5, how predictive learning principles could be applied to the acquisition of word meaning. Basically, Skinner argued that words could serve as overt and covert substitutes for the control learning ABCs. As antecedents, words could function as discriminative stimuli and warning stimuli. For example, “Give mommy a kiss” or “Heads up!” As consequences, words can substitute for reinforcers and punishers (e.g., “Thank you.”, “Stop that!”). A rule is a common, useful, and important type of verbal statement including each of the control learning ABCs (Hayes, 1989). That is, a rule specifies the circumstances (antecedents) under which a particular act (behavior) is rewarded or punished (consequence). For example, a parent might instruct a child, “At dinner, if you eat your vegetables you can have your dessert” or, “When you get to the curb look both ways before crossing the street or you could get hit by a car.”

Chomsky, a psycholinguist, submitted a scathing critique of Skinner’s book, emphasizing how human genetics appears to include a “language acquisition device.” The Chapter 1 picture of the human homunculus, with its disproportional brain space dedicated to the body parts involved in speech, certainly suggests that the human being’s structure facilitates language acquisition. The homunculus also implies there is adaptive value to spoken language; otherwise these structures would not have evolved. Proposing a “language acquisition device”, similar to proposing an instinct to account for speech, is a circular pseudo-explanation. The language acquisition device is inferred from the observation of speech, it does not explain speech. Remember, a psychological explanation must specify specific hereditary and/or environmental causes. Chomsky does neither, whereas Skinner is quite specific about the types of experience that will foster different types of verbal behavior. It is not as though Skinner denies the role of human structure in the acquisition of speech or its importance as indicated in the following quote. “The human species took a crucial step forward when its vocal musculature came under operant control in the production of speech sounds. Indeed, it is possible that all the distinctive achievements of the species can be traced to that one genetic change” (Skinner, 1986). Neuroscientists and behavioral neuroscientists are actively engaged in research examining how our “all-purpose acquisition device” (i.e., brain) is involved in the learning of speech, reading, quantitative skills, problem-solving, etc.

Human beings may have started out under restricted geographic and climatic conditions in Africa, but we have spread all over the globe (Diamond, 2005). We developed different words and languages tailored to our environmental and social circumstances. There is much to be learned from the school of hard knocks, but it is limited to our direct experience and can be difficult or dangerous. Our verbal lives enormously expand learning opportunities beyond our immediate environment to anything that can be imagined. Indirect learning (i.e., observation or language) often speeds up adaptive learning and eliminates danger. It is not surprising that human parents universally dedicate a great deal of effort to teaching their children to speak. It makes life easier, safer, and better for them as well as their children.

MacCorquodale (1969) wrote a retrospective appreciation of Skinner’s book along with a comprehensive and well-reasoned response (1970) to Chomsky’s critique. Essentially, MacCorquodale described Chomsky as a structuralist and Skinner as a functionalist. That is, Chomsky attempted to describe how the structure of the mind enables language. Skinner was concerned with how language enables individuals to adapt to their environmental conditions. Paraphrasing Mark Twain, an article marking the 50th anniversary of its publication concluded that “Reports of the death of Verbal Behavior and behaviorism have been greatly exaggerated” (Schlinger, 2008).

Reading and Writing

It is language in written form that has enabled the rapid and widespread dissemination of knowledge within and between cultures. It is also the medium for recording our evolving advances in knowledge and technology. Early forms of Bronze Age writing were based on symbols or pictures etched in clay. Later Bronze Age writing started to include phonemic symbols that were precursors to the Iron Age Phoenician alphabet consisting of 22 characters representing consonants (but no vowels). The Phoenician alphabet was adopted by the Greeks and evolved into the modern Roman alphabet. The phonetic alphabet permitted written representation of any pronounceable word in a language.

The Arabic numbering system was originally invented in India before being transmitted to Europe in the Middle Ages. It permits written representation of any quantity, real or imagined, and is fundamental to mathematics and the scientific method, which rely on quantification and measurement. The alphabet and Arabic numbers permit words to become “permanent” in comparison to their transitory auditory form. This written permanence made it possible to communicate with more people over greater distances and eventually to build libraries. The first great library was established at Alexandria, Egypt in approximately 300 years B.C. Scrolls of parchment and papyrus were stored on the walled shelves of a huge concrete building (Figure 6.5).   Gutenberg’s invention of the printing press in 1439 enabled mass publication of written material throughout Western Europe (Figure 6.6). Today, e-books are available on electronic readers that can be held in the palm of your hands (Figure 6.7)! It should not be surprising that college student differences in knowledge correlate with their amount of exposure to print (Stanovich and Cunningham, 1993).

File:Library of Alexandria (sepia).jpg

Figure 6.5  The library at Alexandria.

speech on human psychology

Figure 6.6  Gutenberg’s printing press.

https://upload.wikimedia.org/wikipedia/commons/1/18/Kindle_3_by_Jleon.jpg

Figure 6.7  The library now.

Attributions

Figure 6.5 “The library at Alexandria” by Wikimedia is licensed under CC BY-SA 4.0

Figure 6.6 “Guttenburg’s printing press” by עדירל is licensed under CC BY-SA 3.0

Figure 6.7 “A mazon Kindle” by Jleon is licensed under CC BY-SA 3.0

essential features of language include:

interchangeability (ability to transmit and receive messages)

semanticity (specific signals have specific meanings)

arbitrariness of connection between a meaningful unit (e.g., word) and its reference

discreteness of basic units of sound (phonemes) and meaning (morphemes)

productivity (units of meaning must be combined to create new sounds and sentences

syntax (the sequence of meaningful units must matter)

displacement (ability to communicate about things that are not present in the past and future)

specifies the circumstances (antecedents) under which a particular act (behavior) is rewarded or punished

permits written representation of any pronounceable word in a language

permits written representation of any quantity, real or imagined; fundamental to mathematics and the scientific method

Psychology Copyright © by Jeffrey C. Levy is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

Language Acquisition Theory

Henna Lemetyinen

Postdoctoral Researcher

BSc (Hons), Psychology, PhD, Developmental Psychology

Henna Lemetyinen is a postdoctoral research associate at the Greater Manchester Mental Health NHS Foundation Trust (GMMH).

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Language is a cognition that truly makes us human. Whereas other species do communicate with an innate ability to produce a limited number of meaningful vocalizations (e.g., bonobos) or even with partially learned systems (e.g., bird songs), there is no other species known to date that can express infinite ideas (sentences) with a limited set of symbols (speech sounds and words).

This ability is remarkable in itself. What makes it even more remarkable is that researchers are finding evidence for mastery of this complex skill in increasingly younger children.

My project 1 51

Infants as young as 12 months are reported to have sensitivity to the grammar needed to understand causative sentences (who did what to whom; e.g., the bunny pushed the frog (Rowland & Noble, 2010).

After more than 60 years of research into child language development, the mechanism that enables children to segment syllables and words out of the strings of sounds they hear and to acquire grammar to understand and produce language is still quite an enigma.

Behaviorist Theory of Language Acquisition

One of the earliest scientific explanations of language acquisition was provided by Skinner (1957). As one of the pioneers of behaviorism , he accounted for language development using environmental influence, through imitation, reinforcement, and conditioning.

In this view, children learn words and grammar primarily by mimicking the speech they hear and receiving positive feedback for correct usage.

Skinner argued that children learn language based on behaviorist reinforcement principles by associating words with meanings. Correct utterances are positively reinforced when the child realizes the communicative value of words and phrases.

For example, when the child says ‘milk’ and the mother smiles and gives her some. As a result, the child will find this outcome rewarding, enhancing the child’s language development (Ambridge & Lieven, 2011).

Over time, through repetition and reinforcement, they refine their linguistic abilities. Critics argue this theory doesn’t fully explain the rapid pace of language acquisition nor the creation of novel sentences.

Chomsky Theory of Language Development

However, Skinner’s account was soon heavily criticized by Noam Chomsky, the world’s most famous linguist to date.

In the spirit of the cognitive revolution in the 1950s, Chomsky argued that children would never acquire the tools needed for processing an infinite number of sentences if the language acquisition mechanism was dependent on language input alone.

Noam Chomsky introduced the nativist theory of language development, emphasizing the role of innate structures and mechanisms in the human brain. Key points of Chomsky’s theory include:

Language Acquisition Device (LAD): Chomsky proposed that humans have an inborn biological capacity for language, often termed the LAD, which predisposes them to acquire language.

Universal Grammar: He suggested that all human languages share a deep structure rooted in a set of grammatical rules and categories. This “universal grammar” is understood intuitively by all humans.

Poverty of the Stimulus: Chomsky argued that the linguistic input received by young children is often insufficient (or “impoverished”) for them to learn the complexities of their native language solely through imitation or reinforcement. Yet, children rapidly and consistently master their native language, pointing to inherent cognitive structures.

Critical Period: Chomsky, along with other linguists, posited a critical period for language acquisition, during which the brain is particularly receptive to linguistic input, making language learning more efficient.

Critics of Chomsky’s theory argue that it’s too innatist and doesn’t give enough weight to social interaction and other factors in language acquisition.

Universal Grammar

Consequently, he proposed the theory of Universal Grammar: an idea of innate, biological grammatical categories, such as a noun category and a verb category, that facilitate the entire language development in children and overall language processing in adults.

Universal Grammar contains all the grammatical information needed to combine these categories, e.g., nouns and verbs, into phrases. The child’s task is just to learn the words of her language (Ambridge & Lieven).

For example, according to the Universal Grammar account, children instinctively know how to combine a noun (e.g., a boy) and a verb (to eat) into a meaningful, correct phrase (A boy eats).

This Chomskian (1965) approach to language acquisition has inspired hundreds of scholars to investigate the nature of these assumed grammatical categories, and the research is still ongoing.

Contemporary Research

A decade or two later, some psycho-linguists began to question the existence of Universal Grammar. They argued that categories like nouns and verbs are biologically, evolutionarily, and psychologically implausible and that the field called for an account that can explain the acquisition process without innate categories.

Researchers started to suggest that instead of having a language-specific mechanism for language processing, children might utilize general cognitive and learning principles.

Whereas researchers approaching the language acquisition problem from the perspective of Universal Grammar argue for early full productivity, i.e., early adult-like knowledge of the language, the opposing constructivist investigators argue for a more gradual developmental process. It is suggested that children are sensitive to patterns in language which enables the acquisition process.

An example of this gradual pattern learning is morphology acquisition. Morphemes are the smallest grammatical markers, or units, in language that alter words. In English, regular plurals are marked with an –s morpheme (e.g., dog+s).

Similarly, English third singular verb forms (she eat+s, a boy kick+s) are marked with the –s morpheme. Children are considered to acquire their first instances of third singular forms as entire phrasal chunks (Daddy kicks, a girl eats, a dog barks) without the ability to tease the finest grammatical components apart.

When the child hears a sufficient number of instances of a linguistic construction (i.e., the third singular verb form), she will detect patterns across the utterances she has heard. In this case, the repeated pattern is the –s marker in this particular verb form.

As a result of many repetitions and examples of the –s marker in different verbs, the child will acquire sophisticated knowledge that, in English, verbs must be marked with an –s morpheme in the third singular form (Ambridge & Lieven, 2011; Pine, Conti-Ramsden, Joseph, Lieven & Serratrice, 2008; Theakson & Lieven, 2005).

Approaching language acquisition from the perspective of general cognitive processing is an economic account of how children can learn their first language without an excessive biolinguistic mechanism.

However, finding a solid answer to the problem of language acquisition is far from being over. Our current understanding of the developmental process is still immature.

Investigators of Universal Grammar are still trying to convince that language is a task too demanding to acquire without specific innate equipment, whereas constructivist researchers are fiercely arguing for the importance of linguistic input.

The biggest questions, however, are yet unanswered. What is the exact process that transforms the child’s utterances into grammatically correct, adult-like speech? How much does the child need to be exposed to language to achieve the adult-like state?

What account can explain variation between languages and the language acquisition process in children acquiring very different languages to English? The mystery of language acquisition is granted to keep psychologists and linguists alike astonished decade after decade.

What is language acquisition?

Language acquisition refers to the process by which individuals learn and develop their native or second language.

It involves the acquisition of grammar, vocabulary, and communication skills through exposure, interaction, and cognitive development. This process typically occurs in childhood but can continue throughout life.

What is Skinner’s theory of language development?

Skinner’s theory of language development, also known as behaviorist theory, suggests that language is acquired through operant conditioning. According to Skinner, children learn language by imitating and being reinforced for correct responses.

He argued that language is a result of external stimuli and reinforcement, emphasizing the role of the environment in shaping linguistic behavior.

What is Chomsky’s theory of language acquisition?

Chomsky’s theory of language acquisition, known as Universal Grammar, posits that language is an innate capacity of humans.

According to Chomsky, children are born with a language acquisition device (LAD), a biological ability that enables them to acquire language rules and structures effortlessly.

He argues that there are universal grammar principles that guide language development across cultures and languages, suggesting that language acquisition is driven by innate linguistic knowledge rather than solely by environmental factors.

Ambridge, B., & Lieven, E.V.M. (2011). Language Acquisition: Contrasting theoretical approaches . Cambridge: Cambridge University Press.

Chomsky, N. (1965). Aspects of the Theory of Syntax . MIT Press.

Pine, J.M., Conti-Ramsden, G., Joseph, K.L., Lieven, E.V.M., & Serratrice, L. (2008). Tense over time: testing the Agreement/Tense Omission Model as an account of the pattern of tense-marking provision in early child English. Journal of Child Language , 35(1): 55-75.

Rowland, C. F.; & Noble, C. L. (2010). The role of syntactic structure in children’s sentence comprehension: Evidence from the dative. Language Learning and Development , 7(1): 55-75.

Skinner, B.F. (1957). Verbal behavior . Acton, MA: Copley Publishing Group.

Theakston, A.L., & Lieven, E.V.M. (2005). The acquisition of auxiliaries BE and HAVE: an elicitation study. Journal of Child Language , 32(2): 587-616.

Further Reading

An excellent article by Steven Pinker on Language Acquisition

Pinker, S. (1995). The New Science of Language and Mind . Penguin.

Tomasello, M. (2005). Constructing A Language: A Usage-Based Theory of Language Acquisition . Harvard University Press.

Print Friendly, PDF & Email

Related Articles

Vygotsky vs. Piaget: A Paradigm Shift

Child Psychology

Vygotsky vs. Piaget: A Paradigm Shift

Interactional Synchrony

Interactional Synchrony

Internal Working Models of Attachment

Internal Working Models of Attachment

Learning Theory of Attachment

Learning Theory of Attachment

Stages of Attachment Identified by John Bowlby And Schaffer & Emerson (1964)

Stages of Attachment Identified by John Bowlby And Schaffer & Emerson (1964)

Attachment Theory In Psychology

Attachment Theory In Psychology

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 13 May 2024

Representation of internal speech by single neurons in human supramarginal gyrus

  • Sarah K. Wandelt   ORCID: orcid.org/0000-0001-9551-8491 1 , 2 ,
  • David A. Bjånes 1 , 2 , 3 ,
  • Kelsie Pejsa 1 , 2 ,
  • Brian Lee 1 , 4 , 5 ,
  • Charles Liu   ORCID: orcid.org/0000-0001-6423-8577 1 , 3 , 4 , 5 &
  • Richard A. Andersen 1 , 2  

Nature Human Behaviour ( 2024 ) Cite this article

7399 Accesses

1 Citations

294 Altmetric

Metrics details

  • Brain–machine interface
  • Neural decoding

Speech brain–machine interfaces (BMIs) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury. While important advances in vocalized, attempted and mimed speech decoding have been achieved, results for internal speech decoding are sparse and have yet to achieve high functionality. Notably, it is still unclear from which brain areas internal speech can be decoded. Here two participants with tetraplegia with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. In both participants, we found significant neural representation of internal and vocalized speech, at the single neuron and population level in the SMG. From recorded population activity in the SMG, the internally spoken and vocalized words were significantly decodable. In an offline analysis, we achieved average decoding accuracies of 55% and 24% for each participant, respectively (chance level 12.5%), and during an online internal speech BMI task, we averaged 79% and 23% accuracy, respectively. Evidence of shared neural representations between internal speech, word reading and vocalized speech processes was found in participant 1. SMG represented words as well as pseudowords, providing evidence for phonetic encoding. Furthermore, our decoder achieved high classification with multiple internal speech strategies (auditory imagination/visual imagination). Activity in S1 was modulated by vocalized but not internal speech in both participants, suggesting no articulator movements of the vocal tract occurred during internal speech production. This work represents a proof-of-concept for a high-performance internal speech BMI.

Similar content being viewed by others

speech on human psychology

Online speech synthesis using a chronically implanted brain–computer interface in an individual with ALS

speech on human psychology

A high-performance speech neuroprosthesis

speech on human psychology

The speech neuroprosthesis

Speech is one of the most basic forms of human communication, a natural and intuitive way for humans to express their thoughts and desires. Neurological diseases like amyotrophic lateral sclerosis (ALS) and brain lesions can lead to the loss of this ability. In the most severe cases, patients who experience full-body paralysis might be left without any means of communication. Patients with ALS self-report loss of speech as their most serious concern 1 . Brain–machine interfaces (BMIs) are devices offering a promising technological path to bypass neurological impairment by recording neural activity directly from the cortex. Cognitive BMIs have demonstrated potential to restore independence to participants with tetraplegia by reading out movement intent directly from the brain 2 , 3 , 4 , 5 . Similarly, reading out internal (also reported as inner, imagined or covert) speech signals could allow the restoration of communication to people who have lost it.

Decoding speech signals directly from the brain presents its own unique challenges. While non-invasive recording methods such as functional magnetic resonance imaging (fMRI), electroencephalography (EEG) or magnetoencephalography 6 are important tools to locate speech and internal speech production, they lack the necessary temporal and spatial resolution, adequate signal-to-noise ratio or portability for building an online speech BMI 7 , 8 , 9 . For example, state-of-the-art EEG-based imagined speech decoding performances in 2022 ranged from approximately 60% to 80% binary classification 10 . Intracortical electrophysiological recordings have higher signal-to-noise ratios and excellent temporal resolution 11 and are a more suitable choice for an internal speech decoding device.

Invasive speech decoding has predominantly been attempted with electrocorticography (ECoG) 9 or stereo-electroencephalographic depth arrays 12 , as they allow sampling neural activity from different parts of the brain simultaneously. Impressive results in vocalized and attempted speech decoding and reconstruction have been achieved using these techniques 13 , 14 , 15 , 16 , 17 , 18 . However, vocalized speech has also been decoded from localized regions of the cortex. In 2009, the use of a neurotrophic electrode 19 demonstrated real-time speech synthesis from the motor cortex. More recently, speech neuroprosthetics were built from small-scale microelectrode arrays located in the motor cortex 20 , 21 , premotor cortex 22 and supramarginal gyrus (SMG) 23 , demonstrating that vocalized speech BMIs can be built using neural signals from localized regions of cortex.

While important advances in vocalized speech 16 , attempted speech 18 and mimed speech 17 , 22 , 24 , 25 , 26 decoding have been made, highly accurate internal speech decoding has not been achieved. Lack of behavioural output, lower signal-to-noise ratio and differences in cortical activations compared with vocalized speech are speculated to contribute to lower classification accuracies of internal speech 7 , 8 , 13 , 27 , 28 . In ref. 29 , patients implanted with ECoG grids over frontal, parietal and temporal regions silently read or vocalized written words from a screen. They significantly decoded vowels (37.5%) and consonants (36.3%) from internal speech (chance level 25%). Ikeda et al. 30 decoded three internally spoken vowels using ECoG arrays using frequencies in the beta band, with up to 55.6% accuracy from the Broca area (chance level 33%). Using the same recording technology, ref. 31 investigated the decoding of six words during internal speech. The authors demonstrated an average pair-wise classification accuracy of 58%, reaching 88% for the highest pair (chance level 50%). These studies were so-called open-loop experiments, in which the data were analysed offline after acquisition. A recent paper demonstrated real-time (closed-loop) speech decoding using stereotactic depth electrodes 32 . The results were encouraging as internal speech could be detected; however, the reconstructed audio was not discernable and required audible speech to train the decoding model.

While, to our knowledge, internal speech has not previously been decoded from SMG, evidence for internal speech representation in the SMG exists. A review of 100 fMRI studies 33 not only described SMG activity during speech production but also suggested its involvement in subvocal speech 34 , 35 . Similarly, an ECoG study identified high-frequency SMG modulation during vocalized and internal speech 36 . Additionally, fMRI studies have demonstrated SMG involvement in phonologic processing, for instance, during tasks while participants reported whether two words rhyme 37 . Performing such tasks requires the participant to internally ‘hear’ the word, indicating potential internal speech representation 38 . Furthermore, a study performed in people suffering from aphasia found that lesions in the SMG and its adjacent white matter affected inner speech rhyming tasks 39 . Recently, ref. 16 showed that electrode grids over SMG contributed to vocalized speech decoding. Finally, vocalized grasps and colour words were decodable from SMG from one of the same participants involved in this work 23 . These studies provide evidence for the possibility of an internal speech decoder from neural activity in the SMG.

The relationship between inner speech and vocalized speech is still debated. The general consensus posits similarities between internal and vocalized speech processes 36 , but the degree of overlap is not well understood 8 , 35 , 40 , 41 , 42 . Characterizing similarities between vocalized and internal speech could provide evidence that results found with vocalized speech could translate to internal speech. However, such a relationship may not be guaranteed. For instance, some brain areas involved in vocalized speech might be poor candidates for internal speech decoding.

In this Article, two participants with tetraplegia performed internal and vocalized speech of eight words while neurophysiological responses were captured from two implant sites. To investigate neural semantic and phonetic representation, the words were composed of six lexical words and two pseudowords (words that mimic real words without semantic meaning). We examined representations of various language processes at the single-neuron level using recording microelectrode arrays from the SMG located in the posterior parietal cortex (PPC) and the arm and/or hand regions of the primary somatosensory cortex (S1). S1 served as a control for movement, due to emerging evidence of its activation beyond defined regions of interest 43 , 44 . Words were presented with an auditory or a written cue and were produced internally as well as orally. We hypothesized that SMG and S1 activity would modulate during vocalized speech and that SMG activity would modulate during internal speech. Shared representation between internal speech, vocalized speech, auditory comprehension and word reading processes was investigated.

Task design

We characterized neural representations of four different language processes within a population of SMG and S1 neurons: auditory comprehension, word reading, internal speech and vocalized speech production. In this manuscript, internal speech refers to engaging a prompted word internally (‘inner monologue’), without correlated motor output, while vocalized speech refers to audibly vocalizing a prompted word. Participants were implanted in the SMG and S1 on the basis of grasp localization fMRI tasks (Fig. 1 ).

figure 1

a , b , SMG implant locations in participant 1 (1 × 96 multielectrode array) ( a ) and participant 2 (1 × 64 multielectrode array) ( b ). c , d , S1 implant locations in participant 1 (2 × 96 multielectrode arrays) ( c ) and participant 2 (2 × 64 multielectrode arrays) ( d ).

The task contained six phases: an inter-trial interval (ITI), a cue phase (cue), a first delay (D1), an internal speech phase (internal), a second delay (D2) and a vocalized speech phase (speech). Words were cued with either an auditory or a written version of the word (Fig. 2a ). Six of the words were informed by ref. 31 (battlefield, cowboy, python, spoon, swimming and telephone). Two pseudowords (nifzig and bindip) were added to explore phonetic representation in the SMG. The first participant completed ten session days, composed of both the auditory and the written cue tasks. The second participant completed nine sessions, focusing only on the written cue task. The participants were instructed to internally say the cued word during the internal speech phase and to vocalize the same word during the speech phase.

figure 2

a , Written words and sounds were used to cue six words and two pseudowords in a participant with tetraplegia. The ‘audio cue’ task was composed of an ITI, a cue phase during which the sound of one of the words was emitted from a speaker (between 842 and 1,130 ms), a first delay (D1), an internal speech phase, a second delay (D2) and a vocalized speech phase. The ‘written cue’ task was identical to the ‘audio cue’ task, except that written words appeared on the screen for 1.5 s. Eight repetitions of eight words were performed per session day and per task for the first participant. For the second participant, 16 repetitions of eight words were performed for the written cue task. b – e , Example smoothed firing rates of neurons tuned to four words in the SMG for participant 1 (auditory cue, python ( b ), and written cue, telephone ( c )) and participant 2 (written cue, nifzig ( d ), and written cue, spoon ( e )). Top: the average firing rate over 8 or 16 trials (solid line, mean; shaded area, 95% bootstrapped confidence interval). Bottom: one example trial with associated audio amplitude (grey). Vertically dashed lines indicate the beginning of each phase. Single neurons modulate firing rate during internal speech in the SMG.

For each of the four language processes, we observed selective modulation of individual neurons’ firing rates (Fig. 2b–e ). In general, the firing rates of neurons increased during the active phases (cue, internal and speech) and decreased during the rest phases (ITI, D1 and D2). A variety of activation patterns were present in the neural population. Example neurons were selected to demonstrate increases in firing rates during internal speech, cue and vocalized speech. Both the auditory (Fig. 2b ) and the written cue (Fig. 2c–e ) evoked highly modulated firing rates of individual neurons during internal speech.

These stereotypical activation patterns were evident at the single-trial level (Fig. 2b–e , bottom). When the auditory recording was overlaid with firing rates from a single trial, a heterogeneous neural response was observed (Supplementary Fig. 1a ), with some SMG neurons preceding or lagging peak auditory levels during vocalized speech. In contrast, neural activity from primary sensory cortex (S1) only modulated during vocalized speech and produced similar firing patterns regardless of the vocalized word (Supplementary Fig. 1b ).

Population activity represented selective tuning for individual words

Population analysis in the SMG mirrored single-neuron patterns of activation, showing increases in tuning during the active task phases (Fig. 3a,d ). Tuning of a neuron to a word was determined by fitting a linear regression model to the firing rate in 50-ms time bins ( Methods ). Distinctions between participant 1 and participant 2 were observed. Specifically, participant 1 exhibited strong tuning, whereas the number of tuned units was notably lower in participant 2. Based on these findings, we exclusively ran the written cue task with participant number 2. In participant 1, representation of the auditory cue was lower compared with the written cue (Fig. 3b , cue). However, this difference was not observed for other task phases. In both participants, the tuned population activity in S1 increased during vocalized speech but not during the cue and internal speech phases (Supplementary Fig. 3a,b ).

figure 3

a , The average percentage of tuned neurons to words in 50-ms time bins in the SMG over the trial duration for ‘auditory cue’ (blue) and ‘written cue’ (green) tasks for participant 1 (solid line, mean over ten sessions; shaded area, 95% confidence interval of the mean). During the cue phase of auditory trials, neural data were aligned to audio onset, which occurred within 200–650 ms following initiation of the cue phase. b , The average percentage of tuned neurons computed on firing rates per task phase, with 95% confidence interval over ten sessions. Tuning during action phases (cue, internal and speech) following rest phases (ITI, D1 and D2) was significantly higher (paired two-tailed t -test, d.f. 9, P ITI_CueWritten  < 0.001, Cohen’s d  = 2.31; P ITI_CueAuditory  = 0.003, Cohen’s d  = 1.25; P D1_InternalWritten  = 0.008, Cohen’s d  = 1.08; P D1_InternalAuditory  < 0.001, Cohen’s d  = 1.71; P D2_SpeechWritten  < 0.001, Cohen’s d  = 2.34; P D2_SpeechAuditory  < 0.001, Cohen’s d  = 3.23). c , The number of neurons tuned to each individual word in each phase for the ‘auditory cue’ and ‘written cue’ tasks. d , The average percentage of tuned neurons to words in 50-ms time bins in the SMG over the trial duration for ‘written cue’ (green) tasks for participant 2 (solid line, mean over nine sessions; shaded area, 95% confidence interval of the mean). Due to a reduced number of tuned units, only the ‘written cue’ task variation was performed. e , The average percentage of tuned neurons computed on firing rates per task phase, with 95% confidence interval over nine sessions. Tuning during cue and internal phases following rest phases ITI and D1 was significantly higher (paired two-tailed t -test, d.f. 8, P ITI_CueWritten  = 0.003, Cohen’s d  = 1.38; P D1_Internal  = 0.001, Cohen’s d  = 1.67). f , The number of neurons tuned to each individual word in each phase for the ‘written cue’ task.

Source data

To quantitatively compare activity between phases, we assessed the differential response patterns for individual words by examining the variations in average firing rate across different task phases (Fig. 3b,e ). In both participants, tuning during the cue and internal speech phases was significantly higher compared with their preceding rest phases ITI and D1 (paired t -test between phases. Participant 1: d.f. 9, P ITI_CueWritten  < 0.001, Cohen’s d  = 2.31; P ITI_CueAuditory  = 0.003, Cohen’s d  = 1.25; P D1_InternalWritten  = 0.008, Cohen’s d  = 1.08; P D1_InternalAuditory  < 0.001, Cohen’s d  = 1.71. Participant 2: d.f. 8, P ITI_CueWritten  = 0.003, Cohen’s d  = 1.38; P D1_Internal  = 0.001, Cohen’s d  = 1.67). For participant 1, we also observed significantly higher tuning to vocalized speech than to tuning in D2 (d.f. 9, P D2_SpeechWritten  < 0.001, Cohen’s d  = 2.34; P D2_SpeechAuditory  < 0.001, Cohen’s d  = 3.23). Representation for all words was observed in each phase, including pseudowords (bindip and nifzig) (Fig. 3c,f ). To identify neurons with selective activity for unique words, we performed a Kruskal–Wallis test (Supplementary Fig. 3c,d ). The results mirrored findings of the regression analysis in both participants, albeit weaker in participant 2. These findings suggest that, while neural activity during active phases differed from activity during the ITI phase, neural responses of only a few neurons varied across different words for participant 2.

The neural population in the SMG simultaneously represented several distinct aspects of language processing: temporal changes, input modality (auditory, written for participant 1) and unique words from our vocabulary list. We used demixed principal component analysis (dPCA) to decompose and analyse contributions of each individual component: timing, cue modality and word. In Fig. 4 , demixed principal components (PCs) explaining the highest amount of variance were plotted by projecting data onto their respective dPCA decoder axis.

figure 4

a – e , dPCA was performed to investigate variance within three marginalizations: ‘timing’, ‘cue modality’ and ‘word’ for participant 1 ( a – c ) and ‘timing’ and ‘word’ for participant 2 ( d and e ). Demixed PCs explaining the highest variance within each marginalization were plotted over time, by projecting the data onto their respective dPCA decoder axis. In a , the ‘timing’ marginalization demonstrates SMG modulation during cue, internal speech and vocalized speech, while S1 only represents vocalized speech. The solid blue lines (8) represent the auditory cue trials, and dashed green lines (8) represent written cue trials. In b , the ‘cue modality’ marginalization suggests that internal and vocalized speech representation in the SMG are not affected by the cue modality. The solid blue lines (8) represent the auditory cue trials, and dashed green lines (8) represent written cue trials. In c , the ‘word’ marginalization shows high variability for different words in the SMG, but near zero for S1. The colours (8) represent individual words. For each colour, solid lines represent auditory trials and dashed lines represent written cue trials. d is the same as a , but for participant 2. The dashed green lines (8) represent written cue trials. e is the same as c , but for participant 2. The colours (8) represent individual words during written cue trials. The variance for different words in the SMG (left) was higher than in S1 (right), but lower in comparison with SMG in participant 1 ( c ).

For participant 1, the ‘timing’ component revealed that temporal dynamics in the SMG peaked during all active phases (Fig. 4a ). In contrast, temporal S1 modulation peaked only during vocalized speech production, indicating a lack of synchronized lip and face movement of the participant during the other task phases. While ‘cue modality’ components were separable during the cue phase (Fig. 4b ), they overlapped during subsequent phases. Thus, internal and vocalized speech representation may not be influenced by the cue modality. Pseudowords had similar separability to lexical words (Fig. 4c ). The explained variance between words was high in the SMG and was close to zero in S1. In participant 2, temporal dynamics of the task were preserved (‘timing’ component). However, variance to words was reduced, suggesting lower neuronal ability to represent individual words in participant 2. In S1, the results mirrored findings from S1 in participant 1 (Fig. 4d,e , right).

Internal speech is decodable in the SMG

Separable neural representations of both internal and vocalized speech processes implicate SMG as a rich source of neural activity for real-time speech BMI devices. The decodability of words correlated with the percentage of tuned neurons (Fig. 3a–f ) as well as the explained dPCA variance (Fig. 4c,e ) observed in the participants. In participant 1, all words in our vocabulary list were highly decodable, averaging 55% offline decoding and 79% (16–20 training trials) online decoding from neurons during internal speech (Fig. 5a,b ). Words spoken during the vocalized phase were also highly discriminable, averaging 74% offline (Fig. 5a ). In participant 2, offline internal speech decoding averaged 24% (Supplementary Fig. 4b ) and online decoding averaged 23% (Fig. 5a ), with preferential representation of words ‘spoon’ and ‘swimming’.

figure 5

a , Offline decoding accuracies: ‘audio cue’ and ‘written cue’ task data were combined for each individual session day, and leave-one-out CV was performed (black dots). PCA was performed on the training data, an LDA model was constructed, and classification accuracies were plotted with 95% confidence intervals, over the session means. The significance of classification accuracies were evaluated by comparing results with a shuffled distribution (averaged shuffle results over 100 repetitions indicated by red dots; P  < 0.01 indicates that the average mean is >99.5th percentile of shuffle distribution, n  = 10). In participant 1, classification accuracies during action phases (cue, internal and speech) following rest phases (ITI, D1 and D2) were significantly higher (paired two-tailed t -test: n  = 10, d.f. 9, for all P  < 0.001, Cohen’s d  = 6.81, 2.29 and 5.75). b , Online decoding accuracies: classification accuracies for internal speech were evaluated in a closed-loop internal speech BMI application on three different session days for both participants. In participant 1, decoding accuracies were significantly above chance (averaged shuffle results over 1,000 repetitions indicated by red dots; P  < 0.001 indicates that the average mean is >99.95th percentile of shuffle distribution) and improved when 16–20 trials per words were used to train the model (two-sample two-tailed t -test, n (8–14)  = 8, d.f. 11, n (16–20)  = 5, P  = 0.029), averaging 79% classification accuracy. In participant 2, online decoding accuracies were significant (averaged shuffle results over 1,000 repetitions indicated by red dots; P  < 0.05 indicates that average mean is >97.5th percentile of shuffle distribution, n  = 7) and averaged 23%. c , An offline confusion matrix for participant 1: confusion matrices for each of the different task phases were computed on the tested data and averaged over all session days. d , An online confusion matrix: a confusion matrix was computed combining all online runs, leading to a total of 304 trials (38 trials per word) for participant 1 and 448 online trials for participant 2. Participant 1 displayed comparable online decoding accuracies for all words, while participant 2 had preferential decoding for the words ‘swimming’ and ‘spoon’.

In participant 1, trial data from both types of cue (auditory and written) were concatenated for offline analysis, since SMG activity was only differentiable between the types of cue during the cue phase (Figs. 3a and 4b ). This resulted in 16 trials per condition. Features were selected via principal component analysis (PCA) on the training dataset, and PCs that explained 95% of the variance were kept. A linear discriminant analysis (LDA) model was evaluated with leave-one-out cross-validation (CV). Significance was computed by comparing results with a null distribution ( Methods ).

Significant word decoding was observed during all phases, except during the ITI (Fig. 5a , n  = 10, mean decoding value above 99.5th percentile of shuffle distribution is P  < 0.01, per phase, Cohen’s d  = 0.64, 6.17, 3.04, 6.59, 3.93 and 8.26, confidence interval of the mean ± 1.73, 4.46, 5.21, 5.67, 4.63 and 6.49). Decoding accuracies were significantly higher in the cue, internal speech and speech condition, compared with rest phases ITI, D1 and D2 (Fig. 5a , paired t -test, n  = 10, d.f. 9, for all P  < 0.001, Cohen’s d  = 6.81, 2.29 and 5.75). Significant cue phase decoding suggested that modality-independent linguistic representations were present early within the task 45 . Internal speech decoding averaged 55% offline, with the highest session at 72% and a chance level of ~12.5% (Fig. 5a , red line). Vocalized speech averaged even higher, at 74%. All words were highly decodable (Fig. 5c ). As suggested from our dPCA results, individual words were not significantly decodable from neural activity in S1 (Supplementary Fig. 4a ), indicating generalized activity for vocalized speech in the S1 arm region (Fig. 4c ).

For participant 2, SMG significant word decoding was observed during the cue, internal and vocalized speech phases (Supplementary Fig. 4b , n  = 9, mean decoding value above 97.5th/99.5th percentile of shuffle distribution is P  < 0.05/ P  < 0.01, per phase Cohen’s d  = 0.35, 1.15, 1.09, 1.44, 0.99 and 1.49, confidence interval of the mean ± 3.09, 5.02, 6.91, 8.14, 5.45 and 4.15). Decoding accuracies were significantly higher in the cue and internal speech condition, compared with rest phases ITI and D1 (Supplementary Fig. 4b , paired t -test, n  = 9, d.f. 8, P ITI_Cue  = 0.013, Cohen’s d  = 1.07, P D1_Internal  = 0.01, Cohen’s d  = 1.11). S1 decoding mirrored results in participant 1, suggesting that no synchronized face movements occurred during the cue phase or internal speech phase (Supplementary Fig. 4c ).

High-accuracy online speech decoder

We developed an online, closed-loop internal speech BMI using an eight-word vocabulary (Fig. 5b ). On three separate session days, training datasets were generated using the written cue task, with eight repetitions of each word for each participant. An LDA model was trained on the internal speech data of the training set, corresponding to only 1.5 s of neural data per repetition for each class. The trained decoder predicted internal speech during the online task. During the online task, the vocalized speech phase was replaced with a feedback phase. The decoded word was shown in green if correctly decoded, and in red if wrongly decoded (Supplementary Video 1 ). The classifier was retrained after each run of the online task, adding the newly recorded data. Several online runs were performed on each session day, corresponding to different datapoints on Fig. 5b . When using between 8 and 14 repetitions per words to train the decoding model, an average of 59% classification accuracy was obtained for participant 1. Accuracies were significantly higher (two-sample two-tailed t -test, n (8–14)  = 8, n (16–20)  = 5, d.f. 11, P  = 0.029) the more data were added to train the model, obtaining an average of 79% classification accuracy with 16–20 repetitions per word. The highest single run accuracy was 91%. All words were well represented, illustrated by a confusion matrix of 304 trials (Fig. 5d ). In participant 2, decoding was statistically significant, but lower compared with participant 1. The lower number of tuned units (Fig. 3a–f ) and reduced explained variance between words (Fig. 4e , left) could account for these findings. Additionally, preferential representation of words ‘spoon’ and ‘swimming’ was observed.

Shared representations between internal speech, written words and vocalized speech

Different language processes are engaged during the task: auditory comprehension or visual word recognition during the cue phase, and internal speech and vocalized speech production during the speech phases. It has been widely assumed that each of these processes is part of a highly distributed network, involving multiple cortical areas 46 . In this work, we observed significant representation of different language processes in a common cortical region, SMG, in our participants. To explore the relationships between each of these processes, for participant 1 we used cross-phase classification to identify the distinct and common neural codes separately in the auditory and written cue datasets. By training our classifier on the representation found in one phase (for example, the cue phase) and testing the classifier on another phase (for example, internal speech), we quantified generalizability of our models across neural activity of different language processes (Fig. 6 ). The generalizability of a model to different task phases was evaluated through paired t -tests. No significant difference between classification accuracies indicates good generalization of the model, while significantly lower classification accuracies suggest poor generalization of the model.

figure 6

a , Evaluating the overlap of shared information between different task phases in the ‘auditory cue’ task. For each of the ten session days, cross-phase classification was performed. It consisted in training a model on a subset of data from one phase (for example, cue) and applying it on a subset of data from ITI, cue, internal and speech phases. This analysis was performed separately for each task phase. PCA was performed on the training data, an LDA model was constructed and classification accuracies were plotted with a 95% confidence interval over session means. Significant differences in performance between phases were evaluated between the ten sessions (paired two-tailed t -test, FDR corrected, d.f. 9, P  < 0.001 for all, Cohen’s d  ≥ 1.89). For easier visibility, significant differences between ITI and other phases were not plotted. b , Same as a for the ‘written cue’ task (paired two-tailed t -test, FDR corrected, d.f. 9, P Cue_Internal  = 0.028, Cohen’s d  > 0.86; P Cue_Speech  = 0.022, Cohen’s d  = 0.95; all others P  < 0.001 and Cohen’s d  ≥ 1.65). c , The percentage of neurons tuned during the internal speech phase that are also tuned during the vocalized speech phase. Neurons tuned during the internal speech phase were computed as in Fig. 3b separately for each session day. From these, the percentage of neurons that were also tuned during vocalized speech was calculated. More than 80% of neurons during internal speech were also tuned during vocalized speech (82% in the ‘auditory cue’ task, 85% in the ‘written cue’ task). In total, 71% of ‘auditory cue’ and 79% ‘written cue’ neurons also preserved tuning to at least one identical word during internal speech and vocalized speech phases. d , The percentage of neurons tuned during the internal speech phase that were also tuned during the cue phase. Right: 78% of neurons tuned during internal speech were also tuned during the written cue phase. Left: a smaller 47% of neurons tuned during the internal speech phase were also tuned during the auditory cue phase. In total, 71% of neurons preserved tuning between the written cue phase and the internal speech phase, while 42% of neurons preserved tuning between the auditory cue and the internal speech phase.

The strongest shared neural representations were found between visual word recognition, internal speech and vocalized speech (Fig. 6b ). A model trained on internal speech was highly generalizable to both vocalized speech and written cued words, evidence for a possible shared neural code (Fig. 6b , internal). In contrast, the model’s performance was significantly lower when tested on data recorded in the auditory cue phase (Fig. 6a , training phase internal: paired t -test, d.f. 9, P Cue_Internal  < 0.001, Cohen’s d  = 2.16; P Cue_Speech  < 0.001, Cohen’s d  = 3.34). These differences could stem from the inherent challenges in comparing visual and auditory language stimuli, which differ in processing time: instantaneous for text versus several hundred milliseconds for auditory stimuli.

We evaluated the capability of a classification model, initially trained to distinguish words during vocalized speech, in its ability to generalize to internal and cue phases (Fig. 6a,b , training phase speech). The model demonstrated similar levels of generalization during internal speech and in response to written cues, as indicated by the lack of significance in decoding accuracy between the internal and written cue phase (Fig. 6b , training phase speech, cue–internal). However, the model generalized significantly better to internal speech than to representations observed during the auditory cue phase (Fig. 6a , training phase speech, d.f. 9, P Cue_Internal  < 0.001, Cohen’s d  = 2.85).

Neuronal representation of words at the single-neuron level was highly consistent between internal speech, vocalized speech and written cue phases. A high percentage of neurons were not only active during the same task phases but also preserved identical tuning to at least one word (Fig. 6c,d ). In total, 82–85% of neurons active during internal speech were also active during vocalized speech. In 71–79% of neurons, tuning was preserved between the internal speech and vocalized speech phases (Fig. 6c ). During the cue phase, 78% of neurons active during internal speech were also active during the written cue (Fig. 6d , right). However, a lower percentage of neurons (47%) were active during the auditory cue phase (Fig. 6d , left). Similarly, 71% of neurons preserved tuning between the written cue phase and the internal speech phase, while 42% of neurons preserved tuning between the auditory cue phase and the internal speech phase.

Together with the cross-phase analysis, these results suggest strong shared neural representations between internal speech, vocalized speech and the written cue, both at the single-neuron and at the population level.

Robust decoding of multiple internal speech strategies within the SMG

Strong shared neural representations in participant 1 between written, inner and vocalized speech suggest that all three partly represent the same cognitive process or all cognitive processes share common neural features. While internal and vocalized speech have been shown to share common neural features 36 , similarities between internal speech and the written cue could have occurred through several different cognitive processes. For instance, the participant’s observation of the written cue could have activated silent reading. This process has been self-reported as activating internal speech, which can involve ‘hearing’ a voice, thus having an auditory component 42 , 47 . However, the participant could also have mentally pictured an image of the written word while performing internal speech, involving visual imagination in addition to language processes. Both hypotheses could explain the high amount of shared neural representation between the written cue and the internal speech phases (Fig. 6b ).

We therefore compared two possible internal sensory strategies in participant 1: a ‘sound imagination’ strategy in which the participant imagined hearing the word, and a ‘visual imagination’ strategy in which the participant visualized the word’s image (Supplementary Fig. 5a ). Each strategy was cued by the modalities we had previously tested (auditory and written words) (Table 1 ). To assess the similarity of these internal speech processes to other task phases, we conducted a cross-phase decoding analysis (as performed in Fig. 6 ). We hypothesized that, if the high cross-decoding results between internal and written cue phases primarily stemmed from the participant engaging in visual word imagination, we would observe lower decoding accuracies during the auditory imagination phase.

Both strategies demonstrated high representation of the four-word dataset (Supplementary Fig. 5b , highest 94%, chance level 25%). These results suggest our speech BMI decoder is robust to multiple types of internal speech strategy.

The participant described the ‘sound imagination’ strategy as being easier and more similar to the internal speech condition of the first experiment. The participant’s self-reported strategy suggests that no visual imagination was performed during internal speech. Correspondingly, similarities between written cue and internal speech phases may stem from internal speech activation during the silent reading of the cue.

In this work, we demonstrated a decoder for internal and vocalized speech, using single-neuron activity from the SMG. Two chronically implanted, speech-abled participants with tetraplegia were able to use an online, closed-loop internal speech BMI to achieve on average 79% and 23% classification accuracy with 16–32 training trials for an eight-word vocabulary. Furthermore, high decoding was achievable with only 24 s of training data per word, corresponding to 16 trials each with 1.5 s of data. Firing rates recorded from S1 showed generalized activation only during vocalized speech activity, but individual words were not classifiable. In the SMG, shared neural representations between internal speech, the written cue and vocalized speech suggest the occurrence of common processes. Robust control could be achieved using visual and auditory internal speech strategies. Representation of pseudowords provided evidence for a phonetic word encoding component in the SMG.

Single neurons in the SMG encode internal speech

We demonstrated internal speech decoding of six different words and two pseudowords in the SMG. Single neurons increased their firing rates during internal speech (Fig. 2 , S1 and S2), which was also reflected at the population level (Fig. 3a,b,d,e ). Each word was represented in the neuronal population (Fig. 3c,f ). Classification accuracy and tuning during the internal speech phase were significantly higher than during the previous delay phase (Figs. 3b,e and 5a , and Supplementary Figs. 3c,d and 4b ). This evidence suggests that we did not simply decode sustained activity from the cue phase but activity generated by the participant performing internal speech. We obtained significant offline and online internal speech decoding results in two participants (Fig. 5a and Supplementary Fig. 4b ). These findings provide strong evidence for internal speech processing at the single-neuron level in the SMG.

Neurons in S1 are modulated by vocalized but not internal speech

Neural activity recorded from S1 served as a control for synchronized face and lip movements during internal speech. While vocalized speech robustly activated sensory neurons, no increase of baseline activity was observed during the internal speech phase or the auditory and written cue phases in both participants (Fig. 4 , S1). These results underline no synchronized movement inflated our decoding accuracy of internal speech (Supplementary Fig. 4a,c ).

A previous imaging study achieved significant offline decoding of several different internal speech sentences performed by patients with mild ALS 6 . Together with our findings, these results suggest that a BMI speech decoder that does not rely on any movement may translate to communication opportunities for patients suffering from ALS and locked-in syndrome.

Different face activities are observable but not decodable in arm area of S1

The topographic representation of body parts in S1 has recently been found to be less rigid than previously thought. Generalized finger representation was found in a presumably S1 arm region of interest (ROI) 44 . Furthermore, an fMRI paper found observable face and lip activity in S1 leg and hand ROIs. However, differentiation between two lip actions was restricted to the face ROI 43 . Correspondingly, we observed generalized face and lip activity in a predominantly S1 arm region for participant 1 (see ref. 48 for implant location) and a predominantly S1 hand region for participant 2 during vocalized speech (Fig. 4a,d and Supplementary Figs. 1 and 4a,b ). Recorded neural activity contained similar representations for different spoke words (Fig. 4c,e ) and was not significantly decodable (Supplementary Fig. 4a,c ).

Shared neural representations between internal and vocalized speech

The extent to which internal and vocalized speech generalize is still debated 35 , 42 , 49 and depends on the investigated brain area 36 , 50 . In this work, we found on average stronger representation for vocalized (74%) than internal speech (Fig. 5a , 55%) in participant 1 but the opposite effect in participant 2 (Supplementary Fig. 4b , 24% internal, 21% vocalized speech). Additionally, cross-phase decoding of vocalized speech from models trained on data during internal speech resulted in comparable classification accuracies to those of internal speech (Fig. 6a,b , internal). Most neurons tuned during internal speech were also tuned to at least one of the same words during vocalized speech (71–79%; Fig. 6c ). However, some neurons were only tuned during internal speech, or to different words. These observations also applied to firing rates of individual neurons. Here, we observed neurons that had higher peak rates during the internal speech phase than the vocalized speech phase (Supplementary Fig. 1 : swimming and cowboy). Together, these results further suggest neural signatures during internal and vocalized speech are similar but distinct from one another, emphasizing the need for developing speech models from data recorded directly on internal speech production 51 .

Similar observations were made when comparing internal speech processes with visual word processes. In total, 79% of neurons were active both in the internal speech phase and the written cue phase, and 79% preserved the same tuning (Fig. 6d , written cue). Additionally, high cross-decoding between both phases was observed (Fig. 6b , internal).

Shared representation between speech and written cue presentation

Observation of a written cue may engage a variety of cognitive processes, such as visual feature recognition, semantic understanding and/or related language processes, many of which modulate similar cortical regions as speech 45 . Studies have found that silent reading can evoke internal speech; it can be modulated by a presumed author’s speaking speed, voice familiarity or regional accents 35 , 42 , 47 , 52 , 53 . During silent reading of a cued sentence with a neutral versus increased prosody (madeleine brought me versus MADELEINE brought me), one study in particular found that increased left SMG activation correlated with the intensity of the produced inner speech 54 .

Our data demonstrated high cross-phase decoding accuracies between both written cue and speech phases in our first participant (Fig. 6b ). Due to substantial shared neural representation, we hypothesize that the participant’s silent reading during the presentation of the written cue may have engaged internal speech processes. However, this same shared representation could have occurred if visual processes were activated in the internal speech phase. For instance, the participant could have performed mental visualization of the written word instead of generating an internal monologue, as the subjective perception of internal speech may vary between individuals.

Investigating internal speech strategies

In a separate experiment, participant 1 was prompted to execute different mental strategies during the internal speech phase, consisting of ‘sound imagination’ or ‘visual word imagination’ (Supplementary Fig. 5a ). We found robust decoding during the internal strategy phase, regardless of which mental strategy was performed (Supplementary Fig. 5b ). This participant reported the sound strategy was easier to execute than the visual strategy. Furthermore, this participant reported that the sound strategy was more similar to the internal speech strategy employed in prior experiments. This self-report suggests that the patient did not perform visual imagination during the internal speech task. Therefore, shared neural representation between internal and written word phases during the internal speech task may stem from silent reading of the written cue. Since multiple internal mental strategies are decodable from SMG, future patients could have flexibility with their preferred strategy. For instance, people with a strong visual imagination may prefer performing visual word imagination.

Audio contamination in decoding result

Prior studies examining neural representation of attempted or vocalized speech must potentially mitigate acoustic contamination of electrophysiological brain signals during speech production 55 . During internal speech production, no detectable audio was captured by the audio equipment or noticed by the researchers in the room. In the rare cases the participant spoke during internal speech (three trials), the trials were removed. Furthermore, if audio had contaminated the neural data during the auditory cue or vocalized speech, we would have probably observed significant decoding in all channels. However, no significant classification was detected in S1 channels during the auditory cue phase nor the vocalized speech phase (Supplementary Fig. 2b ). We therefore conclude that acoustic contamination did not artificially inflate observed classification accuracies during vocalized speech in the SMG.

Single-neuron modulation during internal speech with a second participant

We found single-neuron modulation to speech processes in a second participant (Figs. 2d,e and 3f , and Supplementary Fig. 2d ), as well as significant offline and online classification accuracies (Fig. 5a and Supplementary Fig. 4b ), confirming neural representation of language processes in the SMG. The number of neurons distinctly active for different words was lower compared with the first participant (Fig. 2e and Supplementary Fig. 3d ), limiting our ability to decode with high accuracy between words in the different task phases (Fig. 5a and Supplementary Fig. 4b ).

Previous work found that single neurons in the PPC exhibited a common neural substrate for written action verbs and observed actions 56 . Another study found that single neurons in the PPC also encoded spoken numbers 57 . These recordings were made in the superior parietal lobule whereas the SMG is in the inferior parietal lobule. Thus, it would appear that language-related activity is highly distributed across the PPC. However, the difference in strength of language representation between each participant in the SMG suggests that there is a degree of functional segregation within the SMG 37 .

Different anatomical geometries of the SMG between participants mean that precise comparisons of implanted array locations become difficult (Fig. 1 ). Implant locations for both participants were informed from pre-surgical anatomical/vasculature scans and fMRI tasks designed to evoke activity related to grasp and dexterous hand movements 48 . Furthermore, the number of electrodes of the implanted array was higher in the first participant (96) than in the second participant (64). A pre-surgical assessment of functional activity related to language and speech may be required to determine the best candidate implant locations within the SMG for online speech decoding applications.

Impact on BMI applications

In this work, an online internal speech BMI achieved significant decoding from single-neuron activity in the SMG in two participants with tetraplegia. The online decoders were trained on as few as eight repetitions of 1.5 s per word, demonstrating that meaningful classification accuracies can be obtained with only a few minutes’ worth of training data per day. This proof-of-concept suggests that the SMG may be able to represent a much larger internal vocabulary. By building models on internal speech directly, our results may translate to people who cannot vocalize speech or are completely locked in. Recently, ref. 26 demonstrated a BMI speller that decoded attempted speech of the letters of the NATO alphabet and used those to construct sentences. Scaling our vocabulary to that size could allow for an unrestricted internal speech speller.

To summarize, we demonstrate the SMG as a promising candidate to build an internal brain–machine speech device. Different internal speech strategies were decodable from the SMG, allowing patients to use the methods and languages with which they are most comfortable. We found evidence for a phonetic component during internal and vocalized speech. Adding to previous findings indicating grasp decoding in the SMG 23 , we propose the SMG as a multipurpose BMI area.

Experimental model and participant details

Two male participants with tetraplegia (33 and 39 years) were recruited for an institutional review board- and Food and Drug Administration-approved clinical trial of a BMI and gave informed consent to participate (Institutional Review Board of Rancho Los Amigos National Rehabilitation Center, Institutional Review Board of California Institute of Technology, clinical trial registration NCT01964261 ). This clinical trial evaluated BMIs in the PPC and the somatosensory cortex for grasp rehabilitation. One of the primary effectiveness objectives of the study is to evaluate the effectiveness of the neuroport in controlling virtual or physical end effectors. Signals from the PPC will allow the subjects to control the end effector with accuracy greater than chance. Participants were compensated for their participation in the study and reimbursed for any travel expenses related to participation in study activities. The authors affirm that the human research participant provided written informed consent for publication of Supplementary Video 1 . The first participant suffered a spinal cord injury at cervical level C5 1.5 years before participating in the study. The second participant suffered a C5–C6 spinal cord injury 3 years before implantation.

Method details

Data were collected from implants located in the left SMG and the left S1 (for anatomical locations, see Fig. 1 ). For description of pre-surgical planning, localization fMRI tasks, surgical techniques and methodologies, see ref. 48 . Placement of electrodes was based on fMRI tasks involving grasp and dexterous hand movements.

The first participant underwent surgery in November 2016 to implant two 96-channel platinum-tipped multi-electrode arrays (NeuroPort Array, Blackrock Microsystems) in the SMG and in the ventral premotor cortex and two 7 × 7 sputtered iridium oxide film (SIROF)-tipped microelectrode arrays with 48 channels each in the hand and arm area of S1. Data were collected between July 2021 and August 2022. The second participant underwent surgery in October 2022 and was implanted with SIROF-tipped 64-channel microelectrode arrays in S1 (two arrays), SMG, ventral premotor cortex and primary motor cortex. Data were collected in January 2023.

Data collection

Recording began 2 weeks after surgery and continued one to three times per week. Data for this work were collected between 2021 and 2023. Broadband electrical activity was recorded from the NeuroPort Arrays using Neural Signal Processors (Blackrock Microsystems). Analogue signals were amplified, bandpass filtered (0.3–7,500 Hz) and digitized at 30,000 samples s −1 . To identify putative action potentials, these broadband data were bandpass filtered (250–5,000 Hz) and thresholded at −4.5 the estimated root-mean-square voltage of the noise. For some of the analyses, waveforms captured at these threshold crossings were then spike sorted by manually assigning each observation to a putative single neuron; for others, multiunit activity was considered. For participant 1, an average of 33 sorted SMG units (between 22 and 56) and 83 sorted S1 units (between 59 and 96) were recorded per session. For participant 2, an average of 80 sorted SMG units (between 69 and 92) and 81 sorted S1 units (between 61 and 101) were recorded per session. Auditory data were recorded at 30,000 Hz simultaneously to the neural data. Background noise was reduced post-recording by using the noise reduction function of the program ‘Audible’.

Experimental tasks

We implemented different tasks to study language processes in the SMG. The tasks cued six words informed by ref. 31 (spoon, python, battlefield, cowboy, swimming and telephone) as well as two pseudowords (bindip and nifzig). The participants were situated 1 m in front of a light-emitting diode screen (1,190 mm screen diagonal), where the task was visualized. The task was implemented using the Psychophysics Toolbox 58 , 59 , 60 extension for MATLAB. Only the written cue task was used for participant 2.

Auditory cue task

Each trial consisted of six phases, referred to in this paper as ITI, cue, D1, internal, D2 and speech. The trial began with a brief ITI (2 s), followed by a 1.5-s-long cue phase. During the cue phase, a speaker emitted the sound of one of the eight words (for example, python). Word duration varied between 842 and 1,130 ms. Then, after a delay period (grey circle on screen; 0.5 s), the participant was instructed to internally say the cued word (orange circle on screen; 1.5 s). After a second delay (grey circle on screen; 0.5 s), the participant vocalized the word (green circle on screen, 1.5 s).

Written cue task

The task was identical to the auditory cue task, except words were cued in writing instead of sound. The written word appeared on the screen for 1.5 s during the cue phase. The auditory cue was played between 200 ms and 650 ms later than the written cue appeared on the screen, due to the utilization of varied sound outputs (direct computer audio versus Bluetooth speaker).

One auditory cue task and one written cue task were recorded on ten individual session days in participant 1. The written cue task was recorded on seven individual session days in participant 2.

Control experiments

Three experiments were run to investigate internal strategies and phonetic versus semantic processing.

Internal strategy task

The task was designed to vary the internal strategy employed by the participant during the internal speech phase. Two internal strategies were tested: a sound imagination and a visual imagination. For the ‘sound imagination’ strategy, the participant was instructed to imagine what the sound of the word sounded like. For the ‘visual imagination’ strategy, the participant was instructed to perform mental visualization from the written word. We also tested if the cue modality (auditory or written) influenced the internal strategy. A subset of four words were used for this experiment. This led to four different variations of the task.

The internal strategy task was run on one session day with participant 1.

Online task

The ‘written cue task’ was used for the closed-loop experiments. To obtain training data for the online task, a written cue task was run. Then, a classification model was trained only on the internal speech data of the task (see ‘Classification’ section). The closed-loop task was nearly identical to the ‘written cue task’ but replaced the vocalized speech phase by a feedback phase. Feedback was provided by showing the word on the screen either in green if correctly classified or in red if wrongly classified. See Supplementary Video 1 for an example of the participant performing the online task. The online task was run on three individual session days.

Error trials

Trials in which participants accidentally spoke during the internal speech part (3 trials) or said the wrong word during the vocalized speech part (20 trials) were removed from all analysis.

Total number of recording trials

For participant 1, we collected offline datasets composed of eight trials per word across ten sessions. Trials during which participant errors occurred were excluded. In total, between 156 and 159 trials per word were included, with a total of 1,257 trials for offline analysis. On four non-consecutive session days, the auditory cue task was run first, and on six non-consecutive days, the written cue task was run first. For online analysis, datasets were recorded on three different session days, for a total of 304 trials. Participant 2 underwent a similar data collection process, with offline datasets comprising 16 trials per word using the written cue modality over nine sessions. Error trials were excluded. In total, between 142 and 144 trials per word were kept, with a total of 1,145 trials for offline analysis. For online analysis, datasets were recorded on three session days, leading to a total of 448 online trials.

Quantification and statistical analysis

Analyses were performed using MATLAB R2020b and Python, version 3.8.11.

Neural firing rates

Firing rates of sorted units were computed as the number of spikes occurring in 50-ms bins, divided by the bin width and smoothed using a Gaussian filter with kernel width of 50 ms to form an estimate of the instantaneous firing rates (spikes s −1 ).

Linear regression tuning analysis

To identify units exhibiting selective firing rate patterns (or tuning) for each of the eight words, linear regression analysis was performed in two different ways: (1) step by step in 50-ms time bins to allow assessing changes in neuronal tuning over the entire trial duration; (2) averaging the firing rate in each task phase to compare tuning between phases. The model returns a fit that estimates the firing rate of a unit on the basis of the following variables:

where FR corresponds to the firing rate of the unit, β 0 to the offset term equal to the average ITI firing rate of the unit, X is the vector indicator variable for each word w , and β w corresponds to the estimated regression coefficient for word w . W was equal to 8 (battlefield, cowboy, python, spoon, swimming, telephone, bindip and nifzig) 23 .

In this model, β symbolizes the change of firing rate from baseline for each word. A t -statistic was calculated by dividing each β coefficient by its standard error. Tuning was based on the P value of the t -statistic for each β coefficient. A follow-up analysis was performed to adjust for false discovery rate (FDR) between the P values 61 , 62 . A unit was defined as tuned if the adjusted P value is <0.05 for at least one word. This definition allowed for tuning of a unit to zero, one or multiple words during different timepoints of the trial. Linear regression was performed for each session day individually. A 95% confidence interval of the mean was computed by performing the Student’s t -inverse cumulative distribution function over the ten sessions.

Kruskal–Wallis tuning analysis

As an alternative tuning definition, differences in firing rates between words were tested using the Kruskal–Wallis test, the non-parametric analogue to the one-way analysis of variance (ANOVA). For each neuron, the analysis was performed to evaluate the null hypothesis that data from each word come from the same distribution. A follow-up analysis was performed to adjust for FDR between the P values for each task phase 61 , 62 . A unit was defined as tuned during a phase if the adjusted P value was smaller than α  = 0.05.

Classification

Using the neuronal firing rates recorded during the tasks, a classifier was used to evaluate how well the set of words could be differentiated during each phase. Classifiers were trained using averaged firing rates over each task phase, resulting in six matrices of size n ,  m , where n corresponds to the number of trials and m corresponds to the number of recorded units. A model for each phase was built using LDA, assuming an identical covariance matrix for each word, which resulted in best classification accuracies. Leave-one-out CV was performed to estimate decoding performance, leaving out a different trial across neurons at each loop. PCA was applied on the training data, and PCs explaining more than 95% of the variance were selected as features and applied to the single testing trial. A 95% confidence interval of the mean was computed as described above.

Cross-phase classification

To estimate shared neural representations between different task phases, we performed cross-phase classification. The process consisted in training a classification model (as described above) on one of the task phases (for example, ITI) and to test it on the ITI, cue, imagined speech and vocalized speech phases. The method was repeated for each of the ten sessions individually, and a 95% confidence interval of the mean was computed. Significant differences in classification accuracies between phases decoded with the same model were evaluated using a paired two-tailed t -test. FDR correction of the P values was performed (‘Linear regression tuning analysis’) 61 , 62 .

Classification performance significance testing

To assess the significance of classification performance, a null dataset was created by repeating classification 100 times with shuffled labels. Then, different percentile levels of this null distribution were computed and compared to the mean of the actual data. Mean classification performances higher than the 97.5th percentile were denoted with P < 0.05 and higher than 99.5th percentile were denoted with P < 0.01.

dPCA analysis

dPCA was performed on the session data to study the activity of the neuronal population in relation to the external task parameters: cue modality and word. Kobak et al. 63 introduced dPCA as a refinement of their earlier dimensionality reduction technique (of the same name) that attempts to combine the explanatory strengths of LDA and PCA. By deconstructing neuronal population activity into individual components, each component relates to a single task parameter 64 .

This text follows the methodology outlined by Kobak et al. 63 . Briefly, this involved the following steps for N neurons:

First, unlike in PCA, we focused not on the matrix, X , of the original data, but on the matrices of marginalizations, X ϕ . The marginalizations were computed as neural activity averaged over trials, k , and some task parameters in analogy to the covariance decomposition done in multivariate analysis of variance. Since our dataset has three parameters: timing, t , cue modality, \(c\) (for example, auditory or visual), and word, w (eight different words), we obtained the total activity as the sum of the average activity with the marginalizations and a final noise term

The above notation of Kobak et al. is the same as used in factorial ANOVA, that is, \({X}_{{tcwk}}\) is the matrix of firing rates for all neurons, \(< \bullet { > }_{{ab}}\) is the average over a set of parameters \(a,b,\ldots\) , \(\bar{X}= < {X}_{{tcwk}}{ > }_{{tcwk}}\) , \({\bar{X}}_{t}= < {X}_{{tcwk}}-\bar{X}{ > }_{{cwk}}\) , \({\bar{X}}_{{tc}}= < {X}_{{tcwk}}-\bar{X}-{\bar{X}}_{t}-{\bar{X}}_{c}-{\bar{X}}_{w}{ > }_{{wk}}\) and so on. Finally, \({{{\epsilon }}}_{{tcwk}}={X}_{{tcwk}}- < {X}_{{tcwk}}{ > }_{k}\) .

Participant 1 datasets were composed of N  = 333 (SMG), N  = 828 (S1) and k  = 8. Participant 2 datasets were composed of N  = 547 (SMG), N  = 522 (S1) and k  = 16. To create balanced datasets, error trials were replaced by the average firing rate of k  − 1 trials.

Our second step reduced the number of terms by grouping them as seen by the braces in the equation above, since there is no benefit in demixing a time-independent pure task, \(a\) , term \({\bar{X}}_{a}\) from the time–task interaction terms \({\bar{X}}_{{ta}}\) since all components are expected to change with time. The above grouping reduced the parametrization down to just five marginalization terms and the noise term (reading in order): the mean firing rate, the task-independent term, the cue modality term, the word term, the cue modality–word interaction term and the trial-to-trial noise.

Finally, we gained extra flexibility by having two separate linear mappings \({F}_{\varphi }\) for encoding and \({D}_{\varphi }\) for decoding (unlike in PCA, they are not assumed to be transposes of each other). These matrices were chosen to minimize the loss function (with a quadratic penalty added to avoid overfitting):

Here, \({{\mu }}=(\lambda\Vert X\Vert)^{2}\) , where λ was optimally selected through tenfold CV in each dataset.

We refer the reader to Kobak et al. for a description of the full analytic solution.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data supporting the findings of this study are openly available via Zenodo at https://doi.org/10.5281/zenodo.10697024 (ref. 65 ). Source data are provided with this paper.

Code availability

The custom code developed for this study is openly available via Zenodo at https://doi.org/10.5281/zenodo.10697024 (ref. 65 ).

Hecht, M. et al. Subjective experience and coping in ALS. Amyotroph. Lateral Scler. Other Mot. Neuron Disord. 3 , 225–231 (2002).

Google Scholar  

Aflalo, T. et al. Decoding motor imagery from the posterior parietal cortex of a tetraplegic human. Science 348 , 906–910 (2015).

CAS   PubMed   PubMed Central   Google Scholar  

Andersen, R. A. Machines that translate wants into actions. Scientific American https://www.scientificamerican.com/article/machines-that-translate-wants-into-actions/ (2019).

Andersen, R. A., Aflalo, T. & Kellis, S. From thought to action: the brain–machine interface in posterior parietal cortex. Proc. Natl Acad. Sci. USA 116 , 26274–26279 (2019).

Andersen, R. A., Kellis, S., Klaes, C. & Aflalo, T. Toward more versatile and intuitive cortical brain machine interfaces. Curr. Biol. 24 , R885–R897 (2014).

Dash, D., Ferrari, P. & Wang, J. Decoding imagined and spoken phrases from non-invasive neural (MEG) signals. Front. Neurosci. 14 , 290 (2020).

PubMed   PubMed Central   Google Scholar  

Luo, S., Rabbani, Q. & Crone, N. E. Brain–computer interface: applications to speech decoding and synthesis to augment communication. Neurotherapeutics https://doi.org/10.1007/s13311-022-01190-2 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Martin, S., Iturrate, I., Millán, J. D. R., Knight, R. T. & Pasley, B. N. Decoding inner speech using electrocorticography: progress and challenges toward a speech prosthesis. Front. Neurosci. 12 , 422 (2018).

Rabbani, Q., Milsap, G. & Crone, N. E. The potential for a speech brain–computer interface using chronic electrocorticography. Neurotherapeutics 16 , 144–165 (2019).

Lopez-Bernal, D., Balderas, D., Ponce, P. & Molina, A. A state-of-the-art review of EEG-based imagined speech decoding. Front. Hum. Neurosci. 16 , 867281 (2022).

Nicolas-Alonso, L. F. & Gomez-Gil, J. Brain computer interfaces, a review. Sensors 12 , 1211–1279 (2012).

Herff, C., Krusienski, D. J. & Kubben, P. The potential of stereotactic-EEG for brain–computer interfaces: current progress and future directions. Front. Neurosci. 14 , 123 (2020).

Angrick, M. et al. Speech synthesis from ECoG using densely connected 3D convolutional neural networks. J. Neural Eng. https://doi.org/10.1088/1741-2552/ab0c59 (2019).

Herff, C. et al. Generating natural, intelligible speech from brain activity in motor, premotor, and inferior frontal cortices. Front. Neurosci. 13 , 1267 (2019).

Kellis, S. et al. Decoding spoken words using local field potentials recorded from the cortical surface. J. Neural Eng. 7 , 056007 (2010).

Makin, J. G., Moses, D. A. & Chang, E. F. Machine translation of cortical activity to text with an encoder–decoder framework. Nat. Neurosci. 23 , 575–582 (2020).

Metzger, S. L. et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620 , 1037–1046 (2023).

Moses, D. A. et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 385 , 217–227 (2021).

Guenther, F. H. et al. A wireless brain–machine interface for real-time speech synthesis. PLoS ONE 4 , e8218 (2009).

Stavisky, S. D. et al. Neural ensemble dynamics in dorsal motor cortex during speech in people with paralysis. eLife 8 , e46015 (2019).

Wilson, G. H. et al. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. J. Neural Eng. 17 , 066007 (2020).

Willett, F. R. et al. A high-performance speech neuroprosthesis. Nature 620 , 1031–1036 (2023).

Wandelt, S. K. et al. Decoding grasp and speech signals from the cortical grasp circuit in a tetraplegic human. Neuron https://doi.org/10.1016/j.neuron.2022.03.009 (2022).

Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568 , 493–498 (2019).

Bocquelet, F., Hueber, T., Girin, L., Savariaux, C. & Yvert, B. Real-time control of an articulatory-based speech synthesizer for brain computer interfaces. PLoS Comput. Biol. 12 , e1005119 (2016).

Metzger, S. L. et al. Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis. Nat. Commun. 13 , 6510 (2022).

Meng, K. et al. Continuous synthesis of artificial speech sounds from human cortical surface recordings during silent speech production. J. Neural Eng. https://doi.org/10.1088/1741-2552/ace7f6 (2023).

Proix, T. et al. Imagined speech can be decoded from low- and cross-frequency intracranial EEG features. Nat. Commun. 13 , 48 (2022).

Pei, X., Barbour, D. L., Leuthardt, E. C. & Schalk, G. Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans. J. Neural Eng. 8 , 046028 (2011).

Ikeda, S. et al. Neural decoding of single vowels during covert articulation using electrocorticography. Front. Hum. Neurosci. 8 , 125 (2014).

Martin, S. et al. Word pair classification during imagined speech using direct brain recordings. Sci. Rep. 6 , 25803 (2016).

Angrick, M. et al. Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity. Commun. Biol. 4 , 1055 (2021).

Price, C. J. The anatomy of language: a review of 100 fMRI studies published in 2009. Ann. N. Y. Acad. Sci. 1191 , 62–88 (2010).

PubMed   Google Scholar  

Langland-Hassan, P. & Vicente, A. Inner Speech: New Voices (Oxford Univ. Press, 2018).

Perrone-Bertolotti, M., Rapin, L., Lachaux, J.-P., Baciu, M. & Lœvenbruck, H. What is that little voice inside my head? Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring. Behav. Brain Res. 261 , 220–239 (2014).

CAS   PubMed   Google Scholar  

Pei, X. et al. Spatiotemporal dynamics of electrocorticographic high gamma activity during overt and covert word repetition. NeuroImage 54 , 2960–2972 (2011).

Oberhuber, M. et al. Four functionally distinct regions in the left supramarginal gyrus support word processing. Cereb. Cortex 26 , 4212–4226 (2016).

Binder, J. R. Current controversies on Wernicke’s area and its role in language. Curr. Neurol. Neurosci. Rep. 17 , 58 (2017).

Geva, S. et al. The neural correlates of inner speech defined by voxel-based lesion–symptom mapping. Brain 134 , 3071–3082 (2011).

Cooney, C., Folli, R. & Coyle, D. Opportunities, pitfalls and trade-offs in designing protocols for measuring the neural correlates of speech. Neurosci. Biobehav. Rev. 140 , 104783 (2022).

Dash, D. et al. Interspeech (International Speech Communication Association, 2020).

Alderson-Day, B. & Fernyhough, C. Inner speech: development, cognitive functions, phenomenology, and neurobiology. Psychol. Bull. 141 , 931–965 (2015).

Muret, D., Root, V., Kieliba, P., Clode, D. & Makin, T. R. Beyond body maps: information content of specific body parts is distributed across the somatosensory homunculus. Cell Rep. 38 , 110523 (2022).

Rosenthal, I. A. et al. S1 represents multisensory contexts and somatotopic locations within and outside the bounds of the cortical homunculus. Cell Rep. 42 , 112312 (2023).

Leuthardt, E. et al. Temporal evolution of gamma activity in human cortex during an overt and covert word repetition task. Front. Hum. Neurosci. 6 , 99 (2012).

Indefrey, P. & Levelt, W. J. M. The spatial and temporal signatures of word production components. Cognition 92 , 101–144 (2004).

Alderson-Day, B., Bernini, M. & Fernyhough, C. Uncharted features and dynamics of reading: voices, characters, and crossing of experiences. Conscious. Cogn. 49 , 98–109 (2017).

Armenta Salas, M. et al. Proprioceptive and cutaneous sensations in humans elicited by intracortical microstimulation. eLife 7 , e32904 (2018).

Cooney, C., Folli, R. & Coyle, D. Neurolinguistics research advancing development of a direct-speech brain–computer interface. iScience 8 , 103–125 (2018).

Soroush, P. Z. et al. The nested hierarchy of overt, mouthed, and imagined speech activity evident in intracranial recordings. NeuroImage https://doi.org/10.1016/j.neuroimage.2023.119913 (2023).

Soroush, P. Z. et al. The nested hierarchy of overt, mouthed, and imagined speech activity evident in intracranial recordings. NeuroImage 269 , 119913 (2023).

Alexander, J. D. & Nygaard, L. C. Reading voices and hearing text: talker-specific auditory imagery in reading. J. Exp. Psychol. Hum. Percept. Perform. 34 , 446–459 (2008).

Filik, R. & Barber, E. Inner speech during silent reading reflects the reader’s regional accent. PLoS ONE 6 , e25782 (2011).

Lœvenbruck, H., Baciu, M., Segebarth, C. & Abry, C. The left inferior frontal gyrus under focus: an fMRI study of the production of deixis via syntactic extraction and prosodic focus. J. Neurolinguist. 18 , 237–258 (2005).

Roussel, P. et al. Observation and assessment of acoustic contamination of electrophysiological brain signals during speech production and sound perception. J. Neural Eng. 17 , 056028 (2020).

Aflalo, T. et al. A shared neural substrate for action verbs and observed actions in human posterior parietal cortex. Sci. Adv. 6 , eabb3984 (2020).

Rutishauser, U., Aflalo, T., Rosario, E. R., Pouratian, N. & Andersen, R. A. Single-neuron representation of memory strength and recognition confidence in left human posterior parietal cortex. Neuron 97 , 209–220.e3 (2018).

Brainard, D. H. The psychophysics toolbox. Spat. Vis. 10 , 433–436 (1997).

Pelli, D. G. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat. Vis. 10 , 437–442 (1997).

Kleiner, M. et al. What’s new in psychtoolbox-3. Perception 36 , 1–16 (2007).

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57 , 289–300 (1995).

Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29 , 1165–1188 (2001).

Kobak, D. et al. Demixed principal component analysis of neural population data. eLife 5 , e10989 (2016).

Kobak, D. dPCA. GitHub https://github.com/machenslab/dPCA (2020).

Wandelt, S. K. Data associated to manuscript “Representation of internal speech by single neurons in human supramarginal gyrus”. Zenodo https://doi.org/10.5281/zenodo.10697024 (2024).

Download references

Acknowledgements

We thank L. Bashford and I. Rosenthal for helpful discussions and data collection. We thank our study participants for their dedication to the study that made this work possible. This research was supported by the NIH National Institute of Neurological Disorders and Stroke Grant U01: U01NS098975 and U01: U01NS123127 (S.K.W., D.A.B., K.P., C.L. and R.A.A.) and by the T&C Chen Brain-Machine Interface Center (S.K.W., D.A.B. and R.A.A.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.

Author information

Authors and affiliations.

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA

Sarah K. Wandelt, David A. Bjånes, Kelsie Pejsa, Brian Lee, Charles Liu & Richard A. Andersen

T&C Chen Brain-Machine Interface Center, California Institute of Technology, Pasadena, CA, USA

Sarah K. Wandelt, David A. Bjånes, Kelsie Pejsa & Richard A. Andersen

Rancho Los Amigos National Rehabilitation Center, Downey, CA, USA

David A. Bjånes & Charles Liu

Department of Neurological Surgery, Keck School of Medicine of USC, Los Angeles, CA, USA

Brian Lee & Charles Liu

USC Neurorestoration Center, Keck School of Medicine of USC, Los Angeles, CA, USA

You can also search for this author in PubMed   Google Scholar

Contributions

S.K.W., D.A.B. and R.A.A. designed the study. S.K.W. and D.A.B. developed the experimental tasks and collected the data. S.K.W. analysed the results and generated the figures. S.K.W., D.A.B. and R.A.A. interpreted the results and wrote the paper. K.P. coordinated regulatory requirements of clinical trials. C.L. and B.L. performed the surgery to implant the recording arrays.

Corresponding author

Correspondence to Sarah K. Wandelt .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature Human Behaviour thanks Abbas Babajani-Feremi, Matthew Nelson and Blaise Yvert for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Supplementary Figs. 1–5.

Reporting Summary

Peer review file, supplementary video 1.

The video shows the participant performing the internal speech task in real time. The participant is cued with a word on the screen. After a delay, an orange dot appears, during which the participant performs internal speech. Then, the decoded word appears on the screen, in green if it is correctly decoded and in red if it is wrongly decoded.

Supplementary Data

Source data for Fig. 3.

Source data for Fig. 4.

Source data for Fig. 5.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 5

Source data fig. 6, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Wandelt, S.K., Bjånes, D.A., Pejsa, K. et al. Representation of internal speech by single neurons in human supramarginal gyrus. Nat Hum Behav (2024). https://doi.org/10.1038/s41562-024-01867-y

Download citation

Received : 15 May 2023

Accepted : 16 March 2024

Published : 13 May 2024

DOI : https://doi.org/10.1038/s41562-024-01867-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

speech on human psychology

May 16, 2024

Device Decodes ‘Internal Speech’ in the Brain

Technology that enables researchers to interpret brain signals could one day allow people to talk using only their thoughts

By Miryam Naddaf & Nature magazine

Human brain with highlighted supramarginal gyrus, computer illustration

Kateryna Kon/Science Photo Library

Scientists have developed brain implants that can decode internal speech — identifying words that two people spoke in their minds without moving their lips or making a sound.

Although the technology is at an early stage — it was shown to work with only a handful of words, and not phrases or sentences — it could have clinical applications in future.

Similar brain–computer interface (BCI) devices, which translate signals in the brain into text, have reached speeds of 62–78 words per minute for some people . But these technologies were trained to interpret speech that is at least partly vocalized or mimed.

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing . By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

The latest study — published in Nature Human Behaviour on 13 May — is the first to decode words spoken entirely internally, by recording signals from individual neurons in the brain in real time.

“It's probably the most advanced study so far on decoding imagined speech,” says Silvia Marchesotti, a neuroengineer at the University of Geneva, Switzerland.

“This technology would be particularly useful for people that have no means of movement any more,” says study co-author Sarah Wandelt, a neural engineer who was at the California Institute of Technology in Pasadena at the time the research was done. “For instance, we can think about a condition like locked-in syndrome.”

Mind-reading tech

The researchers implanted arrays of tiny electrodes in the brains of two people with spinal-cord injuries. They placed the devices in the supramarginal gyrus (SMG), a region of the brain that had not been previously explored in speech-decoding BCIs.

Figuring out the best places in the brain to implant BCIs is one of the key challenges for decoding internal speech, says Marchesotti. The authors decided to measure the activity of neurons in the SMG on the basis of previous studies showing that this part of the brain is active in subvocal speech and in tasks such as deciding whether words rhyme.

Two weeks after the participants were implanted with microelectrode arrays in their left SMG, the researchers began collecting data. They trained the BCI on six words (battlefield, cowboy, python, spoon, swimming and telephone) and two meaningless pseudowords (nifzig and bindip). “The point here was to see if meaning was necessary for representation,” says Wandelt.

Over three days, the team asked each participant to imagine speaking the words shown on a screen and repeated this process several times for each word. The BCI then combined measurements of the participants’ brain activity with a computer model to predict their internal speech in real time.

For the first participant, the BCI captured distinct neural signals for all of the words and was able to identify them with 79% accuracy. But the decoding accuracy was only 23% for the second participant, who showed preferential representation for ‘spoon’ and ‘swimming’ and had fewer neurons that were uniquely active for each word. “It's possible that different sub-areas in the supramarginal gyrus are more, or less, involved in the process,” says Wandelt.

Christian Herff, a computational neuroscientist at Maastricht University in the Netherlands, thinks these results might highlight the different ways in which people process internal speech. “Previous studies showed that there are different abilities in performing the imagined task and also different BCI control abilities,” adds Marchesotti.

The authors also found that 82–85% of neurons that were active during internal speech were also active when the participants vocalized the words. But some neurons were active only during internal speech, or responded differently to specific words in the different tasks.

Although the study represents significant progress in decoding internal speech, clinical applications are still a long way off, and many questions remain unanswered.

“The problem with internal speech is we don't know what’s happening and how is it processed,” says Herff. For example, researchers have not been able to determine whether the brain represents internal speech phonetically (by sound) or semantically (by meaning). “What I think we need are larger vocabularies” for the experiments, says Herff.

Marchesotti also wonders whether the technology can be generalized to people who have lost the ability to speak, given that the two study participants are able to talk and have intact brain speech areas. “This is one of the things that I think in the future can be addressed,” she says.

The next step for the team will be to test whether the BCI can distinguish between the letters of the alphabet. “We could maybe have an internal speech speller, which would then really help patients to spell words,” says Wandelt.

This article is reproduced with permission and was first published on May 13, 2024 .

Search Google Appliance

  • Blackboard Learn
  • People Finder

Department of Psychological and Brain Sciences

Pay attention to speech, irb approved study number: , researcher: , email: , phone number: , room number: , type of subjects: , additional subject information: , time: , description: .

Adults 18 years and older (<30 years) are needed for a study on how seeing a speaker’s face helps with understanding what the speaker said. Researchers in the Language, Intersensory Perception, and Speech (LIPS) lab at UMass Amherst are seeking volunteers with American English as their first language. For more eligibility criteria, contact us. In this study, we test your ability to recognize speech and direct your attention. All tasks will involve button presses or typing short responses with a computer keyboard. While you are performing these tasks, your brainwaves (EEG) will be recorded by electrodes wet with gel that will be placed on your head. Be prepared to leave the experiment with a bit of gel in your hair. We will also collect demographic information from you and may test your hearing.

Compensation: 

Site login • Site Search •  OneDrive login

Undergraduate Advising • Contact or Visit Us

©2024 University of Massachusetts Amherst · Site Policies · Site Contact

Our Services

College Admissions Counseling

UK University Admissions Counseling

EU University Admissions Counseling

College Athletic Recruitment

Crimson Rise: College Prep for Middle Schoolers

Indigo Research: Online Research Opportunities for High Schoolers

Delta Institute: Work Experience Programs For High Schoolers

Graduate School Admissions Counseling

Private Boarding & Day School Admissions

Online Tutoring

Essay Review

Financial Aid & Merit Scholarships

Our Leaders and Counselors

Our Student Success

Crimson Student Alumni

Our Reviews

Our Scholarships

Careers at Crimson

University Profiles

US College Admissions Calculator

GPA Calculator

Practice Standardized Tests

SAT Practice Test

ACT Practice Tests

Personal Essay Topic Generator

eBooks and Infographics

Crimson YouTube Channel

Summer Apply - Best Summer Programs

Top of the Class Podcast

ACCEPTED! Book by Jamie Beaton

Crimson Global Academy

+1 (646) 419-3178

Go back to all articles

100+ Excellent Topics for A Stellar Persuasive Speech

100+ Excellent Topics for A Stellar Persuasive Speech

What Makes a Truly Remarkable Speech?

The Ingredients of an Effective Topic

Ideas & Inspiration: 100+ Topics

Your Next Steps, Step-by-step

This comprehensive blog post serves as a vital resource for anyone looking to craft an impactful persuasive speech. It provides an extensive list of over 100 compelling topics tailored for a wide range of interests and academic fields. Additionally, it offers advanced guidance on selecting the perfect topic, structuring your arguments effectively, and employing persuasive techniques that captivate and convince your audience. Whether you're an academic achiever or an aspiring public speaker, this guide equips you with the insights to deliver a stellar persuasive speech.

Before You Pick the Perfect Topic...

If you’re struggling to find a strong topic for a persuasive speech , you’ll find 100+ ideas for subjects and topics below. Use one that grabs you, or simply find inspiration to get unstuck and come up with a topic about something you and your audience will find interesting.

To help you think about the big picture — your larger essay — we also review what makes a truly effective persuasive speech, all the ingredients of an effective topic, and how to pick the best topic for your circumstances.

Here's what's most essential as you consider your topic choices:

  • pick a topic that has the right scope, one aligned with your larger assignment
  • be sure the topic is one you're interested in researching, has meaning and relevance for your audience, and has the right level of complexity — both for your audience and for your level of speech writing prowess
  • remember your topic should align with themes and subjects related to your circumstances and the speech requirements

Finally, once you’ve picked your topic, and even if you know all the basics — which I’m guessing you do if you’re following posts from Crimson Education — you might still benefit from other advice in today's post, such as numerous speech writing tips and strategies designed to save you time and stress and improve the odds your final speech will exceed expectations.

Here's what you'll find:

  • What Makes a Truly Remarkable Persuasive Speech
  • The Ingredients of an Effective Topic, and Tips for Picking Your Topic
  • 100+ Topic Suggestions
  • How to Develop a Stellar Persuasive Speech — Step-by-Step!

Still feeling a bit hesitant or stuck?

Don’t worry. Once you've picked a really interesting and effective topic and start your research, you'll quickly become a subject-matter expert, regaining both motivation and confidence for all the remaining steps.

What Makes a Truly Remarkable Persuasive Speech?

A good persuasive speech will grab the audience’s attention, help them connect with the speaker (that’s you), and guide their reasoning process — giving the speech the power to persuade your audience why your point of view is logical and compelling, and also superior to the opposing viewpoints.

The 6 Most Essential Ingredients

  • A strong introduction that gets the audience engaged and provides context about the subject and topic, what’s at stake (why it matters), and what issues or concerns tend to be front and center
  • A clear thesis in the form of a specific point of view, opinion, or argument
  • An orderly progression of ideas and arguments, each argument or subtopic supported by logic and evidence
  • An anticipation of opposing viewpoints and arguments (the counterarguments to your opinion)
  • Your responses or ‘rebuttals’ to the opposing viewpoints , answering the anticipated objections and adding additional support for your point of view or thesis
  • A conclusion that highlights the most powerful persuasive elements in your speech and reminds listeners what's at stake, including, if suitable, a call to action

The Historical Roots of Persuasive Speech

Did you know that persuasive speech assignments may be testing your mastery of concepts that go back as far as ancient Greece?

The emergence of democracy in ancient Greece (the 6th and 5th centuries B.C.) created a space for the rule of law and political governance informed by the will of the people — making persuasive speech an essential element of social life.

From courtroom trials to political campaigns and democratic assemblies, persuasive speech emerged in 5th-century Athens as an essential tool of democracy.  Soon the brightest philosophers of the day became concerned with the principles of "rhetoric" — the study of orderly and effective persuasive speaking.

Now, thousands of years later, little has changed in Western democracies: "constructing and defending compelling arguments remains an essential skill in many settings" (Harvard U, Rhetoric ). In short, the principles of deliberation, free speech, and consensus building we use for governance, in school, extracurricular activities , at work, and sometimes our day-to-day life, still rely on persuasive speech.

In every free society individuals are continuously attempting to change the thoughts and/or actions of others. It is a fundamental concept of a free society.

- persuasive speaking, by r. t. oliver, ph.d..

Blog Banner

How The Rhetorical Triangle Can Turbo-charge Your Speech

The 5th-century B.C. Athenian philosopher Aristotle argued that your ability to persuade is based on how well your speech appeals to the audience in three different areas: logos, ethos, and pathos, sometimes referred to as the three points of the rhetorical triangle .

From observation and reflection Aristotle understood that humans are thinking animals (logos), social and moral animals (ethos), and emotional animals (pathos) — such that appealing to all three of these pillars of human understanding and action were essential parts of an effective persuasive speech .

1. Logos — Using clear, logical, and evidence-based reasoning and argumentation to add persuasive power to your speech.

For obvious reasons, audiences will typically expect strong arguments supported by evidence and clear reasoning and logic, all elements that are often prominent on grading rubrics for persuasive speeches.

Maybe you're thinking of speeches you've heard that utterly lacked logic and evidence? It's a reminder that persuasion as such is ultimately about points of view and not always about facts. Even without logic, a speaker can persuade, through effective uses of ethos and pathos , for example. In other instances social phenomena may underlie a lack of logic and evidence, such as "group think," for example , when people are swayed or swept up by a common point of view about an issue, instead of thinking critically about it.

2. Ethos — The component of persuasive speaking that spotlights the appeal, authority, credentials, and moral standing of the speaker .

Have you ever agreed with a speaker simply because you liked the person speaking, or rejected an argument because you disliked a speaker, responding to who the speaker is more than to their arguments? That may not be very logical, but it is very natural for us humans.

Aristotle understood this, that persuasion relies not solely on logical thinking but on relational factors too, including how much we trust a speaker, how much we believe in the integrity of their motives, and the knowledge and expertise they possess (or are perceived to possess).

Take law courts, for example. One common strategy lawyers use to undermine the force of witness testimony is to “discredit” or “taint” the witness , to undermine jurors' confidence in the veracity and motives of the witness. That's using ethos, rather than logic and facts, to impact an audience (the jury).

Likewise, when an audience has a high regard for the speaker's reputation, authority, and credibility, the more convincing that person's arguments are likely to be.

Suggestions for enhancing appeals to ethos in your speech:

  • Share a transformative journey where you shifted from an opposing perspective to your current stance due to overwhelming evidence. This approach can demonstrate your capacity for logic and open-mindedness, helping your audience see you as very rational and impartial, potentially strengthening your credibility.
  • Incorporate the viewpoints and expertise of respected authorities to bolster your arguments. Referencing reliable sources and experts boosts your credibility by showing you've grounded your arguments in established facts, perspectives, and ideas.
  • Foster a connection with your audience. For example, rather than overwhelming them with complex reasoning to showcase your intelligence, strive to comprehend and reflect their perceptions and potential biases regarding your topic. This should make your audience more receptive to your logic and perspectives as your speech progresses.
  • Employ personal anecdotes or lived experiences that unveil a deeper layer of understanding and wisdom. This personal touch not only humanizes you, the speaker, but makes your arguments more relatable and persuasive.

Depending on circumstances, you may think of additional ways to bolster your credibility and trustworthiness — enhancing your standing in the eyes of the audience in order to elevate the persuasive impact of your speech!

3. Pathos — This means injecting your speech with some powerful appeals to listeners' feelings and emotions , in addition to using logic and reason.

For example, if your speech entails persuading voters to increase foreign aide to combat world hunger, you wouldn’t just want to cite cold statistics. Painting a picture of ways malnutrition is affecting real individuals is likely to have a strong impact on listeners' emotions, appealing to their innate capacity for compassion towards others and helping them more deeply appreciate the urgency of the subject . This approach impacts listeners' emotions and highlights an urgent and universal moral imperative that adds conviction to your point of view.

In most academic settings, you'll be expected to present a speech with a strong line of evidence-based, logical reasoning, often making appeals to logos prominent in persuasive speeches in school settings. That said, by injecting and balancing appeals to logos, ethos, and pathos, based on what's most suitable for your topic, assignment, and approach, you'll add a significant measure of mastery to your persuasive writing method.

A Consistent Style and Tone

What style, voice, and tone best suit your personality, the occasion, the listeners, and your subject?

  • Consider adopting a straightforward, clear, and succinct style , reminiscent of a newspaper editorial or a no-nonsense argument in a voter guide. This approach works well for topics and settings requiring direct communication with clear insights and persuasive arguments free from subjectivity and unnecessary analysis and complexity.
  • For topics, interests, or assignments that naturally entail wading through broader philosophical and ethical debates — like debating justifications for euthanasia or arguments against the death penalty — a more introspective, contemplative voice may be expected . This style allows for a deeper exploration of moral dimensions and the broader implications of the issue at hand or the underlying logical principles involved.
  • If your inclination is towards something more unconventional, employing humor and wit could be a chance to take the road less traveled! Whether through irony or parody, for example, by showcasing a humorous topic from the outset, such as "why dog people outshine cat people," or cleverly presenting weaker arguments to underscore your point, this strategy, while offbeat, can captivate and entertain , making your speech stand out in a large class setting. Just be sure to balance the creativity with a clear demonstration of your persuasive speech skills and consider checking in with your teacher about possibilities and expectations beforehand.

With a broader understanding of what goes into a great persuasive speech, you’re better equipped for the important step of picking the topic that will guide your speech.

Picking Your Topic — Questions to Ask

Does it interest you.

Conveying passion for a topic is infectious, adding power to your speech. The more interested and invested you are in your subject and topic, the more likely you are to make your speech the best it can be.

Will the topic interest your audience?

Understanding your audience's values, interests, and views will help you make immediate connections with their own thought processes and attitudes. Try to pick a topic that will get your listeners to perk up and move to the edge of their seats.

Is the topic or point of view fresh and engaging?

Choosing a topic that's novel, contemporary, or presents a unique angle on a familiar issue should help you captivate your audience's attention. You also want the topic to be something that matters to your audience with a point of view that challenges their thinking, so you're not just "preaching to the choir."

Are there any "triggers" or otherwise "sensitive" or "inappropriate" themes?

You might not think there’s not any problem with a topic such as Should we build a wall to keep immigrants out of the country? Or, Should same sex marriage be legal? That said, topics that delve into identity politics or areas that are so controversial that they elicit anger or hostility rather than dialogue and debate may lead to emotional hurt and harm, even if not intended. If you have any doubts, check in with your teacher or a school counselor before settling on your topic!

Finding Subjects and Topics on Your Own

Before you jump ahead and grab a ready-made topic from the list below, remember that a quick brainstorming or online search could be your preferred method to find the best, most interesting topic for your audience, setting, and individual interests or class requirements. For example, an internet search with keywords such as “biggest problems or biggest issues in the world today” will quickly uncover a host of themes and subjects that are both timely and controversial.

Search Results for Keyword Phrase Contemporary World Problems and Issues

  • Water contamination
  • Human rights violation
  • Global health issues
  • Global poverty
  • Children's poor access to healthcare, education and safety
  • Access to food and hunger
  • Anti-corruption and transparency
  • Arms control and nonproliferation
  • Climate and environment
  • Climate crisis
  • Combating and crime
  • Countering terrorism
  • Cyber issues
  • Economic prosperity and trade policy
  • Technology and privacy

A General List vs. Time & Place Factors

Where you live and what’s timely for you and your audience is going to depend on your circumstances. Finding a “hot topic” in your specific time and place could be an effective way to get listeners' attention and address an issue that feels highly relevant.

  • Is there a big policy decision that’s a hot topic at your school?
  • Is there a ballot initiative your community will vote on soon that your audience has strong opinions about?
  • Is there a current events issue in your local news headlines that offers a compelling persuasive speech topic?
  • What’s before congress these days, or before the Supreme Court, or the United Nations — this week (any great topics there for your speech)?

More Inspiration: 100+ Interesting Persuasive Speech Topics for High School

If you haven’t already navigated your way to an interesting persuasive speech topic, use the list below for even more ideas and inspiration!

You can go from top to bottom, or you can jump the line and look for the themes that most interest you, such as Art and Culture or Recreation and Tourism.

Art and Culture

1. Is digital art really art?

2. Street art: vandalism or cultural expression?

3. Is there a place for censorship in the music industry?

4. Do museums promote culture or appropriate culture?

5. Should other countries have a minister of culture or similar government office, as they do in France?

6. Can schools, or art teachers, define good art vs. bad art? Should they?

7. Censorship in art: when is it justified or necessary?

8. Does creative freedom take precedence over cultural appropriation?

9. The impact of digital platforms on the consumption of art and the value of art.

10. Is there a role for public policy and public funding in arts and culture?

1. The pros and cons of minimum wage laws and policies.

2. Cryptocurrency: the future of finance or a scam?

3. Is student loan debt relief good policy?

4. Gender wage gap: are the concerns justified or unjustified?

5. Sustainable development: Is there a way to sustain economic growth and without an environmental catastrophe?

6. The role of small businesses in the economy, do they promote prosperity or undermine efficiencies?

7. Globalization: economic boon or bane?

8. Is consumerism in the general interest or a threat to the planet?

9. The economic effects of climate change, should they be paid now or later?

10. Universal Basic Income: a solution to poverty or a disincentive to work?

1. The case for and against school uniforms.

2. Should non-citizens be allowed to vote in school board elections?

3. The impact of technology on education.

4. Should college education be free?

5. The importance of teaching financial literacy in schools: promoting independent living or consumerism?

6. Should parents have the right to home school children against their will?

7. Is the grading system improving learning?

8. Is mandatory attendance a good policy for high school?

9. Addressing the mental health crisis in schools: is it an individual problem or a social one?

10. Arts education: valuable or a waste of time?

Environmental Issues

1. The urgency of addressing climate change and what to do about it.

2. Plastic pollution: are more stringent government regulations, policies, and laws the answer?

3. Should the government subsidize clean energy technologies and solutions?

4. The importance of water conservation, but whose responsible?

5. Should there be a global environmental tax? On what?

6. Should environmental costs be factored into everyday economic activity?

7. The impact of fast fashion on the environment.

8. The necessity of protecting endangered species.

9. Deforestation: Who's impacted? Who should have power (or not) to stop it?

10. Are electric cars truly better for the environment?

1. The changing dynamics of the modern family.

2. The role of the state in protecting children from parents and guardians.

3. Should adoption records be open or sealed?

4. How can employers, or employment laws, support healthy families?

5. Is there an age when euthanasia should become universally legal and accessible?

6. How to balance parental rights with child welfare.

7. Is your child's gender something they're born with, or something they should be free to choose?

8. The responsibilities of women vs. men in addressing an unplanned pregnancy.

9. Should parents restrict children's use of technology? What is too lax vs. what is too restrictive?

10. Balancing discipline and love in parenting.

Health, Nutrition, & Fitness

1. Should junk food advertising be regulated?

2. The dangers of fad diets: free market vs. consumer protection.

3. Should junk food be banned in schools?

4. Nutrition: are schools failing to teach it?

5. Should students be graded on their fitness and nutrition levels and habits?

6. Should sports programs be replaced by fitness education?

7. E-cigarettes: should they be regulated or not?

8. The obesity epidemic: a problem of individual responsibility, genetics, or social policy?

9. Are agricultural subsidies good for health and the environment?

10. Should teens have more options for balancing school attendance and individual sleep needs and preferences?

Media, Social Media, and Entertainment

1. The effects of social media on teenagers.

2. Should there be regulations on influencer marketing?

3. The impact of video games on behavior.

4. Fake news: Its impact and how to combat it.

5. The role of media in shaping public opinion.

6. Privacy concerns with social media platforms.

7. The influence of celebrities on youth culture: is there a role for rewards and consequences to impact celebrities public behaviors?

8. Digital detox: pros and cons.

9. Media portrayal of women and its societal impact.

10. Censorship in media: necessary or oppressive?

Politics and Society

1. The importance and limits of voting in a democracy.

2. Gun control laws: balancing safety and liberty.

3. The impact of immigration: universal human rights vs. national sovereignty.

4. The death penalty: justice vs. ethics?

5. The legalization of marijuana: the right policy?

6. The right to protest vs. public order.

7. Affirmative action: whose definition of "fairness" do we use?

8. The future of healthcare in America: market solutions or a public option?

9. Climate change policy: National vs. global approaches.

10. The role of the United Nations in today's world.

Recreation & Tourism

1. The benefits of outdoor recreation.

2. Sustainable tourism: protecting nature while promoting travel.

3. The impact of tourism on local cultures.

4. The future of space tourism.

5. The effects of recreational activities on mental health.

6. The importance of historical preservation in tourism.

7. Adventure tourism: reasonable or unreasonable risks vs. rewards proposition?

8. The effects of over-tourism on popular destinations and local communities.

9. Is eco-tourism the right way to promote environmental sustainability?

10. Does international tourism help or harm indigenous peoples, cultures, and communities?

1. Do the ethical downside of genetic engineering outweigh the potential benefits?

2. The potential and pitfalls of artificial intelligence in society.

3. Climate change denial: is it fully within the bounds of free speech?

4. Competing views of vaccine policies and individual rights in pandemics and other health emergencies.

5. Space exploration: is it worth the investment?

6. The use of affirmative action to diversify STEM education and workforce.

7. The impact of technology on job displacement and future employment: is a universal income the right answer?

8. Do renewable energy technologies offer a feasible substitute for eliminating fossil fuels?

9. Ocean pollution: is more government regulation the answer?

10. Protecting biodiversity vs. the right to economic prosperity.

Sports and School Athletics

1. The emphasis on athletic programs in high schools: is the hype benefiting students?

2. Should college athletes be compensated?

3. Do teamwork and group activities help or hinder academic and athletic development?

4. Should schools should require more physical education or less?

5. Should there be more emphasis on non-competitive formats in high school and college athletics?

6. The influence of professional athletes as role models: good or bad?

7. Doping in sports: are athletic programs teaching the wrong values?

8. The benefits and risks of contact sports in high schools athletics.

9. Should there be absolute gender equality in school athletics?

10. What should the educational goal of school athletics be?

These topics span a broad spectrum of interests and concerns — look for one that matters to you and your audience, is likely to prompt insightful dialogue or debate, and is challenging enough to put your individual persuasive speech skills to the test!

Blog Banner

1. Use Diligent Research to Make a Watertight Argument

To go from just any persuasive speech to a truly riveting one, you’ll want to dig around until you find compelling and authoritative research . Even if you're already knowledgeable about your topic, applying yourself with patience and perseverance at this early stage will usually pay off, allowing you to uncover some real gems when it comes to compelling facts and expert perspectives.

What to look for:

  • Facts, statistics, and surveys
  • An expert analysis of a policy or issue
  • Quotes from compelling experts, from books, editorials, or speeches
  • Anecdotal evidence in the form of isolated events or personal experiences that don’t have much statistical significance but can illustrate or capture something powerful that supports your point of view, or add emotional appeal
  • Graphs, tables, and charts

Riveting research will better position you to hit some home runs when you put together your speech. And remember, research is primarily to build a strong logical argument ( logos ), but citing and spotlighting reputable sources will also lend your speech greater persuasive credibility ( ethos ), just as experiential perspectives can add appeals to emotion ( pathos ).

Define Your Thesis

Clearly articulate your stance on the topic. This thesis statement will guide the structure of your speech and inform your audience of your central argument.

I like to create a "working thesis" as a planning tool, something that encapsulates and maps my point of view and essential supporting arguments, and as a way to uncover gaps in my reasoning or evidence early on. Later, it also gives me a ready guide for writing my outline.

Essential Elements of a ‘working thesis’ for a persuasive speech:

  • The subject (including how you'll frame the context for your topic and speech)
  • Your main point of view
  • List of principal arguments
  • The most important counterarguments
  • Key rebuttals to the counterarguments

As you can see, this kind of "working thesis" gives you a bird's eye view of your thesis along with all the key components of your speech and the reasoning you’ll deploy.

Marshaling Your Evidence

As you delve into researching your chosen topic, such as "Why space exploration is not worth the investment," you'll accumulate evidence, including data, anecdotes, expert opinions, and more. This evidence is vital for adding depth, credibility, and persuasion to your speech. You also need to strategically align the evidence with each of your supporting arguments , ensuring that each claim you make is substantiated.

You can use a simple table format to visually map out how you want to align your subtopics and evidence.

Here's an example using the topic Why space exploration is not worth the investment .

This table is just for illustration, and doesn't use real data and opinions, but you can see how organizing your evidence ahead of time can help you logically present and support your arguments and subtopics . It can also help you spot gaps, in case you need to do additional research, and gives you a head start on the next step: outlining your speech!

Make an Outline

Begin with a structured outline to ensure your speech flows logically from one point to the next. Your outline should include:

  • introduction elements
  • key subtopics and the relevant arguments and evidence, examples, anecdotes, or citations, all in sequential order
  • key wording for any important or challenging transitions from one line of thought to the next, or from one subtopic to the next
  • a section for responding to opposing arguments and viewpoints, with the specific rebuttals, all in sequential order
  • key points for your conclusion

Drafting Body Paragraphs, Your Introduction & Conclusion

Now you're making your first rough attempts of turning the key content of your speech into phrases, sentences, and paragraphs. So, this is a could point to refocus on the tone, style, and voice you want to use, and how to use it consistently.

Pro Tip: Write your introduction and conclusion after drafting all of your body paragraphs, because you these two sections to really capture the essence of the larger speech.

Introduction : Start with a strong hook—this could be a startling statistic, a compelling quote, or a relatable and captivating (or entertaining) anecdote— then briefly preview your main points to set the stage for your argument.

Conclusion : Reinforce your thesis with concise references to the the primary evidence you presented. End with a powerful closing statement that reminds your audience of why this topic is important. As suitable, you can also call your audience to action or leave them with something significant to ponder on their own.

Balancing Pathos, Logos, Ethos

Ensure a harmonious balance among logos (logical appeal), ethos (establishing your credibility and using evidence from credible sources and quotes or perspectives from credible authorities), and pathos (emotional appeal).

Checklist for Balancing Logos, Ethos, and Pathos

Here's a rubric, adapted from a real university writing rubric for persuasive speeches, that can help you elevate appeals to logos , ethos , and pathos in your speech.

  • Is the thesis clear and specific?
  • Is the thesis supported by strong reasons and credible evidence?
  • Is the argument logical and well organized?
  • What are the speaker’s qualifications?
  • How has the speaker connected him/herself to the topic being discussed?
  • Does the speaker demonstrate respect for multiple viewpoints, and respond to them with thoughtful arguments?
  • Are sources credible?
  • Are tone, style, and word choice appropriate for the audience/purpose?
  • Is the speech polished and written with care?
  • Are vivid examples, details and images used to engage the listeners' emotions and imagination?
  • Does the writer appeal to the values and beliefs of the listeners by using examples the audience can relate to or cares about?

Revise & Polish

Review your speech and revise for clarity, flow, sentence structure, and word choice.

Remember to use a voice and style consistent with making a speech, with the topic and subject matter, and the specific circumstances for your speech.

Remove any jargon or unnecessary details that might distract from your message.

Sharpen your arguments, making sure they are clear, concise, and compelling.

Practice the Delivery

Dedicate ample time to practicing what it will be like giving your speech. Focus on mastering the tone, pace, and volume of your delivery. If you have time limits on the speech, be sure to time your delivery as well, and make any needed adjustments. Consider body language, eye contact, and gestures, as these non-verbal cues can significantly impact your speech's effectiveness.

The more comfortable and familiar you are with your speech, the more confidently you'll present it.

Also, being nervous, for lots of people, is normal. Practice will help; with better command of your speech you'll feel more confident. Also, practicing your delivery with a friend who can listen and give you some feedback is good way to catch helpful adjustments.

Blog Banner

Final Thoughts

Finding a topic you like and one that your audience will be interested in is a critical foundation for an effective persuasive speech. It will also help you stay motivated and get more out of the experience!

Just remember that investing in some extra research, some thoughtful organization, anticipating counterarguments, and artfully weaving in ethos and pathos alongside a strong line of evidence-based arguments ( logos ) will help you elevate your speech and your learning experience.

With the insights we've just shared, you're more than ready to turn what is often a rote class exercise into something far more artful. In addition, your effort will help prepare you for college — for debating, editorial writing, legal argumentation, public policy advocacy, public speaking, and even running for ASB President!

If you're interested in taking on the challenge of more advanced research and persuasive writing, or even projects that involve scholarly publication, be sure to reach out to a Crimson Education Advisor for information on ways to get connected to advanced online courses and any number of cool capstone and research projects that will also connect you to networks of motivated young scholars and top-notch research and writing mentors.

About the Author

Keith Nickolaus

Keith Nickolaus

Keith Nickolaus is a former educator with a passion for languages, literature, and lifelong learning. After obtaining a B.A. from UC Santa Cruz and exploring university life in Paris, Keith earned his Ph.D. in Comparative Literature from UC Berkeley, and then worked for 16 years in K12 education before setting up shop as a freelance writer.

More Articles

What are the hardest a-levels in 2024.

What Are the Hardest A-Levels in 2024?

The AP Macroeconomics Exam: A Comprehensive Guide

The AP Macroeconomics Exam: A Comprehensive Guide

Your Guide to the AP Physics C Course and Exam

Your Guide to the AP Physics C Course and Exam

Crimson students are 7x more likely to gain acceptance to their dream college!

Remember, you don't have to navigate this journey alone. crimson provides a comprehensive suite of services, from academic mentoring and test prep to essay assistance, extracurricular guidance, and career mentoring, ensuring a holistic approach to your college preparation journey..

Glenn Geher Ph.D.

Free Speech Belongs to All of Us

Personal perspective: restrict freedom of expression, and democracy suffers..

Updated May 19, 2024 | Reviewed by Ray Parker

  • Several years ago, free speech was a hot topic. Many touted it as a tool of the far right.
  • Now, with many protests related to the Middle East occurring, free speech is touted by political progressives.
  • Protecting free speech rights includes expressions we agree with and those we disagree with.
  • When we pick and choose which ideas should be supported by free speech rights, democracy itself takes a hit.

StockSnap / Pixabay

Several years ago, our campus dis-invited a conservative speaker who was set to speak on issues related to the then-upcoming 2016 presidential election. While I identify politically very differently from said speaker (proud member of the Working Families Party of New York right here, if you're wondering), I truly believe in the importance of freedom of speech and its several sibling concepts (e.g., academic freedom and open inquiry). As an academic who is interested in having ideas from a broad array of viewpoints be expressed and explored as part of knowledge creation, I care deeply about ensuring people's right to express themselves. In 2016, I agreed to head a task force on free speech for the campus to help our community deal with the dis-invite, which many folks found concerning.

At the time, many people were unhappy that said conservative speaker was re-invited. And I think that the free speech task force that I headed may not have been the most popular entity on campus at the time. But regardless of how ardently I personally disagreed with pretty much everything that this particular speaker said (who did end up speaking on campus eventually), to this day, I stand by the basic principle of freedom of speech as a basic right in a democracy. Allowing him to speak at a public university within standard parameters that surround free speech, such as those pertaining to safety, defamation, and genuine hate speech, was, as I see it, simply the right thing to do. And if people disagreed with his points, then this forum would allow them to raise their concerns directly with him in a public manner. And that is exactly what happened.

Back then (about eight years ago now), supporting free speech was often conflated in people's minds with some kind of far-right agenda—an agenda that is often antithetical to ideological norms on many campuses today (see Burmila, 2021). I heard people argue that free speech needed to have limits, that it is an inherently unfair concept as some people in society have more opportunities to express themselves than do others, and that free speech was something of a tool of the far-right to maintain some sort of status quo. While I am actually sympathetic to some of these concerns, at the end of the day, a democracy without the right to free speech is not really a democracy at all in my book.

A lot has changed in eight years. Without getting into too much of the details, the current war between Israel and Hamas has, throughout the world, it seems, given the topic of free speech front-and-center stage once again.

However, it is interesting to see that the politics of free speech seem to have changed—partly as a matter of convenience. On so many campuses, several students and other activists this past semester chose to exercise their free speech rights to make statements against much of the brutality and horror that has been launched as part of that war. Students, professors, and all kinds of activists have been taking to activism (e.g., assembling to express their opinions, carrying picket signs that express their views, etc.). As an advocate of free speech (see a recent paper that I (along with several others) coauthored related to this issue, Clark et al., 2023 ), to the extent that they are carried out peacefully and safely, I support these individuals in their efforts—regardless of my stance on the issues that they are concerned about. In other words, to my mind, free speech protections and rights must be distributed across the board (again, keeping in mind standard limitations pertaining to such issues as inciting violence, defamation, libel, etc.).

The Free Speech Irony of 2024

It is more than a little interesting to me that the same people who argued against free speech when it came to dis-inviting conservative speakers seem to be adamantly standing by the tenets of free speech and First Amendment rights when it comes to supporting expressions about the Hamas/Israel war on campus. By and large, these campus protests (conspicuously documented at such campuses as UCLA and Columbia—along with many others) have taken a pro-Palistinean viewpoint. And given that more than 30,000 Palestinians have lost their lives in this conflict (with a large proportion of the deceased having been children), it is not hard to understand the outrage and concerns that are being expressed (although, of course, this is a famously complex issue with deep historical and political roots—all of which is beyond the purview of this piece). In any case, a true advocate of freedom of speech should be blind to any particular viewpoint that is being expressed. That is the whole idea of free speech in the first place.

Many academics who decried free speech several years ago when conservative speakers were being dis-invited from campuses left and right are now citing the importance of free speech when it comes to allowing for peaceful protests and demonstrations that are largely consistent with their viewpoints.

When It Comes to Free Speech, We Cannot Pick and Choose

When people support free speech on a convenience basis, free speech rights become lost. The First Amendment of the Constitution does not specify that freedom of speech applies to some viewpoints but not to others. We may disagree ardently with someone's viewpoint. But disallowing that viewpoint to be expressed—particularly in public, government-owned spaces (such as campuses of state universities)—has the capacity to reduce freedom of expression for all of us down the line.

Then They Came for Me

This renowned quote, "... then they came for me ..." is often attributed to Martin Niemöller in reference to the atrocities of the Holocaust. This point, which speaks for itself in these five simple yet chilling words, bears importance on the issue of reducing free speech rights. The second that people start to pick and choose what ideas are allowed to be expressed freely and what ideas are not, we all (perhaps without realizing it) start down a slippery slope. If a strong and vocal group successfully shuts down free expression regarding Issue X, that could come back and bite those same individuals at a later point when they are trying to express their viewpoints on Issue Y. The second that people in a democracy start to pick and choose what viewpoints are allowed to be expressed freely and what viewpoints are not, down the line, free speech rights end up being diluted for everyone.

When it comes to efforts to reduce the free speech of others, people shouldn't be surprised that, at some point, similar efforts may well be directed at them. In other words, if you actively take steps to reduce the free speech of others, and free speech rights become diluted in general, the "anti free speech police" may well come for you one day.

speech on human psychology

Bottom Line

I feel fortunate to live in a democracy. It is not perfect. Not by any means. But I find myself as someone with a lot to say on lots of topics and I am truly grateful for free speech rights (and its sibling that we call academic freedom ). Being disallowed to express certain perspectives to study certain topics or to present certain research findings is nothing short of censorship.

When it comes to freedom of expression, whether we like it or not, we need to realize that this right applies not only to our own viewpoints or ideas but also to the viewpoints and ideas of those with whom we may disagree quite ardently. The second that our communities start to limit freedom of expression for selected viewpoints, the rights of freedom of expression for everyone become diluted. And our democracy actually becomes less of a democracy. And I would guess that most people don't want that.

———————————————-

Note: This piece was partly inspired by conversations with SUNY New Paltz political scientist, Dr. Dan Lipson.

Burmila E. Liberal Bias in the College Classroom: A Review of the Evidence (or Lack Thereof). PS: Political Science & Politics . 2021;54(3):598-602. doi:10.1017/S1049096521000354

Clark CJ, Jussim L, Frey K, Stevens ST, Al-Gharbi M, Aquino K, Bailey JM, Barbaro N, Baumeister RF, Bleske-Rechek A, Buss D, Ceci S, Del Giudice M, Ditto PH, Forgas JP, Geary DC, Geher G, Haider S, Honeycutt N, Joshi H, Krylov AI, Loftus E, Loury G, Lu L, Macy M, Martin CC, McWhorter J, Miller G, Paresky P, Pinker S, Reilly W, Salmon C, Stewart-Williams S, Tetlock PE, Williams WM, Wilson AE, Winegard BM, Yancey G, von Hippel W. Prosocial motives underlie scientific censorship by scientists: A perspective and research agenda. Proc Natl Acad Sci U S A. 2023 Nov 28;120(48):e2301642120. doi: 10.1073/pnas.2301642120. Epub 2023 Nov 20. PMID: 37983511.

Glenn Geher Ph.D.

Glenn Geher, Ph.D. , is professor of psychology at the State University of New York at New Paltz. He is founding director of the campus’ Evolutionary Studies (EvoS) program.

  • Find a Therapist
  • Find a Treatment Center
  • Find a Psychiatrist
  • Find a Support Group
  • Find Online Therapy
  • International
  • New Zealand
  • South Africa
  • Switzerland
  • Asperger's
  • Bipolar Disorder
  • Chronic Pain
  • Eating Disorders
  • Passive Aggression
  • Personality
  • Goal Setting
  • Positive Psychology
  • Stopping Smoking
  • Low Sexual Desire
  • Relationships
  • Child Development
  • Self Tests NEW
  • Therapy Center
  • Diagnosis Dictionary
  • Types of Therapy

May 2024 magazine cover

At any moment, someone’s aggravating behavior or our own bad luck can set us off on an emotional spiral that threatens to derail our entire day. Here’s how we can face our triggers with less reactivity so that we can get on with our lives.

  • Emotional Intelligence
  • Gaslighting
  • Affective Forecasting
  • Neuroscience

speech on human psychology

  • Shop to Support Independent Journalism
  • We Have Issues
  • Investigations
  • Ethics Policy
  • Ad-Free Login

Trump masters 'horror politics' after hiring spree of 'human psychology' experts: analyst

Matthew Chapman

Matthew Chapman

News writer, matthew chapman is a video game designer who attended rensselaer polytechnic institute and lives in san marcos, texas. before joining raw story, he wrote for shareblue and alternet, specializing in election and policy coverage..

Donald Trump

Former President Donald Trump is trying to figure out how to motivate his supporters to the polls this fall, wrote Chauncey DeVega for Salon on Friday — and to do that he is trying to horrify them.

"The consultants and strategists who are using fear and horror to mobilize and engage Donald Trump’s followers (and to demobilize and exhaust those Americans who oppose the corrupt ex-president and the fascist MAGA movement) appear to possess an expert understanding of human psychology," wrote DeVega.

Specifically, they are trying to terrify Trump voters by telling them everything is at stake — without making them feel hopeless and powerless. But more than that, they want Trump's supporters radicalized and capable of violence, DeVega wrote.

One example of the "horror politics" is a new Trump email to supporters invoking his criminal hush money trial in Manhattan.

"GIVE ME LIBERTY OR GIVE ME DEATH!" it was headed, quoting Founding Father Patrick Henry. "I’ve been stripped of my constitutional rights. I’ve been forced to sit fully gagged while Biden’s cronies spread vicious lies. I’ve even been threatened with JAIL if I don’t bow down to the AMERICA-HATING leftists." It concludes by calling for supporters to "send the radical left Democrats a powerful message that WE WILL NEVER SURRENDER!"

"The purpose of Trump’s emails and other incendiary language is to further radicalize his MAGA people into committing violence because they have been told that they are supposedly facing an existential threat from their imagined 'enemies' in the Democratic Party," DeVega wrote.

ALSO READ: Delay, delay: Lauren Boebert keeping personal finances secret until after GOP primary

"Moreover, Trump’s lies that he is being 'ripped to shreds' and basically held hostage as he is 'gagged' and that the 'left' and 'The Democrats' are 'depraved savages' is a type of projection and promise of what he and his forces will do to his and their perceived enemies if wins the 2024 election.

His use of 'depraved savages' is thinly veiled code for the Black prosecutors, like Fulton County DA Fani Willis and Manhattan DA Alvin Bragg, who are trying to hold Trump accountable."

Trump and his strategists "know that frightened and terrified people are capable of doing very horrible things – and they are willing to do everything in their power to encourage such an outcome here," DeVega concluded darkly.

Stories Chosen For You

Should trump be allowed to run for office, 'hard to believe': chuckling arizona ag fact-checks rudy giuliani's court summons claim.

Rudy Giuliani was the subject a swift and comical fact check Monday from the Arizona attorney general who successfully subpoenaed him to appear in court .

Arizona Attorney General Kris Mayes detailed the many ways her office attempted to serve Rudy Giuliani before issuing the court summons at the former New York City mayor's 80th birthday in Florida.

"We had the attempted on multiple occasions in multiple ways to serve Mr. Giuliani our agents had traveled to New York City to try to serve him," Mayes said. "We were not allowed in his building there where he lives. We stayed there for two days."

Mayes told CNN anchor Kaitlan Collins that her team eventually replied upon Giuliani's social media to locate Trump's onetime lawyer and confidante.

"We found out essentially through his through his live streams," Mayes said. "He's not that hard to find."

Giuliani and other Trump allies have been charged in Arizona, Georgia and Michigan with their efforts to overturn the 2020 election results on behalf of the former president, who has been charged himself in Georgia and Washington, D.C.

Collins asked Mayes to verify a statement made by Giuliani had reached out to her agents "like a gentleman" and told them where he could be found.

"Is that true?" Collins asked. "Or was it that you just knew because it was widely publicized that he was having a birthday party in Palm Beach?"

Faced with this question, Mayes started laughing.

"Yeah, I can tell you he did not tell us where he was gonna be except that he told the world where he was," Mayes said. "It was really hard to believe he didn't know that we were looking for him given the number of times and the different ways we had tried."

Watch the video below or click here .

'Trump is the real cancel culture — emphasis on cult': Jon Stewart slams MAGA woke whining

The Daily Show Host Jon Stewart Monday skewered far-right media pundits over what he described as a hypocritical condemnation of victimhood culture in which they paint themselves as the biggest victims.

Stewart lashed out at Fox News hosts such as Sean Hannity and Laura Ingraham for their repeated and public rants against "woke cancel culture" he argued had nothing on them.

"This is their identity now," Stewart said. "They say what they want if you get upset about it, you don't believe in freedom."

Stewart's live audience broke out into spontaneous laughter when faced with a clip of Hannity declaring that he was not the type of person to become easily upset.

"Sean Hannity can say with a square head, 'I'm not the kind of guy who gets outraged?" Stewart demanded. "He's basically just a meat bag support system for a forehead vein."

The Daily Show then aired clips of Hannity promoting blood-boiling stories about 'disgusting' snowflakes whom he railed against.

"But every snowflake is different," a coy Stewart replied.

ALSO READ: Judge rejects Fox News-inspired Georgia man who rained hate on school administrator

Ultimately, Stewart argued the far-right's free speech complaints don't hold up to a modern age in which any and all internet users are encouraged to share their opinions.

"We are not censored or silenced, we are inundated," said Stewart. "And it is all weaponized by outrage hunters."

The segment then turned to Trump , specifically the speech in which he confused former South Carolina Gov. Nikki Haley and Rep. Nancy Pelosi.

Stewart showed viewers a clip Rep. Elise Stefanik (R-NY) downplaying Trump's fundamental mistake .

In a robotic voice, Stewart claimed, "He is reversed aging, he is stronger, he is Benjamin Button, he will be our wisest baby president!"

Stewart argued Republicans such as Stefanik who lean over backwards to avoid criticizing the former president are the true perpetrators of a worrisome cancel culture.

"Denying reality still won't save you," Stewart said. "There's no level of loyalty deep enough to be free of Trump cancel culture...emphasis on cult."

'Remarkable': Experts stunned by judge's slap-down revealed in Trump trial transcript

The newly released transcript in former President Donald Trump's criminal hush money trial reveals a dramatic moment that stunned legal experts who examined the document Monday night.

A CNN panel that included former federal attorney Elie Honig and Obama Administration "Ethics Czar" Norm Eisen unpacked the "extraordinary moment" when Merchan lectured attorney and Trump defense witness Robert Costello on courtroom decorum.

"The bigger drama happened after the jury was gone," Honig said. "What an embarrassment."

The panel looked specifically at the moment when Costello asked Justice Juan Merchan and received a swift and brutal reply.

“Can I say something, please?” Costello demanded.

“No," snapped Merchan. "This is not a conversation."

Said anchor Kaitlan Collins, "the most remarkable part" is that "the jury missed all of this."

Honig argued that while Merchan may initially appeared to have countered Trump's legal team, in truth the Manhattan criminal court judge had done them a service.

"The judge did the defense a bit of a favor," Honig said. "He get got the jury out of there real quick."

But Honig also wondered what exactly Trump's lawyers thought they had to gain by introducing the problematic witness, brought in purportedly to discredit former fixer and star witness Michael Cohen, in the first place.

"What did the defense even get from the guy," Honig asked. "They're just letting him act like a maniac and it's hurting the defense."

Eisen said he'd seldom seen a witness act out the Costello had

"They were rolling their eyes, they were pursing their lips, they were shaking their heads," Eisen said. "The irony [was]...will Michael Cohen kept his cool for an entire week. Bob Costello, a long-respected attorney blew up the entire courtroom."

speech on human psychology

'A fantasy of manhood': Are frat boys the new Proud Boys?

Inside donald trump’s billion-dollar big oil heist, the last time oligarchs tried to take over america it led to civil war.

Copyright © 2024 Raw Story Media, Inc. PO Box 21050, Washington, D.C. 20009 | Masthead | Privacy Policy | Manage Preferences | Debug Logs For corrections contact [email protected] , for support contact [email protected] .

speech on human psychology

IMAGES

  1. How Does The Brain Process Speech? Easily Explained

    speech on human psychology

  2. 100 Psychology Facts About Human Behaviour [2021]

    speech on human psychology

  3. Human Psychology

    speech on human psychology

  4. Human Psychology: Introduction to the Science of Behavior

    speech on human psychology

  5. 100 Psychology Facts About Human Behaviour [2021]

    speech on human psychology

  6. Famous Psychology Quotes About Life. QuotesGram

    speech on human psychology

VIDEO

  1. Human Wisdom proverbs and sayings quote

  2. How does the language that we speak shape the way we think?

  3. Psychology: The Human Experience 4 Language and Cognition & Intelligence & Motivation

  4. FOCUS ON YOURSELF

  5. Donald Trumps SPEECH will leave you SPEECHLESS One of the most eye opening speeches

  6. Science and Spirit (Full Version): A Talk Given by David Bohm

COMMENTS

  1. 20 Psychology Speech Topics • My Speech Class

    Psychology speech topics about our mental everyday state, think about testing emotional intelligence or even hypnosis and catharsis brought together in informative statements for writing on mindsetting. Hierarchy of human needs theory of Abraham Maslow. The series of levels in that process are good main points: the physiological, safety, belonging, esteem and self-actualisation needs. Why ...

  2. The 8 Coolest TED Talks on Psychology

    These great TED Talks from some of the field's leading lights make learning about psychology easy and entertaining. 1. How we read each other's minds, Rebecca Saxe. According to Saxe, a professor ...

  3. "The Psychology of Human Misjudgment" by Charlie Munger speech

    Speech Transcript. I am very interested in the subject of human misjudgment, and Lord knows I've created a good bit of it. I don't think I've created my full statistical share, and I think that one of the reasons was I tried to do something about this terrible ignorance I left the Harvard Law School with.

  4. Psychology of Human Misjudgment (Transcript) by Charlie Munger

    The Psychology of Human Misjudgment is the magnum opus on why we behave as we do. * In the run-up to publishing Poor Charlies Almanack, Charlie Munger remarked that "The Psychology of Human Misjudgment" could use "a little revising" to align with his most current views on the subject.. Charlie's "little" revision would amount to a full-scale rewrite, with loads of new material ...

  5. 6 Mind-Blowing TED Talks About Psychology & Human Behavior

    The human brain is complex and confusing, which explains why human behavior is so complex and confusing. People have a tendency to act one way when they feel something completely different. Here are a few interesting TED Talks that delve into human psychology and try to explain why we are the way we are.

  6. Ideas about Psychology

    Here's how you can handle stress like a lion, not a gazelle. It's all about being able to see the daily nerve-racking events in your life as challenges -- instead of threats -- and stress expert Elissa Epel PhD explains how we can all start to make this shift. Posted Feb 2023. See all articles on Psychology.

  7. What Is the Relationship Between Human Thought and Language?

    Key points. Unlike other species, our human language capacity shapes our inner thought processes in ways that we can begin to study and understand. Inner speech differs from outer speech in being ...

  8. Language, mind and brain

    Box 1 | Language is all in the mind. Language is often equated with 'speech' or 'communication', a view often leading to an inappropriate focus on just the neural mechanisms underlying ...

  9. How speech is produced and perceived in the human cortex

    Yves Boubenec. In the human brain, the perception and production of speech requires the tightly coordinated activity of neurons across diverse regions of the cerebral cortex. Writing in Nature ...

  10. Public Speaking for Psychologists

    Public Speaking for Psychologists is a practical and lighthearted guide to planning, designing, and delivering a presentation. The first half of the book covers the nuts-and-bolts of public speaking: preparing a talk, submitting an abstract, developing your slides, managing anxiety, handling questions, and preventing public-speaking disasters.

  11. Psychology

    Psychology is the study of mind and behavior. Its subject matter includes the behavior of humans and nonhumans, both conscious and unconscious phenomena, and mental processes such as thoughts, feelings, and motives.Psychology is an academic discipline of immense scope, crossing the boundaries between the natural and social sciences.Biological psychologists seek an understanding of the emergent ...

  12. Speech Perception

    Abstract. Speech perception is conventionally defined as the perceptual and cognitive processes leading to the discrimination, identification, and interpretation of speech sounds. However, to gain a broader understanding of the concept, such processes must be investigated relative to their interaction with long-term knowledge—lexical ...

  13. The Psychology of Speech

    The study of human communication and the intricate interplay between speech and psychology has given rise to a captivating field known as the psychology of speech. This multidisciplinary area of research delves into how speech influences our thoughts, behaviors, and interactions with others. By examining the psychological aspects of speech, we ...

  14. Speech

    Speech analysis techniques have been employed to identify linguistic patterns associated with these conditions, offering valuable insights for diagnostic and therapeutic purposes. In conclusion, the study of speech within psychology encompasses a broad spectrum of interconnected factors that significantly influence human behavior and mental ...

  15. The 8 Key Elements of Highly Effective Speech

    So before you utter another word to another person, memorize this list of the 8 key elements of highly effective speech: Gentle eye contact. Kind facial expression. Warm tone of voice. Expressive ...

  16. Introduction. The perception of speech: from sound to meaning

    1. Introduction. Spoken language communication is arguably the most important activity that distinguishes humans from non-human species. While many animal species communicate and exchange information using sound, humans are unique in the complexity of the information that can be conveyed using speech, and in the range of ideas, thoughts and emotions that can be expressed.

  17. Charlie Munger's Psychology of Human Misjudgment

    (June 1995) The Psychology of Human Misjudgment, a speech given in 1995 by legendary investor Charlie Munger, opened my eyes to how behavioral psychology can...

  18. Speech and Language

    Speech and Language. Civilization began the first time an angry person cast a word instead of a rock. Sigmund Freud. Observational learning has been evidenced in many species of animals including birds (Zentall, 2004) but approximations to speech appear practically unique to humans. Paul Revere famously ordered a lantern signal of "one if by ...

  19. The Psychology of Human Misjudgement

    ContentsCharlie's Reason for Interest in PsychologyBias 1: Reward and Punishment Superresponse TendencyBias 2: Liking/Loving TendencyBias 3: Disliking/Hating...

  20. Language Acquisition Theory In Psychology

    Language Acquisition in psychology refers to the process by which humans acquire the ability to perceive, produce, and use words to understand and communicate. ... Language is a cognition that truly makes us human. Whereas other species do communicate with an innate ability to produce a limited number of meaningful vocalizations (e.g., bonobos ...

  21. Psychology and hate speech: a critical and restorative encounter

    Psychology and hate speech: a critical and restorative encounter. Legal and policy frameworks in South Africa ensure the recognition of diversity, human rights, and recourse to justice in ways that take into consideration identity markers, including those related to sexual orientation and gender identity. However, an enduring effect of colonial ...

  22. Speech perception as an active cognitive process.

    One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input.

  23. Representation of internal speech by single neurons in human

    Speech is one of the most basic forms of human communication, a natural and intuitive way for humans to express their thoughts and desires. Neurological diseases like amyotrophic lateral sclerosis ...

  24. How to Advocate Effectively for Free Speech

    Compassionate listening makes people tolerant. Combative arguments do not. Create an inclusive environment to engage ambivalent persons. For free speech advocates, it is important to uplift the ...

  25. Device Decodes 'Internal Speech' in the Brain

    The latest study — published in Nature Human Behaviour on 13 May — is the first to decode words spoken entirely internally, by recording signals from individual neurons in the brain in real ...

  26. Pay attention to Speech

    Compensation: $12/hr for one 3hr visitAdults 18 years and older (<30 years) are needed for a study on how seeing a speaker's face helps with understanding what the speaker said. Researchers in the Language, Intersensory Perception, and Speech (LIPS) lab at UMass Amhers. To view a listing of other studies, see our Current Studies page.

  27. 100+ Excellent Topics for A Stellar Persuasive Speech

    1. Logos — Using clear, logical, and evidence-based reasoning and argumentation to add persuasive power to your speech. For obvious reasons, audiences will typically expect strong arguments supported by evidence and clear reasoning and logic, all elements that are often prominent on grading rubrics for persuasive speeches.

  28. Free Speech Belongs to All of Us

    Key points. Several years ago, free speech was a hot topic. Many touted it as a tool of the far right. Now, with many protests related to the Middle East occurring, free speech is touted by ...

  29. Trump masters 'horror politics' after hiring spree of 'human psychology

    During the speech, the ex-president also said that Special Counsel Jack Smith was prosecuting him in New York. In reality, Smith's cases are in D.C. and Florida.