Speech Production

  • Reference work entry
  • First Online: 01 January 2015
  • pp 1493–1498
  • Cite this reference work entry

speech production definition

  • Laura Docio-Fernandez 3 &
  • Carmen García Mateo 4  

1220 Accesses

3 Altmetric

Sound generation; Speech system

Speech production is the process of uttering articulated sounds or words, i.e., how humans generate meaningful speech. It is a complex feedback process in which hearing, perception, and information processing in the nervous system and the brain are also involved.

Speaking is in essence the by-product of a necessary bodily process, the expulsion from the lungs of air charged with carbon dioxide after it has fulfilled its function in respiration. Most of the time, one breathes out silently; but it is possible, by contracting and relaxing the vocal tract, to change the characteristics of the air expelled from the lungs.

Introduction

Speech is one of the most natural forms of communication for human beings. Researchers in speech technology are working on developing systems with the ability to understand speech and speak with a human being.

Human-computer interaction is a discipline concerned with the design, evaluation, and implementation...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

T. Hewett, R. Baecker, S. Card, T. Carey, J. Gasen, M. Mantei, G. Perlman, G. Strong, W. Verplank, Chapter 2: Human-computer interaction, in ACM SIGCHI Curricula for Human-Computer Interaction ed. by B. Hefley (ACM, 2007)

Google Scholar  

G. Fant, Acoustic Theory of Speech Production , 1st edn. (Mouton, The Hague, 1960)

G. Fant, Glottal flow: models and interaction. J. Phon. 14 , 393–399 (1986)

R.D. Kent, S.G. Adams, G.S. Turner, Models of speech production, in Principles of Experimental Phonetics , ed. by N.J. Lass (Mosby, St. Louis, 1996), pp. 2–45

T.L. Burrows, Speech Processing with Linear and Neural Network Models (1996)

J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals , 1st edn. (Macmillan, New York, 1993)

Download references

Author information

Authors and affiliations.

Department of Signal Theory and Communications, University of Vigo, Vigo, Spain

Laura Docio-Fernandez

Atlantic Research Center for Information and Communication Technologies, University of Vigo, Pontevedra, Spain

Carmen García Mateo

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Center for Biometrics and Security, Research & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

Departments of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA

Anil K. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this entry

Cite this entry.

Docio-Fernandez, L., García Mateo, C. (2015). Speech Production. In: Li, S.Z., Jain, A.K. (eds) Encyclopedia of Biometrics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7488-4_199

Download citation

DOI : https://doi.org/10.1007/978-1-4899-7488-4_199

Published : 03 July 2015

Publisher Name : Springer, Boston, MA

Print ISBN : 978-1-4899-7487-7

Online ISBN : 978-1-4899-7488-4

eBook Packages : Computer Science Reference Module Computer Science and Engineering

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Logo for Open Library Publishing Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

2.1 How Humans Produce Speech

Phonetics studies human speech. Speech is produced by bringing air from the lungs to the larynx (respiration), where the vocal folds may be held open to allow the air to pass through or may vibrate to make a sound (phonation). The airflow from the lungs is then shaped by the articulators in the mouth and nose (articulation).

Check Yourself

Video script.

The field of phonetics studies the sounds of human speech.  When we study speech sounds we can consider them from two angles.   Acoustic phonetics ,  in addition to being part of linguistics, is also a branch of physics.  It’s concerned with the physical, acoustic properties of the sound waves that we produce.  We’ll talk some about the acoustics of speech sounds, but we’re primarily interested in articulatory phonetics , that is, how we humans use our bodies to produce speech sounds. Producing speech needs three mechanisms.

The first is a source of energy.  Anything that makes a sound needs a source of energy.  For human speech sounds, the air flowing from our lungs provides energy.

The second is a source of the sound:  air flowing from the lungs arrives at the larynx. Put your hand on the front of your throat and gently feel the bony part under your skin.  That’s the front of your larynx . It’s not actually made of bone; it’s cartilage and muscle. This picture shows what the larynx looks like from the front.

Larynx external

This next picture is a view down a person’s throat.

Cartilages of the Larynx

What you see here is that the opening of the larynx can be covered by two triangle-shaped pieces of skin.  These are often called “vocal cords” but they’re not really like cords or strings.  A better name for them is vocal folds .

The opening between the vocal folds is called the glottis .

We can control our vocal folds to make a sound.  I want you to try this out so take a moment and close your door or make sure there’s no one around that you might disturb.

First I want you to say the word “uh-oh”. Now say it again, but stop half-way through, “Uh-”. When you do that, you’ve closed your vocal folds by bringing them together. This stops the air flowing through your vocal tract.  That little silence in the middle of “uh-oh” is called a glottal stop because the air is stopped completely when the vocal folds close off the glottis.

Now I want you to open your mouth and breathe out quietly, “haaaaaaah”. When you do this, your vocal folds are open and the air is passing freely through the glottis.

Now breathe out again and say “aaah”, as if the doctor is looking down your throat.  To make that “aaaah” sound, you’re holding your vocal folds close together and vibrating them rapidly.

When we speak, we make some sounds with vocal folds open, and some with vocal folds vibrating.  Put your hand on the front of your larynx again and make a long “SSSSS” sound.  Now switch and make a “ZZZZZ” sound. You can feel your larynx vibrate on “ZZZZZ” but not on “SSSSS”.  That’s because [s] is a voiceless sound, made with the vocal folds held open, and [z] is a voiced sound, where we vibrate the vocal folds.  Do it again and feel the difference between voiced and voiceless.

Now take your hand off your larynx and plug your ears and make the two sounds again with your ears plugged. You can hear the difference between voiceless and voiced sounds inside your head.

I said at the beginning that there are three crucial mechanisms involved in producing speech, and so far we’ve looked at only two:

  • Energy comes from the air supplied by the lungs.
  • The vocal folds produce sound at the larynx.
  • The sound is then filtered, or shaped, by the articulators .

The oral cavity is the space in your mouth. The nasal cavity, obviously, is the space inside and behind your nose. And of course, we use our tongues, lips, teeth and jaws to articulate speech as well.  In the next unit, we’ll look in more detail at how we use our articulators.

So to sum up, the three mechanisms that we use to produce speech are:

  • respiration at the lungs,
  • phonation at the larynx, and
  • articulation in the mouth.

Essentials of Linguistics Copyright © 2018 by Catherine Anderson is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Logo for University of Minnesota Libraries

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Hearing in Complex Environments

75 Speech Production

Learning Objectives

Understand the separate roles of respiration, phonation, and articulation.

Know the difference between a voiced and an unvoiced sound.

The field of phonetics studies the sounds of human speech. When we study speech sounds, we can consider them from two angles. Acoustic phonetics, in addition to being part of linguistics, is also a branch of physics. It’s concerned with the physical, acoustic properties of the sound waves that we produce. We’ll talk some about the acoustics of speech sounds, but we’re primarily interested in articulatory phonetics—that is, how we humans use our bodies to produce speech sounds.

Producing speech takes three mechanisms.

  • Respiration at the lungs
  • Phonation at the larynx
  • Articulation in the mouth

Let’s take a closer look

  • Respiration (At the lungs): The first thing we need to produce sound is a source of energy. For human speech sounds, the air flowing from our lungs provides energy
  • Phonation (At the larynx): Secondly, we need a source of sound: air flowing from the lungs arrives at the larynx. Put your hand on the front of your throat and gently feel the bony part under your skin. That’s the front of your larynx. It’s not actually made of bone; it’s cartilage and muscle. This picture shows what the larynx looks like from the front.

The larynx is shown from the front view. It is also labelled with its various different parts.

What you in Fig. 7.8.3 is that the opening of the larynx can be covered by two triangle-shaped pieces of tissue. These are often called “vocal cords” but they’re not really like cords or strings. A better name for them is vocal folds. The opening between the vocal folds is called the glottis.

Vocal Folds Experiment:

First I want you to say the word “uh-oh.” Now say it again, but stop half-way through (“uh-“). When you do that, you’ve closed your vocal folds by bringing them together. This stops the air flowing through your vocal tract. That little silence in the middle of uh-oh is called a glottal stop because the air is stopped completely when the vocal folds close off the glottis. Now I want you to open your mouth and breathe out quietly, making a sound like “haaaaaaah.” When you do this, your vocal folds are open and the air is passing freely through the glottis. Now breathe out again and say “aaah,” as if the doctor is looking down your throat. To make that “aaaah” sound, you’re holding your vocal folds close together and vibrating them rapidly. When we speak, we make some sounds with vocal folds open, and some with vocal folds vibrating.  Put your hand on the front of your larynx again and make a long “SSSSS” sound. Now switch and make a “ZZZZZ” sound. You can feel your larynx vibrate on “ZZZZZ” but not on “SSSSS.” That’s because [s] is a voiceless sound, made with the vocal folds held open, and [z] is a voiced sound, where we vibrate the vocal folds. Do it again and feel the difference between voiced and voiceless. Now take your hand off your larynx and plug your ears and make the two sounds again. You can hear the difference between voiceless and voiced sounds inside your head.3. The oral cavity is the space in your mouth. The nasal cavity, as we know, is the space inside and behind your nose. And of course, we use our tongues, lips, teeth and jaws to articulate speech as well. In the next unit, we’ll look in more detail at how we use our articulators.

  • Articulation (In the oral cavity): The oral cavity is the space in your mouth. The nasal cavity, as we know, is the space inside and behind your nose. And of course, we use our tongues, lips, teeth and jaws to articulate speech as well. In the next unit, we’ll look in more detail at how we use our articulators.

speech production definition

So, to sum it up, the three mechanisms that we use to produce speech are:

  • Respiration (At the lungs): Energy comes from the air supplied by the lungs.
  • Phonation (At the larynx): The vocal folds produce sound at the larynx.
  • Articulation (In the mouth): The south is filtered, or shaped, by the articulators.

Wikipedia, Larynx URL: https://commons.wikimedia.org/wiki/File:Illu_larynx.jpg License: Public Domain

Introduction to Sensation and Perception Copyright © 2022 by Students of PSY 3031 and Edited by Dr. Cheryl Olman is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical Literature
  • Classical Reception
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Archaeology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Papyrology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Agriculture
  • History of Education
  • History of Emotions
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Variation
  • Language Families
  • Language Acquisition
  • Language Evolution
  • Language Reference
  • Lexicography
  • Linguistic Theories
  • Linguistic Typology
  • Linguistic Anthropology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Modernism)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Culture
  • Music and Religion
  • Music and Media
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Society
  • Law and Politics
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Medical Ethics
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Games
  • Computer Security
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business History
  • Business Strategy
  • Business Ethics
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Methodology
  • Economic Systems
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Theory
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Language Evolution

  • < Previous chapter
  • Next chapter >

22 The anatomical and physiological basis of human speech production: adaptations and exaptations

Ann MacLarnon is Director of the Centre for Research in Evolutionary Anthropology at Roehampton University. She has worked on a wide variety of areas in primatology and palaeoanthropology, with an emphasis on comparative approaches. Research topics include reproductive life histories and physiology, stress endocrinology and behaviour, and aspects of comparative morphology including the brain and spinal cord. Work on this last area led to the unexpected discovery that humans evolved increased breathing control for speech.

  • Published: 18 September 2012
  • Cite Icon Cite
  • Permissions Icon Permissions

This article provides details on human speech production involving a range of physical features, which may have evolved as specific adaptations for this purpose. All mammalian vocalizations are produced similarly, involving features that primarily evolved for respiration or ingestion. Sounds are produced using the flow of air inhaled through the nose or mouth, or expelled from the lungs. Unvoiced sounds are produced without the involvement of the vocal folds of the larynx. Mammalian vocalizations require coordination of the articulation of the supralaryngeal vocal tract with the flow of air, in or out. An extensive series of harmonics above a fundamental frequency, F 0 for phonated sounds is produced by resonance. These series are filtered by the shape and size of the vocal tract, resulting in the retention of some parts of the series, and diminution or deletion of others, in the emitted vocalization. Human sound sequences are also much more rapid than those of non-human primates, except for very simple sequences such as repetitive trills or quavers. Human vocal tract articulation is much faster, and humans are able to produce multiple sounds on a single breath movement, inhalation or exhalation. The unique form of the tongue within the vocal tract in humans is considered to be a key factor in the speech-related flexibility of supralaryngeal vocal tract.

The major medium for the transmission of human language is vocalization, or speech. Humans use rapid, highly variable, extended sound sequences to transmit the complex information content of language. Speech is a very efficient communication medium: it costs little energetically, it does not require visual contact with the intended receiver(s), and it can be carried out simultaneously with separate manual and other tasks. Although the vocal communication systems of some birds and other mammals, such as cetaceans, may resemble important aspects of human speech, none is as complex, nor as capable of transmitting information, as human speech‐propelled language. Certainly, our closest relatives, the apes and other primates, demonstrate nothing close to this unique human form of communication. Human speech production involves a range of physical features which may have evolved as specific adaptations for this purpose; alternatively, they evolved as exaptations, commandeering existing features. Combining knowledge of the anatomical and physiological basis of human speech production, comparisons with other primate species, and information from the human fossil record, it is possible to form an outline framework for the evolution of human speech capabilities, the features concerned, the likely timing and sequence in which they arose, and the possible combination of adaptations and exaptations involved—the what, when, and why of speech evolution.

All mammalian vocalizations are produced similarly, involving features that primarily evolved for respiration or ingestion. Sounds are produced using the flow of air inhaled through the nose or mouth, or expelled from the lungs. Unvoiced sounds are produced without the involvement of the vocal folds of the larynx. They entail pressurizing the airflow by temporary restriction of the vocal tract at some point(s) along its length. The turbulence of the released air produces either an aperiodic noise, such as a burst or hiss, or, under special conditions, it may produce a periodic sound such as a whistle. For voiced or phonated sounds, the vocal folds at the glottis of the larynx (a structure which first evolved at the top of the trachea to prevent water entering the lungs in aquatic creatures) are held taut, and the air flow needs to be powerful enough to cause the vocal folds to vibrate. This cuts the air flow into a chain of ‘air puffs’, or a periodic sound wave, perceived by the ear as sound at a pitch equivalent to the air puff frequency; this is known as the fundamental frequency or F 0 , and it varies with the length and tension of the vocal folds. Voiced sounds may be modified further by so‐called gestural articulations of the supralaryngeal vocal tract produced by positions or movements of articulatory structures such as the tongue and lips, both primarily involved in ingestion. Mammalian vocalizations therefore require coordination of the articulation of the supralaryngeal vocal tract with the flow of air, in or out. For phonated sounds, an extensive series of harmonics above F 0 is produced by resonance. These series are filtered by the shape and size of the vocal tract, resulting in the retention of some parts of the series, and diminution or deletion of others, in the emitted vocalization. Unvoiced vocalizations generally have less structured acoustic features and broad bands of emitted frequencies. What distinguishes human speech from the vocalizations of other species is the extraordinary range of acoustic variation involved, produced by an enormous variety of gestural articulations of the vocal tract, together with intricate manipulations of the larynx and other respiratory structures. Rather than utilizing the air flow of both inspirations and expirations, human speech is also produced almost entirely on expired air, released in extended, highly controlled expirations.

More than 100 different sound units or phonemes found in human languages are recognized in the International Phonetic Alphabet, together with a further array of major variant types. Each sound unit is acoustically distinctive (Fant 1960 ), as depicted in spectrograms, in which emitted sound frequencies and their amplitudes are plotted against time. Phonemes vary with different relative timing of the start of phonation and of vocal tract constriction, different speeds of movement and combinations of vocal tract articulators, different intonation changes produced in the larynx or by the lungs; sounds may be breathy, creaky, nasal, or aspirated, and so the list goes on. Different languages use different subsets of phonemes.

Phonemes comprise consonants and vowels, which form the building blocks of syllables. Consonants, voiced or unvoiced, involve the complete or near complete obstruction and release of airflow through the vocal tract, which produces characteristic spectrum profiles or envelopes of sound frequencies emitted over time (Fant 1960 ). Vowels always involve phonation, and filtering through different vocal tract constrictions produced by gestures of the tongue, without complete obstruction. They are distinguished by their combinations of formants (Fant 1960 ), which are sharp peaks in the frequency ranges above F 0 emitted following filtration, known as F 1 , F 2 , etc.; typically, different vowels within a language can be characterized by the first two formants. The perception of vowels is not dependent on their absolute formant frequencies, but rather their relative values, normalized by the listener according to the typical frequency levels of a particular individual speaker, be they generally higher or lower pitched, the differences resulting from a shorter or longer vocal tract.

The range and variation of human speech sounds, the different subsets utilized in hundreds of languages, and how they are produced anatomically and physiologically, have been superbly documented in an extraordinary compendium by Ladefoged and Maddieson ( 1996 ). For consonants, they describe how nine independent, moveable, soft tissue articulators can be distinguished: lips; tongue—tip, blade, underblade, front, back, root; epiglottis; and glottis. These move to constrict or block the vocal tract at 11 main articulation points, or more accurately zones: lips, incisor teeth, different points along the palate, the velum or soft palate, and the uvula (the skin flap hanging from the velum), the pharynx or throat, the epiglottis, and the glottis. Together these produce 17 different categories of articulatory gestures, whose precise formation varies in different languages and dialects. Consonants are further differentiated into stops, nasals, fricatives, laterals, rhotics, and clicks, according to whether they involve, respectively, momentary complete stoppage of airflow by vocal tract obstruction, mouth closure and nasal‐only airflow, a turbulent airstream, midline tract closure limited with lateral airflow around the partial obstruction, tongue trills and related movements, or two points of vocal tract closure trapping air with subsequent articulator movement increasing the trapped air volume and hence decreasing pressure prior to its sudden release. Vowel production involves subtle tongue‐shaping in the oral or pharyngeal cavities, resulting in different points of vocal tract constriction, and hence different formant combinations.

It became evident early in attempts to teach apes to speak that our closest living relatives are not capable of the intricate articulatory manoeuvres of the upper respiratory tract which underlie the enormous range of human speech sounds. Recent evidence from Diana monkeys suggests that vocal tract articulation in non‐human primates may not be as severely limited as previously thought (Riede et al. 2005 ). However, it seems improbable that capabilities so useful to human communication would not have been exploited more fully if they existed in other species, and it is therefore likely that the human capacity for the production of highly varied speech sounds is unique among primates.

Human sound sequences are also much more rapid than those of non‐human primates, except for very simple sequences such as repetitive trills or quavers. Human vocal tract articulation is much faster, and humans are able to produce multiple sounds on a single breath movement, inhalation or exhalation. Most non‐human sound sequences, such as chimpanzee pant‐hoots and other vocalizations (Marler and Tenaza 1977 ), are produced on successive inspirations and expirations. Commonly each component sound of such sequences (e.g. the pant, or the hoot of the chimpanzee call) can only be produced on either an inhalation or an exhalation, which also restricts sound sequence combinations.

The laryngeal air sacs present in some non‐human primate species enable them to produce slightly more complex sound sequences on single breath movements, either through additional breath movements in and out of the sacs, or by vibration of the vocal lip at the opening of the sacs into the larynx (e.g. bitonal scream of siamangs; Haimoff 1983 ). Humans do not possess air sacs, and instead produce complex sound sequences by the intricate manipulation of airflow within individual exhalations, freed much more than any non‐human primate from the restrictions of vocalizations tied to breath movements (Hewitt et al. 2002 ). Overall, humans are able to produce sound sequences of up to about 30 sound units per second (P. Lieberman et al. 1992 ). Maximum sound production rates for non‐human primates are typically only 2–3 per second, extending to 5 per second with the involvement of air sacs (MacLarnon and Hewitt 1999 ).

Human speech also demonstrates further flexibility through an enhanced ability to control breathing, the airflow itself, compared with non‐human primates (MacLarnon and Hewitt 1999 , 2004 ). First, humans speak on very extended exhalations, interspersed with quick inhalations, compared with much more even breathing cycles during quiet breathing; non‐human primates appear not to be able to distort their breathing cycles so markedly. During normal speech, humans typically utilize exhalations of 4–5 seconds (Hoit et al. 1994 ), extending up to more than 12 seconds (Winkworth et al. 1995 ), whereas the longest calls given on single breath movements in non‐human primates are only about 5 seconds (MacLarnon and Hewitt 1999 ). Calibrating these measures, taking into account the faster quiet breathing rates of smaller animals, the maximum duration of human speech exhalations is more than 7 times that during quiet breathing. In non‐human primates, the normal maximum duration of exhalations during vocalization is only 2–3 times that during quiet breathing. The exceptions to this are species with air sacs, such as howler monkeys and gibbons, which can extend exhalations to 4–5‐fold their duration during quiet breathing. Again, humans do not possess air sacs, an apparent alternative to control of pulmonary air release for extending call exhalation length, though one that does not enable the very subtle control of respiratory airflow of human speech (Hewitt et al. 2002 ).

22.1 Sound articulation

The unique form of the tongue within the vocal tract in humans is considered to be a key factor in the speech‐related flexibility of our supralaryngeal vocal tract (P. Lieberman 1984 ). In mammals, the tongue is typically a flat muscular structure lying largely within the oral cavity, anchored posteriorly by its attachment to the hyoid bone, which lies just below oral level in the pharynx, immediately above the larynx. The primary function of the tongue is to move food around the mouth for mastication, and posteriorly for swallowing. In humans, however, the tongue is a curved structure, lying part horizontally in the oral cavity and part vertically down an extended pharynx, where it attaches to a much lower hyoid, just above a descended larynx. The horizontal (oral) and vertical (pharyngeal) portions of the human supralaryngeal tract (SVT H and SVT V ) are equal in length, compared with other species in which SVT H is substantially longer. Greatly because of its curvature, movement of the human tongue, together with jaw movements, can vary the cross‐sectional area of each of the two tubes of our vocal tract independently by a factor of approximately ten, providing a very broad range of articulatory gestures, and very variable resultant formants of emitted sound. The 1:1 ratio of SVT H :SVT V , with a sharp bend between the two, is notably important for the production of three vowels, designated phonetically [i], [u], and [a]. These vowels are particularly easily distinguished, with very low perceptual error rates, by their F1, F2 combinations, which lie at the outer limits of the acoustic vowel space, and [i], followed by [u], is the most reliable and commonly used sound unit for vocal tract normalization. The tongue positions for production of the three vowels utilize the angle at the midpoint of the human vocal tract to produce abrupt discontinuities in the cross‐sectional areas of the tube. Because the angle is sharp, the articulatory gestures involved do not have to be performed with particular accuracy for consistent, distinctive acoustic results, making these vowels marked examples of the quantal nature of human speech sounds (Stevens 1972 ). Perhaps consequently, they are the most common vowels in the world's languages (Ladefoged and Maddieson 1996 ).

Humans are not completely unique in having a descended larynx; species including dog, goat, pig, and tamarin lower the larynx during loud calls (Fitch 2001b ). Several deer have a permanently lowered larynx, which may temporarily be lowered further during male roars (Fitch and Reby 2001 ); large cats are apparently similar (Weissengruber et al. 2002 ). However, laryngeal descent is rarely accompanied by descent of the hyoid; hence the tongue remains horizontal in the oral cavity, and cannot act as a pharyngeal articulator (P. Lieberman 2007 ). Temporary laryngeal descent is also much less disruptive of other functions. In humans, because of marked, permanent laryngeal descent, simple contact between the epiglottis and velum is no longer possible, disrupting the normal mammalian separation of the respiratory and digestive tracts during swallowing, and increasing the risk of choking. Permanent laryngeal descent is thus a very different evolutionary development. Nishimura et al. ( 2006 ) have demonstrated that the larynx does descend to some extent during development in chimpanzees, followed by hyoidal descent. However, only humans have evolved permanent, major, laryngeal descent, with associated hyoidal descent, resulting in a curved tongue, and a two‐tube vocal tract with 1:1 proportions. It is not laryngeal descent per se that is crucial to human speech capabilities, but rather a suite of factors in the shape and proportions of the supralaryngeal vocal tract and tongue (P. Lieberman 2007 ).

Considerable efforts have been made to determine when the two‐tube vocal tract evolved in our ancestors, using indirect means, as its soft tissue structures do not fossilize. Reconstruction of the fossil hominin tract was first attempted by Philip Lieberman and Crelin ( 1971 ), using basicranial and mandibular characteristics, followed by Laitman and colleagues (e.g. 1979), who used the basicranial angle, or flexion of the skull base. However, Daniel Lieberman and McCarthy ( 1999 ) recently demonstrated, using radiographic series, that human laryngeal descent is not linked ontogenetically to the development of basicranial flexion. So, reconstruction of the supralaryngeal tract is not possible from basicranial form, and much previous work on the speech articulation capabilities of fossil hominins was therefore flawed, as P. Lieberman ( 2007 ) has fully accepted. In addition, D. Lieberman et al. ( 2001 ) showed that during postnatal descent of the hyoid and larynx in humans, the relative vertical positions of the hyoid, mandible, hard palate and larynx are held more or less constant. However, the ratio SVT H :SVT V changes during development, as a result of differential growth patterns of the total oral and pharyngeal lengths, and only reaches 1:1 from about 6–8 years. Together these results indicate that the descent of the hyolaryngeal structures is primarily constrained to maintain muscular function in relation to mandibular movement for swallowing; speech‐related factors are not maximized until well into childhood, matching the gradual ontogenetic development of acoustically accurate speech production (P. Lieberman 1980 ). Various possible exaptive explanations for why humans evolved their unique vocal tract configuration have been proposed. For example, obligate bipedalism required a more forward position of the spine under the skull, possibly reducing the space available in the upper throat, so squeezing the hyoid and larynx down the pharynx; increased carnivory in early Homo was associated with reduced jaw size and reduced oral cavity length, possibly requiring a compensatory increase in pharyngeal length (Negus 1949 ; Aiello 1996 ).

Recently, D. Lieberman and colleagues (e.g. 2002) have produced substantial new evidence on the integrated evolution of many modern human cranial features, providing a more comprehensive basis for exploring the evolution of the human vocal tract. They showed that a small number of developmental shifts distinguish modern human crania from those of our predecessors, including two—a more flexed basicranium and reduction in face size—which result in a shortening of SVT H , contributing to the attainment of an SVT H :SVT V ratio of 1:1. D. Lieberman ( 2008 ) suggested possible adaptational bases for these shifts, such as temporal lobe increase for enhanced cognitive processing including language, increasing basicranial flexion; increased meat consumption and technologically enhanced food processing including cooking, resulting in facial reduction; endurance running, building on obligate bipedalism, involving facial reduction for improved head stabilization; direct selection for speech capabilities, driving a decrease in oral cavity length, involving facial reduction and/or basicranial flexion, to produce a 1:1 SVT H :SVT V ratio. In other words, a suite of factors may have affected SVT H , and hence played a part in the evolution of the modern human capability for quantal speech. The other component in the evolution of a 1:1 ratio, an increase in SVT V , may have been directly selected for enhanced speech capabilities, so counterbalancing the negative impact of increased choking risk. However, this would not have been advantageous prior to substantial decrease in SVT H , because a long SVT V would require laryngeal descent into the thorax, producing muscular orientations that would compromise functional swallowing. Rather than major, coordinated shifts in both vocal tract parameters occurring with the evolution of modern humans, I think it more probable that other factors, earlier in human evolution, produced descent of the hyolaryngeal complex, and an increase in SVT V . From this exaptive basis, final reduction in SVT H, with the evolution of modern human cranial shape, could be adaptive for quantal speech. As outlined above, maintenance of functional swallowing is central to human developmental hyolaryngeal descent, which only becomes advantageous for speech articulation later in childhood. This, too, is congruent with the suggestion that hyolaryngeal descent resulted from earlier evolutionary change. The most likely candidate is the evolution of bipedalism, involving reconfiguration of neck structures, in Homo erectus . Jaw length also reduced in this species, associated with changing diet and food processing. The use of more complex vocalizations for communication may have begun to increase at the same time, alongside brain size and presumed social complexity (Aiello 1996 ).

As well as its curved shape, other features of the tongue have also been explored for their potential contribution to human speech articulation. Duchin ( 1990 ) drew attention to the greater manoeuvrability of the human tongue compared with apes. Jaw reduction produces a shorter, more controllable tongue, and hyoidal descent angles the tongue, increasing mechanical advantage. Takemoto ( 2008 ) showed that chimpanzee and human tongues have the same detailed internal topology, a muscular hydrostat formation (Kier and Smith 1985 ), which enables elongation, shortening, thinning, fattening, and twisting of the tongue for moving food around the mouth and for swallowing. However, the overall curved shape of the human tongue, compared with the flat chimpanzee form, means the same internal structures are arranged radially in humans, compared with linearly in apes, which increases the degrees of freedom for tongue deformation (Takemoto 2008 ). Hence, the dietary and other changes from early Homo through to modern humans provided the potential for enhanced control of speech articulation gestures through exaptive realignment of both external and internal tongue features.

The lips are second only to the tongue in their importance as human speech articulators. They are particularly important for the production of two major consonant groups, stops and fricatives (the former being the only consonant type to occur in all languages), and also in vowel production (Ladefoged and Maddieson 1996 ). In typical mammals, the face is dominated by a prominent snout housing major structures of the highly developed olfactory sense, which extend onto the face, in the form of the rhinarium, or wet nose. Within primates, the evolution of the haplorhines (tarsiers, monkeys, and apes) involved a shift to diurnal activity from the typical mammalian nocturnal pattern retained by strepsirhines (lemurs and lorises). With this came increased specialization of the visual sense, and an associated reduction in olfaction. The snout reduced, and the rhinarium was lost. As a result, the facial and lip muscles became less constrained and were co‐opted for facial expressions. Haplorhines evolved thicker lips (Schön Ybarra 1995 ), presumably to enhance this function. Hence, the evolution of mobile, muscular lips, so important to human speech, was the exaptive result of the evolution of diurnality and visual communication in the common ancestor of haplorhines. There is a lack of evidence as to whether there have been further adaptational developments in the lips during human evolution, or whether there have been changes in some other articulators, such as the velum or the epiglottis.

To date, there has been one attempt to investigate the comparative innervation of human vocal tract articulators. Kay et al. ( 1998 ) used the size of the hypoglossal canal in the base of the skull to estimate the relative number of nerve fibres in the hypoglossal nerve, which is a major innervator of the tongue. Their results suggested that Middle Pleistocene hominins and Neanderthals had modern human levels of tongue innervation, substantially greater than found in australopithecines and apes, and hence, they suggested, human‐like speech‐related tongue control had evolved by this time. However, DeGusta et al. ( 1999 ) demonstrated that hypoglossal canal and nerve sizes are not correlated, and Jungers et al. ( 2003 ) accepted that the canal size therefore offers no evidence about the timing of human speech evolution. Split second coordination between the highly flexible movements of the human speech articulators is required for human speech, as well as coordination with laryngeal movements affecting phonation. Different sounds result, for example, if the vocal cords start vibrating slightly before, at the same time, or slightly after an articulatory gesture. It seems likely that at least some increase in neural control has evolved in humans for speech articulation, even if empirical evidence is presently lacking.

22.2 Respiratory control

Humans have enhanced control of breathing compared with non‐human primates, which they use to extend exhalations and shorten inhalations during speech, as well as to modulate loudness. Humans are not constrained to produce vocalizations that fade as the lungs deflate. They can also vary the volume of air released through a phrase to emphasize particular words or syllables. In addition, variation in subglottal air pressure can affect intonation patterns. Enhanced breathing control therefore contributes to the human ability to produce fast sound sequences, and to generate a whole variety of language‐specific patterns and meanings, communicated through the intonation and emphasis of phrases or specific syllables. Much of this needs to be tied to cognitive intention, involving complex neural communication and feedback (MacLarnon and Hewitt 1999 ).

Control of subglottal pressure is key to human speech breathing control. During speech breathing, intercostal and anterior abdominal muscles are recruited to expand the thorax and draw air into the lungs, and to control gravitational recoil and hence the release of air as the lungs deflate. This is similar to quiet breathing, except that the diaphragm has a very limited role in speech breathing. It also differs from muscle recruitment during non‐human primate vocalizations, which does involve the diaphragm, and has only a limited role for intercostal muscles (e.g. Jürgens and Schriever 1991 ). The specific muscle movements required vary according to the volume of the lungs and other actions undertaken simultaneously (MacLarnon and Hewitt 1999 ). Overall, the fineness of control required of the intercostal muscles during human speech has been likened to that of the small muscles of the hand (Campbell 1968 ).

There is evidence, from an increase in spinal cord grey matter in the thoracic region, that humans have markedly greater innervation of the intercostal and anterior abdominal muscles compared with non‐human primates (MacLarnon 1993 ). Spinal cord dimensions are well correlated with those of its bony encasement, the vertebral canal. Evidence from fossil hominins demonstrates that enlargement of the canal, and therefore the cord, was not present in australopithecines and Homo erectus , but was present in Neanderthals and early modern humans (MacLarnon and Hewitt 1999 ). The function requiring enhanced neurological control therefore evolved in later human evolution. Of all the functions of the intercostal muscles, including maintenance of body posture for bipedal locomotion, vomiting, coughing, defecation, and breathing control, only enhanced breathing control for speech both requires substantial neurological control and fits the evolutionary timing constraints. It appears, therefore, that enhanced breathing control for speech was absent in Homo erectus , and present in the common ancestor of Neanderthals and modern humans, in the later Middle Pleistocene (MacLarnon and Hewitt 1999 , 2004 ).

As outlined above, human breathing control is not aided by the presence of air sacs, which can provide additional re‐breathed air for the extension of exhalations, without the risk of hyperventilation from excess oxygen intake (Hewitt et al. 2002 ). Larger ape species all possess laryngeal air sacs, so they were presumably lost at some point during human evolution. Air sacs abut against the hyoid bone where they produce characteristic indentions. The australopithecine hyoid from Dikika demonstrates the presence of air sacs (Alemseged et al. 2006 ), whereas hyoids from Homo heidelbergensis at Atapuerca, and a specimen from Castel di Guido dated to 400,000 years ago, as well as Neanderthals from El Sidrón and Kebara (Arensburg et al. 1990 ; Capasso et al. 2008 ; Martínez et al. 2008 ), show that air sacs had been lost by some point in the Middle Pleistocene. One possibility is that this occurred when the human thorax altered from the funnel‐shape of australopithecines, to the barrel‐shape of Homo erectus , as, in apes, air sacs extend into the thorax. It therefore quite probably occurred prior to the evolution of human speech‐breathing control, and it may also have been a necessary prerequisite stage.

The mammalian larynx, which protects the entrance to the lungs during swallowing, comprises a series of three sets of articulating cartilages connected by ligaments and membranes. Some mammal species retain a non‐valvular larynx, in which occlusion involves a simple muscular sphincter; other species have a valvular larynx, in which a mechanical valve provides for closure at the glottis. Based on the distribution of the valvular form, including its greatest development in primates, Negus ( 1949 ) proposed that the valvular larynx is a locomotor adaptation, enabling greater stabilization of the thorax in species with independent use of the forelimbs, through build up of air pressure below a closed glottis. Humans share with gibbons an extreme ability to close the glottis; other primates cannot completely close it off as the inner edges of the vocal processes of their arytenoid cartilages are curved, and when brought together, a small hiatus intervocalis always remains (Schön Ybarra 1995 ). Most likely humans lost the hiatus intervocalis independently from gibbons, as it is retained in living great apes. Gibbons may have evolved complete closure as an adaptation to brachiation. Bipedal humans use the capability of building up high subglottal pressure while lifting heavy objects with their arms, and in forceful coughing, which is particularly important with upright posture (Aiello and Dean 1990 ). In addition, for human speech, substantial subglottal air pressure is required to fuel very long exhalations. Complete glottal closure enhances the ability to control the pitch or intonation (Kelemen 1969 ), something which gibbons use in their songs, and humans use in speech, although it is unclear whether subglottal air pressure, or movements of the laryngeal cricothyroid muscle are more important in human control of intonation (Borden et al. 2003 ). Overall, humans probably lost the hiatus intervocalis as an adaptation to bipedalism, providing an exaptation for speech. Further to this, the membranous part of the vocal folds of humans is less sharp‐edged than in other primates (Negus 1929 ). This may be a direct adaptation for the production of more melodious sounds, selected for at some point after the locomotor‐associated function of the larynx altered in humans, with the evolution of exclusive bipedality in Homo erectus (Aiello 1996 ).

22.3 Evolutionary framework

Diet and technology‐related changes through human evolution, from the time of early Homo , have produced decreases in jaw and tongue length exaptive for the evolution of human speech capabilities. In addition to these, a three‐stage framework for the major features of human speech evolution can tentatively be proposed: first, the evolution of obligate bipedalism in Homo erectus produced the exaptations of laryngeal descent, and the loss of air sacs and the hiatus intervocalis; secondly, during the Middle Pleistocene, human speech breathing control evolved as a specific speech adaptation; thirdly, with the evolution of modern humans, the optimal vocal tract proportions (1:1) were evolved adaptively. Further details are summarized in Table 22.1 , together with suggested speech capabilities for each stage of the evolutionary framework.

Acknowledgements

I would like to thank Kathleen Gibson and Maggie Tallerman for the invitation to contribute to this volume, and for their very helpful editing. My interest in the evolution of human speech was first stimulated by stumbling on evidence for the evolution of human breathing control working with Gwen Hewitt. This paper builds on a lecture prepared for the Language Origins Society, thanks to an invitation from Bernard Bichakjian.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Speech Lang Hear Res

Speech Production From a Developmental Perspective

Melissa a. redford.

a Linguistics Department, University of Oregon, Eugene

An external file that holds a picture, illustration, etc.
Object name is CCBY.jpg

Current approaches to speech production aim to explain adult behavior and so make assumptions that, when taken to their logical conclusion, fail to adequately account for development. This failure is problematic if adult behavior can be understood to emerge from the developmental process. This problem motivates the proposal of a developmentally sensitive theory of speech production. The working hypothesis, which structures the theory, is that feedforward representations and processes mature earlier than central feedback control processes in speech production.

Theoretical assumptions that underpin the 2 major approaches to adult speech production are reviewed. Strengths and weaknesses are evaluated with respect to developmental patterns. A developmental approach is then pursued. The strengths of existing theories are borrowed, and the ideas are resynthesized under the working hypothesis. The speech production process is then reimagined in developmental stages, with each stage building on the previous one.

The resulting theory proposes that speech production relies on conceptually linked representations that are information-reduced holistic perceptual and motoric forms, constituting the phonological aspect of a system that is acquired with the lexicon. These forms are referred to as exemplars and schemas, respectively. When a particular exemplar and schema are activated with the selection of a particular lexical concept, their forms are used to define unique trajectories through an endogenous perceptual–motor space that guides implementation. This space is not linguistic, reflecting its origin in the prespeech period. Central feedback control over production emerges with failures in communication and the development of a self-concept.

Speech motor control allows for flexible, fast, and precise coordination of speech articulators to achieve a motor goal. Adult performance in auditory feedback perturbation experiments suggests not only sensitivity to deviations between, say, an intended vowel and the acoustics of the vowel produced but also an ability to compensate for these deviations with fine motor adjustments that can raise or lower a particular formant frequency by as little as 50 Hz (see, e.g., Katseff, Houde, & Johnson, 2012 ; MacDonald, Goldberg, & Munhall, 2010 ). It is perhaps not surprising that this kind of fine-grained spatiotemporal control over articulation develops slowly. Large gains in speech motor skill are made during the first few years of life, but adultlike control is not achieved until mid-adolescence. Evidence for this claim dates back to Kent and Forner (1980) , who pointed out that temporal variability in young school-aged children's segmental durations is higher than in adults' speech and that this remains true until 12 years of age (see also Lee, Potamianos, & Narayanan, 1999 ; B. L. Smith, 1992 ). These acoustic findings were later supplemented with kinematic ones, which validated the interpretation of greater temporal variability in children's speech as the result of immature articulatory timing control ( Green, Moore, Higashikawa, & Steeve, 2000 ; Sharkey & Folkins, 1985 ; A. Smith & Goffman, 1998 ). A. Smith and Zelaznik (2004) followed up on this work with older children and showed that articulatory timing control is not fully mature until mid-adolescence. So, given the protracted development of speech motor control, why can we more or less understand what children are saying when they first begin to use words at about 12 months of age? Also, even more strikingly, how is it possible that 3-year-old children seem to never stop talking when their speech motor skills are still so immature? The answer put forward in this review article is that feedforward processes mature earlier than central feedback control processes.

More specifically, the argument developed herein is that speech production relies on conceptually linked representations that are abstract (i.e., information-reduced) holistic perceptual and motoric forms. These forms constitute the phonological aspect of the lexicon. The perceptual phonological forms are exogenous representations. They are exemplars that are acquired with lexical concepts beginning around 9 months of age. The motoric phonological forms are endogenous representations. They are schemas that begin to be abstracted around 12 months of age with first word productions. When a particular exemplar and schema are activated with the selection of a particular concept, their forms are used to define unique trajectories through an endogenous perceptual–motor space that guides implementation. This space is not linguistic; its processes are entirely free from conceptual information. The absence of conceptual information reflects the origin of this space in the prespeech period when infants' vocal explorations create the first linkages between perceptual and motoric trajectories.

By hypothesis, schemas are modified through developmental time as central feedback control is incorporated into the production process. This is because the act of speaking indirectly modifies schemas via the same process used to first abstract them. The onset of high-level predictive feedback control emerges with communication failures. These failures are assumed to significantly increase with vocabulary size due to homophony, motivating a shift in the production system toward exemplar representations around 18 months of age. The shift drives the emergence of an internal loop that matches the (projected) perceptual consequences of self-productions against targeted exemplar representations. Selective attention to auditory feedback develops later during the preschool years with the emergence of self-concept. At this point, the child begins to focus on sound production per se in addition to communication. The latter hypothesis could explain why literacy acquisition becomes possible around the age of 5 years and why direct intervention for speech sound disorders also becomes effective at this age.

The argument outlined above is in fact a general theory of speech production that is developmentally sensitive. The theory combines those aspects of existing adult-focused theories that best accommodate acquisition to define whole-word production at different stages of development from infancy to childhood on into adulthood. This developmentally sensitive theory of speech production is further motivated below. This motivation begins with a review of adult-focused theories. A major point of the review will be that the two major approaches to speech, the ecological dynamics and information-processing approaches, lead to different emphases regarding the type of feedforward information used in production (motoric vs. perceptual) and to different views on the type of feedback control processes engaged during execution (peripheral vs. central). I will argue that the holistic motoric representations that drive production in the ecological dynamics approach are consistent with functional approaches to child phonology and better account for young children's speech patterns than the discrete perceptual representations that drive production in the information-processing approach. Nonetheless, the information-processing assumption of distinct production and perception systems is embraced in the developmentally sensitive theory of speech production that I put forward because central feedback control is deemed necessary to account for the evolution of children's speech patterns from first words to adultlike forms.

Adult-Focused Theories of Speech Production

Adult-focused theories of speech production assume the activation of an abstract phonological plan that is then rendered in sufficient phonetic detail for the sensorimotor system to activate speech movements (e.g., Browman & Goldstein, 1992 ; Dell, 1986 ; Garrett, 1988 ; Goldrick, 2006 ; Goldstein, Byrd, & Saltzman, 2006 ; Guenther, 1995 ; Keating & Shattuck-Hufnagel, 2002 ; Roelofs, 1999 ; Turk & Shattuck-Hufnagel, 2014 ). The detailed phonetic plan is known as a speech plan . It contains or directly activates linguistic representations that provide relevant feedforward information for implementation. The representations and type of feedback control processes used in production differ according to the theoretical approach taken. Here, the two main approaches to speech production are reviewed: the ecological dynamics approach and the information-processing approach (see Figure 1 ). These approaches represent an amalgam of different theories, hence the generic labels. The different sets of theories emerge from two fundamentally different approaches to human cognition—an ecological-embodied approach versus a representation-based information-processing approach, which are briefly described next.

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g001.jpg

The ecological dynamics and information-processing approaches to speech production both assume three major levels of analysis: a phonological level where abstract form representations are associated with conceptual meaning, a speech plan level where abstract forms are elaborated for implementation, and an implementation level where articulatory action is formulated and adjusted in real time to achieve the plan. The two approaches otherwise adopt very different fundamental assumptions, resulting in different theories of representation, sequencing, and control. In particular, the ecological dynamics approach emphasizes speech as action and assumes gestalt articulatory representations, emergent sequential structure, and self-organized articulation. In contrast, the information-processing approach emphasizes the importance of discrete elements and assumes executive control over sequencing and implementation, thus promoting a strong role for perception in production while assuming that the two processes are distinct. Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

Richardson, Shockley, Fajen, Riley, and Turvey (2009) outline the tenets of an ecological-embodied approach in contrast to the assumptions of an information-processing approach as follows. In an ecological-embodied approach, behavior is emergent and self-organized, which is to say behavior is not planned or controlled (pp. 170–173). Perception and action are viewed as continuous and cyclic and thus functionally united (pp. 173–175). In particular, the concept of affordances assumes that the objects of perception provide information about action possibilities (pp. 178–182). The theory of direct perception assumes that these useful objects are wholly conveyed by sensory input (pp. 176–178). This means that knowledge is simply extracted from the environment within which the individual lives and moves (pp. 167–170).

The ecological-embodied view of knowledge contrasts with the information-processing view where knowledge emerges from learned associations, which give rise to mediating representations. These representations are knowledge in the information-processing approach. This view of knowledge follows from other assumptions: Individuals are separate from their environment, the mind is separate from the body, and action is separate from perception. Overall, representational and computational processes are “lifted away from the organism–environment system and…studied on their own, permitting cognitive scientists to proceed whereas other specialists work to understand the body and environment of the knower” ( Richardson et al., 2009 , pp. 161–162). This approach to human cognition is likely more familiar to readers than the ecological-embodied approach because it has provided the philosophical foundation for much of mainstream cognitive sciences in North America, including linguistics and psychology, since the “cognitive revolution” in the 1950s (see Mandler, 2007 , Chap. 10). The assumptions of this approach are detailed in Newell and Simon's (1972) classic book, Human Problem Solving .

The information-processing approach has resulted in the modular study of language (e.g., syntax vs. phonology) and in a sharp division of expertise between those who study language and those who are interested in speech production (e.g., phonology vs. phonetics). Among the latter, those who adhere closely to the approach often focus on the translation problem that follows from their computational view, for example, the problem of how discrete phonological elements are transformed into continuous speech action (see, inter alia, Bladon & Al-Bamerni, 1976 ; Keating, 1990 ; MacNeilage, 1970 ; Recasens, 1989 ; Stevens & Blumstein, 1981 ; Wickelgran, 1969 ). This focus also structures psycholinguistic models of production that posit multiple processing stages to generate production units (e.g., Dell, 1986 ; Garrett, 1988 ; Goldrick, 2006 ; Levelt, 1989 ; Roelofs, 1999 ), a generic version of which is presented in the right-hand panel of Figure 1 . Models of speech motor control that have discrete elements as goals emphasize feedback control to ensure accurate implementation of these elements in speech movement (e.g., Abbs & Gracco, 1984 ; Hickok, 2012 ; Houde & Nagarajan, 2011 ; Lindblom, Lubker, & Gay, 1979 ; Niziolek, Nagarajan, & Houde, 2013 ; Perkell, Matthies, Svirsky, & Jordan, 1993 ; Tourville & Guenther, 2011 ).

In contrast to the information-processing approach, the ecological-embodied approach has been mainly applied to the study of speech ( Best, 1995 ; Browman & Goldstein, 1992 ; Fowler, 1986 ; Galantucci, Fowler, & Turvey, 2006 ; Goldstein & Fowler, 2003 ; Kelso, Saltzman, & Tuller, 1986 ; Saltzman & Kelso, 1987 ; Saltzman & Munhall, 1989 ). The assumption of separate language and speech systems is thus preserved by default, and only speech processes are fully consistent with the tenets of an ecological-embodied approach. This entails no translation between higher level speech sound representations and lower level speech movement. Phonological forms are objects of both action and perception. These forms become increasingly elaborated when activated through self-organization rather than through planning. Thus, the flow from high to low is better conceived of as the emergence of speech form, which is mediated only by a linearized version of a nonlinear representation (i.e., a gestural score; see Figure 1 , left). The specific assumptions of each approach to speech production are elaborated further below, beginning with the action-focused ecological dynamics approach.

The Ecological Dynamics Approach

The ecological dynamics approach to speech production is best represented by articulatory phonology ( Browman & Goldstein, 1992 , and subsequent), a task-dynamic approach to articulation ( Kelso et al., 1986 ; Saltzman & Kelso, 1987 ; Saltzman & Munhall, 1989 ), and by ecological theories of speech perception ( Best, 1995 ; Fowler, 1986 ; Galantucci et al., 2006 ; Goldstein & Fowler, 2003 ) and speech sound acquisition ( Best, 1995 ; Best, Goldstein, Nam, & Tyler, 2016 ). The fundamental unit of analysis is a vocal tract constriction that serves as an articulatory attractor. This unit is known as a gesture . Gestures are linguistic primitives, similar to distinctive features in generative theory, that emerge during development under the assumption that infants acquire “a relation between actions of distinct (articulatory) organs and lexical units very early in the process of developing language” ( Goldstein & Fowler, 2003 , p. 35; see also Best et al., 2016 ). Gestures are defined as “events that unfold during speech production and whose consequences can be observed in the movements of the speech articulators” ( Browman & Goldstein, 1992 , p. 156). More specifically, they are abstract representations of “the formation and release of constrictions in the vocal tract ( ibid ),” which are realized dynamically, thus giving them an event-like status. This status in turn confers intrinsic timing; that is, once activated, gestures take time to achieve a target vocal tract constriction and then time to move away from the constriction.

The assumption of intrinsic timing has a number of interesting theoretical consequences, several of which are compatible with a developmental perspective on speech production. Perhaps, the most important of these consequences is in the representation of sequential articulation (see, e.g., Browman & Goldstein, 1992 ; Fowler, 1980 ; Fowler & Saltzman, 1993 ; Kelso et al., 1986 ; Saltzman & Munhall, 1989 ). Gestures, like their distinctive feature counterparts in generative phonology, are always realized as part of a larger whole (i.e., a “molecule”). However, unlike distinctive features, the wholes are not bundled up into individual phonemes that must be sequenced during the production process. Instead, gestures participate in an articulatory gestalt that is, minimally, syllable sized. Moreover, all relevant gestures associated with a lexical entry are coactivated when that entry is selected for production ( Browman & Goldstein, 1989 , 1992 ; Goldstein et al., 2006 ). Put another way, the articulatory phonology view of lexical form representations is that these are holistic and motorically based. The developmentally sensitive theory I propose shares this view of lexical representation; I also argue for holistic, perceptually based form representations.

Under the ecological-embodied assumption of cyclic action, appropriate sequencing within a word is emergent. To understand emergent sequencing, consider, for example, the coordination of a single consonantal and vocalic gesture. Consonantal gestures are intrinsically shorter than vocalic gestures. They are also phased relative to one another: If the cyclic gestures are coordinated without a phase difference, a consonant–vowel syllable emerges; if they are 180° out of phase, a vowel–consonant syllable emerges ( Browman & Goldstein, 1988 ; Goldstein et al., 2006 ; Nam, Goldstein, & Saltzman, 2009 ). These in-phase and antiphase relations are stable coordination patterns in motor systems ( Haken, Kelso, & Bunz, 1985 ; Turvey, 1990 ). Of course, languages allow for consonant or vowel sequences that complicate stable coordination dynamics (e.g., consider the English word “sixths” among many, many others). Thus, gestural timing associated with individual words may be learned during speech acquisition and incorporated into a coupling graph, which is the lexical form representation in articulatory phonology ( Goldstein & Fowler, 2003 ; Goldstein et al., 2006 ; Nam et al., 2009 ).

Note that the ecological dynamics conception of coordination also has implications for a theory of coarticulation, which is understood within this approach to speech production as coproduction (see Fowler, 1980 ). In contrast to information-processing approaches to coarticulation, dynamic formant trajectories and distributed spectral effects of rounding and nasalization and so on emerge directly from the representation; they are never due to a central executive that “looks ahead” to the next sound(s) while preparing the current one. This view of coarticulation appears to be more compatible with developmental findings on coarticulation than the information-processing view, a point to which I return later.

When words are selected for production, their coupling graphs give rise to linearized gestural scores (see, inter alia, Goldstein et al., 2006 ). These scores meet the generic definitions of both a speech plan and a motor program. They are plans in that they specify, abstractly, the relative timing and duration of specific speech actions. They are programs in that they drive these actions directly via task dynamics ( Saltzman & Munhall, 1989 ). The dynamic transformation from coupling graph to gestural score means that there is no speech planning in the ecological dynamic approach to speech production; there are only speech plans that serve also as phonological representations. I make a similar assumption in the developmentally sensitive theory proposed herein.

During the implementation stage of the production process, gestures represent motor goals ( Fowler & Saltzman, 1993 ; Löfqvist, 1990 ). Articulators self-organize to effect these goals. Self-organization is based in large part on functional synergies that stabilize over developmental time to become part of the motor control system (see, e.g., A. Smith & Zelaznik, 2004 ). In other words, gestures give rise to a type of functional motor unit of coordination (i.e., a “coordinative structure”). Peripheral perceptual feedback provides relevant context information to subcortical structures and the peripheral nervous system for goal achievement (see, e.g., Saltzman & Munhall, 1989 , p. 48) and to automatically compensate for perturbations (see, e.g., Abbs & Gracco, 1984 ). In this way, there is no real control over production in the sense of cortically mediated adjustments to movement direction and velocity. Whereas this view of implementation and its development can account for infant vocalizations and early speech attempts and for the overall slow development of speech motor skills, I argue below that the strong evidence from adult speech for cortically mediated control over production must be incorporated into a developmentally sensitive theory of speech production to account for phonological change through developmental time.

In summary, an ecological dynamics approach to speech production assumes an entirely feedforward process. Motor goals are articulatory and event-like and are phased relative to one another in articulatory gestalt representations that are linked to conceptual information in the lexicon. Sequential structure and coarticulatory overlap emerge from gestural dynamics. Production itself is a self-organized process. Thus, the approach eschews the concept of central control over speech production based on first principles.

The Information-Processing Approach

The information-processing approach to speech production is best represented by mainstream psycholinguistic theories of language production (e.g., Dell, 1986 ; Garrett, 1988 ; Goldrick, 2006 ; Roelofs, 1999 ), phonetically informed theories of implementation (e.g., Guenther, 1995 ; Guenther & Perkell, 2004 ; Keating & Shattuck-Hufnagel, 2002 ; Turk & Shattuck-Hufnagel, 2014 ), and by prediction-based models of speech motor control (e.g., Hickok, 2012 ; Houde & Nagarajan, 2011 ; Niziolek et al., 2013 ; Tourville & Guenther, 2011 ). In this approach, phonological representations mediate between perception and production. They are abstract and symbolic.

The phoneme—a categorical and discrete element—is often the fundamental unit of analysis in this approach. The emphasis on phonemes is due to a modeling focus on speech errors (e.g., Bock & Levelt, 2002 ; Dell, 1986 ; Garrett, 1988 ; Levelt, 1989 ; Roelofs, 1999 ), which are best described with reference to segmental structure (see also MacKay, 1970 ; Shattuck-Hufnagel & Klatt, 1979 ). These modeling efforts have led to the psycholinguistic assumption that segment sequencing is an active process during production (see, inter alia, Bock & Levelt, 2002 ; Dell, 1986 ; Garrett, 1988 ; Levelt, 1989 ; Roelofs, 1999 ). This process has come to be known as phonological encoding (see Figure 1 , right). Theories diverge on how encoding happens, but once encoded, all theories recognize that the phonemic string must be further specified before it can be used as a plan for output. In Levelt's (1989) highly influential model, the string is metrically chunked for output, allowing for specification of positional information via allophone selection; for example, the aspirated variant of the voiceless alveolar stop is chosen for tab (i.e., [tʰæb]), the unreleased variant is selected for bat (i.e., [bæt̚]), and the stop is replaced by a flap in batter (i.e., [bæɾɚ]). From a developmental perspective, the mainstream assumption of phonological and phonetic encoding complexifies speech acquisition since it predicts that infants must learn a symbolic system and the computational steps necessary to translate symbolic representations into action plans.

Once a phonological string has been phonetically encoded, it can be implemented. Implementation can mean the appropriate selection of a syllable-sized motor program from a mental syllabary (e.g., Bohland, Bullock, & Guenther, 2010 ; Guenther, Ghosh, & Tourville, 2006 ; Levelt, 1989 ) or careful specification of articulatory timing information (e.g., Keating, 1990 ; Turk & Shattuck-Hufnagel, 2014 ). Either way, discrete phones remain high-level motor goals during execution. These goals are conceived of specifically as speech sound categories (e.g., Guenther, 1995 ; Hickok & Poeppel, 2000 ; Johnson, Flemming, & Wright, 1993 ; Lindblom, 1990 ; Lindblom et al., 1972 ) or more generally as perceptual categories (e.g., Perkell, Matthies, Svirsky, & Jordan, 1995 ; Savariaux, Perrier, & Orliaguet, 1995 ; Schwartz, Boë, Vallée, & Abry, 1997 ). Importantly, the goals remain nonoverlapping even in high-frequency combinations when, through repeated practice, they may be stored together as part of a larger chunk (see, e.g., Bohland et al., 2010 , p. 1505). This view stands very much in contrast to the ecological dynamics view where chunks are articulatory gestalts composed of overlapping gestures/articulatory events. The assumption of discrete goals also requires computationally intensive accounts of coarticulation, especially long-distance coarticulation, which is explained in the information-processing approach to result either from feature spreading at an early stage of encoding (e.g., Bladon & Al-Bamerni, 1976 ; Daniloff & Hammarberg, 1973 ; Recasens, 1989 ) or from planning for the articulation of individual phones within a well-defined window during a later stage of encoding (e.g., Guenther, 1995 ; Keating, 1990 ). These accounts wrongly predict the slow development of coarticulation (see below).

Although discrete perceptual speech motor goals are problematic from a development perspective, they are posited in the information-processing approach to explain “the exquisite control of vocal performance that speakers/singers retain for even the highest frequency syllables” ( Bohland et al., 2010 , p. 1509). Exquisite control of vocal performance requires the coordination of multiple independent speech articulators through time, each of which also has many degrees of movement freedom—another developmentally unfriendly computational problem. The coordination problem is solved in the information-processing approach by assuming central perceptual feedback control over articulatory movements—an assumption for which there is now abundant evidence.

Central feedback control means cortically mediated adjustments to articulation made with reference to perceptual goals in order to achieve on-target sound production. Of course, slow central processing of perceptual feedback presents a problem for perceptual feedback during real-time speech production (see, e.g., Lindblom et al., 1979 ; MacNeilage, 1970 ). Lindblom et al. (1979 , p. 160) were the first to propose a viable solution to this problem. Specifically, they proposed that motor control does not rely on processing perceptual feedback per se but instead references the simulated perceptual results of planned action while execution unfolds. Lindblom et al. called this proposal predictive encoding , and with it, they foreshadowed the emphasis in current models of speech motor control where a copy of the output signal (= efference copy) is used to predict sensory outcomes (e.g., Hickok, 2012 ; Houde & Nagarajan, 2011 ; Niziolek et al., 2013 ; Tourville & Guenther, 2011 ) for error correction purposes (e.g., Tourville & Guenther, 2011 ) or real-time speech motor control (see, e.g., Niziolek et al., 2013 ). The proposal is supported by speakers' remarkable ability to correctly produce target sounds when normal articulation is disrupted.

Lindblom et al. (1979) proposed predictive encoding to account for their speakers' near-instantaneous adaptation to different bite-block manipulations during vowel production. Since then, many sophisticated perturbation experiments have been conducted (e.g., Katseff et al., 2012 ; Lametti, Nasir, & Ostry, 2012 ; MacDonald et al., 2010 ; Savariaux et al., 1995 ). These experiments provide strong evidence in favor of perceptual goals and for the role of central feedback control in speech production. Consider, for example, a study by Lametti et al. (2012) , which investigated the effects of different types of perceptual feedback perturbations on the repetition of a target word, head . Somatosensory feedback was disrupted by a robot arm, which tugged randomly at the speakers' lower jaw, thereby disrupting the normal articulatory path for the target /ɛ/ vowel. Auditory feedback was perturbed by altering the speaker's own F1 upward in the direction of an /æ/ vowel. This real-time alteration was sent to the speaker via headphones. The results indicated that speakers counteracted the effects of perturbation through compensation to maintain the target, head, production. While the majority of speakers compensated more for auditory perturbations than somatosensory perturbations, some speakers showed the opposite effect and many adapted to both types of perturbations.

It has been argued that, whereas perturbation experiments provide evidence for error correction based on perceptual feedback, conclusions about real-time speech motor control are more dubious since the experimental findings require manipulations that create very unnatural speaking conditions (see, e.g., Guenther et al., 2006 , p. 288). Yet, the basic behavior observed in perturbation experiments—speaker adjustments based on incoming perceptual information—is also observed in phonetic imitation experiments, which are significantly more natural. Instead of participants hearing their own perturbed speech, they simply repeat words that others have produced (e.g., Babel, 2012 ; Goldinger, 1998 ; Nielsen, 2011 ; Shockley, Sabadini, & Fowler, 2004 ). Just as in the perturbation paradigm, participants are found to make fine-tuned adjustments to their own speech in the direction of the input; for example, participants' production of voice onset time (VOT) in stop production is measurably changed when shadowing exposure to stop-initial words with substantially different VOT values than their own ( Shockley et al., 2004 ). Moreover, behavior in these laboratory experiments also corresponds to the real-world language phenomenon of convergence ( Giles & Powesland, 1997 ), where interlocutors begin to sound like one another over the course of an exchange. When speakers subconsciously “converge” on a set of phonetic features during an interaction, they are demonstrating that perceptual input informs online spoken language production (see, e.g., Babel, 2012 ). Thus, speakers' behavior in contrived and natural speaking conditions provides strong evidence for the importance of perceptual feedback during speech production. The developmentally sensitive theory proposed herein is meant to accommodate this evidence.

In summary, the information-processing approach emphasizes the importance of discrete elements and so assumes executive control over sequencing and implementation. This assumption entails a role for perception in production. The evidence for online vocal–motor adjustments based on self- and other- generated auditory information is especially strong and consistent with the hypothesis of central perceptual feedback control over speech production.

Implications of Adult-Focused Theories for the Development of Speech Production

From a developmental perspective, the different approaches to speech production each has strengths and important limitations that were alluded to above. The main strength of the ecological dynamics approach is the central hypothesis that temporal relations between articulators are preserved as part of an articulatory gestalt lexical representation. This hypothesis, consistent with whole-word approaches to child phonology, provides a framework for understanding children's speech patterns. The strength of the information-processing approach is in recognizing the importance of perceptual feedback for tuning speech production. This emphasis is not only consistent with adult behavior; it also provides a powerful mechanism for learning and thus the ability to explain change over developmental time. These points are elaborated below with a focus on explaining children's speech patterns and developmental change.

Children's Speech Patterns

Child phonology is often viewed from the adult perspective, hence the description of children's speech as fronted, harmonized, simplified, and so on. Implicit is the idea of transformed adultlike representations. As long as the transformation results in a string of phonemes readied for output, speech acquisition can be handled by an information-processing approach and construed as phonemic acquisition (see Vihman, 2017 , for a review and critique of this view). When construed in this way, the learning problem is restricted to the mapping of phoneme-related speech sounds to articulatory movement. The DIVA model ( Guenther, 1995 ; Guenther et al., 2006 ) instantiates this view of speech acquisition and production. The following discussion focuses on the shortcomings of this model to convey a general, developmental critique of the information-processing approach. This focus is a testament to DIVA's influence on the field and to its status as the most complete and explicit statement of an information-processing theory of speech production. Also, the original DIVA model ( Guenther, 1995 ), though ultimately adult focused, was at least constructed to reflect the knowledge that adult behavior emerges over developmental time. This further increases the relevance of DIVA to the present discussion.

In DIVA, speech motor targets are specified as coordinates in an orosensory space. The coordinates correspond to vocal tract shapes. Speech motor goals are acoustically defined and reside in the speech sound map of the model. Linkages between the speech sound map and orosensory space are acquired during babbling. An orosensory to articulation map is established during the first phase of babbling via random articulatory movements. The speech sound map is then acquired during a second phase that relies on overt perceptual feedback to register regions in the orosensory space associated with known (i.e., perceptually acquired) language-specific sounds. Once linkages between discrete sounds and articulation have been established via orosensory space, speech production can be driven by phoneme strings that sequentially activate cells within the speech sound map.

The ease with which the DIVA model can learn to produce language-specific sequences highlights a limitation of the information-processing approach to the development of speech production: It does not take seriously the slow development of speech motor skills. Production proceeds just as in the adult once the phoneme-to-sound and sound-to-articulation mappings have been established. For example, “after babbling, the (DIVA) model can produce arbitrary phoneme strings using a set of 29 English phonemes in any combination” ( Guenther, 1995 , p. 598). In this way, DIVA's behavior is obviously at odds with real development. Child phonological patterns such as gliding ( leg ➔ weg, bread ➔ bwead ), stopping ( feet ➔ peet, house ➔ hout ), epenthesis ( sleep ➔ se-leep, green ➔ ge-reen ), and cluster simplification ( clean ➔ keen, stop ➔ top ) often persist until the school-age years ( Stoel-Gammon & Dunn, 1985 , pp. 43–46).

Although child phonological patterns can be explained within the information-processing approach by positing grammatical rules that constrain sequencing (see, e.g., Kager, Pater, & Zonneveld, 2004 , and the contributions therein), the assumption that children learn via perceptual feedback to produce discrete perceptual goals in sequence incorrectly predicts that young children produce speech that is less coarticulated than adult speech (see, e.g., Guenther, 1995 ; Kent, 1983 ; Tilsen, 2014 ). Guenther (1995 , p. 617) cites Thompson and Hixon's (1979) study on anticipatory nasal coarticulation in support of this prediction. However, the vowel midpoint measure used in that study assumes static phonemic targets that are achieved at the middle of an acoustic interval rather than the dynamic specification of movement. Flege (1988) took a different approach and measured the duration of nasalization across the entire vowel in child and adult speech. His results showed that both children and adults both open “the (velar-pharyngeal port) long before the lingual constriction for word-final /n/” (p. 533). Moreover, when vowel duration was controlled, Flege found no significant differences in the degree to which children and adults engaged in anticipatory behavior.

Guenther (1995) also cites Kent's (1983) chapter to argue that children's speech is more segmental than that of adults. This was Kent's contention, but it was not rigorously demonstrated. Instead, Kent made a qualitative comparison of F2 trajectories in 4-year-old children's and adults' production of spoken phrases. He discussed the F2 patterns in the spectrograms provided and noted that children's vowel productions appeared to be less influenced by adjacent consonantal articulations than adults' vowel productions. I found something similar in an acoustic investigation of unstressed vowels produced by 5-year-olds, 8-year-olds, and adults ( Redford, 2018 ), but other findings were that anticipatory V-to-C effects on F1 were stronger in children's speech than in adults' speech.

In fact, findings from recent ultrasound studies on coarticulation in children's and adults' speech strongly suggest that children's speech is more coarticulated than adults' speech ( Noiray, Abakarova, Rubertus, Krüger, & Tiede, 2018 ; Noiray, Ménard, & Iskarous, 2013 ; Zharkova, Hewlett, & Hardcastle, 2011 , 2012 ; but see Barbier, 2016 , for an alternative view). For instance, Zharkova et al. (2011) used ultrasound to investigate C-to-V coarticulation in school-aged children's and adults' production of /ʃV/ syllables in the frame sentence “It's a __ Pam.” They found that children's production of the palato-alveolar fricative was more influenced by the following vowel than adults' productions (see also Zharkova et al., 2011 ). Noiray et al. (2018) studied coarticulation degree across a wider age range and more consonantal and vocalic contrasts. Their results showed that coarticulation degree becomes weaker with age. In particular, they found that preschool children's articulation of labial, alveolar, and velar stop consonants was all more influenced by the following vowel than school-aged children's articulation of these consonants and that coarticulation degree was stronger in school-aged children's productions than in adults' productions. These and other similar results are opposite the prediction from the information-processing hypothesis that phonemes provide a basis for speech acquisition and production.

In contrast to the information-processing approach, the ecological dynamics approach to speech production predicts that children's speech is more coarticulated than adults' ( Nittrouer, 1993 , 1995 ; Nittrouer, Studdert-Kennedy, & McGowan, 1989 ; Nittrouer, Studdert-Kennedy, & Neely, 1996 ; see also Noiray et al., 2018 , 2013 ). For example, Nittrouer (1995) hypothesized that children's early word productions are articulatory gestalts and that “the emergence of mature production skills involves two processes: differentiation and tuning of individual gestures, and improvement in coordination among gestures that compose a word” (p. 521). The hypothesis aligns well with a functional approach to child phonology, which emphasizes the communicative intent behind spoken language production and so argues for word-based analyses of children's speech sound patterns (e.g., Ferguson & Farwell, 1975 ; Menn, 1983 ; Stoel-Gammon, 1983 ; Vihman, 2017 ; Vihman & Croft, 2007 ; Vihman, Macken, Miller, Simmons, & Miller, 1985 ; Waterson, 1971 ). In fact, Nittrouer et al. (1989, pp. 120–121) explicitly motivated their prediction that children's speech is more coarticulated than adults' with reference to two of the articles that first introduced the idea that child phonology should take the word as its principal unit of analysis (see “setting papers” in Vihman & Keren-Portnoy, 2013 ). Following Ferguson and Farwell (1975) , they suggested that a child's failure to appropriately generalize correct phonetic forms (e.g., [n] and [m]) from one word to another (e.g., “no” is [noʊ], but “night” is [mɑɪt], whereas “moo” is [buː]) indicated that whole words, rather than phonemes, were the targets of acquisition and also the units of production. Nittrouer et al. also referred to Ferguson and Farwell's observation of children's variable word realizations to argue for an account of word form representation as a “collection of gestures” that were inappropriately timed and so genuinely more gestalt-like than segment-like. Finally, they cited Menn's (1983) analysis of consonant harmony in her son's first words to make a point about the existence of “articulatory routines” for word production.

In summary, children's speech patterns are more compatible with the hypothesis of whole-word production than with the hypothesis of phonemic, or segmental, production. In so far as the systematic patterns of child phonology can also be explained to emerge from motoric constraints (see, e.g., Davis, MacNeilage, & Matyear, 2002 ; Locke, 1983 ; McCune & Vihman, 1987 ), the ecological dynamics emphasis on action-based representations is also more compatible with children's speech patterns than the information-processing emphasis on sequencing constraints derived from a child-specific grammar. For this reason, I deem holistic motoric word form representations fundamental to a developmentally sensitive theory of speech production.

Explaining Phonological Change Over Developmental Time

As in Redford (2015) , the specific proposal is that children begin to acquire holistic motoric representations, or schemas, with their attempts at first words. These schemas then provide the basic speech plan for future word productions. This proposal begs the developmental question: How do schema representations change over time as children's speech becomes more and more adultlike? Here, I argue that the information-processing assumption of separate perception and production systems is required to account for developmental change. To make this argument, let us first consider development from the ecological dynamics perspective.

In an ecological dynamics approach, learning is an attunement process ( Goldstein & Fowler, 2003 ; Studdert-Kennedy, 1987 ). Unsuccessful communication destabilizes representations that encode timing relations between gestures, forcing a random walk through motor space until the word-specific timing patterns have been discovered (see, e.g., Nam et al., 2009 ). This mode of phonological learning implies that the temporary but systematic patterns of child phonology represent local minima in the random walk. This implication is consistent with articulatory constraint-based explanations for these patterns (e.g., Davis & MacNeilage, 2000 ; Davis et al., 2002 ; Locke, 1983 ; McCune & Vihman, 1987 ). However, similar to the constraint-based explanations, the assumption of a self-organized system based on dynamic principles predicts a universal pattern of speech development, albeit one that interacts in predictable ways with the target language. This prediction is undermined by the strong individual differences in speech development that are observed within a language (e.g., Ferguson & Farwell, 1975 ; Macken & Ferguson, 1981 ; Stoel-Gammon & Cooper, 1984 ; Vihman, Ferguson, & Elbert, 1986 ).

Ferguson and Farwell (1975) were among the first to take individual differences in development seriously and to propose, in effect, that these signal the child's control over the speech production process. The specific suggestion was that children select word forms from the adult language that they are able to produce. Word selection implies a kind of insight into the production process meted out by an executive controller—an implication that is anathema to the ecological dynamics approach. McCune and Vihman (1987 , 2001) better defined the “what” of what children are able to produce when they proposed that children build up a unique set of vocal motor schemes during babbling based on individual preferences for particular patterns. Vihman (1996) then recast the notion of selection with respect to these schemas. She proposed that a schema acted as a kind of “articulatory filter” that “selectively enhances motoric recall of phonetically accessible words” (p. 142). Elsewhere, Vihman (2017) refers to resonances between the production and perception systems to explain the selective memory for phonetically accessible words. In this way, Vihman is able to explain individual differences in words and forms attempted while avoiding the homunculus problem inherent to the concept of an executive controller.

Although the idea of an articulatory filter very much implies interactions between action and perception, the specific theory of perception Vihman adopts is very clearly not a direct realist one; for example, elsewhere, Vihman is interested in the role of perceptual saliency in children's development of lexical representations (e.g., Vihman, Nakai, DePaolis, & Hallé, 2004 ). The notion of perceptual saliency relies on the psychoacoustic theory of speech perception that undergirds the information-processing approach of speech production, that is, a theory of perception in which the perceptual primitives are “intrinsically meaningless, simple acoustic features, such as spectral distribution patterns, bursts of band-limited aperiodic noise … into which the speech signal can be analyzed” ( Best, 1995 , p. 175). Why does Vihman adopt this theory? Probably because a psychoacoustic theory of speech perception provides targets of acquisition that go beyond a child's immediate abilities and so allow for directed motor learning and change (see also Menn, Schmidt, & Nicholas, 2013 ). More generally, a psychoacoustic theory of speech perception explains a wider variety of speech-related phenomena than a direct realist theory; for example, it accounts for categorical perception in nonhuman animals and why auditory processing constraints appear to affect the structure of phonological systems (see Diehl, Lotto, & Holt, 2004 , for a review).

In summary, the observation that individual children take very different paths to acquire the same spoken language suggests a developmental process more compatible with the information-processing assumption of distinct perception and production systems than with the ecological dynamics assumption of a unified perception–action system. The developmentally sensitive theory to speech production described below further assumes that distinct production and perception systems entail a role for central perceptual feedback control in speech production.

A Developmental Approach to Speech Production

The developmentally sensitive theory of speech production outlined in this section extends the basic idea, first outlined in Redford (2015) , that adult speech production processes and representations are structured by the acquisition of spoken language. The alternative view, implicit in mainstream theory, is that adult speech production processes and representations are the targets of spoken language acquisition. As in Redford (2015) , the theory assumes that the fundamental unit of production is a word. This assumption follows from the view that “the child's entry into language is mediated by meaning: and meaning cannot be conveyed by isolated features or phonemes” ( Studdert-Kennedy, 1987 , p. 51). Similar to an ecological dynamics approach, endogenous representations are assumed to be holistic and action based. As in Redford (2015) , I call these representations schemas, not gestural scores or coupling graphs, to acknowledge borrowing from Vihman and McCune's theoretical work on child phonology ( McCune & Vihman, 1987 , 2001 ; Vihman & McCune, 1994 ) and debts to schema theory in the area of skilled action and motor control ( Arbib, 1992 ; Cooper & Shallice, 2006 ; Norman & Shallice, 1986 ; Schmidt, 1975 ). These acknowledgments also signal the aforementioned embrace of certain information-processing assumptions, namely, that production and perception are distinct processes and that adults implicitly predict perceptual outcomes and use perceptual feedback to make articulatory (and whole-word) adjustments while speaking.

In addition to building on these assumptions, the developmentally sensitive theory outlined here emphasizes two distinctions: (a) the distinction between others' productions and self-productions and (b) the distinction between self-productions for oneself and self-productions for others. Self-productions provide a basis for endogenous representations. When these are for oneself, they are assumed to be exploratory and so free from association with conceptual information. In this way, they provide the basis for the nonlinguistic perceptual–motor map that is used to integrate exemplar and schema representations for production. When self-productions are for others, they are assumed to be communicative and associated with conceptual information. In this way, they provide the basis for schemas. In contrast to self-productions, others' productions provide the basis for just one type of representation—an exogenous perceptual representation associated with conceptual information. I will call this representation a perceptual exemplar . This label acknowledges inspiration from a class of phonetically informed phonological theories that emphasize the importance of detailed, often word-specific, acoustic–phonetic information for production (e.g., Johnson, 2007 ; Pierrehumbert, 2002 ). Perceptual exemplars provide production targets. A child cannot even attempt first words without having acquired at least a few of these from the ambient language.

The foundational assumptions enumerated above entail speech plan representations that are different from either the ecological dynamics or information-processing approaches to speech production. They also entail a different approach to phonology than the ones alluded to so far. Otherwise, the developmentally sensitive theory proposed here borrows heavily from current models of speech production and motor control. It contributes to the field by accounting for the transition from prespeech to adultlike speech in a series of steps that correspond to major developmental milestones.

Step 1: The Perceptual–Motor Map

As in an information-processing approach to speech production, a developmental approach requires a perceptual–motor map, specifically a mapping between auditory speech and articulatory movement that is likely mediated by somatosensory information (e.g., Guenther, 1995 ; Guenther et al., 2006 ; Perkell et al., 1993 ). The existence of a perceptual–motor map is supported by neuropsychological findings on sensorimotor integration in different regions along the auditory dorsal stream pathway from the primary auditory cortex (= superior temporal gyrus, superior temporal sulcus) to the anterior premotor cortex (= inferior frontal gyrus; see Hickok & Poeppel, 2007 ). It is common to assume that the perceptual–motor map develops during the first year of life as infants engage in vocal exploration (e.g., Davis & MacNeilage, 2000 ; Guenther, 1995 ; Hickok, Buchsbaum, Humphries, & Muftuler, 2003 ; Kuhl, 2000 ; Menn et al., 2013 ). Following Oller (2000, pp. 165–179) , I will assume that this exploration includes all prespeech vocalizations from cooing to squealing to babbling and so describes the mapping of continuous acoustic and motor dimensions, with somatosensory information at the intersection of these two. For example, it associates the frequency sweeps of squealing with continuous changes to the length and tension of the vocal folds and the amplitude-modulated frication of raspberries with the forcing of air through loosely coupled lips. It also associates static sounds, such as silence, to transient actions in the vocal tract, such as a briefly sustained oral or glottal closure. This view of the perceptual–motor map enables the gestural interpretation of acoustic form (cf. Best, 1995 ; see also Hickok, 2012 , 2014 ) and so can take holistic representations as input.

Although the map develops during the prespeech period of infant vocalization, it is important to stipulate that it continues to evolve with the acquisition of speech motor skills and across the life span with the acquisition of new languages and with conformity to or disengagement from the sociolinguistic environment (see Kuhl, Ramírez, Bosseler, Lin, & Imada, 2014 , for a related view). In the context of the current theory, this assumption is required to explain developmental changes that are traditionally attributed to the phonology, that is, the evolution of word forms from childlike to more adultlike. This is because the perceptual–motor map provides a source for the abstract action-based word form representations that are schemas, as described below.

Step 2: Perceptual Word Forms and Action Schemas

Children's first words mark the onset of speech production. Word production depends on conceptual development, including the insight that adult vocalizations are referential. This insight, which occurs perhaps as early as 7 months of age ( Bergelson & Swingley, 2012 ; Harris, Yeeles, Chasin, & Oakley, 1995 ), coincides with the acquisition of perceptual word forms—exemplars—from the ambient language. Bergelson and Swingley (2012) provided evidence for this claim when they used eye tracking to assess 6- to 9-month-old infants' ability to comprehend familiar nouns by discriminating between paired pictures while listening to spoken stimuli (e.g., “Can you find the X ?” and “Where's the X ?”). The authors reported that infants as young as 6 months of age were reliably able to discriminate a significant number of the pairs. Note that, by most accounts, perceptual attunement to the native language occurs between 6 and 10 months of age (see Vihman, 2017 , for a review). Bergelson and Swingley therefore interpreted the finding to indicate that learning the sounds of a language goes hand in hand with learning its vocabulary.

At around 12 months of age, the infant has acquired both a reasonably stable perceptual lexicon and a perceptual–motor map. The production of first words is now possible. This heralds the onset of speech production, which is imagined here as the moment when the infant, motivated to communicate a specific referential meaning, uses her perceptual–motor map to translate an exogenously derived perceptual exemplar into vocal action. As in Redford (2015) , I assume that the motor routines an infant first uses to convey a particular concept are abstracted and associated with that concept when the child has succeeded in communicating the intended meaning. This abstraction is the schema. Similar to gestural scores, schemas encode routine-specific relational information between articulators across time, for example, tongue advancement during jaw opening. Similar to coupling graphs, they are the action-based word form representations. Put another way, schemas are both the phonological representation and speech plan for a given word/concept, where word is broadly construed as any conventionalized form–meaning association that is part of the child's repertoire (e.g., “uh oh” or “gimme” for “give me”). Figure 2 depicts first word production and schema abstraction.

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g002.jpg

The onset of speech coincides with attempts to produce specific meanings (i.e., concepts) associated with perceptual word forms learned from the ambient language (left). Specifically, infants engage their perceptual–motor map to derive a best motoric approximation of the exogenous perceptual form or “perceptual exemplar.” The shape of the approximation will depend on how the map has been warped through vocal exploration, which itself is constrained by motor development. The motor routines used to convey specific concepts are abstracted and stored during production (right). These abstractions, or “motor schemas,” are associated with the concept attempted and so serve as one half of the phonological representation of a word. Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

Schemas are continually updated with production. This means that they become more abstract over time as a one-to-one relationship with a single motor routine gives way to timing generalizations that are common to all attempts of a particular word production. Note that the protracted development of articulatory timing control, which results in highly variable speech output, ensures that the schema-encoded generalizations become abstract quite quickly. Ultimately, schemas may encode little else than the number of syllables as iterations of the open–close cycle of the vocal tract and the relative durations of these cycles, plus the initial posture and direction of major articulators for each cycle. This hypothesis is consistent (or at least reconcilable) with evidence for serial timing control and frame-based plans generated in the supplementary motor area and the pre–supplementary motor area, respectively, during adult speech production (see, e.g., Bohland & Guenther, 2006 ; MacNeilage, 1998 ).

Step 3: Onset of Perceptually Based Control

Once schemas are abstracted, they are activated with the perceptual form when a concept is selected for production. The motor and perceptual forms are integrated in the perceptual–motor map. Hickok, Houde, and Rong (2011 , p. 413) adopt a similar hypothesis, albeit with an emphasis on sensorimotor integration at the level of phoneme production. They note that the hypothesis “is consistent with Wernicke's early model in which he argued that the representation of speech, e.g., a word, has two components, one sensory (what the word sounds like) and one motor (what sequence of movements will generate that sequence of sounds).” Wernicke's exact hypothesis of dual word form representations is adopted here to explain both why child forms deviate from adult forms and how the forms change over time.

With respect to children's deviant forms, schemas are assumed to initially weight production in such a way that it appears motorically constrained. The weighting is the result of a very small productive vocabulary, which serves to entrench particular trajectories through motor space. For a while, this entrenchment may even limit the child's ability to form new motor trajectories. At this stage, children's productions of novel words may appear more template-like than in first word production. In Vihman and Croft's (2007 , p. 696) words, “the child (implicitly) impos(es) one or more preexisting templates, or familiar phonological patterns, on an adult form that is…similar to those patterns.”

Around 18 months of age, significant vocabulary expansion results in a developmental shift away from forms that suggest production constraints and toward those that suggest perceptual ones due to increasing homophony among expressive word forms ( Redford & Miikkulainen, 2007 ). This shift heralds the next critical step in the evolution of speech production: a newfound focus on how self-productions should sound. The onset of predictive encoding (state feedback control) emerges from this focus.

In particular, the proposed process by which the 18-month-old infant begins to forge new paths through motor space takes as its inspiration the hierarchical state feedback control model of production ( Hickok, 2012 , 2014 ; Hickok et al., 2011 ), where state feedback control is described as having two functions. The first is to adjust motor commands so that the articulators reach desired perceptual targets; the second is to use external feedback to update the representations that guide speech. In the present proposal, both functions are thought to emerge with a communication-driven shift in production toward better matching of endogenously derived motor forms to exogenously derived perceptual forms. Furthermore, Function 2 is proposed to drive Function 1 in that Function 1 may begin as a delayed comparison between the perceptual trace of a production and the intended target, absent any motor adjustments (see Figure 3 ).

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g003.jpg

Following early word production, the next major developmental change is hypothesized to occur when motorically driven homophony begins to threaten the young child's ability to effectively communicate. At this stage, the child begins to focus on how words should sound. As a result, production shifts from an entirely feedforward process to one where feedforward routines are adjusted to match perceptual representations. The adjustment process, carried out through interactions between the endogenous perceptual–motor map and the repository of exogenous word form representations or “perceptual exemplars,” sets the stage for state feedback control, which nonetheless begins with a delayed comparison between the perceptual trace and target—absent adjustment (left). Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

How might a delayed matching process evolve into real-time state feedback control? One possibility is that the matching process creates a bidirectional connection between the exogenously derived exemplar targets and the perceptual–motor map, where the connections between motor routines and perceptual patterns are already robust and bidirectional. Now, the perceptual outcomes of schema-associated routines can be matched in real time against perceptual exemplars. Any discrepancies between the expected self-outcomes and other-based representations could force new paths through motor space by stretching entrenched motor routines in the direction of the exogenously derived perceptual form.

Step 4: Self-Monitoring

Speech production does not become adultlike until children begin to externally monitor their own speech and consciously recognize its divergence from (chosen) adult norms. The evidence suggests that this may not occur until around the age of 4 years. In particular, feedback perturbation experiments with young children suggest that perceptual input plays little role in speech production before the age of 4 years; for example, toddlers neither immediately compensate nor adapt over time with articulatory changes to their vowel productions when hearing spectrally perturbed alterations of their own speech during a word production task ( MacDonald, Johnson, Forsythe, Plante, & Munhall, 2012 ). At the age of 4 years, children begin to compensate but do not adapt over the long term to perturbed feedback ( MacDonald et al., 2012 ; Ménard, Perrier, Aubin, Savariaux, & Thibeault, 2008 ); for example, Ménard et al. showed that 4-year-old children return immediately to preferred productions after compensating online to an articulatory perturbation. Failures to adapt suggest that, although 4-year-old children may use auditory information to help guide speech production, they do not yet use external feedback to update existing production representations and processes. Still, the ability to adapt appears to emerge soon after 4 years of age in typically developing children ( Terband, Van Brenk, & van Doornik-van der Zee, 2014 ).

Psycholinguistic evidence is consistent with the hypothesis that self-monitoring emerges late in the preschool years during spoken language development. For example, preschool children understand unfamiliar adult speech better than their own unadultlike speech ( Dodd, 1975 ). In addition, self-initiated speech repairs increase over developmental time, with many fewer repairs observed in the speech of 5-year-old children than in the speech of older school-aged children ( Evans, 1985 ; Rogers, 1978 ). Moreover, if we imagine the self-monitoring process as one where the speaker must identify particular discrepancies between what they intended to produce and what they actually produced, then its slow development is consistent with the slow development of selective attention (see, e.g., Plude, Enns, & Brodeur, 1994 ; Wellman, Cross, & Watson, 2001 ). The speculation here is that selective attention to one's own speech is motivated also by a developing self-concept. When the child begins to appreciate those aspects of his or her own speech that signal an undesired social distance between himself or herself and others, he or she shifts his or her attentional focus to identifying discrepancy between how he or she sounds and who he or she wants to sound like. This motivates a final marked disruption of entrenched motor routines in service of better approximating the exogenously derived exemplars.

Self-concept emerges with theory of mind during the preschool years (see Symons, 2004 ). Self-identity, which is part of the self-concept ( Baumeister, 1999 ; Gecas, 1982 ), manifests in speech with socio-indexical marking. For example, VOT for stops varies differently as a function of gender across languages ( Li, 2013 ; Oh, 2011 ; Whiteside & Irving, 1998 ), suggesting social as opposed to physiological reasons for this speech production difference. How does the child acquire female- versus male-gendered speech? The suggestion here is that a burgeoning sense of identity leads the child to selectively attend to those adult productions he or she is most interested in approximating. In identifying a discrepancy between how they sound and who they want to sound like, children may highlight exemplars associated with those individuals, thereby highlighting aspects of the perceptual form that need special attention in production. At the same time, self-monitoring focuses more attention on the perceptual consequences of one's own speech, which further increases the weight of exemplars in the production process, thus pushing motor routines and resulting schema ever more in the adult direction (see Figure 4 ).

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g004.jpg

During the preschool years, children begin to self-monitor based on external perceptual feedback to identify deviations between how they sound and who they want to sound like. The perceived deviations highlight aspects of the stored perceptual representations, driving the perceptual–motor mapping and resulting endogenous motoric representations (i.e., schemas) ever more toward matching exogenous perceptual goals (i.e., exemplars). Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes.

Thus, the full proposal is that, during the preschool years, socially directed listening induces changes in speech production through a self-monitoring–led shift toward perceptually weighted production. Prior to this point, self-productions are (unconsciously) heard as being the same as other productions. Consider, for example, the toddler who points to a picture of a fish in a picture book and utters “fifth,” to which the parent responds “fifth?” and the child answers, “No, fifth!” (see Menn, 1983 ). Updates to both the perceptual–motor map and schema representations follow from this shift, soon resulting in adultlike representations. This proposed final stage in the development of speech production is consistent with the evidence that socio-indexical information, such as gender-specific use of phonetic features, begin to emerge in children's speech around the age of 4 years (see Foulkes & Docherty, 2006 , pp. 422–424). This observation brings us back to an earlier one that closes the gap between work in speech motor control and real-world speaker behavior, that is, the observation that participants' behavior in auditory feedback perturbation experiments resembles phonetic convergence, normally understood as a socially driven behavior meant to lubricate interactions between interlocutors.

Current approaches to speech production aim to explain adult behavior and, in so doing, frequently make at least some assumptions that, when taken to their logical conclusion, fail to adequately account for how the system develops. This failure is problematic from a developmental perspective. According to this perspective, the representations and processes of adult speech and language should emerge from the developmental process (for a similar view, see Menn et al., 2013 ; Vihman & Croft, 2007 ).

Development is particularly relevant for theories of speech production because of the paradox of early speech onset despite slowly developing speech motor control. Here, this paradox was taken to suggest the working hypothesis that feedforward processes mature earlier than central feedback control processes in speech production. This hypothesis structured a developmentally sensitive theory of speech production that was elaborated in stages, with each stage building on the previous one. The stages proposed were designed to accommodate developmental patterns. At the same time, developmental patterns were given new meanings and grouped in novel ways by the working hypothesis. The accommodation of speech production theory to developmental findings and vice versa results in many new testable hypotheses that could motivate future empirical work and usher in new knowledge and even new clinical practice. For example, the hypothesis that perceptual–motor integration relies on the development of a nonlinguistic perceptual–motor map suggests that therapeutic uses of speech sound practice should cover as broad a range of sound combinations as possible. By hypothesis, these sound combinations need not be tied to lexical content and so the therapy could involve a fun and silly random sound sequence–generating game using, say, magnetic letters that could be arranged and then rearranged on a board. Such a game would allow the set of possible sound combinations in a language to be more fully explored than is possible when that set is constrained by picturable words in the language. The benefits of this therapy for generalization to novel or known word production could be tested against current therapies where speech sound practice typically involves the use of visual props to elicit specific lexical items. Intriguingly, this idea echoes, to some extent, Gierut's (2007) differently motivated contention that words with complex speech sound sequences allow for better generalization of treatment in children with phonological disorder than words that have simple phonological structure.

The hypothesized disassociation of the perceptual–motor map and perceptual exemplar representation of word forms also has implications for the clinical assessment of speech sound disorder. For example, when this hypothesis is taken together with the idea that articulatory change is motivated by weighting perceptual exemplar representations more heavily during production, it suggests that the aforementioned fun and silly random sound sequence–generating game could be used to supplement a comprehensive evaluation of speech sound disorder. Performance in the game could help diagnose whether the articulation problem is due to a poorly developed perceptual–motor map or to poorly specified perceptual exemplars. The diagnosis would then lead to therapy that focuses either on speech sound practice or on developing perceptual exemplars. Finally, the theory-dependent hypothesis that perceptual weighting of production is driven in part by the emergence of a self-concept and the ensuing selective attention to self-productions suggests not only a testable hypothesis regarding the development of convergence behaviors in spoken language interactions but also a novel way to understand the absence of convergence behaviors and mild segmental speech sound disorders in individuals on the autism spectrum.

Another major implication of the developmentally sensitive theory elaborated in this review article is a new adult model of speech production. This model, illustrated in Figure 5 , incorporates insights from many existing theories. Some of these insights were explicitly acknowledged in the preceding text; others were merely implied. For example, the reference to “self-monitoring” indicates an acceptance of the evidence in favor of this well-established hypothesis (see Postma, 2000 , for a review). Otherwise, the model diverges from most adult-focused theories in assuming distinct action- and perception-based representations (though see Hickok, 2012 , 2014 ). This aspect of the model provides a framework for understanding phenomena that have been traditionally ignored in adult-focused theories of speech production. For example, the model very obviously allows for the different possible speaking modes that are thought to correspond with speaking style differences specifically, one mode wherein the motor pathway is emphasized over the perceptual pathway—this is Lindblom's (1990) hypo or system-oriented mode, one mode wherein the reverse occurs—this is Lindblom's hyper or output-oriented mode (shown); and a mode mode wherein the two pathways are in equilibrium—this is likely the default mode.

An external file that holds a picture, illustration, etc.
Object name is JSLHR-62-2946-g005.jpg

The adult model of speech production implied by the developmental model outlined in this review article. Solid lines with arrows represent feedforward processes; dotted lines with arrows represent feedback processes. The linkages between the repository of lexical concepts and motor schemas and between lexical concepts and perceptual exemplars represent the conceptual and phonological aspects of the lexicon.

The implied adult model shown in Figure 5 also diverges from information-processing theories in assuming that holistic phonological representations serve as speech plan representations. This developmentally sensitive aspect of the model is not immediately compatible with the evidence for sublexical units in productions, including the speech error data that have long been used to argue for the psychological reality of a phonological encoding process. The developmentally sensitive adult model automatically fails if it cannot account for these data. Accordingly, we are currently pursuing the hypothesis that discreteness emerges at the level of the perceptual–motor map ( Davis & Redford, 2019 ). More specifically, we have formally defined the perceptual–motor map as a linked set of experienced perceptual and motor trajectories that are time-based excursions through speaker-defined perceptual and motor spaces. By hypothesis, nodes appear where motor trajectories intersect in motor space, creating perceptually linked node-delimited paths that can be recombined. Though weighted in the direction of already experienced paths, exemplar-driven novel word production picks new trajectories through motor space by deforming existing node-delimited paths in systematic ways. These new trajectories may intersect existing trajectories or go on to be intersected themselves. In this way, motor space is reticulated with vocabulary acquisition, and discrete speech motor goals emerge absent discrete phonological representations. In future work, we will investigate how this view of discreteness might account for the speech error data. Our initial hypothesis is that these arise from the competing motoric and perceptual pressures of schema and exemplar integration during speech production.

Theories of spoken language production provide frameworks for understanding developmental speech sound disorders. Even the distinction between motor speech, articulation, and phonological disorders reflects this fact. In so far as the types of interventions chosen to address a disorder follow from how the disorder is understood, theory informs practice. This is as it should be. However, the relationship between theory and practice should also motivate a reconsideration of theory when it fails to address a problem that is relevant to practice. The problem of development clearly falls into this category. A major aim of this review article was to show that current adult-focused approaches to speech production fail to address the paradox of slow developing speech motor control despite early speech onset because they depart from perspectives that are not developmental. A developmental perspective assumes change over time, and those who adopt it focus on explaining how this change occurs. A second major aim of this review article was to show how a commitment to this perspective leads to a theory of speech production that is different in many respects from existing theories. Thus, even if the various ideas presented herein are dismissed after testing, the conclusion should be that a developmental approach to understanding speech production should be pursued if theory is to be useful for practice.

Acknowledgments

Article preparation was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development under Grant R01HD087452. The content is solely the author's responsibility and does not necessarily reflect the views of the National Institute of Child Health & Human Development.

Funding Statement

Article preparation was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development under Grant R01HD087452.

  • Abbs J. H., & Gracco V. L. (1984). Control of complex motor gestures: Orofacial muscle responses to load perturbations of lip during speech . Journal of Neurophysiology , 51 ( 4 ), 705–723. [ PubMed ] [ Google Scholar ]
  • Arbib M. A. (1992). Schema theory . In Shapiro S. (Ed.), The encyclopedia of artificial intelligence (Vol. 2 , pp. 1427–1443). Hoboken, NJ: Wiley. [ Google Scholar ]
  • Babel M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation . Journal of Phonetics , 40 ( 1 ), 177–189. [ Google Scholar ]
  • Barbier G. (2016). Contrôle de la production de la parole chez l'enfant de 4 ans: L'anticipation comme indice de maturité motrice [Speech motor control in the 4-year-old child: Anticipatory coarticulation as an index of speech motor development.] (PhD thesis) . Université Grenoble Alpes, Grenoble, France. [ Google Scholar ]
  • Baumeister R. F. (1999). Self-concept, self-esteem, and identity . In Derlega V. J., Winstead B. A., & Jones W. H. (Eds.), Personality: Contemporary theory and research (Nelson-Hall series in psychology) (2nd ed., pp. 339–375). Chicago, IL: Nelson-Hall Publishers. [ Google Scholar ]
  • Bergelson E., & Swingley D. (2012). At 6–9 months, human infants know the meanings of many common nouns . Proceedings of the National Academy of Sciences of the United States of America , 109 ( 9 ), 3253–3258. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Best C. T. (1995). The emergence of native-language phonological influences in infants: A perceptual assimilation model . In Goodman J. C. & Nusbaum H. C. (Eds.), The development of speech perception: The transition from speech sounds to spoken words (pp. 167–224). Cambridge, MA: MIT Press. [ Google Scholar ]
  • Best C. T., Goldstein L. M., Nam H., & Tyler M. D. (2016). Articulating what infants attune to in native speech . Ecological Psychology , 28 ( 4 ), 216–261. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bladon R. A. W., & Al-Bamerni A. (1976). Coarticulation resistance in English /l/ . Journal of Phonetics , 4 ( 2 ), 137–150. [ Google Scholar ]
  • Bock K., & Levelt W. (2002). Language production . In Altman G. T. (Ed.), Psycholinguistics: Critical concepts in psychology (pp. 405–450). Abingdon-on-Thames, England: Routledge. [ Google Scholar ]
  • Bohland J. W., Bullock D., & Guenther F. H. (2010). Neural representations and mechanisms for the performance of simple speech sequences . Journal of Cognitive Neuroscience , 22 ( 7 ), 1504–1529. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bohland J. W., & Guenther F. H. (2006). An fMRI investigation of syllable sequence production . Neuroimage , 32 ( 2 ), 821–841. [ PubMed ] [ Google Scholar ]
  • Browman C. P., & Goldstein L. (1988). Some notes on syllable structure in articulatory phonology . Phonetica , 45 ( 2–4 ), 140–155. [ PubMed ] [ Google Scholar ]
  • Browman C. P., & Goldstein L. (1989). Articulatory gestures as phonological units . Phonology , 6 ( 2 ), 201–251. [ Google Scholar ]
  • Browman C. P., & Goldstein L. (1992). Articulatory phonology: An overview . Phonetica , 49 ( 3–4 ), 155–180. [ PubMed ] [ Google Scholar ]
  • Cooper R. P., & Shallice T. (2006). Hierarchical schemas and goals in the control of sequential behavior . Psychological Review , 113 ( 4 ), 887–916. [ PubMed ] [ Google Scholar ]
  • Daniloff R., & Hammarberg R. (1973). On defining coarticulation . Journal of Phonetics , 1 ( 3 ), 239–248. [ Google Scholar ]
  • Davis B. L., & MacNeilage P. F. (2000). An embodiment perspective on the acquisition of speech perception . Phonetica , 57 ( 2–4 ), 229–241. [ PubMed ] [ Google Scholar ]
  • Davis B. L., MacNeilage P. F., & Matyear C. L. (2002). Acquisition of serial complexity in speech production: A comparison of phonetic and phonological approaches to first word production . Phonetica , 59 ( 2–3 ), 75–107. [ PubMed ] [ Google Scholar ]
  • Davis M., & Redford M. A. (2019). The emergence of discrete motor units in a production model that assumes holistic speech plans . Submitted. [ PMC free article ] [ PubMed ]
  • Dell G. S. (1986). A spreading-activation theory of retrieval in sentence production . Psychological Review , 93 ( 3 ), 283–321. [ PubMed ] [ Google Scholar ]
  • Diehl R. L., Lotto A. J., & Holt L. L. (2004). Speech perception . Annual Review of Psychology , 55 , 149–179. [ PubMed ] [ Google Scholar ]
  • Dodd B. (1975). Children's understanding of their own phonological forms . The Quarterly Journal of Experimental Psychology , 27 ( 2 ), 165–172. [ PubMed ] [ Google Scholar ]
  • Evans M. A. (1985). Self-initiated speech repairs: A reflection of communicative monitoring in young children . Developmental Psychology , 21 ( 2 ), 365–371. [ Google Scholar ]
  • Ferguson C. A., & Farwell C. B. (1975). Words and sounds in early language acquisition . Language , 51 , 491–439. [ Google Scholar ]
  • Flege J. E. (1988). Anticipatory and carry-over nasal coarticulation in the speech of children and adults . Journal of Speech and Hearing Research , 31 ( 4 ), 525–536. [ PubMed ] [ Google Scholar ]
  • Foulkes P., & Docherty G. (2006). The social life of phonetics and phonology . Journal of Phonetics , 34 ( 4 ), 409–438. [ Google Scholar ]
  • Fowler C. A. (1980). Coarticulation and theories of extrinsic timing . Journal of Phonetics , 8 ( 1 ), 113–133. [ Google Scholar ]
  • Fowler C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective . Journal of Phonetics , 14 , 3–28. [ Google Scholar ]
  • Fowler C. A., & Saltzman E. (1993). Coordination and coarticulation in speech production . Language and Speech , 36 ( 2–3 ), 171–195. [ PubMed ] [ Google Scholar ]
  • Galantucci B., Fowler C. A., & Turvey M. T. (2006). The motor theory of speech perception reviewed . Psychonomic Bulletin & Review , 13 ( 3 ), 361–377. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Garrett M. F. (1988). Processes in language production . Linguistics: The Cambridge Survey , 3 , 69–96. [ Google Scholar ]
  • Gecas V. (1982). The self-concept . Annual Review of Sociology , 8 ( 1 ), 1–33. [ Google Scholar ]
  • Gierut J. A. (2007). Phonological complexity and language learnability . American Journal of Speech-Language Pathology , 16 ( 1 ), 6–17. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Giles H., & Powesland P. (1997). Accommodation theory . In Coupland N. & Jaworski A. (Eds.), Sociolinguistics (pp. 232–239). London, England: Palgrave. [ Google Scholar ]
  • Goldinger S. D. (1998). Echoes of echoes? An episodic theory of lexical access . Psychological Review , 105 ( 2 ), 251–279. [ PubMed ] [ Google Scholar ]
  • Goldrick M. (2006). Limited interaction in speech production: Chronometric, speech error, and neuropsychological evidence . Language and Cognitive Processes , 21 ( 7–8 ), 817–855. [ Google Scholar ]
  • Goldstein L., Byrd D., & Saltzman E. (2006). The role of vocal tract gestural action units in understanding the evolution of phonology . In Arbib M. A. (Ed.), Action to language via the mirror neuron system (pp. 215–249). Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Goldstein L., & Fowler C. A. (2003). Articulatory phonology: A phonology for public language use . In Schiller N. O. & Meyer A. (Eds.), Phonetics and phonology in language comprehension and production: Differences and similarities (pp. 159–207). Berlin, Germany: De Gruyter. [ Google Scholar ]
  • Green J. R., Moore C. A., Higashikawa M., & Steeve R. W. (2000). The physiologic development of speech motor control: Lip and jaw coordination . Journal of Speech, Language, and Hearing Research , 43 , 239–255. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Guenther F. H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production . Psychological Review , 102 ( 3 ), 594–621. [ PubMed ] [ Google Scholar ]
  • Guenther F. H., Ghosh S. S., & Tourville J. A. (2006). Neural modeling and imaging of the cortical interactions underlying syllable production . Brain and Language , 96 ( 3 ), 280–301. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Guenther F. H., & Perkell J. S. (2004). A neural model of speech production and its application to studies of the role of auditory feedback in speech . In Maassen B., Kent R. D., Peters H. F. M., van Lieshout P. H. H. M., & Hulstijn W. (Eds.), Speech motor control in normal and disordered speech (pp. 29–49). Oxford, England: Oxford University Press. [ Google Scholar ]
  • Haken H., Kelso J. S., & Bunz H. (1985). A theoretical model of phase transitions in human hand movements . Biological Cybernetics , 51 ( 5 ), 347–356. [ PubMed ] [ Google Scholar ]
  • Harris M., Yeeles C., Chasin J., & Oakley Y. (1995). Symmetries and asymmetries in early lexical comprehension and production . Journal of Child Language , 22 ( 1 ), 1–18. [ PubMed ] [ Google Scholar ]
  • Hickok G. (2012). Computational neuroanatomy of speech production . Nature Reviews Neuroscience , 13 ( 2 ), 135–145. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hickok G. (2014). The architecture of speech production and the role of the phoneme in speech processing . Language, Cognition and Neuroscience , 29 ( 1 ), 2–20. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hickok G., Buchsbaum B., Humphries C., & Muftuler T. (2003). Auditory–motor interaction revealed by fMRI: Speech, music, and working memory in area Spt . Journal of Cognitive Neuroscience , 15 ( 5 ), 673–682. [ PubMed ] [ Google Scholar ]
  • Hickok G., Houde J., & Rong F. (2011). Sensorimotor integration in speech processing: Computational basis and neural organization . Neuron , 69 ( 3 ), 407–422. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hickok G., & Poeppel D. (2000). Towards a functional neuroanatomy of speech perception . Trends in Cognitive Sciences , 4 ( 4 ), 131–138. [ PubMed ] [ Google Scholar ]
  • Hickok G, & Poeppel D. (2007). The cortical organization of speech processing . Nature Reviews Neuroscience , 8 ( 5 ), 393–402. [ PubMed ] [ Google Scholar ]
  • Houde J. F., & Nagarajan S. S. (2011). Speech production as state feedback control . Frontiers in Human Neuroscience , 5 , 82 https://doi.org/10.3389/fnhum.2011.00082 [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Johnson K. (2007). Decisions and mechanisms in exemplar-based phonology . In Sole M.-J., Beddor P., & Ohala M. (Eds.), Experimental approaches to phonology (pp. 25–40). Oxford, England: Oxford University Press. [ Google Scholar ]
  • Johnson K., Flemming E., & Wright R. (1993). The hyperspace effect: Phonetic targets are hyperarticulated . Language , 69 , 505–528. [ Google Scholar ]
  • Kager R., Pater J., & Zonneveld W. (Eds.). (2004). Constraints in phonological acquisition . Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Katseff S., Houde J., & Johnson K. (2012). Partial compensation for altered auditory feedback: A tradeoff with somatosensory feedback? Language and Speech , 55 ( 2 ), 295–308. [ PubMed ] [ Google Scholar ]
  • Keating P. (1990). The window model of coarticulation: Articulatory evidence . In Kingston J. & Beckman M. E. (Eds.), Papers in laboratory phonology (pp. 451–470). Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Keating P., & Shattuck-Hufnagel S. (2002). A prosodic view of word form encoding for speech production . UCLA Working Papers in Phonetics , 101 , 112–156. [ Google Scholar ]
  • Kelso J. A., Saltzman E. L., & Tuller B. (1986). The dynamical perspective on speech production: Data and theory . Journal of Phonetics , 14 ( 1 ), 29–59. [ Google Scholar ]
  • Kent R. D. (1983). The segmental organization of speech . In MacNeilage P. F. (Ed.), The production of speech (pp. 57–89). New York, NY: Springer-Verlag. [ Google Scholar ]
  • Kent R. D., & Forner L. L. (1980). Speech segment duration in sentence recitations by children and adults . Journal of Phonetics , 8 , 157–168. [ Google Scholar ]
  • Kuhl P. K. (2000). A new view of language acquisition . Proceedings of the National Academy of Sciences of the United States of America , 97 ( 22 ), 11850–11857. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kuhl P. K., Ramírez R. R., Bosseler A., Lin J. F. L., & Imada T. (2014). Infants' brain responses to speech suggest analysis by synthesis . Proceedings of the National Academy of Sciences of the United States of America , 111 ( 31 ), 11238–11245. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lametti D. R., Nasir S. M., & Ostry D. J. (2012). Sensory preference in speech production revealed by simultaneous alteration of auditory and somatosensory feedback . Journal of Neuroscience , 32 ( 27 ), 9351–9358. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lee S., Potamianos A., & Narayanan S. (1999). Acoustics of children's speech: Developmental changes of temporal and spectral parameters . The Journal of the Acoustical Society of America , 105 , 1455–1468. [ PubMed ] [ Google Scholar ]
  • Levelt W. J. (1989). Speaking: From intention to articulation . Cambridge, MA: MIT Press. [ Google Scholar ]
  • Li F. (2013). The effect of speakers' sex on voice onset time in Mandarin stops . The Journal of the Acoustical Society of America , 133 ( 2 ), EL142–EL147. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lindblom B. (1990). Explaining phonetic variation: A sketch of the H&H theory . In Hardcastle W. J. & Marchal A. (Eds.), Speech production and speech modelling (pp. 403–439). Dordrecht, the Netherlands: Springer. [ Google Scholar ]
  • Lindblom B., Lubker J., & Gay T. (1979). Formant frequencies of some fixed-mandible vowels and a model of motor programming by predictive simulation . Journal of Phonetics , 7 , 147–161. [ Google Scholar ]
  • Locke J. L. (1983). Phonological acquisition and change . New York, NY: Academic Press. [ Google Scholar ]
  • Löfqvist A. (1990). Speech as audible gestures . In Hardcastle H. W. & Marchal A. (Eds.), Speech production and speech modelling (pp. 289–322). Dordrecht, the Netherlands: Kluwer Academic. [ Google Scholar ]
  • MacDonald E. N., Goldberg R., & Munhall K. G. (2010). Compensations in response to real-time formant perturbations of different magnitudes . The Journal of the Acoustical Society of America , 127 ( 2 ), 1059–1068. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • MacDonald E. N., Johnson E. K., Forsythe J., Plante P., & Munhall K. G. (2012). Children's development of self-regulation in speech production . Current Biology , 22 ( 2 ), 113–117. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • MacKay D. G. (1970). Spoonerisms: The structure of errors in the serial order of speech . Neuropsychologia , 8 ( 3 ), 323–350. [ PubMed ] [ Google Scholar ]
  • Macken M. A., & Ferguson C. A. (1981). Phonological universals in language acquisition . Annals of the New York Academy of Sciences , 379 ( 1 ), 110–129. [ Google Scholar ]
  • MacNeilage P. F. (1970). Motor control of serial ordering of speech . Psychological Review , 77 ( 3 ), 182–196. [ PubMed ] [ Google Scholar ]
  • MacNeilage P. F. (1998). The frame/content theory of evolution of speech production . Behavioral and Brain Sciences , 21 ( 4 ), 499–511. [ PubMed ] [ Google Scholar ]
  • Mandler G. (2007). A history of modern experimental psychology . Cambridge, MA: MIT Press. [ Google Scholar ]
  • McCune L., & Vihman M. M. (1987). Vocal motor schemes . Papers and Reports on Child Language Development , 26 , 72–79. [ Google Scholar ]
  • McCune L., & Vihman M. M. (2001). Early phonetic and lexical development: A productivity approach . Journal of Speech, Language, and Hearing Research , 44 ( 3 ), 670–684. [ PubMed ] [ Google Scholar ]
  • Ménard L., Perrier P., Aubin J., Savariaux C., & Thibeault M. (2008). Compensation strategies for a lip-tube perturbation of French [u]: An acoustic and perceptual study of 4-year-old children . The Journal of the Acoustical Society of America , 124 ( 2 ), 1192–1206. [ PubMed ] [ Google Scholar ]
  • Menn L. (1983). Development of articulatory, phonetic, and phonological capabilities . In Butterworth B. (Ed.), Language production (Vol. 2 , pp. 3–50). Cambridge, MA: Academic Press. [ Google Scholar ]
  • Menn L., Schmidt E., & Nicholas B. (2013). Challenges to theories, charges to a model: The linked-attractor model of phonological development . In Vihman M. M. & Keren-Portnoy T. (Eds.), The emergence of phonology: Whole-word approaches and cross-linguistic evidence (pp. 460–502). Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Nam H., Goldstein L., & Saltzman E. (2009). Self-organization of syllable structure: A coupled oscillator model . In Pellegrino F., Chitoran I., Marsico E., & Coupé C. (Eds.), Approaches to phonological complexity (pp. 299–328). Berlin, Germany: De Gruyter. [ Google Scholar ]
  • Newell A., & Simon H. A. (1972). Human problem solving . Oxford, England: Prentice-Hall. [ Google Scholar ]
  • Nielsen K. (2011). Specificity and abstractness of VOT imitation . Journal of Phonetics , 39 ( 2 ), 132–142. [ Google Scholar ]
  • Nittrouer S. (1993). The emergence of mature gestural patterns is not uniform: Evidence from an acoustic study . Journal of Speech and Hearing Research , 36 ( 5 ), 959–972. [ PubMed ] [ Google Scholar ]
  • Nittrouer S. (1995). Children learn separate aspects of speech production at different rates: Evidence from spectral moments . The Journal of the Acoustical Society of America , 97 ( 1 ), 520–530. [ PubMed ] [ Google Scholar ]
  • Nittrouer S., Studdert-Kennedy M., & McGowan R. S. (1989). The emergence of phonetic segments: Evidence from the spectral structure of fricative-vowel syllables spoken by children and adults . Journal of Speech and Hearing Research , 32 ( 1 ), 120–132. [ PubMed ] [ Google Scholar ]
  • Nittrouer S., Studdert-Kennedy M., & Neely S. T. (1996). How children learn to organize their speech gestures: Further evidence from fricative-vowel syllables . Journal of Speech and Hearing Research , 39 , 379–389. [ PubMed ] [ Google Scholar ]
  • Niziolek C. A., Nagarajan S. S., & Houde J. F. (2013). What does motor efference copy represent? Evidence from speech production . Journal of Neuroscience , 33 ( 41 ), 16110–16116. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Noiray A., Abakarova D., Rubertus E., Krüger S., & Tiede M. (2018). How do children organize their speech in the first years of life? Insight from ultrasound imaging . Journal of Speech, Language, and Hearing Research , 61 , 1355–1368. [ PubMed ] [ Google Scholar ]
  • Noiray A., Ménard L., & Iskarous K. (2013). The development of motor synergies in children: Ultrasound and acoustic measurements . The Journal of the Acoustical Society of America , 133 ( 1 ), 444–452. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Norman D. A., & Shallice T. (1986). Attention to action . In Davidson R. J., Schwartz G. E., & Shapiro D. (Eds.), Consciousness and self-regulation (pp. 1–18). Plenum Press. [ Google Scholar ]
  • Oh E. (2011). Effects of speaker gender on voice onset time in Korean stops . Journal of Phonetics , 39 ( 1 ), 59–67. [ Google Scholar ]
  • Oller D. K. (2000). The emergence of the speech capacity . New York, NY: Psychology Press. [ Google Scholar ]
  • Perkell J. S., Matthies M. L., Svirsky M. A., & Jordan M. I. (1993). Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: A pilot “motor equivalence” study . The Journal of the Acoustical Society of America , 93 ( 5 ), 2948–2961. [ PubMed ] [ Google Scholar ]
  • Perkell J. S., Matthies M. L., Svirsky M. A., & Jordan M. I. (1995). Goal-based speech motor control: A theoretical framework and some preliminary data . Journal of Phonetics , 23 ( 1 – 2 ), 23–35. [ Google Scholar ]
  • Pierrehumbert J. (2002). Word-specific phonetics . In Gussenhoven C. & Warner N. (Eds.), Laboratory phonology 7 (pp. 101–139). Berlin, Germany: Mouton de Gruyter. [ Google Scholar ]
  • Plude D. J., Enns J. T., & Brodeur D. (1994). The development of selective attention: A life-span overview . Acta Psychologica , 86 ( 2–3 ), 227–272. [ PubMed ] [ Google Scholar ]
  • Postma A. (2000). Detection of errors during speech production: A review of speech monitoring models . Cognition , 77 ( 2 ), 97–132. [ PubMed ] [ Google Scholar ]
  • Recasens D. (1989). Long range coarticulation effects for tongue dorsum contact in VCVCV sequences . Speech Communication , 8 ( 4 ), 293–307. [ Google Scholar ]
  • Redford M. A. (2015). Unifying speech and language in a developmentally sensitive model of production . Journal of Phonetics , 53 , 141–152. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Redford M. A. (2018). Grammatical word production across metrical contexts in school-aged children's and adults' speech . Journal of Speech, Language, and Hearing Research , 61 , 1339–1354. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Redford M. A., & Miikkulainen R. (2007). Effects of acquisition rate on emergent structure in phonological development . Language , 83 ( 4 ), 737–769. [ Google Scholar ]
  • Richardson M. J., Shockley K., Fajen B. R., Riley M. A., & Turvey M. T. (2009). Ecological psychology: Six principles for an embodied–embedded approach to behavior . In Calvo P. & Gomila A. (Eds.), Handbook of cognitive science: An embodied approach (pp. 159–187). Amsterdam, the Netherlands: Elsevier. [ Google Scholar ]
  • Roelofs A. (1999). Phonological segments and features as planning units in speech production . Language and Cognitive Processes , 14 ( 2 ), 173–200. [ Google Scholar ]
  • Rogers S. (1978). Self-initiated corrections in the speech of infant–school children . Journal of Child Language , 5 , 365–371. [ Google Scholar ]
  • Saltzman E., & Kelso J. A. (1987). Skilled actions: A task-dynamic approach . Psychological Review , 94 ( 1 ), 84–106. [ PubMed ] [ Google Scholar ]
  • Saltzman E., & Munhall K. G. (1989). A dynamical approach to gestural patterning in speech production . Ecological Psychology , 1 ( 4 ), 333–382. [ Google Scholar ]
  • Savariaux C., Perrier P., & Orliaguet J. P. (1995). Compensation strategies for the perturbation of the rounded vowel [u] using a lip tube: A study of the control space in speech production . The Journal of the Acoustical Society of America , 98 ( 5 ), 2428–2442. [ Google Scholar ]
  • Schmidt R. A. (1975). A schema theory of discrete motor skill learning . Psychological Review , 82 ( 4 ), 225–260. [ Google Scholar ]
  • Schwartz J. L., Boë L. J., Vallée N., & Abry C. (1997). The dispersion–focalization theory of vowel systems . Journal of Phonetics , 25 ( 3 ), 255–286. [ Google Scholar ]
  • Sharkey S. G., & Folkins J. W. (1985). Variability of lip and jaw movements in children and adults: Implications for the development of speech motor control . Journal of Speech and Hearing Research , 28 , 8–15. [ PubMed ] [ Google Scholar ]
  • Shattuck-Hufnagel S., & Klatt D. H. (1979). The limited use of distinctive features and markedness in speech production: Evidence from speech error data . Journal of Verbal Learning and Verbal Behavior , 18 ( 1 ), 41–55. [ Google Scholar ]
  • Shockley K., Sabadini L., & Fowler C. A. (2004). Imitation in shadowing words . Perception & Psychophysics , 66 ( 3 ), 422–429. [ PubMed ] [ Google Scholar ]
  • Smith A., & Goffman L. (1998). Stability and patterning of speech movement sequences in children and adults . Journal of Speech, Language, and Hearing Research , 41 , 18–30. [ PubMed ] [ Google Scholar ]
  • Smith A., & Zelaznik H. N. (2004). Development of functional synergies for speech motor coordination in childhood and adolescence . Developmental Psychobiology , 45 , 22–33. [ PubMed ] [ Google Scholar ]
  • Smith B. L. (1992). Relationships between duration and temporal variability in children's speech . The Journal of the Acoustical Society of America , 91 , 2165–2174. [ PubMed ] [ Google Scholar ]
  • Stevens K. N., & Blumstein S. E. (1981). The search for invariant acoustic correlates of phonetic features . In Eimas P. D. & Miller J. L. (Eds.), Perspectives on the study of speech (pp. 1–38). Hillsdale, NJ: Erlbaum. [ Google Scholar ]
  • Stoel-Gammon C. (1983). Constraints on consonant–vowel sequences in early words . Journal of Child Language , 10 ( 2 ), 455–457. [ PubMed ] [ Google Scholar ]
  • Stoel-Gammon C., & Cooper J. A. (1984). Patterns of early lexical and phonological development . Journal of Child Language , 11 ( 2 ), 247–271. [ PubMed ] [ Google Scholar ]
  • Stoel-Gammon C., & Dunn C. (1985). Normal and disordered phonology in children . Austin, TX: Pro-Ed. [ Google Scholar ]
  • Studdert-Kennedy M. (1987). The phoneme as a perceptuomotor structure . In Allport A., MacKay D., Prinz W., & Scheerer E. (Eds.), Language perception and production (pp. 67–84). London, England: Academic Press. [ Google Scholar ]
  • Symons D. K. (2004). Mental state discourse, theory of mind, and the internalization of self–other understanding . Developmental Review , 24 ( 2 ), 159–188. [ Google Scholar ]
  • Terband H., Van Brenk F., & van Doornik-van der Zee A. (2014). Auditory feedback perturbation in children with developmental speech sound disorders . Journal of Communication Disorders , 51 , 64–77. [ PubMed ] [ Google Scholar ]
  • Thompson A. E., & Hixon T. J. (1979). Nasal air flow during normal speech production . Cleft Palate Journal , 16 , 412–420. [ PubMed ] [ Google Scholar ]
  • Tilsen S. (2014). Selection and coordination of articulatory gestures in temporally constrained production . Journal of Phonetics , 44 , 26–46. [ Google Scholar ]
  • Tourville J. A., & Guenther F. H. (2011). The DIVA model: A neural theory of speech acquisition and production . Language and Cognitive Processes , 26 ( 7 ), 952–981. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Turk A., & Shattuck-Hufnagel S. (2014). Timing in talking: What is it used for, and how is it controlled . Philosophical Transactions of the Royal Society of London B: Biological Sciences , 369 ( 1658 ), 20130395. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Turvey M. T. (1990). Coordination . American Psychologist , 45 ( 8 ), 938–953. https://doi.org/10.1037/0003-066X.45.8.938 [ PubMed ] [ Google Scholar ]
  • Vihman M. M. (1996). Phonological development: The origins of language in the child . Oxford, England: Blackwell. [ Google Scholar ]
  • Vihman M. M. (2017). Learning words and learning sounds: Advances in language development . British Journal of Psychology , 108 ( 1 ), 1–27. [ PubMed ] [ Google Scholar ]
  • Vihman M. M., & Croft W. (2007). Phonological development: Toward a “radical” templatic phonology . Linguistics , 45 , 683–725. [ Google Scholar ]
  • Vihman M. M., Ferguson C. A., & Elbert M. (1986). Phonological development from babbling to speech: Common tendencies and individual differences . Applied Psycholinguistics , 7 ( 1 ), 3–40. [ Google Scholar ]
  • Vihman M. M. & Keren-Portnoy T. (Eds.). (2013). The emergence of phonology: Whole-approaches and cross-linguistic evidence . Cambridge, England: Cambridge University Press. [ Google Scholar ]
  • Vihman M. M., Macken M. A., Miller R., Simmons H., & Miller J. (1985). From babbling to speech: A re-assessment of the continuity issue . Language , 61 , 397–445. [ Google Scholar ]
  • Vihman M. M., & McCune L. (1994). When is a word a word? Journal of Child Language , 21 ( 3 ), 517–542. [ PubMed ] [ Google Scholar ]
  • Vihman M. M., Nakai S., DePaolis R. A., & Hallé P. (2004). The role of accentual pattern in early lexical representation . Journal of Memory and Language , 50 ( 3 ), 336–353. [ Google Scholar ]
  • Waterson N. (1971). Child phonology: A prosodic view . Journal of Linguistics , 7 ( 2 ), 179–211. [ Google Scholar ]
  • Wellman H. M., Cross D., & Watson J. (2001). Meta-analysis of theory-of-mind development: The truth about false belief . Child Development , 72 ( 3 ), 655–684. [ PubMed ] [ Google Scholar ]
  • Whiteside S. P., & Irving C. J. (1998). Speakers' sex differences in voice onset time: A study of isolated word production . Perceptual and Motor Skills , 86 ( 2 ), 651–654. [ PubMed ] [ Google Scholar ]
  • Wickelgran W. A. (1969). Context-sensitive coding, associative memory, and serial order in (speech) behavior . Psychological Review , 76 ( 1 ), 1–15. [ Google Scholar ]
  • Zharkova N., Hewlett N., & Hardcastle W. J. (2011). Coarticulation as an indicator of speech motor control development in children: An ultrasound study . Motor Control , 15 ( 1 ), 118–140. [ PubMed ] [ Google Scholar ]
  • Zharkova N., Hewlett N., & Hardcastle W. J. (2012). An ultrasound study of lingual coarticulation in /sV/ syllables produced by adults and typically developing children . Journal of the International Phonetic Association , 42 ( 2 ), 193–208. [ Google Scholar ]

Psycholinguistics/Development of Speech Production

  • 1 Introduction
  • 2.1 Stage 1: Reflexive Vocalization
  • 2.2 Stage 2: Gooing, Cooing and Laughing
  • 2.3 Stage 3: Vocal Play
  • 2.4 Stage 4: Canonical babbling
  • 2.5 Stage 5: Integration
  • 3.1 Patterns of Speech
  • 3.2.1 Definition of Error Patterns
  • 3.3 Factors affecting development of phonology
  • 4.1 First Words
  • 4.2 Vocabulary Spurt
  • 4.3 Semantic Errors
  • 5.1.1 Two-word utterances
  • 5.2 Syntactic Errors
  • 7 Learning Exercise
  • 8 Learning Exercise Answers
  • 9 References

Introduction [ edit | edit source ]

Speech production is an important part of the way we communicate. We indicate intonation through stress and pitch while communicating our thoughts, ideas, requests or demands, and while maintaining grammatically correct sentences. However, we rarely consider how this ability develops. We know infants often begin producing one-word utterances, such as "mama," eventually move to two-word utterances, such as "gimme toy" and finally sound like an adult. However, the process itself involves development not only of the vocal sounds (phonology), but also semantics (meaning of words), morphology and syntax (rules and structure). How do children learn to this complex ability? Considering that an infant goes from an inability to speak to two-word utterances within 2 years, the accelerated development pattern is incredible and deserves some attention. When we ponder children's speech production development more closely, we begin to ask more questions. How does a child who says "tree" for "three" eventually learn to correct him/herself? How does a child know "nana" (banana) is the yellow,boat-shaped fruit he/she enjoys eating? Why does a child call all four-legged animals "horsie"? Why does this child say "I goed to the kitchen"? What causes a child to learn words such as "doggie" before "hand"? This chapter will address these questions and focus on the four areas of speech development mentioned: phonology, semantics, and morphology and syntax.

Prelinguistic Speech Development [ edit | edit source ]

Throughout infancy, vocalizations develop from automatic, reflexive vocalizations with no linguistic meaning to articulated words with meaning and intonation. In this section, we will examine the various stages an infant goes through while developing speech. In general, researchers seem to agree that as infants develop they increase their speech-like vocalizations and decrease their non-speech vocalizations (Nathani, Ertmer, & Stark) [1] . Many researchers (Oller, ; [2] Stark, as cited in Nathani, Ertmer, & Stark, 2006) [1] . Many researchers (Oller; [2] Stark, as cited in Nathani, Ertmer, & Stark,) [1] have documented this development and suggest growth through the following five stages: reflexive vocalizations, cooing and laughing, vocal play (expansion stage) , canonical babbling and finally, the integration stage.

Stage 1: Reflexive Vocalization [ edit | edit source ]

speech production definition

As newborns, infants make noises in responses to their environment and current needs. These reflexive vocalizations may consist of crying or vegetative sounds such as grunting, burping, sneezing, and coughing (Oller) [2] . Although it is often thought that infants of this age do not show evidence of linguistic abilities, a recent study has found that newborns’ cries follow the melody of their surrounding language input (Mampe, Friederici, Christophe, & Wermke) [3] . They discovered that the French newborns’ pattern was a rising contour, where the melody of the cry rose slowly and then quickly decreased. In comparison, the German newborns’ cry pattern rose quickly and slowly decreased. These patterns matched the intonation patterns that are found in each of the respective spoken languages. Their finding suggest that perhaps infants vocalizations are not exclusively reflexive and may contain patterns of their native language.

Stage 2: Gooing, Cooing and Laughing [ edit | edit source ]

Between 2 and 4 months, infants begin to produce “cooing” and “gooing” to demonstrate their comfort states. These sounds may often take the form of vowel-like sounds such as “aah” or “oooh.” This stage is often associated with a happy infant as laughing and giggling begin and crying is reduced. Infants will also engage in more face-to-face interactions with their caregivers, smiling and attempting to make eye contact (Oller) [2] .

Stage 3: Vocal Play [ edit | edit source ]

From 4 to 6 months, and infants will attempt to vary the sounds they can produce using their developing vocal apparatus. They show a desire to explore and develop new sounds which may include yells, squeals, growls and whispers(Oller) [2] . Face-to-face interactions are still important at this stage as it promotes development of conversation abililities. Beebe, Alson, Jaffe et al. [4] found that even at this young age, infants’ vocal expression show a “ dialogic structure ” - meaning that, during interactions with caregivers, infants were able to take turns vocalizing.

Stage 4: Canonical babbling [ edit | edit source ]

After 6 months, infants begin to make and combine sounds that are found in their native language, sometimes known as “well-formed syllables,” which are often replicated in their first words(Oller) [2] . During this stage, infants combine consonants and vowels and replicate them over and over - they are thus called reduplicated babble . For example, an infant may produce ‘ga-ga’ over and over. Eventually, infants will begin to string together multiple varied syllables, such as ‘gabamaga’, called variegated babbles . Other times, infants will move right into the variegated babbles stage without evidence of the reduplicated babbles (Oller) [2] . Early in this stage, infants do not produce these sounds for communicative purposes. As they move closer to pronouncing their first words, they may begin to use use sounds for rudimentary communicative purposes(Oller) [2] .

Stage 5: Integration [ edit | edit source ]

speech production definition

In the final stage of prelinguistic speech, 10 month-old infants use intonation and stress patterns in their babbling syllables, imitating adult-like speech. This stage is sometimes known as conversational babble or gibberish because infants may also use gestures and eye movements which resemble conversations(Oller) [2] . Interestingly, they also seem to have acoustic differences in their vocalizations depending on the purpose of their communication. Papaeliou and Trevarthen [5] found that when they were communicating for social purposes they used a higher pitch and were more expressive in their vocalizations and gestures than when exploring and investigating their surroundings. The transition from gibberish to real words is not obvious(Oller) [2] as this stage often overlaps with the acquisition of an infant’s first words. These words begin when an infant understands that the sounds produced are associated with an object .During this stage, infants develop vocal motor schemes , the consistent production of certain consonants in a certain period of time. Keren-Portnoy and Marjorano’s [6] study showed that these vocal motor schemes play a significant part in the development of first words as children who children who mastered them earlier, produced words earlier. These consistent consonants were used in babble and vocal motor schemes, and would also be present in a child’s first words. Evidence that a child may understand the connection between context and sounds is shown when they make consistent sound patterns in certain contexts (Oller) [2] . For example, a child may begin to call his favorite toy “mub.” These phonetically-consistent sound patterns, known as protowords or quasi-words , do not always reflect real words, but they are an important step towards achieving adult-like speech(Otomo [7] ; Oller) [2] . Infants may also use their proto-words to represent an entire sentence (Vetter) [8] . For example, the child may say “mub” but may be expressing “I want my toy”, “Give me back my toy” “Where is my toy?”, etc.

Phonological Development [ edit | edit source ]

When a child explicitly pronounces their first word they have understood the association between sounds and their meaning Yet, their pronunciation may be poor, they produce phonetic errors, and have yet to produce all the sound combinations in their language. Researchers have come up with many theories about the patterns and rules children and infants use while developing their language. In this section, we will examine some frequent error patterns and basic rules children use to articulate words. We will also look how phonological development can be enhanced.

Patterns of Speech [ edit | edit source ]

Depending on their personalities and individual development, infants develop their speech production slightly differently. Some children, productive learners , attempt any word regardless of proper pronunciation (Rabagaliati, Marcus, & Pylkkänen) [9] . Conservative learners (Rabagaliati, Marcus, & Pylkkänen) [9] , are hesitant until they are confident in their pronunciation. Other differences include preference to use nouns and name things versus use of language in a more social context. (Bates et al., as cited in Smits-Bandstra) [10] . Although infants vary in their first words and the development of their phonology, by examining the sound patterns found in their early language, researchers have extracted many similar patterns. For example, McIntosh and Dodd [11] examined these patterns in 2 year olds and found that they were able to produce multiple phonemes but were lacking [ ʃ , θ , tʃ , dʒ , r ]. They were also able to produce complex syllables. Vowel errors also occurred, although consonant errors are much more prevalent. The development of phonemes continues throughout childhood and many are not completely developed until age 8 (Vetter) [8] .

Phonological Errors [ edit | edit source ]

As a child pronounces new words and phonemes, he/she may produce various errors that follow patterns. However, all errors will reduce with age (McIntosh & Dodd) [11] . Although each child does not necessarily produce the same errors, errors can typically be categorized into various groups. For example, they are multiple kinds of consonant errors. A cluster reduction involves reducing a multiple consonants in a row (ie: skate). Most often, a child will skip the first consonant (thus skate becomes kate), or they may leave out the second stop consonant ( consonant deletion - Wyllie-Smith, McLeod, & Ball) [12] (thus skate becomes sate). This type of error has been found by McIntosh and Dodd [11] . For words that have multiple syllables, a child may skip the unstressed syllable at the beginning of the sentence (ie: potato becomes tato) or in the middle of a sentence (ie: telephone becomes tephone) (Ganger & Brent) [13] . This omission may simply be due to the properties of unstressed syllables as they are more difficult to perceive and thus a child may simply lack attention to it. As a child grows more aware of the unstressed syllable, he/she may chose to insert a dummy syllable in place of the unstressed syllable to attempt to lengthen the utterance (Aoyama, Peters, & Winchester [14] ). For example, a child may say [ə hat] (‘ə hot’) (Clark, as cited in Smits-Bandstra) [10] . Replacement shows that the child understands that there should be some sound there, but the child has inserted the wrong one. Another common phonological error pattern is assimilation . A child may pronounce a word such that a phoneme within that word sounds more like another phoneme near it (McIntosh & Dodd) [11] . For example, a child may say “”gug” instead of “bug”. This kind of error may also be seen for with vowels and is common in 2 year-olds, but decreases with age (Newton) [15] .

Definition of Error Patterns [ edit | edit source ]

Definition of error pattern

Factors affecting development of phonology [ edit | edit source ]

speech production definition

As adequate phonology is an important aspect in effective communication, researchers are interested in factors that can enhance it. In a study done by Goldstein and Schwade [16] , it was found that interactions with caregivers provided opportunities for8-10 month old infants to increase their babbling of language sounds (consonant-vowel syllables and vowels). This study also found that infants were not simply imitations their caregivers vocalizations as they were producing various phonological patterns and had longer vocalizations! Thus, it would seem that social feedback from caregivers advances infants phonological development. On the other hand, factors such as hearing impairment, can negatively affect phonological development (Nicolaidis [17] ). A Greek population with hearing impairments was compared to a control group and it was found that they have a different pattern of pronunciation of phonemes. Their pattern displayed substitutions (ie:[x] for target /k/), distortions (ie: place of articulation)and epenthesis/cluster production (ie:[ʃtʃ] or [jθ] for /s/) of words.

Semantic Development [ edit | edit source ]

When children purposefully use words they are trying to express a desire, refusal, a label or for social communication (Ninio & Snow ) [18] . As a child begins to understand that each word has a specific purpose, they will inevitably need to learn meaning of multiple words. Their vocabulary will rapidly expand as they experience various social contexts, sing songs, practice routines and through direct instruction at school (Smits-Bandstra, 2006) [19] . In this section, we will examine children’s first words, their vocabulary spurt, and what their semantic errors are like.

First Words [ edit | edit source ]

Many studies have analyzed the types of words found in early speech. Overall, children’s first words are usually shorter in syllabic length, easier to pronounce, and occur frequently in everyday speech (Storkel, 2004 [20] ). Whether early vocabularies have a noun-bias or not tends to divide researchers. Some researchers argue that the noun bias, or children’s tendency to produce names for objects, people and animals, is sufficient evidence of this bias (Gllette et al.) [21] . However, this bias may not be entirely accurate. Recently, Tardif [22] studied first words cross-culturally between English, Cantonese and Mandarin 8-16 month old infants and found interesting differences. Although all children used terms for people, there was much variation between languages for animals and objects. This suggests that there may be some language differences in which types of words children acquire first.

Vocabulary Spurt [ edit | edit source ]

speech production definition

Around the age of 18 months, many infants will undergo a vocabulary spurt , or vocabulary explosion , where they learn new words at an increasingly rapid rate (Smits-Bandstra) [10] ; Mitchell & McMurray,2009 [23] . Before onset of this spurt, the first 50 words a child learned as usually acquired at a gradual rate (Plunkett, as cited in Smits-Bandstra) [10] .Afterward the spurt, some studies have found upwards of 20 words learned per week( Mitchell and McMurray) [23] . There has been a lot of speculation about the process underlying the vocabulary spurt and there are three main theories. First, it has been suggested that the vocabulary spurt results from the naming insight (Reznick and Gldfield) [24] . The naming insight is a process where children begin to understand that referents can be labeled, either out of context or in place of the object. Second, this period seems to coincide with Piaget’s sensorimotor stage in which children are expanding their understanding of categorizing concepts and objects. Thus, children would necessarily need to expand their vocabulary to label categories (Gopnik) [25] . Finally, it has been suggested that leveraged learning may facilitate the vocabulary explosion (Mitchell & McMurray) [23] . Learning any word begins slowly - one word is learned, which acts as a ‘leverage’ to learn the next word, then those two words can each facilitates learning a new word, and so on. Learning therefore becomes easier. It is possible that not all children experience a vocabulary spurt, however. Some researchers have tested to determine whether there truly is an accelerated learning process. Interestingly, Ganger and Brent [13] used a mathematical model and found that only a minority of the infants studied fit the criteria of a growth spurt. Thus the growth spurt may not be as common as once believed.

Semantic Errors [ edit | edit source ]

Even after a child has developed a large vocabulary; errors are made in selecting words to convey the desired meaning. One type of improper word selection is when children invent a word (called lexical innovation ). This is usually because they have not yet learned a word associated with the meaning they are trying to express, or they simply cannot retrieve it properly. Although made-up words are not real words, it is fairly easy to figure out what a child means, and sometimes easier to remember than the traditional words (Clarke, as cited in Swan) [26] . For example, a child may say “pourer” for “cup” (Clarke, as cited in Swan) [26] .These lexical innovations show that the child is able to understand derivational morphology and use it creatively and productively (Swan) [26] .

Sometimes children may use a word in an inappropriate context either extending or restricting use of the word. For example, a child says “doggie” while pointing to any four-legged animal - this is known as overextension and is most common in 1-2 year olds (McGregor, et al. [27] Bloomquist; [28] Bowerman; [29] Jerger & Damian) [30] . Other times, children may use a word only in one specific context, this is called underextension (McGregor, et al. [27] Bloomquist; [28] Bowerman; [29] Jerger & Damian) [30] . For example, they may only say “baba” for their bottle and not another infant’s bottle. Semantic errors manifest themselves in naming tasks and provide an opportunity to examine how children might organize semantic representations. In McGregor et al.’s [27] naming pictures task for 3-5 year olds, errors were most often related to functional or physical properties (ie: saying chair for saddle). Why are such errors produced? McGregor et al. [27] proposed three reasons for these errors:

Grammatical and Morphological Development [ edit | edit source ]

As children develop larger lexicons, they begin to combine words into sentences that become progressively long and complex, demonstrating their syntactic development. Longer utterances provide evidence that children are reaching an important milestone in beginning the development of morphosyntax (Aoyama et al.) [14] . Brown [31] developed a method that would measure syntactic growth called mean length of the utterance (MLU) . It is determined by recording or listening to a 30-minute sample of a child’s speech, counting the number of meaningful morphemes (semantic roles – see chart below) and dividing it by the number of utterances. Meaningful morphemes can be function words (ie: “of” ), content words (ie: “cat”) or grammatical inflections (ie: -s). Utterances will include each separate thought conveyed thus repetitions, filler words, recitations, or titles and compound words would be counted as one utterance. Brown ended up with 5 different stages to describe syntactical development: Stage I (MLU 1.0-2.0), Stage II (MLU 2.0-2.5), Stage III (MLU 2.5-3.0), Stage IV (MLU 3.0-3.5) Stage V (MLU 3.5-4.0).

Semantic roles

What is this child's MLU? [ edit | edit source ]

Two-word utterances [ edit | edit source ].

Around the age of 18 months, children’s utterances are usually in two-word forms such as “want that, mommy do, doll fall, etc.” (Vetter [8] . In English, these forms are dominated by content words such as nouns, verbs and adjectives and are restricted to concepts that the child is learning based on their sensorimotor stage as suggested by Piaget (Brown) [31] . Thus, they will express relations between objects, actions and people. This type of speech is called telegraphic speech . During this development stage, children are combining words to convey various meanings. They are also displaying evidence of grammatical structure with consistent word orders and inflections.(Behrens & Gut; [32] Vetter) [8] .

Once the child moves from Stage 1, simple sentences begin to form and the child begins to use inflections and function words (Aoyama et al.) [14] . At this time, the child develops grammatical morphemes (Brown) [31] which are classified into 14 different categories organized by acquisition (See chart below).These morphemes modify the meaning of the utterance such as tense, plurality, possession, etc. There are two theories for why this particular order takes place. The frequency hypothesis suggests that children acquire the morphemes they hear most frequently in adult speech. Brown argued against this theory by analyzing adult speech where articles were the most common word form, yet children did not acquire articles quickly. He suggested that linguistic complexity may account for the order of acquisition where the less complex morphemes were acquired first. Complexity of the morphemes was determined based on semantics (meaning) and/or syntax (rules) of the morpheme. In other words, a morpheme with only one meaning such as plurality (-s) is easier to learn than the copula “is” (which encodes number and time the action occurs). Brown also suggested that for a child to have successfully mastered a grammatical morpheme, they must use it properly 90% of the time.

Syntactic Errors [ edit | edit source ]

As children begin to develop more complex sentences, they must learn to use to grammar rules appropriately too. This is difficult in English because of the prevalence of irregular rules. For example, a child may say, “I buyed my toy from the store.” This is known as an overregularization error . The child has understood that there are syntactic patterns and rules to follow, but overuses them, failing to realize that there are exceptions to rules. In the previous example, the child applied a regular part tense rule (-ed) to an irregular verb. Why do these errors occur? It may be that the child does not have a complete understanding of the word meaning and thus incorrectly selects it (Pinker, et al.) [33] . Brooks et al. [34] suggested that these errors may be categorization errors. For example, intransitive or transitive verbs appear in different contexts and thus the child is required to learn that certain verbs appear only in certain contextes. (Brooks) [34] . Interestingly, Hartshorne and Ullman [35] found a gender difference for overregularization errors. Girls were more than three times more likely than boys to produce overregularizations. They concluded that girls were more likely to overgeneralize associatively, whereas boys overgeneralized only through rule-governed methods. In other words, girls, who remember regular forms, better than boys, quickly associated their rule forms to similar sounding words (ie: fold-folded, mold-molded, but they would say hold becomes holded). Boys, on the other hand, will use the regular rule when they have difficulty retrieving the irregular form (ie: past tense form - ed added to irregular form run becomes runed) (Hartshorne & Ullman) [35] .

Another common error committed by children is omission of words from an utterance. These errors are especially prevalent in their early speech production, which frequently lack function words (Gerken, Landau, & Remez) [36] . For example, a child may say “dog eat bone” forgetting function words “the” and “a”.This type of error has been frequently studied and researchers have proposed three main theories to account for omissions. First, it may be that children may focus on words that have referents (Brown) [31] . For example, a child may focus on “car” or “ball”, rather than “jump” or “happy.” The second theory suggests children simply recognize the content words which have greater stress and emphasis (Brown) [31] . The final theory, suggested by Gerken [36] , involves an immature production system. In their study, children could perceive function words and classify them into various syntactic categories, yet still omitted them from their speech production.

Summary [ edit | edit source ]

In this chapter, the development of speech production was examined in the areas of prelinguistics , phonology , semantics , syntax and morphology . As an infant develops, their vocalizations will undergo a transition from reflexive vocalizations to speech-like sounds and finally words. However, their linguistic development does not end there. Infants underdeveloped speech apparatus restricts them from producing all phonemes properly and thus they produce errors such as consonant cluster reduction , omissions of syllables and assimilation . At 18 months, many children seem to undergo a vocabulary spurt . Even with a larger vocabulary, children may also overextend (calling a horse a doggie) or underextend (not calling the neighbors’ dog, doggie) their words. When a child begins to combine words, they are developing syntax and morphology. Syntactic development is measured using mean length of the utterance (MLU) which is categorized into 5 stages (Brown) [31] . After stage II, children begin to use grammatical morphemes (ie: -ed, -s, is) which encode tense, plurality, etc. As with other areas of linguistic development, children also produce errors such as overregularization (ie: “I buyed it”) or omissions (ie: “dog eat bone”). In spite of children’s early errors patterns, children will eventually develop adult-like speech with few errors. Understanding and studying child language development is an important area of research as it may give us insight into underlying processes of language as well as how we might be able to facilitate it or treat individuals with language difficulties.

Learning Exercise [ edit | edit source ]

1. Watch the video clips of a young boy CC provided below.

Video 1 Video 2 Video 3 Video 4 Video 5

2. The following is a transcription of conversations between a mother (*MOT) and a child (*CHI) from Brown's (1970) corpus. You can ignore the # symbol as it represents unintelligible utterances. Use the charts found in the section on " Grammatical and Morphological Development " to help answer this question.

  • Possessive morphemes ('s)
  • Present progressive (-ing)
  • MOT: let me see .
  • MOT: over here +...
  • MOT: you have tapioca on your finger .
  • CHI: tapioca finger .
  • MOT: here you go .
  • CHI: more cookie .
  • MOT: you have another cookie right on the table .
  • CHI: Mommy fix .
  • MOT: want me to fix it ?
  • MOT: alright .
  • MOT: bring it here .
  • CHI: bring it .
  • CHI: that Kathy .
  • MOT: yes # that's Kathy .
  • CHI: op(en) .
  • MOT: no # we'll leave the door shut .
  • CHI: why ?
  • MOT: because I want it shut .
  • CHI: Mommy .
  • MOT: I'll fix it once more and that's all .
  • CHI: Mommy telephone .
  • MOT: well # go and get your telephone .
  • MOT: yes # he gave you your telephone .
  • MOT: who are you calling # Eve ?
  • CHI: my telephone .
  • CHI: Kathy cry .
  • MOT: yes # Kathy was crying .
  • MOT: Kathy was unhappy .
  • MOT: what is that ?
  • CHI: letter .
  • MOT: Eve's letter .
  • CHI: Mommy letter .
  • MOT: there's Mommy's letter .
  • CHI: Eve letter .
  • CHI: a fly .
  • MOT: yes # a fly .
  • MOT: why don't you go in the room and kill a fly ?
  • MOT: you go in the room and kill a fly .
  • MOT: yes # you get a fly .
  • MOT: oh # what's that ?
  • MOT: I'm going to go in the basement # Eve .

3. Below are examples of children's speech. These children are displaying some characteristics of terms of we have covered in this chapter. The specfic terms found in each video are provided. Find examples of these terms within their associated video. Indicate which type of development (phonological, semantic, syntactic) is associated with each of these term.

5.The following are examples of children’s speech errors. Name the error and the type of development it is associated with (phonological, syntactic, morphological, or semantic). Can you explain why such an error occurs?

Learning Exercise Answers [ edit | edit source ]

Click here!

References [ edit | edit source ]

  • ↑ 1.0 1.1 1.2 Nathani, S., Ertmer, D. J., & Stark, R. E. (2006). Assessing vocal development in infants and toddlers. Clinical linguistics & phonetics, 20(5), 351-69.
  • ↑ 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 Oller, D.K.,(2000). The Emergence of the Speech Capacity. NJ: Lawrence Erlbaum Associates, Inc.
  • ↑ Mampe, B., Friederici, A. D., Christophe, A., & Wermke, K. (2009). Newbornsʼ cry melody is shaped by their native language. Current biology : CB, 19(23), 1994-7.
  • ↑ Beebe, B., Alson, D., Jaffe, J., Feldstein, S., & Crown, C. (1988). Vocal congruence in mother-infant play. Journal of psycholinguistic research, 17(3), 245-59.
  • ↑ Papaeliou, C. F., & Trevarthen, C. (2006). Prelinguistic pitch patterns expressing “communication” and “apprehension.” Journal of Child Language, 33(01), 163.
  • ↑ Keren-Portnoy, T., Majorano, M., & Vihman, M. M. (2009). From phonetics to phonology: the emergence of first words in Italian. Journal of child language, 36(2), 235-67.
  • ↑ Otomo, K. (2001). Maternal responses to word approximations in Japanese childrenʼs transition to language. Journal of Child Language, 28(1), 29-57.
  • ↑ 8.0 8.1 8.2 8.3 Vetter, H. J. (1971). Theories of Language Acquisition. Journal of Psycholinguistic Research, 1(1), 31. McIntosh, B., & Dodd, B. J. (2008). Two-year-oldsʼ phonological acquisition: Normative data. International journal of speech-language pathology, 10(6), 460-9. Cite error: Invalid <ref> tag; name "vet" defined multiple times with different content
  • ↑ 9.0 9.1 Rabagliati, H., Marcus, G. F., & Pylkkänen, L. (2010). Shifting senses in lexical semantic development. Cognition, 117(1), 17-37. Elsevier B.V.
  • ↑ 10.0 10.1 10.2 10.3 Smits-bandstra, S. (2006). The Role of Segmentation in Lexical Acquisition in Children Rôle de la Segmentation Dans l’Acquisition du Lexique chez les Enfants. Audiology, 30(3), 182-191.
  • ↑ 11.0 11.1 11.2 11.3 McIntosh, B., & Dodd, B. J. (2008). Two-year-oldsʼ phonological acquisition: Normative data. International journal of speech-language pathology, 10(6), 460-9.
  • ↑ Wyllie-Smith, L., McLeod, S., & Ball, M. J. (2006). Typically developing and speech-impaired childrenʼs adherence to the sonority hypothesis. Clinical linguistics & phonetics, 20(4), 271-91.
  • ↑ 13.0 13.1 Ganger, J., & Brent, M. R. (2004). Reexamining the vocabulary spurt. Developmental psychology, 40(4), 621-32.
  • ↑ 14.0 14.1 14.2 Aoyama, K., Peters, A. M., & Winchester, K. S. (2010). Phonological changes during the transition from one-word to productive word combination. Journal of child language, 37(1), 145-57.
  • ↑ Newton, C., & Wells, B. (2002, July). Between-word junctures in early multi-word speech. Journal of Child Language.
  • ↑ Goldstein, M. H., & Schwade, J. a. (2008). Social feedback to infantsʼ babbling facilitates rapid phonological learning. Psychological science : a journal of the American Psychological Society / APS, 19(5), 515-23. doi: 10.1111/j.1467-9280.2008.02117.x.
  • ↑ Nicolaidis, K. (2004). Articulatory variability during consonant production by Greek speakers with hearing impairment: an electropalatographic study. Clinical linguistics & phonetics, 18(6-8), 419-32.
  • ↑ Nionio, A., & Snow, C. (1996). Pragmatic development. Boulder, CO: Westview Press
  • ↑ Smits-bandstra, S. (2006). The Role of Segmentation in Lexical Acquisition in Children Rôle de la Segmentation Dans l’Acquisition du Lexique chez les Enfants. Audiology, 30(3), 182-191.
  • ↑ Storkel, H. L. (2004). Do children acquire dense neighborhoods? An investigation of similarity neighborhoods in lexical acquisition. Applied Psycholinguistics, 25(02), 201-221.
  • ↑ Gillette, J., Gleitman, H., Gleitman, L., & Lederer, a. (1999). Human simulations of vocabulary learning. Cognition, 73(2), 135-76.
  • ↑ Tardif, T., Fletcher, P., Liang, W., Zhang, Z., Kaciroti, N., & Marchman, V. a. (2008). Babyʼs first 10 words. Developmental psychology, 44(4), 929-38.
  • ↑ 23.0 23.1 23.2 Mitchell, C., & McMurray, B. (2009). On Leveraged Learning in Lexical Acquisition and Its Relationship to Acceleration. Cognitive Science, 33(8), 1503-1523.
  • ↑ Reznick, J. S., & Goldfield, B. a. (1992). Rapid change in lexical development in comprehension and production. Developmental Psychology, 28(3), 406-413.
  • ↑ Gopnik, A., & Meltzoff, A. (1987). The Development of Categorization in the Second Year and Its Relation to Other Cognitive and Linguistic Developments. Child Development, 58(6), 1523.
  • ↑ 26.0 26.1 26.2 Swan, D. W. (2000). How to build a lexicon: a case study of lexical errors and innovations. First Language, 20(59), 187-204.
  • ↑ 27.0 27.1 27.2 27.3 McGregor, K. K., Friedman, R. M., Reilly, R. M., & Newman, R. M. (2002). Semantic representation and naming in young children. Journal of speech, language, and hearing research : JSLHR, 45(2), 332-46.
  • ↑ 28.0 28.1 Bloomquist, J. (2007). Developmental trends in semantic acquisition: Evidence from over-extensions in child language. First Language, 27(4), 407-420.
  • ↑ 29.0 29.1 Bowerman, M. (1978). Systematizing Semantic Knowledge ; Changes over Time in the Child ’ s Organization of Word Meaning tion that errors of word choice stem from the Substitution Errors as Evidence for the Recognition of Semantic Similarities among Words. Child Development, 7.
  • ↑ 30.0 30.1 Jerger, S., & Damian, M. F. (2005). Whatʼs in a name? Typicality and relatedness effects in children. Journal of experimental child psychology, 92(1), 46-75.
  • ↑ 31.0 31.1 31.2 31.3 31.4 31.5 A first Language. Cambridge, MA: Harvard University Press.
  • ↑ Behrens, H., & Gut, U. (2005). The relationship between prosodic and syntactic organization in early multiword speech. Journal of Child Language, 32(1), 1-34.
  • ↑ <Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J., Xu, F., et al. (2011). IN LANGUAGE ACQUISITION Michael Ullman. Language Acquisition, 57(4).
  • ↑ 34.0 34.1 Brooks, P. J., Tomasello, M., Dodson, K., & Lewis, L. B. (1999). Young Childrenʼs Overgeneralizations with Fixed Transitivity Verbs. Child Development, 70(6), 1325-1337. doi: 10.1111/1467-8624.00097.
  • ↑ 35.0 35.1 Hartshorne, J. K., & Ullman, M. T. (2006). Why girls say “holded” more than boys. Developmental science, 9(1), 21-32.
  • ↑ 36.0 36.1 Gerken, L., Landau, B., & Remez, R. E. (1990). Function Morphemes in \ bung Children ’ s Speech Perception and Production. Developmental Psychology, 26(2), 204-216.

speech production definition

  • Psycholinguistics
  • Pages with reference errors

Navigation menu

MIT Press

On the site

  • language arts & disciplines

Physiology of Speech Production

Physiology of Speech Production

Results and Implications of a Quantitative Cineradiographic Study

by Joseph S. Perkell

ISBN: 9780262661706

Pub date: March 17, 2003

  • Publisher: The MIT Press

120 pp. , 6 x 9 in ,

ISBN: 9780262160261

Pub date: May 15, 1969

  • 9780262661706
  • Published: March 2003
  • 9780262160261
  • Published: May 1969

Other Retailers:

  • MIT Press Bookstore
  • Penguin Random House
  • Barnes and Noble
  • Bookshop.org
  • Books a Million
  • Amazon.co.uk
  • Waterstones
  • Description

The physiology of speech production in terms of articulatory dynamics is the subject of this monograph.

An extensive study of articulatory motions is clearly presented with carefully organized and detailed quantitative data derived from tracings of a lateral cineradiography. The data, in graph form, are interpreted in relation to known physical attributes and physiology and relevant linguistic features. Findings from the data are incorporated into a model which presents an approach toward understanding the organization and control of the speech-producing mechanism. The model is constructed to be compatible with linguistic feature systems and methods of computer simulation.

Extensive data of this type have not been previously available, and the techniques and results will provide a valuable source of interest and information to phoneticians, speech scientists, and clinicians in the fields of speech pathology, audiology, and radiology. The data would also interest engineers concerned with speech simulation by computer.

The monograph also gives some analysis and interpretation of the data in terms of underlying linguistic categories. This work, therefore, represents an important step forward in the continuing search for a deeper and better understanding of the nature of human speech.

Contents Methods • The cineradiographic and recording procedures • The speech material • Tracing and measuring techniques • Description of measurements • Data and Discussion • Forms of the data • Discussions of graphical comparisons (among various phonetic segments) of motions of the maxilla, mandible, tongue tip and body, larynx, hyoid bone, lips, pharynx, and velum • Observations from mid-vowel and mid-consonant tracings of certain utterances • Conclusions: Aspects of a Physiological Model of Speech Production

Joseph Perkell is Senior Research Scientist in the College of Health and Rehabilitation Sciences at Boston University.

Additional Material

Sample Chapter

Related Books

Phonology

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

9.3 Speech Production Models

The dell model.

Speech error analysis has been used as the basis for the model developed by Dell (1986, 1988). Dell’s spreading activation model (as seen in Figure 9.3) has features that are informed by the nature of speech errors that respect syllable position constraints. This is based on the observation that when segmental speech errors occur, they usually involve exchanges between onsets, peaks or codas but rarely between different syllable positions. Dell (1986) states that word-forms are represented in a lexical network composed on nodes that represent morphemes, segments and features. These nodes are connected by weighted bidirectional vertices.

A depiction of Dell’s spreading activation model, composed on nodes illustrating the morphemes, segments, and features in a lexical network.

As seen in Figure 9.3, when the morpheme node is activated, it spreads through the lexical network with each node transmitting a proportion of its activation to its direct neighbour(s). The morpheme is mapped onto its associated segments with the highest level of activation. The selected segments are encoded for particular syllable positions which can then be slotted into a syllable frame. This means that the /p/ phoneme that is encoded for syllable onset is stored separately from the /p/ phonemes encoded for syllable coda position. This also accounts for the phonetic level in that instead of having two separate levels for segments (phonological and phonetic levels), there is only one segmental level. In this level, the onset /p/ is stored with its characteristic aspiration as [ph] and the coda /p/ is stored in its unaspirated form [p]. Although this means that segments need to be stored twice for onset and coda positions, it simplified the syllabification process as the segments automatically slot into their respective position. Dell’s model ensures the preservation of syllable constraints in that onset phonemes can only fit into onset syllable slots in the syllable template (the same being true for peaks and codas). The model also has an implicit competition between phonemes that belong to the same syllable position and this explains tongue-twisters such as the following:

  • “She sells sea shells by the seashore” ʃiː sɛlz siːʃɛlz baɪ ðiː siːʃɔː
  • “Betty Botter bought a bit of butter” bɛtiː bɒtə bɔːt ə bɪt ɒv bʌtə

In these examples, speakers are assumed to make errors because of competition between segments that share the same syllable position. As seen in Figure 9.3, Dell (1988) proposes a word-shape header node that contains the CV specifications for the word-form. This node activates the segment nodes one after the other. This is supported by the serial effects seen in implicit priming studies (Meyer, 1990; 1991) as well as some findings on the influence of phonological similarity on semantic substitution errors (Dell & Reich, 1981). For example, the model assumes that semantic errors (errors based on shared meaning) arise in lemma nodes. The word cat shares more segments with a target such as mat ((/æ/nu and /t/cd) than with sap (only /æ/nu). Therefore, the lemma node of mat will have a higher activation level than the one for sap creating the opportunity for a substitution error. In addition, the feedback from morpheme nodes leads to a bias towards producing words rather then nonword error. The model also takes into account the effect of speech rate on error probability (Dell, 1986) and the frequency distribution of anticipation-, perseveration- and transposition- errors (Nooteboom, 1969). The model accounts for differences between various error types by having an in-built bias for anticipation. Activation spreads through time. Therefore, upcoming words receive activation (at a lower level than the current target). Speech rate also has an influence on errors because higher speech rates may lead to nodes not having enough time to reach a specified level of activation (leading to more errors).

While the Dell model has a lot of support for it’s architecture, there have been criticisms. The main evidence used for the model, speech errors, have themselves been questioned as a useful piece of evidence for informing speech production models (Cutler, 1981). For instance, the listener might misinterpret the units involved in the error and may have a bias towards locating errors at the beginning of words (accounting for the large number of word-onset errors). Evidence for the CV header node is limited as segment insertions usually create clusters when the target word also had a cluster and CV similarities are not found for peaks.

The model also has an issue with storage and retrieval as segments need to be stored for each syllable position. For example, the /l/ in English needs to be stored as [l] for syllable onset, [ɫ] for coda and [ḷ] when it appears as a syllabic consonant in the peak (as in bottle ). However, while this may seem redundant and inefficient, recent calculations of storage costs based on information theory by Ramoo and Olson (2021) suggest that the Dell model may actually be more storage efficient than previously thought. They suggest that one of the main inefficiencies of the model are during syllabification across word and morpheme boundaries. During the production of connected speech or polymorphic words, segments from one morpheme or word will move to another (Chomsky & Halle, 1968; Selkirk, 1984; Levelt, 1989). For example, when we say “walk away” /wɔk.ə.weɪ/, we produce [wɔ.kə.weɪ] where the /k/ moves from coda to onset in the next syllable. As the Dell model codes segments for syllable position, it may not be possible for such segments to move from coda to onset position during resyllabification . These and other limitations have led researchers such as Levelt (1989) and his colleagues (Meyer, 1992; Roelofs, 2000) to propose a new model based on reaction time experiments.

The Levelt, Roelofs, and Meyer (LRM) Model

The Levelt, Roelofs, and Meyer or LRM model is one of the most popular models for speech production in psycholinguistics. It is also one of the most comprehensive in that it takes into account all stages from conceptualization to articulation (Levelt et al., 1999). The model is based on reaction time data from naming experiments and is a top-down model where information flows from more abstract levels to more concrete stages. The Word-form Encoding by Activation and VERification (WEAVER) is the computational implementation of the LRM model developed by Roelof (1992, 1996, 1997a, 1997b, 1998, 1999). It is a spreading activation model inspired by Dell’s (1986) ideas about word-form encoding. It accounts for the syllable frequency effect and ambiguous syllable priming data (although the computational implementation has been more successful in illustrating syllable frequency effects rather than priming effects).

An illustration of the Levelt, Roelofs, and Meyer model. Illustrates the lexical level, the lemma level, and the lexeme level within the upper, “lexicon” portion of the diagram, with the syllabary and articulatory buffer contained below under “post-lexical”.

As we can see in Figure 9.4, the lemma node is connected to segment nodes. These vertices are specified for serial position and the segments are not coded for syllable position. Indeed, the only syllabic information that is stored in this model are syllable templates that indicate the stress patterns of each word (which syllable in the word is stressed and which is not). These syllabic templates are used during speech production to syllabify the segments using the principle of onset-maximization (all segments that can legally go into a syllable onset in a language are put into the onset and the leftover segments go into the coda). This kind of syllabification during production accounts for resyllabification (which is a problem for the Dell model). The model also has a mental syllabary which is hypothesized to contain the articulatory programs that are used to plan articulation.

The model is interesting in that syllabification is only relevant at the time of production. Phonemes are defined within the lexicon with regard to their serial position in the word or lemma. This allows for resyllabification across morpheme and word boundaries without any difficulties.  Roelofs and Meyer (1998) investigated whether syllable structures are stored in the mental frame. They employed an implicit priming paradigm where participants produced one word out of a set of words in rapid succession. The words were either homogenous (all words had the same word onsets) or heterogeneous. They found that priming depended on the targets having the same number of syllable and stress patterns but not the same syllable structure. This led them to conclude that syllable structure was not a stored component of speech production but computed during speech (Choline et al., 2004). Costa and Sebastian-Galles (1998) employed a picture-word interference paradigm to investigate this further. They asked participants to name a picture while a word was presented after 150 ms. They found that participants were faster to name a picture when they shared the same syllable structure with the word. These results challenge the view that syllable structure is absent as an abstract encoding within the lexicon. A new model has challenged the LRM model’s assumptions on this with a Lexicon with Syllable Structure (LEWISS) model.

The Lexicon with Syllable Structure (LEWISS) Model

Proposed by Romani et al. (2011), the Lexicon with Syllable Structure (LEWISS) model explores the possibility of stored syllable structure in phonological encoding. As seen in Figure 9.5 the organisation of segments in this model is based on a syllable structure framework (similar to proposals by Selkirk, 1982; Cairns & Feinstein, 1982). However, unlike the Dell model, the segments are not coded for syllable position. The syllable structural hierarchy is composed of syllable constituent nodes (onset, peak and coda) with the vertices having different weights based on their relative positions. This means that the peak (the most important part of a syllable) has a very strongly weighted vertex compared to onsets and codas. Within onsets and codas, the core positions are more strongly weighted compared to satellite position. This is based on the fact that there are positional variations in speech errors. For example, onsets and codas are more vulnerable to errors compared to vowels or peaks. Within onsets and codas, the satellite positions are more vulnerable compared to core positions. For example, in a word like print , the /r/ and /n/ in onset and coda satellite positions are more likely to be the subjects of errors than the /p/ and /t/ which are core positions. The main evidence for the LEWISS model comes from the speech errors of aphasic patients (Romani et al., 2011). It was observed that not only did they produce errors that weighted syllable positions differently, they also preserved the syllable structure of their targets even when making speech errors.

A diagram of the Lexicon with Syllable Structure model, which illustrates how the organization of segments can be based on syllable structure.

In terms of syllabification, the LEWISS model syllabifies at morpheme and word edges instead of having to syllabify the entire utterance each time it is produced. The evidence from speech errors supports the idea of having syllable position constraints. While Romani et al. (2011) have presented data from Italian, speech error analysis in Spanish also supports this view (Garcia-Albea et al., 1989). The evidence from Spanish is also interesting in that the errors are mostly word-medial rather than word-initial as is the case for English (Shattuck-Hufnagel, 1987, 1992). Stemberger (1990) hypothesised that structural frames for CV structure encoding may be compatible with phonological systems proposed by Clements and Keyser (1983) as well as Goldsmith (1990). This was supported by speech errors from German and Swedish (Stemberger, 1984). However, such patterns were not observed in English. Costa and Sebastian-Gallés (1998) found primed picture-naming was facilitated by primes that shared CV structure with the targets. Sevald, Dell and Cole (1995) found similar effects in repeated pronunciation tasks in English. Romani et al. (2011) brought these ideas to the fore with their analysis of speech errors made by Italian aphasic and apraxic patients. The patients did repetition, reading, and picture-naming tasks. Both groups of patients produced errors that targeted vulnerable syllable positions such as onset- and coda- satellites consistent with previous findings (Den Ouden, 2002). They also found that a large proportion of the errors preserved syllable structure even in the errors. This is noted by previous findings as well (Wilshire, 2002). Previous findings by Romani and Calabrese (1996) found that Italian patients replaced geminates with heterosyllabic clusters rather than homosyllabic clusters. For example, /ʤi.raf.fa/ became /ʤi.rar.fa/ rather than /ʤi.ra.fra/ preserving the original syllable structure of the target. While the Dell model’s segments coded for syllable position can also explain such errors, it cannot account for errors that moved from one syllable position to another. More recent computational calculations by Ramoo and Olson (2021) found that the resyllabification rates in English and Hindi as well as storage costs predicted by information theory do not discount LEWISS based on storage and computational costs.

Language Production Models

speech production definition

  • This is the non-verbal concept of the object that is elicited when we see a picture, read the word or hear it.
  • An abstract conceptual form of a word that has been mentally selected for utterance.
  • The meaningful unit (or units) of the lemma attached to specific segments.
  • Syllable nodes are created using the syllable template.
  • Segment nodes are specified for syllable position. So, [p onset] will be a separate segment from [p coda].
  • This node indicates that the word is singular.
  • This node specifies the CV structure and order of the word.
  • A syllable template is used in the syllabification process to indicate which segments can go where.
  • The segment category nodes are specified for syllable position. So, they only activate segments that are for onset, peak or coda syllable positions. Activation will be higher for the appropriate segment.

speech production definition

  • Segment nodes are connected to the morpheme node specified for serial position.
  • The morpheme is connected to a syllable template that indicates how many syllable are contained within the phonological word. It also indicates which syllables are stressed and unstressed.
  • Post-lexical syllabification uses the syllable template to syllabify the phonemes. This is also the place where phonological rules can be implimented. For example, in English, unvoiced stops will be aspirated in output.
  • Syllabified representations are used to access a Mental Syllabary of articulatory motor programs.
  • The final output.

LEWISS Model

speech production definition

  • The syllable structure nodes indicate the structure of the word’s syllable structure. They also specify syllable stress or tone. In addition, the connections are weighted. So, core positions and peak positions are strongly weighted compared to satellite positions.
  • Segment nodes are connected to the morpheme node. They are also connected to a syllable structure that keeps them in place.
  • Post-lexical syllabification syllabify the phonemes at morpheme and word boundaries. This is also the place where phonological rules can be implimented. For example, in English, unvoiced stops will be aspirated in output.

Navigate to the above link to view the interactive version of these models.

Media Attributions

  • Figure 9.3 The Dell Model by Dinesh Ramoo, the author, is licensed under a  CC BY 4.0 licence .
  • Figure 9.4 The LRM Model by Dinesh Ramoo, the author, is licensed under a  CC BY 4.0 licence .
  • Figure 9.5 The LEWIS Model by Dinesh Ramoo, the author, is licensed under a  CC BY 4.0 licence .

The process of putting individual segments into syllables based on language-specific rules.

The process by which segments that belong to one syllable move to another syllable during morphological changes and connected speech.

The structure of the syllable in terms of onset, peak (or nucleus) and coda.

Psychology of Language Copyright © 2021 by Dinesh Ramoo is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

speech production definition

Spencer Coffman

  • Guest Posts
  • Press Releases
  • Journal Articles
  • Supplemental Content
  • Guest Podcasts

Select Page

4 Stages of Speech Production

4 Stages of Speech Production

Humans produce speech on a daily basis. People are social creatures and are always talking to one another. Whether it is through social media, live conversation, texting, chat, or otherwise, we are always producing some form of speech. We produce this speech without thought.

That is, without the thought of how we produce it. Of course, we think about what we are going to say and how to say it so that the other people will listen but we don’t think about what it is made of and how our mind and body actually product speech.

If you have been following my other language-related articles, then you will not be surprised to find out that there are four stages of speech production. It seems that those who classified this data did so in measures of fours and fives. There are…

Five Methods to Learn a Language

Four Ways to Assess Student Knowledge

Five Language Learning Strategies

Four Properties of Spoken Language

The list goes on! Now we have four stages of speech production. These are the processes by which humans produce speech. All of the ways that we come up with the words we say have been compiled into four stages. These stages are not consecutive like normal scientific stages. Instead, they are simply classified as such.

This means that they are not something you go through developmentally. Rather they are simply different ways in which you may produce speech. I’ll describe each one of them so you can learn and understand what they are and know how exactly you come up with everything you say.

Money To Invest… Stop Paying For Trades!

Stage 1 – Conceptualization

The first one is called the Conceptualization Stage. This is when a speaker spontaneously thinks of what he or she is going to say. It is an immediate reaction to external stimuli and is often based on prior knowledge of the particular subject. No premeditation goes into these words and they are all formulated based upon the speaker’s knowledge and experience at hand. It is spontaneous speech. Examples of this can range from answering questions to the immediate verbiage produced as a result of stubbing your toe.

Stage 2 – Formulation

The second stage is called the Formulation Stage. This is when the speaker thinks of the particular words that are going to express their thoughts. It occurs almost simultaneously with the conceptualization stage. However, this time the speaker thinks about the response before responding. The speaker is formulating his or her words and deciding how best to reply to the external stimuli. Where conceptualization is more of an instant and immediate response, formulation is a little delayed.

Stage 3 – Articulation

The third stage is the Articulation Stage. This is when the speaker physically says what he or she has thought of saying. This is a prepared speech or planned wordage. In addition, the words may have been rehearsed such as when someone practices a presentation or rehearses a lie.

It involves the training of physical actions of several motor speech organs such as the lungs, larynx, tongue, lips, and other vocal apparatuses. Of course, the first two stages also involve these organs, however, the articulation stage uses these organs multiple times for the same word patterns.

Stage 4 – Self-Monitoring

The fourth stage is called the Self-Monitoring Stage. This is when the speaker reflects on what he or she has said and makes an effort to correct any errors in his or her speech. Often times this is done in a rebuttal or last words argument.

In addition, it could also be done during a conversation when the speaker realizes that he or she slipped up. This is the action of reflecting on what you said and making sure that what you said is what you meant.

Real-Time Spell Check And Grammar Correction

There you have it. Those are the four stages of speech production. Think about this and start to notice each time you are in each stage. Of course, you won’t be able to consciously notice what stage you are in all of the time. However, once in a while it may be amusing for you to reflect on these stages and see how they coincide with the words you speak.

For more great information take a look at  the supplemental content on this website  and check out  these great blog posts . In addition, feel free to  connect with me on social media.

Enjoying This Content?

Consider donating to support spencer coffman, venmo         paypal          cashapp, related posts.

Connecting Off Steemit With Discord And Steemit Chat

Connecting Off Steemit With Discord And Steemit Chat

August 31, 2020

The Best Way To Get High Quality Backlinks

The Best Way To Get High Quality Backlinks

October 3, 2017

3 Not So Common, Common Sense Factors To Muscle Loss

3 Not So Common, Common Sense Factors To Muscle Loss

September 21, 2017

Interview With Author Paul Bondsfield

Interview With Author Paul Bondsfield

June 5, 2018

United States Trade Representative

  • Readout of Ambassador Katherine Tai’s Meeting With Peru’s Minister of Foreign Trade and Tourism Elizabeth Galdo Marín
  • Ambassador Tai to Travel to Phoenix, Arizona
  • The Office of the U.S. Trade Representative, Small Business Administration, and Department of Commerce Convene the Third USMCA Small and Medium-Sized Enterprise Dialogue
  • ICYMI: U.S. Trade Representative Katherine Tai Delivers Remarks on Her Actions to Increase China Tariffs

U.S. Trade Representative Katherine Tai to Take Further Action on China Tariffs After Releasing Statutory Four-Year Review

  • USTR Invites Public Participation in Stakeholder Listening Session During Fifth United States-Kenya Strategic Trade and Investment Partnership Negotiating Round
  • USMCA Rapid Response Labor Mechanism Panel Releases Determination Regarding Grupo México Mine; Biden-Harris Administration Will Continue Seeking to Enforce USMCA Labor Obligations and Advance Workers’ Rights
  • Ambassador Tai to Travel to Arequipa, Peru
  • Week Ahead Guidance for the Week of May 13 to May 17, 2024
  • United States and Kenya to Hold Fifth Negotiating Round Under their Strategic Trade and Investment Partnership
  • Ambassador Tai to Travel to Los Angeles, California
  • United States and the Philippines Hold Agriculture and Labor Working Group Meetings Under the Trade and Investment Framework Agreement
  • Readout of the U.S.-Iraq Trade and Investment Framework Agreement (TIFA) Council Meeting
  • USTR Announces Neil Beck as Acting Assistant U.S. Trade Representative for WTO and Multilateral Affairs
  • Week Ahead Guidance for the Week of May 6 to May 10, 2024
  • Statement from Ambassador Katherine Tai in Recognition of International Workers’ Day
  • Readout of Negotiating Round Under the U.S.-Taiwan Initiative on 21st-Century Trade
  • Statement from Ambassador Katherine Tai Celebrating Asian American, Native Hawaiian, and Pacific Islander Heritage Month
  • Policy Offices
  • Press Office
  • Press Releases

May 14, 2024

WASHINGTON – U.S. Trade Representative Katherine Tai today released the following statement concerning the statutory review of the tariff actions in the Section 301 investigation of China’s Acts, Policies, and Practices Related to Technology Transfer, Intellectual Property, and Innovation: “After thorough review of the statutory report on Section 301 tariffs, and having considered my advice, President Biden is directing me to take further action to encourage the elimination of the People’s Republic of China’s unfair technology transfer-related policies and practices that continue to burden U.S. commerce and harm American workers and businesses,” said Ambassador Katherine Tai.    “As the President recognizes in his memorandum, while the tariffs have been effective in encouraging the PRC to take some steps to address the issues identified in the Section 301 investigation, further action is required.   “In light of President Biden’s direction, I will be proposing modifications to the China tariffs under Section 301 to confront the PRC’s unfair policies and practices. From the beginning of the Biden-Harris Administration, I have been committed to using every lever of my office to promote American jobs and investments, and these recommendations are no different. Today, we serve our statutory goal to stop the PRC’s harmful technology transfer-related acts, policies, and practices, including its cyber intrusions and cyber theft. I take this charge seriously, and I will continue to work with my partners across sectors to ensure any action complements the Biden-Harris Administration’s efforts to expand opportunities for American workers and manufacturers.”   The Section 301 statute directs that the four-year review includes a consideration of: the effectiveness of the tariff actions in achieving the objective of the investigation; other actions that could be taken; and the overall effects of the tariff actions on the U.S. economy. The Office of the U.S. Trade Representative’s (USTR) Report  addresses the statutory elements of the review, suggests modifications to strengthen the actions, and makes certain recommendations.   To encourage further elimination of the PRC’s technology transfer-related acts, policies, and practices, Ambassador Tai has recommended that products from the PRC currently subject to Section 301 tariffs should remain. Additionally, in light of the increased burden on U.S. commerce, President Biden is directing Ambassador Tai to take action to add or increase tariffs for certain products. As the Report details, Ambassador Tai will propose the following modifications in strategic sectors:

The Report also makes recommendations for: (1) establishing an exclusion process targeting machinery used in domestic manufacturing, including proposals for 19 exclusions for certain solar manufacturing equipment; (2) allocating additional funds to United States Customs and Border Protection for greater enforcement of Section 301 actions; (3) greater collaboration and cooperation between private companies and government authorities to combat state-sponsored technology theft; and (4) continuing to assess approaches to support diversification of supply chains to enhance our own supply chain resilience.   President Biden is also directing Ambassador Tai to establish an exclusion process for machinery used in domestic manufacturing and to prioritize, in particular, exclusions for certain solar manufacturing equipment.   Next week, USTR will issue a Federal Register notice announcing procedures for interested persons to comment on the proposed modifications and information concerning an exclusion process for machinery used in domestic manufacturing.    Background     In May 2022, USTR commenced the statutory four-year review process by notifying representatives of domestic industries that benefit from the tariff actions of the possible termination of those actions and of the opportunity for the representatives to request continuation.  In September 2022, USTR announced that because requests for continuation were received, the tariff actions had not terminated and USTR would conduct a review of the tariff actions.  USTR opened a docket on November 15, 2022, for interested persons to submit comments with respect to a number of considerations concerning the review.  USTR received nearly 1,500 comments.   As part of the statutory review process, throughout 2023 and early 2024, USTR and the Section 301 Committee (a staff-level body of the USTR-led, interagency Trade Policy Staff Committee) held numerous meetings with agency experts concerning the review and the comments received.    Specifically, the Report concludes: 

  • The Section 301 actions have been effective in encouraging the PRC to take steps toward eliminating some of its technology transfer-related acts, policies, and practices and have reduced some of the exposure of U.S. persons and businesses to these technology transfer-related acts, policies, and practices.  
  • The PRC has not eliminated many of its technology transfer-related acts, policies, and practices, which continue to impose a burden or restriction on U.S. commerce. Instead of pursuing fundamental reform, the PRC has persisted, and in some cases become aggressive, including through cyber intrusions and cybertheft, in its attempts to acquire and absorb foreign technology, which further burden or restrict U.S. commerce.  
  • Economic analyses generally find that tariffs (particularly PRC retaliation) have had small negative effects on U.S. aggregate economic welfare, positive impacts on U.S. production in the 10 sectors most directly affected by the tariffs, and minimal impacts on economy-wide prices and employment.  
  • Negative effects on the United States are particularly associated with retaliatory tariffs that the PRC has applied to U.S. exports.  
  • Critically, these analyses examine the tariff actions as isolated policy measures without reference to the policy landscape that may be reinforcing or undermining the effects of the tariffs.  
  • Economic analyses, including the principal U.S. Government analysis published by the U.S. International Trade Commission, generally find that the Section 301 tariffs have contributed to reducing U.S. imports of goods from the PRC and increasing imports from alternate sources, including U.S. allies and partners, thereby potentially supporting U.S. supply chain diversification and resilience. 

  

United States Trade Representative

  • 600 17th Street NW
  • Washington, DC 20508

Twitter

  • Reports and Publications
  • Fact Sheets
  • Speeches and Remarks
  • Blog and Op-Eds
  • The White House Plan to Beat COVID-19
  • Free Trade Agreements
  • Organization
  • Advisory Committees
  • USTR.gov/open
  • Privacy & Legal
  • FOIA & Privacy Act
  • Attorney Jobs

Find your Senator and share your views on important issues.

Senate Bill S9626

Relates to the definition of construction contracts

Share this bill

  • ." class="c-detail--social-item bill">

Sponsored By

speech production definition

Neil D. Breslin

(D, WF) 46th Senate District

Current Bill Status - In Senate Committee Consumer Protection Committee

  • In Committee Assembly
  • In Committee Senate
  • On Floor Calendar Assembly
  • On Floor Calendar Senate
  • Passed Assembly
  • Passed Senate
  • Delivered to Governor
  • Signed By Governor

Do you support this bill?

Please enter your contact information

Optional services from the NY State Senate:

Include a custom message for your Senator? (Optional)

2023-S9626 (ACTIVE) - Details

2023-s9626 (active) - summary.

Removes residential dwelling units that must be affordable to residents at a specific income level from the definition of "construction contracts" for the purposes of certain provisions of the General Business Law.

2023-S9626 (ACTIVE) - Sponsor Memo

  • View More (38 Lines)

2023-S9626 (ACTIVE) - Bill Text download pdf

  • View More (10 Lines)

Open Legislation is a forum for New York State legislation. All comments are subject to review and community moderation is encouraged.

Comments deemed off-topic, commercial, campaign-related, self-promotional; or that contain profanity, hate or toxic speech; or that link to sites outside of the nysenate.gov domain are not permitted, and will not be published. Attempts to intimidate and silence contributors or deliberately deceive the public, including excessive or extraneous posting/posts, or coordinated activity, are prohibited and may result in the temporary or permanent banning of the user. Comment moderation is generally performed Monday through Friday. By contributing or voting you agree to the Terms of Participation and verify you are over 13.

Create an account . An account allows you to sign petitions with a single click, officially support or oppose key legislation, and follow issues, committees, and bills that matter to you. When you create an account, you agree to this platform's terms of participation .

The Federal Register

The daily journal of the united states government, request access.

Due to aggressive automated scraping of FederalRegister.gov and eCFR.gov, programmatic access to these sites is limited to access to our extensive developer APIs.

If you are human user receiving this message, we can add your IP address to a set of IPs that can access FederalRegister.gov & eCFR.gov; complete the CAPTCHA (bot test) below and click "Request Access". This process will be necessary for each IP address you wish to access the site from, requests are valid for approximately one quarter (three months) after which the process may need to be repeated.

An official website of the United States government.

If you want to request a wider IP range, first request access for your current IP, and then use the "Site Feedback" button found in the lower left-hand side to make the request.

Editorial: California blew it on bail reform. Now Illinois is showing it works

Los Angeles Superior Court judges at Santa Monica Courthouse

  • Show more sharing options
  • Copy Link URL Copied!

California lawmakers passed a bill eliminating money bail in 2018, but voters overturned the important reform in tumultuous 2020 after a fear-stoking referendum campaign led by the bail bond industry. The state is now slowly picking its way through more modest improvements set in motion by court policies and lawsuits, leaving us with a piecemeal system that is too slowly and inconsistently rolling back the role of wealth and poverty in determining who gets out of jail before trial.

That left leadership to other states. The Illinois General Assembly passed a law in 2021 that made the state the first in the nation to eliminate money bail. Opponents (again, supported by the bail bond industry) sued, but the Illinois Supreme Court upheld the law last year. Now people who are arrested stay in jail, regardless of how much money they have, if they are deemed by a judge to be too risky to public safety to be released. Those not considered a risk are set free, sometimes with conditions such as ankle monitors, no matter how empty their wallets may be.

FILE - This Sept. 28, 2011, photo, the Los Angeles County Sheriff's Men's Central Jail facility shows in Los Angeles. Los Angeles County has cancelled a nearly $2 billion contract to replace an aging jail after criticism that it needs better ways to deal with a growing population of the mentally ill. County supervisors on Tuesday, Aug. 13, 2019, voted to scrap the contract to replace the Men's Central Jail with a mental health treatment center that critics said was simply another jail. (AP Photo/Damian Dovarganes, File)

Ending Cash Bail

On Oct. 1 Los Angeles County transitions to a new way of administering pre-arraignment justice that doesn’t use cash bail for most crimes.

Sept. 25, 2023

Bail reform opponents predicted mayhem. Too many criminals would be caught, ticketed and turned loose to commit more crimes, they said.

They were wrong. Nearly a year later, data show Illinois’ no-money-bail program is working out quite well . Arrests for new crimes by people released pending trial are coming in so far at about 4% in Cook County , which includes Chicago and much of the state’s crime. That’s about on par with or slightly better than the pre-reform rearrest rate over the last several years. Defendants who promise to show up for their hearings do, for the most part. Warrants are issued for the approximately 10% who don’t — again, about the same as the proportion previously released before trial with or without having posted bail.

LOS ANGELES, CALIF. -- THURSDAY, AUGUST 30, 2018: Bad Boys Bail Bonds is located across the street from the Los Angeles County Jail in Los Angeles, Calif., on Aug. 30, 2018. Gov. Jerry Brown signed Senate Bill 10, replacing bail with “risk assessments” of individuals and non-monetary conditions of release. The change, which will take effect in October 2019. (Gary Coronado / Los Angeles Times)

Editorial: L.A.’s bail reform is an improvement, but falls short of what Illinois has done

The L.A. County Superior Court is scaling back cash bail, but it was outclassed by the Illinois Supreme Court, which upheld legislation halting it altogether.

July 26, 2023

Numbers of rearrests and failures to appear across Illinois’ other 101 counties range from similar to sharply lower.

There are some costs to the no-money-bail program — for example, in court time. Judges who in the past might have decided to hold or release defendants based on their ability to pay are now spending more time in pretrial hearings to weigh arguments and evidence. That’s as it should be. Imagine a system in which a court hands out convictions or acquittals based on how much money the defendant pays, rather than on the weight of witness testimony and other evidence.

Such a system would be the very definition of corruption and injustice. Yet that’s what money bail systems do during the period before trial.

There are also benefits. Billions of dollars in bail bond payments that were previously extracted from families, usually from those who could afford it the least, can be used for housing, food and other daily expenses. The burdens of poverty that are borne disproportionately by people of color now no longer turn automatically into disproportionate pretrial incarceration. Jail populations in Illinois are declining, meaning less taxpayer money spent to feed and house people who would be safe to release.

LOS ANGELES, CA - SEPTEMBER 25, 2023 - The Bail Boys Bail Bonds in Los Angeles on September 25, 2023. A new pretrial protocol will eliminate money bail for most people in Los Angeles County between their arrest and their arraignment, beginning Oct. 1. (Genaro Molina / Los Angeles Times)

Editorial: L.A. County cities should stop fighting the end of cash bail. It’s working

It could work better, though. New data show eliminating money bail for most defendants in Los Angeles County did not increase repeat arrests. But not as many people are being released as expected.

March 1, 2024

The biggest losers in Illinois are, predictably, members of the bail bond industry, including agents and the sureties — in effect, insurance companies — that work with them.

Illinois’ no-money-bail system is leaps and bounds ahead of Los Angeles County’s extremely modest program . For one thing, the program designed and operated by the Superior Court only applies to low-level crimes. Anyone accused of a serious felony is ineligible for no-money-bail release, yet ironically can still be set free — and in some cases must be set free — if they pay their bail, even if they are at high risk to public safety.

LOS ANGELES, CA - MAY 18, 2017 --The Clara Shortridge Foltz Criminal Justice Center is the county courthouse located at 210 West Temple Street, between Broadway and Spring in downtown Los Angeles on May 18, 2017. California’s court leaders expressed alarm Wednesday over a new study that showed more than 100 courthouses in the state — including many in Los Angeles County — could collapse and cause “substantial” loss of life in a major earthquake. Courthouses near the top of the list of buildings in peril include the Stanley Mosk Courthouse and Clara Shortridge Foltz Criminal Justice Center, the Pasadena municipal courthouse, and courthouses in Beverly Hills and Burbank. (Al Seib / Los Angeles Times)

Editorial: L.A. court’s refreshing candor on money bail: It doesn’t make us safer

Data from three weeks of the new pre-arraignment release protocol pokes some serious holes in the false and fearful narrative peddled by police and far too many politicians.

Nov. 6, 2023

For another thing, L.A.’s program only applies in the short pre-arraignment phase — the period between arrest and the defendant’s initial appearance before a judge, which is usually only two or three days. A defendant who is freed at the police station might be out for 30 days, then at the arraignment ordered into custody all over again — or even ordered to pay money bail.

More than two dozen cities are suing the Superior Court in the wildly misinformed belief that using risk factors to determine which defendants to detain and which to release, instead of payments, somehow makes the public less safe. City officials may believe, falsely, that defendants out on bail will forfeit their money if they are arrested again while waiting for trial. Defendants forfeit their money only if they fail to show up for hearings, and usually not even then. Bail does not provide much of a financial incentive to alter behavior.

Or they may believe that people with money are just naturally better risks than people without, although there is no evidence to support that.

Or they may be just too eager to listen to fairy tales told to them by members of the same industry that defeated bail reform in California four years ago, but was thankfully unable to do the same in Illinois.

More to Read

CALABASAS, CA-OCTOBER 19, 2022: Batteries are locked up behind a glass showcase at a Ralphs market in Calabasas. Retail chains are adding more security measures to combat retail theft. (Mel Melcon / Los Angeles Times)

Opinion: This tough-on-crime proposal won’t solve California retail theft, but it would crowd our prisons

May 7, 2024

The United States Court of Appeals for the Seventh Circuit.

Editorial: Indiana’s private-for-profit asset forfeiture scheme undermines justice

May 2, 2024

Represa CA - April 13: Razor wire tops a fence near the Short-Term Restricted Housing Unit at California State Prison, Sacramento. The unit is for prisoners in segregated or solitary confinement. The California legisature is considering another bill (AB 280) to restrict solitary confinement after a similar proposal died last year. (Luis Sinco / Los Angeles Times)

Opinion: California’s budget deficit will force difficult cuts. This one should be the easiest

April 30, 2024

A cure for the common opinion

Get thought-provoking perspectives with our weekly newsletter.

You may occasionally receive promotional content from the Los Angeles Times.

The Los Angeles Times’ editorial board determines the positions of The Times as an institution. It operates separately from the newsroom. You can read more about the board’s mission and its members at About The Times Editorial Board .

More From the Los Angeles Times

A BYD Seagull electric vehicle is displayed at the Caresoft Global facility Wednesday, April 3, 2024, in Livonia, Mich. Caresoft President Terry Woychowski, a former engineer at General Motors, said the Seagull represents a "clarion call" for the U.S. auto industry. (AP Photo/Mike Householder)

Editorial: China embraced electric vehicles. The U.S. didn’t. Now we’re paying the price

May 17, 2024

Students from Benjamin Franklin Elementary School learn how to play the mariachi classic "arboles de la barranca" by learning how to play the violin, trumpet and guitar during the Anaheim Elementary School District's, Learning Opportunities Program at Benjamin Franklin Elementary School in Anaheim on Tuesday, December 6, 2022. (Photo by James Carbone)

Editorial: Critics say Prop. 28 arts funding is being misspent. School administrators need to show their work

May 16, 2024

Los Angeles, CA - October 28: Los Angeles City Council President Paul Krekorian this morning during the Los Angeles City Council meeting at Los Angeles City Hall on Friday, Oct. 28, 2022, in Los Angeles, CA. (Francine Orr / Los Angeles Times)

Editorial: L.A. City Council just proved it can’t be trusted to fix itself

May 15, 2024

President Joe Biden speaks at the U.S. Holocaust Memorial Museum's Annual Days of Remembrance ceremony at the U.S. Capitol, Tuesday, May 7, 2024 in Washington. Statue of Freedom stands behind.(AP Photo/Evan Vucci)

Editorial: House antisemitism bill would stymie free speech and wouldn’t make students safer

May 14, 2024

IMAGES

  1. 4 Stages of Speech Production

    speech production definition

  2. PPT

    speech production definition

  3. 8 A schematic diagram of the human speech production mechanism

    speech production definition

  4. PPT

    speech production definition

  5. A schematic diagram of the human speech production mechanism

    speech production definition

  6. Speech Production Mechanism

    speech production definition

VIDEO

  1. Module 4: Speech & Technology

  2. Mechanism of Speech Production (ENG)

  3. Introduction to Speech Processing (Lecture 1)

  4. Production Function

  5. Speech Production: Systems involved

  6. production ASL

COMMENTS

  1. Speech production

    Speech production is the process by which thoughts are translated into speech. This includes the selection of words, the organization of relevant grammatical forms, and then the articulation of the resulting sounds by the motor system using the vocal apparatus. Speech production can be spontaneous such as when a person creates the words of a ...

  2. Speech Production

    Definition. Speech production is the process of uttering articulated sounds or words, i.e., how humans generate meaningful speech. It is a complex feedback process in which hearing, perception, and information processing in the nervous system and the brain are also involved. Speaking is in essence the by-product of a necessary bodily process ...

  3. Articulating: The Neural Mechanisms of Speech Production

    Speech production is a highly complex sensorimotor task involving tightly coordinated processing across large expanses of the cerebral cortex. Historically, the study of the neural underpinnings of speech suffered from the lack of an animal model. The development of non-invasive structural and functional neuroimaging techniques in the late 20 ...

  4. Speech Production

    Speech production is a complex process that includes the articulation of sounds and words, relying on the intricate interplay of hearing, perception, and information processing by the brain and ...

  5. 2.1 How Humans Produce Speech

    Speech is produced by bringing air from the lungs to the larynx (respiration), where the vocal folds may be held open to allow the air to pass through or may vibrate to make a sound (phonation). The airflow from the lungs is then shaped by the articulators in the mouth and nose (articulation). The field of phonetics studies the sounds of human ...

  6. Speech

    Speech is the faculty of producing articulated sounds, which, when blended together, form language. Human speech is served by a bellows-like respiratory activator, which furnishes the driving energy in the form of an airstream; a phonating sound generator in the larynx (low in the throat) to transform the energy; a sound-molding resonator in ...

  7. 9.2 The Standard Model of Speech Production

    Figure 9.2 The Standard Model of Speech Production. The Standard Model of Word-form Encoding as described by Meyer (2000), illustrating five level of summation of conceptualization, lemma, morphemes, phonemes, and phonetic levels, using the example word "tiger". From top to bottom, the levels are:

  8. Speech Production

    Producing speech takes three mechanisms. Respiration at the lungs. Phonation at the larynx. Articulation in the mouth. Let's take a closer look. Respiration (At the lungs): The first thing we need to produce sound is a source of energy. For human speech sounds, the air flowing from our lungs provides energy. Phonation (At the larynx ...

  9. Speech Production

    Speech Production. J. Harrington, C. Mooshammer, in Encyclopedia of Language & Linguistics (Second Edition), 2006 Exemplar Theory. Weaver and many other speech production models based on performance errors adopt the idea from generative phonology that there is a phonological grammar and a component for phonetic implementation that is independent of the words in the lexicon.

  10. Speech Production

    A theory of speech production provides an account of the means by which a planned sequence of language forms is implemented as vocal tract activity that gives rise to an audible, intelligible acoustic speech signal. Such an account must address several issues. Two central issues are considered in this article.

  11. The Source-Filter Theory of Speech

    To systematically understand the mechanism of speech production, the source-filter theory divides such process into two stages (Chiba & Kajiyama, 1941; Fant, 1960) (see figure 1): (a) The air flow coming from the lungs induces tissue vibration of the vocal folds that generates the "source" sound.Turbulent noise sources are also created at constricted parts of the glottis or the vocal tract.

  12. Speech science

    The production of speech is a highly complex motor task that involves approximately 100 orofacial, laryngeal, pharyngeal, and respiratory muscles. Precise and expeditious timing of these muscles is essential for the production of temporally complex speech sounds, which are characterized by transitions as short as 10 ms between frequency bands and an average speaking rate of approximately 15 ...

  13. The anatomical and physiological basis of human speech production

    Human speech production involves a range of physical features which may have evolved as specific adaptations for this purpose; alternatively, they evolved as exaptations, commandeering existing features. Combining knowledge of the anatomical and physiological basis of human speech production, comparisons with other primate species, and ...

  14. Speech Production From a Developmental Perspective

    The speech production process is then reimagined in developmental stages, with each stage building on the previous one. Conclusion. The resulting theory proposes that speech production relies on conceptually linked representations that are information-reduced holistic perceptual and motoric forms, constituting the phonological aspect of a ...

  15. (PDF) Speech Production

    Speech Production. Dani Byrd. Department of Linguistics, USC, 3601 Watt Way, GFS 301, Los Angeles, CA 90089-1693; [email protected]. Elliot Saltzman. Department of Physical Therapy, Boston University ...

  16. The Handbook of Speech Production

    The Handbook of Speech Production is the first reference work to provide an overview of this burgeoning area of study. Twenty-four chapters written by an international team of authors examine issues in speech planning, motor control, the physical aspects of speech production, and external factors that impact speech production. Contributions bring together behavioral, clinical, computational ...

  17. Psycholinguistics/Development of Speech Production

    Speech production is an important part of the way we communicate. We indicate intonation through stress and pitch while communicating our thoughts, ideas, requests or demands, and while maintaining grammatically correct sentences. However, we rarely consider how this ability develops.

  18. Physiology of Speech Production

    The physiology of speech production in terms of articulatory dynamics is the subject of this monograph.An extensive study of articulatory motions is clearly ...

  19. 1

    The production of a speech sound may be divided into four separate but interrelated processes: the initiation of the air stream, normally in the lungs; its phonation in the larynx through the operation of the vocal folds; its direction by the velum into either the oral cavity or the nasal cavity (the oro-nasal process); and finally its ...

  20. Phonation

    Among some phoneticians, phonation is the process by which the vocal folds produce certain sounds through quasi-periodic vibration. This is the definition used among those who study laryngeal anatomy and physiology and speech production in general. Phoneticians in other subfields, such as linguistic phonetics, call this process voicing, and use ...

  21. 9.3 Speech Production Models

    Dell's spreading activation model (as seen in Figure 9.3) has features that are informed by the nature of speech errors that respect syllable position constraints. This is based on the observation that when segmental speech errors occur, they usually involve exchanges between onsets, peaks or codas but rarely between different syllable positions.

  22. Speech production

    speech production: 1 n the utterance of intelligible speech Synonyms: speaking Types: speech the exchange of spoken words susurration , voicelessness , whisper , whispering speaking softly without vibration of the vocal cords stage whisper a loud whisper that can be overheard; on the stage it is heard by the audience but it supposed to be ...

  23. 4 Stages of Speech Production

    This is the action of reflecting on what you said and making sure that what you said is what you meant. Real-Time Spell Check And Grammar Correction. Conclusion. There you have it. Those are the four stages of speech production. Think about this and start to notice each time you are in each stage.

  24. U.S. Trade Representative Katherine Tai to Take Further Action on China

    Economic analyses generally find that tariffs (particularly PRC retaliation) have had small negative effects on U.S. aggregate economic welfare, positive impacts on U.S. production in the 10 sectors most directly affected by the tariffs, and minimal impacts on economy-wide prices and employment.

  25. NY State Senate Bill 2023-S9626

    BILL NUMBER: S9626 SPONSOR: BRESLIN TITLE OF BILL: An act to amend the general business law, in relation to the definition of construction contracts PURPOSE OF THE BILL: The bill would amend the definition of "construction contract" in the General Business Law Article 35-E to exclude any residential project that includes or will include residential dwelling units that must be affordable to ...

  26. PDF 43424 Federal Register /Vol. 89, No. 97/Friday, May 17, 2024 ...

    investments in domestic production. HUD continues its efforts to implement the Act in those programs consistent with the guidance and requirements of the Made in America Office of the Office of Management and Budget, including guidance concerning appropriate compliance with the BAP. III. Waivers Under section 70914(b), HUD and

  27. Federal Register :: Modernization Updates to Standards of Ethical

    Example 1 to paragraph (b)(2): After giving a speech at the facility of a pharmaceutical company, a Government employee is presented with a glass paperweight in the shape of a pill capsule with the name of the company's latest drug and the date of the speech imprinted on the side. The employee may accept the paperweight because it is an item ...

  28. Editorial: California blew it on bail reform. Now Illinois is leading

    California got cold feet on bail reform as voters rejected a groundbreaking program to eliminate money from pretrial release decisions. Now Illinois shows it can work.