You are using an outdated browser. Please upgrade your browser or activate Google Chrome Frame to improve your experience.

FluentU Logo

9 Influential Theories of Language Learning by Brilliant Thinkers

Language is all around us and yet many of us find it challenging to pick up a new one .

Many of us wish to be more like those kids that we once were when we learned our first language.  Simply absorbing things  the way kids do without really thinking about the language must surely be our best bet, we convince ourselves.

But here’s the thing. We’re not kids anymore and we never will be again. So, what can you do if you want to learn a new language? That’s where language learning theories come in. 

Theory, the most highly condensed form of thought based on principles and evidence, can help us as adults to excel in language learning in ways that would otherwise not be possible.

Of course, learning about language learning theory in no way needs to occupy the bulk of your time. By devoting just a fraction of your time to theory right now, you’ll reap benefits far beyond getting in an extra 10 minutes of studying. So without further ado, let’s start at the beginning.

1. Plato’s Problem

2. cartesian linguistics, by descartes.

  • 3. Locke’s Tabula Rasa

4. Skinner’s Theory of Behaviorism

5. chomsky’s universal grammar, 6. cognitive theory , 7. schumann’s acculturation model, 8. krashen’s monitor model, 9. social interactionist theory, and one more thing....

Download: This blog post is available as a convenient and portable PDF that you can take anywhere. Click here to get a copy. (Download)

The writings of Plato stretch all the way back to the beginnings of Western philosophical thought, but Plato was already posing problems critical to modern linguistic discourse.

In the nature versus nurture debate, Plato tended to side with nature, believing that knowledge was  innate .

This was his answer to what has become known as Plato’s Problem , or as Bertrand Russell summarizes it: “How comes it that human beings, whose contacts with the world are brief and personal and limited, are nevertheless able to know as much as they do know?”

Being born with this knowledge from the get-go would naturally solve this little quandary and consequently he viewed language as innate.

Centuries later, the French philosopher Descartes took a crack at linguistic philosophy. In his opinion, language acquisition was  a simple and easy process, barely worthy of his attention .

Like Plato, he believed in the innateness of language  because he thought it reflected the general rationality of human beings.

But rather than Descartes himself, it was the rationalist movement that he symbolized and that was thriving in the time period when he lived that was most important for linguistics.

This “Cartesian” movement, according to Chomsky (who we’ll get to later), noted the creativity involved in everyday language and presented the idea that there were  universal principles behind every language .

3. Locke’s  Tabula Rasa

Most people familiar with Locke’s philosophy have heard of his concept of tabula rasa , or the blank slate .

To state it briefly and in a simplified manner, this is the idea that all knowledge comes from outside ourselves through sensory experience rather than through innate knowledge that we have at birth.

This naturally carried over to language theory with Locke rejecting the idea that there was an innate logic behind language.

Obviously, these theories don’t touch too much on the practical, everyday level of language learning. They’re far less detailed and more philosophical than the modern scientific theories we’re used to. But they have important implications.

If Plato and the Cartesians are right, then the emphasis in language learning must lie on what we already know , using our innate abilities to come to an understanding of the particularities of a specific language. If Locke is right, then we must focus our attention on sensory input , gaining as much external input as possible.

In the middle of the 20th century, B.F. Skinner took Locke’s ideas of sensory input and ran with them .

According to behaviorism, all behavior is no more than a response to external stimuli and there’s no innate programming within a human being to learn a language at birth.

In his concept of what he called “operant conditioning,” language learning grew out of a process of reinforcement and punishment whereby individuals are conditioned into saying the right thing.

For instance, if you’re hungry and you’re able to say “Mommy, I’m hungry,” you may be rewarded with food and your behavior will thereby be reinforced since you got what you wanted.

To put it another way, Skinner described a mechanism for language learning that hadn’t existed before on the  tabula rasa side of the language acquisition debate.

What this means for us as language learners, should his theory be even partially true, is that a process of conditioning must be achieved for us to succeed.

When we say the right thing, we must be rewarded. When we say something incorrectly, that too must be made clear. In other words, we need feedback to succeed as language learners.

Around the same time as Skinner, Noam Chomsky, one of the most influential nativist theorists, proposed another theory called Universal Grammar (the 1950s) and it would assert nearly the exact opposite of what Skinner had offered in his theory .

Where Skinner saw all learning coming from external stimuli, Chomsky saw an innate device for language acquisition . What Skinner understood to be conditioning according to particular events Chomsky, understood to be the result of the   universal elements that structure all languages .

In fact, one of Chomsky’s major bones to pick with Skinner’s theory had to do with Plato’s problem, as described above. After all, if Skinner is right, how is it that children can learn a language so quickly, creating and understanding sentences they have never heard before?

Universal Grammar has also received plenty of criticism. One critique that particularly concerns us is that it may have little to do with learning a second language, even if it’s how we learn a first language.

There are certainly theories about applying this concept to organize syllabi  for language learning, but this seems unnecessarily complex for the average, independent learner.

The Cognitive Theory of language acquisition made its mark in the late 20th century, influenced by the pioneering work of Swiss psychologist Jean Piaget. This theory emphasizes the role of cognitive processes like memory, attention and problem-solving  in language learning journey. In other words, it says that to speak a language you don’t just need words and grammar; it also important to have meaningful and engaging experiences.

When it comes to learning a foreign language, the Cognitive Theory serves up a fresh perspective. Stop passively memorizing vocabulary lists and start applying your language knowledge in practical, real-world contexts!

It’s all about language input and exposure. Just like children thrive on exposure to their native language, adults benefit from authentic materials. So dive into literature, immerse yourself in videos, and engage in real-life conversations – these experiences provide a diverse palette of language structures and vocabulary.

But the Cognitive Theory doesn’t stop there. It recognizes the power of metacognitive strategies in your language learning journey. Think of it as the captain steering the ship. Strategies like self-monitoring, self-assessment, and reflection become your compass. They help you navigate the language-learning waters, fine-tune your course, and adapt your language usage as you go. In other words, they’re your secret weapons in regulating your learning and becoming a more effective language learner!

John Schumann’s Acculturation Model describes the process by which immigrants pick up a new language while being completely immersed in that language.

This theory doesn’t deal with the process of language learning as we normally think of it (such as how we acquire grammar or listening skills), but rather focuses on social and psychological aspects that influence our success .

For instance, an immigrant is more likely to acquire their new target language if their language and the target language are socially equal, if the group of immigrants is small and not cohesive and if there is a higher degree of similarity between the immigrant’s culture and that of their new area of residence.

The obvious takeaway is that language learning is not an abstract subject like physics that can be learned out of a book regardless of the world around you. There are sociological factors at play, and the more we do to connect with the culture on the other end of our second language, the faster and easier it will be for us to learn that language.

For example, as a language learner, one way you could interact with the cultural context that imitates the immersion experience is with a program such as FluentU .

Stephen Krashen’s Monitor Model in fact consists of several distinct hypotheses which make up what is probably the most cited theory  in second language acquisition.  There’s so much to take away from Krashen’s theory that I’ll just let you peruse the link given for details and give a rundown of the highlights here.

  • Language acquisition is subconscious and results from informal, natural communication.
  • Language learning is conscious and driven by error correction (more formal).
  • Grammar structures are acquired in a predictable order.
  • Language acquisition occurs with comprehensible input (i.e. hearing or reading things that are just slightly above our current language level).
  • A monitor is anything that corrects your language performance and pressures one to “communicate correctly and not just convey meaning” (such as a language teacher who corrects you when you make a grammatical mistake).

It should be noted that this is just Krashen’s theory. While this theory is quite popular, there has been criticism and direct contradiction of certain parts of it (particularly his idea about the predictable order of grammar structures). Still, it’s useful to get ideas for language learning.

This theory suggests that we should both strive to increase our second language inputs and make sure we receive proper error correction in one form or another .

Lev Vygotsky’s Social Interactionist Theory of language acquisition is all about the power of social interaction in your language learning journey. According to this theory, language isn’t a solo endeavor and is shaped by the interactions and collaborations with people around you within your cultural context.

It emphasizes the profound impact of authentic interactions and collaborations . To learn effectively, it’s not just about your innate abilities, but also about immersing yourself in social and cultural environments . When you interact with native speakers, you can observe, imitate, and receive immediate feedback and corrections – a real-life language lab at your disposal.

The Social Interactionist Theory doesn’t stop at language alone. It understands the inextricable connection between language and culture. To truly master a foreign language, you need to grasp the cultural nuances intertwined with it. Dive deep into the traditions, practices, and perspectives of the language you’re studying.

As this selection of important theories should make clear, the subset of linguistics which deals with language learning is both wide and deep.

Some of it is highly theoretical and complex and is most relevant to scholars of the field. Other parts are extremely zoomed in and tell us highly specific details about how to learn a language .

Regardless, it’s all connected.

By understanding more bits and pieces of it all, you’ll gradually begin to understand yourself and your own language learning process better than ever before.

If you dig the idea of learning on your own time from the comfort of your smart device with real-life authentic language content, you'll love using FluentU .

With FluentU, you'll learn real languages—as they're spoken by native speakers. FluentU has a wide variety of videos as you can see here:

learn-a-language-with-videos

FluentU App Browse Screen.

FluentU has interactive captions that let you tap on any word to see an image, definition, audio and useful examples. Now native language content is within reach with interactive transcripts.

Didn't catch something? Go back and listen again. Missed a word? Hover your mouse over the subtitles to instantly view definitions.

learn-a-language-with-music

Interactive, dual-language subtitles.

You can learn all the vocabulary in any video with FluentU's "learn mode." Swipe left or right to see more examples for the word you’re learning.

learn-a-language-with-adaptive-quizzes

FluentU Has Quizzes for Every Video

And FluentU always keeps track of vocabulary that you’re learning. It gives you extra practice with difficult words—and reminds you when it’s time to review what you’ve learned. You get a truly personalized experience.

Start using the FluentU website on your computer or tablet or, better yet, download the FluentU app from the iTunes or Google Play store. Click here to take advantage of our current sale! (Expires at the end of this month.)

Enter your e-mail address to get your free PDF!

We hate SPAM and promise to keep your email address safe

hypothesis on language learning

Language Acquisition Theory

Henna Lemetyinen

Postdoctoral Researcher

BSc (Hons), Psychology, PhD, Developmental Psychology

Henna Lemetyinen is a postdoctoral research associate at the Greater Manchester Mental Health NHS Foundation Trust (GMMH).

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Language is a cognition that truly makes us human. Whereas other species do communicate with an innate ability to produce a limited number of meaningful vocalizations (e.g., bonobos) or even with partially learned systems (e.g., bird songs), there is no other species known to date that can express infinite ideas (sentences) with a limited set of symbols (speech sounds and words).

This ability is remarkable in itself. What makes it even more remarkable is that researchers are finding evidence for mastery of this complex skill in increasingly younger children.

My project 1 51

Infants as young as 12 months are reported to have sensitivity to the grammar needed to understand causative sentences (who did what to whom; e.g., the bunny pushed the frog (Rowland & Noble, 2010).

After more than 60 years of research into child language development, the mechanism that enables children to segment syllables and words out of the strings of sounds they hear and to acquire grammar to understand and produce language is still quite an enigma.

Behaviorist Theory of Language Acquisition

One of the earliest scientific explanations of language acquisition was provided by Skinner (1957). As one of the pioneers of behaviorism , he accounted for language development using environmental influence, through imitation, reinforcement, and conditioning.

In this view, children learn words and grammar primarily by mimicking the speech they hear and receiving positive feedback for correct usage.

Skinner argued that children learn language based on behaviorist reinforcement principles by associating words with meanings. Correct utterances are positively reinforced when the child realizes the communicative value of words and phrases.

For example, when the child says ‘milk’ and the mother smiles and gives her some. As a result, the child will find this outcome rewarding, enhancing the child’s language development (Ambridge & Lieven, 2011).

Over time, through repetition and reinforcement, they refine their linguistic abilities. Critics argue this theory doesn’t fully explain the rapid pace of language acquisition nor the creation of novel sentences.

Chomsky Theory of Language Development

However, Skinner’s account was soon heavily criticized by Noam Chomsky, the world’s most famous linguist to date.

In the spirit of the cognitive revolution in the 1950s, Chomsky argued that children would never acquire the tools needed for processing an infinite number of sentences if the language acquisition mechanism was dependent on language input alone.

Noam Chomsky introduced the nativist theory of language development, emphasizing the role of innate structures and mechanisms in the human brain. Key points of Chomsky’s theory include:

Language Acquisition Device (LAD): Chomsky proposed that humans have an inborn biological capacity for language, often termed the LAD, which predisposes them to acquire language.

Universal Grammar: He suggested that all human languages share a deep structure rooted in a set of grammatical rules and categories. This “universal grammar” is understood intuitively by all humans.

Poverty of the Stimulus: Chomsky argued that the linguistic input received by young children is often insufficient (or “impoverished”) for them to learn the complexities of their native language solely through imitation or reinforcement. Yet, children rapidly and consistently master their native language, pointing to inherent cognitive structures.

Critical Period: Chomsky, along with other linguists, posited a critical period for language acquisition, during which the brain is particularly receptive to linguistic input, making language learning more efficient.

Critics of Chomsky’s theory argue that it’s too innatist and doesn’t give enough weight to social interaction and other factors in language acquisition.

Universal Grammar

Consequently, he proposed the theory of Universal Grammar: an idea of innate, biological grammatical categories, such as a noun category and a verb category, that facilitate the entire language development in children and overall language processing in adults.

Universal Grammar contains all the grammatical information needed to combine these categories, e.g., nouns and verbs, into phrases. The child’s task is just to learn the words of her language (Ambridge & Lieven).

For example, according to the Universal Grammar account, children instinctively know how to combine a noun (e.g., a boy) and a verb (to eat) into a meaningful, correct phrase (A boy eats).

This Chomskian (1965) approach to language acquisition has inspired hundreds of scholars to investigate the nature of these assumed grammatical categories, and the research is still ongoing.

Contemporary Research

A decade or two later, some psycho-linguists began to question the existence of Universal Grammar. They argued that categories like nouns and verbs are biologically, evolutionarily, and psychologically implausible and that the field called for an account that can explain the acquisition process without innate categories.

Researchers started to suggest that instead of having a language-specific mechanism for language processing, children might utilize general cognitive and learning principles.

Whereas researchers approaching the language acquisition problem from the perspective of Universal Grammar argue for early full productivity, i.e., early adult-like knowledge of the language, the opposing constructivist investigators argue for a more gradual developmental process. It is suggested that children are sensitive to patterns in language which enables the acquisition process.

An example of this gradual pattern learning is morphology acquisition. Morphemes are the smallest grammatical markers, or units, in language that alter words. In English, regular plurals are marked with an –s morpheme (e.g., dog+s).

Similarly, English third singular verb forms (she eat+s, a boy kick+s) are marked with the –s morpheme. Children are considered to acquire their first instances of third singular forms as entire phrasal chunks (Daddy kicks, a girl eats, a dog barks) without the ability to tease the finest grammatical components apart.

When the child hears a sufficient number of instances of a linguistic construction (i.e., the third singular verb form), she will detect patterns across the utterances she has heard. In this case, the repeated pattern is the –s marker in this particular verb form.

As a result of many repetitions and examples of the –s marker in different verbs, the child will acquire sophisticated knowledge that, in English, verbs must be marked with an –s morpheme in the third singular form (Ambridge & Lieven, 2011; Pine, Conti-Ramsden, Joseph, Lieven & Serratrice, 2008; Theakson & Lieven, 2005).

Approaching language acquisition from the perspective of general cognitive processing is an economic account of how children can learn their first language without an excessive biolinguistic mechanism.

However, finding a solid answer to the problem of language acquisition is far from being over. Our current understanding of the developmental process is still immature.

Investigators of Universal Grammar are still trying to convince that language is a task too demanding to acquire without specific innate equipment, whereas constructivist researchers are fiercely arguing for the importance of linguistic input.

The biggest questions, however, are yet unanswered. What is the exact process that transforms the child’s utterances into grammatically correct, adult-like speech? How much does the child need to be exposed to language to achieve the adult-like state?

What account can explain variation between languages and the language acquisition process in children acquiring very different languages to English? The mystery of language acquisition is granted to keep psychologists and linguists alike astonished decade after decade.

What is language acquisition?

Language acquisition refers to the process by which individuals learn and develop their native or second language.

It involves the acquisition of grammar, vocabulary, and communication skills through exposure, interaction, and cognitive development. This process typically occurs in childhood but can continue throughout life.

What is Skinner’s theory of language development?

Skinner’s theory of language development, also known as behaviorist theory, suggests that language is acquired through operant conditioning. According to Skinner, children learn language by imitating and being reinforced for correct responses.

He argued that language is a result of external stimuli and reinforcement, emphasizing the role of the environment in shaping linguistic behavior.

What is Chomsky’s theory of language acquisition?

Chomsky’s theory of language acquisition, known as Universal Grammar, posits that language is an innate capacity of humans.

According to Chomsky, children are born with a language acquisition device (LAD), a biological ability that enables them to acquire language rules and structures effortlessly.

He argues that there are universal grammar principles that guide language development across cultures and languages, suggesting that language acquisition is driven by innate linguistic knowledge rather than solely by environmental factors.

Ambridge, B., & Lieven, E.V.M. (2011). Language Acquisition: Contrasting theoretical approaches . Cambridge: Cambridge University Press.

Chomsky, N. (1965). Aspects of the Theory of Syntax . MIT Press.

Pine, J.M., Conti-Ramsden, G., Joseph, K.L., Lieven, E.V.M., & Serratrice, L. (2008). Tense over time: testing the Agreement/Tense Omission Model as an account of the pattern of tense-marking provision in early child English. Journal of Child Language , 35(1): 55-75.

Rowland, C. F.; & Noble, C. L. (2010). The role of syntactic structure in children’s sentence comprehension: Evidence from the dative. Language Learning and Development , 7(1): 55-75.

Skinner, B.F. (1957). Verbal behavior . Acton, MA: Copley Publishing Group.

Theakston, A.L., & Lieven, E.V.M. (2005). The acquisition of auxiliaries BE and HAVE: an elicitation study. Journal of Child Language , 32(2): 587-616.

Further Reading

An excellent article by Steven Pinker on Language Acquisition

Pinker, S. (1995). The New Science of Language and Mind . Penguin.

Tomasello, M. (2005). Constructing A Language: A Usage-Based Theory of Language Acquisition . Harvard University Press.

Print Friendly, PDF & Email

Introduction The Acquisition-Learning Hypothesis The Natural Order Hypothesis The Monitor Hypothesis The Input Hypothesis The Affective Filter Hypothesis Curriculum Design Conclusions Bibliography
  Introduction         The influence of Stephen Krashen on language education research and practice is undeniable.  First introduced over 20 years ago, his theories are still debated today.  In 1983, he published The Natural Approach with Tracy Terrell, which combined a comprehensive second language acquisition theory with a curriculum for language classrooms.  The influence of Natural Approach can be seen especially in current EFL textbooks and teachers resource books such as The Lexical Approach (Lewis, 1993).  Krashen’s theories on second language acquisition have also had a huge impact on education in the state of California, starting in 1981 with his contribution to Schooling and language minority students: A theoretical framework by the California State Department of Education (Krashen 1981).  Today his influence can be seen most prominently in the debate about bilingual education and perhaps less explicitly in language education policy:  The BCLAD/CLAD teacher assessment tests define the pedagogical factors affecting first and second language development in exactly the same terms used in Krashen’s Monitor Model (California Commission on Teacher Credentialing, 1998).         As advertised, The Natural Approach is very appealing – who wouldn’t want to learn a language the natural way, and what language teacher doesn’t think about what kind of input to provide for students.  However, upon closer examination of Krashen’s hypotheses and Terrell’s methods, they fail to provide the goods for a workable system.  In fact, within the covers of “The Natural Approach”, the weaknesses that other authors criticize can be seen playing themselves out into proof of the failure of Krashen’s model.  In addition to reviewing what other authors have written about Krashen’s hypotheses, I will attempt to directly address what I consider to be some of the implications for ES/FL teaching today by drawing on my own experience in the classroom as a teacher and a student of language.  Rather than use Krashen’s own label, which is to call his ideas simply “second language acquisition theory”, I will adopt McLaughlin’s terminology (1987) and refer to them collectively as “the Monitor Model”.  This is distinct from “the Monitor Hypothesis”, which is the fourth of Krashen’s five hypotheses. The Acquisition-Learning Hypothesis         First is the Acquisition-Learning Hypothesis, which makes a distinction between “acquisition,” which he defines as developing competence by using language for “real communication” and “learning.” which he defines as “knowing about” or “formal knowledge” of a language (p.26).  This hypothesis is presented largely as common sense: Krashen only draws on only one set of references from Roger Brown in the early 1970’s.  He claims that Brown’s research on first language acquisition showed that parents tend to correct the content of children’s speech rather than their grammar.  He compares it with several other authors’ distinction of “implicit” and “explicit” learning but simply informs the reader that evidence will be presented later.         Gregg (1984) first notes that Krashen’s use of the Language Acquisition Device (LAD) gives it a much wider scope of operation than even Chomsky himself.  He intended it simply as a construct to describe the child’s initial state, which would therefore mean that it cannot apply to adult learners.  Drawing on his own experience of learning Japanese, Gregg contends that Krashen’s dogmatic insistence that “learning” can never become “acquisition” is quickly refuted by the experience of anyone who has internalized some of the grammar they have consciously memorized.  However, although it is not explicitly stated, Krashen’s emphasis seems to be that classroom learning does not lead to fluent, native-like speech.  Gregg’s account that his memorization of a verb conjugation chart was “error-free after a couple of days”(p.81) seems to go against this spirit.  The reader is left to speculate whether his proficiency in Japanese at the time was sufficient enough for him to engage in error-free conversations with the verbs from his chart.         McLaughlin (1987) begins his critique by pointing out that Krashen never adequately defines “acquisition”, “learning”, “conscious” and “subconscious”, and that without such clarification, it is very difficult to independently determine whether subjects are “learning” or “acquiring” language.  This is perhaps the first area that needs to be explained in attempting to utilize the Natural Approach.  If the classroom situation is hopeless for attaining proficiency, then it is probably best not to start.  As we will see in an analysis of the specific methods in the book, any attempt to recreate an environment suitable for “acquisition” is bound to be problematic.         Krashen’s conscious/unconscious learning distinction appeals to students and teachers in monolingual countries immediately.  In societies where there are few bilinguals, like the United States, many people have struggled to learn a foreign language at school, often unsuccessfully.  They see people who live in other countries as just having “picked up” their second language naturally in childhood.  The effort spent in studying and doing homework seems pointless when contrasted with the apparent ease that “natural” acquisition presents.  This feeling is not lost on teachers: without a theoretical basis for the methods, given any perceived slow progress of their students, they would feel that they have no choice but to be open to any new ideas         Taking a broad interpretation of this hypothesis, the main intent seems to be to convey how grammar study (learning) is less effective than simple exposure (acquisition).  This is something that very few researchers seem to doubt, and recent findings in the analysis of right hemisphere trauma indicate a clear separation of the facilities for interpreting context-independent sentences from context-dependent utterances (Paradis, 1998).  However, when called upon to clarify, Krashen takes the somewhat less defensible position that the two are completely unrelated and that grammar study has no place in language learning (Krashen 1993a, 1993b).  As several authors have shown (Gregg 1984, McLaughlin 1987, and Lightbown & Pienemann 1993, for a direct counter-argument to Krashen 1993a) there are countless examples of how grammar study can be of great benefit to students learning by some sort of communicative method. The Natural Order Hypothesis         The second hypothesis is simply that grammatical structures are learned in a predictable order.  Once again this is based on first language acquisition research done by Roger Brown, as well as that of Jill and Peter de Villiers.  These studies found striking similarities in the order in which children acquired certain grammatical morphemes.  Krashen cites a series of studies by Dulay and Burt which show that a group of Spanish speaking and a group of Chinese speaking children learning English as a second language also exhibited a “natural” order for grammatical morphemes which did not differ between the two groups.  A rather lengthy end-note directs readers to further research in first and second language acquisition, but somewhat undercuts the basic hypothesis by showing limitations to the concept of an order of acquisition.         Gregg argues that Krashen has no basis for separating grammatical morphemes from, for example, phonology.  Although Krashen only briefly mentions the existence of other parallel “streams” of acquisition in The Natural Approach, their very existence rules out any order that might be used in instruction.  The basic idea of a simple linear order of acquisition is extremely unlikely, Gregg reminds us.  In addition, if there are individual differences then the hypothesis is not provable, falsifiable, and in the end, not useful.         McLaughlin points out the methodological problems with Dulay and Burt’s 1974 study, and cites a study by Hakuta and Cancino (1977, cited in McLaughlin, 1987, p.32) which found that the complexity of a morpheme depended on the learner’s native language.  The difference between the experience of a speaker of a Germanic language studying English with that of an Asian language studying English is a clear indication of the relevance of this finding.  The contradictions for planning curriculum are immediately evident.  Having just discredited grammar study in the Acquisition-Learning Hypothesis, Krashen suddenly proposes that second language learners should follow the “natural” order of acquisition for grammatical morphemes.  The teacher is first instructed to create a natural environment for the learner but then, in trying to create a curriculum, they are instructed to base it on grammar.  As described below in an analysis of the actual classroom methods presented in the Natural Approach, attempting to put these conflicting theories into practice is very problematic.         When one examines this hypothesis in terms of comprehension and production, its insufficiencies become even more apparent.  Many of the studies of order of acquisition, especially those in first language acquisition, are based on production.  McLaughlin also points out that “correct usage” is not monolithic – even for grammatical morphemes, correct usage in one situation does not guarantee as correct usage in another (p.33).  In this sense, the term “acquisition” becomes very unclear, even when not applying Krashen’s definition.  Is a structure “acquired” when there are no mistakes in comprehension?  Or is it acquired when there is a certain level of accuracy in production?  First language acquisition is very closely linked to the cognitive development of infants, but second language learners have most of these facilities present, even as children.  Further, even if some weak form of natural order exists for any learners who are speakers of a given language, learning in a given environment, it is not clear that the order is the same for comprehension and production.  If these two orders differ, it is not clear how they would interact. The Monitor Hypothesis The role of conscious learning is defined in this somewhat negative hypothesis: The only role that such “learned” competence can have is an editor on what is produced.  Output is checked and repaired, after it has been produced, by the explicit knowledge the learner has gained through grammar study.  The implication is that the use of this Monitor should be discouraged and that production should be left up to some instinct that has been formed by “acquisition”.  Using the Monitor, speech is halting since it only can check what has been produced, but Monitor-free speech is much more instinctive and less contrived.  However, he later describes cases of using the Monitor efficiently (p. 32) to eliminate errors on “easy” rules.  This hypothesis presents very little in the way of supportive evidence:  Krashen cites several studies by Bialystok alone and with Frohlich as “confirming evidence” (p.31) and several of his own studies on the difficulty of confirming acquisition of grammar.         Perhaps Krashen’s recognition of this factor was indeed a step forward – language learners and teachers everywhere know the feeling that the harder they try to make a correct sentence, the worse it comes out.  However, he seems to draw the lines around it a bit too closely.  Gregg points (p.84) out that by restricting monitor use to “learned” grammar and only in production, Krashen in effect makes the Acquisition-Learning Hypothesis and the Monitor Hypothesis contradictory.  Gregg also points out that the restricting learning to the role of editing production completely ignores comprehension (p.82).  Explicitly learned grammar can obviously play a crucial role in understanding speech.         McLaughlin gives a thorough dissection of the hypothesis, showing that Krashen has never demonstrated the operation of the Monitor in his own or any other research.  Even the further qualification that it only works on discrete-point tests on one grammar rule at a time failed to produce evidence of operation.  Only one study (Seliger, cited on p.26) was able to find narrow conditions for its operation, and even there the conclusion was that it was not representative of the conscious knowledge of grammar.  He goes on to point out how difficult it is to determine if one is consciously employing a rule, and that such conscious editing actually interferes with performance.  But his most convincing argument is the existence of learners who have taught themselves a language with very little contact with native speakers.  These people are perhaps rare on the campuses of U.S. universities, but it is quite undeniable that they exist.         The role that explicitly learned grammar and incidentally acquired exposure have in forming sentences is far from clear.  Watching intermediate students practice using recasts is certainly convincing evidence that something like the Monitor is at work: even without outside correction, they can eliminate the errors in a target sentence or expression of their own ideas after several tries.  However, psycholinguists have yet to determine just what goes into sentence processing and bilingual memory.  In a later paper (Krashen 1991), he tried to show that high school students, despite applying spelling rules they knew explicitly, performed worse than college students who did not remember such rules.  He failed to address not only the relevance of this study to the ability to communicate in a language, but also the possibility that whether they remembered the rules or not, the college students probably did know the rules consciously at some point, which again violates the Learning-Acquisition Hypothesis. The Input Hypothesis         Here Krashen explains how successful “acquisition” occurs:  by simply understanding input that is a little beyond the learner’s present “level” – he defined that present “level” as i and the ideal level of input as i +1.  In the development of oral fluency, unknown words and grammar are deduced through the use of context (both situational and discursive), rather than through direct instruction.  Krashen has several areas which he draws on for proof of the Input Hypothesis.  One is the speech that parents use when talking to children (caretaker speech), which he says is vital in first language acquisition (p.34).  He also illustrates how good teachers tune their speech to their students’ level, and how when talking to each other, second language learners adjust their speech in order to communicate.  This hypothesis is also supported by the fact that often the first second language utterances of adult learners are very similar to those of infants in their first language.  However it is the results of methods such as Asher’s Total Physical Response that provide the most convincing evidence.  This method was shown to be far superior to audiolingual, grammar-translation or other approaches, producing what Krashen calls “nearly five times the [normal] acquisition rate.”         Gregg spends substantial time on this particular hypothesis, because, while it seems to be the core of the model, it is simply an uncontroversial observation with no process described and no proof provided.  He brings up the very salient point that perhaps practice does indeed also have something to do with second language acquisition, pointing out that monitoring could be used as a source of correct utterances (p. 87).  He also cites several studies that shed some doubt on the connection between caretaker speech in first language acquisition and simplified input in second language acquisition.         McLaughlin also gives careful and thorough consideration to this part of Krashen’s model.  He addresses each of the ten lines of evidence that Krashen presents, arguing that it is not sufficient to simply say that certain phenomenon can be viewed from the perspective of the Input Hypothesis.  The concept of a learner’s “level” is extremely difficult to define, just as the idea of i +1 is (p.37).  Further, there are many structures such as passives and yes/no questions that cannot be learned through context.  Also, there is no evidence that a learner has to fully comprehend an utterance for it to aid in acquisition.  Some of the first words that children and second language learners produce are formulaic expressions that are not fully understood initially.  Finally McLaughlin points out that Krashen simply ignores other internal factors such as motivation and the importance of producing language for interaction.         This hypothesis is perhaps the most appealing part of Krashen’s model for the language learner as well as the teacher.  He makes use of the gap between comprehension and production that everyone feels, enticing us with the hope of instant benefits if we just get the input tuned to the right level.  One of Krashen’s cleverest catch-alls is that other methods of teaching appear to work at times because they inadvertently provide this input.  But the disappointment is that he never gives any convincing idea as to how it works.  In the classroom a teacher can see when the students don’t understand and can simplify his or her speech to the point where they do.  Krashen would have the teacher think that this was all that is necessary, and it is just a matter of time before the students are able to express themselves freely.  However, Ellis (1992) points out that even as of his 1985 work (Krashen 1985), he still had not provided a single study that demonstrated the Input Hypothesis.  Over extended periods of time students do learn to understand more and even how to speak, but it often seems to take much longer than Krashen implies, indicating that there are perhaps many more factors involved.  More importantly, even given this beginning of i, and the goal of i + 1, indefinable as they are, the reader is given no indication of how to proceed.  As shown above the Natural Order Hypothesis holds no answers, especially as to how comprehension progresses.  In an indication of a direction that should be explored, Ellis’s exploratory study (ibid.) showed that it is the effort involved in attempting to understand input rather than simple comprehension that fuels acquisition. The Affective Filter Hypothesis         This concept receives the briefest treatment in “The Natural Approach”.  Krashen simply states that “attitudinal variables relate directly to language acquisition but not language learning.”  He cites several studies that examine the link between motivation and self-image, arguing that an “integrative” motivation (the learner want to “be like” the native speakers of a language) is necessary.  He postulates an “affective filter” that acts before the Language Acquisition Device and restricts the desire to seek input if the learner does not have such motivation.  Krashen also says that at puberty, this filter increases dramatically in strength.         Gregg notes several problems with this hypothesis as well.  Among others, Krashen seems to indicate that perhaps the affective filter is associated with the emotional upheaval and hypersensitivity of puberty, but Gregg notes that this would indicate that the filter would slowly disappear in adulthood, which Krashen does not allow for (p.92).  He also remarks on several operational details, such as the fact that simply not being unmotivated would be the same as being highly motivated in this hypothesis – neither is the negative state of being unmotivated.  Also, he questions how this filter would selectively choose certain “parts of a language” to reject (p.94).         McLaughlin argues much along the same lines as Gregg and points out that adolescents often acquire languages faster than younger, monitor-free children (p.29).  He concludes that while affective variables certainly play a critical role in acquisition, there is no need to theorize a filter like Krashen’s.         Again, the teacher in the classroom is enticed by this hypothesis because of the obvious effects of self-confidence and motivation.  However, Krashen seems to imply that teaching children, who don’t have this filter, is somehow easier, since “given sufficient exposure, most children reach native-like levels of competence in second languages” (p.47).  This obviously completely ignores the demanding situations that face language minority children in the U.S. every day.  A simplification into a one page “hypothesis” gives teachers the idea that these problems are easily solved and fluency is just a matter of following this path.  As Gregg and McLaughlin point out, however, trying to put these ideas into practice, one quickly runs into problems. Curriculum Design         The educational implications of Krashen’s theories become more apparent in the remainder of the book, where he and Terrell lay out the specific methods that make use of the Monitor Model.  These ideas are based on Terrell’s earlier work (Terrell, 1977) but have been expanded into a full curriculum.  The authors qualify this collection somewhat by saying that teachers can use all or part of the Natural Approach, depending on how it fits into their classroom.         This freedom, combined with the thoroughness of their curriculum, make the Natural Approach very attractive.  In fact, the guidelines they set out at the beginning– communication is the primary goal, comprehension preceding production, production simply emerge, acquisition activities are central, and the affective filter should be lowered (p. 58-60) – are without question, excellent guidelines for any language classroom.  The compilation of topics and situations (p.67-70) which make up their curriculum are a good, broad overview of many of the things that students who study by grammar translation or audiolingual methods do not get.  The list of suggested rules (p.74) is notable in its departure from previous methods with its insistence on target language input but its allowance for partial, non-grammatical or even L1 responses.         Outside of these areas, application of the suggestions run into some difficulty.  Three general communicative goals of being able to express personal identification, experiences and opinions (p.73) are presented, but there is no theoretical background.  The Natural Approach contains ample guidance and resources for the beginner levels, with methods for introducing basic vocabulary and situations in a way that keeps students involved.  It also has very viable techniques for more advanced and self-confident classes who will be stimulated by the imaginative situational practice (starting on p.101).  However, teachers of the broad middle range of students who have gotten a grip on basic vocabulary but are still struggling with sentence and question production are left with conflicting advice.         Once beyond one-word answers to questions, the Natural Approach ventures out onto thin ice by suggesting elicited productions.  These take the form of open-ended sentences, open dialogs and even prefabricated patterns (p.84).  These formats necessarily involve explicit use of grammar, which violates every hypothesis of the Monitor Model.  The authors write this off as training for optimal Monitor use (p.71, 142), despite Krashen’s promotion of “Monitor-free” production.  Even if a teacher were to set off in this direction and begin to introduce a “structure of the day” (p. 72), once again there is no theoretical basis for what to choose.  Perhaps the most glaring omission is the lack of any reference to the Natural Order Hypothesis, which as noted previously, contained no realistically usable information for designing curriculum.         Judging from the emphasis on exposure in the Natural Approach and the pattern of Krashen’s later publications, which focused on the Input Hypothesis, the solution to curriculum problems seems to be massive listening.  However, as noted before, other than i + 1, there is no theoretical basis for overall curriculum design regarding comprehension.  Once again, the teacher is forced to rely on a somewhat dubious “order of acquisition”, which is based on production anyway.  Further, the link from exposure to production targets is tenuous at best.  Consider the dialog presented on p.87: . . . to the question What is the man doing in this picture? the students may reply run.  The instructor expands the answer.  Yes, that’s right, he’s running.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Krashen's Language Acquisition Hypotheses: A Critical Review

Profile image of Rohani Motivator

Related Papers

International Journal of Social Research

Mzamani Maluleke

The monitor model, being one of its kind postulating the rigorous process taken by learners of second language, has since its inception in 1977, stirred sterile debates the globe over. Since then, Krashen has been rethinking and expanding his hypothetical acquisition notions, improve the applicability of his theory. The model has not been becoming, and it therefore faces disapproval on the basis of its failure to be tested empirically and, at some points, its contrast to Krashen’s earlier perceptions on both first and second language acquisition. In this paper, the writers deliberate upon Krashen’s monitor model, its tenets as well as the various ways in which it impacts, either negatively or positively upon educational teaching and learning.

hypothesis on language learning

Amalia Oyarzún

Aufani Yukzanali

Many theories on how language is acquired has been introduced since 19th century and still being introduced today by many great thinkers. Like any other theories which arose from variety of disciplines, language acquisition theories generally derived from linguistics and psychological thinking. This paper concluded that the most important implication of language acquisition theories is obviously the fact that applied linguists, methodologist and language teachers should view the acquisition of a language not only as a matter of nurture but also an instance of nature. In addition, only when we distinguish between a general theory of learning and language learning can we ameliorate the conditions L2 education. To do so, applied linguists must be aware of the nature of both L1 and L2 acquisition and must consider the distinction proposed in this study. Furthermore, no longer should mind and innateness be treated as dirty words. This will most probably lead to innovative proposals for syllabus development and the design of instructional systems, practices, techniques, procedures in the language classroom, and finally a sound theory of L2 teaching and learning.

Karunakaran Thirunavukkarasu

Luz Villarroel Cornejo

Evynurul Laily Zen

This paper aims at revealing the factors that contribute to children's language acquisition of either their first or second language. The affective filter hypothesis (Krashen, 2003) as the underlying framework of this paper is used to see how children's perception towards the language input take a role in the process of acquisition. 25 lecturers in the Faculty of Letters, State University of Malang who have sons or daughters under the age of 10 become the data source. The data are collected through survey method and analyzed qualitatively since this paper is attempting to give a thorough description of the reality in children's language acquisition. The results show that most children are exposed to the language while interacting with their family members, especially their mothers. Another factor is children's interactions with friends. The languages used by their friends are potential to be acquired by them. These two factors strongly confirm the core idea of the affective filter hypothesis that children will learn best when they feel comfortable and are positive about the input they are absorbing. Furthermore, reading is also one of other minor contributing factors discovering the fact that the books the children like helps them construct positive perception which then encourage them import more inputs. 1. Rationale This paper is an attempt to disseminate the result of the survey-based research conducted to have a closer look at the mapping of bilingual language situation seen in certain linguistic situation in Malang. The survey that was conducted to bilingual parents is basically about to satisfy a personal yet scientific curiosity of the researchers as both parents to bilingual children and language teachers. Nothing seems really unique from the fact that children in Indonesia are born to be bilingual because, by nature, they are raised by bilingual parents in bi(multi)lingual situation. On the other hand, there have been an increasing number of studies that explore the nature of bilingual language acquisition. Some have seen negative impact of exposing second language to children (at various angles by which these previous studies have been carried out, the socio-psycholinguistic environment of bilingual children in Malang is obviously worth-researching. One of the focuses of the survey is looking thoroughly at the contributing factors of both the first and second language development of bilinguals that mainly becomes the concern of this paper. Something really significant to start with is the result of the survey seen from Figure 1 below that not only 16% of the children of the respondents are raised monolingual, but also 28% of them are trilingual.

Lazaros Kikidis

For Didactics and Applied Linguistics MA students

Andreas Gozali

Language and Education

Nicole Ziegler

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 15 September 2021

Language and nonlanguage factors in foreign language learning: evidence for the learning condition hypothesis

  • Xin Kang   ORCID: orcid.org/0000-0002-1126-5771 1 , 2 ,
  • Stephen Matthews   ORCID: orcid.org/0000-0001-7683-8051 3 ,
  • Virginia Yip 1 , 4 &
  • Patrick C. M. Wong   ORCID: orcid.org/0000-0002-6105-5027 1 , 2 , 5  

npj Science of Learning volume  6 , Article number:  28 ( 2021 ) Cite this article

10k Accesses

3 Citations

20 Altmetric

Metrics details

  • Human behaviour
  • Interdisciplinary studies

The question of why native and foreign languages are learned with a large performance gap has prompted language researchers to hypothesize that they are subserved by fundamentally different mechanisms. However, this hypothesis may not have taken into account that these languages can be learned under different conditions (e.g., naturalistic vs. classroom settings). With a large sample of 636 third language (L3) learners who learned Chinese and English as their first (L1) and second (L2) languages, the present study examined the association of learning success across L1–L3. We argue that learning conditions may reveal how these languages are associated in terms of learning success. Because these languages were learned under a continuum of naturalistic to classroom conditions from L1 to L3, this sample afforded us a unique opportunity to evaluate the hypothesis that similar learning conditions between languages could be an important driving force determining language learning success. After controlling for nonlanguage factors such as musical background and motivational factors and using a convergence of analytics including the general linear models, the structural equation models, and machine learning, we found that the closer two languages were on the continuum of learning conditions, the stronger their association of learning success. Specifically, we found a significant association between L1 and L2 and between L2 and L3, but not between L1 and L3. Our results suggest that learning conditions may have important implications for the learning success of L1–L3.

Similar content being viewed by others

hypothesis on language learning

A large-scale repository of spoken narratives in French, German and Spanish from Cantonese-speaking learners

Xin Kang, Virginia Yip, … Patrick C. M. Wong

hypothesis on language learning

Contributions of common genetic variants to specific languages and to when a language is learned

Patrick C. M. Wong, Xin Kang, … Kwong Wai Choy

hypothesis on language learning

Languages with more speakers tend to be harder to (machine-)learn

Alexander Koplenig & Sascha Wolfer

Introduction

For decades, linguists, psychologists, neuroscientists, and educators have been puzzled by the observation that young children can learn their native languages with ease, yet adults often struggle to learn even the basics of foreign languages 1 , 2 , 3 , 4 . This observation has propelled a large body of research conducted on the hypothesis that native (L1) and nonnative (foreign) (L2) languages are learned, represented, and/or processed in fundamentally different ways that result in this large gap in learning outcomes 5 , 6 . On the other hand, a smaller but growing set of studies has found similarities in learning outcomes between native and foreign languages, which support the hypothesis that a common set of mental operations may be in place for all language learning 7 , 8 . By examining the language outcomes of a large sample of learners who have learned three languages consecutively from birth, in early childhood, and in adulthood, the present study aims to evaluate these two sets of hypotheses against a newly proposed hypothesis to offer new insights into native and foreign language learning.

Studies that investigate native and foreign language learning are generally designed to test two sets of hypotheses. The first is the fundamental difference hypothesis (FDH) 5 , 6 . According to the FDH, native (first) language acquisition relies on a domain-specific core computational system of human language (known as Universal Grammar) 9 , whereas adult foreign language learners either lack access to this innate system or its operation is partial and imperfect, leading to difficult and ultimately unsuccessful outcomes of learning. The central claim of FDH is consistent with the critical/sensitive period hypothesis 10 , 11 that native language proficiency could not be achieved beyond a limited age range prior to puberty or young adulthood 12 , 13 , 14 . The age-related decline in language learning ability is said to be part of human brain maturation 15 , 16 , 17 , 18 . The second set of hypotheses, including the linguistic coding deficit/differences hypothesis (LCDH) 7 , 8 , 19 , 20 , argue that both native and foreign language learning are tied to the same set of core language functions (e.g., phonological, syntactic, and semantic processing skills), thus learning outcomes for all languages will be interrelated.

The literature on L1 and L2 learning reports two general sets of findings which have been interpreted as supporting either hypothesis. The first is a well-documented set of findings of a large performance gap between L1 and L2, despite years of training for some aspects of morphosyntax and phonology. Even fluent nonnative speakers tend to lag behind native speakers in real-time language processing 21 . These findings are often interpreted as supportive of FDH. The second set of findings concerns not absolute proficiency levels but how performance in the two languages is correlated within individuals 22 , 23 . For example, Sparks et al. 22 found that the best predictor of L2 word decoding was individual learners’ L1 word decoding. In addition, auditory processing may explain variability in success in learning linguistic rules by both infants and adults 24 . These results suggest that a common set of core functions may be at play for both native and foreign languages, as suggested by LCDH.

While foundational in contributing to our current understanding of native and foreign language learning, these two sets of studies may not by themselves confirm either of the hypotheses. The first set of studies that was used to support FDH did not examine whether the proficiency of L1 and L2 was correlated but focused on absolute performance levels or the morphosyntax of the languages. The second set of studies was often restricted to the learning of typologically similar languages (Indo-European languages), and the correlation between L1 and L2 was usually found on metalinguistic tasks (e.g., decoding, spelling) with a heavy emphasis on task (procedural) rather than linguistic abilities. The L1–L2 correlation might disappear if the two languages were further apart in typological distance or if metalinguistic abilities were deemphasized.

In the present study, we attempt to address these limitations by examining the association between native and foreign languages and by using more comprehensive measures of language proficiency. We examined the learning of more than two languages differing in typological distance for a more rigorous investigation. Whether a core set of functions subserve the learning of native and foreign languages should be observed in the learning of all languages, not a pair of languages, not only when the languages are typologically close, and not only when a specific task is administered.

Although highly influential, the body of literature evaluating FDH and LCDH may have not considered a crucial aspect of language learning that native and foreign languages can be learned under vastly different conditions. As one of the alternative accounts to FDH and LCDH, we propose the learning condition hypothesis (LCH), in which we postulate that a primary factor determining proficiency levels of languages is the condition under which these languages are learned. Prior research studies have reported two different types of factors that affect language learning success: learner-internal and learner-external factors 25 . Learner-internal factors are about the learners themselves, such as their age 11 , 12 , 13 , 14 , 15 , 16 , 17 , nonverbal IQ 26 , and working memory 27 , while external factors refer to stimuli that exist outside the learners such as the environment 28 and the teacher 29 . Our hypothesis concerns one type of external factors, namely learning conditions. For example, a native language tends to be acquired or learned in a more naturalistic setting with input from caregivers and peers, while foreign language learning usually occurs with explicit instruction and practice in the formal academic context of a classroom. We hypothesize that success in learning one language and the other may be linked, not due to a set of core functions for language learning, but because these languages are learned in similar conditions. Taking the learning of three languages as an example, the three languages could be learned under different conditions and in four orders: (1) L1–L3 being learned naturalistically at home; (2) L1 and L2 being learned naturalistically at home before learning L3 in school; (3) L2 and L3 being learned in school after naturalistic acquisition of the L1 at home; and (4) L1–L3 being learned consecutively on a continuum of naturalistic setting to explicit instruction. Following the LCH, we predict that proficiency in these three languages would be associated differently in the above four situations: (1) proficiency in L1–L3 would all be associated; (2) L1 and L2 would be associated, but not L1/L2 and L3; (3) L2 and L3 would be associated, but not L1 and L2/L3; and (4) L2 would be associated with L1 and L3, but L1 and L3 would not be associated. We suggest that learning conditions exert a stronger effect than factors such as typological distance between these languages 30 , 31 , 32 , 33 .

The present study aims to evaluate the three hypotheses: FDH, LCDH, and our newly proposed LCH. We capitalized on our unique ability to access a large population of L3 learners in Hong Kong who learned L1–L3 under different learning conditions. In total, we enrolled 636 participants who were undergraduate students of Chinese descent learning one of three languages as their L3: French ( n  = 187), German ( n  = 176), or Spanish ( n  = 273), at the time of participation. Power calculation was based on the requirements of finding associations between L1 and L2, L2 and L3, and/or L1 and L3 to test our hypotheses. We used the first 25 participants of each L3 to estimate the sample size. We obtained a Pearson’s correlation value between L1 and L2, r  = 0.25 and for a family-wise alpha of 0.05 (Bonferroni-corrected p value of 0.017 for three tests performed to evaluate the relationships between L1 and L2, L2 and L3, and L1 and L3), a minimum of 163 participants in total were required. For each language, we had data available from at least 167 participants for the key measures of L1–L3 proficiency. Our study is therefore sufficiently powered.

On a continuum of naturalistic on one end and instructed on the other, learning conditions of our participants’ L1 and L3 were respectively on the opposite ends, while L2 was in the middle. All participants started to learn Chinese as L1 without the influence of motivational factors from an early age 18 or even before birth 34 , while they learned English as L2 in the formal education system from ~3 years of age for 15 years through the end of senior secondary education. In Hong Kong, Chinese and English are both official languages. According to the Census and Statistics Department of the Hong Kong SAR 35 , over 90% of individuals aged from 6 to 24 years attending full-time schools could read and write both Chinese and English. However, compared to Chinese, English is by no means to be regarded as another native language for the vast majority of families 36 . Cantonese Chinese remains to be the most commonly spoken language among the majority of the local population and is the main medium of instruction in the formal education system 37 , 38 , while English input is abundantly available in daily life and can be one source of implicit language input in addition to explicit input from the classroom. Thus, learning conditions of Chinese and English shared differences and similarities in Hong Kong, with Chinese being the L1 acquired naturally but English being the L2 learned mostly in the classroom. Similar to L2, L3 was learned in the classroom and taught by teachers who were native speakers of L3 or nonnative speakers with near-native proficiency. Although both English and L3 were used as the medium of instruction in language classes at the elementary level, all textbooks and handouts were written in L3 only.

In this study, comprehensive proficiency level for L1 and L2 of participants was assessed by the grades on their college entrance examination, the Hong Kong Diploma of Secondary Education (HKDSE) 39 . L3 proficiency was measured by a combination of measures that included classroom performance and laboratory-based assessment including narrative production, lexical access, and pronunciation judgment. Our study of languages of different typological characteristics also provided an opportunity to examine whether learning conditions as a factor exerted a stronger effect than typological similarity of the languages being learned, as invoked by some theories of L3 learning 32 , 33 . One objective method of defining typological similarity is by ancestral relationship. Accordingly, English and German (Germanic languages) should be regarded as very close relations and so should French and Spanish (members of the Romance languages group). Germanic languages as a group should be regarded as being closer to Romance languages (the two groups being Indo-European) than they are to a Sinitic language (Chinese). As the contribution of nonlanguage factors may also account for substantial variance in learning, including nonverbal IQ 26 , socioeconomic status (SES) 40 , 41 , 42 , musical background 43 , 44 , age 11 , 12 , 13 , 14 , 15 , 16 , 17 , gender 45 , anxiety 46 , and motivational factors of foreign language learning 47 , we obtained these measures and entered them into our statistical analyses (Table  1 ). According to FDH, proficiency of L2 or L3 of our participants would not be likely to be related to L1, because L1 is fundamentally different from other languages. According to LCDH, proficiency of all of our participants’ languages should be correlated, because learning of all languages requires the same set of core functions. Importantly, we predict that according to LCH, L3 of our participants should be correlated with their L2, but not their L1, because of the stark difference in learning conditions between L1 and L3, but L2 should be correlated with their L1 due to implicit learning of L1 and L2 in daily life. Figure  1 provides a graphic representation of what each hypothesis predicts.

figure 1

a The LCH model postulates that L1 and L2 proficiency and L2 and L3 proficiency are related. b The FDH model predicts that L1 proficiency is related neither to L2 nor L3 proficiency. c The LCDH model argues that the proficiency levels of all languages are related.

Statistical analysis

In order to provide converging evidence for one or more of the three competing hypotheses (Fig.  1 ), we subjected our data to three types of analyses: general linear models (GLM), structural equation models (SEM) and machine learning (support vector regression, SVR). L1–L3 proficiency measures were obtained from each of the participants. L1 and L2 measures were obtained from the participants’ HKDSE composite scores for Chinese and English subjects 48 . Because our participants learned different L3s and because of a lack of a single standardized measure for these languages, we measured L3 proficiency using a number of classroom and laboratory measures and used statistical data reduction methods to arrive at an L3 Global score for each participant. Regardless of the type of statistical analysis, our primary goal was to demonstrate the degree of association in proficiency between pairs of languages.

Bivariate correlations

As an initial analysis, we calculated Spearman’s pairwise correlation coefficients (cc) for pairs of L1–L3 (Supplementary Fig.  2 ; Supplementary Table  3 ). Statistical significance was indicated by the false discovery rate (FDR) corrected p values. L1 (Chinese) HKDSE grades were significantly correlated with L2 (English) HKDSE grades ( r  = 0.26 , p  < 0.001). L2 (English) HKDSE grades were significantly correlated with L3 Global scores ( r  = 0.28 , p  < 0.001). Importantly, L1 (Chinese) HKDSE grades were not significantly correlated with L3 Global scores ( r  = 0.05 , p  = 0.263). These bivariate correlational results provide an initial set of evidence for LCH. Although L1 and L2 were both measured by HKDSE and were significantly correlated, L2 and L3 were significantly correlated despite the differences in measurements.

Multiple linear regression models

The bivariate correlational results reported above did not take into account the contributions of other factors that may influence language learning. As discussed in the Introduction, factors such as musical experience could explain variance in language learning 43 , 44 . We therefore employed multiple linear regression models to explore the relationships among the three languages, with the other nonlanguage factors accounted for. We constructed two separate models. In the first model (Table  2 ), L2 (English) HKDSE grades were treated as the dependent variable, and L1 (Chinese) HKDSE grades, gender, musical training, family SES, nonverbal IQ, and age were treated as independent variables. We found musical training ( β  = 0.31, p  = 0.015), family SES ( β  = 0.02, p  < 0.001), and age ( β  = −0.16, p  = 0.015) to significantly predict L2 (English) HKDSE grades. Importantly, L1 (Chinese) HKDSE grades also significantly predicted L2 (English) HKDSE grades and exerted the strongest effect of any of the significant predictors (Δ R 2  = 0.06, p  < 0.001). In the second model (Table  3 ), L3 Global scores were treated as the dependent variable. In addition to the aforementioned nonlanguage predictors, affective and motivational factors as measured by the modern language (ML) learner questionnaire 49 were also entered as independent predictor variables, as they have been found to contribute to the learning of a new language (in this study, L1 and L2 were not new languages being learned and we did not know the learners’ motivation of learning L2 since this began in early childhood). Attitude ( β  = 0.13, p  = 0.043) and age ( β  = −0.08, p  = 0.044) significantly predicted L3 proficiency. Importantly, L2 proficiency (Δ R 2  = 0.06, p  < 0.001) was the most significant predictor of L3 proficiency, while L1 proficiency ( β  = −0.02, p  = 0.580) was not a significant contributor. Taken together, the GLM results indicate an association between L1 and L2 as well as between L2 and L3 after the relevant nonlanguage factors were controlled for. Importantly, we again failed to find an association between L1 and L3.

Structural equation models

All participants.

To evaluate the potential statistical causal links among the three languages while accounting for the contribution of nonlanguage factors, and to directly test the three hypotheses, two latent variable structural equation models were tested. The two models had the same structures except for the paths connecting the three languages. In the first model, paths were drawn from L1 to L2, and from L2 to L3, which enabled us to test LCH (Fig.  1a ). In the second model, paths were imposed from L1 to L2, L2 to L3, and L1 to L3. This second model allowed us to simultaneously evaluate FDH (Fig.  1b ) and LCDH (Fig.  1c ). FDH would predict no statistical effects among any of the paths, while LCDH would predict effects of all three paths. Both LCH and LCDH models provided a statistically acceptable fit. For the first model, the root mean square error of approximation (RMSEA) was 0.025 [CI: 0.000–0.054], the standardized root mean square residual (SRMR) = 0.021, the comparative fit index (CFI) = 0.971, the Tucker–Lewis index (TLI) = 0.946, and the Yuan–Bentler scaling correction factor = 1.019 (Fig.  2a ). For the second model, the RMSEA = 0.028 [CI: 0.000–0.058], the SRMR = 0.021, the CFI = 0.967, the TLI = 0.932, and the Yuan–Bentler scaling correction factor = 1.037 (Fig.  2b ). While both models were statistically significant, the crucial path between L1 and L3 of the second model was not statistically significant ( b  = −0.021 [CI: −0.101 to 0.059]). Importantly, the fit of the second model showed no significant improvement over the first model (Δ χ 2  = 0.269, p  = 0.604). Thus, the results suggest that we fail to reject the null hypothesis that the two models were significantly different and thus parsimony would favor the first model that has fewer estimated parameters. These results, demonstrating associations between L1 and L2 and between L2 and L3, but not L1 and L3, support the LCH. Detailed statistics for each path for each model can be found in Table  4 .

figure 2

Parameter estimates are unstandardized, and the paths are scaled to reflect effect size. Red arrows represent negative paths, while blue arrows are positive paths. L1–L3 are latent variables of language proficiency with Chinese HKDSE grades, English HKDSE grades, and L3 Global scores as their indicators, respectively. Only significant relationships are presented and denoted with asterisks: * p  < 0.05, ** p  < 0.01, *** p  < 0.001. a The LCH model. b The LCDH and FDH models are tested simultaneously because they both concern connections (or lack therefore) of L1–L2, L2–L3, and L1–L3. The path between L1 and L3 as indicated by the dashed arrows was not significant.

Separate models for low and high proficiency learners

Our results reported above came from models where all participants were included. It is possible that the results may differ between learners of low and high proficiency levels. We categorized participants into two groups based on the academic class levels that they were enrolled in (see Methods for explanation). We fitted a “free” LCH model with all parameters being allowed to differ between groups. We then fitted a “constrained” LCH model with all parameters being fixed to those obtained from analysis of the pooled data across the two groups. We examined whether the “constrained” model was significantly different from the “free” model. Results suggest that the “constrained” model is not significantly different between low vs. high proficiency groups, Δ χ 2  = 11.62, p  = 0.637 (Supplementary Fig.  4 ).

Separate models for learners of different languages

Although our study was not specifically designed to test theoretical accounts of L3 learning that are centered on typological similarities 32 , 33 , we conducted further analysis that separated participants into different L3 language groups. We acknowledge that since we did not explicitly measure the psychotypology as perceived by learners 31 , definition of typological proximity of these languages (Chinese, English, French, German, Spanish) can be controversial. Nonetheless, using ancestral relationship as a measure of typological distance, we may expect the effect between English and German to be strongest since both of them are Germanic languages, which would also exhibit less similarity with French and Spanish (Romance languages), and less similarity with Chinese (a Sinitic language). If language typological distance exerts an effect, we may expect the effect from L2 to L3 to be different across pairs of languages, depending on their typological distance from English. We thus first compared the SEMs of German learners with French learners, and those of German learners with Spanish learners, respectively. In addition to comparisons including German learners as a comparison group, we also examined model differences in the Spanish and French group. Our results revealed that no such comparison is significant (German vs. French: Δ χ 2  = 22.30, p  = 0.073; German vs. Spanish: Δ χ 2  = 16.73, p  = 0.271; Spanish vs. French: Δ χ 2  = 11.73, p  = 0.628) (Supplementary Fig.  5 ). These results suggest that as far as our large sample of L3 learners and their comprehensive proficiency assessment of the three L3 languages are concerned, associations with learning of English are not significantly related to typological distance. These results are supplementary to our main findings, as our study was not designed to examine the question of typology. Nevertheless, we conclude that learning condition exerts a stronger effect than typology.

Machine-learning prediction via SVR

Our final analysis involved machine learning using SVR 50 , 51 . The advantage of this approach is the ability to cross-validate models that are more likely to generalize to future, unseen data, as opposed to traditional GLM approaches that tend to overestimate the true effects 52 . We report Pearson’s cc between predicted and observed outcomes from a tenfold cross-validation procedure with 10,000 iterations. The cc values are used as an indicator of predictability, with higher cc values indicating a more accurate predictive performance. When all predictors were included to predict L3 proficiency in the SVR model (Fig.  3 ), the predicted cc (mean = 0.355, SD = 0.039) was significantly different from the null distribution (mean = 0.001, SD = 0.068, p  < 0.001). Importantly, when only L2 (English) HKDSE grades were included as the predictor of L3 Global scores, the distribution of predicted cc (mean = 0.278, SD = 0.039) was also significantly different from the null distribution (mean = −0.001, SD = 0.065, p  < 0.001). Interestingly, when only L1 (Chinese) HKDSE grades were included as the predictor of L3 Global scores, the distribution of the predicted cc (mean = 0.052, SD = 0.042) again differed significantly from the null distribution (mean = −0.001, SD = 0.065, p  < 0.001), but the effect size (Cohen’s d  = 0.96) was at least five times smaller than in the models with all predictors (Cohen’s d  = 6.39) or only the L2 (English) HKDSE grades (Cohen’s d  = 5.22) as predictors.

figure 3

The predictability of L3 Global scores was estimated by the correlation coefficients (cc) between the predicted and the observed language proficiency scores based on tenfold cross-validation with 10,000 iterations. a The importance ranking of all predictors of L3 proficiency, where the x -axis represents the importance value and y -axis represents the variables. The importance value was calculated from tenfold cross-validation with 100 iterations. b When all predictors are included in the SVR model, the distribution of prediction values was significantly different from the null distribution ( p  < 0.001, Cohen’s d  = 6.39). c With only L2 (English) HKDSE grades as the predictor of L3 Global scores, the distribution of prediction values was also significantly different from the null distribution ( p  < 0.001, Cohen’s d  = 5.22). d With only L1 (Chinese) HKDSE grades as the predictor of L3 Global scores, the distribution of prediction values was significantly different from the null distribution ( p  < 0.001, Cohen’s d  = 0.96), but the effect size was much smaller than when all predictors or only L2 predictors were included in the models.

When all predictors were included in the SVR model to predict L2 (English) HKDSE grades (Fig.  4 ), the predicted cc (mean = 0.373, SD = 0.036) was significantly different from the null distribution (mean = −0.004, SD = 0.064, p  < 0.001, Cohen’s d  = 6.45). When only L1 (Chinese) HKDSE grades were used as the predictor of L2 (English) HKDSE grades, the distribution of predicted cc (mean = 0.257, SD = 0.037) was also significantly different from the null distribution (mean = −0.0004, SD = 0.062, p  < 0.001, Cohen’s d  = 7.17).

figure 4

The predictability of L2 (English) HKDSE grades was estimated by the correlation coefficients (cc) between the predicted and the observed language proficiency scores based on tenfold cross-validation with 10,000 iterations. a The importance ranking of all predictors of L2 proficiency, where the x -axis represents the importance value and y -axis represents the variables. The importance value was calculated from tenfold cross-validation with 1,000 iterations. b When all predictors are included in the SVR model to predict L2 (English) HKDSE grades, the distribution of prediction values was significantly different from the null distribution ( p  < 0.001, Cohen’s d  = 7.17). c With only L1 (Chinese) HKDSE grades as the predictor of English HKDSE grades, the distribution of prediction values was also significantly different from the null distribution ( p  < 0.001, Cohen’s d  = 6.45).

Taken together, our analysis of data from all 636 participants using GLM, SEM, and SVR approaches supported the LCH hypothesis that predicted significant associations between L1 and L2, as well as between L2 and L3 in this sample of participants.

The present study was designed to examine the relationship of L1–L3 proficiency based on three hypotheses concerning the learning of native and foreign languages. Its focus was on a long-standing academic debate as to whether the learning of all languages depended on a common set of core functions (LCDH) or whether the mechanisms that subserve the learning of native and foreign languages were fundamentally different (FDH). As an alternative theoretical account, we reconceptualized the problem into one that focuses on the learning conditions and postulated that similarities of language learning conditions would result in similarities in learning outcome (LCH), regardless of whether the language to be learned was native or not.

Our access to a large cohort of language learners provided us the opportunity to evaluate the three hypotheses. All participants learned Chinese as L1, English as L2, and either French, German, or Spanish as L3. Learning conditions of L1–L3 ranged along a continuum from naturalistic and implicit (for L1) at one end to instructed and explicit (for L3) at the other, with L2 falling in between. By using four types of analytics (bivariate correlation, regression, SEM, and machine learning), our results converged to demonstrate close relationships between L1 and L2, and between L2 and L3, but not between L1 and L3. Unlike L1 and L2, the participants did not engage in standardized testing for L3. We therefore developed a detailed method for assessing their L3 proficiency by using a number of different classroom and laboratory measures to arrive at an overall L3 Global score using data reduction techniques. It is worth noting that despite differences in how the three languages were measured, a significant association between L2 and L3 was found. The significant association between L1 and L2 was unlikely to be due to measurement similarities.

It is important to highlight that after controlling for nonlanguage factors such as SES 40 , 41 , 42 , musical experience 43 , 44 , age 11 , 12 , 13 , 14 , 15 , 16 , 17 , gender 45 , and motivational factors 47 that have previously been reported to impact on both native and foreign language learning, we still identified significant associations among the three languages. Our unusually large sample size afforded us the opportunity to look at these factors more closely and control for them statistically. The use of a large sample size and our deployment of multiple types of analytics enhance the generalizability of our findings. Moreover, collecting data from participants who learned a real language in a classroom setting rather than studying an artificial language in the laboratory 53 , 54 enhances the ecological validity of our study.

We believe our results cannot simply be explained by the influence of a sensitive/critical period of language acquisition 10 , 11 , 12 . Our learners started L2 acquisition well before any commonly accepted age definition of a critical period for language 11 , 13 , 14 , 16 , 17 , yet an association between L2 and L3 was found. Our findings are consistent with those of several studies of experience-related neural adaptation in the human brain, namely that the duration and extent of bilingual experiences differentially affects brain structure and function 55 , 56 , 57 , 58 , 59 , 60 . When learning a nonnative language in childhood, learners may tend to approach new information in much the same way as we acquire a native language. For example, Kim et al. 56 demonstrated that early bilinguals showed overlapping activation for L1 and L2, but segregated activation in late-onset L2 learners. Learning a nonnative language later in life, however, occurs most often in a classroom setting. As with L2 learning, L3 learning may operate under explicit learning conditions and utilize the underlying neural circuitry of nonnative language learning 58 , since the reuse of preexisting mechanisms is consistent with biological and evolutionary principles 61 .

Foreign language learning, namely the acquisition/learning of a language after the first language, is known to be a complex and dynamic experience 62 , 63 , with individual variabilities in achievement 64 , 65 . Understanding how nonnative language learning occurs not only enables language teaching to be optimized, with the development of learning and intervention programs that improve learners’ chances of success, but also provides an important context for investigating the interaction of impact factors that may offer a unique and fundamentally important perspective on the biological endowment and neurocognitive adaptations of human beings 55 , 56 . The present study was not designed to address foreign language learning per se, but as bilingualism/multilingualism is becoming increasingly common, researchers are increasingly interested in the learning of three or more languages. A few theoretical accounts have been proposed to account for L3 learning, such as the typological primacy model (TPM) 32 , 33 , cumulative enhancement model (CEM) 66 , L2 status factor 67 , 68 , dynamic model of multilingualism (DMM) 69 , revised hierarchical model (RHM) 70 , linguistic proximity model 71 , and foreign language effect 72 . Nonetheless, these theoretical models focus mostly on morphosyntax (and phonology to a lesser extent) rather than on the overall proficiency level of the learners. They make predictions about whether similarities in structural properties between L2/L3 and L1 or language input would facilitate language learning. The present study was not designed to evaluate any of these three hypotheses concerning L2/L3. In fact, research studies supporting FDH and LCDH have hitherto been usually conducted by focusing on two languages. We believe that by studying the proficiency of L1–L3, our study provides a more rigorous investigation of FDH, LCDH, and LCH. Nevertheless, some of our findings could be interpreted in the context of theories of L2/L3 learning.

The TPM 32 , 33 proposes that the language (either L1 or L2) that the learner views as more similar to L3 is the one most likely to facilitate L3 acquisition. The learner determines similarity by first scanning the lexicon, then considering aspects of phonology, and so on. Because English is typologically closer to other Indo-European languages, our finding of a stronger association between L2 and L3 could provide support for TPM as well. However, it is interesting to note that although German is typologically closer to English, we did not find a stronger English–German association in our SEM results than between other L2–L3 pairs, weakening support for the TPM. Furthermore, although Chinese and English are typologically distant, we found a significant association, which we interpreted as a result of learning conditions. The TPM makes no prediction about L1 and L2 association, but it is noteworthy that typological distance alone may not be sufficient to explain all aspects of native and foreign language learning, at least not when a large sample of learners are examined and when overall proficiency level rather specific grammatical structures are studied. We acknowledge that quantifying typological distance is difficult. Reliance on ancestral relationship in our analysis for typological distance could only be a starting point. Nevertheless, it is important to point out that regardless of how typological distance from English is defined (e.g., based on psychotypology) 31 , we found no statistically reliable difference across the L3 languages studied. The effect from English to French was no stronger than the effect between English to Spanish and German.

The CEM 66 postulates that language learning is cumulative, so that all previous languages (L1 and L2) may have an impact on the learning of a new language (L3). Flynn et al. 66 examined the production of English restrictive relative clauses by child and adult speakers of Kazakh (L1) and Russian (L2) who learned English as L3. They found subtle differences between adults and children and L1 did not play a more important role in L3 learning. Nonetheless, as the proficiency of L1–L3 was not measured, it was not known from this study whether there was an association between proficiency of these languages.

The L2 status factor 67 argues that L2 grammar, which is acquired later in life than L1 grammar, exerts a stronger transfer effect than L1 at the initial stages of L3 learning. The latest version of the L2 status factor 68 specifically argues that similarity in learning contexts and metalinguistic knowledge between L2 and L3, which is most likely subserved by declarative memory, make L2 especially influential. Again, most studies supporting the L2 status factor focused on the grammar rather than the proficiency of learners, but our results of an association between L2 and L3 lend support for this theory to some extent.

The DMM 69 is another model relevant to the present study. According to DMM, learning of a second language creates a “metalinguistic knowledge and awareness” system that is distinct from that of monolinguals, which facilitates the learning of subsequent languages. The author pointed out that the development of a L3 system is dependent on the dynamic adaptation of existing systems. As our participants learned the three languages consecutively, our results (L1→L2; L2→L3) partly support the claims of DMM, but we did not find any association between L1→L3.

The RHM 70 , a model of bilingual language processing, argues that there are two types of word representations including lexical representations of word forms and conceptual representations of word meanings. Late bilinguals who acquired L2 after early childhood thus showed longer translation latencies from L1 to L2 than from L2 to L1 because of the underlying asymmetry in the strength of the links between lexical and conceptual representations in L1 and L2. In our data, it could be that the links between L2 and L3 were stronger than between L1 and L3, because L2 was also used as a language of instruction. However, as we demonstrated in our analysis, both lower level learners and high level learners showed converging patterns despite the differences in the percentages of L2 used in the teaching of L3.

One limitation of this study is that although we have investigated a relatively large set of nonlanguage variables, these may not fully represent all factors that influence language learning. For example, we have not examined the impact of different types of memory (including the procedural/declarative system 61 , working memory 27 , or language ‘aptitude’ 73 , 74 , 75 ) on the success of L3 learning. Future studies should attempt to explore a broader range of variables, such as language aptitude, working memory, procedural memory, and declarative memory, to expand our understanding of the interaction between language and nonlanguage factors. Furthermore, as participants in our study learned both their second and third languages in the classroom, instead of (as with their native language) by natural immersion, future studies might profitably examine the cases of immigrants or heritage speakers, who share the context of the native and the nonnative language. This could provide additional evidence for the LCH model. In addition, we acknowledge that age of learning does covary with the continuum of learning conditions, and thus could be a confounding variable. Our learners started learning L1 from birth, but L2 and L3 at around 3 years and 18 years, respectively. Despite the relatively small age difference between L1 and L2 (3 years), the strength of their association of learning success is comparable to the association between L2 and L3 where there is an age gap of around 15 years. Future research will need to systematically address age of learning as a contributing factor, but from the evidence available it does not appear that age is a primary contributor to the results (see also Flege et al. 15 ). Another limitation of our study is that it did not account for the potential influence of genetic variation on the learning of native and foreign languages 51 , 76 , 77 , 78 , which should be addressed in future research.

In sum, the current study adds to the growing body of evidence demonstrating the influence of prior linguistic experience, motivational and affective factors on the learning of a new nonnative language. Importantly, the study provides supportive evidence for the hypothesis that learning conditions of languages may be the principal factor that influences how the proficiency levels of L1/L2/L3 are associated. As shown by our participants from Hong Kong, their L1 proficiency had a positive effect on their L2 but not on their L3, while their L2 had a positive effect on their L3, even after nonlanguage factors are accounted for. Our results provide empirical evidence for our current understanding of language learning and broader issues of human cognition and learning. The results may have implications for studies concerning intervention for communication disorders in children 79 , 80 and adults 81 .

Participants

A total of 636 participants between 18 and 25 years of age were enrolled in the present study. They were all native speakers of Cantonese who learned English as a second language from early childhood. All were students of a ML class at the Chinese University of Hong Kong, who were learning either German ( n  = 176), French ( n  = 187), or Spanish ( n  = 273) as L3. All participants had nonverbal IQ within normal limits (≥85), as assessed by the Test of Nonverbal Intelligence, Fourth Edition 82 . Hearing was screened for the frequencies of 500 Hz, 1 kHz, 2 kHz, and 4 kHz at 30 dB HL in a sound booth. Participants supplied basic demographic information, including gender, date of birth, and family SES by completing a questionnaire, and also answered questions on their musical and language background. To calculate musical training experience, we asked participants whether they had received musical training before and listed the style of music they studied (e.g., jazz piano) and years of training undertaken in that particular style. We coded participants’ musical training into two categories: Yes = have received at least 1 year of musical training and No = have received less than 1 year of musical training or have not received any musical training at all. Family SES was assessed following the Hollingshead index 83 , an extensively used measure, by coding parents’ educational levels and occupational prestige. Participant characteristics are reported in Table  1 . Not every participant had data from all of these measures. Missing data were randomly presented in the dataset due to incomplete data submission by the participants or coding errors. Written informed consent was obtained from each participant. The research protocol was approved by the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee. Participants were invited to join the study through mass emails and advertisements in their language classes after obtaining permission from the language teachers.

L1 and L2 proficiency measure

Standardized public examinations are commonly adopted as measures of language proficiency in large-scale research 84 . We used participants’ composite grades of Chinese and English language subject in the HKDSE examination as L1 and L2 proficiency scores. The composite score is calculated based on reading, writing, speaking, and listening skills. HKDSE examination is the public examination for university entrance in Hong Kong, administered by the Hong Kong Examinations and Assessment Authority 48 . Standards referenced reporting with annual calibration exercises is implemented to ensure that scores across years reflect the same levels of performance 39 . Participants’ average grade of 5.32 on the English subject test is roughly equivalent to an overall band score around 7 in the International English Language Testing System. HKDSE examination is typically taken in the final year of secondary school (at around age 17 years).

L3 proficiency measure

Unlike L1 and L2, our participants did not attend any standardized public examination for L3. This presents a challenge for obtaining an overall measure of L3 proficiency, especially when different languages were learned. To overcome this challenge, we obtained laboratory-based and classroom-based measures from each participant which covered their reading, writing, speaking, and listening abilities for their target language. In the laboratory, participants provided a narrative sample by telling the “Frog, Where Are You?” story 85 . Their production was then transcribed and analyzed using the CLAN program following the CHAT transcription manual 86 for a number of narrative measures (see Supplementary Table  1 ). For lexical access, participants named body parts in the target language using the Hawaii Assessment of Language Access battery 87 . For assessment of the native accent of speech production, short excerpts from the narrative production were evaluated by native speakers. We also had access to each participant’s exam score for the language class they took. These exam grades were z-transformed in order to be compared across classes. All of these measures were subject to a data reduction procedure via principal component analysis (PCA), and the final L3 Global score was obtained for each participant.

L3 narrative measures

Participants were instructed to tell a story in the target languages based on pictures taken from a children’s wordless story book named “Frog, Where Are You?” 85 . A microphone recorded the storytelling. No time limit was set for this task. Audio recordings were transcribed by native speakers of the target languages using the CLAN program by following the CHAT transcription manual 86 . Two transcribers were employed for each language. The first transcriber transcribed the audio recordings, while the second transcriber checked the transcript and coded learners’ errors at the word and sentence levels. The second transcriber also randomly transcribed 10% of the audio recordings from scratch for a reliability test. On average, a 93% consensus was reached by comparing the first-draft transcripts from the first and the second transcribers after removing punctuations. All transcripts were double-checked by research assistants who spoke the target languages as an additional quality control procedure. Discrepancies were noted and resolved by communicating with transcribers. The transcripts were then automatically coded with morphosyntactic information using the CLAN program 86 . In total, the transcripts contained 70,020 words of French, 65,804 words of German, and 82,904 words of Spanish. The vast majority of these words (93.33%, 96.32%, and 97.18%, respectively) were automatically tagged by the CLAN program.

A PCA was conducted on the 15 indexes of the quality of L3 narrative production (see Supplementary Table  1 ) with orthogonal rotation (varimax). The Kaiser–Meyer–Olkin (KMO) measure verified the sampling adequacy for the analysis (measure of sampling adequacy (MSA) value = 0.78) with all variables having MSA above 0.50 as the cut-off point. Bartlett’s test of sphericity, χ 2  = 1768, p  < 0.001, indicated that correlations between items were sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each component in the data. Three components had eigenvalues above Kaiser’s criterion of 2 and in combination explained 65% of the variance. The scree plot showed inflexions that would justify retaining two components in the final analysis. Factor loadings suggest that component 1 represents length of the narrative, component morphosyntactic complexity, and lexical diversity, while component 2 represents the content, including the mean length of the utterances (Supplementary Fig.  1 ; Supplementary Table  2 )

L3 language access

In addition to linguistic knowledge, research in bilingualism has examined the relative strength of the two or more languages by considering language access. Based on psycholinguistic principles, language access can be defined by the relative speed of accessing and naming basic vocabulary and simple phrase structures. Participants were assessed using a picture naming task following the Hawaii Assessment of Language Access battery 87 . Participants were instructed to name 31 photographs of body parts in the target language as quickly as possible. The stimuli were presented in a random order on a computer screen. Participants pressed a button on the response box to present the next stimulus. We recorded what they named and calculated accuracy rates of naming as an indicator of L3 vocabulary. Native speakers of the target languages listened to the audio recordings and judged whether participants gave an accurate name of the picture. We calculated the percentage of accuracy rates for each participant (Supplementary Table  2 ).

L3 pronunciation ratings

Participants were rated for their pronunciation by native speakers of the target languages who had either no exposure, or very limited exposure, to Cantonese. Two 20–30 s excerpts per recording were taken from the beginning and the end of each recording, excluding any initial pauses or false starts. Two counterbalanced lists were created for each language. Each list was further divided into subquestionnaires with 30–40 trials per list. Sixteen native speakers were recruited to rate the recordings on a 9-point scale for native-like qualities via crowd-sourcing programs. The recordings were presented to them in randomized order via Qualtrics.com. The final ratings were averaged across mean ratings of the two excerpts (Supplementary Table  2 ).

L3 classroom exam scores

To measure participants’ classroom performance, the final exam scores of the L3 language were collected at the end of each academic term. A typical exam consisted of speaking, writing, listening, and reading. For each language class, permission was obtained to gather the mean and standard deviation of the final exam for the entire class. We were therefore able to convert the raw exam scores of each study participant into a z score that reflected their relative performance within the class that they took (Supplementary Table  2 ). Because of the various limitations of relying solely on classroom exams to assess student performance 88 , 89 and because different languages and proficiency levels were compared, classroom exam performance was only one of the many measures we considered in arriving at the final L3 Global score for each participant.

L3 Global score

The procedures described above generated five measures associated with L3 outcome: the first two components from the PCA of the narrative analysis, language access, pronunciation, and classroom exam. To eliminate the variability of language proficiency across participants who enrolled at different class levels of the third languages, we standardized narrative measures, language access, and pronunciation ratings by calculating the z scores within each class level of each third language. We then entered these four measures together with classroom exam scores into a PCA with orthogonal rotation (varimax) for further data reduction. The KMO measure verified the sampling adequacy for the analysis (MSA value = 0.60) with all variables having MSA above 0.50 cut-off point. Bartlett’s test of sphericity, χ 2  = 144.99, p  < 0.001, indicated that correlations between items were sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each component in the data. We took the loadings of the first component that had an eigenvalue of 1.58 and explained 32% of the variance as the L3 Global scores to mark the overall L3 proficiency of the participants (Supplementary Fig.  1 ).

Supplementary Fig.  2 and Supplementary Table  3 show Spearman’s rho correlations between L3 Global scores and the five L3 outcome measures inputted into the original PCA analysis. In addition, correlations between L3 measures and L1 and L2 proficiency are also reported.

ML learner questionnaire

Success in learning a new language is correlated with the learners’ motivation, which can be measured by the ML learner questionnaire 49 . The original questionnaire was designed to cover ten factors, including ideal L2 self, ought-to L2 self, family influence, and attitudes in the first two parts of the questionnaire. Entering all ten factors into our statistical analysis would be inappropriate, and we employed a data reduction method to identify fewer underlying variables. Using PCA with varimax rotation, two components were retained for Part I of the questionnaire. One covered factors related to extrinsic variables such as ought-to L2 self and family influence, which we labeled “external motivation.” The other, which we labeled “internal motivation,” included items on Ideal L2 Self. A separate PCA analysis was conducted for Part II of the questionnaire and two factors, named anxiety and attitude, were retained (Supplementary Fig.  3a , b).

As learners’ motivation is significantly associated with the outcome of learning a new language, motivation was measured in detail by adapting the ML Learners’ motivation questionnaire 49 . The first part of the questionnaire consists of 49 statement-type items measuring the learners’ motivation (e.g., “I have to learn ML because I don’t want to fail the ML course”). Participants were asked to give their ratings on a six-point Likert scale, with the options ranging from “Strongly disagree” to “Strongly agree.” The second part consists of 17 question-type items about learners’ anxiety and attitudes toward the target language class, the native speakers’ community, and the culture (e.g., “Do you always look forward to ML classes?”). Participants were instructed to give their answers on a six-point Likert scale, with options ranging from “Not at all” to “Very much.”

Separate PCAs based on varimax rotation were conducted on Part I and Part II of the learner motivation questionnaire, as illustrated in Supplementary Figs  3 and 4 . For Part I (Q1–Q49), the KMO measure verified the sampling adequacy for the analysis (MSA value = 0.93) with all variables having MSA above 0.50 as the cut-off point. Bartlett’s test of sphericity, χ 2  = 1950, p  < 0.001, indicated that correlations between items were sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each component in the data. Two components had eigenvalues above Kaiser’s criterion of 4, and in combination explained 35% of the variance. The scree plot showed inflexions that would justify retaining two components in the final analysis. Factor loadings suggest that component 1 represents external motivation, and component 2 internal motivation (Supplementary Fig.  3a ).

For Part II (Q50–Q67), we followed the same procedure of data analysis. The KMO measure verified the sampling adequacy for the analysis (MSA value = 0.86) with all variables having MSA above 0.50 as the cut-off point. Bartlett’s test of sphericity, χ 2  = 588, p  < 0.001, indicated that correlations between items were sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each component in the data. Three components had eigenvalues above Kaiser’s criterion of 2, and in combination explained 42% of the variance. The scree plot showed inflexions that would justify retaining two components in the final analysis. Factor loadings suggest that component 1 represents attitudes toward the L3 language, culture, and community, while component 2 represents anxiety (Supplementary Fig.  3b ).

Data reduction

As many measures about L3 proficiency were obtained from each participant, we first employed data reduction procedures using the PCA to reveal major components of the overall language proficiency of each participant (henceforth “L3 Global scores”).

Correlations between L1, L2, and L3 proficiency

Spearman’s cc and p values were calculated between indicators of language proficiency of L1–L3 using R 90 . Pairwise deletion was adopted to minimize loss through listwise deletion. FDR correction was used to calculate statistical significance.

General linear regression models

We fitted separate general linear regression models for L2 and L3 proficiency. To predict English HKDSE grades, we included demographic factors and Chinese HKDSE grades as predictors (Table  2 ). To predict L3 Global scores, we used not only demographic variables but also motivational and affective factors (internal motivation, external motivation, anxiety, and attitude), along with both Chinese and English HKDSE grades as predictors (Table  3 ). We used the “p.adjust” function in R, and calculated FDR adjusted p values with the Benjamini–Hochberg method.

Structural equation models (SEMs)

To further quantify statistical relationships among L1–L3 proficiency, we fitted a series of latent variable structural equation models (SEMs) 91 , 92 to allow simultaneous fitting of multiple regression models using the lavaan package, version 0.6-1 93 in R 90 . We assumed a causal structure of predictor variables and hypothesized two a priori metamodels to evaluate our three hypotheses The LCH model predicts that L1 has effects on L2 but not on L3, while L2 has effects on L3 (Fig. 1a ). FDH predicts no relationship across the three languages (Fig. 1b ). The LCDH model hypothesizes that L1 has effects on both L2 and L3, while L2 has effects on L3 only (Fig. 1c ).

In all models, proficiency scores in L1–L3 were treated as latent variables that were approximated using scores from exams and experiments. Using language proficiency scores as latent variables, we fitted a structural model to reflect the hypotheses about how L1–L3 are related to each other. The measurement model links the latent variable to the observed variables. Exam grades of Chinese (L1) and English (L2) subjects in the HKDSE were used as observed variables to measure L1 and L2 proficiency, respectively. L3 Global scores, calculated using the PCA based on lab and classroom measures, were used as the observed variable of L3 proficiency. Demographic, music, IQ, and motivational factors were also added to the measurement models.

SEM model selection and parameter estimation

We fitted two separate latent variable SEMs to test the hypotheses of LCH and LCDH, using full information maximum likelihood to adjust for missing data, and with the robust SEs to account for nonnormality. The SEM of LCH was trimmed to achieve better global fit statistics by removing paths with high standard residuals. We then adopted the same structure of the LCH SEM by including a path between the latent variable L1 and L3 as the model of LCDH. We assessed the goodness of fit for each model and reported the parameters for the most likely model. To evaluate the overall fit of the models, we used the CFI (acceptable fit: 0.95–0.97, good fit: >0.97), SRMR (acceptable fit: 0.05–0.10, good fit: <0.05), and RMSEA (acceptable fit: <0.08, good fit: <0.05), and reported the TLI (good fit: ≥0.80) and Yuan–Bentler scaling factor for each model 94 , 95 . We did not use the χ 2 of overall fit, because the test is always rejected when the sample size is large.

Machine-learning prediction analysis

We conducted prediction analyses using SVR classifier under support vector machine to examine whether proficiency of L2 and L3 can be predicted by significant predictors in the GLMs. SVR was implemented with the e1071 package 50 in R 90 . We obtained algorithms of SVR models with linear kernel and the penalty parameter C = 1 and epsilon = 0.1 to predict language proficiency. We calculated the mean of the predictions of the SVR models and used this average as our global projection of language proficiency. To evaluate the performance of the SVR, we used a nested tenfold cross-validation procedure via the e1071 50 and caret 93 , 96 packages in R 90 . In this procedure, we first randomly used data from 90% of the participants to build a training SVR model. Parameters from the training model were used to predict the language proficiency of the remaining 10% of participants as test data. For each iteration, we calculated the Pearson’s cc between the predicted language proficiency and the actual language proficiency in the test data. We repeated this process ten times and calculated the averaged cc. We then repeated this process 10,000 times, resulting in a distribution containing 10,000 cc values which represents the SVR model’s performance of predicting language proficiency. A null distribution of predictability was generated by a permutation test based on randomly ordered observed data. We repeated this permutation procedure 10,000 times with the same tenfold cross-validation procedure to generate the distributions for the predictability of the null model. Overall, 5% was set as the critical value for a two-tailed t test of the null hypotheses (predictability being the same as random predictions).

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Data availability

All data needed to evaluate the conclusions in the paper are present in the paper. The numeric data of this study are available at Open Science Framework ( https://osf.io/f5wt8 ).

Code availability

The script for statistical analysis is available is available at Open Science Framework ( https://osf.io/f5wt8 ), for purposes of reproducing or extending the analysis.

Kennedy, D. & Norman, C. What don’t we know? Science 309 , 75 (2005).

Article   CAS   PubMed   Google Scholar  

Morgan-Short, K. et al. A view of the neural representation of second language syntax through artificial language learning under implicit contexts of exposure. Stud. Second Lang. Acquis. 37 , 383–419 (2015).

Article   Google Scholar  

Ettlinger, M., Bradlow, A. R. & Wong, P. C. M. Variability in the learning of complex morphophonology. Appl. Psycholinguist. 35 , 807–831 (2014).

Deng, Z., Chandrasekaran, B., Wang, S. & Wong, P. C. M. Resting-state low-frequency fluctuations reflect individual differences in spoken language learning. Cortex 76 , 63–78 (2016).

Article   PubMed   Google Scholar  

Bley-Vroman, R. What is the logical problem of foreign language learning? in Linguistic Perspectives on Second Language Acquisition 41–68 (Cambridge University Press, 1989). https://doi.org/10.1017/CBO9781139524544.005 .

Bley-Vroman, R. The evolving context of the fundamental difference hypothesis. Stud. Second Lang. Acquis. 31 , 175–198 (2009).

Sparks, R. L. Examining the linguistic coding differences hypothesis to explain individual differences in foreign language learning. Ann. Dyslexia 45 , 187–214 (1995).

Sparks, R. L. Individual differences in L2 learning and long-term L1-L2 relationships. Lang. Learn. 62 , 5–27 (2012).

Chomsky, N. Language and Mind (Harper and Row, 1968).

Lenneberg, E. H. Biological Foundations of Language (Wiley, 1967).

Johnson, J. S. & Newport, E. L. Critical period effects in second language learning: the influence of maturational state on the acquisition of English as a second language. Cogn. Psychol. 21 , 60–99 (1989).

Birdsong, D. Age and second language acquisition and processing: a selective overview: age and L2 acquisition and processing. Lang. Learn. 56 , 9–49 (2006).

Hartshorne, J. K., Tenenbaum, J. B. & Pinker, S. A critical period for second language acquisition: evidence from 2/3 million English speakers. Cognition 177 , 263–277 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Chen, T. & Hartshorne, J. K. More evidence from over 1.1 million subjects that the critical period for syntax closes in late adolescence. Cognition 214 , 104706 (2021).

Flege, J. E., Yeni-Komshian, G. H. & Liu, S. Age constraints on second-language acquisition. J. Mem. Lang. 41 , 78–104 (1999).

Perani, D. et al. The bilingual brain. Proficiency and age of acquisition of the second language. Brain 121 (Pt 10), 1841–1852 (1998).

Mayberry, R. I. & Lock, E. Age constraints on first versus second language acquisition: evidence for linguistic plasticity and epigenesis. Brain Lang. 87 , 369–384 (2003).

Kuhl, P. K. Brain mechanisms in early language acquisition. Neuron 67 , 713–727 (2010).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Ganschow, L., Sparks, R. L. & Javorsky, J. Foreign language learning difficulties: an historical perspective. J. Learn. Disabil. 31 , 248–258 (1998).

Sparks, R. L., Patton, J. & Luebbers, J. Individual differences in L2 achievement mirror individual differences in L1 skills and L2 aptitude: crosslinguistic transfer of L1 to L2 skills. Foreign Lang. Ann. 52 , 255–283 (2019).

Kaan, E. Predictive sentence processing in L2 and L1: what is different? Linguistic Approaches Biling. 4 , 257–282 (2014).

Sparks, R. L., Patton, J., Ganschow, L. & Humbach, N. Long-term relationships among early first language skills, second language aptitude, second language affect, and later second language proficiency. Appl. Psycholinguist. 30 , 725–755 (2009).

Sparks, R. L., Patton, J., Ganschow, L. & Humbach, N. Do L1 reading achievement and L1 print exposure contribute to the prediction of L2 proficiency? Lang. Learn. 62 , 473–505 (2012).

Mueller, J. L., Friederici, A. D. & Männel, C. Auditory perception at the root of language learning. Proc. Natl Acad. Sci. USA 109 , 15953–15958 (2012).

Gagné, E. D. The Cognitive Psychology of School Learning (Little, Brown, 1985).

Netten, A., Droop, M. & Verhoeven, L. Predictors of reading literacy for first and second language learners. Read. Writ. 24 , 413–425 (2011).

Baddeley, A., Gathercole, S. & Papagno, C. The phonological loop as a language learning device. Psychol. Rev. 105 , 158–173 (1998).

Genesee, F. Second language learning through immersion: a review of U.S. programs. Rev. Educ. Res. 55 , 541–561 (1985).

Cheng, X. & Zhang, L. J. Teacher written feedback on english as a foreign language learners’ writing: examining native and nonnative english-speaking teachers’ practices in feedback provision. Front. Psychol . 12 , 629921 (2021).

Cenoz, J. The role of typology in the organization of the multilingual lexicon. in The Multilingual Lexicon (eds Cenoz, J., Hufeisen, B. & Jessner, U.) 103–116 (Springer Netherlands, 2003). https://doi.org/10.1007/978-0-306-48367-7_8 .

Kellerman, E. Now you see it, now you don’t. in Language Transfer in Language Learning (eds Gass, S. & Selinker L.) 112–134 (Newbury House, 1983).

Rothman, J. L3 syntactic transfer selectivity and typological determinacy: the typological primacy model. Second Lang. Res. 27 , 107–127 (2011).

Rothman, J. Linguistic and cognitive motivations for the typological primacy model (TPM) of third language (L3) transfer: timing of acquisition and proficiency considered*. Bilingualism: Lang. Cogn. 18 , 179–190 (2015).

Partanen, E. et al. Learning-induced neural plasticity of speech processing before birth. Proc. Natl Acad. Sci. USA 110 , 15145–15150 (2013).

Census and Statistics Department of Hong Kong SAR. Snapshot of Hong Kong Population (2016) . https://www.bycensus2016.gov.hk/en/Snapshot-08.html (2016).

Wong, S. W. L., Dealey, J., Leung, V. W. H. & Mok, P. P. K. Production of English connected speech processes: an assessment of Cantonese ESL learners’ difficulties obtaining native-like speech. Lang. Learn. J. 1–16 (2019). https://doi.org/10.1080/09571736.2019.1642372 .

Poon, A. Y. K. Language use, and language policy and planning in Hong Kong. Curr. Issues Lang. Plan. 11 , 1–66 (2010).

Lau, C. English language education in Hong Kong: a review of policy and practice. Curr. Issues Lang. Plan. 21 , 457–474 (2020).

Hong Kong Examinations and Assessment Authority. Grading Procedures and Standards-referenced Reporting in the HKDSE (Hong Kong Examinations and Assessment Authority, 2018). http://www.hkeaa.edu.hk/DocLibrary/Media/Leaflets/HKDSE_SRR_A4booklet_Mar2018.pdf .

Fernald, A., Marchman, V. A. & Weisleder, A. SES differences in language processing skill and vocabulary are evident at 18 months. Dev. Sci. 16 , 234–248 (2013).

Kahn-Horwitz, J., Shimron, J. & Sparks, R. L. Weak and strong novice readers of English as a foreign language: effects of first language and socioeconomic status. Ann. Dyslexia 56 , 161–185 (2006).

Rowe, M. L. & Goldin-Meadow, S. Differences in early gesture explain SES disparities in child vocabulary size at school entry. Science 323 , 951–953 (2009).

Slevc, L. R. & Miyake, A. Individual differences in second-language proficiency: does musical ability matter? Psychol. Sci. 17 , 675–681 (2006).

Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T. & Kraus, N. Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci. 10 , 420–422 (2007).

van der Slik, F. W. P., van Hout, R. W. N. M. & Schepens, J. J. The gender gap in second language acquisition: gender differences in the acquisition of Dutch among Immigrants from 88 countries with 49 mother tongues. PLoS ONE 10 , e0142056 (2015).

Article   PubMed   CAS   PubMed Central   Google Scholar  

Horwitz, E. K., Horwitz, M. B. & Cope, J. Foreign language classroom anxiety. Mod. Lang. J. 70 , 125–132 (1986).

Dörnyei, Z. & Ryan, S. The Psychology of the Language Learner Revisited . (Routledge, 2015). https://doi.org/10.4324/9781315779553 .

Hong Kong Examinations and Assessment Authority. Benchmarking Study between IELTS and HKDSE English Language Examination (2012) . https://www.hkeaa.edu.hk/mobile/en/recognition/benchmarking/hkdse/ielts .

Dörnyei, Z. & Taguchi, T. Questionnaires in Second Language Research: Construction, Administration, and Processing (Routledge, 2009).

Meyer, D., et al. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071) (TU Wien, 2019). https://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf .

Wong, P. C. M. et al. ASPM-lexical tone association in speakers of a tone language: direct evidence for the genetic-biasing hypothesis of language evolution. Sci. Adv. 6 , eaba5090 (2020).

Awad, M. & Khanna, R. Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers . (Apress, 2015). https://doi.org/10.1007/978-1-4302-5990-9 .

Ettlinger, M., Morgan-Short, K., Faretta-Stutenberg, M. & Wong, P. C. M. The relationship between artificial and second language learning. Cogn. Sci. 40 , 822–847 (2016).

Morgan-Short, K., Faretta-Stutenberg, M., Brill-Schuetz, K. A., Carpenter, H. & Wong, P. C. M. Declarative and procedural memory as individual differences in second language acquisition*. Bilingualism: Lang. Cogn. 17 , 56–72 (2014).

Cao, F., Tao, R., Liu, L., Perfetti, C. A. & Booth, J. R. High proficiency in a second language is characterized by greater involvement of the first language network: evidence from Chinese learners of English. J. Cogn. Neurosci. 25 , 1649–1663 (2013).

Kim, K. H., Relkin, N. R., Lee, K. M. & Hirsch, J. Distinct cortical areas associated with native and second languages. Nature 388 , 171–174 (1997).

DeLuca, V., Rothman, J., Bialystok, E. & Pliatsikas, C. Redefining bilingualism as a spectrum of experiences that differentially affects brain structure and function. Proc. Natl Acad. Sci. USA 116 , 7565–7574 (2019).

Pliatsikas, C., DeLuca, V. & Voits, T. The many shades of bilingualism: language experiences modulate adaptations in brain structure. Lang. Learn. 70 , 133–149 (2020).

Sulpizio, S., Del Maschio, N., Del Mauro, G., Fedeli, D. & Abutalebi, J. Bilingualism as a gradient measure modulates functional connectivity of language and control networks. Neuroimage 205 , 116306 (2020).

Kimppa, L., Kujala, T. & Shtyrov, Y. Individual language experience modulates rapid formation of cortical memory circuits for novel words. Sci. Rep. 6 , 30227 (2016).

Hamrick, P., Lum, J. A. G. & Ullman, M. T. Child first language and adult second language are both tied to general-purpose learning systems. Proc. Natl Acad. Sci. USA 115 , 1487–1492 (2018).

de Bot, K., Lowie, W. & Verspoor, M. A dynamic systems theory approach to second language acquisition. Bilingualism: Lang. Cogn. 10 , 7–21 (2007).

Larsen-Freeman, D. Chaos/complexity science and second language acquisition. Appl. Linguist. 18 , 141–165 (1997).

Skehan, P. Individual differences in second language learning. Stud. Second Lang. Acquis. 13 , 275–298 (1991).

Skehan, P. Foreign language aptitude and its relationship with grammar: a critical overview. Appl. Linguist. 36 , 367–384 (2015).

Flynn, S., Foley, C. & Vinnitskaya, I. The cumulative-enhancement model for language acquisition: comparing adults’ and children’s patterns of development in first, second and third language acquisition of relative clauses. Int. J. Multiling. 1 , 3–16 (2004).

Bardel, C. & Falk, Y. The role of the second language in third language acquisition: the case of Germanic syntax. Second Lang. Res. 23 , 459–484 (2007).

Bardel, C. & Falk, Y. The L2 status factor and the declarative/procedural. in Third Language Acquisition in Adulthood (eds Cabrelli, J., Flynn, S. & Rothman, J.) 61–78 (Benjamins, 2012).

Jessner, U. A. DST model of multilingualism and the role of metalinguistic awareness. Mod. Lang. J. 92 , 270–283 (2008).

Kroll, J. F. & Stewart, E. Category interference in translation and picture naming: evidence for asymmetric connections between bilingual memory representations. J. Mem. Lang. 33 , 149–174 (1994).

Westergaard, M., Mitrofanova, N., Mykhaylyk, R. & Rodina, Y. Crosslinguistic influence in the acquisition of a third language: the linguistic proximity model. Int. J. Biling. 21 , 666–682 (2017).

Meisel, J. M. Transfer as a second-language strategy. Lang. Commun. 3 , 11–46 (1983).

Carroll, J. B. Implications of aptitude test research and psycholinguistic theory for foreign-language teaching. Int. J. Psycholinguist. 2 , 5–14 (1973).

Google Scholar  

Skehan, P. The role of foreign language aptitude in a model of school learning. Lang. Test. 3 , 188–221 (1986).

Wen, Z. (Edward), Biedroń, A. & Skehan, P. Foreign language aptitude theory: yesterday, today and tomorrow. Lang. Teach . 50 , 1–31 (2017).

Wong, P. C. M., Chandrasekaran, B. & Zheng, J. The derived allele of ASPM is associated with lexical tone perception. PLoS ONE 7 , e34243 (2012).

Wong, P. C. M., Ettlinger, M. & Zheng, J. Linguistic grammar learning and DRD2-TAQ-IA polymorphism. PLoS ONE 8 , e64983 (2013).

Wong, P. C. M., Morgan-Short, K., Ettlinger, M. & Zheng, J. Linking neurogenetics and individual differences in language learning: the dopamine hypothesis. Cortex 48 , 1091–1102 (2012).

Ingvalson, E. M. & Wong, P. C. M. Training to improve language outcomes in cochlear implant recipients. Front. Psychol. 4 , 263 (2013).

Feng, G. et al. Neural preservation underlies speech improvement from auditory deprivation in young cochlear implant recipients. Proc. Natl Acad. Sci. USA 115 , E1022–E1031 (2018).

Peach, R. & Wong, P. Integrating the message level into treatment for agrammatism using story retelling. Aphasiology 18 , 429–441 (2004).

Brown, L., Sherbenou, R. & Johnsen, S. TONI 4, Test of Nonverbal Intelligence (Pro-Ed, 2010).

Hollingshead, A. B. Four factor index of social status. Yale J. Sociol. 8 , 21–51 (2011).

Rangel, M. A. & Shi, Y. Early patterns of skill acquisition and immigrants’ specialization in STEM careers. Proc. Natl Acad. Sci. USA 116 , 484–489 (2019).

Mayer, M. Frog, Where Are You? (Dial Press, 1969).

MacWhinney, B. The CHILDES project: Tools for Analyzing Talk (third edition): Volume I: Transcription Format and Programs, Volume II: The database. Comput. linguist . 26 , 657–657 (2000).

O’Grady, W., Schafer, A. J., Perla, J., Lee, O. & Wieting, J. A. psychoinguistic tool for the assessment of language loss: the HALA project. Lang. Doc. Conserv. 3 , 1–112 (2009).

Hulstijn, J. H. The construct of language proficiency in the study of bilingualism from a cognitive perspective*. Bilingualism: Lang. Cogn. 15 , 422–433 (2012).

Tremblay, A. Proficiency assessment standards in second language acquisition RESEARCH: ‘Clozing’ the gap. Stud. Second Lang. Acquis. 33 , 339–372 (2011).

R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2019).

Bollen, K. A. & Noble, M. D. Structural equation models and the quantification of behavior. Proc. Natl Acad. Sci. USA 108 (Suppl 3), 15639–15646 (2011).

Hayduk, L. A. & Littvay, L. Should researchers use single indicators, best indicators, or multiple indicators in structural equation models? BMC Med. Res. Methodol. 12 , 159 (2012).

Rosseel, Y. lavaan: an R package for structural equation modeling. J. Stat. Softw. 48 , 1–36 (2012).

Cangur, S. & Ercan, I. Comparison of model fit indices used in structural equation modeling under multivariate normality. J. Mod. Appl. Stat. Methods 14 , 152–167 (2015).

Schermelleh-Engel, K., Moosbrugger, H. & Müller, H. Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Methods Psychol. Res. 8 , 23–74 (2003).

Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw . 28 , 1–26 (2008).

Download references

Acknowledgements

This work was supported by the Research Grants Council of Hong Kong (HSSPF #34000118), the Dr. Stanley Ho Medical Development Foundation and the Department of Linguistics and Modern Languages at The Chinese University of Hong Kong (P.C.M.W.). The authors wish to thank Kynthia Yip, Doris Lau, Kay Hoi Yi Wong, Danny Ip, Tsz Yin Wong, and a group of student helpers and transcribers for their assistance with data collection and analysis. The authors also wish to thank the Modern Languages Instructional Team at the Chinese University of Hong Kong (led by Annette Frömel, Lee Hyon Sou Kunegel, and Celia Carracedo Manzanera at the time of the research) for their assistance with participant recruitment and general advice on the project, Xiujuan Geng for advice on statistical analysis, and Kara-Morgan Short for comments about the Spanish data.

Author information

Authors and affiliations.

Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong SAR, China

Xin Kang, Virginia Yip & Patrick C. M. Wong

Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR, China

Xin Kang & Patrick C. M. Wong

Department of Linguistics, The University of Hong Kong, Hong Kong SAR, China

Stephen Matthews

Childhood Bilingualism Research Centre, The Chinese University of Hong Kong, Hong Kong SAR, China

  • Virginia Yip

Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Hong Kong SAR, China

  • Patrick C. M. Wong

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: P.C.M.W.; data curation and data analysis: X.K.; funding acquisition: P.C.M.W.; investigation: P.C.M.W., V.Y., S.M., and X.K.; methodology: P.C.M.W. and X.K.; project administration: X.K.; supervision: P.C.M.W., S.M., and V.Y.; visualization: X.K.; writing—original draft: P.C.M.W. and X.K.; and writing—review and editing: all authors (based on CRediT: Contributor Roles Taxonomy, https://casrai.org/credit/ ). All authors have agreed the final completed version. All authors hold accountability for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Patrick C. M. Wong .

Ethics declarations

Competing interests.

P.C.M.W. declares that he is an owner of a startup company supported by a Hong Kong Government technology startup scheme. The research reported here is not associated with this company. The other authors declare no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Kang, X., Matthews, S., Yip, V. et al. Language and nonlanguage factors in foreign language learning: evidence for the learning condition hypothesis. npj Sci. Learn. 6 , 28 (2021). https://doi.org/10.1038/s41539-021-00104-9

Download citation

Received : 02 December 2020

Accepted : 16 August 2021

Published : 15 September 2021

DOI : https://doi.org/10.1038/s41539-021-00104-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Scientific Data (2023)

  • Kwong Wai Choy

Scientific Reports (2022)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

hypothesis on language learning

Chapter 3: Infancy and Toddlerhood

Theories of language development.

Psychological theories of language learning differ in terms of the importance they place on nature and nurture. Remember that we are a product of both nature and nurture. Researchers now believe that language acquisition is partially inborn and partially learned through our interactions with our linguistic environment (Gleitman & Newport, 1995; Stork & Widdowson, 1974).

Learning Theory: Perhaps the most straightforward explanation of language development is that it occurs through the principles of learning, including association and reinforcement (Skinner, 1953). Additionally, Bandura (1977) described the importance of observation and imitation of others in learning language. There must be at least some truth to the idea that language is learned through environmental interactions or nurture. Children learn the language that they hear spoken around them rather than some other language. Also supporting this idea is the gradual improvement of language skills with time. It seems that children modify their language through imitation and reinforcement, such as parental praise and being understood. For example, when a two-year-old child asks for juice, he might say, “me juice,” to which his mother might respond by giving him a cup of apple juice.

However, language cannot be entirely learned. For one, children learn words too fast for them to be learned through reinforcement. Between the ages of 18 months and 5 years, children learn up to 10 new words every day (Anglin, 1993). More importantly, language is more generative than it is imitative . Language is not a predefined set of ideas and sentences that we choose when we need them, but rather a system of rules and procedures that allows us to create an infinite number of statements, thoughts, and ideas, including those that have never previously occurred. When a child says that she “swimmed” in the pool, for instance, she is showing generativity. No adult speaker of English would ever say “swimmed,” yet it is easily generated from the normal system of producing language.

Other evidence that refutes the idea that all language is learned through experience comes from the observation that children may learn languages better than they ever hear them. Deaf children whose parents do not speak ASL very well nevertheless are able to learn it perfectly on their own, and may even make up their own language if they need to (Goldin-Meadow & Mylander, 1998). A group of deaf children in a school in Nicaragua, whose teachers could not sign, invented a way to communicate through made-up signs (Senghas, Senghas, & Pyers, 2005). The development of this new Nicaraguan Sign Language has continued and changed as new generations of students have come to the school and started using the language. Although the original system was not a real language, it is becoming closer and closer every year, showing the development of a new language in modern times.

Chomsky and Nativism : The linguist Noam Chomsky is a believer in the nature approach to language, arguing that human brains contain a Language Acquisition Device that includes a universal grammar that underlies all human language (Chomsky, 1965, 1972). According to this approach, each of the many languages spoken around the world (there are between 6,000 and 8,000) is an individual example of the same underlying set of procedures that are hardwired into human brains. Chomsky’s account proposes that children are born with a knowledge of general rules of syntax that determine how sentences are constructed. Language develops as long as the infant is exposed to it. No teaching, training, or reinforcement is required for language to develop as proposed by Skinner.

Chomsky differentiates between the deep structure of an idea; that is, how the idea is represented in the fundamental universal grammar that is common to all languages , and the surface structure of the idea or how it is expressed in any one language . Once we hear or express a thought in surface structure, we generally forget exactly how it happened. At the end of a lecture, you will remember a lot of the deep structure (i.e., the ideas expressed by the instructor), but you cannot reproduce the surface structure (the exact words that the instructor used to communicate the ideas).

Although there is general agreement among psychologists that babies are genetically programmed to learn language, there is still debate about Chomsky’s idea that there is a universal grammar that can account for all language learning. Evans and Levinson (2009) surveyed the world’s languages and found that none of the presumed underlying features of the language acquisition device were entirely universal. In their search they found languages that did not have noun or verb phrases, that did not have tenses (e.g., past, present, future), and even some that did not have nouns or verbs at all, even though a basic assumption of a universal grammar is that all languages should share these features.

hypothesis on language learning

Figure 3.20 Victor of Aveyon. Public domain.

Critical Periods: Anyone who has tried to master a second language as an adult knows the difficulty of language learning. Yet children learn languages easily and naturally. Children who are not exposed to language early in their lives will likely never learn one. Case studies, including Victor the “Wild Child,” who was abandoned as a baby in France and not discovered until he was 12, and Genie, a child whose parents kept her locked in a closet from 18 months until 13 years of age, are (fortunately) two of the only known examples of these deprived children. Both of these children made some progress in socialization after they were rescued, but neither of them ever developed language (Rymer, 1993). This is also why it is important to determine quickly if a child is deaf, and to communicate in sign language immediately. Deaf children who are not exposed to sign language during their early years will likely never learn it (Mayberry, Lock, & Kazmi, 2002). The concept of critical periods highlights the importance of both nature and nurture for language development.

Social pragmatics: Another view emphasizes the very social nature of human language. Language from this view is not only a cognitive skill, but also a social one. Language is a tool humans use to communicate, connect to, influence, and inform others. Most of all, language comes out of a need to cooperate. The social nature of language has been demonstrated by a number of studies that have shown that children use several pre-linguistic skills (such as pointing and other gestures) to communicate not only their own needs, but what others may need. So a child watching her mother search for an object may point to the object to help her mother find it.

Eighteen-month to 30-month-olds have been shown to make linguistic repairs when it is clear that another person does not understand them (Grosse, Behne, Carpenter & Tomasello, 2010). Grosse et al. (2010) found that even when the child was given the desired object, if there had been any misunderstanding along the way (such as a delay in being handed the object, or the experimenter calling the object by the wrong name), children would make linguistic repairs. This would suggest that children are using language not only as a means of achieving some material goal, but to make themselves understood in the mind of another person.

hypothesis on language learning

Figure 3.21 Drawing of Brain Showing Broca’s and Wernicke’s AreasFor most people the left hemisphere is specialized for language. Broca’s area, near the motor cortex, is involved in language production, whereas Wernicke’s area, near the auditory cortex, is specialized for language comprehension.

Brain Areas for Language : For the 90% of people who are right-handed, language is stored and controlled by the left cerebral cortex, although for some left-handers this pattern is reversed. These differences can easily be seen in the results of neuroimaging studies that show that listening to and producing language creates greater activity in the left hemisphere than in the right. Broca’s area , an area in front of the left hemisphere near the motor cortex , is responsible for language production (Figure 3.21). This area was first localized in the 1860s by the French physician Paul Broca, who studied patients with lesions to various parts of the brain. Wernicke’s area , an area of the brain next to the auditory cortex , is responsible for language comprehension.

Learning Objectives: Psychosocial Development in Infancy and Toddlerhood

  • Identify styles of temperament and explore goodness-of-fit
  • Describe the early theories of attachment
  • Contrast styles of attachment according to the Strange Situation Technique
  • Explain the factors that influence attachment
  • Describe self-awareness, stranger wariness, and separation anxiety
  • Use Erikson’s theory to characterize psychosocial development during infancy
  • Authored by : Martha Lally and Suzanne Valentine-French. Provided by : College of Lake County Foundation. Located at : http://dept.clcillinois.edu/psy/LifespanDevelopment.pdf . License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

Footer Logo Lumen Candela

Privacy Policy

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis

* E-mail: [email protected]

Affiliation Department of Multilingualism, University of Fribourg, Fribourg, Switzerland

  • Jan Vanhove

PLOS

  • Published: July 25, 2013
  • https://doi.org/10.1371/journal.pone.0069172
  • Reader Comments

17 Jul 2014: The PLOS ONE Staff (2014) Correction: The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis. PLOS ONE 9(7): e102922. https://doi.org/10.1371/journal.pone.0102922 View correction

Figure 1

In second language acquisition research, the critical period hypothesis ( cph ) holds that the function between learners' age and their susceptibility to second language input is non-linear. This paper revisits the indistinctness found in the literature with regard to this hypothesis's scope and predictions. Even when its scope is clearly delineated and its predictions are spelt out, however, empirical studies–with few exceptions–use analytical (statistical) tools that are irrelevant with respect to the predictions made. This paper discusses statistical fallacies common in cph research and illustrates an alternative analytical method (piecewise regression) by means of a reanalysis of two datasets from a 2010 paper purporting to have found cross-linguistic evidence in favour of the cph . This reanalysis reveals that the specific age patterns predicted by the cph are not cross-linguistically robust. Applying the principle of parsimony, it is concluded that age patterns in second language acquisition are not governed by a critical period. To conclude, this paper highlights the role of confirmation bias in the scientific enterprise and appeals to second language acquisition researchers to reanalyse their old datasets using the methods discussed in this paper. The data and R commands that were used for the reanalysis are provided as supplementary materials.

Citation: Vanhove J (2013) The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis. PLoS ONE 8(7): e69172. https://doi.org/10.1371/journal.pone.0069172

Editor: Stephanie Ann White, UCLA, United States of America

Received: May 7, 2013; Accepted: June 7, 2013; Published: July 25, 2013

Copyright: © 2013 Jan Vanhove. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: No current external funding sources for this study.

Competing interests: The author has declared that no competing interests exist.

Introduction

In the long term and in immersion contexts, second-language (L2) learners starting acquisition early in life – and staying exposed to input and thus learning over several years or decades – undisputedly tend to outperform later learners. Apart from being misinterpreted as an argument in favour of early foreign language instruction, which takes place in wholly different circumstances, this general age effect is also sometimes taken as evidence for a so-called ‘critical period’ ( cp ) for second-language acquisition ( sla ). Derived from biology, the cp concept was famously introduced into the field of language acquisition by Penfield and Roberts in 1959 [1] and was refined by Lenneberg eight years later [2] . Lenneberg argued that language acquisition needed to take place between age two and puberty – a period which he believed to coincide with the lateralisation process of the brain. (More recent neurological research suggests that different time frames exist for the lateralisation process of different language functions. Most, however, close before puberty [3] .) However, Lenneberg mostly drew on findings pertaining to first language development in deaf children, feral children or children with serious cognitive impairments in order to back up his claims. For him, the critical period concept was concerned with the implicit “automatic acquisition” [2, p. 176] in immersion contexts and does not preclude the possibility of learning a foreign language after puberty, albeit with much conscious effort and typically less success.

sla research adopted the critical period hypothesis ( cph ) and applied it to second and foreign language learning, resulting in a host of studies. In its most general version, the cph for sla states that the ‘susceptibility’ or ‘sensitivity’ to language input varies as a function of age, with adult L2 learners being less susceptible to input than child L2 learners. Importantly, the age–susceptibility function is hypothesised to be non-linear. Moving beyond this general version, we find that the cph is conceptualised in a multitude of ways [4] . This state of affairs requires scholars to make explicit their theoretical stance and assumptions [5] , but has the obvious downside that critical findings risk being mitigated as posing a problem to only one aspect of one particular conceptualisation of the cph , whereas other conceptualisations remain unscathed. This overall vagueness concerns two areas in particular, viz. the delineation of the cph 's scope and the formulation of testable predictions. Delineating the scope and formulating falsifiable predictions are, needless to say, fundamental stages in the scientific evaluation of any hypothesis or theory, but the lack of scholarly consensus on these points seems to be particularly pronounced in the case of the cph . This article therefore first presents a brief overview of differing views on these two stages. Then, once the scope of their cph version has been duly identified and empirical data have been collected using solid methods, it is essential that researchers analyse the data patterns soundly in order to assess the predictions made and that they draw justifiable conclusions from the results. As I will argue in great detail, however, the statistical analysis of data patterns as well as their interpretation in cph research – and this includes both critical and supportive studies and overviews – leaves a great deal to be desired. Reanalysing data from a recent cph -supportive study, I illustrate some common statistical fallacies in cph research and demonstrate how one particular cph prediction can be evaluated.

Delineating the scope of the critical period hypothesis

First, the age span for a putative critical period for language acquisition has been delimited in different ways in the literature [4] . Lenneberg's critical period stretched from two years of age to puberty (which he posits at about 14 years of age) [2] , whereas other scholars have drawn the cutoff point at 12, 15, 16 or 18 years of age [6] . Unlike Lenneberg, most researchers today do not define a starting age for the critical period for language learning. Some, however, consider the possibility of the critical period (or a critical period for a specific language area, e.g. phonology) ending much earlier than puberty (e.g. age 9 years [1] , or as early as 12 months in the case of phonology [7] ).

Second, some vagueness remains as to the setting that is relevant to the cph . Does the critical period constrain implicit learning processes only, i.e. only the untutored language acquisition in immersion contexts or does it also apply to (at least partly) instructed learning? Most researchers agree on the former [8] , but much research has included subjects who have had at least some instruction in the L2.

Third, there is no consensus on what the scope of the cp is as far as the areas of language that are concerned. Most researchers agree that a cp is most likely to constrain the acquisition of pronunciation and grammar and, consequently, these are the areas primarily looked into in studies on the cph [9] . Some researchers have also tried to define distinguishable cp s for the different language areas of phonetics, morphology and syntax and even for lexis (see [10] for an overview).

Fourth and last, research into the cph has focused on ‘ultimate attainment’ ( ua ) or the ‘final’ state of L2 proficiency rather than on the rate of learning. From research into the rate of acquisition (e.g. [11] – [13] ), it has become clear that the cph cannot hold for the rate variable. In fact, it has been observed that adult learners proceed faster than child learners at the beginning stages of L2 acquisition. Though theoretical reasons for excluding the rate can be posited (the initial faster rate of learning in adults may be the result of more conscious cognitive strategies rather than to less conscious implicit learning, for instance), rate of learning might from a different perspective also be considered an indicator of ‘susceptibility’ or ‘sensitivity’ to language input. Nevertheless, contemporary sla scholars generally seem to concur that ua and not rate of learning is the dependent variable of primary interest in cph research. These and further scope delineation problems relevant to cph research are discussed in more detail by, among others, Birdsong [9] , DeKeyser and Larson-Hall [14] , Long [10] and Muñoz and Singleton [6] .

Formulating testable hypotheses

Once the relevant cph 's scope has satisfactorily been identified, clear and testable predictions need to be drawn from it. At this stage, the lack of consensus on what the consequences or the actual observable outcome of a cp would have to look like becomes evident. As touched upon earlier, cph research is interested in the end state or ‘ultimate attainment’ ( ua ) in L2 acquisition because this “determines the upper limits of L2 attainment” [9, p. 10]. The range of possible ultimate attainment states thus helps researchers to explore the potential maximum outcome of L2 proficiency before and after the putative critical period.

One strong prediction made by some cph exponents holds that post- cp learners cannot reach native-like L2 competences. Identifying a single native-like post- cp L2 learner would then suffice to falsify all cph s making this prediction. Assessing this prediction is difficult, however, since it is not clear what exactly constitutes sufficient nativelikeness, as illustrated by the discussion on the actual nativelikeness of highly accomplished L2 speakers [15] , [16] . Indeed, there exists a real danger that, in a quest to vindicate the cph , scholars set the bar for L2 learners to match monolinguals increasingly higher – up to Swiftian extremes. Furthermore, the usefulness of comparing the linguistic performance in mono- and bilinguals has been called into question [6] , [17] , [18] . Put simply, the linguistic repertoires of mono- and bilinguals differ by definition and differences in the behavioural outcome will necessarily be found, if only one digs deep enough.

A second strong prediction made by cph proponents is that the function linking age of acquisition and ultimate attainment will not be linear throughout the whole lifespan. Before discussing how this function would have to look like in order for it to constitute cph -consistent evidence, I point out that the ultimate attainment variable can essentially be considered a cumulative measure dependent on the actual variable of interest in cph research, i.e. susceptibility to language input, as well as on such other factors like duration and intensity of learning (within and outside a putative cp ) and possibly a number of other influencing factors. To elaborate, the behavioural outcome, i.e. ultimate attainment, can be assumed to be integrative to the susceptibility function, as Newport [19] correctly points out. Other things being equal, ultimate attainment will therefore decrease as susceptibility decreases. However, decreasing ultimate attainment levels in and by themselves represent no compelling evidence in favour of a cph . The form of the integrative curve must therefore be predicted clearly from the susceptibility function. Additionally, the age of acquisition–ultimate attainment function can take just about any form when other things are not equal, e.g. duration of learning (Does learning last up until time of testing or only for a more or less constant number of years or is it dependent on age itself?) or intensity of learning (Do learners always learn at their maximum susceptibility level or does this intensity vary as a function of age, duration, present attainment and motivation?). The integral of the susceptibility function could therefore be of virtually unlimited complexity and its parameters could be adjusted to fit any age of acquisition–ultimate attainment pattern. It seems therefore astonishing that the distinction between level of sensitivity to language input and level of ultimate attainment is rarely made in the literature. Implicitly or explicitly [20] , the two are more or less equated and the same mathematical functions are expected to describe the two variables if observed across a range of starting ages of acquisition.

But even when the susceptibility and ultimate attainment variables are equated, there remains controversy as to what function linking age of onset of acquisition and ultimate attainment would actually constitute evidence for a critical period. Most scholars agree that not any kind of age effect constitutes such evidence. More specifically, the age of acquisition–ultimate attainment function would need to be different before and after the end of the cp [9] . According to Birdsong [9] , three basic possible patterns proposed in the literature meet this condition. These patterns are presented in Figure 1 . The first pattern describes a steep decline of the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function up to the end of the cp and a practically non-existent age effect thereafter. Pattern 2 is an “unconventional, although often implicitly invoked” [9, p. 17] notion of the cp function which contains a period of peak attainment (or performance at ceiling), i.e. performance does not vary as a function of age, which is often referred to as a ‘window of opportunity’. This time span is followed by an unbounded decline in ua depending on aoa . Pattern 3 includes characteristics of patterns 1 and 2. At the beginning of the aoa range, performance is at ceiling. The next segment is a downward slope in the age function which ends when performance reaches its floor. Birdsong points out that all of these patterns have been reported in the literature. On closer inspection, however, he concludes that the most convincing function describing these age effects is a simple linear one. Hakuta et al. [21] sketch further theoretically possible predictions of the cph in which the mean performance drops drastically and/or the slope of the aoa – ua proficiency function changes at a certain point.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

The graphs are based on based on Figure 2 in [9] .

https://doi.org/10.1371/journal.pone.0069172.g001

Although several patterns have been proposed in the literature, it bears pointing out that the most common explicit prediction corresponds to Birdsong's first pattern, as exemplified by the following crystal-clear statement by DeKeyser, one of the foremost cph proponents:

[A] strong negative correlation between age of acquisition and ultimate attainment throughout the lifespan (or even from birth through middle age), the only age effect documented in many earlier studies, is not evidence for a critical period…[T]he critical period concept implies a break in the AoA–proficiency function, i.e., an age (somewhat variable from individual to individual, of course, and therefore an age range in the aggregate) after which the decline of success rate in one or more areas of language is much less pronounced and/or clearly due to different reasons. [22, p. 445].

DeKeyser and before him among others Johnson and Newport [23] thus conceptualise only one possible pattern which would speak in favour of a critical period: a clear negative age effect before the end of the critical period and a much weaker (if any) negative correlation between age and ultimate attainment after it. This ‘flattened slope’ prediction has the virtue of being much more tangible than the ‘potential nativelikeness’ prediction: Testing it does not necessarily require comparing the L2-learners to a native control group and thus effectively comparing apples and oranges. Rather, L2-learners with different aoa s can be compared amongst themselves without the need to categorise them by means of a native-speaker yardstick, the validity of which is inevitably going to be controversial [15] . In what follows, I will concern myself solely with the ‘flattened slope’ prediction, arguing that, despite its clarity of formulation, cph research has generally used analytical methods that are irrelevant for the purposes of actually testing it.

Inferring non-linearities in critical period research: An overview

hypothesis on language learning

Group mean or proportion comparisons.

hypothesis on language learning

[T]he main differences can be found between the native group and all other groups – including the earliest learner group – and between the adolescence group and all other groups. However, neither the difference between the two childhood groups nor the one between the two adulthood groups reached significance, which indicates that the major changes in eventual perceived nativelikeness of L2 learners can be associated with adolescence. [15, p. 270].

Similar group comparisons aimed at investigating the effect of aoa on ua have been carried out by both cph advocates and sceptics (among whom Bialystok and Miller [25, pp. 136–139], Birdsong and Molis [26, p. 240], Flege [27, pp. 120–121], Flege et al. [28, pp. 85–86], Johnson [29, p. 229], Johnson and Newport [23, p. 78], McDonald [30, pp. 408–410] and Patowski [31, pp. 456–458]). To be clear, not all of these authors drew direct conclusions about the aoa – ua function on the basis of these groups comparisons, but their group comparisons have been cited as indicative of a cph -consistent non-continuous age effect, as exemplified by the following quote by DeKeyser [22] :

Where group comparisons are made, younger learners always do significantly better than the older learners. The behavioral evidence, then, suggests a non-continuous age effect with a “bend” in the AoA–proficiency function somewhere between ages 12 and 16. [22, p. 448].

The first problem with group comparisons like these and drawing inferences on the basis thereof is that they require that a continuous variable, aoa , be split up into discrete bins. More often than not, the boundaries between these bins are drawn in an arbitrary fashion, but what is more troublesome is the loss of information and statistical power that such discretisation entails (see [32] for the extreme case of dichotomisation). If we want to find out more about the relationship between aoa and ua , why throw away most of the aoa information and effectively reduce the ua data to group means and the variance in those groups?

hypothesis on language learning

Comparison of correlation coefficients.

hypothesis on language learning

Correlation-based inferences about slope discontinuities have similarly explicitly been made by cph advocates and skeptics alike, e.g. Bialystok and Miller [25, pp. 136 and 140], DeKeyser and colleagues [22] , [44] and Flege et al. [45, pp. 166 and 169]. Others did not explicitly infer the presence or absence of slope differences from the subset correlations they computed (among others Birdsong and Molis [26] , DeKeyser [8] , Flege et al. [28] and Johnson [29] ), but their studies nevertheless featured in overviews discussing discontinuities [14] , [22] . Indeed, the most recent overview draws a strong conclusion about the validity of the cph 's ‘flattened slope’ prediction on the basis of these subset correlations:

In those studies where the two groups are described separately, the correlation is much higher for the younger than for the older group, except in Birdsong and Molis (2001) [ =  [26] , JV], where there was a ceiling effect for the younger group. This global picture from more than a dozen studies provides support for the non-continuity of the decline in the AoA–proficiency function, which all researchers agree is a hallmark of a critical period phenomenon. [22, p. 448].

In Johnson and Newport's specific case [23] , their correlation-based inference that ua levels off after puberty happened to be largely correct: the gjt scores are more or less randomly distributed around a near-horizontal trend line [26] . Ultimately, however, it rests on the fallacy of confusing correlation coefficients with slopes, which seriously calls into question conclusions such as DeKeyser's (cf. the quote above).

hypothesis on language learning

https://doi.org/10.1371/journal.pone.0069172.g002

hypothesis on language learning

Lower correlation coefficients in older aoa groups may therefore be largely due to differences in ua variance, which have been reported in several studies [23] , [26] , [28] , [29] (see [46] for additional references). Greater variability in ua with increasing age is likely due to factors other than age proper [47] , such as the concomitant greater variability in exposure to literacy, degree of education, motivation and opportunity for language use, and by itself represents evidence neither in favour of nor against the cph .

Regression approaches.

Having demonstrated that neither group mean or proportion comparisons nor correlation coefficient comparisons can directly address the ‘flattened slope’ prediction, I now turn to the studies in which regression models were computed with aoa as a predictor variable and ua as the outcome variable. Once again, this category of studies is not mutually exclusive with the two categories discussed above.

In a large-scale study using self-reports and approximate aoa s derived from a sample of the 1990 U.S. Census, Stevens found that the probability with which immigrants from various countries stated that they spoke English ‘very well’ decreased curvilinearly as a function of aoa [48] . She noted that this development is similar to the pattern found by Johnson and Newport [23] but that it contains no indication of an “abruptly defined ‘critical’ or sensitive period in L2 learning” [48, p. 569]. However, she modelled the self-ratings using an ordinal logistic regression model in which the aoa variable was logarithmically transformed. Technically, this is perfectly fine, but one should be careful not to read too much into the non-linear curves found. In logistic models, the outcome variable itself is modelled linearly as a function of the predictor variables and is expressed in log-odds. In order to compute the corresponding probabilities, these log-odds are transformed using the logistic function. Consequently, even if the model is specified linearly, the predicted probabilities will not lie on a perfectly straight line when plotted as a function of any one continuous predictor variable. Similarly, when the predictor variable is first logarithmically transformed and then used to linearly predict an outcome variable, the function linking the predicted outcome variables and the untransformed predictor variable is necessarily non-linear. Thus, non-linearities follow naturally from Stevens's model specifications. Moreover, cph -consistent discontinuities in the aoa – ua function cannot be found using her model specifications as they did not contain any parameters allowing for this.

Using data similar to Stevens's, Bialystok and Hakuta found that the link between the self-rated English competences of Chinese- and Spanish-speaking immigrants and their aoa could be described by a straight line [49] . In contrast to Stevens, Bialystok and Hakuta used a regression-based method allowing for changes in the function's slope, viz. locally weighted scatterplot smoothing ( lowess ). Informally, lowess is a non-parametrical method that relies on an algorithm that fits the dependent variable for small parts of the range of the independent variable whilst guaranteeing that the overall curve does not contain sudden jumps (for technical details, see [50] ). Hakuta et al. used an even larger sample from the same 1990 U.S. Census data on Chinese- and Spanish-speaking immigrants (2.3 million observations) [21] . Fitting lowess curves, no discontinuities in the aoa – ua slope could be detected. Moreover, the authors found that piecewise linear regression models, i.e. regression models containing a parameter that allows a sudden drop in the curve or a change of its slope, did not provide a better fit to the data than did an ordinary regression model without such a parameter.

hypothesis on language learning

To sum up, I have argued at length that regression approaches are superior to group mean and correlation coefficient comparisons for the purposes of testing the ‘flattened slope’ prediction. Acknowledging the reservations vis-à-vis self-estimated ua s, we still find that while the relationship between aoa and ua is not necessarily perfectly linear in the studies discussed, the data do not lend unequivocal support to this prediction. In the following section, I will reanalyse data from a recent empirical paper on the cph by DeKeyser et al. [44] . The first goal of this reanalysis is to further illustrate some of the statistical fallacies encountered in cph studies. Second, by making the computer code available I hope to demonstrate how the relevant regression models, viz. piecewise regression models, can be fitted and how the aoa representing the optimal breakpoint can be identified. Lastly, the findings of this reanalysis will contribute to our understanding of how aoa affects ua as measured using a gjt .

Summary of DeKeyser et al. (2010)

I chose to reanalyse a recent empirical paper on the cph by DeKeyser et al. [44] (henceforth DK et al.). This paper lends itself well to a reanalysis since it exhibits two highly commendable qualities: the authors spell out their hypotheses lucidly and provide detailed numerical and graphical data descriptions. Moreover, the paper's lead author is very clear on what constitutes a necessary condition for accepting the cph : a non-linearity in the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function, with ua declining less strongly as a function of aoa in older, post- cp arrivals compared to younger arrivals [14] , [22] . Lastly, it claims to have found cross-linguistic evidence from two parallel studies backing the cph and should therefore be an unsuspected source to cph proponents.

hypothesis on language learning

The authors set out to test the following hypotheses:

  • Hypothesis 1: For both the L2 English and the L2 Hebrew group, the slope of the age of arrival–ultimate attainment function will not be linear throughout the lifespan, but will instead show a marked flattening between adolescence and adulthood.
  • Hypothesis 2: The relationship between aptitude and ultimate attainment will differ markedly for the young and older arrivals, with significance only for the latter. (DK et al., p. 417)

Both hypotheses were purportedly confirmed, which in the authors' view provides evidence in favour of cph . The problem with this conclusion, however, is that it is based on a comparison of correlation coefficients. As I have argued above, correlation coefficients are not to be confused with regression coefficients and cannot be used to directly address research hypotheses concerning slopes, such as Hypothesis 1. In what follows, I will reanalyse the relationship between DK et al.'s aoa and gjt data in order to address Hypothesis 1. Additionally, I will lay bare a problem with the way in which Hypothesis 2 was addressed. The extracted data and the computer code used for the reanalysis are provided as supplementary materials, allowing anyone interested to scrutinise and easily reproduce my whole analysis and carry out their own computations (see ‘supporting information’).

Data extraction

hypothesis on language learning

In order to verify whether we did in fact extract the data points to a satisfactory degree of accuracy, I computed summary statistics for the extracted aoa and gjt data and checked these against the descriptive statistics provided by DK et al. (pp. 421 and 427). These summary statistics for the extracted data are presented in Table 1 . In addition, I computed the correlation coefficients for the aoa – gjt relationship for the whole aoa range and for aoa -defined subgroups and checked these coefficients against those reported by DK et al. (pp. 423 and 428). The correlation coefficients computed using the extracted data are presented in Table 2 . Both checks strongly suggest the extracted data to be virtually identical to the original data, and Dr DeKeyser confirmed this to be the case in response to an earlier draft of the present paper (personal communication, 6 May 2013).

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t001

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t002

Results and Discussion

Modelling the link between age of onset of acquisition and ultimate attainment.

I first replotted the aoa and gjt data we extracted from DK et al.'s scatterplots and added non-parametric scatterplot smoothers in order to investigate whether any changes in slope in the aoa – gjt function could be revealed, as per Hypothesis 1. Figures 3 and 4 show this not to be the case. Indeed, simple linear regression models that model gjt as a function of aoa provide decent fits for both the North America and the Israel data, explaining 65% and 63% of the variance in gjt scores, respectively. The parameters of these models are given in Table 3 .

thumbnail

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 1.

https://doi.org/10.1371/journal.pone.0069172.g003

thumbnail

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 5.

https://doi.org/10.1371/journal.pone.0069172.g004

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t003

hypothesis on language learning

To ensure that both segments are joined at the breakpoint, the predictor variable is first centred at the breakpoint value, i.e. the breakpoint value is subtracted from the original predictor variable values. For a blow-by-blow account of how such models can be fitted in r , I refer to an example analysis by Baayen [55, pp. 214–222].

hypothesis on language learning

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g005

thumbnail

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g006

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t004

hypothesis on language learning

https://doi.org/10.1371/journal.pone.0069172.g007

thumbnail

Solid: regression with breakpoint at aoa 16 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g008

thumbnail

Solid: regression with breakpoint at aoa 6 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g009

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t005

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t006

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t007

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t008

hypothesis on language learning

In sum, a regression model that allows for changes in the slope of the the aoa – gjt function to account for putative critical period effects provides a somewhat better fit to the North American data than does an everyday simple regression model. The improvement in model fit is marginal, however, and including a breakpoint does not result in any detectable improvement of model fit to the Israel data whatsoever. Breakpoint models therefore fail to provide solid cross-linguistic support in favour of critical period effects: across both data sets, gjt can satisfactorily be modelled as a linear function of aoa .

On partialling out ‘age at testing’

As I have argued above, correlation coefficients cannot be used to test hypotheses about slopes. When the correct procedure is carried out on DK et al.'s data, no cross-linguistically robust evidence for changes in the aoa – gjt function was found. In addition to comparing the zero-order correlations between aoa and gjt , however, DK et al. computed partial correlations in which the variance in aoa associated with the participants' age at testing ( aat ; a potentially confounding variable) was filtered out. They found that these partial correlations between aoa and gjt , which are given in Table 9 , differed between age groups in that they are stronger for younger than for older participants. This, DK et al. argue, constitutes additional evidence in favour of the cph . At this point, I can no longer provide my own analysis of DK et al.'s data seeing as the pertinent data points were not plotted. Nevertheless, the detailed descriptions by DK et al. strongly suggest that the use of these partial correlations is highly problematic. Most importantly, and to reiterate, correlations (whether zero-order or partial ones) are actually of no use when testing hypotheses concerning slopes. Still, one may wonder why the partial correlations differ across age groups. My surmise is that these differences are at least partly the by-product of an imbalance in the sampling procedure.

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t009

hypothesis on language learning

The upshot of this brief discussion is that the partial correlation differences reported by DK et al. are at least partly the result of an imbalance in the sampling procedure: aoa and aat were simply less intimately tied for the young arrivals in the North America study than for the older arrivals with L2 English or for all of the L2 Hebrew participants. In an ideal world, we would like to fix aat or ascertain that it at most only weakly correlates with aoa . This, however, would result in a strong correlation between aoa and another potential confound variable, length of residence in the L2 environment, bringing us back to square one. Allowing for only moderate correlations between aoa and aat might improve our predicament somewhat, but even in that case, we should tread lightly when making inferences on the basis of statistical control procedures [61] .

On estimating the role of aptitude

Having shown that Hypothesis 1 could not be confirmed, I now turn to Hypothesis 2, which predicts a differential role of aptitude for ua in sla in different aoa groups. More specifically, it states that the correlation between aptitude and gjt performance will be significant only for older arrivals. The correlation coefficients of the relationship between aptitude and gjt are presented in Table 10 .

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t010

The problem with both the wording of Hypothesis 2 and the way in which it is addressed is the following: it is assumed that a variable has a reliably different effect in different groups when the effect reaches significance in one group but not in the other. This logic is fairly widespread within several scientific disciplines (see e.g. [62] for a discussion). Nonetheless, it is demonstrably fallacious [63] . Here we will illustrate the fallacy for the specific case of comparing two correlation coefficients.

hypothesis on language learning

Apart from not being replicated in the North America study, does this difference actually show anything? I contend that it does not: what is of interest are not so much the correlation coefficients, but rather the interactions between aoa and aptitude in models predicting gjt . These interactions could be investigated by fitting a multiple regression model in which the postulated cp breakpoint governs the slope of both aoa and aptitude. If such a model provided a substantially better fit to the data than a model without a breakpoint for the aptitude slope and if the aptitude slope changes in the expected direction (i.e. a steeper slope for post- cp than for younger arrivals) for different L1–L2 pairings, only then would this particular prediction of the cph be borne out.

Using data extracted from a paper reporting on two recent studies that purport to provide evidence in favour of the cph and that, according to its authors, represent a major improvement over earlier studies (DK et al., p. 417), it was found that neither of its two hypotheses were actually confirmed when using the proper statistical tools. As a matter of fact, the gjt scores continue to decline at essentially the same rate even beyond the end of the putative critical period. According to the paper's lead author, such a finding represents a serious problem to his conceptualisation of the cph [14] ). Moreover, although modelling a breakpoint representing the end of a cp at aoa 16 may improve the statistical model slightly in study on learners of English in North America, the study on learners of Hebrew in Israel fails to confirm this finding. In fact, even if we were to accept the optimal breakpoint computed for the Israel study, it lies at aoa 6 and is associated with a different geometrical pattern.

Diverging age trends in parallel studies with participants with different L2s have similarly been reported by Birdsong and Molis [26] and are at odds with an L2-independent cph . One parsimonious explanation of such conflicting age trends may be that the overall, cross-linguistic age trend is in fact linear, but that fluctuations in the data (due to factors unaccounted for or randomness) may sometimes give rise to a ‘stretched L’-shaped pattern ( Figure 1, left panel ) and sometimes to a ‘stretched 7’-shaped pattern ( Figure 1 , middle panel; see also [66] for a similar comment).

Importantly, the criticism that DeKeyser and Larsson-Hall levy against two studies reporting findings similar to the present [48] , [49] , viz. that the data consisted of self-ratings of questionable validity [14] , does not apply to the present data set. In addition, DK et al. did not exclude any outliers from their analyses, so I assume that DeKeyser and Larsson-Hall's criticism [14] of Birdsong and Molis's study [26] , i.e. that the findings were due to the influence of outliers, is not applicable to the present data either. For good measure, however, I refitted the regression models with and without breakpoints after excluding one potentially problematic data point per model. The following data points had absolute standardised residuals larger than 2.5 in the original models without breakpoints as well as in those with breakpoints: the participant with aoa 17 and a gjt score of 125 in the North America study and the participant with aoa 12 and a gjt score of 117 in the Israel study. The resultant models were virtually identical to the original models (see Script S1 ). Furthermore, the aoa variable was sufficiently fine-grained and the aoa – gjt curve was not ‘presmoothed’ by the prior aggregation of gjt across parts of the aoa range (see [51] for such a criticism of another study). Lastly, seven of the nine “problems with supposed counter-evidence” to the cph discussed by Long [5] do not apply either, viz. (1) “[c]onfusion of rate and ultimate attainment”, (2) “[i]nappropriate choice of subjects”, (3) “[m]easurement of AO”, (4) “[l]eading instructions to raters”, (6) “[u]se of markedly non-native samples making near-native samples more likely to sound native to raters”, (7) “[u]nreliable or invalid measures”, and (8) “[i]nappropriate L1–L2 pairings”. Problem No. 5 (“Assessments based on limited samples and/or “language-like” behavior”) may be apropos given that only gjt data were used, leaving open the theoretical possibility that other measures might have yielded a different outcome. Finally, problem No. 9 (“Faulty interpretation of statistical patterns”) is, of course, precisely what I have turned the spotlights on.

Conclusions

The critical period hypothesis remains a hotly contested issue in the psycholinguistics of second-language acquisition. Discussions about the impact of empirical findings on the tenability of the cph generally revolve around the reliability of the data gathered (e.g. [5] , [14] , [22] , [52] , [67] , [68] ) and such methodological critiques are of course highly desirable. Furthermore, the debate often centres on the question of exactly what version of the cph is being vindicated or debunked. These versions differ mainly in terms of its scope, specifically with regard to the relevant age span, setting and language area, and the testable predictions they make. But even when the cph 's scope is clearly demarcated and its main prediction is spelt out lucidly, the issue remains to what extent the empirical findings can actually be marshalled in support of the relevant cph version. As I have shown in this paper, empirical data have often been taken to support cph versions predicting that the relationship between age of acquisition and ultimate attainment is not strictly linear, even though the statistical tools most commonly used (notably group mean and correlation coefficient comparisons) were, crudely put, irrelevant to this prediction. Methods that are arguably valid, e.g. piecewise regression and scatterplot smoothing, have been used in some studies [21] , [26] , [49] , but these studies have been criticised on other grounds. To my knowledge, such methods have never been used by scholars who explicitly subscribe to the cph .

I suspect that what may be going on is a form of ‘confirmation bias’ [69] , a cognitive bias at play in diverse branches of human knowledge seeking: Findings judged to be consistent with one's own hypothesis are hardly questioned, whereas findings inconsistent with one's own hypothesis are scrutinised much more strongly and criticised on all sorts of points [70] – [73] . My reanalysis of DK et al.'s recent paper may be a case in point. cph exponents used correlation coefficients to address their prediction about the slope of a function, as had been done in a host of earlier studies. Finding a result that squared with their expectations, they did not question the technical validity of their results, or at least they did not report this. (In fact, my reanalysis is actually a case in point in two respects: for an earlier draft of this paper, I had computed the optimal position of the breakpoints incorrectly, resulting in an insignificant improvement of model fit for the North American data rather than a borderline significant one. Finding a result that squared with my expectations, I did not question the technical validity of my results – until this error was kindly pointed out to me by Martijn Wieling (University of Tübingen).) That said, I am keen to point out that the statistical analyses in this particular paper, though suboptimal, are, as far as I could gather, reported correctly, i.e. the confirmation bias does not seem to have resulted in the blatant misreportings found elsewhere (see [74] for empirical evidence and discussion). An additional point to these authors' credit is that, apart from explicitly identifying their cph version's scope and making crystal-clear predictions, they present data descriptions that actually permit quantitative reassessments and have a history of doing so (e.g. the appendix in [8] ). This leads me to believe that they analysed their data all in good conscience and to hope that they, too, will conclude that their own data do not, in fact, support their hypothesis.

I end this paper on an upbeat note. Even though I have argued that the analytical tools employed in cph research generally leave much to be desired, the original data are, so I hope, still available. This provides researchers, cph supporters and sceptics alike, with an exciting opportunity to reanalyse their data sets using the tools outlined in the present paper and publish their findings at minimal cost of time and resources (for instance, as a comment to this paper). I would therefore encourage scholars to engage their old data sets and to communicate their analyses openly, e.g. by voluntarily publishing their data and computer code alongside their articles or comments. Ideally, cph supporters and sceptics would join forces to agree on a protocol for a high-powered study in order to provide a truly convincing answer to a core issue in sla .

Supporting Information

Dataset s1..

aoa and gjt data extracted from DeKeyser et al.'s North America study.

https://doi.org/10.1371/journal.pone.0069172.s001

Dataset S2.

aoa and gjt data extracted from DeKeyser et al.'s Israel study.

https://doi.org/10.1371/journal.pone.0069172.s002

Script with annotated R code used for the reanalysis. All add-on packages used can be installed from within R.

https://doi.org/10.1371/journal.pone.0069172.s003

Acknowledgments

I would like to thank Irmtraud Kaiser (University of Fribourg) for helping me to get an overview of the literature on the critical period hypothesis in second language acquisition. Thanks are also due to Martijn Wieling (currently University of Tübingen) for pointing out an error in the R code accompanying an earlier draft of this paper.

Author Contributions

Analyzed the data: JV. Wrote the paper: JV.

  • 1. Penfield W, Roberts L (1959) Speech and brain mechanisms. Princeton: Princeton University Press.
  • 2. Lenneberg EH (1967) Biological foundations of language. New York: Wiley.
  • View Article
  • Google Scholar
  • 10. Long MH (2007) Problems in SLA. Mahwah, NJ: Lawrence Erlbaum.
  • 14. DeKeyser R, Larson-Hall J (2005) What does the critical period really mean? In: Kroll and De Groot [75], 88–108.
  • 19. Newport EL (1991) Contrasting conceptions of the critical period for language. In: Carey S, Gelman R, editors, The epigenesis of mind: Essays on biology and cognition, Hillsdale, NJ: Lawrence Erlbaum. 111–130.
  • 20. Birdsong D (2005) Interpreting age effects in second language acquisition. In: Kroll and De Groot [75], 109–127.
  • 22. DeKeyser R (2012) Age effects in second language learning. In: Gass SM, Mackey A, editors, The Routledge handbook of second language acquisition, London: Routledge. 442–460.
  • 24. Weisstein EW. Discontinuity. From MathWorld –A Wolfram Web Resource. Available: http://mathworld.wolfram.com/Discontinuity.html . Accessed 2012 March 2.
  • 27. Flege JE (1999) Age of learning and second language speech. In: Birdsong [76], 101–132.
  • 36. Champely S (2009) pwr: Basic functions for power analysis. Available: http://cran.r-project.org/package=pwr . R package, version 1.1.1.
  • 37. R Core Team (2013) R: A language and environment for statistical computing. Available: http://www.r-project.org/ . Software, version 2.15.3.
  • 47. Hyltenstam K, Abrahamsson N (2003) Maturational constraints in sla . In: Doughty CJ, Long MH, editors, The handbook of second language acquisition, Malden, MA: Blackwell. 539–588.
  • 49. Bialystok E, Hakuta K (1999) Confounded age: Linguistic and cognitive factors in age differences for second language acquisition. In: Birdsong [76], 161–181.
  • 52. DeKeyser R (2006) A critique of recent arguments against the critical period hypothesis. In: Abello-Contesse C, Chacón-Beltrán R, López-Jiménez MD, Torreblanca-López MM, editors, Age in L2 acquisition and teaching, Bern: Peter Lang. 49–58.
  • 55. Baayen RH (2008) Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.
  • 56. Fox J (2002) Robust regression. Appendix to An R and S-Plus Companion to Applied Regression. Available: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix.html .
  • 57. Ripley B, Hornik K, Gebhardt A, Firth D (2012) MASS: Support functions and datasets for Venables and Ripley's MASS. Available: http://cran.r-project.org/package=MASS . R package, version 7.3–17.
  • 58. Zuur AF, Ieno EN, Walker NJ, Saveliev AA, Smith GM (2009) Mixed effects models and extensions in ecology with R. New York: Springer.
  • 59. Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2013) nlme: Linear and nonlinear mixed effects models. Available: http://cran.r-project.org/package=nlme . R package, version 3.1–108.
  • 65. Field A (2009) Discovering statistics using SPSS. London: SAGE 3rd edition.
  • 66. Birdsong D (2009) Age and the end state of second language acquisition. In: Ritchie WC, Bhatia TK, editors, The new handbook of second language acquisition, Bingley: Emerlad. 401–424.
  • 75. Kroll JF, De Groot AMB, editors (2005) Handbook of bilingualism: Psycholinguistic approaches. New York: Oxford University Press.
  • 76. Birdsong D, editor (1999) Second language acquisition and the critical period hypothesis. Mahwah, NJ: Lawrence Erlbaum.
  • Beelinguapp

Stephen Krashen’s Five Hypotheses of Second Language Acquisition

A male teacher helping a young female student

Unsplash Monica Melton

Interested in learning more about linguistics and linguists ? Read this way.

What is linguistics? Linguistics is the scientific study of language that involves the analysis of language rules, language meaning, and language context. In other words, linguistics is the study of how a language is formed and how it works.

A person who studies linguistics is called a linguist . A linguist doesn't necessarily have to learn different languages because they’re more interested in learning the structures of languages. Noam Chomsky and Dr. Stephen Krashen are two of the world’s most famous linguists.

Dr. Stephen D. Krashen facilitated research in second-language acquisition , bilingual education, and in reading. He believes that language acquisition requires “meaningful interaction with the target language.”

Dr. Krashen also theorized that there are 5 hypotheses to second language acquisition , which have been very influential in the field of second language research and teaching

Let’s take a look at these hypotheses. Who knows, maybe you’ve applied one or all of them in your language learning journey!

1. Acquisition-Learning Hypothesis

The Acquisition-Learning Hypothesis states that there is a distinction between language acquisition and language learning. In language acquisition, the student acquires language unconsciously . This is similar to when a child picks up their first language. On the other hand, language learning happens when the student is consciously discovering and learning the rules and grammatical structures of the language.

2. Monitor Hypothesis

Monitor Hypothesis states that the learner is consciously learning the grammar rules and functions of a language rather than its meaning. This theory focuses more on the correctness of the language . To use the Monitor Hypothesis properly, three standards must be met:

  • The acquirer must know the rules of the language.
  • The acquirer must concentrate on the exact form of the language.
  • The acquirer must set aside some time to review and apply the language rules in a conversation. Although this is a tricky one, because in regular conversations there’s hardly enough time to ensure correctness of the language.

3. Natural Order Hypothesis

Natural Order Hypothesis is based on the finding that language learners learn grammatical structures in a fixed and universal way . There is a sense of predictability to this kind of learning, which is similar to how a speaker learns their first language.

4. Input Hypothesis

Input Hypothesis places more emphasis on the acquisition of the second language. This theory is more concerned about how the language is acquired rather than learned.

Moreover, the Input Hypothesis states that the learner naturally develops language as soon as the student receives interesting and fun information .

5. Affective Filter Hypothesis

In Affective Filter, language acquisition can be affected by emotional factors. If the affective filter is higher, then the student is less likely to learn the language. Therefore, the learning environment for the student must be positive and stress-free so that the student is open for input.

A cartoon practicing language acquisition

Language acquisition is a subconscious process. Usually, language acquirers are aware that they’re using the language for communication but are unaware that they are acquiring the language.

Language acquirers also are unaware of the rules of the language they are acquiring. Instead, language acquirers feel a sense of correctness, when the sentence sounds and feels right. Strange right? But it is also quite fascinating.

Acquiring a language is a tedious process. It can seem more like a chore, a game of should I learn today or should I just do something else? Sigh

But Dr. Krashen’s language acquisition theories might be onto something, don’t you think? Learning a language should be fun and in some way it should happen naturally. Try to engage in meaningful interactions like reading exciting stories and relevant news articles, even talking with friends and family in a different language. Indulge in interesting and easy to understand language activities, and by then you might already have slowly started acquiring your target language!

Related Posts

Learn how to talk like a pirate with “scurvy” & more pirate phrases, how to count from 1 to 100 in greek, “amélie” and 5 more movies to learn french, subscribe to our newsletter.

What Toddlers and AI Can Learn From Each Other

A scientist put a head cam on his daughter to capture the human experience of acquiring words.

Photo of letters, kids' hands

When Luna was seven months old, she began wearing, at the behest of her scientist father, a hot-pink helmet topped with a camera that would, for about an hour at a time, capture everything she saw, heard, and said.

Her dad, Brenden Lake, is a cognitive scientist at New York University, where he thinks about  better ways to train artificial intelligence. At home, he trains human intelligence, by which I just mean that he’s a dad. On a recent Sunday morning, he held up a robot puppet and asked Luna, who was meting out her wooden toys, “That’s for robot?” “ Oh, goodness! ” he added in a silly Muppet voice. Luna seemed only half-interested—in the way small children are always sort of on their own planet—but a couple of minutes later, she returned to pick up the puppet. “Robot,” she said. “Robot,” she repeated, dispelling any doubt about her intentions. Her dad turned to me, surprised; he’d never heard her say “robot” before. Had she learned the word just now?

At one and a half years old, Luna has mastered a technique that current AI models still struggle with. Humans are able to learn from very few examples, meaning that even a single encounter can solidify the connection between a silver hand puppet and the phonemes that comprise robot . Artificial intelligence, by contrast, might need dozens or hundreds of examples; large language models such as the one powering ChatGPT are trained on hundreds of billions, if not trillions, of words—an inhuman amount of data. “It would take 1,000 years to hear a word count of that magnitude,” Lake told me. Given that humans require far less time—and far fewer words—to master language, could AI be trained more efficiently? Could it learn more like, say, a toddler?

These questions are what initially motivated Lake to record his daughter’s early life. (He convinced his wife with a more sentimental pitch: They could capture and replay Luna’s baby milestones.) Along with 25 or so other babies, Luna is part of the BabyView study, a project run out of Stanford that aims to capture exactly what young kids see and hear in the crucial period when they’re picking up language at a shocking speed. Lake hopes to one day feed the data from Luna and others back into his own models—to find better ways of training AI, and to find better ways of understanding how children pull off the ubiquitous yet remarkable feat of learning language.

Luna at 18 months wearing a pink helmet with a camera attached to it

Recent technological leaps—in artificial intelligence but also in hardware—have given scientists new tools to study developmental psychology. Cameras and microphones are now small and light enough for infants to wear for longer stretches, including at home. In the early 2010s, Michael Frank, a developmental psychologist at Stanford who now leads the BabyView study, decided along with two colleagues to put head cams on their own babies. They would track their kid’s development from about six months, when babies have enough neck strength not to be bothered by a camera, to around two and a half years, when toddlers really start to protest. Frank’s baby, however, refused to consent from the start; she absolutely loathed having anything on her head. “I didn’t have the fortitude” to continue, he told me, and his daughter dropped out. But the data collected from the two other babies—and later a third—were released in 2021 as a research data set called SAYCam .

Not long after, Frank decided to go bigger and more ambitious with BabyView, which has the same idea but would feature more babies, crisper audio, and higher-resolution video. This resulting data will be shared online, but to protect the privacy of the babies, it’ll be accessible only to institutional researchers, and participants can choose to delete videos well before they are shared.

Lake decided to sign his daughter up for BabyView—fortunately, Luna tolerates a head cam just fine—because he was immediately interested in using the SAYCam corpus to train AI. On a basic level, would it even work? His group at NYU published a much-publicized paper in Science this past winter, which showed that even AI models trained on 61 hours of low-res video, or just 1 percent of the waking hours of one SAYCam baby, could classify images that showed objects including a ball, a cat, and a car. A suite of other studies from his lab has found that AI models trained on SAYCam can form their own categories such as “food,” “vehicle,” and “clothing ,” or clusters of words that correspond to nouns or verbs —as you might expect a young toddler to do as they learn about the world.

To be clear, Lake and his colleagues do not claim to have replicated in silico how toddlers actually learn. The models are trained, after all, on snippets of video and text—a poor imitation of the rich sensory experience of being in a physical world. But the studies are most interesting as proof of concept. In the field of language acquisition, for example, experts have long debated the extent to which babies are born with innate knowledge, strategies, and biases that prime them for language. On one extreme, one could posit that babies are born as blank slates. The AI models definitely started as blank slates; if training them with just a small percentage of a baby’s audiovisual experience can get them to classify balls and cats, that shows how a neural network can learn “starting from nothing,” says Wai Keen Vong, a research scientist with Lake at NYU who was the lead author on the paper. By adult-human standards, though, the model might not be that impressive; its overall accuracy was just over 60 percent. Maybe it needs more data, or maybe it needs a different way of learning.

This is so where things could get interesting. Lake would like to equip artificial intelligence with some of the strategies babies seem to display in lab experiments. For example, when young children are presented with a new word—such as kettle —they seem to instinctively know that kettle refers to the entirety of the kettle, not just to its handle or its material or its color. When they are presented with two objects—one familiar and one unfamiliar—they will assume that a new word they hear refers to the new object . These strategies likely help babies sift through the cluttered, chaotic world of their everyday life, and they might help artificial intelligence learn more like a child too, though AI is far, far from actually imitating child.

That said, AI models could also inspire new ideas about how children learn. Chen Yu, a developmental psychologist at the University of Austin, told me about a study he conducted with his collaborators, in which parents and children wore head cams as they played with toys in a lab. Curiously, Yu and his collaborators noticed that a computer vision model trained on the child’s POV outperformed one trained on the parents’. What about a child’s perspective is more conducive to learning? They wondered if children were manipulating the toys more thoroughly, turning them back and forth to see the objects from different angles. With these AI-enabled approaches, Yu said,  can generate new hypotheses that can then be tested back in the lab. Linda Smith, a frequent collaborator of Yu’s and a longtime researcher of children’s cognitive development at Indiana University, told me that when she got her start, decades ago, “artificial intelligence and human cognition were one field. It was all the same people.” The fields may have since diverged, but the overlap still makes perfect sense.

In his academic career, Lake, who had previously taught an AI model how handwriting works , has also been seeking out ways to create an AI that learns more like a human. This naturally led him to how children learn. “Children are the most impressive learners in the known universe,” he told me. After having kids of his own, he thought parenting might inspire fresh insights for his research. Has it? I probed, curious because I too have a 1-year-old at home, whose intellectual progression is possibly the most remarkable thing I have ever witnessed. Not really, he admitted. Watching children learn is so fascinating, so surprising, so fun. But the process is also so intuitive—if it was that easy for any parent to understand how their child learns, wouldn’t we have figured it out already?

  • Graduate & Professional
  • Teaching, Learning, and Education
  • Theory and Practice Blog

Fabiola Fadda-Ginski: A Story of Empowerment Through Language

fabiola-fadda-ginski.jpg

Can you share your journey in the field of education, specifically focusing on language acquisition and your role as the World Language Programs Director for Chicago Public Schools?

My journey in education, rooted in a passion for languages instilled by my mother, began in Sardinia, Italy, where I grew up in a bilingual household. My formal education in languages took off at the Oxford Institute of Rome, leading to a degree in foreign languages and interpretation. After I moved to the U.S. I began volunteering at my son's school, which inspired me to pursue a career in education. Now, as the Director of World Language Programs for Chicago Public Schools, I support 800 teachers across 238 schools in teaching 11 languages, blending my personal passion and professional goals to inspire others through the power of language.

Reflecting on your experiences, what advice would you give to women pursuing leadership roles in education?

Reflecting on my journey, the advice I'd give to women pursuing leadership roles in education is to always follow your passion and not shy away from hard work. It's crucial to find your voice and use it. Being present and vocal in spaces where decisions are made is essential because silence can often lead to being overlooked. For me, advocating for my teachers and students has always been a priority. I encourage others to confidently find their own voice and use it to make an impact. It's about making sure you're heard, representing those you're responsible for, and not letting any barriers hold you back from achieving your goals.

How do you approach the challenges of language education in diverse educational settings?

Approaching the challenges of language education, especially in diverse settings, requires a student-centered mindset. I prioritize understanding the unique needs, dreams, and identities of each student. Teaching English to minoritized students, for example, is crucial because it serves as a 'power code' in the U.S., enabling students to navigate and understand different cultures without losing sight of their own identities. My goal through language education is not just to teach a new language but to open students' minds and build bridges between cultures, fostering a more harmonious world. It's about empowering students with the tools they need to thrive in a global society while celebrating their individual heritage and contributions.

In honor of Women's History Month, who are some women in the field of education or linguistics who have inspired you?

Two women who have profoundly inspired me in the field of education and linguistics are Dr. Karime Asaf, the Chief of Language and Culture at CPS, and Dr. Gholdy Muhammad. Dr. Asaf's journey, much like my own, showcases the power of immigrant women in leadership roles within education. Her efforts to integrate bilingual and world language education have been groundbreaking, truly moving mountains in a system where resources can be scarce. Dr. Muhammad's approach to finding joy in teaching resonates deeply with me. Her philosophy emphasizes the growth and joy in the educational process, not just for students but for teachers as well. Both women embody passionate leadership and innovative teaching methods that have significantly impacted the field, inspiring me to continue advocating for comprehensive language education.

Could you discuss the significance of multilingual education in today's globalized world and how it impacts students' futures?

In today's globalized world, multilingual education is not just an asset; it's a necessity. It equips students with the flexibility, empathy, and comprehensive understanding needed to navigate diverse perspectives. This form of education broadens their horizons, allowing them to connect with people across different cultures and backgrounds. I've witnessed countless success stories where students, through their language skills, have secured opportunities that were once beyond their reach—be it scholarships, international careers, or the ability to make significant societal contributions. Multilingual education opens doors to a world of possibilities, preparing students not just for success in their personal and professional lives but also enabling them to act as bridges between cultures, fostering a more inclusive and understanding global community.

How do you envision the future of world language programs in public schools, especially considering technological advancements and cultural shifts?

Envisioning the future of world language programs, I'm optimistic about the role of technology, like AI, in enriching language learning. However, concerns remain over the sustainability of these programs amid teacher shortages and university department closures. Technology should enhance, not replace, the invaluable human elements of teaching. It's crucial to advocate for support and innovation in language education, ensuring programs continue to prepare students as global citizens in a connected world.

Could you share a memorable success story from your career that highlights the impact of effective language teaching on students' lives?

One of the most memorable success stories from my career involves a student who went on to receive a full scholarship to study Arabic at a prestigious university. This student, like many others I've had the privilege to teach, came back to thank me, attributing their success to the language skills and cultural understanding we developed together. Witnessing former students achieve their dreams and make positive contributions to the world is incredibly rewarding. It reaffirms the profound impact that dedicated educators and effective language teaching can have, not just academically, but in shaping students' futures and helping them realize their potential.

Finally, how do you balance your responsibilities at Northwestern and the Chicago Public Schools, and what synergies have you found between these roles?

Balancing my roles at Northwestern and the Chicago Public Schools involves what I like to call "cross-pollination." I bring real-world experiences from the schools into the university classroom, and I take the latest research and teaching strategies from Northwestern back to the public schools. This symbiotic relationship enhances my ability to lead and teach, fostering innovation and growth in both settings. It's a dynamic exchange that not only enriches my professional practice but also benefits my students and colleagues across both institutions. This synergy is central to my approach, allowing me to contribute meaningfully to the advancement of education and language learning.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis

Jan vanhove.

Department of Multilingualism, University of Fribourg, Fribourg, Switzerland

Analyzed the data: JV. Wrote the paper: JV.

Associated Data

In second language acquisition research, the critical period hypothesis ( cph ) holds that the function between learners' age and their susceptibility to second language input is non-linear. This paper revisits the indistinctness found in the literature with regard to this hypothesis's scope and predictions. Even when its scope is clearly delineated and its predictions are spelt out, however, empirical studies–with few exceptions–use analytical (statistical) tools that are irrelevant with respect to the predictions made. This paper discusses statistical fallacies common in cph research and illustrates an alternative analytical method (piecewise regression) by means of a reanalysis of two datasets from a 2010 paper purporting to have found cross-linguistic evidence in favour of the cph . This reanalysis reveals that the specific age patterns predicted by the cph are not cross-linguistically robust. Applying the principle of parsimony, it is concluded that age patterns in second language acquisition are not governed by a critical period. To conclude, this paper highlights the role of confirmation bias in the scientific enterprise and appeals to second language acquisition researchers to reanalyse their old datasets using the methods discussed in this paper. The data and R commands that were used for the reanalysis are provided as supplementary materials.

Introduction

In the long term and in immersion contexts, second-language (L2) learners starting acquisition early in life – and staying exposed to input and thus learning over several years or decades – undisputedly tend to outperform later learners. Apart from being misinterpreted as an argument in favour of early foreign language instruction, which takes place in wholly different circumstances, this general age effect is also sometimes taken as evidence for a so-called ‘critical period’ ( cp ) for second-language acquisition ( sla ). Derived from biology, the cp concept was famously introduced into the field of language acquisition by Penfield and Roberts in 1959 [1] and was refined by Lenneberg eight years later [2] . Lenneberg argued that language acquisition needed to take place between age two and puberty – a period which he believed to coincide with the lateralisation process of the brain. (More recent neurological research suggests that different time frames exist for the lateralisation process of different language functions. Most, however, close before puberty [3] .) However, Lenneberg mostly drew on findings pertaining to first language development in deaf children, feral children or children with serious cognitive impairments in order to back up his claims. For him, the critical period concept was concerned with the implicit “automatic acquisition” [2, p. 176] in immersion contexts and does not preclude the possibility of learning a foreign language after puberty, albeit with much conscious effort and typically less success.

sla research adopted the critical period hypothesis ( cph ) and applied it to second and foreign language learning, resulting in a host of studies. In its most general version, the cph for sla states that the ‘susceptibility’ or ‘sensitivity’ to language input varies as a function of age, with adult L2 learners being less susceptible to input than child L2 learners. Importantly, the age–susceptibility function is hypothesised to be non-linear. Moving beyond this general version, we find that the cph is conceptualised in a multitude of ways [4] . This state of affairs requires scholars to make explicit their theoretical stance and assumptions [5] , but has the obvious downside that critical findings risk being mitigated as posing a problem to only one aspect of one particular conceptualisation of the cph , whereas other conceptualisations remain unscathed. This overall vagueness concerns two areas in particular, viz. the delineation of the cph 's scope and the formulation of testable predictions. Delineating the scope and formulating falsifiable predictions are, needless to say, fundamental stages in the scientific evaluation of any hypothesis or theory, but the lack of scholarly consensus on these points seems to be particularly pronounced in the case of the cph . This article therefore first presents a brief overview of differing views on these two stages. Then, once the scope of their cph version has been duly identified and empirical data have been collected using solid methods, it is essential that researchers analyse the data patterns soundly in order to assess the predictions made and that they draw justifiable conclusions from the results. As I will argue in great detail, however, the statistical analysis of data patterns as well as their interpretation in cph research – and this includes both critical and supportive studies and overviews – leaves a great deal to be desired. Reanalysing data from a recent cph -supportive study, I illustrate some common statistical fallacies in cph research and demonstrate how one particular cph prediction can be evaluated.

Delineating the scope of the critical period hypothesis

First, the age span for a putative critical period for language acquisition has been delimited in different ways in the literature [4] . Lenneberg's critical period stretched from two years of age to puberty (which he posits at about 14 years of age) [2] , whereas other scholars have drawn the cutoff point at 12, 15, 16 or 18 years of age [6] . Unlike Lenneberg, most researchers today do not define a starting age for the critical period for language learning. Some, however, consider the possibility of the critical period (or a critical period for a specific language area, e.g. phonology) ending much earlier than puberty (e.g. age 9 years [1] , or as early as 12 months in the case of phonology [7] ).

Second, some vagueness remains as to the setting that is relevant to the cph . Does the critical period constrain implicit learning processes only, i.e. only the untutored language acquisition in immersion contexts or does it also apply to (at least partly) instructed learning? Most researchers agree on the former [8] , but much research has included subjects who have had at least some instruction in the L2.

Third, there is no consensus on what the scope of the cp is as far as the areas of language that are concerned. Most researchers agree that a cp is most likely to constrain the acquisition of pronunciation and grammar and, consequently, these are the areas primarily looked into in studies on the cph [9] . Some researchers have also tried to define distinguishable cp s for the different language areas of phonetics, morphology and syntax and even for lexis (see [10] for an overview).

Fourth and last, research into the cph has focused on ‘ultimate attainment’ ( ua ) or the ‘final’ state of L2 proficiency rather than on the rate of learning. From research into the rate of acquisition (e.g. [11] – [13] ), it has become clear that the cph cannot hold for the rate variable. In fact, it has been observed that adult learners proceed faster than child learners at the beginning stages of L2 acquisition. Though theoretical reasons for excluding the rate can be posited (the initial faster rate of learning in adults may be the result of more conscious cognitive strategies rather than to less conscious implicit learning, for instance), rate of learning might from a different perspective also be considered an indicator of ‘susceptibility’ or ‘sensitivity’ to language input. Nevertheless, contemporary sla scholars generally seem to concur that ua and not rate of learning is the dependent variable of primary interest in cph research. These and further scope delineation problems relevant to cph research are discussed in more detail by, among others, Birdsong [9] , DeKeyser and Larson-Hall [14] , Long [10] and Muñoz and Singleton [6] .

Formulating testable hypotheses

Once the relevant cph 's scope has satisfactorily been identified, clear and testable predictions need to be drawn from it. At this stage, the lack of consensus on what the consequences or the actual observable outcome of a cp would have to look like becomes evident. As touched upon earlier, cph research is interested in the end state or ‘ultimate attainment’ ( ua ) in L2 acquisition because this “determines the upper limits of L2 attainment” [9, p. 10]. The range of possible ultimate attainment states thus helps researchers to explore the potential maximum outcome of L2 proficiency before and after the putative critical period.

One strong prediction made by some cph exponents holds that post- cp learners cannot reach native-like L2 competences. Identifying a single native-like post- cp L2 learner would then suffice to falsify all cph s making this prediction. Assessing this prediction is difficult, however, since it is not clear what exactly constitutes sufficient nativelikeness, as illustrated by the discussion on the actual nativelikeness of highly accomplished L2 speakers [15] , [16] . Indeed, there exists a real danger that, in a quest to vindicate the cph , scholars set the bar for L2 learners to match monolinguals increasingly higher – up to Swiftian extremes. Furthermore, the usefulness of comparing the linguistic performance in mono- and bilinguals has been called into question [6] , [17] , [18] . Put simply, the linguistic repertoires of mono- and bilinguals differ by definition and differences in the behavioural outcome will necessarily be found, if only one digs deep enough.

A second strong prediction made by cph proponents is that the function linking age of acquisition and ultimate attainment will not be linear throughout the whole lifespan. Before discussing how this function would have to look like in order for it to constitute cph -consistent evidence, I point out that the ultimate attainment variable can essentially be considered a cumulative measure dependent on the actual variable of interest in cph research, i.e. susceptibility to language input, as well as on such other factors like duration and intensity of learning (within and outside a putative cp ) and possibly a number of other influencing factors. To elaborate, the behavioural outcome, i.e. ultimate attainment, can be assumed to be integrative to the susceptibility function, as Newport [19] correctly points out. Other things being equal, ultimate attainment will therefore decrease as susceptibility decreases. However, decreasing ultimate attainment levels in and by themselves represent no compelling evidence in favour of a cph . The form of the integrative curve must therefore be predicted clearly from the susceptibility function. Additionally, the age of acquisition–ultimate attainment function can take just about any form when other things are not equal, e.g. duration of learning (Does learning last up until time of testing or only for a more or less constant number of years or is it dependent on age itself?) or intensity of learning (Do learners always learn at their maximum susceptibility level or does this intensity vary as a function of age, duration, present attainment and motivation?). The integral of the susceptibility function could therefore be of virtually unlimited complexity and its parameters could be adjusted to fit any age of acquisition–ultimate attainment pattern. It seems therefore astonishing that the distinction between level of sensitivity to language input and level of ultimate attainment is rarely made in the literature. Implicitly or explicitly [20] , the two are more or less equated and the same mathematical functions are expected to describe the two variables if observed across a range of starting ages of acquisition.

But even when the susceptibility and ultimate attainment variables are equated, there remains controversy as to what function linking age of onset of acquisition and ultimate attainment would actually constitute evidence for a critical period. Most scholars agree that not any kind of age effect constitutes such evidence. More specifically, the age of acquisition–ultimate attainment function would need to be different before and after the end of the cp [9] . According to Birdsong [9] , three basic possible patterns proposed in the literature meet this condition. These patterns are presented in Figure 1 . The first pattern describes a steep decline of the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function up to the end of the cp and a practically non-existent age effect thereafter. Pattern 2 is an “unconventional, although often implicitly invoked” [9, p. 17] notion of the cp function which contains a period of peak attainment (or performance at ceiling), i.e. performance does not vary as a function of age, which is often referred to as a ‘window of opportunity’. This time span is followed by an unbounded decline in ua depending on aoa . Pattern 3 includes characteristics of patterns 1 and 2. At the beginning of the aoa range, performance is at ceiling. The next segment is a downward slope in the age function which ends when performance reaches its floor. Birdsong points out that all of these patterns have been reported in the literature. On closer inspection, however, he concludes that the most convincing function describing these age effects is a simple linear one. Hakuta et al. [21] sketch further theoretically possible predictions of the cph in which the mean performance drops drastically and/or the slope of the aoa – ua proficiency function changes at a certain point.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g001.jpg

The graphs are based on based on Figure 2 in [9] .

Although several patterns have been proposed in the literature, it bears pointing out that the most common explicit prediction corresponds to Birdsong's first pattern, as exemplified by the following crystal-clear statement by DeKeyser, one of the foremost cph proponents:

[A] strong negative correlation between age of acquisition and ultimate attainment throughout the lifespan (or even from birth through middle age), the only age effect documented in many earlier studies, is not evidence for a critical period…[T]he critical period concept implies a break in the AoA–proficiency function, i.e., an age (somewhat variable from individual to individual, of course, and therefore an age range in the aggregate) after which the decline of success rate in one or more areas of language is much less pronounced and/or clearly due to different reasons. [22, p. 445].

DeKeyser and before him among others Johnson and Newport [23] thus conceptualise only one possible pattern which would speak in favour of a critical period: a clear negative age effect before the end of the critical period and a much weaker (if any) negative correlation between age and ultimate attainment after it. This ‘flattened slope’ prediction has the virtue of being much more tangible than the ‘potential nativelikeness’ prediction: Testing it does not necessarily require comparing the L2-learners to a native control group and thus effectively comparing apples and oranges. Rather, L2-learners with different aoa s can be compared amongst themselves without the need to categorise them by means of a native-speaker yardstick, the validity of which is inevitably going to be controversial [15] . In what follows, I will concern myself solely with the ‘flattened slope’ prediction, arguing that, despite its clarity of formulation, cph research has generally used analytical methods that are irrelevant for the purposes of actually testing it.

Inferring non-linearities in critical period research: An overview

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e005.jpg

Group mean or proportion comparisons

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e007.jpg

[T]he main differences can be found between the native group and all other groups – including the earliest learner group – and between the adolescence group and all other groups. However, neither the difference between the two childhood groups nor the one between the two adulthood groups reached significance, which indicates that the major changes in eventual perceived nativelikeness of L2 learners can be associated with adolescence. [15, p. 270].

Similar group comparisons aimed at investigating the effect of aoa on ua have been carried out by both cph advocates and sceptics (among whom Bialystok and Miller [25, pp. 136–139], Birdsong and Molis [26, p. 240], Flege [27, pp. 120–121], Flege et al. [28, pp. 85–86], Johnson [29, p. 229], Johnson and Newport [23, p. 78], McDonald [30, pp. 408–410] and Patowski [31, pp. 456–458]). To be clear, not all of these authors drew direct conclusions about the aoa – ua function on the basis of these groups comparisons, but their group comparisons have been cited as indicative of a cph -consistent non-continuous age effect, as exemplified by the following quote by DeKeyser [22] :

Where group comparisons are made, younger learners always do significantly better than the older learners. The behavioral evidence, then, suggests a non-continuous age effect with a “bend” in the AoA–proficiency function somewhere between ages 12 and 16. [22, p. 448].

The first problem with group comparisons like these and drawing inferences on the basis thereof is that they require that a continuous variable, aoa , be split up into discrete bins. More often than not, the boundaries between these bins are drawn in an arbitrary fashion, but what is more troublesome is the loss of information and statistical power that such discretisation entails (see [32] for the extreme case of dichotomisation). If we want to find out more about the relationship between aoa and ua , why throw away most of the aoa information and effectively reduce the ua data to group means and the variance in those groups?

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e010.jpg

Comparison of correlation coefficients

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e026.jpg

Correlation-based inferences about slope discontinuities have similarly explicitly been made by cph advocates and skeptics alike, e.g. Bialystok and Miller [25, pp. 136 and 140], DeKeyser and colleagues [22] , [44] and Flege et al. [45, pp. 166 and 169]. Others did not explicitly infer the presence or absence of slope differences from the subset correlations they computed (among others Birdsong and Molis [26] , DeKeyser [8] , Flege et al. [28] and Johnson [29] ), but their studies nevertheless featured in overviews discussing discontinuities [14] , [22] . Indeed, the most recent overview draws a strong conclusion about the validity of the cph 's ‘flattened slope’ prediction on the basis of these subset correlations:

In those studies where the two groups are described separately, the correlation is much higher for the younger than for the older group, except in Birdsong and Molis (2001) [ =  [26] , JV], where there was a ceiling effect for the younger group. This global picture from more than a dozen studies provides support for the non-continuity of the decline in the AoA–proficiency function, which all researchers agree is a hallmark of a critical period phenomenon. [22, p. 448].

In Johnson and Newport's specific case [23] , their correlation-based inference that ua levels off after puberty happened to be largely correct: the gjt scores are more or less randomly distributed around a near-horizontal trend line [26] . Ultimately, however, it rests on the fallacy of confusing correlation coefficients with slopes, which seriously calls into question conclusions such as DeKeyser's (cf. the quote above).

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e030.jpg

It can then straightforwardly be deduced that, other things equal, the aoa – ua correlation in the older group decreases as the ua variance in the older group increases relative to the ua variance in the younger group (Eq. 3).

equation image

Lower correlation coefficients in older aoa groups may therefore be largely due to differences in ua variance, which have been reported in several studies [23] , [26] , [28] , [29] (see [46] for additional references). Greater variability in ua with increasing age is likely due to factors other than age proper [47] , such as the concomitant greater variability in exposure to literacy, degree of education, motivation and opportunity for language use, and by itself represents evidence neither in favour of nor against the cph .

Regression approaches

Having demonstrated that neither group mean or proportion comparisons nor correlation coefficient comparisons can directly address the ‘flattened slope’ prediction, I now turn to the studies in which regression models were computed with aoa as a predictor variable and ua as the outcome variable. Once again, this category of studies is not mutually exclusive with the two categories discussed above.

In a large-scale study using self-reports and approximate aoa s derived from a sample of the 1990 U.S. Census, Stevens found that the probability with which immigrants from various countries stated that they spoke English ‘very well’ decreased curvilinearly as a function of aoa [48] . She noted that this development is similar to the pattern found by Johnson and Newport [23] but that it contains no indication of an “abruptly defined ‘critical’ or sensitive period in L2 learning” [48, p. 569]. However, she modelled the self-ratings using an ordinal logistic regression model in which the aoa variable was logarithmically transformed. Technically, this is perfectly fine, but one should be careful not to read too much into the non-linear curves found. In logistic models, the outcome variable itself is modelled linearly as a function of the predictor variables and is expressed in log-odds. In order to compute the corresponding probabilities, these log-odds are transformed using the logistic function. Consequently, even if the model is specified linearly, the predicted probabilities will not lie on a perfectly straight line when plotted as a function of any one continuous predictor variable. Similarly, when the predictor variable is first logarithmically transformed and then used to linearly predict an outcome variable, the function linking the predicted outcome variables and the untransformed predictor variable is necessarily non-linear. Thus, non-linearities follow naturally from Stevens's model specifications. Moreover, cph -consistent discontinuities in the aoa – ua function cannot be found using her model specifications as they did not contain any parameters allowing for this.

Using data similar to Stevens's, Bialystok and Hakuta found that the link between the self-rated English competences of Chinese- and Spanish-speaking immigrants and their aoa could be described by a straight line [49] . In contrast to Stevens, Bialystok and Hakuta used a regression-based method allowing for changes in the function's slope, viz. locally weighted scatterplot smoothing ( lowess ). Informally, lowess is a non-parametrical method that relies on an algorithm that fits the dependent variable for small parts of the range of the independent variable whilst guaranteeing that the overall curve does not contain sudden jumps (for technical details, see [50] ). Hakuta et al. used an even larger sample from the same 1990 U.S. Census data on Chinese- and Spanish-speaking immigrants (2.3 million observations) [21] . Fitting lowess curves, no discontinuities in the aoa – ua slope could be detected. Moreover, the authors found that piecewise linear regression models, i.e. regression models containing a parameter that allows a sudden drop in the curve or a change of its slope, did not provide a better fit to the data than did an ordinary regression model without such a parameter.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e060.jpg

To sum up, I have argued at length that regression approaches are superior to group mean and correlation coefficient comparisons for the purposes of testing the ‘flattened slope’ prediction. Acknowledging the reservations vis-à-vis self-estimated ua s, we still find that while the relationship between aoa and ua is not necessarily perfectly linear in the studies discussed, the data do not lend unequivocal support to this prediction. In the following section, I will reanalyse data from a recent empirical paper on the cph by DeKeyser et al. [44] . The first goal of this reanalysis is to further illustrate some of the statistical fallacies encountered in cph studies. Second, by making the computer code available I hope to demonstrate how the relevant regression models, viz. piecewise regression models, can be fitted and how the aoa representing the optimal breakpoint can be identified. Lastly, the findings of this reanalysis will contribute to our understanding of how aoa affects ua as measured using a gjt .

Summary of DeKeyser et al. (2010)

I chose to reanalyse a recent empirical paper on the cph by DeKeyser et al. [44] (henceforth DK et al.). This paper lends itself well to a reanalysis since it exhibits two highly commendable qualities: the authors spell out their hypotheses lucidly and provide detailed numerical and graphical data descriptions. Moreover, the paper's lead author is very clear on what constitutes a necessary condition for accepting the cph : a non-linearity in the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function, with ua declining less strongly as a function of aoa in older, post- cp arrivals compared to younger arrivals [14] , [22] . Lastly, it claims to have found cross-linguistic evidence from two parallel studies backing the cph and should therefore be an unsuspected source to cph proponents.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e067.jpg

The authors set out to test the following hypotheses:

  • Hypothesis 1: For both the L2 English and the L2 Hebrew group, the slope of the age of arrival–ultimate attainment function will not be linear throughout the lifespan, but will instead show a marked flattening between adolescence and adulthood.
  • Hypothesis 2: The relationship between aptitude and ultimate attainment will differ markedly for the young and older arrivals, with significance only for the latter. (DK et al., p. 417)

Both hypotheses were purportedly confirmed, which in the authors' view provides evidence in favour of cph . The problem with this conclusion, however, is that it is based on a comparison of correlation coefficients. As I have argued above, correlation coefficients are not to be confused with regression coefficients and cannot be used to directly address research hypotheses concerning slopes, such as Hypothesis 1. In what follows, I will reanalyse the relationship between DK et al.'s aoa and gjt data in order to address Hypothesis 1. Additionally, I will lay bare a problem with the way in which Hypothesis 2 was addressed. The extracted data and the computer code used for the reanalysis are provided as supplementary materials, allowing anyone interested to scrutinise and easily reproduce my whole analysis and carry out their own computations (see ‘supporting information’).

Data extraction

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e069.jpg

In order to verify whether we did in fact extract the data points to a satisfactory degree of accuracy, I computed summary statistics for the extracted aoa and gjt data and checked these against the descriptive statistics provided by DK et al. (pp. 421 and 427). These summary statistics for the extracted data are presented in Table 1 . In addition, I computed the correlation coefficients for the aoa – gjt relationship for the whole aoa range and for aoa -defined subgroups and checked these coefficients against those reported by DK et al. (pp. 423 and 428). The correlation coefficients computed using the extracted data are presented in Table 2 . Both checks strongly suggest the extracted data to be virtually identical to the original data, and Dr DeKeyser confirmed this to be the case in response to an earlier draft of the present paper (personal communication, 6 May 2013).

Results and Discussion

Modelling the link between age of onset of acquisition and ultimate attainment.

I first replotted the aoa and gjt data we extracted from DK et al.'s scatterplots and added non-parametric scatterplot smoothers in order to investigate whether any changes in slope in the aoa – gjt function could be revealed, as per Hypothesis 1. Figures 3 and ​ and4 4 show this not to be the case. Indeed, simple linear regression models that model gjt as a function of aoa provide decent fits for both the North America and the Israel data, explaining 65% and 63% of the variance in gjt scores, respectively. The parameters of these models are given in Table 3 .

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g003.jpg

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 1.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g004.jpg

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 5.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e073.jpg

To ensure that both segments are joined at the breakpoint, the predictor variable is first centred at the breakpoint value, i.e. the breakpoint value is subtracted from the original predictor variable values. For a blow-by-blow account of how such models can be fitted in r , I refer to an example analysis by Baayen [55, pp. 214–222].

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e081.jpg

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g006.jpg

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e092.jpg

Solid: regression with breakpoint at aoa 16 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g009.jpg

Solid: regression with breakpoint at aoa 6 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e105.jpg

In sum, a regression model that allows for changes in the slope of the the aoa – gjt function to account for putative critical period effects provides a somewhat better fit to the North American data than does an everyday simple regression model. The improvement in model fit is marginal, however, and including a breakpoint does not result in any detectable improvement of model fit to the Israel data whatsoever. Breakpoint models therefore fail to provide solid cross-linguistic support in favour of critical period effects: across both data sets, gjt can satisfactorily be modelled as a linear function of aoa .

On partialling out ‘age at testing’

As I have argued above, correlation coefficients cannot be used to test hypotheses about slopes. When the correct procedure is carried out on DK et al.'s data, no cross-linguistically robust evidence for changes in the aoa – gjt function was found. In addition to comparing the zero-order correlations between aoa and gjt , however, DK et al. computed partial correlations in which the variance in aoa associated with the participants' age at testing ( aat ; a potentially confounding variable) was filtered out. They found that these partial correlations between aoa and gjt , which are given in Table 9 , differed between age groups in that they are stronger for younger than for older participants. This, DK et al. argue, constitutes additional evidence in favour of the cph . At this point, I can no longer provide my own analysis of DK et al.'s data seeing as the pertinent data points were not plotted. Nevertheless, the detailed descriptions by DK et al. strongly suggest that the use of these partial correlations is highly problematic. Most importantly, and to reiterate, correlations (whether zero-order or partial ones) are actually of no use when testing hypotheses concerning slopes. Still, one may wonder why the partial correlations differ across age groups. My surmise is that these differences are at least partly the by-product of an imbalance in the sampling procedure.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e109.jpg

The upshot of this brief discussion is that the partial correlation differences reported by DK et al. are at least partly the result of an imbalance in the sampling procedure: aoa and aat were simply less intimately tied for the young arrivals in the North America study than for the older arrivals with L2 English or for all of the L2 Hebrew participants. In an ideal world, we would like to fix aat or ascertain that it at most only weakly correlates with aoa . This, however, would result in a strong correlation between aoa and another potential confound variable, length of residence in the L2 environment, bringing us back to square one. Allowing for only moderate correlations between aoa and aat might improve our predicament somewhat, but even in that case, we should tread lightly when making inferences on the basis of statistical control procedures [61] .

On estimating the role of aptitude

Having shown that Hypothesis 1 could not be confirmed, I now turn to Hypothesis 2, which predicts a differential role of aptitude for ua in sla in different aoa groups. More specifically, it states that the correlation between aptitude and gjt performance will be significant only for older arrivals. The correlation coefficients of the relationship between aptitude and gjt are presented in Table 10 .

The problem with both the wording of Hypothesis 2 and the way in which it is addressed is the following: it is assumed that a variable has a reliably different effect in different groups when the effect reaches significance in one group but not in the other. This logic is fairly widespread within several scientific disciplines (see e.g. [62] for a discussion). Nonetheless, it is demonstrably fallacious [63] . Here we will illustrate the fallacy for the specific case of comparing two correlation coefficients.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e130.jpg

Apart from not being replicated in the North America study, does this difference actually show anything? I contend that it does not: what is of interest are not so much the correlation coefficients, but rather the interactions between aoa and aptitude in models predicting gjt . These interactions could be investigated by fitting a multiple regression model in which the postulated cp breakpoint governs the slope of both aoa and aptitude. If such a model provided a substantially better fit to the data than a model without a breakpoint for the aptitude slope and if the aptitude slope changes in the expected direction (i.e. a steeper slope for post- cp than for younger arrivals) for different L1–L2 pairings, only then would this particular prediction of the cph be borne out.

Using data extracted from a paper reporting on two recent studies that purport to provide evidence in favour of the cph and that, according to its authors, represent a major improvement over earlier studies (DK et al., p. 417), it was found that neither of its two hypotheses were actually confirmed when using the proper statistical tools. As a matter of fact, the gjt scores continue to decline at essentially the same rate even beyond the end of the putative critical period. According to the paper's lead author, such a finding represents a serious problem to his conceptualisation of the cph [14] ). Moreover, although modelling a breakpoint representing the end of a cp at aoa 16 may improve the statistical model slightly in study on learners of English in North America, the study on learners of Hebrew in Israel fails to confirm this finding. In fact, even if we were to accept the optimal breakpoint computed for the Israel study, it lies at aoa 6 and is associated with a different geometrical pattern.

Diverging age trends in parallel studies with participants with different L2s have similarly been reported by Birdsong and Molis [26] and are at odds with an L2-independent cph . One parsimonious explanation of such conflicting age trends may be that the overall, cross-linguistic age trend is in fact linear, but that fluctuations in the data (due to factors unaccounted for or randomness) may sometimes give rise to a ‘stretched L’-shaped pattern ( Figure 1, left panel ) and sometimes to a ‘stretched 7’-shaped pattern ( Figure 1 , middle panel; see also [66] for a similar comment).

Importantly, the criticism that DeKeyser and Larsson-Hall levy against two studies reporting findings similar to the present [48] , [49] , viz. that the data consisted of self-ratings of questionable validity [14] , does not apply to the present data set. In addition, DK et al. did not exclude any outliers from their analyses, so I assume that DeKeyser and Larsson-Hall's criticism [14] of Birdsong and Molis's study [26] , i.e. that the findings were due to the influence of outliers, is not applicable to the present data either. For good measure, however, I refitted the regression models with and without breakpoints after excluding one potentially problematic data point per model. The following data points had absolute standardised residuals larger than 2.5 in the original models without breakpoints as well as in those with breakpoints: the participant with aoa 17 and a gjt score of 125 in the North America study and the participant with aoa 12 and a gjt score of 117 in the Israel study. The resultant models were virtually identical to the original models (see Script S1 ). Furthermore, the aoa variable was sufficiently fine-grained and the aoa – gjt curve was not ‘presmoothed’ by the prior aggregation of gjt across parts of the aoa range (see [51] for such a criticism of another study). Lastly, seven of the nine “problems with supposed counter-evidence” to the cph discussed by Long [5] do not apply either, viz. (1) “[c]onfusion of rate and ultimate attainment”, (2) “[i]nappropriate choice of subjects”, (3) “[m]easurement of AO”, (4) “[l]eading instructions to raters”, (6) “[u]se of markedly non-native samples making near-native samples more likely to sound native to raters”, (7) “[u]nreliable or invalid measures”, and (8) “[i]nappropriate L1–L2 pairings”. Problem No. 5 (“Assessments based on limited samples and/or “language-like” behavior”) may be apropos given that only gjt data were used, leaving open the theoretical possibility that other measures might have yielded a different outcome. Finally, problem No. 9 (“Faulty interpretation of statistical patterns”) is, of course, precisely what I have turned the spotlights on.

Conclusions

The critical period hypothesis remains a hotly contested issue in the psycholinguistics of second-language acquisition. Discussions about the impact of empirical findings on the tenability of the cph generally revolve around the reliability of the data gathered (e.g. [5] , [14] , [22] , [52] , [67] , [68] ) and such methodological critiques are of course highly desirable. Furthermore, the debate often centres on the question of exactly what version of the cph is being vindicated or debunked. These versions differ mainly in terms of its scope, specifically with regard to the relevant age span, setting and language area, and the testable predictions they make. But even when the cph 's scope is clearly demarcated and its main prediction is spelt out lucidly, the issue remains to what extent the empirical findings can actually be marshalled in support of the relevant cph version. As I have shown in this paper, empirical data have often been taken to support cph versions predicting that the relationship between age of acquisition and ultimate attainment is not strictly linear, even though the statistical tools most commonly used (notably group mean and correlation coefficient comparisons) were, crudely put, irrelevant to this prediction. Methods that are arguably valid, e.g. piecewise regression and scatterplot smoothing, have been used in some studies [21] , [26] , [49] , but these studies have been criticised on other grounds. To my knowledge, such methods have never been used by scholars who explicitly subscribe to the cph .

I suspect that what may be going on is a form of ‘confirmation bias’ [69] , a cognitive bias at play in diverse branches of human knowledge seeking: Findings judged to be consistent with one's own hypothesis are hardly questioned, whereas findings inconsistent with one's own hypothesis are scrutinised much more strongly and criticised on all sorts of points [70] – [73] . My reanalysis of DK et al.'s recent paper may be a case in point. cph exponents used correlation coefficients to address their prediction about the slope of a function, as had been done in a host of earlier studies. Finding a result that squared with their expectations, they did not question the technical validity of their results, or at least they did not report this. (In fact, my reanalysis is actually a case in point in two respects: for an earlier draft of this paper, I had computed the optimal position of the breakpoints incorrectly, resulting in an insignificant improvement of model fit for the North American data rather than a borderline significant one. Finding a result that squared with my expectations, I did not question the technical validity of my results – until this error was kindly pointed out to me by Martijn Wieling (University of Tübingen).) That said, I am keen to point out that the statistical analyses in this particular paper, though suboptimal, are, as far as I could gather, reported correctly, i.e. the confirmation bias does not seem to have resulted in the blatant misreportings found elsewhere (see [74] for empirical evidence and discussion). An additional point to these authors' credit is that, apart from explicitly identifying their cph version's scope and making crystal-clear predictions, they present data descriptions that actually permit quantitative reassessments and have a history of doing so (e.g. the appendix in [8] ). This leads me to believe that they analysed their data all in good conscience and to hope that they, too, will conclude that their own data do not, in fact, support their hypothesis.

I end this paper on an upbeat note. Even though I have argued that the analytical tools employed in cph research generally leave much to be desired, the original data are, so I hope, still available. This provides researchers, cph supporters and sceptics alike, with an exciting opportunity to reanalyse their data sets using the tools outlined in the present paper and publish their findings at minimal cost of time and resources (for instance, as a comment to this paper). I would therefore encourage scholars to engage their old data sets and to communicate their analyses openly, e.g. by voluntarily publishing their data and computer code alongside their articles or comments. Ideally, cph supporters and sceptics would join forces to agree on a protocol for a high-powered study in order to provide a truly convincing answer to a core issue in sla .

Supporting Information

aoa and gjt data extracted from DeKeyser et al.'s North America study.

aoa and gjt data extracted from DeKeyser et al.'s Israel study.

Script with annotated R code used for the reanalysis. All add-on packages used can be installed from within R.

Acknowledgments

I would like to thank Irmtraud Kaiser (University of Fribourg) for helping me to get an overview of the literature on the critical period hypothesis in second language acquisition. Thanks are also due to Martijn Wieling (currently University of Tübingen) for pointing out an error in the R code accompanying an earlier draft of this paper.

Funding Statement

No current external funding sources for this study.

  • Nebraska Medicine

Understanding Hypothesis Testing, Significance Level, Power and Sample Size Calculation

  • Written by Steph Langel
  • Published Apr 4, 2024

hypothesis on language learning

This e-module offers an in-depth discussion of the essential components of hypothesis testing: significance levels, statistical power, and sample size calculations, which are fundamental to rigorous research methodology. Learners will develop a comprehensive understanding of designing, interpreting, and evaluating research findings through interactive content and real-world case studies. This will enable them to make well-informed decisions based on statistical best practices. The module’s framework allows a thorough learning experience, starting from fundamental definitions and progressing to the hands-on implementation of statistical ideas. This ensures that learners acquire the essential abilities to conduct ethically appropriate and scientifically valid research.

Course Number

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Recommended

hypothesis on language learning

Help | Advanced Search

Computer Science > Machine Learning

Title: mixture-of-depths: dynamically allocating compute in transformer-based language models.

Abstract: Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by capping the number of tokens ($k$) that can participate in the self-attention and MLP computations at a given layer. The tokens to be processed are determined by the network using a top-$k$ routing mechanism. Since $k$ is defined a priori, this simple procedure uses a static computation graph with known tensor sizes, unlike other conditional computation techniques. Nevertheless, since the identities of the $k$ tokens are fluid, this method can expend FLOPs non-uniformly across the time and model depth dimensions. Thus, compute expenditure is entirely predictable in sum total, but dynamic and context-sensitive at the token-level. Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50\% faster to step during post-training sampling.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

  1. 12 Krashen's Hypotheses ideas

    hypothesis on language learning

  2. The Noticing Hypothesis in Language Learning

    hypothesis on language learning

  3. The Acquisition-Learning Hypothesis, The Monitor Model, Second Language Acquisition

    hypothesis on language learning

  4. PPT

    hypothesis on language learning

  5. PPT

    hypothesis on language learning

  6. PPT

    hypothesis on language learning

VIDEO

  1. 6- The neural theory of language

  2. Hypothesis Tests on Means in R (updated)

  3. Linguistics relativity (Sapir-Whorf hypothesis)

  4. MATH 1342

  5. LENNEBERG, CRITICAL PERIOD HYPOTHESIS, LATERLISATION

  6. krashen's five hypothesis of second language acquisition

COMMENTS

  1. Review Article A review of theoretical perspectives on language learning and acquisition

    Chomsky (1976) counterattacks the theory of Behaviorism by bringing into light his concept of Universal Grammar (UG) in which every human is biologically equipped to learn language using the language faculty or the Language Acquisition Device (LAD), which is responsible for the initial stage of language development. Based on the UG theory, the ...

  2. Cognitive and behavioral approaches to language acquisition: Conceptual

    The past 20 years have seen research on language acquisition in the cognitive sciences grow immensely. The current paper offers a fairly extensive review of this literature, arguing that new cognitive theories and empirical data are perfectly consistent with core predictions a behavior analytic approach makes about language development. The review focuses on important examples of productive ...

  3. 9 Influential Theories of Language Learning by Brilliant Thinkers

    6. Cognitive Theory. The Cognitive Theory of language acquisition made its mark in the late 20th century, influenced by the pioneering work of Swiss psychologist Jean Piaget. This theory emphasizes the role of cognitive processes like memory, attention and problem-solving in language learning journey.

  4. Language Acquisition Theory In Psychology

    Language acquisition refers to the process by which individuals learn and develop their native or second language. It involves the acquisition of grammar, vocabulary, and communication skills through exposure, interaction, and cognitive development. This process typically occurs in childhood but can continue throughout life.

  5. Krashen and Terrell's "Natural Approach"

    The Acquisition-Learning Hypothesis. First is the Acquisition-Learning Hypothesis, which makes a distinction between "acquisition," which he defines as developing competence by using language for "real communication" and "learning." which he defines as "knowing about" or "formal knowledge" of a language (p.26).

  6. Essentials of a Theory of Language Cognition

    These are exciting times to work in usage-based approaches to language learning because the enterprise brings together people working from different but complementary empirical and theoretical approaches: cognitive linguistics, construction grammar, functional linguistics, cognitive psychology, learning theory, psycholinguistics, statistical ...

  7. The social brain of language: grounding second language learning in

    The SL2 approach advocated here also echoes a movement in the broader language science, from sociocultural theory 28 to usage-based language learning 29 and conversational analysis 30, all of ...

  8. Krashen's Language Acquisition Hypotheses: A Critical Review

    The monitor hypothesis The monitor hypothesis puts forward that utterances in second language are initiated by acquisition and are monitored by learning. Acquisition is responsible for fluency. Krashen maintains that language performers may be able to use conscious rules if they have enough time, focus on the forms, and know the rules.

  9. PDF Krashen's Five Proposals on Language Learning: Are They Valid in ...

    Keywords: language acquisition, language learning, Monitor theory, Stephen Krashen, EFL classes, EFL methodology 1. Introduction Unlike some earlier theories about language learning, Krashen's theory on second language acquisition (SLA) has been stated in simple language- in words the majority of teachers can understand, and uses examples from

  10. Language and nonlanguage factors in foreign language learning ...

    The second set of hypotheses, including the linguistic coding deficit/differences hypothesis (LCDH) 7,8,19,20, argue that both native and foreign language learning are tied to the same set of core ...

  11. Theories of Language Development

    Researchers now believe that language acquisition is partially inborn and partially learned through our interactions with our linguistic environment (Gleitman & Newport, 1995; Stork & Widdowson, 1974). Learning Theory: Perhaps the most straightforward explanation of language development is that it occurs through the principles of learning ...

  12. Theories of the early stages of language acquisition

    The learning theory of language acquisition suggests that children learn a language much like they learn to tie their shoes or how to count; through repetition and reinforcement. When babies first learn to babble, parents and guardians smile, coo, and hug them for this behavior. As they grow older, children are praised for speaking properly and ...

  13. The Critical Period Hypothesis in Second Language Acquisition: A ...

    In second language acquisition research, the critical period hypothesis (cph) holds that the function between learners' age and their susceptibility to second language input is non-linear. This paper revisits the indistinctness found in the literature with regard to this hypothesis's scope and predictions. Even when its scope is clearly delineated and its predictions are spelt out, however ...

  14. Noticing in second language acquisition: a critical review

    This article examines the Noticing Hypothesis - the claim that second language learners must consciously notice the grammatical form of their input in order to acquire grammar. ... Instance theory and second language rule learning under explicit conditions . Studies in Second Language Acquisition 15, 413-438 . Google Scholar. Rutherford, W ...

  15. Social Interaction and Language Acquisition

    These data have led to the theoretical hypothesis that social interaction "gates" language learning (Kuhl, 2007; 2011). However, the underlying brain mechanisms by which the social gating hypothesis might work are not well understood. This chapter reviews the brain and behavioral data on the effects of social interaction on language ...

  16. Testing Hypotheses about Language Learning Using Structural Equation

    Language Learning, 62, 1170 - 1204. doi: 10.1111/j.1467-9922.2012.00722.x. Kieffer and Lesaux investigated how derivational morphological awareness impacts English reading comprehension in sixth-grade students (n = 952) in southern California. The students came from different language backgrounds: native English, Spanish-speaking language ...

  17. A critical period for second language acquisition: Evidence from 2/3

    Language Learning. 2006; 56:9-49. [Google Scholar] Birdsong D. The critical period hypothesis for second language acquisition: Tailoring the coat of many colors. In: Pawlak M, Aronin L, editors. Essential topics in applied linguistics and multilingualism. Studies in honor of David Singleton. Berlin and New York: Springer; 2014. pp. 43-50.

  18. Stephen Krashen's Five Hypotheses of Second Language Acquisition

    This is similar to when a child picks up their first language. On the other hand, language learning happens when the student is consciously discovering and learning the rules and grammatical structures of the language. 2. Monitor Hypothesis. Monitor Hypothesis states that the learner is consciously learning the grammar rules and functions of a ...

  19. PDF Language Learning Theories: an Overview

    Behavioral learning theory views learning as a response to stimuli in the environment; the learner is a "creature of habit" that can be manipulated, observed, and described. Behaviorist influences in second language teaching can be observed in methods such as the audio-lingual approach and situational language teaching.

  20. (PDF) Theories of Language Learning

    In 1972, Dell Hy mes said that Chomsky's theory about language competence is not good enough t o. understandthe wonder of acquir ing a language (Xia, 2014). T hen, h e created the theo ry of co ...

  21. PDF LANGUAGE ACQUISITION AND LANGUAGE LEARNING

    Language acquisition is based on the neuro-psychological processes (Maslo, 2007: 41). Language acquistion is opposed to learning and is a subconscious process similar to that by which children acquire their first language (Kramina, 2000: 27). Hence, language acquisition is an integral part of the unity of all language (Robbins, 2007: 49).

  22. The Critical Period Hypothesis: Support, Challenge, and Reconc

    through which to test the effects of maturation on language learning. At first glance, the evidence supporting a critical period for second language acquisition seems to be convincing. As Bley-Vroman's (1988) Fundamental Difference Hypothesis argues, adult language learning of an L2 as opposed to an L1 is characterized by widespread failure.

  23. Full article: Social annotations and second language viewers

    The commonly adopted theoretical frameworks include Cognitive Load Theory (Plass et al., Citation 2010) and Cognitive Theory in Multimedia Learning (Mayer, Citation 2009). However, what has not been investigated extensively is the link between video annotations and L2 viewers' engagement and motivation levels, which are particularly important ...

  24. What Toddlers and AI Can Learn From Each Other

    April 5, 2024, 9:47 AM ET. When Luna was seven months old, she began wearing, at the behest of her scientist father, a hot-pink helmet topped with a camera that would, for about an hour at a time ...

  25. Fabiola Fadda-Ginski: A Story of Empowerment Through Language

    March combines the celebration of National World Language Month with Women's History Month, offering a unique opportunity to spotlight leaders who are shaping the future of education through language and empowerment. We had the privilege of speaking with Fabiola Fadda-Ginski, a beacon of inspiration in the world of language education and the Director of World Language Programs for Chicago ...

  26. The Critical Period Hypothesis in Second Language Acquisition: A

    Delineating the scope of the critical period hypothesis. First, the age span for a putative critical period for language acquisition has been delimited in different ways in the literature .Lenneberg's critical period stretched from two years of age to puberty (which he posits at about 14 years of age) , whereas other scholars have drawn the cutoff point at 12, 15, 16 or 18 years of age .

  27. Understanding Hypothesis Testing, Significance Level, Power and Sample

    Learn about the program, timeline and how you can participate. Faculty Benefits. ... This e-module offers an in-depth discussion of the essential components of hypothesis testing: significance levels, statistical power, and sample size calculations, which are fundamental to rigorous research methodology. Learners will develop a comprehensive ...

  28. Allegheny County school districts respond to increase in English

    Holly Niemi walked around her Baldwin High School classroom one day earlier this month, listening as students in her third period English as a second language...

  29. [2403.19887] Jamba: A Hybrid Transformer-Mamba Language Model

    View PDF HTML (experimental) Abstract: We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable.

  30. [2404.02258] Mixture-of-Depths: Dynamically allocating compute in

    Mixture-of-Depths: Dynamically allocating compute in transformer-based language models. Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along ...