Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis

* E-mail: [email protected]

Affiliation Department of Multilingualism, University of Fribourg, Fribourg, Switzerland

  • Jan Vanhove

PLOS

  • Published: July 25, 2013
  • https://doi.org/10.1371/journal.pone.0069172
  • Reader Comments

17 Jul 2014: The PLOS ONE Staff (2014) Correction: The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis. PLOS ONE 9(7): e102922. https://doi.org/10.1371/journal.pone.0102922 View correction

Figure 1

In second language acquisition research, the critical period hypothesis ( cph ) holds that the function between learners' age and their susceptibility to second language input is non-linear. This paper revisits the indistinctness found in the literature with regard to this hypothesis's scope and predictions. Even when its scope is clearly delineated and its predictions are spelt out, however, empirical studies–with few exceptions–use analytical (statistical) tools that are irrelevant with respect to the predictions made. This paper discusses statistical fallacies common in cph research and illustrates an alternative analytical method (piecewise regression) by means of a reanalysis of two datasets from a 2010 paper purporting to have found cross-linguistic evidence in favour of the cph . This reanalysis reveals that the specific age patterns predicted by the cph are not cross-linguistically robust. Applying the principle of parsimony, it is concluded that age patterns in second language acquisition are not governed by a critical period. To conclude, this paper highlights the role of confirmation bias in the scientific enterprise and appeals to second language acquisition researchers to reanalyse their old datasets using the methods discussed in this paper. The data and R commands that were used for the reanalysis are provided as supplementary materials.

Citation: Vanhove J (2013) The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis. PLoS ONE 8(7): e69172. https://doi.org/10.1371/journal.pone.0069172

Editor: Stephanie Ann White, UCLA, United States of America

Received: May 7, 2013; Accepted: June 7, 2013; Published: July 25, 2013

Copyright: © 2013 Jan Vanhove. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: No current external funding sources for this study.

Competing interests: The author has declared that no competing interests exist.

Introduction

In the long term and in immersion contexts, second-language (L2) learners starting acquisition early in life – and staying exposed to input and thus learning over several years or decades – undisputedly tend to outperform later learners. Apart from being misinterpreted as an argument in favour of early foreign language instruction, which takes place in wholly different circumstances, this general age effect is also sometimes taken as evidence for a so-called ‘critical period’ ( cp ) for second-language acquisition ( sla ). Derived from biology, the cp concept was famously introduced into the field of language acquisition by Penfield and Roberts in 1959 [1] and was refined by Lenneberg eight years later [2] . Lenneberg argued that language acquisition needed to take place between age two and puberty – a period which he believed to coincide with the lateralisation process of the brain. (More recent neurological research suggests that different time frames exist for the lateralisation process of different language functions. Most, however, close before puberty [3] .) However, Lenneberg mostly drew on findings pertaining to first language development in deaf children, feral children or children with serious cognitive impairments in order to back up his claims. For him, the critical period concept was concerned with the implicit “automatic acquisition” [2, p. 176] in immersion contexts and does not preclude the possibility of learning a foreign language after puberty, albeit with much conscious effort and typically less success.

sla research adopted the critical period hypothesis ( cph ) and applied it to second and foreign language learning, resulting in a host of studies. In its most general version, the cph for sla states that the ‘susceptibility’ or ‘sensitivity’ to language input varies as a function of age, with adult L2 learners being less susceptible to input than child L2 learners. Importantly, the age–susceptibility function is hypothesised to be non-linear. Moving beyond this general version, we find that the cph is conceptualised in a multitude of ways [4] . This state of affairs requires scholars to make explicit their theoretical stance and assumptions [5] , but has the obvious downside that critical findings risk being mitigated as posing a problem to only one aspect of one particular conceptualisation of the cph , whereas other conceptualisations remain unscathed. This overall vagueness concerns two areas in particular, viz. the delineation of the cph 's scope and the formulation of testable predictions. Delineating the scope and formulating falsifiable predictions are, needless to say, fundamental stages in the scientific evaluation of any hypothesis or theory, but the lack of scholarly consensus on these points seems to be particularly pronounced in the case of the cph . This article therefore first presents a brief overview of differing views on these two stages. Then, once the scope of their cph version has been duly identified and empirical data have been collected using solid methods, it is essential that researchers analyse the data patterns soundly in order to assess the predictions made and that they draw justifiable conclusions from the results. As I will argue in great detail, however, the statistical analysis of data patterns as well as their interpretation in cph research – and this includes both critical and supportive studies and overviews – leaves a great deal to be desired. Reanalysing data from a recent cph -supportive study, I illustrate some common statistical fallacies in cph research and demonstrate how one particular cph prediction can be evaluated.

Delineating the scope of the critical period hypothesis

First, the age span for a putative critical period for language acquisition has been delimited in different ways in the literature [4] . Lenneberg's critical period stretched from two years of age to puberty (which he posits at about 14 years of age) [2] , whereas other scholars have drawn the cutoff point at 12, 15, 16 or 18 years of age [6] . Unlike Lenneberg, most researchers today do not define a starting age for the critical period for language learning. Some, however, consider the possibility of the critical period (or a critical period for a specific language area, e.g. phonology) ending much earlier than puberty (e.g. age 9 years [1] , or as early as 12 months in the case of phonology [7] ).

Second, some vagueness remains as to the setting that is relevant to the cph . Does the critical period constrain implicit learning processes only, i.e. only the untutored language acquisition in immersion contexts or does it also apply to (at least partly) instructed learning? Most researchers agree on the former [8] , but much research has included subjects who have had at least some instruction in the L2.

Third, there is no consensus on what the scope of the cp is as far as the areas of language that are concerned. Most researchers agree that a cp is most likely to constrain the acquisition of pronunciation and grammar and, consequently, these are the areas primarily looked into in studies on the cph [9] . Some researchers have also tried to define distinguishable cp s for the different language areas of phonetics, morphology and syntax and even for lexis (see [10] for an overview).

Fourth and last, research into the cph has focused on ‘ultimate attainment’ ( ua ) or the ‘final’ state of L2 proficiency rather than on the rate of learning. From research into the rate of acquisition (e.g. [11] – [13] ), it has become clear that the cph cannot hold for the rate variable. In fact, it has been observed that adult learners proceed faster than child learners at the beginning stages of L2 acquisition. Though theoretical reasons for excluding the rate can be posited (the initial faster rate of learning in adults may be the result of more conscious cognitive strategies rather than to less conscious implicit learning, for instance), rate of learning might from a different perspective also be considered an indicator of ‘susceptibility’ or ‘sensitivity’ to language input. Nevertheless, contemporary sla scholars generally seem to concur that ua and not rate of learning is the dependent variable of primary interest in cph research. These and further scope delineation problems relevant to cph research are discussed in more detail by, among others, Birdsong [9] , DeKeyser and Larson-Hall [14] , Long [10] and Muñoz and Singleton [6] .

Formulating testable hypotheses

Once the relevant cph 's scope has satisfactorily been identified, clear and testable predictions need to be drawn from it. At this stage, the lack of consensus on what the consequences or the actual observable outcome of a cp would have to look like becomes evident. As touched upon earlier, cph research is interested in the end state or ‘ultimate attainment’ ( ua ) in L2 acquisition because this “determines the upper limits of L2 attainment” [9, p. 10]. The range of possible ultimate attainment states thus helps researchers to explore the potential maximum outcome of L2 proficiency before and after the putative critical period.

One strong prediction made by some cph exponents holds that post- cp learners cannot reach native-like L2 competences. Identifying a single native-like post- cp L2 learner would then suffice to falsify all cph s making this prediction. Assessing this prediction is difficult, however, since it is not clear what exactly constitutes sufficient nativelikeness, as illustrated by the discussion on the actual nativelikeness of highly accomplished L2 speakers [15] , [16] . Indeed, there exists a real danger that, in a quest to vindicate the cph , scholars set the bar for L2 learners to match monolinguals increasingly higher – up to Swiftian extremes. Furthermore, the usefulness of comparing the linguistic performance in mono- and bilinguals has been called into question [6] , [17] , [18] . Put simply, the linguistic repertoires of mono- and bilinguals differ by definition and differences in the behavioural outcome will necessarily be found, if only one digs deep enough.

A second strong prediction made by cph proponents is that the function linking age of acquisition and ultimate attainment will not be linear throughout the whole lifespan. Before discussing how this function would have to look like in order for it to constitute cph -consistent evidence, I point out that the ultimate attainment variable can essentially be considered a cumulative measure dependent on the actual variable of interest in cph research, i.e. susceptibility to language input, as well as on such other factors like duration and intensity of learning (within and outside a putative cp ) and possibly a number of other influencing factors. To elaborate, the behavioural outcome, i.e. ultimate attainment, can be assumed to be integrative to the susceptibility function, as Newport [19] correctly points out. Other things being equal, ultimate attainment will therefore decrease as susceptibility decreases. However, decreasing ultimate attainment levels in and by themselves represent no compelling evidence in favour of a cph . The form of the integrative curve must therefore be predicted clearly from the susceptibility function. Additionally, the age of acquisition–ultimate attainment function can take just about any form when other things are not equal, e.g. duration of learning (Does learning last up until time of testing or only for a more or less constant number of years or is it dependent on age itself?) or intensity of learning (Do learners always learn at their maximum susceptibility level or does this intensity vary as a function of age, duration, present attainment and motivation?). The integral of the susceptibility function could therefore be of virtually unlimited complexity and its parameters could be adjusted to fit any age of acquisition–ultimate attainment pattern. It seems therefore astonishing that the distinction between level of sensitivity to language input and level of ultimate attainment is rarely made in the literature. Implicitly or explicitly [20] , the two are more or less equated and the same mathematical functions are expected to describe the two variables if observed across a range of starting ages of acquisition.

But even when the susceptibility and ultimate attainment variables are equated, there remains controversy as to what function linking age of onset of acquisition and ultimate attainment would actually constitute evidence for a critical period. Most scholars agree that not any kind of age effect constitutes such evidence. More specifically, the age of acquisition–ultimate attainment function would need to be different before and after the end of the cp [9] . According to Birdsong [9] , three basic possible patterns proposed in the literature meet this condition. These patterns are presented in Figure 1 . The first pattern describes a steep decline of the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function up to the end of the cp and a practically non-existent age effect thereafter. Pattern 2 is an “unconventional, although often implicitly invoked” [9, p. 17] notion of the cp function which contains a period of peak attainment (or performance at ceiling), i.e. performance does not vary as a function of age, which is often referred to as a ‘window of opportunity’. This time span is followed by an unbounded decline in ua depending on aoa . Pattern 3 includes characteristics of patterns 1 and 2. At the beginning of the aoa range, performance is at ceiling. The next segment is a downward slope in the age function which ends when performance reaches its floor. Birdsong points out that all of these patterns have been reported in the literature. On closer inspection, however, he concludes that the most convincing function describing these age effects is a simple linear one. Hakuta et al. [21] sketch further theoretically possible predictions of the cph in which the mean performance drops drastically and/or the slope of the aoa – ua proficiency function changes at a certain point.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

The graphs are based on based on Figure 2 in [9] .

https://doi.org/10.1371/journal.pone.0069172.g001

Although several patterns have been proposed in the literature, it bears pointing out that the most common explicit prediction corresponds to Birdsong's first pattern, as exemplified by the following crystal-clear statement by DeKeyser, one of the foremost cph proponents:

[A] strong negative correlation between age of acquisition and ultimate attainment throughout the lifespan (or even from birth through middle age), the only age effect documented in many earlier studies, is not evidence for a critical period…[T]he critical period concept implies a break in the AoA–proficiency function, i.e., an age (somewhat variable from individual to individual, of course, and therefore an age range in the aggregate) after which the decline of success rate in one or more areas of language is much less pronounced and/or clearly due to different reasons. [22, p. 445].

DeKeyser and before him among others Johnson and Newport [23] thus conceptualise only one possible pattern which would speak in favour of a critical period: a clear negative age effect before the end of the critical period and a much weaker (if any) negative correlation between age and ultimate attainment after it. This ‘flattened slope’ prediction has the virtue of being much more tangible than the ‘potential nativelikeness’ prediction: Testing it does not necessarily require comparing the L2-learners to a native control group and thus effectively comparing apples and oranges. Rather, L2-learners with different aoa s can be compared amongst themselves without the need to categorise them by means of a native-speaker yardstick, the validity of which is inevitably going to be controversial [15] . In what follows, I will concern myself solely with the ‘flattened slope’ prediction, arguing that, despite its clarity of formulation, cph research has generally used analytical methods that are irrelevant for the purposes of actually testing it.

Inferring non-linearities in critical period research: An overview

critical period hypothesis krashen

Group mean or proportion comparisons.

critical period hypothesis krashen

[T]he main differences can be found between the native group and all other groups – including the earliest learner group – and between the adolescence group and all other groups. However, neither the difference between the two childhood groups nor the one between the two adulthood groups reached significance, which indicates that the major changes in eventual perceived nativelikeness of L2 learners can be associated with adolescence. [15, p. 270].

Similar group comparisons aimed at investigating the effect of aoa on ua have been carried out by both cph advocates and sceptics (among whom Bialystok and Miller [25, pp. 136–139], Birdsong and Molis [26, p. 240], Flege [27, pp. 120–121], Flege et al. [28, pp. 85–86], Johnson [29, p. 229], Johnson and Newport [23, p. 78], McDonald [30, pp. 408–410] and Patowski [31, pp. 456–458]). To be clear, not all of these authors drew direct conclusions about the aoa – ua function on the basis of these groups comparisons, but their group comparisons have been cited as indicative of a cph -consistent non-continuous age effect, as exemplified by the following quote by DeKeyser [22] :

Where group comparisons are made, younger learners always do significantly better than the older learners. The behavioral evidence, then, suggests a non-continuous age effect with a “bend” in the AoA–proficiency function somewhere between ages 12 and 16. [22, p. 448].

The first problem with group comparisons like these and drawing inferences on the basis thereof is that they require that a continuous variable, aoa , be split up into discrete bins. More often than not, the boundaries between these bins are drawn in an arbitrary fashion, but what is more troublesome is the loss of information and statistical power that such discretisation entails (see [32] for the extreme case of dichotomisation). If we want to find out more about the relationship between aoa and ua , why throw away most of the aoa information and effectively reduce the ua data to group means and the variance in those groups?

critical period hypothesis krashen

Comparison of correlation coefficients.

critical period hypothesis krashen

Correlation-based inferences about slope discontinuities have similarly explicitly been made by cph advocates and skeptics alike, e.g. Bialystok and Miller [25, pp. 136 and 140], DeKeyser and colleagues [22] , [44] and Flege et al. [45, pp. 166 and 169]. Others did not explicitly infer the presence or absence of slope differences from the subset correlations they computed (among others Birdsong and Molis [26] , DeKeyser [8] , Flege et al. [28] and Johnson [29] ), but their studies nevertheless featured in overviews discussing discontinuities [14] , [22] . Indeed, the most recent overview draws a strong conclusion about the validity of the cph 's ‘flattened slope’ prediction on the basis of these subset correlations:

In those studies where the two groups are described separately, the correlation is much higher for the younger than for the older group, except in Birdsong and Molis (2001) [ =  [26] , JV], where there was a ceiling effect for the younger group. This global picture from more than a dozen studies provides support for the non-continuity of the decline in the AoA–proficiency function, which all researchers agree is a hallmark of a critical period phenomenon. [22, p. 448].

In Johnson and Newport's specific case [23] , their correlation-based inference that ua levels off after puberty happened to be largely correct: the gjt scores are more or less randomly distributed around a near-horizontal trend line [26] . Ultimately, however, it rests on the fallacy of confusing correlation coefficients with slopes, which seriously calls into question conclusions such as DeKeyser's (cf. the quote above).

critical period hypothesis krashen

https://doi.org/10.1371/journal.pone.0069172.g002

critical period hypothesis krashen

Lower correlation coefficients in older aoa groups may therefore be largely due to differences in ua variance, which have been reported in several studies [23] , [26] , [28] , [29] (see [46] for additional references). Greater variability in ua with increasing age is likely due to factors other than age proper [47] , such as the concomitant greater variability in exposure to literacy, degree of education, motivation and opportunity for language use, and by itself represents evidence neither in favour of nor against the cph .

Regression approaches.

Having demonstrated that neither group mean or proportion comparisons nor correlation coefficient comparisons can directly address the ‘flattened slope’ prediction, I now turn to the studies in which regression models were computed with aoa as a predictor variable and ua as the outcome variable. Once again, this category of studies is not mutually exclusive with the two categories discussed above.

In a large-scale study using self-reports and approximate aoa s derived from a sample of the 1990 U.S. Census, Stevens found that the probability with which immigrants from various countries stated that they spoke English ‘very well’ decreased curvilinearly as a function of aoa [48] . She noted that this development is similar to the pattern found by Johnson and Newport [23] but that it contains no indication of an “abruptly defined ‘critical’ or sensitive period in L2 learning” [48, p. 569]. However, she modelled the self-ratings using an ordinal logistic regression model in which the aoa variable was logarithmically transformed. Technically, this is perfectly fine, but one should be careful not to read too much into the non-linear curves found. In logistic models, the outcome variable itself is modelled linearly as a function of the predictor variables and is expressed in log-odds. In order to compute the corresponding probabilities, these log-odds are transformed using the logistic function. Consequently, even if the model is specified linearly, the predicted probabilities will not lie on a perfectly straight line when plotted as a function of any one continuous predictor variable. Similarly, when the predictor variable is first logarithmically transformed and then used to linearly predict an outcome variable, the function linking the predicted outcome variables and the untransformed predictor variable is necessarily non-linear. Thus, non-linearities follow naturally from Stevens's model specifications. Moreover, cph -consistent discontinuities in the aoa – ua function cannot be found using her model specifications as they did not contain any parameters allowing for this.

Using data similar to Stevens's, Bialystok and Hakuta found that the link between the self-rated English competences of Chinese- and Spanish-speaking immigrants and their aoa could be described by a straight line [49] . In contrast to Stevens, Bialystok and Hakuta used a regression-based method allowing for changes in the function's slope, viz. locally weighted scatterplot smoothing ( lowess ). Informally, lowess is a non-parametrical method that relies on an algorithm that fits the dependent variable for small parts of the range of the independent variable whilst guaranteeing that the overall curve does not contain sudden jumps (for technical details, see [50] ). Hakuta et al. used an even larger sample from the same 1990 U.S. Census data on Chinese- and Spanish-speaking immigrants (2.3 million observations) [21] . Fitting lowess curves, no discontinuities in the aoa – ua slope could be detected. Moreover, the authors found that piecewise linear regression models, i.e. regression models containing a parameter that allows a sudden drop in the curve or a change of its slope, did not provide a better fit to the data than did an ordinary regression model without such a parameter.

critical period hypothesis krashen

To sum up, I have argued at length that regression approaches are superior to group mean and correlation coefficient comparisons for the purposes of testing the ‘flattened slope’ prediction. Acknowledging the reservations vis-à-vis self-estimated ua s, we still find that while the relationship between aoa and ua is not necessarily perfectly linear in the studies discussed, the data do not lend unequivocal support to this prediction. In the following section, I will reanalyse data from a recent empirical paper on the cph by DeKeyser et al. [44] . The first goal of this reanalysis is to further illustrate some of the statistical fallacies encountered in cph studies. Second, by making the computer code available I hope to demonstrate how the relevant regression models, viz. piecewise regression models, can be fitted and how the aoa representing the optimal breakpoint can be identified. Lastly, the findings of this reanalysis will contribute to our understanding of how aoa affects ua as measured using a gjt .

Summary of DeKeyser et al. (2010)

I chose to reanalyse a recent empirical paper on the cph by DeKeyser et al. [44] (henceforth DK et al.). This paper lends itself well to a reanalysis since it exhibits two highly commendable qualities: the authors spell out their hypotheses lucidly and provide detailed numerical and graphical data descriptions. Moreover, the paper's lead author is very clear on what constitutes a necessary condition for accepting the cph : a non-linearity in the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function, with ua declining less strongly as a function of aoa in older, post- cp arrivals compared to younger arrivals [14] , [22] . Lastly, it claims to have found cross-linguistic evidence from two parallel studies backing the cph and should therefore be an unsuspected source to cph proponents.

critical period hypothesis krashen

The authors set out to test the following hypotheses:

  • Hypothesis 1: For both the L2 English and the L2 Hebrew group, the slope of the age of arrival–ultimate attainment function will not be linear throughout the lifespan, but will instead show a marked flattening between adolescence and adulthood.
  • Hypothesis 2: The relationship between aptitude and ultimate attainment will differ markedly for the young and older arrivals, with significance only for the latter. (DK et al., p. 417)

Both hypotheses were purportedly confirmed, which in the authors' view provides evidence in favour of cph . The problem with this conclusion, however, is that it is based on a comparison of correlation coefficients. As I have argued above, correlation coefficients are not to be confused with regression coefficients and cannot be used to directly address research hypotheses concerning slopes, such as Hypothesis 1. In what follows, I will reanalyse the relationship between DK et al.'s aoa and gjt data in order to address Hypothesis 1. Additionally, I will lay bare a problem with the way in which Hypothesis 2 was addressed. The extracted data and the computer code used for the reanalysis are provided as supplementary materials, allowing anyone interested to scrutinise and easily reproduce my whole analysis and carry out their own computations (see ‘supporting information’).

Data extraction

critical period hypothesis krashen

In order to verify whether we did in fact extract the data points to a satisfactory degree of accuracy, I computed summary statistics for the extracted aoa and gjt data and checked these against the descriptive statistics provided by DK et al. (pp. 421 and 427). These summary statistics for the extracted data are presented in Table 1 . In addition, I computed the correlation coefficients for the aoa – gjt relationship for the whole aoa range and for aoa -defined subgroups and checked these coefficients against those reported by DK et al. (pp. 423 and 428). The correlation coefficients computed using the extracted data are presented in Table 2 . Both checks strongly suggest the extracted data to be virtually identical to the original data, and Dr DeKeyser confirmed this to be the case in response to an earlier draft of the present paper (personal communication, 6 May 2013).

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t001

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t002

Results and Discussion

Modelling the link between age of onset of acquisition and ultimate attainment.

I first replotted the aoa and gjt data we extracted from DK et al.'s scatterplots and added non-parametric scatterplot smoothers in order to investigate whether any changes in slope in the aoa – gjt function could be revealed, as per Hypothesis 1. Figures 3 and 4 show this not to be the case. Indeed, simple linear regression models that model gjt as a function of aoa provide decent fits for both the North America and the Israel data, explaining 65% and 63% of the variance in gjt scores, respectively. The parameters of these models are given in Table 3 .

thumbnail

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 1.

https://doi.org/10.1371/journal.pone.0069172.g003

thumbnail

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 5.

https://doi.org/10.1371/journal.pone.0069172.g004

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t003

critical period hypothesis krashen

To ensure that both segments are joined at the breakpoint, the predictor variable is first centred at the breakpoint value, i.e. the breakpoint value is subtracted from the original predictor variable values. For a blow-by-blow account of how such models can be fitted in r , I refer to an example analysis by Baayen [55, pp. 214–222].

critical period hypothesis krashen

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g005

thumbnail

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g006

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t004

critical period hypothesis krashen

https://doi.org/10.1371/journal.pone.0069172.g007

thumbnail

Solid: regression with breakpoint at aoa 16 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g008

thumbnail

Solid: regression with breakpoint at aoa 6 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

https://doi.org/10.1371/journal.pone.0069172.g009

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t005

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t006

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t007

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t008

critical period hypothesis krashen

In sum, a regression model that allows for changes in the slope of the the aoa – gjt function to account for putative critical period effects provides a somewhat better fit to the North American data than does an everyday simple regression model. The improvement in model fit is marginal, however, and including a breakpoint does not result in any detectable improvement of model fit to the Israel data whatsoever. Breakpoint models therefore fail to provide solid cross-linguistic support in favour of critical period effects: across both data sets, gjt can satisfactorily be modelled as a linear function of aoa .

On partialling out ‘age at testing’

As I have argued above, correlation coefficients cannot be used to test hypotheses about slopes. When the correct procedure is carried out on DK et al.'s data, no cross-linguistically robust evidence for changes in the aoa – gjt function was found. In addition to comparing the zero-order correlations between aoa and gjt , however, DK et al. computed partial correlations in which the variance in aoa associated with the participants' age at testing ( aat ; a potentially confounding variable) was filtered out. They found that these partial correlations between aoa and gjt , which are given in Table 9 , differed between age groups in that they are stronger for younger than for older participants. This, DK et al. argue, constitutes additional evidence in favour of the cph . At this point, I can no longer provide my own analysis of DK et al.'s data seeing as the pertinent data points were not plotted. Nevertheless, the detailed descriptions by DK et al. strongly suggest that the use of these partial correlations is highly problematic. Most importantly, and to reiterate, correlations (whether zero-order or partial ones) are actually of no use when testing hypotheses concerning slopes. Still, one may wonder why the partial correlations differ across age groups. My surmise is that these differences are at least partly the by-product of an imbalance in the sampling procedure.

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t009

critical period hypothesis krashen

The upshot of this brief discussion is that the partial correlation differences reported by DK et al. are at least partly the result of an imbalance in the sampling procedure: aoa and aat were simply less intimately tied for the young arrivals in the North America study than for the older arrivals with L2 English or for all of the L2 Hebrew participants. In an ideal world, we would like to fix aat or ascertain that it at most only weakly correlates with aoa . This, however, would result in a strong correlation between aoa and another potential confound variable, length of residence in the L2 environment, bringing us back to square one. Allowing for only moderate correlations between aoa and aat might improve our predicament somewhat, but even in that case, we should tread lightly when making inferences on the basis of statistical control procedures [61] .

On estimating the role of aptitude

Having shown that Hypothesis 1 could not be confirmed, I now turn to Hypothesis 2, which predicts a differential role of aptitude for ua in sla in different aoa groups. More specifically, it states that the correlation between aptitude and gjt performance will be significant only for older arrivals. The correlation coefficients of the relationship between aptitude and gjt are presented in Table 10 .

thumbnail

https://doi.org/10.1371/journal.pone.0069172.t010

The problem with both the wording of Hypothesis 2 and the way in which it is addressed is the following: it is assumed that a variable has a reliably different effect in different groups when the effect reaches significance in one group but not in the other. This logic is fairly widespread within several scientific disciplines (see e.g. [62] for a discussion). Nonetheless, it is demonstrably fallacious [63] . Here we will illustrate the fallacy for the specific case of comparing two correlation coefficients.

critical period hypothesis krashen

Apart from not being replicated in the North America study, does this difference actually show anything? I contend that it does not: what is of interest are not so much the correlation coefficients, but rather the interactions between aoa and aptitude in models predicting gjt . These interactions could be investigated by fitting a multiple regression model in which the postulated cp breakpoint governs the slope of both aoa and aptitude. If such a model provided a substantially better fit to the data than a model without a breakpoint for the aptitude slope and if the aptitude slope changes in the expected direction (i.e. a steeper slope for post- cp than for younger arrivals) for different L1–L2 pairings, only then would this particular prediction of the cph be borne out.

Using data extracted from a paper reporting on two recent studies that purport to provide evidence in favour of the cph and that, according to its authors, represent a major improvement over earlier studies (DK et al., p. 417), it was found that neither of its two hypotheses were actually confirmed when using the proper statistical tools. As a matter of fact, the gjt scores continue to decline at essentially the same rate even beyond the end of the putative critical period. According to the paper's lead author, such a finding represents a serious problem to his conceptualisation of the cph [14] ). Moreover, although modelling a breakpoint representing the end of a cp at aoa 16 may improve the statistical model slightly in study on learners of English in North America, the study on learners of Hebrew in Israel fails to confirm this finding. In fact, even if we were to accept the optimal breakpoint computed for the Israel study, it lies at aoa 6 and is associated with a different geometrical pattern.

Diverging age trends in parallel studies with participants with different L2s have similarly been reported by Birdsong and Molis [26] and are at odds with an L2-independent cph . One parsimonious explanation of such conflicting age trends may be that the overall, cross-linguistic age trend is in fact linear, but that fluctuations in the data (due to factors unaccounted for or randomness) may sometimes give rise to a ‘stretched L’-shaped pattern ( Figure 1, left panel ) and sometimes to a ‘stretched 7’-shaped pattern ( Figure 1 , middle panel; see also [66] for a similar comment).

Importantly, the criticism that DeKeyser and Larsson-Hall levy against two studies reporting findings similar to the present [48] , [49] , viz. that the data consisted of self-ratings of questionable validity [14] , does not apply to the present data set. In addition, DK et al. did not exclude any outliers from their analyses, so I assume that DeKeyser and Larsson-Hall's criticism [14] of Birdsong and Molis's study [26] , i.e. that the findings were due to the influence of outliers, is not applicable to the present data either. For good measure, however, I refitted the regression models with and without breakpoints after excluding one potentially problematic data point per model. The following data points had absolute standardised residuals larger than 2.5 in the original models without breakpoints as well as in those with breakpoints: the participant with aoa 17 and a gjt score of 125 in the North America study and the participant with aoa 12 and a gjt score of 117 in the Israel study. The resultant models were virtually identical to the original models (see Script S1 ). Furthermore, the aoa variable was sufficiently fine-grained and the aoa – gjt curve was not ‘presmoothed’ by the prior aggregation of gjt across parts of the aoa range (see [51] for such a criticism of another study). Lastly, seven of the nine “problems with supposed counter-evidence” to the cph discussed by Long [5] do not apply either, viz. (1) “[c]onfusion of rate and ultimate attainment”, (2) “[i]nappropriate choice of subjects”, (3) “[m]easurement of AO”, (4) “[l]eading instructions to raters”, (6) “[u]se of markedly non-native samples making near-native samples more likely to sound native to raters”, (7) “[u]nreliable or invalid measures”, and (8) “[i]nappropriate L1–L2 pairings”. Problem No. 5 (“Assessments based on limited samples and/or “language-like” behavior”) may be apropos given that only gjt data were used, leaving open the theoretical possibility that other measures might have yielded a different outcome. Finally, problem No. 9 (“Faulty interpretation of statistical patterns”) is, of course, precisely what I have turned the spotlights on.

Conclusions

The critical period hypothesis remains a hotly contested issue in the psycholinguistics of second-language acquisition. Discussions about the impact of empirical findings on the tenability of the cph generally revolve around the reliability of the data gathered (e.g. [5] , [14] , [22] , [52] , [67] , [68] ) and such methodological critiques are of course highly desirable. Furthermore, the debate often centres on the question of exactly what version of the cph is being vindicated or debunked. These versions differ mainly in terms of its scope, specifically with regard to the relevant age span, setting and language area, and the testable predictions they make. But even when the cph 's scope is clearly demarcated and its main prediction is spelt out lucidly, the issue remains to what extent the empirical findings can actually be marshalled in support of the relevant cph version. As I have shown in this paper, empirical data have often been taken to support cph versions predicting that the relationship between age of acquisition and ultimate attainment is not strictly linear, even though the statistical tools most commonly used (notably group mean and correlation coefficient comparisons) were, crudely put, irrelevant to this prediction. Methods that are arguably valid, e.g. piecewise regression and scatterplot smoothing, have been used in some studies [21] , [26] , [49] , but these studies have been criticised on other grounds. To my knowledge, such methods have never been used by scholars who explicitly subscribe to the cph .

I suspect that what may be going on is a form of ‘confirmation bias’ [69] , a cognitive bias at play in diverse branches of human knowledge seeking: Findings judged to be consistent with one's own hypothesis are hardly questioned, whereas findings inconsistent with one's own hypothesis are scrutinised much more strongly and criticised on all sorts of points [70] – [73] . My reanalysis of DK et al.'s recent paper may be a case in point. cph exponents used correlation coefficients to address their prediction about the slope of a function, as had been done in a host of earlier studies. Finding a result that squared with their expectations, they did not question the technical validity of their results, or at least they did not report this. (In fact, my reanalysis is actually a case in point in two respects: for an earlier draft of this paper, I had computed the optimal position of the breakpoints incorrectly, resulting in an insignificant improvement of model fit for the North American data rather than a borderline significant one. Finding a result that squared with my expectations, I did not question the technical validity of my results – until this error was kindly pointed out to me by Martijn Wieling (University of Tübingen).) That said, I am keen to point out that the statistical analyses in this particular paper, though suboptimal, are, as far as I could gather, reported correctly, i.e. the confirmation bias does not seem to have resulted in the blatant misreportings found elsewhere (see [74] for empirical evidence and discussion). An additional point to these authors' credit is that, apart from explicitly identifying their cph version's scope and making crystal-clear predictions, they present data descriptions that actually permit quantitative reassessments and have a history of doing so (e.g. the appendix in [8] ). This leads me to believe that they analysed their data all in good conscience and to hope that they, too, will conclude that their own data do not, in fact, support their hypothesis.

I end this paper on an upbeat note. Even though I have argued that the analytical tools employed in cph research generally leave much to be desired, the original data are, so I hope, still available. This provides researchers, cph supporters and sceptics alike, with an exciting opportunity to reanalyse their data sets using the tools outlined in the present paper and publish their findings at minimal cost of time and resources (for instance, as a comment to this paper). I would therefore encourage scholars to engage their old data sets and to communicate their analyses openly, e.g. by voluntarily publishing their data and computer code alongside their articles or comments. Ideally, cph supporters and sceptics would join forces to agree on a protocol for a high-powered study in order to provide a truly convincing answer to a core issue in sla .

Supporting Information

Dataset s1..

aoa and gjt data extracted from DeKeyser et al.'s North America study.

https://doi.org/10.1371/journal.pone.0069172.s001

Dataset S2.

aoa and gjt data extracted from DeKeyser et al.'s Israel study.

https://doi.org/10.1371/journal.pone.0069172.s002

Script with annotated R code used for the reanalysis. All add-on packages used can be installed from within R.

https://doi.org/10.1371/journal.pone.0069172.s003

Acknowledgments

I would like to thank Irmtraud Kaiser (University of Fribourg) for helping me to get an overview of the literature on the critical period hypothesis in second language acquisition. Thanks are also due to Martijn Wieling (currently University of Tübingen) for pointing out an error in the R code accompanying an earlier draft of this paper.

Author Contributions

Analyzed the data: JV. Wrote the paper: JV.

  • 1. Penfield W, Roberts L (1959) Speech and brain mechanisms. Princeton: Princeton University Press.
  • 2. Lenneberg EH (1967) Biological foundations of language. New York: Wiley.
  • View Article
  • Google Scholar
  • 10. Long MH (2007) Problems in SLA. Mahwah, NJ: Lawrence Erlbaum.
  • 14. DeKeyser R, Larson-Hall J (2005) What does the critical period really mean? In: Kroll and De Groot [75], 88–108.
  • 19. Newport EL (1991) Contrasting conceptions of the critical period for language. In: Carey S, Gelman R, editors, The epigenesis of mind: Essays on biology and cognition, Hillsdale, NJ: Lawrence Erlbaum. 111–130.
  • 20. Birdsong D (2005) Interpreting age effects in second language acquisition. In: Kroll and De Groot [75], 109–127.
  • 22. DeKeyser R (2012) Age effects in second language learning. In: Gass SM, Mackey A, editors, The Routledge handbook of second language acquisition, London: Routledge. 442–460.
  • 24. Weisstein EW. Discontinuity. From MathWorld –A Wolfram Web Resource. Available: http://mathworld.wolfram.com/Discontinuity.html . Accessed 2012 March 2.
  • 27. Flege JE (1999) Age of learning and second language speech. In: Birdsong [76], 101–132.
  • 36. Champely S (2009) pwr: Basic functions for power analysis. Available: http://cran.r-project.org/package=pwr . R package, version 1.1.1.
  • 37. R Core Team (2013) R: A language and environment for statistical computing. Available: http://www.r-project.org/ . Software, version 2.15.3.
  • 47. Hyltenstam K, Abrahamsson N (2003) Maturational constraints in sla . In: Doughty CJ, Long MH, editors, The handbook of second language acquisition, Malden, MA: Blackwell. 539–588.
  • 49. Bialystok E, Hakuta K (1999) Confounded age: Linguistic and cognitive factors in age differences for second language acquisition. In: Birdsong [76], 161–181.
  • 52. DeKeyser R (2006) A critique of recent arguments against the critical period hypothesis. In: Abello-Contesse C, Chacón-Beltrán R, López-Jiménez MD, Torreblanca-López MM, editors, Age in L2 acquisition and teaching, Bern: Peter Lang. 49–58.
  • 55. Baayen RH (2008) Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.
  • 56. Fox J (2002) Robust regression. Appendix to An R and S-Plus Companion to Applied Regression. Available: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix.html .
  • 57. Ripley B, Hornik K, Gebhardt A, Firth D (2012) MASS: Support functions and datasets for Venables and Ripley's MASS. Available: http://cran.r-project.org/package=MASS . R package, version 7.3–17.
  • 58. Zuur AF, Ieno EN, Walker NJ, Saveliev AA, Smith GM (2009) Mixed effects models and extensions in ecology with R. New York: Springer.
  • 59. Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team (2013) nlme: Linear and nonlinear mixed effects models. Available: http://cran.r-project.org/package=nlme . R package, version 3.1–108.
  • 65. Field A (2009) Discovering statistics using SPSS. London: SAGE 3rd edition.
  • 66. Birdsong D (2009) Age and the end state of second language acquisition. In: Ritchie WC, Bhatia TK, editors, The new handbook of second language acquisition, Bingley: Emerlad. 401–424.
  • 75. Kroll JF, De Groot AMB, editors (2005) Handbook of bilingualism: Psycholinguistic approaches. New York: Oxford University Press.
  • 76. Birdsong D, editor (1999) Second language acquisition and the critical period hypothesis. Mahwah, NJ: Lawrence Erlbaum.

Is there a single key issue in the field of second language acquisition/learning, an as yet unresolved matter on which all else depends? A good case could be made for the question of whether or not there is a critical period for second language learning being just such a key issue. In other words, does the nature of second language acquisition change if the first exposure to the new language comes after a certain age? This question is closely linked to the question of whether first language (L1) acquisition and second language (L2) acquisition are essentially the same process, or very similar processes, and if so whether this is the case for some learners, or for all. In practical terms, it could be central not only to such issues as the optimal age at which children should start learning foreign languages, but also to the best teaching/learning approach for adults. Krashen's Input Hypothesis (Krashen, 1985) is totally undermined if a critical period does indeed exist, since the hypothesis assumes not only that L2 acquisition is similar in nature to L1 acquisition, but also that this is the case for learners of any age. Alhough many would claim that Krashen's theories are seriously flawed in any case, their influence in the field of second language teaching can hardly be denied. Issues such as the relative importance of lexis and syntax in teaching materials must ultimately link back to the way in which second language knowledge is organised in the brain. If that organisation is different in learners who have first been exposed to L2 after a certain age, then this has a bearing on choice of teaching approach. Yes, I believe there is a strong prima facie case for regarding the debate over the Critical Period Hypothesis (CPH) as a central issue.

The concept of a critical period is well known in nature. One example is imprinting in ducks and geese, where it is claimed that ducklings and goslings can be induced to adopt chickens, people, or even mechanical objects as their mothers if they encounter them within a certain short period after hatching. (Note, however, that the exact nature of even this apparently well-documented instance of a critical period is now coming under fire; see Hoffmann, 1996). In humans, on the basis of extant evidence, it seems that there is a critical period for first language acquisition; those unfortunate persons who are not exposed to any language before puberty seem unable to properly acquire the syntax of their first language later in life. (Inevitably, our knowledge in this area is sketchy and unreliable, being based solely on a very few cases, of which that of "Genie" is the most celebrated and best known; see Eubank and Gregg's article in Birdsong for a discussion.) [-1-]

Provided that a person learns a first language in the normal way, the question is then whether there is a certain biologically-determined critical period during which that person can acquire further languages using one mental mechanism, probably resulting in a high level of achievement if learning continues, and after which the learning process for new languages changes, so that the learning outcome will not be as good. Note that we are not talking here about the commonly-observed and widely-accepted generalisation that learning gets harder as one gets older; nor is the question one of whether changes in attitudes or situation alter the learning process as one gets older. The issue is whether a fundamental change in the learning process and thus in potential learning outcomes related to second languages occurs in the brain at a fairly fixed age, closing a biological "window of opportunity" (although as Birdsong points out in his introduction to this book, there is no single formulation of the Critical Period Hypothesis, but a number of different versions of the theory).

This book contributes to the debate by juxtaposing a number of papers which consider the CPH from a variety of points of view, and which arrive at a variety of conclusions. Most of the papers in the book are based on talks given at an AILA symposium on the CPH which took place in Finland in 1996. It must have been quite a conference; the names of the contributors to this book make up a Who's Who of researchers in fields related to the CPH issue, and the diversity of the opinions held by the contributors must have made for some sharp exchanges. The book contains research papers by both proponents and opponents of a CPH for SLA, thus drawing the reader into the controversy.

What you get in the book is what you might expect from the above description. First, it must be said that it is a fairly tough read. Some of the writers are easier to follow than others, but these are research papers, and anyone unfamiliar with the fields covered--and there is a considerable range of fields--is likely to have to work quite hard at some of the texts at least. Second, there is no overall conclusion, even though the editor does have his own clearly-expressed view. This is not because of differences in the interpretation of data; it is because the various writers operate in different areas of research, each casting a different light on the central issue. These varied areas of research produce conclusions which point in different directions, and because of the lack of common ground on which to debate, the differences cannot easily be resolved. Third, there is an unevenness about the book. Some writers report on tentative conclusions from ongoing research; others simply reproduce material on completed projects which can be found in almost identical form elsewhere. The relevance of the research presented to the central issue also varies. These points might be regarded as drawbacks. But the compensation comes in having so much relevant and fairly up-to-date material on the issue collected together in one volume, providing insights into current knowledge and thinking from a variety of angles.

In his introduction, Birdsong briefly surveys the background to the debate, outlining some of the arguments previously advanced for and against the existence of a critical period. He is particularly well suited to this task, having "changed sides" on the issue in the early 1990s. After the background section, Birdsong goes on to present a careful summary of each of the chapters in the book. While admitting his adherence to the "anti-CPH" camp, he makes no attempt to resolve the evidence presented in the various chapters, and concludes that in total the contributions to the book demonstrate "the richness, depth and breadth of the critical period enquiry" and that they "testify to the unmistakable centrality of the CPH in L2A research" (p. 18). The introduction is clearly written, and since it contains so much summarising material it can stand as a valuable survey of the field in itself. However, some of the chapters in the book do not lend themselves to brief summaries (the chapter by Eubank and Gregg, for example, is far too broad in scope) and thus reading the introduction is no substitute for reading the entire book. [-2-]

There are three chapters providing evidence for the existence of an SLA critical period. First, Weber-Fox and Neville take a frontal approach to the issue with an investigation of neural activity while performing L2 tasks in subjects whose first exposure to L2 was at different ages. Their paper has the rather daunting title of "Functional Neural Subsystems Are Differentially Affected by Delays in Second Language Immersion: ERP and Behavioural Evidence in Bilinguals." The findings do not point to the existence of a single critical period; the patterns of change vary for different language tasks. The authors claim, fairly circumspectly, that "our findings are consistent with the hypothesis that the development of at least some neural subsystems for language processing is constrained by maturational changes, even in early childhood. Additionally, our results are compatible, at least in part, with aspects of Lenneberg's . . . original hypothesis that puberty may mark a significant point in language learning capacity and neural reorganization capabilities" (pp. 35-36). Eubank and Gregg, in a wide-ranging paper entitled "Critical Periods and (Second) Language Acquisition: Divide et Impera," recognise the importance of neurological investigation in their consideration of whether second language learners retain access to Universal Grammar, and find Weber-Fox and Neville's line of research a promising one. Sandwiched between these two chapters comes a paper by Hurford and Kirby which takes a very different approach to the problem. The writers' argument is an evolutionary one; they produce computer models to suggest that a critical period for language acquisition finishing at puberty inevitably evolves in order to produce maximum language learning by the time a reproductive age is reached. However, their argument appears predicated on the assumption that the level of an individual's language knowledge governs the likelihood of his or her being able to reproduce. Alas, it has never been my personal experience that linguistic ability provides a crucial advantage in the competition for sexual partners, and I do not find myself convinced in any great measure by this application of the currently fashionable evolutionary approach to exploring the nature of the human mind.

Then come three chapters arguing against the existence of a critical period. James Flege provides research evidence to show that level of achievement in pronunciation is closely related to age of first exposure to the second language. He claims that, even for children, the later in life the first exposure to L2, the greater the degree of foreign accent, with no sudden discontinuity in the figures at a certain age to suggest that a critical [-3-] period has ended, a window of opportunity suddenly closed. Other hypotheses, he claims, can be advanced to explain the linear nature of the relationship between age of first exposure and L2 pronunciation, notably that pronunciation of L2 varies as a function of how well one pronounces L1. Theo Bongaerts takes a very different approach in his paper, which again focusses on pronunciation; his view is that people who begin learning L2 later in life can sometimes achieve native-like pronunciation. If such learners do indeed exist, and Bongaerts presents evidence to suggest that they do, then there can be no biological window of learning opportunity that closes at a fixed age; instead, there must be other explanations for the lack of success of the majority of learners. These two papers, then, while arriving at the same conclusion with regard to the existence of an L2 critical period, do so on the basis of more or less contradictory evidence! Finally, Bialystok and Hakuta, focussing primarily on syntax rather than pronunciation, again point to the lack of any age-related discontinuity in the nature of L2 acquisition. They also suggest that belief in a critical period may be the result of misattribution of causality in examining the evidence; even the neurological differences pointed out by Weber-Fox and Neville could be the result of differences in the learning experience, rather than causes of such differences.

This is all somewhat confusing, and the only conclusion that a reader can come to at the end of the book is that there are no easy answers on the CPH. What is clear is that the old notion that the nature of L2 acquisition changes suddenly and dramatically at around the age of 12-13 because of changes in the brain is much too simplistic (as has been generally recognised for some time). If there is any truth in the CPH, then there may be different critical periods for different language skills, different types of change at different ages. If on the other hand there is no physical change in the brain which can be directly related to language learning, other powerful explanations are needed to account for the dramatic decline in ultimate achievement generally seen in later second language learners compared to young children -- and such explanations are no more than tentative guesses at present. None of this is of much immediate help to the practising language teacher; it may even be in the long run that exact age of first L2 exposure and the CPH will not turn out to be such a central issue after all, at least not in a formal learning context. But whether it is itself a key field, or whether it simply takes us into other areas which are key fields, further research into the relationship between age and language learning is likely to help us delve deeper into the mysteries of the mechanisms of second language acquisition.

Birdsong's collection of papers describes a waystage--or rather, a series of different waystages on different roads--in research and speculation on the CPH issue. It is a significant publication, and important reading for those who need to keep up-to-date with second language acquisition research.

Hoffmann, H. S. (1997). Imprinting: A brief description. Available: http://www.animatedsoftware.com/family/howardsh/imprint.htm

Krashen, S. D. (1985). The input hypothesis: Issues and implications. London & New York: Longman.

Tim Caudery University of Aarhus, Denmark <[email protected]>

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

The Critical Period Hypothesis in Second Language Acquisition: A Statistical Critique and a Reanalysis

Jan vanhove.

Department of Multilingualism, University of Fribourg, Fribourg, Switzerland

Analyzed the data: JV. Wrote the paper: JV.

Associated Data

In second language acquisition research, the critical period hypothesis ( cph ) holds that the function between learners' age and their susceptibility to second language input is non-linear. This paper revisits the indistinctness found in the literature with regard to this hypothesis's scope and predictions. Even when its scope is clearly delineated and its predictions are spelt out, however, empirical studies–with few exceptions–use analytical (statistical) tools that are irrelevant with respect to the predictions made. This paper discusses statistical fallacies common in cph research and illustrates an alternative analytical method (piecewise regression) by means of a reanalysis of two datasets from a 2010 paper purporting to have found cross-linguistic evidence in favour of the cph . This reanalysis reveals that the specific age patterns predicted by the cph are not cross-linguistically robust. Applying the principle of parsimony, it is concluded that age patterns in second language acquisition are not governed by a critical period. To conclude, this paper highlights the role of confirmation bias in the scientific enterprise and appeals to second language acquisition researchers to reanalyse their old datasets using the methods discussed in this paper. The data and R commands that were used for the reanalysis are provided as supplementary materials.

Introduction

In the long term and in immersion contexts, second-language (L2) learners starting acquisition early in life – and staying exposed to input and thus learning over several years or decades – undisputedly tend to outperform later learners. Apart from being misinterpreted as an argument in favour of early foreign language instruction, which takes place in wholly different circumstances, this general age effect is also sometimes taken as evidence for a so-called ‘critical period’ ( cp ) for second-language acquisition ( sla ). Derived from biology, the cp concept was famously introduced into the field of language acquisition by Penfield and Roberts in 1959 [1] and was refined by Lenneberg eight years later [2] . Lenneberg argued that language acquisition needed to take place between age two and puberty – a period which he believed to coincide with the lateralisation process of the brain. (More recent neurological research suggests that different time frames exist for the lateralisation process of different language functions. Most, however, close before puberty [3] .) However, Lenneberg mostly drew on findings pertaining to first language development in deaf children, feral children or children with serious cognitive impairments in order to back up his claims. For him, the critical period concept was concerned with the implicit “automatic acquisition” [2, p. 176] in immersion contexts and does not preclude the possibility of learning a foreign language after puberty, albeit with much conscious effort and typically less success.

sla research adopted the critical period hypothesis ( cph ) and applied it to second and foreign language learning, resulting in a host of studies. In its most general version, the cph for sla states that the ‘susceptibility’ or ‘sensitivity’ to language input varies as a function of age, with adult L2 learners being less susceptible to input than child L2 learners. Importantly, the age–susceptibility function is hypothesised to be non-linear. Moving beyond this general version, we find that the cph is conceptualised in a multitude of ways [4] . This state of affairs requires scholars to make explicit their theoretical stance and assumptions [5] , but has the obvious downside that critical findings risk being mitigated as posing a problem to only one aspect of one particular conceptualisation of the cph , whereas other conceptualisations remain unscathed. This overall vagueness concerns two areas in particular, viz. the delineation of the cph 's scope and the formulation of testable predictions. Delineating the scope and formulating falsifiable predictions are, needless to say, fundamental stages in the scientific evaluation of any hypothesis or theory, but the lack of scholarly consensus on these points seems to be particularly pronounced in the case of the cph . This article therefore first presents a brief overview of differing views on these two stages. Then, once the scope of their cph version has been duly identified and empirical data have been collected using solid methods, it is essential that researchers analyse the data patterns soundly in order to assess the predictions made and that they draw justifiable conclusions from the results. As I will argue in great detail, however, the statistical analysis of data patterns as well as their interpretation in cph research – and this includes both critical and supportive studies and overviews – leaves a great deal to be desired. Reanalysing data from a recent cph -supportive study, I illustrate some common statistical fallacies in cph research and demonstrate how one particular cph prediction can be evaluated.

Delineating the scope of the critical period hypothesis

First, the age span for a putative critical period for language acquisition has been delimited in different ways in the literature [4] . Lenneberg's critical period stretched from two years of age to puberty (which he posits at about 14 years of age) [2] , whereas other scholars have drawn the cutoff point at 12, 15, 16 or 18 years of age [6] . Unlike Lenneberg, most researchers today do not define a starting age for the critical period for language learning. Some, however, consider the possibility of the critical period (or a critical period for a specific language area, e.g. phonology) ending much earlier than puberty (e.g. age 9 years [1] , or as early as 12 months in the case of phonology [7] ).

Second, some vagueness remains as to the setting that is relevant to the cph . Does the critical period constrain implicit learning processes only, i.e. only the untutored language acquisition in immersion contexts or does it also apply to (at least partly) instructed learning? Most researchers agree on the former [8] , but much research has included subjects who have had at least some instruction in the L2.

Third, there is no consensus on what the scope of the cp is as far as the areas of language that are concerned. Most researchers agree that a cp is most likely to constrain the acquisition of pronunciation and grammar and, consequently, these are the areas primarily looked into in studies on the cph [9] . Some researchers have also tried to define distinguishable cp s for the different language areas of phonetics, morphology and syntax and even for lexis (see [10] for an overview).

Fourth and last, research into the cph has focused on ‘ultimate attainment’ ( ua ) or the ‘final’ state of L2 proficiency rather than on the rate of learning. From research into the rate of acquisition (e.g. [11] – [13] ), it has become clear that the cph cannot hold for the rate variable. In fact, it has been observed that adult learners proceed faster than child learners at the beginning stages of L2 acquisition. Though theoretical reasons for excluding the rate can be posited (the initial faster rate of learning in adults may be the result of more conscious cognitive strategies rather than to less conscious implicit learning, for instance), rate of learning might from a different perspective also be considered an indicator of ‘susceptibility’ or ‘sensitivity’ to language input. Nevertheless, contemporary sla scholars generally seem to concur that ua and not rate of learning is the dependent variable of primary interest in cph research. These and further scope delineation problems relevant to cph research are discussed in more detail by, among others, Birdsong [9] , DeKeyser and Larson-Hall [14] , Long [10] and Muñoz and Singleton [6] .

Formulating testable hypotheses

Once the relevant cph 's scope has satisfactorily been identified, clear and testable predictions need to be drawn from it. At this stage, the lack of consensus on what the consequences or the actual observable outcome of a cp would have to look like becomes evident. As touched upon earlier, cph research is interested in the end state or ‘ultimate attainment’ ( ua ) in L2 acquisition because this “determines the upper limits of L2 attainment” [9, p. 10]. The range of possible ultimate attainment states thus helps researchers to explore the potential maximum outcome of L2 proficiency before and after the putative critical period.

One strong prediction made by some cph exponents holds that post- cp learners cannot reach native-like L2 competences. Identifying a single native-like post- cp L2 learner would then suffice to falsify all cph s making this prediction. Assessing this prediction is difficult, however, since it is not clear what exactly constitutes sufficient nativelikeness, as illustrated by the discussion on the actual nativelikeness of highly accomplished L2 speakers [15] , [16] . Indeed, there exists a real danger that, in a quest to vindicate the cph , scholars set the bar for L2 learners to match monolinguals increasingly higher – up to Swiftian extremes. Furthermore, the usefulness of comparing the linguistic performance in mono- and bilinguals has been called into question [6] , [17] , [18] . Put simply, the linguistic repertoires of mono- and bilinguals differ by definition and differences in the behavioural outcome will necessarily be found, if only one digs deep enough.

A second strong prediction made by cph proponents is that the function linking age of acquisition and ultimate attainment will not be linear throughout the whole lifespan. Before discussing how this function would have to look like in order for it to constitute cph -consistent evidence, I point out that the ultimate attainment variable can essentially be considered a cumulative measure dependent on the actual variable of interest in cph research, i.e. susceptibility to language input, as well as on such other factors like duration and intensity of learning (within and outside a putative cp ) and possibly a number of other influencing factors. To elaborate, the behavioural outcome, i.e. ultimate attainment, can be assumed to be integrative to the susceptibility function, as Newport [19] correctly points out. Other things being equal, ultimate attainment will therefore decrease as susceptibility decreases. However, decreasing ultimate attainment levels in and by themselves represent no compelling evidence in favour of a cph . The form of the integrative curve must therefore be predicted clearly from the susceptibility function. Additionally, the age of acquisition–ultimate attainment function can take just about any form when other things are not equal, e.g. duration of learning (Does learning last up until time of testing or only for a more or less constant number of years or is it dependent on age itself?) or intensity of learning (Do learners always learn at their maximum susceptibility level or does this intensity vary as a function of age, duration, present attainment and motivation?). The integral of the susceptibility function could therefore be of virtually unlimited complexity and its parameters could be adjusted to fit any age of acquisition–ultimate attainment pattern. It seems therefore astonishing that the distinction between level of sensitivity to language input and level of ultimate attainment is rarely made in the literature. Implicitly or explicitly [20] , the two are more or less equated and the same mathematical functions are expected to describe the two variables if observed across a range of starting ages of acquisition.

But even when the susceptibility and ultimate attainment variables are equated, there remains controversy as to what function linking age of onset of acquisition and ultimate attainment would actually constitute evidence for a critical period. Most scholars agree that not any kind of age effect constitutes such evidence. More specifically, the age of acquisition–ultimate attainment function would need to be different before and after the end of the cp [9] . According to Birdsong [9] , three basic possible patterns proposed in the literature meet this condition. These patterns are presented in Figure 1 . The first pattern describes a steep decline of the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function up to the end of the cp and a practically non-existent age effect thereafter. Pattern 2 is an “unconventional, although often implicitly invoked” [9, p. 17] notion of the cp function which contains a period of peak attainment (or performance at ceiling), i.e. performance does not vary as a function of age, which is often referred to as a ‘window of opportunity’. This time span is followed by an unbounded decline in ua depending on aoa . Pattern 3 includes characteristics of patterns 1 and 2. At the beginning of the aoa range, performance is at ceiling. The next segment is a downward slope in the age function which ends when performance reaches its floor. Birdsong points out that all of these patterns have been reported in the literature. On closer inspection, however, he concludes that the most convincing function describing these age effects is a simple linear one. Hakuta et al. [21] sketch further theoretically possible predictions of the cph in which the mean performance drops drastically and/or the slope of the aoa – ua proficiency function changes at a certain point.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g001.jpg

The graphs are based on based on Figure 2 in [9] .

Although several patterns have been proposed in the literature, it bears pointing out that the most common explicit prediction corresponds to Birdsong's first pattern, as exemplified by the following crystal-clear statement by DeKeyser, one of the foremost cph proponents:

[A] strong negative correlation between age of acquisition and ultimate attainment throughout the lifespan (or even from birth through middle age), the only age effect documented in many earlier studies, is not evidence for a critical period…[T]he critical period concept implies a break in the AoA–proficiency function, i.e., an age (somewhat variable from individual to individual, of course, and therefore an age range in the aggregate) after which the decline of success rate in one or more areas of language is much less pronounced and/or clearly due to different reasons. [22, p. 445].

DeKeyser and before him among others Johnson and Newport [23] thus conceptualise only one possible pattern which would speak in favour of a critical period: a clear negative age effect before the end of the critical period and a much weaker (if any) negative correlation between age and ultimate attainment after it. This ‘flattened slope’ prediction has the virtue of being much more tangible than the ‘potential nativelikeness’ prediction: Testing it does not necessarily require comparing the L2-learners to a native control group and thus effectively comparing apples and oranges. Rather, L2-learners with different aoa s can be compared amongst themselves without the need to categorise them by means of a native-speaker yardstick, the validity of which is inevitably going to be controversial [15] . In what follows, I will concern myself solely with the ‘flattened slope’ prediction, arguing that, despite its clarity of formulation, cph research has generally used analytical methods that are irrelevant for the purposes of actually testing it.

Inferring non-linearities in critical period research: An overview

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e005.jpg

Group mean or proportion comparisons

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e007.jpg

[T]he main differences can be found between the native group and all other groups – including the earliest learner group – and between the adolescence group and all other groups. However, neither the difference between the two childhood groups nor the one between the two adulthood groups reached significance, which indicates that the major changes in eventual perceived nativelikeness of L2 learners can be associated with adolescence. [15, p. 270].

Similar group comparisons aimed at investigating the effect of aoa on ua have been carried out by both cph advocates and sceptics (among whom Bialystok and Miller [25, pp. 136–139], Birdsong and Molis [26, p. 240], Flege [27, pp. 120–121], Flege et al. [28, pp. 85–86], Johnson [29, p. 229], Johnson and Newport [23, p. 78], McDonald [30, pp. 408–410] and Patowski [31, pp. 456–458]). To be clear, not all of these authors drew direct conclusions about the aoa – ua function on the basis of these groups comparisons, but their group comparisons have been cited as indicative of a cph -consistent non-continuous age effect, as exemplified by the following quote by DeKeyser [22] :

Where group comparisons are made, younger learners always do significantly better than the older learners. The behavioral evidence, then, suggests a non-continuous age effect with a “bend” in the AoA–proficiency function somewhere between ages 12 and 16. [22, p. 448].

The first problem with group comparisons like these and drawing inferences on the basis thereof is that they require that a continuous variable, aoa , be split up into discrete bins. More often than not, the boundaries between these bins are drawn in an arbitrary fashion, but what is more troublesome is the loss of information and statistical power that such discretisation entails (see [32] for the extreme case of dichotomisation). If we want to find out more about the relationship between aoa and ua , why throw away most of the aoa information and effectively reduce the ua data to group means and the variance in those groups?

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e010.jpg

Comparison of correlation coefficients

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e026.jpg

Correlation-based inferences about slope discontinuities have similarly explicitly been made by cph advocates and skeptics alike, e.g. Bialystok and Miller [25, pp. 136 and 140], DeKeyser and colleagues [22] , [44] and Flege et al. [45, pp. 166 and 169]. Others did not explicitly infer the presence or absence of slope differences from the subset correlations they computed (among others Birdsong and Molis [26] , DeKeyser [8] , Flege et al. [28] and Johnson [29] ), but their studies nevertheless featured in overviews discussing discontinuities [14] , [22] . Indeed, the most recent overview draws a strong conclusion about the validity of the cph 's ‘flattened slope’ prediction on the basis of these subset correlations:

In those studies where the two groups are described separately, the correlation is much higher for the younger than for the older group, except in Birdsong and Molis (2001) [ =  [26] , JV], where there was a ceiling effect for the younger group. This global picture from more than a dozen studies provides support for the non-continuity of the decline in the AoA–proficiency function, which all researchers agree is a hallmark of a critical period phenomenon. [22, p. 448].

In Johnson and Newport's specific case [23] , their correlation-based inference that ua levels off after puberty happened to be largely correct: the gjt scores are more or less randomly distributed around a near-horizontal trend line [26] . Ultimately, however, it rests on the fallacy of confusing correlation coefficients with slopes, which seriously calls into question conclusions such as DeKeyser's (cf. the quote above).

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e030.jpg

It can then straightforwardly be deduced that, other things equal, the aoa – ua correlation in the older group decreases as the ua variance in the older group increases relative to the ua variance in the younger group (Eq. 3).

equation image

Lower correlation coefficients in older aoa groups may therefore be largely due to differences in ua variance, which have been reported in several studies [23] , [26] , [28] , [29] (see [46] for additional references). Greater variability in ua with increasing age is likely due to factors other than age proper [47] , such as the concomitant greater variability in exposure to literacy, degree of education, motivation and opportunity for language use, and by itself represents evidence neither in favour of nor against the cph .

Regression approaches

Having demonstrated that neither group mean or proportion comparisons nor correlation coefficient comparisons can directly address the ‘flattened slope’ prediction, I now turn to the studies in which regression models were computed with aoa as a predictor variable and ua as the outcome variable. Once again, this category of studies is not mutually exclusive with the two categories discussed above.

In a large-scale study using self-reports and approximate aoa s derived from a sample of the 1990 U.S. Census, Stevens found that the probability with which immigrants from various countries stated that they spoke English ‘very well’ decreased curvilinearly as a function of aoa [48] . She noted that this development is similar to the pattern found by Johnson and Newport [23] but that it contains no indication of an “abruptly defined ‘critical’ or sensitive period in L2 learning” [48, p. 569]. However, she modelled the self-ratings using an ordinal logistic regression model in which the aoa variable was logarithmically transformed. Technically, this is perfectly fine, but one should be careful not to read too much into the non-linear curves found. In logistic models, the outcome variable itself is modelled linearly as a function of the predictor variables and is expressed in log-odds. In order to compute the corresponding probabilities, these log-odds are transformed using the logistic function. Consequently, even if the model is specified linearly, the predicted probabilities will not lie on a perfectly straight line when plotted as a function of any one continuous predictor variable. Similarly, when the predictor variable is first logarithmically transformed and then used to linearly predict an outcome variable, the function linking the predicted outcome variables and the untransformed predictor variable is necessarily non-linear. Thus, non-linearities follow naturally from Stevens's model specifications. Moreover, cph -consistent discontinuities in the aoa – ua function cannot be found using her model specifications as they did not contain any parameters allowing for this.

Using data similar to Stevens's, Bialystok and Hakuta found that the link between the self-rated English competences of Chinese- and Spanish-speaking immigrants and their aoa could be described by a straight line [49] . In contrast to Stevens, Bialystok and Hakuta used a regression-based method allowing for changes in the function's slope, viz. locally weighted scatterplot smoothing ( lowess ). Informally, lowess is a non-parametrical method that relies on an algorithm that fits the dependent variable for small parts of the range of the independent variable whilst guaranteeing that the overall curve does not contain sudden jumps (for technical details, see [50] ). Hakuta et al. used an even larger sample from the same 1990 U.S. Census data on Chinese- and Spanish-speaking immigrants (2.3 million observations) [21] . Fitting lowess curves, no discontinuities in the aoa – ua slope could be detected. Moreover, the authors found that piecewise linear regression models, i.e. regression models containing a parameter that allows a sudden drop in the curve or a change of its slope, did not provide a better fit to the data than did an ordinary regression model without such a parameter.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e060.jpg

To sum up, I have argued at length that regression approaches are superior to group mean and correlation coefficient comparisons for the purposes of testing the ‘flattened slope’ prediction. Acknowledging the reservations vis-à-vis self-estimated ua s, we still find that while the relationship between aoa and ua is not necessarily perfectly linear in the studies discussed, the data do not lend unequivocal support to this prediction. In the following section, I will reanalyse data from a recent empirical paper on the cph by DeKeyser et al. [44] . The first goal of this reanalysis is to further illustrate some of the statistical fallacies encountered in cph studies. Second, by making the computer code available I hope to demonstrate how the relevant regression models, viz. piecewise regression models, can be fitted and how the aoa representing the optimal breakpoint can be identified. Lastly, the findings of this reanalysis will contribute to our understanding of how aoa affects ua as measured using a gjt .

Summary of DeKeyser et al. (2010)

I chose to reanalyse a recent empirical paper on the cph by DeKeyser et al. [44] (henceforth DK et al.). This paper lends itself well to a reanalysis since it exhibits two highly commendable qualities: the authors spell out their hypotheses lucidly and provide detailed numerical and graphical data descriptions. Moreover, the paper's lead author is very clear on what constitutes a necessary condition for accepting the cph : a non-linearity in the age of onset of acquisition ( aoa )–ultimate attainment ( ua ) function, with ua declining less strongly as a function of aoa in older, post- cp arrivals compared to younger arrivals [14] , [22] . Lastly, it claims to have found cross-linguistic evidence from two parallel studies backing the cph and should therefore be an unsuspected source to cph proponents.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e067.jpg

The authors set out to test the following hypotheses:

  • Hypothesis 1: For both the L2 English and the L2 Hebrew group, the slope of the age of arrival–ultimate attainment function will not be linear throughout the lifespan, but will instead show a marked flattening between adolescence and adulthood.
  • Hypothesis 2: The relationship between aptitude and ultimate attainment will differ markedly for the young and older arrivals, with significance only for the latter. (DK et al., p. 417)

Both hypotheses were purportedly confirmed, which in the authors' view provides evidence in favour of cph . The problem with this conclusion, however, is that it is based on a comparison of correlation coefficients. As I have argued above, correlation coefficients are not to be confused with regression coefficients and cannot be used to directly address research hypotheses concerning slopes, such as Hypothesis 1. In what follows, I will reanalyse the relationship between DK et al.'s aoa and gjt data in order to address Hypothesis 1. Additionally, I will lay bare a problem with the way in which Hypothesis 2 was addressed. The extracted data and the computer code used for the reanalysis are provided as supplementary materials, allowing anyone interested to scrutinise and easily reproduce my whole analysis and carry out their own computations (see ‘supporting information’).

Data extraction

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e069.jpg

In order to verify whether we did in fact extract the data points to a satisfactory degree of accuracy, I computed summary statistics for the extracted aoa and gjt data and checked these against the descriptive statistics provided by DK et al. (pp. 421 and 427). These summary statistics for the extracted data are presented in Table 1 . In addition, I computed the correlation coefficients for the aoa – gjt relationship for the whole aoa range and for aoa -defined subgroups and checked these coefficients against those reported by DK et al. (pp. 423 and 428). The correlation coefficients computed using the extracted data are presented in Table 2 . Both checks strongly suggest the extracted data to be virtually identical to the original data, and Dr DeKeyser confirmed this to be the case in response to an earlier draft of the present paper (personal communication, 6 May 2013).

Results and Discussion

Modelling the link between age of onset of acquisition and ultimate attainment.

I first replotted the aoa and gjt data we extracted from DK et al.'s scatterplots and added non-parametric scatterplot smoothers in order to investigate whether any changes in slope in the aoa – gjt function could be revealed, as per Hypothesis 1. Figures 3 and ​ and4 4 show this not to be the case. Indeed, simple linear regression models that model gjt as a function of aoa provide decent fits for both the North America and the Israel data, explaining 65% and 63% of the variance in gjt scores, respectively. The parameters of these models are given in Table 3 .

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g003.jpg

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 1.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g004.jpg

The trend line is a non-parametric scatterplot smoother. The scatterplot itself is a near-perfect replication of DK et al.'s Fig. 5.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e073.jpg

To ensure that both segments are joined at the breakpoint, the predictor variable is first centred at the breakpoint value, i.e. the breakpoint value is subtracted from the original predictor variable values. For a blow-by-blow account of how such models can be fitted in r , I refer to an example analysis by Baayen [55, pp. 214–222].

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e081.jpg

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g006.jpg

Solid: regression with breakpoint at aoa 18 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e092.jpg

Solid: regression with breakpoint at aoa 16 (dashed lines represent its 95% confidence interval); dot-dash: regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.g009.jpg

Solid: regression with breakpoint at aoa 6 (dashed lines represent its 95% confidence interval); dot-dash (hardly visible due to near-complete overlap): regression without breakpoint.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e105.jpg

In sum, a regression model that allows for changes in the slope of the the aoa – gjt function to account for putative critical period effects provides a somewhat better fit to the North American data than does an everyday simple regression model. The improvement in model fit is marginal, however, and including a breakpoint does not result in any detectable improvement of model fit to the Israel data whatsoever. Breakpoint models therefore fail to provide solid cross-linguistic support in favour of critical period effects: across both data sets, gjt can satisfactorily be modelled as a linear function of aoa .

On partialling out ‘age at testing’

As I have argued above, correlation coefficients cannot be used to test hypotheses about slopes. When the correct procedure is carried out on DK et al.'s data, no cross-linguistically robust evidence for changes in the aoa – gjt function was found. In addition to comparing the zero-order correlations between aoa and gjt , however, DK et al. computed partial correlations in which the variance in aoa associated with the participants' age at testing ( aat ; a potentially confounding variable) was filtered out. They found that these partial correlations between aoa and gjt , which are given in Table 9 , differed between age groups in that they are stronger for younger than for older participants. This, DK et al. argue, constitutes additional evidence in favour of the cph . At this point, I can no longer provide my own analysis of DK et al.'s data seeing as the pertinent data points were not plotted. Nevertheless, the detailed descriptions by DK et al. strongly suggest that the use of these partial correlations is highly problematic. Most importantly, and to reiterate, correlations (whether zero-order or partial ones) are actually of no use when testing hypotheses concerning slopes. Still, one may wonder why the partial correlations differ across age groups. My surmise is that these differences are at least partly the by-product of an imbalance in the sampling procedure.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e109.jpg

The upshot of this brief discussion is that the partial correlation differences reported by DK et al. are at least partly the result of an imbalance in the sampling procedure: aoa and aat were simply less intimately tied for the young arrivals in the North America study than for the older arrivals with L2 English or for all of the L2 Hebrew participants. In an ideal world, we would like to fix aat or ascertain that it at most only weakly correlates with aoa . This, however, would result in a strong correlation between aoa and another potential confound variable, length of residence in the L2 environment, bringing us back to square one. Allowing for only moderate correlations between aoa and aat might improve our predicament somewhat, but even in that case, we should tread lightly when making inferences on the basis of statistical control procedures [61] .

On estimating the role of aptitude

Having shown that Hypothesis 1 could not be confirmed, I now turn to Hypothesis 2, which predicts a differential role of aptitude for ua in sla in different aoa groups. More specifically, it states that the correlation between aptitude and gjt performance will be significant only for older arrivals. The correlation coefficients of the relationship between aptitude and gjt are presented in Table 10 .

The problem with both the wording of Hypothesis 2 and the way in which it is addressed is the following: it is assumed that a variable has a reliably different effect in different groups when the effect reaches significance in one group but not in the other. This logic is fairly widespread within several scientific disciplines (see e.g. [62] for a discussion). Nonetheless, it is demonstrably fallacious [63] . Here we will illustrate the fallacy for the specific case of comparing two correlation coefficients.

An external file that holds a picture, illustration, etc.
Object name is pone.0069172.e130.jpg

Apart from not being replicated in the North America study, does this difference actually show anything? I contend that it does not: what is of interest are not so much the correlation coefficients, but rather the interactions between aoa and aptitude in models predicting gjt . These interactions could be investigated by fitting a multiple regression model in which the postulated cp breakpoint governs the slope of both aoa and aptitude. If such a model provided a substantially better fit to the data than a model without a breakpoint for the aptitude slope and if the aptitude slope changes in the expected direction (i.e. a steeper slope for post- cp than for younger arrivals) for different L1–L2 pairings, only then would this particular prediction of the cph be borne out.

Using data extracted from a paper reporting on two recent studies that purport to provide evidence in favour of the cph and that, according to its authors, represent a major improvement over earlier studies (DK et al., p. 417), it was found that neither of its two hypotheses were actually confirmed when using the proper statistical tools. As a matter of fact, the gjt scores continue to decline at essentially the same rate even beyond the end of the putative critical period. According to the paper's lead author, such a finding represents a serious problem to his conceptualisation of the cph [14] ). Moreover, although modelling a breakpoint representing the end of a cp at aoa 16 may improve the statistical model slightly in study on learners of English in North America, the study on learners of Hebrew in Israel fails to confirm this finding. In fact, even if we were to accept the optimal breakpoint computed for the Israel study, it lies at aoa 6 and is associated with a different geometrical pattern.

Diverging age trends in parallel studies with participants with different L2s have similarly been reported by Birdsong and Molis [26] and are at odds with an L2-independent cph . One parsimonious explanation of such conflicting age trends may be that the overall, cross-linguistic age trend is in fact linear, but that fluctuations in the data (due to factors unaccounted for or randomness) may sometimes give rise to a ‘stretched L’-shaped pattern ( Figure 1, left panel ) and sometimes to a ‘stretched 7’-shaped pattern ( Figure 1 , middle panel; see also [66] for a similar comment).

Importantly, the criticism that DeKeyser and Larsson-Hall levy against two studies reporting findings similar to the present [48] , [49] , viz. that the data consisted of self-ratings of questionable validity [14] , does not apply to the present data set. In addition, DK et al. did not exclude any outliers from their analyses, so I assume that DeKeyser and Larsson-Hall's criticism [14] of Birdsong and Molis's study [26] , i.e. that the findings were due to the influence of outliers, is not applicable to the present data either. For good measure, however, I refitted the regression models with and without breakpoints after excluding one potentially problematic data point per model. The following data points had absolute standardised residuals larger than 2.5 in the original models without breakpoints as well as in those with breakpoints: the participant with aoa 17 and a gjt score of 125 in the North America study and the participant with aoa 12 and a gjt score of 117 in the Israel study. The resultant models were virtually identical to the original models (see Script S1 ). Furthermore, the aoa variable was sufficiently fine-grained and the aoa – gjt curve was not ‘presmoothed’ by the prior aggregation of gjt across parts of the aoa range (see [51] for such a criticism of another study). Lastly, seven of the nine “problems with supposed counter-evidence” to the cph discussed by Long [5] do not apply either, viz. (1) “[c]onfusion of rate and ultimate attainment”, (2) “[i]nappropriate choice of subjects”, (3) “[m]easurement of AO”, (4) “[l]eading instructions to raters”, (6) “[u]se of markedly non-native samples making near-native samples more likely to sound native to raters”, (7) “[u]nreliable or invalid measures”, and (8) “[i]nappropriate L1–L2 pairings”. Problem No. 5 (“Assessments based on limited samples and/or “language-like” behavior”) may be apropos given that only gjt data were used, leaving open the theoretical possibility that other measures might have yielded a different outcome. Finally, problem No. 9 (“Faulty interpretation of statistical patterns”) is, of course, precisely what I have turned the spotlights on.

Conclusions

The critical period hypothesis remains a hotly contested issue in the psycholinguistics of second-language acquisition. Discussions about the impact of empirical findings on the tenability of the cph generally revolve around the reliability of the data gathered (e.g. [5] , [14] , [22] , [52] , [67] , [68] ) and such methodological critiques are of course highly desirable. Furthermore, the debate often centres on the question of exactly what version of the cph is being vindicated or debunked. These versions differ mainly in terms of its scope, specifically with regard to the relevant age span, setting and language area, and the testable predictions they make. But even when the cph 's scope is clearly demarcated and its main prediction is spelt out lucidly, the issue remains to what extent the empirical findings can actually be marshalled in support of the relevant cph version. As I have shown in this paper, empirical data have often been taken to support cph versions predicting that the relationship between age of acquisition and ultimate attainment is not strictly linear, even though the statistical tools most commonly used (notably group mean and correlation coefficient comparisons) were, crudely put, irrelevant to this prediction. Methods that are arguably valid, e.g. piecewise regression and scatterplot smoothing, have been used in some studies [21] , [26] , [49] , but these studies have been criticised on other grounds. To my knowledge, such methods have never been used by scholars who explicitly subscribe to the cph .

I suspect that what may be going on is a form of ‘confirmation bias’ [69] , a cognitive bias at play in diverse branches of human knowledge seeking: Findings judged to be consistent with one's own hypothesis are hardly questioned, whereas findings inconsistent with one's own hypothesis are scrutinised much more strongly and criticised on all sorts of points [70] – [73] . My reanalysis of DK et al.'s recent paper may be a case in point. cph exponents used correlation coefficients to address their prediction about the slope of a function, as had been done in a host of earlier studies. Finding a result that squared with their expectations, they did not question the technical validity of their results, or at least they did not report this. (In fact, my reanalysis is actually a case in point in two respects: for an earlier draft of this paper, I had computed the optimal position of the breakpoints incorrectly, resulting in an insignificant improvement of model fit for the North American data rather than a borderline significant one. Finding a result that squared with my expectations, I did not question the technical validity of my results – until this error was kindly pointed out to me by Martijn Wieling (University of Tübingen).) That said, I am keen to point out that the statistical analyses in this particular paper, though suboptimal, are, as far as I could gather, reported correctly, i.e. the confirmation bias does not seem to have resulted in the blatant misreportings found elsewhere (see [74] for empirical evidence and discussion). An additional point to these authors' credit is that, apart from explicitly identifying their cph version's scope and making crystal-clear predictions, they present data descriptions that actually permit quantitative reassessments and have a history of doing so (e.g. the appendix in [8] ). This leads me to believe that they analysed their data all in good conscience and to hope that they, too, will conclude that their own data do not, in fact, support their hypothesis.

I end this paper on an upbeat note. Even though I have argued that the analytical tools employed in cph research generally leave much to be desired, the original data are, so I hope, still available. This provides researchers, cph supporters and sceptics alike, with an exciting opportunity to reanalyse their data sets using the tools outlined in the present paper and publish their findings at minimal cost of time and resources (for instance, as a comment to this paper). I would therefore encourage scholars to engage their old data sets and to communicate their analyses openly, e.g. by voluntarily publishing their data and computer code alongside their articles or comments. Ideally, cph supporters and sceptics would join forces to agree on a protocol for a high-powered study in order to provide a truly convincing answer to a core issue in sla .

Supporting Information

aoa and gjt data extracted from DeKeyser et al.'s North America study.

aoa and gjt data extracted from DeKeyser et al.'s Israel study.

Script with annotated R code used for the reanalysis. All add-on packages used can be installed from within R.

Acknowledgments

I would like to thank Irmtraud Kaiser (University of Fribourg) for helping me to get an overview of the literature on the critical period hypothesis in second language acquisition. Thanks are also due to Martijn Wieling (currently University of Tübingen) for pointing out an error in the R code accompanying an earlier draft of this paper.

Funding Statement

No current external funding sources for this study.

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Krashen's Language Acquisition Hypotheses: A Critical Review

Profile image of Rohani Motivator

Related Papers

International Journal of Social Research

Mzamani Maluleke

The monitor model, being one of its kind postulating the rigorous process taken by learners of second language, has since its inception in 1977, stirred sterile debates the globe over. Since then, Krashen has been rethinking and expanding his hypothetical acquisition notions, improve the applicability of his theory. The model has not been becoming, and it therefore faces disapproval on the basis of its failure to be tested empirically and, at some points, its contrast to Krashen’s earlier perceptions on both first and second language acquisition. In this paper, the writers deliberate upon Krashen’s monitor model, its tenets as well as the various ways in which it impacts, either negatively or positively upon educational teaching and learning.

critical period hypothesis krashen

Amalia Oyarzún

Aufani Yukzanali

Many theories on how language is acquired has been introduced since 19th century and still being introduced today by many great thinkers. Like any other theories which arose from variety of disciplines, language acquisition theories generally derived from linguistics and psychological thinking. This paper concluded that the most important implication of language acquisition theories is obviously the fact that applied linguists, methodologist and language teachers should view the acquisition of a language not only as a matter of nurture but also an instance of nature. In addition, only when we distinguish between a general theory of learning and language learning can we ameliorate the conditions L2 education. To do so, applied linguists must be aware of the nature of both L1 and L2 acquisition and must consider the distinction proposed in this study. Furthermore, no longer should mind and innateness be treated as dirty words. This will most probably lead to innovative proposals for syllabus development and the design of instructional systems, practices, techniques, procedures in the language classroom, and finally a sound theory of L2 teaching and learning.

Karunakaran Thirunavukkarasu

Luz Villarroel Cornejo

Evynurul Laily Zen

This paper aims at revealing the factors that contribute to children's language acquisition of either their first or second language. The affective filter hypothesis (Krashen, 2003) as the underlying framework of this paper is used to see how children's perception towards the language input take a role in the process of acquisition. 25 lecturers in the Faculty of Letters, State University of Malang who have sons or daughters under the age of 10 become the data source. The data are collected through survey method and analyzed qualitatively since this paper is attempting to give a thorough description of the reality in children's language acquisition. The results show that most children are exposed to the language while interacting with their family members, especially their mothers. Another factor is children's interactions with friends. The languages used by their friends are potential to be acquired by them. These two factors strongly confirm the core idea of the affective filter hypothesis that children will learn best when they feel comfortable and are positive about the input they are absorbing. Furthermore, reading is also one of other minor contributing factors discovering the fact that the books the children like helps them construct positive perception which then encourage them import more inputs. 1. Rationale This paper is an attempt to disseminate the result of the survey-based research conducted to have a closer look at the mapping of bilingual language situation seen in certain linguistic situation in Malang. The survey that was conducted to bilingual parents is basically about to satisfy a personal yet scientific curiosity of the researchers as both parents to bilingual children and language teachers. Nothing seems really unique from the fact that children in Indonesia are born to be bilingual because, by nature, they are raised by bilingual parents in bi(multi)lingual situation. On the other hand, there have been an increasing number of studies that explore the nature of bilingual language acquisition. Some have seen negative impact of exposing second language to children (at various angles by which these previous studies have been carried out, the socio-psycholinguistic environment of bilingual children in Malang is obviously worth-researching. One of the focuses of the survey is looking thoroughly at the contributing factors of both the first and second language development of bilinguals that mainly becomes the concern of this paper. Something really significant to start with is the result of the survey seen from Figure 1 below that not only 16% of the children of the respondents are raised monolingual, but also 28% of them are trilingual.

Lazaros Kikidis

For Didactics and Applied Linguistics MA students

Andreas Gozali

Language and Education

Nicole Ziegler

RELATED PAPERS

Yesika Chirinos

thanet chitsuphaphan

Meghna Desai

The archives of bone and joint surgery

seyed mohamad mehdi daneshpoor

Journal of Molecular Structure

vahram ghazaryan

Funmilayo Eguakun

Child Abuse & Neglect

Kathryn Lemery-Chalfant

Environment Systems and Decisions

Igor Linkov

POLIENE SOARES DOS SANTOS BICALHO

Majallah-i ̒ilm/sanjī-i kāspiyan

behjat taheri

Proceedings of the Human Factors and Ergonomics Society Annual Meeting

Linda Pierce

Medicina Oral Patología Oral y Cirugia Bucal

Tessa Botelho

Jurnal Inovasi Teknologi Pendidikan

Hery Yanto The

Biotechnology and Applied Biochemistry

Mickael Chevalier

Jorge Enrique Gallego Vásquez

Fabiana casarin

Disa Yolanda Putri

Association of Paediatric Palliative Medicine

Kirsten Fairgrieve

Revista Brasileira de Cardiologia Invasiva

Pedro de Andrade

2013 IEEE International Conference on Consumer Electronics (ICCE)

Thomas Thomas

Verimlilik dergisi

ERTUĞRUL KARAKAYA

André Fernando Uebe Mansur

Journal of Physics: Condensed Matter

Soubhik Chakrabarty

Rosalind Barnett

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

IMAGES

  1. The Critical Period Hypothesis

    critical period hypothesis krashen

  2. Krashen's Five Hypotheses Diagram

    critical period hypothesis krashen

  3. PPT

    critical period hypothesis krashen

  4. Krashen's Five Hypotheses

    critical period hypothesis krashen

  5. KRASHEN´S HYPOTHESIS

    critical period hypothesis krashen

  6. PPT

    critical period hypothesis krashen

VIDEO

  1. What is CRITICAL PERIOD HYPOTHESIS What does CRITICAL PERIOD HYPOTHESIS mean

  2. Critical Period Hypothesis @quicknote

  3. Uncovering Earth's Ancient Cataclysm: The Younger Dryas Impact Hypothesis

  4. LENNEBERG, CRITICAL PERIOD HYPOTHESIS, LATERLISATION

  5. Prof. K. Sacha (Institute of Theoretical Physics, Jagiellonian University, Kraków): Time crystals

  6. The Critical Period Hypothesis #CPH فرضية المرحلة الحرجة

COMMENTS

  1. Critical period hypothesis

    The critical period hypothesis or sensitive period hypothesis claims that there is an ideal time window of brain development to acquire language in a linguistically rich environment, after which further language acquisition becomes much more difficult and effortful. It is the subject of a long-standing debate in linguistics and language acquisition over the extent to which the ability to ...

  2. The Critical Period Hypothesis in Second Language Acquisition: A ...

    Delineating the scope of the critical period hypothesis. First, the age span for a putative critical period for language acquisition has been delimited in different ways in the literature .Lenneberg's critical period stretched from two years of age to puberty (which he posits at about 14 years of age) , whereas other scholars have drawn the cutoff point at 12, 15, 16 or 18 years of age .

  3. The Critical Period for Language Acquisition: Evidence from Second

    The critical period hypothesis holds that first language acquisition must occur before cerebral lateralization is complete, at about the age of puberty. ... (Krashen 1975; Lenneberg 1967, 1969; Scovel 1969). There are few reported cases of success-ful first language acquisition after the age of puberty. Buddenhagen (1971) reported suc-

  4. The critical period for language acquisition and its possible bases

    Later on, the Critical Period Hypothesis was criticized and has acquired many different variations (Krashen 1975; Long 1990;Birdsong 1992). In general, it is assumed that there is a negative ...

  5. PDF The Critical Period Hypothesis Revisited: The Implications for Current

    According to critical period hypothesis, which was first put forward for the learning of the mother tongue, language ... Krashen (1973) supports this idea by looking at the "reports of psychological testing of children with unilateral brain damage" (p. 65). According to the results of the previous studies he examined, unilateral brain ...

  6. The Critical Period Hypothesis: Support, Challenge, and Reconc

    The Critical Period Hypothesis: Support, Challenge, and Reconceptualization The Critical Period Hypothesis: Support, Challenge, and Reconceptualization Andy Schouten1 ... Krashen (1975) states that although language development "would proceed quite differently and involve different mechanisms after puberty" (p. 212), significant second ...

  7. (PDF) The Critical Period Hypothesis in Second Language Acquisition: A

    The Critical Period Hypothesis (CPH), ... Krashen, 1982;Major, 2014;Nikolov & Djigunović, 2006;). Estas investigaciones encuentran su sustento teórico en la Hipótesis del Periodo Crítico ...

  8. Age, Rate and Eventual Attainment in Second Language Acquisition

    Stephen D. Krashen, Michael A. Long, and Robin C. Scarcella This paper presents evidence for three generalizations concerning the relationship between age, rate, and eventual attainment in second language ... While recent research reports have claimed to be counter to the hypothesis that there is a critical period for language acquisition, the ...

  9. Was Krashen right? Forty years later

    In the late 1970s and early 1980s, Stephen Krashen developed Monitor Theory—a group of hypotheses explaining second language acquisition with implications for language teaching. As the L2 scholarly community began considering what requirements theories should meet, Monitor Theory was widely criticized and dismissed, along with its teaching ...

  10. The Optimal Distance Model of Second Language Acquisition

    The critical period hypothesis has been viewed in recent second lan-guage research as a biological or developmental phenomenon which explains ... ments for a second language critical period have been presented (Krashen 1973, 1976; Oyama 1976; Rosansky 1975; Schumann 1976) which appealed--as did ...

  11. Second Language Acquisition and the Critical Period Hypothesis

    Krashen's Input Hypothesis (Krashen, 1985) is totally undermined if a critical period does indeed exist, since the hypothesis assumes not only that L2 acquisition is similar in nature to L1 acquisition, but also that this is the case for learners of any age. Alhough many would claim that Krashen's theories are seriously flawed in any case ...

  12. The Critical Period for Language Acquisition and Its Possible Bases

    The idea of a critical period for learning derives from studies by Hessl and by Lorenz,* who observed that greylag goslings "imprinted" on certain moving objects only during a certain limited developmental stage (a few hours after hatching). Lenneberg3 (see also Ref. 4) has hypothesized that a critical period may exist for human language as well, and suggests that first language may be ...

  13. The Critical Period for Language Acquisition and Its Possible Bases

    THE CRITICAL PERIOD FOR LANGUAGE ACQUISITION AND ITS POSSIBLE BASES. Stephen D. Krashen, Stephen D. Krashen ... Stephen D. Krashen, Stephen D. Krashen. English Language Institute Queens College, College of the City of New York Flushing, New York 11367. Department of Linguistics, University of Southern California, Los Angeles, Calif. 90007 ...

  14. The Critical Period Hypothesis in Second Language Acquisition: A

    Delineating the scope of the critical period hypothesis. First, the age span for a putative critical period for language acquisition has been delimited in different ways in the literature .Lenneberg's critical period stretched from two years of age to puberty (which he posits at about 14 years of age) , whereas other scholars have drawn the cutoff point at 12, 15, 16 or 18 years of age .

  15. Lateralization, Language Learning, and The Critical Period: Some New

    New evidence is presented that modifies Lenneberg's (1967) proposed critical period of language acquisition. The development of lateralization is complete much earlier than puberty and is thus not a barrier to accent free second language learning by adults. Rather, the development of lateralization may correspond to normal first language acquisition. Also, the case of Genie, a girl who endured ...

  16. Lateralization, Language Learning, and The Critical Period: Some New

    Rather, the development of lateralization may correspond to normal first language acquisition. Also, the case of Genie, a girl who endured 11 years of enforced isolation, shows that some first language acquisition is possible after the critical period, although mechanisms outside of the left hemisphere may be involved.

  17. Critical period effects in second language learning: The influence of

    In its basic form, the critical period hypothesis need only have consequences for first language acquisition. Nevertheless, it is essential to our understanding of the nature of the hypothesized critical period to determine whether or not it extends as well to second language acquisition. ... (Krashen, 1975). Nevertheless, the hypothesis that ...

  18. (PDF) The Critical Period Hypothesis Revisited: The Implications for

    Since most of the studies on critical period hypothesis are not consistent with each other, Krashen, Long, and Scarcella (1979) brings a different argum ent. The claim is that, there are three ...

  19. Input hypothesis

    Input hypothesis. The input hypothesis, also known as the monitor model, is a group of five hypotheses of second-language acquisition developed by the linguist Stephen Krashen in the 1970s and 1980s. Krashen originally formulated the input hypothesis as just one of the five hypotheses, but over time the term has come to refer to the five ...

  20. [PDF] The Critical Period Hypothesis: Support, Challenge, and

    The Critical Period Hypothesis: Support, Challenge, and Reconceptualization. A. Schouten. Published 31 May 2009. Linguistics. Given the general failure experienced by adults when attempting to learn a second or foreign language, many have hypothesized that a critical period exists for the domain of language learning.

  21. PDF The Implications of the Critical Period Hypothesis for Second Language

    such as Krashen (1982), Michael Long (1990), Hatch (1983) and McDonald (2006), who have . Adult and Higher Education (2022) ... critical period hypothesis further highlights the different characteristics of children and adults when it comes to learning a second language.

  22. Critical Evidence: A Test of the Critical-Period Hypothesis for Second

    critical period terminates is 5 years (Krashen, 1973), 6 years (Pinker, 1994), 12 years (Lenneberg, 1967), or 15 years (Johnson & Newport, 1989). An alternative to the critical-period hypothesis is that second-language learning becomes compromised with age, potentially because of factors

  23. Krashen's Language Acquisition Hypotheses: A Critical Review

    This paper aims at revealing the factors that contribute to children's language acquisition of either their first or second language. The affective filter hypothesis (Krashen, 2003) as the underlying framework of this paper is used to see how children's perception towards the language input take a role in the process of acquisition. 25 lecturers in the Faculty of Letters, State University of ...