Loading metrics

Open Access

Peer-reviewed

Research Article

Genetic drift and selection in many-allele range expansions

Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America

ORCID logo

Roles Conceptualization, Formal analysis, Methodology, Software, Supervision, Validation, Visualization, Writing – review & editing

Current address: Department of Physics and Astronomy, University of Tennessee, Knoxville, Tennessee, United States of America

Affiliation Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America

Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – review & editing

Affiliations Living Systems Institute, University of Exeter, Exeter, United Kingdom, Physics and Astronomy, College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, United Kingdom, Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America

Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

Affiliations FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, United States of America, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America

Roles Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

Affiliations School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America, Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America, FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, United States of America, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America

  • Bryan T. Weinstein, 
  • Maxim O. Lavrentovich, 
  • Wolfram Möbius, 
  • Andrew W. Murray, 
  • David R. Nelson

PLOS

  • Published: December 1, 2017
  • https://doi.org/10.1371/journal.pcbi.1005866
  • See the preprint
  • Reader Comments

Fig 1

We experimentally and numerically investigate the evolutionary dynamics of four competing strains of E. coli with differing expansion velocities in radially expanding colonies. We compare experimental measurements of the average fraction, correlation functions between strains, and the relative rates of genetic domain wall annihilations and coalescences to simulations modeling the population as a one-dimensional ring of annihilating and coalescing random walkers with deterministic biases due to selection. The simulations reveal that the evolutionary dynamics can be collapsed onto master curves governed by three essential parameters: (1) an expansion length beyond which selection dominates over genetic drift; (2) a characteristic angular correlation describing the size of genetic domains; and (3) a dimensionless constant quantifying the interplay between a colony’s curvature at the frontier and its selection length scale. We measure these parameters with a new technique that precisely measures small selective differences between spatially competing strains and show that our simulations accurately predict the dynamics without additional fitting. Our results suggest that the random walk model can act as a useful predictive tool for describing the evolutionary dynamics of range expansions composed of an arbitrary number of genotypes with different fitnesses.

Author summary

Population expansions occur naturally during the spread of invasive species and have played a role in our evolutionary history when humans migrated out of Africa. We use a colony of non-motile bacteria expanding into unoccupied, nutrient-rich territory on an agar plate as a model system to explore how an expanding population’s spatial structure impacts its evolutionary dynamics. Spatial structure is present in expanding microbial colonies because daughter cells migrate only a small distance away from their mothers each generation. Generally, the constituents of expansions occurring in nature and in the lab have different genetic compositions (genotypes, or alleles if a single gene differs), each instilling different fitnesses, which compete to proliferate at the frontier. Here, we show that a random-walk model can accurately predict the dynamics of four expanding strains of E. coli with different fitnesses; each strain represents a competing allele. Our results can be extended to describe any number of competing genotypes with different fitnesses in a naturally occurring expansion as long as the underlying motility of the organisms does not cause our model to break down. Our model can also be used to precisely measure small selective differences between spatially competing genotypes in controlled laboratory settings.

Citation: Weinstein BT, Lavrentovich MO, Möbius W, Murray AW, Nelson DR (2017) Genetic drift and selection in many-allele range expansions. PLoS Comput Biol 13(12): e1005866. https://doi.org/10.1371/journal.pcbi.1005866

Editor: Jeff Gore, MIT, UNITED STATES

Received: June 8, 2017; Accepted: November 1, 2017; Published: December 1, 2017

Copyright: © 2017 Weinstein et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All of the experimental data used to create the figures and tables in this paper and its supplemental materials can be publicly accessed in a Dryad repository at the following DOI: 10.5061/dryad.n9r96 .

Funding: Research by BTW is supported by the Department of Energy Office of Science Graduate Fellowship Program (DOE SCGF), made possible in part by the American Recovery and Reinvestment Act of 2009, administered by ORISE-ORAU under contract no. DE-AC05-06OR23100, by the US Department of Energy (DOE) under Grant No. DE-FG02-87ER40328, as well as Harvard University’s Institute for Applied Computational Science (IACS) Student Fellowship. BTW, AWM, and DRN benefitted from support from the Human Frontiers Science Program Grant RGP0041/2014 and from the National Science Foundation, through grants DMR1608501 and via the Harvard Materials Science and Engineering Center, through grant DMR1435999. MOL acknowledges support from NSF grant DMR-1262047, the UPenn MRSEC under Award No. NSF-DMR-1120901, the US Department of Energy, Office of Basic Energy Sciences, Division of Materials Sciences and Engineering under Grant No. DE-FG02-05ER46199, and from the Simons Foundation for the collaboration "Cracking the Glass Problem’’ (Grant No. 454945). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

A competition between stochastic and deterministic effects underlies evolution. In a well-mixed system such as a shaken culture of the yeast microorganism Saccharomyces cerevisiae , stochastic competition between individuals, mutations, and selection dictate the dynamics of the population [ 1 ]. In spatially structured environments, active or passive dispersal of individuals also plays an important role. The local “well-mixed” dynamics must be coupled to the motion of individuals, leading to strikingly different evolutionary dynamics, even in the absence of selection [ 2 – 7 ].

A model laboratory system that can be used to explore the coupling between local “well-mixed” effects and spatial deterministic and stochastic dynamics is a microbial range expansion [ 8 ], in which a population expands into an unoccupied region of a hard agar Petri dish. Non-motile microbes expand outwards from their initial position due to a combination of growth coupled with random pushing by neighboring cells and leave behind a record of their genetic competition as they cannot move and cease reproducing once the population becomes too dense [ 8 ]. A frozen genetic pattern of four competing strains of E. coli marked by different fluorescent colors can be seen in Fig 1 . Spatial structure is present in the frozen genetic patterns because the microbes at the expanding frontier produce daughter cells of the same color that migrate only a small fraction of the front circumference within a generation. Hallatschek et al. [ 8 ] identified the key role of genetic drift in producing these sectored patterns; the small population size at the front of an expanding population [ 9 , 10 ] enhances number fluctuations (i.e. genetic drift), eventually leading to the local fixation of one strain past a critical expansion radius R 0 . The decrease in genetic diversity as the small number of individuals at the frontier expands is referred to as the “Founder effect” [ 11 ].

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.g001

Outside of the laboratory, range expansions occur naturally during the spread of invasive species such as the bank vole in Ireland [ 12 ] or the cane toad in Australia [ 13 ], and played a role in the evolutionary history of humans when migrating out of Africa [ 14 ]. In these natural expansions, populations may have many competing genotypes, or alleles, each instilling a different fitness. Even if a population is originally clonal, mutations may create new alleles that compete with one another to proliferate, a phenomenon known as clonal interference [ 15 ].

An allele’s fitness is often determined by its corresponding expansion velocity. Faster expanding individuals will colonize more territory and will block slower strains from expanding, resulting in the increased abundance of ‘faster’ alleles at the frontier [ 13 , 16 , 17 ]. If the curvature of a microbial colony can be neglected and its front is sufficiently smooth, it has been shown both theoretically and experimentally that the domain wall of a faster expanding strain will displace a slower expanding strain at a constant rate per length expanded after an initial transient, resulting in a characteristic triangular shape [ 17 ] as shown on the right side of Fig 1 . If the curvature of the expansion is not negligible, the sector boundaries will trace logarithmic spirals [ 17 ].

Even in the most simple scenario when de-novo mutations and mutualistic or antagonistic interactions are ignored, the dynamics of many competing alleles with varying fitnesses at the front of a range expansion have neither been quantified theoretically nor explored in laboratory experiments. Prior laboratory experiments focused on the dynamics of a single sector of a more fit strain (representing a competing alelle) of yeast sweeping through a less fit strain [ 17 ] in regimes where stochastic wandering of genetic boundaries was not expected to be important. Recent experimental work studied how fast a single more fit strain swept through a less fit strain in a range expansion and compared the dynamics to the same strains in a well mixed test tube [ 9 ].

In this paper, we experimentally and numerically investigate the dynamics of four competing strains (alleles) of E. coli with varying selective advantages initially distributed randomly at the front of a radial range expansion. The eCFP (blue) and eYFP-labeled (yellow) strains expanded the fastest, followed by the non-fluorescent (black) strain, and finally the mCherry-labeled (red) strain. The differences in expansion speeds are reflected in Fig 1 as follows: the yellow/blue bulges at the front of the expansion are larger than the black bulges which are larger than the red bulges. The significant random undulations at the frontier, however, significantly mask the selection-induced bulges.

genetic drift research paper

Experimental results

We begin by reporting our measurements of the average fraction of each strain, the two-point correlation functions between strains, and the relative rates of annihilations and coalescences as a function of length expanded for our four competing strains of E. coli . As discussed in the Materials and Methods, we found that our eCFP and eYFP strains had the fastest expansion velocities followed by the black strain and finally the mCherry strain (see Table 1 ). We expected that our experimental measurements would reflect this hierarchy of speeds; faster expanding strains should have a larger fitness than slower expanding ones. To illustrate the presence of selection, we used neutral theory (discussed in detail in S1 Appendix ) as a null expectation; selection caused deviations from the neutral predictions. To calibrate neutral theory to our experiments we fit R 0 and D w , two model parameters illustrated in Fig 1 , following the procedures discussed in the Materials and Methods. The fit values of R 0 and D w can be seen in Table 2 . In later sections, we show how to predict the average fraction, two-point correlation functions, and relative rates of annihilation and coalescences using our random-walk model and simulation.

thumbnail

https://doi.org/10.1371/journal.pcbi.1005866.t001

thumbnail

https://doi.org/10.1371/journal.pcbi.1005866.t002

Average fractions.

genetic drift research paper

We measured the average fraction versus radial length expanded in two separate sets of experiments where we inoculated different fractions of our eYFP, eCFP, and mCherry strains. In one experiment, we inoculated the eYFP, eCFP, and mCherry strains with equal initial fractions of 33% while in the other we inoculated 80% of the mCherry strain and 10% each of the eCFP and eYFP strains. We conducted 20 replicates in each case and calculated the average fraction of each strain using our image analysis package. Fig 2 displays the trajectories of the 20 expansions and the mean trajectory (the average fraction) as ternary composition diagrams for both sets of initial conditions [ 37 ].

thumbnail

The red dot indicates the composition at the radius R 0 = 3.50 mm where distinct domain walls form and the blue dot indicates the composition at the end of the experiment. The red dots are dispersed about the initial inoculated fractions due to the stochastic dynamics at the early stages of the range expansions when R < R 0 . The highly stochastic trajectories illustrate the importance of genetic drift at the frontier in the E. coli range expansions. The smaller ternary diagrams display the average fraction over all expansions vs. length expanded for each set of experiments. For both initial conditions, we see a small systematic drift away from the mCherry vertex indicating that the mCherry strain has a lower fitness, in agreement with the independent radial expansion velocities of each strain (see Table 1 ). Note that two replicates on the right resulted in the complete extinction of eCFP due to strong spatial diffusion, indicated by the trajectories pinned on the absorbing line connecting the eYFP and mCherry vertices.

https://doi.org/10.1371/journal.pcbi.1005866.g002

In both sets of experiments, we observed a systematic drift away from the mCherry vertex as a function of radius as illustrated by the mean trajectories shown as insets. We witnessed two cases where the 10% initial inoculant of the eCFP strain became extinct, represented by the pinning of trajectories to the absorbing boundary connecting the eYFP and mCherry vertex, a consequence of the strong genetic drift at the frontiers of our E. coli range expansions. These measurements indicate that the mCherry strain was less fit than the eCFP and eYFP strains, consistent with the order of the radial expansion velocities.

Two-point correlation functions.

genetic drift research paper

We measured the correlation functions between each pair of strains in three sets of experiments where we inoculated equal well-mixed fractions of the eCFP, eYFP, and black strains, then eCFP, eYFP, and mCherry, and then finally all four strains. We conducted 20 replicates of each experiment, measured all two-point correlation functions at the final radius of R = 10 mm corresponding to a length expanded of L = R − R 0 = 6.5 mm, and averaged the results. In Fig 3 , we plotted the neutral correlation function prediction and compared it to the experimentally measured correlation functions.

thumbnail

The shaded regions in these plots indicate standard errors of the mean. Using the measured diffusion coefficient D w and initial radius where domain walls form R 0 (see Table 2 ), we also plot the theoretical neutral two-point correlation functions (black dashed line; see eq. (S1.3)). The colors of each plotted correlation function were chosen to correspond to their composite strain colors; for example, two-point correlation correlation functions associated with mCherry were red or were blended with red. The subscripts correspond to the color of each strain: C = eCFP, Y = eYFP, R = mCherry, and B = Black. As judged by the magnitude of the deviation from neutral predictions, the black strain has a small selective disadvantage relative to eCFP and eYFP and the mCherry strain has an even greater disadvantage, in agreement with the independent radial expansion velocities of each strain (see Table 1 ).

https://doi.org/10.1371/journal.pcbi.1005866.g003

The two-point correlation functions in the experiment between eCFP, eYFP, and the black strains (first column of Fig 3 ) are consistent with the order of radial expansion velocities (see Table 1 ). The correlation between the eCFP and eYFP strains plateaued at a higher value than the neutral prediction while the correlation between eCFP and black plateaued at a lower value, indicating that the eCFP and eYFP strains were more fit. The self-correlation for the black strain, F BB , also plateaued at a value below eCFP, eYFP, and the neutral prediction, further indicating that it had a smaller fitness. The self-correlation data was more noisy than the correlation between strains, however; we consistently found that correlations between strains were better at detecting fitness differences than self-correlations.

In contrast, combining eCFP, eYFP, and mCherry in one set of experiments and all four strains in another revealed that mCherry had a larger fitness defect. Correlation functions including mCherry always plateaued at a significantly smaller value than correlation functions excluding it. Furthermore, off-diagonal (bottom-row of Fig 3 ) correlation functions involving the mCherry strain had a smaller slope at zero angular separation, indicating that less mCherry domain walls were present and that the mCherry strain was less fit than the others. The two-point correlation functions were thus consistent with the black strain having a small selective disadvantage relative to eCFP and eYFP and the mCherry strain having a larger disadvantage relative to all others.

Annihilation asymmetry.

genetic drift research paper

To gain insight into the behavior of Δ P , for the case of q neutral colors in equal proportions, we have lim q →∞ Δ P ( q ) = −1 (only coalescences), Δ P ( q = 3) = 0 (equal numbers of annihilations and coalescences), and Δ P ( q = 2) = 1 (only annihilations). The quantity Δ P thus provides a simple way to characterize the annihilation/coalescence difference in a single curve that varies smoothly between −1 and 1 as 2 ≤ q < ∞. In S1 Appendix we develop and discuss the case when strains are inoculated in non-equal proportions (see supplementary equations (S1.8)–(S1.10)); in that scenario, it is useful to define a “fractional q ” by inverting eq (3) to read q = (3 + Δ P )/(1 + Δ P ) (i.e. a fractional q can be evaluated for a given Δ P ).

To experimentally quantify the annihilation asymmetry, we examined the average cumulative difference in annihilations and coalescences vs. the average cumulative number of domain wall collisions as colonies expanded; Δ P is given by the slope of this quantity and can be seen in Fig 4 (see Supplementary S1 Fig for a display of cumulative count vs. length expanded). Regardless of which strains were inoculated and their selective differences, our results were consistent with the neutral theory prediction in eq (3) for q = 2, q = 3, and q = 4 as judged by the overlap of the black dashed line with the shaded standard error of the mean in each case. Δ P appeared to be constant as a function of length. We also tested an initial condition where we inoculated strains in unequal proportions: we inoculated 10% of eCFP and eYFP and 80% of mCherry. This experiment again matched the neutral prediction of Δ P ≈ 0.51 (and correspondingly q ≈ 2.33) within error. Evidentally, as discussed in more detail below, certain observables like the average fraction and two-point correlation functions show stronger signatures of selection than others like the annihilation asymmetry.

thumbnail

The slope of this plot gives the annihilation asymmetry Δ P . The shaded regions represent the standard error of the mean between many experiments. We use the notation C = eCFP, Y = eYFP, B = black, and R = mCherry. Despite the presence of selection, Δ P was consistent with the standard neutral theory prediction of eq (3) for q = 2, q = 3, and q = 4 (equal initial fractions of q strains), as judged by the overlap of the black dashed lines with the shaded areas in every case. We also explored an initial condition where we inoculated unequal fractions of three strains; we inoculated 10% of both eCFP and eYFP and 80% of mCherry. Our experiments agreed with the prediction of Δ P ≈ 0.51, or an effective q ≈ 2.33, from the neutral theory developed in supplementary equations (S1.8)–(S1.10).

https://doi.org/10.1371/journal.pcbi.1005866.g004

Simulation results

genetic drift research paper

Key parameters.

genetic drift research paper

If κ ≳ 1, inflation does not appreciably slow selective sweeps as L I approaches the linear selection length scale L s . In contrast, if κ ≪ 1, the inflationary selection length scale L I will be many times larger than the linear selection length scale L s , indicating that selection will be weak compared to inflation and diffusion (but will ultimately dominate at very large lengths expanded). The three black points correspond to measurements of the κ ij that govern the dynamics of our competing strains; N stands for the two selectively neutral strains (eCFP and eYFP), B for black, and R for mCherry (red). See the Predicting experimental results with simulation section for more details.

https://doi.org/10.1371/journal.pcbi.1005866.g005

genetic drift research paper

Collapsing the evolutionary dynamics with the key parameters.

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.g006

We now consider the collapsed curves F ( L / L s , κ ) and Δ P ( L / L s , κ ) as a function of the parameter κ as seen in Fig 6 . κ had a pronounced effect on both quantities. For κ ≳ 5 the dynamics of F and Δ P approached the dynamics of a linear expansion at all L / L s , illustrated by the bright pink line on the left and the bright pink dots on the right of Fig 6 ; the more fit strain swept so quickly through the less fit strain that the colony’s radial expansion could be ignored. As κ decreased, the less fit strain was squeezed out more slowly due to the inflation of the frontier, resulting in slower transitions from q = 3 to q = 2 colors and consequently slower transitions from Δ P = 0 to Δ P = 1. For κ ≪ 1, Δ P barely shifted from 0 over the course of the simulation. Interestingly, Δ P peaked at a finite L / L s for small κ ; it is not clear what causes this effect, but it may be related to the transition from linear to inflation-dominated dynamics as L increases.

genetic drift research paper

Predicting experimental results with simulations

A major goal of this paper is to test if the annihilating and coalescing random-walk model can predict the experimental evolutionary dynamics of our four competing strains (alleles) with different fitnesses (radial expansion velocities). To the best of our knowledge, analytical results for the random-walk model are unavailable (as discussed in S1 Appendix ); we consequently used our simulations to predict the dynamics. In this section we quantify the three key parameter combinations for our experimental expansions and then use them to predict the evolutionary dynamics of all four of our competing E. coli strains in an independent experiment.

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.g007

To determine the best-fitting value of L s , we calculated the sum of the squared displacements weighted by the inverse of the experimental standard error squared between experiment and simulation. The best-fitting L s was determined by finding the value which minimized the weighted sum of squares. To estimate the error in our fit, we assigned each potential value of L s a probability proportional to the inverse of the weighted sum of squares, normalized the probability distribution, and set the error in our fit of L s to the confidence intervals of the probability distribution.

genetic drift research paper

“CI” stands for confidence interval.

https://doi.org/10.1371/journal.pcbi.1005866.t003

To test that the resulting L s and κ could accurately predict the experimental dynamics at all L and not just the L where the correlation functions were fit, we plotted the experimental average fraction and correlation functions (solid lines, Fig 8 ) as we varied L and compared their values to those predicted by simulation (dashed lines, Fig 8 ). Fig 8 uses the same set of experimental data as that from Fig 7 . The simulation using the fit parameters always closely tracked the experimental values at all L , suggesting that our fitting technique was robust and could be used to describe the dynamics of our strains.

thumbnail

The shaded region is the standard error of the mean. The simulated dynamics closely match the experimental dynamics, suggesting that our fitting technique to extract L s is robust and can be used to describe the dynamics of our strains at all L .

https://doi.org/10.1371/journal.pcbi.1005866.g008

genetic drift research paper

No additional fitting parameters were used. The shaded region is the standard error of the mean. The simulated dynamics closely matched the experimental dynamics except at small lengths expanded ( L ≲ 3 mm) where the black strain introduced significant image analysis artifacts (see Supplementary S5 Fig ).

https://doi.org/10.1371/journal.pcbi.1005866.g009

genetic drift research paper

The quantitative agreement between our model and our experiments suggests that the one-dimensional annihilating-coalescing random walk model can indeed be used to predict the dynamics of many competing strains with different fitnesses in a range expansion.

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.t004

genetic drift research paper

Materials and methods

We used four E. coli strains (labelled BW001, BW002, BW003, and BW012) with a DH5 α background and plasmids whose sequences coded for spectrally distinguishable fluorescent proteins. The unique colors were obtained by using the plasmid vector pTrc99a [ 39 ] and the open reading frame for the respective fluorescent proteins. Strains BW001, BW002, and BW003 expressed eCFP (cyan/blue), Venus YFP (yellow), and mCherry (red) respectively, and were identical to the E. coli strains eWM282, eWM284, and eWM40 used in Ref. [ 40 ]. Note that these three strains were isogenic and differed only by the open reading frames corresponding to their respective fluorescent proteins. The final strain, BW012, was a mutated descendant of strain BW002 (yellow) that fluoresced at a decreased intensity, appearing black, while retaining its ampicillin resistance from the pTrc99a vector. Throughout this work, no additional mutations were introduced or observed. We therefore consider that these four strains correspond to four different alleles. Throughout the paper, we refer to the strains as eCFP, eYFP, mCherry, and black.

Experimental setup

To prepare saturated cultures, strains were inoculated in 10mL of 2xYT media and were shaken for approximately 16 hours at 37°C. After vortexing each saturated culture and obtaining their concentration via optical density (OD-600) measurements, appropriate volumes (e.g., 1:1:1 mixtures of three strains) were added to an Eppendorf tube with a final volume of 1mL. The Eppendorf tube was then vortexed to uniformly mix the strains. A volume of 2 μ L was taken from the vortexed tube and placed on center of a 100 mm diameter Petri dish containing 35 mL of lysogeny broth (LB), ampicillin at a concentration of 100 μ g/mL, and 1.25% w/v bacto-agar. The carrier fluid in the resulting circular drop evaporated within 2-3 minutes, depositing a circular “homeland” of well-mixed bacteria onto the plate.

After inoculation, plates were stored for 8 days upside down (to avoid condensation) in a Rubbermaid 7J77 box at 37°C with a beaker filled with water; the water acted as a humidifier and prevented the plates from drying out. The plates were occasionally removed from the box and imaged (at roughly 24 hour intervals) using the brightfield channel to determine the radius of the colony as a function of time. On the eighth day, the plates were imaged in both fluorescent and brightfield channels. The number of replicate plates used are stated next to the respective experimental results. If we noticed that a mutation had occurred during an expansion (mutations usually presented themselves as unexpected large bulges at the front of a colony or as distortions in fluorescent intensity), we discounted the colony.

Image acquisition and analysis

We imaged our range expansions with a Zeiss SteREO Lumar.V12 stereoscope in four channels: eCFP, eYFP, mCherry (fluorescent channels), and brightfield. In order to analyze a colony with a maximum radius of approximately 10 mm using a single image, we stitched four images together with an overlap of 20% using AxioVision 4.8.2, the software accompanying the microscope. We blended the overlapping areas of the images to lessen the impact of background inhomogeneities. An example of a stitched image can be seen on the left side of Fig 10 . Stitching introduced small artifacts such as vertical lines near the center of our expansions; we verified that these did not affect our results.

thumbnail

Images were acquired for four overlapping quadrants and stitched together to obtain a single image with a large field of view. Overlapping regions were blended to minimize inhomogeneities. To obtain the binary masks, pixels with fluorescence above background noise were marked as “on.” A visual comparison of the raw data and the masks confirm that our binary masks accurately reflect the location and shape of individual sectors.

https://doi.org/10.1371/journal.pcbi.1005866.g010

To extract the local fraction of each strain per pixel, we first created binary masks for each fluorescence channel indicating if the corresponding E. coli strain was present. We utilized the “Enhance Local Contrast” (CLAHE) algorithm [ 41 ] in Fiji [ 42 ], an open-source image analysis platform, to help correct for inhomogeneities in background illumination. After applying the CLAHE algorithm, a combination of automatic thresholding and manual tracing yielded a binary mask of each channel, an example of which is shown in Fig 10 ; the image on the left is an overlay of an experimental range expansion’s fluorescent channels and the image on the right is the overlay of the corresponding binary masks. A small amount of manual tracing was required near the edges of our colonies because our fluorescent lamp provided uneven illumination; resulting dark regions could barely be identified above background noise. As we mainly used manual tracing near the edge of the colonies where the monoclonal sectors were well defined, we found that our procedure was very reproducible. To alleviate this problem, future work could utilize brighter strains or a more advanced imaging setup.

We mapped the binary images to the local fraction of each E. coli strain in the following way: if N binary masks (corresponding to N colors) were “on” at a pixel, the local fraction of their corresponding channels was assigned to be 1/ N . Although this assignment produces inaccuracies (i.e., if one strain occupied 90% of a pixel and the other occupied 10%, our algorithm would register both as 50%), domain boundaries were the only areas besides the homeland and the early stages of the range expansions where multiple strains were colocalized. The black strain was defined to be present at pixels reached by the range expansion in which no other strains were present. Although this definition introduced errors at radii close to the homeland with significant color overlap, the error became negligible at large radii as quantified in Supplementary S5 Fig . Once we determined the fraction of each strain at each pixel, we were able to extract quantities such as the total fraction of each strain in the colony and spatial correlations between strains at a given expansion radius.

The mask in Fig 10 highlights that sector boundaries can be used to determine local strain abundance. Although it is possible to extract the position of every domain wall from each strains’ local fraction, it is challenging to actually track a single wall due to collisions between walls. To address this problem, we created a binary mask of the edges in our images and labelled the edges of each domain. Annihilations and coalescences were counted manually within Fiji [ 42 ]; automated measures were not accurate enough.

It is worth pointing out that in this paper, we ignore the three-dimensional structure of our colonies and describe them by our two-dimensional images taken with the stereoscope. We justify this approximation because the initial diameter of our colonies is at least a factor of 10 larger than their height (less than 1 mm as judged by a ruler), so they are effectively two-dimensional, and because the strain composition of our colonies does not vary with height inside the colony. We confirmed that strain composition does not vary with height by using a confocal microscope to probe the internal structure and also by taking a pipette tip, scratching it through a sector, growing the cells touched by the tip in overnight culture, and verifying that plated single colonies from the culture were the same color as the sector.

Measuring radial expansion velocities u i

We used the average expansion velocity of each strain for radii R > R 0 as a proxy for selective advantage, similar to previous work [ 17 , 35 ]. In three independent sets of experiments using different batches of agar plates (the main source of variability in our experiments), we measured the diameter of 12 expansions of each strain approximately every 24 hours following the protocol for range expansions with two or more strains. To account for biological variance, sets of four of the 12 colonies were created from independent single colonies; no statistical difference was seen between biological replicates. The diameters were determined by manually fitting a circle to a brightfield image of the expansion three times and averaging the measured diameters. Fig 11 shows the average radius increasing with time for each strain from one of our experiments. In every experiment, the eCFP and eYFP strains had the fastest expansion velocities (the respective datapoints overlap in Fig 11 ), followed by the black strain, and then finally the mCherry strain. The expansion velocity slowly decreased as a function of time; we attribute this to nutrient depletion in the plates.

thumbnail

The error bars (comparable to symbol size at early times) are the standard errors of the mean calculated from 12 replicate expansions for each strain. The eYFP and eCFP strains had the fastest expansion velocities (data points overlap in the plot) followed by black and then mCherry. R 0 is the radius at which expansions with competing strains typically demix into one color locally; R 0 is approximately 1.75 times the initial inoculant radius of 2 mm (see Fig 1 ).

https://doi.org/10.1371/journal.pcbi.1005866.g011

The radial expansion velocity of each strain was obtained by using linear regression to fit the radius versus time for radii greater than R 0 . We calculated the average radial expansion velocity between the three sets of plates and reported its error as the standard error of the mean; see Table 1 . Additionally, we quantified the dimensionless selective advantage of each strain relative to the slowest growing mCherry strain following [ 17 ] via s iR = u i / u R − 1 where the R indicates the mCherry strain (red) in each experiment. The selective advantages were consistent, within error, when we calculated the velocities u i and u R over different time intervals. We averaged s iR across our three experiments and reported its error as the standard error of the mean as seen in Table 1 .

The eCFP and eYFP strains had an average selective advantage of 9%, similar to the experiments of Weber et al. [ 35 ] which found, despite the fact that they used different E. coli strains and plasmids, that the expression of mCherry decreased the expansion velocity of their strains by approximately 15% in certain “fast growth” environmental conditions. Our black strain had an approximately 6% enhancement over the mCherry strain. Differences in radial expansion velocities of this magnitude have been used to study yeast S. cerevisiae and E. coli range expansions in the past [ 9 , 17 ]. To investigate the source of this fitness defect, we took the plasmids from our original strains, inserted them into a different set of clonal DH5 α cells, and inoculated the new eCFP, eYFP, and mCherry strains in equal proportions in a range expansion. We saw that the average mCherry fraction decreased by 10% at a radius expanded of R = 10 mm, matching the results of Fig 2 , suggesting that the presence of the plasmids was responsible for the fitness defect.

genetic drift research paper

Comparing well-mixed fitness to fitness from expansion velocities

genetic drift research paper

Measuring the local fixation radius R 0

When calibrating our model to experiment, the precise value of R 0 did not matter as long as each strain’s local fraction could be accurately measured at that radius. Therefore, to maximize the length over which we could quantify range expansion growth, we defined the local fixation radius R 0 as the minimum radius where our image analysis package became accurate. For R < R 0 , our package predicted equal fractions of each strain due to the overlap of each channel in the homeland (see Fig 10 ). Therefore, to determine R 0 , we inoculated radial expansions with three strains in unequal proportions; we used 10% of two strains and 80% of another. The minimum radius where the fractions agreed with their inoculated values was R 0 = 3.50 ± 0.05 mm as seen in Supplementary S6 Fig . We found that this value of R 0 worked for all colonies.

Measuring the domain wall diffusion coefficient D w

genetic drift research paper

We fit H ( ϕ , L ) to our experimentally measured heterozygosity of two neutral strains (eCFP and eYFP) on three independent sets of agar plates each with 14 range expansions. We averaged the heterozygosity at each L as can be seen in Fig 12 (error bars were omitted for readability; the same figure with error bars can be found as Supplementary S7 Fig ). As we had previously measured R 0 = 3.50 ± 0.05 mm, and H 0 = 1/2 for two neutral strains inoculated at equal fractions, D w is the single free parameter in eq (11) . We consequently fit D w at each L with non-linear least-squares, averaged the D w from the three independent experiments, and found D w = 0.100 ± 0.005 mm; the reported error is the standard error of the mean between the experiments. The value of the diffusion constant is on the same order of magnitude as that from previous work [ 18 ].

thumbnail

The dashed lines are the theoretical fits of the heterozygosity with a constant D w = 0.100 ± 0.005 mm. The theoretical curves track our experimental data, suggesting that a diffusive approximation to domain boundary motion is justified.

https://doi.org/10.1371/journal.pcbi.1005866.g012

Fig 12 shows the Voter model’s fit (dashed lines) together with the experimental heterozygosity (solid lines) for one set of plates using our values of D w and R 0 . The fit closely matches the experimental heterozygosity suggesting that a diffusive description of E. coli domain motion is justified. We use this value of D w for all strains. In principle, D w may depend on ij , the particular domain wall type. However, we checked that the measured value of D w did not vary for our all ij (all strain) combinations by examining the variance in domain wall position versus length expanded; the variances agreed within error and were thus consistent with a constant D w . The two-point correlation functions in the main-text were well fit by a constant D w as well. Unlike the Voter model and our simulations, the experimental heterozygosity at zero separation H ( L , ϕ = 0) fails to vanish due to overlap between strains at domain boundaries; this effect is less pronounced at large radii because the effective angular width of boundaries decreased. The discrepancy between the theoretical and experimental heterozygosity is larger at small lengths expanded because the overlap between strains is larger; our image analysis is consequently less accurate.

Measuring the domain wall velocities

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.g013

genetic drift research paper

Simulation methods

Lattice simulations of range expansions, especially radial ones, can suffer from artifacts arising from the preferred directions of the lattice. It is possible to use an amorphous Bennett model lattice [ 44 ] to mitigate some of these effects [ 32 ]. Instead, we developed a simple off-lattice method that treats the domain walls as annihilating and coalescing random walkers moving along the edge of an inflating ring. The basic idea of the simulation is illustrated in Fig 14 . We incorporate both the random, diffusive motion of the domain walls as well as deterministic movement due to selection. The radial expansion procedure is most easily understood by first considering a linear range expansion simulation for which the simulation steps are as follows:

  • Create a line of N 0 microbes of width a at the linear frontier. Assign each microbe one of the q potential alleles.
  • Identify genetic domain walls by locating neighbors with different alleles; assign type ij to each wall where i and j are the strains to the left and right respectively. Assign a relative “growth rate” r ij to each wall characterizing the bias in the probability that strain i divides into new territory before strain j . Two such domain walls are shown in a radial expansion in Fig 14 .

genetic drift research paper

  • (b) If the hopping domain wall collides with another wall, react the walls instantaneously with an appropriate annihilation or coalescence depending on whether the leftmost and rightmost strains are the same or different respectively.
  • Increment the elapsed time by Δ t = 1/ N generations, where N is the number of domain walls at the beginning of the jump, and increment the length expanded by the colony by Δ L = a Δ t = a / N , where a (the cell width) is the distance that the colony expands per generation. Note that this length increment Δ L could also be some set by a different length scale d , i.e. Δ L = d / N (in our experiments, colonies typically expand further than a cell width during each generation due to growth behind the front). This does not change our analysis and we choose d = a for simplicity.
  • Repeat steps 3 and 4 until no domain walls remain or until the simulation has run for the desired number of generations.

thumbnail

The initial population is a circle of cells of radius R 0 = N 0 a /2 π , where N 0 is the initial number of cells and a is a cell width. During each time step (generation), the expansion advances a distance a ; the radius consequently grows according to R ( t ) = R 0 + at where t is the time in generations. The dashed circle shows the population after one generation time. Each domain wall position is tracked on the inflating ring (solid lines). At each time step, domain walls (two shown) hop to the left or right with probability P l and P r , respectively, with an angular jump length δϕ ≡ a / R ( t ), and the position is updated (dashed lines). After each domain wall movement, the time in generations is incremented by 1/ N where N is the number of domain walls present. For a linear simulation, the radius is simply not inflated in time, i.e. R ( t ) = R 0 .

https://doi.org/10.1371/journal.pcbi.1005866.g014

genetic drift research paper

In contrast to algorithms that follow the position and state of every organism at the front of a colony, our algorithm only tracks the positions of domain walls and is consequently much faster per generation as the sectors coarsen, allowing for simulations of larger colonies. Fig 15 displays a radial and linear simulation with three neutral colors and a fourth red color with a selective disadvantage comparable to our experiments. We check that our simulation correctly reproduces the behavior of a single more fit domain wall sweeping through a less fit strain as we vary simulation parameters in Supplementary S8 Fig . Our implementation of this algorithm and examples of how to use it are available on GitHub [ 34 ].

thumbnail

https://doi.org/10.1371/journal.pcbi.1005866.g015

Supporting information

S1 appendix. supplemental theory..

https://doi.org/10.1371/journal.pcbi.1005866.s001

S2 Appendix. Quantifying the discrepancy between radial expansion velocity and wall velocity.

https://doi.org/10.1371/journal.pcbi.1005866.s002

S1 Fig. Average cumulative annihilations and coalescences for two, three, and four strains.

All strains were inoculated in equal fractions except for the experiment with 10% of eCFP, 10% of eYFP and 80% of mCherry. The annihilation and coalescence rates (the slope of the respective curves) decrease as radius increases as there are less domain walls due to previous collisions and also because inflation decreases the probability of two walls colliding per length expanded. As the number of colors increases, coalescences occur more often than annihilations.

https://doi.org/10.1371/journal.pcbi.1005866.s003

S2 Fig. Collapse of F ij .

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.s004

S3 Fig. Collapsed average fraction and annihilation asymmetry on a linear scale.

Identical to Fig 6 except the y -axis of F ( L / L s , κ ) is placed on a linear scale, which may be useful for comparison with experiments.

https://doi.org/10.1371/journal.pcbi.1005866.s005

S4 Fig. Collapse of average fraction and annihilation asymmetry.

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.s006

S5 Fig. Image processing artifacts introduced by using a non-fluorescent (i.e. black) strain.

To estimate the image analysis artifacts introduced by using a non-fluorescent, black strain we performed an experiment with three fluorescent strains (eCFP, eYFP, and mCherry in equal initial proportions) and analyzed the data twice: once where we included all three fluorescent channels and once where we excluded the eCFP channel and treated it as if it were a black strain. We compared the black-substituted average fractions F i (the dashed lines) to the real fractions as a function of radius (the solid lines). At a small radius relative to R 0 = 3.5 mm, the error from introducing a black strain was large; this is likely because we defined black as the absence of any other channels and channels typically had large overlaps close to the homeland. At large radius, the error from introducing a black strain was negligible.

https://doi.org/10.1371/journal.pcbi.1005866.s007

S6 Fig. Determining R 0 .

To fit the radius R 0 where our image analysis package became accurate, we inoculated 80% of mCherry, 10% of eCFP, and 10% of eYFP in 10 range expansions and tabulated the average fraction of each strain. The inoculated fractions are illustrated by dashed lines. As seen in the plot, at a radius of approximately R 0 = 3.50 ± 0.05 mm the measured average fractions were closest to the inoculated fractions. Our image analysis package inaccurately predicted fractions in the homeland because of significant overlap between the strains.

https://doi.org/10.1371/journal.pcbi.1005866.s008

S7 Fig. Error bars when fitting D w .

The same as the right side of Fig 12 except with error bars; the shaded areas are the standard error of the mean.

https://doi.org/10.1371/journal.pcbi.1005866.s009

S8 Fig. Confirming simulation accuracy.

genetic drift research paper

https://doi.org/10.1371/journal.pcbi.1005866.s010

Acknowledgments

BTW would like to thank Matti Gralka, Paula Villa Martin, Miguel A Muñoz, Severine Atis, Markus F. Weber, Kirill Korolev, and Steven Weinstein for helpful discussion and advice.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 11. Hartl DL, Clark AG. Principles of Population Genetics. 4th ed. Sunderland, MA: Sinauer Associates, Inc.; 2007. Available from: http://www.sinauer.com/principles-of-population-genetics.html .
  • 21. Krapivsky PL, Redner S, Ben-Naim E. A Kinetic View of Statistical Physics. Cambridge: Cambridge University Press; 2010. Available from: http://ebooks.cambridge.org/ref/id/CBO9780511780516 .
  • 34. Weinstein B. Range Expansions on GitHub; 2016. Available from: https://github.com/Range-Expansions .
  • 37. Harper M. Python-ternary: A python library for ternary plots; 2011. Available from: https://github.com/marcharper/python-ternary .
  • 41. Zuiderveld K. Contrast limited adaptive histogram equalization. Graphics Gems IV. 1994; p. 474–485. https://doi.org/10.1016/B978-0-12-336156-1.50061-6

Learning mitigates genetic drift

Affiliations.

  • 1 Faculty of Science, Research Centre for Toxic Compounds in the Environment, Masaryk University, Kamenice 5, Building A29, 62500, Brno, Czech Republic. [email protected].
  • 2 Department of Experimental Biology, Faculty of Science, Masaryk University, Kamenice 5, 62500, Brno, Czech Republic. [email protected].
  • 3 Faculty of Science, Institute of Cell Biology, University of Bern, Baltzerstrasse 4, 3012, Bern, Switzerland. [email protected].
  • 4 Faculty of Science, Research Centre for Toxic Compounds in the Environment, Masaryk University, Kamenice 5, Building A29, 62500, Brno, Czech Republic.
  • 5 Department of Mathematics, Faculty of Science, Centre for Mathematical Biology, University of South Bohemia, Branišovská 1760, 37005, České Budějovice, Czech Republic.
  • 6 Department of Ecology, Biology Centre, Institute of Entomology, The Czech Academy of Sciences, Branišovská 31, 37005, České Budějovice, Czech Republic.
  • PMID: 36437294
  • PMCID: PMC9701794
  • DOI: 10.1038/s41598-022-24748-8

Genetic drift is a basic evolutionary principle describing random changes in allelic frequencies, with far-reaching consequences in various topics ranging from species conservation efforts to speciation. The conventional approach assumes that genetic drift has the same effect on all populations undergoing the same changes in size, regardless of different non-reproductive behaviors and history of the populations. However, here we reason that processes leading to a systematic increase of individuals` chances of survival, such as learning or immunological memory, can mitigate loss of genetic diversity caused by genetic drift even if the overall mortality rate in the population does not change. We further test this notion in an agent-based model with overlapping generations, monitoring allele numbers in a population of prey, either able or not able to learn from successfully escaping predators' attacks. Importantly, both these populations start with the same effective size and have the same and constant overall mortality rates. Our results demonstrate that even under these conditions, learning can mitigate loss of genetic diversity caused by drift, by creating a pool of harder-to-die individuals that protect alleles they carry from extinction. Furthermore, this effect holds regardless if the population is haploid or diploid or whether it reproduces sexually or asexually. These findings may be of importance not only for basic evolutionary theory but also for other fields using the concept of genetic drift.

© 2022. The Author(s).

Publication types

  • Research Support, Non-U.S. Gov't
  • Biological Evolution*
  • Gene Frequency
  • Genetic Drift*

Grants and funding

  • CZ.02.1.01/0.0/0.0/15_003/0000469/Ministerstvo Školství, Mládeže a Tělovýchovy
  • 857560/Horizon 2020 Framework Programme

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RESEARCH BRIEFINGS
  • 03 April 2024

Five steps to connect genetic risk variants to disease

This is a summary of: Schnitzler, G. R. et al . Convergence of coronary artery disease genes onto endothelial cell programs. Nature 626 , 799–807 (2024) .

Access options

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Rent or buy this article

Prices vary by article type

Prices may be subject to local taxes which are calculated during checkout

doi: https://doi.org/10.1038/d41586-024-00061-4

Aragam, K. G. et al. Nature Genet. 54 , 1803–1815 (2022).

Article   PubMed   Google Scholar  

Claussnitzer, M. et al. Nature 577 , 179–189 (2020).

Dixit, A. et al. Cell 167 , 1853–1866 (2016).

Adamson, B. et al. Cell 167 , 1867–1882 (2016).

Snellings, D. A. et al. Circ. Res. 129 , 195–215 (2021).

Download references

Reprints and permissions

Related Articles

genetic drift research paper

  • Cardiovascular biology

Cell-type-resolved mosaicism reveals clonal dynamics of the human forebrain

Cell-type-resolved mosaicism reveals clonal dynamics of the human forebrain

Article 10 APR 24

AI can help to tailor drugs for Africa — but Africans should lead the way

AI can help to tailor drugs for Africa — but Africans should lead the way

Comment 09 APR 24

Advanced CRISPR system fixes a deadly mutation in cells

Advanced CRISPR system fixes a deadly mutation in cells

Research Highlight 04 APR 24

Gut bacteria break down cholesterol — hinting at probiotic treatments

Gut bacteria break down cholesterol — hinting at probiotic treatments

News 02 APR 24

‘Epigenetic’ editing cuts cholesterol in mice

‘Epigenetic’ editing cuts cholesterol in mice

News 28 FEB 24

Crackdown on skin-colour bias by fingertip oxygen sensors is coming, hints FDA

Crackdown on skin-colour bias by fingertip oxygen sensors is coming, hints FDA

News 02 FEB 24

Junior Group Leader Position at IMBA - Institute of Molecular Biotechnology

The Institute of Molecular Biotechnology (IMBA) is one of Europe’s leading institutes for basic research in the life sciences. IMBA is located on t...

Austria (AT)

IMBA - Institute of Molecular Biotechnology

genetic drift research paper

Open Rank Faculty, Center for Public Health Genomics

Center for Public Health Genomics & UVA Comprehensive Cancer Center seek 2 tenure-track faculty members in Cancer Precision Medicine/Precision Health.

Charlottesville, Virginia

Center for Public Health Genomics at the University of Virginia

genetic drift research paper

Husbandry Technician I

Memphis, Tennessee

St. Jude Children's Research Hospital (St. Jude)

genetic drift research paper

Lead Researcher – Department of Bone Marrow Transplantation & Cellular Therapy

Researcher in the center for in vivo imaging and therapy.

genetic drift research paper

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Strategies to Minimize Genetic Drift

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Population genetics: past, present, and future

Atsuko okazaki.

1 Intractable Disease Research Center, Juntendo University, Tokyo, Japan

2 Laboratory of Statistical Genetics, Rockefeller University, 1230 York Avenue, New York, NY 10065 USA

Satoru Yamazaki

3 Department of Molecular Pharmacology, National Cerebral and Cardiovascular Center, Osaka, Japan

Ituro Inoue

4 Division of the Human Genetics, National Institute of Genetics, Shizuoka, Japan

We present selected topics of population genetics and molecular phylogeny. As several excellent review articles have been published and generally focus on European and American scientists, here, we emphasize contributions by Japanese researchers. Our review may also be seen as a belated 50-year celebration of Motoo Kimura’s early seminal paper on the molecular clock, published in 1968.

Introduction

In recent years, large amounts of DNA sequencing data have been generated in various projects such as 1000 Genomes (Genomes Project et al. 2010 , 2012 , 2015 ), the ALSPAC database (Fraser et al. 2013 ; Hameed et al. 2017 ), and Icelandic (Gudbjartsson et al. 2015 ), and Japanese populations (Nagasaki et al. 2015 ). Major achievements of these efforts have been as follows: (1) Larger genetic variation is observed within populations than between populations, and (2) each individual harbors large numbers of variants with low allele frequencies. These findings have long ago been predicted by population genetics and evolutionary studies. Therefore, it is instructive to look back at historic achievements in population genetics.

Excellent reviews of population genetics have been written (Chakraborty 2006 ; Charlesworth and Charlesworth 2017 ; Crow 1987 ; Crow and Kimura 1970 ) documenting the development of population genetics from early achievements by Mendel ( 1866 ), Hardy ( 1908 ), and Weinberg ( 1908 ) up to highly sophisticated theoretical developments, mostly by American, British, and Japanese scientists. Here, we review selected aspects of population genetics, genome evolution, and molecular phylogeny with an emphasis on contributions by Japanese researchers.

Historical aspects of population genetics and road to the neutral theory

Darwin’s theory of evolution through selection very well explains changes in time of heritable phenotypes. In the early 1900s, focusing on the evolution of genetic variants in the population, R. A. Fisher, S. Wright, and J. B. S. Haldane made fundamental theoretical contributions to population genetics (Provine 1971 ), Fisher in his 1922 paper (Fisher 1922 ), which was the first to introduce diffusion equations into population genetics, and Haldane in developing in 1927 (Haldane 1927 ) the approximation of change of numbers of copies of very rare mutants by branching processes. Wright ( 1938 ) developed the theory on the effects of genetic drift, that is, random changes in small populations. While his theory was supported only by a minority of scientists in an era when the molecular basis of genes had yet to be proven and the effects of genetic drift were underestimated, Wright’s theory made a great contribution to connecting Mendelian Genetics with the Darwinian theory of evolution.

More recently, it has become apparent that many molecular changes have no effects on phenotypes. Based on Wright’s drift hypothesis and Haldane’s approximation model of an advantageous mutation (Haldane 1927 ), Motoo Kimura ( 1964 ) then developed his neutral theory based on backward diffusion models, which showed the probability of fixation to zero of a variant in the population to be equal to 2  s ( N e / N ), where s is the selection coefficient, N the size of the breeding population, and N e the effective population size.

Mutations and selection are driving forces for evolution. Basically, mutations occur at random DNA bases. Harmful mutations tend to be eliminated within a short period of time and do not contribute to long-term evolution. This process is called negative or purifying selection as opposed to positive selection. Before Kimura ( 1964 ) proposed his neutral theory, there was little notion of neutral variation, although, at about the same time, Lewontin and Hubby ( 1966 ) considered the possibility of neutral mutation as a possible reason for a large amount of variation which they found in electrophoretic mobility. Still, natural selection was the mainstream hypothesis with the idea that advantageous variations in populations are the driving forces for evolution, and deleterious variations are removed in a rapid manner.

At the time, population genetics usually considered two alleles at each gene locus based on the assumption of genes being base pairs. On the other hand, Kimura and Crow ( 1964 ) assumed an infinite allele model (“neutral isoalleles”) and proposed that genetic variation in populations arises as to the balance between mutations and genetic drift. Comparing hemoglobin molecules between different organisms, Kimura ( 1968 ) postulated that amino-acid substitution rates are so high that they can only be explained by neutral mutations. In other words, mutation and random changes in a finite population can maintain considerable variation through random fixation of selectively neutral or nearly neutral mutants. In the light of current knowledge, however, Kimura’s reasoning appears somewhat flawed. For example, he argued that the “cost of natural selection” would be too high otherwise—more consideration has shown that no cost is imposed by beneficial mutations in the absence of environmental deterioration. He also used the total amount of DNA without distinguishing protein-coding regions and non-coding regions. Nonetheless, Kimura’s contributions to population genetics have been tremendous.

Together with the Darwinian selection hypothesis, the neutral theory is one of the two pillars of genome evolution. Thus, ‘survival of the luckiest, and not necessarily of the fittest’ may be a good explanation for the evolution of a great majority of genetic changes (Chakraborty 2006 ). Interestingly, Kimura ( 1969 ) also proposed the “infinite sites model”. In this model, if the mutation rate is low and the effective population size is small ( θ  = 4 N e µ « 1), a mutant variant will always appear at a different site in the genome. If so, identity by state at the variant can be regarded as identity by descent, and in this respect, the infinite sites model represents one of the bases for genome-wide association studies using SNPs as genetic markers in unrelated individuals (Sella and Barton 2019 ).

The nearly neutral theory

The evolutionary rate, λ  =  fμ , in the neutral theory ( f is the proportion of neutral mutations among all mutations in a gene, μ is the mutation rate) disregards mutations favorable to survival and simply classifies other mutations into neutral ( f ) and deleterious (1 −  f ) mutations. However, the extent of harmfulness measured by the selection coefficient, s , is a continuous quantity. Based on these ideas, Tomoko Ohta (Ohta 1973 , 1992 , 2002 ), who had built the foundation of the neutral theory with Motoo Kimura, proposed the “nearly neutral” theory, where slightly disadvantageous mutations (attenuated mutations) could persist in the population by chance if the population is small. Thus, according to her publications (Ohta 1973 , 1992 , 2002 ), a substantial fraction of changes is caused by random fixation of nearly neutral changes, a class that includes intermediates between neutral and advantageous, as well as between neutral and deleterious classes, although other population geneticists may disagree with this view (Kondrashov 1995 ; Nei 2005 ).

A difference from the neutral theory is that the nearly neutral theory allows for interactions between (1) genes having occurred through weak natural selection (or weak deleterious selection) and (2) genes without weak natural selections, and for the two types of genes to jointly contribute to evolution by opposing the action of genetic drift (Hurst 2009 ). In the nearly neutral theory, the effect of genetic drift is weakened, and slightly disadvantageous mutations are excluded from a population if the population is extremely large; if a population is small, then slightly disadvantageous mutations are kept (some are even fixed) by the effects of genetic drift. It seems that the structure of very large datasets such as 1000 Genomes or the Exome Sequencing Project 6500 can be explained by the nearly neutral theory, because there is increasing evidence that selection pressure in small populations such as mammals including humans is weaker compared to that in ancestral species, and slightly disadvantageous mutations have been accumulating in populations (Kosiol et al. 2008 ; Nelson et al. 2012 ; Nielsen et al. 2009 ; Tennessen et al. 2012 ).

Evolutionary rate of pseudogenes

In the second half of 1970, accumulated sequencing data confirmed the prediction by King and Jukes ( 1969 ) that mutation rates of synonymous variants are higher than those of non-synonymous variants, which supports the neutral theory. Kimura ( 1977 ) asserted that according to the neutral mutation-random drift hypothesis, most mutant substitutions detected among organisms should be the results of random fixation of selectively neutral or nearly neutral mutations. This conjecture was verified by the analysis of mutation rates of pseudogenes, that is, of genes with sequences similar to normal genes having lost their functions as they were duplicated to another location in the genome, and in the process, their transcription sequences were not preserved. Based on the neutral theory, Takashi Miyata calculated the replacement rates of non-synonymous variants and synonymous variants in nucleotide sequences of several pseudogenes, α and β globin, and compared them with those in their functional counterparts (Miyata and Hayashida 1981 ). Results showed that replacement rates were uniformly the same in different pseudogenes and almost equal to the mutation rate, with no other gene evolving at a faster rate. This observation clearly supported the neutral theory.

Junk DNA, a term publicized by Susumu Ohno ( 1972 ) but rarely used today (see below), contains inter-genic regions, most of which are SINEs ( S hort IN terspersed E lements) and LINEs ( L ong IN terspersed E lements). The term ‘junk DNA’ was mentioned by a few other authors in 1972 and even 9 years earlier in a paper little known to human geneticists (Ehret and De Haller 1963 ), but Ohno’s name tends to be most closely associated with this term.

Evolutionary rates of junk DNA are expected to be similar to those of synonymous mutations and pseudogenes. In mammals, most of the genome regions, likely well more than 90%, are predicted to be junk DNA. Therefore, evolutionary rates of whole genomes can be approximated as being those of junk DNA.

In 2012, the Encyclopedia of DNA elements (ENCODE) project (Consortium 2012 ) proved biochemical functions of 80% of the genome, especially outside of protein-coding regions, which was once considered junk DNA. The findings from the ENCODE project enable us to further explore the function of the human genome.

Genes and genomic duplication

In higher organisms, genomic duplication is known to be extremely important for evolution. Early on, Susumu Ohno proposed that evolution is caused by genomic duplication, which was a visionary idea at a time when large sequencing data were not yet available (Ohno 1970 ). It has been shown empirically and by theoretical considerations that the advantage of creating new copies of genomes (or individual genes) can result in higher fitness. An alternative model explaining genomic duplication is DDC ( D uplication D egeneration C omplementation) (Lynch and Conery 2000 ). In the DDC model, regulatory elements each controlling independent functions are duplicated and random null mutations in the regulatory elements through degeneration lead to sub-functionalization, where the regulatory elements complement each other to achieve the full ancestral repertoires. What is important in the process is that it does not require the help of positive selection, that is, functional diversification. In practice, it has been proposed that the selection of slightly disadvantageous mutations works with the expression level of each gene changing. Therefore, genetic duplication is predicted to proceed in a nearly neutral manner based on mutation pressure and genetic drift. In addition, “concerted evolution” in minisatellites used as markers for hyper-polymorphisms, and in other sequences such as rRNA genes can be explained well by Ohno’s theory (Hillis et al. 1991 ; Jeffreys et al. 1985 ).

Molecular phylogeny

Through evolution, currently, living organisms have descended from common ancestors. Systematic biology seeks to unravel relationships among organisms and to establish evolutionary trees. As every biology student knows, the classical approach to such discoveries is through painstaking analysis of morphological details. Depending on which of these phenotypes are considered most important, different relationships among organisms emerge.

Rather than relying on phenotypes that may or may not be heritable, molecular phylogeny relies on DNA sequences and their comparisons among organisms. Researchers with various backgrounds have made significant contributions to methods of creating phylogenetic trees and the evaluation of phylogenetic relationships. In this field, Joseph Felsenstein almost single-handedly established this field as a special branch of population genetics (Felsenstein 2004 ). For example, he introduced the maximum-likelihood method of establishing phylogenetic trees (Felsenstein 1978 ) (see below). One of his other contributions is the “Felsenstein Zone” (Huelsenbeck and Hillis 1993 ), which involves the phenomenon of “long-branch attraction”; that is, long branches will appear similar to each other and appear as sister taxa on a tree even though they do not share a common ancestry. The Zone is the set of trees on which long-branch attraction occurs. Such phenomena have been observed in many datasets and simulation analyses, and have led to the discovery of long-branch attraction, which leads to wrongly assuming phylogeny where none exists (Huelsenbeck and Hillis 1993 ). Furthermore, Felsenstein contributed greatly to molecular phylogeny by developing a program package, PHYLIP, combining various phylogenic tree estimation methods including DNAML. Thanks to his contributions, molecular phylogeny has become increasingly popular for empirical molecular evolutionists.

The development of molecular phylogeny may not seem to be related to disease gene discovery. However, it greatly contributes to such discoveries through interpretation of huge sequencing datasets obtained from the 1000 Genomes project and other projects. Generating a molecular phylogenetic tree for phylogenetic relationships between species led to the discovery of gene families (orthologs and paralogs). The coalescent theory, which examines the gene tree in a species by reversing the time, was also applied to reconstruct the demographic history of species of interest. In particular, regarding the coalescent theory, Tajima ( 1983 ) estimated nucleotide diversity based on the limited DNA polymorphic data, calculated the time of coalescence of genes sampled from a single population, and their theory applies to a few genes at the time of population splitting. Takahata and Nei ( 1985 ) further developed a coalescent theory from DNA sequencing data and theoretically showed that alleles with deep coalescences are relatively rare.

The neighbor-joining method

Many methods for creating (estimating) phylogenic trees have been developed. Historically, these methods can roughly be classified into two groups, distance matrix methods and character state methods. The former uses a distance matrix and estimates evolutionary distance such as the number of amino-acid substitutions or base substitutions based on all possible pairs of OTUs (Operational Taxonomic Units). This method was first applied to create phylogenic trees in the form of the UPGMA (Unweighted Pair Group Method with Arithmetic mean) method, where clusters of neighboring OTUs are created and connected in a stepwise fashion. The method is used not only for amino-acid or base-pair sequences but also in numerical taxonomy, which deals with expression analysis using microarray (Eisen et al. 1998 ) or trait-encoded information (Sokal and Michener 1958 ). However, since this method assumes constant evolutionary speed, it is problematic to apply to amino-acid or base-pair sequence data. To overcome this problem, distance methods were developed that did not assume a molecular clock (Fitch and Margoliash 1967 ). Masatoshi Nei and Naruya Saitou greatly improved upon this method and developed a much faster procedure (Saitou and Nei 1987 ). This method is one of the “star decomposition” methods that determine which, of a given pair of sequences, reduces length of the total tree most and combine neighboring nodes until all OTUs are included. In the neighbor-joining method, “neighbors” keep track of nodes on a tree rather than taxa or clusters of taxa. A modified distance matrix is obtained in which the separation between each pair of nodes is adjusted on the basis of their average divergence from all other nodes. The tree is constructed by joining the least-distant pair of nodes in this modified matrix. When two nodes are joined, their common ancestral node is added to the tree and the terminal nodes with their respective branches are removed from the tree. At each stage in the process, two terminal nodes are replaced by one new node. This iterative operation finds “neighbors” one after another, which creates the final phylogenetic tree. The neighbor-joining method is the most commonly used distance matrix method. Starting in 1971, Nei proposed that Nei’s distance be used for phylogenetic tree estimation, which was later incorporated into the neighbor-joining program package MEGA (Kumar et al. 1994 ; Saitou and Nei 1987 ).

The second group, character state methods, do not use a distance matrix and define characters (phenotypes) and use them for exploring tree topology. One of the examples of character state methods is the maximum-likelihood method discussed in the next section.

The maximum-likelihood method

Maximum likelihood (ML) was developed by Fisher ( 1922 ) as a method to estimate parameters in statistical models. It has several advantages over other methods, but tends to be more complicated to apply than simpler methods. In population genetics, Luigi Luca Cavalli-Sforza first applied the ML method to an approach for creating phylogenic trees based on allele frequencies (Cavalli-Sforza and Edwards 1967 ). The first use of maximum-likelihood inference of trees from molecular sequences was by Jerzy Neyman (Felsenstein 2001 ; Neyman 1971 ). Felsenstein proposed ML for creating phylogenic trees based on allele frequencies as continuous quantities (Felsenstein 1973a ), thus improving on the method previously proposed by Cavalli-Sforza, and introduced ML for estimating trees based on discrete datasets and the maximum parsimony criterion (Felsenstein 1973b ). Masami Hasegawa incorporated this approach into the MOLPHY program package and pioneered in the use of model selection methods such as AIC in comparing phylogenies (he was a member of Akaike’s institute) (Adachi and Hasegawa 1992 , 1996 ).

The ML method is the most efficient approach among all tree construction methods. For example, false-positive evidence of relationships of long branches (“long-branch attraction”) will not occur when trees are estimated by ML and the model of evolution is correct, although it can occur when the model is not correct. However, the ML method tends to be time-consuming and, for some large trees, may be impossible to apply.

Impact of variants on multifactorial disorders and missing heritability

Based on the material mentioned so far, we will now cover some topics on how progress in population genetics, genome evolution, and phylogenic studies can be applied to medical research.

Multifactorial disorders are assumed to occur through interactions between multiple genetic and environmental factors. Therefore, identifying disease susceptibility genes has been considered difficult, and detecting interactions with environmental factors even more so. Especially in the 1990s, such considerations were widespread, quite in contrast to the relative ease with which increased numbers of gene identifications for monogenic disorders have been achieved. However, there was a researcher to struggle with the solution for genetic causes of multifactorial disorders at that time. Ituro Inoue succeeded in narrowing down disease loci using linkage analysis with affected sib-pairs and constructing haplotypes of the angiotensinogen (AGT) gene using limited data (Inoue et al. 1997 ). Inoue assessed linkage disequilibrium (LD) at each site in the AGT gene and further demonstrated by in vitro functional assay that the combination between A (− 6) and T235 alleles affects the expression of the AGT gene. This study was visionary, since LD block structures had yet to be proved at that time.

After that, genome-wide association studies with large SNP data over the whole genome became available thanks to the HAPMAP project, SNP collections by Perlegen Science, LD block measurements, and construction of haplotype maps (HapMap 2005 ; Hinds et al. 2005 ). Although such genome-wide studies contributed to narrowing down locations of disease susceptibility genes, results are still insufficient for identifying many specific disease susceptibility genes, for example Moyamoya disease (Liu et al. 2011 ). A remaining challenge has been that identified susceptibility loci show only small odds ratios, and all susceptibility loci combined only explain up to 30% of most of the disease causes. These numbers are generally smaller than the heritability calculated in the previous twin studies, which is known as “missing heritability” (Manolio et al. 2009 ). Nowadays, however, methods for calculating SNP-based heritability have been developed (Yang et al. 2017 ) that come up with heritability estimates close to those obtained by classical segregation analysis, and part of the problem seems to be resolved.

Out-of-Africa hypothesis

Recent advances in sequencing technology have enabled the identification of whole genome structures at population levels. These successes have made it possible to compare current human genome sequences with ancient genomes such as Homo neanderthalensis or Denisova hominin , which greatly contributed to the understanding of the origin of Homo sapiens (Nielsen et al. 2017 ). Allan Wilson, along with Rebecca Cann and Mark Stoneking, first proposed the “out-of-Africa” hypothesis (Cann et al. 1987 ), which claims that Homo sapiens originated in Africa and then spread all over the world. They based their results on the analysis of mitochondrial DNA of various populations, which represented the first phylogenic tree of Homo sapiens . Work by Masatoshi Nei contributed to the out-of-Africa hypothesis: In the 1970s, Nei calculated heterozygosity for various protein isozymes and created phylogenic trees of Homo sapiens (Nei and Roychoudhury 1972 , 1974 ; Nielsen et al. 2017 ). An interesting finding based on this work is that genetic variation estimated by Nei’s distance or Wright’s F st is larger within populations than between populations (Lewontin 1972 ), which was later confirmed by the 1000 Genomes project. In other words, there are greater differences among individuals in a given population than between populations. However, this notion has also been challenged (Edwards 2003 ).

Relationship between recent explosive population growth and origin of deleterious variants

Numerous human genome sequence projects such as 1000 Genomes revealed that each individual harbors considerable numbers of private mutations. This fact had been proposed by Haldane in his “genetic load” theory, which predicted an association between the numbers of variants possessed over populations and survival rate (Haldane 1937 ). In his theory, he claimed that if we consider genetic load for the whole genome rather than a given locus, the fitness decrease by mutations is equal to the mutation rate, v , irrespective of the extent of selection. He also claimed that pathogenic mutations accumulate in the form of heterozygous variants unless such mutations are excluded as lethal homozygous mutations (Haldane 1937 ) (this theory is also known as the Haldane–Muller principle). The theory of genetic load was further elaborated upon by Kimura ( 1960 ); for neutral mutations, there is no load. Based on this background, for variants whose distributions differ among populations, estimating the age of each variant becomes possible, which is important for understanding the history of human evolution, as well as for developing novel methods for disease gene discovery. The mathematical theory of coalescence allowing haplotype and allele ages to be calculated was developed by John Kingman ( 2000 ), and Kimura and Ohta ( 1973 ) proposed a formula for determining allele age, − 2 x (1 −  x )/log( x ). This formula represents the expected age of a neutral mutation of frequency x in a stationary population based on a diffusion process used in classical population genetics. Although there was a discussion regarding the restrictive assumption that the age distribution of a mutant allele with population frequency x should be the same as the distribution of the time to extinction of the allele, conditional on extinction, it made a great contribution to later calculations of allele age (Fu et al. 2013 ). Calculating allele age assuming the infinite many sites of model of mutation developed Kimura and Ohta formula, it showed that about three-quarters of all protein-coding SNV predicted to be deleterious across in the past 5000 years (Fu et al. 2013 ). This attempt provides important practical information that can be prioritized variants in disease gene discovery.

Inbreeding (mating between relatives) has so far not been discussed here as it does not lead to changes in allele frequencies. It does, however, lead to a decrease in heterozygotes and a corresponding increase in homozygotes. As is well known, at a bi-allelic locus with allele frequency p , the proportion of heterozygotes is given by 2 p (1 −  p )(1 −  F ), where F is the inbreeding coefficient. In many human populations, F tends to be rather small; for example, F  = 0.00038 in the UK (Pattison 2016 ). An exception is offspring of first cousins ( F  = 1/16). For rare deleterious recessive traits with disease allele frequency p , recessive offspring of first-cousin marriages occur with probability p 2  +  p (1 −  p ) F (Haldane and Moshinsky 1939 ). Through genetic linkage of such a trait with SNPs surrounding it, rare recessive traits tend to be located in long runs of homozygous SNPs (homozygosity mapping (Lander and Botstein 1987 )). More modern approaches have been developed, for example, based on the Hamming distance between chromosomes in affected and control individuals (Imai et al. 2015 ). This approach revealed a mutation, p.H96R in the BOLA3 gene, possibly having originated in a single Japanese founder individual (Imai et al. 2016 ).

Darwinian (evolutionary) medicine

From the viewpoint of Darwinian medicine (or evolutionary medicine), which is medicine based on evolution (Williams and Nesse 1991 ), we discuss a few aspects of how discovering variants can translate into medical care.

In the 1960s, Richard Lewontin discovered in Drosophila populations that heterozygosity is more often observed than expected (Lewontin and Hubby 1966 ). He interpreted this finding as advantageous fitness of heterozygosity compared to the homozygous state of the wild type or mutant (so-called over-dominance, or balancing selection) and emphasized its importance for survival. After the establishment of the neutral theory, as described below, the importance of balancing selection for some types of variants with high allele frequencies was rediscovered. Theoretical studies on natural selection also greatly progressed and “Tajima’s D”, developed by Fumio Tajima, is computed as the difference between two measures of genetic diversity: the mean number of pairwise differences and the number of segregating sites, each scaled so that they are expected to be the same in a neutrally evolving population of constant size. This is a unique contribution to statistical genetics by Japanese researchers in that this method can assess whether a given variant scattered over the whole genome is neutral or under selection pressure (Tajima 1989 ).

Analyzing genome sequences in several populations using the techniques of next-generation sequencing reveals some signals with positive selection pressure. One such example is infection-related diseases. Regarding the natural selection for resistance of a pathogen, this was revealed by next-generation sequencing to represent the strongest positive selection pressure in human evolution; that is, the well-known balancing signals on glycoproteins and positive selection signals on TLRs (Ferrer-Admetlla et al. 2008 ). Applying the history of evolution for various pathogens to disease susceptibility research will likely identify functional variants as well as intra-cellular mechanisms and treatment for various diseases. We believe that selection pressure for ancient pathogens will affect not only infectious and auto-immune diseases but also other traits. Recently, the association between life-style diseases and natural selection has become an attractive topic. Using 40 traits from the UK Biobank, functional low-frequency variants have been revealed to be under negative selection (Gazal et al. 2018 ). An alternative suggestion has been that positive selection acts on susceptibility loci for life-style diseases. An example is the thrifty gene hypothesis. At the dawn of the era of genomic medicine, the ancient history of human evolution is a powerful tool for understanding human biology leading to improving human health.

In this outline, we deliberately emphasized contributions to population genetics by Japanese researchers—in this field, Japanese scientists have arguably carried out comprehensive fundamental work. Thus, we feel justified in presenting this short review of population genetics from a Japanese point of view.

In terms of future developments in population genetics, we expect DNA sequencing to play an ever-increasing role. In an era where human genome sequence projects are underway around the world, established population genetics principles will be applied to reveal more detailed migration history, population history, and mechanisms of selection pressure, particularly in small ethnic populations (Antonio et al. 2019 ; Lipson et al. 2020 ).

Technological advances have changed the landscape of genetic screening (Ceyhan-Birsoy et al. 2019 ). Together with epidemiological and molecular genetics studies, population genetics approaches have demonstrated the association between disease mechanisms and mutations in populations. Cystic fibrosis is one such successful example (Bell et al. 2020 ). By identifying the relationship between specific mutations and a cystic fibrosis transmembrane conductance regulator (CFTR) defect, we can improve patient care including disease monitoring and treatment decisions. In the future, improvement of patient care in more diseases can be achieved by the combination of population genetics, epidemiological studies, and molecular genetics studies.

With the huge amount of genomic information currently available, it is challenging to link genotypes to phenotypes, predict regulatory functions, and classify mutant types. Therefore, new and innovative approaches are needed for further understanding of medical biology and connections to genetic disease. One approach is to collect previously reported SNV information and create a suitable mathematical model. As an example, a study by Davis et al. ( 2016 ) describes a biophysical metric of cardiomyocyte function, which accurately predicts human cardiac phenotypes.

Another approach is based on neural networks to automatically extract relevant features from input data (Zou et al. 2019 ). Since advances in sequencing technologies provide large amounts of data, it is realistic to utilize machine learning as a tool for analysis in the field of clinical healthcare and population genetics. Although deep learning has great potential, attempts to apply it to genomics have only just begun. For example, SpliceAI, a 32-layer deep neural network (DNN) was developed for predicting de novo mutations with predicted splice-altering consequences in patients with neurodevelopmental disorders, which paves the way for the application of deep learning on complex genetic variant prediction (Jaganathan et al. 2019 ). To identify pathogenic mutations in patients with rare diseases, a DNN model was developed combining common variants derived from human and six non-human primate species. The proposed model achieved an 88% accuracy and found 14 unreported candidate genes associated with intellectual disability (Sundaram et al. 2018 ).

Finally, epidemics and pandemics of viruses and their sequences provide rich sources of information. For example, population genetic analyses of 103 SARS-CoV-2 genomes indicated the presence of two major lineages, although the implications of these evolutionary changes remained unclear (Tang et al. 2020 ).

Acknowledgements

Helpful comments by Prof. Joseph Felsenstein on an earlier version of this manuscript are gratefully acknowledged. This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI, Grant numbers JP20K08497 and JP18K15863 (A. O.), and Grant number 19K09408 (S. Y.).

Compliance with ethical standards

The authors declare no conflict of interest.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Adachi J, Hasegawa M (1992) MOLPHY, programs for molecular phylogenetics. I, PROTML, maximum likelihood inference of protein phylogeny. Computer science monographs, no. 27. Institute of Statistical Mathematics, Tokyo, pp 1–14
  • Adachi J, Hasegawa M (1996) MOLPHY version 2.3: Programs for molecular phylogenetics based on maximum likelihood. Computer science monographs, no. 28. Institute of Statistical Mathematics, Tokyo, pp 1–150
  • Antonio ML, Gao Z, Moots HM, Lucci M, Candilio F, Sawyer S, Oberreiter V, Calderon D, Devitofranceschi K, Aikens RC, Aneli S, Bartoli F, Bedini A, Cheronet O, Cotter DJ, Fernandes DM, Gasperetti G, Grifoni R, Guidi A, La Pastina F, Loreti E, Manacorda D, Matullo G, Morretta S, Nava A, Fiocchi Nicolai V, Nomi F, Pavolini C, Pentiricci M, Pergola P, Piranomonte M, Schmidt R, Spinola G, Sperduti A, Rubini M, Bondioli L, Coppa A, Pinhasi R, Pritchard JK. Ancient Rome: a genetic crossroads of Europe and the Mediterranean. Science. 2019; 366 :708–714. doi: 10.1126/science.aay6826. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bell SC, Mall MA, Gutierrez H, Macek M, Madge S, Davies JC, Burgel PR, Tullis E, Castanos C, Castellani C, Byrnes CA, Cathcart F, Chotirmall SH, Cosgriff R, Eichler I, Fajac I, Goss CH, Drevinek P, Farrell PM, Gravelle AM, Havermans T, Mayer-Hamblett N, Kashirskaya N, Kerem E, Mathew JL, McKone EF, Naehrlich L, Nasr SZ, Oates GR, O'Neill C, Pypops U, Raraigh KS, Rowe SM, Southern KW, Sivam S, Stephenson AL, Zampoli M, Ratjen F. The future of cystic fibrosis care: a global perspective. Lancet Respir Med. 2020; 8 :65–124. doi: 10.1016/S2213-2600(19)30337-6. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cann RL, Stoneking M, Wilson AC. Mitochondrial DNA and human evolution. Nature. 1987; 325 :31–36. [ PubMed ] [ Google Scholar ]
  • Cavalli-Sforza LL, Edwards AW. Phylogenetic analysis. Models and estimation procedures. Am J Hum Genet. 1967; 19 :233–257. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ceyhan-Birsoy O, Murry JB, Machini K, Lebo MS, Yu TW, Fayer S, Genetti CA, Schwartz TS, Agrawal PB, Parad RB, Holm IA, McGuire AL, Green RC, Rehm HL, Beggs AH, BabySeq Project T Interpretation of Genomic Sequencing Results in Healthy and Ill Newborns: Results from the BabySeq Project. Am J Hum Genet. 2019; 104 :76–93. doi: 10.1016/j.ajhg.2018.11.016. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chakraborty R. Population Genetics: Historical Aspects. eLS. Chichester: Wiley; 2006. pp. 1–3. [ Google Scholar ]
  • Charlesworth B, Charlesworth D. Population genetics from 1966 to 2016. Heredity (Edinb) 2017; 118 :2–9. doi: 10.1038/hdy.2016.55. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Consortium EP An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489 :57–74. doi: 10.1038/nature11247. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Crow JF. Population genetics history: a personal view. Annu Rev Genet. 1987; 21 :1–22. doi: 10.1146/annurev.ge.21.120187.000245. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Crow JF, Kimura M. An introduction to population genetics theory. New York: Harper & Row; 1970. [ Google Scholar ]
  • Davis J, Davis LC, Correll RN, Makarewich CA, Schwanekamp JA, Moussavi-Harami F, Wang D, York AJ, Wu H, Houser SR, Seidman CE, Seidman JG, Regnier M, Metzger JM, Wu JC, Molkentin JD. A tension-based model distinguishes hypertrophic versus dilated cardiomyopathy. Cell. 2016; 165 :1147–1159. doi: 10.1016/j.cell.2016.04.002. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Edwards AW. Human genetic diversity: Lewontin's fallacy. BioEssays. 2003; 25 :798–801. doi: 10.1002/bies.10315. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ehret CF, De Haller G. Origin, development and maturation of organelles and organelle systems of the cell surface in Paramecium. J Ultrastruct Res. 1963; 23 :1–42. [ PubMed ] [ Google Scholar ]
  • Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998; 95 :14863–14868. doi: 10.1073/pnas.95.25.14863. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Felsenstein J. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet. 1973; 25 :471–492. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Felsenstein J. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Biol. 1973; 22 :240–249. [ Google Scholar ]
  • Felsenstein J. The number of evolutionary trees. Syst Biol. 1978; 27 :27–33. doi: 10.2307/2412810. [ CrossRef ] [ Google Scholar ]
  • Felsenstein J. Taking variation of evolutionary rates between sites into account in inferring phylogenies. J Mol Evol. 2001; 53 :447–455. doi: 10.1007/s002390010234. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Felsenstein J. Inferring phylogenies. Sunderland: Sinauer Associates; 2004. [ Google Scholar ]
  • Ferrer-Admetlla A, Bosch E, Sikora M, Marques-Bonet T, Ramirez-Soriano A, Muntasell A, Navarro A, Lazarus R, Calafell F, Bertranpetit J, Casals F. Balancing selection is the main force shaping the evolution of innate immunity genes. J Immunol. 2008; 181 :1315–1322. doi: 10.4049/jimmunol.181.2.1315. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fisher RA. On the mathematical foundations of theoretical statistics. Phil Trans Roy Soc. 1922; A202 :309–368. [ Google Scholar ]
  • Fitch WM, Margoliash E. Construction of phylogenetic trees. Science. 1967; 155 :279–284. doi: 10.1126/science.155.3760.279. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fraser A, Macdonald-Wallis C, Tilling K, Boyd A, Golding J, Davey Smith G, Henderson J, Macleod J, Molloy L, Ness A, Ring S, Nelson SM, Lawlor DA. Cohort profile: the avon longitudinal study of parents and children: ALSPAC mothers cohort. Int J Epidemiol. 2013; 42 :97–110. doi: 10.1093/ije/dys066. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fu W, O'Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Rieder MJ, Altshuler D, Shendure J, Nickerson DA, Bamshad MJ, Project NES. Akey JM. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature. 2013; 493 :216–220. doi: 10.1038/nature11690. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gazal S, Loh P-R, Finucane HK, Ganna A, Schoech A, Sunyaev S, Price AL. Functional architecture of low-frequency variants highlights strength of negative selection across coding and non-coding annotations. Nat Genet. 2018; 50 :1600–1607. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Genomes Project C. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010; 467 :1061–1073. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Genomes Project C. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491 :56–65. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Genomes Project C. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR. A global reference for human genetic variation. Nature. 2015; 526 :68–74. doi: 10.1038/nature15393. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gudbjartsson DF, Helgason H, Gudjonsson SA, Zink F, Oddson A, Gylfason A, Besenbacher S, Magnusson G, Halldorsson BV, Hjartarson E, Sigurdsson GT, Stacey SN, Frigge ML, Holm H, Saemundsdottir J, Helgadottir HT, Johannsdottir H, Sigfusson G, Thorgeirsson G, Sverrisson JT, Gretarsdottir S, Walters GB, Rafnar T, Thjodleifsson B, Bjornsson ES, Olafsson S, Thorarinsdottir H, Steingrimsdottir T, Gudmundsdottir TS, Theodors A, Jonasson JG, Sigurdsson A, Bjornsdottir G, Jonsson JJ, Thorarensen O, Ludvigsson P, Gudbjartsson H, Eyjolfsson GI, Sigurdardottir O, Olafsson I, Arnar DO, Magnusson OT, Kong A, Masson G, Thorsteinsdottir U, Helgason A, Sulem P, Stefansson K. Large-scale whole-genome sequencing of the Icelandic population. Nat Genet. 2015; 47 :435–444. doi: 10.1038/ng.3247. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Haldane J. A mathematical theory of natural and artificial selection, Part V: selection and mutation. Math Proc Cambridge Philos Soc. 1927; 23 :838–844. doi: 10.1017/S0305004100015644. [ CrossRef ] [ Google Scholar ]
  • Haldane JBS. The effect of variation on fitness. Am Nat. 1937; 71 :337–349. [ Google Scholar ]
  • Haldane JBS, Moshinsky P. Inbreeding in mendelian populations with special reference to human cousin marriage. Ann Eugen. 1939; 9 :321–340. [ Google Scholar ]
  • Hameed MA, Lingam R, Zammit S, Salvi G, Sullivan S, Lewis AJ. Trajectories of early childhood developmental skills and early adolescent psychotic experiences: findings from the ALSPAC UK birth cohort. Front Psychol. 2017; 8 :2314. doi: 10.3389/fpsyg.2017.02314. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • HapMap A haplotype map of the human genome. Nature. 2005; 437 :1299–1320. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hardy GH. Mendelian proportions in a mixed population. Science. 1908; 28 :49–50. [ PubMed ] [ Google Scholar ]
  • Hillis DM, Moritz C, Porter CA, Baker RJ. Evidence for biased gene conversion in concerted evolution of ribosomal DNA. Science. 1991; 251 :308–310. doi: 10.1126/science.1987647. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-genome patterns of common DNA variation in three human populations. Science. 2005; 307 :1072–1079. doi: 10.1126/science.1105436. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huelsenbeck JP, Hillis DM. Success of phylogenetic methods in the four-taxon case. Syst Biol. 1993; 42 :247–264. doi: 10.1093/sysbio/42.3.247. [ CrossRef ] [ Google Scholar ]
  • Hurst LD. Evolutionary genomics and the reach of selection. J Biol. 2009; 8 :12. doi: 10.1186/jbiol113. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Imai A, Nakaya A, Fahiminiya S, Tetreault M, Majewski J, Sakata Y, Takashima S, Lathrop M, Ott J. Beyond homozygosity mapping: family-control analysis based on hamming distance for prioritizing variants in exome sequencing. Sci Rep. 2015; 5 :12028. doi: 10.1038/srep12028. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Imai A, Kohda M, Nakaya A, Sakata Y, Murayama K, Ohtake A, Lathrop M, Okazaki Y, Ott J. HDR: a statistical two-step approach successfully identifies disease genes in autosomal recessive families. J Hum Genet. 2016; 61 :959–963. doi: 10.1038/jhg.2016.85. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Inoue I, Nakajima T, Williams CS, Quackenbush J, Puryear R, Powers M, Cheng T, Ludwig EH, Sharma AM, Hata A, Jeunemaitre X, Lalouel JM. A nucleotide substitution in the promoter of human angiotensinogen is associated with essential hypertension and affects basal transcription in vitro. J Clin Invest. 1997; 99 :1786–1797. doi: 10.1172/JCI119343. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB, Chow ED, Kanterakis E, Gao H, Kia A, Batzoglou S, Sanders SJ, Farh KK. Predicting splicing from primary sequence with deep learning. Cell. 2019; 176 (535–548):e24. doi: 10.1016/j.cell.2018.12.015. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jeffreys AJ, Wilson V, Thein SL. Hypervariable 'minisatellite' regions in human DNA. Nature. 1985; 314 :67–73. doi: 10.1038/314067a0. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kimura M. Optimum mutation rate and degree of dominance as determined by the principle of minimum genetic load. J Genet. 1960; 57 :21–34. [ Google Scholar ]
  • Kimura M. Diffusion models in population genetics. J Appl Probab. 1964; 1 :177–232. [ Google Scholar ]
  • Kimura M. Evolutionary rate at the molecular level. Nature. 1968; 217 :624–626. [ PubMed ] [ Google Scholar ]
  • Kimura M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 1969; 61 :893–903. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kimura M. Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature. 1977; 267 :275–276. [ PubMed ] [ Google Scholar ]
  • Kimura M, Crow JF. The number of alleles that can be maintained in a finite population. Genetics. 1964; 49 :725–738. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kimura M, Ohta T. The age of a neutral mutant persisting in a finite population. Genetics. 1973; 75 :199–212. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • King JL, Jukes TH. Non-Darwinian evolution. Science. 1969; 164 :788–798. [ PubMed ] [ Google Scholar ]
  • Kingman JF. Origins of the coalescent. 1974–1982. Genetics. 2000; 156 :1461–1463. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kondrashov AS. Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over? J Theor Biol. 1995; 175 :583–594. doi: 10.1006/jtbi.1995.0167. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A. Patterns of positive selection in six Mammalian genomes. PLoS Genet. 2008; 4 :e1000144. doi: 10.1371/journal.pgen.1000144. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kumar S, Tamura K, Nei M. MEGA: molecular evolutionary genetics analysis software for microcomputers. Comput Appl Biosci. 1994; 10 :189–191. [ PubMed ] [ Google Scholar ]
  • Lander ES, Botstein D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science. 1987; 236 :1567–1570. [ PubMed ] [ Google Scholar ]
  • Lewontin RC. The apportionment of human diversity. In: Dobzhansky T, Hecht MK, Steere WC, editors. Evolutionary biology. New York: Appleton-Century-Crofts; 1972. pp. 381–398. [ Google Scholar ]
  • Lewontin RC, Hubby JL. A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics. 1966; 54 :595–609. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lipson M, Ribot I, Mallick S, Rohland N, Olalde I, Adamski N, Broomandkhoshbacht N, Lawson AM, Lopez S, Oppenheimer J, Stewardson K, Asombang RN, Bocherens H, Bradman N, Culleton BJ, Cornelissen E, Crevecoeur I, de Maret P, Fomine FLM, Lavachery P, Mindzie CM, Orban R, Sawchuk E, Semal P, Thomas MG, Van Neer W, Veeramah KR, Kennett DJ, Patterson N, Hellenthal G, Lalueza-Fox C, MacEachern S, Prendergast ME, Reich D. Ancient West African foragers in the context of African population history. Nature. 2020; 577 :665–670. doi: 10.1038/s41586-020-1929-1. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Liu W, Morito D, Takashima S, Mineharu Y, Kobayashi H, Hitomi T, Hashikata H, Matsuura N, Yamazaki S, Toyoda A, Kikuta K, Takagi Y, Harada KH, Fujiyama A, Herzig R, Krischek B, Zou L, Kim JE, Kitakaze M, Miyamoto S, Nagata K, Hashimoto N, Koizumi A. Identification of RNF213 as a susceptibility gene for moyamoya disease and its possible role in vascular development. PLoS ONE. 2011; 6 :e22542. doi: 10.1371/journal.pone.0022542. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000; 290 :1151–1155. [ PubMed ] [ Google Scholar ]
  • Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, Mackay TF, McCarroll SA, Visscher PM. Finding the missing heritability of complex diseases. Nature. 2009; 461 :747–753. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Mendel GJ. Versuche über Pflanzen-Hybriden. Verh Naturforsch Ver Brünn. 1866; 4 :3–47. [ Google Scholar ]
  • Miyata T, Hayashida H. Extraordinarily high evolutionary rate of pseudogenes: evidence for the presence of selective pressure against changes between synonymous codons. Proc Natl Acad Sci USA. 1981; 78 :5739–5743. doi: 10.1073/pnas.78.9.5739. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nagasaki M, Yasuda J, Katsuoka F, Nariai N, Kojima K, Kawai Y, Yamaguchi-Kabata Y, Yokozawa J, Danjoh I, Saito S, Sato Y, Mimori T, Tsuda K, Saito R, Pan X, Nishikawa S, Ito S, Kuroki Y, Tanabe O, Fuse N, Kuriyama S, Kiyomoto H, Hozawa A, Minegishi N, Douglas Engel J, Kinoshita K, Kure S, Yaegashi N, To MJRPP, Yamamoto M. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun. 2015; 6 :8018. doi: 10.1038/ncomms9018. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nei M. Selectionism and neutralism in molecular evolution. Mol Biol Evol. 2005; 22 :2318–2342. doi: 10.1093/molbev/msi242. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nei M, Roychoudhury AK. Gene differences between Caucasian, Negro, and Japanese populations. Science. 1972; 177 :434–436. [ PubMed ] [ Google Scholar ]
  • Nei M, Roychoudhury AK. Genic variation within and between the three major races of man, Caucasoids, Negroids, and Mongoloids. Am J Hum Genet. 1974; 26 :421–443. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D, Warren L, Aponte J, Zawistowski M, Liu X, Zhang H, Zhang Y, Li J, Li Y, Li L, Woollard P, Topp S, Hall MD, Nangle K, Wang J, Abecasis G, Cardon LR, Zollner S, Whittaker JC, Chissoe SL, Novembre J, Mooser V. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012; 337 :100–104. doi: 10.1126/science.1217876. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Neyman J. Molecular studies of evolution: a source of novel statistical problems. In: Gupta SS, Yackel J, editors. Statistical decision theory and related topics. New York: Academic Press; 1971. pp. 1–27. [ Google Scholar ]
  • Nielsen R, Hubisz MJ, Hellmann I, Torgerson D, Andres AM, Albrechtsen A, Gutenkunst R, Adams MD, Cargill M, Boyko A, Indap A, Bustamante CD, Clark AG. Darwinian and demographic forces affecting human protein coding genes. Genome Res. 2009; 19 :838–849. doi: 10.1101/gr.088336.108. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Nielsen R, Akey JM, Jakobsson M, Pritchard JK, Tishkoff S, Willerslev E. Tracing the peopling of the world through genomics. Nature. 2017; 541 :302–310. doi: 10.1038/nature21347. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ohno S. Evolution by gene duplication. New York: Springer; 1970. [ Google Scholar ]
  • Ohno S. So much "junk" DNA in our genome. Brookhaven Symp Biol. 1972; 23 :366–370. [ PubMed ] [ Google Scholar ]
  • Ohta T. Slightly deleterious mutant substitutions in evolution. Nature. 1973; 246 :96–98. [ PubMed ] [ Google Scholar ]
  • Ohta T. The nearly neutral theory of molecular evolution. Annu Rev Ecol Syst. 1992; 23 :263–286. [ Google Scholar ]
  • Ohta T. Near-neutrality in evolution of genes and gene regulation. Proc Natl Acad Sci USA. 2002; 99 :16134–16137. doi: 10.1073/pnas.252626899. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Pattison JE. An attempt to integrate previous localized estimates of human inbreeding for the whole of Britain. Hum Biol. 2016; 88 :264–274. [ PubMed ] [ Google Scholar ]
  • Provine WB. The origins of theoretical population genetics. Chicago: University of Chicago Press; 1971. [ Google Scholar ]
  • Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987; 4 :406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sella G, Barton NH. Thinking about the evolution of complex traits in the era of genome-wide association studies. Annu Rev Genomics Hum Genet. 2019; 20 :461–493. doi: 10.1146/annurev-genom-083115-022316. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Sokal RR, Michener CD. A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull. 1958; 38 :1409–1438. [ Google Scholar ]
  • Sundaram L, Gao H, Padigepati SR, McRae JF, Li Y, Kosmicki JA, Fritzilas N, Hakenberg J, Dutta A, Shon J, Xu J, Batzoglou S, Li X, Farh KK. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet. 2018; 50 :1161–1170. doi: 10.1038/s41588-018-0167-z. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983; 105 :437–460. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989; 123 :585–595. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Takahata N, Nei M. Gene genealogy and variance of interpopulational nucleotide differences. Genetics. 1985; 110 :325–344. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Tang X, Wu C, Li X, Song Y, Yao X, Wu X, Duan Y, Zhang H, Wang Y, Qian Z, Cui J, Lu J. On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev. 2020; 7 :1012–1023. doi: 10.1093/nsr/nwaa036. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM, Broad GO, Seattle GO, Project NES Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012; 337 :64–69. doi: 10.1126/science.1219240. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Weinberg W. Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg. 1908; 64 :369–382. [ Google Scholar ]
  • Williams GC, Nesse RM. The dawn of Darwinian medicine. Q Rev Biol. 1991; 66 :1–22. doi: 10.1086/417048. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Wright S. Size of population and breeding structure in relation to evolution. Science. 1938; 87 :430–431. [ Google Scholar ]
  • Yang J, Zeng J, Goddard ME, Wray NR, Visscher PM. Concepts, estimation and interpretation of SNP-based heritability. Nat Genet. 2017; 49 :1304–1310. doi: 10.1038/ng.3941. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zou J, Huss M, Abid A, Mohammadi P, Torkamani A, Telenti A. A primer on deep learning in genomics. Nat Genet. 2019; 51 :12–18. doi: 10.1038/s41588-018-0295-5. [ PubMed ] [ CrossRef ] [ Google Scholar ]

IMAGES

  1. Genetic Drift: Types, Causes and Examples

    genetic drift research paper

  2. (PDF) Effect of natural selection, artificial selection and genetic

    genetic drift research paper

  3. Genetic drift

    genetic drift research paper

  4. (PDF) Genetic drift in recent human evolution?

    genetic drift research paper

  5. Genetic Drift Lab

    genetic drift research paper

  6. (PDF) Genetic drift

    genetic drift research paper

VIDEO

  1. Genetic Drift 2nd Year Bio Unit 24 Lecture 17

  2. genetic drift || evolution || class 12 biology Sindh board new book

  3. gene flow, genetic drift dan seleksi alam

  4. drift #漂移

  5. GENETIC DRIFT DETAILED EXPLANATION FROM NEW BOOK||XII BIOLOGY CHAPTER 24 EVOLUTION||EASIEST TOPIC

  6. What is genetic drift?

COMMENTS

  1. (PDF) Genetic Drift

    The allele 120I (P = 1 × 10 −10 ) had stable frequency (23%) over time. Genetic drift is a non-directional change in allele frequency that occurs by chance between generations by decreasing or ...

  2. Genetic Drift

    Neutral Models of Genetic Drift and Mutation. P.W. Messer, in Encyclopedia of Evolutionary Biology, 2016 Abstract. Random genetic drift describes the stochastic fluctuations of allele frequencies due to random sampling in finite populations. Over time, genetic drift can lead to fixation or loss of genetic variants, thereby systematically eliminating diversity from a population.

  3. Learning mitigates genetic drift

    Genetic drift is a basic evolutionary principle describing random changes in allelic frequencies, with far-reaching consequences in various topics ranging from species conservation efforts to ...

  4. Effects of Genetic Drift and Gene Flow on the Selective Maintenance of

    Genetic drift can introduce alleles that are slightly deleterious to populations either when drift is strong or when there are many mutations whose effect are small (Schultz and Lynch 1997; Whitlock 2003). Since genetic drift should be relatively less important for the largest population, our results suggest that the number of these mutations ...

  5. The Genetic Drift Inventory: A Tool for Measuring What Advanced

    These experts did not necessarily teach or research the topic of genetic drift. We used a Welch's t test to compare the mean scores between experts and undergraduates. This analysis excludes one statement (19 in Supplemental Material, Genetic Drift Inventory 1.0) because we included the incorrect stem for one statement in the version of the ...

  6. Effects of multiple sources of genetic drift on pathogen variation

    Abstract. Changes in pathogen genetic variation within hosts alter the severity and spread of infectious diseases, with important implications for clinical disease and public health. Genetic drift may play a strong role in shaping pathogen variation, but analyses of drift in pathogens have oversimplified pathogen population dynamics, either by ...

  7. Genetic drift, selection and the evolution of the mutation rate

    The theory outlined in Box 1 also predicts detectable levels (certainly two- to threefold) of variation in the mutation rate among lineages experiencing identical levels of selection and random ...

  8. Genetic Drift Shapes the Evolution of a Highly Dynamic Metapopulation

    Abstract. The dynamics of extinction and (re)colonization in habitat patches are characterizing features of dynamic metapopulations, causing them to evolve differently than large, stable populations. The propagule model, which assumes genetic bottlenecks during colonization, posits that newly founded subpopulations have low genetic diversity ...

  9. Did Genetic Drift Drive Increases in Genome Complexity?

    Lynch and colleagues [5] - [7] have argued strongly for a central role for nonadaptive processes such as mutation and drift in the evolution of genome size and complexity. In contrast to proposed neutral and adaptive models of genome size evolution (see, e.g. [8], [9] ), they outline a model positing that mutations increasing genome size are ...

  10. Genetic drift: the ghost in the genome

    Originally developed as a research model by C.C. Little, founder of the Jackson Laboratory, C57BL/6 has become the most widely used mouse model in biomedical research, owing mainly to inertia ...

  11. Genetic drift and selection in many-allele range expansions

    Hallatschek et al. [ 8] identified the key role of genetic drift in producing these sectored patterns; the small population size at the front of an expanding population [ 9, 10] enhances number fluctuations (i.e. genetic drift), eventually leading to the local fixation of one strain past a critical expansion radius R0.

  12. Learning mitigates genetic drift

    Genetic drift is a basic evolutionary principle describing random changes in allelic frequencies, with far-reaching consequences in various topics ranging from species conservation efforts to speciation. ... 4 Faculty of Science, Research Centre for Toxic Compounds in the Environment, Masaryk University, Kamenice 5, Building A29, 62500, Brno ...

  13. Genetic Drift

    Genetic drift, like any other evolutionary force, can only operate as an evolutionary force when there is genetic variability. Genetic drift causes its most dramatic and rapid changes in small populations. The chapter consider some examples of founder and bottleneck effects. Disassortative mating can strongly interact with drift-induced linkage ...

  14. Human Molecular Genetics and Genomics

    Genomic research has evolved from seeking to understand the fundamentals of the human genetic code to examining the ways in which this code varies among people, and then applying this knowledge to ...

  15. A Generalized approach to genetic drift and its applications

    Other definitions of genetic drift, such as the Ornstein-Uhlenbeck model, are about non-specific 27 stochastic noises in long-term evolution, unconnected to genetic variation (Szitenberg et al. 2016).

  16. Strategies to Minimize Genetic Drift and Maximize Experimental

    The purpose of this paper is to educate researchers on the potential for genetic drift to impact research progress, to highlight best practices to minimize drift, and provide solutions to reverse drift if it arises in a mouse colony. THE IMPORTANCE OF GENETIC STABILITY IN MOUSE RESEARCH Genetic drift occurs in any independent mouse breeding colony and has the potential to negatively affect ...

  17. [PDF] Genetic drift

    Semantic Scholar extracted view of "Genetic drift" by J. Masel. ... Search 217,710,868 papers from all fields of science. Search. Sign In Create Free Account. DOI: 10.1016/j.cub.2011.08.007; Corpus ID: 17619958; Genetic drift ... AI-powered research tool for scientific literature, based at the Allen Institute for AI. Learn More.

  18. Evaluating genetic drift in time-series evolutionary analysis

    2. Results. The potential to correctly identify a model of drift was evaluated using a Hidden Markov Model with an independent emission component, based on a version of the Kalman filter (Barber, 2012, Fischer, Vázquez-García, Illingworth, Mustonen, 2014).In general terms, we represented the frequency of an allele as a probability distribution, propagated at each generation, and observed via ...

  19. Five steps to connect genetic risk variants to disease

    The V2G2P approach presented in this paper enables the mapping of target genes and biological functions from genetic variants discovered in GWAS, which is a key challenge in human genetics.

  20. Strategies to Minimize Genetic Drift

    White Papers; Strategies to Minimize Genetic Drift; Strategies to Minimize Genetic Drift Published Date March 23, 2020 Expand Fullscreen Exit Fullscreen. ... Aged C57BL/6J Mice for Research Studies Most Recent Flipbooks. Show previous Show next. 5 months ago Utility Of Case Reports & JAX-CKB In Mitigating Challenge Of Treating Patients With ...

  21. Learning mitigates genetic drift

    Learning after every unsuccessful attack instead of at the end of every timestep produces qualitatively distinct results (Fig. 3) that, nevertheless, support the hypothesis that learning mitigates genetic drift. Still, the effect remains strong as it can lead to a difference of up to 50 alleles in 50 timesteps.

  22. Sustainability

    Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications. ... gene mutations, genetic drift ...

  23. Biology Undergraduates' Misconceptions about Genetic Drift

    This framework suggests three hypotheses regarding undergraduates' conceptions of genetic drift. The rest of this paper presents the framework and hypotheses, followed by implications for instruction and future research. ... and future research on student conceptions of drift has the potential to be just as fruitful. Supplementary Material.

  24. Genetic Drift

    Genetic drift is a change in allele frequency in a population, due to a random selection of certain genes. Oftentimes, mutations within the DNA can have no effect on the fitness of an organism. These changes in genetics can increase or decrease in a population, simply due to chance. Genetic Drift Explained. Although variations of genes (also ...

  25. Population genetics: past, present, and future

    Therefore, genetic duplication is predicted to proceed in a nearly neutral manner based on mutation pressure and genetic drift. In addition, "concerted evolution" in minisatellites used as markers for hyper-polymorphisms, and in other sequences such as rRNA genes can be explained well by Ohno's theory (Hillis et al. 1991 ; Jeffreys et al ...