U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Environmental Epidemiology; National Research Council (US) Commission on Life Sciences. Environmental Epidemiology: Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington (DC): National Academies Press (US); 1997.

Cover of Environmental Epidemiology

Environmental Epidemiology: Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology.

  • Hardcopy Version at National Academies Press

2 Environmental-Epidemiology Studies: Their Design and Conduct

This chapter discusses the origins of epidemiologic study and summarizes common analytic techniques. After a brief discussion of study designs and the types of information they produce, this chapter notes several difficulties for studies of environmental epidemiology, including the problems of studying small numbers of persons or rare diseases. We recommend that research on study designs focus on the improvement of statistical power or probability of detecting an effect. Finally, we review principles for inferring causation in epidemiology.

  • Origins of Epidemiology

Although early epidemiologic studies often focused on infectious diseases and death, epidemiology today has a much broader application, as ''the study of the distribution and determinants of health-related states and events in specified populations and the application of this study to the control of health problems" (Tyler and Last, 1991, p. 12). Traditionally, epidemiology has been linked with disease prevention, in that its results can indicate risk factors that can be modified in order to control or eliminate certain diseases.

As chapter 1 indicates, environmental epidemiology is a logical extension of the field, expanding the range of concerns to include biologic, physical, or chemical factors that may be related to patterns of health and disease in populations. In general, environmental epidemiology is an observational rather than an experimental science; scientific deductions are drawn from patterns of occurrence. Its principal aim is to identify risk factors that can be averted or reduced so as to prevent or reduce the risk of future disease and promote public health.

  • Types of Studies in Environmental Epidemiology

Environmental-epidemiologic studies can be classified broadly into 2 categories that are not mutually exclusive: descriptive and analytic. Typically, descriptive studies are most useful for generating hypotheses and analytic studies most useful for testing hypotheses, though each type of study can be used for both purposes. Whether a study is hypothesis-testing or hypothesis-generating depends more on the sequence of past studies and the present state of knowledge (i.e., whether a hypothesis currently under evaluation was suggested by a previous study) than on the study design. Recent innovations in descriptive studies sometimes permit refined assessments of dose-response relations and etiologic factors.

Descriptive Studies

Descriptive studies include case reports, surveillance systems, ecologic studies, and cluster studies (WHO, 1983).

Case Reports

A case report is a descriptive study of a single individual or small group in which the study of an association between an observed effect and a specific environmental exposure is based on detailed clinical evaluations and histories of the individual(s). These reports require few financial or personnel resources other than those of clinical medicine, and they may indicate whether additional study of a larger group of persons with similar health problems and exposures should be undertaken. However, the value of case reports is often limited because they lack a context of the disease in unexposed persons, variables such as time and dose of exposure are generally not known, and controls are absent. They are most likely to be useful when the disease is uncommon and when it is caused exclusively or almost exclusively by a single kind of exposure. In spite of these limitations, many known human environmental toxicants (e.g., methyl mercury, asbestos, tobacco smoke, and radon) first came to attention in case reports and series developed by astute clinicians, pathologists, and health workers. Public-health agencies must often investigate clusters of cases that are reported to them by private physicians and others. While case reports may not lead to identification of new causes of disease, they are more likely to point to specific hypotheses and to biologically meaningful associations if either the disease or the exposure is relatively rare.

Surveillance Systems

These systems provide broad-scale information on specified populations for which epidemiologic analyses can be conducted. Surveillance systems are generally designed to attain complete or nearly complete coverage of every identified instance of certain defined conditions in a defined population. Thus, they can be used to estimate the background incidence and prevalence of adverse effects, and trends can be analyzed across time and between populations or geographic areas.

Surveillance systems can identify increases or decreases in the occurrence of deaths from specific diseases and thus suggest or test hypotheses related to environmental exposures. For example, observations of a decline in age-adjusted stomach-cancer rates over time in the United States have stimulated the development of hypotheses about changes in dietary habits in the population as a whole, as well as about changes in the use of food preservatives and refrigeration (Howson et al., 1986) that might explain these trends. Similarly, after postmenopausal estrogen use fell in the United States, rates of endometrial carcinoma declined in women over age 65, lending support to an inference drawn from case-control studies that postmenopausal estrogen use increased the risk of endometrial cancer (Austin and Roe, 1982). In another instance, surveillance data from the National Center for Health Statistics suggested that a fall in blood-lead levels in US children was linked to a drop in gasoline-lead levels (Annest et al., 1983).

As public-health agencies have expanded the scope of surveillance systems (see chapter 5 ), it has become feasible to study the relationships between disease patterns and variations in environmental factors. Surveillance systems are expected to become increasingly common because the quality of their data is rising, statistical methods are improving, and costs are declining. The Agency for Toxic Substances and Disease Registry (ATSDR) has devised several surveillance systems to monitor the health of persons believed to have incurred exposure to such substances as trichloroethylene and dioxin. No results are yet available from those systems. If these exposure registries are to produce valuable results, they will need to include sufficient numbers of persons over a long enough period for diseases of interest to manifest themselves in numbers sufficient to demonstrate that some problem exists or that the problem is unlikely to exist and be large enough to cause serious concern.

Ecologic Studies

Ecologic studies explore the statistical connection between disease and estimated exposures in population groups rather than individuals. They combine data from vital records, hospital discharges, or disease registries with grouped data or estimates of exposure, such as factory emissions in a given geographic area, proximity to waste sites, or air or water pollution levels. Observed associations may provide support for further investigations. Ecologic studies suffer from serious weaknesses: they assign group exposure levels to all members of the group, fail to control for individual confounding factors, use necessarily crude estimates of exposure, and may not capture the relevant exposure at the time of disease induction. Although some population groups in ecologic studies may appear similar to "cohorts" (see below), they lack the individual data that permit their analysis in a cohort study. The "ecologic fallacy" refers to drawing inferences incorrectly from data on groups or about individuals in the groups.

Several advances have facilitated an increase in the number of ecologic studies, including the development of surveillance systems, improved environmental-exposure databases (e.g., by ASTDR), and increased availability of sophisticated tools, such as geographic information systems. The value of ecologic studies may be strengthened as methods for estimating exposure are improved. Where valid proxies for gradations of exposure and relevant confounding variables can be devised, the ecologic fallacy may be reduced or overcome.

Ecologic investigations have provided important clues about causal associations even though these studies can be difficult to interpret. For example, fluoride was first found to prevent dental caries on the basis of observed correlations between geographic variations in natural levels of fluoride and rates of tooth decay (Dean et al., 1942). Similarly, rates of cardiovascular disease and cancer among immigrants have been correlated with those of their newly acquired compatriots, suggesting that changes in dietary and other factors are involved. Further refinements in the parameters of interest in ecologic studies might permit these studies to generate more-precise indications of associations between risk factors and disease (Greenland, 1990).

Studies of health problems in relation to fixed sources of environmental exposure have often relied on either labor-intensive techniques, such as personal interviews, or much more general classifications, such as ZIP code or town of residence. The latter approach has the obvious problems of errors in classification of actual residence and of including people in the exposed category who live far from the site and have little opportunity for exposure. This difficulty has been partially overcome by including better geographic-location information as part of state and federal lists of potential sources of environmental exposure (e.g., Superfund sites) and by public availability of more-complete coding of geographic information in US census data.

This additional information, along with the availability of improved mapping software, has greatly improved our ability to link health data, such as cancer incidence with residence near a source of environmental exposure. With good geographic coding, disease cases and controls can be readily and quickly located in relation to an environmental source so that various measures of distance and direction can be studied. Data on the number and characteristics of people living in the area can also be obtained from census data.

Cluster Studies

A cluster study is a descriptive study of the population in a geographic area, occupational setting, or other small group in which the rate of a specific adverse effect is much higher than expected. Further, the group is often defined after the fact; that is, the "cluster" comes to attention, and the group is then defined so as to include it. Thus, clusters usually have the drawbacks of small samples. Cluster studies suffer from a major tautology: the data that inspire a hypothesized relation between a given exposure and a specific health outcome tend to be used to test this hypothesis, and then exposure and risk-factor data may be generated for persons defined to be in the study group and, usually, in some control group. For example, a reported cluster of cardiac birth defects that occur near a hazardous-waste site may be "tested" by comparing the measured rates of these defects in the same given geographic area with those from outside the same area. This is a highly unreliable approach methodologically and statistically, as the sample being studied has not been randomly selected. Nonetheless, the approach can be useful when the relative risk is extremely high, and it can be useful in developing hypotheses for study with other data. Many occupational hazards were first identified because clusters of disease were detected in specific workplaces, and other environmental diseases may also be ascertainable through cluster analysis.

Analytic Studies

In contrast to descriptive studies, analytic studies are based on more individually detailed data from individuals that can be used to control for confounding, and they are usually more costly and labor-intensive. Information from medical records, clinical or laboratory investigations, questionnaire results, or direct measures or estimates of exposures may allow analytic studies to explore hypotheses about suspected causes of disease or identify and measure risk factors that increase the chance that a given disease will occur. Analytic studies may also be a source of additional specific hypotheses, often leading to a sequence of studies, the more recent being designed to attempt to refute hypotheses raised by earlier studies.

The classic designs of analytical studies are case-control and cohort studies. In addition, 2 "hybrid" designs—nested case-control studies and case-cohort studies—can be based on identified cohorts.

Case-Control Studies

Case-control studies compare exposures of individuals who have a specific adverse effect or disease with exposures of controls who do not have the effect or disease; controls generally come from the same population from which the cases were derived. There is an extensive literature on the design of case-control studies, including selection of controls, correction for confounding, statistical methods for analysis, and presentation of measures of effect, usually the odds ratio (Schlesselman, 1982). These studies generally depend on the collection of retrospective data. They may suffer from recall bias, i.e., the tendency of people who have a disease to remember putative causes more readily than those without a given disease. However, it is often possible in a case-control study to collect histories of exposure to many different factors and control for confounding more efficiently than in a large cohort study, where the costs of collecting substantial exposure data from all the members of the cohort may be prohibitive. It is likely that case-control studies will be conducted with increasing frequency as new ways of characterizing exposure through the use of biologic markers are developed (see chapter 3 ), mirroring the development that has occurred in the last 2 decades in other areas of epidemiology.

Cohort Studies

These studies identify a group of persons called a cohort, or sometimes several cohorts with differing kinds of the exposures of interest. Sometimes, a control group has zero exposure. The cohort study evaluates associations between the exposure(s) and 1 or more health outcomes in the cohort(s). In a cohort study, individuals with differing exposures to a suspected risk factor are identified and then observed for the occurrence of certain health effects over some period, commonly years rather than weeks or months. The occurrence rates of the disease of interest are measured and related to estimated exposure levels.

Cohort studies are of 2 kinds—retrospective and prospective—each with advantages and disadvantages. The retrospective (or historical) cohort study relates a complete set of outcomes already observed in a defined population to exposures that occurred earlier; data on both exposure and outcomes must be available at the time the study is undertaken. Prospective cohort studies, in which current exposure is directly measured and individuals are then followed, have a potential for more-accurate measurements but may suffer from loss of subjects to followup or bias in ascertainment of end points. Also, it may be necessary to wait for many years or even for the time of followup to exceed the latent period between exposure and effect or for sufficient outcome events to occur.

Cohort studies can utilize questionnaires or laboratory tests to measure both exposure and outcome. One advantage over case-control studies is that multiple outcomes can be evaluated simultaneously in relation to the exposure data. However, the power to test associations will depend on the frequencies of the different outcomes considered, which in turn depend on the number of persons followed (see discussion below on power considerations).

One type of cohort study seeks to correlate time trends in outcome measures and environmental exposures. Such studies can be divided into 3 broad classes: those in which the outcome is estimated or measured relatively few times, those in which outcome variables are linked to episodic variations in exposure, and those in which long-term time trends in measures or estimates of health outcomes are linked with variations in monitored or estimated exposures. The first class is seen in some cardiovascular studies in which determinations of health status are made annually. Outcome measures are often continuous, as well as dichotomous. Other examples are those that correlate the development of chronic bronchitis with exposure to air pollution and prospective cohort studies that follow children's lead exposure and cognitive development from conception or birth. The second broad class examines changes in response to exposures that are episodic or of short duration. Studies that link peaks in air pollution to patterns of asthma fall into this category. The third broad class is similar to time-series studies often conducted in the social sciences. In such studies, both exposure and outcome measures are collected, perhaps on a daily basis, for periods of months or even years. Short-term fluctuations in those outcomes are correlated with short-term variations in environmental exposures. For instance, studies of changes in peak respiratory flow, respiratory symptoms, hospital admission, and daily mortality can be linked to changes in environmental air pollution. In most of these studies, the multifactorial nature of the outcome means that the explanatory power of each environmental variable is generally small. This has necessitated relatively large samples and careful modeling to avoid potential confounding.

Nested Case-control Studies

These studies are similar to ordinary cohort studies except that only a sample of controls (persons free of the disease) are studied in detail. They generally use old cases in a defined cohort that has been followed long enough for sufficient outcome events to have occurred but only a random sample of cohort members who were eligible to become cases but had not developed the disease or died at the time the corresponding cases were diagnosed. Controls are often matched to cases on 1 or more potential confounders (e.g., age, sex, and smoking status) that the investigator does not wish to study. An individual selected as a control may become a case if the disease of interest develops. Nested case-control studies can be designed to have almost as much statistical power as the cohort study from which they are derived because of tighter experimental control, and they can be used to derive better inferences on exposure-disease associations. These studies may also be substantially more economical if the determination of exposure of the controls can be limited to a sample.

Case-cohort Studies

In this design, a random sample of the total cohort is drawn and taken to represent the exposure experience of the cohort. When the cohort has been followed long enough to accrue sufficient cases for analysis, the exposure experience of this subcohort is compared with that of the cases (who arise from the total cohort and might or might not be individuals in the subcohort who become cases). This design also provides economies in obtaining exposure data compared with a cohort study, but surveillance of the total cohort is still needed to identify the cases that occur.

  • Special Considerations

Many epidemiologic studies explore the relation between risk factors and health outcomes, often examining the relation between a single exposure and a single factor or disease. In environmental epidemiology, however, both exposures and outcomes are usually multiple. Many of the risk factors of interest derive from large-scale data sets on environmental pollution that involve continuous variables, as well as a variety of clinical health indicators. Much of cancer epidemiology has focused on studying specific anatomic sites of cancer and delineating important contributors to specific types of cancer, such as the link between occupational exposure to benzene and leukemia or that between asbestos and mesothelioma. Similarly, much of cardiovascular epidemiology has involved prospective cohort studies that concentrate on identifying a few specific risk factors.

Cross-Sectional Designs

Many environmental-epidemiology studies are cross-sectional. In such designs, the relations between contemporaneous assessments of outcome and exposure are studied; this can give rise to difficulties in determining the temporal aspects of an association. Often the exposure variable is measured continuously but with substantial error. The outcome is generally multifactorial, requiring a large number of covariates, and can include a wide range of health effects for which standard nomenclature, coding, and test systems do not exist. Examples of such outcomes include neurologic outcomes used in studies of lead toxicity, outcomes of some pulmonary-function tests, and diaries of activity level. Environmental epidemiology often relies extensively on a complex of study designs, such as cross-sectional designs that meld both analytic and descriptive studies, and often considers multiple health outcomes as well as multiple exposure variables.

Molecular-Epidemiology Studies

Recent advances in molecular biology provide new ways to identify and measure markers of exposure or outcome, such as DNA adducts or oncogenes, that are identified through molecular biology. Such data can be used in any of the epidemiologic methods, so such studies have been designated "molecular epidemiology" by molecular biologists. The committee addresses the utility of biologic markers of exposure further in chapter 3 and biologic markers of outcome in chapter 4 . However, the committee notes that the application of molecular biology to humans as distinct from experimental animals does not in itself justify the term "molecular epidemiology." For a study to be classed as molecular epidemiology, it is essential that valid epidemiologic techniques and study designs be used, including the selection of study subjects from a defined population. This field can develop only if epidemiologists and molecular biologists collaborate in the design and conduct of such studies. In the absence of adequate implementation of both aspects, the term molecular epidemiology should not be used.

Considerations of the Power of Study Designs

Before any study is undertaken, sound epidemiologic practice requires careful consideration of statistical power, that is, the probability that a given research study will be able to detect a true positive effect if it exists. A study's power depends on many factors, including the increases in risk of exposed persons for the outcome under study, the size of the population to be surveyed, and, for cohort studies, the duration of followup. The higher the expected relative risk (RR), the smaller the population that needs to be surveyed. Conversely, the larger the population studied, the smaller the RR that can be detected. Most environmental pollution includes relatively low levels of exposure to complexes of poorly defined materials. Thus, an environmental pollutant is likely to be associated with relatively small risks, though it could affect large numbers of people.

At any given level of statistical significance, there is a relation among study power, sample size, prevalence of exposure, and expected rate of a given outcome. In general, studies of larger numbers of persons over longer periods are more likely to yield positive results than those involving smaller populations for shorter periods. However, even large studies with long followup will result in uncertain findings if exposure is poorly measured or misclassified (see chapter 3 ). The sample size needed to achieve a given study power is also related to whether exposure is measured as a dichotomous or continuous variable, to the variability in distribution of the exposure, and to the effects of confounders and errors in the measure of exposure. In general, larger samples are needed when exposure measures are not continuous, when the effects of confounders and errors of measurement cannot be taken into account, and when the adverse outcome is a rare event (Greenland, 1983; McKeown-Eyssen and Thomas, 1985; Lubin et al., 1988; Lubin and Gail, 1990). Finally, all statistical-power calculations depend on the critical assumption that bias in both exposure and outcome can be ignored; this assumption may be rarely true in practice.

Statistical-significance testing is used to assess the likelihood that positive results of any given study represent a "real" association. However, no matter which statistical tests are employed, common research designs all produce studies with fixed, known chances of making a type I error, that is, of finding a positive result when one does not really exist. This probability is called alpha and is generally determined by a statistician at the time the protocol is drafted. It is commonly set at 5%.

Of equal importance for environmental epidemiology is a consideration of the probability that a failure to find an effect is a false negative, or type II error. This often occurs when small numbers of persons are studied and when relatively low risks are involved. Statistical tests cannot determine whether or not an error has been made but can indicate the probability that an error could occur, called beta, if the effect is of some hypothetical size specified by the investigator. The power to detect an effect of that size, defined as 1-beta, depends on the alpha level of significance testing and the unknown relative risk. Tables have been devised to help determine the number of observations required to have specified power to detect an effect of specified size if an association exists (Fleiss, 1981). For any specific size of effect, the power of a study increases as the study size increases.

Many episodes of environmental contamination involve low relative risks and small numbers of people, so environmental-epidemiology studies often lack sufficient power to detect important effects. This makes the development of innovative techniques to combine results an important priority for the field.

P values are measures of random uncertainty alone and are dominated by sample size and other power considerations. In observational epidemiology, the primary sources of uncertainty about whether an effect is present are confounding, selection bias, and similar problems. In contrast, measures of the size of a possible effect, such as regression coefficients or odds ratios, may be less sensitive to sample size. If associations are due primarily to confounding, investigators may report considerable variation in measures of effect across different studies and populations. Hence, in modern epidemiology these measures of effect, and confidence intervals for them, are given greater attention than P values. Consistency in these measures across studies with differences in exposures to potential confounders can provide valuable clues about whether observed associations indicate cause-effect relationships.

A very severe problem in environmental epidemiology is known as ''multiple comparisons." If the probability of an error with 1 comparison (P value or confidence bounds) is kept at the traditional value of 5%, a research study that includes more than 1 such comparison has a higher chance of making at least 1 error. While statistical methods exist to remove this effect, they have an unintended and often devastating effect on statistical power. This matter is dealt with in many statistical texts, so we do not expand on it here.

  • Causal Inference in Epidemiology

The previous volume elaborated on criteria relevant to drawing inferences from epidemiologic studies (see NRC, 1991, for general guidance on these studies). They are summarized here as follows.

Strength of Association

The strength of association measures the size of the risk that is correlated with a causal agent (exposure). It is typically expressed as the risk of an exposed person's incurring a disease compared with that of a non-exposed person. The most-common comparison measures are the standard mortality ratio (SMR), the odds ratio (OR), and the relative risk (RR). The larger the ratio (SMR, OR, or RR), the stronger the association between the inferred link of exposure to disease for exposed individuals. For example, an RR of 1.4 for lung cancer after exposure to environmental tobacco smoke indicates that exposed persons are 40% more likely to develop lung cancer than are non-exposed persons. The strength of association must often be considered in relation to the population at risk and intensity of the exposure. For example, an RR of 4 that affects a small population may have a much smaller public-health impact than does an RR of 1.2 that affects much larger numbers. Epidemiologists are sometimes concerned with attributable risk, which is a measure of the rate of disease above the background rate that can be attributed to exposure. This is more difficult to detect, study, and estimate in environmental epidemiology because it is difficult to determine a baseline rate. Problems with using strength of association as the principal criterion for causality include the fact that misclassification and other biases can profoundly change the strength of association.

Specificity of Association

Specificity suggests that the suspected causal agent induces a single disease. While this may apply to a few associations between exposure and disease (e.g., vinyl chloride and angiosarcoma of the liver), single diseases (e.g., lung cancer) can have many causes, and single agents can cause many effects (e.g., lead at high-enough levels can cause increased blood pressure, neurologic symptoms, reproductive effects, and kidney damage). Specificity can be diminished by inappropriate or inaccurate grouping of diseases in a way that obscures a real effect (e.g., grouping some rare forms of cancer with other cancers).

Consistency of Association

The observed relation between exposure and disease is seen rather regularly in independently conducted studies; the value of consistency is enhanced if the studies are of different types and in different populations. For example, a study of the association between lung cancer and passive smoking may produce an RR of only 2.0 or less, but this elevated risk has now been reported in over 30 studies carried out in 6 countries (NRC, 1986). Because of the variety in study protocols and populations, claims of bias in all the studies have little credibility. Studies not having statistically significant results can be combined with similar studies, as long as they all use sound methods. Studies that meet the standards for good epidemiologic practice can be grouped for meta-analysis, which allows for statistical pooling of different studies.

Temporality

The exposure should precede the development of symptoms or diseases of interest by an appropriate interval. The time between exposure and disease should be consistent with biologic understanding of the time from exposure to the observed disease. For example, tobacco typically causes lung cancer 25 years or more after the beginning of regular exposure, though a few cases have been observed within 10 years of first exposure (Doll and Peto, 1978).

Biologic Gradient of Relation Between Estimated Exposure and Disease

In general, a greater exposure should cause a stronger (though not always proportional) effect. For example, smoking more cigarettes increases the risk of lung cancer. Typically, dose equals the concentration integrated over time. In some cases, however, dosing patterns can be more important than the overall dose in the relation between dose and response. Also, the timing of the exposure can be critical in the dose-response relation.

Effects of Removal of a Suspected Cause

If a causal relation exists, removing the causal agent should reduce or eliminate the effect; if the effect is irreversible in individuals already exposed, this reduction may not be apparent until the exposed generation is largely removed from the study population by death or in some other way (e.g., limitation to persons under age 65). If different causes are related to a single disease, then the principle applies only to the specific causal factor removed.

Biologic Plausibility

The relation between the suspected causal agent and suspected effect should make sense, given the current understanding of human biology. Animal studies or other experimental evidence can strengthen or weaken the biologic plausibility of the relation by demonstrating mechanisms of disease or determining whether the association between exposure and disease holds in experimental situations. However, lack of a known mechanism does not invalidate a causal association. For many diseases, the underlying mechanisms are unknown.

  • Annest, J.L., J.L. Pirkle, D. Makuc, J.W. Neese, D.D. Bayse, and M.G. Kovar. 1983. Chronological trend in blood lead levels between 1976 and 1980 . N. Engl. J. Med. 308:1373-1377. [ PubMed : 6188954 ]
  • Austin, D.F., and K.M. Roe. 1982. The decreasing incidence of endometrial cancer: public health implications . Am. J. Pub. Health 72:65-68. [ PMC free article : PMC1649756 ] [ PubMed : 7053625 ]
  • Dean, H.T., F.A. Arnold Jr., and E. Elvove. 1942. Domestic water and dental caries. V . Additional studies of the relation of fluoride domestic waters to dental caries experiences in 4425 white children aged 12-14 years, of 13 cities in 4 states. Public Health Rep. 57:1155-1179.
  • Doll, R., and R. Peto. 1978. Cigarette smoking and bronchial carcinoma: dose and time relationships among regular smokers and lifelong non-smokers . J. Epidemiol. Community Health 32:303-313. [ PMC free article : PMC1060963 ] [ PubMed : 744822 ]
  • Fleiss, J.L. 1981. Statistical Methods for Rates and Proportions . New York: Wiley. 321 pp.
  • Greenland, S. 1983. Tests for interaction in epidemiologic studies: a review and a study of power . Stat. Med. 2:243-251. [ PubMed : 6359318 ]
  • Greenland, S. 1990. Divergent Biases in Ecologic and Individual-Level Studies . Paper presented at the Second Annual Meeting of the International Society for Environmental Epidemiology, August 12-15, 1990, Berkeley, CA.
  • Howson, C.P., T. Hiyama, and E.L. Wynder. 1986. The decline in gastric cancer: epidemiology of an unplanned triumph . Epidemiol. Rev. 8:1-27. [ PubMed : 3533579 ]
  • Lubin, J.H., and M.H. Gail. 1990. On power and sample size for studying features of the relative odds of disease . Am. J. Epidemiol. 131:552-566. [ PubMed : 2301364 ]
  • Lubin, J.H., M.H. Gail, and A.G. Ershow. 1988. Sample size and power for case-control studies when exposures are continuous . Stat. Med. 7:363-376. [ PubMed : 3358016 ]
  • McKeown-Eyssen, G.E., and D.C. Thomas. 1985. Sample size determination in case-control studies: The influence of the distribution of exposure . J. Chronic Dis. 38:559-568. [ PubMed : 4008598 ]
  • NRC (National Research Council). 1986. Environmental Tobacco Smoke: Measuring Exposures and Assessing Health Effects . Washington, DC: National Academy Press. 337 pp. [ PubMed : 25032469 ]
  • NRC (National Research Council). 1991. Environmental Epidemiology . Public Health and Hazardous Wastes . Washington, DC: National Academy Press. 282 pp.
  • Schlesselman, J.J. 1982. Case-Control Studies: Design, Conduct, Analysis . New York: Oxford University Press. 354 pp.
  • Tyler, C.W., Jr., and J.M. Last. 1991. Epidemiology . Pp. 11-39 in J. M. Last, editor; and R. B. Wallace, editor. , eds. Maxcy-Rosenau-Last Public Health and Preventive Medicine . 13th ed. Norwalk, CT: Appleton & Lange.
  • WHO (World Health Organization). 1983. Guidelines on Studies in Environmental Epidemiology . Environmental Health Criteria 27 . Geneva: World Health Organization.
  • Cite this Page National Research Council (US) Committee on Environmental Epidemiology; National Research Council (US) Commission on Life Sciences. Environmental Epidemiology: Volume 2: Use of the Gray Literature and Other Data in Environmental Epidemiology. Washington (DC): National Academies Press (US); 1997. 2, Environmental-Epidemiology Studies: Their Design and Conduct.
  • PDF version of this title (945K)

In this Page

Related information.

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Recent Activity

  • Environmental-Epidemiology Studies: Their Design and Conduct - Environmental Epi... Environmental-Epidemiology Studies: Their Design and Conduct - Environmental Epidemiology

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

A cross-divisional department spanning

  • Environmental Epidemiology
  • Baltimore Community Outreach and Engagement Projects
  • Northeast Market Patron Survey
  • Evaluating the Impacts of Energy Options on Baltimore’s Air Quality
  • One Health and Asthma Prevention in Baltimore
  • Safer Urban Agriculture in Baltimore
  • Diversity and Equity Initiatives
  • Environmental Health and Engineering Student Organization (EHESO)
  • Message from the Chair
  • Postdoctoral Opportunity in Neuroscientist/(Neuro)toxicologist
  • Postdoctoral Opportunity in Public Health Policy
  • Research Assistant
  • Bachelor of Science in Environmental Engineering
  • Program Objectives and Outcomes
  • Why Hopkins?
  • Application Fee Waiver Requirements
  • Areas of Focus
  • Graduate Student Resources
  • Jensen Fellowship
  • Postdoctoral Opportunity: Toxicology Policy, Law and Regulatory Analysis
  • Quotes from our EHE Alumni
  • Non-Degree Programs
  • EHE Research Retreat
  • Geyh-Bouwer Trainee Practice Award
  • Mobile Air Pollution Measurement Laboratory
  • Air Pollution and Cardiorespiratory Diseases
  • Antimicrobial Resistance and Infectious Disease
  • Biosecurity and Emerging Threats
  • COVID-19 Research
  • Carcinogens and Cancer
  • Children's Environmental Health
  • Chronic Disease Etiology and Prevention
  • Community Sustainability, Resilience, and Preparedness
  • Consumer Product Safety
  • Energy Management and Alternative Technologies
  • Environmental Chemistry, Microbiology and Ecology
  • Environmental Engineering
  • Environmental Inequities and Injustice
  • Environmental Resource Quality
  • Epigenetic Regulation in Environmental Diseases
  • Food and Agricultural Systems
  • Geomorphology, Geochemistry, and Hydrology
  • About the Program
  • Conceptual Framework
  • Land Use and Energy Issues
  • MPH Concentration in Global Environmental Sustainability & Health
  • Projects and Research
  • Recommended Reading
  • Research on Land Use and Public Health
  • What the Future Must Look Like
  • Novel Exposure Assessment
  • Risk Sciences and Public Policy
  • Social and Behavioral Sciences
  • Toxicology, Physiology, & Cell Biology
  • Johns Hopkins University Water Institute
  • Worker Health and Safety
  • Teaching and Research Labs at WSE
  • The INnovations to Generate Estimates of children's Soil/dust inTake (INGEST) Study
  • Centers and Institutes
  • Environmental Health and Engineering Doctoral Students
  • Full-time Faculty Directory
  • Postdoctoral Fellows
  • News and Events
  • Make a Gift

case study environmental epidemiology

Environmental epidemiologists seek to understand the health effects of biological, chemical, and physical stressors to improve the health and well-being of human populations. EHE researchers are on the forefront of developing and applying advanced epidemiologic and causal inference approaches to address pressing environmental health challenges. Our faculty are nationally and internationally recognized leaders in their fields with expertise in a variety of research areas, including the effects of chemical exposures, climate, air pollutants, and the built environment on health across the lifespan.

Research Highlights

Methodology for children’s environmental health research.

The NIH’s  Environmental influences on Child Health Outcomes Program  aims to understand the effects of a broad range of early environmental influences on child health and development. As part of ECHO’s Data Analysis Center, EHE faculty serve as experts on epidemiologic methods for environmental health research including application of exposure biomarkers, geospatial methods for exposure assessment, and estimating effects of exposure mixtures. Our faculty are leading and collaborating on numerous ECHO projects leveraging rich  chemical exposure data  available within ECHO.

Using Electronic Health Records (EHRs) for Population Health Research

Geisinger Center for Health Research and the Johns Hopkins Bloomberg School of Public Health joined to form the  Environmental Health Institute (EHI)  in 2007. The mission of the EHI is to understand how land use, the built environment, energy production and use, food systems and water systems may impact human health in central and northeast Pennsylvania. EHRs are ideal for environmental epidemiologic research given individuals seeking medical care are represented across diverse built, physical, and social environments. Geisinger EHRs have been used to study the effects of unconventional natural gas development on asthma, birth outcomes, chronic rhinosinusitis, depressive symptoms, fatigue, heart failure, and migraine; risk of methicillin-resistant  Staphylococcus aureus  infection from high-density livestock operations; and effects of built and natural environments on type 2 diabetes risk in adults.

Associated Faculty

The goal of Dr. Agnew's research is to better understand the relationship between workplace exposures, worker characteristics, and musculoskeletal disorders so that these debilitating and expensive conditions can be prevented.

More Information

Dr. Barnett's research interests include best practice models to enhance all-hazards public health emergency readiness and response.

Joe's laboratory has been studying transporters and their interaction with environmental toxins.

As an environmental and pediatric/perinatal epidemiologist, Jessie's goal is to conduct innovative and high impact research to inform environmental policies targeted at improving children’s health. 

As a molecular epidemiologist and an environmental microbiologist, Meghan studies the interface of bacteria and hosts to reduce microbe-mediated disease in humans and animals.

Dr. Hamidi has expertise in geospatial data, built environment, housing and transportation and their connections to public health.

Chris' research focuses on environmentally-mediated impacts on health and well-being, specifically community land use, waste disposal, and food production practices, and integrates the academic disciplines of environmental microbiology, molecular biology, immunology, epidemiology, and community-based participatory research (CBPR).

Kirsten's research goals involve the use of direct-reading instrumentation to improve spatiotemporal exposure assessment. Direct-reading (i.e. “real-time”) monitors can rapidly assess exposures to various hazards.

Jordan is an environmental epidemiologist working on several projects related to early life chemical exposures and childhood health and developmental outcomes.

Keeve's research aims to generate the scientific evidence needed to support decisions that mitigate human exposures to chemical and microbial hazards associated with food production. 

Lesliam's research focuses on characterizing environmental exposures to endocrine disrupting agents and examining their potential health effects on highly vulnerable, low-income and minority populations underrepresented and understudied in public health research, including occupational populations, pregnant women and women of reproductive age, and children.

Ram has conducted pioneering studies in occupational hygiene decision-making that synthesizes mathematical exposure models, monitoring data, and probabilistic expert judgment within a Bayesian framework. 

Kellogg's research laboratory focuses on environmental microbiology and engineering with an emphasis on the fate and transport of chemicals, emerging contaminants and pathogenic microorganisms in water, food, and the environment.

The focus of Fenna's research group is to understand the effects of environmental exposures on the development and function of our immune system.

Genee's research focuses on understanding the disproportionate burden of a changing climate on vulnerable populations and the impacts of neighborhood-level environmental exposures, including degraded infrastructure, unfair development, and chemical pollutants, on health disparities.

*Denotes faculty who are accepting PhD students. 

  • Search Menu
  • Advance articles
  • Editor's Choice
  • 100 years of the AJE
  • Collections
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • About American Journal of Epidemiology
  • About the Johns Hopkins Bloomberg School of Public Health
  • Journals Career Network
  • Editorial Board
  • Advertising and Corporate Services
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Society for Epidemiologic Research

Article Contents

  • < Previous

Environmental Epidemiology: Principles and Methods: By Ray M. Merrill

  • Article contents
  • Figures & tables
  • Supplementary Data

Harvey Checkoway, Environmental Epidemiology: Principles and Methods: By Ray M. Merrill, American Journal of Epidemiology , Volume 169, Issue 1, 1 January 2009, Pages 124–125, https://doi.org/10.1093/aje/kwn373

  • Permissions Icon Permissions

There is a long legacy of epidemiologic research on environmental determinants of disease, arguably beginning in the modern era with John Snow's investigations of drinking water as the cause of the London cholera epidemic in the 1850s. Environmental epidemiology has, in fact, experienced somewhat of a renaissance during the past 20 years, as public concerns and academic interests have grown regarding the potential health consequences of air, water, soil, and food contaminants, as well as those related to environmental conditions that occur on a global scale, notably climate change. Numerous monographs and collections of case studies have been published on environmental factors and health, yet there is a conspicuous need for a systematic textbook that addresses both methodological and content-specific aspects of the field. This text goes a long way toward filling this gap.

Professor Merrill has taken a rather ambitious approach—especially for a one-author book—of presenting standard epidemiologic study designs and those especially characteristic of environmental epidemiology (e.g., cluster investigation, time series) and data analysis approaches, illustrated with applications to studies of the major types of environmental hazards. The text also conveys a substantial amount of important information concerning the sources and levels of environmental risk factors, measurement methods, and policy implications associated with environmental epidemiology research.

The book is organized into 3 major sections. Section I, chapters 1–6, offers a brief overview of the field, including some historical examples; describes methods for monitoring environmental exposures; and summarizes conventional epidemiologic study designs and data analysis techniques. Section II, chapters 7–9, focuses more on methods widely used in environmental epidemiology, with chapters on disease clusters, mapping and geographic information systems, and time-series analyses. Section III, chapters 10–14, presents applications of epidemiologic methods to investigations of risks associated with classes of hazards: air pollution, soil and food contaminants, water pollution, radiation, and, climate change. The individual chapters begin with a set of learning objectives, and they conclude with lists of key issues, exercises, and study questions. There are also 5 relatively brief appendices to the book, including measures of health and environmental exposure data sources; sources, including websites, to obtain relevant data; selected statistical techniques; an exposure history questionnaire; and answers to the odd-numbered questions provided in the main text chapters. A comprehensive glossary and an index complete the text.

As the author states in the preface, the book is intended for use in undergraduate and graduate-level courses and as a guide for public health professionals. Students and teachers will undoubtedly appreciate the book's well-formulated organization. It is written very clearly in a style that should be accessible both to newcomers and to advanced students and professionals in the field, without compromising scientific rigor. The range of methodological and substantive topics covered suitably reflects current trends and developments in the field. The effective mix of substantive and methodological material is exemplified by chapter 10, “Indoor and Ambient Air Quality and Health,” which defines the types of air pollutants (nitrogen oxides, particulate matter, ozone, etc.), major sources and routes of exposure, related health effects, and epidemiologic methods of investigation. The chapter on climate change and health (chapter 14) is especially timely and has the additional virtue of providing a balanced review of sources of climate change, potential threats to human and environmental health, and policy implications.

Although the book is extremely well organized, highly informative, and clearly presented, there are some shortcomings. The chapter on statistical modeling and inference (chapter 5) and Appendix III (“Selected Statistical Techniques and Tests”) contain largely rudimentary material that could be omitted. Readers would be better advised to consult standard biostatistics and epidemiology methods texts. Some suggested replacements would be more in-depth descriptions of strengths and limitations of the various study designs widely adopted in environmental epidemiology—cohort, case-control, and case-crossover studies. The chapter devoted to the time-series design (chapter 9) is a good template for expanded methodological material. A more detailed presentation of risk assessment methods, or perhaps a separate chapter devoted to that topic, might also be desirable, insofar as risk assessment is often an important end result of environmental epidemiology research. Findings for many of the illustrative examples throughout the book are summarized in the text, whereas more explicit tabular and graphical displays of data would probably be appreciated by instructors and students. Also, many, but by no means the vast majority, of the examples are derived from studies conducted in the United States. More international examples, of which there are many, would lend a broader appeal to the book. Similarly, the presentation of regulatory agency policies for setting exposure standards in various chapters is largely devoted to the US agencies, especially the Environmental Protection Agency. Adding some examples of environmental regulatory processes in other countries might be considered.

On balance, this book is an excellent text for teaching environmental epidemiology to undergraduate and graduate-level students. Professionals working in environmental epidemiology and other branches of the discipline should also find this book to be an outstanding resource.

Conflict of interest: none declared.

Email alerts

Citing articles via, looking for your next opportunity.

  • Recommend to your Library

Affiliations

  • Online ISSN 1476-6256
  • Print ISSN 0002-9262
  • Copyright © 2024 Johns Hopkins Bloomberg School of Public Health
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Perspective
  • Published: 05 September 2018

Epidemiology: a foundation of environmental decision making

  • Kathleen C. (Kacee) Deener 1 ,
  • Jason D. Sacks 1 ,
  • Ellen F. Kirrane 1 ,
  • Barbara S. Glenn 1 ,
  • Maureen R. Gwinn 1 ,
  • Thomas F. Bateson 1 &
  • Thomas A. Burke 2  

Journal of Exposure Science & Environmental Epidemiology volume  28 ,  pages 515–521 ( 2018 ) Cite this article

703 Accesses

7 Citations

1 Altmetric

Metrics details

Many epidemiologic studies are designed so they can be drawn upon to provide scientific evidence for evaluating hazards of environmental exposures, conducting quantitative assessments of risk, and informing decisions designed to reduce or eliminate harmful exposures. However, experimental animal studies are often relied upon for environmental and public health policy making despite the expanding body of observational epidemiologic studies that could inform the relationship between actual, as opposed to controlled, exposures and health effects. This paper provides historical examples of how epidemiology has informed decisions at the U.S. Environmental Protection Agency, discusses some challenges with using epidemiology to inform decision making, and highlights advances in the field that may help address these challenges and further the use of epidemiologic studies moving forward.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 6 print issues and online access

251,40 € per year

only 41,90 € per issue

Buy this article

  • Purchase on Springer Link
  • Instant access to full article PDF

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

case study environmental epidemiology

Advancing systematic-review methodology in exposure science for environmental health decision making

case study environmental epidemiology

Challenges and recommendations on the conduct of systematic reviews of observational epidemiologic studies in environmental and occupational health

case study environmental epidemiology

Building a European exposure science strategy

Birnbaum L, Burke T, Jones J. Informing 21st-century risk assessments with 21st-century science. Environ Health Perspect. 2016;124:A60–3.

PubMed   PubMed Central   Google Scholar  

Gwinn M, Axelrad D, Bahadori T, Bussard D, Cascio W, Deener K. et al. Chemical risk assessment: traditional vs public health perspectives. Am J Public Health. 2017;107:1032–9.

Article   Google Scholar  

Owens EO, Patel MM, Kirrane E, Long TC, Brown J, Cote I. et al. Framework for assessing causality of air pollution-related health effects for reviews of the national ambient air quality standards. Regul Toxicol Pharmacol. 2017;88:332–7.

NRC (National Research Council). Risk assessment in the Federal Government: managing the process. Washington, DC: National Academy Press; 1983.

Google Scholar  

Greenbaum D. Epidemiology for decisions: why we need it, and opportunities to address its challenges. Bloomberg BNA: Environment & Safety Resource Center: Arlington, VA; 2016.

Pope CA, Dockery DW. Health effects of fine particulate air pollution: lines that connect. J Air Waste Manag Assoc. 2006;56:709–42.

Article   CAS   Google Scholar  

Dockery DW, Pope CA, 3rd, Xu X, Spengler JD, Ware JH, Fay ME. et al. An association between air pollution and mortality in six U.S. cities. N Engl J Med. 1993;329:1753–9.

Pope CA 3rd, Thun MJ, Namboodiri MM, Dockery DW, Evans JS, Speizer FE, et al. Particulate air pollution as a predictor of mortality in a prospective study of U.S. adults. Am J Respir Crit Care Med. 1995;151:669–74.

Krewski D, Burnett RT, Goldberg MS, Hoover BK, Siemiatycki, et al. Overview of the reanalysis of the Harvard Six Cities Study and American Cancer Society study of particulate air pollution and mortality. J Toxicol Environ Health A. 2003;66:1507–51.

USEPA (Environmental Protection Agency). National Ambient Air Quality Standards for Participate Matter. Final Rule. 62 FR 38652-38760. 1997. https://www.gpo.gov/fdsys/granule/FR-1997-07-18/97-18577 . Accessed 22 April 2017.

Krewski D, Burnett RT, Goldberg MS, Hoover K, Siemiatycki J, Jerrett M, et al. Reanalysis of the Harvard Six Cities Study and the American Cancer Society Study of Particulate Air Pollution and Mortality. Cambridge, MA: Health Effects Institute; 2001.

Laden F, Schwartz J, Speizer FE, Dockery DW. Reduction in fine particulate air pollution and mortality: Extended follow-up of the Harvard Six Cities study. Am J Respir Crit Care Med. 2006;173:667–72.

Kaufman JD, Adar SD, Barr RG, Budoff M, Burke GL, Curl CL. et al. Association between air pollution and coronary artery calcification within six metropolitan areas in the USA (the Multi-Ethnic Study of Atherosclerosis and Air Pollution): a longitudinal cohort study. Lancet. 2016;388:696–704.

Brook RD, Rajagopalan S, Pope CA, Brook JR, Bhatnagar A, Diez-Roux AV. et al. Particulate matter air pollution and cardiovascular disease: an update to the scientific statement from the American Heart Association. Circulation. 2010;121:2331–78.

USEPA (Environmental Protection Agency). Final Report: Integrated Science Assessment for Particulate Matter. Washington, DC, EPA/600/R-08/139F. 2009. https://cfpub.epa.gov/ncea/risk/recordisplay.cfm?deid=216546 . Accessed 7 June 2016.

USEPA (Environmental Protection Agency). National Ambient Air Quality Standards for Participate Matter. Final Rule. 78 FR 3085-3287. 2013. https://www.federalregister.gov/documents/2013/01/15/2012-30946/national-ambient-air-quality-standards-for-particulate-matter . Accessed 22 April 2017.

Rice DC. Lead-induced behavioral impairment on a spatial discrimination reversal task in monkeys exposed during different periods of development. Toxicol Appl Pharmacol. 1990;106:327–33.

USEPA (Environmental Protection Agency). Integrated science assessment for lead (EPA/600/R-10/075F). Research Triangle Park: U.S. Environmental Protection Agency, National Center for Environmental Assessment; 2013. http://cfpub.epa.gov/ncea/cfm/recordisplay.cfm?deid=255721

Federal Register (FR) Vol 73 No 219 Wednesday, 2 Nov 2008. p. 66964-7062.

USEPA (Environmental Protection Agency). Toxicological Review of Libby Amphibole Asbestos (EPA/635/R-11/002F). Washington, DC: Office of Research and Development, National Center for Environmental Assessment; 2014. https://cfpub.epa.gov/ncea/iris/iris_documents/documents/toxreviews/1026tr.pdf

Sullivan PA. Vermiculite, respiratory disease, and asbestos exposure in Libby, Montana: update of a Cohort Mortality Study. Environ Health Perspect. 2007;115:579–85.

Nachman KE, Fox MA, Sheehan MC, Burke TA, Rodricks JV, Woodruff TJ. Leveraging epidemiology to improve risk assessment. Open Epidemiol J. 2011;4:3–9.

Burns CJ, Wright M, Pierson JB, Bateson TF, Burstyn I, Goldstein DA. et al. Evaluating uncertainty to strengthen epidemiologic data for use in human health risk assessment. Environ Health Perspect. 2014;122:1160–5.

Rooney AA, Cooper GS, Jahnke GD, Lam J, Morgan RL, Boyles AL, et al. How credible are the study results? Evaluating and applying internal validity tools to literature-based assessments of environmental health hazards. Environ Int. 2016;92-93:617–29.

NRC (National Research Council). Review of EPA’s Integrated Risk Information System (IRIS) Process. Washington, DC: National Academies Press; 2014.

Rooney AA, Boyles AL, Wolfe MS, Bucher JR, Thayer KA. Systematic review and evidence integration for literature-based environmental health science assessments. Environ Health Perspect. 2014;122:711–8.

NTP. Handbook for Preparing Report on Carcinogens Monographs. Office of the Report on Carcinogens. 2015. p. 83 https://ntp.niehs.nih.gov/ntp/roc/handbook/roc_handbook_508.pdf .

Pearl J. An introduction to causal inference. Int J Biostat. 2010;6:1–59.

Weisskopf MG, Kioumourtzoglou MA, Roberts AL. Air pollution and autism spectrum disorders: causal or confounded?. Curr Environ Health Rep. 2015;2:430–9.

Brewer LE, Wright JM, Rice G, Neas L, Teuschler L. Causal inference in cumulative risk assessment: the roles of directed acyclic graphs. Environ Int. 2017;102:30–41.

Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43:1969–85.

Spiegelman D. Approaches to uncertainty in exposure assessment in environmental epidemiology. Annu Rev Public Health. 2010;31:149–63.

Christensen K, Christensen CH, Wright JM, Galizia A, Glenn BS, Scott CS. et al. The use of epidemiology in risk assessment: challenges and opportunities. Hum Ecol Risk Assess. 2015;21:1644–63.

Sobus JR, DeWoskin RS, Tan YM, Pleil JD, Phillips MB, George BJ. et al. Uses of NHANES biomarker data for chemical risk assessment: trends, challenges, and opportunities. Environ Health Perspect. 2015;123:919–27.

Shin HM, Vieira VM, Ryan PB, Detwiler R, Sanders B, Steenland K. et al. Environmental fate and transport modeling for perfluorooctanoic acid emitted from the Washington Works Facility in West Virginia. Environ Sci Technol. 2011;45:1435–42.

Shin HM, Vieira VM, Ryan PB, Steenland K, Bartell SM. Retrospective exposure estimation and predicted versus observed serum perfluorooctanoic acid concentrations for participants in the C8 Health Project. Environ Health Perspect. 2011;119:1760–5.

Winquist A, Lally C, Shin H, Steenland K. Design, methods, and population for a study of PFOA health effects among highly exposed Mid-Ohio valley community residents and workers. Environ Health Perspect. 2013;121:893–9.

Zartarian V, Xue J, Tornero-Velez R, Brown J. Children’s lead exposure: a multimedia modeling analysis to guide public health decision-making. Environ Health Perspect. 2017;125:097009.

Kloog I, Chudnovsky AA, Just AC, Nordio F, Koutrakis P, Coull BA, et al. A new hybrid spatio-temporal model for estimating daily multiyear PM2.5 concentrations across northeastern USA using high resolution aerosol optical depth data. Atmos Environ. 2014;95:581–90.

Sacks JD, Rappold AG, Davis JA,Jr, Richardson DB, Waller AE, Luben TJ. Influence of urbanicity and county characteristics on the association between ozone and asthma emergency department visits in North Carolina. Environ Health Perspect. 2014;122:50612.

Kirrane EF, Bowman C, Davis JA, Hoppin JA, Blair A, Chen H. et al. Associations of ozone and PM 2.5 concentrations with Parkinson’s disease among participants in the Agricultural Health Study. J Occup Environ Med. 2015;57:509–17.

Download references

Acknowledgements

All authors were employed at the U.S. Environmental Protection Agency at the time of the writing of this article.

Author information

Authors and affiliations.

U.S. Environmental Protection Agency, Office of Research and Development, Ronald Reagan Building, 1300 Pennsylvania Ave., N.W. Room 51136, Washington, DC, 20004, USA

Kathleen C. (Kacee) Deener, Jason D. Sacks, Ellen F. Kirrane, Barbara S. Glenn, Maureen R. Gwinn & Thomas F. Bateson

Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA

Thomas A. Burke

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Kathleen C. (Kacee) Deener .

Ethics declarations

Conflict of interest.

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Deener, K.C.(., Sacks, J.D., Kirrane, E.F. et al. Epidemiology: a foundation of environmental decision making. J Expo Sci Environ Epidemiol 28 , 515–521 (2018). https://doi.org/10.1038/s41370-018-0059-4

Download citation

Received : 27 November 2017

Revised : 25 May 2018

Accepted : 31 May 2018

Published : 05 September 2018

Issue Date : November 2018

DOI : https://doi.org/10.1038/s41370-018-0059-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Epidemiology
  • criteria air pollutants; disease
  • exposure modeling

This article is cited by

Weight of epidemiological evidence for titanium dioxide risk assessment: current state and further needs.

  • Irina Guseva Canu
  • Sandrine Fraize-Frontier
  • Sandrine Charles

Journal of Exposure Science & Environmental Epidemiology (2020)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

case study environmental epidemiology

Advanced Epidemiological Analysis

Chapter 3 time series / case-crossover studies.

We’ll start by exploring common characteristics in time series data for environmental epidemiology. In the first half of the class, we’re focusing on a very specific type of study—one that leverages large-scale vital statistics data, collected at a regular time scale (e.g., daily), combined with large-scale measurements of a climate-related exposure, with the goal of estimating the typical relationship between the level of the exposure and risk of a health outcome. For example, we may have daily measurements of particulate matter pollution for a city, measured daily at a set of Environmental Protection Agency (EPA) monitors. We want to investigate how risk of cardiovascular mortality changes in the city from day to day in association with these pollution levels. If we have daily counts of the number of cardiovascular deaths in the city, we can create a statistical model that fits the exposure-response association between particulate matter concentration and daily risk of cardiovascular mortality. These statistical models—and the type of data used to fit them—will be the focus of the first part of this course.

3.1 Readings

The required readings for this chapter are:

  • Bhaskaran et al. ( 2013 ) Provides an overview of time series regression in environmental epidemiology.
  • Vicedo-Cabrera, Sera, and Gasparrini ( 2019 ) Provides a tutorial of all the steps for a projecting of health impacts of temperature extremes under climate change. One of the steps is to fit the exposure-response association using present-day data (the section on “Estimation of Exposure-Response Associations” in the paper). In this chapter, we will go into details on that step, and that section of the paper is the only required reading for this chapter. Later in the class, we’ll look at other steps covered in this paper. Supplemental material for this paper is available to download by clicking http://links.lww.com/EDE/B504 . You will need the data in this supplement for the exercises for class.

The following are supplemental readings (i.e., not required, but may be of interest) associated with the material in this chapter:

  • B. Armstrong et al. ( 2012 ) Commentary that provides context on how epidemiological research on temperature and health can help inform climate change policy.
  • Dominici and Peng ( 2008c ) Overview of study designs for studying climate-related exposures (air pollution in this case) and human health. Chapter in a book that is available online through the CSU library.
  • B. Armstrong ( 2006 ) Covers similar material as Bhaskaran et al. ( 2013 ) , but with more focus on the statistical modeling framework
  • Gasparrini and Armstrong ( 2010 ) Describes some of the advances made to time series study designs and statistical analysis, specifically in the context of temperature
  • Basu, Dominici, and Samet ( 2005 ) Compares time series and case-crossover study designs in the context of exploring temperature and health. Includes a nice illustration of different referent periods, including time-stratified.
  • B. G. Armstrong, Gasparrini, and Tobias ( 2014 ) This paper describes different data structures for case-crossover data, as well as how conditional Poisson regression can be used in some cases to fit a statistical model to these data. Supplemental material for this paper is available at https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-14-122#Sec13 .
  • Imai et al. ( 2015 ) Typically, the time series study design covered in this chapter is used to study non-communicable health outcomes. This paper discusses opportunities and limitations in applying a similar framework for infectious disease.
  • Dominici and Peng ( 2008b ) Heavier on statistics. Describes some of the statistical challenges of working with time series data for air pollution epidemiology. Chapter in a book that is available online through the CSU library.
  • Lu and Zeger ( 2007 ) Heavier on statistics. This paper shows how, under conditions often common for environmental epidemiology studies, case-crossover and time series methods are equivalent.
  • Gasparrini ( 2014 ) Heavier on statistics. This provides the statistical framework for the distributed lag model for environmental epidemiology time series studies.
  • Dunn and Smyth ( 2018 ) Introduction to statistical models, moving into regression models and generalized linear models. Chapter in a book that is available online through the CSU library.
  • James et al. ( 2013 ) General overview of linear regression, with an R coding “lab” at the end to provide coding examples. Covers model fit, continuous, binary, and categorical covariates, and interaction terms. Chapter in a book that is available online through the CSU library.

3.2 Time series and case-crossover study designs

In the first half of this course, we’ll take a deep look at how researchers can study how environmental exposures and health risk are linked using time series studies . Let’s start by exploring the study design for this type of study, as well as a closely linked study design, that of case-crossover studies .

It’s important to clarify the vocabulary we’re using here. We’ll use the terms time series study and case-crossover study to refer specifically to a type of study common for studying air pollution and other climate-related exposures. However, both terms have broader definitions, particularly in fields outside environmental epidemiology. For example, a time series study more generally refers to a study where data is available for the same unit (e.g., a city) for multiple time points, typically at regularly-spaced times (e.g., daily). A variety of statistical methods have been developed to apply to gain insight from this type of data, some of which are currently rarely used in the specific fields of air pollution and climate epidemiology that we’ll explore here. For example, there are methods to address autocorrelation over time in measurements—that is, that measurements taken at closer time points are likely somewhat correlated—that we won’t cover here and that you won’t see applied often in environmental epidemiology studies, but that might be the focus of a “Time Series” course in a statistics or economics department.

In air pollution and climate epidemiology, time series studies typically begin with study data collected for an aggregated area (e.g., city, county, ZIP code) and with a daily resolution. These data are usually secondary data, originally collected by the government or other organizations through vital statistics or other medical records (for the health data) and networks of monitors for the exposure data. In the next section of this chapter, we’ll explore common characteristics of these data. These data are used in a time series study to investigate how changes in the daily level of the exposure is associated with risk of a health outcome, focusing on the short-term period. For example, a study might investigate how risk of respiratory hospitalization in a city changes in relationship with the concentration of particulate matter during the week or two following exposure. The study period for these studies is often very long (often a decade or longer), and while single-community time series studies can be conducted, many time series studies for environmental epidemiology now include a large set of communities of national or international scope.

The study design essentially compares a community with itself at different time points—asking if health risk tends to be higher on days when exposure is higher. By comparing the community to itself, the design removes many challenges that would come up when comparing one community to another (e.g., is respiratory hospitalization risk higher in city A than city B because particulate matter concentrations are typically higher in city A?). Communities differ in demographics and other factors that influence health risk, and it can be hard to properly control for these when exploring the role of environmental exposures. By comparison, demographics tend to change slowly over time (at least, compared to a daily scale) within a community.

One limitation, however, is that the study design is often best-suited to study acute effects, but more limited in studying chronic health effects. This is tied to the design and traditional ways of statistically modeling the resulting data. Since a community is compared with itself, the design removes challenges in comparing across communities, but it introduces new ones in comparing across time. Both environmental exposures and rates of health outcomes can have strong patterns over time, both across the year (e.g., mortality rates tend to follow a strong seasonal pattern, with higher rates in winter) and across longer periods (e.g., over the decade or longer of a study period). These patterns must be addressed through the statistical model fit to the time series data, and they make it hard to disentangle chronic effects of the exposure from unrelated temporal patterns in the exposure and outcome, and so most time series studies will focus on the short-term (or acute) association between exposure and outcome, typically looking at a period of at most about a month following exposure.

The term case-crossover study is a bit more specific than time series study , although there has been a strong movement in environmental epidemiology towards applying a specific version of the design, and so in this field the term often now implies this more specific version of the design. Broadly, a case-crossover study is one in which the conditions at the time of a health outcome are compared to conditions at other times that should otherwise (i.e., outside of the exposure of interest) be comparable. A case-crossover study could, for example, investigate the association between weather and car accidents by taking a set of car accidents and investigating how weather during the car accident compared to weather in the same location the week before.

One choice in a case-crossover study design is how to select the control time periods. Early studies tended to use a simple method for this—for example, taking the day before, or a day the week before, or some similar period somewhat close to the day of the outcome. As researchers applied the study design to large sets of data (e.g., all deaths in a community over multiple years), they noticed that some choices could create bias in estimates. As a result, most environmental epidemiology case-crossover studies now use a time-stratified approach to selecting control days. This selects a set of control days that typically include days both before and after the day of the health outcome, and are a defined set of days within a “stratum” that should be comparable in terms of temporal trends. For daily-resolved data, this stratum typically will include all the days within a month, year, and day of week. For example, one stratum of comparable days might be all the Mondays in January of 2010. These stratums are created throughout the study period, and then days are only compared to other days within their stratum (although, fortunately, there are ways you can apply a single statistical model to fit all the data for this approach rather than having to fit code stratum-by-stratum over many years).

When this is applied to data at an aggregated level (e.g., city, county, or ZIP code), it is in spirit very similar to a time series study design, in that you are comparing a community to itself at different time points. The main difference is that a time series study uses statistical modeling to control from potential confounding from temporal patterns, while a case-crossover study of this type instead controls for this potential confounding by only comparing days that should be “comparable” in terms of temporal trends, for example, comparing a day only to other days in the same month, year, and day of week. You will often hear that case-crossover studies therefore address potential confounding for temporal patterns “by design” rather than “statistically” (as in time series studies). However, in practice (and as we’ll explore in this class), in environmental epidemiology, case-crossover studies often are applied to aggregated community-level data, rather than individual-level data, with exposure assumed to be the same for everyone in the community on a given day. Under these assumptions, time series and case-crossover studies have been determined to be essentially equivalent (and, in fact, can use the same study data), only with slightly different terms used to control for temporal patterns in the statistical model fit to the data. Several interesting papers have been written to explore differences and similarities in these two study designs as applied in environmental epidemiology ( Basu, Dominici, and Samet 2005 ; B. G. Armstrong, Gasparrini, and Tobias 2014 ; Lu and Zeger 2007 ) .

These types of study designs in practice use similar datasets. In earlier presentations of the case-crossover design, these data would be set up a bit differently for statistical modeling. More recent work, however, has clarified how they can be modeled similarly to when using a time series study design, allowing the data to be set up in a similar way ( B. G. Armstrong, Gasparrini, and Tobias 2014 ) .

Several excellent commentaries or reviews are available that provide more details on these two study designs and how they have been used specifically investigate the relationship between climate-related exposures and health ( Bhaskaran et al. 2013 ; B. Armstrong 2006 ; Gasparrini and Armstrong 2010 ) . Further, these designs are just two tools in a wider collection of study designs that can be used to explore the health effects of climate-related exposures. Dominici and Peng ( 2008c ) provides a nice overview of this broader set of designs.

3.3 Time series data

Let’s explore the type of dataset that can be used for these time series–style studies in environmental epidemiology. In the examples in this chapter, we’ll be using data that comes as part of the Supplemental Material in one of this chapter’s required readings, ( Vicedo-Cabrera, Sera, and Gasparrini 2019 ) . Follow the link for the supplement for this article and then look for the file “lndn_obs.csv.” This is the file we’ll use as the example data in this chapter.

These data are saved in a csv format (that is, a plain text file, with commas used as the delimiter), and so they can be read into R using the read_csv function from the readr package (part of the tidyverse). For example, you can use the following code to read in these data, assuming you have saved them in a “data” subdirectory of your current working directory:

This example dataset shows many characteristics that are common for datasets for time series studies in environmental epidemiology. Time series data are essentially a sequence of data points repeatedly taken over a certain time interval (e.g., day, week, month etc). General characteristics of time series data for environmental epidemiology studies are:

  • Observations are given at an aggregated level. For example, instead of individual observations for each person in London, the obs data give counts of deaths throughout London. The level of aggregation is often determined by geopolitical boundaries, for example, counties or ZIP codes in the US.
  • Observations are given at regularly spaced time steps over a period. In the obs dataset, the time interval is day. Typically, values will be provided continuously over that time period, with observations for each time interval. Occasionally, however, the time series data may only be available for particular seasons (e.g., only warm season dates for an ozone study), or there may be some missing data on either the exposure or health outcome over the course of the study period.
  • Observations are available at the same time step (e.g., daily) for (1) the health outcome, (2) the environmental exposure of interest, and (3) potential time-varying confounders. In the obs dataset, the health outcome is mortality (from all causes; sometimes, the health outcome will focus on a specific cause of mortality or other health outcomes such as hospitalizations or emergency room visits). Counts are given for everyone in the city for each day ( all column), as well as for specific age categories ( all_0_64 for all deaths among those up to 64 years old, and so on). The exposure of interest in the obs dataset is temperature, and three metrics of this are included ( tmean , tmin , and tmax ). Day of the week is one time-varying factor that could be a confounder, or at least help explain variation in the outcome (mortality). This is included through the dow variable in the obs data. Sometimes, you will also see a marker for holidays included as a potential time-varying confounder, or other exposure variables (temperature is a potential confounder, for example, when investigating the relationship between air pollution and mortality risk).
  • Multiple metrics of an exposure and / or multiple health outcome counts may be included for each time step. In the obs example, three metrics of temperature are included (minimum daily temperature, maximum daily temperature, and mean daily temperature). Several counts of mortality are included, providing information for specific age categories in the population. The different metrics of exposure will typically be fit in separate models, either as a sensitivity analysis or to explore how exposure measurement affects epidemiological results. If different health outcome counts are available, these can be modeled in separate statistical models to determine an exposure-response function for each outcome.

3.4 Exploratory data analysis

When working with time series data, it is helpful to start with some exploratory data analysis. This type of time series data will often be secondary data—it is data that was previously collected, as you are re-using it. Exploratory data analysis is particularly important with secondary data like this. For primary data that you collected yourself, following protocols that you designed yourself, you will often be very familiar with the structure of the data and any quirks in it by the time you are ready to fit a statistical model. With secondary data, however, you will typically start with much less familiarity about the data, how it was collected, and any potential issues with it, like missing data and outliers.

Exploratory data analysis can help you become familiar with your data. You can use summaries and plots to explore the parameters of the data, and also to identify trends and patterns that may be useful in designing an appropriate statistical model. For example, you can explore how values of the health outcome are distributed, which can help you determine what type of regression model would be appropriate, and to see if there are potential confounders that have regular relationships with both the health outcome and the exposure of interest. You can see how many observations have missing data for the outcome, the exposure, or confounders of interest, and you can see if there are any measurements that look unusual. This can help in identifying quirks in how the data were recorded—for example, in some cases ground-based weather monitors use -99 or -999 to represent missing values, definitely something you want to catch and clean-up in your data (replacing with R’s NA for missing values) before fitting a statistical model!

The following applied exercise will take you through some of the questions you might want to answer through this type of exploratory analysis. In general, the tidyverse suite of R packages has loads of tools for exploring and visualizing data in R. The lubridate package from the tidyverse , for example, is an excellent tool for working with date-time data in R, and time series data will typically have at least one column with the timestamp of the observation (e.g., the date for daily data). You may find it worthwhile to explore this package some more. There is a helpful chapter in Wickham and Grolemund ( 2016 ) , https://r4ds.had.co.nz/dates-and-times.html , as well as a cheatsheet at https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_lubridate.pdf . For visualizations, if you are still learning techniques in R, two books you may find useful are Healy ( 2018 ) (available online at https://socviz.co/ ) and Chang ( 2018 ) (available online at http://www.cookbook-r.com/Graphs/ ).

Applied: Exploring time series data

Read the example time series data into R and explore it to answer the following questions:

  • What is the study period for the example obs dataset? (i.e., what dates / years are covered by the time series data?)
  • Are there any missing dates (i.e., dates with nothing recorded) within this time period? Are there any recorded dates where health outcome measurements are missing? Any where exposure measurements are missing?
  • Are there seasonal trends in the exposure? In the outcome?
  • Are there long-term trends in the exposure? In the outcome?
  • Is the outcome associated with day of week? Is the exposure associated with day of week?

Based on your exploratory analysis in this section, talk about the potential for confounding when these data are analyzed to estimate the association between daily temperature and city-wide mortality. Is confounding by seasonal trends a concern? How about confounding by long-term trends in exposure and mortality? How about confounding by day of week?

Applied exercise: Example code

In the obs dataset, the date of each observation is included in a column called date . The data type of this column is “Date”—you can check this by using the class function from base R:

Since this column has a “Date” data type, you can run some mathematical function calls on it. For example, you can use the min function from base R to get the earliest date in the dataset and the max function to get the latest.

You can also run the range function to get both the earliest and latest dates with a single call:

This provides the range of the study period for these data. One interesting point is that it’s not a round set of years—instead, the data ends during the summer of the last study year. This doesn’t present a big problem, but is certainly something to keep in mind if you’re trying to calculate yearly averages of any values for the dataset. If you’re getting the average of something that varies by season (e.g., temperature), it could be slightly weighted by the months that are included versus excluded in the partial final year of the dataset. Similarly, if you group by year and then count totals by year, the number will be smaller for the last year, since only part of the year’s included. For example, if you wanted to count the total deaths in each year of the study period, it will look like they go down a lot the last year, when really it’s only because only about half of the last year is included in the study period:

case study environmental epidemiology

  • Are there any missing dates within this time period? Are there any recorded dates where health outcome measurements are missing? Any where exposure measurements are missing?

There are a few things you should check to answer this question. First (and easiest), you can check to see if there are any NA values within any of the observations in the dataset. This helps answer the second and third parts of the question. The summary function will provide a summary of the values in each column of the dataset, including the count of missing values ( NA s) if there are any:

Based on this analysis, all observations are complete for all dates included in the dataset. There are no listings for NA s for any of the columns, and this indicates no missing values in the dates for which there’s a row in the data.

However, this does not guarantee that every date between the start date and end date of the study period are included in the recorded data. Sometimes, some dates might not get recorded at all in the dataset, and the summary function won’t help you determine when this is the case. One common example in environmental epidemiology is with ozone pollution data. These are sometimes only measured in the warm season, and so may be shared in a dataset with all dates outside of the warm season excluded.

There are a few alternative explorations you can do to check this. Perhaps the easiest is to check the number of days between the start and end date of the study period, and then see if the number of observations in the dataset is the same:

This indicates that there is an observation for every date over the study period, since the number of observations should be one more than the time difference. In the next question, we’ll be plotting observations by time, and typically this will also help you see if there are large chunks of missing dates in the data.

You can use a simple plot to visualize patterns over time in both the exposure and the outcome. For example, the following code plots a dot for each daily temperature observation over the study period. The points are set to a smaller size ( size = 0.5 ) and plotted with some transparency ( alpha = 0.5 ) since there are so many observations.

case study environmental epidemiology

There is (unsurprisingly) clear evidence here of a strong seasonal trend in mean temperature, with values typically lowest in the winter and highest in the summer.

You can plot the outcome variable in the same way:

case study environmental epidemiology

Again, there are seasonal trends, although in this case they are inversed. Mortality tends to be highest in the winter and lowest in the summer. Further, the seasonal pattern is not equally strong in all years—some years it has a much higher winter peak, probably in conjunction with severe influenza seasons.

Another way to look for seasonal trends is with a heatmap-style visualization, with day of year along the x-axis and year along the y-axis. This allows you to see patterns that repeat around the same time of the year each year (and also unusual deviations from normal seasonal patterns).

For example, here’s a plot showing temperature in each year, where the observations are aligned on the x-axis by time in year. We’re using the doy —which stands for “day of year” (i.e., Jan 1 = 1; Jan 2 = 2; … Dec 31 = 365 as long as it’s not a leap year) as the measure of time in the year. We’ve reversed the y-axis so that the earliest years in the study period start at the top of the visual, then later study years come later—this is a personal style, and it would be no problem to leave the y-axis as-is. We’ve used the viridis color scale for the fill, since that has a number of features that make it preferable to the default R color scale, including that it is perceptible for most types of color blindness and be printed out in grayscale and still be correctly interpreted.

case study environmental epidemiology

From this visualization, you can see that temperatures tend to be higher in the summer months and lower in the winter months. “Spells” of extreme heat or cold are visible—where extreme temperatures tend to persist over a period, rather than randomly fluctuating within a season. You can also see unusual events, like the extreme heat wave in the summer of 2003, indicated with the brightest yellow in the plot.

We created the same style of plot for the health outcome. In this case, we focused on mortality among the oldest age group, as temperature sensitivity tends to increase with age, so this might be where the strongest patterns are evident.

case study environmental epidemiology

For mortality, there tends to be an increase in the winter compared to the summer. Some winters have stretches with particularly high mortality—these are likely a result of seasons with strong influenza outbreaks. You can also see on this plot the impact of the 2003 heat wave on mortality among this oldest age group—an unusual spot of light green in the summer.

Some of the plots we created in the last section help in exploring this question. For example, the following plot shows a clear pattern of decreasing daily mortality counts, on average, over the course of the study period:

case study environmental epidemiology

It can be helpful to add a smooth line to help detect these longer-term patterns, which you can do with geom_smooth :

case study environmental epidemiology

You could also take the median mortality count across each year in the study period, although you should take out any years without a full year’s worth of data before you do this, since there are seasonal trends in the outcome:

case study environmental epidemiology

Again, we see a clear pattern of decreasing mortality rates in this city over time. This means we need to think carefully about long-term time patterns as a potential confounder. It will be particularly important to think about this if the exposure also has a strong pattern over time. For example, air pollution regulations have meant that, in many cities, there may be long-term decreases in pollution concentrations over a study period.

The data already has day of week as a column in the data ( dow ). However, this is in a character data type, so it doesn’t have the order of weekdays encoded (e.g., Monday comes before Tuesday). This makes it hard to look for patterns related to things like weekend / weekday.

We could convert this to a factor and encode the weekday order when we do it, but it’s even easier to just recreate the column from the date column. We used the wday function from the lubridate package to do this—it extracts weekday as a factor, with the order of weekdays encoded (using a special “ordered” factor type):

We looked at the mean, median, and 25th and 75th quantiles of the mortality counts by day of week:

Mortality tends to be a bit higher on weekdays than weekends, but it’s not a dramatic difference.

We did the same check for temperature:

In this case, there does not seem to be much of a pattern by weekday.

You can also visualize the association using boxplots:

case study environmental epidemiology

You can also try violin plots—these show the full distribution better than boxplots, which only show quantiles.

case study environmental epidemiology

All these reinforce that there are some small differences in weekend versus weekday patterns for mortality. There isn’t much pattern by weekday with temperature, so in this case weekday is unlikely to be a confounder (the same is not true with air pollution, which often varies based on commuting patterns and so can have stronger weekend/weekday differences). However, since it does help some in explaining variation in the health outcome, it might be worth including in our models anyway, to help reduce random noise.

Exploratory data analysis is an excellent tool for exploring your data before you begin fitting a statistical model, and you should get in the habit of using it regularly in your research. Dominici and Peng ( 2008a ) provides another walk-through of exploring this type of data, including some more advanced tools for exploring autocorrelation and time patterns.

3.5 Statistical modeling for a time series study

Now that we’ve explored the data typical of a time series study in climate epidemiology, we’ll look at how we can fit a statistical model to those data to gain insight into the relationship between the exposure and acute health effects. Very broadly, we’ll be using a statistical model to answer the question: How does the relative risk of a health outcome change as the level of the exposure changes, after controlling for potential confounders?

In the rest of this chapter and the next chapter, we’ll move step-by-step to build up to the statistical models that are now typically used in these studies. Along the way, we’ll discuss key components and choices in this modeling process. The statistical modeling is based heavily on regression modeling, and specifically generalized linear regression. To help you get the most of this section, you may find it helpful to review regression modeling and generalized linear models. Some resources for that include Dunn and Smyth ( 2018 ) and James et al. ( 2013 ) .

One of the readings for this week, Vicedo-Cabrera, Sera, and Gasparrini ( 2019 ) , includes a section on fitting exposure-response functions to describe the association between daily mean temperature and mortality risk. This article includes example code in its supplemental material, with code for fitting the model to these time series data in the file named “01EstimationERassociation.r.” Please download that file and take a look at the code.

The model in the code may at first seem complex, but it is made up of a number of fairly straightforward pieces (although some may initially seem complex):

  • The model framework is a generalized linear model (GLM)
  • This GLM is fit assuming an error distribution and a link function appropriate for count data
  • The GLM is fit assuming an error distribution that is also appropriate for data that may be overdispersed
  • The model includes control for day of the week by including a categorical variable
  • The model includes control for long-term and seasonal trends by including a spline (in this case, a natural cubic spline ) for the day in the study
  • The model fits a flexible, non-linear association between temperature and mortality risk, also using a spline
  • The model fits a flexible non-linear association between temperature on a series of preceeding days and current day and mortality risk on the current day using a distributed lag approach
  • The model jointly describes both of the two previous non-linear associations by fitting these two elements through one construct in the GLM, a cross-basis term

In this section and the next chapter, we will work through the elements, building up the code to get to the full model that is fit in Vicedo-Cabrera, Sera, and Gasparrini ( 2019 ) .

Fitting a GLM to time series data

The generalized linear model (GLM) framework unites a number of types of regression models you may have previously worked with. One basic regression model that can be fit within this framework is a linear regression model. However, the framework also allows you to also fit, among others, logistic regression models (useful when the outcome variable can only take one of two values, e.g., success / failure or alive / dead) and Poisson regression models (useful when the outcome variable is a count or rate). This generalized framework brings some unity to these different types of regression models. From a practical standpoint, it has allowed software developers to easily provide a common interface to fit these types of models. In R, the common function call to fit GLMs is glm .

Within the GLM framework, the elements that separate different regression models include the link function and the error distribution. The error distribution encodes the assumption you are enforcing about how the errors after fitting the model are distributed. If the outcome data are normally distributed (a.k.a., follow a Gaussian distribution), after accounting for variance explained in the outcome by any of the model covariates, then a linear regression model may be appropriate. For count data—like numbers of deaths a day—this is unlikely, unless the average daily mortality count is very high (count data tend to come closer to a normal distribution the further their average gets from 0). For binary data—like whether each person in a study population died on a given day or not—normally distributed errors are also unlikely. Instead, in these two cases, it is typically more appropriate to fit GLMs with Poisson and binomial “families,” respectively, where the family designation includes an appropriate specification for the variance when fitting the model based on these outcome types.

The other element that distinguishes different types of regression within the GLM framework is the link function. The link function applies a transformation on the combination of independent variables in the regression equation when fitting the model. With normally distributed data, an identity link is often appropriate—with this link, the combination of independent variables remain unchanged (i.e., keep their initial “identity”). With count data, a log link is often more appropriate, while with binomial data, a logit link is often used.

Finally, data will often not perfectly adhere to assumptions. For example, the Poisson family of GLMs assumes that variance follows a Poisson distribution (The probability mass function for Poisson distribution \(X \sim {\sf Poisson}(\mu)\) is denoted by \(f(k;\mu)=Pr[X=k]= \displaystyle \frac{\mu^{k}e^{-\mu}}{k!}\) , where \(k\) is the number of occurences, and \(\mu\) is equal to the expected number of cases). With this distribution, the variance is equal to the mean ( \(\mu=E(X)=Var(X)\) ). With real-life data, this assumption is often not valid, and in many cases the variance in real life count data is larger than the mean. This can be accounted for when fitting a GLM by setting an error distribution that does not require the variance to equal the mean—instead, both a mean value and something like a variance are estimated from the data, assuming an overdispersion parameter \(\phi\) so that \(Var(X)=\phi E(X)\) . In environmental epidemiology, time series are often fit to allow for this overdispersion. This is because if the data are overdispersed but the model does not account for this, the standard errors on the estimates of the model parameters may be artificially small. If the data are not overdispersed ( \(\phi=1\) ), the model will identify this when being fit to the data, so it is typically better to prefer to allow for overdispersion in the model (if the size of the data were small, you may want to be parsimonious and avoid unneeded complexity in the model, but this is typically not the case with time series data).

In the next section, you will work through the steps of developing a GLM to fit the example dataset obs . For now, you will only fit a linear association between mean daily temperature and mortality risk, eventually including control for day of week. In later work, especially the next chapter, we will build up other components of the model, including control for the potential confounders of long-term and seasonal patterns, as well as advancing the model to fit non-linear associations, distributed by time, through splines, a distributed lag approach, and a cross-basis term.

Applied: Fitting a GLM to time series data

In R, the function call used to fit GLMs is glm . Most of you have likely covered GLMs, and ideally this function call, in previous courses. If you are unfamiliar with its basic use, you will want to refresh yourself on this topic—you can use some of the resources noted earlier in this section and in the chapter’s “Supplemental Readings” to do so.

  • Fit a GLM to estimate the association between mean daily temperature (as the independent variable) and daily mortality count (as the dependent variable), first fitting a linear regression. (Since the mortality data are counts, we will want to shift to a different type of regression within the GLM framework, but this step allows you to develop a simple glm call, and to remember where to include the data and the independent and dependent variables within this function call.)
  • Change your function call to fit a regression model in the Poisson family.
  • Change your function call to allow for overdispersion in the outcome data (daily mortality count). How does the estimated coefficient for temperature change between the model fit for #2 and this model? Check both the central estimate and its estimated standard error.
  • Change your function call to include control for day of week.
  • Fit a GLM to estimate the association between mean daily temperature (as the independent variable) and daily mortality count (as the dependent variable), first fitting a linear regression.

This is the model you are fitting:

\(Y_{t}=\beta_{0}+\beta_{1}X1_{t}+\epsilon\)

where \(Y_{t}\) is the mortality count on day \(t\) , \(X1_{t}\) is the mean temperature for day \(t\) and \(\epsilon\) is the error term. Since this is a linear model we are assuming a Gaussian error distribution \(\epsilon \sim {\sf N}(0, \sigma^{2})\) , where \(\sigma^{2}\) is the variance not explained by the covariates (here just temperature).

To do this, you will use the glm call. If you would like to save model fit results to use later, you assign the output a name as an R object ( mod_linear_reg in the example code). If your study data are in a dataframe, you can specify these data in the glm call with the data parameter. Once you do this, you can use column names directly in the model formula. In the model formula, the dependent variable is specified first ( all , the column for daily mortality counts for all ages, in this example), followed by a tilde ( ~ ), followed by all independent variables (only tmean in this example). If multiple independent variables are included, they are joined using + . We’ll see an example when we start adding control for confounders later.

Once you have fit a model and assigned it to an R object, you can explore it and use resulting values. First, the print method for a regression model gives some summary information. This method is automatically called if you enter the model object’s name at the console:

More information is printed if you run the summary method on the model object:

Make sure you are familiar with the information provided from the model object, as well as how to interpret values like the coefficient estimates and their standard errors and p-values. These basic elements should have been covered in previous coursework (even if a different programming language was used to fit the model), and so we will not be covering them in great depth here, but instead focusing on some of the more advanced elements of how regression models are commonly fit to data from time series and case-crossover study designs in environmental epidemiology. For a refresher on the basics of fitting statistical models in R, you may want to check out Chapters 22 through 24 of Wickham and Grolemund ( 2016 ) , a book that is available online, as well as Dunn and Smyth ( 2018 ) and James et al. ( 2013 ) .

Finally, there are some newer tools for extracting information from model fit objects. The broom package extracts different elements from these objects and returns them in a “tidy” data format, which makes it much easier to use the output further in analysis with functions from the “tidyverse” suite of R packages. These tools are very popular and powerful, and so the broom tools can be very useful in working with output from regression modeling in R.

The broom package includes three main functions for extracting data from regression model objects. First, the glance function returns overall data about the model fit, including the AIC and BIC:

The tidy function returns data at the level of the model coefficients, including the estimate for each model parameter, its standard error, test statistic, and p-value.

Finally, the augment function returns data at the level of the original observations, including the fitted value for each observation, the residual between the fitted and true value, and some measures of influence on the model fit.

One way you can use augment is to graph the fitted values for each observation after fitting the model:

case study environmental epidemiology

For more on the broom package, including some excellent examples of how it can be used to streamline complex regression analyses, see Robinson ( 2014 ) . There is also a nice example of how it can be used in one of the chapters of Wickham and Grolemund ( 2016 ) , available online at https://r4ds.had.co.nz/many-models.html .

A linear regression is often not appropriate when fitting a model where the outcome variable provides counts, as with the example data, since such data often don’t follow a normal distribution. A Poisson regression is typically preferred.

For a count distribution were \(Y \sim {\sf Poisson(\mu)}\) we typically fit a model such as

\(g(Y)=\beta_{0}+\beta_{1}X1\) , where \(g()\) represents the link function, in this case a log function so that \(log(Y)=\beta_{0}+\beta_{1}X1\) . We can also express this as \(Y=exp(\beta_{0}+\beta_{1}X1)\) .

In the glm call, you can specify this with the family parameter, for which “poisson” is one choice.

One thing to keep in mind with this change is that the model now uses a non-identity link between the combination of independent variable(s) and the dependent variable. You will need to keep this in mind when you interpret the estimates of the regression coefficients. While the coefficient estimate for tmean from the linear regression could be interpreted as the expected increase in mortality counts for a one-unit (i.e., one degree Celsius) increase in temperature, now the estimated coefficient should be interpreted as the expected increase in the natural log-transform of mortality count for a one-unit increase in temperature.

You can see this even more clearly if you take a look at the association between temperature for each observation and the expected mortality count fit by the model. First, if you look at the fitted values without transforming, they will still be in a state where mortality count is log-transformed. You can see by looking at the range of the y-scale that these values are for the log of expected mortality, rather than expected mortality (compare, for example, to the similar plot shown from the first model, which was linear), and that the fitted association for that transformation , not for untransformed mortality counts, is linear:

case study environmental epidemiology

You can use exponentiation to transform the fitted values back to just be the expected mortality count based on the model fit. Once you make this transformation, you can see how the link in the Poisson family specification enforced a curved relationship between mean daily temperature and the untransformed expected mortality count.

case study environmental epidemiology

For this model, we can interpret the coefficient for the temperature covariate as the expected log relative risk in the health outcome associated with a one-unit increase in temperature. We can exponentiate this value to get an estimate of the relative risk:

If you want to estimate the confidence interval for this estimate, you should calculate that before exponentiating.

In the R glm call, there is a family that is similar to Poisson (including using a log link), but that allows for overdispersion. You can specify it with the “quasipoisson” choice for the family parameter in the glm call:

When you use this family, there will be some new information in the summary for the model object. It will now include a dispersion parameter ( \(\phi\) ). If this is close to 1, then the data were close to the assumed variance for a Poisson distribution (i.e., there was little evidence of overdispersion). In the example, the overdispersion is around 5, suggesting the data are overdispersed (this might come down some when we start including independent variables that explain some of the variation in the outcome variable, like long-term and seasonal trends).

If you compare the estimates of the temperature coefficient from the Poisson regression with those when you allow for overdispersion, you’ll see something interesting:

The central estimate ( estimate column) is very similar. However, the estimated standard error is larger when the model allows for overdispersion. This indicates that the Poisson model was too simple, and that its inherent assumption that data were not overdispersed was problematic. If you naively used a Poisson regression in this case, then you would estimate a confidence interval on the temperature coefficient that would be too narrow. This could cause you to conclude that the estimate was statistically significant when you should not have (although in this case, the estimate is statistically significant under both models).

Day of week is included in the data as a categorical variable, using a data type in R called a factor. You are now essentially fitting this model:

\(log(Y)=\beta_{0}+\beta_{1}X1+\gamma^{'}X2\) ,

where \(X2\) is a categorical variable for day of the week and \(\gamma^{'}\) represents a vector of parameters associated with each category.

It is pretty straightforward to include factors as independent variables in calls to glm : you just add the column name to the list of other independent variables with a + . In this case, we need to do one more step: earlier, we added order to dow , so it would “remember” the order of the week days (Monday before Tuesday, etc.). However, we need to strip off this order before we include the factor in the glm call. One way to do this is with the factor call, specifying ordered = FALSE . Here is the full call to fit this model:

When you look at the summary for the model object, you can see that the model has fit a separate model parameter for six of the seven weekdays. The one weekday that isn’t fit (Sunday in this case) serves as a baseline —these estimates specify how the log of the expected mortality count is expected to differ on, for example, Monday versus Sunday (by about 0.03), if the temperature is the same for the two days.

You can also see from this summary that the coefficients for the day of the week are all statistically significant. Even though we didn’t see a big difference in mortality counts by day of week in our exploratory analysis, this suggests that it does help explain some variance in mortality observations and will likely be worth including in the final model.

The model now includes day of week when fitting an expected mortality count for each observation. As a result, if you plot fitted values of expected mortality versus mean daily temperature, you’ll see some “hoppiness” in the fitted line:

case study environmental epidemiology

This is because each fitted value is also incorporating the expected influence of day of week on the mortality count, and that varies across the observations (i.e., you could have two days with the same temperature, but different expected mortality from the model, because they occur on different days).

If you plot the model fits separately for each day of the week, you’ll see that the line is smooth across all observations from the same day of the week:

case study environmental epidemiology

Wrapping up

At this point, the coefficient estimates suggests that risk of mortality tends to decrease as temperature increases. Do you think this is reasonable? What else might be important to build into the model based on your analysis up to this point?

  • Open access
  • Published: 29 September 2022

Case-only approach applied in environmental epidemiology: 2 examples of interaction effect using the US National Health and Nutrition Examination Survey (NHANES) datasets

  • Jinyoung Moon   ORCID: orcid.org/0000-0001-5948-823X 1 , 2 &
  • Hwan-Cheol Kim   ORCID: orcid.org/0000-0002-3635-1297 2 , 3  

BMC Medical Research Methodology volume  22 , Article number:  254 ( 2022 ) Cite this article

1848 Accesses

1 Citations

Metrics details

Introduction

By substituting the general ‘susceptibility factor’ concept for the conventional ‘gene’ concept in the case-only approach for gene-environment interaction, the case-only approach can also be used in environmental epidemiology. Under the independence between the susceptibility factor and environmental exposure, the case-only approach can provide a more precise estimate of an interaction effect.

Two analysis examples of the case-only approach in environmental epidemiology are provided using the 2015–2016 and 2017–2018 US National Health and Nutritional Examination Survey (NHANES): (i) the negative interaction effect between blood chromium level and glycohemoglobin level on albuminuria and (ii) the positive interaction effect between blood cobalt level and old age on albuminuria. The second part of the methods (theoretical backgrounds) summarized the logic and equations provided in previous studies about the case-only approach.

(i) When a 1 μg/L difference of both blood chromium level (mcg/L) and a 1% difference in blood glycohemoglobin level coincide, the multiplicative interaction contrast ratio (ICR c/nc ) was 0.72 (95% CI 0.35–1.60), with no statistical significance. However, when only the cases were analyzed, the case-only ICR (ICR CO ) was 0.59 (95% CI 0.28–0.95), with a statistical significance (a negative interaction effect). (ii) When a 1 μg/L difference of both blood cobalt levels and a 1-year difference in age coincide, the multiplicative interaction contrast ratio (ICR c/nc ) was 1.13 (95% CI 0.99–1.37), with no statistical significance. However, when only the cases were analyzed, the case-only ICR (ICR CO ) was 1.21 (95% CI 1.06–1.51), with a statistical significance (a positive interaction effect).

The discussion suggested the theoretical background and previous literature about the possible protective interaction effect between blood chromium levels and blood glycohemoglobin levels on the incidence of albuminuria and the possible aggravating interaction effect between blood cobalt levels and increasing ages on the incidence of albuminuria. If the independence assumption between a susceptibility factor and environmental exposure in a study with cases and non-cases is kept, the case-only approach can provide a more precise interaction effect estimate than conventional approaches with both cases and non-cases.

Peer Review reports

The estimation of an interaction effect has often been conducted in cohort or case-control studies using information from both cases and controls [ 1 , 2 , 3 , 4 ]. However, a case-only approach can be a valid alternative and even may have advantages under certain circumstances over conventional approaches that use information from both cases and controls.

The case-only approach is used to calculate the interaction effect estimate. This unique approach is mainly used in gene-environmental and gene-gene interaction studies in genetic epidemiology [ 5 , 6 , 7 , 8 ]. However, if the ‘gene’ concept in the gene-environmental interaction could indicate a type of ‘susceptibility factor,’ the term ‘gene-environment interaction’ in genetic epidemiology can be replaced with the ‘susceptibility factor-environmental exposure interaction’ in environmental epidemiology.

The case-only approach can provide 2 benefits over a study with cases and non-cases or conventional cohort/case-control studies to estimate the interaction effect between a susceptibility factor and an environmental exposure [ 5 , 7 , 8 , 9 , 10 , 11 , 12 , 13 ]. The first is that a more precise interaction effect estimate can be calculated. The second is that this approach can estimate the interaction effect when appropriate controls are unavailable. However, this case-only approach requires an important condition between the susceptibility factor and the environmental exposure studied: independence [ 5 , 14 ]. If this independent assumption between a susceptibility factor and an environmental exposure is not fulfilled, the case-only interaction estimate might be biased severely from the interaction effect estimate acquired from a study with cases and non-cases.

This study will summarize all logic, definitions, and equations about the case-only approach through various study types, including case-only studies and a study with cases and non-cases, including case-control and cohort studies. In addition, this study will deal with important assumptions and the relationship among these assumptions, which are required for the reliable estimation of the interaction effect in the case-only approach. Possible corrective strategies for the violation of the independence assumption will also be dealt with. Finally, 2 analysis examples of the case-only approach will be illustrated using the US NHANES dataset. This study can clarify the logic and equations of the case-only approach and contribute to applying the case-only approach of genetic epidemiology to environmental epidemiology.

Methods: application for real data – 2 examples

In this study, 2 analysis examples using the US National Health and Nutritional Examination Survey (NHANES) data will be provided ( https://www.cdc.gov/nchs/nhanes/index.htm ). The case-only approach applied in environmental epidemiology will be explained using this dataset.

The preventive (negative) interaction effect between blood chromium level and glycohemoglobin level on albuminuria (micro and macro)

The laboratory data of NHANES 2015–2016 and NHANES 2017–2018 datasets were used. The blood chromium levels (mcg/L) were used as the environmental exposure variable, and the glycohemoglobin levels (%) were used as the susceptibility factor variable. The albumin creatinine ratio (mg/g) was the outcome (disease) variable.

The chromium level of 1.4 mcg/L was set as the standpoint between normal and abnormal chromium levels. The albumin creatinine ratio of 300 mg/g was set as the standpoint between normal and albuminuria (micro and macro). Both micro-albuminuria and macro-albuminuria were categorized in the single ‘albuminuria’ category. Glycohemoglobin level was used as a continuous variable without conversion to a categorical variable. Because of possible confounding due to diabetes treatment (glucose-lowering medications), all respondents with the ‘yes’ answer to the question ‘take diabetic pills to lower blood sugar’ were excluded from the analysis.

The aggravating (positive) interaction effect between blood cobalt level and old age on albuminuria (micro and macro)

The laboratory data and demographics data of NHANES 2015–2016 and NHANES 2017–2018 datasets were used. The blood cobalt level (mcg/L) in laboratory data was used as the environmental exposure variable, and age in years in demographics data was used as the susceptibility factor variable. Albumin creatinine ratio (mg/g) in laboratory data was used as the outcome variable.

The cobalt level of 1.8 mcg/L was set as the standpoint between normal and abnormal cobalt levels. The albumin creatinine ratio of 300 mg/g was set as the standpoint between normal and albuminuria. Both micro-albuminuria and macro-albuminuria were categorized as a single ‘albuminuria’ category. Age in years was applied as a continuous variable without conversion to a categorical variable.

Calculation of estimates

All abbreviations used in this article are provided in Table  1 . First, the estimate with an appropriate confidence interval for the fold-difference in the odds of albuminuria associated with a unit difference in the blood chromium level was calculated in the first example. In the second example, the estimate with an appropriate confidence interval for the fold-difference in the odds of albuminuria associated with a unit difference in the blood cobalt level was calculated. Second, the estimate with an appropriate confidence interval for the fold-difference in the odds of albuminuria associated with a unit difference in the blood glycohemoglobin level was calculated in the first example. In the second example, the estimate with an appropriate confidence interval for the fold-difference in the odds of albuminuria associated with a unit difference in the age in years was calculated. Third, the estimate with an appropriate confidence interval for the multiplicative ICR associated with the difference of one unit in both the blood chromium level and the blood glycohemoglobin level was calculated in the first example. In the second example, the estimate with an appropriate confidence interval for the multiplicative ICR associated with the difference of one unit in both the blood cobalt level and age in years was calculated. Fourth, the independence between the blood chromium level and blood glycohemoglobin level was assessed in the whole sample, including cases and non-cases in the first example. In the second example, the independence between the blood cobalt level and age in years was assessed in the whole sample, including cases and non-cases. Fifth, only if the independence mentioned in the fourth item was plausible the multiplicative ICR using only cases were calculated. If the independence mentioned in the fourth item was not plausible, the multiplicative ICR calculated based on only cases was adjusted based on theoretical equations (multiplied by the S-E OR c/nc ). After these steps, the authors concluded whether the estimate derived from only cases is more precise than the estimate obtained from both cases and non-cases.

Statistical method and software

A logistic regression model was applied for the calculation of odds ratios. The R software version 4.0.3 was used. Package ‘dplyr’ and ‘data.table’ were used for the pre-processing of the datasets. The used R codes are provided in Supplementary material A .

Methods: theoretical backgrounds

Basic assumption: the joint and icr on the multiplicative scale.

Statistical interactions between the effects of susceptibility factors and those of environmental factors can be assessed as departures from multiplicativity of effects or as departures from additivity of effects. Table  2 indicates an example of a study with cases and non-cases. With the unexposed and no susceptibility (E-G-) group set as the reference group, we can calculate relative risk (RR) and odds ratio (OR) for all other 3 groups.

The joint RR for the susceptibility factor and environmental exposure (RR se ) can be compared with the RR for environmental exposure alone (RR e ) or with the RR for susceptibility factor alone (RR s ). The joint OR for the susceptibility factor and environmental exposure (OR se ) can be compared with the OR for environmental exposure alone (OR e ) or with the OR for susceptibility factor alone (OR s ). In the joint RR model with the additive scale, the ICR (ICR c/nc ) indicates the departures from the sum of individual RRs minus one (ICR c/nc  = RR se -(RR s  + RR e -1)). This equation is called ‘relative excess risk due to interaction (RERI)’ in epidemiologic literature [ 15 ]. In the joint OR model with the additive scale, the ICR (ICR c/nc ) indicates the departures from the sum of individual ORs minus one (ICR c/nc  = OR se -(OR s  + OR e -1)). In the joint RR model with the multiplicative scale, the ICR (ICR c/nc ) indicates the departures from the product of individual RRs (ICR c/nc  = RR se /(RR s  × RR e )). In the joint OR model with the multiplicative scale, the ICR (ICR c/nc ) indicates the departures from the product of individual ORs (ICR c/nc  = OR se /(OR s  × OR e )). In this article, we used only the joint RR or the joint OR model with the multiplicative scale to estimate the ICR c/nc .

The ICR in a case-only study and the ICR in a study with cases and non-cases

Table 2 illustrates the composition of a study with cases and non-cases. To generate case-only data from the above source population, we extracted only the ‘case’ column in Table  3 .

The ICR in a case-only study will be as follows:

The ICR in a study with cases and non-cases will be as follows:

In Eq. ( 2 ), (ag/ce) is converted into ICR co obtained in the case-only study. ICR c/nc is the ICR calculated in a study with cases and non-cases. From Eq. ( 2 ), the requirement for the equality between the ICR acquired from a study with cases and non-cases and the ICR acquired from the case-only study is as follows:

Equation ( 3 ) means that the environmental exposure and the susceptibility factor must be independent in a study with cases and non-cases for the equality between the ICR acquired from a study with cases and non-cases and the ICR acquired from the case-only study. In Eqs. ( 2 ) and ( 3 ), we should note that the equality between the ICR from a study with case and non-cases and the ICR from the case-only study does not necessarily require a rare disease assumption (a low prevalence of the disease).

The above equations in this subsection can be understood from the context of a logistic model, with other covariates adjusted. The following equations indicate a conventional logistic regression model for a case-only study:

When E is a categorical or continuous variable for environmental exposure status, a case-only estimate for the interaction effect can be obtained using Eq. ( 5 ).

We can also assess the independence between an environmental factor and a susceptibility factor in a study with cases and non-cases from the context of a logistic model using the following equations:

According to the independence assumption provided in Eq. ( 3 ), the environmental exposure and the susceptibility factor must be independent in the population with cases and non-cases for the equality between the ICR obtained in the population with cases, and non-cases and the ICR obtained in the case-only study. From the context of a logistic model, this means that the confidence interval for Eq. ( 7 ) must include 1 and that the point estimate for Eq. ( 7 ) must be close to 1.

We can also calculate the ICR obtained in the population with cases and non-cases from the context of a logistic model, using the following equation:

The ICR in a case-control study

We can define the susceptibility-environment ICR acquired from a case-control study in the model with the multiplicative scale as follows:

ICR cc : the ICR calculated in a case-control study.

ICR cc  > 1: The joint OR is larger than the product of each individual OR.

ICR cc  < 1: The joint OR is smaller than the product of each individual OR.

ICR cc  = 1: The joint OR is the same as the product of each individual OR.

If the joint OR is larger than the product of each individual OR, the ICR cc will be larger than 1. If the joint OR is smaller than the product of each individual OR, the ICR cc will be smaller than 1. If the joint OR is the same as the product of each individual OR, the ICR cc will be 1.

The ICR in a case-only study and the ICR in a case-control study

For the generation of the case-control study data, a fraction (p) of controls in each group was selected from the population with cases and non-cases in Table  4 .

The ICR in a case-control study can be calculated as follows:

In Eq. ( 11 ), the requirement for equality between ICR cc and ICR co is as follows:

Equation ( 12 ) means that for the equality between ICR cc and ICR co , the susceptibility factor and environmental exposure must be independent in the control population. A rare disease assumption is also not required for this equality.

We can also calculate the ICR in a case-control study from the context of a logistic model, using the following equation:

The ICR in a study with cases and non-cases and the ICR in a case-control study

The equality between ICR cc and ICR co does not mean that these 2 estimates are not biased away from the ICR acquired from the population with cases and non-cases (ICR c/nc ). Based on Eqs. ( 2 ) and ( 11 ), we can get the following equation:

In Eq. ( 15 ), for the equality between ICR cc and ICR c/nc , the following equation or at least 1 of 2 conditions suggested below should be met:

Equation ( 16 ) means that for the equality between ICR cc and ICR c/nc , the susceptibility factor and the environmental exposure must be independent both in the population with cases and non-cases and in the controls. Alternatively, if the disease is rare, Eq. ( 16 ) will be satisfied. In this case, the rare disease assumption must be examined in the population with cases and non-cases.

S-E independence in the population with cases and non-cases and S-E independence in the controls: one cannot replace the other

If we evaluate Eq. ( 16 ) in detail, we can find an important relationship. The S-E independence in the controls is a totally different concept from the S-E independence in the population with cases and non-cases: one cannot replace the other.

For the first equal sign, S-E OR control  = 1 is required according to Eq. ( 11 ).

For the second equal sign, S-E OR c/nc  = 1 is required according to Eq. ( 2 ).

If the disease is rare, \({\mathrm{ICR}}_{\mathrm{cc}}=\left({\mathrm{ICR}}_{\mathrm{co}}\right)\left(\frac{\mathrm{DF}}{\mathrm{BH}}\right)\) according to Eq. ( 11 ), and \({\mathrm{ICR}}_{\mathrm{c}/\mathrm{nc}}=\left({\mathrm{ICR}}_{\mathrm{c}\mathrm{o}}\right)\left(\frac{\mathrm{DF}}{\mathrm{BH}}\right)\) according to Eq. ( 2 ).

If a researcher uses whether or not S-E OR controls equals 1, instead of whether or not S-E OR c/nc equals 1, for the assessment of the validity of using ICR co instead of using ICR c/nc , this misuse can lead to either the rejection of the valid ICR co or the acceptance of the invalid ICR co mistakenly.

In Supplementary material B , an example from Gatto et al. [ 8 ] is provided for this problem. In the first example, S and E are independent in the population, including cases and non-cases (S-E OR c/nc  = 1). The interaction estimate in the population, including cases and non-cases (i.e., ICR c/nc ) is 2.5. The ICR co is also 2.5. In this situation, the S-E OR control of 0.7 does not provide a reliable estimation for S-E OR c/nc of 1.0. In the second example, the S-E OR c/nc is 2.0, showing a non-independent relationship. The ICR c/nc is 1.0, but ICR co is 2.0. In this situation, the S-E OR control of 1.0 does not provide a reliable estimation for S-E OR c/nc of 2.0.

The rare disease assumption: for ICR cc  = ICR c/nc and S-E OR control  = S-E OR c/nc

The rare disease assumption provides 2 implications in this discussion of the case-only approach. The first implication is provided in Eq. ( 18 ). The second implication is the following:

\(\left(\frac{\left(\mathrm{c}+\mathrm{D}\right)\left(\mathrm{e}+\mathrm{F}\right)}{\left(\mathrm{a}+\mathrm{B}\right)\left(\mathrm{g}+\mathrm{H}\right)}\right)=\) S-E OR c/nc from Eq. ( 3 ) \(\mathrm{and}\ \left(\frac{\mathrm{DF}}{\mathrm{BH}}\right)=\frac{\mathrm{df}}{\mathrm{bh}}=\) S-E OR control from Eq. ( 12 )

In this subsection, we will deal with the second implication. Equation ( 20 ) indicates the relationship between S-E OR control and S-E OR c/nc [ 8 ].

In Gatto et al. [ 8 ], the authors used Eq. ( 20 ) to conduct a sensitivity analysis (Supplementary material C ). The article assessed the impact of the baseline risk of disease in the population (p(D|S-E-)) and the independent effect of S (RR S ) on the S-E OR control when the S-E OR c/nc is 1.0. In Supplementary material C , the baseline risk of disease ranges from 0.1 to 6%. As illustrated in Supplementary material C , the S-E OR control is similar to the S-E OR c/nc of 1.0 when either the baseline risk of disease (p(D|S-E-)) is under 1%, and the independent effect of S is relatively low (RR S  < 2.5). However, as the baseline risk of disease approaches 3%, the S-E OR control begins to diverge from the S-E OR c/nc of 1.0. This worsens when the independent effect of the susceptibility factor increases.

Violation of independence: confounder and subpopulation dependence

The violation of independence between S and E occurs when an individual alters his or her environmental exposure according to his or her susceptibility factor. This violation is due to 2 factors mainly: (i) a confounder and (ii) subpopulation dependence.

Gatto et al. [ 8 ] provide 2 examples of confounders. In the first example of Supplementary material D , the family history functions as a confounder, and in the second example of Supplementary material D , the adverse reaction to alcohol functions as a mediator between the susceptibility factor and the environmental exposure. For these 2 examples, the positive multiplicative interaction (ICR CO of > 1) will be biased towards the null (ICR CO ≈ 1) because of the overall negative association between S and E due to C.

If these covariates can be adjusted, the independence between S and E can be restored.

However, a cautious approach is required because the adjustment of unrelated covariates with S-E dependence would cost some degrees of freedom and would reduce the precision of ICR CO [ 8 ].

Another source of the violation of independence is a hidden dependence on a subpopulation. Wang et al. [ 9 ] provide a unique solution for this problem, providing the following Eq. ( 9 ):

CIR: Confounding Interaction Ratio. r SE : the correlation coefficient between S and E. CV S : variation in susceptibility factor prevalence odds. CV E : variation in environmental exposure prevalence odds.

CIR U : the upper bound of CIR, CIR L : the lower bound of CIR, υ S ( υ S  ≥ 1): the ratio of the largest and the smallest susceptibility frequency odds across all strata. υ E ( υ E  ≥ 1): the ratio of the largest and the smallest exposure frequency odds across all strata.

In Eq. ( 23 ), CIR is the ratio of the crude ICR c/nc without stratification over ICR c/nc with stratification. According to the above equation, there would be no population stratification bias (CIR =1), (i) if the exposure prevalence odds and the susceptibility frequency odds are uncorrelated across all strata (r ES  = 0), (ii) no variation exists in the exposure prevalence odds (CV E  = 0), or (iii) no variation exists in the susceptibility frequency odds (CV S  = 0).

In Eq. ( 24 ), υ S ( υ S  ≥ 1) denotes the ratio of the largest over the smallest susceptibility frequency odds, and υ E ( υ E  ≥ 1) denotes the ratio of the largest over the smallest exposure prevalence odds across all the strata in the population. If there is either no variation in the susceptibility frequency odds ( υ S  = 1) or in the exposure prevalence odds ( υ E  = 1), there would be no bias (U = L = 1) according to Eq. ( 24 ). If we can calculate CIR for a population, we can calculate ICR c/nc with stratification.

For the violation of S-E independence, researchers usually would try to evaluate a potential confounder based on their subject-matter knowledge. However, for subpopulation dependence, attention should be paid to the whole study population and the strata rather than finding a confounder. This important difference should be in the mind of researchers using a case-only approach.

The efficiency gained from the case-only approach

Case-only approach can calculate a more precise interaction effect estimate (i.e., that with a narrower confidence interval) than a study design with case and non-cases, such as a cohort/case-control study approach can do [ 16 ].

In Eqs. ( 8 ) and ( 9 ), and Table 2 , the asymptotic variance of \(\hat{\upbeta}\) 3 in a population with cases and non-cases is as follows:

In Eqs. ( 13 ) and ( 14 ), and Table 4 , the asymptotic variance of \({\overline{\overline{\upbeta}}}_3\) in a case-control study is as follows:

In Eqs. ( 4 ), Eq. ( 5 ), and Table 3 , the asymptotic variance of \(\hat{\gamma}\) 1 in a case-only study is as follows:

Comparing Eq. ( 27 ) with Eqs. ( 25 ) and ( 26 ), the case-only design can provide an estimate with a narrower confidence interval than either the case-control or the cohort design (study designs with cases and non-cases) can do. This efficiency gain comes from the independence assumption between susceptibility factor and environmental exposure (S-E OR c/nc  = 1).

Methodological issues to be considered

Several issues must be considered when applying the case-only approach to estimating the interaction effect between a susceptibility factor and an environmental exposure. Firstly, the case selection process must follow a typical rule of case selection as in a case-control study. Secondly, researchers must verify independence between the susceptibility trait and the environmental exposure in the population with cases and non-cases to substitute the ICR CO calculated in a case-only design for the ICR c/nc calculated in a population with cases and non-cases (according to Eqs. ( 2 ) and ( 3 )). If evidence of an association between susceptibility factor and environmental exposure exists, the calculated S-E OR c/nc must be used to correct the ICR CO by multiplying it as provided in Eq. ( 2 ). Thirdly, the independence assumption might seem reasonable for various susceptibility factors and environmental exposures. However, some susceptibility factors can modify the likelihood of environmental exposure. This hidden association must be discovered before a case-only approach is applied. Finally, the interaction effect estimate (ICR CO ) obtained from the case-only approach can only be interpreted as a departure from the multiplicative effect and not from the additive effect. However, according to previous epidemiologic literature, additive interaction more closely corresponds to mechanistic biologic interaction effects rather than merely statistical interaction effects [ 17 , 18 ]. Even though this is true, researchers in the current academic societies often use the multiplicative scale to estimate interaction effects because of several practical reasons [ 18 ]. This limitation should be considered when the results of this study are applied.

In summary, the case-only approach can be applied to environmental epidemiology successfully when a susceptibility factor and an environmental exposure are independent in a population with cases and non-cases. Through this approach, a more precise interaction effect estimate can be calculated.

Basic information of datasets and descriptive analysis for each variable

By combining ‘Albumin & Creatinine – Urine,’ ‘Chromium & Cobalt,’ ‘Glycohemoglobin,’ and ‘Demographic Variables and Sample Weights’ data files, a dataset with 7286 subjects was created. For the first analysis example, the respondents with the ‘yes’ answer to the question ‘take diabetic pills to lower blood sugar’ were excluded (5890 subjects). After that, only 1396 subjects were included. For the second analysis example, all subjects (7286 subjects created) were included. The descriptive analysis results for the main variables are provided in Table  5 .

The negative interaction effect between blood chromium level and glycohemoglobin level on albuminuria (micro and macro)

As the first example, Table  6 provides the sequential processes of applying the case-only approach (which will be explained in the first discussion section) in estimating the interaction effect between blood chromium level and glycohemoglobin level on albuminuria. All these sequential processes follow the sequential processes provided in subsection 2.3: (i) Firstly, a 1 μg/L difference of blood chromium level resulted in the fold-difference in the odds of albuminuria 2.20 (95% CI 1.48–3.32) times. (ii) Secondly, a 1% difference in blood glycohemoglobin level resulted in the fold-difference in the odds of albuminuria 1.57 (95% CI 1.44–1.73) times. (iii) Thirdly, when a 1 μg/L difference in blood chromium level and a 1% difference in blood glycohemoglobin level coincide, the multiplicative interaction contrast ratio (ICR) is 0.72 (95% CI 0.35–1.60), with statistical insignificance. (iv) Fourthly, in the population with cases and non-cases, blood chromium levels and blood glycohemoglobin levels are independent of each other (S-E OR c/nc : 0.76 (95% CI 0.47–1.06)). Therefore, the case-only ICR can be a good substitute for the ICR acquired from the population with cases and non-cases. (v) Finally, when only the cases are analyzed (case-only approach), the case-only ICR is 0.59 (95% CI 0.28–0.95), with a statistical significance (a negative interaction effect).

In this example, the environmental exposure (blood chromium level) and the susceptibility factor (blood glycohemoglobin level) are independent in the population with cases and non-cases. Therefore, the case-only ICR itself can be used as the ICR acquired from the population with cases and non-cases without a conversion. (This will be explained in the first discussion section in detail.) However, the ICR acquired from the population with cases, and non-cases was a statistically insignificant ICR because of a relatively wide confidence interval. This problem was solved by applying the case-only approach, producing a slightly decreased ICR with a statistical significance (a narrower confidence interval). A possible protective (negative) interaction effect between blood chromium levels and blood glycohemoglobin levels can be inferred from this example.

The positive interaction effect between blood cobalt level and old age on albuminuria (micro and macro)

As the second example, Table 6 provides the sequential processes of applying the case-only approach in estimating the interaction effect between blood cobalt level and age in years on albuminuria. All these sequential processes follow the sequential processes provided in subsection 2.3: (i) Firstly, a 1 μg/L difference in blood cobalt level resulted in the fold-difference in the odds of albuminuria 1.09 (95% CI 0.98–1.20) times, without a statistical significance. (ii) Secondly, the 1-year difference in age resulted in the fold-difference in the odds of albuminuria by 1.05 (95% CI 1.04–1.05) times. (iii) Thirdly, when a 1 μg/L difference in blood cobalt level (mcg/L) and a 1-year difference in age coincide, the multiplicative ICR is 1.13 (95% CI 0.99–1.37), with statistical insignificance. (iv) Fourthly, in the population with cases and non-cases, blood cobalt level and age in years show a slight association, not completely independent (S-E OR c/nc : 1.06 (95% CI 1.03–1.10)). Therefore, the case-only ICR must be multiplied by the S-E OR c/nc to be ICR c/nc according to Eq. ( 2 ). (v) Finally, when only the cases are analyzed (case-only approach), the case-only ICR is 1.14 (1.03–1.37), with a statistical significance (a positive interaction effect). (vi) By multiplying S-E OR c/nc by the ICR CO calculated, the ICR CO -adjusted, 1.21 (95% CI 1.06–1.51), was produced.

In this example, the environmental exposure (blood cobalt level) and the susceptibility factor (age in years) are not independent in the population with cases and non-cases. Therefore, the case-only ICR must be multiplied by the S-E OR c/nc to produce the ICR c/nc according to Eq. ( 2 ). The ICR acquired from the population with cases, and non-cases showed a statistically equivocal ICR (1.13 (95% CI 0.99–1.37)). However, by applying the case-only approach, the ICR CO -adjusted showed a slightly higher ICR with a statistical significance (1.21 (95% CI 1.06–1.51). Therefore, a possible aggravating (positive) interaction effect between blood cobalt levels and ages in years can be inferred from this example.

Many previous studies dealt with various aspects of the case-only approach, usually in the context of gene-environment interaction studies or gene-gene interaction studies [ 5 , 7 , 9 , 11 , 14 ]. Some studies compared the case-only ICR with the ICR from the case-control design, whereas others compared the case-only ICR with the ICR from the population with cases and non-cases. This study incorporated all previous literature and systematically organized the provided logic and equations. From this effort, various definitions and equations for the ICR in the case-only design can be established compared to the ICR in the population with cases and non-cases (cohort/case-control studies). This systematic organization of concepts from 3 study designs is the original contribution of this study.

Furthermore, this study extended the case-only approach, which had been used usually in gene-environment interaction or gene-gene interaction studies, to a more general concept of the interaction effect estimation between susceptibility factors and environmental exposures. If the independence assumption between a susceptibility factor and an environmental exposure is fulfilled, even though the ‘gene’ is replaced with the ‘susceptibility factor,’ the same equations can be applied. Therefore, the case-only approach can also be applied to environmental epidemiology.

The preventive (negative) interaction effect between blood chromium levels and glycohemoglobin levels on albuminuria (micro and macro)

The adverse effect of chromium on kidney function was reported in some previous literature [ 19 , 20 ]. Glycohemoglobin level ≥ 6.5% is a diagnostic criterion for diabetes mellitus and is naturally associated with diabetic nephropathy [ 21 ]. Albuminuria, including micro-albuminuria and macro-albuminuria, has been used both as a useful initial marker for kidney damage and a marker associated with an increased risk of progressive renal diseases [ 22 , 23 ]. However, a possible protective interaction effect is being increasingly reported for the interaction effect between chromium exposure and diabetic chronic kidney disease, based on improved glucose tolerance and insulin sensitivity [ 24 , 25 , 26 , 27 , 28 ].

The result of this study illustrates well a protective interaction effect between blood chromium level (environmental exposure) and blood glycohemoglobin level (susceptibility factor) on the albuminuria status (outcome). This protective interaction effect of chromium on diabetic patients with nephropathy can be used for establishing a future effective treatment strategy for diabetic nephropathy. For example, a study reports a possible positive effect of prescribing a nano chromium metal-organic framework on diabetic chronic kidney disease patients [ 24 ].

The aggravating (positive) interaction effect between blood cobalt levels and old ages on albuminuria (micro and macro)

The effect of blood cobalt levels on kidney function is not yet established, with only a few studies reporting possible adverse effects, mainly in experimental animals [ 29 ]. However, the effect of aging on decreasing kidney function is relatively well established [ 30 , 31 ]. Furthermore, the fact that this aging kidney is susceptible to various toxic substances is well known through numerous studies [ 32 , 33 , 34 , 35 ]. From these pieces of evidence, we can infer that the aging kidney could be more susceptible to the possible toxic effect of cobalt, even if it is almost non-toxic to the young kidney.

The result of this study illustrates well this toxin-susceptible feature of the aging kidney (susceptibility factor) to cobalt exposure (environmental exposure). As a marker of kidney damage, the proportion of albuminuria was greater in the older subjects. The result of this study can be used to devise a protective environmental health strategy for aging people with an increased possibility of exposure to heavy metals, such as cobalt.

This study summarized the previously reported logic and equations about the case-only approach systematically. In particular, the associated definitions and equations are collectively summarized from the cohort and case-control (study designs with cases and non-cases) to case-only studies. By substituting the ‘susceptibility factor’ concept from environmental epidemiology for the conventional ‘gene’ concept from genetic epidemiology, this study broadened the applicability of the case-only approach to broad environmental health topics. If the independence assumption between a susceptibility factor and an environmental exposure in the population with cases and non-cases is kept, this case-only approach can provide a more precise interaction effect estimate than that from study designs with cases and non-cases (cohort/case-control studies). Finally, 2 analysis examples of the case-only approach using the US NHANES datasets were explained. The protective interaction effect between blood chromium levels and blood glycohemoglobin levels and the aggravating interaction effect between blood cobalt levels and increasing ages on the incidence of albuminuria must be investigated meticulously in future studies. In summary, the case-only approach can be a useful approach not only in genetic epidemiology but also in environmental epidemiology.

Availability of data and materials

All data used in this article are available on the National Health and Nutrition Examination Survey homepage ( https://wwwn.cdc.gov/nchs/nhanes/ ).

Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet. 2001;358(9290):1356–60.

Article   CAS   Google Scholar  

Hogan MD, Kupper LL, Most BM, Haseman JK. Alternatives to Rothman's approach for assessing synergism (or antagonism) in cohort studies. Am J Epidemiol. 1978;108(1):60–7.

CAS   PubMed   Google Scholar  

Knol MJ, Egger M, Scott P, Geerlings MI, Vandenbroucke JP. When one depends on the other: reporting of interaction in case-control and cohort studies. Epidemiology. 2009;20:161–6.

Article   Google Scholar  

Skrondal A. Interaction as departure from additivity in case-control studies: a cautionary note. Am J Epidemiol. 2003;158(3):251–8.

Dennis J, Hawken S, Krewski D, Birkett N, Gheorghe M, Frei J, et al. Bias in the case-only design applied to studies of gene-environment and gene-gene interaction: a systematic review and meta-analysis. Int J Epidemiol. 2011;40(5):1329–41.

VanderWeele TJ, Hernández-Díaz S, Hernán MA. Case-only gene-environment interaction studies: when does association imply mechanistic interaction? Genet Epidemiol. 2010;34(4):327–34.

Li D, Conti DV. Detecting gene-environment interactions using a combined case-only and case-control approach. Am J Epidemiol. 2009;169(4):497–504.

Gatto NM, Campbell UB, Rundle AG, Ahsan H. Further development of the case-only design for assessing gene-environment interaction: evaluation of and adjustment for bias. Int J Epidemiol. 2004;33(5):1014–24.

Wang L-Y, Lee W-C. Population stratification bias in the case-only study for gene-environment interactions. Am J Epidemiol. 2008;168(2):197–201.

Albert PS, Ratnasinghe D, Tangrea J, Wacholder S. Limitations of the case-only design for identifying gene-environment interactions. Am J Epidemiol. 2001;154(8):687–93.

Yang Q, Khoury MJ, Sun F, Flanders WD. Case-only design to measure gene-gene interaction. Epidemiology. 1999;10(2):167–70.

Schmidt S, Schaid DJ. Potential misinterpretation of the case-only study to assess gene-environment interaction. Am J Epidemiol. 1999;150(8):878–85.

Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene environment interaction: case-control studies with no controls! Am J Epidemiol. 1996;144(3):207–13.

Dai JY, Liang CJ, LeBlanc M, Prentice RL, Janes H. Case-only approach to identifying markers predicting treatment effects on the relative risk scale. Biometrics. 2018;74(2):753–63.

Richardson DB, Kaufman JS. Estimation of the Relative Excess Risk Due to Interaction and Associated Confidence Bounds. Am J Epidemiol. 2009;169(6):756–60.

Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med. 1994;13(2):153–62.

Rothman KJ, Greenland S, Lash TL. Modern epidemiology. Third Edition. Philadelphia: Lippincott Williams & Wilkins; 2008.

Google Scholar  

VanderWeele TJ, Knol MJ. A tutorial on interaction. Epidemiol Methods. 2014;3(1):33–72.

Tsai T-L, Kuo C-C, Pan W-H, Chung Y-T, Chen C-Y, Wu T-N, et al. The decline in kidney function with chromium exposure is exacerbated with co-exposure to lead and cadmium. Kidney Int. 2017;92(3):710–20.

Wedeen RP, Qian LF. Chromium-induced kidney disease. Environ Health Perspect. 1991;92:71–4.

CAS   PubMed   PubMed Central   Google Scholar  

Association AD. 2. Classification and diagnosis of diabetes: standards of medical care in diabetes—2019. Diabetes Care. 2019;42(Supplement 1):S13–28.

Levey AS, Becker C, Inker LA. Glomerular filtration rate and albuminuria for detection and staging of acute and chronic kidney disease in adults: a systematic review. JAMA. 2015;313(8):837–46.

Heerspink HJL, Gansevoort RT. Albuminuria is an appropriate therapeutic target in patients with CKD: the pro view. Clin J Am Soc Nephrol. 2015;10(6):1079–88.

Fakharzadeh S, Kalanaky S, Argani H, Dadashzadeh S, Torbati PM, Nazaran MH, et al. Ameliorative effect of a nano chromium metal–organic framework on experimental diabetic chronic kidney disease. Drug Dev Res. 2021;82(3):393–403.

Huang H, Chen G, Dong Y, Zhu Y, Chen H. Chromium supplementation for adjuvant treatment of type 2 diabetes mellitus: Results from a pooled analysis. Mol Nutr Food Res. 2018;62(1):1700438.

Yin RV, Phung OJ. Effect of chromium supplementation on glycated hemoglobin and fasting plasma glucose in patients with diabetes mellitus. Nutr J. 2015;14(1):1–9.

Lewicki S, Zdanowski R, Krzyzowska M, Lewicka A, Debski B, Niemcewicz M, et al. The role of Chromium III in the organism and its possible use in diabetes and obesity treatment. Ann Agric Environ Med. 2014;21(2):331–5.

Sahin K, Onderci M, Tuzcu M, Ustundag B, Cikim G, Ozercan İH, et al. Effect of chromium on carbohydrate and lipid metabolism in a rat model of type 2 diabetes mellitus: the fat-fed, streptozotocin-treated rat. Metabolism. 2007;56(9):1233–40.

Naura AS, Sharma R. Toxic effects of hexaammine cobalt(III) chloride on liver and kidney in mice: Implication of oxidative stress. Drug Chem Toxicol. 2009;32(3):293–9.

Wetzels JFM, Kiemeney LALM, Swinkels DW, Willems HL, Heijer MD. Age- and gender-specific reference values of estimated GFR in Caucasians: The Nijmegen Biomedical Study. Kidney Int. 2007;72(5):632–7.

Coresh J, Astor BC, Greene T, Eknoyan G, Levey AS. Prevalence of chronic kidney disease and decreased kidney function in the adult US population: Third national health and nutrition examination survey. Am J Kidney Dis. 2003;41(1):1–12.

Wang X, Bonventre J, Parrish A. The Aging Kidney: Increased Susceptibility to Nephrotoxicity. Int J Mol Sci. 2014;15(9):15358–76.

Rosner MH. The pathogenesis of susceptibility to acute kidney injury in the elderly. Curr Aging Sci. 2009;2(2):158–64.

Schmitt R, Cantley LG. The impact of aging on kidney repair. Am J Physiol Ren Physiol. 2008;294(6):F1265–72.

Jerkić M, Vojvodić S, López-Novoa JM. The mechanism of increased renal susceptibility to toxic substances in the elderly. Int Urol Nephrol. 2001;32(4):539–47.

Download references

Acknowledgments

The author appreciates the reviewers’ comments on this study. (Reviewer #1 and David M. Thompson). In particular, the comments from David M. Thompson were of great help in improving the quality and logic of the primary manuscript. This work was supported by INHA UNIVERSITY Research Grant.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and affiliations.

Department of Environmental Health Science, Graduate School of Public Health, Seoul National University, Gwanak-ro 1, Gwanak-gu, Seoul, 08826, Republic of Korea

Jinyoung Moon

Department of Occupational and Environmental Medicine, Inha University Hospital, Inhang-ro 27, Jung-gu, Incheon, 22332, Republic of Korea

Jinyoung Moon & Hwan-Cheol Kim

Department of Social and Preventive Medicine, College of Medicine, Inha University, Inha-ro 100, Michuhol-gu, Incheon, 22212, Republic of Korea

Hwan-Cheol Kim

You can also search for this author in PubMed   Google Scholar

Contributions

Jinyoung Moon: Conceptualization, Methodology, Investigation, Resources, Data Curation, Software, Validation, Formal analysis, Writing – Original Draft, Visualization. Hwan-Cheol Kim: Writing –Review & Editing, Supervision, Project administration. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Hwan-Cheol Kim .

Ethics declarations

Ethics approval and consent to participate.

This study used only the publicly available National Health and Nutrition Examination Survey (NHANES) datasets. These datasets can be accessed on the NHANES homepage ( https://www.cdc.gov/nchs/nhanes/index.htm ). For the datasets, the information about Ethics Review Board (ERB) approval can be found on https://www.cdc.gov/nchs/nhanes/irba98.htm .

The authors confirm that all experiments were performed in accordance with the Declaration of Helsinki.

Consent for publication

Not applicable.

Competing interests

The authors have no potential competing interests to disclose.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: supplementary material a..

The used R codes for the statistical analyses. Supplementary material B. The S-E independence in the controls cannot replace the S-E independence in the population with cases and non-cases [1]. Supplementary material C. How strong a rare disease assumption is required for the equality between S-E OR c/nc and S-E OR control [1]. Supplementary material D. Violation of independence: confounder [1].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Moon, J., Kim, HC. Case-only approach applied in environmental epidemiology: 2 examples of interaction effect using the US National Health and Nutrition Examination Survey (NHANES) datasets. BMC Med Res Methodol 22 , 254 (2022). https://doi.org/10.1186/s12874-022-01706-6

Download citation

Received : 29 October 2021

Accepted : 08 August 2022

Published : 29 September 2022

DOI : https://doi.org/10.1186/s12874-022-01706-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Case-only approach
  • Environmental epidemiology
  • Interaction effect
  • Independence assumption
  • National Health and Nutritional Examination Survey
  • Susceptibility factor
  • Environmental exposure

BMC Medical Research Methodology

ISSN: 1471-2288

case study environmental epidemiology

  • Open access
  • Published: 15 November 2021

Role of epidemiology in risk assessment: a case study of five ortho-phthalates

  • Maricel V. Maffini   ORCID: orcid.org/0000-0002-3853-9461 1 ,
  • Birgit Geueke   ORCID: orcid.org/0000-0002-0749-3982 2 ,
  • Ksenia Groh 3 ,
  • Bethanie Carney Almroth   ORCID: orcid.org/0000-0002-5037-4612 4 &
  • Jane Muncke   ORCID: orcid.org/0000-0002-6942-0594 2  

Environmental Health volume  20 , Article number:  114 ( 2021 ) Cite this article

8036 Accesses

17 Citations

82 Altmetric

Metrics details

The association between environmental chemical exposures and chronic diseases is of increasing concern. Chemical risk assessment relies heavily on pre-market toxicity testing to identify safe levels of exposure, often known as reference doses (RfD), expected to be protective of human health. Although some RfDs have been reassessed in light of new hazard information, it is not a common practice. Continuous surveillance of animal and human data, both in terms of exposures and associated health outcomes, could provide valuable information to risk assessors and regulators. Using ortho-phthalates as case study, we asked whether RfDs deduced from male reproductive toxicity studies and set by traditional regulatory toxicology approaches sufficiently protect the population for other health outcomes.

We searched for epidemiological studies on benzyl butyl phthalate (BBP), diisobutyl phthalate (DIBP), dibutyl phthalate (DBP), dicyclohexyl phthalate (DCHP), and bis(2-ethylhexyl) phthalate (DEHP). Data were extracted from studies where any of the five chemicals or their metabolites were measured and showed a statistically significant association with a health outcome; 38 studies met the criteria. We estimated intake for each phthalate from urinary metabolite concentration and compared estimated intake ranges associated with health endpoints to each phthalate’s RfD.

For DBP, DIBP, and BBP, the estimated intake ranges significantly associated with health endpoints were all below their individual RfDs. For DEHP, the intake range included associations at levels both below and above its RfD. For DCHP, no relevant studies could be identified. The significantly affected endpoints revealed by our analysis include metabolic, neurodevelopmental and behavioral disorders, obesity, and changes in hormone levels. Most of these conditions are not routinely evaluated in animal testing employed in regulatory toxicology.

We conclude that for DBP, DIBP, BBP, and DEHP current RfDs estimated based on male reproductive toxicity may not be sufficiently protective of other health effects. Thus, a new approach is needed where post-market exposures, epidemiological and clinical data are systematically reviewed to ensure adequate health protection.

Peer Review reports

Non-communicable diseases (NCD) are a global burden to public health [ 1 ]. Nutritional shortcomings and lifestyle factors have been associated with increased incidence of diabetes and obesity, but current evidence indicates that exposures to environmental chemical contaminants also play a role in the development of NCDs [ 2 ]. In the US, cardiovascular diseases and mental health conditions impose the highest economic burden followed by cancer, diabetes, and chronic respiratory diseases [ 3 ]. Of particular concern are exposures during gestation and early childhood [ 4 ] . A recent review [ 5 ] proposed incorporating environmental health risk factors when estimating global burden of disease, including air pollutants, neurotoxicants, endocrine disrupting chemicals, and climate-related factors. To do this successfully, the components of risk assessment such as exposure sources and levels, as well as data about chemical effects and associated health outcomes, are required [ 6 ].

One source of chemical exposure is plastic. With a global production of almost 360 million metric tons in 2018 [ 7 ], manufacturing, use, and disposal of plastic materials pose major safety concerns. Leachate from landfills, migration from consumer products (e.g., food packaging, toys, flooring, textiles), and air pollution from burning plastic materials are just some of the sources of chemical contamination affecting humans and the environment [ 8 , 9 , 10 ] . Because information on chemicals present in plastics is difficult to obtain and their hazards often remain unknown, Groh and colleagues [ 11 ] published a comprehensive database with more than 900 chemicals likely associated with plastic packaging as part of the Hazardous Chemicals in Plastic Packaging (HCPP) project. The authors also ranked the chemicals based on hazards to human and environmental health according to the United Nations’ Globally Harmonized System of Classification and Labelling of Chemicals [ 12 ] The 63 chemicals that ranked highest for human health concerns underwent a tiered prioritization [ 13 ] based on biomonitoring data, endocrine disrupting properties, and their regulatory status under the European Chemicals Regulation REACH. This prioritization approach identified five ortho-phthalates (referred to as phthalates in this article) for which the risk to human health was considered the highest: benzyl butyl phthalate (BBP, CAS 85-68-7); dibutyl phthalate (DBP, CAS 84-74-2); diisobutyl phthalate (DIBP, CAS 84-69-5); bis(2-ethylhexyl) phthalate (DEHP, CAS 117-81-7); dicyclohexyl phthalate (DCHP, CAS 84-61-7).

Phthalates are highly abundant plastic additives used primarily as plasticizers to soften materials and make them flexible [ 14 ]. Human biomonitoring shows widespread exposure to phthalates [ 15 ] from diverse sources including food which could be contaminated from its packaging as well as other food contact materials such as conveyor belts and tubing used in food processing [ 16 , 17 , 18 , 19 , 20 ]. Personal care products and building materials also contribute to human exposure to phthalates [ 21 , 22 ].

Several regulatory authorities have assessed the toxicity of BBP, DBP, DIBP, DEHP, and DCHP [ 23 , 24 , 25 ] and established the amount of each chemical above which the risk to human health increases. Regulatory agencies give different names to these so-called ‘safe’ levels including derived no-effect level (DNEL) used by the European Chemical Agency (ECHA) [ 26 ], acceptable daily intake (ADI) used by International Programme on Chemical Safety [ 27 ], total dietary intake (TDI) used by the European Food Safety Authority (EFSA) [ 28 ] and reference dose (RfD) used by the U.S. Environmental Protection Agency (EPA) [ 29 ]. Although the nomenclature is different, the meaning is similar, namely, exposures above established amounts of chemicals are not safe. For simplicity, we use the term RfD throughout the article.

The established RfDs for the five phthalates are the result of risk assessments of mostly animal studies showing adverse effects on male reproductive development due to the anti-androgenic properties of these chemicals. These risk assessments’ results have led to the restriction of some uses of these phthalates. In 2008, the Congress of the United States banned the use of DEHP, BBP and DBP in children’s toys and childcare articles [ 30 ] and in 2017, the Consumer Protection Safety Commission increased the list of prohibited phthalates to eight [ 23 ]. Similarly, the European Union has also listed DEHP and DBP in its authorization list under REACH and more than a dozen phthalates are included in the candidate list for authorization [ 31 ]. The observed decline in human exposure to restricted phthalates in industrialized countries over the years [ 15 , 32 , 33 ] have been attributed to these regulatory measures. Notably there are yet no major restrictions to uses in food contact materials (e.g., packaging, processing equipment), pharmaceuticals, and medical devices.

Epidemiological data published in the last 15 years indicate that in some cases exposure to phthalates is still a cause of concern to human health. For example, recent publications by the U.S. Environmental Protection Agency (EPA) show a strong association between exposure to low concentrations of DEHP, DBP, and DIBP and increased risk of diabetes [ 34 ], and between exposure to DEHP and DBP and male reproductive effects such as reduced semen quality and testosterone levels [ 35 ]. Several small- and large-scale human studies have also shown phthalates to associate in a dose-dependent manner with negative effects on neurodevelopment [ 36 , 37 ], metabolic function [ 38 ] and female reproduction [ 39 ]. Therefore, we aimed to investigate whether regulatory safe levels of phthalates are protective of the public for other relevant health outcomes in addition to male reproductive development. We conducted a targeted literature search of human studies showing association between any of the five phthalates, BBP, DBP, DIBP, DEHP, and DCHP and health effects. Furthermore, we back-estimated daily intake for each phthalate that showed a statistically significant association with health effects, and compared these estimated intake values to the individual RfD.

Targeted literature search

We searched the Public Library of Medicine for human studies on phthalates published between 2003 and 2019. Search terms included compounds’ full name, abbreviation, and chemical abstracts service (CAS) numbers in combination with human exposure, epidemiological studies and metabolites among others. See Supplemental Materials for additional information. This targeted search aimed at obtaining information on the five phthalates including concentration of parent phthalates or metabolites in any bodily fluid, description of measured endpoint, and statistical significance of the association between health endpoint and concentration measured. When a study met these criteria, we extracted the following data: 1) population sampled and population in which the endpoints were measured (e.g., men; pregnant women/children; children, etc.); age; gestational age where appropriate; 2) metabolite or parent compound concentration as percentile, geometric mean or other available concentration measure; 3) concentration at which metabolite(s) or parent compound had a statistically significant correlation with an endpoint; 4) statistically significant endpoint and outcome (e.g., increase/decrease; positive/negative association). We used the studies that met the criteria described above to perform the analysis and controlled for quality, specifically, whether the studies included controls for covariates and confounders such as race, maternal/paternal age, child’s sex, IQ, socioeconomic status, smoking, physical activity, caloric intake, etc.; however, we did not control for potential bias.

Intake estimation from urinary concentration

From the studies that met the inclusion criteria, we identified the lowest phthalate metabolite concentration that was associated with a statistically significant endpoint. Concentration data were expressed in various ways including geometric means of a population, percentiles, and average of urine collections per individual visits. We established the following assumptions: 1) the 25th percentile concentration was considered equivalent to a no-observed-adverse-effect level when concentrations were expressed as quartiles, meaning that only concentrations at or greater than the 25th percentile were included; 2) unless specified in the studies, logistic regressions were considered linear.

For each phthalate, we estimated intake using urinary concentration of its metabolite(s), daily urine volume, body weight (bw), and creatinine correction values for the different populations assessed [ 40 , 41 ]. In the case of DEHP, we considered the individual excretion of its four metabolites over time and expressed it as percent of the parent phthalate’s intake as described previously [ 42 , 43 ]. We used the following mean percentage excretion values for DEHP metabolites: 6% for monoethylhexyl phthalate (MEHP), 11% for mono-(2-ethyl-5-oxohexyl phthalate (5oxo MEHP), 15% for mono(2-ethyl-5-hydroxyhexyl) phthalate (5OH MEHP) and 14% for mono-(2-ethyl-5-carboxypentyl) phthalate (5cx MEHP). For DBP, DIBP and BBP, we followed the European Chemical Agency (ECHA) assumption of 100% elimination of the parent compound as phthalate monoesters [ 25 ]. We used the following formula:

Intake (μg/ bw (kg)/d) = Metabolite concentration (μg/L) x (Vol (L)/day) x (1/bw (kg)) x (1/% elimination)

In cases when creatinine correction was needed, concentration of urinary metabolite in microgram per gram (μg/g) creatinine was multiplied by the urinary concentration of creatinine in gram per liter (g/L). The Supplemental Materials include an example of the intake calculations and the assumptions made for each population (children, pregnant women, non-pregnant women, men) regarding body weight, daily urine volume, and creatinine excretion.

Regulation of priority phthalates

The uses of and exposure to phthalates are regulated in the European Union and the United States [ 23 , 24 , 25 , 44 , 45 ]. We chose the regulatory limits set by ECHA and the US Consumer Protection Safety Commission (CPSC) to compare against the estimated intakes associated with health endpoints because these safe levels have been reaffirmed or established in the last 5 years using current scientific evidence. In addition, both assessments target products that are commonly used by children, a susceptible population as highlighted by government regulatory agencies [ 23 , 25 , 46 ]. Regulatory RfDs are commonly expressed as the amount of chemical a person is safe to consume per kilogram of body weight per day, over their expected lifetime. Table  1 summarizes the RfD for BBP, DBP, DIBP and DEHP and the health endpoint selected by ECHA to establish each reference dose. Because ECHA did not establish an RfD for DCHP, we used a regulatory limit set by the US CPSC, i.e., less than 0.1% DCHP per weight of the final product for children’s toys and articles [ 23 ]. This assessment was also based on DCHP’s anti-androgenic effects (i.e., reduced anogenital distance) observed in male rodents [ 54 ].

We identified 38 out of 64 publications that met our selection criteria (Table  2 ). The studies included longitudinal and cross-sectional studies; small cohorts (e.g., patients at fertility clinics; under-represented urban populations) and nationally representative cohorts such as the National Health and Nutrition Examination Survey (NHANES) of the U.S. Center for Disease Control and Prevention; and prenatal exposure studies where phthalates were measured in the mothers but the health outcomes were assessed in their children months or years after birth. Supplemental Materials Table S1 lists the 26 publications that did not meet our criteria and therefore were not included in this case study.

All 38 studies reported phthalate metabolites measured in urine. DEHP was the phthalate most frequently assessed. There were 12 studies on mother-child pairs evaluating prenatal exposure effects, 12 women-only studies, six men-only studies, eight children studies evaluating postnatal exposure effects, and two studies including both men and women. A few studies included more than one population (e.g., children and adults) and only one study was a prospective mother-child study. It is worth noting that none of the studies included evaluation of DCHP, neither as a parent compound nor its metabolite. The lack of epidemiological studies on DCHP is likely due to the fact that the urinary concentration of DCHP metabolite has been found to be consistently below the limit of detection at the 75th percentile in the NHANES 1999–2010 period [ 83 ] and, when measured, the frequency of detection has been low (e.g., less than 10% of the population tested) [ 33 , 83 ].

Table 1 lists the range of exposure for each phthalate and their association with significant endpoints. All phthalates measured in urine as metabolites of DEHP, DBP, BBP and DIBP showed significant associations with reproductive (male and female), neurodevelopmental, behavioral, hormonal, and metabolic endpoints at estimated intake values well below their respective RfDs.

Figure  1 shows the estimated intake distribution per phthalate compared to the respective RfD. DEHP had the widest range of estimated intakes associated with statistically significant endpoints: 0.03–242.5 μg/kg-bw/d (Table 1 , Fig.  1 ). The highest estimate was almost seven times greater than the RfD (35 μg/kg-bw/d) which is an indication that some individuals could already be exposed to unsafe levels of the chemical as judged by the current regulatory limits. As shown in Table 1 , the highest DEHP intake was associated with decreased semen quality [ 47 ]. On the lower end, DEHP was associated with significantly lower number of ovarian antral follicles (a measure of remaining oocytes supply) [ 39 ] at an estimated intake three-orders of magnitude lower than the RfD (0.03 and 35 μg/kg-bw/d, respectively).

figure 1

Schematic representation of the range of estimated intake for individual phthalates (solid light-colored bars) associated with statistically significant endpoints (small circles) in relation to their respective reference doses (RfD; large circles). Each small circle corresponds to an endpoint significantly associated with an estimated intake. The lowest metabolite concentrations measured in urine that were found to be associated with statistically significant endpoints were 0.03, 0.19, 0.06 and 0.08 μg/L for DEHP, DBP, BBP and DIBP, respectively. See Supplemental Table S2 for additional data. DEHP: diethylhexyl phthalate; DBP: dibutyl phthalate; BBP: butylbenzyl phthalate; DIBP: diisobutyl phthalate

For DBP, DIBP and BBP, the ranges of intake associated with statistically significant endpoints were all below their respective RfDs. (Fig. 1 ). The lowest estimated intake for DBP (0.19 μg/kg-bw/d) was associated with decreased sperm motility and semen concentration [ 48 ] while the highest intake (2.86 μg/kg-bw/d) was associated with decreased concentration of total thyroid hormone thyroxine (T4) and free T4 (fT4) in women [ 49 ]. The lowest DIBP intake measured in pregnant women (0.08 μg/kg-bw/d) was associated with decrease in masculine play behavior in boys [ 52 ] and the highest intake (0.51 μg/kg-bw/d) also measured in pregnant women, was significantly associated with increased occurrence of eczema in children [ 53 ]. The range of estimated intake for BBP associated with significant endpoints showed the greatest difference with the RfD. The lowest intake of 0.06 μg/kg-bw/d was associated with increased levels of steroid hormone binding globulin (SHBG) in children [ 50 ]. SHBG is a protein that transports estrogen and testosterone in the blood and regulates their access to tissues [ 84 ]. The highest estimated intake for BBP (0.6 μg/kg-bw/d) was associated with increased body mass index and waist circumference in men and women [ 51 ]. These intakes are eight-thousand to five-thousand times lower than BBP’s RfD of 500 μg/kg-bw/d.

The four phthalates for which we found data are known to affect male reproductive development due to their anti-androgenic properties which are the basis of their regulation. However, other systems are also affected at exposure levels similar to those associated with anti-androgenicity as seen in Table 2 . Our analysis shows the 10 lowest estimated intakes were significantly associated with endpoints measured in women and children. Many of these endpoints relate to endocrine function and neurobehavioral development in children as well as female reproductive system (Table  3 ).

Prenatal exposures to DEHP, DBP, BBP and DIBP were significantly associated with a diverse set of negative outcomes in the neurological system, and all endpoints were associated with intakes well below the RfD for each phthalate. Supplemental Table S2 shows that children born to mothers exposed to phthalates during pregnancy display delayed psychomotor and mental development [ 65 , 66 ]; decreased intellectual, memory and executive function development [ 36 ]; and behavioral changes associated with both delinquency and externalization [ 64 ] as well as withdrawn personalities and internalization of problems [ 65 , 68 ]. Increased odds of attention deficit hyperactivity disorder [ 69 ] and decreased masculine behavior in boys [ 52 ] were also observed.

We identified three major systems associated with metabolic function that were affected by phthalates: thyroid, pancreas, and fat tissue (Supplemental Table S3 ). DEHP, DBP and BBP were associated with decreased levels of triiodothyronine (T3) in men and children as young as 4 years of age. DEHP was also associated with decreased levels of free T4 in women [ 56 ] and men [ 82 ] and decreased thyroid stimulating hormone (TSH) in men [ 47 ].

DIBP and DEHP intakes were positively associated with insulin resistance in children [ 72 , 78 ] and men and women [ 60 , 77 ]. The effect of DEHP on fat tissue was more diverse. For instance, in adults, body mass index (BMI) was negatively associated with DEHP levels in men and women [ 59 ], while Hatch et al. [ 51 ] reported a positive correlation in women). Maternal DEHP levels were inversely associated with their daughters’ BMI at a young age (4–7 years) [ 67 ] and Zang and colleagues also observed a negative association between DEHP levels and obesity in 8–10-year-old girls [ 74 ]. DBP and BBP showed a positive correlation with obesity in boys [ 74 ], BMI and waist circumference in women and men [ 51 ].

All the estimated intakes were below their respective RfDs, except for the reduction in TSH level in men that was associated with the highest DEHP intake of 242.55 μg/kg-bw/d [ 47 ].

Both, the male and female reproductive systems and their associated hormones, were negatively affected by the four phthalates (Supplemental Table S4 ). DEHP, DBP and DIBP intakes were associated with reduced number of antral follicles in women [ 39 ] and DEHP, DBP and BBP with delayed puberty in girls [ 73 ]. DEHP and DBP were associated with decreased number of fertilized eggs and total oocytes, and lower quality of oocytes [ 57 ]. DEHP and DIBP showed a negative association with trophoblast differentiation genes [ 58 ]. DEHP was also associated with decreased levels of inhibin [ 61 ], a critical hormone in reproductive functions [ 85 ], and showed inconsistent association with gestational length [ 62 , 63 ].

In adult men, DEHP, DBP and BBP all had a negative association with semen quality including concentration and sperm motility [ 47 , 48 , 79 ]. DEHP was associated with decreased total and free testosterone and estradiol, as well as increased levels of SHBG [ 80 ]. DEHP also had a positive association with testosterone/estradiol ratio [ 81 ]. In boys gestationally exposed to known levels of phthalates, DEHP and DIBP were negatively associated with free and total testosterone and estradiol [ 50 ]. DEHP, DBP and BBP were associated with increased SHBG. DBP was associated with decreased levels of dehydroepiandrosterone [ 50 ]. Finally, DEHP was also associated with reduced anogenital distance in boys [ 70 ].

This case study shows that low dose exposures to BBP, DBP, DIDP and DEHP are associated with health endpoints in organs and systems not usually assessed in regulatory toxicology studies. These endpoints differ markedly from the well-studied effects of phthalates on male reproductive development. Furthermore, there are significant physiological effects (i.e., early biological perturbations that may lead to overt effects) and disorders that may require clinical interventions later in life associated with estimated intake levels lower than the current RfD. We also observed that some individuals appear to be exposed to levels of DEHP higher than its RfD. This may be the case if there are yet to be identified exposure routes and sources, or if the metabolism or excretion of DEHP is altered. Overall, these data, although with limitations, show weaknesses in a chemical regulation framework that is in need of improvement.

Some of the limitations are, first, this study is similar to a mapping of evidence; it is not a systematic review that must follow stricter protocols and methods. Second, our approach aimed to capture as many publications as possible. However, although we used broad search terms, we may still have missed relevant publications. Third, we trust the integrity and quality of the journal peer-reviewed conducted for each of the studies we included. However, we understand the peer review process is not perfect. An example of this less-than-perfect process is the lack of clarity or data that prevented us to include an additional 26 human studies as shown in Table S1. Importantly, only six studies were excluded because of the lack of statistical significance, hence, the body of evidence is consistent with the associations. Fourth, in some cases, data interpretation had to be based on information that was available. Although we contacted authors from some of the studies that did not meet our criteria to obtain additional data, only a few responded to our request and were willing to share additional data. Fifth, the number of subjects in the studies varied from less than 100 to thousands of people; although the population size as such could be a limitation, strong and weak statistical significance was observed in all cases. As all but one study was cross-sectional, we are mindful about implying that they show causality. Lastly, some assumptions made in our calculations may have been outdated. For example, the EPA handbook on exposure is from 2011. Although it is our understanding that the agency and others continue to use this handbook in their analysis, we cannot rule out that parameters such as body weight by age range may have changed in the last decade and could have affected our estimates.

Overall, the case study we present here specifically aimed to use strong human data to perform a first examination of a hypothesis, namely that the current animal-based testing methods to estimate “safe” exposure levels of chemicals could be significantly underestimating actual human health risk if epidemiological data are not considered. Following the initial confirmatory findings presented here, this hypothesis will serve as a basis to guide further testing and more detailed assessments in a follow-up work.

The protection of public health from detrimental effects of environmental chemical exposures should ideally incorporate the expertise from two sides: the risk assessors and the healthcare community, including epidemiologists. On the one hand, risk assessment relies on evaluating exposure to a chemical and using animal models to identify which organ(s) would be affected, in order to find a dose that would cause no harm. On the other hand, the medical community is confronted with a wide range of health outcomes in the human population—from acute to chronic and from subtle to clinically defined—and tries to identify what caused them, whether environmental chemical exposure or otherwise, in order to support prevention. But there is a disconnect between these bookends of environmental health which hinders effective protection of the public from chemical exposures. In 2017, the US National Academy of Sciences [ 86 ] recommended that for evaluating evidence of low dose effects, regulators should surveil for signals indicating an adverse outcome in a human population or evidence that a particular low dose effect may not be detectable with traditional toxicity testing. The authors stated that one way to seek out information is by conducting regular surveys of the scientific literature. Our limited case study of five phthalates shows that many of the health effects observed to occur in humans at very low exposure levels are not traditionally evaluated in animal toxicology testing. Metabolic, neurodevelopmental and behavioral disorders, obesity, levels of hormones and transport proteins are just a few examples of endpoints not commonly included in toxicity testing guidelines despite their relevance to human health. It is also important to point out that traditional toxicology studies only infrequently evaluate a dose-effect relationship using chemical levels relevant to human exposures occurring at different life-stages. Rather, assumptions of safe levels are commonly made based on adult non-pregnant animal data. These omissions thus result in significant gaps in chemicals regulation that may put human health at risk [ 87 ].

The current chemical risk assessment approach to establish an RfD used by most regulatory agencies around the world combines a dose that did not cause an adverse effect in animal studies using high exposure doses and safety factors (also known as uncertainty factors) to account for incomplete data and variability between and within species. Although not routinely, regulatory ‘safe’ levels have been reviewed. For example, ECHA lowered the derived no effect level for DIBP from 420 to 8.3 μg/kg bw/d in 2016 [ 25 ]; similarly, the European Food Safety Authority (EFSA) lowered the tolerable daily intake of bisphenol A from 50 to 4 μg/kg bw/d in 2015 [ 88 ]. In both cases, new scientific information was available at the time the agencies were responding to requests for reassessment of those chemicals. However, we would argue that, in addition to specific requests made to regulatory agencies, a more systematic reevaluation of RfDs could be incorporated into the risk assessment and management processes. For example, a post-market RfD reassessment could be triggered by 1) human studies showing associations between exposure and endpoints previously not measured; 2) information on reported uses or biomonitoring indicating increased exposures due to chemical production volumes or reduced exposure due to abandoned uses; or 3) new hazard information. Lastly, this information surveillance should not be the exclusive responsibility of the regulatory agencies; rather, companies with approved chemical uses should submit new available information that could potentially raise questions about the safety of their product and agencies should establish a mechanism to enforce this requirement.

Both, scientific information and market behavior, are dynamic. Advances in science and technology allow scientists to develop new methods to measure chemicals in humans and gain new knowledge and understanding of chemicals’ interactions with physiological systems at different life stages. To account for these developments, epidemiological and clinical studies together with chemical biomonitoring data should be evaluated at regular intervals as recommended by the NAS [ 86 ] in order to check whether an RfD review is warranted to better protect public health. We are cognizant that this approach, although promising, is not without shortcomings. For instance, biomonitoring data alone cannot account for all sources of exposure. For chemicals like phthalates, with many sources ranging from the diet to personal care products and house dust, it may be challenging to design mitigating strategies to reduce the most significant sources of exposure. However, well designed surveys and a better understanding of materials’ composition may help identifying the major exposure sources for various populations as it was described by Lioy and colleagues [ 89 ].

As implied earlier, the RfD represents a concept of ‘safety’, a bright line between ‘no risk’ or ‘safe’ when the exposure estimate is below the established number and ‘risk’ or ‘unsafe’ when the value is greater than the RfD. In reality, it is far more complicated, namely, chemical hazard information and populations’ background exposures from multiple chemicals, health conditions and life-stages change with time. In its 2009 Science and Decisions report [ 90 ], the NAS recognized this complexity and recommended a progression away from the current concept of ‘safety’ and towards dose-response methods that quantify risk at doses used in animal experiments as well as lower doses representing human exposures. As much as two-thirds of the human population suffers from chronic diseases that cannot be explained by genetic causes alone [ 91 ] and it is becoming increasingly apparent that life-long chemical exposures can contribute to this burden [ 5 ]. Yet, for the great majority, chemicals are not evaluated for their contribution to common chronic ailments in the human population [ 92 , 93 ]. As a consequence, the current work on toxicology and epidemiology is inundated with disconnected data that misses the bigger picture: better protection of the entire human population’s health. Perhaps it is time to reconsider the status quo to ensure adequate population health protection. Issues to be interrogated may include, among others, strategies for proper assessment of the risk of developmental exposures; use of early biomarkers of health effects; integration of evidence from different data streams including predictive modeling, in vitro, animals and humans; development of new and redesign of old testing protocols; optimization of in vitro testing to minimize the use of laboratory animals; design of protocols to more efficiently monitor human exposures.

Conclusions

Phthalates have been used in many products for many decades. There are substantial animal and human data available which allowed us to use these substances in case studies such as this one. However, a similar question could be raised for many other chemicals with a growing body of human biomonitoring data and evidence of human health effects [ 94 ].

To set the course for a better, more efficient and health protective risk assessment of chemicals, a dialogue should be established between risk assessors, the medical community, and academic researchers. Until a profound modernization of the risk assessment and management of chemicals occurs, human studies should be taken into account to identify whether the health risk of chemicals already in the marketplace, such as phthalates, should be reassessed.

Abbreviations

First trimester

Second trimester

Third trimester

Acceptable daily intake

Benzyl butyl phthalate

Body mass index

Body weight

Chemical abstract service

Consumer products safety commission

Dibutyl phthalate

Dicyclohexyl phthalate

Bis(2-ethylhexyl) phthalate

Diisobutyl phthalate

Derived no-effect level

European chemical agency

European food safety authority

Environmental protection agency

Homeostatic model assessment of insulin resistance

Intelligence quotient

Non-communicable disease

National health and nutrition examination survey

Registration, evaluation, authorization and restriction of chemicals

  • Reference dose

Steroid hormone binding globulin

Triiodothyronine

Tolerable daily intake

Thyroid stimulating hormone

United States

World Health Organization: Global status report on noncommunicable diseases. 2014.

Landrigan PJ, Fuller R, Acosta NJR, Adeyi O, Arnold R, Basu N, et al. The lancet commission on pollution and health. Lancet. 2017;391:462–512.

Chen S, Kuhn M, Prettner K, Bloom DE. The macroeconomic burden of noncommunicable diseases in the United States: estimates and projections. PLoS One. 2018;13(11):e0206702.

Article   Google Scholar  

Balbus JM, Barouki R, Birnbaum LS, Etzel RA, Gluckman PD, Grandjean P, et al. Early-life prevention of non-communicable diseases. Lancet. 2013;381(9860):3–4.

Shaffer RM, Sellers SP, Baker MG, de Buen KR, Frostad J, Suter MK, et al. Improving and expanding estimates of the global burden of disease due to environmental health risk factors. Environ Health Perspect. 2019;127(10):105001.

Muncke J, Backhaus T, Geueke B, Maffini MV, Martin OV, Myers JP, et al. Scientific challenges in the risk assessment of Food contact materials. Environ Health Perspect. 2017;125(9):095001.

PlasticsEurope: Plastics - the Facts 2019. In.; 2019.

Kawagoshi Y, Tsukagoshi Y, Fukunaga I. Determination of estrogenic activity in landfill leachate by simplified yeast two-hybrid assay. JEM. 2002;4(6):1040–6.

CAS   Google Scholar  

Biedermann-Brem S, Biedermann M, Pfenninger S, Bauer M, Altkofer W, Rieger K, et al. Plasticizers in PVC toys and childcare products: what succeeds the phthalates? Market survey 2007. Chromatographia. 2008;68(3):227–34.

Article   CAS   Google Scholar  

Jambeck JR, Geyer R, Wilcox C, Siegler TR, Perryman M, Andrady A, et al. Plastic waste inputs from land into the ocean. Science. 2015;347(6223):768–71.

Groh KJ, Backhaus T, Carney-Almroth B, Geueke B, Inostroza PA, Lennquist A, et al. Overview of known plastic packaging-associated chemicals and their hazards. Sci Total Environ. 2019;651(Pt 2):3253–68.

United Nations Economic Commission for Europe: Globally harmonized system of classification and labelling of chemicals (GHS). 2015. https://unece.org/about-ghs .

Geueke B, Inostroza PA, Maffini M, Backhaus T, Carney-Almroth B, Groh KJ, et al. Prioritization approaches for hazardous chemicals associated with plastic packaging. Food Packaging Forum Zurich. 2018:1–14.

Nerin C, Canellas E, Vera P. Plasticizer migration into foods. In: Reference Module in Food Science. Amsterdam: Elsevier; 2018.

Frederiksen H, Nielsen O, Koch HM, Skakkebaek NE, Juul A, Jørgensen N, et al. Changes in urinary excretion of phthalates, phthalate substitutes, bisphenols and other polychlorinated and phenolic substances in young Danish men; 2009–2017. Int J Hyg Environ Health. 2020;223(1):93–105.

Husøy T, Martínez MA, Sharma RP, Kumar V, Andreassen M, Sakhi AK, et al. Comparison of aggregated exposure to di(2-ethylhexyl) phthalate from diet and personal care products with urinary concentrations of metabolites using a PBPK model - results from the Norwegian biomonitoring study in EuroMix. Food Chem Toxicol. 2020;143:111510.

Geueke B, Muncke J. Substances of very high concern in food contact materials: migration and regulatory background. Packag Technol Sci. 2018;31(12):757–69.

Van Holderbeke M, Geerts L, Vanermen G, Servaes K, Sioen I, De Henauw S, et al. Determination of contamination pathways of phthalates in food products sold on the Belgian market. Environ Res. 2014;134:345–52.

Schecter A, Lorber M, Guo Y, Wu Q, Yun SH, Kannan K, et al. Phthalate concentrations and dietary exposure from food purchased in New York state. Environ Health Perspect. 2013;121(4):473–94.

Guart A, Bono-Blay F, Borrell A, Lacorte S. Effect of bottling and storage on the migration of plastic constituents in Spanish bottled waters. Food Chem. 2014;156:73–80.

Koch HM, Lorber M, Christensen KL, Pälmke C, Koslitz S, Brüning T. Identifying sources of phthalate exposure with human biomonitoring: results of a 48 h fasting study with urine collection and personal activity patterns. Int J Hyg Environ. 2013;216(6):672–81.

Hammel SC, Levasseur JL, Hoffman K, Phillips AL, Lorenzo AM, Calafat AM, et al. Children's exposure to phthalates and non-phthalate plasticizers in the home: the TESIE study. Environ Int. 2019;132:105061.

Consumer Protection Safety Commission. Prohibition of Children's Toys and Child Care Articles Containing Specified Phthalates. In: 16 CFR 1307. United States: Federal Register; 2017. p. 49938–82.

EFSA Panel on Food contact materials E, aids P, Silano V, Barat Baviera JM, Bolognesi C, Chesson A, Cocconcelli PS, et al. Update of the risk assessment of di-butylphthalate (DBP), butyl-benzyl-phthalate (BBP), bis (2-ethylhexyl) phthalate (DEHP), di-isononylphthalate (DINP) and di-isodecylphthalate (DIDP) for use in food contact materials. EFSA J. 2019;17(12):e05838.

Google Scholar  

European Chemical Agency Committee for Risk Assessment: Opinion on an Annex XV dossier proposing restrictions on FOUR PHTHALATES (DEHP, BBP, DBP, DIBP). ECHA/RAC/RES-O-0000001412-86-140/F. 2017.

European Commision: Registration, evaluation, authorisation and restriction of chemicals (REACH). EC 1907/2006; 2006.

Joint FAO/WHO Expert Committee on Food Additives: Principles and methods for the risk assessment of chemicals in food. Environmental Health Criteria 2009.

European Food Safety Authority. Glossary [ https://www.efsa.europa.eu/en/glossary-taxonomy-terms ].

US Environmental Protection Agency Integrated Risk Information System (IRIS). Glossary [ https://sor.epa.gov/sor_internet/registry/termreg/searchandretrieve/glossariesandkeywordlists/search.do?details=&glossaryName=IRIS%20Glossary ].

Consumer Product Safety Improvement Act of 2008. 122 STAT 3016 2008.

ECHA Authorization List [ https://echa.europa.eu/authorisation-list ].

Zota AR, Calafat AM, Woodruff TJ. Temporal trends in phthalate exposures: findings from the National Health and nutrition examination survey, 2001-2010. Environ Health Perspect. 2014;122(3):235–41.

Schwedler G, Rucic E, Lange R, Conrad A, Koch HM, Pälmke C, et al. Phthalate metabolites in urine of children and adolescents in Germany. Human biomonitoring results of the German Environmental Survey GerES V, 2014–2017. Int J Hyg Environ Health. 2020;225:113444.

Radke EG, Galizia A, Thayer KA, Cooper GS. Phthalate exposure and metabolic effects: a systematic review of the human epidemiological evidence. Environ Int. 2019;132:104768.

Radke EG, Braun JM, Meeker JD, Cooper GS. Phthalate exposure and male reproductive outcomes: a systematic review of the human epidemiological evidence. Environ Int. 2018;121(Pt 1):764–93.

Factor-Litvak P, Insel B, Calafat AM, Liu X, Perera F, Rauh VA, et al. Persistent associations between maternal prenatal exposure to phthalates on child IQ at age 7 years. PLoS One. 2014;9(12):e114003.

Radke EG, Braun JM, Nachman RM, Cooper GS. Phthalate exposure and neurodevelopment: a systematic review and meta-analysis of human epidemiological evidence. Environ Int. 2020;137:105408.

James-Todd TM, Chiu YH, Messerlian C, Mínguez-Alarcón L, Ford JB, Keller M, et al. Trimester-specific phthalate concentrations and glucose levels among women from a fertility clinic. Environ Health. 2018;17(1):55.

Messerlian C, Souter I, Gaskins AJ, Williams PL, Ford JB, Chiu YH, et al. Urinary phthalate metabolites and ovarian reserve among women seeking infertility care. Hum Reprod. 2016;31(1):75–83.

US Environmental Protection Agency: Exposure factors handbook 2011 edition (final report). Washington, DC 2011.

Barr DB, Wilder LC, Caudill SP, Gonzalez AJ, Needham LL, Pirkle JL. Urinary creatinine concentrations in the U.S. population: implications for urinary biologic monitoring measurements. Environ Health Perspect. 2005;113(2):192–200.

Koch HM, Bolt HM, Preuss R, Angerer J. New metabolites of di(2-ethylhexyl)phthalate (DEHP) in human urine and serum after single oral doses of deuterium-labelled DEHP. Arch Toxicol. 2005;79(7):367–76.

Anderson WA, Castle L, Hird S, Jeffery J, Scotter MJ. A twenty-volunteer study using deuterium labelling to determine the kinetics and fractional excretion of primary and secondary urinary metabolites of di-2-ethylhexylphthalate and di-iso-nonylphthalate. Food Chem Toxicol. 2011;49(9):2022–9.

US Food and Drug Administration. Cosmetic Ingredients: Phthalates [ https://www.fda.gov/cosmetics/cosmetic-ingredients/phthalates ].

US Environmental Protection Agency. Assessing and Managing Chemicals under TSCA: Risk Management for Phthalates [ https://www.epa.gov/assessing-and-managing-chemicals-under-tsca/risk-management-phthalates ].

Neal-Kluever A, Aungst J, Gu Y, Hatwell K, Muldoon-Jacobs K, Liem A, et al. Infant toxicology: state of the science and considerations in evaluation of safety. Food Chem Toxicol. 2014;70:68–83.

Wang YX, Zhou B, Chen YJ, Liu C, Huang LL, Liao JQ, et al. Thyroid function, phthalate exposure and semen quality: exploring associations and mediation effects in reproductive-aged men. Environ Int. 2018;116:278–85.

Hauser R, Meeker JD, Duty S, Silva MJ, Calafat AM. Altered semen quality in relation to urinary concentrations of phthalate monoester and oxidative metabolites. Epidemiology. 2006;17(6):682–91.

Huang PC, Kuo PL, Guo YL, Liao PC, Lee CC. Associations between urinary phthalate monoesters and thyroid hormones in pregnant women. Hum Reprod. 2007;22(10):2715–22.

Ferguson KK, Peterson KE, Lee JM, Mercado-García A, Blank-Goldenberg C, Téllez-Rojo MM, et al. Prenatal and peripubertal phthalates and bisphenol a in relation to sex hormones and puberty in boys. Reprod Toxicol. 2014;47:70–6.

Hatch EE, Nelson JW, Qureshi MM, Weinberg J, Moore LL, Singer M, et al. Association of urinary phthalate metabolite concentrations with body mass index and waist circumference: a cross-sectional study of NHANES data, 1999-2002. Environ Health. 2008;7:27.

Swan SH, Liu F, Hines M, Kruse RL, Wang C, Redmon JB, et al. Prenatal phthalate exposure and reduced masculine play in boys. Int J Androl. 2010;33(2):259–69.

Soomro MH, Baiz N, Philippat C, Vernet C, Siroux V, Nichole Maesano C, et al. Prenatal exposure to phthalates and the development of eczema phenotypes in male children: results from the EDEN mother-child cohort study. Environ Health Perspect. 2018;126(2):027002.

US Consumer Product Safety Commission: Chronic Hazard Advisory Panel on Phthalates and Phthalates Alternatives. cpsc.gov/chap ; 2014.

Huang PC, Tsai CH, Liang WY, Li SS, Huang HB, Kuo PL. Early phthalates exposure in pregnant women is associated with alteration of thyroid hormones. PLoS One. 2016;11(7):e0159398.

Johns LE, Ferguson KK, Soldin OP, Cantonwine DE, Rivera-González LO, Del Toro LV, et al. Urinary phthalate metabolites in relation to maternal serum thyroid and sex hormone levels during pregnancy: a longitudinal analysis. Reprod Biol Endocrinol. 2015;13:4.

Machtinger R, Gaskins AJ, Racowsky C, Mansur A, Adir M, Baccarelli AA, et al. Urinary concentrations of biomarkers of phthalates and phthalate alternatives and IVF outcomes. Environ Int. 2018;111:23–31.

Adibi JJ, Whyatt RM, Hauser R, Bhat HK, Davis BJ, Calafat AM, et al. Transcriptional biomarkers of steroidogenesis and trophoblast differentiation in the placenta in relation to prenatal phthalate exposure. Environ Health Perspect. 2010;118(2):291–6.

Yaghjyan L, Sites S, Ruan Y, Chang SH. Associations of urinary phthalates with body mass index, waist circumference and serum lipids among females: National Health and nutrition examination survey 1999-2004. Int J Obes. 2015;39(6):994–1000.

Kim JH, Park HY, Bae S, Lim YH, Hong YC. Diethylhexyl phthalates is associated with insulin resistance via oxidative stress in the elderly: a panel study. PLoS One. 2013;8(8):e71392.

Du YY, Guo N, Wang YX, Hua X, Deng TR, Teng XM, et al. Urinary phthalate metabolites in relation to serum anti-Müllerian hormone and inhibin B levels among women from a fertility center: a retrospective analysis. Reprod Health. 2018;15(1):33.

Boss J, Zhai J, Aung MT, Ferguson KK, Johns LE, McElrath TF, et al. Associations between mixtures of urinary phthalate metabolites with gestational age at delivery: a time to event analysis using summative phthalate risk scores. Environ Health. 2018;17(1):56.

Adibi JJ, Hauser R, Williams PL, Whyatt RM, Calafat AM, Nelson H, et al. Maternal urinary metabolites of Di-(2-Ethylhexyl) phthalate in relation to the timing of labor in a US multicenter pregnancy cohort study. Am J Epidemiol. 2009;169(8):1015–24.

Huang HB, Kuo PH, Su PH, Sun CW, Chen WJ, Wang SL. Prenatal and childhood exposure to phthalate diesters and neurobehavioral development in a 15-year follow-up birth cohort study. Environ Res. 2019;172:569–77.

Whyatt RM, Liu X, Rauh VA, Calafat AM, Just AC, Hoepner L, et al. Maternal prenatal urinary phthalate metabolite concentrations and child mental, psychomotor, and behavioral development at 3 years of age. Environ Health Perspect. 2012;120(2):290–5.

Kim Y, Ha EH, Kim EJ, Park H, Ha M, Kim JH, et al. Prenatal exposure to phthalates and infant development at 6 months: prospective mothers and Children's environmental health (MOCEH) study. Environ Health Perspect. 2011;119(10):1495–500.

Buckley JP, Engel SM, Braun JM, Whyatt RM, Daniels JL, Mendez MA, et al. Prenatal phthalate exposures and body mass index among 4- to 7-year-old children: a pooled analysis. Epidemiology. 2016;27(3):449–58.

Philippat C, Nakiwala D, Calafat AM, Botton J, De Agostini M, Heude B, et al. Prenatal exposure to nonpersistent endocrine disruptors and behavior in boys at 3 and 5 years. Environ Health Perspect. 2017;125(9):097014.

Engel SM, Villanger GD, Nethery RC, Thomsen C, Sakhi AK, Drover SSM, et al. Prenatal phthalates, maternal thyroid function, and risk of attention-deficit hyperactivity disorder in the Norwegian mother and child cohort. Environ Health Perspect. 2018;126(5):057004.

Swan SH, Sathyanarayana S, Barrett ES, Janssen S, Liu F, Nguyen RH, et al. First trimester phthalate exposure and anogenital distance in newborns. Hum Reprod. 2015;30(4):963–72.

Boas M, Frederiksen H, Feldt-Rasmussen U, Skakkebæk NE, Hegedüs L, Hilsted L, et al. Childhood exposure to phthalates: associations with thyroid function, insulin-like growth factor I, and growth. Environ Health Perspect. 2010;118(10):1458–64.

Smerieri A, Testa C, Lazzeroni P, Nuti F, Grossi E, Cesari S, et al. Di-(2-ethylhexyl) phthalate metabolites in urine show age-related changes and associations with adiposity and parameters of insulin sensitivity in childhood. PLoS One. 2015;10(2):e0117831.

Kasper-Sonnenberg M, Wittsiepe J, Wald K, Koch HM, Wilhelm M. Pre-pubertal exposure with phthalates and bisphenol a and pubertal development. PLoS One. 2017;12(11):e0187922.

Zhang Y, Meng X, Chen L, Li D, Zhao L, Zhao Y, et al. Age and sex-specific relationships between phthalate exposures and obesity in Chinese children at puberty. PLoS One. 2014;9(8):e104852.

Trasande L, Sathyanarayana S, Trachtman H. Dietary phthalates and low-grade albuminuria in US children and adolescents. Clin J Am Soc Nephrol. 2014;9(1):100–9.

Trasande L, Sathyanarayana S, Spanier AJ, Trachtman H, Attina TM, Urbina EM. Urinary phthalates are associated with higher blood pressure in childhood. J Pediatr. 2013;163(3):747–753.e741.

Meeker JD, Ferguson KK. Relationship between urinary phthalate and bisphenol a concentrations and serum thyroid measures in U.S. adults and adolescents from the National Health and nutrition examination survey (NHANES) 2007-2008. Environ Health Perspect. 2011;119(10):1396–402.

Trasande L, Spanier AJ, Sathyanarayana S, Attina TM, Blustein J. Urinary phthalates and increased insulin resistance in adolescents. Pediatrics. 2013;132(3):e646–55.

Duty SM, Silva MJ, Barr DB, Brock JW, Ryan L, Chen Z, et al. Phthalate exposure and human semen parameters. Epidemiology. 2003;14(3):269–77.

Mendiola J, Meeker JD, Jørgensen N, Andersson AM, Liu F, Calafat AM, et al. Urinary concentrations of di(2-ethylhexyl) phthalate metabolites and serum reproductive hormones: pooled analysis of fertile and infertile men. J Androl. 2012;33(3):488–98.

Meeker JD, Calafat AM, Hauser R. Urinary metabolites of di(2-ethylhexyl) phthalate are associated with decreased steroid hormone levels in adult men. J Androl. 2009;30(3):287–97.

Meeker JD, Calafat AM, Hauser R. Di(2-ethylhexyl) phthalate metabolites may alter thyroid hormone levels in men. Environ Health Perspect. 2007;115(7):1029–34.

Center for Disease Control and Prevention: Fourth National Report on Human Exposure to Environmental Chemicals. 2019.

Hammond GL. Diverse roles for sex hormone-binding globulin in reproduction. Biol Reprod. 2011;85(3):431–41.

Luisi S, Florio P, Reis FM, Petraglia F. Inhibins in female and male reproductive physiology: role in gametogenesis, conception, implantation and early pregnancy. Hum Reprod Update. 2005;11(2):123–35.

National Academies of Sciences E, Medicine: Application of systematic review methods in an overall strategy for evaluating low-dose toxicity from endocrine active chemicals. 2017.

Maffini M, Vandenberg L. Closing the gap: improving additives safety evaluation to reflect human health concerns. Environ Risk Assess Remediat. 2017;1(3):26–33.

EFSA Panel on Food Contact Materials E, Flavourings, Aids P. Scientific opinion on the risks to public health related to the presence of bisphenol a (BPA) in foodstuffs. EFSA J. 2015;13(1):3978.

Lioy PJ, Hauser R, Gennings C, Koch HM, Mirkes PE, Schwetz BA, et al. Assessment of phthalates/phthalate alternatives in children's toys and childcare articles: review of the report including conclusions and recommendation of the chronic Hazard advisory panel of the consumer product safety commission. J Expo Sci Environ Epidemiol. 2015;25(4):343–53.

National Academy of Sciences: Science and Decisions - Advancing Risk Assessment. In. National Academies of Science, Engineering, and Medicine; 2009.

Rappaport SM, Smith MT. Epidemiology. Environment and disease risks. Science. 2010;330(6003):460–1.

US Food and Drug Administration: Guidance for Industry and Other Stakeholders: Redbook 2000. Toxicological Principles for the Safety Assessment of Food Ingredients 2007.

European Chemical Agency: Framework, Read-Across Assessment. 2017.

Madia F, Worth A, Whelan M, Corvi R. Carcinogenicity assessment: addressing the challenges of cancer and chemicals in the environment. Environ Int. 2019;128:417–29.

Download references

Acknowledgments

The authors are grateful to Dr. Leonardo Trasande for his expert advice and guidance and to those investigators that shared detailed data not included in the public versions of their publications.

This work was funded in part by a grant from MAVA Foundation and by the Food Packaging Forum (FPF). BG and JM are employees of FPF. FPF receives unconditional donations for unrestricted funding, as well as project-related grants, and all funding sources are listed.

https://www.foodpackagingforum.org/about-us/funding . Neither the board of FPF, nor MAVA Foundation interfered with the authors’ freedom to design, conduct, interpret and publish this information.

Author information

Authors and affiliations.

Independent Consultant, Frederick, MD, USA

Maricel V. Maffini

Food Packaging Forum Foundation, Zurich, Switzerland

Birgit Geueke & Jane Muncke

Eawag, Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland

Ksenia Groh

Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden

Bethanie Carney Almroth

You can also search for this author in PubMed   Google Scholar

Contributions

MVM: Conceptualization, Methodology, Investigation and Original draft. BG: Validation, Visualization, Review and Editing. KG: Validation, Review and Editing. BCA: Review and Editing. JM: Conceptualization, Review and Editing, Project administration, Funding acquisition. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Maricel V. Maffini .

Ethics declarations

Competing interests.

MVM and KG are members of FPF science advisory board. MVM is a co-author on a petition to the US Food and Drug Administration to revoke the authorizations to use phthalates in food packaging and processing equipment.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Maffini, M.V., Geueke, B., Groh, K. et al. Role of epidemiology in risk assessment: a case study of five ortho-phthalates. Environ Health 20 , 114 (2021). https://doi.org/10.1186/s12940-021-00799-8

Download citation

Received : 09 November 2020

Accepted : 18 October 2021

Published : 15 November 2021

DOI : https://doi.org/10.1186/s12940-021-00799-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Risk assessment
  • Epidemiology
  • Human health

Environmental Health

ISSN: 1476-069X

case study environmental epidemiology

  • Arnold School of Public Health
  • Location Location
  • Contact Contact
  • Colleges and Schools
  • 2024 News Archive

Devin Bowes uses wastewater-based epidemiology to advance environmental justice, health equity

April 24, 2023  | Erin Bluvas,  [email protected]

Devin Bowes ’ expertise lies at the intersection of human and environmental health. The assistant professor of environmental health sciences first became interested in the field when she was studying nutrition and dietetics at West Chester University of Pennsylvania.

“I discovered my passion for these areas during my undergraduate studies when I first started to learn about community nutrition and food insecurity,” Bowes says. “I grew very fascinated with how our overall health and well-being is, in part, a function of our surrounding environment, and how these circumstances disproportionately impact certain populations.”

Devin Bowes

During her doctoral program at Arizona State University, she expanded on these interests as a graduate research assistant with the Biodesign Centers for Environmental Health Engineering and Health Through Microbiomes. These experiences, coupled with her coursework in biological design, led Bowes to develop expertise in leveraging community wastewater to understand human behavior, exposures, activity and overall health at the population level.

This emerging, interdisciplinary field – which became more widely known during the COVID-19 pandemic due to its usefulness in monitoring local virus levels and outbreaks of SARS-CoV-2 – is known as wastewater-based epidemiology. It involves the analysis of various chemical (e.g., pesticides, drugs) and biological (e.g., infectious diseases) agents in community wastewater to assess trends in near real-time.

Bowes’ work uses the methodologies offered by wastewater-based epidemiology while adopting an environmental justice and health equity lens. She was inspired to incorporate these perspectives into her work when conducting research as a graduate student.

“The COVID-19 pandemic highlighted clear vulnerabilities within our public health infrastructure that exacerbated health disparities,” Bowes says. “In a year-long, neighborhood-level study using wastewater-based epidemiology to monitor trends of SARS-CoV-2 across a city, we learned that wastewater not only served as an early warning indicator of disease presence, but it could also identify hotspots of infection in areas where clinical surveillance could not reach, yet, infection rates were extremely high.”

After her 2022 graduation, Bowes continued her research at the Biodesign Institute as a postdoctoral research scholar. She gained another year of training as a postdoctoral associate at Boston University’s Center on Forced Displaced, where she focused on using wastewater-based epidemiology in migrant populations, before accepting her first tenure-track appointment at the Arnold School.

The overall culture and humble nature felt very supportive, with a tangible collective investment in the growth and success of the school, including a commitment to honoring diversity, equity and inclusion.

She was drawn to the school’s international reputation for academic scholarship and the welcoming environment. Another important factor was the opportunity to join the inaugural cohort of the FIRST FIRRE Program, which included three other faculty members from the Arnold School and the College of Nursing, whose research areas focus on health equity.

“Devin’s unique area of focus on wastewater biosurveillence enables her to expand our knowledge of food insecurity at a population level, as she has developed unique nutritional biomarkers that enable the assessment of the effectiveness of community outreach on health and nutrition in disadvantaged communities often located in food deserts,” says Geoff Scott , chair of the Department of Environmental Health Sciences. “This greatly supports our research efforts on Environmental Justice Strong -related community engagement, which is a hallmark of research focus in our department.”

“The overall culture and humble nature felt very supportive, with a tangible collective investment in the growth and success of the school, including a commitment to honoring diversity, equity and inclusion,” says Bowes “Students are actively engaged in the community and demonstrate remarkable potential for success and continued global impact in their future endeavors.”

Challenge the conventional. Create the exceptional. No Limits.

Environmental Epidemiology

Environmental Epidemiology is an advanced epidemiology course that addresses epidemiological research methods used to study environmental exposures from air pollution to heavy metals, and from industrial pollutants to consumer product chemicals. The course will provide an overview of major study designs in environmental epidemiology, including cohort studies, panel studies, natural experiments, randomized controlled trials, time-series, and case-crossover studies. The course will discuss disease outcomes related to environmental exposures, including cancer and diseases of cardiovascular, respiratory, urinary, reproductive, and nervous systems. Case studies in environmental epidemiology will be discussed to provide details of research methods and findings.

Continued decline in the incidence of myocardial infarction beyond the COVID-19 pandemic: a nationwide study of the Swedish population aged 60 and older during 2015–2022

  • Open access
  • Published: 23 April 2024

Cite this article

You have full access to this open access article

case study environmental epidemiology

  • Anna C. Meyer   ORCID: orcid.org/0000-0003-2749-7179 1 ,
  • Marcus Ebeling   ORCID: orcid.org/0000-0002-6531-8525 1 , 3 ,
  • Enrique Acosta   ORCID: orcid.org/0000-0001-6250-4018 2 , 3 &
  • Karin Modig   ORCID: orcid.org/0000-0002-5151-4867 1  

1 Altmetric

The number of myocardial infarctions declined during the early COVID-19 pandemic but mechanisms behind these declines are poorly understood. COVID-19 infection is also associated with an increased risk of myocardial infarction which could lead to higher incidence rates in the population. This study aims to shed light on the seemingly paradoxical relationship between COVID-19 and myocardial infarction occurrence on the population level by exploring long-term trends in incidence rates, case fatality, and proportion of patients dying before reaching a hospital. Our work is based on a linkage of administrative registers covering the entire population aged 60 + in Sweden. Considering both long-term trends since 2015 and seasonal variability, we compared observed incidence, case fatality, and proportions of patients hospitalized to expected values during 2020–2022. Despite more than 200 laboratory-confirmed COVID-19 cases per 1000 inhabitants by the end of 2022, incidence rates of myocardial infarction continued to decline, thus following the long-term trend observed already before 2020. During the first pandemic wave there was an additional incidence decline corresponding to 13% fewer myocardial infarctions than expected. This decline was neither accompanied by increasing case fatality nor by lower shares of patients being hospitalized. We found no increase in the population-level incidence of myocardial infarction despite large-scale exposure to COVID-19, which suggests that the effect of COVID-19 on myocardial infarction risk is not substantial. Increased pressure on the Swedish health care system has not led to increased risks or poorer outcomes for patients presenting with acute myocardial infarction.

Avoid common mistakes on your manuscript.

Introduction

During the early phase of the COVID-19 pandemic, the number of hospital admissions for myocardial infarction declined across the world [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 ]. Despite the vast number of published studies, it is still unknown whether the observed declines reflect a real decrease in the risk of myocardial infarction or merely the fact that fewer patients reached a hospital. Several studies hypothesized that reduced care seeking among individuals who experienced symptoms of myocardial infarction led to lower numbers of patients presenting in hospitals [ 15 , 16 , 17 , 18 ]. Evidence for this hypothesis is, however, mixed. Although delays in the care of myocardial infarction were observed in some settings, several European studies did not find significantly increased delays between symptom onset and first medical contact for patients with myocardial infarction [ 16 , 19 , 20 ]. Moreover, since timely treatment of myocardial infarction is essential, delays in care seeking would likely result in increasing case fatality and in growing proportions of patients dying before reaching the hospital. Lower admission rates for cardiovascular diseases were indeed accompanied by increased case fatality during the first pandemic year in the United States [ 18 ] but not in several European countries [ 4 , 8 , 14 , 21 ]. However, there are also other hypotheses that could explain the declining number of myocardial infarctions during the early pandemic, such as changes in lifestyle factors, stress levels, and environmental exposures [ 22 ]. These hypotheses would result not only in a decline in the number of patients presenting at hospitals but also in an overall decline in incidence rates.

In parallel, COVID-19 infection has been linked to an increase in the risk of cardiovascular diseases, including acute myocardial infarction [ 23 ]. This suggests that the initial decline in hospital admissions might eventually turn into increasing incidence rates of myocardial infarction in the long run—especially since COVID-19 affected a large share of the population. Moreover, the cardiovascular damage and exacerbation of cardiovascular disease during infection with the SARS-CoV-2 virus [ 23 , 24 ] might have affected survival rates for patients with myocardial infarction. Nevertheless, to our knowledge, data reflecting case-fatality throughout the pandemic years have not been presented.

Incidence rates –as well as case fatality– declined continuously during the past decades. Most previous studies did not take this trend into account and directly compared data observed in 2020 to an earlier reference period, usually 2019 [ 3 , 4 , 6 , 7 , 9 , 10 , 11 , 13 , 16 ]. In addition, only a few studies have estimated incidence rates based on accurate measures of person-time at risk, considering the increased mortality due to COVID-19 itself. Previous studies were, moreover, often limited to clinical populations [ 3 , 4 ]. In contrast to administrative registers, the coverage of clinical databases declined during the pandemic [ 25 , 26 ].

With this study, we explore the seemingly paradoxical scenario in which the number of myocardial infarctions decreased in the early pandemic, but subsequent research revealed an elevated risk of myocardial infarction associated with COVID-19 infection. One could thus hypothesize that the incidence of myocardial infarction increased during the later phases of the pandemic. Here, we present population-wide trends in age-specific incidence rates, case fatality and the proportion of patients hospitalized between the onset of the COVID-19 pandemic and the end of 2022 in Sweden.

Data and study population

This study is based on a linkage of administrative population registers using the unique personal identification number assigned to each Swedish resident. The entire population over the age of 60 residing in Sweden between 2015 and 2022 was identified in the Total Population Register (TPR). Individuals entered the study population in the month of their 60th birthday and were followed until death, emigration, loss to the registers (i.e., no registration in the TPR without recorded death or emigration), or the end of 2022, whichever came first. Based on weekly data on confirmed cases and COVID-19 deaths in Sweden reported by WHO, we defined the first, second, and third pandemic waves as the time periods 23–03-2020 to 12-07-2020, 19-10-2020 to 23-05-2021, and 20-12-2021 to 15-05-2022, respectively [ 27 ].

Myocardial infarctions were identified in the Cause of Death Register (CDR) and in the National Patient Register (NPR) using the 10th version of the International Classification of Diseases (ICD) codes. The CDR records death dates of all individuals registered in Sweden together with ICD codes for the underlying and contributing causes of death. The NPR contains all hospital admissions and specialized outpatient care visits in the country together with ICD diagnoses assigned by physicians.

In accordance with the Swedish National Board of Health and Welfare, incident events were defined through ICD-codes I21 or I22 as main or contributing cause of hospitalization or death occurring at least 28 days apart [ 28 ]. A comparison with clinical data during 2021 showed that this definition yields a sensitivity of 94% for detecting incident myocardial infarction in Sweden [ 28 ]. Older validation studies have further demonstrated excellent positive predictive values (98 and 100%, respectively) [ 28 , 29 ]. Case fatality was defined as the proportion of individuals dying within 30 days after the occurrence of a myocardial infarction. The proportion of patients reaching the hospital was calculated as the number of incident events identified in the NPR divided by all incident events.

Information on place of residence was available on a yearly basis. For the stratification by geographical region, we therefore distinguished between individuals registered in Stockholm County and those registered elsewhere on December 31st of the previous year. A person contributed person-time at risk as well as disease events to the population of Stockholm County if they were registered there at the end of the previous year.

Statistical analyses

For each month between January 2015 and December 2022, we calculated person-years at risk by counting the number of days spent at risk of MI for every individual and transforming the total number of days into years. Incidence rates were calculated as the number of incident myocardial infarctions observed divided by person-time at risk for each month. To compare incidence rates during the pandemic to an appropriate reference, i.e., to the expected incidence in absence of the pandemic, we estimated expected monthly myocardial infarctions for the time period March 2020 to December 2022 considering both long-term incidence trends and within-year seasonal variability between January 2015 and February 2020. A quasi-Poisson generalized additive model, including a log-linear component for the long-term secular trend, a cyclic p-spline for seasonality, and an offset component to control for changes in the population at risk, was separately fitted to each age group and sex. Based on these models, we predicted the expected monthly myocardial infarctions from March 2020 to December 2022 in the absence of the pandemic and computed 95% prediction intervals using bootstrapping with 2000 iterations. All analyses were stratified by sex and reported separately for four age groups (60–69, 70–79, 80–89 and 90 or older).

Sensitivity analyses

In sensitivity analyses, we extracted additional data on outpatient care (e.g., visits to outpatient emergency centers) from the NPR. We calculated the number of incident events based on data from inpatient, outpatient, and death records and calculated the proportions of incident events identified in each data source during the first period of the pandemic (March to December 2020). A marked decline in outpatient care utilization could indicate reduced care seeking by patients. Second, we calculated the number of events that could only be identified through outpatient diagnoses, i.e., that had no matching record in either inpatient or death records, which could indicate limited sensitivity of identifying myocardial infarction in the latter two sources. Note that for outpatient records, we only included those with a main diagnosis of myocardial infarction and excluded records with a code for follow-up examinations (ICD-10: Z09).

The first three waves of the COVID-19 pandemic in Sweden (green) together with the cumulative number of confirmed COVID-19 cases per 1000 inhabitants (gray) are shown in Fig.  1 . Since testing capacities were limited in the early pandemic and recommendations to test all suspected cases in laboratories were effectively stopped by the Swedish government during February 2022, the cumulative proportion of the Swedish population affected by the virus likely exceeds the numbers shown in Fig.  1 .

figure 1

Cumulative number of confirmed COVID-19 cases per 1000 inhabitants in Sweden (shaded grey area) [ 24 ]. The black vertical line indicates the date on which most testing for COVID-19 was stopped by the Swedish government (09–02-2022). Vertical green bands show the first three pandemic waves

Time trends in incidence rates of myocardial infarction

Figure  2 shows trends in annual and monthly incidence rates of myocardial infarction between January 2015 and December 2022 stratified by sex and age group. Annual incidence rates (shown as horizontal lines) declined consistently already before 2020. Between 2015 and 2022, declines in annual incidence rates ranged from 16.2% (men aged 60–69) to 37.7% (women aged 90 and older). Declines in annual rates were roughly linear from 2015 through 2022 with the exception of 2020, which deviated from overall trends by exhibiting lower incidence rates. Monthly incidence rates followed a seasonal pattern with a tendency toward lower rates during summer months and higher rates in December and January (Fig.  2 ).

figure 2

Monthly and annual incidence rates of myocardial infarction per 100,000 person-years in the Swedish population aged 60 and older stratified by sex and age group, January 2015 to December 2022. Annual incidence rates are shown as thick horizontal lines (transparent). Thin lines reflect monthly incidence rates. Vertical green bands indicate three pandemic wave periods in Sweden

Figure  3 shows the expected (black line) and observed (blue line) incidence rates of myocardial infarction together with 95% prediction intervals from March 2020 to December 2022 in four age groups. During the first wave of the pandemic, incidence rates were consistently lower than expected for all age groups, although the lower level did not fall outside the prediction interval during all months. In contrast, no consistent deviations from expected numbers were observed during the pandemic’s second and third waves or during the remaining months in 2020 to 2022. The pattern of lower-than-expected incidence rates during the first pandemic wave but no consistent deviations from expected rates thereafter was consistent among men and women (Supplementary Fig. 1) and in Stockholm County as well as the rest of Sweden (Supplementary Fig. 2).

figure 3

Expected (black lines) and observed (blue lines) incidence rates of myocardial infarction per 100,000 person-years in the Swedish population over the age of 60 from March 2020 to December 2022. Expected incidence rates are based on trends since 2015 and shown together with 95% prediction intervals (gray shading). Highlighted dots indicate observed incidence rates that fell outside the prediction intervals. Vertical bands indicate three pandemic wave periods in Sweden

The total number of myocardial infarctions during March to June 2020, i.e., the first pandemic wave (n = 6095), was considerably lower than that during the same period in 2019 (n = 7126), corresponding to a decline of 14.5%. When considering long-term trends, seasonality, and the changing population composition, we estimated approximately 900 (13.0%) fewer myocardial infarctions than expected, largely clustered in age groups between 70 and 89 during March and April 2020 (Fig.  2 ).

Case fatality and proportion of patients receiving hospital care

Figure  4 shows proportions of individuals with incident myocardial infarction who died within 30 days as well as proportions of individuals receiving hospital care between March and December 2022 together with 95% prediction intervals. The respective data stratified by sex are shown in Supplementary Fig. 3. From March to June 2020, 25.5% of all patients died within 30 days of experiencing a myocardial infarction compared with 24.9% during the same period in 2019. Case fatality observed in individual months during 2020–2022 was neither consistently higher nor lower than expected proportions.

figure 4

Proportion of myocardial infarction cases dying within 30 days (case fatality, lower graphs) and proportion of individuals with myocardial infarction receiving hospital care (upper graphs) in the Swedish population aged 60 and older in four age groups, March 2020 to December 2022. Shaded gray areas show 95% prediction intervals. Highlighted dots indicate observed incidence rates that fell outside the prediction intervals. Vertical green bands indicate three pandemic wave periods in Sweden

The average proportion of incident myocardial infarctions receiving hospital care between March and June overall increased from 81.0% in 2015 to 83.7% in 2019 and remained at 83.6% in 2020. From March to June 2022, 84.6% received hospital care. There were no consistent deviations from expected values (Fig.  4 ). We observed only weak seasonal patterns in case fatality and in the proportion hospitalized.

The number of myocardial infarctions identified in the outpatient register without a record in the inpatient or cause of death data was small. Between 2019 and 2020, this number declined to an extent similar to the number of myocardial infarctions in our main analyses (11.2% compared to 10.3%).

The increased risk of myocardial infarction associated with Covid-19 infection, along with the ideas that monitoring of risk factors has been compromised during the pandemic, and that lockdowns have negatively influenced health behaviours, have led to widespread concern about increasing rates of heart disease following the global pandemic [ 16 , 22 , 23 , 24 , 30 ]. Our results do not support these concerns. Despite the high spread of COVID-19 across the Swedish population, we found that incidence rates of myocardial infarction continued to decline at least until the end of 2022, thus following the long-term downward trend observed already before 2020. While there is an indication that the declining trend may have halted among the oldest men, observed rates still lie well within predicted intervals. Even in Stockholm County, an area in which COVID-19 was already widespread during March and April 2020, when vaccinations were not yet available and medical staff was still inexperienced in treating the virus [ 31 ], we found no evidence for increasing rates of myocardial infarction.

Evaluating changes in the incidence of myocardial infarction is challenging, as rates are shaped by a complex interplay of long-term trends, seasonal fluctuations, and changes in the population at risk. Simple comparisons to earlier years can therefore lead to incorrect conclusions and to an overestimation of differences between the pandemic and prepandemic periods. We fitted expected rates for the years 2020 to 2022 based on the previous years’ trends and seasonal variation also considering changes in the composition of the population at risk. Even in these analyses, we found substantially lower incidence rates; approximately 900 fewer events occurred during the first pandemic wave than expected, a number corresponding to 13% fewer than expected myocardial infarctions during this period.

Competing risk of death from COVID-19 is one proposed mechanism behind the declining number of cardiovascular disease events. Severe COVID-19 infections and cardiovascular diseases share common risk factors [ 22 , 30 ], and it is hence possible that the number of high-risk individuals depleted faster than the total population at risk, thereby not only reducing the total number of myocardial infarctions but also incidence rates. This hypothesis is, however, challenged by consistently lower incidence rates in areas outside of Stockholm County already in March and April 2020. These areas experienced virtually no deaths from COVID-19 in this early phase of the pandemic, yet introduced recommendations for older individuals to stay at home [ 31 ]. Furthermore, we analyzed changes in the composition of the population at risk with respect to age, sex, comorbidity, and care status and found no substantial changes during the pandemic.

The etiological mechanisms behind the notable decline in myocardial infarction in the early stage of a global pandemic are intriguing and remain to be studied further. Altered stress levels, lifestyle, and environmental factors, such as reduced air pollution during lockdown, may have contributed to lowering the risk of acute myocardial infarction [ 22 ]. While many of these factors operate through long-term accumulation of risk, factors that trigger myocardial infarctions in the short term, such as stress or air pollution, may contribute as well [ 32 , 33 ]. Research has shown that air pollution can indeed affect the risk of myocardial infarction within weeks, days and even hours of exposure to pollutants [ 34 , 35 , 36 ]. Even despite the comparatively lenient restrictions during the pandemic, Swedish air pollution levels decreased substantially. WHO reported a roughly 30% lower mean annual concentration of NO 2 fine particles and 18% lower concentrations of PM 10 and PM 2.5 particles during 2020 compared to 2018–2019 in Stockholm [ 37 ].

The absence of higher fatality and of higher proportions of patients dying before receiving care is noteworthy. Clinical processes and staff have been challenged during the pandemic; surgeries have been postponed, and waiting times for patients with many diseases have increased [ 38 ]. Indeed, delays in the care pathways of cardiovascular conditions as well as poorer treatment outcomes have been observed in some studies in low- and middle-income countries [ 16 ]. For the Swedish setting, the clinical register Swedeheart reported that the time to treatment of acute myocardial infarction had not been prolonged during the pandemic [ 26 ]. Reporting to this register is not mandatory and has declined during the pandemic [ 25 , 26 ], but our study based on nationwide administrative data supports the conclusion that increased pressure on the Swedish health care system has not led to poorer outcomes for patients presenting with acute myocardial infarction.

Our study has several strengths. We use nationwide administrative data on the entire Swedish population, which allow us to derive precise estimates of person-time at risk and incident myocardial infarction. While reporting to clinical registers is prone to be disrupted once clinical processes are challenged and staff shortages occur, reporting to administrative registers is mandatory and has a high priority because it is directly linked to the reimbursement of health care costs. Sensitivity and positive predictive values for myocardial infarction in Swedish inpatient data have been shown to be excellent [ 28 , 29 ], Specific ICD codes are available to encode a history of myocardial infarction, limiting the probability of misclassifying historical events as incident events. Nevertheless, we cannot rule out some misclassification. Our data did not allow us to identify myocardial infarctions for which patients did not seek any care, and it is further possible that causes of death are misclassified in some instances. However, this would only induce bias if misclassification changed systematically over time. Although one could argue that the accuracy of cause of death assignment has decreased under the pressure of the pandemic, medical scrutiny may have also been promoted by efforts to determine the presence of COVID-19 infection in deceased individuals. Either way, we obtained similar results when excluding data from death records, indicating that misclassification of cause of death cannot explain the pronounced declines in myocardial infarction incidence in Sweden. Finally, it should be noted that our study is limited to ages 60 and above but Swedish authorities reported that 14% of all myocardial infarctions occured in ages below 60 years as of 2022. Younger ages might have adopted different lifestyles than older people during the pandemic, and it is not certain that our findings can be generalized to the younger population.

The incidence of myocardial infarction among individuals aged 60 + in Sweden continued to decrease between 2020 and 2022, despite concerns about an increased incidence of cardiovascular diseases during the COVID-19 pandemic. During the first wave of the pandemic, there was an additional decline in incidence rates. These declines were neither accompanied by increasing case fatality nor by lower shares of patients being hospitalized. Our findings support the conclusion that increased pressure on the Swedish health care system has not led to increased risks or poorer outcomes for patients presenting with acute myocardial infarction. Our work also suggests that the effect of COVID-19 on myocardial infarction risk is not substantial, as we found no increase in incidence at the population level, despite the large share of the population that has been exposed to COVID-19.

Data availability

Data were provided by the Swedish National Board of Health and Welfare and Statistics Sweden. Restrictions apply to the availability of these data, which are thus not publicly accessible. Pseudonymized data are, however, available from the authors upon reasonable request and with permission of the regional ethics board in Stockholm. Aggregated data on age-specific incidence rates as well as statistical code are available upon request from the corresponding author at [email protected].

Swedish corona commission [Coronakommissionen]. Government report. Sverige under pandemin. Sjukvård och folkhälsa [Sweden during the pandemic. Health care and public health] SOU 2021:89. Stockholm2021.

Folkhälsomyndigheten [Swedish Public Health Agency]. Hur har folkhälsan påverkats av covid-19-pandemin? [How has the Covid-19 pandemic affected public health?]: Folkhälsomyndigheten2021.

Huynh J, Barmano N, Karlsson J-E, Stomby A. Sex and age differences in the incidence of acute myocardial infarction during the COVID-19 pandemic in a Swedish health-care region without lockdown: a retrospective cohort study. Lancet Healthy Longevity. 2021;2(5):e283-e9. https://doi.org/10.1016/S2666-7568(21)00085-4

Mohammad MA, Koul S, Olivecrona GK, et al. Incidence and outcome of myocardial infarction treated with percutaneous coronary intervention during COVID-19 pandemic. Heart. 2020;106(23):1812. https://doi.org/10.1136/heartjnl-2020-317685 .

Article   CAS   PubMed   Google Scholar  

Swedish Association of Local Authorities and Regions [SKR Sveriges Kommuner och Regioner]. Hälso och Sjukvårdsrapporten 2021 [Health and health care report 2021]. Stockholm2021.

Bhatt Ankeet S, Moscone A, McElrath Erin E, et al. Fewer Hospitalizations for acute cardiovascular conditions during the covid-19 pandemic. J Am Coll Cardiol. 2020;76(3):280–8. https://doi.org/10.1016/j.jacc.2020.05.038 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Mesnier J, Cottin Y, Coste P, et al. Hospital admissions for acute myocardial infarction before and after lockdown according to regional prevalence of COVID-19 and patient profile in France: a registry study. The Lancet Public Health. 2020;5(10):e536–42. https://doi.org/10.1016/S2468-2667(20)30188-2 .

Article   PubMed   PubMed Central   Google Scholar  

Rattka M, Dreyhaupt J, Winsauer C, et al. Effect of the COVID-19 pandemic on mortality of patients with STEMI: a systematic review and meta-analysis. Heart. 2021;107(6):482. https://doi.org/10.1136/heartjnl-2020-318360 .

Article   CAS   Google Scholar  

Seiffert M, Brunner FJ, Remmel M, et al. Temporal trends in the presentation of cardiovascular and cerebrovascular emergencies during the COVID-19 pandemic in Germany: an analysis of health insurance claims. Clin Res Cardiol. 2020;109(12):1540–8. https://doi.org/10.1007/s00392-020-01723-9 .

Sokolski M, Gajewski P, Zymliński R, et al. Impact of coronavirus disease 2019 (covid-19) outbreak on acute admissions at the emergency and cardiology departments across Europe. Am J Med. 2021;134(4):482–9. https://doi.org/10.1016/j.amjmed.2020.08.043 .

Toniolo M, Negri F, Antonutti M, Masè M, Facchin D. Unpredictable fall of severe emergent cardiovascular diseases hospital admissions during the COVID-19 pandemic: experience of a single large Center in Northern Italy. J Am Heart Assoc. 2020;9(13): e017122. https://doi.org/10.1161/JAHA.120.017122 .

Wu J, Mamas Mamas A, de Belder MA, Deanfield John E, Gale CP. Second decline in admissions with heart failure and myocardial infarction during the covid-19 pandemic. J Am Coll Cardiol. 2021;77(8):1141–3. https://doi.org/10.1016/j.jacc.2020.12.039 .

König S, Ueberham L, Pellissier V, et al. Hospitalization deficit of in- and outpatient cases with cardiovascular diseases and utilization of cardiological interventions during the COVID-19 pandemic: Insights from the German-wide helios hospital network. Clin Cardiol. 2021;44(3):392–400. https://doi.org/10.1002/clc.23549

Campo G, Fortuna D, Berti E, et al. In- and out-of-hospital mortality for myocardial infarction during the first wave of the COVID-19 pandemic in Emilia-Romagna, Italy: a population-based observational study. Lancet Regional Health Europe. 2021;3:100055. https://doi.org/10.1016/j.lanepe.2021.100055

Pourasghari H, Tavolinejad H, Soleimanpour S, et al. Hospitalization, major complications and mortality in acute myocardial infarction patients during the COVID-19 era: a systematic review and meta-analysis. Int J Cardiol Heart Vasculature. 2022;41:101058. https://doi.org/10.1016/j.ijcha.2022.101058

Nadarajah R, Wu J, Hurdus B, et al. The collateral damage of COVID-19 to cardiovascular services: a meta-analysis. Eur Heart J. 2022;43(33):3164–78. https://doi.org/10.1093/eurheartj/ehac227

Sofi F, Dinu M, Reboldi G, et al. Worldwide impact of COVID-19 on hospital admissions for non-ST-elevation acute coronary syndromes (NSTACS): a systematic review with meta-analysis of 553,038 cases. Eur Heart J Qual Care Clin Outcomes. 2023. https://doi.org/10.1093/ehjqcco/qcad048

Nogueira RG, Etter K, Nguyen TN, et al. Changes in the care of acute cerebrovascular and cardiovascular conditions during the first year of the covid-19 pandemic in 746 hospitals in the USA: retrospective analysis. BMJ medicine. 2023;2(1): e000207. https://doi.org/10.1136/bmjmed-2022-000207 .

Article   PubMed   Google Scholar  

Lidin M, Lyngå P, Kinch-Westerdahl A, Nymark C. Patient delay prior to care-seeking in acute myocardial infarction during the outbreak of the coronavirus SARS-CoV2 pandemic. Eur J Cardiovasc Nursing J Working Group Cardiovasc Nursing Eur Soc Cardiol. 2021;20(8):752–9. https://doi.org/10.1093/eurjcn/zvab087 .

Article   Google Scholar  

Granström J, Lantz P, Lidin M, Wahlström M, Nymark C. Perceptions of delay when afflicted by an acute myocardial infarction during the first wave of the COVID-19 pandemic. Eur J Cardiovasc Nurs. 2022:zvac021. https://doi.org/10.1093/eurjcn/zvac021

Bodilsen J, Nielsen PB, Søgaard M, et al. Hospital admission and mortality rates for non-covid diseases in Denmark during covid-19 pandemic: nationwide population based cohort study. BMJ. 2021;373: n1135. https://doi.org/10.1136/bmj.n1135 .

Gorini F, Chatzianagnostou K, Mazzone A, et al. Acute myocardial infarction in the time of COVID-19: a review of biological, environmental, and psychosocial contributors. Int J Environ Res Public Health. 2020;17(20). https://doi.org/10.3390/ijerph17207371

Katsoularis I, Fonseca-Rodríguez O, Farrington P, Lindmark K, Fors Connolly AM. Risk of acute myocardial infarction and ischaemic stroke following COVID-19 in Sweden: a self-controlled case series and matched cohort study. Lancet (London, England). 2021;398(10300):599–607. https://doi.org/10.1016/s0140-6736(21)00896-5 .

Bader F, Manla Y, Atallah B, Starling RC. Heart failure and COVID-19. Heart Fail Rev. 2021;26(1):1–10. https://doi.org/10.1007/s10741-020-10008-2 .

National Board of Health and Welfare [Socialstyrelsen]. Hur Covid-19 har påverkat akut vård av äldre med stroke och hjärtinfarkt? [How has Covid-19 affected acute care of older persons with stroke and myocardial infarction?]. Stockholm: National Board of Health and Welfare [Socialstyrelsen]2020 Contract No.: Dnr. 5.7–17529/2020.

Swedish register for heart diseases [Swedeheart]. Annual report 2021: Swedish register for heart diseases [Swedeheart]2022.

World Health Organization. 2023. https://covid19.who.int/region/euro/country/se . Accessed 26.10.2023.

Socialstyrelsen. Kvalitetsdeklaration. Statistik om hjärtinfarkter 20212022.

Ludvigsson JF, Andersson E, Ekbom A, et al. External review and validation of the Swedish national inpatient register. BMC Public Health. 2011;11(1):450. https://doi.org/10.1186/1471-2458-11-450 .

Italia L, Tomasoni D, Bisegna S, et al. COVID-19 and Heart failure: from epidemiology during the pandemic to myocardial injury, myocarditis, and heart failure sequelae. Front Cardiovasc Med 2021;8.

Swedish corona commission [Coronakommissionen]. Government report. Sverige under pandemin. Smittspridning och smittskydd [Sweden during the pandemic. Spread and prevention] SOU 2021:89. Stockholm. 2021. https://coronakommissionen.com/publikationer/delbetankande-2/ . 1.

Nawrot TS, Perez L, Künzli N, Munters E, Nemery B. Public health importance of triggers of myocardial infarction: a comparative risk assessment. Lancet (London, England). 2011;377(9767):732–40. https://doi.org/10.1016/s0140-6736(10)62296-9 .

Wilbert-Lampen U, Leistner D, Greven S, et al. Cardiovascular events during world cup soccer. N Engl J Med. 2008;358(5):475–83. https://doi.org/10.1056/NEJMoa0707427 .

Dahlquist M, Frykman V, Hollenberg J, et al. Short-term ambient air pollution exposure and risk of out-of-hospital cardiac arrest in sweden: a nationwide case-crossover study. J Am Heart Assoc. 2023;12(21): e030456. https://doi.org/10.1161/jaha.123.030456 .

Sahlén A, Ljungman P, Erlinge D, et al. Air pollution in relation to very short-term risk of ST-segment elevation myocardial infarction: case-crossover analysis of SWEDEHEART. Int J Cardiol. 2019;275:26–30. https://doi.org/10.1016/j.ijcard.2018.10.069 .

Peters A, Dockery DW, Muller JE, Mittleman MA. Increased particulate air pollution and the triggering of myocardial infarction. Circulation. 2001;103(23):2810–5. https://doi.org/10.1161/01.cir.103.23.2810 .

WHO. WHO Ambient Air Quality Database (update 2024). Version 6.1. Geneva, World Health Organization, 2024.

National Board of Health and Welfare [Socialstyrelsen]. Analys av första och andra covid-19-vågen—produktion, köer och väntetider i vården [Analysis of the first and second Covid-19 waves - production, queues and waiting times in the care system]. Stockholm: National Board of Health and Welfare [Socialstyrelsen]2021.

Download references

Acknowledgements

This work was supported by the Swedish Research Council of Health, Working Life and Welfare (FORTE) [grant number 2021-00451]. The funding source did not influence data collection, study design or interpretation of findings.

Open access funding provided by Karolinska Institute. This work was supported by the Swedish Research Council of Health, Working Life and Welfare (FORTE) [grant number 2021–00451]. The funding source did not influence data collection, study design or interpretation of findings.

Author information

Authors and affiliations.

Unit of Epidemiology, Institute of Environmental Medicine, Karolinska Institutet, PO Box 210, 17177, Stockholm, Sweden

Anna C. Meyer, Marcus Ebeling & Karin Modig

Centre for Demographic Studies (CED), Carrer de Ca N’Altayó, Edifici E2 Universitat Autònoma de Barcelona, Bellaterra, 08193, Bellaterra, Spain

Enrique Acosta

Max Planck Institute for Demographic Research, Konrad-Zuse-Str. 1, 18057, Rostock, Germany

Marcus Ebeling & Enrique Acosta

You can also search for this author in PubMed   Google Scholar

Contributions

KM acquired funding for this study. KM and AM initiated the study. AM conducted the literature search. All authors contributed to the design of this study and the selection of methods. AM and ME prepared the data for analysis, and AM, EA, and ME undertook statistical analyses. ME prepared visualizations. AM and KM drafted the manuscript. All authors contributed to the interpretation of findings and critical revision of the manuscript.

Corresponding author

Correspondence to Anna C. Meyer .

Ethics declarations

Conflict of interest.

The authors have no relevant financial or non-financial interests to disclose. The authors report no conflicts of interest.

Ethical approval

This study was approved by the regional ethics committee in Stockholm, Karolinska Institutet, Sweden, and performed in line with the principles of the Declaration of Helsinki (permit numbers Dnr 2011/136–31/5 and Dnr 2020–04753). The board waived the need for patient consent.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 976 kb)

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Meyer, A.C., Ebeling, M., Acosta, E. et al. Continued decline in the incidence of myocardial infarction beyond the COVID-19 pandemic: a nationwide study of the Swedish population aged 60 and older during 2015–2022. Eur J Epidemiol (2024). https://doi.org/10.1007/s10654-024-01118-4

Download citation

Received : 10 November 2023

Accepted : 15 March 2024

Published : 23 April 2024

DOI : https://doi.org/10.1007/s10654-024-01118-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Myocardial infarction
  • Epidemiological monitoring
  • Public health
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 19 April 2024

GbyE: an integrated tool for genome widely association study and genome selection based on genetic by environmental interaction

  • Xinrui Liu 1 , 2 ,
  • Mingxiu Wang 1 ,
  • Jie Qin 1 ,
  • Yaxin Liu 1 ,
  • Shikai Wang 1 ,
  • Shiyu Wu 1 ,
  • Ming Zhang 1 ,
  • Jincheng Zhong 1 &
  • Jiabo Wang 1  

BMC Genomics volume  25 , Article number:  386 ( 2024 ) Cite this article

257 Accesses

Metrics details

The growth and development of organism were dependent on the effect of genetic, environment, and their interaction. In recent decades, lots of candidate additive genetic markers and genes had been detected by using genome-widely association study (GWAS). However, restricted to computing power and practical tool, the interactive effect of markers and genes were not revealed clearly. And utilization of these interactive markers is difficult in the breeding and prediction, such as genome selection (GS).

Through the Power-FDR curve, the GbyE algorithm can detect more significant genetic loci at different levels of genetic correlation and heritability, especially at low heritability levels. The additive effect of GbyE exhibits high significance on certain chromosomes, while the interactive effect detects more significant sites on other chromosomes, which were not detected in the first two parts. In prediction accuracy testing, in most cases of heritability and genetic correlation, the majority of prediction accuracy of GbyE is significantly higher than that of the mean method, regardless of whether the rrBLUP model or BGLR model is used for statistics. The GbyE algorithm improves the prediction accuracy of the three Bayesian models BRR, BayesA, and BayesLASSO using information from genetic by environmental interaction (G × E) and increases the prediction accuracy by 9.4%, 9.1%, and 11%, respectively, relative to the Mean value method. The GbyE algorithm is significantly superior to the mean method in the absence of a single environment, regardless of the combination of heritability and genetic correlation, especially in the case of high genetic correlation and heritability.

Conclusions

Therefore, this study constructed a new genotype design model program (GbyE) for GWAS and GS using Kronecker product. which was able to clearly estimate the additive and interactive effects separately. The results showed that GbyE can provide higher statistical power for the GWAS and more prediction accuracy of the GS models. In addition, GbyE gives varying degrees of improvement of prediction accuracy in three Bayesian models (BRR, BayesA, and BayesCpi). Whatever the phenotype were missed in the single environment or multiple environments, the GbyE also makes better prediction for inference population set. This study helps us understand the interactive relationship between genomic and environment in the complex traits. The GbyE source code is available at the GitHub website ( https://github.com/liu-xinrui/GbyE ).

Peer Review reports

Genetic by environmental interaction (G × E) is crucial of explaining individual traits and has gained increasing attention in research. It refers to the influence of genetic factors on susceptibility to environmental factors. In-depth study of G × E contributes to a deeper understanding of the relationship between individual growth, living environment and phenotypes. Genetic factors play a role in most human diseases at the molecular or cellular level, but environmental factors also contribute significantly. Researchers aim to uncover the mechanisms behind complex diseases and quantitative traits by investigating the interactions between organisms and their environment. Common, complex, or rare human diseases are often considered as outcomes resulting from the interplay of genes, environmental factors, and their interactions. Analyzing the joint effects of genes and the environment can provide valuable insights into the underlying pathway mechanisms of diseases. For instance, researchers have successfully identified potential loci associated with asthma risk through G × E interactions [ 1 ], and have explored predisposing factors for challenging-to-treat diseases like cancer [ 2 , 3 ], rhinitis [ 4 ], and depression [ 5 ].

However, two main methods are currently being used by breeders in agricultural production to increase crop yields and livestock productivity [ 6 ]. The first is to develop varieties with relatively low G × E effect to ensure stable production performance in different environments. The second is to use information from different environments to improve the statistical power of genome-wide association study (GWAS) to reveal potential loci of complex traits. The first method requires long-term commitment, while the second method clearly has faster returns. In GWAS, the use of multiple environments or phenotypes for association studies has become increasingly important. This not only improves the statistical power of environmental susceptibility traits[ 7 ], but also allows to detect signaling loci for G × E. There are significant challenges when using multiple environments or phenotypes for GWAS, mainly because most diseases and quantitative traits have numerous associated loci with minimal impact [ 8 ], and thus it is impossible to determine the effect size regulated by environment in these loci. The current detection strategy for G × E is based on complex statistical model, often requiring the use of a large number of samples to detect important signals [ 9 , 10 ]. In GS, breeders can use whole genome marker data to identify and select target strains in the early stages of animal and plant production [ 11 , 12 , 13 ]. Initially, GS models, similar to GWAS models, could only analyze a single environment or phenotype [ 14 ]. To improve the predictive accuracy of the models, higher marker densities are often required, allowing the proportion of genetic variation explained by these markers to be increased, indirectly obtaining higher predictive accuracy. It is worth mentioning that the consideration of G × E and multiple phenotypes in GS models [ 15 ] has been widely studied in different plant and animal breeding [ 16 ]. GS models that allow G × E have been developed [ 17 ] and most of them have modeled and interpreted G × E using structured covariates [ 18 ]. In these studies, most of the GS models provided more predictive accuracy when combined with G × E compared to single environment (or phenotype) analysis. Hence, there is need to develop models that leverage G × E information for GWAS and GS studies.

This study developed a novel genotype-by-environment method based on R, termed GbyE, which leverages the interaction among multiple environments or phenotypes to enhance the association study and prediction performance of environmental susceptibility traits. The method enables the identification of mutation sites that exhibit G × E interactions in specific environments. To evaluate the performance of the method, simulation experiments were conducted using a dataset comprising 282 corn samples. Importantly, this method can be seamlessly integrated into any GWAS and GS analysis.

Materials and methods

Support packages.

The development purpose of GbyE is to apply it to GWAS and GS research, therefore it uses the genome association and prediction integrated tool (GAPIT) [ 19 ], Bayesian Generalized Linear Regression (BGLR) [ 20 ], and Ridge Regression Best Linear Unbiased Prediction (rrBLUP) [ 21 ]package as support packages, where GbyE only provides conversion of interactive formats and file generation. In order to simplify the operation of the GbyE function package, the basic calculation package is attached to this package to support the operation of GbyE, including four function packages GbyE.Simulation.R (Dual environment phenotype simulation based on heritability, genetic correlation, and QTL quantity), GbyE.Calculate.R (For numerical genotype and phenotype data, this package can be used to process interactive genotype files of GbyE), GbyE.Power.FDR.R (Calculate the statistical power and false discovery rate (FDR) of GWAS), and GbyE.Comparison.Pvalue.R (GbyE generates redundant calculations in GWAS calculations, and SNP effect loci with minimal p -values can be filtered by this package).

Samples and sequencing data

In this study, a small volume of data was used for software simulation analysis, which is widely used in testing tasks of software such as GAPIT, TASSEL, and rMPV. The demonstration data comes from 282 inbred lines of maize, including 4 phenotypic data. In any case, there are no missing phenotypes in these data, and this dataset can be obtained from the website of GAPIT ( https://zzlab.net/GAPIT/index.html , accessed on May 1, 2022). Among them, our phenotype data was simulated using a self-made R simulation function, and the Mean and GbyE phenotype files were calculated. Convert this format to HapMap format using PLINK v1.09 and scripts written by oneself.

Simulated traits

Phenotype simulation was performed by modifying the GAPIT.Phenotype.Simulation function in the GAPIT. Based on the input parameter NQTN, the random selected markers’ genotype from whole genome were used to simulate genetic effect in the simulated trait. The genotype effects of these selected QTNs were randomly sampled from a multivariate normal distribution, the correlation value between these normal distribution was used to define the genetic relationship between each environments. The additive heritability ( \({{\text{h}}}_{{\text{g}}}^{2}\) ) was used to scale the relationship between additive genetic variance and phenotype variance. The simulated phenotype conditions in this paper are set as follows: 1) The three levels of \({{\text{h}}}_{{\text{g}}}^{2}\) were set at 0.8, 0.5, and 0.2, representing high ( \({{\text{h}}}_{{\text{h}}}^{2}\) ), median ( \({{\text{h}}}_{{\text{m}}}^{2}\) ) and low ( \({{\text{h}}}_{{\text{l}}}^{2}\) ) heritability; 2) Genetic correlation were set three levels 0.8, 0.5, 0.2 representing high ( \({{\text{R}}}_{{\text{h}}}\) ), medium ( \({{\text{R}}}_{{\text{m}}}\) ) and low ( \({{\text{R}}}_{{\text{l}}}\) ) genetic correlation; 3) 20 pre-set effect loci of QTL. The phenotype values in each environment were simulated together following above parameters.

Genetic by environment interaction model

The pipeline analysis process of GbyE includes three steps: data preprocessing, production converted, Association analysis. Normalize the phenotype data matrix Y of the dual environment and perform GbyE conversion to generate phenotype data in GbyE.Y format. The genotype data format, such as hapmap, vcf, bed and other formats firstly need to be converted into numerical genotype format (homozygotes were coded as 0 or 2, heterozygotes were coded as 1) using software or scripts such as GAPIT, PLINK, etc. The environment (E) matrix is environment index matrix. The G (n × m) originally of genotype matrix was converted as GbyE.GD(2n × 2 m) \(\left[\begin{array}{cc}G& 0\\ G& G\end{array}\right]\) during the Kronecker product, and the Y vector (n × 1) was also converted as the GbyE.Y vector (2n × 1) after normalization. The duplicated data format indicated different environments, genetic effect, and populations. The genomic data we used in the analysis was still retained the whole genome information. The first column of E is the additive effect, which was the average genetic effect among environments. The others columns of E are the interactive effect, which should be less one column than the number of environments. Because it need to avoid the linear dependent in the model. In the GbyE algorithm, we coded the first environment as background as default, that means the genotype in the first environment are 0, the others are 1. Then the Kronecker product of G and environment index matrix was named as GbyE.GD. The interactive effect part of the GbyE.GD matrix in the GWAS and GS were the relative values based on the first environment (Fig.  1 ). The GbyE environmental interaction matrix can be easily obtained by constructing the interaction matrix E (e.g., Eq. 1 ) such that the genotype matrix G is Kronecker-product with the design interaction matrix E (e.g., Eq. 2 ), in which \(\left[\begin{array}{c}G\\ G\end{array}\right]\) matrix is defined as additive effect and \(\left[\begin{array}{c}0\\ G\end{array}\right]\) matrix is defined as interactive effect. \(\left[\begin{array}{cc}G& 0\\ G& G\end{array}\right]\) matrix is called gene by environment interaction matrix, hereinafter referred to as the GbyE matrix. The phenotype file (GbyE.Y) and genotype file (GbyE.GD) after transformation by GbyE will be inputted into the GWAS and GS models and computed as standard phenotype and genotype files.

where G is the matrix of whole genotype and E is the design matrix for exploring interactive effects. GbyE mainly uses the Kronecker product of the genetic matrix (G) and the environmental matrix (E) as the genotype for subsequent GWAS as a way to distinguish between additive and interactive effects.

figure 1

The workflow pipeline of GbyE. The GbyE contains three main steps. (Step 1) Preprocessing of phenotype and genotype data,. The phenotype values in each environment was normalized respectively. Meanwhile, all genotype from HapMap, VCF, BED, and other types were converted to numeric genotype; (Step 2) Generate GbyE phenotype and interactive genotype matrix through the transformation of GbyE. In GbyE.GD matrix, the blue characters indicate additive effect, and red ones indicate interactive effect; (Step 3) The MLM and rrBLUP and BGLR were used to perform GWAS and GS

Association analysis model

The mixed linear model (MLM) of GAPIT is used as the basic model for GWAS analysis, and the principal component analysis (PCA) parameter is set to 3. Then the p -values of detection results are sorted and their power and FDR values are calculated. General expression of MLM (Fig.  1 ):

where Y is the vector of phenotypic measures (2n × 1); PCA and SNP i were defined as fixed effects, with a size of (2n × 2 m); Z is the incidence matrix of random effects; μ is the random effect vector, which follows the normal distribution μ ~ N(0, \({\delta }_{G}^{2}\) K) with mean vector of 0 and variance covariance matrix of \({\delta }_{G}^{2}\) K, where the \({\delta }_{G}^{2}\) is the total genetic variance including additive variance and interactive variance, the K is the kinship matrix built with all genotype including additive genotype and interactive genotype; e is a random error vector, and its elements need not be independent and identically distributed, e ~ N(0, \({\delta }_{e}^{2}\) I), where the \({\delta }_{e}^{2}\) is the residual and environment variance, the I is the design matrix.

Detectivity of GWAS

In the GWAS results, the list of markers following the order of P-values was used to evaluate detectivity of GWAS methods. When all simulated QTNs were detected, the power of the GWAS method was considered as 1 (100%). From the list of markers, following increasing of the criterion of real QTN, the power values will be increasing. The FDR indicates the rate between the wrong criterion of real QTNs and the number of all un-QTNs. The mean of 100 cycles was used to consider as the reference value for statistical power comparison. Here, we used a commonly used method in GWAS research with multiple traits or environmental phenotypes as a comparison[ 22 ]. This method obtains the mean of phenotypic values under different conditions as the phenotypic values for GWAS analysis, called the Mean value method, Compare the calculation results of GbyE with the additive and interactive effects of the mean method to evaluate the detection power of the GbyE strategy. Through the comprehensive analysis of these evaluation indicators, we aim to comprehensively evaluate the statistical power of the GbyE strategy in GWAS and provide a reference for future optimization research.

Among them, the formulae for calculating Power and FDR are as follows:

where \({{\text{n}}}_{{\text{i}}}\) indicates whether the i-th detection is true, true is 1, false is 0; \({{\text{m}}}_{{\text{r}}}\) is the total number of all true QTLs in the sample size; the maximum value of Power is 1.

where \({{\text{N}}}_{{\text{i}}}\) represents the i-th true value detected in the pseudogene, true is 1, false is 0. and cumulative calculation; \({{\text{M}}}_{{\text{f}}}\) is the number of all labeled un-QTNs in the total samples; the maximum value of FDR is 1.

Genomic prediction

To comparison the prediction accuracy of different GS models using GbyE, we performed rrBLUP, Bayesian methods using R packages. All phenotype of reference population and genotype of all population were used to train the model and predict genomic estimated breeding value (gEBV) of all individuals. The correlation between real phenotypes and gEBV of inference population was considered as prediction accuracy. fivefold cross-validation and 100 times repeats was performed to avoid over prediction and reduce bias. In order to distinguish the additive and interactive effects in GbyE, we designed two lists of additive and interactive effects in the "ETA" of BGLR, and put the additive and interactive effects into the model as two kinships for random objects. However, it was not possible to load the gene effects of the two lists in rrBLUP, so the additive and interactive genotypes together were used to calculate whole genetic kinship in rrBLUP (Fig.  1 ). Relevant parameters in BGLR are set as follows: 1) model set to "RRB"; 2) nIter is set to "12000"; 3) burnIn is set to "10000". The results of the above operations are averaged over 100 cycles. We also validated the GbyE method using four other Bayesian methods (BayesA, BayesB, BayesCpi, and Bayesian LASSO) in addition to RRB in BGLR.

Partial missing phentoype in the prediction

In this study, we artificially missed phenotype values in the single and double environments in the whole population from 281 inbred maize datasets. In the missing single environment case, the inference set in the cross-validation was selected from whole population, and each individual in the inference were only missed phenotypes in the one environment. The phenotype in the other environment was kept. The genotypes were always kept. In the case of missing double environments, both phenotypes and genotypes of environment 1 and environment 2 are missing, and the model can only predict phenotypic values in the two missing environments through the effects of other markers. In addition, the data were standardized and unstandardized to assess whether standardization had an effect on the estimation of the model. This experiment was tested using the "ML" method in rrBLUP to ensure the efficiency of the model.

GWAS statistical power of models at different heritabilities and genetic correlations

Power-FDR plots were used to demonstrate the detection efficiency of GbyE at three genetic correlation and three genetic power levels, with a total of nine different scenarios simulated (from left to right for high and low genetic correlation and from top to bottom for high and low genetic power). In order to distinguish whether the effect of improving the detection ability of genome-wide association analysis in GbyE is an additive effect or an effect of environmental interactions, we plotted their Power-FDR curves separately and added the traditional Mean method for comparative analysis. As shown in Fig.  2 , GbyE algorithm can detect more statistically significant genetic loci with lower FDR under any genetic background. However, in the combination with low heritability (Fig.  2 A, B, C), the interactive effect detected more real loci than GbyE under low FDR, but with the continued increase of FDR, GbyE detected more real loci than other groups. Under the combination with high heritability, all groups have high statistical power at low FDR, but with the increase of FDR, the statistical effect of GbyE gradually highlights. From the analysis of heritability combinations at all levels, the effect of heritability on interactive effect is not obvious, but GbyE always maintains the highest statistical power. The average detection power of GWAS in GbyE can be increased by about 20%, and with the decrease of genetic correlation, the effect of GbyE gradually highlights, indicating that the G × E plays a role.

figure 2

The power-FDR testing in simulated traits. Comparing the efficacy of the GbyE algorithm with the conventional mean method in terms of detection power and FDR. From left to right, the three levels of genetic correlation are indicated in order of low, medium and high. From top to bottom, the three levels of heritability, low, medium and high, are indicated in order. (1) Inter: Interactive section extracted from GbyE; (2) AddE: Additive section extracted from GbyE; (3) \({{\text{h}}}_{{\text{l}}}^{2}\) , \({{\text{h}}}_{{\text{m}}}^{2}\) , \({{\text{h}}}_{{\text{g}}}^{2}\) : Low, medium, high heritability; (4) \({{\text{R}}}_{{\text{l}}}\) , \({{\text{R}}}_{{\text{m}}}\) , \({{\text{R}}}_{{\text{l}}}\) : where R stands for genetic correlation, represents three levels of low, medium and high

Resolution of additive and interactive effect

The output results of GbyE could be understood as resolution of additive and interactive genetic effect. Hence, we created a combined Manhattan plots with Mean result from MLM, additive, and interactive results from GbyE. As shown in Fig.  3 , true marker loci were detected on chromosomes 1, 6 and 9 in Mean, and the same loci were detected on chromosomes 1 and 6 for the additive result in GbyE (the common loci detected jointly by the two results were marked as solid gray lines in the figure). All known pseudo QTNs were labeled with gray dots in the circle. Total 20 pseudo QTNs were simulated in such trait (The heritability is set to 0.9, and the genetic correlation is set to 0.1). Although the additive section in GbyE did not catch the locus on chromosome 9 yet (those p-values of markers did not show above the significance threshold (p-value < 3.23 × 10 –6 )), it has shown high significance relative to other markers of the same chromosome. In the reciprocal effect of GbyE, we detected more significant loci on chromosomes 1, 2, 3 and 10, and these loci were not detected in either of the two previous sections. An integrate QQ plot (Fig.  3 D) shows that the overall statistical power of the additive section in Mean and GbyE are close, nevertheless, the interactive section in the GbyE provided a bit of inflation.

figure 3

Manhattan statistical comparison plot. Manhattan comparison plots of mean ( A ), additive ( B ) and gene-environment interactive sections ( C ) at a heritability of 0.9 and genetic correlation of 0.1. Different colors are used in the diagram to distinguish between different chromosomes (X-axis). Loci with reinforcing circles and centroids are set up as real QTN loci. Consecutive loci found in both parts are shown as id lines, and loci found separately in the reciprocal effect only are shown as dashed lines. Parallel horizontal lines indicate significance thresholds ( p -value < 3.23 × 10 –6 ). D Quantile–quantile plots of simulated phenotypes for demo data from genome-wide association studies. x-axis indicates expected values of log p -values and y-axis is observed values of log p -values. The diagonal coefficients in red are 1. GbyE-inter is the interactive section in GbyE; GbyE-AddE is the additive section in GbyE

Genomic selection in assumption codistribution

The prediction accuracy of GbyE was significantly higher than the Mean value method by model statistics of rrBLUP in most cases of heritability and genetic correlation (Fig.  4 ). The prediction accuracy of the additive effect was close to that of Mean value method, which was consistent with the situation under the low hereditary. The prediction accuracy of interactive sections in GbyE remains at the same level as in GbyE, and interactive section plays an important role in the model. We observed that in \({{\text{h}}}_{{\text{l}}}^{2}{{\text{R}}}_{{\text{h}}}\) (Fig.  4 C), \({{\text{h}}}_{{\text{m}}}^{2}{{\text{R}}}_{{\text{h}}}\) (Fig.  4 F), \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig.  4 G), the prediction accuracy of GbyE was slightly higher than the Mean value method, but there was no significant difference overall. In addition, we only observed that the prediction accuracy of GbyE was slightly lower than the Mean value method in \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig.  4 H), but there was still no significant difference between GbyE and Mean value methods. Under the combination of low heritability and genetic correlation, the prediction accuracy of Mean value method and additive effect model remained at a similar level. However, with the continuous increase of heritability and genetic correlation, the difference in prediction accuracy between the two gradually increases. In summary, the GbyE algorithm can improve the accuracy of GS by capturing information on multiple environment or trait effects under the rrBLUP model.

figure 4

Box-plot of model prediction accuracy. The prediction accuracy (pearson's correlation coefficient) of the GbyE algorithm was compared with the tradition al Mean value method in a simulation experiment of genomic selection under the rrBLUP operating environment. The effect of different levels of heritability and genetic correlation on the prediction accuracy of genomic selection was simulated in this experiment. Each row from top to bottom represents low heritability ( \({{\text{h}}}_{{\text{l}}}^{2}\) ), medium heritability ( \({{\text{h}}}_{{\text{m}}}^{2}\) ) and high heritability ( \({{\text{h}}}_{{\text{h}}}^{2}\) ), respectively; each column from left to right represents low genetic correlation ( \({{\text{R}}}_{{\text{l}}}\) ), medium genetic correlation ( \({{\text{R}}}_{{\text{m}}}\) ) and high genetic correlation ( \({{\text{R}}}_{{\text{h}}}\) ), respectively; The X-axis shows the different test methods and effects, and the Y-axis shows the prediction accuracy

Genomic selection in assumption un-codistribution

The overall performance of GbyE under the 'BRR' statistical model based on the BGLR package remained consistent with rrBLUP, maintaining high predictive accuracy in most cases of heritability and genetic relatedness (Fig. S1 ). However, when the heritability is set to low and medium, the difference between the prediction accuracy of GbyE algorithm and Mean value method gradually decreases with the continuous increase of genetic correlation, and there is no statistically significant difference between the two. The prediction accuracy of the model by GbyE in \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig. S1 G) and \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{h}}}\) (Fig. S1 I) is significantly higher than that by Mean value method when the heritability is set to be high. On the contrary, when the genetic correlation is set to medium, there is no significant difference between GbyE and Mean value method in improving the prediction accuracy of the model, and the overall mean of GbyE is lower than Mean. When GbyE has relatively high heritability and low genetic correlation, its prediction accuracy is significantly higher than the mean method, such as \({{\text{h}}}_{{\text{m}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig. S1 D), \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{l}}}\) (Fig. S1 G), and \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{m}}}\) (Fig. S1 H). Therefore, GbyE is more suitable for situations with high heritability and low genetic correlation.

Adaptability of Bayesian models

Next, we tested a more complex Bayesian model. The GbyE algorithm and Mean value method were combined with five Bayesian algorithms in BGLR for GS analysis, and the computing R script was used for phenotypic simulation test, where heritability and genetic correlation were both set to 0.5. The results indicate that among the three Bayesian models of RRB, BayesA, and BayesLASSO, the predictive accuracy of GbyE is significantly higher than that of Mean value method (Fig.  5 ). In contrast, under the Bayesian models of BayesB and BayesCpi, the prediction accuracy of GbyE is lower than that of the Mean value method. The GbyE algorithm improves the prediction accuracy of the three Bayesian models BRR, BayesA, and BayesLASSO using information from G × E and increases the prediction accuracy by 9.4%, 9.1%, and 11%, respectively, relative to the Mean value method. However, the predictive accuracy of the BayesB model decreased by 11.3%, while the BayescCpi model decreased by 6%.

figure 5

Relative prediction accuracy histogram for different Bayesian models. The X-axis is the Bayesian approach based on BGLR, and the Y-axis is the relative prediction accuracy. Where we normalize the prediction accuracy of Mean (the prediction accuracy is all adjusted to 1); the prediction accuracy of GbyE is the increase or decrease value relative to Mean in the same group of models

Impact of all and partial environmental missing

We tested missing the environmental by using simulated data. In the case of the simulated data, we simulated a total of nine situations with different heritability and genetic correlations (Fig.  6 ) and conducted tests on single and dual environment missing. The improvement in prediction accuracy by the GbyE algorithm was found to be significantly higher than the Mean value method in single environment deletion, regardless of the combination of heritability and genetic correlation. In the case of \({{\text{h}}}_{{\text{h}}}^{2}{{\text{R}}}_{{\text{h}}}\) , the prediction accuracy of GbyE is higher than 0.5, which is the highest value among all simulated combinations. When GbyE estimates the phenotypic values of Environment 1 and Environment 2 separately, its predictive accuracy seems too accurate. On the other hand, when the phenotypic values of both environments are missing on the same genotype, the predictive accuracy of GbyE does not show a significant decrease, and even maintains accuracy comparable to that of a single environment missing. However, when GbyE estimates Environment 1 and Environment 2 separately, the prediction accuracy significantly decreases compared to when a single environment is missing, and the prediction accuracy of Environment 1 and Environment 2 in \({{\text{h}}}_{{\text{l}}}^{2}{{\text{R}}}_{{\text{m}}}\) is extremely low (Fig.  6 B). In addition, the prediction accuracy of GbyE is lower than Mean values only in \({{\text{h}}}_{{\text{l}}}^{2}{{\text{R}}}_{{\text{h}}}\) , whether it is missing in a single or dual environment.

figure 6

Prediction accuracy of simulated data in single and dual environment absence. The prediction effect of GbyE was divided into two parts, environment 1 and environment 2, to compare the prediction accuracy of GbyE when predicting these two parts separately. This includes simulations with missing phenotypes and genotypes in environment 1 only ( A ) and simulations with missing in both environments ( B ). The horizontal coordinates of the graph indicate the different combinations of heritabilities and genetic correlations of the simulations

The phenotype of organisms is usually controlled by multiple factors, mainly genetic [ 23 ] and environmental factors [ 24 ], and their interactive factors. The phenotype of quantitative traits is often influenced by these three factors [ 25 , 26 ]. However, based on the computing limitation and lack of special tool, the interactive effect always was ignored in most GWAS and GS research, and it is difficult to distinguish additive and interactive effects. The rate between all additive genetic variance and phenotype variance was named as narrow sense heritability. The accuracy square of prediction of additive GS model is considered that can not surpass narrow sense heritability. In this study, the additive effects in GbyE are essentially equivalent to the detectability of traditional models, the key advantage of GbyE is the interactive section. More significant markers with interactive effects were detected. Detecting two genetic effects (additive and interactive sections) in GWAS and GS is a boost to computational complexity, while obtaining genotypes for genetic interactions by Kronecker product is an efficient means. This allows the estimation of additive and interactive genetic effects separately during the analysis, and ultimately the estimated genetic effects for each GbyE genotype (including additive and interactive genetic effect markers) are placed in a t-distribution for p -value calculation, and the significance of each genotype is considered by multiple testing. The GbyE also expanded the estimated heritability as generalized heritability which could be explained as the rate between total genetics variance and phenotype variance.

The genetic correlation among traits in multiple environments is the major immanent cause of GbyE. When the genetic correlation level is high, then additive genetic effects will play primary impact in the total genetic effect, and interactive genetic effects with different traits or environments are often at lower levels [ 27 ]. Therefore, the statistical power of the GbyE algorithm did not improve significantly compared with the traditional method (Mean value) when simulating high levels of genetic correlation. On the contrary, in the case of low levels of genetic correlation, the genetic variance of additive effects is relatively low and the genetic variance of interactive effects is major. At this time, GbyE utilizes multiple environments or traits to highlight the statistical power. Since the GbyE algorithm obtains additive, environmental, and interactive information by encoding numerical genotypes, it only increases the volume of SNP data and can be applied to any traditional GWAS association statistical model. However, this may slightly increase the correlation operation time of the GWAS model, but compared to other multi environment or trait models [ 28 , 29 ], GbyE only needs to perform a complete traditional GWAS once to obtain the results.

In GS, rrBLUP algorithm is a linear mixed model-based prediction method that assumes all markers provide genetic effects and their values following a normal distribution [ 30 ]. In contrast, the BGLR model is a linear mixed model, which assumes that gene effects are randomly drawn from a multivariate normal distribution and genotype effects are randomly drawn from a multivariate Gaussian process, which takes into account potential pleiotropy and polygenic effects and allows inferring the effects of single gene while estimating genomic values [ 31 ]. The algorithm typically uses Markov Chain Monte Carlo methods for estimation of the ratio between genetic variances and residual variances [ 32 , 33 ]. The model has been able to take into account more biological features and complexity, and therefore the overall improvement of the GbyE algorithm under BGLR is smaller than Mean method. In addition, the length of the Markov chain set on the BGLR package is often above 20,000 to obtain stable parameters and to undergo longer iterations to make the chain stable [ 34 ]. GbyE is effective in improving the statistical power of the model under most Bayesian statistical models. In the case of the phenotypes we simulated, more iterations cannot be provided for the BayesB and BayesCpi models because of the limitation of computation time, which causes low prediction accuracy. It is worth noting that the prediction accuracy of BayesCpi may also be influenced by the number of QTLs [ 35 ], and the prediction accuracy of BayesB is often related to the distribution of different allele frequencies (from rare to common variants) at random loci [ 36 ].

The overall statistical power of GbyE was significantly higher in missing single environment than in missing double environment, because in the case of missing single environment, GbyE can take full advantage of the information from the phenotype in the second environment. And the correlation between two environments can also affect the detectability of the GbyE algorithm in different ways. On the one hand, a high correlation between two environments can improve the predictive accuracy of the GbyE algorithm by using the information from one environment to predict the breeding values in the other environment, even if there is only few relationship with that environment [ 37 , 38 ]. On the other hand, when two environments are extremely uncorrelated, GbyE algorithm trained in one environment may not export well to another environment, which may lead to a decrease in prediction accuracy [ 39 ]. In the testing, we found that when the GbyE algorithm uses a GS model trained in one environment and tested in another environment, the high correlation between environments may result to the model capturing similarities between environments unrelated to G × E information [ 40 ]. However, when estimating the breeding values for each environment separately, GbyE still made effective predictions using the genotypes in that environment and maintained high prediction accuracy. As expected, the additive effect calculates the average genetic effect between environments, and its predictive effect does not differ much from the mean method. The interactive effect, however, has one less column than the number of environments, and it calculates the relative values between environments, a component that has a direct impact on the predictive effect. The correlation between the two environments may have both positive and negative effects on the detectability of the GbyE, so it is important to carefully consider the relationship between the two environments in subsequent in development and testing.

A key advantage of the GbyE algorithm is that it can be applied to almost all current genome-wide association and prediction. However, the focus of GbyE is still on estimating additive and interactive effects separately, so that it is easy to determine which portion of the is playing a role in the computational estimation.. The GbyE algorithm may have implications for the design of future GS studies. For example, the model could be used to identify the best environments or traits to include in GS studies in order to maximize prediction accuracy. It is particularly important to test the model on large datasets with different genetic backgrounds and environmental conditions to ensure that it can accurately predict genome-wide effects in a variety of contexts.

GbyE can simulate the effects of gene-environment interactions by building genotype files for multiple environments or multiple traits, normalizing the effects of multiple environments and multiple traits on marker effects. It also enables higher statistical power and prediction accuracy for GWAS and GS. The additive and interactive effects of genes under genetic roles could be revealed clearly, which makes it possible to utilize environmental information to improve the statistical power and prediction accuracy of traditional models, thus helping us to better understand the interactions between genes and the environment.

Availability of data and materials

The GbyE source code, demo script, and demo data are freely available on the GitHub website ( https://github.com/liu-xinrui/GbyE ).

Abbreviations

  • Genome-widely association study

Genome selection

Genetic by environmental interaction

Genome association and prediction integrated tool

Mixed linear model

Bayesian generalized linear regression

Ridge regression best linear unbiased prediction

False discovery rate

Principal component analysis

Genomic estimated breeding value

Maazi H, Hartiala JA, Suzuki Y, Crow AL, Shafiei Jahani P, Lam J, Patel N, Rigas D, Han Y, Huang P. A GWAS approach identifies Dapp1 as a determinant of air pollution-induced airway hyperreactivity. PLoS Genet. 2019;15(12):e1008528.

Article   PubMed   PubMed Central   Google Scholar  

Simonds NI, Ghazarian AA, Pimentel CB, Schully SD, Ellison GL, Gillanders EM, Mechanic LE. Review of the gene-environment interaction literature in cancer: what do we know? Genet Epidemiol. 2016;40(5):356–65.

Wang X, Chen H, Kapoor PM, Su Y-R, Bolla MK, Dennis J, Dunning AM, Lush M, Wang Q, Michailidou K. A Genome-Wide Gene-Based Gene-Environment Interaction Study of Breast Cancer in More than 90,000 Women. Cancer research communications. 2022;2(4):211–9.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Chen R-X, Dai M-D, Zhang Q-Z, Lu M-P, Wang M-L, Yin M, Zhu X-J, Wu Z-F, Zhang Z-D, Cheng L. TLR Signaling Pathway Gene Polymorphisms, Gene-Gene and Gene-Environment Interactions in Allergic Rhinitis. Journal of Inflammation Research. 2022;15:3613–30.

Zhao M-Z, Song X-S, Ma J-S. Gene× environment interaction in major depressive disorder. World Journal of Clinical Cases. 2021;9(31):9368.

Falconer DS. The problem of environment and selection. Am Nat. 1952;86(830):293–8.

Article   Google Scholar  

Kim J, Zhang Y, Pan W. Powerful and adaptive testing for multi-trait and multi-SNP associations with GWAS and sequencing data. Genetics. 2016;203(2):715–31.

Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics. 2017;101(1):5–22.

Article   CAS   PubMed   Google Scholar  

van Os J, Rutten BP. Gene-environment-wide interaction studies in psychiatry. Am J Psychiatry. 2009;166(9):964–6.

Article   PubMed   Google Scholar  

Winham SJ, Biernacka JM. Gene–environment interactions in genome-wide association studies: current approaches and new directions. Journal of Child Psychology Psychiatry. 2013;54(10):1120–34.

Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink J-L, Sorrells ME, Raman B, Cairns JE, Tarekegne A, Semagn K. Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3: Genes|Genomes|Genetics. 2012;2(11):1427–36.

Xu S, Zhu D, Zhang Q. Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc Natl Acad Sci. 2014;111(34):12456–61.

Zhao Y, Mette M, Gowda M, Longin C, Reif J. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat. Heredity. 2014;112(6):638–45.

Crossa J, Perez P, Hickey J, Burgueno J, Ornella L, Cerón-Rojas J, Zhang X, Dreisigacker S, Babu R, Li Y. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity. 2014;112(1):48–60.

Crossa J, Pérez-Rodríguez P, Cuevas J, Montesinos-López O, Jarquín D, De Los CG, Burgueño J, González-Camacho JM, Pérez-Elizalde S, Beyene Y. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 2017;22(11):961–75.

Roorkiwal M, Jarquin D, Singh MK, Gaur PM, Bharadwaj C, Rathore A, Howard R, Srinivasan S, Jain A, Garg V. Genomic-enabled prediction models using multi-environment trials to estimate the effect of genotype× environment interaction on prediction accuracy in chickpea. Sci Rep. 2018;8(1):11701.

Burgueño J, de los Campos G, Weigel K, Crossa J. Genomic prediction of breeding values when modeling genotype× environment interaction using pedigree and dense molecular markers. Crop Science. 2012;52(2):707–19.

Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou J, Piraux F, Guerreiro L, Pérez P, Calus M. A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theoretical applied genetics. 2014;127:595–607.

Wang JB, Zhang ZW. GAPIT Version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteomics Bioinformatics. 2021;19(4):629–40.

Pérez P, de Los CG. Genome-wide regression and prediction with the BGLR statistical package. Genetics. 2014;198(2):483–95.

Endelman JB. Ridge Regression and other kernels for genomic selection with R package rrBLUP. Plant Genome J. 2011;4:250–5.

Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, Nguyen-Viet TA, Wedow R, Zacher M. Furlotte NAJNg. Multi-trait analysis of genome-wide association summary statistics using MTAG. 2018;50(2):229–37.

CAS   Google Scholar  

Falconer DS. Introduction to quantitative genetics. Pearson Education India; 1996.

Google Scholar  

Lynch M, Walsh B. Genetics and analysis of quantitative traits, vol. 1: Sinauer Sunderland, MA. 1998.

Mackay TF. The genetic architecture of quantitative traits. Annu Rev Genet. 2001;35(1):303–39.

Visscher PM, Hill WG, Wray NR. Heritability in the genomics era—concepts and misconceptions. Nat Rev Genet. 2008;9(4):255–66.

Van der Sluis S, Posthuma D, Dolan CV. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 2013;9(1):e1003235.

O’Reilly PF, Hoggart CJ, Pomyen Y, Calboli FC, Elliott P, Jarvelin M-R, Coin LJ. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE. 2012;7(5):e34861.

Chung J, Jun GR, Dupuis J, Farrer LA. Comparison of methods for multivariate gene-based association tests for complex diseases using common variants. Eur J Hum Genet. 2019;27(5):811–23.

Pérez-Rodríguez P, Gianola D, González-Camacho JM, Crossa J, Manès Y, Dreisigacker S. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3: Genes|Genomes|Genetics. 2012;2(12):1595–16605.

VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.

Meuwissen TH, Hayes BJ, Goddard M. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29.

de Los CG, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP. Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics. 2013;193(2):327–45.

Andrieu C, De Freitas N, Doucet A, Jordan MI. An introduction to MCMC for machine learning. Mach Learn. 2003;50:5–43.

Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA. The impact of genetic architecture on genome-wide evaluation methods. Genetics. 2010;185(3):1021–31.

Clark SA, Hickey JM, Van der Werf JH. Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol. 2011;43(1):1–9.

Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42(7):565–9.

González-Recio O, Forni S. Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol. 2011;43:1–12.

Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods. 2013;9(1):1–9.

Gauderman WJ. Sample size requirements for matched case-control studies of gene–environment interaction. Stat Med. 2002;21(1):35–50.

Download references

Acknowledgements

Thank you to all colleagues in the laboratory for their continuous help.

This project was partially funded by the National Key Research and Development Project of China, China (2022YFD1601601), the Heilongjiang Province Key Research and Development Project, China (2022ZX02B09), the Qinghai Science and Technology Program, China (2022-NK-110), Sichuan Science and Technology Program, China (Award #s 2021YJ0269 and 2021YJ0266), the Program of Chinese National Beef Cattle and Yak Industrial Technology System, China (Award #s CARS-37), and Fundamental Research Funds for the Central Universities, China (Southwest Minzu University, Award #s ZYN2023097).

Author information

Authors and affiliations.

Key Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Sichuan Province and Ministry of Education, Southwest Minzu University, Chengdu, 6110041, China

Xinrui Liu, Mingxiu Wang, Jie Qin, Yaxin Liu, Shikai Wang, Shiyu Wu, Ming Zhang, Jincheng Zhong & Jiabo Wang

Nanchong Academy of Agricultural Sciences, Nanchong, 637000, China

You can also search for this author in PubMed   Google Scholar

Contributions

JW and XL conceived and designed the project. XL managed the entire trial, conducted software code development, software testing, and visualization. MW, JQ, YL, SW, MZ and SW helped with data collection and analysis. JQ, and YL assisted with laboratory analyses. JW, and XL had primary responsibility for the content in the final manuscript. JZ supervised the research. JW designed software and project methodology. All authors approved the final manuscript. All authors have reviewed the manuscript.

Corresponding author

Correspondence to Jiabo Wang .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors have declared no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Liu, X., Wang, M., Qin, J. et al. GbyE: an integrated tool for genome widely association study and genome selection based on genetic by environmental interaction. BMC Genomics 25 , 386 (2024). https://doi.org/10.1186/s12864-024-10310-5

Download citation

Received : 27 December 2023

Accepted : 15 April 2024

Published : 19 April 2024

DOI : https://doi.org/10.1186/s12864-024-10310-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Genomic selection

BMC Genomics

ISSN: 1471-2164

case study environmental epidemiology

IMAGES

  1. Statistical Methods for Environmental Epidemiology with R: A Case Study

    case study environmental epidemiology

  2. Routledge Revivals- Environmental Epidemiology

    case study environmental epidemiology

  3. Environmental Epidemiology

    case study environmental epidemiology

  4. Environmental Case Study Summaries

    case study environmental epidemiology

  5. Descriptive Epidemiology, Case Reports, Case Series, Cross-Sectional

    case study environmental epidemiology

  6. Environmental Epidemiology: Principles And Methods

    case study environmental epidemiology

VIDEO

  1. Lecture #2 Descriptive Epidemiological Study 🔥 || Epidemiological Studies ||@MedicalChannel-1

  2. case control study part 2 || epidemiology|| PSM|| @Sudarshan263

  3. HESI Environmental Epidemiology Webinar Series Evidence Integration in Epidemiology and Risk Asses

  4. module-3 Case study

  5. EPIDEMIOLOGY lecture 11 CASE CONTROL STUDY detailed information with all questions

  6. 🔴 2- Study Design, Dr.Hazem Sayed ازاي تعرف نوع الدراسة بسهولة

COMMENTS

  1. Environmental-Epidemiology Studies: Their Design and Conduct

    This chapter discusses the origins of epidemiologic study and summarizes common analytic techniques. After a brief discussion of study designs and the types of information they produce, this chapter notes several difficulties for studies of environmental epidemiology, including the problems of studying small numbers of persons or rare diseases. We recommend that research on study designs focus ...

  2. Time-stratified case-crossover studies for aggregated data in

    The case-crossover study design was first developed for individual-level data to study transient effects of the risk of acute events (e.g. myocardial infarction). 12 However, in environmental epidemiology studies, ambient exposures (e.g. air pollution, temperature) are often assigned using central monitoring stations or gridded exposure ...

  3. Association between ambient temperature and ...

    Space-Time-Stratified Case-Crossover Design in Environmental Epidemiology Study. Health Data Science. 2021; 2021. View in Article Scopus (28) Crossref; Google Scholar; ... A Nationwide Case-Crossover Study" for their contribution in the field of health while we make some contributions and clarifications on interpretations, ...

  4. The Case Time Series Design : Epidemiology

    The case time series data setting provides a flexible framework that can be adapted for studying a wide range of epidemiologic associations. For instance, outcomes, exposures, and other predictors can be represented by either indicators for events, episodes, or continuous measurements that vary across units and times, as in Figure 1.The time intervals can be of any length (from seconds to ...

  5. Environmental Epidemiology

    EHRs are ideal for environmental epidemiologic research given individuals seeking medical care are represented across diverse built, physical, and social environments. Geisinger EHRs have been used to study the effects of unconventional natural gas development on asthma, birth outcomes, chronic rhinosinusitis, depressive symptoms, fatigue ...

  6. Epidemiology: a foundation of environmental decision making

    Many epidemiologic studies are designed so they can be drawn upon to provide scientific evidence for evaluating hazards of environmental exposures, conducting quantitative assessments of risk, and ...

  7. Environmental Epidemiology: Principles and Methods

    Readers would be better advised to consult standard biostatistics and epidemiology methods texts. Some suggested replacements would be more in-depth descriptions of strengths and limitations of the various study designs widely adopted in environmental epidemiology—cohort, case-control, and case-crossover studies.

  8. Statistical Methods for Environmental Epidemiology with R: A Case Study

    A Case Study in Air Pollution and Health. Home. Book. Statistical Methods for Environmental Epidemiology with R Download book PDF. Authors: Francesca Dominici, ... As an area of statistical application, environmental epidemiology and more speci cally, the estimation of health risk associated with the exposure to - vironmental agents, has led to ...

  9. PDF Epidemiology: a foundation of environmental decision making

    fi. of integrating the human health data with other diverse lines of evidence (e.g., animal toxicological data) to assess human health risks [1-3]. Although the collective body of evidence ...

  10. Space-Time-Stratified Case-Crossover Design in Environmental

    We are living in a changing environment that affects human health. It is vital to use proper methods to quantify the impact of environmental exposure (e.g., air pollutants and extreme temperatures) on human health. Case-crossover design with daily environmental exposure and health outcomes (e.g., deaths and hospitalisations) is one of the most common study designs. It allows researchers to ...

  11. Environmental epidemiology

    The study types most often employed in environmental epidemiology are: Cohort studies; Case-control studies; Cross-sectional studies; Estimating risk. Epidemiologic studies that assess how an environmental exposure and a health outcome may be connected use a variety of biostatistical approaches to attempt to quantify the relationship.

  12. PDF Environmental Epidemiology

    Case studies in environmental epidemiology will be discussed to provide details of research methods and findings. Course prerequisite: It is recommended, although not required, that students had an introductory epidemiology course and an introductory biostatistics course. The course is open to

  13. Chapter 3 Time series / case-crossover studies

    Chapter 3 Time series / case-crossover studies. We'll start by exploring common characteristics in time series data for environmental epidemiology. In the first half of the class, we're focusing on a very specific type of study—one that leverages large-scale vital statistics data, collected at a regular time scale (e.g., daily), combined with large-scale measurements of a climate-related ...

  14. Case-only approach applied in environmental epidemiology: 2 examples of

    The case-only approach can provide 2 benefits over a study with cases and non-cases or conventional cohort/case-control studies to estimate the interaction effect between a susceptibility factor and an environmental exposure [5, 7,8,9,10,11,12,13]. The first is that a more precise interaction effect estimate can be calculated.

  15. Role of epidemiology in risk assessment: a case study of five ortho

    Background The association between environmental chemical exposures and chronic diseases is of increasing concern. Chemical risk assessment relies heavily on pre-market toxicity testing to identify safe levels of exposure, often known as reference doses (RfD), expected to be protective of human health. Although some RfDs have been reassessed in light of new hazard information, it is not a ...

  16. PDF Environmental Epidemiology

    contrIButIonS of epIdemIology to envIronmental HealtH. Epidemiology aids the environmental health field through: • Concern with populations • Use of observational data • Methodology for study designs • Descriptive and analytic studies. Epidemiology is important to the study of environmen-tal health problems because (1) many exposures ...

  17. Statistical Methods for Environmental Epidemiology with R: A Case Study

    This work presents a reproducible seasonal analysis of PM10 and mortaility in the U.S. with a focus on pooling risks across locations and quantifying spatial heterogeneity. Studies of air pollution and health. - Introduction to R and air pollution and health data. - Reproducible research tools. - Statistical issues in estimating the health effects of spatial-temporal environmental exposures ...

  18. PDF Environmental Epidemiology

    environmental epidemiology Case study: asbestos and mesothelioma Berger RE et al. NEJM 2017; 376:2591-2592 2 1/30 Exposure assessment I: general guidelines and air pollution Case study: Radon and lung cancer Konen T et al. Environ Res 2019;175:449-456 3 2/6 Exposure assessment

  19. Devin Bowes uses wastewater-based epidemiology to advance environmental

    Bowes adopts an environmental justice and health equity lens. She was inspired to incorporate these perspectives into her work when conducting COVID-19 research as a graduate student. ... "In a year-long, neighborhood-level study using wastewater-based epidemiology to monitor trends of SARS-CoV-2 across a city, we learned that wastewater not ...

  20. Environmental Epidemiology

    Environmental Epidemiology. Environmental Epidemiology is an advanced epidemiology course that addresses epidemiological research methods used to study environmental exposures from air pollution to heavy metals, and from industrial pollutants to consumer product chemicals. The course will provide an overview of major study designs in ...

  21. Continued decline in the incidence of myocardial infarction ...

    During the early phase of the COVID-19 pandemic, the number of hospital admissions for myocardial infarction declined across the world [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18].Despite the vast number of published studies, it is still unknown whether the observed declines reflect a real decrease in the risk of myocardial infarction or merely the fact that fewer patients reached a hospital.

  22. GbyE: an integrated tool for genome widely association study and genome

    The growth and development of organism were dependent on the effect of genetic, environment, and their interaction. In recent decades, lots of candidate additive genetic markers and genes had been detected by using genome-widely association study (GWAS). However, restricted to computing power and practical tool, the interactive effect of markers and genes were not revealed clearly.

  23. Life Cycle Assessment of Plasterboard Production: A UK Case Study

    Plasterboard, which serves as a nonstructural building material, is widely employed for lightweight wall construction and surface finishing in walls and ceilings. Amid mounting concerns regarding product sustainability and the adoption of Net Zero strategies, evaluating the environmental performance of materials has become crucial. This study aims to conduct a comprehensive life cycle ...