U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Evidencing How Experience and Problem Format Affect Probabilistic Reasoning Through Interaction Analysis

Manuele reani.

1 School of Computer Science, University of Manchester, Manchester, United Kingdom

Alan Davies

2 Division of Informatics, Imaging and Data Sciences, University of Manchester, Manchester, United Kingdom

Caroline Jay

Associated data.

Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/manurea/mouse-interaction-bayesian-reasoning .

This paper examines the role that lived experience plays in the human capacity to reason about uncertainty. Previous research shows that people are more likely to provide accurate responses in Bayesian tasks when the data are presented in natural frequencies, the problem in question describes a familiar event, and the values of the data are in line with beliefs. Precisely why these factors are important remains open to debate. We elucidate the issue in two ways. Firstly, we hypothesize that in a task that requires people to reason about conditional probabilities, they are more likely to respond accurately when the values of the problem reflect their own lived experience, than when they reflect the experience of the average participant. Secondly, to gain further understanding of the underlying reasoning process, we employ a novel interaction analysis method that tracks mouse movements in an interactive web application and applies transition analysis to model how the approach to reasoning differs depending on whether data are presented using percentages or natural frequencies. We find (1) that the closer the values of the data in the problem are to people's self-reported lived experience, the more likely they are to provide a correct answer, and (2) that the reasoning process employed when data are presented using natural frequencies is qualitatively different to that employed when data are presented using percentages. The results indicate that the benefits of natural frequency presentation are due to a clearer representation of the relationship between sets and that the prior humans acquire through experience has an overwhelming influence on their ability to reason about uncertainty.

1. Introduction

Over the past five decades, the human ability to reason about uncertainty has been the subject of a wealth of research. A large amount of evidence has shown that humans struggle with certain forms of probabilistic reasoning. Of particular difficulty are problems where one is expected to use Bayes' theorem (Equation 1) to estimate the probability of a hypothesis given the availability of certain evidence. These appear to be challenging not only for laypeople but also for experts, such as medical professionals. Consider this example from an early study (Eddy, 1982 ):

The probability of having breast cancer for a woman of a particular age group is 1%. The probability that a woman with breast cancer will have a positive mammography is 80%. The probability that a woman without breast cancer will also have a positive mammography is 9.6%. What is the probability that a woman with a positive mammography actually has breast cancer?

To answer the question one should apply Equation 1 in which P(H ∣ E), known as the posterior probability or the positive predictive value (PPV), is the probability of the hypothesis (breast cancer) given the evidence (positive mammography), P(E ∣ H), known as the likelihood or sensitivity of the test, is the probability of the evidence given the hypothesis, P(H), known as the prior probability or base rate, is the probability of the hypothesis, P(E ∣ ¬H), known as the false positive rate or false alarm rate, is the probability of the evidence given the opposite hypothesis (e.g., no breast cancer) and P(¬H) is the probability of the opposite hypothesis.

The answer to this problem in the original paper, achieved by applying the equation to the figures given in the question, is 7.8%. When posed to a group of physicians, however, only around 5% of them arrived at the correct estimate; the majority estimated a probability of between 70 and 80% (Eddy, 1982 ). Many subsequent studies have reported similar results, and for at least four decades there has been an ongoing debate about why people perform so poorly in probabilistic reasoning tasks (McDowell and Jacobs, 2017 ; Weber et al., 2018 ). Among the many explanations given, two have been reported extensively in previous literature. One theory is that many people fail to make a correct inference because they do not adequately consider the base rate—a phenomenon known as base rate neglect (Tversky and Kahneman, 1974 ; Bar-Hillel, 1983 ). When the base rate value is very small, this can lead to a large overestimation of the PPV, as found in the mammography problem study (Eddy, 1982 ). A second theory is that people who fail to make a correct inference confuse the sensitivity, i.e., P(E ∣ H), with the PPV, i.e., P(H ∣ E) (Eddy, 1982 ; Elstein, 1988 ; Gigerenzer and Hoffrage, 1995 ; Gigerenzer et al., 1998 ; Hoffrage and Gigerenzer, 1998 ). Previous research suggests that there are other factors affecting probabilistic reasoning. The information format in which the problem is described appears to be strongly linked to how people perceive probabilistic problems (Gigerenzer and Hoffrage, 1995 ; Binder et al., 2018 ). Furthermore, people's beliefs about the uncertainty surrounding the event described in the problem (which may be the result of direct experience) can also affect how they perceive and reason about probabilities (Cohen et al., 2017 ). At present, however, the cognitive processes involved in this form of reasoning remain poorly understood, and a full account of how these factors affect reasoning is still lacking (Weber et al., 2018 ). The current study has two aims. The first is to examine whether the previous lived experience people have with the uncertainty surrounding a real-life stochastic event affects their reasoning about the probability of such an event. We hypothesize that personal beliefs about uncertainty formed as a result of lived experience, reinforced over time, can bias people's estimation of risk. A second aim of the study is to investigate whether the format in which the data is presented (i.e., probabilities vs. frequencies) affects the way people approach the problem and whether behavioral patterns associated with the different formats can explain people's reasoning. To achieve this, we use a paradigm where information remains hidden until it is hovered over with a mouse. By tracking mouse movements, we can determine when and in what order people access the problem data, providing a window on the cognitive process.

1.1. Two Theories of Probabilistic Reasoning

It has been hypothesized that people's inability to answer probabilistic reasoning problems correctly might be related to the way these problems are framed, i.e., the information format (Gigerenzer and Hoffrage, 1995 ). The ecological rationality framework argues that the use of natural frequencies, or visualizations that highlight frequencies, improves probabilistic reasoning because this way of representing the problem reflects what humans have encountered in real-life situations over thousands of years of evolution (McDowell and Jacobs, 2017 ). The mammography problem re-framed using frequencies states:

100 out of 10,000 women of a particular age group who participate in routine screening have breast cancer. 80 out of 100 women who participate in routine screening and have breast cancer will have a positive mammography. 950 out of 9,900 women who participate in routine screening and have no breast cancer will also have a positive mammography. How many of the women who have participated in routine screening and received a positive mammography actually have breast cancer? (Gigerenzer and Hoffrage , 1995 )

In this case, the calculation required to correctly answer the problem is simpler, as it reduces to dividing the number of women who have breast cancer and tested positive (80) by the number of women who tested positive regardless of whether they actually have the disease (80 + 950).

Previous research shows that the use of the frequency format, or graphs highlighting frequencies, boosts performance (Gigerenzer and Hoffrage, 1995 ; McDowell and Jacobs, 2017 ). Nevertheless, even when re-framing the problem using natural frequencies, evidence from more than 20 years of probabilistic reasoning research shows that about 76% of people still make incorrect estimates (McDowell and Jacobs, 2017 ). To date, it is still not clear why this is the case (Weber et al., 2018 ).

It is worth noting that, in this study, by “frequency format” we mean the numerical format describing a Bayesian problems where the data is presented using natural frequencies and the question asks the participant to state the frequency of events in the form of X out of Y . By “probability format” we mean the numerical format describing a Bayesian problem where the data are shown using probabilities (or percentages) and the question asks for a single-event probability. This clarification is needed as there are hybrid possibilities where, the question in a problem framed using natural frequencies can be asked as a single event probability. In this situation, the advantage of using natural frequencies appears to be diminished (Cosmides and Tooby, 1996 ; Tubau et al., 2018 ).

As shown in the above calculation, the frequency format is less computationally demanding than the probability format. According to the proponents of the ecological rationality framework, this is the main, albeit not the only reason why people reason better with frequencies. The frequency format is also argued to be more congruent with the way people acquire information in the wild (Gigerenzer and Hoffrage, 1995 , 2007 ; McDowell and Jacobs, 2017 ). A strict interpretation of this framework assumes that frequencies are better processed by the human mind, as this way of representing uncertainty might be the ideal input for a cognitive mechanism specifically evolved through human phylogenesis to deal with frequencies, a position which has been challenged by some (Sirota and Juanchich, 2011 ; Lesage et al., 2013 ; Gigerenzer, 2015 ; Hoffrage et al., 2015 ; Sirota et al., 2015 ; McDowell and Jacobs, 2017 ).

A second perspective, the nested-set hypothesis , states that the frequency format, and related visual aids, are effective because they clearly expose relationships between sets that are not apparent when the problem is described using the probability version of the textual format (McDowell and Jacobs, 2017 ). According to this theory, it is less the case that the format taps into a specially evolved cognitive module, but rather that it better supports domain-general human cognition via a clearer problem presentation (Cosmides and Tooby, 1996 , 2008 ; Sirota et al., 2015 ). This latter view has been supported in a number of studies (Sirota and Juanchich, 2011 ; Lesage et al., 2013 ;Sirota et al., 2015 ).

Some researchers hold the views that the ecological rationality framework and the nested-set hypothesis diverge in their explanation of how humans perform probabilistic reasoning, others disagree that the theories are dichotomous, stating that both explanations converge on the conclusion that the format provides an information structure that simplifies computations (McDowell and Jacobs, 2017 ). Furthermore, it is worth noting that the theorists who developed the ecological rationality framework had stated in their research that natural frequencies simplify the calculation because they provide a clearer structure of the problem. Thus, although they did not call this the nested-set hypothesis, it appears clear that they referred to the same concept (Hoffrage et al., 2002 ; Gigerenzer and Hoffrage, 2007 ). Although it can be argued that the two theories are in reality one, the cognitive process by which this facilitative effect is achieved is still under investigation. The present lack of consensus, and the heterogeneity found in the results of previous studies, suggest that the cognitive mechanisms underpinning how people approach probabilistic reasoning problems are still not fully understood (McDowell and Jacobs, 2017 ).

1.2. The Role of the Data Acquisition Process

The format in which information is displayed is not the only factor affecting probabilistic reasoning. Previous research suggests that the way in which people take on board information and learn probabilities – termed the data acquisition process – can also affect reasoning (Hoffrage et al., 2015 ; Traczyk et al., 2019 ).

Research in probabilistic reasoning is historically divided into two different families of tasks: in one case probabilities are derived from sequential experimentation; in the other probabilities are fully stated in a single instance (Hoffrage et al., 2015 ). Data acquisition is thus accomplished either by obtaining information through sequential experimentation, enabling a reconstruction of the likelihood, i.e., P(E ∣ H), as described in the “bags-and-chips” problem below, or by receiving an explicit statement of the likelihood and the false positive rate values, as found in the mammography problem described earlier. Early research in probabilistic reasoning was pioneered by Edwards ( 1968 ), who conducted several studies using the famous “bags-and-chips” problem (Phillips and Edwards, 1966 ; Edwards, 1968 , 1982 ; Slovic and Lichtenstein, 1971 ). In this problem, participants are told that there are two bags filled with poker chips. One bag has 70 red chips and 30 blue chips, while the other bag has 30 red chips and 70 blue chips. Participants do not know which bag is which. The experimenter flips a coin to choose one of the bags, and then begins to randomly sample chips from the chosen bag, with replacement. Thus, before drawing any chip, each bag is equally likely to be chosen (i.e., p = 0.5). At the end of the sampling process, participants are left with a sequence of chips drawn from the bag, e.g., six red and four blue chips. Participants are then asked to estimate the probability that the predominantly red bag is the one being sampled. Applying Bayes' theorem to a situation where six red and four blue chips are sampled, the probability that the predominantly red bag is the one being sampled is 0.85. Several experiments using this task show that participants' estimates tend to be very close to correct, but are slightly conservative (i.e., participants have the tendency to slightly underestimate the probability that the bag chosen is the predominantly red bag) (Phillips and Edwards, 1966 ,?; Edwards, 1968 , 1982 ; Slovic and Lichtenstein, 1971 ). Edwards and colleagues concluded that people reason in accordance with Bayes' rule, but they are “conservative Bayesians”, as they do not fully update their prior beliefs in light of new evidence as strongly as Bayes' rule prescribes (Phillips and Edwards, 1966 ;Edwards, 1968 ).

The key difference between the mammography problem and the bags-and-chips problem is that in the former, the likelihood and the false positive rate values are explicitly stated in the description of the problem; conversely, in the latter, participants have to update their beliefs sequentially, upon the acquisition of new information – i.e., the information acquisition process is staged, and subjects learn about each case serially through lived experience (Edwards, 1968 ; Mandel, 2014 ). Thus, the method used by Edwards for testing probabilistic reasoning is conceptually very different to that used in more recent research where different versions of the mammography problem have been employed. The results from previous research show that the outcomes produced by these two classes of experiments, in terms of participants' performance, are also different. In the bags-and-chips problem people's estimates, albeit conservative, tend to be fairly accurate. Conversely, the results from research using descriptive tasks (e.g., the mammography problem) have shown that people perform poorly at probabilistic reasoning and tend to greatly overestimate risk (Eddy, 1982 ; Gigerenzer and Hoffrage, 1995 ). A clear distinction can thus be made between the probability learning paradigm , which uses tasks in which people learn probabilities through a direct (lived) experience with the sampling process (i.e., the data acquisition process involves continuously updating beliefs over time in light of new evidence) and the textbook paradigm in which the probabilities are fully stated in a text or in a graph (i.e., the data acquisition process is indirect, and the temporal component is missing) (Hoffrage et al., 2015 ). This distinction draws a parallel with some literature in the field of decision making which highlighted a difference between decisions derived from experience and decisions from descriptions (Hertwig et al., 2004 ).

1.3. How Does Data Acquisition Affect Cognition

The probability learning paradigm employs tasks where people are given the opportunity to learn probabilities from a sequence of events, and are subsequently tested as to whether they make judgments consistent with Bayes' rule. In such tasks, performance tends to be accurate. The superior performance observed in the probability learning paradigm is hypothesized to be due to the fact that in these situations people may use unconscious, less computationally demanding (evolutionary purposeful) mental processes (Gigerenzer, 2015 ).

The textbook paradigm employs tasks where probabilities are numerically stated, in either a textual description or a graphical representation of the problem. People perform poorly in these tasks, particularly when the information is provided in probabilities. This effect may be due to a heavy reliance on consciously analytical (biologically secondary) mental processes that require much greater cognitive effort (Gigerenzer, 2015 ).

It thus appears that direct experience with uncertainty (typical of those tasks found in the probability learning paradigm) taps into statistical intuition. Conversely, descriptions that are merely abstractions of reality are not able to fully substitute for an individual's direct experience with the environment and may require (explicit) analytic thinking (Hertwig et al., 2018 ).

Although experience and description are different ways of learning about uncertainty, they can be complementary. Description learning may be useful when we do not have the opportunity to directly experience reality, as may be the case when events are rare, samples are small, or when the causal structure of experience is too complex (Hertwig et al., 2018 ). Learning on the basis of a description may also be perceived as an experiential episode, if the format of the description is able to trigger an experience-like learning process. For example, presenting a textbook problem, such as the mammography problem, in terms of natural frequencies rather than conditional probabilities, may make this task (at the perceptual level) closer to learning from experience. This would occur if frequencies from natural sampling are seen as abstractions representing the process of sequentially observing one event after the other in the real world (Hoffrage et al., 2015 ). If this is the case, the manipulation of the information format would affect the perception of the data acquisition process. This may be the reason why the proportion of people who reason in accordance with Bayes' rule rises substantially when the information is presented using natural frequencies (Gigerenzer, 2015 ; Hertwig et al., 2018 ). Nevertheless, it may also be that frequency formats are effective merely due to their ability to highlight hidden relationships (i.e., this would enable the formation of clearer mental representations of the problem) or the fact that computing the solution when the problem is framed using the frequency format is much simpler than computing the solution when the problem is framed using the probability format due to the reduced number of algebraic calculations in the former (Sirota and Juanchich, 2011 ; Lesage et al., 2013 ; Sirota et al., 2015 ).

1.4. The Role of Task Familiarity and Personal Beliefs

In probabilistic reasoning research using the textbook paradigm, people appear to be more accurate when reasoning about familiar tasks (everyday problems) than unfamiliar tasks (e.g., diagnostic medical testing) (Binder et al., 2015 ). There is also evidence that the degree of belief a participant has about the probability of an event affects his or her performance (Cohen et al., 2017 ). This latter stream of research collected people's opinions, via surveys, about the uncertainty surrounding certain stochastic events – i.e., whether the probabilities used in problems are believable or not – and subsequently tested participants on these, to show that accuracy improves when the probabilities are rated as more believable.

A person's beliefs might be formed as a result of indirect experience (e.g., a friend's story, anecdotes, news, social media, discussion forums, etc.) or from lived experience, through direct exposure to the uncertainty surrounding an event, perhaps reinforced over time (e.g., a physician dealing with mammography tests daily). Thus, the quality of one's beliefs can be the result of the way he/she acquire the information (i.e, the data acquisition process) in such problems. This draws parallel with the distinction which was made between reasoning from description and reasoning from experience presented in previous studies (Hoffrage et al., 2015 ). According to this line of argument, if data in a reasoning task matches beliefs emerging from lived (direct) experience of the uncertainty related to the stochastic event, people may perform better than they would if the data are simply generally plausible, and that this may hold regardless of the format in which uncertainty is encoded.

1.5. Rationale and Research Hypotheses

In this study, we investigate the effect of lived experience on reasoning accuracy. Previous research has shown that people are more accurate in their reasoning when presented with believable data, as determined at a population level (Cohen et al., 2017 ). There is also evidence from the experiential learning paradigm that direct experience with the data facilitates reasoning (Edwards, 1968 ). Indeed, some research has shown that the way in which people gather information about uncertainty affects reasoning (Hoffrage et al., 2015 ; Traczyk et al., 2019 ). We thus hypothesize (H1) that people are more likely to reason accurately when the data presented in a reasoning problem directly match their self-reported experience of the probability of an event, than when the data are believable, but do not match their experience. This is because experience-matched data may tap into those unconscious processes typically involved in experiential learning (Gigerenzer, 2015 ).

The second hypothesis (H2) tests whether the frequency format is superior to the probability format only because it resembles the process of learning from experience. The ecological rationality framework states that people reason more accurately when using the frequency format because it induces experiential learning at the perceptual level. However, when the data is derived from people' lived experience, an experiential learning process had already took place. At this point, the facilitative effect of the frequency format might be redundant. We therefore hypothesize that when data match experience, there will be no facilitative effect of presenting the problem in the frequency format, but when data do not directly match experience, this effect will be present.

Previous research using interaction analysis to study probabilistic reasoning has found patterns in people's observable behavior to be linked to certain reasoning strategies (Khan et al., 2015 ; Reani et al., 2018b , 2019 ). To date, this work has focused primarily on eye tracking analysis, which may not provide a comprehensive picture of an individual's reasoning process. For instance, people may fixate on certain locations not because they consciously intend to acquire the information contained in those locations, but because the physical properties of these (e.g., color, shape, etc.) attract visual attention.

In this study we therefore seek to shed further light on the reasoning processes with an online method that uses mouse-event analysis to study human cognition. In an interactive web application, the user has to hover the mouse cursor over the nodes in a tree diagram to uncover hidden information. When the mouse moves away, the information is hidden again, so it is clear when the user is accessing the data. As the relevant information is obscured by buttons, and participants must explicitly hover over the button to reveal the data underneath, it is possible to obtain a direct link between cursor behavior and cognition. Mouse events are then analyzed using a transition comparison method previously applied to eye tracking data (Reani et al., 2018b , 2019 ; Schulte-Mecklenbeck et al., 2019 ). We hypothesize (H3) that if probability reasoning and frequency reasoning invoke different cognitive processes, mouse movement will differ according to the format in which the information is encoded.

In the mammography problem (Eddy, 1982 ), the jargon and the problem context may be unfamiliar to most people and, consequently, participants may not fully understand what the results of a diagnostic test actually represent in terms of risk. Previous research has shown that people are better at solving problems which are familiar to them from everyday experience (Binder et al., 2015 ). People are seldom exposed to diagnostic tests in every day life, unless they are medical professionals. Thus, the general public may not be able to make full use of their previous experience to evaluate uncertainty about an event, if their experience regarding this event is limited.

As a result, in this study, a “fire-and-alarm” scenario was used as a situation that is meaningful to most people (see the Supplementary Materials for the full textual description of the problem). In this context, by analogy with the mammography problem, the diagnostic test is the fire alarm, which can sound or not sound, and the disease is the fire which can be present or absent. It is very likely that participants have been exposed to at least some situations in which they have heard a fire alarm, for instance in a school or a workplace. This scenario is thus presumed to be more familiar to people than scenarios describing medical diagnostic tests, and uses simpler terminology. However, although the context of the problem is different, the information provided in the fire-and-alarm scenario is similar to the information provided in the original mammography problem, i.e., they both include the base rate (here, the probability of being in the presence of fire in a random school on a random day of the year), the true positive rate (the probability of hearing a fire alarm given that there is a real fire in the school) and the false alarm rate (the probability of hearing a fire alarm given that there is not a fire in the school).

The problem was presented using a tree diagram (see Figure 1 ). We chose to use a graph because this clearly separates the data of the problem in space and, consequently, can be easily used to study interaction events. Bayesian problems of this kind are known to be hard to solve (Eddy, 1982 ), and previous research in probabilistic reasoning has used trees extensively as a clear and familiar way to display probabilistic problems (Binder et al., 2015 ; Hoffrage et al., 2015 ; Reani et al., 2018a ). Some studies have shown that performance in probabilistic reasoning tasks improves when these are presented using tree diagrams containing natural frequencies, but not when these diagrams display probabilities (Binder et al., 2015 , 2018 ). A graph can be presented alone or in conjunction with a textual description of the problem. As previous work has demonstrated that adding a textual description to a graph which already displays all the data is unnecessary and does not improve participants' performance (Sweller, 2003 ; Mayer, 2005 ; Micallef et al., 2012 ; Böcherer-Linder and Eichler, 2017 ; Binder et al., 2018 ), in the present research we use a tree diagram without a description of the problem. We compare frequency trees with probability trees to test our hypothesis (H2) that the manipulation of the information format does not have an effect on performance in a descriptive task which is perceived to be like an experiential learning task (details below).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-10-01548-g0001.jpg

Problem shown using a tree diagram with the probability format, where the information is hidden behind the buttons, and hovering the mouse cursor over a button reveals the information underneath.

Before presenting the problem, participants were given some contextual information (provided in the Supplementary Materials ) which described several plausible situations that they were likely to have encountered; for instance situations in which there was a fire in a school but the fire alarm did not sound, perhaps because it was faulty, or situations in which one could hear a fire alarm but there was no fire, for instance, because someone was smoking in the bathroom. This type of contextual information is similar to the information given in the narratives used in previous experiments to reduce the artificiality of the experimental setting and improve the clarity of the problem (Ottley et al., 2016 ). In this case, it was also used to better relate the problem to participants' previous experience.

To investigate the effect of the data acquisition process on people's reasoning about uncertainty, two separate but comparable online studies were conducted. The data from the two studies are evaluated within the same analysis (using a between-subjects approach), as the only difference between them was the way in which the information provided in the graph was generated (the variable DGM—Data Generating Mode).

In both studies, participants were asked in a preliminary survey to provide estimates, based on self-reported experience, of the probability of fire in a given school on a random day of the year (the base rate information), the probability of hearing a fire alarm given that there was a real fire (the true positive rate) and the probability of hearing a fire alarm given that there was not a real fire (the false alarm rate). In both studies, participants were asked to provide these quantities either in the form of frequency (e.g., 2 out of 50) for the first condition, or in the form of percentages (e.g., 4%) for the second condition.

2.1. Study 1

In the first study, participants were shown a tree diagram displaying information derived from the values provided by the participants themselves (i.e., their self-reported experience with regard to the base rate, true positive rate and false alarm rate). To achieve this, the inputs provided by the participants during the preliminary survey (see Supplementary Materials ) were stored in the Web application database, and then utilized to construct the tree that was displayed in the second phase of the task.

The study used a between-subjects design with one factor, Information Format, with two levels (frequency vs. probability). Participants were asked to provide the three quantities in the form of either natural frequencies (for the frequency format condition) or percentages (for the probability format condition), and the problem was subsequently framed using natural frequencies or percentages respectively. The inputs provided by the participants were adapted to the problem such that the total population was 1,000 events for the frequency format and 100% for the probability format. For instance, if a participant in the frequency format condition stated that the chance of being in the presence of fire in a random school on a random day of the year was 1 out of 5, this was shown on the tree diagram as 200 events where fire occurs, out of 1,000 total events; if he or she stated that the probability of hearing a fire alarm in the case of fire was 9 out of 10, then on the graph the number of events where the fire and alarm occurred were 180 out of the remaining 200 events where fire occurred. In the probability format condition, participants were asked to provide these quantities in the form of percentages and the problem was also framed using percentages. It is worth noting that an inherent property of probability/percentage trees is that the values on the graph are normalized at each branch - i.e., the total number of events is set back to 100% at each node (or to 1 in the case of probabilities). This contrasts with frequency trees in which case the values are derived from a natural sampling process—i.e., each node starts with the number of events which is left from the preceding splits.

The question below the graph asked participants to compute the probability of fire given that the fire alarm was sounding (i.e., the positive predictive value, or PPV). Participants were explicitly asked to calculate the PPV based on the data shown in the graph.

It is worth noting that, in the initial survey, participants were not asked to provide the PPV. This question was asked after the survey, during the experimental task that presented data derived from their responses. Thus, participants could not just rely on memory. They still needed to reason to understand the data, the relationships between different pieces of information and what the question was asking them to calculate.

2.2. Study 2

The second study used a different Data Generating Mode. Instead of showing data derived from the participants' personal experience, we displayed fixed values, which were the median of the base rate, true positive rate and false alarm rate values calculated from all the responses given in the first study. As such, they were plausible probabilities, but did not necessarily match people's actual experience with the situation presented in the problem. These values were still collected in the preliminary survey in study 2, in order to calculate the extent to which the difference between participants' reported experience and the average values they were presented with affected performance. Study 2 used a between-subjects design with one factor, Information Format, with two levels (frequency vs. probability).

2.3. Participants

The participants were “workers” recruited from Amazon Mechanical Turk (MTurk) 1 , who took part in the study for monetary compensation (Behrend et al., 2011 ; Mason and Suri, 2012 ). There were 300 participants in study 1, 150 in each condition, and 300 participants in study 2, 150 in each condition. We eliminated from the analysis those participants who did not disable any active ad-blocker in their web browser (an action which was explicitly requested on the instructions page) before starting the experiment, as the ad-blocker may have interfered with the data collection tool. We also eliminated all those participants who answered the problem without looking at the question at least once. It was possible to detect this from the interaction data, as the participant was required to hover over a button to see the question. Finally, we eliminated from the data set all those participants who did not look at (by hovering over) at least two pieces of information, excluding the question, as this sort of behavior was assumed to indicate a lack of effort from the participants – to answer the question one needs to extract at least two pieces of information from the graph, and this was explicitly mentioned on the problem description page. After eliminating invalid participants based on the above criteria, we were left with 156 participants in study 1 (age range 18–71, 66 males and 90 females) and 186 participants in study 2 (age range 18–68, 65 males and 121 females. The distribution of age and gender of participants across conditions can be found in Table 1 . A meta-analysis reviewing 20 years of research on probabilistic reasoning shows that participants with greater educational or professional experience are not better than laypeople at solving probabilistic reasoning problems (McDowell and Jacobs, 2017 ). However, some research highlights certain links between probabilistic reasoning ability and people's numeracy (Brase and Hill, 2017 ). Thus, before starting the task, participants were asked to complete the Subjective Numeracy Scale, which is a widely used standardized questionnaire for assessing people's numeracy (Fagerlin et al., 2007 ). This was used to control for potential confounders stemming from individual differences in mathematical abilities.

Biographical data and descriptive statistics.

The values for the descriptive statistics are the means and the standard deviations (in brackets) .

2.4. Procedure and Stimuli

Both studies employed a crowdsourcing method that allocated Amazon Mechanical Turk's Workers to one of the two conditions—frequency format or probability format—counterbalancing the order of the allocation of the participants (Behrend et al., 2011 ; Mason and Suri, 2012 ). Those workers who self-enrolled to take part in the study were redirected to our web application, which was hosted on a university server. The application was built in JavaScript and Python and is available on GitHub 2 . The application was specifically designed to display the problem, collect participants' responses, and integrate with another application which was used to track participants' mouse events for the duration of the task (Apaolaza et al., 2013 ). This is also available on GitHub 3 .

At the beginning of the experiment, an instruction page provided participants with an explanation of the study. Participants were asked to give their consent by checking a box before starting the actual task. After that, demographic data including age and gender were collected, and participants performed the numeracy test. Contextual information was also provided regarding the fire-and-alarm problem, and what was expected from participants (see the Supplementary Materials ). Then, participants' estimates of the probability of the three quantities (i.e., base rate, true positive rate, and false alarm rate) were collected. Finally, the actual problem was presented using a tree diagram (see Figure 1 ), and participants were asked to provide an answer in the dedicated space below the graph, next to the question. After completing the task, participants were redirected to an end-page which provided an alphanumeric code that could be used to retrieve compensation through the Amazon platform.

Inconsistencies between the answers participants gave in the survey and in the actual task could have arisen during the study, due to typographical error, for example. Several checks were thus hard-coded into the web application. For instance, if the numerator was greater than the denominator, the software generated a pop-up window with an error stating that the numerator could not be larger than the denominator.

On the task page, the data were hidden below buttons placed on the tree diagram. The buttons had labels describing the data they concealed (e.g., the button labeled “Fire” covered the number of events with fire). The text describing the question was also hidden behind a button (see Figure 1 ). To access the concealed data or text, participants had to hover over the relevant button with their mouse. The information was hidden again when they moved away. This interaction technique was used to determine which pieces of information participants thought were relevant, and the order in which they decided to gather these pieces of information.

The advantage of using (explicit) mouse tracking over eye tracking, is that the latter method can include patterns that may not be directly linked to human reasoning, but rather emerge in a bottom up fashion, due, for example, to visual properties of the stimulus (Hornof and Halverson, 2002 ; Holmqvist et al., 2011 ; Kok and Jarodzka, 2017 ). Similarly, continuous mouse movements may not be accurate in determining a user's focus of attention during tasks (Guo and Agichtein, 2010 ; Huang et al., 2012 ; Liebling and Dumais, 2014 ). Studying mouse movements that explicitly uncover information hidden behind buttons means that the events used in the analysis are much closer to conscious cognition.

3. Analysis

Two metrics were used to measure participants' performance. The first, Correctness, was a binary variable (correct/incorrect) indicating whether the participant's answer matched the correct answer. For this we applied the extensively used strict rounding criterion proposed by Gigerenzer and Hoffrage, where only those answers matching the true value rounding up or down to the next full percentage point were considered correct answers (Gigerenzer and Hoffrage, 1995 ).

The second variable, Log-Relative-Error, was a continuous variable measuring how far a participant's answer deviated from the correct answer. This is the result of the function log 10 ( P e P t ) , where Pe is the Estimated Posterior (the given answer) and Pt is the True Posterior, i.e., the answer obtained by applying Bayes' theorem to the data provided on the graph (Micallef et al., 2012 ; Reani et al., 2018a ). Thus, the variable Log-Relative-Error is the log-transformed ratio between the Estimated Posterior and the True Posterior, and indicates an overestimation, if positive, or an underestimation, if negative, of the probability of being in the presence of a fire given that the fire alarm was sounding, with respect to the true probability of such an event. Correct answers result in a value of zero. The full data and the script used for analysis is available on GitHub 4 .

3.1. Performance Analysis

In a logistic regression analysis, Correctness served as the response variable, and Information Format, DGM and Numeracy as the predictors. This was fitted to the aggregated data from both studies.

In a linear regression analysis of the data from study 2, Log-Relative-Error served as the response variable and Information Format and Log-Experience-Deviation as the two predictors. Log-Experience-Deviation is the result of the function log 10 ( P s P t ) , where Ps is the Subjective Posterior, i.e., the a priori estimate of the risk of fire in the case of an alarm, before seeing the actual data. This value was calculated using the estimates of the base rate, true positive rate and false alarm rate collected during the initial survey. Pt is the True Posterior, as generated using the actual data on the graph.

The value of Log-Experience-Deviation therefore indicates whether a person overestimates, if positive, or underestimates, if negative, the probability of fire in the case of an alarm (i.e., the posterior), in comparison with the real estimate derived using the data presented in the task. A value of zero would result if a participant's estimate based on self-reported lived experience exactly matched the PPV calculated using the aggregated values from study 1.

3.2. Mouse Event Analysis

To access an item of information, participants had to hover over the relevant button with a mouse. We designated a meaningful code to each of these locations as defined in Table 2 . T represents the button covering the total number of events, F is the button covering the events with fire, nF is the button covering the events with no fire, FA is the button covering the events with fire and alarm, FnA is the button covering the events with fire and no alarm, nFA is the button covering the events with no fire and alarm, nFnA is the button covering the events with no fire and no alarm and Q is the button covering the question.

Coding scheme for the locations (i.e., buttons) on the diagram.

Mouse event data was analyzed firstly by considering the proportion of time (as a percentage) spent viewing each location with respect to the total (aggregated) time spent viewing all locations, for each condition. To understand whether there were differences in the order in which people looked at locations between groups, a transition analysis was conducted (Reani et al., 2018a , b ). We were interested in determining which locations participants thought were important, and the order in which they accessed these before answering the question. We focused our investigation on bi-grams, calculating for each location the probability that a participant would access each of the other locations next (Reani et al., 2018b , 2019 ).

The locations thus define a sample space Ω, which includes eight locations in total, Ω = { T , F , nF , FA , FnA , nFA , nFnA , Q }, from which we derived all possible combinations, without repetition, to form the list of transitions between any two buttons, L = 8 × 7 = 56. Once the list of transitions was generated the frequency counts of these were extracted from the interaction data collected for each participant. These values were then normalized by the group total to obtain two frequency distributions of transitions (one for each condition). Then, we calculated the Hellinger distance between these two distributions, as an indicator of the amount of difference in terms of mouse behavior between the frequency format group and the probability format group. A permutation test, which compared the difference between the experimental groups with groups created at random, 10,000 times, was used to determine whether the difference in mouse movement between groups was due to chance, or to the manipulation of the variable of interest (Information Format).

Finally we identified which transitions were the most discriminative, i.e., the transitions that differed most, in term of relative frequency, between the frequency and probability conditions. Two parameters were taken into account to assess whether a transition was a meaningful discriminator. The first is the transition odds-ratio value, calculated as OR = ( p 1 - p   ÷   q 1 - q ) where p and q are the distributions of transitions in the frequency and probability conditions respectively. The odds-ratio, in this context, is a measure testing the relationship between two variables (Information Format and mouse behavior), and its 95% confidence interval provides a measure of the significance of this relationship. Further details about this method can be found in Reani et al. ( 2018b , 2019 ). An odds-ratio of one indicates that the transition is found in both conditions with the same relative frequency, and thus the further from one the odds-ratio is, the more discriminative it is. The second parameter is the maximum frequency F = max( x i , y i ) - i.e., the maximum value of the transition frequency between the frequency condition and probability condition (Reani et al., 2018b , 2019 ). A discriminative transition should also have a large F , as transitions that occur only a few times are not representative of the strategies used by the majority of people.

We compared participants' mouse behavior between the two formats (frequency vs. probability) in both study 1 and study 2. For study 1 only, we also compared the mouse behavior of correct and incorrect respondents, for both conditions (frequency and probability format) separately, to determine whether participants who answered correctly exhibited different mouse behavior from participants who answered incorrectly. This is because the number of correct responses was large enough to support a meaningful comparison only in study 1. In this latter analysis, the odds-ratio scale is a measure of the relationship between Correctness and mouse behavior.

The results are reported separately for each study and for each condition (probability vs. frequency), for the variables Correctness and Log-Relative-Error. When reporting results for the variable Numeracy, we aggregated the data of both studies. The results for the variable Log-Experience-Deviation are reported for study 2 only.

4.1. Performance Analysis Results

In the experience-matched data mode (study 1), 39% of the participants presented with the frequency format answered correctly, but only 14% of those presented with the probability format. In the experience-mismatched data mode (study 2), 9% of the participants answered correctly with the frequency format, and only 2% with the probability format. Thus more people answered correctly with the frequency format, regardless of the Data Generating Mode, and more people answered correctly with the experience-matched data mode, regardless of the Information Format.

The descriptive statistics for the variable Numeracy are reported for Correctness and Information Format separately, aggregating the data from both studies. For incorrect respondents in the frequency condition, the Numeracy median was Mdn = 3.88 (IQR = 0.88), for incorrect respondents in the probability condition, Mdn = 3.89 (IQR = 1.01), for correct respondents in the frequency condition, Mdn = 4.19 (IQR = 0.69), and for correct respondents in the probability condition, Mdn = 4.17 (IQR = 0.75). From these results, it appears that correct respondents were, on average, slightly more numerate than incorrect respondents. The full descriptive statistics are reported in Table 1 .

A logistic regression analysis, with Correctness as the response variable and Information Format, DGM and Numeracy as predictors shows that Information Format was a strong predictor of Correctness (odds ratio OR = 0.23, 95% Confidence Interval CI [0.12, 0.47]), indicating that the odds of answering correctly in the frequency format were four times the odds of answering correctly in the probability format. The logistic model shows that Data Generating Mode was also a strong predictor of Correctness (OR = 0.14, 95% CI [0.07, 0.29]), indicating that the odds of answering correctly in the experience-matched data mode were about 7 times the odds of answering correctly in the experience-mismatched data mode. Numeracy was not a strong predictor of Correctness (OR = 1.65, 95% CI [0.82, 3.32]).

As reported by Weber and colleagues, performance in Bayesian reasoning tasks seems to improve when no false negatives are present in the problem description; i.e., when the hit rate is 100% (Weber et al., 2018 ). If some of the participants, in study 1, were presented with a problem with no false negatives, this could potentially have influenced the results of the regression analysis. In study 1, there were only seven participants who were presented with a problem with a hit rate of 100%, and another two who were presented with a problem with a hit rate higher than 99%. To exclude potential confounders stemming from problems with a hit rate equal or close to 100%, we re-ran the analysis excluding these participants from the dataset. This did not significantly change the results (see Supplementary Materials ).

As the number of correct responses was limited, four additional 2x2 chi-squared tests were performed to assess whether there was a real relationship between Information Format and Correctness, one for each study, and between DGM and Correctness, one for each Information Format (the p-values reported below are adjusted using the Bonferroni method for multiple comparisons). The first chi-squared test of independence revealed that, in study 1, Information Format was significantly associated with Correctness, χ 2 (1, N = 156) = 11.762, p = 0.002). Cramer's V determined that these variables shared 28% variance. The second chi-squared test of independence revealed that, in study 2, Information Format was marginally associated with Correctness, χ 2 (1, N = 186) = 3.617, p = 0.057). Cramer's V determined that these variables shared 14% variance. The third chi-squared test of independence revealed that, in study 2, DGM was significantly associated with Correctness, χ 2 (1, N = 170) = 19.427, p < 0.001). Cramer's V determined that these variables shared 33% variance. The fourth chi-squared test of independence revealed that, in study 2, DGM was also significantly associated with Correctness, χ 2 (1, N = 172) = 7.11, p = 0.03). Cramer's V determined that these variables shared 20% variance.

The medians for the variable Log-Relative-Error were Mdn = 0.01 (IQR = 0.33) for the frequency format in study 1, Mdn = 0.30 (IQR = 0.68) for the probability format in study 1, Mdn = -0.23 (IQR = 0.90) for the frequency format in study 2 and Mdn = 0.46 (IQR = 0.53) for the probability format in study 2. It can be noted that the relative error was considerably larger for the probability format than the frequency format, and relatively larger in study 2 compared with study 1.

The medians for the variable Log-Experience-Deviation (in study 2 only) were Mdn = -0.42 (IQR = 1.92) for the frequency format and Mdn = 0.02 (IQR = 0.86) for the probability format. On average, the Subjective Posterior was considerably closer to the True Posterior for the probability format compared with the frequency format. Participants using the frequency format estimated that, on average, the probability of fire in the case of hearing a fire alarm was considerably smaller ( Mdn = 0.07) than the probability presented in the task ( Mdn = 0.17; this latter median is the value derived from the data collected in study 1; the other median values from study 1 were base rate = 0.1, true positive rate = 0.5, false alarm rate = 0.27). The median of the answers provided for the probability format was Mdn = 0.18, which is very close to the true value of 0.17. This indicates that, in study 2, participants' estimates in the probability format condition were similar to the estimates that participants in study 1 made about the risk of fire (see Figure 2 ).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-10-01548-g0002.jpg

Distribution of Log-Experience-Deviation for frequency (Left) and probability (Right) ; the vertical red dashed lines represent the medians.

In the second (linear) regression, conducted for study 2 only, we used Log-Relative-Error as the response variable and Information Format and Log-Experience-Deviation as the predictors. The second regression model was fitted to the data from study 2 only, as in study 1 the data presented for calculating the correct answers were derived from participant's reported experience (collected in the initial survey). In study 1 Log-Experience-Deviation is therefore a constant with a value of zero. The results from the regression indicate a significant effect of both predictors on the response variable Log-Relative-Error. For Information Format (with frequency format as the reference class), Beta = 0.26, 95% CI [0.13, 0.39]) and for Log-Experience-Deviation, Beta = 0.10, 95% CI [0.04, 0.16]. Thus, the probability format was associated with a 1.30 increase in the relative error, compared with the frequency format. This indicates that the use of probabilities produced a larger deviation in participants' estimates. The analysis also shows that with a one unit increase in the deviation of the Subjective Posterior from the True Posterior, the relative error in the estimate increased by 0.41 units, on average.

This result suggests that the more participants' self-reported lived experience differed from the actual data presented, the larger their over- or underestimate in the direction of those beliefs; the larger the deviation of the ( a priori ) Subjective Posterior from the True Posterior, the larger the deviation of the ( a posteriori ) Estimated Posterior from the True Posterior. This result suggests that the larger the deviation of the (a priori) Subjective Posterior (derived from participants' self-reported lived experience) from the True Posterior (derived from the problem data), the larger the deviation of the (a posteriori) Estimated Posterior (participants' answer) from the True Posterior. The bias in participants' response was also in the direction of their beliefs. This indicates that there is a tendency for people to give an answer consistent with their personal experience rather than the data provided.

4.2. Interaction Analysis Results

The interaction analysis is divided in two parts. The first part focuses on analyzing the amount of time participants spent on different locations of interest, comparing the two conditions (frequency vs. probability format). The second part focuses on analyzing the order in which these locations are visited by participants, looking for repetitive patterns within groups.

4.2.1. Dwell Time

The variable Dwell Time, measured as a percentage, is the amount of time viewing a location (hovering over a button) on the graph divided by the total time spent viewing all locations. This is reported in Table 3 , by condition (frequency vs. probability) and by study (study 1 vs. study 2). The table also reports d which is the difference between the mean Dwell Time for the frequency format and the mean Dwell Time for the probability format, divided by the pooled standard deviation. Here we only report the two largest d values in both studies. For the full results see Table 3 .

Means ( M ) and standard deviations ( SD ), for Dwell Time in percentages for each location, for the frequency format (left) and probability format (right), and for study 1 (top) and study 2 (bottom).

The table also reports the standardized difference in means by condition (d) .

The largest relative difference in study 1 was found in location FnA (fire and no alarm), with participants in the probability condition ( M = 7%, SD = 6) spending a larger proportion of time, on average, viewing this location than participants in the frequency condition ( M = 5%, SD = 4). The second largest relative difference in study 1 was found in location FA (fire and alarm), where participants in the probability condition ( M = 12%, SD = 9) spent a larger proportion of time, on average, than participants in the frequency condition ( M = 9%, SD = 7).

The largest relative difference in study 2 was found in location T (the total number of events), where participants in the frequency condition ( M = 12%, SD = 8) spent a larger proportion of time, on average, than participants in the probability condition ( M = 7%, SD = 7). The second largest relative difference in study 2 was found in location FnA (fire and no alarm), where participants in the probability condition ( M = 7%, SD = 7) spent a larger proportion of time, on average, than participants in the frequency condition ( M = 5%, SD = 7).

A consistent pattern found in both studies was that participants presented with the frequency format tended to spend more time on location T, and participants presented with the probability format tended to spend more time on location FnA. Moreover, in both studies, participants in the probability format condition tended to focus more on the upper branch of the Tree, represented by locations F, FA and FnA, compared with participants using the frequency format (see Table 3 ).

4.2.2. Permutation Tests

For study 1 and study 2, separate permutation tests, with 10,000 permutations each, compared the Hellinger distance between the distribution of transitions for the frequency format and the distribution of transitions for the probability format, with the distance between two distributions created at random (Reani et al., 2018b ).

The estimated sampling distributions of the two tests are shown in Figure 3 . The vertical red line represents the distance between the frequency and the probability groups for study 1, on the left, and for study 2, on the right. The gray curve represents the distributions of the distances between pairs of randomly sampled groups (with replacement) of comparable sizes. The cut-off area under the curve delimited to the right of the vertical line is the probability of the null hypothesis being true; i.e., that the distance between the transition distributions of the frequency and the probability groups is not different from the distances between any two groups of comparable sizes sampled at random from the population.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-10-01548-g0003.jpg

Sampling distribution of distances between the frequency and the probability groups for study 1 the left and study 2 on the right. The vertical red line is the actual Hellinger distance between groups.

The permutation test for study 1 shows a significant difference between the frequency and the probability conditions: the Hellinger distance is Hd = 0.123 and the p-value is p = 0.005. A similar effect was found for study 2 ( Hd = 0.119, p = 0.002). These results indicate that participants' mouse behavior differed between Information Format groups, in both studies.

For study 1 only, we ran two further permutation tests to investigate whether Correctness was also related to participants' mouse behavior in the frequency and probability conditions respectively. The comparison between transitions for the correct respondents and transitions for the incorrect respondents is meaningful only if there are enough participants who answered the problem correctly (Reani et al., 2018b , 2019 ). Thus, we did not run these tests on the participants who took part in the second study because in study 2 the number of correct responses was too small to enable a meaningful comparison. The results for the frequency condition did not show a significant difference between Correct and Incorrect groups – the Hellinger distance was Hd = 0.11 and the p-value was p = 0.21. A similar result was found for the probability condition ( Hd = 0.17, p = 0.80). These results indicate that participants' mouse behavior was not related to the variable Correctness.

4.2.3. Discriminative Transitions

The results from the first set of permutation tests suggest that, in both studies, there were mouse transitions that might typify users' behavior in different Information Format conditions.

Figure 4 shows, for study 1 on the left and for study 2 on the right, all the transitions by OR on the x -axis (scaled using a logarithmic transformation) and by absolute frequency on the y -axis.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-10-01548-g0004.jpg

Transitions distribution by odds-ratio (x-axis) and absolute frequency (y-axis) for study 1 (Left) and study 2 (Right) conditions.

The red circles are those transitions that have a narrow confidence interval that does not include the value one. These tend to be the transitions which have an OR far from one (represented in the graph by the vertical dashed blue line) and, at the same time, a relatively large F . Table 4 reports these transition together with their OR values, confidence intervals and frequency, for study 1 (top) and study 2 (bottom).

Discriminative Transitions by Study, with odds-ratio values, 95% confidence intervals and absolute frequency of occurrence.

In Table 4 , an OR value larger than one indicates a larger relative frequency for that transition in the frequency format compared with the probability format. There were five discriminative transitions in study 1, four of which represented the typical behavior of participants presented with the frequency format (F-T, nF-T, FnA-F and nFA-nF) and one which represented the typical behavior of participants presented with the probability format (Q-T). In study 2, we found four discriminative transitions, two of which represented the typical behavior of participants presented with the frequency format (F-T and nF-T), and two which represented the typical behavior of participants presented with the probability format (FnA-nF and nF-Q).

To understand what these transitions represent in the context of the problem, we mapped them onto the original tree diagram in Figure 5 , where red arrows represent the discriminative transitions for the frequency format (right) and the probability format (left) and for study 1 (top) and study 2 (bottom).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-10-01548-g0005.jpg

Discriminative transitions shown using arrows on the original tree diagram, for the frequency (Left) and probability (Right) conditions, and for study 1 (Top) and study 2 (Bottom) .

From the graph, it can be noted that, in study 1, participants in the frequency condition were more likely than participants in the probability condition to move leftwards, toward the total number of events (location T). In the probability format, they tended to move upwards, from location Q (the question) to location T (the total number of events).

In study 2, the pattern found in study 1 is repeated, i.e., participants in the frequency format tended to move leftwards, from the events with fire to the total number of events (F-T) and from the event with no-fire to the total (nF-T). Participants in the probability condition tended to move downwards, from the location representing the events with fire and no-alarm to the events with no-fire (FnA-nF) and from this latter location to the question (location Q).

5. Discussion

This research investigated the effects of Information Format (whether data is presented in frequencies or probabilities) and Data Generating Mode (whether or not the data directly matched an individual's self-reported lived experience), on how people approach probabilistic reasoning tasks (Gigerenzer, 2015 ; Hoffrage et al., 2015 ). To determine whether there were differences in reasoning behavior between conditions, it employed a novel interaction analysis approach in an online task. In line with previous research, we found that people were more likely to provide an accurate answer when presented with data in the frequency format than the probability format (Gigerenzer and Hoffrage, 1995 ; Gigerenzer, 2015 ; McDowell and Jacobs, 2017 ). In support of our first hypothesis (H1), we found that people were more likely to answer accurately when presented with data that matched their reported experience, than when they were presented with data that matched the average person's experience, and that the extent to which their answer deviated from the correct response in study 2 was directly related to the distance between the subjective posterior and the true posterior. This provides support for the idea that experiencing data is strongly related to being able to reason about it correctly (Gigerenzer, 2015 ; Sirota et al., 2015 ; Hertwig et al., 2018 ). It also demonstrates that the effect of this learned subjective posterior (here, the result of lived experience) may hinder people's ability to reason about information that does not match it.

The results did not support our second hypothesis (H2) as the manipulation of the format did have an effect regardless of DGM. This suggests that the difference in performance found in previous research comparing the frequency and probability formats is not due solely to the former being able to trigger the perception of learning from experience, but rather that, in line with the nested-set hypothesis, the facilitatory effect of the frequency format is due to a clearer representation of the relationships between sets (Sirota and Juanchich, 2011 ; Lesage et al., 2013 ; Sirota et al., 2015 ). We tested our third hypothesis (H3) – that mouse movement would differ according to the format in which the information is encoded – by using a web-based tool that forced people to hover the mouse cursor over those parts of the graph that the participants thought were crucial for solving the problem, and analyzing the differences in transitions between these locations.

In both studies, participants using the frequency format tended to focus more on the total number of events (location T), compared with participants using the probability format. It was also the case in both studies that participants in the probability format condition tended to focus more on location FnA (fire and no alarm) than participants in the frequency format condition. The question asks participants to estimate the probability of fire, given that the alarm sounded. Thus, looking at FnA is not necessary to answer the question. The only useful locations for solving the problem framed using probabilities are FA, nFA, F and nF, which were the pieces of data that had to be entered in the Bayesian formula to produce the correct estimate. One possible reason why people looked more at location FnA in the probability format condition, might be that the normalization process used in the probability Tree is not clear, and thus people look at the opposite data value in an attempt to understand how the data were normalized.

A second explanation is that people focus on this because they are trying to compare events with alarm and events without alarm given that there was a fire, confusing the sensitivity of the test with the PPV. This may explain why participants in the probability format tended to focus more on the information found on the upper branch of the tree, which shows only the data related to events with fire (see Table 3 ). This interpretation is represented in Figure 6 which shows, for the probability format condition, where the reasoner should focus to answer the question correctly (gray-filled circles), and where participants actually focused in the experiment (dashed-line circles).

An external file that holds a picture, illustration, etc.
Object name is fpsyg-10-01548-g0006.jpg

The probability format condition, marked to show where the reasoner should focus (gray-filled ellipses) and where participants actually focused (dashed-line ellipses).

The answer analysis showed that in the probability format condition 60% of participants in study 1 and 52% of participants in study 2 gave the value of the sensitivity of the test, instead of the PPV, as the answer to the problem. This reflects the mouse movement patterns described above (related to the second explanation). This result is also in line with previous research showing that the error made most often by participants, in probabilistic reasoning tasks framed using probabilities, is confusing the sensitivity of the tests with the PPV (Eddy, 1982 ; Elstein, 1988 ; Gigerenzer et al., 1998 ; Hoffrage and Gigerenzer, 1998 ). From our results, it appears that mouse behavior does indeed provide evidence for this faulty reasoning strategy. The transition analysis shows that participants, in the frequency format only, have the tendency to move the cursor leftwards, toward the total number of events (T). This ‘reversion to total’ behavior, which is found in both studies, is also found in research using eye tracking methods to study a similar problem (Reani et al., 2019 ). These results also reflect the responses provided by many of the participants. When presented with the frequency format, 45% of the participants in study 1 and 57% of the participants in study 2 used, as the denominator of the proportion in their answer, the total number of events (i.e., 1,000). This value was covered by the button T, and these results might explain why a great number of participants exhibit the behavior of going leftwards often, toward T. This provides behavioral evidence that, as proposed in other studies, the most common error of participants in the frequency format is to use the total population, instead of the relevant subset (i.e, alarm events) as the denominator in the answer, perhaps because they did not understand which population the question refers to (Khan et al., 2015 ; Reani et al., 2018a , 2019 ). When presented with the probability format, participants tended to move vertically, from the question toward location T (study 1) and from location nF toward the question (study 2). This suggests that participants tend to check the question more often when the problem is framed using probabilities, perhaps because the question in this case is more difficult to understand compared with when the problem is framed using frequencies. This confusion is in line with the fact that a significantly larger number of participants answered incorrectly when the problem was presented using probabilities. Although we found interesting correlates between participants' mouse behavior and their answers, this method has two main limitations. The first is that post hoc analyses of this type leave room for different interpretations. Here we interpret our results in terms of current theory and participants' responses.

The second limitation is that the experimental settings used in the current study were different from the settings used in previous research. Specifically, in our studies, the data were not available to the participants all of the time; participants needed to move the mouse to see the values hidden behind buttons, and this has the potential to change the reasoning process. In tasks where the data is always available, people have immediate access to the information and some aspects of this information may be taken on board implicitly and effortlessly. In tasks where the information is covered and people need to engage interactively with the tool to uncover the data, certain implicit processes that should occur in the data acquisition stage may be lost. This loss, however, can be beneficial for the purpose of studying conscious human cognition, especially in tasks involving complex reasoning, because it filters out some of those noisy patterns that are associated with low-level perceptions.

It is assumed that the fire-and-alarm scenario used in this study is a familiar situation to most of the participants (at least compared to the mammography problem). Nevertheless, we cannot be sure that the participants were all familiar with such a scenario. As a consequence of using a single scenario, we cannot be sure that these results would generalize to other familiar/everyday scenarios as well.

Throughout the study we kept the settings of the problem constant and we provided the same information in all the conditions. We manipulated only the format (probability vs. frequency) and whether the data provided matched people's reported experience. It is possible other factors may have affected the results. For instance, some participants may have been more familiar with the scenario compare to others, perhaps because they were firefighters.

Previous research has explored eye-mouse coordination patterns, for instance, in tasks where participants were asked to examine SERP (Guo and Agichtein, 2010 ). However, such a comparison has not been conducted in probabilistic reasoning research. Thus, future work will focus on combining different interaction analysis methods such as eye tracking and mouse tracking simultaneously, to understand what each can tell us about the reasoning process.

Although the study was not a memory test, as the data were available for the whole duration of the task, it is possible that, in study 1, familiarity with the uncertainty surrounding the event may have potentially lessened the load in working memory while performing calculations. To exclude any memory effect, this needs to be further investigated in further.

There is evidence that the tendency to use the total sample size as the denominator of the Bayesian ratio has been related to a superficial processing of the data (Tubau et al., 2018 ). The sequential presentation of isolated numbers might thus be linked to a more superficial processing. Future work could investigate this by comparing the effect of presenting complete uncovered trees vs. presenting covered trees, of the type used in this study.

A further limitation of the study is the use of the word “events” in the question, when referring to days in which there was a fire. Although the problem description uses “days” to describe the scenario, the question then asks participants to provide the number of events, which is an ambiguous term. Some participants may have had misunderstood what this term referred to.

Relatedly, the fact that the mean of the base rate across conditions ranges from 4 to 15% (i.e., it is relatively high), might indicate that some participants did not have a good “feeling” for the real base rate related to the scenario. This might need to be explored in future work as the scenario was chosen to be a familiar one when in fact, for some participants, this might not have been the case.

6. Conclusion

We investigated how Information Format affected mouse behavior in an interactive probabilistic reasoning task, and whether presenting probabilities that matched people's self-reported lived experience improved the accuracy of their posterior probability estimates. We found that the closer the data presented in the task were to self-reported experience, the more accurate people's answers were, indicating that the subjective posterior developed through lived experience had an overwhelming impact on the reasoning process. We also found that people are better able to reason about data presented in frequencies regardless of whether they match experience. By analyzing mouse events in light of participants' responses, we obtained evidence for different faulty strategies related to frequency presentation and probability presentation respectively. This supports analysis of mouse behavior as a way of gathering evidence about the cognitive processes underpinning probabilistic reasoning.

Data Availability

Ethics statement.

This study was carried out in accordance with the recommendations of the Computer Science School Panel at The University of Manchester with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by The University of Manchester ethics committee.

Author Contributions

MR wrote the manuscript, designed the experiments, collected the data, and performed the analysis. AD helped with software development. NP supervised the project and advised on the statistical analysis. CJ supervised the project and edited the manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1 Amazon Mechanical Turk: https://www.mturk.com/

2 GitHub: https://github.com/IAM-lab/FireWeb

3 GitHub: https://github.com/aapaolaza/UCIVIT-WebIntCap

4 GitHub: https://github.com/manurea/study4

Funding. MR's work was funded by the Engineering and Physical Science Research Council (EPSRC number: 1703971). NP's work was partially funded by the Engineering and Physical Sciences Research Council (EP/P010148/1) and by the National Institute for Health Research (NIHR) Greater Manchester Patient Safety Translational Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2019.01548/full#supplementary-material

  • Apaolaza A., Harper S., Jay C. (2013). Understanding users in the wild , in W4A '13 Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility (New York, NY: ACM; ). 10.1145/2461121.2461133 [ CrossRef ] [ Google Scholar ]
  • Bar-Hillel M. (1983). The base rate fallacy controversy , in Decision Making Under Uncertainty, Advances in Psychology , vol 16 , ed Scholz R. W. 39–61. 10.1016/S0166-4115(08)62193-7 [ CrossRef ] [ Google Scholar ]
  • Behrend T. S., Sharek D. J., Meade A. W., Wiebe E. N. (2011). The viability of crowdsourcing for survey research . Behav. Res. Methods 43 :800. 10.3758/s13428-011-0081-0 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Binder K., Krauss S., Bruckmaier G. (2015). Effects of visualizing statistical information–an empirical study on tree diagrams and 2 ×2 tables . Front. Psychol. 6 :1186 10.3389/fpsyg.2015.01186 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Binder K., Krauss S., Bruckmaier G., Marienhagen J. (2018). Visualizing the bayesian 2-test case: the effect of tree diagrams on medical decision making . PLoS ONE 13 :e0195029. 10.1371/journal.pone.0195029 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Böcherer-Linder K., Eichler A. (2017). The impact of visualizing nested sets. an empirical study on tree diagrams and unit squares . Front. Psychol. 7 :2026. 10.3389/fpsyg.2016.02026 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brase G. L., Hill W. T. (2017). Adding up to good bayesian reasoning: problem format manipulations and individual skill differences . J. Exp. Psychol. General 146 :577–591. 10.1037/xge0000280 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cohen A. L., Sidlowski S., Staub A. (2017). Beliefs and bayesian reasoning . Psychon. Bull. Rev. 24 , 972–978. 10.3758/s13423-016-1161-z [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cosmides L., Tooby J. (1996). Are humans good intuitive statisticians after all? rethinking some conclusions from the literature on judgment under uncertainty . Cognition 58 , 1–73. [ Google Scholar ]
  • Cosmides L., Tooby J. (2008). Can a general deontic logic capture the facts of human moral reasoning? how the mind interprets social exchange rules and detects cheaters , in Moral Psychology , ed Sinnott-Armstrong (Cambridge, MA: MIT Press; ), W53–W119. [ Google Scholar ]
  • Eddy D. M. (1982). Probabilistic reasoning in clinical medicine: Problems and opportunities , in Judgment Under Uncertainty: Heuristics and Biases , eds Kahneman D., Slovic P., Tversky A. (Cambridge: Cambridge University Press; ), 249–267. [ Google Scholar ]
  • Edwards W. (1968). Conservatism in human information processing , Formal Representation of Human Judgment , ed Kleinmuntz B. (New York, NY: John Wiley & Sons Inc; ). [ Google Scholar ]
  • Edwards W. (1982). Conservatism in Human Information Processing . Cambridge: Cambridge University; Press , 359–369. [ Google Scholar ]
  • Elstein A. S. (1988). Cognitive processes in clinical inference and decision making , in Reasoning, Inference, and Judgment in Clinical Psychology ed Salovey D. C. T. P. (New York, NY: Free Press; ), 17–50. [ Google Scholar ]
  • Fagerlin A., Zikmund-Fisher B. J., Ubel P. A., Jankovic A., Derry H. A., Smith D. M. (2007). Measuring numeracy without a math test: development of the Subjective Numeracy Scale . Med. Decision Making 27 , 672–680. 10.1177/0272989X07304449 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gigerenzer G. (2015). On the supposed evidence for libertarian paternalism . Rev. Philos. Psychol. 6 „ 361–383. 10.1007/s13164-015-0248-1 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Gigerenzer G., Hoffrage U. (1995). How to improve bayesian reasoning without instruction: frequency formats . Psychol. Rev. 102 :684. [ Google Scholar ]
  • Gigerenzer G., Hoffrage U. (2007). The role of representation in bayesian reasoning: correcting common misconceptions . Behav. Brain Sci. 30 , 264–267. 10.1017/S0140525X07001756 [ CrossRef ] [ Google Scholar ]
  • Gigerenzer G., Hoffrage U., Ebert A. (1998). Aids counselling for low-risk clients . AIDS Care 10 , 197–211. [ PubMed ] [ Google Scholar ]
  • Guo Q., Agichtein E. (2010). Ready to buy or just browsing?: Detecting web searcher goals from interaction data , in SIGIR '10 Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (New York, NY: ACM; ), 130–137. [ Google Scholar ]
  • Hertwig R., Barron G., Weber E. U., Erev I. (2004). Decisions from experience and the effect of rare events in risky choice . Psychol. Sci. 15 , 534–539. 10.1111/j.0956-7976.2004.00715.x [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hertwig R., Hogarth R. M., Lejarraga T. (2018). Experience and description: exploring two paths to knowledge . Curr. Direct. Psychol. Sci. 27 , 123–128. 10.1177/0963721417740645 [ CrossRef ] [ Google Scholar ]
  • Hoffrage U., Gigerenzer G. (1998). Using natural frequencies to improve diagnostic inferences . Acad. Med. 73 , 538–540. 10.1097/00001888-199805000-00024 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hoffrage U., Gigerenzer G., Krauss S., Martignon L. (2002). Representation facilitates reasoning: What natural frequencies are and what they are not . Cognition 84 , 343–352. 10.1016/S0010-0277(02)00050-1 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hoffrage U., Krauss S., Martignon L., Gigerenzer G. (2015). Natural frequencies improve bayesian reasoning in simple and complex inference tasks . Front. Psychol. 6 :1473. 10.3389/fpsyg.2015.01473 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Holmqvist K., Nyström M., Andersson R., Dewhurst R., Jarodzka H., Van de Weijer J. (2011). Eye Tracking: A Comprehensive Guide to Methods and Measures . Oxford: Oxford University Press. [ Google Scholar ]
  • Hornof A. J., Halverson T. (2002). Cleaning up systematic error in eye-tracking data by using required fixation locations . Behav. Res. Methods Instr. Comput. 34 , 592–604. 10.3758/BF03195487 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Huang J., White R., Buscher G. (2012). User see, user point: gaze and cursor alignment in web search , in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, NY: ACM; ), 1341–1350. [ Google Scholar ]
  • Khan A., Breslav S., Glueck M., Hornbæk K. (2015). Benefits of visualization in the Mammography Problem . Int. J. Hum. Comput. Stud. 83 , 94–113. 10.1016/j.ijhcs.2015.07.001 [ CrossRef ] [ Google Scholar ]
  • Kok E. M., Jarodzka H. (2017). Before your very eyes: the value and limitations of eye tracking in medical education . Med. Educ. 51 , 114–122. 10.1111/medu.13066 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lesage E., Navarrete G., De Neys W. (2013). Evolutionary modules and bayesian facilitation: the role of general cognitive resources . Think. Reas. 19 , 27–53. 10.1080/13546783.2012.713177 [ CrossRef ] [ Google Scholar ]
  • Liebling D. J., Dumais S. T. (2014). Gaze and mouse coordination in everyday work , in Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication (New York, NY: ACM; ), 1141–1150. [ Google Scholar ]
  • Mandel D. R. (2014). The psychology of bayesian reasoning . Front. Psychol. 5 :1144. 10.3389/fpsyg.2014.01144 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mason W., Suri S. (2012). Conducting behavioral research on amazon's mechanical turk . Behav. Res. Methods 44 , 1–23. 10.3758/s13428-011-0124-6 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mayer R. (2005). Cognitive theory of multimedia learning , in The Cambridge Handbook of Multimedia Learning eds Fagerberg J., Mowery D. C., Nelson R. R. (Cambridge: Cambridge University Press; ), 31–48. [ Google Scholar ]
  • McDowell M., Jacobs P. (2017). Meta-analysis of the effect of natural frequencies on bayesian reasoning . Psychol. Bull. 143 :1273–1312. 10.1037/bul0000126 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Micallef L., Dragicevic P., Fekete J. (2012). Assessing the effect of visualizations on bayesian reasoning through crowdsourcing . IEEE Trans. Visual. Comput. Graph. 18 , 2536–2545. 10.1109/TVCG.2012.199 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ottley A., Peck E. M., Harrison L. T., Afergan D., Ziemkiewicz C., Taylor H. A., et al.. (2016). Improving bayesian reasoning: The effects of phrasing, visualization, and spatial ability . IEEE Trans. Visual. Comput. Graph. 22 , 529–538. 10.1109/TVCG.2015.2467758 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Phillips L. D., Edwards W. (1966). Conservatism in a simple probability inference task . J. Exp. Psychol. 72 :346. 10.1037/h0023653 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Reani M., Davies A., Peek N., Jay C. (2018a). How do people use information presentation to make decisions in bayesian reasoning tasks? Int. J. Hum. Comput. Stud. 111 , 62–77. 10.1016/j.ijhcs.2017.11.004 [ CrossRef ] [ Google Scholar ]
  • Reani M., Peek N., Jay C. (2018b). An investigation of the effects of n-gram length in scanpath analysis for eye-tracking research , in Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications (New York, NY: ACM; ). 10.1145/3204493.3204527 [ CrossRef ] [ Google Scholar ]
  • Reani M., Peek N., Jay C. (2019). How different visualizations affect human reasoning about uncertainty: an analysis of visual behaviour . Comput. Hum. Behav. 92 , 55–64. 10.1016/j.chb.2018.10.033 [ CrossRef ] [ Google Scholar ]
  • Schulte-Mecklenbeck M., Kuehberger A., Johnson J. G. (2019). A Handbook of Process Tracing Methods, 2nd Ed . New York, NY: Routledge. [ Google Scholar ]
  • Sirota M., Juanchich M. (2011). Role of numeracy and cognitive reflection in bayesian reasoning with natural frequencies . Studia Psychol. 53 , 151–161. Available online at: https://www.scopus.com/record/display.uri?eid=2-s2.0-79960000436&origin=inward&txGid=20f0bffecbf43b0d39147c29faea8c5c [ Google Scholar ]
  • Sirota M., Vallée-Tourangeau G., Vallée-Tourangeau F., Juanchich M. (2015). On bayesian problem-solving: helping bayesians solve simple bayesian word problems . Front. Psychol. 6 :1141. 10.3389/fpsyg.2015.01141 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Slovic P., Lichtenstein S. (1971). Comparison of bayesian and regression approaches to the study of information processing in judgment . Organ. Behav. Hum. Perform. 6 , 649–744. [ Google Scholar ]
  • Sweller J. (2003). Evolution of human cognitive architecture , in Psychology of Learning and Motivation , Vol 43 (Academic Press), 215–266. Available online at: https://www.sciencedirect.com/science/article/pii/S0079742103010156
  • Traczyk J., Sobkow A., Matukiewicz A., Petrova D., Garcia-Retamero R. (2019). The experience-based format of probability improves probability estimates: The moderating role of individual differences in numeracy . Int. J. Psychol . [Epub ahead of print]. 10.1002/ijop.12566 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tubau E., Rodríguez-Ferreiro J., Barberia I., Colomé À. (2018). From reading numbers to seeing ratios: a benefit of icons for risk comprehension . Psychol. Res. [Epub ahead of print].1–9. 10.1007/s00426-018-1041-4 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Tversky A., Kahneman D. (1974). Judgment under uncertainty: heuristics and biases . Science 185 , 1124–1131. [ PubMed ] [ Google Scholar ]
  • Weber P., Binder K., Krauss S. (2018). Why can only 24% solve bayesian reasoning problems in natural frequencies: frequency phobia in spite of probability blindness . Front. Psychol. 9 :1833. 10.3389/fpsyg.2018.01833 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

The natural frequency hypothesis and evolutionary arguments

  • Published: 19 September 2014
  • Volume 14 , pages 1–19, ( 2015 )

Cite this article

frequency format hypothesis

  • Yuichi Amitani 1  

534 Accesses

Explore all metrics

In the rationality debate, Gerd Gigerenzer and his colleagues have argued that human’s apparent inability to follow probabilistic principles does not mean our irrationality, because we can do probabilistic reasoning successfully if probability information is given in frequencies, not percentages (the natural frequency hypothesis). They also offered an evolutionary argument to this hypothesis, according to which using frequencies was evolutionarily more advantageous to our hominin ancestors than using percentages, and this is why we can reason correctly about probabilities in the frequency format. This paper offers a critical review of this evolutionary argument. I show that there are reasons to believe using the frequency format was not more adaptive than using the standard (percentage) format. I also argue that there is a plausible alternative explanation (the nested-sets hypothesis) for the improved test performances of experimental subjects—one of Gigerenzer’s key explananda—which undermines the need to postulate mental mechanisms for probabilistic reasoning tuned to the frequency format. The explanatory thrust of the natural frequency hypothesis is much less significant than its advocates assume.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Similar content being viewed by others

Now you bayes, now you don’t: effects of set-problem and frequency-format mental representations on statistical reasoning.

frequency format hypothesis

An evolutionary explanation of the Allais paradox

frequency format hypothesis

Propensities, Probabilities, and Experimental Statistics

The quizzes where subjects are asked to calculate the probability of a hypothesis ( H : you are infected with HIV) given data ( D : a test says you are infected with HIV)— \(\Pr (H\mid D)\) —from other probabilities, such as the probability of the hypothesis ( \(Pr(H)\) ) and that of false alarm ( \(\Pr (D\mid \lnot H)\) )—see Sect.  4 for details.

Gigerenzer and his colleagues distinguish natural frequencies and relative frequencies . According to a definition adopted in Gigerenzer ( 2000 ), the former come from natural sampling (in his words: “[n]atural frequencies report the final tally of a natural sampling process” (63)). In natural sampling, one gets a frequency of a particular event-type from his or her experience sequentially (as opposed to systematic survey or experiments with the sample size being fixed in advance—as he says, “the base rate are fixed before any observations are made” in the systematic sampling (96)). A natural frequency is the frequency which he or she acquires that way (e.g., ‘5 hunting successes out of 10 attempts’) and conveys information on the number of samples. Relative frequencies are normalized numbers and thereby convey no information on base rates (e.g., ‘success rate in hunting of 0.2’). However, Gigerenzer and his colleagues have changed his definition in recent years. See discussion on p. 13. Supporters of the natural frequency hypothesis argue that the frequency representation by natural frequency rather than relative frequency affects subjects’ performances. Hereafter, we only refer to natural frequencies as ‘frequencies,’ except as otherwise noted.

Some notable philosophers are also sympathetic toward the natural frequency hypothesis. For example, Nozick ( 1993 ) and de Sousa ( 2007 ) offer extended discussion on it.

A study suggests that this faciliatation effect may be restricted to people with high academic ability, like undergraduate students and obstetricians. See Bramwell et al. ( 2006 ), although Zhu and Gigerenzer ( 2006 ) report that 4th to 6th graders can benefit from the frequency representation.

Barbey and Sloman ( 2007 ) make a similar point, but they do not point out possible evolutionary implications to the uses of both formats.

Sloman and Over ( 2003 ) and Over ( 2003 ) make a similar point: we have to memorize all relevant experiences in order to track frequencies. But since they do not explicitly compare various strategies, it is not clear whether this difficulty is avoided in other strategies. Brase ( 2002 ), a supporter of the natural frequency hypothesis, is also aware that the hypothesis has a problem with memory, but he does not compare it with other strategies either.

Another possible reply on behalf of the natural frequency hypothesis is that one may not memorize frequencies of events themselves (Brase et al. ( 1998 ) suggests this possibility). One may just memorize events in the form of episodic memory—yesterday’s hunting by George, his hunting one week ago, and so on—without storing the frequency explicitly in his mind, and build George’s success rate from one’s episodic memory when asked (“How many times does George succeed in hunting these days? He did well yesterday, but he failed last week...”). A difficulty with this proposal is that the frequency of events one acquires this way is often inaccurate due to psychological biases, such as the availability bias (Over 2003 , see also Tversky and Kahneman 1982 ). Note that the evolutionary arguments assume that the frequency format allows us to store and retrieve information about past events with some accuracy. If humans store and retrieve inaccurate information about the frequency of past events, then one will wonder what the benefit of using the frequency format is.

Some might appeal to the studies on automatic encoding of frequency information of the occurrence of an event [reviewed by Hasher and Zacks ( 1984 ); see also Zacks and Hasher ( 2002 )]. According to Hasher and Zacks, humans can accurately record the frequency of an event and we can do this rather automatically. In one experiment reviewed by them (Hasher and Chromiak 1977 ), for instance, subjects can record how many times they saw different words on the slides in a single session with reasonable accuracy whether or not they were instructed to do it in advance. Those studies, some might argue, imply that natural frequencies put little burden on memory, because frequencies are recorded in one’s mind without significant effort. I have two replies. First, automatic encoding does not imply automatic retention or retrieval of the frequency information. In fact, as Underwood et al. ( 1971 ) show, we do lose access to the substantial portion of our memory on frequency information only in a week. Second, in many of the experiments reviewed by Hasher and Zacks, subjects are exposed to lists of items in a lab over a relatively short time. In contrast, natural sampling is a sequential process and thus data may come from years of sampling. Thus it leaves it open whether one encodes frequency data automatically if it is accumulated over a long period of time.

Sloman et al. ( 2003 ) are not the only or the earliest ones who support this hypothesis. Tversky and Kahneman ( 1983 ), who found the facilitation of probabilistic reasoning by frequencies earlier than Gigerenzer and his colleagues, suggested this possibility. The use of tree diagrams for this purpose is traced back to Kleiter ( 1994 ). Kleiter also points out that natural sampling makes computation in Bayesian reasoning easy. See also Over ( 2007 ) on the logical relationships between tree diagrams and set/subset inclusions.

Those in the nested-sets camp have proposed two other methods of making the nested-sets structure transparent: (1) Girotto and Gonzalez ( 2001 ) represent the probabilities by using the term ‘chance’ (“A person who was tested had 4 chances out of 100 of having the infection”), and (2) Sloman et al. ( 2003 ) and Yamagishi ( 2003 ) used various diagrams, such as Euler and tree diagrams. I do not stress the “chance” language in the present paper because I believe that, as Brase ( 2008 ) points out, if all the probabilities are represented in terms of ‘chance’ in the instruction, then some participants may take them as frequencies. In contrast, when it comes to the effects of diagrams on Bayesian reasoning, experimental findings are somewhat mixed. Brase ( 2009 ) claims that a plain Euler diagram does not facilitate Bayesian reasoning as much as the no-picture version of the quiz (Experiment 1) and that “icon” diagrams making individuation of objects salient facilitate our reasoning more. However, this conflicts with the results obtained by Sloman et al. ( 2003 ) and Yamagishi ( 2003 ) that various diagrams such as Euler, tree and roulette-wheel diagrams do facilitate Bayesian inference compared to the plain probability format.

Some might think that the term ‘chance’ in the instruction is read as suggesting a frequency rather than a probability (Brase 2008 ). But since ‘chance’ is used only once and the only usage of it is sandwiched by two usages of ‘probability’ (the first sentence and the question), most will probably take it as a probability, not a frequency.

There are a couple of studies suggesting that natural sampling may not facilitate the probabilistic inference, especially among small children. In an experiment Girotto and Gonzalez ( 2008 ) found that young children (4–5 year olds) can successfully update the prior probability of an event to calculate its posterior probability after exposed to new sampling data only when they did not observe the sampling process by themselves; when they observed the sampling process, children often predict the outcome which they did not observe in prior trials as in the gambler’s fallacy. Téglás et al. ( 2007 ) also revealed that 3-year-olds do not successfully change their probabilistic expectations even after they saw unexpected events for a number of times (see also Téglás et al. ( 2011 ) for similar results). These studies lead us to suspect that young children would not update their probabilistic beliefs properly after natural sampling, although the supporters of the natural frequency hypothesis do not specify exactly when the facilitation of probabilistic reasoning by frequencies arises in a developmental sequence.

For example, Barbey and Sloman ( 2007 ) do not make a clear response to the same point made by Gigerenzer and Hoffrage ( 2007 ) and Barton et al. ( 2007 ) in their comments.

Supporters of the natural frequency hypothesis are not always consistent in this respect though. For example, Zhu and Gigerenzer ( 2006 , p. 303) call “1 out of 2” and “1 out of 3” natural frequencies. Gigerenzer did not revise this part when the original article was reprinted in his book (Gigerenzer 2008 , p. 188).

Furthermore, this move costs another selling point for the natural frequency hypothesis. Along with the Bayesian inference quizzes, supporters of the hypothesis have stressed that subjects are better at avoiding the conjunction fallacy under frequencies (Gigerenzer 1993 ). Yet under the alternative definition, frequencies in the instruction will no longer count as natural frequencies. Thus the results would no longer favor the natural frequency hypothesis under the new definition.

Barbey A, Sloman S (2007) Base-rate respect: from ecological rationality to dual processes. Behav Brain Sci 30:241–297

Google Scholar  

Barton A, Mousavi S, Stevens J (2007) A statistical taxonomy and another “chance” for natural frequencies. Behav Brain Sci 30:255–256

Article   Google Scholar  

Bennett D (2004) Logic made easy: how to know when language deceives you. WW Norton & Company, NY

Bramwell R, West H, Salmon P (2006) Health professionals’ and service users’ interpretation of screening test results: experimental study. Brit Med J 333:284–288

Brase G (2002) Which statistical formats facilitate what decisions? The perception and influence of different statistical information formats. J Behav Decis Mak 15:381–401

Brase G (2007) Omissions, conflations, and false dichotomies: conceptual and empirical problems with the Barbey and Sloman account. Behav Brain Sci 30:258–259

Brase G (2008) Frequency interpretation of ambiguous statistical information facilitates Bayesian reasoning. Psychon B Rev 15:284–289

Brase G (2009) Pictorial representations in statistical reasoning. Appl Cogn Psychol 23(3):369–381

Brase G, Cosmides L, Tooby J (1998) Individuation, counting, and statistical inference: the role of frequency and whole-object representations in judgment under uncertainty. J Exp Psychol 127:3–21

Cosmides L, Tooby J (1996) Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58:1–73

Gigerenzer G (1991) How to make cognitive illusions disappear: beyond heuristics and biases. Eur Rev Soc Psychol 2:83–115

Gigerenzer G (1993) The bounded rationality of probabilistic mental models. In: Manktelow K, Over D (eds) Rationality: psychological and philosophical perspectives. Routledge, London, pp 284–313

Gigerenzer G (1998) Ecological intelligence: an adaptation for frequencies. In: Cummins D, Allen C (eds) The evolution of mind. Oxford University Press, Oxford, pp 9–29

Gigerenzer G (2000) Adaptive thinking: rationality in the real world. Oxford University Press, New York

Gigerenzer G (2001) Content-blind norms, no norms, or good norms? A reply to Vranas. Cognition 81:93–103

Gigerenzer G (2008) Rationality for mortals. Oxford University Press, New York

Gigerenzer G, Hoffrage U (1995) How to improve Bayesian reasoning without instruction: frequency formats. Psychol Rev 102(4):684–704

Gigerenzer G, Hoffrage U (1999) Overcoming difficulties in Bayesian reasoning: a reply to Lewis and Keren and Mellers and McGraw. Psychol Rev 106:425–430

Gigerenzer G, Hoffrage U (2007) The role of representation in Bayesian reasoning: correcting common misconceptions. Behav Brain Sci 30:264–267

Girotto V, Gonzalez M (2001) Solving probabilistic and statistical problems: a matter of information structure and question form. Cognition 78:247–276

Girotto V, Gonzalez M (2002) Chances and frequencies in probabilistic reasoning: rejoinder to Hoffrage, Gigerenzer, Krauss, and Martingen. Cognition 84:353–359

Girotto V, Gonzalez M (2008) Children’s understanding of posterior probability. Cognition 106:325–344

Griggs R, Newstead S (1982) The role of problem structure in a deductive reasoning task. J Exp Psychol Learn 8:297–307

Groarke L, Tindale C, Fisher L (1997) Good reasoning matters!, 3rd edn. Oxford University Press, Oxford

Grossen B, Carnine D (1990) Diagramming a logic strategy: effects on difficult problem types and transfer. Learn Disabil Q 13:168–182

Hasher L, Chromiak W (1977) The processing of frequency information: an automatic mechanism? J Verbal Learning Verbal Behav 16:173–184

Hasher L, Zacks R (1984) Automatic processing of fundamental information: the case of frequency of occurrence. Am Psychol 39:1372–1388

Hoffrage U, Gigerenzer G (1998) Using natural frequencies to improve disgnostic inferences. Acad Med 73:538–40

Hoffrage U, Gigerenzer G, Krauss S, Martingnon L (2002) Representation facilitates reasoning: what natural frequencies are and what they are not. Cognition 84:343–352

Johnson-Laird P, Legrenzi P, Girotto V, Legrenzi M, Caverni JP (1999) Naive probability: a mental mode theory of extensional reasoning. Psychol Rev 106:62–88

Kahneman D, Tversky A (1996) On the reality of cognitive illusions. Psychol Rev 103(3):582–591

Kleiter G (1994) Natural sampling: rationality without base rates. In: Fischer GH, Laming D (eds) Contributions to mathematical psychology, psychometrics, and methodology. Springer, Berlin, pp 375–388

Chapter   Google Scholar  

Lewis C, Keren G (1999) On the difficulties underlying Bayesian reasoning: a comment on Gigerenzer and Hoffrage. Psychol Rev 106:411–416

Neace W, Michaud S, Bolling L, Deer K, Zecevic L (2008) Frequency formats, probability formats, or problem structure? A test of the nested-sets hypothesis in an extensional reasoning task. Judgm Decis Mak 3:140–152

Nozick R (1993) The nature of rationality. Princeton University Press, Princeton, NJ

Over D (2000a) Ecological issues: a reply to Todd, Fiddick, and Krauss. Think Reason 6(4):385–388

Over D (2000b) Ecological rationality and its heuristics. Think Reason 6(2):182–192

Over D (2003) From massive modularity to metarepresentation: the evolution of higher cognition. In: Over D (ed) Evolution and the psychology of thinking: the debate. Psychology Press, Hove, England, pp 121–144

Over D (2007) The logic of natural sampling. Behav Brain Sci 30:277

Patterson R (2007) The versatility and generality of nested set operations. Behav Brain Sci 30:277–278

Pinker S (1997) How the mind works. Norton, New York

Polonioli A (2012) Gigerenzer’s “external validity argument” against the heuristics and biases program: an assessment. Mind Soc 11:133–148

Samuels R, Stich S, Bishop M (2001) Ending the rationality wars: how to make disputes about human rationality disappear. In: Elio R (ed) Common sense, reasoning and rationality. Oxford University Press, New York, pp 236–268

Samuels R, Stich S, Faucher L (2004) Reason and rationality. In: Niiniluoto IWJ, Sintonen M (eds) Handbook of epistemology. Springer, Berlin, pp 131–182

Sloman S, Over D (2003) Probability judgment from the inside and out. In: Over D (ed) Evolution and the psychology of thinking: the debate. Psychology Press, Hove, England, pp 145–169

Sloman S, Over D, Slovak L, Stibel J (2003) Frequency illusions and other fallacies. Organ Behav Hum Decis 91:296–309

de Sousa R (2007) Why think?: evolution and the rational mind. Oxford University Press, Oxford

Book   Google Scholar  

Stanovich K, West R (2000) Individual differences in reasoning: Implications for the rationality debate? Behav Brain Sci 23:645–665

Stanovich K, West R (2003) Evolutionary versus instrumental goals: how evolutionary psychology misconceives human rationality. In: Over D (ed) Evolution and the psychology of thinking: the debate. Psychology Press, Hove, England, pp 171–230

Téglás E, Girotto V, Gonzalez M, Bonatti L (2007) Intuitions of probabilities shape expectations about the future at 12 months and beyond. Proc Natl Acad Sci USA 104:19156–19159

Téglás E, Vul E, Girotto V, Gonzalez M, Tenenbaum J, Bonatti L (2011) Pure reasoning in 12-month-old infants as probabilistic inference. Science 332:1054–1059

Tversky A, Kahneman D (1982) Judgment under uncertainty: heuristics and biases. In: Kahneman D, Slovic P, Tversky A (eds) Judgment under uncertainty: heuristics and biases. Cambridge University Press, Cambridge, England

Tversky A, Kahneman D (1983) Extensional versus intuition reasoning: conjunction fallacy in probability judgment. Psychol Rev 90:293–315

Underwood B, Zimmerman J, Freund J (1971) Retention of frequency information with observations on recognition and recall. J Exp Psychol 87:149–162

Vranas P (2000) Gigerenzer’s normative critique of Kahneman and Tversky. Cognition 76:179–193

Vranas P (2001) Single-case probability and content-neutral norms: a reply to Gigerenzer. Cognition 81:105–111

Woods J, Irvine A, Walton D (2003) Argument: critical thinking, logic and the fallacies, 2nd edn. Prentice Hall, Pearson

Yamagishi K (2003) Facilitating normative judgments of conditional probability: frequency or nested sets. Exp Psychol 50:97–106

Zacks R, Hasher L (2002) Frequency processing: a twenty-five year perspective. In: Sedlmeier P (ed) ETC frequency processing and cognition. Oxford University Press, London, pp 21–36

Zhu L, Gigerenzer G (2006) Children can solve bayesian problems: the role of representation in mental computation. Cognition 98:287–308

Download references

Acknowledgments

I am very grateful to the following people for improvements to this paper which I could not have made without them: Paul Bartha, John Beatty, Gary Brase, Vittorio Girotto, Konstantinos Katsikopoulos, Kohei Kishida, the late Brian Laetz, Patrick Rysiew, Chris Stephens, the audience in the biennual meeting of the Philosophy of Science Association (in Pittsburgh, USA, November 8th 2008), and two anonymous reviewers. An earlier, shorter version of the paper was published in Kagaku Tetsugaku (in Japanese) as “Hindo kasetsu to shinka kara no ronkyo.” This work is financially supported by JSPS KAKENHI (Grant Number: 25370016).

Author information

Authors and affiliations.

Tokyo University of Agriculture, 196 Yasaka, Abashiri, Hokkaido, 0992493, Japan

Yuichi Amitani

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Yuichi Amitani .

Rights and permissions

Reprints and permissions

About this article

Amitani, Y. The natural frequency hypothesis and evolutionary arguments. Mind Soc 14 , 1–19 (2015). https://doi.org/10.1007/s11299-014-0155-7

Download citation

Received : 11 February 2014

Accepted : 03 September 2014

Published : 19 September 2014

Issue Date : June 2015

DOI : https://doi.org/10.1007/s11299-014-0155-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Probabilistic reasoning
  • Fast and frugal heuristics
  • Ecological rationality
  • Evolutionary psychology
  • Find a journal
  • Publish with us
  • Track your research

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

Frequency formats, probability formats, or problem structure? A test of the nested-sets hypothesis in an extensional reasoning task

Profile image of William Neace

2008, Judgment and Decision Making

Related Papers

Frederic Vallee-Tourangeau

frequency format hypothesis

Journal of Cognitive Psicology

Rodrigo Moro

Esteban Freidin

Since the 1970s, the Heuristics and Biases Program in Cognitive Psychology has shown that people do not reason correctly about conditional probability problems. In the 1990s, however, evolutionary psychologists discovered that if the same problems are presented in a different way, people’s performance greatly improves. Two explanations have been offered to account for this facilitation effect: the natural frequency hypothesis and the nested-set hypothesis. The empirical evidence on this debate is mixed. We review the literature pointing out some methodological issues that we take into account in our own present experiments. We interpret our results as suggesting that when the mentioned methodological problems are tackled, the evidence seems to favour the natural frequency hypothesis and to go against the nested-set hypothesis.

Since the 1970s, the Heuristics and Biases Program in Cognitive Psychology has shown that people do not reason correctly about conditional probability problems. In the 1990s, however, evolutionary psychologists discovered that if the same problems are presented in a different way, people's performance greatly improves. Two explanations have been offered to account for this facilitation effect: the natural frequency hypothesis and the nested-set hypothesis. The empirical evidence on this debate is mixed. We review the literature pointing out some methodological issues that we take into account in our own present experiments. We interpret our results as suggesting that when the mentioned methodological problems are tackled, the evidence seems to favour the natural frequency hypothesis and to go against the nested-set hypothesis.

Psychonomic Bulletin & Review

The idea that naturally sampled frequencies facilitate performance in statistical reasoning tasks because they are a cognitively privileged representational format has been challenged by findings that similarly structured numbers presented as “chances” similarly facilitate performance, based on the claim that these are technically single-event probabilities. A crucial opinion, however, is that of the research participants, who possibly interpret chances as de facto frequencies. A series of experiments here indicate that not only is performance improved by clearly presented natural frequencies rather than chances phrasing, but also that participants who interpreted chances as frequencies rather than probabilities were consistently better at statistical reasoning. This result was found across different variations of information presentation and across different populations.

Anales de Psicología

Nick Perham

Three experiments examined people's ability to incorporate base rate information when judging posterior probabilities. Specifically, we tested the (Cosmides, L., & Tooby, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgement under uncertainty. Cognition, 58, 1–73) conclusion that people's reasoning appears to follow Bayesian principles when they are presented with information in a frequency format, but not when information is presented as one case probabilities. First, we found that frequency formats were not generally associated with better performance than probability formats unless they were presented in a manner which facilitated construction of a set inclusion mental model. Second, we demonstrated that the use of frequency information may promote biases in the weighting of information. When participants are asked to express their judgements in frequency rather than probability format, they were more likely to produce the base rate as their answer, ignoring diagnostic evidence.

Sergey Artemenkov

Introduction Subsequent to the investigations of Tversky and Kahneman (Tversky & Kahneman, 1983) it is well known that judgments under uncertainty are often mediated by intuitive heuristics that are not bound by specific scientific natural laws. For example, according to the conjunction rule a conjunction can be more representative than one of its constituents, and instances of a specific category can be easier to imagine or to retrieve than instances of a more inclusive category. The so called Representativeness and Availability Heuristics (RAH) therefore can make a conjunction appear more probable than one of its constituents, which breaks the most basic qualitative law of probability conjunction rule: The probability of a conjunction, P(A&B), cannot exceed the probabilities of its constituents, P(A) and P(B), because the extension (or the possibility set) of the conjunction is included in the extension of its constituents. This phenomenon was regarded as cognitive illusion and de...

Michel Gonzalez

Proceedings of the 29th Annual Meeting of the Cognitive Science Society

Joseph Jay Williams

In evaluation frames, both focal and alternative hypotheses are explicit in queries about an event’s probability. We investigated whether evaluation frames improved the accuracy and coherence of conditional probability judgments when compared to economy frames in which only the focal hypothesis was explicit. Participants were presented with contingency information regarding the relation between viruses and an illness with an unknown etiology, and they judged the conditional probability that the illness would occur or not occur given that a virus was either present or absent. Compared to economy frames, evaluation frames improved the accuracy and coherence of probability judgments.

RELATED PAPERS

Niya Islami

Paulo Coelho

DK Wickramasinghe

liuba petrov

Iffan Ahmad Gufron

The European Journal of Development Research

Cecile Jackson

AKSELERASI: Jurnal Ilmiah Nasional

Muhammad Arief Ramadhani 2203036191

nesti pklbandung

2018 20th International Conference on Transparent Optical Networks (ICTON)

Carole Ecoffet

Shaun Rosier

Fibres and Textiles in Eastern Europe

Ernest Popardowski

yyjugf hfgerfd

SA journal of human resource management

Gina Gorgens-Ekermans

Journal of Biological Chemistry

Szecheng Lo

Angela Dillard

International Journal of Mechanical and Production Engineering Research and Development

sushil kumar

Estudos em Ciências da Saúde

Nayara França

Complexitas – Revista de Filosofia Temática

Edna Alves de Souza

stewart frescas

Journal of Membrane Science

Dr. Ghanshyam Jadav

London 2013, 75th eage conference en exhibition incorporating SPE Europec

Nicolas Bousquie

Olga Vadzyuk

Journal of the Atmospheric Sciences

Robert Rogers

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024
  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, frequency formats: how primary school stochastics profits from cognitive psychology.

frequency format hypothesis

  • 1 Eichwald-Realschule Sachsenheim, Sachsenheim, Germany
  • 2 Department 3, Mathematics Institute, University of Koblenz-Landau, Koblenz, Germany

Cognitive psychology has shown that understanding numerical information is deeply related to the format in which this information is presented; percentages are difficult to grasp whereas frequency formats are intuitively accessible. This plays a vital role in the medical domain where difficult risk-related probability judgments have to be made both by professionals and their patients. In this article, we demonstrate that the idea of representing statistical information in terms of frequency formats is not only helpful for communicating risks, but can be applied to primary school stochastics when percentages and fractions are not available. For this purpose, we report on an intervention study conducted in grade 4 in primary school. The results show, on the one hand, that primary school students could already solve Bayesian reasoning tasks in the pretest when natural frequencies were used. On the other hand, the students profited from the intervention where they used different representations, namely colored tinker cubes and natural frequencies in order to describe and quantify frequencies and probabilities. These results go along with findings from cognitive psychology that activities with hands-on material as well as pointing out to the underlying nested-sets structure can foster Bayesian reasoning. The results are discussed in particular with regard to teaching stochastics in (primary) school.

Theoretical Background

Why do people find probability and statistics unintuitive and difficult? I've been working in this area for around 35 years, and after all this time have finally arrived at an answer. Because probability and statistics are unintuitive and difficult .

– Spiegelhalter and Gage (2014)

The core idea of this paper is to provide empirical evidence from an intervention study in primary school that demonstrates that probability and statistics are not— per se —unintuitive and difficult. It appears that the way stochastic concepts and contents are communicated and represented is often unintuitive and difficult but, can be—at least partly—made accessible already to primary students by using natural frequencies in combination with enactive, hands-on material and activities. In our study, we focus on Bayesian reasoning in the sense of inferring or adjusting probabilities for hypotheses “upon receiving new evidence” ( Vallée-Tourangeau et al., 2015 , p. 4). First of all, there is an a-priori probability P(H) for a certain hypothesis to be true. When receiving new information (data = D), this probability might be adjusted. In many stochastic situations the conditional probability P(D|H) can be determined from the context. However, what is often of interest is the inversion of this conditional probability, namely P(H|D). In these cases, the Bayes' theorem that can be applied in order to calculate the inversion of such a conditional probability what can be considered as an update of the a-priori probability. Research clearly shows that it is very difficult for many people to understand conditional probabilities and in particular the Bayes' theorem ( Gigerenzer and Hoffrage, 1995 ; Sedlmeier, 2001 ; Sedlmeier and Gigerenzer, 2001 ; Hoffrage et al., 2002 ; Wassner, 2004 ). With regard to our sample, we won't focus on the Bayes' theorem in this study. However—as we will show in this paper—primary school students can already understand the core idea of Bayesian reasoning in the sense of updating probabilities, if the used representation format is adequate, e.g., if natural frequencies are used. In the following, we will describe how natural frequencies can support human understanding in specific situations.

The Role of Natural Frequencies in Human Comprehension of Situations of Uncertainty

The way statistical or numerical information is communicated is deeply related to the processes of the human mind and its mechanisms ( Gigerenzer and Hoffrage, 1995 ; Sedlmeier, 2001 ; Hoffrage et al., 2002 ; Spiegelhalter et al., 2011 ). During the last 50 years, there have been disputes between advocates of the heuristics-and-biases tradition and evolutionary psychologists about humans' reasoning and judgment capabilities under uncertainty ( Samuels et al., 2002 ). The hot-button issue is the question of whether human beings lack a sense for probability ( Piattelli-Palmarini, 1994 ) or whether they do indeed have a form of instinct for it ( Pinker, 1997 ). The scholars with a pessimistic mindset come primarily from the ranks of the heuristics-and-biases program. Piattelli-Palmarini (1994) , Bazerman and Neale (1986) as well as Gould (1992) state that humans are somewhat probability-blind when reasoning and judging under uncertainty. From their perspective, humans are not capable of making probability-related judgments because of one main reason: The human mind is “not built to work by the rules of probability” ( Gould, 1992 , p. 469). As a result, human choice behavior will always deviate from normatively appropriate judgments ( Samuels et al., 2002 ). One of the most popular proponents and founder of the heuristics-and-biases program is Daniel Kahneman. In his opinion, there is little hope of eliminating wrong intuitions and biases in probabilistic thinking through instruction ( Kahneman, 2011 ). In contrast, several evolutionary psychologists argue that probabilistic phenomena are too pervasive in nature for humans to lack a sense of them ( Pinker, 1997 ). Almost every incident in everyday life can be described as a probabilistic phenomenon. As a result, the human mind must be capable of dealing with randomness. Moreover, the reasons for the difficulties mentioned above hark back to counterintuitive formats in which probabilities are communicated ( Gigerenzer, 1991 ). Information should be presented in the way people naturally think ( Pinker, 1997 ). As a consequence, cognitive illusions such as the base-rate fallacy or the conjunction fallacy may just disappear ( Gigerenzer, 1991 ). We will now introduce the concept of natural frequencies, a format that might support understanding probabilities.

The concept of natural frequencies was first put forward by Gigerenzer and Hoffrage (1995) . It can be vividly illustrated as a natural movement people perform when they, e.g., extract two apples from a basket with 10 apples, or certain tokens from a larger set of tokens (see Figure 1 ). The relations between those subsets can be interpreted as “nested sets.” The so-called “nested-sets theory” is based on the idea that Bayesian reasoning is deeply intertwined with the understanding of the relation within sets and their subsets ( McDowell and Jacobs, 2017 ; see also Section Possible Explanations for the Advantages of Natural Frequencies: The Nested-Sets Theory and the Ecological Rationality Framework).

www.frontiersin.org

Figure 1 . Sampling using frequencies: cover image of a German schoolbook for upper-secondary level mathematics (source: Diepgen et al., 1993 ).

In order to show the specific and intuitive nature of natural frequencies, we contrast them to numerical expressions of percentages. For instance, when describing the proportion of colored tokens from the image in Figure 1 , we can either say 7 out of 40 are colored (natural frequencies) or we can say 17.5% tokens are colored (relative frequency as percentage).

Both expressions are mathematically equivalent; however, one appears to be adapted to the human mind because of the natural movement we associate with this expression. We can directly obverse and count the numbers involved in the natural frequency of colored tokens ( Hoffrage et al., 2002 ). Expressions in terms of percentages are more difficult to grasp because of the normalization to 100. This might be explained by the following: the base rate describes the frequency of a certain feature (seven colored tokens) in relation to the population (a total of 40 tokens). Normalization means dividing this absolute frequency by the total number in the population (and multiplying it with 100). As a result of this normalization, the information about the absolute numbers within the population disappear. On the one hand, this procedure facilitates comparing populations of different sizes. On the other hand, this process increases the level of abstraction, since there are no absolute, countable entities in the standardized frequencies, i.e., the percentages.

People might say that natural frequencies are not mathematically valid. Whereas, 7 out of 40 might be considered as only one arbitrary numerical example of the underlying proportion, the percentage 17.5% is the commonly used and most generally accepted representation of this proportion. And it is true that dealing with natural frequencies might not be easy when comparing or computing proportions since sizes of the underlying populations might be different—in contrast to percentages. However, an argument for using natural frequencies is that 7 out of 40 can indeed be considered as a representative of the underlying proportion if we think of it as an expected value. For instance, this expected value can easily be interpreted as the mean proportion of the following: 5 out of 40; 9 out of 40; 6 out of 40 and 8 out of 40 . Another argument for using natural frequencies is that they are suitable for describing conditional probabilities. Referring to the example in Figure 1 , the conditional probability P (green token | colored tokens) can be described as 2 green out of 7 colored tokens, which is more easy to interpret than the percentage 29% (rounded value of 2/7). Again, a natural movement can be associated, i.e., extracting the colored tokens out of the large set of all tokens and taking the two green tokens out of the small subset of colored tokens.

Natural Frequencies Can Support the Understanding of Bayesian Reasoning Tasks

Within the pioneering edition Judgment under uncertainty—Heuristics and Biases by Kahneman et al. (1982 , p. 253), Eddy stressed that medical doctors do not follow the Bayes' formula when solving the following task:

The probability that a woman aged 40 has breast cancer (B) is 1% (P(B) = prevalence = 1%). According to the literature, the probability that the disease is detected by a mammography (M) is 80% (P(M+|B) = sensitivity = 80%). The probability that the test mis-detects the disease, although the patient does not have it is 9.6% (P(M+|B) = 1 - specificity = 9.6%). If a woman aged 40 is tested as positive, what is the probability that she indeed has breast cancer P(B|M+)?

Application of the Bayes' formula yields the following result:

Thus, although having a positive mammography, the probability of breast cancer is only 7.8%, while Eddy (1982) reports that 95 out of 100 doctors wrongly estimated this probability to be between 70 and 80% in his empirical study.

In order to support the estimation of such conditional probabilities, Gigerenzer and Hoffrage (1995) investigated the corresponding representation of uncertainty. In Eddy's task from above, quantitative information was represented as probabilities. Gigerenzer and Hoffrage presented an adaption of Eddy's task to medical doctors: The original probabilities were replaced by a different representation of uncertainty, namely natural frequencies. The adapted task was as follows (ibid., p. 688):

Hundred out of every 10,000 women aged 40 who participate in routine screening have breast cancer. 80 of every 100 women with breast cancer will be detected as positive by a mammography. 950 out of every 9 900 women without breast cancer will also be detected as positive by a mammography. Here is a new representative sample of women aged 40 who have been detected as positive by a mammography in routine screening. How many of these women do you expect to actually have breast cancer?

Putting the numbers into Bayes' formula yields the following result:

Gigerenzer and Hoffrage (1995) reported that nearly half (46%) of all doctors gave the correct answer to this adapted task. This study was one of the first of several studies that empirically confirmed the positive effects of representing information in terms of natural frequencies instead of percentages ( Gigerenzer and Hoffrage, 1995 ; see also Macchi, 1995 ; Girotto and Gonzalez, 2001 ). In the following section, we will present further empirical studies comparing natural frequencies with other probability formats such as percentages in order to get a more profound view of their potential benefit.

Natural Frequencies—A Panacea for Solving Bayesian Reasoning Problems?

The frequency-probability-effect, i.e., the fact that using natural frequencies produces higher solution rates than using probabilities, is a very robust phenomenon. It has been replicated in many studies (see, e.g., the meta-analysis of McDowell and Jacobs, 2017 ). Nevertheless, the correctness of judgments concerning the medical test problem is far from being accurate—even if natural frequencies are used ( Pighin et al., 2018 ). In some cases, single-event probabilities have indeed shown some advantages over natural frequencies. In this sense, Pighin et al. (2018) found that the communication of test results in terms of chances compared to natural frequencies better helped patients to interpret their personal situation. Moreover, Ayal and Beyth-Marom (2014) found evidence that tasks using a natural frequency format were only solved better if not more than one mental step was required. There is evidence that in more complex tasks with several mental steps, probability formats outperform natural frequencies. This might be due to the normalization of the frequencies that is characteristic for probabilities and percentages and that helps to compare and compute different values ( Ayal and Beyth-Marom, 2014 ).

These findings relativize the frequency-probability-effect and, hence, have to be accounted for in this research field. Nevertheless, they play only a minor role for our study conducted in primary school. If any, quantifications of probabilities in primary school are restricted to frequency formats in the sense of “The probability to get a red cube is, e.g., 3 out of 10 .”

Two opposite theories, the Nested-Sets Theory and the Ecological Rationality Framework, have been established that provide explanations for the frequency-probability-effect. We will briefly present and contrast them in the following section.

Possible Explanations for the Advantages of Natural Frequencies: The Nested-Sets Theory and the Ecological Rationality Framework

McDowell and Jacobs (2017) state a long-lasting controversy with regard to possible explanations of the frequency-probability-effect. Proponents of the Ecological Rationality Framework ERF (e.g., Gigerenzer and Hoffrage, 1995 ; Cosmides and Tooby, 1996 ) assume that there is a specialized module in the human mind that automatically processes natural frequencies. According to ERF, this module has developed through evolution based on an appropriate matching between the human mind and the structure of the environment ( McDowell and Jacobs, 2017 ). As a consequence, the presentation of a Bayesian reasoning task in terms of natural frequencies increases solution rates as these natural frequencies correspond to people's natural environment for millions of years. In particular, the advantages of using natural frequencies are independent from the individual's cognitive resources ( Lesage et al., 2013 ).

A contrary view is expressed by the Nested-Sets Theory (NST) that explains the frequency-probability-effect as a result of emphasizing the nested-sets structure of the Bayesian problem when probabilities are translated into frequency format ( Girotto and Gonzalez, 2001 ; Barbey and Sloman, 2007 ). By using natural frequencies, this nested-sets structure becomes more prominent and visible. As a result, the analytical system of human mind is triggered and executive resources get available that can be used for calculating a correct answer. Lesage et al. (2013) examined the relationship between cognitive capacity and performance on Bayesian reasoning tasks. Participants with rather low cognitive capacity did not benefit much from facilitating the tasks via using natural frequencies. This finding is in line with NST that states that people with rather low cognitive resources profit less from the nested-sets structure visible in natural frequencies. In contrast, ERF claims that the benefits of using natural frequencies should rather equally apply for people with different levels of cognitive capacity since everyone has such a specialized module that automatically processes natural frequencies.

With regard to the focus of this study, we will not go into further details concerning the presented theories. However, they both emphasize that natural frequencies can help the understanding of, e.g., conditional probabilities or Bayesian reasoning tasks. Moreover, NST provides an analytical explanation for the benefit of using natural frequencies: When people get aware of the nested-sets structure of a Bayesian reasoning task (i.e., by natural frequencies), they will perform better on these tasks. Although this theory can serve as a theoretical basis for our study, as primary school students are able to work on such nested-sets, it has to be noted that there are different factors that mediate people's performance on Bayesian tasks. Such factors will be presented in the following.

Critical Factors Mediating Performance on Bayesian Reasoning

The meta-analysis of McDowell and Jacobs (2017) reveals important factors that account for different performances in Bayesian reasoning tasks. Two of the strongest factors concern the characteristics of the tasks and they apply for both natural frequencies and probabilities. First, task performance increases substantially if task complexity is reduced (see in particular Ayal and Beyth-Marom, 2014 ). This means for instance that less irrelevant information is given in a task or that less mental steps in the mathematical computations are required. Second, if participants are given visual aids, they perform much better since these external representations can clarify the underlying nested-sets structure ( McDowell and Jacobs, 2017 ).

Concerning individual factors, cognitive abilities and thinking dispositions ( Sirota et al., 2014 ), text comprehension ( Johnson and Tubau, 2015 ), as well as numeracy and cognitive reflection ( Sirota and Juanchich, 2011 ) predict Bayesian reasoning performance in both natural frequencies and probability formats. As the meta-study of McDowell and Jacobs (2017) indicates that a high level of numeracy leads to better Bayesian reasoning, Johnson and Tubau (2013) focused their study on this concrete individual characteristic. They found that short and clear natural frequency problems lead to less differences between people with low and high numeracy skills. Hence, both high and low numerate participants were able to adequately solve short Bayesian reasoning tasks using natural frequencies. The solution rates became smaller when the problems were presented in the form of longer word problems both in the natural frequencies and the probability format.

Whereas there are several studies focusing on such individual factors mediating the ability to solve Bayesian reasoning problems, there is only little research on how for example interactivity-based intervention improves performances on Bayesian reasoning tasks. Vallée-Tourangeau et al. (2015) conclude that enabling an enactive, physical manipulation of the problem information leads to substantially better statistical reasoning, without a specific training or instruction. In their study, participants benefited by working with malleable physical representations of a problem, namely playing cards. The participants who solved the problems with playing cards performed better than their peers without.

Although the mentioned studies reveal important findings about mediating factors on people's performance in Bayesian reasoning tasks, there is still the need to explore how this performance can be fostered. In particular, it stands to reason if and how young students with limited experiences in stochastics can be supported in this perspective. Therefore, the next section will present to what extent stochastics and Bayesian reasoning are taught at primary school.

Stochastics and Bayesian Reasoning in Primary School—Status Quo and Potential

Teaching stochastics in primary school is required by the German curricular standards but restricted to descriptive statistics (e.g., gathering, representing, and analyzing data in the context of tasks related to the students' everyday lives such as “How do you get to school?”) and basic random experiments (e.g., performing experiments with dice and spinners and discussing whether an event is “impossible”, “certain” or “likely” ( KMK, 2004 )). There is a strong focus on qualitative probability judgments and basic quantitative probability (e.g., “Are you more likely to get a number on the dice between 1 and 2 or a number between 3 and 6?”). Nevertheless, young students' potential does not appear to be fully exploited, as several studies suggest that primary school students are able to do more profound stochastics.

Lindmeier and Reiss (2014) , for example, show that children aged from 9 to 12 years can acquire elementary competencies regarding inferential statistics. In their experiment, the students took random samples out of a box with an unknown amount of red and blue cubes. After several trials, they had to estimate the amount and proportion of red and blue cubes in the box.

Other studies indicate that students in primary school are able to grasp an elementary form of conditional probabilities and Bayesian reasoning if these concepts are introduced using natural frequencies ( Martignon and Kurz-Milcke, 2006 ; Martignon and Krauss, 2009 ; Latten et al., 2011 ; Till, 2015 ). Due to the students' young age, these studies focus on their ability to capture the statistical or probabilistic phenomena instead of on their ability to work out the Bayes' formula. Promoting such a propaedeutic understanding of (conditional) probabilities also appears to be an important basis for further learning as, for instance, Diaz and Fuente (2007) show that students often approach probabilities in an algorithmic way: They master the techniques but do not catch the underlying phenomenon.

The study of Zhu and Gigerenzer (2006) used specific tasks promoting (an elementary form of) Bayesian reasoning by means of natural frequencies. Before presenting such student tasks, we will introduce a task by Kahneman (2011 , p. 6–7) that served as a model for Zhu and Gigerenzer. In Kahneman's task, which often results in wrong judgments, an individual is described by a neighbor as follows:

Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail. Is Steve more likely to be a librarian or a farmer?

According to Kahneman's (2011) research, most people answered that Steve is probably a librarian. However, as there are five times as many farmers as librarians in the United States, the absolute number of shy and helpful farmers is larger than the absolute number of shy and helpful librarians. Hence, the right answer to Kahneman's task is that it is more likely that Steve is a farmer. The most common mistake in this kind of task is that people neglect the base rate. Gigerenzer and Hoffrage (1995) claim that this typical fallacy—as well as some others—disappears when using natural representation formats.

In order to use such tasks that focus on Bayesian reasoning already in primary school, Kahneman's task was adapted to this age group by Zhu and Gigerenzer (2006) . Latten et al. (2011) implemented these ideas several years later in a short learning environment (the cited learning environment originates from Multmeier, see, e.g., Multmeier, 2012 ). In this adaption, librarians became princesses; farmers became mermaids, and the attribute shy became wearing a crown:

5 out of 60 fairytale characters are princesses, and 4 of these 5 princesses wear a crown. The other 55 out of 60 fairytale characters are mermaids, and 12 of these 55 mermaids wear a crown.

The corresponding question in this task is as follows: “Imagine you see a fairytale character wearing a crown. Would she be more likely to be a princess or a mermaid?”

When solving this task, the students have to concentrate only on the people wearing a crown and mask out all people without crown. Then they can compare the given natural frequencies of fairytale characters with crowns: 4 out of 16 characters with crowns are princesses, whereas 12 out of 16 characters with crowns are mermaids. Therefore, if they were to see a character with the attribute wearing a crown , it would be more likely to be a mermaid! By comparing the concrete numbers, students can realize that although almost every princess wears a crown (4 out of 5), there are altogether more mermaids with a crown. Hence, the attribute wearing a crown applies to more mermaids, which is why it is more likely for a character with a crown to be a mermaid. Understanding these nested-sets structure is essential for Bayesian reasoning.

The presented typical Bayesian reasoning task can be made even more accessible by combining the use of natural frequencies with iconic representations, such as icon arrays ( Kurz-Milcke et al., 2011 ). Several studies have shown the positive effects of visual representations for (probabilistic) problem-solving ( Corter and Zahner, 2007 ; Brase, 2008 ; Garcia-Retamero et al., 2010 ; Gaissmaier et al., 2012 ; McDowell and Jacobs, 2017 ). As a result of representing statistical information by means of visual representations, subset structures become visible, which is particularly conducive to understanding Bayesian reasoning problems. The big advantage of such visually perceived representations is that all proportions of the relevant features are visible what might help students to intuitively grasp all proportions ( Scholz and Waschescio, 1986 ). Figure 2 displays an iconic representation related to the above-described student task. This representation helps students to realize that there are so-called symptomatic characteristics for certain fairytale characters such as crowns for princesses. In the above-presented task, it helps the students to get aware that the symptomatic characteristic crown does not automatically lead to a higher probability for princesses. As, in this example, the absolute number of mermaids wearing a crown is higher than that of the princesses, the correct answer for the task above is “mermaid.”

www.frontiersin.org

Figure 2 . Iconic representation of a typical Bayesian task: Icon array. See Till (2015 , p. 91).

Of course, there are also other representations that could help students to work on the described Bayesian task. For instance, it can alternatively be modeled using hands-on material in the form of colored tinker cubes. Figure 3 displays such an example. In this simplified version, there are 2 princesses (red) and 8 mermaids (blue). 1 of the 2 princesses and 2 of the 8 mermaids wear a crown (marked in yellow). The other fairytale characters wear no crowns (marked in green). The base rate of princesses is 2 out of 10 (prior probability). Looking for princesses in the sample of the characters with crowns yields a base rate of 1 out of 3 (posterior probability).

www.frontiersin.org

Figure 3 . Enactive representation of a typical Bayesian task: Tinker cubes. See Till (2015 , p. 91).

The previous section shows that there are possibilities of introducing conditional probabilities and Bayesian reasoning already in primary school. In the following, we will sketch empirical results related to using natural frequencies in Bayesian reasoning tasks—in secondary but also in primary school.

Empirical Research on Students' Bayesian Reasoning

In an intervention study, Wassner (2004) compared two ways of teaching the Bayes' formula in a sample of 15- to 17-year-old students: one with probabilities and one with natural frequencies. The students who worked with natural frequencies performed significantly better in the posttest than the students who worked with probabilities. Wassner also reported on long-term effects of the intervention.

In the experimental study “The dog ate my homework!,” Spiegelhalter and Gage (2014) asked 14- to 16-year-old students to model the following Bayesian task: Within a school class, several students were accused of lying about the reasons why they had forgotten their homework. Hence, the study participants had to find out how likely it was that the accused or non-accused students were lying or telling the truth. In order to encode the binary variables (lying/telling the truth; accused/non-accused), the students worked with colored tinker cubes; moreover, all students created 2 × 2 tables and empirical frequency trees. All of these representations were based on natural frequencies, the concrete numbers of students' attributes (lying/telling the truth and accused/non-accused) were assigned randomly. This class experiment indicated that students could easily do probability calculations based on natural frequencies. However, due to the study design, it was not possible to determine the representation format that led to the highest growth in learning.

Zhu and Gigerenzer (2006) showed that children aged from 9 to 11 years can already work successfully on typical Bayesian tasks when the relevant information is presented as natural frequencies. The researchers used a set of ten tasks presented in two different ways: The information was given as probabilities in percentage form to one group of children and as natural frequencies to the other group. The students working with probabilities could not find any right solution at all. In contrast, even the youngest students (aged 9 years) from the group working with natural frequencies solved 14% of the tasks. The 10-year-olds in this group solved 42% and the 11-year-olds 47% of the tasks. These findings indicate that also very young students can deal with conditional probabilities when natural frequencies are used.

In an experiment, Martignon and Kurz-Milcke (2006) asked students aged from 8 to 10 years to construct stochastic situations using tinker cubes and stochastic urns. One of their aims was to foster the development of dynamic mental imagery to represent stochastic situations. The experiment consisted of a so-called “urn arithmetic” in which first elements of expanding proportions were fostered. The students had to compare proportions by constructing equivalent urns in the following manner: We have two urns, namely U 1 (1 red: 2 all) and U 2 (2 red: 5 all). Which urn is more convenient if we want a red tinker cube? ( Martignon and Kurz-Milcke, 2006 ). Without knowing about fractions the students discussed how to enlarge an urn without changing the odds (1 out of 2 = 2 out of 4). The authors consider “this first confrontation with comparison of proportions and similarity of proportions [as] a fundamental previous step before fractions are introduced” ( Martignon and Kurz-Milcke, 2006 , p. 3). In their experiment Martignon and Kurz-Milcke also used Kahneman's Bayesian task related to girls' and boys' mathematical enthusiasm and modeled the situation with a big urn in the involved classes. All students in the corresponding class were represented by tinker towers, i.e., a combination of two colored tinker cubes (red/blue for the students' gender, yellow/green for their math enthusiasm). After having gathered the relevant information about the whole class, the towers were categorized in a tree diagram. Based on this tree diagram, students formulated questions such as: “I have a blue cube (boy) behind my back. Do you think I am likely to be a math enthusiast?” Although there was no formal testing in this experiment, the authors stated that representing conditional probabilities via tinker towers in combination with tree-like layouts on the classroom floor helped students to work on Bayesian tasks.

Martignon and Krauss (2009) conducted a study in which they introduced a tool box for decision-making and reckoning with risk. This study was conducted in six grade 4 primary school classes. The students aged 9 to 10 were confronted with a sequence of tasks and playful activities involving, e.g., elementary Bayesian reasoning [“princess/mermaid task” presented in chapter Stochastics and Bayesian Reasoning in Primary School—Status Quo and Potential ( Latten et al., 2011 )] as well as the comparison of proportions and risks. One focus of the training was dealing with the Wason selection task, a logic puzzle about deductive reasoning. By following logical principles, students needed to figure out which cards to flip over to figure out certain rules. Hence, this game bridges between logical thinking and conditional probabilities. Furthermore, the primary school students played the game “Ludo” and were asked to compare different moves and the associated risks. The authors stated that these playful tasks and activities were fruitful. Again, this study confirmed that primary school students can successfully work on Bayesian tasks.

The study RIKO-STAT (e.g., Kuntze et al., 2010 ) assessed different competencies in the area of statistical literacy in a sample of primary school, secondary school, and university students. The tasks for the primary school students required them to apply, e.g., an elementary approach to expected values, risk reduction, and comparing proportions. The students were also confronted with the above-described Bayesian reasoning task addressing mermaids and princesses (chapter Stochastics and Bayesian Reasoning in Primary School—Status Quo and Potential). All in all, the students' performance showed considerable weaknesses, and hence, the authors argued in favor of encouraging statistical and probabilistic thinking earlier and more deeply at school. Furthermore, the authors reported that the primary school students performed well on the Bayesian tasks. Analyzing the primary students' strategies showed that many intuitively used an approach focusing on natural frequencies which led to satisfying solution rates, whereas the secondary school students mostly used percentages and did not perform well. The authors assumed that they would have performed better if these secondary school students had applied natural frequencies instead of percentages.

Based on the results from RIKO-STAT, researchers from Ludwigsburg University of Education and cognitive psychologists from the Harding Center for Risk Literacy in Berlin investigated in a sample of primary school students aged 9 to 10 their competencies related to risk ( Latten et al., 2011 ). In this intervention study consisting of six lessons, the students were confronted with first elements of expected values, risk reduction, conditional probabilities, and comparisons of proportions. The authors reported of significantly improved competencies due to the intervention.

The above-mentioned findings show that natural frequencies can be used to foster students' Bayesian reasoning. In the next section, we will outline the corresponding research desideratum of our study.

Research Desideratum

Since several decades, there is vast empirical evidence that many people have difficulties with Bayesian reasoning—even if they dispose of high cognitive capacity and high numeracy (e.g., Kahneman et al., 1982 ; Sirota and Juanchich, 2011 ; McDowell and Jacobs, 2017 ). One idea to foster Bayesian reasoning, is to confront already young children with corresponding situations and tasks in order to develop valid intuitions. This idea is based and supported by considerations of the previous sections that outlined (a) theoretically-driven explanations for the intuitive character of natural frequencies, (b) empirical findings confirming their advantages compared to probabilities represented as percentages and, in particular, (c) empirical results indicating that natural frequencies can successfully be used at primary school, where percentages, ratios, and fractions are not explicitly addressed—at least not in Germany. In this perspective, the first research question of this study investigates how successful primary school students are with specific Bayesian reasoning tasks represented in natural frequencies. The corresponding research question is:

• To what extent are students in grade 4 able to solve Bayesian reasoning tasks when the information is given in terms of natural frequencies?

Considering empirical evidence from prior research leads to the hypothesis that already young students can handle with such tasks. This study aims at confirming these prior studies and to enlarge them by quantitative evidence—as most of the cited study do not provide quantitative results.

Moreover, and based on the idea that primary school students can successfully work on Bayesian reasoning tasks via natural frequencies, it stands to reason if and how primary school students can be supported in this regard. For this age group, a play- and activity-based approach appears to be adequate that could prepare a valid basis for the further learning about Bayesian reasoning ( Martignon and Kurz-Milcke, 2006 ; Martignon and Krauss, 2009 ; see also Johnson and Tubau, 2015 ). The intervention of this study was conceived in this sense as it involves playful learning with enactive representations like tinker cubes. The intervention will be described in the Methods Section in more detail. The corresponding research question focuses on evaluating the effectiveness of this intervention:

• How does a specific intervention affect primary students' performance in tasks related to conditional probabilities and Bayesian reasoning?

As numeracy has proven to be a predictor of Bayesian reasoning in prior research ( Johnson and Tubau, 2013 ), we will control for this covariate when investigating research question 2.

Previous studies have indicated that young students' Bayesian reasoning can be fostered through activities such as in our intervention, but often, a statistical effect has not been proven empirically. In particular, most of the cited studies do not provide an experimental design enabling to quantitatively evaluate an intervention effect of using natural representations. This study closes this research gap and seeks to support the above-mentioned findings using a pretest-posttest design including a control group. In the following, we will describe the method used in this study.

In this study, 244 grade 4 students (131 girls) aged between 8 and 12 years (M = 9.5, SD = 0.61) took part. The students came from 12 classes from six different schools in the surroundings of a medium-sized city in the south of Germany. Eight classes including 152 students were part of the treatment group and four classes including 92 students served as control group (baseline). The classes were not assigned randomly to the different test conditions due to pragmatic reasons (see Limitations Section). In each of the classes, there were around 20 students. As conditional probabilities and Bayesian reasoning are usually taught in grade 10 or 11 at the earliest, the students had no previous school experience with these topics.

Design of the Study

In order to determine particular intervention effects, a pre-, post-, follow-up test design with a treatment and control group was chosen. All students from the treatment and control group completed the tests; however, only the students from the treatment group attended stochastics-specific lessons, whereas the students from the control classes attended general and non-stochastics-specific math lessons in the time between the testings. The pre- and posttests were administered directly before and after the intervention; the follow-up test was conducted 3 months after the posttest. These temporal distances were comparable in the treatment and control group.

The intervention effects were analyzed via a multiple regression in SPSS 25. Covariates, such as students' age, gender, and their grades were collected. In this study, we control for the covariate “grades in Mathematics” as a safeguard against possible biases of the intervention effect due to general mathematical competency represented by these grades. This appears to be important as numeracy has shown to be an influencing factor of Bayesian reasoning performances ( Sirota and Juanchich, 2011 ; Johnson and Tubau, 2013 ).

Intervention

The intervention included elements of several classroom experiments and studies which had been conducted before at the University of Education in Ludwigsburg as well as at the Max-Planck Institute in Berlin ( Martignon and Kurz-Milcke, 2006 ; Martignon and Krauss, 2009 ; Latten et al., 2011 ). In particular, the intervention comprised tasks and activities related to risk and decisions under uncertainty that were intended also to foster first intuitions of expected values. In the first lesson, the students were confronted with a play-based simulation of the following trade-off: “Either you choose one candybar for sure or you can toss a coin. If you get heads, you win four candybars. Otherwise you go empty-handed.” In the second and third lesson the focus was on proportional reasoning as well as on relative and absolute risks (see e.g., Till, 2014 , 2015 ). In the fourth lesson, the students were confronted with a typical Bayesian task during an ordinary 45-min lesson. Because of the focus of this article, we will present the content of this lesson in more detail. The following task, which was adapted from the medical test problem (see chapter Stochastics and Bayesian Reasoning in Primary School—Status Quo and Potential), was discussed in this lesson:

“In a school yard, there are two girls—one with long hair and one with short hair. There are also eight boys—two with long hair and six with short hair. If I told you that I talked with one of these children with long hair. Would you bet it was a girl?”

At the beginning of the lesson, the students were asked several questions about the distribution of different characteristics within their own class such as “How many girls are in this class?” “How many students play soccer in a sports club?” By doing so, the class was introduced to represent the considered population. Afterwards, the initial question relating to countable entities was turned into a probabilistic question: “Imagine someone picks one student out of your class. What is the probability that this person is a girl or a boy?” After some qualitative judgments addressing for instance terms such as “more likely,” the class made quantitative judgments formulated as frequencies (“8 out of 21”). In the sense of Bayesian reasoning, these statements can be understood as a-priori probabilities. After these preparative activities, the task described above was introduced. In order to really understand this Bayesian task and to clarify the nested-sets structure of the problem, a little role play was performed: 10 students (two girls and eight boys) representing the characteristics described in the task were asked to line up in front of the class. The other students described the distribution of the characteristics in the two groups (girls and boys). By doing so, they were unknowingly introduced to natural frequencies: “2 out of 10 children are girls; 1 out of 2 girls has long hair, whereas 2 out of 8 boys have long hair.” Therefore, the characteristic long hair is more typical for a girl. The teacher then asked “I talked with one of these children with long hair. Would you bet it was a girl?” The class discussed about the right answer. In order to make this situation more accessible, the teacher asked the students with long hair to make a step forward. Now all students gave the right answer because they realized the nested-sets structure related to the characteristic “long hair.” Afterwards, the students used colored tinker cubes to encode the features boy, girl, long hair, and short hair in order to model the situation. By putting two cubes together, students were able to represent related characteristics (i.e., a long-haired boy).

According to Diaz and Fuente (2007) , there are no standardized tests of (young) students' understanding of conditional probabilities and Bayesian reasoning. Therefore, test items were used that are comparable to the items of Zhu and Gigerenzer (2006) . They were structured in the same way as the medical test problem ( Eddy, 1982 ; Cosmides and Tooby, 1996 ). However, different cover stories were created for the pre-, post-, and follow-up test.

In order to illustrate the test in more detail, we will present and describe two items in the following. The Item FEU (see Figure 4 on the left) is characterized by the fact that students first are asked by a sub-item (a) to determine the a-priori probability of the hypothesis that a student of a certain school comes from the city [P(H)]. Afterwards, they are asked in sub-item (b) to update this probability when new information is given, namely the fact that the observed child has a mobile phone [P(H|D)]. Sub-item (a) draws the students' attention to the frequencies of children coming from the city and the village within the whole set. Sub-item (b) draws their attention to children from the city and village within the subset of children having a mobile phone. As the sub-item (a) might be considered as a trigger to think about the nested-sets structure given in the task—what might help students to answer also sub-item (b)—we label such items as “guided tasks.” In addition to such “guided” tasks, there are “non-guided” task (LaH ) that are mathematically equivalent to the presented type-(b) sub-item of the “guided” tasks (see Figure 4 on the right). However, students‘ attention here is not drawn to the nested-sets structure by a preceding type-(a) sub-item. The students are asked about the a-posteriori probability relating to the number of princesses in the subset of individuals wearing a crown [P(H|D)] without being triggered to the frequency of princesses in the whole set.

www.frontiersin.org

Figure 4 . On the left: “Guided” task ( FEU ); on the right: “Non-guided” task ( LaH ).

As mentioned above, we consider the “guided task” as easier to solve because students are triggered to think about and determine the a-priori probability of a hypothesis and then update this probability into an a-posteriori probability when new information is gathered. This consideration is in line with the nested-sets theory ( Girotto and Gonzalez, 2001 ; Barbey and Sloman, 2007 ) as students' attention is drawn to the nested-sets structure of the given situation. As the sample items illustrate, the tasks were written in a short and comprehensible language to make sure that students of both groups (treatment and control group) exactly understood what they were required to do. The pre-, post-, and follow-up tests all included items where the students (a) had to mark the right answer (single-choice format), (b) fill in the blanks with their answer, or (c) give an explanation for their answer. Hence, altogether there were six items yielding to a maximum score of six points. Tasks with missing values were coded as zero because the students had enough time to complete the tests.

Beyond tasks referring to Bayesian reasoning such as the presented ones, the test included also tasks involving, e.g., elementary comparisons of probabilities, proportions and frequencies, trade-offs as first elements for expected values, and risk reductions. As these tasks are not addressed in this article, we do not report on them in more detail. More information about the test instrument can be found in Till (2015) . For ease of reading, in the following we will label the test scores referring to the Bayesian reasoning items only as pre-, post-, and follow-up test scores .

In the following, we present the results of this study in two subsections: First, we report and analyze students' overall performance on the Bayesian reasoning tasks (both treatment and control group) at the different times of testing (see research question 1 und 2). Second and in order to investigate the intervention effects (research question 2) in more detail, we will present solution frequencies of the two items FEU and LaH that were already introduced in the Methods Section.

The overall average of the Bayesian pretest score was 2.96 (SD = 1.48) out of 6 points. The students from the control group had significantly higher pretest scores compared to the students from the treatment group ( M treatment = 2.81, SD = 1.48; M control = 3.22, SD = 1.44; t (242) = 2.11, p = 0.036, Cohen's d = 0.28). After the intervention, the students from the treatment group outperformed the students from the control group with a marginally significant p -value [ M treatment = 4.20, SD = 1.86; M control = 3.75, SD = 1.74; t (225) = 2.24, p = 0.071, Cohen's d = 0.26]. The increase from pre- to posttest was significant both in treatment [ t (143) = −8.39, p < 0.001, Cohen's d = 0.83] and control group [ t (82) = 2.74, p = 0.008, Cohen's d = 0.33]. After 3 months, the follow-up test scores of the treatment group were still higher ( M treatment = 3.84, SD = 1.86; M control = 3.64, SD = 1.88), though this difference was not significant [ t (226) = 0.7595, p = 0.448]. Table 1 displays an overview of these results.

www.frontiersin.org

Table 1 . Average test scores of the treatment and control group.

In order to get more insight into the intervention effects, a multiple regression was performed including also the covariate grades in Mathematics (considered as a representative of students‘ numeracy). Two models were compared (see Table 2 ): In the first model, the predictors pretest Bayes score and grades in Mathematics explained 17% of the variance of the posttest Bayes score (pretest predicting follow-up test: 23%). Both predictors proved to be significant, which means that, on average, students with good grades in Mathematics (considered as numeracy) and students with high pretest scores also achieved high posttest scores.

www.frontiersin.org

Table 2 . Prediction of the posttest results of the Bayesian tasks.

For the second model, the third predictor test condition (dummy-coded with 0 for the control group and 1 for the treatment group) additionally explained 2% of variance. Hence, 19% of the posttest results can be explained by the three predictors pretest score, grade in Mathematics , and test condition . The fact that the predictor test condition had a significant regression weight of 0.18 ( p < 0.01) indicates that the short treatment had a significant effect. Determining the effect size for pretest-posttest-designs with treatment and control group (corrected in the sense of Morris, 2008 ) indicated a medium effect size of d = 0.59. The findings related to the prediction of the 3-months-delayed follow-up test result were similar (see Table 3 ), whereas in this case the test condition was not significant.

www.frontiersin.org

Table 3 . Prediction of the follow-up test results of the Bayesian tasks.

As mentioned above and in order to get insight into the intervention effect in more detail, we will now present solution frequencies of two concrete items. As we only consider two items, we do not use t-tests or other inferential statistics. The item FEU represents a so-called “guided task” whereas the item LaH is a “non-guided” task (see Methods Section).

Figures 5 , 6 show the different solution frequencies of the treatment and the control group on the two tasks. In the pretest, the majority of the students (68% both in control and treatment group) were able to complete the “guided task” FEU . Only about 23% of the students from the treatment group and 36% of the control group solved the “non-guided” task.

www.frontiersin.org

Figure 5 . Comparison of solution rates related to two different items (treatment group).

www.frontiersin.org

Figure 6 . Comparison of solution rates related to two different items (control group).

After the treatment, 64% of the students from the treatment group solved the “non-guided task”, the solution frequency in the control group was 49%. The solution rates of the posttest concerning the “guided task” were still high in both groups (treatment group 73%; control group 66%).

The first—and perhaps the most important—result of this study is the relatively high average pretest score of all students. Even without prior confrontation with Bayesian text problems, the students on average achieved half of the maximum test score. This is even more meaningful when we consider the difficulties that adults (medical doctors, lawyers) have with such tasks ( Gigerenzer et al., 2008 ; Gaissmaier et al., 2012 ). One explanation of this finding might be the task's representation format, namely natural frequencies. Existing literature (e.g., Gigerenzer and Hoffrage, 1995 ; Sedlmeier and Gigerenzer, 2001 ; Hoffrage et al., 2002 ; Wassner, 2004 ; Zhu and Gigerenzer, 2006 ) shows that people benefit from working with natural frequencies when they have to solve probability-related tasks. This applies in particular for a special kind of probability task, the medical test problem, as difficult conditional probabilities and their inversions become easier to understand if they are presented in terms of natural frequencies. Barbey and Sloman (2007) explain that natural frequencies lead to a clear representation of the subset relationships (see also NST, e.g., Girotto and Gonzalez, 2001 ) and to a simplification of numerical calculations ( Sedlmeier, 2001 ; Sedlmeier and Gigerenzer, 2001 ; Wassner, 2004 ). Therefore, we assumed that this format might be also suitable for primary school. This assumption could be confirmed by the present study.

Beyond the representation format of natural frequencies, another explanation of the rather strong average pretest scores might be the short and simple question format of our test instrument that was obviously easy to understand for the children. In particular, this question format made visible the nested-sets structures underlying the tasks. In each task, a given set of individuals with certain attributes had to be considered and absolute numbers had to be compared. As the study shows, many students managed to solve the inversion of the conditional probability task even without the support of the intervention. These results go along with findings from McDowell and Jacobs (2017) according to which short and simple text formats as well as the communication in terms of natural frequencies facilitate Bayesian reasoning tasks. Moreover, the comparison between the “guided” and “non-guided” tasks shows that the students of both groups had less problems with the “guided” task. This is even more impressive when we consider that the “guided task” was arithmetically more demanding than the “non-guided” one (“guided task”: A small school with 60 children; “non-guided task”: A castle with 10 women). In line with the Nested-Sets Theory ( Girotto and Gonzalez, 2001 ; Barbey and Sloman, 2007 ), this finding was to be expected as the type-(a) sub-item of the “guided” task draw the students' attention to the nested-sets structure and hence makes it more visible. However, as these type-(a) sub-items do not draw the students' attention directly to the structure focused in the type-(b) sub-items, this expectation had to be empirically confirmed. The higher solution rates (pretest) of both groups for the “guided task” confirm that making the nested-sets structure visible helps the students to solve the task.

In the following, we will discuss the intervention effects. A comparison of the results after the intervention reveals that there was a significant difference in students' performance in the test condition. Directly after the intervention and even 3 months later, the students of the treatment group achieved higher test scores than their peers in the control group. Although the absolute differences between the two groups in their average scores in the posttests were not large ( Table 1 ; similar also in the follow-up tests), the scores of the children in the treatment group showed a significantly larger increase from pre- to posttest with a medium effect size ( Table 2 ). These results empirically confirm that young students' Bayesian reasoning could be fostered by the short intervention providing a first experience with natural frequencies and modeling stochastic situations using tinker cubes. Hence, using natural frequencies once again showed up to be appropriate already in primary school. Moreover, the playful and hands-on intervention including a role-play and modeling nested-sets structures with tinker cubes proved to be supporting for the students. This is in line with Vallée-Tourangeau et al. (2015) who claim that making all sets and subsets explicit by enabling enactive activities related to the problem information substantially improves statistical reasoning. One reason for the rather moderate absolute differences between treatment and control group in the post- and follow-up test scores (see Table 1 ) might be that the maximum score was limited to 6 what means – together with the relatively high pretest scores—that there wasn't much improvement potential for the students. Another reason might be the short duration of the intervention of only one lesson. In such a short period, large improvements cannot to be expected. However, the medium effect sizes allow us to be optimistic about the potential of this approach.

Comparing the intervention effects related to the “guided” and “non-guided” tasks shows that the solutions rates of the “guided” task were relatively stable over time in both groups. However, within the treatment group, the solution rate of the “non-guided” task considerably increased, and even in the control group, higher posttest scores were recorded. We interpret this as follows: For the “guided” tasks, there was a kind of ceiling effect leading to no substantial differences from pre- to posttest. Moreover, the intervention effect appears to be moderate on tasks where the nested-sets structure is already triggered by the task itself. In contrast, the intervention appears to support students' ability to recognize the nested-sets structure particularly in tasks where it is not triggered automatically. The fact that also the students in the control group increased their solution frequency in this task indicates that already the repeated dealing with (“guided” and “non-guided”) Bayesian reasoning tasks supports students' corresponding performance. Hence, experiences with nested-sets structures appear to help students in developing their Bayesian reasoning. In our study, they could particularly be supported by a corresponding training using hands-on activities (and natural frequencies) but also the individual dealing with such tasks can (moderately) improve their corresponding abilities. The slight improvement of the children in the control group is not limited to the “non-guided” tasks but can also be seen in the overall Bayesian reasoning score. This might be explained by familiarity with the test items or (subconscious) learning effects of working on them (including possibly also the informal exchange of the participants between pre- and posttest). It also highlights once again the importance of using an appropriate representation format—which was also used in the test items.

Implications for Future Research

The idea of this article was to evaluate the effect of a representation format that facilitates probabilistic reasoning, namely natural frequencies, in a sample of young students. In contrast to other studies, the focus was not on comparing different factors (e.g., representation format, task-complexity, numeracy) and their influence on Bayesian reasoning performances. In the present study the intention was to empirically prove that an activity-based and playful training can lead to better performances on Bayesian reasoning tasks. Our results show, that already this short intervention had a medium effect, that might be strengthened by a longer duration of the intervention. However, this expectation of a more substantial effect by a longer intervention should be empirically proven. Moreover, the used test instrument should be enlarged by more Bayesian reasoning tasks in order to get a more detailed insight into the effects of such a longer intervention.

Although this study confirmed that students can be fostered in their Bayesian reasoning by an activity-based and playful training it also raises issues for further research. For instance, we support the claim of research that focuses on the following questions: “What strategies are the participants pursuing when solving Bayesian reasoning problems? Which aids are helpful for recognizing the nested-sets structure?” (e.g., playing cards/modeling the subset-relationships via tinker cubes). With this demand we join the research desideratum of McDowell and Jacobs (2017) as well as Vallée-Tourangeau et al. (2015) . This desideratum could be approached by qualitative studies in which students communicate their thoughts via interviews or open-ended questions when solving Bayesian reasoning problems.

Implication for Teaching Statistics in Primary and Secondary School

What are the consequences for teaching probability and statistics (in primary school)? Should we refrain from working with percentages and use only natural frequencies from now on? Of course not. In primary school where fractions and percentages are not available yet, natural frequencies seem to be a suitable way to quantify probabilities at an early stage. In this perspective, our study shows that it is possible to teach already primary school students in Bayesian reasoning when using natural frequencies. We consider such early and playful experiences with Bayesian reasoning as important in order to establish a basis for more abstract contexts (e.g., the formal calculation of probabilities in general or the Bayes‘ theorem). Although our study shows that the early fostering of Bayesian reasoning can be successful, we see two obstacles for its implementation at school: First, time is limited and therefore teachers might put more emphasis, e.g., on arithmetic skills than on statistics. Second, in German primary schools, a considerable number of teachers did not study Mathematics as a main subject. Particularly these teachers cannot draw on solid prerequisites to teach Bayesian reasoning. Developing and implementing primary school teacher trainings could help to overcome both of these obstacles. In particular, teachers here could learn about the importance and benefit of using natural frequencies in primary and secondary school: They allow the quantification of probabilities without using fractions and percentages. Furthermore, they also contribute to strengthen the concept of ratios and fractions at an early stage. Additionally and as our study shows, teachers can use them to introduce Bayesian reasoning at an early stage. For this purpose, also hands-on activities such as using the described tinker cubes can be introduced what illustrates the playful character and the appropriateness of a teaching unit based on the ideas of our intervention for young students. Such teacher trainings might at least lead to overcome the prejudice that statistics and Bayesian reasoning are per se too difficult for primary school. In a longterm perspective, such teacher trainings and implementations of Bayesian reasoning in primary school might have the potential to increase the number of people making reasonable decisions under uncertainty. We are absolutely convinced that enhancing good decisions under uncertainty goes along with an appropriate statistics education at school.

Limitations

Even though the intervention had an effect on the students' understanding of conditional probabilities and Bayesian problems, there are some limitations that relate to the design of the study. First, students who participated in a training were compared to students who had no training at all (baseline control group). Although no different treatments were tested against each other, comparing the treatment group to a baseline control group appears to be appropriate in order to evaluate the effectiveness of new ideas and learning approaches. Second, in this study, the classes were not assigned randomly to the different test conditions. This is caused by the fact that in Germany, school interventions hinge on the willingness of the teachers. Some teachers wanted their class to be part of the intervention. Others only wanted to be part of the control group. In order not to refuse participation in this study to any of the teachers, their corresponding requests were satisfied. Therefore, and as we consider a large number of students in the treatment group as more important than in the control group, their ratio is not perfectly balanced. In order to account for the different pretest scores in the treatment and control group, this variable was controlled for in the multiple regression analysis. A multilevel analysis due to the hierarchical structured sample (classes/schools) has not been carried out as the sample of this study was not large enough. Further studies with bigger samples could take into account this hierarchical structure.

Data Availability Statement

The dataset used for the analyses presented in this article are available in the Supplementary Material .

Ethics Statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

Author Contributions

CT and US are accountable for the content of this article.

The authors were members of the Cooperative Research Training Group of the University of Education, Ludwigsburg, and the University of Tübingen, which was supported by the Ministry of Science, Research, and the Arts in Baden-Württemberg. A former version of this article was published in the context of the Ph.D. thesis of CT under the supervision of L. Martignon. The publication of this paper was funded by the Open Access Fund of the University of Koblenz-Landau.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feduc.2020.00073/full#supplementary-material

Ayal, S., and Beyth-Marom, R. (2014). The effects of mental steps and compatibility on Bayesian reasoning. Judgment Decision Making 9, 226–242.

Google Scholar

Barbey, A. K., and Sloman, S. A. (2007). Base-rate respect: from statistical formats to cognitive structures. Behav. Brain Sci. 30, 287–292. doi: 10.1017/S0140525X07001963

PubMed Abstract | CrossRef Full Text | Google Scholar

Bazerman, M., and Neale, M. (1986). “Heuristics and negotiation,” in Judgment and Decision Making: an Interdisciplinary Reader , eds H. Arkes, and K. Hammond (Cambridge: Cambridge University Press), 311–321.

Brase, G. L. (2008). Pictorial Representations in Statistical Reasoning . Retrieved from Available online at: http://www.k-state.edu/psych/research/documents/2009ACP.pdf (accessed February 13, 2020).

Corter, J. E., and Zahner, D. C. (2007). Use of external visual representations in probability problem solving. Stat. Educ. Res. J. 6, 22–50.

Cosmides, L., and Tooby, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58, 1–73. doi: 10.1016/0010-0277(95)00664-8

Diaz, C., and Fuente, D. (2007). Assessing students' difficulties with conditional probability. Int. Electron. J. Math. Educ. 2, 128–148.

Diepgen, R., Kuypers, W., and Karlheinz, R. (1993). Mathematik Gymnasiale Oberstufe. Allgemeine Ausgabe/Stochastik: Grund- und Leistungskurs. Berlin: Cornelsen Verlag.

Eddy, D. M. (1982). “Probabilistic reasoning in clinical medicine: problems and opportunities,” in Judgment under Uncertainty: Heuristics and Biases , eds D. Kahneman, P. Slovic, and A. Tversky (Cambridge: Cambridge University Press), 335–347.

Gaissmaier, W., Wegwarth, O., Skopec, D., Müller, A.-S., Broschinski, S., and Politi, M. C. (2012). Numbers can be worth a thousand pictures: individual differences in understanding graphical and numerical representations of health-related information. Health Psychol. 31, 286–296. doi: 10.1037/a0024850

Garcia-Retamero, R., Galesic, M., and Gigerenzer, G. (2010). Do icon arrays help reduce denominator neglect? Med. Decision Making 30, 672–684. doi: 10.1177/0272989X10369000

Gigerenzer, G. (1991). How to make cognitive illusions disappear. Beyond “heuristics and biases”. Eur. Rev. Soc. Psychol. 2, 83–115. doi: 10.1080/14792779143000033

CrossRef Full Text | Google Scholar

Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L. M., and Woloshin, S. (2008). Helping doctors and patients make sense of health statistics. Assoc. Psychol. Sci. 8, 53–96. doi: 10.1111/j.1539-6053.2008.00033.x

Gigerenzer, G., and Hoffrage, U. (1995). How to improve bayesian reasoning without instruction: frequency formats. Psychol. Rev. 102, 684–704. doi: 10.1037/0033-295X.102.4.684

Girotto, V., and Gonzalez, M. (2001). Solving probabilistic and statistical problems: a matter of information structure and question form. Cognition 78, 247–276. doi: 10.1016/S0010-0277(00)00133-5

Gould, S. (1992). Bully for Brontosaurus. Further Reflections in Natural History. London, UK: Penguin Books.

Hoffrage, U., Gigerenzer, G., Krauss, S., and Martignon, L. (2002). Representation facilitates reasoning: what natural frequencies are and what they are not. Cognition 84, 343–352. doi: 10.1016/S0010-0277(02)00050-1

Johnson, E. D., and Tubau, E. (2013). Words, numbers, and numeracy: diminishing individual differences in Bayesian reasoning. Learn. Individual Differ. 28, 34–40. doi: 10.1016/j.lindif.2013.09.004

Johnson, E. D., and Tubau, E. (2015). Comprehension and computation in Bayesian problem solving. Front. Psychol. 6:938. doi: 10.3389/fpsyg.2015.00938

Kahneman, D. (2011). Thinking, Fast and Slow . New York, NY: Farrar, Straus and Giroux.

Kahneman, D., Slovic, P., and Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases . Cambridge: Cambridge University Press.

KMK (2004). Bildungsstandards im Fach Mathematik für den Primarbereich . Retrieved from Available online at: https://www.kmk.org/fileadmin/Dateien/veroeffentlichungen_beschluesse/2004/2004_10_15-Bildungsstandards-Mathe-Primar.pdf (accessed Febuary 13, 2020).

Kuntze, S., Gundlach, M., Engel, J., and Martignon, L. (2010). “Aspects of statistical literacy between competency measures and indicators for conceptual knowledge – empirical research in the project ‘RIKO-STAT’,” in Proceedings of the 8th International Conference on Teaching Statistics (ICOTS8) (Ljubljana).

Kurz-Milcke, E., Gigerenzer, G., and Martignon, L. (2011). Risiken durchschauen: grafische und analoge Werkzeuge. Stochastik in der Schule 31, 8–16.

Latten, S., Martignon, L., Monti, M., and Multmeier, J. (2011). Die Förderung erster Kompetenzen für den Umgang mit Risiken bereits in der Grundschule. Ein Projekt von RIKO-STAT und dem Harding Center. Stochastik in der Schule , 31, 17–25.

Lesage, E., Navarrete, G., and De Neys, W. (2013). Evolutionary modules and Bayesian facilitation: The role of general cognitive resources. Think. Reason. 19, 27–53. doi: 10.1080/13546783.2012.713177

Lindmeier, A., and Reiss, L. (2014). Wahrscheinlichkeitsvergleich und inferenzstatistisches Schließen. Fähigkeiten von Kindern des 4. und 6. Schuljahres bei Basisproblemen aus dem Bereich Daten und Zufall. Math. Didactica , 37, 30–60.

Macchi, L. (1995). Pragmatic aspects of the base rate fallacy. Q J. Exp. Psychol. 48, 188–207. doi: 10.1080/14640749508401384

Martignon, L., and Krauss, S. (2009). Hands-on activities for fourth graders: a tool box for decision-making and reckoning with risk. Int. Electron. J. Math. Educ. 4, 227–258.

Martignon, L., and Kurz-Milcke, E. (2006). “Educating children in stochastic modeling: games with stochastic urns and colored tinker-cubes,” in: Proceedings of the 7th International Conference on Teaching Statistics (ICOTS7) (Salvador).

McDowell, M., and Jacobs, P. (2017). Meta-analysis of the effect of natural frequencies on Bayesian reasoning. Psychol. Bull. 143, 1273–1312. doi: 10.1037/bul0000126

Morris, S. B. (2008). Estimating effect sizes from pretest-posttest-control group designs. Organ. Res. Methods 11, 364–386. doi: 10.1177/1094428106291059

Multmeier, J. (2012). Representations Facilitate Bayesian Reasoning: Omputational Facilitation and Ecological Design Revisited. Berlin: Freie Universität Berlin, 2012 (unveröffentlichte Dissertation).

Piattelli-Palmarini, M. (1994). InevitableIillusions: How Mistakes of Reasons Rule our Minds . New York, NY: John Wiley.

Pighin, S., Tentori, K., Savadori, L., and Girotto, V. (2018). Fostering the understanding of positive test results. Ann. Behav. Med. 52 , 909–919. doi: 10.1093/abm/kax065

Pinker, S. (1997). How the Mind Works . New York, NY: W. W. Norton.

Samuels, R., Stich, S., and Bishop, M. (2002). Ending the rationality wars: how to make disputes about human reasoning disappear. In R. Elio (Ed.), Common sense, reasoning and rationality (pp. 311–321). New York: New York: OUP. doi: 10.1093/0195147669.003.0011

Scholz, R. W., and Waschescio, R. (1986). Kognitive Strategien von Kindern bei Zwei-Scheiben Rouletteaufgaben . In Beiträge zum Mathematikunterricht. Hannover: Franzbecker.

Sedlmeier, P. (2001). “Statistik ohne Formeln,” in Anregungen zum Stochastikunterricht: die NCTM-Standards 2000; Klassische und Bayessche Sichtweise im Vergleich , eds M. Borovcnik, J. Engel, and D. Wickmann (Berlin: Verlag Franzbecker), 83–95.

Sedlmeier, P., and Gigerenzer, G. (2001). Teaching bayesian reasoning in less than two hours. Exp. Psychol. General 130, 380–400. doi: 10.1037/0096-3445.130.3.380

Sirota, M., and Juanchich, M. (2011). Role of numeracy and cognitive reflection in Bayesian reasoning with natural frequencies. Studia Psychologica , 53, 151–161.

Sirota, M., Juanchich, M., and Hagmayer, Y. (2014). Ecological rationality or nested sets? Individual differences in cognitive processing predict Bayesian reasoning. Psychonomic Bull. Rev. 21, 198–204. doi: 10.3758/s13423-013-0464-6

Spiegelhalter, D., and Gage, J. (2014). “What can education learn from real-world communication of risk and uncertainty,” in Proceedings of the 8th International Conference on Teaching Statistics (ICOTS9) (Flagstaff), 1–7.

Spiegelhalter, D., Pearson, M., and Short, I. (2011). Visualizing uncertainty about the future. Science 333, 1393–1400. doi: 10.1126/science.1191181

Till, C. (2014). Fostering risk literacy in elementary school. Int. Electron. J. Math. Educ. 9, 85–98.

Till, C. (2015). Entwicklung der Vorstellungen von Grundschülerinnen und Grundschülern zu Risiko und Entscheidungen unter Unsicherheit (Dissertationsschrift) . Retrieved from Available online at: https://phbl-opus.phlb.de/frontdoor/deliver/index/docId/70/file/Dissertation+TILL+-+Risiko.pdf ( accessed on Febraury 13, 2020).

Vallée-Tourangeau, G., Abadie, M., and Vallée-Tourangeau, F. (2015). Interactivity fosters Bayesian reasoning without instruction. J. Exp. Psychol. 144, 581–603. doi: 10.1037/a0039161

Wassner, C. (2004). Förderung Bayesianischen Denkens. Kognitionspsychologische Grundlagen und didaktische Analysen (Dissertationsschrift, Hildesheim). doi: 10.1007/BF03339021

Zhu, L., and Gigerenzer, G. (2006). Children can solve bayesian problems: the role of representation in mental computation. Cognition 98, 287–308.

PubMed Abstract | Google Scholar

Keywords: natural frequencies, Bayesian reasoning, representations, empirical study, primary school

Citation: Till C and Sproesser U (2020) Frequency Formats: How Primary School Stochastics Profits From Cognitive Psychology. Front. Educ. 5:73. doi: 10.3389/feduc.2020.00073

Received: 31 October 2019; Accepted: 07 May 2020; Published: 30 June 2020.

Reviewed by:

Copyright © 2020 Till and Sproesser. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ute Sproesser, utesproesser@uni-koblenz.de

This article is part of the Research Topic

Psychology and Mathematics Education

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

5.2 - writing hypotheses.

The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis (\(H_0\)) and an alternative hypothesis (\(H_a\)).

When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the direction of the test (non-directional, right-tailed or left-tailed), and (3) the value of the hypothesized parameter.

  • At this point we can write hypotheses for a single mean (\(\mu\)), paired means(\(\mu_d\)), a single proportion (\(p\)), the difference between two independent means (\(\mu_1-\mu_2\)), the difference between two proportions (\(p_1-p_2\)), a simple linear regression slope (\(\beta\)), and a correlation (\(\rho\)). 
  • The research question will give us the information necessary to determine if the test is two-tailed (e.g., "different from," "not equal to"), right-tailed (e.g., "greater than," "more than"), or left-tailed (e.g., "less than," "fewer than").
  • The research question will also give us the hypothesized parameter value. This is the number that goes in the hypothesis statements (i.e., \(\mu_0\) and \(p_0\)). For the difference between two groups, regression, and correlation, this value is typically 0.

Hypotheses are always written in terms of population parameters (e.g., \(p\) and \(\mu\)).  The tables below display all of the possible hypotheses for the parameters that we have learned thus far. Note that the null hypothesis always includes the equality (i.e., =).

1.3 Frequency, Frequency Tables, and Levels of Measurement

Once you have a set of data, you will need to organize it so that you can analyze how frequently each datum occurs in the set. However, when calculating the frequency, you may need to round your answers so that they are as precise as possible.

Answers and Rounding Off

A simple way to round off answers is to carry your final answer one more decimal place than was present in the original data. Round off only the final answer. Do not round off any intermediate results, if possible. If it becomes necessary to round off intermediate results, carry them to at least twice as many decimal places as the final answer. Expect that some of your answers will vary from the text due to rounding errors.

It is not necessary to reduce most fractions in this course. Especially in Probability Topics , the chapter on probability, it is more helpful to leave an answer as an unreduced fraction.

Levels of Measurement

The way a set of data is measured is called its level of measurement . Correct statistical procedures depend on a researcher being familiar with levels of measurement. Not every statistical operation can be used with every set of data. Data can be classified into four levels of measurement. They are as follows (from lowest to highest level):

  • Nominal scale level
  • Ordinal scale level
  • Interval scale level
  • Ratio scale level

Data that is measured using a nominal scale is qualitative (categorical) . Categories, colors, names, labels, and favorite foods along with yes or no responses are examples of nominal level data. Nominal scale data are not ordered. For example, trying to classify people according to their favorite food does not make any sense. Putting pizza first and sushi second is not meaningful.

Smartphone companies are another example of nominal scale data. The data are the names of the companies that make smartphones, but there is no agreed upon order of these brands, even though people may have personal preferences. Nominal scale data cannot be used in calculations.

Data that is measured using an ordinal scale is similar to nominal scale data but there is a big difference. The ordinal scale data can be ordered. An example of ordinal scale data is a list of the top five national parks in the United States. The top five national parks in the United States can be ranked from one to five but we cannot measure differences between the data.

Another example of using the ordinal scale is a cruise survey where the responses to questions about the cruise are excellent , good , satisfactory , and unsatisfactory . These responses are ordered from the most desired response to the least desired. But the differences between two pieces of data cannot be measured. Like the nominal scale data, ordinal scale data cannot be used in calculations.

Data that is measured using the interval scale is similar to ordinal level data because it has a definite ordering but there is a difference between data. The differences between interval scale data can be measured though the data does not have a starting point.

Temperature scales like Celsius (C) and Fahrenheit (F) are measured by using the interval scale. In both temperature measurements, 40° is equal to 100° minus 60°. Differences make sense. But 0 degrees does not because, in both scales, 0 is not the absolute lowest temperature. Temperatures like –10 °F and –15 °C exist and are colder than 0.

Interval level data can be used in calculations, but one type of comparison cannot be done. 80 °C is not four times as hot as 20 °C (nor is 80 °F four times as hot as 20 °F). There is no meaning to the ratio of 80 to 20 (or four to one).

Data that is measured using the ratio scale takes care of the ratio problem and gives you the most information. Ratio scale data is like interval scale data, but it has a 0 point and ratios can be calculated. For example, four multiple choice statistics final exam scores are 80, 68, 20 and 92 (out of a possible 100 points). The exams are machine-graded.

The data can be put in order from lowest to highest 20, 68, 80, 92.

The differences between the data have meaning. The score 92 is more than the score 68 by 24 points. Ratios can be calculated. The smallest score is 0. So 80 is four times 20. The score of 80 is four times better than the score of 20.

Twenty students were asked how many hours they worked per day. Their responses, in hours, are as follows: 5, 6, 3, 3, 2, 4, 7, 5, 2, 3, 5, 6, 5, 4, 4, 3, 5, 2, 5, 3.

Table 1.12 lists the different data values in ascending order and their frequencies.

A frequency is the number of times a value of the data occurs. According to Table 1.12 , there are three students who work two hours, five students who work three hours, and so on. The sum of the values in the frequency column, 20, represents the total number of students included in the sample.

A relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes. To find the relative frequencies, divide each frequency by the total number of students in the sample, in this case, 20. Relative frequencies can be written as fractions, percents, or decimals.

The sum of the values in the relative frequency column of Table 1.13 is 20 20 20 20 , or 1.

Cumulative relative frequency is the accumulation of the previous relative frequencies. To find the cumulative relative frequencies, add all the previous relative frequencies to the relative frequency for the current row, as shown in Table 1.14 .

In the first row, the cumulative frequency is simply .15 because it is the only one. In the second row, the relative frequency was .25, so adding that to .15, we get a relative frequency of .40. Continue adding the relative frequencies in each row to get the rest of the column.

The last entry of the cumulative relative frequency column is one, indicating that one hundred percent of the data has been accumulated.

Because of rounding, the relative frequency column may not always sum to one, and the last entry in the cumulative relative frequency column may not be one. However, they each should be close to one.

The following data are the heights (in inches to the nearest half inch) of 100 male semiprofessional soccer players. The heights are continuous data since height is measured. 60, 60.5, 61, 61, 61.5, 63.5, 63.5, 63.5, 64, 64, 64, 64, 64, 64, 64, 64.5, 64.5, 64.5, 64.5, 64.5, 64.5, 64.5, 64.5, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66.5, 66.5, 66.5, 66.5, 66.5, 66.5, 66.5, 66.5, 66.5, 66.5, 66.5, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67.5, 67.5, 67.5, 67.5, 67.5, 67.5, 67.5, 68, 68, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69, 69.5, 69.5, 69.5, 69.5, 69.5, 70, 70, 70, 70, 70, 70, 70.5, 70.5, 70.5, 71, 71, 71, 72, 72, 72, 72.5, 72.5, 73, 73.5, 74

Table 1.15 summarizes the heights in this sample. Since heights are expressed in tenths, the frequency table will use labels measured in hundredths. This ensures that no data value will coincide with the upper or lower limit of an interval.

The data in this table have been grouped into the following intervals:

  • 59.95–61.95 inches
  • 61.95–63.95 inches
  • 63.95–65.95 inches
  • 65.95–67.95 inches
  • 67.95–69.95 inches
  • 69.95–71.95 inches
  • 71.95–73.95 inches
  • 73.95–75.95 inches

This example is used again in Descriptive Statistics , where the method used to compute the intervals will be explained.

In this sample, there are five players whose heights fall within the interval 59.95–61.95 inches, three players whose heights fall within the interval 61.95–63.95 inches, 15 players whose heights fall within the interval 63.95–65.95 inches, 40 players whose heights fall within the interval 65.95–67.95 inches, 17 players whose heights fall within the interval 67.95–69.95 inches, 12 players whose heights fall within the interval 69.95–71.95, seven players whose heights fall within the interval 71.95–73.95, and one player whose heights fall within the interval 73.95–75.95. All heights fall between the endpoints of an interval and not at the endpoints.

Example 1.15

From Table 1.15 , find the percentage of heights that are less than 65.95 inches.

If you look at the first, second, and third rows, the heights are all less than 65.95 inches. There are 5 + 3 + 15 = 23 players whose heights are less than 65.95 inches. The percentage of heights less than 65.95 inches is then 23 100 23 100 or 23 percent. This percentage is the cumulative relative frequency entry in the third row.

Try It 1.15

Table 1.16 shows the amount, in inches, of annual rainfall in a sample of towns.

From Table 1.16 , find the percentage of rainfall that is less than 9.01 inches.

Example 1.16

From Table 1.15 , find the percentage of heights that fall between 61.95 and 65.95 inches.

Add the relative frequencies in the second and third rows: .03 + .15 = .18 or 18 percent.

Try It 1.16

From Table 1.16 , find the percentage of rainfall that is between 6.99 and 13.05 inches.

Example 1.17

Use the heights of the 100 male semiprofessional soccer players in Table 1.15 . Fill in the blanks and check your answers.

  • The percentage of heights that are from 67.95–71.95 inches is ________.
  • The percentage of heights that are from 67.95–73.95 inches is ________.
  • The percentage of heights that are more than 65.95 inches is ________.
  • The number of players in the sample who are between 61.95 and 71.95 inches tall is ________.
  • What kind of data are the heights?
  • Describe how you could gather this data (the heights) so that the data are characteristic of all male semiprofessional soccer players.

Remember, you count frequencies . To find the relative frequency, divide the frequency by the total number of data values. To find the cumulative relative frequency, add all of the previous relative frequencies to the relative frequency for the current row.

  • quantitative continuous
  • get rosters from each team and choose a simple random sample from each

Try It 1.17

From Table 1.16 , find the number of towns that have rainfall between 2.95 and 9.01 inches.

Collaborative Exercise

In your class, have someone conduct a survey of the number of siblings (brothers and sisters) each student has. Create a frequency table. Add to it a relative frequency column and a cumulative relative frequency column. Answer the following questions:

  • What percentage of the students in your class have no siblings?
  • What percentage of the students have from one to three siblings?
  • What percentage of the students have fewer than three siblings?

Example 1.18

Nineteen people were asked how many miles, to the nearest mile, they commute to work each day. The data are as follows: 2 ; 5 ; 7 ; 3 ; 2 ; 10 ; 18 ; 15 ; 20 ; 7 ; 10 ; 18 ; 5 ; 12 ; 13 ; 12 ; 4 ; 5 ; 10 . Table 1.17 was produced.

  • Is the table correct? If it is not correct, what is wrong?
  • True or False: Three percent of the people surveyed commute three miles. If the statement is not correct, what should it be? If the table is incorrect, make the corrections.
  • What fraction of the people surveyed commute five or seven miles?
  • What fraction of the people surveyed commute 12 miles or more? Less than 12 miles? Between five and 13 miles (not including five and 13 miles)?
  • No. The frequency column sums to 18, not 19. Not all cumulative relative frequencies are correct. The table entries for data values 2, 3, 10, and 18 are incorrect. This affects cumulative relative frequency for most values.
  • False. The frequency for three miles should be one; for two miles (left out), two. The cumulative relative frequency column should read 1052, .1579, .2105, .3684, .4737, .6316, .7368, .7895, .8421, .9474, 1.0000.
  • 7 19 7 19 , 12 19 12 19 , 7 19 7 19

Try It 1.18

Table 1.16 represents the amount, in inches, of annual rainfall in a sample of towns. What fraction of towns surveyed get between 11.03 and 13.05 inches of rainfall each year?

Example 1.19

Table 1.18 contains the total number of deaths worldwide as a result of earthquakes for the period from 2000 to 2012.

Answer the following questions:

  • What is the frequency of deaths measured from 2006 through 2009?
  • What percentage of deaths occurred after 2009?
  • What is the relative frequency of deaths that occurred in 2003 or earlier?
  • What is the percentage of deaths that occurred in 2004?
  • What kind of data are the numbers of deaths?
  • The Richter scale is used to quantify the energy produced by an earthquake. Examples of Richter scale numbers are 2.3, 4.0, 6.1, and 7.0. What kind of data are these numbers?
  • 97,118 (11.8 percent)
  • 41.6 percent
  • 67,092/823,356 or 0.081 or 8.1 percent
  • 27.8 percent
  • quantitative discrete

Try It 1.19

Table 1.19 contains the total number of fatal motor vehicle traffic crashes in the United States for the period from 1994–2011.

  • What is the frequency of deaths measured from 2000 through 2004?
  • What percentage of deaths occurred after 2006?
  • What is the relative frequency of deaths that occurred in 2000 or before?
  • What is the percentage of deaths that occurred in 2011?
  • What is the cumulative relative frequency for 2006? Explain what this number tells you about the data.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/1-3-frequency-frequency-tables-and-levels-of-measurement

© Jan 23, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

  • You are here
  • Everything Explained.Today
  • A-Z Contents
  • Frequency format hypothesis

Frequency format hypothesis explained

The frequency format hypothesis is the idea that the brain understands and processes information better when presented in frequency formats rather than a numerical or probability format. Thus according to the hypothesis, presenting information as 1 in 5 people rather than 20% leads to better comprehension. The idea was proposed by German scientist Gerd Gigerenzer , after compilation and comparison of data collected between 1976 and 1997.

Automatic encoding

Certain information about one's experience is often stored in the memory using an implicit encoding process. Where did you sit last time in class? Do you say the word hello or charisma more? People are very good at answering such questions without actively thinking about it or not knowing how they got that information in the first place. This was the observation that lead to Hasher and Zacks' 1979 study on frequency.

Through their research work, Hasher and Zacks found out that information about frequency is stored without the intention of the person. [1] Also, training and feedback does not increase ability to encode frequency. [2] Frequency information was also found to be continually registered in the memory, regardless of age, ability or motivation. [3] The ability to encode frequency also does not decrease with old age, depression or multiple task requirements. [4] They called this characteristic of the frequency encoding as automatic encoding. [2]

Infant study

Another important evidence for the hypothesis came through the study of infants. In one study, 40 newborn infants were tested for their ability to discriminate between 2 dots versus 3 dots and 4 dots versus 6 dots. [5] Even though infants were able to make the discrimination between 2 versus 3 dots, they were not able to distinguish between 4 versus 6 dots. The tested new born infants were only 21 hours to 144 hours old.

Similarly in another study, to test whether infants could recognize numerical correspondences, Starkey et al. designed a series of experiments in which 6 to 8 month old infants were shown pairs of either a display of two objects or a display of three objects. [6] While the displays were still visible, infants heard either two or three drumbeats. Measurement of looking time revealed that the infants looked significantly longer toward the display that matched the number of sounds.

The contingency rule

Later on, Barbara A. Spellmen from University of Texas describes the performance of humans in determining cause and effects as the contingency rule ΔP, defined as

P = P(E|C) - P(E|~C)where P(E|C) is the probability of the effect given the presence of the proposed cause and P(E|~C) is the probability of the effect given the absence of the proposed cause. [7] Suppose we wish to evaluate the performance of a fertilizer. If the plants bloomed 15 out of 20 times when the fertilizer was used, and only 5 out of 20 plants bloomed in the absence of the fertilizer. In this case

P(E|C) = 15/20 = 0.75 P(E|~C)= 5/20 = 0.25 ΔP = P(E|C) - P(E|~C) ΔP = 0.75 - 0.25 = 0.50The ΔP value as a result is always bound between -1 and 1. Even though the contingency rule is a good model of what humans do in predicting one event causation of another, when it comes to predicting outcomes of events with multiple causes, there exists a large deviation from the contingency rule called the cue-interaction-effect.

Cue-interaction-effect

In 1993 Baker Mercer and his team used video games to demonstrate this effect. Each test subject is given the task of helping a tank travel across a mine field using a button that sometimes worked correctly in camouflaging and sometimes did not. [8] As a second cause a spotter plane, a friend or an enemy would sometimes fly over the tank. After 40 trials, the test subjects were asked to evaluate the effectiveness of the camouflage and the plane in helping the tank through the minefield. They were asked to give it a number between -100 and 100.

Mathematically, there are two contingency values possible for the plane: the plane was either irrelevant to tank's success, then ΔP=0(.5/0 condition) and the plane was relevant to the plane's success, ΔP=1 (.5/1 condition). Even though the ΔP for the camouflage in either condition is 0.5, the test subjects evaluated the ΔP of camouflage to be much higher in the .5/0 condition than in the .5/1 condition. The results are shown in table below.

Gigerenzer contributions

Several experiments have been performed that shows that ordinary and sometimes skilled people make basic probabilistic fallacies , especially in the case of Bayesian inference quizzes. [10] [11] [12] [13] Gigerenzer claims that the observed errors are consistent with the way we acquired mathematical abilities during the course of human evolution. [14] [15] Gigerenzer argues that the problem with these quizzes is the way the information is presented. During these quizzes the information is presented in percentages. [16] [17] Gigerenzer argues that presenting information in frequency format would help in solving these puzzles accurately. He argues that evolutionary the brain physiologically evolved to understand frequency information better than probability information. Thus if the Bayesian quizzes were asked in frequency format, then test subjects would be better at it. Gigerenzer calls this idea the frequency format hypothesis in his published paper titled "The psychology of good judgment: frequency formats and simple algorithms".

Supporting arguments

Evolutionary perspective.

Gigerenzer argued that from an evolutionary point of view, a frequency method was easier and more communicable compared to conveying information in probability format. He argues that probability and percentages are rather recent forms of representation as opposed to frequency. The first known existence of a representative form of percentages is in the seventeenth century. [18] He also argues that more information is given in the case of frequency representation. For instance, conveying data as 50 out of 100, using the frequency form, as opposed to saying 50%, using the probability format, gives the users more information about the sample size. This can in turn make the data and results more reliable and more appealing.

Elaborate encoding

An explanation given as to why people choose encounter frequency is that in the case of frequencies, the subjects are given vivid descriptions, while with probabilities only a dry number is given to the subject. [19] Therefore, in the case of frequency, subjects are given more recall cues. This could in turn mean that the frequency encounters are remembered by the brain more often than in the case of probability numbers. Thus this might be a reason why people in general intuitively choose frequency encountered choices rather than probability based choices.

Sequential input

Yet another explanation offered by the authors is the fact that in the case of frequency, people often come across them multiple times and have a sequential input, compared to a probability value, which is given in one time. From John Medina ’s Brain Rules , sequential input can lead to a stronger memory than a onetime input. This can be a primary reason why humans choose frequency encounters over probability. [20]

Easier storage

Another rationale provided in justifying the frequency format hypothesis is that using frequencies makes it easier to keep track and update a database of events. For example, if an event happened 3 out of 6 times, the probability format would store this as 50%, whereas in frequency format it is stored as 3 out of 6. Now imagine that the event does not happen this time. The frequency format can be updated to 3 out of 7. However, for the probability format updating is extremely harder.

Classifying information

Frequency representation can also be helpful in keeping track of classes and statistical information. Picture a scenario where every 500 out of 1000 people die due to lung cancer . However, 40 of those 1000 were smokers and 20 out of the 40 had a genetic condition predisposed to possible lung cancer. Such class division and information storage can only be done using frequency format, since a number .05% probability of having lung cancer does not give any information or allow to calculate such information.

Refuting arguments

Nested-sets hypothesis.

Frequency-format studies tend to share a confound -- namely that when presenting frequency information, the researchers also make clear the reference class they are referring to. For example, consider these three different ways to formulate the same problem: [21]

Probability Format

"Consider a test to detect a disease that a given American has a 1/1000 chance of getting. An individual that does not have the disease has a 50/1000 chance of testing positive. An individual that does have the disease will definitely test positive.

What is the chance that a person found to have a positive result actually has the disease, assuming that you nothing about the person’s symptoms or signs? _____%"

Frequency Format

"One out of every 1000 Americans has disease X. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, out of every 1000 people who are perfectly healthy, 50 of them test positive for the disease.

Imagine we have assembled a random sample of 1000 Americans. They were selected by lottery. Those who conducted the lottery had no information about the health status of any of these people.

Given the information above, on average, how many people who test positive for the disease actually have the disease? _____out of_____."

Probability Format Highlighting Set-Subset Structure of the Problem

"The prevalence of disease X among Americans is 1/1000. A test has been developed to detect when a person has disease X. Every time the test is given to a person who has the disease, the test comes out positive. But sometimes the test also comes out positive when it is given to a person who is completely healthy. Specifically, the chance is 50/1000 that someone who is perfectly healthy would test positive for the disease.

Imagine we have just given the test to a random sample of Americans. They were selected by lottery. Those who conducted the lottery had no information about the health status of any of these people.

What is the chance that a person found to have a positive result actually has the disease? _____%"

All three problems make clear the set of 1/1000 Americans who have the disease and that the test has perfect sensitivity (100% of people with the disease will receive a positive test) and that 50/1000 healthy people will receive a positive test (e.g., false positives). However, the latter two formats additionally highlights the separate classes within the population (e.g., positive test (with disease/without disease), negative test (without disease)), and therefore makes it easier for people to choose the correct class (people with a positive test) to reason with (thus generating something close to the correct answer—1/51/~2%.) Both frequency and Probability format highlighting set-subset structures lead to similar rates of correct answers, whereas the probability format alone leads to fewer correct answers (as people are likely to rely on the incorrect class in this case.) Research has also shown that one can reduce performance in the frequency format by disguising the set-subset relationships in the problem (just as in the standard probability format), thus demonstrating that it is not, in fact, the frequency format, but instead, the highlighting of the set-subset structure that improves judgments.

Ease of comparison

Critics of the frequency format hypothesis argue that probability formats allow for much easier comparison than frequency format representation of data. In some cases, using frequency formats actually does allow for easy comparison. If team A wins 19 of its 29 games, and another team B wins 10 of its 29 games, one can clearly see that team A is much better than team B. However comparison in frequency format is not always this clear and easy. If team A won 19 out of its 29 games, comparing this team with team B that won 6 out of its 11 games becomes much harder in frequency format. But, in the probability format, one could say since 65.6%(19/29) is greater than 54.5%, one could much easily compare the two.

Memory burden

Tooby and Cosmides had argued that frequency representation helps update data easier each time one gets new data. [22] However this involves updating both numbers. Referring back to the example of teams, if team A won its 31st game, note that both the number of games won(20->21) and the number of games played(30->31) has to be updated. In the case of probability the only number to be updated is the single percentage number. Also, this number could be updated over the course of 10 games instead of updating each game, which cannot be done in the case of frequency format.

Notes and References

  • Hasher . L. . Zacks . R. . 1984 . Automatic processing of fundamental information: the case of frequency of occurrence . The American Psychologist . 39 . 12. 1372–1388 . 10.1037/0003-066x.39.12.1372 . 6395744.
  • Hasher . Lynn . Zacks . Rose T. . 1979 . Automatic and effortful processes in memory . Journal of Experimental Psychology: General . 108 . 3. 356–388 . 10.1037/0096-3445.108.3.356 .
  • Hasher . L. . Chromiak . W. . 1977 . The processing of frequency information: An automatic mechanism? . Journal of Verbal Learning and Verbal Behavior . 16 . 2. 173–184 . 10.1016/s0022-5371(77)80045-5.
  • Antell . S. E. . Keating . D. P. . 1983 . Perception of numerical invariance in neonates . Child Development . 54 . 3. 695–701 . 10.2307/1130057 . 1130057 . 6851716 .
  • Starkey . P. . Spelke . E. . Gelman . R. . 1990 . Numerical abstraction by human infants . Cognition . 36 . 2. 97–127 . 10.1016/0010-0277(90)90001-z. 2225757 . 706365 .
  • Spellman . B. A. . 1996 . Acting as intuitive scientists: Contingency judgments are made while controlling for alternative potential causes . Psychological Science . 7 . 6. 337–342 . 10.1111/j.1467-9280.1996.tb00385.x. 143455322 .
  • Baker . A.G. . Mercier . Pierre . Vallée-Tourangeau . Frédéric . Frank . Robert . Pan . Maria . 1993 . Selective Associations and Causality Judgments: Presence of a Strong Causal Factor May Reduce Judgments of a Weaker One . Journal of Experimental Psychology: Learning, Memory, and Cognition . 19 . 2. 414–432 . 10.1037/0278-7393.19.2.414.
  • A.G. Baker, Robin A. Murphy, Associative and Normative Models of Causal Induction: Reacting to Versus Understanding Cause, In: David R. Shanks, Douglas L. Medin and Keith J. Holyoak, Editor(s), Psychology of Learning and Motivation, Academic Press, 1996, Volume 34, Pages 1-45, ISSN 0079-7421,,
  • Sloman . S. A. . Over . D. . Slovak . L. . Stibel . J. M. . 2003 . Frequency illusions and other fallacies . Organizational Behavior and Human Decision Processes . 91 . 2. 296–309 . 10.1016/s0749-5978(03)00021-9 . 10.1.1.19.8677 .
  • Birnbaum . M. H. . Mellers . B. A. . 1983 . Bayesian inference: Combining base rates with opinions of sources who vary in credibility . Journal of Personality and Social Psychology . 45 . 4. 792–804 . 10.1037/0022-3514.45.4.792 .
  • Murphy . G. L. . Ross . B. H. . 2010 . Uncertainty in category-based induction: When do people integrate across categories?. . Journal of Experimental Psychology: Learning, Memory, and Cognition . 36 . 2. 263–276 . 10.1037/a0018685 . 20192530 . 2856341 .
  • Sirota . M. . Juanchich . M. . 2011 . ROLE OF NUMERACY AND COGNITIVE REFLECTION IN BAYESIAN REASONING WITH NATURAL FREQUENCIES . Studia Psychologica . 53 . 2. 151–161 .
  • Gigerenzer . G . 1996 . The psychology of good judgment. frequency formats and simple algorithms . Medical Decision Making . 16 . 3. 273–280 . 10.1177/0272989X9601600312 . 8818126 . 14885938 .
  • Gigerenzer, G. (2002). Calculated risks, how to know when numbers deceive you. (p. 310). New York: Simon & Schuster.
  • Daston . L. . Gigerenzer . G. . 1989 . The Problem of Irrationality . Science . 244 . 4908. 1094–5 . 10.1126/science.244.4908.1094. 17741045 .
  • Reyna . V. F. . Brainerd . C. J. . 2008 . Numeracy, ratio bias, and denominator neglect in judgments of risk and probability . Learning and Individual Differences . 18 . 1. 89–107 . 10.1016/j.lindif.2007.03.011 .
  • Hacking, I. (1986). The emergence of probability, a philosophical study of early ideas about probability, induction and statistical inference. London: Cambridge Univ Pr.
  • Obrecht . N. A. . Chapman . G. B. . Gelman . R. . 2009 . An encounter frequency account of how experience affects likelihood estimation . Memory & Cognition . 37 . 5. 632–643 . 10.3758/mc.37.5.632 . 19487755 . free .
  • Medina, J. (2010). Brain rules, 12 principles for surviving and thriving at work, home, and school. Seattle, WA: Pear Pr.
  • Cosmides. L. Tooby. J. 1996. Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty . Cognition. en. 58. 1. 1–73. 10.1016/0010-0277(95)00664-8. 18631755.
  • Cosmides . L. . Tooby . J. . 1996 . Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature on judgment under uncertainty . Cognition . 58 . 1–73 . 10.1016/0010-0277(95)00664-8. 10.1.1.131.8290 . 18631755 .

This article is licensed under the GNU Free Documentation License . It uses material from the Wikipedia article " Frequency format hypothesis ".

Except where otherwise indicated, Everything.Explained.Today is © Copyright 2009-2024, A B Cryer, All Rights Reserved. Cookie policy .

Hypothesis Testing - Chi Squared Test

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

Introductory word scramble

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific tests considered here are called chi-square tests and are appropriate when the outcome is discrete (dichotomous, ordinal or categorical). For example, in some clinical trials the outcome is a classification such as hypertensive, pre-hypertensive or normotensive. We could use the same classification in an observational study such as the Framingham Heart Study to compare men and women in terms of their blood pressure status - again using the classification of hypertensive, pre-hypertensive or normotensive status.  

The technique to analyze a discrete outcome uses what is called a chi-square test. Specifically, the test statistic follows a chi-square probability distribution. We will consider chi-square tests here with one, two and more than two independent comparison groups.

Learning Objectives

After completing this module, the student will be able to:

  • Perform chi-square tests by hand
  • Appropriately interpret results of chi-square tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

Tests with One Sample, Discrete Outcome

Here we consider hypothesis testing with a discrete outcome variable in a single population. Discrete variables are variables that take on more than two distinct responses or categories and the responses can be ordered or unordered (i.e., the outcome can be ordinal or categorical). The procedure we describe here can be used for dichotomous (exactly 2 response options), ordinal or categorical discrete outcomes and the objective is to compare the distribution of responses, or the proportions of participants in each response category, to a known distribution. The known distribution is derived from another study or report and it is again important in setting up the hypotheses that the comparator distribution specified in the null hypothesis is a fair comparison. The comparator is sometimes called an external or a historical control.   

In one sample tests for a discrete outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the proportions of participants in each response

Test Statistic for Testing H 0 : p 1 = p 10 , p 2 = p 20 , ..., p k = p k0

We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1. In the test statistic, O = observed frequency and E=expected frequency in each of the response categories. The observed frequencies are those observed in the sample and the expected frequencies are computed as described below. χ 2 (chi-square) is another probability distribution and ranges from 0 to ∞. The test above statistic formula above is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories.  

When we conduct a χ 2 test, we compare the observed frequencies in each response category to the frequencies we would expect if the null hypothesis were true. These expected frequencies are determined by allocating the sample to the response categories according to the distribution specified in H 0 . This is done by multiplying the observed sample size (n) by the proportions specified in the null hypothesis (p 10 , p 20 , ..., p k0 ). To ensure that the sample size is appropriate for the use of the test statistic above, we need to ensure that the following: min(np 10 , n p 20 , ..., n p k0 ) > 5.  

The test of hypothesis with a discrete outcome measured in a single sample, where the goal is to assess whether the distribution of responses follows a known distribution, is called the χ 2 goodness-of-fit test. As the name indicates, the idea is to assess whether the pattern or distribution of responses in the sample "fits" a specified population (external or historical) distribution. In the next example we illustrate the test. As we work through the example, we provide additional details related to the use of this new test statistic.  

A University conducted a survey of its recent graduates to collect demographic and health information for future planning purposes as well as to assess students' satisfaction with their undergraduate experiences. The survey revealed that a substantial proportion of students were not engaging in regular exercise, many felt their nutrition was poor and a substantial number were smoking. In response to a question on regular exercise, 60% of all graduates reported getting no regular exercise, 25% reported exercising sporadically and 15% reported exercising regularly as undergraduates. The next year the University launched a health promotion campaign on campus in an attempt to increase health behaviors among undergraduates. The program included modules on exercise, nutrition and smoking cessation. To evaluate the impact of the program, the University again surveyed graduates and asked the same questions. The survey was completed by 470 graduates and the following data were collected on the exercise question:

Based on the data, is there evidence of a shift in the distribution of responses to the exercise question following the implementation of the health promotion campaign on campus? Run the test at a 5% level of significance.

In this example, we have one sample and a discrete (ordinal) outcome variable (with three response options). We specifically want to compare the distribution of responses in the sample to the distribution reported the previous year (i.e., 60%, 25%, 15% reporting no, sporadic and regular exercise, respectively). We now run the test using the five-step approach.  

  • Step 1. Set up hypotheses and determine level of significance.

The null hypothesis again represents the "no change" or "no difference" situation. If the health promotion campaign has no impact then we expect the distribution of responses to the exercise question to be the same as that measured prior to the implementation of the program.

H 0 : p 1 =0.60, p 2 =0.25, p 3 =0.15,  or equivalently H 0 : Distribution of responses is 0.60, 0.25, 0.15  

H 1 :   H 0 is false.          α =0.05

Notice that the research hypothesis is written in words rather than in symbols. The research hypothesis as stated captures any difference in the distribution of responses from that specified in the null hypothesis. We do not specify a specific alternative distribution, instead we are testing whether the sample data "fit" the distribution in H 0 or not. With the χ 2 goodness-of-fit test there is no upper or lower tailed version of the test.

  • Step 2. Select the appropriate test statistic.  

The test statistic is:

We must first assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=470 and the proportions specified in the null hypothesis are 0.60, 0.25 and 0.15. Thus, min( 470(0.65), 470(0.25), 470(0.15))=min(282, 117.5, 70.5)=70.5. The sample size is more than adequate so the formula can be used.

  • Step 3. Set up decision rule.  

The decision rule for the χ 2 test depends on the level of significance and the degrees of freedom, defined as degrees of freedom (df) = k-1 (where k is the number of response categories). If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. Critical values can be found in a table of probabilities for the χ 2 distribution. Here we have df=k-1=3-1=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule is as follows: Reject H 0 if χ 2 > 5.99.

  • Step 4. Compute the test statistic.  

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) and the expected frequencies into the formula for the test statistic identified in Step 2. The computations can be organized as follows.

Notice that the expected frequencies are taken to one decimal place and that the sum of the observed frequencies is equal to the sum of the expected frequencies. The test statistic is computed as follows:

  • Step 5. Conclusion.  

We reject H 0 because 8.46 > 5.99. We have statistically significant evidence at α=0.05 to show that H 0 is false, or that the distribution of responses is not 0.60, 0.25, 0.15.  The p-value is p < 0.005.  

In the χ 2 goodness-of-fit test, we conclude that either the distribution specified in H 0 is false (when we reject H 0 ) or that we do not have sufficient evidence to show that the distribution specified in H 0 is false (when we fail to reject H 0 ). Here, we reject H 0 and concluded that the distribution of responses to the exercise question following the implementation of the health promotion campaign was not the same as the distribution prior. The test itself does not provide details of how the distribution has shifted. A comparison of the observed and expected frequencies will provide some insight into the shift (when the null hypothesis is rejected). Does it appear that the health promotion campaign was effective?  

Consider the following: 

If the null hypothesis were true (i.e., no change from the prior year) we would have expected more students to fall in the "No Regular Exercise" category and fewer in the "Regular Exercise" categories. In the sample, 255/470 = 54% reported no regular exercise and 90/470=19% reported regular exercise. Thus, there is a shift toward more regular exercise following the implementation of the health promotion campaign. There is evidence of a statistical difference, is this a meaningful difference? Is there room for improvement?

The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in categories) among Americans in 2002. The distribution was based on specific values of body mass index (BMI) computed as weight in kilograms over height in meters squared. Underweight was defined as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were distributed as follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we want to assess whether the distribution of BMI is different in the Framingham Offspring sample. Using data from the n=3,326 participants who attended the seventh examination of the Offspring in the Framingham Heart Study we created the BMI categories as defined and observed the following:

  • Step 1.  Set up hypotheses and determine level of significance.

H 0 : p 1 =0.02, p 2 =0.39, p 3 =0.36, p 4 =0.23     or equivalently

H 0 : Distribution of responses is 0.02, 0.39, 0.36, 0.23

H 1 :   H 0 is false.        α=0.05

The formula for the test statistic is:

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=3,326 and the proportions specified in the null hypothesis are 0.02, 0.39, 0.36 and 0.23. Thus, min( 3326(0.02), 3326(0.39), 3326(0.36), 3326(0.23))=min(66.5, 1297.1, 1197.4, 765.0)=66.5. The sample size is more than adequate, so the formula can be used.

Here we have df=k-1=4-1=3 and a 5% level of significance. The appropriate critical value is 7.81 and the decision rule is as follows: Reject H 0 if χ 2 > 7.81.

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) into the formula for the test statistic identified in Step 2. We organize the computations in the following table.

The test statistic is computed as follows:

We reject H 0 because 233.53 > 7.81. We have statistically significant evidence at α=0.05 to show that H 0 is false or that the distribution of BMI in Framingham is different from the national data reported in 2002, p < 0.005.  

Again, the χ 2   goodness-of-fit test allows us to assess whether the distribution of responses "fits" a specified distribution. Here we show that the distribution of BMI in the Framingham Offspring Study is different from the national distribution. To understand the nature of the difference we can compare observed and expected frequencies or observed and expected proportions (or percentages). The frequencies are large because of the large sample size, the observed percentages of patients in the Framingham sample are as follows: 0.6% underweight, 28% normal weight, 41% overweight and 30% obese. In the Framingham Offspring sample there are higher percentages of overweight and obese persons (41% and 30% in Framingham as compared to 36% and 23% in the national data), and lower proportions of underweight and normal weight persons (0.6% and 28% in Framingham as compared to 2% and 39% in the national data). Are these meaningful differences?

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable in a single population. We presented a test using a test statistic Z to test whether an observed (sample) proportion differed significantly from a historical or external comparator. The chi-square goodness-of-fit test can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square goodness-of-fit test.

The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?

We presented the following approach to the test using a Z statistic. 

  • Step 1. Set up hypotheses and determine level of significance

H 0 : p = 0.75

H 1 : p ≠ 0.75                               α=0.05

We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 125(0.75), 125(1-0.75))=min(94, 31)=31. The sample size is more than adequate so the following formula can be used

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. The sample proportion is:

frequency format hypothesis

We reject H 0 because -6.15 < -1.960. We have statistically significant evidence at a =0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data. (p < 0.0001).  

We now conduct the same test using the chi-square goodness-of-fit test. First, we summarize our sample data as follows:

H 0 : p 1 =0.75, p 2 =0.25     or equivalently H 0 : Distribution of responses is 0.75, 0.25 

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ...,np k >) > 5. The sample size here is n=125 and the proportions specified in the null hypothesis are 0.75, 0.25. Thus, min( 125(0.75), 125(0.25))=min(93.75, 31.25)=31.25. The sample size is more than adequate so the formula can be used.

Here we have df=k-1=2-1=1 and a 5% level of significance. The appropriate critical value is 3.84, and the decision rule is as follows: Reject H 0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

(Note that (-6.15) 2 = 37.8, where -6.15 was the value of the Z statistic in the test for proportions shown above.)

We reject H 0 because 37.8 > 3.84. We have statistically significant evidence at α=0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data.  (p < 0.0001). This is the same conclusion we reached when we conducted the test using the Z test above. With a dichotomous outcome, Z 2 = χ 2 !   In statistics, there are often several approaches that can be used to test hypotheses. 

Tests for Two or More Independent Samples, Discrete Outcome

Here we extend that application of the chi-square test to the case with two or more independent comparison groups. Specifically, the outcome of interest is discrete with two or more responses and the responses can be ordered or unordered (i.e., the outcome can be dichotomous, ordinal or categorical). We now consider the situation where there are two or more independent comparison groups and the goal of the analysis is to compare the distribution of responses to the discrete outcome variable among several independent comparison groups.  

The test is called the χ 2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across comparison groups. This is often stated as follows: The outcome variable and the grouping variable (e.g., the comparison treatments or comparison groups) are independent (hence the name of the test). Independence here implies homogeneity in the distribution of the outcome among comparison groups.    

The null hypothesis in the χ 2 test of independence is often stated in words as: H 0 : The distribution of the outcome is independent of the groups. The alternative or research hypothesis is that there is a difference in the distribution of responses to the outcome variable among the comparison groups (i.e., that the distribution of responses "depends" on the group). In order to test the hypothesis, we measure the discrete outcome variable in each participant in each comparison group. The data of interest are the observed frequencies (or number of participants in each response category in each group). The formula for the test statistic for the χ 2 test of independence is given below.

Test Statistic for Testing H 0 : Distribution of outcome is independent of groups

and we find the critical value in a table of probabilities for the chi-square distribution with df=(r-1)*(c-1).

Here O = observed frequency, E=expected frequency in each of the response categories in each group, r = the number of rows in the two-way table and c = the number of columns in the two-way table.   r and c correspond to the number of comparison groups and the number of response options in the outcome (see below for more details). The observed frequencies are the sample data and the expected frequencies are computed as described below. The test statistic is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories in each group.  

The data for the χ 2 test of independence are organized in a two-way table. The outcome and grouping variable are shown in the rows and columns of the table. The sample table below illustrates the data layout. The table entries (blank below) are the numbers of participants in each group responding to each response category of the outcome variable.

Table - Possible outcomes are are listed in the columns; The groups being compared are listed in rows.

In the table above, the grouping variable is shown in the rows of the table; r denotes the number of independent groups. The outcome variable is shown in the columns of the table; c denotes the number of response options in the outcome variable. Each combination of a row (group) and column (response) is called a cell of the table. The table has r*c cells and is sometimes called an r x c ("r by c") table. For example, if there are 4 groups and 5 categories in the outcome variable, the data are organized in a 4 X 5 table. The row and column totals are shown along the right-hand margin and the bottom of the table, respectively. The total sample size, N, can be computed by summing the row totals or the column totals. Similar to ANOVA, N does not refer to a population size here but rather to the total sample size in the analysis. The sample data can be organized into a table like the above. The numbers of participants within each group who select each response option are shown in the cells of the table and these are the observed frequencies used in the test statistic.

The test statistic for the χ 2 test of independence involves comparing observed (sample data) and expected frequencies in each cell of the table. The expected frequencies are computed assuming that the null hypothesis is true. The null hypothesis states that the two variables (the grouping variable and the outcome) are independent. The definition of independence is as follows:

 Two events, A and B, are independent if P(A|B) = P(A), or equivalently, if P(A and B) = P(A) P(B).

The second statement indicates that if two events, A and B, are independent then the probability of their intersection can be computed by multiplying the probability of each individual event. To conduct the χ 2 test of independence, we need to compute expected frequencies in each cell of the table. Expected frequencies are computed by assuming that the grouping variable and outcome are independent (i.e., under the null hypothesis). Thus, if the null hypothesis is true, using the definition of independence:

P(Group 1 and Response Option 1) = P(Group 1) P(Response Option 1).

 The above states that the probability that an individual is in Group 1 and their outcome is Response Option 1 is computed by multiplying the probability that person is in Group 1 by the probability that a person is in Response Option 1. To conduct the χ 2 test of independence, we need expected frequencies and not expected probabilities . To convert the above probability to a frequency, we multiply by N. Consider the following small example.

The data shown above are measured in a sample of size N=150. The frequencies in the cells of the table are the observed frequencies. If Group and Response are independent, then we can compute the probability that a person in the sample is in Group 1 and Response category 1 using:

P(Group 1 and Response 1) = P(Group 1) P(Response 1),

P(Group 1 and Response 1) = (25/150) (62/150) = 0.069.

Thus if Group and Response are independent we would expect 6.9% of the sample to be in the top left cell of the table (Group 1 and Response 1). The expected frequency is 150(0.069) = 10.4.   We could do the same for Group 2 and Response 1:

P(Group 2 and Response 1) = P(Group 2) P(Response 1),

P(Group 2 and Response 1) = (50/150) (62/150) = 0.138.

The expected frequency in Group 2 and Response 1 is 150(0.138) = 20.7.

Thus, the formula for determining the expected cell frequencies in the χ 2 test of independence is as follows:

Expected Cell Frequency = (Row Total * Column Total)/N.

The above computes the expected frequency in one step rather than computing the expected probability first and then converting to a frequency.  

In a prior example we evaluated data from a survey of university graduates which assessed, among other things, how frequently they exercised. The survey was completed by 470 graduates. In the prior example we used the χ 2 goodness-of-fit test to assess whether there was a shift in the distribution of responses to the exercise question following the implementation of a health promotion campaign on campus. We specifically considered one sample (all students) and compared the observed distribution to the distribution of responses the prior year (a historical control). Suppose we now wish to assess whether there is a relationship between exercise on campus and students' living arrangements. As part of the same survey, graduates were asked where they lived their senior year. The response options were dormitory, on-campus apartment, off-campus apartment, and at home (i.e., commuted to and from the university). The data are shown below.

Based on the data, is there a relationship between exercise and student's living arrangement? Do you think where a person lives affect their exercise status? Here we have four independent comparison groups (living arrangement) and a discrete (ordinal) outcome variable with three response options. We specifically want to test whether living arrangement and exercise are independent. We will run the test using the five-step approach.  

H 0 : Living arrangement and exercise are independent

H 1 : H 0 is false.                α=0.05

The null and research hypotheses are written in words rather than in symbols. The research hypothesis is that the grouping variable (living arrangement) and the outcome variable (exercise) are dependent or related.   

  • Step 2.  Select the appropriate test statistic.  

The condition for appropriate use of the above test statistic is that each expected frequency is at least 5. In Step 4 we will compute the expected frequencies and we will ensure that the condition is met.

The decision rule depends on the level of significance and the degrees of freedom, defined as df = (r-1)(c-1), where r and c are the numbers of rows and columns in the two-way data table.   The row variable is the living arrangement and there are 4 arrangements considered, thus r=4. The column variable is exercise and 3 responses are considered, thus c=3. For this test, df=(4-1)(3-1)=3(2)=6. Again, with χ 2 tests there are no upper, lower or two-tailed tests. If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. The rejection region for the χ 2 test of independence is always in the upper (right-hand) tail of the distribution. For df=6 and a 5% level of significance, the appropriate critical value is 12.59 and the decision rule is as follows: Reject H 0 if c 2 > 12.59.

We now compute the expected frequencies using the formula,

Expected Frequency = (Row Total * Column Total)/N.

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency.   The expected frequencies are shown in parentheses.

Notice that the expected frequencies are taken to one decimal place and that the sums of the observed frequencies are equal to the sums of the expected frequencies in each row and column of the table.  

Recall in Step 2 a condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 9.6) and therefore it is appropriate to use the test statistic.

We reject H 0 because 60.5 > 12.59. We have statistically significant evidence at a =0.05 to show that H 0 is false or that living arrangement and exercise are not independent (i.e., they are dependent or related), p < 0.005.  

Again, the χ 2 test of independence is used to test whether the distribution of the outcome variable is similar across the comparison groups. Here we rejected H 0 and concluded that the distribution of exercise is not independent of living arrangement, or that there is a relationship between living arrangement and exercise. The test provides an overall assessment of statistical significance. When the null hypothesis is rejected, it is important to review the sample data to understand the nature of the relationship. Consider again the sample data. 

Because there are different numbers of students in each living situation, it makes the comparisons of exercise patterns difficult on the basis of the frequencies alone. The following table displays the percentages of students in each exercise category by living arrangement. The percentages sum to 100% in each row of the table. For comparison purposes, percentages are also shown for the total sample along the bottom row of the table.

From the above, it is clear that higher percentages of students living in dormitories and in on-campus apartments reported regular exercise (31% and 23%) as compared to students living in off-campus apartments and at home (10% each).  

Test Yourself

 Pancreaticoduodenectomy (PD) is a procedure that is associated with considerable morbidity. A study was recently conducted on 553 patients who had a successful PD between January 2000 and December 2010 to determine whether their Surgical Apgar Score (SAS) is related to 30-day perioperative morbidity and mortality. The table below gives the number of patients experiencing no, minor, or major morbidity by SAS category.  

Question: What would be an appropriate statistical test to examine whether there is an association between Surgical Apgar Score and patient outcome? Using 14.13 as the value of the test statistic for these data, carry out the appropriate test at a 5% level of significance. Show all parts of your test.

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable and two independent comparison groups. We presented a test using a test statistic Z to test for equality of independent proportions. The chi-square test of independence can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square test of independence.

A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.

We tested whether there was a significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using a Z statistic, as follows. 

H 0 : p 1 = p 2    

H 1 : p 1 ≠ p 2                             α=0.05

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group or that:

In this example, we have

Therefore, the sample size is adequate, so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:

We now substitute to compute the test statistic.

  • Step 5.  Conclusion.  

We now conduct the same test using the chi-square test of independence.  

H 0 : Treatment and outcome (meaningful reduction in pain) are independent

H 1 :   H 0 is false.         α=0.05

The formula for the test statistic is:  

For this test, df=(2-1)(2-1)=1. At a 5% level of significance, the appropriate critical value is 3.84 and the decision rule is as follows: Reject H0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

We now compute the expected frequencies using:

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency. The expected frequencies are shown in parentheses.

A condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 22.0) and therefore it is appropriate to use the test statistic.

(Note that (2.53) 2 = 6.4, where 2.53 was the value of the Z statistic in the test for proportions shown above.)

Chi-Squared Tests in R

The video below by Mike Marin demonstrates how to perform chi-squared tests in the R programming language.

Answer to Problem on Pancreaticoduodenectomy and Surgical Apgar Scores

We have 3 independent comparison groups (Surgical Apgar Score) and a categorical outcome variable (morbidity/mortality). We can run a Chi-Squared test of independence.

H 0 : Apgar scores and patient outcome are independent of one another.

H A : Apgar scores and patient outcome are not independent.

Chi-squared = 14.3

Since 14.3 is greater than 9.49, we reject H 0.

There is an association between Apgar scores and patient outcome. The lowest Apgar score group (0 to 4) experienced the highest percentage of major morbidity or mortality (16 out of 57=28%) compared to the other Apgar score groups.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 08 May 2024

Decoupling excitons from high-frequency vibrations in organic molecules

  • Pratyush Ghosh   ORCID: orcid.org/0000-0001-5780-3718 1 ,
  • Antonios M. Alvertis   ORCID: orcid.org/0000-0001-5916-3419 2 , 3 ,
  • Rituparno Chowdhury 1 ,
  • Petri Murto   ORCID: orcid.org/0000-0001-7618-000X 1 , 4 ,
  • Alexander J. Gillett   ORCID: orcid.org/0000-0001-7572-7333 1 ,
  • Shengzhi Dong   ORCID: orcid.org/0000-0002-5052-5767 5 ,
  • Alexander J. Sneyd 1 ,
  • Hwan-Hee Cho   ORCID: orcid.org/0000-0002-2205-729X 1 ,
  • Emrys W. Evans   ORCID: orcid.org/0000-0002-9092-3938 1 , 6 ,
  • Bartomeu Monserrat   ORCID: orcid.org/0000-0002-4233-4071 1 , 7 ,
  • Feng Li   ORCID: orcid.org/0000-0001-5236-3709 5 ,
  • Christoph Schnedermann   ORCID: orcid.org/0000-0002-2841-8586 1 ,
  • Hugo Bronstein 1 , 4 ,
  • Richard H. Friend   ORCID: orcid.org/0000-0001-6565-6308 1 &
  • Akshay Rao   ORCID: orcid.org/0000-0003-4261-0766 1  

Nature volume  629 ,  pages 355–362 ( 2024 ) Cite this article

8449 Accesses

81 Altmetric

Metrics details

  • Applied physics
  • Electronic devices
  • Optoelectronic devices and components
  • Photonic devices
  • Semiconductors

The coupling of excitons in π-conjugated molecules to high-frequency vibrational modes, particularly carbon–carbon stretch modes (1,000–1,600 cm −1 ) has been thought to be unavoidable 1 , 2 . These high-frequency modes accelerate non-radiative losses and limit the performance of light-emitting diodes, fluorescent biomarkers and photovoltaic devices. Here, by combining broadband impulsive vibrational spectroscopy, first-principles modelling and synthetic chemistry, we explore exciton–vibration coupling in a range of π-conjugated molecules. We uncover two design rules that decouple excitons from high-frequency vibrations. First, when the exciton wavefunction has a substantial charge-transfer character with spatially disjoint electron and hole densities, we find that high-frequency modes can be localized to either the donor or acceptor moiety, so that they do not significantly perturb the exciton energy or its spatial distribution. Second, it is possible to select materials such that the participating molecular orbitals have a symmetry-imposed non-bonding character and are, thus, decoupled from the high-frequency vibrational modes that modulate the π-bond order. We exemplify both these design rules by creating a series of spin radical systems that have very efficient near-infrared emission (680–800 nm) from charge-transfer excitons. We show that these systems have substantial coupling to vibrational modes only below 250 cm −1 , frequencies that are too low to allow fast non-radiative decay. This enables non-radiative decay rates to be suppressed by nearly two orders of magnitude in comparison to π-conjugated molecules with similar bandgaps. Our results show that losses due to coupling to high-frequency modes need not be a fundamental property of these systems.

Similar content being viewed by others

frequency format hypothesis

Long-range order enabled stability in quantum dot light-emitting diodes

frequency format hypothesis

Giant electron-mediated phononic nonlinearity in semiconductor–piezoelectric heterostructures

frequency format hypothesis

Observation of interlayer plasmon polaron in graphene/WS2 heterostructures

In the limit of weak electronic coupling between the ground and excited electronic states, the rate of non-radiative recombination ( k nr ) as a function of the energy gap Δ E between the excited and ground electronic states can be written as 1 , 3 , 4

where C is the effective electronic coupling matrix element and \({\lambda }_{i}\) corresponds to the reorganization energy associated with the driving modes that promote 5 non-radiative relaxation. Equation ( 1 ) formalizes the energy-gap law, which predicts an increase in non-radiative decay rate with a decreasing energy gap.

High-frequency molecular vibrations (1,000–1,600 cm −1 ) are ubiquitous in π-conjugated organic molecules and are strongly coupled to electronic excited states where they directly modulate the π-bond order 1 . This is particularly problematic when the energy gap is directly coupled to π-bonding alternations, such as the phenylene ring-stretching mode 6 . On the other hand, low-frequency modes of less than 500 cm −1 are associated with high-mass displacement and are structurally delocalized in nature. These modes contribute less to fast non-radiative decay, which is captured by the ω term in equation ( 1 ). This term is the frequency of the promoting vibrational mode. Thus, exciton–vibration coupling to high-frequency modes, as generally observed for molecular systems 7 , 8 , 9 , 10 , 11 , causes rapid non-radiative decay dynamics. The key question is, therefore, whether we can decouple excitons from high-frequency vibrations in organic molecules.

Radicals as an efficient NIR emitter

We have selected a few examples of recently developed near-infrared (NIR) organic emitters that appear to violate the energy-gap law. We begin by focusing on an emerging family of spin-1/2 radical molecular semiconductors, as these have some of the highest values for the photoluminescence quantum efficiency (PLQE) and electroluminescence quantum efficiency (EQE EL ) reported for organic systems. These molecules consist of a doublet spin unit (TTM), which acts as an electron acceptor covalently linked to a donor unit, and an N-arylated carbazole moiety (TTM-3PCz and TTM-3NCz); see Fig. 1a . Both materials show strong luminescence from an intra-molecular charge-transfer exciton 12 , 13 . The lowest optical excitation for absorption and emission is the charge-transfer transition within the spin doublet manifold. With the correct tuning of the charge-transfer energetics and overlap, efficient emission can be achieved 14 . For these materials, excitation within the doublet manifold avoids access to higher multiplicity spin states and eliminates the problems due to the formation of triplet excitons in conventional closed-shell organic emitters. These systems have a very high NIR PLQE. For example, for TTM-3NCz, it is above 85% for the solid-state blends used in organic light-emitting diode host CBP for emission at 710 nm (ref. 12 ). The key question is how these systems can achieve such a high luminescence yield in the NIR and, thus, seemingly violate the energy-gap law.

figure 1

a , Molecular structures of the conjugated polymer (rr-P3HT), a laser dye (rhodamine 6G or r6G), two conventional organic semiconductors and spin-1/2 radical emitters (TTM-3PCz and TTM-3NCz), which are both state-of-the-art deep-red/NIR emitters. The name of the molecule is highlighted in red if the corresponding emission maxima are above 650 nm. The maximum reported PLQEs, Ф, are indicated. Note that rhodamine 6G has a high PLQE despite strong coupling to high-frequency modes due to the high energy-gap emission. b , Absorption spectra and the transition involved for impulsive excitation. The grey rectangular box indicates the spectral profile of the impulsive pump used to excite the samples. c , Vibrational coherence extracted from excited-state signal. d (bottom), The corresponding excited-state Raman spectra obtained after time-resolution correction for rr-P3HT (grey), rhodamine 6G (black), TTM-3NCz (orange) and TTM-3PCz (red). d (top), Theoretically calculated non-radiative decay rate (from equation ( 1 ) using DFT and TDDFT) plotted against the vibrational frequency for TTM-3PCz excitons assuming the corresponding vibrational mode as the main deactivation pathway. The black circles indicate the normal modes of the TTM-3PCz molecule. The probe windows for the vibrational coherence shown in c are as follows: rr-P3HT, 700–710 nm (stimulated emission) 35 ; rhodamine 6G, 560–570 nm (stimulated emission); TTM-3NCz, 650-710 nm (excited-state photo-induced absorption plus stimulated emission) and TTM-3PCz: 660–670 nm (excited-state photo-induced absorption and stimulated emission convoluted). The further analysis in Supplementary Information sections 5 and 6 investigates the origin of the vibrational coherence for TTM-3PCz molecules, which confirms that high-frequency decoupled vibrational coupling corresponds to the transition involved in luminescence. All samples were measured in solution except for P3HT ( Methods ). a.u., arbitrary units.

To answer this question, we compared these materials with a range of conventional organic molecules: regioregular poly(3-hexylthiophene) or rr-P3HT, a well-studied semiconductor homopolymer with a low PLQE of less than 5% at 680 nm, and the laser dye rhodamine 6G (r6G), which emits brightly at a relatively higher energy gap (PLQE = 94% at 550 nm).

Impulsive vibrational spectroscopy

We probe the vibrational coupling in the excited electronic state of these organic molecules by employing resonant impulsive vibrational spectroscopy (IVS; Methods and Supplementary Information section 18 ) 15 .

Figure 1b shows the absorption spectra of the organic molecules investigated as well as the spectral range of the ultrafast pump pulse used in our IVS studies (grey rectangle, 8.8 fs, centred at 575 nm). For rr-P3HT and r6G, the pump pulse was resonant with a π → π* transition. By contrast, for TTM-3PCz and TTM-3NCz, the pump pulse was resonant with a charge-transfer transition, which corresponds to a doublet excitation (D 0  → D 1 ) from the 3PCz/3NCz-centred highest occupied molecular orbital (HOMO) to a TTM-centred singly occupied molecular orbital (SOMO) 12 .

Following resonant impulsive excitation by the pump pulse, the early-time electronic population dynamics exhibits distinct oscillatory modulations across the entire visible probe region for all investigated molecules (see Extended Data Figs. 1c,d and 4a for wavelength-resolved analysis of TTM-3PCz and TTM-3NCz, respectively). Figure 1c displays the isolated excited-state vibrational coherences, and Fig. 1d (bottom) shows the time-resolution-corrected excited-state Raman spectrum of the corresponding time-domain data from Fig. 1c for each molecule (see  Methods for details). we observe in rr-P3HT a pronounced vibrational mode at 1,441 cm −1 , which is due to the C=C ring-stretching mode in this conjugated polymer system 7 . Rhodamine 6G has a series of high-frequency modes (1,356, 1,504 and 1,647 cm −1 ) corresponding to localized C–C and C=C stretching motions.

By contrast, the excited-state vibrational spectra of TTM-3PCz and TTM-3NCz have only one prominent mode (232 cm −1 ), which is in the range of frequencies associated with torsional motions of the TTM to 3PCz/3NCz moiety (see Extended Data Fig. 3 for a complete analysis of low-frequency modes and theoretically calculated exciton–vibration coupling constants).

These results suggest that two different regimes for exciton–phonon coupling operate in the materials studied here. For the conjugated homopolymer (rr-P3HT) and laser dye (rhodamine 6G), the photo-excited transition leads to the formation of excitons coupled to high-frequency C–C and C=C stretching modes, as is conventionally expected for organic systems. However, the lowest lying excitons of TTM-3PCz and TTM-3NCz, which are associated with the charge-transfer D 0  → D 1 transition, are decoupled from these high-frequency modes.

The effect of this vibrational decoupling on the non-radiative loss is dramatic. This is illustrated in the top panel of Fig. 1d , which shows the non-radiative decay rate along each normal mode coordinate calculated using density functional theory (DFT). It can be seen that the high-frequency modes lead to significantly faster rates of non-radiative recombination. For instance, taking Δ E  = 14,437 cm −1 (1.79 eV) and a reorganization energy of 1,105 cm −1 (0.137 eV) (Supplementary Information section 12 ), for TTM-3PCz, a representative high-frequency 1,560 cm −1 mode leads to a non-radiative rate approximately 10 15 times faster that of the low-frequency 230 cm −1 mode. In typical organic systems with a low bandgap, this would lead to rapid non-radiative losses. However, the key point here is that TTM-3PCz and TTM-3NCz show no coupling to these high-frequency modes.

Band-selective impulsive excitation

Having probed the charge-transfer type D 0  → D 1 transitions in radical molecules, we now turn our attention to the higher D 0  → D 2 transition, which does not involve charge-transfer excitons 12 , 16 . Studying this transition, therefore, allowed us to compare the vibrational coupling between the charge-transfer and non-charge-transfer transitions of the same molecule. Here, we focus on the novel radical molecule TTM-TPA, which has a donor–acceptor structural motif like those of TTM-3PCz and TTM-3NCz but has a triphenylamine (TPA) group as electron donor instead of the N-aryl carbazole (PCz/NCz) group (Fig. 2a ). As shown in Fig. 2a and like TTM-3PCz (ref. 12 ), the lowest-energy electronic transition (D 0  → D 1 ) in TTM-TPA corresponds to a charge-transfer excitation from the TPA-centred HOMO (donor) to the TTM-centred SOMO (acceptor), as revealed by time-dependent DFT (TDDFT) calculations (Extended Data Table 2 ). The second lowest-energy transition (D 0  → D 2 ) involves frontier molecular orbitals sitting predominantly on the TTM part (HOMO-2 to SOMO), corresponding to spatially overlapping orbitals, which is consistent with similar derivations 12 , 16 . As illustrated in Fig. 2b , the charge-transfer character of the lowest-energy absorption band (approximately 700 nm, D 0  → D 1 ) shows the expected solvatochromic redshift, whereas the higher energy absorption band (approximately 500 nm, D 0  → D 2 ) is barely affected by the solvent polarity. As TPA has a higher-lying HOMO compared to 3PCz (ref. 17 ), the charge-transfer transition is redshifted while it maintains a nearly similar energy for the local exciton transition in TTM-TPA (Fig. 2b ). This greater energy separation between the charge-transfer and non-charge-transfer transitions allows us to compare their vibrational coupling more cleanly than would be possible in TTM-3PCz. We excited the charge-transfer state with a pump pulse centred at 725 nm (pulse P 1 , 12 fs, Fig. 2b ), whereas the local exciton state was excited with a pump pulse centred at 575 nm (pulse P 2 , 15 fs, Fig. 2b ).

figure 2

a , Chemical structure of the novel NIR emitter TTM-TPA and its molecular orbital diagram highlighting the two lowest-energy transitions used in the band-selective excitation indicated by the purple and magenta arrows. The energy levels in the molecular orbital diagram are not to scale. b , Steady-state absorption spectra of TTM-TPA in different solvents with variable polarity (f: cyclohexane → CHCl 3  → tetrahydrofuran). The magenta and purple areas indicate the spectral profile of the impulsive pumps (P 1 and P 2 ) used to excite the different bands. c , Excited-state vibrational spectra obtained from vibrational coherence generated at early timescales (100–1,250 fs) upon exciting with the magenta and purple impulsive pumps. d , Theoretically calculated exciton–vibration coupling parameter, the so-called Huang–Rhys factor \(({S}_{{\rm{ev}}}(k))\) , for the D 0  → D 1 and D 0  → D 2 electronic transitions of TTM-TPA in the high-frequency regime for the experimentally obtained modes. CT, charge transfer. e , Vector displacement diagram of the high-frequency breathing mode with frequency 1,561 cm −1 plotted on the optimized geometry of TTM-TPA. f , g (top), Exciton wavefunction (transition density) ρ for the D 2 (non-charge-transfer exciton) transition ( f ) and the D 1 (charge-transfer exciton) transition ( g ). f , g (bottom), differential exciton wavefunction (transition density) upon displacement along the 1,561 cm −1 mode {Δ ρ } 1,561 cm −1 , plotted for the D 2 (non-charge-transfer exciton) transition ( f ) and the D 1 (charge-transfer exciton) transition ( g ).

Photo-excitation of TTM-TPA into D 1 with pulse P 1 yielded vibrational coherences like those of the previously observed charge-transfer excitons of TTM-3PCz and TTM-3NCz. Here, the excited-state impulsive vibrational spectrum is again dominated by low-frequency modes (228 cm −1 ) with a minor contribution from high-frequency modes in the range 1,100–1,650 cm −1 (magenta, Fig. 2c ). Photo-excitation into the non-charge-transfer exciton state through pulse P 2 populated the D 2 state, which rapidly cooled to the D 1 state with a time constant of 670 fs (see Extended Data Fig. 5 for the electronic and vibrational dynamics of D 2  → D 1 cooling). Figure 2c (purple) shows the corresponding vibrational spectrum obtained directly after photo-excitation into D 2 , which exhibits significantly enhanced coupling to high-frequency modes at 1,272,1,520 and 1,565 cm −1 , in stark contrast to the spectrum obtained for D 1 (Fig. 2c , magenta; see Extended Data Fig. 6 for a wavelength-resolved analysis). Taken together, this selective photo-excitation reveals that the charge-transfer (D 1 ) and non-charge-transfer exciton (D 2 ) states exhibit large differences in their coupling to the vibrational modes, even within the same molecule.

First-principles modelling

We performed first-principles spin-unrestricted DFT and TDDFT calculations to quantify the exciton–vibration coupling 18 , 19 , 20 using the Huang–Rhys factor ( \({S}_{{\rm{ev}}}\) ) for the D 0  → D 1 (charge-transfer) and D 0  → D 2 (non-charge-transfer) electronic transitions:

\({E}_{{\rm{ex}}(i)}^{+\delta u\left(k\right)}\) and \({E}_{{\rm{ex}}(i)}^{-\delta u\left(k\right)}\) are the excitation energies for an electronic transition \((i)\) upon displacing the equilibrium geometry by a small dimensionless quantity ( \(+\delta u,-\,\delta u\) ) along the \(k{\rm{th}}\) normal mode with frequency \(\omega \) , in the harmonic limit.

We computed \({S}_{{\rm{ev}}}(\omega )\) for the normal modes of TTM-TPA associated with both the D 0  → D 1 (charge-transfer) and D 0  → D 2 (non-charge-transfer) excitations. The results are shown in Fig. 2d for the key high-frequency vibrational modes in the experimental data, namely 1,273, 1,522, 1,561 and 1,572 cm −1 . The calculations show that for all high-frequency modes, the vibrational coupling is significantly reduced for the charge-transfer exciton (Fig. 2d , magenta) compared to the non-charge transfer exciton (Fig. 2d , purple), in line with the experimental observations

To better understand how these vibrational modes affect the electronic structure of TTM-TPA, we computed the exciton wavefunction ( ρ ) as well as the change in the wavefunction due to displacement along a normal mode, {Δ ρ } ω . Figure 2f,g shows these differential wavefunction plots {Δ ρ } ω , for the normal mode with a frequency of 1,561 cm −1 , which is associated with C–C and C=C stretching vibrations localized primarily on the TTM moiety (Fig. 2e ). This mode is strongly present in the experimental data (1,565 cm −1 , Fig. 2c ) and also dominates the theoretically calculated exciton–vibration coupling plot (Fig. 2d ).

TDDFT calculations reveal that excitation to D 2 localizes the wavefunction onto TTM ( ρ , Fig. 2f , top), as expected for a local non-charge-transfer excitonic state 16 . Figure 2f (bottom) shows how the exciton density on the molecule varies owing to perturbations of the molecular geometry along the 1,561 cm −1 mode, which is represented by {Δ ρ } 1,561 . Displacement along this normal mode leads to large changes in the D 2 exciton wavefunction, indicating strong coupling of the high-frequency vibrational modes to the non-charge-transfer exciton wavefunction. We then compare this to the exciton density arising from D 0  → D 1 (Fig. 2g , top), which leads to a delocalized wavefunction over the whole molecule with disjoint electron and hole densities. Critically, the exciton density upon perturbation along the 1,561 cm −1 normal mode ({Δ ρ } 1,561 , Fig. 2g , bottom) shows very little change for the D 1 exciton, in marked contrast to the results for the D 2 exciton. This shows the strong suppression of exciton–vibration coupling for the charge-transfer-type D 1 exciton. Extended Data Fig. 7 presents similar results for all the other experimentally obtained high-frequency vibrational modes.

To get a complete mode-resolved picture, we also calculated the parameter \({\varphi }_{{\rm{lf}}}^{{\rm{hf}}}\) , which is defined as the ratio of the vibrational reorganization energy 21 ( \({\lambda }_{{\rm{v}}}\) ) associated with all high-frequency normal modes (1,000–2,000 cm −1 ) to the low-frequency modes (100–1,000 cm −1 ) for a particular electronic transition:

As displayed in Extended Data Fig. 8a,b , \({\varphi }_{{\rm{lf}}}^{{\rm{hf}}}\) for the non-charge-transfer-type D 2 state is 2.4 times higher than for the charge-transfer-type D 1 state, in agreement with the experimental results in Fig. 2d .

Thermally activated delayed fluorescence

We next examined how other low-bandgap organic systems, especially those with a variable charge-transfer character in the exciton coupling to vibrations (Fig. 3a ). We selected APDC-DTPA (refs. 22 , 23 , 24 ) as an example of a highly efficient NIR-emitting thermally activated delayed fluorescence (TADF) system (PLQE = 63% at 687 nm) 22 . Here, the electronic excitation promotes an electron from the HOMO centred at TPA to the lowest unoccupied molecular orbital (LUMO) at an acenaphthene-based acceptor core (APDC). We also studied a classic green-emitting TADF system, 4CzIPN, which has a higher energy gap (refs. 25 , 26 , 27 , 28 , 29 ; PLQE = 94% at 550 nm). The key design feature of TADF systems, as first developed by Adachi and co-workers 28 , is the introduction of a donor–acceptor character such that the charge-transfer exciton has a spatially reduced electron–hole overlap that reduces the singlet–triplet exchange energy.

figure 3

a , Molecular structures of the TADF molecules studied (4CzIPN and APDC-DTPA) b , Time-resolution-corrected excited-state Raman spectra of the TADF (4CzIPN and APDC-DTPA). Asterisks indicate the solvent mode. See Extended Data Figs. 1a and 4b for a wavelength-resolved analysis of APDC-DTPA and 4CzIPN. The three-pulse IVS experiment on 4CzIPN solution is detailed in Supplementary Information section  7 . c , Experimentally obtained non-radiative rates of the studied low energy-gap molecules. Orange circles represent radical emitters. The blue circle represents TADF. The grey circles represent non-fullerene acceptors 36 (IO-4Cl, ITIC, o-IDTBR, Y5 and Y6). d , Vibrational coupling to the frontier molecular orbitals of APDC-DTPA (TADF) and TTM-3PCz (radical) obtained from first-principles DFT. The hole-accepting orbitals of both TADF (HOMO of APDC-DTPA) and the radical (HOMO of TTM-TPA) are localized on the central N atom with a non-bonding character, which is reflected in the lower coupling to the high-frequency phenylic ring-stretching modes. The electron-accepting level of the radical (SOMO of TTM-TPA) has a non-bonding character and suppressed high-frequency coupling with respect to its HOMO, whereas for the TADF structures, the hole-accepting level (HOMO of APDC-DTPA) makes a significant orbital contribution in the vicinity of the planar π bonds and shows reasonable high-frequency coupling.

As shown in Fig. 3b , the excited-state vibrational spectrum of APDC-DTPA exhibits vibrational activity only in the lower-frequency regime (183, 290, 324, 557, 587, 678 and 735 cm −1 ), which is associated with more delocalized torsional modes in the system. Similarly, 4CzIPN shows strong coupling to low-frequency torsional modes (157, 244, 429, 521 and 562 cm −1 ) with a nominal contribution from high-frequency modes.

Once again, we observed that electronic transitions featuring a strong charge-transfer character and non-planar molecular geometry, which lead to spatially separated and disjoint HOMO/SOMO or the LUMO, as seen for APDC-DTPA, 4CzIPN, TTM-3PCz, TTM-3NCz and TTM-TPA, give rise to excitons that do not couple to high-frequency modes.

Figure 3c summarizes the non-radiative decay rates for all the deep-red/NIR-emitting molecules studied here, which were based on radiative lifetime and PLQE measurements. These measurements directly show that the suppression of coupling to high-frequency modes, as measured by IVS (Figs. 1d , 2c and 3b ), results in a lower non-radiative decay rate in doublet and TADF systems. This contrasts with non-fullerene acceptor systems, which have higher rates of non-radiative decay and which we found have strong coupling to high-frequency modes, as presented in Extended Data Fig. 2 and Supplementary Information section 21 . By contrast, the TADF and radical emitters show greatly suppressed non-radiative rates, matching the lack of coupling to high-frequency modes observed by IVS.

This is also in agreement with our calculations, which show a supressed contribution of the high-frequency normal modes to the vibrational reorganization energy for the charge-transfer excitons studied here (TTM-3PCz and APDC-DTPA), in comparison to the non-charge-transfer excitons (TTM and pentacene), in agreement with non-adiabatic calculations 13 (Extended Data Fig. 8c ), which is further supported experimentally by solvent polarity-dependent IVS measurements (Supplementary Information section 19 ). Taken together, the calculations support the experimental observation of suppressed high-frequency vibrational activity of the excited electronic state for charge-transfer excitations.

Intuitively, a general proposition to understand these results can be understood as follows. The charge-transfer excitation in non-planar molecules (radicals and TADF) provides spatially separated electrons (HOMOs/SOMOs) and holes (LUMOs) across the molecular backbone (Figs. 2g , 3d ). Simultaneous changes to both the electron and hole wavefunctions due to highly localized high-frequency carbon–carbon stretching motion, therefore, result in a smaller effect compared to planar excitonic systems, which exhibit strongly overlapping HOMOs and LUMOs with high electronic densities in the vicinity of these high-frequency nuclear oscillations. We note that although the non-fullerene acceptor systems have a donor–acceptor structural motif, due to their coplanar geometry and strong electronic conjugation through the fused rings, the HOMO and LUMO strongly overlap in space, unlike the radicals and TADF, so that the dipole oscillator strength of the lowest-energy transition in these materials is very high, as required for their use in photovoltaics.

figure 4

a , b , Chemical structure of the regio-isomers of dimesitylated-TTM-carbazole system M 2 TTM-3PCz ( a ) and M 2 TTM-2PCz ( b ) with solution PLQE. R stands for the mesityl group and R 1 represents –C 6 H 13 . c , d , Two HOMOs having N-non-bonding and π character and their alternative pattern of delocalization in the phenyl group of the M 2 TTM moiety upon changing the linking position from 3 ( c ) to 2 ( d ). The grey dotted circle represents the delocalization of the molecular orbital from the PCz group to the adjacent phenyl ring of the M 2 TTM radical core. e , f , Vibrational coupling parameters for the N-non-bonding ( e ) and π ( f ) molecular orbitals of phenylcarbazole (PCz). g , h , Time-resolution-corrected excited-state Raman spectra of M 2 TTM-3PCz ( g ) and M 2 TTM-2PCz ( h ). The selected probe windows are the blue edges of the photo-induced absorption (680–700 nm for M 2 TTM-3PCz and 680–700 nm for M 2 TTM-2PCz; see Extended Data Fig. 9 for a wavelength-resolved analysis).

Non-bonding-type electron and hole levels

Comparing the measured impulsive vibrational spectra of the radical emitters with the TADF molecules (Figs. 1d and 3b ), it can be seen that although the TADF molecules do not couple to vibrations of more than 1,000 cm −1 , the radicals do not couple to modes of more than 240 cm −1 . The radical emitters also display lower rates of non-radiative recombination than the TADF systems, as shown in Fig. 3c and, thus, go beyond what these TADF systems can achieve in terms of supressing non-radiative recombination. This suggests that something else is also supressing the coupling to high-frequency modes in radical systems in comparison to TADF systems.

Figure 3d shows the results of first-principles calculations for the vibrational coupling to the hole and electron levels in APDC-DTPA (TADF molecule) and TTM-TPA (radical molecules). In both TADF and radical systems, TPA and N-aryl-carbazole (Cz) are widely adopted as donors 12 , 17 , 30 . These donor moieties have nitrogen p z  centred non-bonding-type HOMO. As can be seen in Fig. 3d , these levels are not strongly coupled to high-frequency phenyl ring-stretching vibrations. The degree of further localization of the non-bonding-type HOMO on the nitrogen atom depends on the non-planarity of the nitrogen p z orbital to the adjacent π systems imposed by steric hindrance 31 (Supplementary Information section 14 ). It can be seen that for the electron level of the TADF molecule, the LUMO does show coupling to high-frequency vibrations, but for the radical systems, the electron-accepting SOMO level, localized on the TTM moiety’s central sp 2 carbon atom, has a non-bonding character 12 and does not show strong coupling to high-frequency vibrational modes. This implies that both the electron and hole levels for the radical systems have a non-bonding character, whereas only the hole levels have this non-bonding character in TADF systems (Fig. 3d ). This may explain the weaker coupling to high-frequency vibrations (Figs. 1d and 3b ) and the lower non-radiative recombination rate in the radical systems compared to the TADF systems.

To test this hypothesis, we designed two radical molecules to tune the participation of the non-bonding character in the emitting electronic transition. As can be seen in Fig. 4a,b , M 2 TTM-3PCz and M 2 TTM-2PCz are two regio-isomers of the dimesitylated-TTM linked through either the 3 or 2 positions of phenylcarbazole (PCz). This regioselective linking between donor and acceptor leads to dramatic changes in the photoluminescence efficiencies (for M 2 TTM-3PCz, PLQE = 92% with a photoluminescence lifetime of 15.2 ns, whereas for M 2 TTM-2PCz, PLQE = 14% with a photoluminescence lifetime of 3.2 ns) and a very large difference in the non-radiative rate (Extended Data Fig. 9d ). As can be visualized in Fig. 4c,d , from an electronic structure point of view, the molecules are differentiated because either the N-non-bonding type HOMO (for M 2 TTM-3PCz) or the carbazole-π-type HOMO-1 (for M 2 TTM-2PCz) are delocalized onto the adjacent phenyl ring of the M 2 TTM moiety. On the basis of our hypothesis of the importance of the non-bonding character in suppressing coupling to high-frequency vibrations, we would predict that M 2 TTM-3PCz should show reduced coupling to high-frequency modes in comparison to M 2 TTM-2PCz (Fig. 4e–f ). This prediction is verified in the impulsive vibrational spectra in Fig. 4g–h , as M 2 TTM-3PCz has stronger coupling to the carbazole ring-stretching node at 1,612 cm −1 because the hole-accepting level has a carbazole-π character (see Extended Data Fig. 9c for wavelength-resolved data). This tuning of the coupling to high-frequency modes through the participation of non-bonding levels in the electronic transitions allowed us to achieve an even lower non-radiative rate (Fig. 3c ) and near-unity PLQE at 680 nm. This shows that combining a charge-transfer character with non-bonding orbitals is the key to decoupling excitons from higher-frequency vibrations (over 250 cm −1 ) and that a charge-transfer character alone is not sufficient.

Conclusions and outlook

Taken together, our experiments and calculations provide a mechanistic picture for how to decouple excitons from vibrational modes in organic systems. If an exciton wavefunction has a substantial charge-transfer character and the electron and hole wavefunctions are spatially separated, localized high-frequency modes (over 1,000 cm −1 ) do not significantly perturb its energy or its spatial distribution (experimental data in Figs. 1d and 2c and calculations in Fig. 2d,f,g ). As depicted in Figs. 2 and 3 , spatially separated disjoint hole (HOMO) and electron (LUMO) pairs can be generated by having a non-coplanar electron-rich (donor) and an electron-deficient (acceptor) moiety in a molecule, although this comes at the expense of a lower radiative rate. The selection of moieties with non-bonding electronic levels, such as the hole-accepting TPA, arylated carbazole (HOMO) and the electron-accepting TTM-donor radicals (SOMO), further decouples the exciton from high-frequency vibrations (over 250 cm −1 ). Consequently, non-radiative decay channels, which are dominated by high-frequency modes in organic molecules, can be efficiently suppressed.

This mechanism explains the high luminescence efficiency of the low-bandgap TTM-donor-based radical molecules as well as some low-bandgap TADF systems, for which the charge-transfer excitations are the lowest excited state. Our results also explain the apparent contradiction in the performance of these materials, for which the charge-transfer character of the electronic transition leads to a reduced oscillator strength and reduced radiative rate (Extended Data Table 1 ), which would normally be associated with a lower luminescence efficiency 32 , 33 . However, the suppression of non-radiative decay pathways due to the charge-transfer character of the excitations and non-bonding nature of the levels, as demonstrated here, overcomes this and enables high luminescence efficiency from these states.

This has important implications for the design of organic emitters for organic light-emitting diodes and NIR fluorescent markers for biological applications. The proposed design principles also open up new possibilities for organic photovoltaics, by allowing efficient radiative recombination in organic photovoltaics (such as achieved with metal-halide perovskites or GaAs solar cells) to boost the open-circuit voltage, the major outstanding challenge in the field 2 , 34 . This could enable device efficiencies well above 20% in future organic photovoltaics.

TTM-3PCz and TTM-3NCz were synthesized by the Suzuki coupling reaction as reported earlier 12 . The synthetic route of the novel radical TTM-TPA is extensively discussed in Supplementary Information section 1 . M 2 TTM-3PCz was synthesized following the recently reported Suzuki coupling and radical conversion procedures, and the novel radical M 2 TTM-2PCz was prepared by the same procedures, which are discussed in Supplementary Information section  17 . APDC-DTPA, 4CzIPN, rr-P3HT, rhodamine 6G, pentacene, ITIC, IO-4Cl, o-IDTBR, Y5, Y6 and Y7 were obtained from Lumtec, Ossila and Merck. The impulsive measurements of all samples were done in solution except for rr-P3HT. To prevent rr-P3HT from burning on the cuvette wall (in solution), a spin-coated thin-film measurement was done. In solution, it can show PLQE = 33% with a blueshifted emission compared to films. The higher PLQE in solution can be ascribed to an avoidance of interchain-state formation and a high gap emission.

In IVS, an ultrafast pump pulse (sub-15 fs) resonant with the optical gap impulsively generates vibrational coherence in the photo-excited state of a material, which evolves in time according to the underlying excited-state potential energy surface. The impulsive response of the system was recorded by a time-delayed probe pulse that was spectrally tuned to probe excited-state resonances. The so-obtained vibrational coherence manifested as oscillatory modulations superimposed on top of the sample’s transient population dynamics and provided direct access to the excited-state Raman spectrum of the material with a Fourier transformation. The IVS experiments were performed with a home-built set-up 37 seeded by a commercially available Yb:KGW amplifier laser (PHAROS, Light Conversion, 1,030 nm, 38 kHz and 15 W). A chirped white light continuum (WLC) spanning from 530 to 950 nm was used as a probe pulse. This was generated by focusing a part of the fundamental beam onto a 3 mm YAG crystal and collimating after it. The impulsive pump pulses were generated by a non-collinear optical parametric amplifier (NOPA), as reported previously 38 . The second- (515 nm) and third-harmonic (343 nm) pulses required to pump the NOPAs were generated with an automatic harmonic generator (HIRO, Light Conversion). The impulsive pump (experiments reported in Fig. 2 ) and P 2 pulse for the band-selective experiment were generated with a NOPA seeded by 1030-WLC and amplified by the third harmonic (343 nm). The P 1 pulse for the band-selective experiment was generated with a NOPA seeded by 1030-WLC and amplified by the second harmonic (515 nm). Pump pulses were compressed using a pair of chirped mirrors in combination with wedge prisms (Layertec). The spatio-temporal profile of the pulses was measured with a second-harmonic generation frequency-resolved optical gating (Supplementary Information section 3 and Extended Data Fig. 6 ). A chopper wheel in the pump beam path modulated the pump beam at 9 kHz to generate differential transmission spectra. The pump–probe delay was set by a computer-controlled piezoelectric translation stage (PhysikInstrumente) with a step size of 4 fs. The pump and probe polarizations were parallel. The transmitted probe was recorded by a grating spectrometer equipped with a Si line camera (Entwicklungsbüro Stresing) operating at 38 kHz with a 550 nm blazed grating. Solution samples were measured in a flow cell cuvette with an ultrathin wall aperture (Starna, Far UV Quartz, path length of 0.2 mm). Pulse compression was performed after a quartz coverslip (170 μm) was placed in the beam path of the frequency-resolved optical gating to compensate for the dispersion produced by the cuvette wall.

Time-domain vibrational data analysis

After correcting for the chirp and subtracting the background, the kinetic traces for each probe wavelength were truncated to exclude time delays of less than 100 fs to prevent contamination from coherent artefacts. We subsequently extracted the residual oscillations from the convoluted kinetic traces after globally fitting the electronic dynamics by a sum of two exponential decaying functions with an offset over the whole spectral range. A series of signal-processing techniques were employed to convert the oscillatory time-domain signals to the frequency domain, including apodization (Kaiser–Bessel window, \(\beta =1\) ), zero-padding and a fast Fourier transformation (FFT). Before we produced the intensity spectra, the |FFT| amplitude was multiplied by a frequency-dependent scaling function to remove time-resolution artefacts (the time-resolution correction method is described in detail in Supplementary Information section  3.2 ).

Computational methods

To study the ground state properties of the different molecules, we performed DFT calculations, employing the B3LYP hybrid functional and cc-pVDZ basis set as implemented within the software NWChem (ref. 39 ). For the open-shell systems discussed in this work, we performed spin-unrestricted DFT calculations, setting the multiplicity to two (doublet state). To compute the vibrational properties and the effect of vibrations on excited states, we coupled our molecular DFT calculations to finite displacement methods 20 . Excited-state properties were computed by TDDFT on top of the previously calculated DFT ground states using the B3LYP exchange-correlation functional and the same basis set as above. We verified for each of the studied open-shell systems that the computed ground and excited states did not suffer from spin contamination 40 .

Data availability

The data underlying all figures in the main text are publicly available from the University of Cambridge repository https://doi.org/10.17863/CAM.105569 (ref. 41 ).

Code availability

The code for analysing the data from the IVS experiments used in the manuscript is accessible through open access at https://doi.org/10.17863/CAM.105569 (ref. 41 ).

Wilson, J. S. et al. The energy gap law for triplet states in Pt-containing conjugated polymers and monomers. J. Am. Chem. Soc. 123 , 9412–9417 (2001).

Article   CAS   PubMed   Google Scholar  

Benduhn, J. et al. Intrinsic non-radiative voltage losses in fullerene-based organic solar cells. Nat. Energy 2 , 1–6 (2017).

Englman, R. & Jortner, J. The energy gap law for radiationless transitions in large molecules. Mol. Phys. 18 , 285–287 (1970).

Article   Google Scholar  

Wei, Y.-C. et al. Overcoming the energy gap law in near-infrared OLEDs by exciton–vibration decoupling. Nat. Photon. 14 , 570–577 (2020).

Wang, S. F. et al. Polyatomic molecules with emission quantum yields >20% enable efficient organic light-emitting diodes in the NIR(II) window. Nat. Photon.   16 , 843–850 (2022).

Spano, F. C. Absorption and emission in oligo-phenylene vinylene nanoaggregates: the role of disorder and structural defects. J. Chem. Phys. 116 , 5877 (2002).

Article   ADS   CAS   Google Scholar  

Falke, S. M. et al. Coherent ultrafast charge transfer in an organic photovoltaic blend. Science 344 , 1001–1005 (2014).

Article   ADS   CAS   PubMed   Google Scholar  

Rafiq, S., Fu, B., Kudisch, B. & Scholes, G. D. Interplay of vibrational wavepackets during an ultrafast electron transfer reaction. Nat. Chem. 13 , 70–76 (2021).

Article   CAS   Google Scholar  

Musser, A. J. et al. Evidence for conical intersection dynamics mediating ultrafast singlet exciton fission. Nat. Phys. 11 , 352–357 (2015).

Schnedermann, C. et al. A molecular movie of ultrafast singlet fission. Nat. Commun. 10 , 4207 (2019).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Song, Y., Clafton, S. N., Pensack, R. D., Kee, T. W. & Scholes, G. D. Vibrational coherence probes the mechanism of ultrafast electron transfer in polymer–fullerene blends. Nat. Commun. 5 , 4933 (2014).

Article   ADS   PubMed   Google Scholar  

Ai, X. et al. Efficient radical-based light-emitting diodes with doublet emission. Nature 563 , 536–540 (2018).

Guo, H. et al. High stability and luminescence efficiency in donor–acceptor neutral radicals not following the Aufbau principle. Nat. Mater. 18 , 977–984 (2019).

Abdurahman, A. et al. Understanding the luminescent nature of organic radicals for efficient doublet emitters and pure-red light-emitting diodes. Nat. Mater. 19 , 1224–1229 (2020).

Liebel, M., Schnedermann, C., Wende, T. & Kukura, P. Principles and applications of broadband impulsive vibrational spectroscopy. J. Phys. Chem. A 119 , 9506–9517 (2015).

Cho, E., Coropceanu, V. & Brédas, J. L. Organic neutral radical emitters: impact of chemical substitution and electronic-state hybridization on the luminescence properties. J. Am. Chem. Soc. 142 , 17782–17786 (2020).

Dong, S. et al. Effects of substituents on luminescent efficiency of stable triaryl methyl radicals. Phys. Chem. Chem. Phys. 20 , 18657–18662 (2018).

Monserrat, B. Electron–phonon coupling from finite differences. J. Phys. Condens. Matter 30 , 083001 (2018).

Alvertis, A. M. et al. Impact of exciton delocalization on exciton–vibration interactions in organic semiconductors. Phys. Rev. B 102 , 081122 (2020).

Hele, T. J. H., Monserrat, B. & Alvertis, A. M. Systematic improvement of molecular excited state calculations by inclusion of nuclear quantum motion: a mode-resolved picture and the effect of molecular size. J. Chem. Phys. 154 , 244109 (2021).

Gruhn, N. E. et al. The vibrational reorganization energy in pentacene: molecular influences on charge transport. J. Am. Chem. Soc. 124 , 7918–7919 (2002).

Yuan, Y. et al. Over 10% EQE near-infrared electroluminescence based on a thermally activated delayed fluorescence emitter. Adv. Funct. Mater. 27 , 1700986 (2017).

Hu, Y. et al. Efficient near-infrared emission by adjusting the guest–host interactions in thermally activated delayed fluorescence organic light-emitting diodes. Adv. Funct. Mater. 28 , 1802597 (2018).

Article   ADS   Google Scholar  

Congrave, D. G. et al. A simple molecular design strategy for delayed fluorescence toward 1000 nm. J. Am. Chem. Soc. 141 , 18390–18394 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Nakanotani, H., Masui, K., Nishide, J., Shibata, T. & Adachi, C. Promising operational stability of high-efficiency organic light-emitting diodes based on thermally activated delayed fluorescence. Sci. Rep. 3 , 2127 (2013).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Sandanayaka, A. S. D., Matsushima, T. & Adachi, C. Degradation mechanisms of organic light-emitting diodes based on thermally activated delayed fluorescence molecules. J. Phys. Chem. C 119 , 23845–23851 (2015).

Hosokai, T. et al. 58-2: revealing the excited-state dynamics of thermally activated delayed flourescence molecules by using transient absorption spectrospy. SID Symp. Dig. Tech. Pap. 47 , 786–789 (2016).

Uoyama, H., Goushi, K., Shizu, K., Nomura, H. & Adachi, C. Highly efficient organic light-emitting diodes from delayed fluorescence. Nature 492 , 234–238 (2012).

Méhes, G., Nomura, H., Zhang, Q., Nakagawa, T. & Adachi, C. Enhanced electroluminescence efficiency in a spiro-acridine derivative through thermally activated delayed fluorescence. Angew. Chem. Int. Ed. 51 , 11311–11315 (2012).

Liu, Y., Li, C., Ren, Z., Yan, S. & Bryce, M. R. All-organic thermally activated delayed fluorescence materials for organic light-emitting diodes. Nat. Rev. Mater. 3 , 18020 (2018).

Valeur, B. & Berberan‐Santos, M. N. Molecular Fluorescence: Principles and Applications 2nd edn (Wiley, 2012).

Gillett, A. J. et al. Spontaneous exciton dissociation enables spin state interconversion in delayed fluorescence organic semiconductors. Nat. Commun. 12 , 8–17 (2021).

Pershin, A. et al. Highly emissive excitons with reduced exchange energy in thermally activated delayed fluorescent molecules. Nat. Commun. 10 , 3–7 (2019).

Chen, X. K. et al. A unified description of non-radiative voltage losses in organic solar cells. Nat. Energy 6 , 799–806 (2021).

Sneyd, A. J. et al. Efficient energy transport in an organic semiconductor mediated by transient exciton delocalization. Sci. Adv. 7 , eabh4232 (2021).

Yan, C. et al. Non-fullerene acceptors for organic solar cells. Nat. Rev. Mater. 3 , 18003 (2018).

Pandya, R. et al. Exciton–phonon interactions govern charge-transfer-state dynamics in CdSe/CdTe two-dimensional colloidal heterostructures. J. Am. Chem. Soc. 140 , 14097–14111 (2018).

Liebel, M., Schnedermann, C. & Kukura, P. Sub-10-fs pulses tunable from 480 to 980 nm from a NOPA pumped by an Yb:KGW source. Opt. Lett. 39 , 4112 (2014).

Aprà, E. et al. NWChem: past, present, and future. J. Chem. Phys. 152 , 184102 (2020).

Baker, J., Scheiner, A. & Andzelm, J. Spin contamination in density functional theory. Chem. Phys. Lett. 216 , 380–388 (1993).

Ghosh, P. et al. Data and Code Supporting ’Decoupling Excitons from High-Frequency Vibrations in Organic Molecules’. Apollo — University of Cambridge Repository. https://doi.org/10.17863/CAM.105569 (2024).

Download references

Acknowledgements

We thank D. Beljonne and T. J. H. Hele for valuable discussions. This project has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 101020167 (SCORS) to R.H.F. and Grant Agreement No. 758826 (SOLARX) to A.R.). This work has received funding from the Engineering and Physical Sciences Research Council (UK). R.H.F. acknowledges support from the Simons Foundation (Grant No. 601946). P.G. thanks the Cambridge Trust and the George and Lilian Schiff Foundation for a PhD scholarship and St John’s College, Cambridge, for additional support. H.B. acknowledges EPSRC (grant no EP/S003126/1). P.M. has received funding from Marie Skłodowska-Curie Actions (Grant Agreement No. 891167). H.C. acknowledges the George and Lilian Schiff Foundation for PhD studentship funding. P.M. and H.C. also acknowledge the European Research Council for the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 101020167) for funding. S.D. and F.L. are grateful for receiving financial support from the National Natural Science Foundation of China (Grant No. 51925303). R.C. thanks the European Union’s Horizon 2020 project for funding under its research and innovation programme through Marie Skłodowska-Curie Actions (Grant Agreement No. 859752, HEL4CHIROLED). B.M. acknowledges support from a Future Leaders Fellowship from UK Research and Innovation (UKRI; Grant No. MR/V023926/1), from the Gianna Angelopoulos Programme for Science, Technology, and Innovation, and from the Winton Programme for the Physics of Sustainability. The calculations in this work were performed using resources provided by the Cambridge Tier-2 system operated by the University of Cambridge Research Computing Service and funded by the Engineering and Physical Sciences Research Council (Grant No. EP/P020259/1). A.J.G. thanks the Leverhulme Trust for an Early Career Fellowship (ECF-2022-445). This work was funded by the UKRI. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.

Author information

Authors and affiliations.

Cavendish Laboratory, University of Cambridge, Cambridge, UK

Pratyush Ghosh, Rituparno Chowdhury, Petri Murto, Alexander J. Gillett, Alexander J. Sneyd, Hwan-Hee Cho, Emrys W. Evans, Bartomeu Monserrat, Christoph Schnedermann, Hugo Bronstein, Richard H. Friend & Akshay Rao

KBR, Inc., NASA Ames Research Center, Moffett Field, CA, USA

Antonios M. Alvertis

Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA

Yusuf Hamied Department of Chemistry, University of Cambridge, Cambridge, UK

Petri Murto & Hugo Bronstein

State Key Laboratory of Supramolecular Structure and Materials, College of Chemistry, Jilin University, Changchun, China

Shengzhi Dong & Feng Li

Department of Chemistry, Swansea University, Swansea, UK

Emrys W. Evans

Department of Materials Science and Metallurgy, University of Cambridge, Cambridge, UK

Bartomeu Monserrat

You can also search for this author in PubMed   Google Scholar

Contributions

A.R. conceived the project. P.G. developed the project, designed and built the experiments, and performed the resonant IVS and transient absorption spectroscopy measurements and quantum modelling. P.G. analysed the vibrational spectroscopy data with input from C.S. P.G. performed the quantum chemical calculations with input from A.M.A. and B.M. P.M. synthesized and characterized the dimesitylated-TTM-carbazole isomers under the supervision of H.B. P.G. and H.C. prepared the samples for the experiments. R.C. determined the PL characterization of the dimesitylated-TTM-carbazole isomers. P.G., A.J.G. and A.J.S. performed the three-pulse impulsive vibrational experiment with input from C.S. S.D. synthesized and characterized the radical materials TTM-TPA, TTM-3PCz and TTM-3NCz under the supervision of F.L. P.G., A.R. and R.H.F. co-wrote the manuscript with input from all other authors. R.H.F. and A.R. supervised the work.

Corresponding author

Correspondence to Akshay Rao .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Xian-Kai Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 probe-wavelength-resolved resonant-ivs data of apdc-dtpa and ttm-3pcz..

a (APDC-DTPA), c (TTM-3PCz), Differential transmission map following excitation with a 10-fs pulse centred at 575 nm, at room temperature. b (APDC-DTPA), d (TTM-3PCz), Wavelength-resolved impulsive Raman map of following impulsive photo-excitation into lowest excited CT state.

Extended Data Fig. 2 Probe-wavelength-resolved resonant-IVS studies on non-fullerene acceptor (NFA) molecules in chloroform solution.

Wavelength-resolved impulsive Raman map of a ITIC, b IO-4Cl, c o-IDTBR, d Y5, e Y6, f Y7. Transient absorption spectra at 1-2 picosecond are plotted in the right inset of every panel. We note that while the NFAs have a donor-acceptor structural motif, due to coplanar geometry and strong electronic conjugation through the fused rings, the HOMO and LUMO strongly overlap in space so that the dipole oscillator strength of the lowest energy transition in these materials is very high, as required for their use in photovoltaics (see the frontier MOs in Supplementary Information section  10 ). We also note that in many of these molecules the exciton is delocalised across a large spatial extent (greater than for the APDC-DTPA and TTM-3PCz molecules), but that this overall delocalisation does not suppress coupling to high-frequency modes significantly (Supplementary Information section 21 ) nor reduced resultant non-radiative recombination rate (Fig. 3c ).

Extended Data Fig. 3 Analysis of the exciton-vibration coupling constant of TTM-3PCz and assignment of strongly coupled mode to the excited state.

a , Excited-state Raman map of TTM-3PCz in CHCl 3 , b Calculated Hung-Rhys factor associated with the lowest energy transition (D 0  → D 1 ) of TTM-3PCz, c vector displacement diagram of the normal modes with calculated frequency 156, 202, 204 cm −1 which correspond to experimentally obtained 150, 232 cm −1 mode. The description of this figure is discussed in further detail in Supplementary Information (Section  9) .

Extended Data Fig. 4 Probe-wavelength-resolved IVS studies on a, TTM-3NCz (CHCl 3 ) and b, 4CzIPN (toluene).

(I) Chemical structure, (II) probe wavelength-resolved impulsive Raman map, (III) excited-state Raman spectra shown for representative probe wavelength (625–635 nm for TTM-3NCz and 870–880 nm for 4CzIPN). Black asterisk (*) corresponds to the solvent modes. The experiment performed on 4CzIPN, is a 3-optical pulse IVS experiment due to the spectral bandwidth limitation of direct impulsive excitation high-energy transitions (>2.4 eV, see  methods ). 4CzIPN (toluene) is excited to 1CT state with 450 nm (200 fs) pulse followed by inducing vibrational coherence (VC) with 850 nm-centred broadband (8.5 fs) impulsive pulse, 500 fs after initial photo-excitation. The impulsive pulse spectrally overlaps with the excited state absorption of 4CzIPN observed >550 nm. The result shown for 4CzIPN, was recorded in 1 mm pathlength cuvette. This experiment is analysed in further detail in Supplementary information (Section  7) .

Extended Data Fig. 5 Electronic and Vibrational dynamics of internal conversion from D 2 to D 1 state of TTM-TPA.

a) transient absorption signal in the form of differential transmission (ΔT/T) upon photo-excitation with P 1 and P 2 pump pulse at different time delays (the purple shaded area: P 2 excitation; and magenta line: P 1 excitation). b) electronic population dynamics of the D 2 exciton for a representative wavelength (790–800 nm) which turns out to be 670 ± 125 fs. (The method is described in SI section  8 ). c, d , the spectra obtained from the vibrational coherence at early time (150–1500 fs) and later time (650–2000 fs) in band selective excitation experiment with c) P 2 excitation and d) P 1 excitation at representative probe wavelength (710–750 nm). It is important to note that a very close resemblance of the later time spectra of P 2 excitation and early time spectra of the P 1 excitation is consistent with this barrierless and coherent internal conversion process from D 2 to D 1 .

Extended Data Fig. 6 Band selective Impulsive vibrational spectroscopy of TTM-TPA.

a Temporal profile of the pulse P 1 and P 2 used in band selective experiment to excite charge transfer and local exciton-rich excited state respectively. b , Vibrational coherence generated after photo-excitation with P 1 and P 2 pulse for a representative probe window (680–700 nm). c,d , Wavelength-resolved impulsive Raman map (time-resolution corrected) of TTM-TPA following impulsive photo-excitation by c) the P 2 pulse to the D 2 rich state and d) the P 1 pulse to the D 1 state; The diagonal strips in the d, arise due to the interference scattering with the P 1 pump. The fact that, the dominance of the high-frequency oscillations (time-period = 20–30 fs) is highly pronounced in the D 2 state even after photo-excited with the slower pump pulse, strongly support the data and calculation presented in the Fig. 2 .

Extended Data Fig. 7 Perturbational effect on the exciton wavefunction (transition density) of the D 2 and D 1 state along the coordinate of all experimentally obtained high-frequency vibrational modes (full mode resolved picture of the data represented in Fig. 3g,h ).

on the top left panel, experimentally obtained D 1 and D 2 state rich Raman spectra are reproduced. In the table, ω k (cm −1 ) corresponds to the frequency of the k th vibrational mode. Inside parenthesis, the theoretically obtained frequencies are provided. q(k) corresponds to the vector displacement diagram of the k th mode. {Δρ(D 2 )} k and {Δρ(D 1 )} k are the differential exciton wavefunction (transition density) upon displacement along the k th mode for D 2 and D 1 exciton respectively.

Extended Data Fig. 8 Mode-resolved reorganization energy ( λ v ) and contribution of the high-frequency modes to the vibrational reorganization energy.

a , Mode-resolved picture of reorganisation energy for D 2 state of TTM-TPA obtained from TDDFT calculation displacing the ground state optimised geometry along the coordinate of each normal modes. b , Mode-resolved picture of reorganisation energy for D 1 state of TTM-TPA. c , \({\varphi }_{{lf}}^{{hf}}\) calculated for lowest excited state of Pentacene (S 1 , non-CT exciton), TTM radical (D 1 , non-CT exciton), TTM-3PCz (D 1 , CT exciton), APDC-DTPA (S 1 , CT exciton), TTM-TPA (D 1 , CT exciton). The reduced vibrational coupling to the high frequency regime in TTM-TPA and TTM-3PCz with respect to the TTM presented in c.

Extended Data Fig. 9 Electronic structure and wavelength-resolved impulsive vibrational spectra of M 2 TTM-3PCz/2PCz.

a , The transition dipole moment of the two lowest energy electronic transition of the M 2 TTM-3PCz and M 2 TTM-2PCz obtained from TDDFT with experimental absorption spectra. b , Frontier molecular orbitals of the M 2 TTM-3PCz and M 2 TTM-2PCz and the arrows indicate lowest bright charge-transfer transition. c , The wavelength resolved impulsive vibrational maps of M 2 TTM-3PCz and M 2 TTM-2PCz in CHCl 3 . d, radiative and non-radiative rate of the isomers obtained from PLQE and photoluminescence decay rates (see Extended Data Fig. 10 for details). It is important to note that depending on attachment on 2 or 3 positions of the phenyl-carbazole (PCz), the HOMO (N-nonbonding) and HOMO-1 (carbazole-π) molecular orbitals have different extent of leakage into the phenyl ring of the M 2 TTM. HOMO-1 (carbazole-π) orbitals is delocalised to the adjacent phenyl ring of the M 2 TTM in M 2 TTM-2PCz whereas for M 2 TTM-3PCz, HOMO (N-nonbonding) shows the similar phenomenon. N-nonbonding type HOMO for M 2 TTM-2PCz and π-type HOMO-1 for M 2 TTM-3PCz are extremely localised on the carbazole as the corresponding molecular orbitals have weak orbital coefficient on the linking carbon atom. As the electron accepting level (SOMO) is located on the M 2 TTM core, only delocalised occupied orbitals makes charge-transfer electronic transitions with non-negligible transition dipole moment which explains the placement of the arrows in the Extended Data Fig. 9b . Also due to the leakage on the adjacent phenyl ring of M 2 TTM, N-nonbonding and the carbazole-π molecular orbitals has higher orbital energy in M 2 TTM-3PCz and M 2 TTM-2PCz respectively in comparison to alternative isomer (Extended Data Fig. 9b ). That explains the smallest energy gap between two lowest energy transitions for M 2 TTM-2PCz (Extended Data Fig. 9b ).

Extended Data Fig. 10 PL characterization of M 2 TTM-3PCz/2PCz.

a , PL spectra b , time-resolved PL decay in solution.

Supplementary information

Supplementary information.

The file contains synthesis and characterization, Supplementary experimental data, quantum chemical calculations, and an in-depth discussion regarding the theory of non-radiative loss and vibrational coherence.

Peer Review File

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ghosh, P., Alvertis, A.M., Chowdhury, R. et al. Decoupling excitons from high-frequency vibrations in organic molecules. Nature 629 , 355–362 (2024). https://doi.org/10.1038/s41586-024-07246-x

Download citation

Received : 20 September 2022

Accepted : 27 February 2024

Published : 08 May 2024

Issue Date : 09 May 2024

DOI : https://doi.org/10.1038/s41586-024-07246-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

frequency format hypothesis

IMAGES

  1. 13 Different Types of Hypothesis (2024)

    frequency format hypothesis

  2. Progression of hypothesis probabilities and waveform frequency over a

    frequency format hypothesis

  3. How to Write a Strong Hypothesis in 6 Simple Steps

    frequency format hypothesis

  4. Data Analysis: Frequency Distribution, Regression Analysis, Hypothesis

    frequency format hypothesis

  5. PPT

    frequency format hypothesis

  6. The Frequency Hypothesis and Evolutionary Arguments

    frequency format hypothesis

VIDEO

  1. Introductory Statistics

  2. MATH&146 Section 1.3

  3. Lesson 1.3 Frequency, Frequency Tables, and Level of Measurement

  4. Types of Frequency Distribution

  5. Hypothesis Testing| Classification| Tabulation

  6. Frequency, Mean & Standard Deviation and Other Descriptive Analysis in SPSS

COMMENTS

  1. Frequency format hypothesis

    The frequency format hypothesis is the idea that the brain understands and processes information better when presented in frequency formats rather than a numerical or probability format. Thus according to the hypothesis, presenting information as 1 in 5 people rather than 20% leads to better comprehension.

  2. Frequency formats, probability formats, or problem structure? A test of

    The natural frequency hypothesis predicts that only the frequency problem version will facilitate Bayesian reasoning and result in an approximately correct solution (1/51 or ~ 2%). ... In Experiment 2, the frequency format used may have actually structured participants' responses for them by giving them a concrete reference class ("imagine ...

  3. Frequency formats, probability formats, or problem structure? A test of

    Five experiments addressed a controversy in the probability judgment literature that centers on the efficacy of framing probabilities as frequencies. The natural frequency view predicts that frequency formats attenuate errors, while the nested-sets view predicts that highlighting the set-subset structure of the problem reduces error, regardless of problem format. This study tested these ...

  4. Evidencing How Experience and Problem Format Affect Probabilistic

    The second hypothesis (H2) tests whether the frequency format is superior to the probability format only because it resembles the process of learning from experience. The ecological rationality framework states that people reason more accurately when using the frequency format because it induces experiential learning at the perceptual level.

  5. The natural frequency hypothesis and evolutionary arguments

    His hypothesis is that frequency format is the informational structure that our intellectual abilities involving probability are specific for, and they have an information-processing psychological mechanism tuned to or designed for this form of information structure [this is called the natural frequency hypothesis; see for example, Gigerenzer ...

  6. (PDF) The Frequency Hypothesis and Evolutionary Arguments

    The Frequency Hypothesis and Evolutionary Arguments. January 2008; ... _pace_ Gigerenzer, there are some reasons to believe that using the frequency format was not more adaptive than using the ...

  7. PDF Frequency formats, probability formats, or problem structure? A test of

    The natural frequency hypothesis predicts that only the frequency problem version will facilitate Bayesian rea-soning and result in an approximately correct solution ... Note that the frequency format asks for a numeric fre-quency estimate and provides a specific reference class out of which to make that estimate. The probability

  8. Frequency versus probability formats in statistical word problems

    This provides further evidence against the hypothesis that frequency formats are generally easier. In this experiment, again, the frequency versus probability comparison was properly controlled so that the problems differed in no other regard. ... Of the 25 participants giving frequency format answers, 48% responded correctly, whereas of the 79 ...

  9. (PDF) Frequency formats, probability formats, or problem structure? A

    The natural frequency hypothesis predicts that only the frequency problem version will facilitate Bayesian reasoning and result in an approximately correct solution (1/51 or ˜ 2%). ... In the frequency format, the last sentence read: 143 Nested-sets and extensional reasoning Table 1: Percentages of participants who committed conjunction errors ...

  10. Frequency Formats: How Primary School Stochastics Profits From

    A contrary view is expressed by the Nested-Sets Theory (NST) that explains the frequency-probability-effect as a result of emphasizing the nested-sets structure of the Bayesian problem when probabilities are translated into frequency format (Girotto and Gonzalez, 2001; Barbey and Sloman, 2007). By using natural frequencies, this nested-sets ...

  11. The natural frequency hypothesis and evolutionary arguments.

    They also offered an evolutionary argument to this hypothesis, according to which using frequencies was evolutionarily more advantageous to our hominin ancestors than using percentages, and this is why we can reason correctly about probabilities in the frequency format. This paper offers a critical review of this evolutionary argument.

  12. 5.2

    5.2 - Writing Hypotheses. The first step in conducting a hypothesis test is to write the hypothesis statements that are going to be tested. For each test you will have a null hypothesis ( H 0) and an alternative hypothesis ( H a ). When writing hypotheses there are three things that we need to know: (1) the parameter that we are testing (2) the ...

  13. The Frequency Hypothesis and Evolutionary Arguments

    The Frequency Hypothesis and Evolutionary Arguments Amitani, Yuichi (2008) The Frequency Hypothesis and Evolutionary Arguments. In: UNSPECIFIED. ... It will show first, that, _pace_ Gigerenzer, there are some reasons to believe that using the frequency format was not more adaptive than using the standard (percentage) format and, second, that ...

  14. The natural frequency hypothesis and evolutionary arguments

    The explanatory thrust of the natural frequency hypothesis is much less significant than its advocates assume. It is shown that there are reasons to believe using the frequency format was not more adaptive than using the standard (percentage) format, and there is a plausible alternative explanation for the improved test performances of ...

  15. Frequency versus probability formats in statistical word problems

    In a recent meta-analysis, McDowell & Jacobs (2017) showed that presenting a task in natural frequency format increases performance rates to 24% compared to only 4% when the same task is presented ...

  16. Why do demographers give rates per 100,000 people?

    there is actually also scientific theory about this, the "frequency format hypothesis": "The frequency format hypothesis is the idea that the brain understands and processes information better when presented in frequency formats rather than a numerical or probability format." (from Wikipedia)

  17. 1.3 Frequency, Frequency Tables, and Levels of Measurement

    A frequency is the number of times a value of the data occurs. According to Table 1.12, there are three students who work two hours, five students who work three hours, and so on.The sum of the values in the frequency column, 20, represents the total number of students included in the sample. A relative frequency is the ratio (fraction or proportion) of the number of times a value of the data ...

  18. Part II

    Frequency in Language - October 2019. To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account.

  19. Frequency formats, probability formats, or problem structure? A test of

    The natural frequency hypothesis predicts that only the frequency problem version will facilitate Bayesian reasoning and result in an approximately correct solution (1/51 or ~ 2%). ... In Experiment 2, the frequency format used may have actually structured participants' responses for them by giving them a concrete reference class (" ...

  20. Frequency format hypothesis explained

    The frequency format hypothesis is the idea that the brain understands and processes information better when presented in frequency formats rather than a numerical or probability format. Thus according to the hypothesis, presenting information as 1 in 5 people rather than 20% leads to better comprehension.

  21. Hypothesis Testing

    Step 1: State your null and alternate hypothesis. Step 2: Collect data. Step 3: Perform a statistical test. Step 4: Decide whether to reject or fail to reject your null hypothesis. Step 5: Present your findings. Other interesting articles. Frequently asked questions about hypothesis testing.

  22. Chi-Square (Χ²) Tests

    Frequency distributions are often displayed using frequency distribution tables. A frequency distribution table shows the number of observations in each group. ... Null hypothesis (H 0): ... Report the chi-square alongside its degrees of freedom, sample size, and p value, following this format: ...

  23. Hypothesis Testing

    We then determine the appropriate test statistic for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H0: p1 = p 10 , p2 = p 20 , ..., pk = p k0. We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1.

  24. Decoupling excitons from high-frequency vibrations in organic ...

    The coupling of excitons in π-conjugated molecules to high-frequency vibrational modes, particularly carbon-carbon stretch modes (1,000-1,600 cm−1) has been thought to be unavoidable1,2.

  25. Source reservoir controls on the size, frequency, and ...

    Very large magnitude explosive eruptions (M > 7) expel tens to thousands of cubic kilometers of silicic magma ().Their global frequency is inversely proportional to the volume of magma released; the largest eruptions recur over timescales of order 100's ka (1, 2).These super-eruptions are rare but have a global impact on the environment and human populations ().