A free, AI-powered research tool for scientific literature

  • Lynda Gratton
  • Market Structure

New & Improved API for Developers

Introducing semantic reader in beta.

Stay Connected With Semantic Scholar Sign Up What Is Semantic Scholar? Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Research paper

Writing a Research Paper Introduction | Step-by-Step Guide

Published on September 24, 2022 by Jack Caulfield . Revised on March 27, 2023.

Writing a Research Paper Introduction

The introduction to a research paper is where you set up your topic and approach for the reader. It has several key goals:

  • Present your topic and get the reader interested
  • Provide background or summarize existing research
  • Position your own approach
  • Detail your specific research problem and problem statement
  • Give an overview of the paper’s structure

The introduction looks slightly different depending on whether your paper presents the results of original empirical research or constructs an argument by engaging with a variety of sources.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

Step 1: introduce your topic, step 2: describe the background, step 3: establish your research problem, step 4: specify your objective(s), step 5: map out your paper, research paper introduction examples, frequently asked questions about the research paper introduction.

The first job of the introduction is to tell the reader what your topic is and why it’s interesting or important. This is generally accomplished with a strong opening hook.

The hook is a striking opening sentence that clearly conveys the relevance of your topic. Think of an interesting fact or statistic, a strong statement, a question, or a brief anecdote that will get the reader wondering about your topic.

For example, the following could be an effective hook for an argumentative paper about the environmental impact of cattle farming:

A more empirical paper investigating the relationship of Instagram use with body image issues in adolescent girls might use the following hook:

Don’t feel that your hook necessarily has to be deeply impressive or creative. Clarity and relevance are still more important than catchiness. The key thing is to guide the reader into your topic and situate your ideas.

Prevent plagiarism. Run a free check.

This part of the introduction differs depending on what approach your paper is taking.

In a more argumentative paper, you’ll explore some general background here. In a more empirical paper, this is the place to review previous research and establish how yours fits in.

Argumentative paper: Background information

After you’ve caught your reader’s attention, specify a bit more, providing context and narrowing down your topic.

Provide only the most relevant background information. The introduction isn’t the place to get too in-depth; if more background is essential to your paper, it can appear in the body .

Empirical paper: Describing previous research

For a paper describing original research, you’ll instead provide an overview of the most relevant research that has already been conducted. This is a sort of miniature literature review —a sketch of the current state of research into your topic, boiled down to a few sentences.

This should be informed by genuine engagement with the literature. Your search can be less extensive than in a full literature review, but a clear sense of the relevant research is crucial to inform your own work.

Begin by establishing the kinds of research that have been done, and end with limitations or gaps in the research that you intend to respond to.

The next step is to clarify how your own research fits in and what problem it addresses.

Argumentative paper: Emphasize importance

In an argumentative research paper, you can simply state the problem you intend to discuss, and what is original or important about your argument.

Empirical paper: Relate to the literature

In an empirical research paper, try to lead into the problem on the basis of your discussion of the literature. Think in terms of these questions:

  • What research gap is your work intended to fill?
  • What limitations in previous work does it address?
  • What contribution to knowledge does it make?

You can make the connection between your problem and the existing research using phrases like the following.

Now you’ll get into the specifics of what you intend to find out or express in your research paper.

The way you frame your research objectives varies. An argumentative paper presents a thesis statement, while an empirical paper generally poses a research question (sometimes with a hypothesis as to the answer).

Argumentative paper: Thesis statement

The thesis statement expresses the position that the rest of the paper will present evidence and arguments for. It can be presented in one or two sentences, and should state your position clearly and directly, without providing specific arguments for it at this point.

Empirical paper: Research question and hypothesis

The research question is the question you want to answer in an empirical research paper.

Present your research question clearly and directly, with a minimum of discussion at this point. The rest of the paper will be taken up with discussing and investigating this question; here you just need to express it.

A research question can be framed either directly or indirectly.

  • This study set out to answer the following question: What effects does daily use of Instagram have on the prevalence of body image issues among adolescent girls?
  • We investigated the effects of daily Instagram use on the prevalence of body image issues among adolescent girls.

If your research involved testing hypotheses , these should be stated along with your research question. They are usually presented in the past tense, since the hypothesis will already have been tested by the time you are writing up your paper.

For example, the following hypothesis might respond to the research question above:

The only proofreading tool specialized in correcting academic writing - try for free!

The academic proofreading tool has been trained on 1000s of academic texts and by native English editors. Making it the most accurate and reliable proofreading tool for students.

scientific paper or research

Try for free

The final part of the introduction is often dedicated to a brief overview of the rest of the paper.

In a paper structured using the standard scientific “introduction, methods, results, discussion” format, this isn’t always necessary. But if your paper is structured in a less predictable way, it’s important to describe the shape of it for the reader.

If included, the overview should be concise, direct, and written in the present tense.

  • This paper will first discuss several examples of survey-based research into adolescent social media use, then will go on to …
  • This paper first discusses several examples of survey-based research into adolescent social media use, then goes on to …

Full examples of research paper introductions are shown in the tabs below: one for an argumentative paper, the other for an empirical paper.

  • Argumentative paper
  • Empirical paper

Are cows responsible for climate change? A recent study (RIVM, 2019) shows that cattle farmers account for two thirds of agricultural nitrogen emissions in the Netherlands. These emissions result from nitrogen in manure, which can degrade into ammonia and enter the atmosphere. The study’s calculations show that agriculture is the main source of nitrogen pollution, accounting for 46% of the country’s total emissions. By comparison, road traffic and households are responsible for 6.1% each, the industrial sector for 1%. While efforts are being made to mitigate these emissions, policymakers are reluctant to reckon with the scale of the problem. The approach presented here is a radical one, but commensurate with the issue. This paper argues that the Dutch government must stimulate and subsidize livestock farmers, especially cattle farmers, to transition to sustainable vegetable farming. It first establishes the inadequacy of current mitigation measures, then discusses the various advantages of the results proposed, and finally addresses potential objections to the plan on economic grounds.

The rise of social media has been accompanied by a sharp increase in the prevalence of body image issues among women and girls. This correlation has received significant academic attention: Various empirical studies have been conducted into Facebook usage among adolescent girls (Tiggermann & Slater, 2013; Meier & Gray, 2014). These studies have consistently found that the visual and interactive aspects of the platform have the greatest influence on body image issues. Despite this, highly visual social media (HVSM) such as Instagram have yet to be robustly researched. This paper sets out to address this research gap. We investigated the effects of daily Instagram use on the prevalence of body image issues among adolescent girls. It was hypothesized that daily Instagram use would be associated with an increase in body image concerns and a decrease in self-esteem ratings.

The introduction of a research paper includes several key elements:

  • A hook to catch the reader’s interest
  • Relevant background on the topic
  • Details of your research problem

and your problem statement

  • A thesis statement or research question
  • Sometimes an overview of the paper

Don’t feel that you have to write the introduction first. The introduction is often one of the last parts of the research paper you’ll write, along with the conclusion.

This is because it can be easier to introduce your paper once you’ve already written the body ; you may not have the clearest idea of your arguments until you’ve written them, and things can change during the writing process .

The way you present your research problem in your introduction varies depending on the nature of your research paper . A research paper that presents a sustained argument will usually encapsulate this argument in a thesis statement .

A research paper designed to present the results of empirical research tends to present a research question that it seeks to answer. It may also include a hypothesis —a prediction that will be confirmed or disproved by your research.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Caulfield, J. (2023, March 27). Writing a Research Paper Introduction | Step-by-Step Guide. Scribbr. Retrieved April 8, 2024, from https://www.scribbr.com/research-paper/research-paper-introduction/

Is this article helpful?

Jack Caulfield

Jack Caulfield

Other students also liked, writing strong research questions | criteria & examples, writing a research paper conclusion | step-by-step guide, research paper format | apa, mla, & chicago templates, what is your plagiarism score.

Help | Advanced Search

Computer Science > Computation and Language

Title: mapping the increasing use of llms in scientific papers.

Abstract: Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent this tool might have an effect on global scientific practices. However, we lack a precise measure of the proportion of academic writing substantially modified or produced by LLMs. To address this gap, we conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals, using a population-level statistical framework to measure the prevalence of LLM-modified content over time. Our statistical estimation operates on the corpus level and is more robust than inference on individual instances. Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers (up to 17.5%). In comparison, Mathematics papers and the Nature portfolio showed the least LLM modification (up to 6.3%). Moreover, at an aggregate level, our analysis reveals that higher levels of LLM-modification are associated with papers whose first authors post preprints more frequently, papers in more crowded research areas, and papers of shorter lengths. Our findings suggests that LLMs are being broadly used in scientific writings.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.25(3); 2014 Oct

Logo of ejifcc

Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide

Jacalyn kelly.

1 Clinical Biochemistry, Department of Pediatric Laboratory Medicine, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada

Tara Sadeghieh

Khosrow adeli.

2 Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada

3 Chair, Communications and Publications Division (CPD), International Federation for Sick Clinical Chemistry (IFCC), Milan, Italy

The authors declare no conflicts of interest regarding publication of this article.

Peer review has been defined as a process of subjecting an author’s scholarly work, research or ideas to the scrutiny of others who are experts in the same field. It functions to encourage authors to meet the accepted high standards of their discipline and to control the dissemination of research data to ensure that unwarranted claims, unacceptable interpretations or personal views are not published without prior expert review. Despite its wide-spread use by most journals, the peer review process has also been widely criticised due to the slowness of the process to publish new findings and due to perceived bias by the editors and/or reviewers. Within the scientific community, peer review has become an essential component of the academic writing process. It helps ensure that papers published in scientific journals answer meaningful research questions and draw accurate conclusions based on professionally executed experimentation. Submission of low quality manuscripts has become increasingly prevalent, and peer review acts as a filter to prevent this work from reaching the scientific community. The major advantage of a peer review process is that peer-reviewed articles provide a trusted form of scientific communication. Since scientific knowledge is cumulative and builds on itself, this trust is particularly important. Despite the positive impacts of peer review, critics argue that the peer review process stifles innovation in experimentation, and acts as a poor screen against plagiarism. Despite its downfalls, there has not yet been a foolproof system developed to take the place of peer review, however, researchers have been looking into electronic means of improving the peer review process. Unfortunately, the recent explosion in online only/electronic journals has led to mass publication of a large number of scientific articles with little or no peer review. This poses significant risk to advances in scientific knowledge and its future potential. The current article summarizes the peer review process, highlights the pros and cons associated with different types of peer review, and describes new methods for improving peer review.

WHAT IS PEER REVIEW AND WHAT IS ITS PURPOSE?

Peer Review is defined as “a process of subjecting an author’s scholarly work, research or ideas to the scrutiny of others who are experts in the same field” ( 1 ). Peer review is intended to serve two primary purposes. Firstly, it acts as a filter to ensure that only high quality research is published, especially in reputable journals, by determining the validity, significance and originality of the study. Secondly, peer review is intended to improve the quality of manuscripts that are deemed suitable for publication. Peer reviewers provide suggestions to authors on how to improve the quality of their manuscripts, and also identify any errors that need correcting before publication.

HISTORY OF PEER REVIEW

The concept of peer review was developed long before the scholarly journal. In fact, the peer review process is thought to have been used as a method of evaluating written work since ancient Greece ( 2 ). The peer review process was first described by a physician named Ishaq bin Ali al-Rahwi of Syria, who lived from 854-931 CE, in his book Ethics of the Physician ( 2 ). There, he stated that physicians must take notes describing the state of their patients’ medical conditions upon each visit. Following treatment, the notes were scrutinized by a local medical council to determine whether the physician had met the required standards of medical care. If the medical council deemed that the appropriate standards were not met, the physician in question could receive a lawsuit from the maltreated patient ( 2 ).

The invention of the printing press in 1453 allowed written documents to be distributed to the general public ( 3 ). At this time, it became more important to regulate the quality of the written material that became publicly available, and editing by peers increased in prevalence. In 1620, Francis Bacon wrote the work Novum Organum, where he described what eventually became known as the first universal method for generating and assessing new science ( 3 ). His work was instrumental in shaping the Scientific Method ( 3 ). In 1665, the French Journal des sçavans and the English Philosophical Transactions of the Royal Society were the first scientific journals to systematically publish research results ( 4 ). Philosophical Transactions of the Royal Society is thought to be the first journal to formalize the peer review process in 1665 ( 5 ), however, it is important to note that peer review was initially introduced to help editors decide which manuscripts to publish in their journals, and at that time it did not serve to ensure the validity of the research ( 6 ). It did not take long for the peer review process to evolve, and shortly thereafter papers were distributed to reviewers with the intent of authenticating the integrity of the research study before publication. The Royal Society of Edinburgh adhered to the following peer review process, published in their Medical Essays and Observations in 1731: “Memoirs sent by correspondence are distributed according to the subject matter to those members who are most versed in these matters. The report of their identity is not known to the author.” ( 7 ). The Royal Society of London adopted this review procedure in 1752 and developed the “Committee on Papers” to review manuscripts before they were published in Philosophical Transactions ( 6 ).

Peer review in the systematized and institutionalized form has developed immensely since the Second World War, at least partly due to the large increase in scientific research during this period ( 7 ). It is now used not only to ensure that a scientific manuscript is experimentally and ethically sound, but also to determine which papers sufficiently meet the journal’s standards of quality and originality before publication. Peer review is now standard practice by most credible scientific journals, and is an essential part of determining the credibility and quality of work submitted.

IMPACT OF THE PEER REVIEW PROCESS

Peer review has become the foundation of the scholarly publication system because it effectively subjects an author’s work to the scrutiny of other experts in the field. Thus, it encourages authors to strive to produce high quality research that will advance the field. Peer review also supports and maintains integrity and authenticity in the advancement of science. A scientific hypothesis or statement is generally not accepted by the academic community unless it has been published in a peer-reviewed journal ( 8 ). The Institute for Scientific Information ( ISI ) only considers journals that are peer-reviewed as candidates to receive Impact Factors. Peer review is a well-established process which has been a formal part of scientific communication for over 300 years.

OVERVIEW OF THE PEER REVIEW PROCESS

The peer review process begins when a scientist completes a research study and writes a manuscript that describes the purpose, experimental design, results, and conclusions of the study. The scientist then submits this paper to a suitable journal that specializes in a relevant research field, a step referred to as pre-submission. The editors of the journal will review the paper to ensure that the subject matter is in line with that of the journal, and that it fits with the editorial platform. Very few papers pass this initial evaluation. If the journal editors feel the paper sufficiently meets these requirements and is written by a credible source, they will send the paper to accomplished researchers in the field for a formal peer review. Peer reviewers are also known as referees (this process is summarized in Figure 1 ). The role of the editor is to select the most appropriate manuscripts for the journal, and to implement and monitor the peer review process. Editors must ensure that peer reviews are conducted fairly, and in an effective and timely manner. They must also ensure that there are no conflicts of interest involved in the peer review process.

An external file that holds a picture, illustration, etc.
Object name is ejifcc-25-227-g001.jpg

Overview of the review process

When a reviewer is provided with a paper, he or she reads it carefully and scrutinizes it to evaluate the validity of the science, the quality of the experimental design, and the appropriateness of the methods used. The reviewer also assesses the significance of the research, and judges whether the work will contribute to advancement in the field by evaluating the importance of the findings, and determining the originality of the research. Additionally, reviewers identify any scientific errors and references that are missing or incorrect. Peer reviewers give recommendations to the editor regarding whether the paper should be accepted, rejected, or improved before publication in the journal. The editor will mediate author-referee discussion in order to clarify the priority of certain referee requests, suggest areas that can be strengthened, and overrule reviewer recommendations that are beyond the study’s scope ( 9 ). If the paper is accepted, as per suggestion by the peer reviewer, the paper goes into the production stage, where it is tweaked and formatted by the editors, and finally published in the scientific journal. An overview of the review process is presented in Figure 1 .

WHO CONDUCTS REVIEWS?

Peer reviews are conducted by scientific experts with specialized knowledge on the content of the manuscript, as well as by scientists with a more general knowledge base. Peer reviewers can be anyone who has competence and expertise in the subject areas that the journal covers. Reviewers can range from young and up-and-coming researchers to old masters in the field. Often, the young reviewers are the most responsive and deliver the best quality reviews, though this is not always the case. On average, a reviewer will conduct approximately eight reviews per year, according to a study on peer review by the Publishing Research Consortium (PRC) ( 7 ). Journals will often have a pool of reviewers with diverse backgrounds to allow for many different perspectives. They will also keep a rather large reviewer bank, so that reviewers do not get burnt out, overwhelmed or time constrained from reviewing multiple articles simultaneously.

WHY DO REVIEWERS REVIEW?

Referees are typically not paid to conduct peer reviews and the process takes considerable effort, so the question is raised as to what incentive referees have to review at all. Some feel an academic duty to perform reviews, and are of the mentality that if their peers are expected to review their papers, then they should review the work of their peers as well. Reviewers may also have personal contacts with editors, and may want to assist as much as possible. Others review to keep up-to-date with the latest developments in their field, and reading new scientific papers is an effective way to do so. Some scientists use peer review as an opportunity to advance their own research as it stimulates new ideas and allows them to read about new experimental techniques. Other reviewers are keen on building associations with prestigious journals and editors and becoming part of their community, as sometimes reviewers who show dedication to the journal are later hired as editors. Some scientists see peer review as a chance to become aware of the latest research before their peers, and thus be first to develop new insights from the material. Finally, in terms of career development, peer reviewing can be desirable as it is often noted on one’s resume or CV. Many institutions consider a researcher’s involvement in peer review when assessing their performance for promotions ( 11 ). Peer reviewing can also be an effective way for a scientist to show their superiors that they are committed to their scientific field ( 5 ).

ARE REVIEWERS KEEN TO REVIEW?

A 2009 international survey of 4000 peer reviewers conducted by the charity Sense About Science at the British Science Festival at the University of Surrey, found that 90% of reviewers were keen to peer review ( 12 ). One third of respondents to the survey said they were happy to review up to five papers per year, and an additional one third of respondents were happy to review up to ten.

HOW LONG DOES IT TAKE TO REVIEW ONE PAPER?

On average, it takes approximately six hours to review one paper ( 12 ), however, this number may vary greatly depending on the content of the paper and the nature of the peer reviewer. One in every 100 participants in the “Sense About Science” survey claims to have taken more than 100 hours to review their last paper ( 12 ).

HOW TO DETERMINE IF A JOURNAL IS PEER REVIEWED

Ulrichsweb is a directory that provides information on over 300,000 periodicals, including information regarding which journals are peer reviewed ( 13 ). After logging into the system using an institutional login (eg. from the University of Toronto), search terms, journal titles or ISSN numbers can be entered into the search bar. The database provides the title, publisher, and country of origin of the journal, and indicates whether the journal is still actively publishing. The black book symbol (labelled ‘refereed’) reveals that the journal is peer reviewed.

THE EVALUATION CRITERIA FOR PEER REVIEW OF SCIENTIFIC PAPERS

As previously mentioned, when a reviewer receives a scientific manuscript, he/she will first determine if the subject matter is well suited for the content of the journal. The reviewer will then consider whether the research question is important and original, a process which may be aided by a literature scan of review articles.

Scientific papers submitted for peer review usually follow a specific structure that begins with the title, followed by the abstract, introduction, methodology, results, discussion, conclusions, and references. The title must be descriptive and include the concept and organism investigated, and potentially the variable manipulated and the systems used in the study. The peer reviewer evaluates if the title is descriptive enough, and ensures that it is clear and concise. A study by the National Association of Realtors (NAR) published by the Oxford University Press in 2006 indicated that the title of a manuscript plays a significant role in determining reader interest, as 72% of respondents said they could usually judge whether an article will be of interest to them based on the title and the author, while 13% of respondents claimed to always be able to do so ( 14 ).

The abstract is a summary of the paper, which briefly mentions the background or purpose, methods, key results, and major conclusions of the study. The peer reviewer assesses whether the abstract is sufficiently informative and if the content of the abstract is consistent with the rest of the paper. The NAR study indicated that 40% of respondents could determine whether an article would be of interest to them based on the abstract alone 60-80% of the time, while 32% could judge an article based on the abstract 80-100% of the time ( 14 ). This demonstrates that the abstract alone is often used to assess the value of an article.

The introduction of a scientific paper presents the research question in the context of what is already known about the topic, in order to identify why the question being studied is of interest to the scientific community, and what gap in knowledge the study aims to fill ( 15 ). The introduction identifies the study’s purpose and scope, briefly describes the general methods of investigation, and outlines the hypothesis and predictions ( 15 ). The peer reviewer determines whether the introduction provides sufficient background information on the research topic, and ensures that the research question and hypothesis are clearly identifiable.

The methods section describes the experimental procedures, and explains why each experiment was conducted. The methods section also includes the equipment and reagents used in the investigation. The methods section should be detailed enough that it can be used it to repeat the experiment ( 15 ). Methods are written in the past tense and in the active voice. The peer reviewer assesses whether the appropriate methods were used to answer the research question, and if they were written with sufficient detail. If information is missing from the methods section, it is the peer reviewer’s job to identify what details need to be added.

The results section is where the outcomes of the experiment and trends in the data are explained without judgement, bias or interpretation ( 15 ). This section can include statistical tests performed on the data, as well as figures and tables in addition to the text. The peer reviewer ensures that the results are described with sufficient detail, and determines their credibility. Reviewers also confirm that the text is consistent with the information presented in tables and figures, and that all figures and tables included are important and relevant ( 15 ). The peer reviewer will also make sure that table and figure captions are appropriate both contextually and in length, and that tables and figures present the data accurately.

The discussion section is where the data is analyzed. Here, the results are interpreted and related to past studies ( 15 ). The discussion describes the meaning and significance of the results in terms of the research question and hypothesis, and states whether the hypothesis was supported or rejected. This section may also provide possible explanations for unusual results and suggestions for future research ( 15 ). The discussion should end with a conclusions section that summarizes the major findings of the investigation. The peer reviewer determines whether the discussion is clear and focused, and whether the conclusions are an appropriate interpretation of the results. Reviewers also ensure that the discussion addresses the limitations of the study, any anomalies in the results, the relationship of the study to previous research, and the theoretical implications and practical applications of the study.

The references are found at the end of the paper, and list all of the information sources cited in the text to describe the background, methods, and/or interpret results. Depending on the citation method used, the references are listed in alphabetical order according to author last name, or numbered according to the order in which they appear in the paper. The peer reviewer ensures that references are used appropriately, cited accurately, formatted correctly, and that none are missing.

Finally, the peer reviewer determines whether the paper is clearly written and if the content seems logical. After thoroughly reading through the entire manuscript, they determine whether it meets the journal’s standards for publication,

and whether it falls within the top 25% of papers in its field ( 16 ) to determine priority for publication. An overview of what a peer reviewer looks for when evaluating a manuscript, in order of importance, is presented in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is ejifcc-25-227-g002.jpg

How a peer review evaluates a manuscript

To increase the chance of success in the peer review process, the author must ensure that the paper fully complies with the journal guidelines before submission. The author must also be open to criticism and suggested revisions, and learn from mistakes made in previous submissions.

ADVANTAGES AND DISADVANTAGES OF THE DIFFERENT TYPES OF PEER REVIEW

The peer review process is generally conducted in one of three ways: open review, single-blind review, or double-blind review. In an open review, both the author of the paper and the peer reviewer know one another’s identity. Alternatively, in single-blind review, the reviewer’s identity is kept private, but the author’s identity is revealed to the reviewer. In double-blind review, the identities of both the reviewer and author are kept anonymous. Open peer review is advantageous in that it prevents the reviewer from leaving malicious comments, being careless, or procrastinating completion of the review ( 2 ). It encourages reviewers to be open and honest without being disrespectful. Open reviewing also discourages plagiarism amongst authors ( 2 ). On the other hand, open peer review can also prevent reviewers from being honest for fear of developing bad rapport with the author. The reviewer may withhold or tone down their criticisms in order to be polite ( 2 ). This is especially true when younger reviewers are given a more esteemed author’s work, in which case the reviewer may be hesitant to provide criticism for fear that it will damper their relationship with a superior ( 2 ). According to the Sense About Science survey, editors find that completely open reviewing decreases the number of people willing to participate, and leads to reviews of little value ( 12 ). In the aforementioned study by the PRC, only 23% of authors surveyed had experience with open peer review ( 7 ).

Single-blind peer review is by far the most common. In the PRC study, 85% of authors surveyed had experience with single-blind peer review ( 7 ). This method is advantageous as the reviewer is more likely to provide honest feedback when their identity is concealed ( 2 ). This allows the reviewer to make independent decisions without the influence of the author ( 2 ). The main disadvantage of reviewer anonymity, however, is that reviewers who receive manuscripts on subjects similar to their own research may be tempted to delay completing the review in order to publish their own data first ( 2 ).

Double-blind peer review is advantageous as it prevents the reviewer from being biased against the author based on their country of origin or previous work ( 2 ). This allows the paper to be judged based on the quality of the content, rather than the reputation of the author. The Sense About Science survey indicates that 76% of researchers think double-blind peer review is a good idea ( 12 ), and the PRC survey indicates that 45% of authors have had experience with double-blind peer review ( 7 ). The disadvantage of double-blind peer review is that, especially in niche areas of research, it can sometimes be easy for the reviewer to determine the identity of the author based on writing style, subject matter or self-citation, and thus, impart bias ( 2 ).

Masking the author’s identity from peer reviewers, as is the case in double-blind review, is generally thought to minimize bias and maintain review quality. A study by Justice et al. in 1998 investigated whether masking author identity affected the quality of the review ( 17 ). One hundred and eighteen manuscripts were randomized; 26 were peer reviewed as normal, and 92 were moved into the ‘intervention’ arm, where editor quality assessments were completed for 77 manuscripts and author quality assessments were completed for 40 manuscripts ( 17 ). There was no perceived difference in quality between the masked and unmasked reviews. Additionally, the masking itself was often unsuccessful, especially with well-known authors ( 17 ). However, a previous study conducted by McNutt et al. had different results ( 18 ). In this case, blinding was successful 73% of the time, and they found that when author identity was masked, the quality of review was slightly higher ( 18 ). Although Justice et al. argued that this difference was too small to be consequential, their study targeted only biomedical journals, and the results cannot be generalized to journals of a different subject matter ( 17 ). Additionally, there were problems masking the identities of well-known authors, introducing a flaw in the methods. Regardless, Justice et al. concluded that masking author identity from reviewers may not improve review quality ( 17 ).

In addition to open, single-blind and double-blind peer review, there are two experimental forms of peer review. In some cases, following publication, papers may be subjected to post-publication peer review. As many papers are now published online, the scientific community has the opportunity to comment on these papers, engage in online discussions and post a formal review. For example, online publishers PLOS and BioMed Central have enabled scientists to post comments on published papers if they are registered users of the site ( 10 ). Philica is another journal launched with this experimental form of peer review. Only 8% of authors surveyed in the PRC study had experience with post-publication review ( 7 ). Another experimental form of peer review called Dynamic Peer Review has also emerged. Dynamic peer review is conducted on websites such as Naboj, which allow scientists to conduct peer reviews on articles in the preprint media ( 19 ). The peer review is conducted on repositories and is a continuous process, which allows the public to see both the article and the reviews as the article is being developed ( 19 ). Dynamic peer review helps prevent plagiarism as the scientific community will already be familiar with the work before the peer reviewed version appears in print ( 19 ). Dynamic review also reduces the time lag between manuscript submission and publishing. An example of a preprint server is the ‘arXiv’ developed by Paul Ginsparg in 1991, which is used primarily by physicists ( 19 ). These alternative forms of peer review are still un-established and experimental. Traditional peer review is time-tested and still highly utilized. All methods of peer review have their advantages and deficiencies, and all are prone to error.

PEER REVIEW OF OPEN ACCESS JOURNALS

Open access (OA) journals are becoming increasingly popular as they allow the potential for widespread distribution of publications in a timely manner ( 20 ). Nevertheless, there can be issues regarding the peer review process of open access journals. In a study published in Science in 2013, John Bohannon submitted 304 slightly different versions of a fictional scientific paper (written by a fake author, working out of a non-existent institution) to a selected group of OA journals. This study was performed in order to determine whether papers submitted to OA journals are properly reviewed before publication in comparison to subscription-based journals. The journals in this study were selected from the Directory of Open Access Journals (DOAJ) and Biall’s List, a list of journals which are potentially predatory, and all required a fee for publishing ( 21 ). Of the 304 journals, 157 accepted a fake paper, suggesting that acceptance was based on financial interest rather than the quality of article itself, while 98 journals promptly rejected the fakes ( 21 ). Although this study highlights useful information on the problems associated with lower quality publishers that do not have an effective peer review system in place, the article also generalizes the study results to all OA journals, which can be detrimental to the general perception of OA journals. There were two limitations of the study that made it impossible to accurately determine the relationship between peer review and OA journals: 1) there was no control group (subscription-based journals), and 2) the fake papers were sent to a non-randomized selection of journals, resulting in bias.

JOURNAL ACCEPTANCE RATES

Based on a recent survey, the average acceptance rate for papers submitted to scientific journals is about 50% ( 7 ). Twenty percent of the submitted manuscripts that are not accepted are rejected prior to review, and 30% are rejected following review ( 7 ). Of the 50% accepted, 41% are accepted with the condition of revision, while only 9% are accepted without the request for revision ( 7 ).

SATISFACTION WITH THE PEER REVIEW SYSTEM

Based on a recent survey by the PRC, 64% of academics are satisfied with the current system of peer review, and only 12% claimed to be ‘dissatisfied’ ( 7 ). The large majority, 85%, agreed with the statement that ‘scientific communication is greatly helped by peer review’ ( 7 ). There was a similarly high level of support (83%) for the idea that peer review ‘provides control in scientific communication’ ( 7 ).

HOW TO PEER REVIEW EFFECTIVELY

The following are ten tips on how to be an effective peer reviewer as indicated by Brian Lucey, an expert on the subject ( 22 ):

1) Be professional

Peer review is a mutual responsibility among fellow scientists, and scientists are expected, as part of the academic community, to take part in peer review. If one is to expect others to review their work, they should commit to reviewing the work of others as well, and put effort into it.

2) Be pleasant

If the paper is of low quality, suggest that it be rejected, but do not leave ad hominem comments. There is no benefit to being ruthless.

3) Read the invite

When emailing a scientist to ask them to conduct a peer review, the majority of journals will provide a link to either accept or reject. Do not respond to the email, respond to the link.

4) Be helpful

Suggest how the authors can overcome the shortcomings in their paper. A review should guide the author on what is good and what needs work from the reviewer’s perspective.

5) Be scientific

The peer reviewer plays the role of a scientific peer, not an editor for proofreading or decision-making. Don’t fill a review with comments on editorial and typographic issues. Instead, focus on adding value with scientific knowledge and commenting on the credibility of the research conducted and conclusions drawn. If the paper has a lot of typographical errors, suggest that it be professionally proof edited as part of the review.

6) Be timely

Stick to the timeline given when conducting a peer review. Editors track who is reviewing what and when and will know if someone is late on completing a review. It is important to be timely both out of respect for the journal and the author, as well as to not develop a reputation of being late for review deadlines.

7) Be realistic

The peer reviewer must be realistic about the work presented, the changes they suggest and their role. Peer reviewers may set the bar too high for the paper they are editing by proposing changes that are too ambitious and editors must override them.

8) Be empathetic

Ensure that the review is scientific, helpful and courteous. Be sensitive and respectful with word choice and tone in a review.

Remember that both specialists and generalists can provide valuable insight when peer reviewing. Editors will try to get both specialised and general reviewers for any particular paper to allow for different perspectives. If someone is asked to review, the editor has determined they have a valid and useful role to play, even if the paper is not in their area of expertise.

10) Be organised

A review requires structure and logical flow. A reviewer should proofread their review before submitting it for structural, grammatical and spelling errors as well as for clarity. Most publishers provide short guides on structuring a peer review on their website. Begin with an overview of the proposed improvements; then provide feedback on the paper structure, the quality of data sources and methods of investigation used, the logical flow of argument, and the validity of conclusions drawn. Then provide feedback on style, voice and lexical concerns, with suggestions on how to improve.

In addition, the American Physiology Society (APS) recommends in its Peer Review 101 Handout that peer reviewers should put themselves in both the editor’s and author’s shoes to ensure that they provide what both the editor and the author need and expect ( 11 ). To please the editor, the reviewer should ensure that the peer review is completed on time, and that it provides clear explanations to back up recommendations. To be helpful to the author, the reviewer must ensure that their feedback is constructive. It is suggested that the reviewer take time to think about the paper; they should read it once, wait at least a day, and then re-read it before writing the review ( 11 ). The APS also suggests that Graduate students and researchers pay attention to how peer reviewers edit their work, as well as to what edits they find helpful, in order to learn how to peer review effectively ( 11 ). Additionally, it is suggested that Graduate students practice reviewing by editing their peers’ papers and asking a faculty member for feedback on their efforts. It is recommended that young scientists offer to peer review as often as possible in order to become skilled at the process ( 11 ). The majority of students, fellows and trainees do not get formal training in peer review, but rather learn by observing their mentors. According to the APS, one acquires experience through networking and referrals, and should therefore try to strengthen relationships with journal editors by offering to review manuscripts ( 11 ). The APS also suggests that experienced reviewers provide constructive feedback to students and junior colleagues on their peer review efforts, and encourages them to peer review to demonstrate the importance of this process in improving science ( 11 ).

The peer reviewer should only comment on areas of the manuscript that they are knowledgeable about ( 23 ). If there is any section of the manuscript they feel they are not qualified to review, they should mention this in their comments and not provide further feedback on that section. The peer reviewer is not permitted to share any part of the manuscript with a colleague (even if they may be more knowledgeable in the subject matter) without first obtaining permission from the editor ( 23 ). If a peer reviewer comes across something they are unsure of in the paper, they can consult the literature to try and gain insight. It is important for scientists to remember that if a paper can be improved by the expertise of one of their colleagues, the journal must be informed of the colleague’s help, and approval must be obtained for their colleague to read the protected document. Additionally, the colleague must be identified in the confidential comments to the editor, in order to ensure that he/she is appropriately credited for any contributions ( 23 ). It is the job of the reviewer to make sure that the colleague assisting is aware of the confidentiality of the peer review process ( 23 ). Once the review is complete, the manuscript must be destroyed and cannot be saved electronically by the reviewers ( 23 ).

COMMON ERRORS IN SCIENTIFIC PAPERS

When performing a peer review, there are some common scientific errors to look out for. Most of these errors are violations of logic and common sense: these may include contradicting statements, unwarranted conclusions, suggestion of causation when there is only support for correlation, inappropriate extrapolation, circular reasoning, or pursuit of a trivial question ( 24 ). It is also common for authors to suggest that two variables are different because the effects of one variable are statistically significant while the effects of the other variable are not, rather than directly comparing the two variables ( 24 ). Authors sometimes oversee a confounding variable and do not control for it, or forget to include important details on how their experiments were controlled or the physical state of the organisms studied ( 24 ). Another common fault is the author’s failure to define terms or use words with precision, as these practices can mislead readers ( 24 ). Jargon and/or misused terms can be a serious problem in papers. Inaccurate statements about specific citations are also a common occurrence ( 24 ). Additionally, many studies produce knowledge that can be applied to areas of science outside the scope of the original study, therefore it is better for reviewers to look at the novelty of the idea, conclusions, data, and methodology, rather than scrutinize whether or not the paper answered the specific question at hand ( 24 ). Although it is important to recognize these points, when performing a review it is generally better practice for the peer reviewer to not focus on a checklist of things that could be wrong, but rather carefully identify the problems specific to each paper and continuously ask themselves if anything is missing ( 24 ). An extremely detailed description of how to conduct peer review effectively is presented in the paper How I Review an Original Scientific Article written by Frederic G. Hoppin, Jr. It can be accessed through the American Physiological Society website under the Peer Review Resources section.

CRITICISM OF PEER REVIEW

A major criticism of peer review is that there is little evidence that the process actually works, that it is actually an effective screen for good quality scientific work, and that it actually improves the quality of scientific literature. As a 2002 study published in the Journal of the American Medical Association concluded, ‘Editorial peer review, although widely used, is largely untested and its effects are uncertain’ ( 25 ). Critics also argue that peer review is not effective at detecting errors. Highlighting this point, an experiment by Godlee et al. published in the British Medical Journal (BMJ) inserted eight deliberate errors into a paper that was nearly ready for publication, and then sent the paper to 420 potential reviewers ( 7 ). Of the 420 reviewers that received the paper, 221 (53%) responded, the average number of errors spotted by reviewers was two, no reviewer spotted more than five errors, and 35 reviewers (16%) did not spot any.

Another criticism of peer review is that the process is not conducted thoroughly by scientific conferences with the goal of obtaining large numbers of submitted papers. Such conferences often accept any paper sent in, regardless of its credibility or the prevalence of errors, because the more papers they accept, the more money they can make from author registration fees ( 26 ). This misconduct was exposed in 2014 by three MIT graduate students by the names of Jeremy Stribling, Dan Aguayo and Maxwell Krohn, who developed a simple computer program called SCIgen that generates nonsense papers and presents them as scientific papers ( 26 ). Subsequently, a nonsense SCIgen paper submitted to a conference was promptly accepted. Nature recently reported that French researcher Cyril Labbé discovered that sixteen SCIgen nonsense papers had been used by the German academic publisher Springer ( 26 ). Over 100 nonsense papers generated by SCIgen were published by the US Institute of Electrical and Electronic Engineers (IEEE) ( 26 ). Both organisations have been working to remove the papers. Labbé developed a program to detect SCIgen papers and has made it freely available to ensure publishers and conference organizers do not accept nonsense work in the future. It is available at this link: http://scigendetect.on.imag.fr/main.php ( 26 ).

Additionally, peer review is often criticized for being unable to accurately detect plagiarism. However, many believe that detecting plagiarism cannot practically be included as a component of peer review. As explained by Alice Tuff, development manager at Sense About Science, ‘The vast majority of authors and reviewers think peer review should detect plagiarism (81%) but only a minority (38%) think it is capable. The academic time involved in detecting plagiarism through peer review would cause the system to grind to a halt’ ( 27 ). Publishing house Elsevier began developing electronic plagiarism tools with the help of journal editors in 2009 to help improve this issue ( 27 ).

It has also been argued that peer review has lowered research quality by limiting creativity amongst researchers. Proponents of this view claim that peer review has repressed scientists from pursuing innovative research ideas and bold research questions that have the potential to make major advances and paradigm shifts in the field, as they believe that this work will likely be rejected by their peers upon review ( 28 ). Indeed, in some cases peer review may result in rejection of innovative research, as some studies may not seem particularly strong initially, yet may be capable of yielding very interesting and useful developments when examined under different circumstances, or in the light of new information ( 28 ). Scientists that do not believe in peer review argue that the process stifles the development of ingenious ideas, and thus the release of fresh knowledge and new developments into the scientific community.

Another issue that peer review is criticized for, is that there are a limited number of people that are competent to conduct peer review compared to the vast number of papers that need reviewing. An enormous number of papers published (1.3 million papers in 23,750 journals in 2006), but the number of competent peer reviewers available could not have reviewed them all ( 29 ). Thus, people who lack the required expertise to analyze the quality of a research paper are conducting reviews, and weak papers are being accepted as a result. It is now possible to publish any paper in an obscure journal that claims to be peer-reviewed, though the paper or journal itself could be substandard ( 29 ). On a similar note, the US National Library of Medicine indexes 39 journals that specialize in alternative medicine, and though they all identify themselves as “peer-reviewed”, they rarely publish any high quality research ( 29 ). This highlights the fact that peer review of more controversial or specialized work is typically performed by people who are interested and hold similar views or opinions as the author, which can cause bias in their review. For instance, a paper on homeopathy is likely to be reviewed by fellow practicing homeopaths, and thus is likely to be accepted as credible, though other scientists may find the paper to be nonsense ( 29 ). In some cases, papers are initially published, but their credibility is challenged at a later date and they are subsequently retracted. Retraction Watch is a website dedicated to revealing papers that have been retracted after publishing, potentially due to improper peer review ( 30 ).

Additionally, despite its many positive outcomes, peer review is also criticized for being a delay to the dissemination of new knowledge into the scientific community, and as an unpaid-activity that takes scientists’ time away from activities that they would otherwise prioritize, such as research and teaching, for which they are paid ( 31 ). As described by Eva Amsen, Outreach Director for F1000Research, peer review was originally developed as a means of helping editors choose which papers to publish when journals had to limit the number of papers they could print in one issue ( 32 ). However, nowadays most journals are available online, either exclusively or in addition to print, and many journals have very limited printing runs ( 32 ). Since there are no longer page limits to journals, any good work can and should be published. Consequently, being selective for the purpose of saving space in a journal is no longer a valid excuse that peer reviewers can use to reject a paper ( 32 ). However, some reviewers have used this excuse when they have personal ulterior motives, such as getting their own research published first.

RECENT INITIATIVES TOWARDS IMPROVING PEER REVIEW

F1000Research was launched in January 2013 by Faculty of 1000 as an open access journal that immediately publishes papers (after an initial check to ensure that the paper is in fact produced by a scientist and has not been plagiarised), and then conducts transparent post-publication peer review ( 32 ). F1000Research aims to prevent delays in new science reaching the academic community that are caused by prolonged publication times ( 32 ). It also aims to make peer reviewing more fair by eliminating any anonymity, which prevents reviewers from delaying the completion of a review so they can publish their own similar work first ( 32 ). F1000Research offers completely open peer review, where everything is published, including the name of the reviewers, their review reports, and the editorial decision letters ( 32 ).

PeerJ was founded by Jason Hoyt and Peter Binfield in June 2012 as an open access, peer reviewed scholarly journal for the Biological and Medical Sciences ( 33 ). PeerJ selects articles to publish based only on scientific and methodological soundness, not on subjective determinants of ‘impact ’, ‘novelty’ or ‘interest’ ( 34 ). It works on a “lifetime publishing plan” model which charges scientists for publishing plans that give them lifetime rights to publish with PeerJ, rather than charging them per publication ( 34 ). PeerJ also encourages open peer review, and authors are given the option to post the full peer review history of their submission with their published article ( 34 ). PeerJ also offers a pre-print review service called PeerJ Pre-prints, in which paper drafts are reviewed before being sent to PeerJ to publish ( 34 ).

Rubriq is an independent peer review service designed by Shashi Mudunuri and Keith Collier to improve the peer review system ( 35 ). Rubriq is intended to decrease redundancy in the peer review process so that the time lost in redundant reviewing can be put back into research ( 35 ). According to Keith Collier, over 15 million hours are lost each year to redundant peer review, as papers get rejected from one journal and are subsequently submitted to a less prestigious journal where they are reviewed again ( 35 ). Authors often have to submit their manuscript to multiple journals, and are often rejected multiple times before they find the right match. This process could take months or even years ( 35 ). Rubriq makes peer review portable in order to help authors choose the journal that is best suited for their manuscript from the beginning, thus reducing the time before their paper is published ( 35 ). Rubriq operates under an author-pay model, in which the author pays a fee and their manuscript undergoes double-blind peer review by three expert academic reviewers using a standardized scorecard ( 35 ). The majority of the author’s fee goes towards a reviewer honorarium ( 35 ). The papers are also screened for plagiarism using iThenticate ( 35 ). Once the manuscript has been reviewed by the three experts, the most appropriate journal for submission is determined based on the topic and quality of the paper ( 35 ). The paper is returned to the author in 1-2 weeks with the Rubriq Report ( 35 ). The author can then submit their paper to the suggested journal with the Rubriq Report attached. The Rubriq Report will give the journal editors a much stronger incentive to consider the paper as it shows that three experts have recommended the paper to them ( 35 ). Rubriq also has its benefits for reviewers; the Rubriq scorecard gives structure to the peer review process, and thus makes it consistent and efficient, which decreases time and stress for the reviewer. Reviewers also receive feedback on their reviews and most significantly, they are compensated for their time ( 35 ). Journals also benefit, as they receive pre-screened papers, reducing the number of papers sent to their own reviewers, which often end up rejected ( 35 ). This can reduce reviewer fatigue, and allow only higher-quality articles to be sent to their peer reviewers ( 35 ).

According to Eva Amsen, peer review and scientific publishing are moving in a new direction, in which all papers will be posted online, and a post-publication peer review will take place that is independent of specific journal criteria and solely focused on improving paper quality ( 32 ). Journals will then choose papers that they find relevant based on the peer reviews and publish those papers as a collection ( 32 ). In this process, peer review and individual journals are uncoupled ( 32 ). In Keith Collier’s opinion, post-publication peer review is likely to become more prevalent as a complement to pre-publication peer review, but not as a replacement ( 35 ). Post-publication peer review will not serve to identify errors and fraud but will provide an additional measurement of impact ( 35 ). Collier also believes that as journals and publishers consolidate into larger systems, there will be stronger potential for “cascading” and shared peer review ( 35 ).

CONCLUDING REMARKS

Peer review has become fundamental in assisting editors in selecting credible, high quality, novel and interesting research papers to publish in scientific journals and to ensure the correction of any errors or issues present in submitted papers. Though the peer review process still has some flaws and deficiencies, a more suitable screening method for scientific papers has not yet been proposed or developed. Researchers have begun and must continue to look for means of addressing the current issues with peer review to ensure that it is a full-proof system that ensures only quality research papers are released into the scientific community.

share this!

April 3, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

A periodic table of primes: Research team claims that prime numbers can be predicted

by Michael Gibb, City University of Hong Kong

Huge breakthrough in prime number theory— study from City University of Hong demonstrates primes can be predicted

Both arithmetic aficionados and the mathematically challenged will be equally captivated by new research that upends hundreds of years of popular belief about prime numbers.

Contrary to what just about every mathematician on Earth will tell you, prime numbers can be predicted, according to researchers at City University of Hong Kong (CityUHK) and North Carolina State University, U.S.

The research team comprises Han-Lin Li, Shu-Cherng Fang, and Way Kuo. Fang is the Walter Clark Chair Professor of Industrial and Systems Engineering at North Carolina State University. Kuo is a Senior Fellow at the Hong Kong Institute for Advanced Study, CityU.

This is a genuinely revolutionary development in prime number theory, says Way Kuo, who is working on the project alongside researchers from the U.S. The team leader is Han-Lin Li, a Visiting Professor in the Department of Computer Science at CityUHK.

We have known for millennia that an infinite number of prime numbers, i.e., 2, 3, 5, 7, 11, etc., can be divided by themselves and the number 1 only. But until now, we have not been able to predict where the next prime will pop up in a sequence of numbers. In fact, mathematicians have generally agreed that prime numbers are like weeds: they seem just to shoot out randomly.

"But our team has devised a way to predict accurately and swiftly when prime numbers will appear," adds Kuo.

The technical aspects of the research are daunting for all but a handful of mathematicians worldwide. In a nutshell, the outcome of the team's research is a handy periodic table of primes, or the PTP, pointing the locations of prime numbers. The research is available as a working paper in the SSRN Electronic Journal .

The PTP can be used to shed light on finding a future prime, factoring an integer, visualizing an integer and its factors, identifying locations of twin primes, predicting the total number of primes and twin primes or estimating the maximum prime gap within an interval, among others.

More to the point, the PTP has major applications today in areas such as cyber security. Primes are already a fundamental part of encryption and cryptography, so this breakthrough means data can be made much more secure if we can predict prime numbers, Kuo explains.

This advance in prime number research stemmed from working on systems reliability design and a color coding system that uses prime numbers to enable efficient encoding and more effective color compression. During their research, the team discovered that their calculations could be used to predict prime numbers .

Provided by City University of Hong Kong

Explore further

Feedback to editors

scientific paper or research

Finding new chemistry to capture double the carbon

32 minutes ago

scientific paper or research

Americans are bad at recognizing conspiracy theories when they believe they're true, says study

scientific paper or research

A total solar eclipse races across North America as clouds part along totality

scientific paper or research

New statistical-modeling workflow may help advance drug discovery and synthetic chemistry

scientific paper or research

Researchers develop better way to make painkiller from trees

scientific paper or research

Replacing plastics with alternatives is worse for greenhouse gas emissions in most cases, study finds

scientific paper or research

A targeted polymer to treat colorectal cancer liver metastases

scientific paper or research

When an antibiotic fails: Scientists are using AI to target 'sleeper' bacteria

scientific paper or research

Scientists discover new phage resistance mechanism in phage-bacterial arms race

scientific paper or research

Deep parts of Great Barrier Reef 'insulated' from global warming, for now

Relevant physicsforums posts, how to pick some numbers out of 13 integers, by a 4 digits code.

7 hours ago

Calculating the inverse of a function involving the error function

Apr 5, 2024

Reprise: Calculate the distance between two points without using a coordinate system

Apr 4, 2024

Non-sinusoidal waveform model

Apr 3, 2024

I've been trying to understand the proof for the binomial theorem

Formal definition of multiplication for real and complex numbers.

Mar 29, 2024

More from General Math

Related Stories

scientific paper or research

Why prime numbers still fascinate mathematicians, 2,300 years later

Apr 3, 2018

scientific paper or research

A newly discovered prime number makes its debut

Jan 31, 2018

scientific paper or research

Why do we need to know about prime numbers with millions of digits?

Jan 12, 2018

scientific paper or research

GIMPS project discovers largest known prime number

Jan 4, 2018

scientific paper or research

Quantum simulator offers faster route for prime factorization

Apr 10, 2018

scientific paper or research

Surprising hidden order unites prime numbers and crystal-like materials

Sep 6, 2018

Recommended for you

scientific paper or research

'I had such fun!', says winner of top math prize

Mar 20, 2024

scientific paper or research

Ice-ray patterns: A rediscovery of past design for the future

scientific paper or research

Paper offers a mathematical approach to modeling a random walker moving across a random landscape

Mar 13, 2024

scientific paper or research

How do neural networks learn? A mathematical formula explains how they detect relevant patterns

Mar 12, 2024

scientific paper or research

Mathematicians prove Pólya's conjecture for the eigenvalues of a disk, a 70-year-old math problem

Mar 1, 2024

scientific paper or research

Pythagoras was wrong: There are no universal musical harmonies, study finds

Feb 27, 2024

Let us know if there is a problem with our content

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 26 March 2024

Predicting and improving complex beer flavor through machine learning

  • Michiel Schreurs   ORCID: orcid.org/0000-0002-9449-5619 1 , 2 , 3   na1 ,
  • Supinya Piampongsant 1 , 2 , 3   na1 ,
  • Miguel Roncoroni   ORCID: orcid.org/0000-0001-7461-1427 1 , 2 , 3   na1 ,
  • Lloyd Cool   ORCID: orcid.org/0000-0001-9936-3124 1 , 2 , 3 , 4 ,
  • Beatriz Herrera-Malaver   ORCID: orcid.org/0000-0002-5096-9974 1 , 2 , 3 ,
  • Christophe Vanderaa   ORCID: orcid.org/0000-0001-7443-5427 4 ,
  • Florian A. Theßeling 1 , 2 , 3 ,
  • Łukasz Kreft   ORCID: orcid.org/0000-0001-7620-4657 5 ,
  • Alexander Botzki   ORCID: orcid.org/0000-0001-6691-4233 5 ,
  • Philippe Malcorps 6 ,
  • Luk Daenen 6 ,
  • Tom Wenseleers   ORCID: orcid.org/0000-0002-1434-861X 4 &
  • Kevin J. Verstrepen   ORCID: orcid.org/0000-0002-3077-6219 1 , 2 , 3  

Nature Communications volume  15 , Article number:  2368 ( 2024 ) Cite this article

50k Accesses

851 Altmetric

Metrics details

  • Chemical engineering
  • Gas chromatography
  • Machine learning
  • Metabolomics
  • Taste receptors

The perception and appreciation of food flavor depends on many interacting chemical compounds and external factors, and therefore proves challenging to understand and predict. Here, we combine extensive chemical and sensory analyses of 250 different beers to train machine learning models that allow predicting flavor and consumer appreciation. For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 different machine learning models. The best-performing algorithm, Gradient Boosting, yields models that significantly outperform predictions based on conventional statistics and accurately predict complex food features and consumer appreciation from chemical profiles. Model dissection allows identifying specific and unexpected compounds as drivers of beer flavor and appreciation. Adding these compounds results in variants of commercial alcoholic and non-alcoholic beers with improved consumer appreciation. Together, our study reveals how big data and machine learning uncover complex links between food chemistry, flavor and consumer perception, and lays the foundation to develop novel, tailored foods with superior flavors.

Similar content being viewed by others

scientific paper or research

BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules

Rudraksh Tuwani, Somin Wadhwa & Ganesh Bagler

scientific paper or research

Sensory lexicon and aroma volatiles analysis of brewing malt

Xiaoxia Su, Miao Yu, … Tianyi Du

scientific paper or research

Predicting odor from molecular structure: a multi-label classification approach

Kushagra Saini & Venkatnarayan Ramanathan

Introduction

Predicting and understanding food perception and appreciation is one of the major challenges in food science. Accurate modeling of food flavor and appreciation could yield important opportunities for both producers and consumers, including quality control, product fingerprinting, counterfeit detection, spoilage detection, and the development of new products and product combinations (food pairing) 1 , 2 , 3 , 4 , 5 , 6 . Accurate models for flavor and consumer appreciation would contribute greatly to our scientific understanding of how humans perceive and appreciate flavor. Moreover, accurate predictive models would also facilitate and standardize existing food assessment methods and could supplement or replace assessments by trained and consumer tasting panels, which are variable, expensive and time-consuming 7 , 8 , 9 . Lastly, apart from providing objective, quantitative, accurate and contextual information that can help producers, models can also guide consumers in understanding their personal preferences 10 .

Despite the myriad of applications, predicting food flavor and appreciation from its chemical properties remains a largely elusive goal in sensory science, especially for complex food and beverages 11 , 12 . A key obstacle is the immense number of flavor-active chemicals underlying food flavor. Flavor compounds can vary widely in chemical structure and concentration, making them technically challenging and labor-intensive to quantify, even in the face of innovations in metabolomics, such as non-targeted metabolic fingerprinting 13 , 14 . Moreover, sensory analysis is perhaps even more complicated. Flavor perception is highly complex, resulting from hundreds of different molecules interacting at the physiochemical and sensorial level. Sensory perception is often non-linear, characterized by complex and concentration-dependent synergistic and antagonistic effects 15 , 16 , 17 , 18 , 19 , 20 , 21 that are further convoluted by the genetics, environment, culture and psychology of consumers 22 , 23 , 24 . Perceived flavor is therefore difficult to measure, with problems of sensitivity, accuracy, and reproducibility that can only be resolved by gathering sufficiently large datasets 25 . Trained tasting panels are considered the prime source of quality sensory data, but require meticulous training, are low throughput and high cost. Public databases containing consumer reviews of food products could provide a valuable alternative, especially for studying appreciation scores, which do not require formal training 25 . Public databases offer the advantage of amassing large amounts of data, increasing the statistical power to identify potential drivers of appreciation. However, public datasets suffer from biases, including a bias in the volunteers that contribute to the database, as well as confounding factors such as price, cult status and psychological conformity towards previous ratings of the product.

Classical multivariate statistics and machine learning methods have been used to predict flavor of specific compounds by, for example, linking structural properties of a compound to its potential biological activities or linking concentrations of specific compounds to sensory profiles 1 , 26 . Importantly, most previous studies focused on predicting organoleptic properties of single compounds (often based on their chemical structure) 27 , 28 , 29 , 30 , 31 , 32 , 33 , thus ignoring the fact that these compounds are present in a complex matrix in food or beverages and excluding complex interactions between compounds. Moreover, the classical statistics commonly used in sensory science 34 , 35 , 36 , 37 , 38 , 39 require a large sample size and sufficient variance amongst predictors to create accurate models. They are not fit for studying an extensive set of hundreds of interacting flavor compounds, since they are sensitive to outliers, have a high tendency to overfit and are less suited for non-linear and discontinuous relationships 40 .

In this study, we combine extensive chemical analyses and sensory data of a set of different commercial beers with machine learning approaches to develop models that predict taste, smell, mouthfeel and appreciation from compound concentrations. Beer is particularly suited to model the relationship between chemistry, flavor and appreciation. First, beer is a complex product, consisting of thousands of flavor compounds that partake in complex sensory interactions 41 , 42 , 43 . This chemical diversity arises from the raw materials (malt, yeast, hops, water and spices) and biochemical conversions during the brewing process (kilning, mashing, boiling, fermentation, maturation and aging) 44 , 45 . Second, the advent of the internet saw beer consumers embrace online review platforms, such as RateBeer (ZX Ventures, Anheuser-Busch InBev SA/NV) and BeerAdvocate (Next Glass, inc.). In this way, the beer community provides massive data sets of beer flavor and appreciation scores, creating extraordinarily large sensory databases to complement the analyses of our professional sensory panel. Specifically, we characterize over 200 chemical properties of 250 commercial beers, spread across 22 beer styles, and link these to the descriptive sensory profiling data of a 16-person in-house trained tasting panel and data acquired from over 180,000 public consumer reviews. These unique and extensive datasets enable us to train a suite of machine learning models to predict flavor and appreciation from a beer’s chemical profile. Dissection of the best-performing models allows us to pinpoint specific compounds as potential drivers of beer flavor and appreciation. Follow-up experiments confirm the importance of these compounds and ultimately allow us to significantly improve the flavor and appreciation of selected commercial beers. Together, our study represents a significant step towards understanding complex flavors and reinforces the value of machine learning to develop and refine complex foods. In this way, it represents a stepping stone for further computer-aided food engineering applications 46 .

To generate a comprehensive dataset on beer flavor, we selected 250 commercial Belgian beers across 22 different beer styles (Supplementary Fig.  S1 ). Beers with ≤ 4.2% alcohol by volume (ABV) were classified as non-alcoholic and low-alcoholic. Blonds and Tripels constitute a significant portion of the dataset (12.4% and 11.2%, respectively) reflecting their presence on the Belgian beer market and the heterogeneity of beers within these styles. By contrast, lager beers are less diverse and dominated by a handful of brands. Rare styles such as Brut or Faro make up only a small fraction of the dataset (2% and 1%, respectively) because fewer of these beers are produced and because they are dominated by distinct characteristics in terms of flavor and chemical composition.

Extensive analysis identifies relationships between chemical compounds in beer

For each beer, we measured 226 different chemical properties, including common brewing parameters such as alcohol content, iso-alpha acids, pH, sugar concentration 47 , and over 200 flavor compounds (Methods, Supplementary Table  S1 ). A large portion (37.2%) are terpenoids arising from hopping, responsible for herbal and fruity flavors 16 , 48 . A second major category are yeast metabolites, such as esters and alcohols, that result in fruity and solvent notes 48 , 49 , 50 . Other measured compounds are primarily derived from malt, or other microbes such as non- Saccharomyces yeasts and bacteria (‘wild flora’). Compounds that arise from spices or staling are labeled under ‘Others’. Five attributes (caloric value, total acids and total ester, hop aroma and sulfur compounds) are calculated from multiple individually measured compounds.

As a first step in identifying relationships between chemical properties, we determined correlations between the concentrations of the compounds (Fig.  1 , upper panel, Supplementary Data  1 and 2 , and Supplementary Fig.  S2 . For the sake of clarity, only a subset of the measured compounds is shown in Fig.  1 ). Compounds of the same origin typically show a positive correlation, while absence of correlation hints at parameters varying independently. For example, the hop aroma compounds citronellol, and alpha-terpineol show moderate correlations with each other (Spearman’s rho=0.39 and 0.57), but not with the bittering hop component iso-alpha acids (Spearman’s rho=0.16 and −0.07). This illustrates how brewers can independently modify hop aroma and bitterness by selecting hop varieties and dosage time. If hops are added early in the boiling phase, chemical conversions increase bitterness while aromas evaporate, conversely, late addition of hops preserves aroma but limits bitterness 51 . Similarly, hop-derived iso-alpha acids show a strong anti-correlation with lactic acid and acetic acid, likely reflecting growth inhibition of lactic acid and acetic acid bacteria, or the consequent use of fewer hops in sour beer styles, such as West Flanders ales and Fruit beers, that rely on these bacteria for their distinct flavors 52 . Finally, yeast-derived esters (ethyl acetate, ethyl decanoate, ethyl hexanoate, ethyl octanoate) and alcohols (ethanol, isoamyl alcohol, isobutanol, and glycerol), correlate with Spearman coefficients above 0.5, suggesting that these secondary metabolites are correlated with the yeast genetic background and/or fermentation parameters and may be difficult to influence individually, although the choice of yeast strain may offer some control 53 .

figure 1

Spearman rank correlations are shown. Descriptors are grouped according to their origin (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)), and sensory aspect (aroma, taste, palate, and overall appreciation). Please note that for the chemical compounds, for the sake of clarity, only a subset of the total number of measured compounds is shown, with an emphasis on the key compounds for each source. For more details, see the main text and Methods section. Chemical data can be found in Supplementary Data  1 , correlations between all chemical compounds are depicted in Supplementary Fig.  S2 and correlation values can be found in Supplementary Data  2 . See Supplementary Data  4 for sensory panel assessments and Supplementary Data  5 for correlation values between all sensory descriptors.

Interestingly, different beer styles show distinct patterns for some flavor compounds (Supplementary Fig.  S3 ). These observations agree with expectations for key beer styles, and serve as a control for our measurements. For instance, Stouts generally show high values for color (darker), while hoppy beers contain elevated levels of iso-alpha acids, compounds associated with bitter hop taste. Acetic and lactic acid are not prevalent in most beers, with notable exceptions such as Kriek, Lambic, Faro, West Flanders ales and Flanders Old Brown, which use acid-producing bacteria ( Lactobacillus and Pediococcus ) or unconventional yeast ( Brettanomyces ) 54 , 55 . Glycerol, ethanol and esters show similar distributions across all beer styles, reflecting their common origin as products of yeast metabolism during fermentation 45 , 53 . Finally, low/no-alcohol beers contain low concentrations of glycerol and esters. This is in line with the production process for most of the low/no-alcohol beers in our dataset, which are produced through limiting fermentation or by stripping away alcohol via evaporation or dialysis, with both methods having the unintended side-effect of reducing the amount of flavor compounds in the final beer 56 , 57 .

Besides expected associations, our data also reveals less trivial associations between beer styles and specific parameters. For example, geraniol and citronellol, two monoterpenoids responsible for citrus, floral and rose flavors and characteristic of Citra hops, are found in relatively high amounts in Christmas, Saison, and Brett/co-fermented beers, where they may originate from terpenoid-rich spices such as coriander seeds instead of hops 58 .

Tasting panel assessments reveal sensorial relationships in beer

To assess the sensory profile of each beer, a trained tasting panel evaluated each of the 250 beers for 50 sensory attributes, including different hop, malt and yeast flavors, off-flavors and spices. Panelists used a tasting sheet (Supplementary Data  3 ) to score the different attributes. Panel consistency was evaluated by repeating 12 samples across different sessions and performing ANOVA. In 95% of cases no significant difference was found across sessions ( p  > 0.05), indicating good panel consistency (Supplementary Table  S2 ).

Aroma and taste perception reported by the trained panel are often linked (Fig.  1 , bottom left panel and Supplementary Data  4 and 5 ), with high correlations between hops aroma and taste (Spearman’s rho=0.83). Bitter taste was found to correlate with hop aroma and taste in general (Spearman’s rho=0.80 and 0.69), and particularly with “grassy” noble hops (Spearman’s rho=0.75). Barnyard flavor, most often associated with sour beers, is identified together with stale hops (Spearman’s rho=0.97) that are used in these beers. Lactic and acetic acid, which often co-occur, are correlated (Spearman’s rho=0.66). Interestingly, sweetness and bitterness are anti-correlated (Spearman’s rho = −0.48), confirming the hypothesis that they mask each other 59 , 60 . Beer body is highly correlated with alcohol (Spearman’s rho = 0.79), and overall appreciation is found to correlate with multiple aspects that describe beer mouthfeel (alcohol, carbonation; Spearman’s rho= 0.32, 0.39), as well as with hop and ester aroma intensity (Spearman’s rho=0.39 and 0.35).

Similar to the chemical analyses, sensorial analyses confirmed typical features of specific beer styles (Supplementary Fig.  S4 ). For example, sour beers (Faro, Flanders Old Brown, Fruit beer, Kriek, Lambic, West Flanders ale) were rated acidic, with flavors of both acetic and lactic acid. Hoppy beers were found to be bitter and showed hop-associated aromas like citrus and tropical fruit. Malt taste is most detected among scotch, stout/porters, and strong ales, while low/no-alcohol beers, which often have a reputation for being ‘worty’ (reminiscent of unfermented, sweet malt extract) appear in the middle. Unsurprisingly, hop aromas are most strongly detected among hoppy beers. Like its chemical counterpart (Supplementary Fig.  S3 ), acidity shows a right-skewed distribution, with the most acidic beers being Krieks, Lambics, and West Flanders ales.

Tasting panel assessments of specific flavors correlate with chemical composition

We find that the concentrations of several chemical compounds strongly correlate with specific aroma or taste, as evaluated by the tasting panel (Fig.  2 , Supplementary Fig.  S5 , Supplementary Data  6 ). In some cases, these correlations confirm expectations and serve as a useful control for data quality. For example, iso-alpha acids, the bittering compounds in hops, strongly correlate with bitterness (Spearman’s rho=0.68), while ethanol and glycerol correlate with tasters’ perceptions of alcohol and body, the mouthfeel sensation of fullness (Spearman’s rho=0.82/0.62 and 0.72/0.57 respectively) and darker color from roasted malts is a good indication of malt perception (Spearman’s rho=0.54).

figure 2

Heatmap colors indicate Spearman’s Rho. Axes are organized according to sensory categories (aroma, taste, mouthfeel, overall), chemical categories and chemical sources in beer (malt (blue), hops (green), yeast (red), wild flora (yellow), Others (black)). See Supplementary Data  6 for all correlation values.

Interestingly, for some relationships between chemical compounds and perceived flavor, correlations are weaker than expected. For example, the rose-smelling phenethyl acetate only weakly correlates with floral aroma. This hints at more complex relationships and interactions between compounds and suggests a need for a more complex model than simple correlations. Lastly, we uncovered unexpected correlations. For instance, the esters ethyl decanoate and ethyl octanoate appear to correlate slightly with hop perception and bitterness, possibly due to their fruity flavor. Iron is anti-correlated with hop aromas and bitterness, most likely because it is also anti-correlated with iso-alpha acids. This could be a sign of metal chelation of hop acids 61 , given that our analyses measure unbound hop acids and total iron content, or could result from the higher iron content in dark and Fruit beers, which typically have less hoppy and bitter flavors 62 .

Public consumer reviews complement expert panel data

To complement and expand the sensory data of our trained tasting panel, we collected 180,000 reviews of our 250 beers from the online consumer review platform RateBeer. This provided numerical scores for beer appearance, aroma, taste, palate, overall quality as well as the average overall score.

Public datasets are known to suffer from biases, such as price, cult status and psychological conformity towards previous ratings of a product. For example, prices correlate with appreciation scores for these online consumer reviews (rho=0.49, Supplementary Fig.  S6 ), but not for our trained tasting panel (rho=0.19). This suggests that prices affect consumer appreciation, which has been reported in wine 63 , while blind tastings are unaffected. Moreover, we observe that some beer styles, like lagers and non-alcoholic beers, generally receive lower scores, reflecting that online reviewers are mostly beer aficionados with a preference for specialty beers over lager beers. In general, we find a modest correlation between our trained panel’s overall appreciation score and the online consumer appreciation scores (Fig.  3 , rho=0.29). Apart from the aforementioned biases in the online datasets, serving temperature, sample freshness and surroundings, which are all tightly controlled during the tasting panel sessions, can vary tremendously across online consumers and can further contribute to (among others, appreciation) differences between the two categories of tasters. Importantly, in contrast to the overall appreciation scores, for many sensory aspects the results from the professional panel correlated well with results obtained from RateBeer reviews. Correlations were highest for features that are relatively easy to recognize even for untrained tasters, like bitterness, sweetness, alcohol and malt aroma (Fig.  3 and below).

figure 3

RateBeer text mining results can be found in Supplementary Data  7 . Rho values shown are Spearman correlation values, with asterisks indicating significant correlations ( p  < 0.05, two-sided). All p values were smaller than 0.001, except for Esters aroma (0.0553), Esters taste (0.3275), Esters aroma—banana (0.0019), Coriander (0.0508) and Diacetyl (0.0134).

Besides collecting consumer appreciation from these online reviews, we developed automated text analysis tools to gather additional data from review texts (Supplementary Data  7 ). Processing review texts on the RateBeer database yielded comparable results to the scores given by the trained panel for many common sensory aspects, including acidity, bitterness, sweetness, alcohol, malt, and hop tastes (Fig.  3 ). This is in line with what would be expected, since these attributes require less training for accurate assessment and are less influenced by environmental factors such as temperature, serving glass and odors in the environment. Consumer reviews also correlate well with our trained panel for 4-vinyl guaiacol, a compound associated with a very characteristic aroma. By contrast, correlations for more specific aromas like ester, coriander or diacetyl are underrepresented in the online reviews, underscoring the importance of using a trained tasting panel and standardized tasting sheets with explicit factors to be scored for evaluating specific aspects of a beer. Taken together, our results suggest that public reviews are trustworthy for some, but not all, flavor features and can complement or substitute taste panel data for these sensory aspects.

Models can predict beer sensory profiles from chemical data

The rich datasets of chemical analyses, tasting panel assessments and public reviews gathered in the first part of this study provided us with a unique opportunity to develop predictive models that link chemical data to sensorial features. Given the complexity of beer flavor, basic statistical tools such as correlations or linear regression may not always be the most suitable for making accurate predictions. Instead, we applied different machine learning models that can model both simple linear and complex interactive relationships. Specifically, we constructed a set of regression models to predict (a) trained panel scores for beer flavor and quality and (b) public reviews’ appreciation scores from beer chemical profiles. We trained and tested 10 different models (Methods), 3 linear regression-based models (simple linear regression with first-order interactions (LR), lasso regression with first-order interactions (Lasso), partial least squares regressor (PLSR)), 5 decision tree models (AdaBoost regressor (ABR), extra trees (ET), gradient boosting regressor (GBR), random forest (RF) and XGBoost regressor (XGBR)), 1 support vector regression (SVR), and 1 artificial neural network (ANN) model.

To compare the performance of our machine learning models, the dataset was randomly split into a training and test set, stratified by beer style. After a model was trained on data in the training set, its performance was evaluated on its ability to predict the test dataset obtained from multi-output models (based on the coefficient of determination, see Methods). Additionally, individual-attribute models were ranked per descriptor and the average rank was calculated, as proposed by Korneva et al. 64 . Importantly, both ways of evaluating the models’ performance agreed in general. Performance of the different models varied (Table  1 ). It should be noted that all models perform better at predicting RateBeer results than results from our trained tasting panel. One reason could be that sensory data is inherently variable, and this variability is averaged out with the large number of public reviews from RateBeer. Additionally, all tree-based models perform better at predicting taste than aroma. Linear models (LR) performed particularly poorly, with negative R 2 values, due to severe overfitting (training set R 2  = 1). Overfitting is a common issue in linear models with many parameters and limited samples, especially with interaction terms further amplifying the number of parameters. L1 regularization (Lasso) successfully overcomes this overfitting, out-competing multiple tree-based models on the RateBeer dataset. Similarly, the dimensionality reduction of PLSR avoids overfitting and improves performance, to some extent. Still, tree-based models (ABR, ET, GBR, RF and XGBR) show the best performance, out-competing the linear models (LR, Lasso, PLSR) commonly used in sensory science 65 .

GBR models showed the best overall performance in predicting sensory responses from chemical information, with R 2 values up to 0.75 depending on the predicted sensory feature (Supplementary Table  S4 ). The GBR models predict consumer appreciation (RateBeer) better than our trained panel’s appreciation (R 2 value of 0.67 compared to R 2 value of 0.09) (Supplementary Table  S3 and Supplementary Table  S4 ). ANN models showed intermediate performance, likely because neural networks typically perform best with larger datasets 66 . The SVR shows intermediate performance, mostly due to the weak predictions of specific attributes that lower the overall performance (Supplementary Table  S4 ).

Model dissection identifies specific, unexpected compounds as drivers of consumer appreciation

Next, we leveraged our models to infer important contributors to sensory perception and consumer appreciation. Consumer preference is a crucial sensory aspects, because a product that shows low consumer appreciation scores often does not succeed commercially 25 . Additionally, the requirement for a large number of representative evaluators makes consumer trials one of the more costly and time-consuming aspects of product development. Hence, a model for predicting chemical drivers of overall appreciation would be a welcome addition to the available toolbox for food development and optimization.

Since GBR models on our RateBeer dataset showed the best overall performance, we focused on these models. Specifically, we used two approaches to identify important contributors. First, rankings of the most important predictors for each sensorial trait in the GBR models were obtained based on impurity-based feature importance (mean decrease in impurity). High-ranked parameters were hypothesized to be either the true causal chemical properties underlying the trait, to correlate with the actual causal properties, or to take part in sensory interactions affecting the trait 67 (Fig.  4A ). In a second approach, we used SHAP 68 to determine which parameters contributed most to the model for making predictions of consumer appreciation (Fig.  4B ). SHAP calculates parameter contributions to model predictions on a per-sample basis, which can be aggregated into an importance score.

figure 4

A The impurity-based feature importance (mean deviance in impurity, MDI) calculated from the Gradient Boosting Regression (GBR) model predicting RateBeer appreciation scores. The top 15 highest ranked chemical properties are shown. B SHAP summary plot for the top 15 parameters contributing to our GBR model. Each point on the graph represents a sample from our dataset. The color represents the concentration of that parameter, with bluer colors representing low values and redder colors representing higher values. Greater absolute values on the horizontal axis indicate a higher impact of the parameter on the prediction of the model. C Spearman correlations between the 15 most important chemical properties and consumer overall appreciation. Numbers indicate the Spearman Rho correlation coefficient, and the rank of this correlation compared to all other correlations. The top 15 important compounds were determined using SHAP (panel B).

Both approaches identified ethyl acetate as the most predictive parameter for beer appreciation (Fig.  4 ). Ethyl acetate is the most abundant ester in beer with a typical ‘fruity’, ‘solvent’ and ‘alcoholic’ flavor, but is often considered less important than other esters like isoamyl acetate. The second most important parameter identified by SHAP is ethanol, the most abundant beer compound after water. Apart from directly contributing to beer flavor and mouthfeel, ethanol drastically influences the physical properties of beer, dictating how easily volatile compounds escape the beer matrix to contribute to beer aroma 69 . Importantly, it should also be noted that the importance of ethanol for appreciation is likely inflated by the very low appreciation scores of non-alcoholic beers (Supplementary Fig.  S4 ). Despite not often being considered a driver of beer appreciation, protein level also ranks highly in both approaches, possibly due to its effect on mouthfeel and body 70 . Lactic acid, which contributes to the tart taste of sour beers, is the fourth most important parameter identified by SHAP, possibly due to the generally high appreciation of sour beers in our dataset.

Interestingly, some of the most important predictive parameters for our model are not well-established as beer flavors or are even commonly regarded as being negative for beer quality. For example, our models identify methanethiol and ethyl phenyl acetate, an ester commonly linked to beer staling 71 , as a key factor contributing to beer appreciation. Although there is no doubt that high concentrations of these compounds are considered unpleasant, the positive effects of modest concentrations are not yet known 72 , 73 .

To compare our approach to conventional statistics, we evaluated how well the 15 most important SHAP-derived parameters correlate with consumer appreciation (Fig.  4C ). Interestingly, only 6 of the properties derived by SHAP rank amongst the top 15 most correlated parameters. For some chemical compounds, the correlations are so low that they would have likely been considered unimportant. For example, lactic acid, the fourth most important parameter, shows a bimodal distribution for appreciation, with sour beers forming a separate cluster, that is missed entirely by the Spearman correlation. Additionally, the correlation plots reveal outliers, emphasizing the need for robust analysis tools. Together, this highlights the need for alternative models, like the Gradient Boosting model, that better grasp the complexity of (beer) flavor.

Finally, to observe the relationships between these chemical properties and their predicted targets, partial dependence plots were constructed for the six most important predictors of consumer appreciation 74 , 75 , 76 (Supplementary Fig.  S7 ). One-way partial dependence plots show how a change in concentration affects the predicted appreciation. These plots reveal an important limitation of our models: appreciation predictions remain constant at ever-increasing concentrations. This implies that once a threshold concentration is reached, further increasing the concentration does not affect appreciation. This is false, as it is well-documented that certain compounds become unpleasant at high concentrations, including ethyl acetate (‘nail polish’) 77 and methanethiol (‘sulfury’ and ‘rotten cabbage’) 78 . The inability of our models to grasp that flavor compounds have optimal levels, above which they become negative, is a consequence of working with commercial beer brands where (off-)flavors are rarely too high to negatively impact the product. The two-way partial dependence plots show how changing the concentration of two compounds influences predicted appreciation, visualizing their interactions (Supplementary Fig.  S7 ). In our case, the top 5 parameters are dominated by additive or synergistic interactions, with high concentrations for both compounds resulting in the highest predicted appreciation.

To assess the robustness of our best-performing models and model predictions, we performed 100 iterations of the GBR, RF and ET models. In general, all iterations of the models yielded similar performance (Supplementary Fig.  S8 ). Moreover, the main predictors (including the top predictors ethanol and ethyl acetate) remained virtually the same, especially for GBR and RF. For the iterations of the ET model, we did observe more variation in the top predictors, which is likely a consequence of the model’s inherent random architecture in combination with co-correlations between certain predictors. However, even in this case, several of the top predictors (ethanol and ethyl acetate) remain unchanged, although their rank in importance changes (Supplementary Fig.  S8 ).

Next, we investigated if a combination of RateBeer and trained panel data into one consolidated dataset would lead to stronger models, under the hypothesis that such a model would suffer less from bias in the datasets. A GBR model was trained to predict appreciation on the combined dataset. This model underperformed compared to the RateBeer model, both in the native case and when including a dataset identifier (R 2  = 0.67, 0.26 and 0.42 respectively). For the latter, the dataset identifier is the most important feature (Supplementary Fig.  S9 ), while most of the feature importance remains unchanged, with ethyl acetate and ethanol ranking highest, like in the original model trained only on RateBeer data. It seems that the large variation in the panel dataset introduces noise, weakening the models’ performances and reliability. In addition, it seems reasonable to assume that both datasets are fundamentally different, with the panel dataset obtained by blind tastings by a trained professional panel.

Lastly, we evaluated whether beer style identifiers would further enhance the model’s performance. A GBR model was trained with parameters that explicitly encoded the styles of the samples. This did not improve model performance (R2 = 0.66 with style information vs R2 = 0.67). The most important chemical features are consistent with the model trained without style information (eg. ethanol and ethyl acetate), and with the exception of the most preferred (strong ale) and least preferred (low/no-alcohol) styles, none of the styles were among the most important features (Supplementary Fig.  S9 , Supplementary Table  S5 and S6 ). This is likely due to a combination of style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original models, as well as the low number of samples belonging to some styles, making it difficult for the model to learn style-specific patterns. Moreover, beer styles are not rigorously defined, with some styles overlapping in features and some beers being misattributed to a specific style, all of which leads to more noise in models that use style parameters.

Model validation

To test if our predictive models give insight into beer appreciation, we set up experiments aimed at improving existing commercial beers. We specifically selected overall appreciation as the trait to be examined because of its complexity and commercial relevance. Beer flavor comprises a complex bouquet rather than single aromas and tastes 53 . Hence, adding a single compound to the extent that a difference is noticeable may lead to an unbalanced, artificial flavor. Therefore, we evaluated the effect of combinations of compounds. Because Blond beers represent the most extensive style in our dataset, we selected a beer from this style as the starting material for these experiments (Beer 64 in Supplementary Data  1 ).

In the first set of experiments, we adjusted the concentrations of compounds that made up the most important predictors of overall appreciation (ethyl acetate, ethanol, lactic acid, ethyl phenyl acetate) together with correlated compounds (ethyl hexanoate, isoamyl acetate, glycerol), bringing them up to 95 th percentile ethanol-normalized concentrations (Methods) within the Blond group (‘Spiked’ concentration in Fig.  5A ). Compared to controls, the spiked beers were found to have significantly improved overall appreciation among trained panelists, with panelist noting increased intensity of ester flavors, sweetness, alcohol, and body fullness (Fig.  5B ). To disentangle the contribution of ethanol to these results, a second experiment was performed without the addition of ethanol. This resulted in a similar outcome, including increased perception of alcohol and overall appreciation.

figure 5

Adding the top chemical compounds, identified as best predictors of appreciation by our model, into poorly appreciated beers results in increased appreciation from our trained panel. Results of sensory tests between base beers and those spiked with compounds identified as the best predictors by the model. A Blond and Non/Low-alcohol (0.0% ABV) base beers were brought up to 95th-percentile ethanol-normalized concentrations within each style. B For each sensory attribute, tasters indicated the more intense sample and selected the sample they preferred. The numbers above the bars correspond to the p values that indicate significant changes in perceived flavor (two-sided binomial test: alpha 0.05, n  = 20 or 13).

In a last experiment, we tested whether using the model’s predictions can boost the appreciation of a non-alcoholic beer (beer 223 in Supplementary Data  1 ). Again, the addition of a mixture of predicted compounds (omitting ethanol, in this case) resulted in a significant increase in appreciation, body, ester flavor and sweetness.

Predicting flavor and consumer appreciation from chemical composition is one of the ultimate goals of sensory science. A reliable, systematic and unbiased way to link chemical profiles to flavor and food appreciation would be a significant asset to the food and beverage industry. Such tools would substantially aid in quality control and recipe development, offer an efficient and cost-effective alternative to pilot studies and consumer trials and would ultimately allow food manufacturers to produce superior, tailor-made products that better meet the demands of specific consumer groups more efficiently.

A limited set of studies have previously tried, to varying degrees of success, to predict beer flavor and beer popularity based on (a limited set of) chemical compounds and flavors 79 , 80 . Current sensitive, high-throughput technologies allow measuring an unprecedented number of chemical compounds and properties in a large set of samples, yielding a dataset that can train models that help close the gaps between chemistry and flavor, even for a complex natural product like beer. To our knowledge, no previous research gathered data at this scale (250 samples, 226 chemical parameters, 50 sensory attributes and 5 consumer scores) to disentangle and validate the chemical aspects driving beer preference using various machine-learning techniques. We find that modern machine learning models outperform conventional statistical tools, such as correlations and linear models, and can successfully predict flavor appreciation from chemical composition. This could be attributed to the natural incorporation of interactions and non-linear or discontinuous effects in machine learning models, which are not easily grasped by the linear model architecture. While linear models and partial least squares regression represent the most widespread statistical approaches in sensory science, in part because they allow interpretation 65 , 81 , 82 , modern machine learning methods allow for building better predictive models while preserving the possibility to dissect and exploit the underlying patterns. Of the 10 different models we trained, tree-based models, such as our best performing GBR, showed the best overall performance in predicting sensory responses from chemical information, outcompeting artificial neural networks. This agrees with previous reports for models trained on tabular data 83 . Our results are in line with the findings of Colantonio et al. who also identified the gradient boosting architecture as performing best at predicting appreciation and flavor (of tomatoes and blueberries, in their specific study) 26 . Importantly, besides our larger experimental scale, we were able to directly confirm our models’ predictions in vivo.

Our study confirms that flavor compound concentration does not always correlate with perception, suggesting complex interactions that are often missed by more conventional statistics and simple models. Specifically, we find that tree-based algorithms may perform best in developing models that link complex food chemistry with aroma. Furthermore, we show that massive datasets of untrained consumer reviews provide a valuable source of data, that can complement or even replace trained tasting panels, especially for appreciation and basic flavors, such as sweetness and bitterness. This holds despite biases that are known to occur in such datasets, such as price or conformity bias. Moreover, GBR models predict taste better than aroma. This is likely because taste (e.g. bitterness) often directly relates to the corresponding chemical measurements (e.g., iso-alpha acids), whereas such a link is less clear for aromas, which often result from the interplay between multiple volatile compounds. We also find that our models are best at predicting acidity and alcohol, likely because there is a direct relation between the measured chemical compounds (acids and ethanol) and the corresponding perceived sensorial attribute (acidity and alcohol), and because even untrained consumers are generally able to recognize these flavors and aromas.

The predictions of our final models, trained on review data, hold even for blind tastings with small groups of trained tasters, as demonstrated by our ability to validate specific compounds as drivers of beer flavor and appreciation. Since adding a single compound to the extent of a noticeable difference may result in an unbalanced flavor profile, we specifically tested our identified key drivers as a combination of compounds. While this approach does not allow us to validate if a particular single compound would affect flavor and/or appreciation, our experiments do show that this combination of compounds increases consumer appreciation.

It is important to stress that, while it represents an important step forward, our approach still has several major limitations. A key weakness of the GBR model architecture is that amongst co-correlating variables, the largest main effect is consistently preferred for model building. As a result, co-correlating variables often have artificially low importance scores, both for impurity and SHAP-based methods, like we observed in the comparison to the more randomized Extra Trees models. This implies that chemicals identified as key drivers of a specific sensory feature by GBR might not be the true causative compounds, but rather co-correlate with the actual causative chemical. For example, the high importance of ethyl acetate could be (partially) attributed to the total ester content, ethanol or ethyl hexanoate (rho=0.77, rho=0.72 and rho=0.68), while ethyl phenylacetate could hide the importance of prenyl isobutyrate and ethyl benzoate (rho=0.77 and rho=0.76). Expanding our GBR model to include beer style as a parameter did not yield additional power or insight. This is likely due to style-specific chemical signatures, such as iso-alpha acids and lactic acid, that implicitly convey style information to the original model, as well as the smaller sample size per style, limiting the power to uncover style-specific patterns. This can be partly attributed to the curse of dimensionality, where the high number of parameters results in the models mainly incorporating single parameter effects, rather than complex interactions such as style-dependent effects 67 . A larger number of samples may overcome some of these limitations and offer more insight into style-specific effects. On the other hand, beer style is not a rigid scientific classification, and beers within one style often differ a lot, which further complicates the analysis of style as a model factor.

Our study is limited to beers from Belgian breweries. Although these beers cover a large portion of the beer styles available globally, some beer styles and consumer patterns may be missing, while other features might be overrepresented. For example, many Belgian ales exhibit yeast-driven flavor profiles, which is reflected in the chemical drivers of appreciation discovered by this study. In future work, expanding the scope to include diverse markets and beer styles could lead to the identification of even more drivers of appreciation and better models for special niche products that were not present in our beer set.

In addition to inherent limitations of GBR models, there are also some limitations associated with studying food aroma. Even if our chemical analyses measured most of the known aroma compounds, the total number of flavor compounds in complex foods like beer is still larger than the subset we were able to measure in this study. For example, hop-derived thiols, that influence flavor at very low concentrations, are notoriously difficult to measure in a high-throughput experiment. Moreover, consumer perception remains subjective and prone to biases that are difficult to avoid. It is also important to stress that the models are still immature and that more extensive datasets will be crucial for developing more complete models in the future. Besides more samples and parameters, our dataset does not include any demographic information about the tasters. Including such data could lead to better models that grasp external factors like age and culture. Another limitation is that our set of beers consists of high-quality end-products and lacks beers that are unfit for sale, which limits the current model in accurately predicting products that are appreciated very badly. Finally, while models could be readily applied in quality control, their use in sensory science and product development is restrained by their inability to discern causal relationships. Given that the models cannot distinguish compounds that genuinely drive consumer perception from those that merely correlate, validation experiments are essential to identify true causative compounds.

Despite the inherent limitations, dissection of our models enabled us to pinpoint specific molecules as potential drivers of beer aroma and consumer appreciation, including compounds that were unexpected and would not have been identified using standard approaches. Important drivers of beer appreciation uncovered by our models include protein levels, ethyl acetate, ethyl phenyl acetate and lactic acid. Currently, many brewers already use lactic acid to acidify their brewing water and ensure optimal pH for enzymatic activity during the mashing process. Our results suggest that adding lactic acid can also improve beer appreciation, although its individual effect remains to be tested. Interestingly, ethanol appears to be unnecessary to improve beer appreciation, both for blond beer and alcohol-free beer. Given the growing consumer interest in alcohol-free beer, with a predicted annual market growth of >7% 84 , it is relevant for brewers to know what compounds can further increase consumer appreciation of these beers. Hence, our model may readily provide avenues to further improve the flavor and consumer appreciation of both alcoholic and non-alcoholic beers, which is generally considered one of the key challenges for future beer production.

Whereas we see a direct implementation of our results for the development of superior alcohol-free beverages and other food products, our study can also serve as a stepping stone for the development of novel alcohol-containing beverages. We want to echo the growing body of scientific evidence for the negative effects of alcohol consumption, both on the individual level by the mutagenic, teratogenic and carcinogenic effects of ethanol 85 , 86 , as well as the burden on society caused by alcohol abuse and addiction. We encourage the use of our results for the production of healthier, tastier products, including novel and improved beverages with lower alcohol contents. Furthermore, we strongly discourage the use of these technologies to improve the appreciation or addictive properties of harmful substances.

The present work demonstrates that despite some important remaining hurdles, combining the latest developments in chemical analyses, sensory analysis and modern machine learning methods offers exciting avenues for food chemistry and engineering. Soon, these tools may provide solutions in quality control and recipe development, as well as new approaches to sensory science and flavor research.

Beer selection

250 commercial Belgian beers were selected to cover the broad diversity of beer styles and corresponding diversity in chemical composition and aroma. See Supplementary Fig.  S1 .

Chemical dataset

Sample preparation.

Beers within their expiration date were purchased from commercial retailers. Samples were prepared in biological duplicates at room temperature, unless explicitly stated otherwise. Bottle pressure was measured with a manual pressure device (Steinfurth Mess-Systeme GmbH) and used to calculate CO 2 concentration. The beer was poured through two filter papers (Macherey-Nagel, 500713032 MN 713 ¼) to remove carbon dioxide and prevent spontaneous foaming. Samples were then prepared for measurements by targeted Headspace-Gas Chromatography-Flame Ionization Detector/Flame Photometric Detector (HS-GC-FID/FPD), Headspace-Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (HS-SPME-GC-MS), colorimetric analysis, enzymatic analysis, Near-Infrared (NIR) analysis, as described in the sections below. The mean values of biological duplicates are reported for each compound.

HS-GC-FID/FPD

HS-GC-FID/FPD (Shimadzu GC 2010 Plus) was used to measure higher alcohols, acetaldehyde, esters, 4-vinyl guaicol, and sulfur compounds. Each measurement comprised 5 ml of sample pipetted into a 20 ml glass vial containing 1.75 g NaCl (VWR, 27810.295). 100 µl of 2-heptanol (Sigma-Aldrich, H3003) (internal standard) solution in ethanol (Fisher Chemical, E/0650DF/C17) was added for a final concentration of 2.44 mg/L. Samples were flushed with nitrogen for 10 s, sealed with a silicone septum, stored at −80 °C and analyzed in batches of 20.

The GC was equipped with a DB-WAXetr column (length, 30 m; internal diameter, 0.32 mm; layer thickness, 0.50 µm; Agilent Technologies, Santa Clara, CA, USA) to the FID and an HP-5 column (length, 30 m; internal diameter, 0.25 mm; layer thickness, 0.25 µm; Agilent Technologies, Santa Clara, CA, USA) to the FPD. N 2 was used as the carrier gas. Samples were incubated for 20 min at 70 °C in the headspace autosampler (Flow rate, 35 cm/s; Injection volume, 1000 µL; Injection mode, split; Combi PAL autosampler, CTC analytics, Switzerland). The injector, FID and FPD temperatures were kept at 250 °C. The GC oven temperature was first held at 50 °C for 5 min and then allowed to rise to 80 °C at a rate of 5 °C/min, followed by a second ramp of 4 °C/min until 200 °C kept for 3 min and a final ramp of (4 °C/min) until 230 °C for 1 min. Results were analyzed with the GCSolution software version 2.4 (Shimadzu, Kyoto, Japan). The GC was calibrated with a 5% EtOH solution (VWR International) containing the volatiles under study (Supplementary Table  S7 ).

HS-SPME-GC-MS

HS-SPME-GC-MS (Shimadzu GCMS-QP-2010 Ultra) was used to measure additional volatile compounds, mainly comprising terpenoids and esters. Samples were analyzed by HS-SPME using a triphase DVB/Carboxen/PDMS 50/30 μm SPME fiber (Supelco Co., Bellefonte, PA, USA) followed by gas chromatography (Thermo Fisher Scientific Trace 1300 series, USA) coupled to a mass spectrometer (Thermo Fisher Scientific ISQ series MS) equipped with a TriPlus RSH autosampler. 5 ml of degassed beer sample was placed in 20 ml vials containing 1.75 g NaCl (VWR, 27810.295). 5 µl internal standard mix was added, containing 2-heptanol (1 g/L) (Sigma-Aldrich, H3003), 4-fluorobenzaldehyde (1 g/L) (Sigma-Aldrich, 128376), 2,3-hexanedione (1 g/L) (Sigma-Aldrich, 144169) and guaiacol (1 g/L) (Sigma-Aldrich, W253200) in ethanol (Fisher Chemical, E/0650DF/C17). Each sample was incubated at 60 °C in the autosampler oven with constant agitation. After 5 min equilibration, the SPME fiber was exposed to the sample headspace for 30 min. The compounds trapped on the fiber were thermally desorbed in the injection port of the chromatograph by heating the fiber for 15 min at 270 °C.

The GC-MS was equipped with a low polarity RXi-5Sil MS column (length, 20 m; internal diameter, 0.18 mm; layer thickness, 0.18 µm; Restek, Bellefonte, PA, USA). Injection was performed in splitless mode at 320 °C, a split flow of 9 ml/min, a purge flow of 5 ml/min and an open valve time of 3 min. To obtain a pulsed injection, a programmed gas flow was used whereby the helium gas flow was set at 2.7 mL/min for 0.1 min, followed by a decrease in flow of 20 ml/min to the normal 0.9 mL/min. The temperature was first held at 30 °C for 3 min and then allowed to rise to 80 °C at a rate of 7 °C/min, followed by a second ramp of 2 °C/min till 125 °C and a final ramp of 8 °C/min with a final temperature of 270 °C.

Mass acquisition range was 33 to 550 amu at a scan rate of 5 scans/s. Electron impact ionization energy was 70 eV. The interface and ion source were kept at 275 °C and 250 °C, respectively. A mix of linear n-alkanes (from C7 to C40, Supelco Co.) was injected into the GC-MS under identical conditions to serve as external retention index markers. Identification and quantification of the compounds were performed using an in-house developed R script as described in Goelen et al. and Reher et al. 87 , 88 (for package information, see Supplementary Table  S8 ). Briefly, chromatograms were analyzed using AMDIS (v2.71) 89 to separate overlapping peaks and obtain pure compound spectra. The NIST MS Search software (v2.0 g) in combination with the NIST2017, FFNSC3 and Adams4 libraries were used to manually identify the empirical spectra, taking into account the expected retention time. After background subtraction and correcting for retention time shifts between samples run on different days based on alkane ladders, compound elution profiles were extracted and integrated using a file with 284 target compounds of interest, which were either recovered in our identified AMDIS list of spectra or were known to occur in beer. Compound elution profiles were estimated for every peak in every chromatogram over a time-restricted window using weighted non-negative least square analysis after which peak areas were integrated 87 , 88 . Batch effect correction was performed by normalizing against the most stable internal standard compound, 4-fluorobenzaldehyde. Out of all 284 target compounds that were analyzed, 167 were visually judged to have reliable elution profiles and were used for final analysis.

Discrete photometric and enzymatic analysis

Discrete photometric and enzymatic analysis (Thermo Scientific TM Gallery TM Plus Beermaster Discrete Analyzer) was used to measure acetic acid, ammonia, beta-glucan, iso-alpha acids, color, sugars, glycerol, iron, pH, protein, and sulfite. 2 ml of sample volume was used for the analyses. Information regarding the reagents and standard solutions used for analyses and calibrations is included in Supplementary Table  S7 and Supplementary Table  S9 .

NIR analyses

NIR analysis (Anton Paar Alcolyzer Beer ME System) was used to measure ethanol. Measurements comprised 50 ml of sample, and a 10% EtOH solution was used for calibration.

Correlation calculations

Pairwise Spearman Rank correlations were calculated between all chemical properties.

Sensory dataset

Trained panel.

Our trained tasting panel consisted of volunteers who gave prior verbal informed consent. All compounds used for the validation experiment were of food-grade quality. The tasting sessions were approved by the Social and Societal Ethics Committee of the KU Leuven (G-2022-5677-R2(MAR)). All online reviewers agreed to the Terms and Conditions of the RateBeer website.

Sensory analysis was performed according to the American Society of Brewing Chemists (ASBC) Sensory Analysis Methods 90 . 30 volunteers were screened through a series of triangle tests. The sixteen most sensitive and consistent tasters were retained as taste panel members. The resulting panel was diverse in age [22–42, mean: 29], sex [56% male] and nationality [7 different countries]. The panel developed a consensus vocabulary to describe beer aroma, taste and mouthfeel. Panelists were trained to identify and score 50 different attributes, using a 7-point scale to rate attributes’ intensity. The scoring sheet is included as Supplementary Data  3 . Sensory assessments took place between 10–12 a.m. The beers were served in black-colored glasses. Per session, between 5 and 12 beers of the same style were tasted at 12 °C to 16 °C. Two reference beers were added to each set and indicated as ‘Reference 1 & 2’, allowing panel members to calibrate their ratings. Not all panelists were present at every tasting. Scores were scaled by standard deviation and mean-centered per taster. Values are represented as z-scores and clustered by Euclidean distance. Pairwise Spearman correlations were calculated between taste and aroma sensory attributes. Panel consistency was evaluated by repeating samples on different sessions and performing ANOVA to identify differences, using the ‘stats’ package (v4.2.2) in R (for package information, see Supplementary Table  S8 ).

Online reviews from a public database

The ‘scrapy’ package in Python (v3.6) (for package information, see Supplementary Table  S8 ). was used to collect 232,288 online reviews (mean=922, min=6, max=5343) from RateBeer, an online beer review database. Each review entry comprised 5 numerical scores (appearance, aroma, taste, palate and overall quality) and an optional review text. The total number of reviews per reviewer was collected separately. Numerical scores were scaled and centered per rater, and mean scores were calculated per beer.

For the review texts, the language was estimated using the packages ‘langdetect’ and ‘langid’ in Python. Reviews that were classified as English by both packages were kept. Reviewers with fewer than 100 entries overall were discarded. 181,025 reviews from >6000 reviewers from >40 countries remained. Text processing was done using the ‘nltk’ package in Python. Texts were corrected for slang and misspellings; proper nouns and rare words that are relevant to the beer context were specified and kept as-is (‘Chimay’,’Lambic’, etc.). A dictionary of semantically similar sensorial terms, for example ‘floral’ and ‘flower’, was created and collapsed together into one term. Words were stemmed and lemmatized to avoid identifying words such as ‘acid’ and ‘acidity’ as separate terms. Numbers and punctuation were removed.

Sentences from up to 50 randomly chosen reviews per beer were manually categorized according to the aspect of beer they describe (appearance, aroma, taste, palate, overall quality—not to be confused with the 5 numerical scores described above) or flagged as irrelevant if they contained no useful information. If a beer contained fewer than 50 reviews, all reviews were manually classified. This labeled data set was used to train a model that classified the rest of the sentences for all beers 91 . Sentences describing taste and aroma were extracted, and term frequency–inverse document frequency (TFIDF) was implemented to calculate enrichment scores for sensorial words per beer.

The sex of the tasting subject was not considered when building our sensory database. Instead, results from different panelists were averaged, both for our trained panel (56% male, 44% female) and the RateBeer reviews (70% male, 30% female for RateBeer as a whole).

Beer price collection and processing

Beer prices were collected from the following stores: Colruyt, Delhaize, Total Wine, BeerHawk, The Belgian Beer Shop, The Belgian Shop, and Beer of Belgium. Where applicable, prices were converted to Euros and normalized per liter. Spearman correlations were calculated between these prices and mean overall appreciation scores from RateBeer and the taste panel, respectively.

Pairwise Spearman Rank correlations were calculated between all sensory properties.

Machine learning models

Predictive modeling of sensory profiles from chemical data.

Regression models were constructed to predict (a) trained panel scores for beer flavors and quality from beer chemical profiles and (b) public reviews’ appreciation scores from beer chemical profiles. Z-scores were used to represent sensory attributes in both data sets. Chemical properties with log-normal distributions (Shapiro-Wilk test, p  <  0.05 ) were log-transformed. Missing chemical measurements (0.1% of all data) were replaced with mean values per attribute. Observations from 250 beers were randomly separated into a training set (70%, 175 beers) and a test set (30%, 75 beers), stratified per beer style. Chemical measurements (p = 231) were normalized based on the training set average and standard deviation. In total, three linear regression-based models: linear regression with first-order interaction terms (LR), lasso regression with first-order interaction terms (Lasso) and partial least squares regression (PLSR); five decision tree models, Adaboost regressor (ABR), Extra Trees (ET), Gradient Boosting regressor (GBR), Random Forest (RF) and XGBoost regressor (XGBR); one support vector machine model (SVR) and one artificial neural network model (ANN) were trained. The models were implemented using the ‘scikit-learn’ package (v1.2.2) and ‘xgboost’ package (v1.7.3) in Python (v3.9.16). Models were trained, and hyperparameters optimized, using five-fold cross-validated grid search with the coefficient of determination (R 2 ) as the evaluation metric. The ANN (scikit-learn’s MLPRegressor) was optimized using Bayesian Tree-Structured Parzen Estimator optimization with the ‘Optuna’ Python package (v3.2.0). Individual models were trained per attribute, and a multi-output model was trained on all attributes simultaneously.

Model dissection

GBR was found to outperform other methods, resulting in models with the highest average R 2 values in both trained panel and public review data sets. Impurity-based rankings of the most important predictors for each predicted sensorial trait were obtained using the ‘scikit-learn’ package. To observe the relationships between these chemical properties and their predicted targets, partial dependence plots (PDP) were constructed for the six most important predictors of consumer appreciation 74 , 75 .

The ‘SHAP’ package in Python (v0.41.0) was implemented to provide an alternative ranking of predictor importance and to visualize the predictors’ effects as a function of their concentration 68 .

Validation of causal chemical properties

To validate the effects of the most important model features on predicted sensory attributes, beers were spiked with the chemical compounds identified by the models and descriptive sensory analyses were carried out according to the American Society of Brewing Chemists (ASBC) protocol 90 .

Compound spiking was done 30 min before tasting. Compounds were spiked into fresh beer bottles, that were immediately resealed and inverted three times. Fresh bottles of beer were opened for the same duration, resealed, and inverted thrice, to serve as controls. Pairs of spiked samples and controls were served simultaneously, chilled and in dark glasses as outlined in the Trained panel section above. Tasters were instructed to select the glass with the higher flavor intensity for each attribute (directional difference test 92 ) and to select the glass they prefer.

The final concentration after spiking was equal to the within-style average, after normalizing by ethanol concentration. This was done to ensure balanced flavor profiles in the final spiked beer. The same methods were applied to improve a non-alcoholic beer. Compounds were the following: ethyl acetate (Merck KGaA, W241415), ethyl hexanoate (Merck KGaA, W243906), isoamyl acetate (Merck KGaA, W205508), phenethyl acetate (Merck KGaA, W285706), ethanol (96%, Colruyt), glycerol (Merck KGaA, W252506), lactic acid (Merck KGaA, 261106).

Significant differences in preference or perceived intensity were determined by performing the two-sided binomial test on each attribute.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The data that support the findings of this work are available in the Supplementary Data files and have been deposited to Zenodo under accession code 10653704 93 . The RateBeer scores data are under restricted access, they are not publicly available as they are property of RateBeer (ZX Ventures, USA). Access can be obtained from the authors upon reasonable request and with permission of RateBeer (ZX Ventures, USA).  Source data are provided with this paper.

Code availability

The code for training the machine learning models, analyzing the models, and generating the figures has been deposited to Zenodo under accession code 10653704 93 .

Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355 , 391–394 (2017).

Article   ADS   CAS   PubMed   Google Scholar  

Plutowska, B. & Wardencki, W. Application of gas chromatography–olfactometry (GC–O) in analysis and quality assessment of alcoholic beverages – A review. Food Chem. 107 , 449–463 (2008).

Article   CAS   Google Scholar  

Legin, A., Rudnitskaya, A., Seleznev, B. & Vlasov, Y. Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie. Anal. Chim. Acta 534 , 129–135 (2005).

Loutfi, A., Coradeschi, S., Mani, G. K., Shankar, P. & Rayappan, J. B. B. Electronic noses for food quality: A review. J. Food Eng. 144 , 103–111 (2015).

Ahn, Y.-Y., Ahnert, S. E., Bagrow, J. P. & Barabási, A.-L. Flavor network and the principles of food pairing. Sci. Rep. 1 , 196 (2011).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bartoshuk, L. M. & Klee, H. J. Better fruits and vegetables through sensory analysis. Curr. Biol. 23 , R374–R378 (2013).

Article   CAS   PubMed   Google Scholar  

Piggott, J. R. Design questions in sensory and consumer science. Food Qual. Prefer. 3293 , 217–220 (1995).

Article   Google Scholar  

Kermit, M. & Lengard, V. Assessing the performance of a sensory panel-panellist monitoring and tracking. J. Chemom. 19 , 154–161 (2005).

Cook, D. J., Hollowood, T. A., Linforth, R. S. T. & Taylor, A. J. Correlating instrumental measurements of texture and flavour release with human perception. Int. J. Food Sci. Technol. 40 , 631–641 (2005).

Chinchanachokchai, S., Thontirawong, P. & Chinchanachokchai, P. A tale of two recommender systems: The moderating role of consumer expertise on artificial intelligence based product recommendations. J. Retail. Consum. Serv. 61 , 1–12 (2021).

Ross, C. F. Sensory science at the human-machine interface. Trends Food Sci. Technol. 20 , 63–72 (2009).

Chambers, E. IV & Koppel, K. Associations of volatile compounds with sensory aroma and flavor: The complex nature of flavor. Molecules 18 , 4887–4905 (2013).

Pinu, F. R. Metabolomics—The new frontier in food safety and quality research. Food Res. Int. 72 , 80–81 (2015).

Danezis, G. P., Tsagkaris, A. S., Brusic, V. & Georgiou, C. A. Food authentication: state of the art and prospects. Curr. Opin. Food Sci. 10 , 22–31 (2016).

Shepherd, G. M. Smell images and the flavour system in the human brain. Nature 444 , 316–321 (2006).

Meilgaard, M. C. Prediction of flavor differences between beers from their chemical composition. J. Agric. Food Chem. 30 , 1009–1017 (1982).

Xu, L. et al. Widespread receptor-driven modulation in peripheral olfactory coding. Science 368 , eaaz5390 (2020).

Kupferschmidt, K. Following the flavor. Science 340 , 808–809 (2013).

Billesbølle, C. B. et al. Structural basis of odorant recognition by a human odorant receptor. Nature 615 , 742–749 (2023).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Smith, B. Perspective: Complexities of flavour. Nature 486 , S6–S6 (2012).

Pfister, P. et al. Odorant receptor inhibition is fundamental to odor encoding. Curr. Biol. 30 , 2574–2587 (2020).

Moskowitz, H. W., Kumaraiah, V., Sharma, K. N., Jacobs, H. L. & Sharma, S. D. Cross-cultural differences in simple taste preferences. Science 190 , 1217–1218 (1975).

Eriksson, N. et al. A genetic variant near olfactory receptor genes influences cilantro preference. Flavour 1 , 22 (2012).

Ferdenzi, C. et al. Variability of affective responses to odors: Culture, gender, and olfactory knowledge. Chem. Senses 38 , 175–186 (2013).

Article   PubMed   Google Scholar  

Lawless, H. T. & Heymann, H. Sensory evaluation of food: Principles and practices. (Springer, New York, NY). https://doi.org/10.1007/978-1-4419-6488-5 (2010).

Colantonio, V. et al. Metabolomic selection for enhanced fruit flavor. Proc. Natl. Acad. Sci. 119 , e2115865119 (2022).

Fritz, F., Preissner, R. & Banerjee, P. VirtualTaste: a web server for the prediction of organoleptic properties of chemical compounds. Nucleic Acids Res 49 , W679–W684 (2021).

Tuwani, R., Wadhwa, S. & Bagler, G. BitterSweet: Building machine learning models for predicting the bitter and sweet taste of small molecules. Sci. Rep. 9 , 1–13 (2019).

Dagan-Wiener, A. et al. Bitter or not? BitterPredict, a tool for predicting taste from chemical structure. Sci. Rep. 7 , 1–13 (2017).

Pallante, L. et al. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci. Rep. 12 , 1–11 (2022).

Malavolta, M. et al. A survey on computational taste predictors. Eur. Food Res. Technol. 248 , 2215–2235 (2022).

Lee, B. K. et al. A principal odor map unifies diverse tasks in olfactory perception. Science 381 , 999–1006 (2023).

Mayhew, E. J. et al. Transport features predict if a molecule is odorous. Proc. Natl. Acad. Sci. 119 , e2116576119 (2022).

Niu, Y. et al. Sensory evaluation of the synergism among ester odorants in light aroma-type liquor by odor threshold, aroma intensity and flash GC electronic nose. Food Res. Int. 113 , 102–114 (2018).

Yu, P., Low, M. Y. & Zhou, W. Design of experiments and regression modelling in food flavour and sensory analysis: A review. Trends Food Sci. Technol. 71 , 202–215 (2018).

Oladokun, O. et al. The impact of hop bitter acid and polyphenol profiles on the perceived bitterness of beer. Food Chem. 205 , 212–220 (2016).

Linforth, R., Cabannes, M., Hewson, L., Yang, N. & Taylor, A. Effect of fat content on flavor delivery during consumption: An in vivo model. J. Agric. Food Chem. 58 , 6905–6911 (2010).

Guo, S., Na Jom, K. & Ge, Y. Influence of roasting condition on flavor profile of sunflower seeds: A flavoromics approach. Sci. Rep. 9 , 11295 (2019).

Ren, Q. et al. The changes of microbial community and flavor compound in the fermentation process of Chinese rice wine using Fagopyrum tataricum grain as feedstock. Sci. Rep. 9 , 3365 (2019).

Hastie, T., Friedman, J. & Tibshirani, R. The Elements of Statistical Learning. (Springer, New York, NY). https://doi.org/10.1007/978-0-387-21606-5 (2001).

Dietz, C., Cook, D., Huismann, M., Wilson, C. & Ford, R. The multisensory perception of hop essential oil: a review. J. Inst. Brew. 126 , 320–342 (2020).

CAS   Google Scholar  

Roncoroni, Miguel & Verstrepen, Kevin Joan. Belgian Beer: Tested and Tasted. (Lannoo, 2018).

Meilgaard, M. Flavor chemistry of beer: Part II: Flavor and threshold of 239 aroma volatiles. in (1975).

Bokulich, N. A. & Bamforth, C. W. The microbiology of malting and brewing. Microbiol. Mol. Biol. Rev. MMBR 77 , 157–172 (2013).

Dzialo, M. C., Park, R., Steensels, J., Lievens, B. & Verstrepen, K. J. Physiology, ecology and industrial applications of aroma formation in yeast. FEMS Microbiol. Rev. 41 , S95–S128 (2017).

Article   PubMed   PubMed Central   Google Scholar  

Datta, A. et al. Computer-aided food engineering. Nat. Food 3 , 894–904 (2022).

American Society of Brewing Chemists. Beer Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A.).

Olaniran, A. O., Hiralal, L., Mokoena, M. P. & Pillay, B. Flavour-active volatile compounds in beer: production, regulation and control. J. Inst. Brew. 123 , 13–23 (2017).

Verstrepen, K. J. et al. Flavor-active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Meilgaard, M. C. Flavour chemistry of beer. part I: flavour interaction between principal volatiles. Master Brew. Assoc. Am. Tech. Q 12 , 107–117 (1975).

Briggs, D. E., Boulton, C. A., Brookes, P. A. & Stevens, R. Brewing 227–254. (Woodhead Publishing). https://doi.org/10.1533/9781855739062.227 (2004).

Bossaert, S., Crauwels, S., De Rouck, G. & Lievens, B. The power of sour - A review: Old traditions, new opportunities. BrewingScience 72 , 78–88 (2019).

Google Scholar  

Verstrepen, K. J. et al. Flavor active esters: Adding fruitiness to beer. J. Biosci. Bioeng. 96 , 110–118 (2003).

Snauwaert, I. et al. Microbial diversity and metabolite composition of Belgian red-brown acidic ales. Int. J. Food Microbiol. 221 , 1–11 (2016).

Spitaels, F. et al. The microbial diversity of traditional spontaneously fermented lambic beer. PLoS ONE 9 , e95384 (2014).

Blanco, C. A., Andrés-Iglesias, C. & Montero, O. Low-alcohol Beers: Flavor Compounds, Defects, and Improvement Strategies. Crit. Rev. Food Sci. Nutr. 56 , 1379–1388 (2016).

Jackowski, M. & Trusek, A. Non-Alcohol. beer Prod. – Overv. 20 , 32–38 (2018).

Takoi, K. et al. The contribution of geraniol metabolism to the citrus flavour of beer: Synergy of geraniol and β-citronellol under coexistence with excess linalool. J. Inst. Brew. 116 , 251–260 (2010).

Kroeze, J. H. & Bartoshuk, L. M. Bitterness suppression as revealed by split-tongue taste stimulation in humans. Physiol. Behav. 35 , 779–783 (1985).

Mennella, J. A. et al. A spoonful of sugar helps the medicine go down”: Bitter masking bysucrose among children and adults. Chem. Senses 40 , 17–25 (2015).

Wietstock, P., Kunz, T., Perreira, F. & Methner, F.-J. Metal chelation behavior of hop acids in buffered model systems. BrewingScience 69 , 56–63 (2016).

Sancho, D., Blanco, C. A., Caballero, I. & Pascual, A. Free iron in pale, dark and alcohol-free commercial lager beers. J. Sci. Food Agric. 91 , 1142–1147 (2011).

Rodrigues, H. & Parr, W. V. Contribution of cross-cultural studies to understanding wine appreciation: A review. Food Res. Int. 115 , 251–258 (2019).

Korneva, E. & Blockeel, H. Towards better evaluation of multi-target regression models. in ECML PKDD 2020 Workshops (eds. Koprinska, I. et al.) 353–362 (Springer International Publishing, Cham, 2020). https://doi.org/10.1007/978-3-030-65965-3_23 .

Gastón Ares. Mathematical and Statistical Methods in Food Science and Technology. (Wiley, 2013).

Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on tabular data? Preprint at http://arxiv.org/abs/2207.08815 (2022).

Gries, S. T. Statistics for Linguistics with R: A Practical Introduction. in Statistics for Linguistics with R (De Gruyter Mouton, 2021). https://doi.org/10.1515/9783110718256 .

Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2 , 56–67 (2020).

Ickes, C. M. & Cadwallader, K. R. Effects of ethanol on flavor perception in alcoholic beverages. Chemosens. Percept. 10 , 119–134 (2017).

Kato, M. et al. Influence of high molecular weight polypeptides on the mouthfeel of commercial beer. J. Inst. Brew. 127 , 27–40 (2021).

Wauters, R. et al. Novel Saccharomyces cerevisiae variants slow down the accumulation of staling aldehydes and improve beer shelf-life. Food Chem. 398 , 1–11 (2023).

Li, H., Jia, S. & Zhang, W. Rapid determination of low-level sulfur compounds in beer by headspace gas chromatography with a pulsed flame photometric detector. J. Am. Soc. Brew. Chem. 66 , 188–191 (2008).

Dercksen, A., Laurens, J., Torline, P., Axcell, B. C. & Rohwer, E. Quantitative analysis of volatile sulfur compounds in beer using a membrane extraction interface. J. Am. Soc. Brew. Chem. 54 , 228–233 (1996).

Molnar, C. Interpretable Machine Learning: A Guide for Making Black-Box Models Interpretable. (2020).

Zhao, Q. & Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. Publ. Am. Stat. Assoc. 39 , 272–281 (2019).

Article   MathSciNet   Google Scholar  

Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning. (Springer, 2019).

Labrado, D. et al. Identification by NMR of key compounds present in beer distillates and residual phases after dealcoholization by vacuum distillation. J. Sci. Food Agric. 100 , 3971–3978 (2020).

Lusk, L. T., Kay, S. B., Porubcan, A. & Ryder, D. S. Key olfactory cues for beer oxidation. J. Am. Soc. Brew. Chem. 70 , 257–261 (2012).

Gonzalez Viejo, C., Torrico, D. D., Dunshea, F. R. & Fuentes, S. Development of artificial neural network models to assess beer acceptability based on sensory properties using a robotic pourer: A comparative model approach to achieve an artificial intelligence system. Beverages 5 , 33 (2019).

Gonzalez Viejo, C., Fuentes, S., Torrico, D. D., Godbole, A. & Dunshea, F. R. Chemical characterization of aromas in beer and their effect on consumers liking. Food Chem. 293 , 479–485 (2019).

Gilbert, J. L. et al. Identifying breeding priorities for blueberry flavor using biochemical, sensory, and genotype by environment analyses. PLOS ONE 10 , 1–21 (2015).

Goulet, C. et al. Role of an esterase in flavor volatile variation within the tomato clade. Proc. Natl. Acad. Sci. 109 , 19009–19014 (2012).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Borisov, V. et al. Deep Neural Networks and Tabular Data: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21 https://doi.org/10.1109/TNNLS.2022.3229161 (2022).

Statista. Statista Consumer Market Outlook: Beer - Worldwide.

Seitz, H. K. & Stickel, F. Molecular mechanisms of alcoholmediated carcinogenesis. Nat. Rev. Cancer 7 , 599–612 (2007).

Voordeckers, K. et al. Ethanol exposure increases mutation rate through error-prone polymerases. Nat. Commun. 11 , 3664 (2020).

Goelen, T. et al. Bacterial phylogeny predicts volatile organic compound composition and olfactory response of an aphid parasitoid. Oikos 129 , 1415–1428 (2020).

Article   ADS   Google Scholar  

Reher, T. et al. Evaluation of hop (Humulus lupulus) as a repellent for the management of Drosophila suzukii. Crop Prot. 124 , 104839 (2019).

Stein, S. E. An integrated method for spectrum extraction and compound identification from gas chromatography/mass spectrometry data. J. Am. Soc. Mass Spectrom. 10 , 770–781 (1999).

American Society of Brewing Chemists. Sensory Analysis Methods. (American Society of Brewing Chemists, St. Paul, MN, U.S.A., 1992).

McAuley, J., Leskovec, J. & Jurafsky, D. Learning Attitudes and Attributes from Multi-Aspect Reviews. Preprint at https://doi.org/10.48550/arXiv.1210.3926 (2012).

Meilgaard, M. C., Carr, B. T. & Carr, B. T. Sensory Evaluation Techniques. (CRC Press, Boca Raton). https://doi.org/10.1201/b16452 (2014).

Schreurs, M. et al. Data from: Predicting and improving complex beer flavor through machine learning. Zenodo https://doi.org/10.5281/zenodo.10653704 (2024).

Download references

Acknowledgements

We thank all lab members for their discussions and thank all tasting panel members for their contributions. Special thanks go out to Dr. Karin Voordeckers for her tremendous help in proofreading and improving the manuscript. M.S. was supported by a Baillet-Latour fellowship, L.C. acknowledges financial support from KU Leuven (C16/17/006), F.A.T. was supported by a PhD fellowship from FWO (1S08821N). Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16/17/006).

Author information

These authors contributed equally: Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni.

Authors and Affiliations

VIB—KU Leuven Center for Microbiology, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Michiel Schreurs, Supinya Piampongsant, Miguel Roncoroni, Lloyd Cool, Beatriz Herrera-Malaver, Florian A. Theßeling & Kevin J. Verstrepen

CMPG Laboratory of Genetics and Genomics, KU Leuven, Gaston Geenslaan 1, B-3001, Leuven, Belgium

Leuven Institute for Beer Research (LIBR), Gaston Geenslaan 1, B-3001, Leuven, Belgium

Laboratory of Socioecology and Social Evolution, KU Leuven, Naamsestraat 59, B-3000, Leuven, Belgium

Lloyd Cool, Christophe Vanderaa & Tom Wenseleers

VIB Bioinformatics Core, VIB, Rijvisschestraat 120, B-9052, Ghent, Belgium

Łukasz Kreft & Alexander Botzki

AB InBev SA/NV, Brouwerijplein 1, B-3000, Leuven, Belgium

Philippe Malcorps & Luk Daenen

You can also search for this author in PubMed   Google Scholar

Contributions

S.P., M.S. and K.J.V. conceived the experiments. S.P., M.S. and K.J.V. designed the experiments. S.P., M.S., M.R., B.H. and F.A.T. performed the experiments. S.P., M.S., L.C., C.V., L.K., A.B., P.M., L.D., T.W. and K.J.V. contributed analysis ideas. S.P., M.S., L.C., C.V., T.W. and K.J.V. analyzed the data. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Kevin J. Verstrepen .

Ethics declarations

Competing interests.

K.J.V. is affiliated with bar.on. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Communications thanks Florian Bauer, Andrew John Macintosh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, description of additional supplementary files, supplementary data 1, supplementary data 2, supplementary data 3, supplementary data 4, supplementary data 5, supplementary data 6, supplementary data 7, reporting summary, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Schreurs, M., Piampongsant, S., Roncoroni, M. et al. Predicting and improving complex beer flavor through machine learning. Nat Commun 15 , 2368 (2024). https://doi.org/10.1038/s41467-024-46346-0

Download citation

Received : 30 October 2023

Accepted : 21 February 2024

Published : 26 March 2024

DOI : https://doi.org/10.1038/s41467-024-46346-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

scientific paper or research

IMAGES

  1. Examples Of Science Paper Abstract / Research Paper Sample Pdf Chapter

    scientific paper or research

  2. Format Example Of Scientific Paper / Scientific journal article example

    scientific paper or research

  3. STRUCTURE OF A RESEARCH PAPER.docx

    scientific paper or research

  4. Tips For How To Write A Scientific Research Paper

    scientific paper or research

  5. How To Write A Chemistry Research Paper? All Details

    scientific paper or research

  6. PPT

    scientific paper or research

VIDEO

  1. Difference between Research paper and a review. Which one is more important?

  2. How to Write a Scientific Research Paper

  3. Implementing A Research Project||Scientific Inquiry Research Design and Methodology||Research Notes

  4. Difference between Research Paper and Research Article

  5. What is the Difference Between Research Paper, Research Article, Review Article

  6. How scientific papers are published

COMMENTS

  1. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.

  2. ScienceDirect.com

    3.3 million articles on ScienceDirect are open access. Articles published open access are peer-reviewed and made freely available for everyone to read, download and reuse in line with the user license displayed on the article. ScienceDirect is the world's leading source for scientific, technical, and medical research.

  3. Research articles

    Read the latest Research articles from Scientific Reports. ... Calls for Papers Guide to referees ... Scientific Reports (Sci Rep) ...

  4. arXiv.org e-Print archive

    arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv.

  5. Science

    Science is a leading outlet for scientific news, commentary, and cutting-edge research. Through its print and online incarnations, Science reaches an estimated worldwide readership of more than one million. Science 's authorship is global too, and its articles consistently rank among the world's most cited research. mission & scope.

  6. JSTOR Home

    Harness the power of visual materials—explore more than 3 million images now on JSTOR. Enhance your scholarly research with underground newspapers, magazines, and journals. Explore collections in the arts, sciences, and literature from the world's leading museums, archives, and scholars. JSTOR is a digital library of academic journals ...

  7. Journal Top 100

    Journal Top 100 - 2022. This collection highlights our most downloaded* research papers published in 2022. Featuring authors from around the world, these papers highlight valuable research from an ...

  8. Browse Articles

    The variation and evolution of complete human centromeres. A comparison of two complete sets of human centromeres reveals that the centromeres show at least a 4.1-fold increase in single ...

  9. Search

    Find the research you need | With 160+ million publications, 1+ million questions, and 25+ million researchers, this is where everyone can access science. Discover the world's scientific knowledge

  10. How to Write a Scientific Paper: Practical Guidelines

    A scientific paper is the formal lasting record of a research process. It is meant to document research protocols, methods, results and conclusions derived from an initial working hypothesis. The first medical accounts date back to antiquity.

  11. How to Write a Research Paper

    Choose a research paper topic. There are many ways to generate an idea for a research paper, from brainstorming with pen and paper to talking it through with a fellow student or professor.. You can try free writing, which involves taking a broad topic and writing continuously for two or three minutes to identify absolutely anything relevant that could be interesting.

  12. Ten simple rules for reading a scientific paper

    Scientists write original research papers primarily to present new data that may change or reinforce the collective knowledge of a field. Therefore, the most important parts of this type of scientific paper are the data. Some people like to scrutinize the figures and tables (including legends) before reading any of the "main text": because ...

  13. The 100 most-cited scientific papers

    Here at Science we love ranking things, so we were thrilled with this list of the top 100 most-cited scientific papers, courtesy of Nature.Surprisingly absent are many of the landmark discoveries you might expect, such as the discovery of DNA's double helix structure. Instead, most of these influential manuscripts are slightly more utilitarian in nature.

  14. Trial of Lixisenatide in Early Parkinson's Disease

    Lixisenatide, a glucagon-like peptide-1 receptor agonist used for the treatment of diabetes, has shown neuroprotective properties in a mouse model of Parkinson's disease. In this phase 2, double ...

  15. How to (seriously) read a scientific paper

    The results and methods sections allow you to pull apart a paper to ensure it stands up to scientific rigor. Always think about the type of experiments performed, and whether these are the most appropriate to address the question proposed. Ensure that the authors have included relevant and sufficient numbers of controls.

  16. Scientific Papers

    Scientific Papers. Scientific papers are for sharing your own original research work with other scientists or for reviewing the research conducted by others. As such, they are critical to the ...

  17. Semantic Scholar

    Semantic Reader is an augmented reader with the potential to revolutionize scientific reading by making it more accessible and richly contextual. Try it for select papers. Learn More. Semantic Scholar uses groundbreaking AI and engineering to understand the semantics of scientific literature to help Scholars discover relevant research.

  18. Writing a Research Paper Introduction

    Table of contents. Step 1: Introduce your topic. Step 2: Describe the background. Step 3: Establish your research problem. Step 4: Specify your objective (s) Step 5: Map out your paper. Research paper introduction examples. Frequently asked questions about the research paper introduction.

  19. Difference between research paper and scientific paper

    6. A research paper is a paper containing original research. That is, if you do some work to add (or try to add) new knowledge to a field of study, and then present the details of your approach and findings in a paper, that paper can be called a research paper. Not all academic papers contain original research; other kinds of academic papers ...

  20. HOW TO WRITE A SCIENTIFIC ARTICLE

    Conducting scientific and clinical research is only the beginning of the scholarship of discovery. In order for the results of research to be accessible to other professionals and have a potential effect on the greater scientific community, it must be written and published. ... The task of writing a scientific paper and submitting it to a ...

  21. Research

    Elizabeth Finkel. Australian museum's plan to cut research draws fire from scientists. 4 Apr 2024 By. Jon Cohen. Bird flu may be spreading in cows via milking and herd transport. 4 Apr 2024 By. Gina Jiménez. With money running out, astronomers urge Mexico to save its giant telescope.

  22. Toolkit: How to write a great paper

    A clear format will ensure that your research paper is understood by your readers. Follow: 1. Context — your introduction. 2. Content — your results. 3. Conclusion — your discussion. Plan ...

  23. Mapping the Increasing Use of LLMs in Scientific Papers

    Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what ...

  24. Peer Review in Scientific Publications: Benefits, Critiques, & A

    Peer review has become fundamental in assisting editors in selecting credible, high quality, novel and interesting research papers to publish in scientific journals and to ensure the correction of any errors or issues present in submitted papers. Though the peer review process still has some flaws and deficiencies, a more suitable screening ...

  25. A periodic table of primes: Research team claims that prime numbers can

    In a nutshell, the outcome of the team's research is a handy periodic table of primes, or the PTP, pointing the locations of prime numbers. The research is available as a working paper in the SSRN ...

  26. How to write a first-class paper

    For the whole paper, the introduction sets the context, the results present the content and the discussion brings home the conclusion. It's crucial to focus your paper on a single key message ...

  27. NIE faculty and research staff participate in the ISLS Annual Meeting

    NIE faculty and research staff will maintain a strong presence at this year's Annual Meeting of the International Society of the Learning Sciences (ISLS), with the acceptance of an early career workshop proposal, three long papers, four short papers, two posters, and two symposia for the flagship conference held in Buffalo, New York, from 8 to 14 June 2024.

  28. Mandating indoor air quality for public buildings

    Science. 28 Mar 2024. Vol 383, Issue 6690. pp. 1418 - 1420. DOI: 10.1126/science.adl0677. People living in urban and industrialized societies, which are expanding globally, spend more than 90% of their time in the indoor environment, breathing indoor air (IA). Despite decades of research and advocacy, most countries do not have legislated ...

  29. Predicting and improving complex beer flavor through machine ...

    Research in the lab of K.J.V. is supported by KU Leuven, FWO, VIB, VLAIO and the Brewing Science Serves Health Fund. Research in the lab of T.W. is supported by FWO (G.0A51.15) and KU Leuven (C16 ...