Identifying Primary and Secondary Research Articles

  • Primary and Secondary

Profile Photo

Primary Research Articles

Primary research articles report on a single study. In the health sciences, primary research articles generally describe the following aspects of the study:

  • The study's hypothesis or research question
  • Some articles will include information on how participants were recruited or identified, as well as additional information about participants' sex, age, or race/ethnicity
  • A "methods" or "methodology" section that describes how the study was performed and what the researchers did
  • Results and conclusion section

Secondary Research Articles

Review articles are the most common type of secondary research article in the health sciences. A review article is a summary of previously published research on a topic. Authors who are writing a review article will search databases for previously completed research and summarize or synthesize those articles,  as opposed to recruiting participants and performing a new research study.

Specific types of review articles include:

  • Systematic Reviews
  • Meta-Analysis
  • Narrative Reviews
  • Integrative Reviews
  • Literature Reviews

Review articles often report on the following:

  • The hypothesis, research question, or review topic
  • Databases searched-- authors should clearly describe where and how they searched for the research included in their reviews
  • Systematic Reviews and Meta-Analysis should provide detailed information on the databases searched and the search strategy the authors used.Selection criteria-- the researchers should describe how they decided which articles to include
  • A critical appraisal or evaluation of the quality of the articles included (most frequently included in systematic reviews and meta-analysis)
  • Discussion, results, and conclusions

Determining Primary versus Secondary Using the Database Abstract

Information found in PubMed, CINAHL, Scopus, and other databases can help you determine whether the article you're looking at is primary or secondary.

Primary research article abstract

  • Note that in the "Objectives" field, the authors describe their single, individual study.
  • In the materials and methods section, they describe the number of patients included in the study and how those patients were divided into groups.
  • These are all clues that help us determine this abstract is describing is a single, primary research article, as opposed to a literature review.
  • Primary Article Abstract

primary vs secondary research article

Secondary research/review article abstract

  • Note that the words "systematic review" and "meta-analysis" appear in the title of the article
  • The objectives field also includes the term "meta-analysis" (a common type of literature review in the health sciences)
  • The "Data Source" section includes a list of databases searched
  • The "Study Selection" section describes the selection criteria
  • These are all clues that help us determine that this abstract is describing a review article, as opposed to a single, primary research article.
  • Secondary Research Article

primary vs secondary research article

  • Primary vs. Secondary Worksheet

Full Text Challenge

Can you determine if the following articles are primary or secondary?

  • Last Updated: Feb 17, 2024 5:25 PM
  • URL: https://library.usfca.edu/primary-secondary

2130 Fulton Street San Francisco, CA 94117-1080 415-422-5555

  • Facebook (link is external)
  • Instagram (link is external)
  • Twitter (link is external)
  • YouTube (link is external)
  • Consumer Information
  • Privacy Statement
  • Web Accessibility

Copyright © 2022 University of San Francisco

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Primary vs Secondary Research

Try Qualtrics for free

Primary vs secondary research – what’s the difference.

14 min read Find out how primary and secondary research are different from each other, and how you can use them both in your own research program.

Primary vs secondary research: in a nutshell

The essential difference between primary and secondary research lies in who collects the data.

  • Primary research definition

When you conduct primary research, you’re collecting data by doing your own surveys or observations.

  • Secondary research definition:

In secondary research, you’re looking at existing data from other researchers, such as academic journals, government agencies or national statistics.

Free Ebook: The Qualtrics Handbook of Question Design

When to use primary vs secondary research

Primary research and secondary research both offer value in helping you gather information.

Each research method can be used alone to good effect. But when you combine the two research methods, you have the ingredients for a highly effective market research strategy. Most research combines some element of both primary methods and secondary source consultation.

So assuming you’re planning to do both primary and secondary research – which comes first? Counterintuitive as it sounds, it’s more usual to start your research process with secondary research, then move on to primary research.

Secondary research can prepare you for collecting your own data in a primary research project. It can give you a broad overview of your research area, identify influences and trends, and may give you ideas and avenues to explore that you hadn’t previously considered.

Given that secondary research can be done quickly and inexpensively, it makes sense to start your primary research process with some kind of secondary research. Even if you’re expecting to find out what you need to know from a survey of your target market, taking a small amount of time to gather information from secondary sources is worth doing.

Types of market research

Primary research

Primary market research is original research carried out when a company needs timely, specific data about something that affects its success or potential longevity.

Primary research data collection might be carried out in-house by a business analyst or market research team within the company, or it may be outsourced to a specialist provider, such as an agency or consultancy. While outsourcing primary research involves a greater upfront expense, it’s less time consuming and can bring added benefits such as researcher expertise and a ‘fresh eyes’ perspective that avoids the risk of bias and partiality affecting the research data.

Primary research gives you recent data from known primary sources about the particular topic you care about, but it does take a little time to collect that data from scratch, rather than finding secondary data via an internet search or library visit.

Primary research involves two forms of data collection:

  • Exploratory research This type of primary research is carried out to determine the nature of a problem that hasn’t yet been clearly defined. For example, a supermarket wants to improve its poor customer service and needs to understand the key drivers behind the customer experience issues. It might do this by interviewing employees and customers, or by running a survey program or focus groups.
  • Conclusive research This form of primary research is carried out to solve a problem that the exploratory research – or other forms of primary data – has identified. For example, say the supermarket’s exploratory research found that employees weren’t happy. Conclusive research went deeper, revealing that the manager was rude, unreasonable, and difficult, making the employees unhappy and resulting in a poor employee experience which in turn led to less than excellent customer service. Thanks to the company’s choice to conduct primary research, a new manager was brought in, employees were happier and customer service improved.

Examples of primary research

All of the following are forms of primary research data.

  • Customer satisfaction survey results
  • Employee experience pulse survey results
  • NPS rating scores from your customers
  • A field researcher’s notes
  • Data from weather stations in a local area
  • Recordings made during focus groups

Primary research methods

There are a number of primary research methods to choose from, and they are already familiar to most people. The ones you choose will depend on your budget, your time constraints, your research goals and whether you’re looking for quantitative or qualitative data.

A survey can be carried out online, offline, face to face or via other media such as phone or SMS. It’s relatively cheap to do, since participants can self-administer the questionnaire in most cases. You can automate much of the process if you invest in good quality survey software.

Primary research interviews can be carried out face to face, over the phone or via video calling. They’re more time-consuming than surveys, and they require the time and expense of a skilled interviewer and a dedicated room, phone line or video calling setup. However, a personal interview can provide a very rich primary source of data based not only on the participant’s answers but also on the observations of the interviewer.

Focus groups

A focus group is an interview with multiple participants at the same time. It often takes the form of a discussion moderated by the researcher. As well as taking less time and resources than a series of one-to-one interviews, a focus group can benefit from the interactions between participants which bring out more ideas and opinions. However this can also lead to conversations going off on a tangent, which the moderator must be able to skilfully avoid by guiding the group back to the relevant topic.

Secondary research

Secondary research is research that has already been done by someone else prior to your own research study.

Secondary research is generally the best place to start any research project as it will reveal whether someone has already researched the same topic you’re interested in, or a similar topic that helps lay some of the groundwork for your research project.

Secondary research examples

Even if your preliminary secondary research doesn’t turn up a study similar to your own research goals, it will still give you a stronger knowledge base that you can use to strengthen and refine your research hypothesis. You may even find some gaps in the market you didn’t know about before.

The scope of secondary research resources is extremely broad. Here are just a few of the places you might look for relevant information.

Books and magazines

A public library can turn up a wealth of data in the form of books and magazines – and it doesn’t cost a penny to consult them.

Market research reports

Secondary research from professional research agencies can be highly valuable, as you can be confident the data collection methods and data analysis will be sound

Scholarly journals, often available in reference libraries

Peer-reviewed journals have been examined by experts from the relevant educational institutions, meaning there has been an extra layer of oversight and careful consideration of the data points before publication.

Government reports and studies

Public domain data, such as census data, can provide relevant information for your research project, not least in choosing the appropriate research population for a primary research method. If the information you need isn’t readily available, try contacting the relevant government agencies.

White papers

Businesses often produce white papers as a means of showcasing their expertise and value in their field. White papers can be helpful in secondary research methods, although they may not be as carefully vetted as academic papers or public records.

Trade or industry associations

Associations may have secondary data that goes back a long way and offers a general overview of a particular industry. This data collected over time can be very helpful in laying the foundations of your particular research project.

Private company data

Some businesses may offer their company data to those conducting research in return for fees or with explicit permissions. However, if a business has data that’s closely relevant to yours, it’s likely they are a competitor and may flat out refuse your request.

Learn more about secondary research

Examples of secondary research data

These are all forms of secondary research data in action:

  • A newspaper report quoting statistics sourced by a journalist
  • Facts from primary research articles quoted during a debate club meeting
  • A blog post discussing new national figures on the economy
  • A company consulting previous research published by a competitor

Secondary research methods

Literature reviews.

A core part of the secondary research process, involving data collection and constructing an argument around multiple sources. A literature review involves gathering information from a wide range of secondary sources on one topic and summarizing them in a report or in the introduction to primary research data.

Content analysis

This systematic approach is widely used in social science disciplines. It uses codes for themes, tropes or key phrases which are tallied up according to how often they occur in the secondary data. The results help researchers to draw conclusions from qualitative data.

Data analysis using digital tools

You can analyze large volumes of data using software that can recognize and categorize natural language. More advanced tools will even be able to identify relationships and semantic connections within the secondary research materials.

Text IQ

Comparing primary vs secondary research

We’ve established that both primary research and secondary research have benefits for your business, and that there are major differences in terms of the research process, the cost, the research skills involved and the types of data gathered. But is one of them better than the other?

The answer largely depends on your situation. Whether primary or secondary research wins out in your specific case depends on the particular topic you’re interested in and the resources you have available. The positive aspects of one method might be enough to sway you, or the drawbacks – such as a lack of credible evidence already published, as might be the case in very fast-moving industries – might make one method totally unsuitable.

Here’s an at-a-glance look at the features and characteristics of primary vs secondary research, illustrating some of the key differences between them.

What are the pros and cons of primary research?

Primary research provides original data and allows you to pinpoint the issues you’re interested in and collect data from your target market – with all the effort that entails.

Benefits of primary research:

  • Tells you what you need to know, nothing irrelevant
  • Yours exclusively – once acquired, you may be able to sell primary data or use it for marketing
  • Teaches you more about your business
  • Can help foster new working relationships and connections between silos
  • Primary research methods can provide upskilling opportunities – employees gain new research skills

Limitations of primary research:

  • Lacks context from other research on related subjects
  • Can be expensive
  • Results aren’t ready to use until the project is complete
  • Any mistakes you make in in research design or implementation could compromise your data quality
  • May not have lasting relevance – although it could fulfill a benchmarking function if things change

What are the pros and cons of secondary research?

Secondary research relies on secondary sources, which can be both an advantage and a drawback. After all, other people are doing the work, but they’re also setting the research parameters.

Benefits of secondary research:

  • It’s often low cost or even free to access in the public domain
  • Supplies a knowledge base for researchers to learn from
  • Data is complete, has been analyzed and checked, saving you time and costs
  • It’s ready to use as soon as you acquire it

Limitations of secondary research

  • May not provide enough specific information
  • Conducting a literature review in a well-researched subject area can become overwhelming
  • No added value from publishing or re-selling your research data
  • Results are inconclusive – you’ll only ever be interpreting data from another organization’s experience, not your own
  • Details of the research methodology are unknown
  • May be out of date – always check carefully the original research was conducted

Related resources

Business research methods 12 min read, qualitative research interviews 11 min read, market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, request demo.

Ready to learn more about Qualtrics?

Tutorial: Evaluating Information: Primary vs. Secondary Articles

  • Evaluating Information
  • Scholarly Literature Types
  • Primary vs. Secondary Articles
  • Peer Review
  • Systematic Reviews & Meta-Analysis
  • Gray Literature
  • Evaluating Like a Boss
  • Evaluating AV

Primary vs. Secondary Research Articles

In the sciences,  primary (or empirical) research articles :

  • are original scientific reports of new research findings (Please note that an original scientific article does not include review articles, which summarize the research literature on a particular subject, or articles using meta-analyses, which analyze pre-published data.)
  • usually include the following sections: Introduction , Methods , Results , Discussion, References
  • are usually  peer-reviewed (examined by expert(s) in the field before publication). Please note that a peer-reviewed article is not the same as a review article, which summarizes the research literature on a particular subject

You may also choose to use some secondary sources (summaries or interpretations of original research) such as books (find these through the library catalog) or review articles (articles which organize and critically analyze the research of others on a topic). These secondary sources, particularly review articles, are often useful and easier-to-read summaries of research in an area. Additionally, you can use the listed references to find useful primary research articles.

Anatomy of a Scholarly Article

scholarly article anatomy

from NCSU Libraries' Anatomy of a Scholarly Article

Types of health studies

In the sciences, particularly the health sciences, there are a number of types of primary articles (the gold standard being randomized controlled trials ) and secondary articles (the gold standard being systematic reviews and meta-analysis ). The chart below summarizes their differences and the linked article gives more information.

health study types

Searching for Primary vs. Secondary Articles

primary or secondary article search

Some scholarly databases will allow you to specific what kind of scholarly literature you're looking for.  However, be careful! Sometimes, depending on the database, the Review article type may mean book review instead of or as well as review article. You may also have to look under more or custom options to find these choices.

  • << Previous: Scholarly Literature Types
  • Next: Peer Review >>
  • Last Updated: Oct 20, 2021 11:11 AM
  • URL: https://guides.library.cornell.edu/evaluate

University Library

Distinguish between primary and secondary sources.

  • Further Information

Introduction

1. Introduction

Whether conducting research in the social sciences, humanities (especially history), arts, or natural sciences, the ability to distinguish between primary and secondary source material is essential. Basically, this distinction illustrates the degree to which the author of a piece is removed from the actual event being described, informing the reader as to whether the author is reporting impressions first hand (or is first to record these immediately following an event), or conveying the experiences and opinions of others—that is, second hand .  

2. Primary sources

These are contemporary accounts of an event, written by someone who experienced or witnessed the event in question. These original documents (i.e., they are not about another document or account) are often diaries, letters, memoirs, journals, speeches, manuscripts, interviews and other such unpublished works. They may also include published pieces such as newspaper or magazine articles (as long as they are written soon after the fact and not as historical accounts), photographs, audio or video recordings, research reports in the natural or social sciences, or original literary or theatrical works.  

3. Secondary sources

The function of these is to interpret primary sources , and so can be described as at least one step removed from the event or phenomenon under review. Secondary source materials, then, interpret, assign value to, conjecture upon, and draw conclusions about the events reported in primary sources. These are usually in the form of published works such as journal articles or books, but may include radio or television documentaries, or conference proceedings.  

4. Defining questions

When evaluating primary or secondary sources, the following questions might be asked to help ascertain the nature and value of material being considered:

  • How does the author know these details (names, dates, times)? Was the author present at the event or soon on the scene?
  • Where does this information come from—personal experience, eyewitness accounts, or reports written by others?
  • Are the author's conclusions based on a single piece of evidence, or have many sources been taken into account (e.g., diary entries, along with third-party eyewitness accounts, impressions of contemporaries, newspaper accounts)?

Ultimately, all source materials of whatever type must be assessed critically and even the most scrupulous and thorough work is viewed through the eyes of the writer/interpreter. This must be taken into account when one is attempting to arrive at the 'truth' of an event.

Ask a Librarian

In Person | Phone | Email | Chat

Related Guides

  • Distinguish between Popular and Scholarly Journals by Annette Marines Last Updated Mar 17, 2024 4184 views this year
  • Next: Further Information >>

spacer bullet

Creative Commons Attribution 3.0 License except where otherwise noted.

Library Twitter page

Land Acknowledgement

The land on which we gather is the unceded territory of the Awaswas-speaking Uypi Tribe. The Amah Mutsun Tribal Band, comprised of the descendants of indigenous people taken to missions Santa Cruz and San Juan Bautista during Spanish colonization of the Central Coast, is today working hard to restore traditional stewardship practices on these lands and heal from historical trauma.

The land acknowledgement used at UC Santa Cruz was developed in partnership with the Amah Mutsun Tribal Band Chairman and the Amah Mutsun Relearning Program at the UCSC Arboretum .

Brown University Homepage

Evaluating Information

Introduction.

  • Primary Sources
  • Secondary Sources

Further Reading

Learning objectives.

  • Exploring and Evaluating Popular, Trade, and Scholarly Sources
  • How to Read a Scholarly Article

This guide is designed to help you:

  • Identify the difference between a primary and a secondary source
  • Discuss the roles that each type play in academic research

What is a primary source?

Primary sources are evidence that was created at a time under study. They include printed, manuscript/archival, audio/visual, and born-digital materials. When analyzing a primary source, it’s important to consider who the intended audience might have been. For example, a letter could have been sent to an individual reader; a newspaper article would have been intended for a broader audience. 

  • Use primary sources to inform your research about a particular time, place, or individual.
  • Primary sources can be found online through research databases, websites like Twitter, and digitized special collections, including many items from the Brown University Library's Special Collections. Search the Brown Digital Repository for digitized special collections material . 
  • Upon request, the Library can scan some primary source material that is not already digitized.
Note for research in the sciences: Primary sources in the sciences are forms of documentation of original research. This could be a conference paper, presentation, journal article, lab notebook, dissertation, or patent.

Ask yourself: How could this image be used as evidence to support my research?

Citation:  Boitard, Louis-Philippe, "Hannah Snell the female soldier" (1750). Prints, Drawings and Watercolors from the Anne S.K. Brown Military Collection. Brown Digital Repository. Brown University Library. https://repository.library.brown.edu/studio/item/bdr:227249/

What is a secondary source?

Secondary sources are scholarly or other analyses of a primary source, created by a person not directly involved with the time period or event being studied.

  • Use secondary sources to recreate, analyze, critique, and/or report on a particular topic based on review of a single or a collection of primary sources.
  • Secondary sources available online include ebooks and journals. Learn more in the Finding Information tutorial. 
  • If a secondary source is unavailable electronically through the Library, you can suggest a purchase.  Once the suggestions is received, we will try to find an electronic copy of the material.
Note  for  research  in the sciences: Secondary  sources in the sciences are publications that comment or analyze original research. This could be a handbook, monograph, public opinion, encyclopedia, or government or public policy.

Based on the research we were doing in the first example, let's look for research that others have done about cross-dressing in history, especially around the time that the etching above was created.

You can  search the Library's catalog (BruKnow) with the keywords   cross dressing 18th century 

Within the results, you see a book titled In the Company of Men: Cross-dressed Women around 1800 .

Krimmer, E. (2004). In the company of men : cross-dressed women around 1800. Detroit, Mich.: Wayne State University Press.

If you click to get more information on the book, you will find useful information to provide context and background for the etching, housed in the Library's Special Collections. 

  • Vos, J. and Guzman, Y. (2019). "Understanding my home: the potential for affective impact and cultural competence in primary source literacy," Journal of Western Archives, 10 (1), article 6. Available at: https://digitalcommons.usu.edu/westernarchives/vol10/iss1/6
  • Primary and secondary source quiz form the Ithaca College Library: h ttps://library.ithaca.edu/ r101/primary/

This guide was designed to help you:

  • << Previous: Overview
  • Next: Exploring and Evaluating Popular, Trade, and Scholarly Sources >>
  • Last Updated: Feb 16, 2024 3:55 PM
  • URL: https://libguides.brown.edu/evaluate

moBUL - Mobile Brown University Library

Brown University Library  |  Providence, RI 02912  |  (401) 863-2165  |  Contact  |  Comments  |  Library Feedback  |  Site Map

Library Intranet

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base
  • Working with sources
  • Primary vs. Secondary Sources | Difference & Examples

Primary vs. Secondary Sources | Difference & Examples

Published on 4 September 2022 by Raimo Streefkerk . Revised on 15 May 2023.

When you do research, you have to gather information and evidence from a variety of sources.

Primary sources provide raw information and first-hand evidence. Examples include interview transcripts, statistical data, and works of art. A primary source gives you direct access to the subject of your research.

Secondary sources provide second-hand information and commentary from other researchers. Examples include journal articles, reviews, and academic books . A secondary source describes, interprets, or synthesises primary sources.

Primary sources are more credible as evidence, but good research uses both primary and secondary sources.

Instantly correct all language mistakes in your text

Be assured that you'll submit flawless writing. Upload your document to correct all your mistakes.

upload-your-document-ai-proofreader

Table of contents

What is a primary source, what is a secondary source, primary and secondary source examples, how to tell if a source is primary or secondary, primary vs secondary sources: which is better, frequently asked questions about primary and secondary sources.

A primary source is anything that gives you direct evidence about the people, events, or phenomena that you are researching. Primary sources will usually be the main objects of your analysis.

If you are researching the past, you cannot directly access it yourself, so you need primary sources that were produced at the time by participants or witnesses (e.g. letters, photographs, newspapers ).

If you are researching something current, your primary sources can either be qualitative or quantitative data that you collect yourself (e.g. through interviews, surveys, experiments) or sources produced by people directly involved in the topic (e.g. official documents or media texts).

The only proofreading tool specialized in correcting academic writing

The academic proofreading tool has been trained on 1000s of academic texts and by native English editors. Making it the most accurate and reliable proofreading tool for students.

primary vs secondary research article

Correct my document today

A secondary source is anything that describes, interprets, evaluates, or analyses information from primary sources. Common examples include:

  • Books , articles and documentaries that synthesise information on a topic
  • Synopses and descriptions of artistic works
  • Encyclopaedias and textbooks that summarize information and ideas
  • Reviews and essays that evaluate or interpret something

When you cite a secondary source, it’s usually not to analyse it directly. Instead, you’ll probably test its arguments against new evidence or use its ideas to help formulate your own.

Examples of sources that can be primary or secondary

A secondary source can become a primary source depending on your research question . If the person, context, or technique that produced the source is the main focus of your research, it becomes a primary source.

To determine if something can be used as a primary or secondary source in your research, there are some simple questions you can ask yourself:

  • Does this source come from someone directly involved in the events I’m studying (primary) or from another researcher (secondary)?
  • Am I interested in analysing the source itself (primary) or only using it for background information (secondary)?
  • Does the source provide original information (primary) or does it comment upon information from other sources (secondary)?

Most research uses both primary and secondary sources. They complement each other to help you build a convincing argument. Primary sources are more credible as evidence, but secondary sources show how your work relates to existing research.

What do you use primary sources for?

Primary sources are the foundation of original research. They allow you to:

  • Make new discoveries
  • Provide credible evidence for your arguments
  • Give authoritative information about your topic

If you don’t use any primary sources, your research may be considered unoriginal or unreliable.

What do you use secondary sources for?

Secondary sources are good for gaining a full overview of your topic and understanding how other researchers have approached it. They often synthesise a large number of primary sources that would be difficult and time-consuming to gather by yourself. They allow you to:

  • Gain background information on the topic
  • Support or contrast your arguments with other researchers’ ideas
  • Gather information from primary sources that you can’t access directly (e.g. private letters or physical documents located elsewhere)

When you conduct a literature review , you can consult secondary sources to gain a thorough overview of your topic. If you want to mention a paper or study that you find cited in a secondary source, seek out the original source and cite it directly.

Remember that all primary and secondary sources must be cited to avoid plagiarism . You can use Scribbr’s free citation generator to do so!

Common examples of primary sources include interview transcripts , photographs, novels, paintings, films, historical documents, and official statistics.

Anything you directly analyze or use as first-hand evidence can be a primary source, including qualitative or quantitative data that you collected yourself.

Common examples of secondary sources include academic books, journal articles , reviews, essays , and textbooks.

Anything that summarizes, evaluates or interprets primary sources can be a secondary source. If a source gives you an overview of background information or presents another researcher’s ideas on your topic, it is probably a secondary source.

To determine if a source is primary or secondary, ask yourself:

  • Was the source created by someone directly involved in the events you’re studying (primary), or by another researcher (secondary)?
  • Does the source provide original information (primary), or does it summarize information from other sources (secondary)?
  • Are you directly analyzing the source itself (primary), or only using it for background information (secondary)?

Some types of sources are nearly always primary: works of art and literature, raw statistical data, official documents and records, and personal communications (e.g. letters, interviews ). If you use one of these in your research, it is probably a primary source.

Primary sources are often considered the most credible in terms of providing evidence for your argument, as they give you direct evidence of what you are researching. However, it’s up to you to ensure the information they provide is reliable and accurate.

Always make sure to properly cite your sources to avoid plagiarism .

A fictional movie is usually a primary source. A documentary can be either primary or secondary depending on the context.

If you are directly analysing some aspect of the movie itself – for example, the cinematography, narrative techniques, or social context – the movie is a primary source.

If you use the movie for background information or analysis about your topic – for example, to learn about a historical event or a scientific discovery – the movie is a secondary source.

Whether it’s primary or secondary, always properly cite the movie in the citation style you are using. Learn how to create an MLA movie citation or an APA movie citation .

Articles in newspapers and magazines can be primary or secondary depending on the focus of your research.

In historical studies, old articles are used as primary sources that give direct evidence about the time period. In social and communication studies, articles are used as primary sources to analyse language and social relations (for example, by conducting content analysis or discourse analysis ).

If you are not analysing the article itself, but only using it for background information or facts about your topic, then the article is a secondary source.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Streefkerk, R. (2023, May 15). Primary vs. Secondary Sources | Difference & Examples. Scribbr. Retrieved 3 June 2024, from https://www.scribbr.co.uk/working-sources/primary-vs-secondary-sources/

Is this article helpful?

Raimo Streefkerk

Raimo Streefkerk

Other students also liked, tertiary sources explained | quick guide & examples, types of sources explained | examples & tips, how to find sources | scholarly articles, books, etc..

Survey Software & Market Research Solutions - Sawtooth Software

  • Technical Support
  • Technical Papers
  • Knowledge Base
  • Question Library

Call our friendly, no-pressure support team.

Primary vs Secondary Research: Differences, Methods, Sources, and More

Two images representing primary vs secondary research: woman holding a phone taking an online survey (primary research), and a stack of books bound with string (secondary research).

Table of Contents

Primary vs Secondary Research – What’s the Difference?

In the search for knowledge and data to inform decisions, researchers and analysts rely on a blend of research sources. These sources are broadly categorized into primary and secondary research, each serving unique purposes and offering different insights into the subject matter at hand. But what exactly sets them apart?

Primary research is the process of gathering fresh data directly from its source. This approach offers real-time insights and specific information tailored to specific objectives set by stakeholders. Examples include surveys , interviews, and observational studies.

Secondary research , on the other hand, involves the analysis of existing data, most often collected and presented by others. This type of research is invaluable for understanding broader trends, providing context, or validating hypotheses. Common sources include scholarly articles, industry reports, and data compilations.

The crux of the difference lies in the origin of the information: primary research yields firsthand data which can be tailored to a specific business question, whilst secondary research synthesizes what's already out there. In essence, primary research listens directly to the voice of the subject, whereas secondary research hears it secondhand .

When to Use Primary and Secondary Research

Selecting the appropriate research method is pivotal and should be aligned with your research objectives. The choice between primary and secondary research is not merely procedural but strategic, influencing the depth and breadth of insights you can uncover.

Primary research shines when you need up-to-date, specific information directly relevant to your study. It's the go-to for fresh insights, understanding consumer behavior, or testing new theories. Its bespoke nature makes it indispensable for tailoring questions to get the exact answers you need.

Ready to Start Gathering Primary Research Data?

Get started with our free survey research tool today! In just a few minutes, you can create powerful surveys with our easy-to-use interface.

Start Survey Research for Free or Request a Product Tour

Secondary research is your first step into the research world. It helps set the stage by offering a broad understanding of the topic. Before diving into costly primary research, secondary research can validate the need for further investigation or provide a solid background to build upon. It's especially useful for identifying trends, benchmarking, and situating your research within the existing body of knowledge.

Combining both methods can significantly enhance your research. Starting with secondary research lays the groundwork and narrows the focus, whilst subsequent primary research delves deep into specific areas of interest, providing a well-rounded, comprehensive understanding of the topic.

Primary vs Secondary Research Methods

In the landscape of market research, the methodologies employed can significantly influence the insights and conclusions drawn. Let's delve deeper into the various methods underpinning both primary and secondary research, shedding light on their unique applications and the distinct insights they offer.

Two women interviewing at a table. Represents primary research interviews.

Primary Research Methods:

  • Surveys: Surveys are a cornerstone of primary research, offering a quantitative approach to gathering data directly from the target audience. By employing structured questionnaires, researchers can collect a vast array of data ranging from customer preferences to behavioral patterns. This method is particularly valuable for acquiring statistically significant data that can inform decision-making processes and strategy development. The application of statistical approaches for analysing this data, such as key drivers analysis, MaxDiff or conjoint analysis can also further enhance any collected data.
  • One on One Interviews: Interviews provide a qualitative depth to primary research, allowing for a nuanced exploration of participants' attitudes, experiences, and motivations. Conducted either face-to-face or remotely, interviews enable researchers to delve into the complexities of human behavior, offering rich insights that surveys alone may not uncover. This method is instrumental in exploring new areas of research or obtaining detailed information on specific topics.
  • Focus Groups: Focus groups bring together a small, diverse group of participants to discuss and provide feedback on a particular subject, product, or idea. This interactive setting fosters a dynamic exchange of ideas, revealing consumers' perceptions, experiences, and preferences. Focus groups are invaluable for testing concepts, exploring market trends, and understanding the factors that influence consumer decisions.
  • Ethnographic Studies: Ethnographic studies involve the systematic watching, recording, and analysis of behaviors and events in their natural setting. This method offers an unobtrusive way to gather authentic data on how people interact with products, services, or environments, providing insights that can lead to more user-centered design and marketing strategies.

The interior of a two story library with books lining the walls and study cubicles in the center of the room. Represents secondary research.

Secondary Research Methods:

  • Literature Reviews: Literature reviews involve the comprehensive examination of existing research and publications on a given topic. This method enables researchers to synthesize findings from a range of sources, providing a broad understanding of what is already known about a subject and identifying gaps in current knowledge.
  • Meta-Analysis: Meta-analysis is a statistical technique that combines the results of multiple studies to arrive at a comprehensive conclusion. This method is particularly useful in secondary research for aggregating findings across different studies, offering a more robust understanding of the evidence on a particular topic.
  • Content Analysis: Content analysis is a method for systematically analyzing texts, media, or other content to quantify patterns, themes, or biases . This approach allows researchers to assess the presence of certain words, concepts, or sentiments within a body of work, providing insights into trends, representations, and societal norms. This can be performed across a range of sources including social media, customer forums or review sites.
  • Historical Research: Historical research involves the study of past events, trends, and behaviors through the examination of relevant documents and records. This method can provide context and understanding of current trends and inform future predictions, offering a unique perspective that enriches secondary research.

Each of these methods, whether primary or secondary, plays a crucial role in the mosaic of market research, offering distinct pathways to uncovering the insights necessary to drive informed decisions and strategies.

Primary vs Secondary Sources in Research

Both primary and secondary sources of research form the backbone of the insight generation process, when both are utilized in tandem it can provide the perfect steppingstone for the generation of real insights. Let’s explore how each category serves its unique purpose in the research ecosystem.

Primary Research Data Sources

Primary research data sources are the lifeblood of firsthand research, providing raw, unfiltered insights directly from the source. These include:

  • Customer Satisfaction Survey Results: Direct feedback from customers about their satisfaction with a product or service. This data is invaluable for identifying strengths to build on and areas for improvement and typically renews each month or quarter so that metrics can be tracked over time.
  • NPS Rating Scores from Customers: Net Promoter Score (NPS) provides a straightforward metric to gauge customer loyalty and satisfaction. This quantitative data can reveal much about customer sentiment and the likelihood of referrals.
  • Ad-hoc Surveys: Ad-hoc surveys can be about any topic which requires investigation, they are typically one off surveys which zero in on one particular business objective. Ad-hoc projects are useful for situations such as investigating issues identified in other tracking surveys, new product development, ad testing, brand messaging, and many other kinds of projects.
  • A Field Researcher’s Notes: Detailed observations from fieldwork can offer nuanced insights into user behaviors, interactions, and environmental factors that influence those interactions. These notes are a goldmine for understanding the context and complexities of user experiences.
  • Recordings Made During Focus Groups: Audio or video recordings of focus group discussions capture the dynamics of conversation, including reactions, emotions, and the interplay of ideas. Analyzing these recordings can uncover nuanced consumer attitudes and perceptions that might not be evident in survey data alone.

These primary data sources are characterized by their immediacy and specificity, offering a direct line to the subject of study. They enable researchers to gather data that is specifically tailored to their research objectives, providing a solid foundation for insightful analysis and strategic decision-making.

Secondary Research Data Sources

In contrast, secondary research data sources offer a broader perspective, compiling and synthesizing information from various origins. These sources include:

  • Books, Magazines, Scholarly Journals: Published works provide comprehensive overviews, detailed analyses, and theoretical frameworks that can inform research topics, offering depth and context that enriches primary data.
  • Market Research Reports: These reports aggregate data and analyses on industry trends, consumer behavior, and market dynamics, providing a macro-level view that can guide primary research directions and validate findings.
  • Government Reports: Official statistics and reports from government agencies offer authoritative data on a wide range of topics, from economic indicators to demographic trends, providing a reliable basis for secondary analysis.
  • White Papers, Private Company Data: White papers and reports from businesses and consultancies offer insights into industry-specific research, best practices, and market analyses. These sources can be invaluable for understanding the competitive landscape and identifying emerging trends.

Secondary data sources serve as a compass, guiding researchers through the vast landscape of information to identify relevant trends, benchmark against existing data, and build upon the foundation of existing knowledge. They can significantly expedite the research process by leveraging the collective wisdom and research efforts of others.

By adeptly navigating both primary and secondary sources, researchers can construct a well-rounded research project that combines the depth of firsthand data with the breadth of existing knowledge. This holistic approach ensures a comprehensive understanding of the research topic, fostering informed decisions and strategic insights.

Examples of Primary and Secondary Research in Marketing

In the realm of marketing, both primary and secondary research methods play critical roles in understanding market dynamics, consumer behavior, and competitive landscapes. By comparing examples across both methodologies, we can appreciate their unique contributions to strategic decision-making.

Example 1: New Product Development

Primary Research: Direct Consumer Feedback through Surveys and Focus Groups

  • Objective: To gauge consumer interest in a new product concept and identify preferred features.
  • Process: Surveys distributed to a target demographic to collect quantitative data on consumer preferences, and focus groups conducted to dive deeper into consumer attitudes and desires.
  • Insights: Direct insights into consumer needs, preferences for specific features, and willingness to pay. These insights help in refining product design and developing a targeted marketing strategy.

Secondary Research: Market Analysis Reports

  • Objective: To understand the existing market landscape, including competitor products and market trends.
  • Process: Analyzing published market analysis reports and industry studies to gather data on market size, growth trends, and competitive offerings.
  • Insights: Provides a broader understanding of the market, helping to position the new product strategically against competitors and align it with current trends.

Example 2: Brand Positioning

Primary Research: Brand Perception Analysis through Surveys

  • Objective: To understand how the brand is perceived by consumers and identify potential areas for repositioning.
  • Process: Conducting surveys that ask consumers to describe the brand in their own words, rate it against various attributes, and compare it to competitors.
  • Insights: Direct feedback on brand strengths and weaknesses from the consumer's perspective, offering actionable data for adjusting brand messaging and positioning.

Secondary Research: Social Media Sentiment Analysis

  • Objective: To analyze public sentiment towards the brand and its competitors.
  • Process: Utilizing software tools to analyze mentions, hashtags, and discussions related to the brand and its competitors across social media platforms.
  • Insights: Offers an overview of public perception and emerging trends in consumer sentiment, which can validate findings from primary research or highlight areas needing further investigation.

Example 3: Market Expansion Strategy

Primary Research: Consumer Demand Studies in New Markets

  • Objective: To assess demand and consumer preferences in a new geographic market.
  • Process: Conducting surveys and interviews with potential consumers in the target market to understand their needs, preferences, and cultural nuances.
  • Insights: Provides specific insights into the new market’s consumer behavior, preferences, and potential barriers to entry, guiding market entry strategies.

Secondary Research: Economic and Demographic Analysis

  • Objective: To evaluate the economic viability and demographic appeal of the new market.
  • Process: Reviewing existing economic reports, demographic data, and industry trends relevant to the target market.
  • Insights: Offers a macro view of the market's potential, including economic conditions, demographic trends, and consumer spending patterns, which can complement insights gained from primary research.

By leveraging both primary and secondary research, marketers can form a comprehensive understanding of their market, consumers, and competitors, facilitating informed decision-making and strategic planning. Each method brings its strengths to the table, with primary research offering direct consumer insights and secondary research providing a broader context within which to interpret those insights.

What Are the Pros and Cons of Primary and Secondary Research?

When it comes to market research, both primary and secondary research offer unique advantages and face certain limitations. Understanding these can help researchers and businesses make informed decisions on which approach to utilize for their specific needs. Below is a comparative table highlighting the pros and cons of each research type.

Navigating the Pros and Cons

  • Balance Your Research Needs: Consider starting with secondary research to gain a broad understanding of the subject matter, then delve into primary research for specific, targeted insights that are tailored to your precise needs.
  • Resource Allocation: Evaluate your budget, time, and resource availability. Primary research can offer more specific and actionable data but requires more resources. Secondary research is more accessible but may lack the specificity or recency you need.
  • Quality and Relevance: Assess the quality and relevance of available secondary sources before deciding if primary research is necessary. Sometimes, the existing data might suffice, especially for preliminary market understanding or trend analysis.
  • Combining Both for Comprehensive Insights: Often, the most effective research strategy involves a combination of both primary and secondary research. This approach allows for a more comprehensive understanding of the market, leveraging the broad perspective provided by secondary sources and the depth and specificity of primary data.

Free Survey Maker Tool

Get access to our free and intuitive survey maker. In just a few minutes, you can create powerful surveys with its easy-to-use interface.

Try our Free Survey Maker or Request a Product Tour

Sawtooth Software

3210 N Canyon Rd Ste 202

Provo UT 84604-6508

United States of America

primary vs secondary research article

Support: [email protected]

Consulting: [email protected]

Sales: [email protected]

Products & Services

Support & Resources

primary vs secondary research article

NAU Cline Library logo

Evidence Based Practice

  • 1. Ask: PICO(T) Question
  • 2. Align: Levels of Evidence
  • 3a. Acquire: Resource Types
  • 3b. Acquire: Searching
  • 4. Appraise

Primary vs. Secondary Sources

  • Qualitative and Quantitative Sources
  • Managing References

Sources are considered primary, secondary, or tertiary depending on the originality of the information presented and their proximity or how close they are to the source of information. This distinction can differ between subjects and disciplines.

In the sciences, research findings may be communicated informally between researchers through email, presented at conferences (primary source), and then, possibly, published as a journal article or technical report (primary source). Once published, the information may be commented on by other researchers (secondary sources), and/or professionally indexed in a database (secondary sources). Later the information may be summarized into an encyclopedic or reference book format (tertiary sources). Source

Primary Sources

A primary source in science is a document or record that reports on a study, experiment, trial or research project. Primary sources are usually written by the person(s) who did the research, conducted the study, or ran the experiment, and include hypothesis, methodology, and results.

Primary Sources include:

  • Pilot/prospective studies
  • Cohort studies
  • Survey research
  • Case studies
  • Lab notebooks
  • Clinical trials and randomized clinical trials/RCTs
  • Dissertations

Secondary Sources

Secondary sources list, summarize, compare, and evaluate primary information and studies so as to draw conclusions on or present current state of knowledge in a discipline or subject. Sources may include a bibliography which may direct you back to the primary research reported in the article.

Secondary Sources include:

  • reviews, systematic reviews, meta-analysis
  • newsletters and professional news sources
  • practice guidelines & standards
  • clinical care notes
  • patient education Information
  • government & legal Information
  • entries in nursing or medical encyclopedias Source

More on Systematic Reviews and Meta-Analysis

Systematic reviews – Systematic reviews are best for answering single questions (eg, the effectiveness of tight glucose control on microvascular complications of diabetes). They are more scientifically structured than traditional reviews, being explicit about how the authors attempted to find all relevant articles, judge the scientific quality of each study, and weigh evidence from multiple studies with conflicting results. These reviews pay particular attention to including all strong research, whether or not it has been published, to avoid publication bias (positive studies are preferentially published). Source

Meta-analysis -- Meta-analysis, which is commonly included in systematic reviews, is a statistical method that quantitatively combines the results from different studies. It can be used to provide an overall estimate of the net benefit or harm of an intervention, even when these effects may not have been apparent in the individual studies [ 9 ]. Meta-analysis can also provide an overall quantitative estimate of other parameters such as diagnostic accuracy, incidence, or prevalence. Source

  • << Previous: 4. Appraise
  • Next: Qualitative and Quantitative Sources >>
  • Last Updated: Nov 9, 2023 12:14 PM
  • URL: https://libraryguides.nau.edu/evidencebasedpractice

Banner

EDUC 9300: Educational Research (Hirsch)

  • Navigating the Library Online
  • ILLiad: Interlibrary Loan & Document Delivery Service

Primary Research vs. Secondary Research: Why does it matter? How do I tell which one I'm looking at?

  • Education Research: Where do I search? Selecting a database and implementing your search strategy.
  • EBSCOhost Databases - Create your own Account to Save your Research In
  • RefWorks: Citation & Management Tool
  • APA Citations This link opens in a new window
  • Your Librarian
  • What's the difference?
  • In what type of sources are primary research published?
  • Where can I find primary research sources?
  • Knowledge Check Activity

Different databases may contain different types of sources. When you are conducting research you need to consider what types are most likely to contain the type of information you need to answer your research question. Depending on the database you might find books, magazine articles, trade publication articles, scholarly journal articles, documents, newspapers, videos, audio clips, images, et cetera. For the purposes of your Education Literature Review we will focus on sources that are primary research studies. Before you can do that, you need to know the difference between primary research and secondary research, how to identify which one you are looking at when you evaluate a source that you find and why the emphasis gets placed on using primary research studies.

Primary Research: This is research that is done by the author of the source you are using where that author conducted some method of research to gather new data that s/he then reports, analyzes and interprets in that source. Primary = original, first-hand; the author of the source generated the research data they are using.

Secondary Research: This is when an author of the source you are using gathers existing data, usually produced by someone else, and they then report, analyze or interpret that other person's data. Secondary = second-hand; the author of the source did not generate the research data s/he is using.

Tips when first evaluating a source: Once you find a research-based source, read the abstract and/or methodologies section and ask yourself who conducted the actual research process to gather the data?

  • If the author(s) indicate they gathered the data first-hand by surveying a specific population, creating and running an experiment, conducting a focus group, observing a specific population/task, et cetera, they are doing primary research.
  • If the author(s) indicate that they used only data gathered from other people's research studies by reviewing the literature or research, they are doing secondary research. 

Common sources where primary research in the field of Education is published are:

  • Scholarly/academic journals - these are journals that are published by academic publishers (colleges, universities, et cetera) and professional organizations.
  • Conference Proceedings
  • Dissertations
  • District, State or National Reports

Any of the above sources may also be peer-reviewed, meaning that the content is reviewed by other professionals in the field before it is published. Peer-reviewed, scholarly journals are considered to be high quality and often are a requirement in your research assignments because they are produced by experts and professionals in the field and all primary research articles are put through a peer-reviewed vetting process that is detailed by the journal publisher. It is also where professionals in the field in turn tend to publish their research for those same reasons. It is important to keep a couple of things in mind:

  • Not all scholarly/academic journals are peer-reviewed.
  • Not every article in a peer-reviewed journal is a primary research article or even a secondary research article. These journals often contain book/product reviews, opinion pieces, advice for practitioners, literature reviews, et cetera.
  • Books take longer to publish and when you need current research/data this is a consideration. It can also be hard to determine what, if any type of peer-review process a book has gone through.
  • Dissertations are by their very nature peer-reviewed, especially at the doctoral level, but can be hard to access as they are often only available at the university where the dissertation was written.
  • In Education, district/state/national reports may be peer-reviewed, but like with books this can be hard to determine.

Research Databases are a great tool for finding and accessing primary research articles published in scholarly/academic, peer-reviewed journals. In the field of Education our library provides access to ERIC, Education Source, Proquest Education Database and Academic Search Ultimate. Each of these databases have search functions to help you narrow your results list down to scholarly/academic, peer-reviewed journals.

Depending on your topic and research question you may need to explore research databases in other disciplines as well. Here are two examples:

  • Does gender impact bullying in middle school? In edition to Education databases I might want to also look at psychology (from the aspect of gender and behavior) and criminal justice databases (from the perspective that bullying can be a crime and looking at gender/age based statistics)
  • What impact does arts therapy have on children with behavioral issues in the classroom? In edition to Education databases I might want to also look at psychology and medical databases (from the aspect of children with behavioral issues and arts therapy methods) 
  • Knowledge Check Activity: Primary vs. Secondary Education Research Ready to apply what you just learned? Click on the provided link then read the provided abstracts and try to determine whether the article is a primary research study, a secondary research study or non-research-based.
  • << Previous: ILLiad: Interlibrary Loan & Document Delivery Service
  • Next: Education Research: Where do I search? Selecting a database and implementing your search strategy. >>
  • Last Updated: Oct 25, 2023 12:27 PM
  • URL: https://fitchburgstate.libguides.com/educ9300

Welcome to the Columbus State Community College Library!

The Research Process: 3a. Primary vs. Secondary

  • Get Started
  • 1a. Develop Research Questions
  • 1b. Identify Keywords
  • 1c. Find Background Information
  • 1d. Refine a Topic
  • 2a. Find Books
  • 2b. Find Videos
  • 2c. Find Articles
  • 2d. Find Websites
  • 2e. Search Strategies
  • 3a. Primary vs. Secondary
  • 3b. Databases vs. Internet
  • 3c. Types of Periodicals
  • 4a. Notetaking
  • 4b. Paraphrasing
  • 4c. Citation Styles
  • 5a. Plagiarism
  • 5b. Copyright

Primary Vs. Secondary Sources

When evaluating the quality of the information you are using, it is useful to identify if you are using a Primary or Secondary  source. By doing so, you will be able recognize if the author is reporting on his or her own first-hand experiences or relying on the views of others. Take a look at the descriptions below:

Primary vs. Secondary? The distinction between types of sources can get tricky, because a secondary source may also be a primary source. Garry Wills' book about Lincoln's Gettysburg Address, for example, can looked at as both a secondary and a primary source. The distinction may depend on how you are using the source and the nature of your research. If you are researching Abraham Lincoln , the book would be a secondary source because Wills is offering his opinions about Lincoln and the Gettysburg Address. If your assignment is to critique Garry Wills' thesis or write a book review of Lincoln at Gettysburg, the book becomes a primary source, because you are commenting, evaluating, and discussing Garry Wills' ideas.

You can't always determine if something is primary or secondary just because of the source it is found in. Articles in newspapers and magazines are usually considered secondary sources. However, if a story in a newspaper about the Iraq war is an eyewitness account, that would be a primary source. If the reporter, however, includes additional materials he or she has gathered through interviews or other investigations, the article would be a secondary source. An interview in the Rolling Stone with Chris Robinson of the Black Crowes would be a primary source, but a review of the latest Black Crowes album would be a secondary source. In contrast, scholarly journals include research articles with primary materials, but they also have review articles that are not.

For your thinking and not just to confuse you even further, some experts include tertiary sources in addition to primary and secondary. These are sources that provide a short overview or brief summary of a topic, often digesting other sources or repackaging ideas related to a specific topic. Chief examples are wikipedia entries, articles in encyclopedias, and chapters in textbooks. This is the reason that you may be advised not to include an encyclopedia article in a final bibliography.

~ Adapted from Ithaca College Library , used with permission.

Primary Sources

Search the catalog to find primary source material for your topic.  Try adding one of the keywords below:

  • correspondence
  • early works
  • manuscripts
  • personal narratives

Primary, Secondary and Tertiary Sources

This video discusses the difference between primary, secondary, and tertiary sources, and provides examples of each.

  • << Previous: Step 3: Evaluate
  • Next: 3b. Databases vs. Internet >>
  • Columbus State Community College
  • Research Guides
  • The Research Process
  • Last Updated: May 28, 2024 8:09 AM
  • URL: https://library.cscc.edu/research-process
  • Staff Login

Facebook

Privacy & Confidentiality Statement Library Code of Conduct

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Dtsch Arztebl Int
  • v.106(15); 2009 Apr

Types of Study in Medical Research

Bernd röhrig.

1 MDK Rheinland-Pfalz, Referat Rehabilitation/Biometrie, Alzey

Jean-Baptist du Prel

2 Zentrum für Präventive Pädiatrie, Zentrum für Kinder- und Jugendmedizin, Mainz

Daniel Wachtlin

3 Interdisziplinäres Zentrum Klinische Studien (IZKS), Fachbereich Medizin der Universität Mainz

Maria Blettner

4 Institut für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), Johannes Gutenberg Universität Mainz

The choice of study type is an important aspect of the design of medical studies. The study design and consequent study type are major determinants of a study’s scientific quality and clinical value.

This article describes the structured classification of studies into two types, primary and secondary, as well as a further subclassification of studies of primary type. This is done on the basis of a selective literature search concerning study types in medical research, in addition to the authors’ own experience.

Three main areas of medical research can be distinguished by study type: basic (experimental), clinical, and epidemiological research. Furthermore, clinical and epidemiological studies can be further subclassified as either interventional or noninterventional.

Conclusions

The study type that can best answer the particular research question at hand must be determined not only on a purely scientific basis, but also in view of the available financial resources, staffing, and practical feasibility (organization, medical prerequisites, number of patients, etc.).

The quality, reliability and possibility of publishing a study are decisively influenced by the selection of a proper study design. The study type is a component of the study design (see the article "Study Design in Medical Research") and must be specified before the study starts. The study type is determined by the question to be answered and decides how useful a scientific study is and how well it can be interpreted. If the wrong study type has been selected, this cannot be rectified once the study has started.

After an earlier publication dealing with aspects of study design, the present article deals with study types in primary and secondary research. The article focuses on study types in primary research. A special article will be devoted to study types in secondary research, such as meta-analyses and reviews. This article covers the classification of individual study types. The conception, implementation, advantages, disadvantages and possibilities of using the different study types are illustrated by examples. The article is based on a selective literature research on study types in medical research, as well as the authors’ own experience.

Classification of study types

In principle, medical research is classified into primary and secondary research. While secondary research summarizes available studies in the form of reviews and meta-analyses, the actual studies are performed in primary research. Three main areas are distinguished: basic medical research, clinical research, and epidemiological research. In individual cases, it may be difficult to classify individual studies to one of these three main categories or to the subcategories. In the interests of clarity and to avoid excessive length, the authors will dispense with discussing special areas of research, such as health services research, quality assurance, or clinical epidemiology. Figure 1 gives an overview of the different study types in medical research.

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-106-0262_001.jpg

Classification of different study types

*1 , sometimes known as experimental research; *2 , analogous term: interventional; *3 , analogous term: noninterventional or nonexperimental

This scheme is intended to classify the study types as clearly as possible. In the interests of clarity, we have excluded clinical epidemiology — a subject which borders on both clinical and epidemiological research ( 3 ). The study types in this area can be found under clinical research and epidemiology.

Basic research

Basic medical research (otherwise known as experimental research) includes animal experiments, cell studies, biochemical, genetic and physiological investigations, and studies on the properties of drugs and materials. In almost all experiments, at least one independent variable is varied and the effects on the dependent variable are investigated. The procedure and the experimental design can be precisely specified and implemented ( 1 ). For example, the population, number of groups, case numbers, treatments and dosages can be exactly specified. It is also important that confounding factors should be specifically controlled or reduced. In experiments, specific hypotheses are investigated and causal statements are made. High internal validity (= unambiguity) is achieved by setting up standardized experimental conditions, with low variability in the units of observation (for example, cells, animals or materials). External validity is a more difficult issue. Laboratory conditions cannot always be directly transferred to normal clinical practice and processes in isolated cells or in animals are not equivalent to those in man (= generalizability) ( 2 ).

Basic research also includes the development and improvement of analytical procedures—such as analytical determination of enzymes, markers or genes—, imaging procedures—such as computed tomography or magnetic resonance imaging—, and gene sequencing—such as the link between eye color and specific gene sequences. The development of biometric procedures—such as statistical test procedures, modeling and statistical evaluation strategies—also belongs here.

Clinical studies

Clinical studies include both interventional (or experimental) studies and noninterventional (or observational) studies. A clinical drug study is an interventional clinical study, defined according to §4 Paragraph 23 of the Medicines Act [Arzneimittelgesetz; AMG] as "any study performed on man with the purpose of studying or demonstrating the clinical or pharmacological effects of drugs, to establish side effects, or to investigate absorption, distribution, metabolism or elimination, with the aim of providing clear evidence of the efficacy or safety of the drug."

Interventional studies also include studies on medical devices and studies in which surgical, physical or psychotherapeutic procedures are examined. In contrast to clinical studies, §4 Paragraph 23 of the AMG describes noninterventional studies as follows: "A noninterventional study is a study in the context of which knowledge from the treatment of persons with drugs in accordance with the instructions for use specified in their registration is analyzed using epidemiological methods. The diagnosis, treatment and monitoring are not performed according to a previously specified study protocol, but exclusively according to medical practice."

The aim of an interventional clinical study is to compare treatment procedures within a patient population, which should exhibit as few as possible internal differences, apart from the treatment ( 4 , e1 ). This is to be achieved by appropriate measures, particularly by random allocation of the patients to the groups, thus avoiding bias in the result. Possible therapies include a drug, an operation, the therapeutic use of a medical device such as a stent, or physiotherapy, acupuncture, psychosocial intervention, rehabilitation measures, training or diet. Vaccine studies also count as interventional studies in Germany and are performed as clinical studies according to the AMG.

Interventional clinical studies are subject to a variety of legal and ethical requirements, including the Medicines Act and the Law on Medical Devices. Studies with medical devices must be registered by the responsible authorities, who must also approve studies with drugs. Drug studies also require a favorable ruling from the responsible ethics committee. A study must be performed in accordance with the binding rules of Good Clinical Practice (GCP) ( 5 , e2 – e4 ). For clinical studies on persons capable of giving consent, it is absolutely essential that the patient should sign a declaration of consent (informed consent) ( e2 ). A control group is included in most clinical studies. This group receives another treatment regimen and/or placebo—a therapy without substantial efficacy. The selection of the control group must not only be ethically defensible, but also be suitable for answering the most important questions in the study ( e5 ).

Clinical studies should ideally include randomization, in which the patients are allocated by chance to the therapy arms. This procedure is performed with random numbers or computer algorithms ( 6 – 8 ). Randomization ensures that the patients will be allocated to the different groups in a balanced manner and that possible confounding factors—such as risk factors, comorbidities and genetic variabilities—will be distributed by chance between the groups (structural equivalence) ( 9 , 10 ). Randomization is intended to maximize homogeneity between the groups and prevent, for example, a specific therapy being reserved for patients with a particularly favorable prognosis (such as young patients in good physical condition) ( 11 ).

Blinding is another suitable method to avoid bias. A distinction is made between single and double blinding. With single blinding, the patient is unaware which treatment he is receiving, while, with double blinding, neither the patient nor the investigator knows which treatment is planned. Blinding the patient and investigator excludes possible subjective (even subconscious) influences on the evaluation of a specific therapy (e.g. drug administration versus placebo). Thus, double blinding ensures that the patient or therapy groups are both handled and observed in the same manner. The highest possible degree of blinding should always be selected. The study statistician should also remain blinded until the details of the evaluation have finally been specified.

A well designed clinical study must also include case number planning. This ensures that the assumed therapeutic effect can be recognized as such, with a previously specified statistical probability (statistical power) ( 4 , 6 , 12 ).

It is important for the performance of a clinical trial that it should be carefully planned and that the exact clinical details and methods should be specified in the study protocol ( 13 ). It is, however, also important that the implementation of the study according to the protocol, as well as data collection, must be monitored. For a first class study, data quality must be ensured by double data entry, programming plausibility tests, and evaluation by a biometrician. International recommendations for the reporting of randomized clinical studies can be found in the CONSORT statement (Consolidated Standards of Reporting Trials, www.consort-statement.org ) ( 14 ). Many journals make this an essential condition for publication.

For all the methodological reasons mentioned above and for ethical reasons, the randomized controlled and blinded clinical trial with case number planning is accepted as the gold standard for testing the efficacy and safety of therapies or drugs ( 4 , e1 , 15 ).

In contrast, noninterventional clinical studies (NIS) are patient-related observational studies, in which patients are given an individually specified therapy. The responsible physician specifies the therapy on the basis of the medical diagnosis and the patient’s wishes. NIS include noninterventional therapeutic studies, prognostic studies, observational drug studies, secondary data analyses, case series and single case analyses ( 13 , 16 ). Similarly to clinical studies, noninterventional therapy studies include comparison between therapies; however, the treatment is exclusively according to the physician’s discretion. The evaluation is often retrospective. Prognostic studies examine the influence of prognostic factors (such as tumor stage, functional state, or body mass index) on the further course of a disease. Diagnostic studies are another class of observational studies, in which either the quality of a diagnostic method is compared to an established method (ideally a gold standard), or an investigator is compared with one or several other investigators (inter-rater comparison) or with himself at different time points (intra-rater comparison) ( e1 ). If an event is very rare (such as a rare disease or an individual course of treatment), a single-case study, or a case series, are possibilities. A case series is a study on a larger patient group with a specific disease. For example, after the discovery of the AIDS virus, the Center for Disease Control (CDC) in the USA collected a case series of 1000 patients, in order to study frequent complications of this infection. The lack of a control group is a disadvantage of case series. For this reason, case series are primarily used for descriptive purposes ( 3 ).

Epidemiological studies

The main point of interest in epidemiological studies is to investigate the distribution and historical changes in the frequency of diseases and the causes for these. Analogously to clinical studies, a distinction is made between experimental and observational epidemiological studies ( 16 , 17 ).

Interventional studies are experimental in character and are further subdivided into field studies (sample from an area, such as a large region or a country) and group studies (sample from a specific group, such as a specific social or ethnic group). One example was the investigation of the iodine supplementation of cooking salt to prevent cretinism in a region with iodine deficiency. On the other hand, many interventions are unsuitable for randomized intervention studies, for ethical, social or political reasons, as the exposure may be harmful to the subjects ( 17 ).

Observational epidemiological studies can be further subdivided into cohort studies (follow-up studies), case control studies, cross-sectional studies (prevalence studies), and ecological studies (correlation studies or studies with aggregated data).

In contrast, studies with only descriptive evaluation are restricted to a simple depiction of the frequency (incidence and prevalence) and distribution of a disease within a population. The objective of the description may also be the regular recording of information (monitoring, surveillance). Registry data are also suited for the description of prevalence and incidence; for example, they are used for national health reports in Germany.

In the simplest case, cohort studies involve the observation of two healthy groups of subjects over time. One group is exposed to a specific substance (for example, workers in a chemical factory) and the other is not exposed. It is recorded prospectively (into the future) how often a specific disease (such as lung cancer) occurs in the two groups ( figure 2a ). The incidence for the occurrence of the disease can be determined for both groups. Moreover, the relative risk (quotient of the incidence rates) is a very important statistical parameter which can be calculated in cohort studies. For rare types of exposure, the general population can be used as controls ( e6 ). All evaluations naturally consider the age and gender distributions in the corresponding cohorts. The objective of cohort studies is to record detailed information on the exposure and on confounding factors, such as the duration of employment, the maximum and the cumulated exposure. One well known cohort study is the British Doctors Study, which prospectively examined the effect of smoking on mortality among British doctors over a period of decades ( e7 ). Cohort studies are well suited for detecting causal connections between exposure and the development of disease. On the other hand, cohort studies often demand a great deal of time, organization, and money. So-called historical cohort studies represent a special case. In this case, all data on exposure and effect (illness) are already available at the start of the study and are analyzed retrospectively. For example, studies of this sort are used to investigate occupational forms of cancer. They are usually cheaper ( 16 ).

An external file that holds a picture, illustration, etc.
Object name is Dtsch_Arztebl_Int-106-0262_002.jpg

Graphical depiction of a prospective cohort study (simplest case [2a]) and a retrospective case control study (2b)

In case control studies, cases are compared with controls. Cases are persons who fall ill from the disease in question. Controls are persons who are not ill, but are otherwise comparable to the cases. A retrospective analysis is performed to establish to what extent persons in the case and control groups were exposed ( figure 2b ). Possible exposure factors include smoking, nutrition and pollutant load. Care should be taken that the intensity and duration of the exposure is analyzed as carefully and in as detailed a manner as possible. If it is observed that ill people are more often exposed than healthy people, it may be concluded that there is a link between the illness and the risk factor. In case control studies, the most important statistical parameter is the odds ratio. Case control studies usually require less time and fewer resources than cohort studies ( 16 ). The disadvantage of case control studies is that the incidence rate (rate of new cases) cannot be calculated. There is also a great risk of bias from the selection of the study population ("selection bias") and from faulty recall ("recall bias") (see too the article "Avoiding Bias in Observational Studies"). Table 1 presents an overview of possible types of epidemiological study ( e8 ). Table 2 summarizes the advantages and disadvantages of observational studies ( 16 ).

1 = slight; 2 = moderate; 3 = high; N/A, not applicable.

*Individual cases may deviate from this pattern.

Selecting the correct study type is an important aspect of study design (see "Study Design in Medical Research" in volume 11/2009). However, the scientific questions can only be correctly answered if the study is planned and performed at a qualitatively high level ( e9 ). It is very important to consider or even eliminate possible interfering factors (or confounders), as otherwise the result cannot be adequately interpreted. Confounders are characteristics which influence the target parameters. Although this influence is not of primary interest, it can interfere with the connection between the target parameter and the factors that are of interest. The influence of confounders can be minimized or eliminated by standardizing the procedure, stratification ( 18 ), or adjustment ( 19 ).

The decision as to which study type is suitable to answer a specific primary research question must be based not only on scientific considerations, but also on issues related to resources (personnel and finances), hospital capacity, and practicability. Many epidemiological studies can only be implemented if there is access to registry data. The demands for planning, implementation, and statistical evaluation for observational studies should be just as high for observational studies as for experimental studies. There are particularly strict requirements, with legally based regulations (such as the Medicines Act and Good Clinical Practice), for the planning, implementation, and evaluation of clinical studies. A study protocol must be prepared for both interventional and noninterventional studies ( 6 , 13 ). The study protocol must contain information on the conditions, question to be answered (objective), the methods of measurement, the implementation, organization, study population, data management, case number planning, the biometric evaluation, and the clinical relevance of the question to be answered ( 13 ).

Important and justified ethical considerations may restrict studies with optimal scientific and statistical features. A randomized intervention study under strictly controlled conditions of the effect of exposure to harmful factors (such as smoking, radiation, or a fatty diet) is not possible and not permissible for ethical reasons. Observational studies are a possible alternative to interventional studies, even though observational studies are less reliable and less easy to control ( 17 ).

A medical study should always be published in a peer reviewed journal. Depending on the study type, there are recommendations and checklists for presenting the results. For example, these may include a description of the population, the procedure for missing values and confounders, and information on statistical parameters. Recommendations and guidelines are available for clinical studies ( 14 , 20 , e10 , e11 ), for diagnostic studies ( 21 , 22 , e12 ), and for epidemiological studies ( 23 , e13 ). Since 2004, the WHO has demanded that studies should be registered in a public registry, such as www.controlled-trials.com or www.clinicaltrials.gov . This demand is supported by the International Committee of Medical Journal Editors (ICMJE) ( 24 ), which specifies that the registration of the study before inclusion of the first subject is an essential condition for the publication of the study results ( e14 ).

When specifying the study type and study design for medical studies, it is essential to collaborate with an experienced biometrician. The quality and reliability of the study can be decisively improved if all important details are planned together ( 12 , 25 ).

Acknowledgments

Translated from the original German by Rodney A. Yeates, M.A., Ph.D.

Conflict of interest statement

The authors declare that there is no conflict of interest in the sense of the International Committee of Medical Journal Editors.

Primary vs. Secondary Sources for Scientific Research

  • Primary, Secondary, and Tertiary Sources
  • Science Resources
  • Analyzing Sources

Primary and Secondary Sources

Primary Sources in the Sciences

What's a primary source in the sciences?

Primary sources in the sciences (and many social sciences), report original research, ideas, or scientific discoveries for the first time. Primary sources in the sciences may also be referred to as primary research, primary articles, or research studies. Examples include research studies, scientific experiments, papers and proceedings from scientific conferences or meetings, dissertations and theses, and technical reports.

The following are some characteristics of scientific primary sources:

  • They report results/findings/data from experiments or research studies.
  • They do not include meta-analyses, systematic reviews, or literature reviews.  These are secondary sources.
  • They are frequently found in peer-reviewed or scholarly journals.
  • They should explain the research methodology used and frequently include methods, results, and discussion sections.
  • They are factual, not interpretive.

How do I find primary sources in the sciences?

A good place to start your search is in a subject-specific database. Many of these databases include options to narrow your search by source type. Not sure which database to use? Check out our  Database A-Z List  (use the dropdown menu to filter by subject).

Information adapted from Binghamton University Library

When searching for biomedical literature, you will find two types of articles: primary and secondary. Primary sources include articles that describe original research. Secondary sources analyze and interpret primary research.

Primary Literature 

Secondary Literature

Adapted from Regis University Library

  • Next: Primary, Secondary, and Tertiary Sources >>
  • Last Updated: Aug 11, 2023 11:49 AM
  • URL: https://clt.library.jwu.edu/c.php?g=1335128

JWU-Charlotte Library:

801 West Trade Street, Charlotte, North Carolina 28202

980 598-1611

  • Location and Directions
  • Off-Campus Access
  • Staff Directory
  • Chat with a Librarian
  • Interlibrary Loan (ILL)
  • System Status
  • Study Rooms
  • Research Appointment
  • Culinary Museum

Banner

Finding Primary Research Articles in the Sciences: Primary vs. Secondary

  • Advanced Search-Databases
  • Primary vs. Secondary
  • Analyzing a Primary Research Article
  • MLA, APA, and Chicago Style

Primary Sources in the Sciences

Primary sources in the sciences are a little bit different than primary sources in history, humanities, or social sciences.

In the sciences, the focus is on the research. Primary sources are ones written by the scientists who performed the experiments - these articles include original research data. 

So how can you tell if a science article is a primary source?

Primary research articles will usually include sections about:

  • abstract - summarizes paper in one paragraph, states purpose of study
  • methodology - explaining how the experiment was conducted
  • results - detailing what happened and providing raw data sets (often as tables or graphs)
  • conclusions - connecting the results with theories and other research
  • references - to previous research or theories that influenced the research

Primary sources-Research article

A sample of what a primary research article might look like: 

primary vs secondary research article

Secondary Sources

Secondary sources interpret and analyze the original event documented in a primary source. 

primary vs secondary research article

  • << Previous: Advanced Search-Databases
  • Next: Analyzing a Primary Research Article >>
  • Last Updated: May 21, 2024 2:51 PM
  • URL: https://libguides.polk.edu/primaryresearch

Polk State College is committed to equal access/equal opportunity in its programs, activities, and employment. For additional information, visit polk.edu/compliance .

Banner Image

Peer-Reviewed Literature: Peer-Reviewed Research: Primary vs. Secondary

  • Peer-Reviewed Research: Primary vs. Secondary
  • Types of Peer Review
  • Identifying Peer-Reviewed Research

Peer Reviewed Research

Published literature can be either peer-reviewed or non-peer-reviewed. Official research reports are almost always peer reviewed while a journal's other content is usually not. In the health sciences, official research can be primary, secondary, or even tertiary. It can be an original experiment or investigation (primary), an analysis or evaluation of primary research (secondary), or findings that compile secondary research (tertiary). If you are doing research yourself, then primary or secondary sources can reveal more in-depth information.

Primary Research

Primary research is information presented in its original form without interpretation by other researchers. While it may acknowledge previous studies or sources, it always presents original thinking, reports on discoveries, or new information about a topic.

Health sciences research that is primary includes both experimental trials and observational studies where subjects may be tested for outcomes or investigated to gain relevant insight.  Randomized Controlled Trials are the most prominent experimental design because randomized subjects offer the most compelling evidence for the effectiveness of an intervention. See the below graphic and below powerpoint for further information on primary research studies.

primary vs secondary research article

  • Research Design

Secondary Research

Secondary research is an account of original events or facts. It is secondary to and retrospective of the actual findings from an experiment or trial. These studies may be appraised summaries, reviews, or interpretations of primary sources and often exclude the original researcher(s). In the health sciences, meta-analysis and systematic reviews are the most frequent types of secondary research. 

  • A meta-analysis is a quantitative method of combining the results of primary research. In analyzing the relevant data and statistical findings from experimental trials or observational studies, it can more accurately calculate effective resolutions regarding certain health topics.
  • A systematic review is a summary of research that addresses a focused clinical question in a systematic, reproducible manner. In order to provide the single best estimate of effect in clinical decision making, primary research studies are pooled together and then filtered through an inclusion/exclusion process. The relevant data and findings are then compiled and synthesized to arrive at a more accurate conclusion about a specific health topic. Only peer-reviewed publications are used and analyzed in a methodology which may or may not include a meta-analysis.

primary vs secondary research article

  • << Previous: Home
  • Next: Types of Peer Review >>
  • Last Updated: Sep 29, 2023 10:05 AM
  • URL: https://ttuhsc.libguides.com/PeerReview

Texas Tech University Health Sciences Center logo

  • Privacy Policy

Research Method

Home » Primary Vs Secondary Research

Primary Vs Secondary Research

Table of Contents

Primary Vs Secondary Research

Primary and secondary research are two different types of research methods used to gather information for a study or research project.

Primary Research

Primary Research involves collecting original data for a specific research purpose. This type of research is designed to answer specific research questions and is often conducted through methods such as surveys, interviews, focus groups, or experiments. Primary research is time-consuming and requires careful planning and execution to ensure that the data collected is valid and reliable. However, it provides researchers with first-hand information that is relevant to their specific research questions and can be tailored to their specific needs.

Secondary Research

Secondary research involves gathering data that has already been collected by someone else. This type of research can be conducted through various sources, such as academic journals, books, government reports, and online databases. Secondary research is less time-consuming and less expensive than primary research, as the data has already been collected and analyzed. However, the data may not be specific to the researcher’s research questions or may be outdated. Therefore, it is essential to evaluate the quality and relevance of the data collected through secondary research carefully.

Difference Between Primary and Secondary Research

Here are some key differences between primary and secondary research:

Also see Research Methods

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Review Article vs Research Article

Review Article vs Research Article

Descriptive Statistics vs Inferential Statistics

Descriptive vs Inferential Statistics – All Key...

Essay Vs Research Paper

Essay Vs Research Paper

Qualitative Vs Quantitative Research

Qualitative Vs Quantitative Research

Descriptive vs Experimental Research

Descriptive vs Experimental Research

Exploratory Vs Explanatory Research

Exploratory Vs Explanatory Research

  • How it works

researchprospect post subheader

Primary vs Secondary Research – A Guide with Examples

Published by Alvin Nicolas at August 16th, 2021 , Revised On August 29, 2023

Introduction

Primary research or secondary research? How do you decide which is best for your dissertation paper?

As researchers, we need to be aware of the pros and cons of the two types of research methods to make sure their selected research method is the most appropriate, taking into account the topic of investigation .

The success of any dissertation paper largely depends on  choosing the correct research design . Before you can decide whether you must base your  research strategy  on primary or secondary research; it is important to understand the difference between primary resources and secondary resources.

What is the Difference between Primary Sources and Secondary Sources?

What are primary sources.

According to UCL libraries, primary sources are articles, images, or documents that provide direct evidence or first-hand testimony about any given research topic.

Is it important that we have a clear understanding of the information resulting from actions under investigation ? Primary sources allow us to get close to those events to recognise their analysis and interpretation in scientific and academic communities.

Examples of Primary Sources

Classic examples of primary sources include;

  • Original documents are prepared by the researcher investigating any given topic of research.
  • Reporters witnessing an event and reporting news.
  • Conducting surveys to collect data , such as primary elections and population census.
  • Interviews , speeches, letters, and diaries – what the participants wrote or said during data collection.
  • Audio, video, and image files were created to capture an event

What are Secondary Sources?

However, when the researcher wishes to analyse and understand information coming out of events or actions that have already occurred, their work is regarded as a secondary source.

In essence, no secondary source can be created without using primary sources. The same information source or evidence can be considered either primary or secondary, depending on who is presenting the information and where the information is presented.

Examples of Secondary Sources

Some examples of secondary sources are;

  • Documentaries (Even though the images, videos, and audio are seen as primary sources by the developer of the documentary)
  • Articles, publications, journals, and research documents are created by those not directly involved in the research.
  • Dissertations , thesis, and essays .
  • Critical reviews.
  • Books presented as evidence.

Need help with getting started with your dissertation paper? Here is a comprehensive article on “ How to write a dissertation – Step by step guide “.

What Type of Research you Should Base your Dissertation on – Primary or Secondary?

Below you will find detailed guidelines to help you make an informed decision if you have been thinking of the question “Should I use primary or secondary research in my dissertation”.

Hire an Expert Writer

Proposal and dissertation orders completed by our expert writers are

  • Formally drafted in academic style
  • Plagiarism free
  • 100% Confidential
  • Never Resold
  • Include unlimited free revisions
  • Completed to match exact client requirements

Primary Research

Primary research includes an exhaustive  analysis of data  to answer  research questions  that are specific and exploratory in nature.

Primary research methods with examples include the use of various primary research tools such as interviews,  research surveys , numerical data, observations, audio, video, and images to collect data directly rather than using existing literature.

Business organisations throughout the world have their employees or an external research agency conduct primary research on their behalf to address certain issues. On the other hand, undergraduate and postgraduate students conduct primary research as part of their dissertation projects  to fill an obvious research gap in their respective fields of study.

As indicated above, primary data can be collected in a number of ways, and so we have also  conducted in-depth research on the most common yet independent primary data collection techniques .

Sampling in Primary Research

When conducting primary research, it is vitally important to pay attention to the chosen  sampling method  which can be described as “ a specific principle used to select members of the population to participate in the research ”.

Oftentimes, the researcher might not be able to directly work with the targeted population because of its large size, and so it becomes indispensable to employ statistical sampling techniques where the researchers have no choice but to draw conclusions based on responses collected from the representative population.

Population vs sample

The process of sampling in primary data collection includes the following five steps;

  • Identifying the target population.
  • Selecting an appropriate sampling frame.
  • Determining the sampling size.
  • Choosing a sampling method .
  • Practical application of the selected sampling technique.

The researcher can gather responses when conducting primary research, but nonverbal communication and gestures play a considerable role. They help the researcher identify the various hidden elements which cannot be identified when conducting the secondary research.

How to use Social Media Networks for Dissertation Research

Reasons Why you Should Use Primary Research

  • As stated previously, the most prominent advantage of primary research over secondary research is that the researcher is able to directly collect the data from the respondents which makes the data more authentic and reliable.
  • Primary research has room for customisation based on the personal requirements and/or limitations of the researcher.
  • Primary research allows for a comprehensive analysis of the subject matter to address the problem at hand .
  • The researcher will have the luxury to decide how to collect and use the data, which means that they will be able to make use of the data in whatever way deemed fit to them to gain meaningful insights.
  • The results obtained from primary research are recognised as credible throughout academic and scientific communities.

Reasons Why you Should not Use Primary Research

  • If you are considering primary research for your dissertation , you need to be aware of the high costs involved in the process of gathering primary data. Undergraduate and Masters’ students often do not have the financial resources to fund their own research work. Ph.D. students, on the other hand, are awarded a very limited research budget to work with. Thus, if you are on a low or limited budget, conducting primary research might not be the most suitable option.
  • Primary research can be extremely time-consuming. Getting your target population to participate in online surveys and face-to-face or telephonic interviews requires patience and a lot of time. This is especially important for undergraduate and Masters’ students who are required to complete and submit their work within a certain timeframe.
  • Primary research is well recognised only when it makes use of several methods of data collection . Having just one primary research method will undermine your research. Using more than one method of data collection will mean that you need more time and financial resources.
  • There might be participants who wouldn’t be willing to disclose their information, thus this aspect is crucial and should be looked into carefully.

One important aspect of primary research that researchers should look into is research ethics. Keeping participants’ information confidential is a research responsibility that should never be overlooked.

How to Approach a Company for your Primary Study 

What data collection method best suits your research?

  • Find out by hiring an expert from ResearchProspect today!
  • Despite how challenging the subject may be, we are here to help you.

data collection method

Secondary Research

Secondary research or desk-based research is the second type of research you could base your  research methodology in a dissertation  on. This type of research reviews and analyses existing research studies to improve the overall authenticity of the research.

Secondary research methods include the use of secondary sources of information including journal articles, published reports, public libraries, books, data available on the internet, government publications, and results from primary research studies conducted by other researchers in the past.

Unlike primary research, secondary research is cost-effective and less time-consuming simply because it uses existing literature and doesn’t require the researcher to spend time and financial resources to collect first-hand data.

Not all researchers and/or business organisations are able to afford a significant amount of money towards research, and that’s one of the reasons this type of research is the most popular in universities and organisations.

The Steps for Conducting Secondary Research

Secondary research involves the following five steps;

  • Establishing the topic of research and setting up the research questions to be answered or the research hypothesis to be tested.
  • Identifying authentic and reliable sources of information.
  • Gather data relevant to the topic of research from various secondary sources such as books, journal articles, government publications, commercial sector reports.
  • Combining the data in a suitable format so you can gain meaningful insights.
  • Analysing the data to find a solution to a problem in hand

Reasons Why you Should Use Secondary Research

  • Secondary sources are readily available with researchers facing little to no difficulty in accessing secondary data. Unlike primary data that involves a lengthy and complex process, secondary data can be collected by the researcher through a number of existing sources without having to leave the comfort of the desk.
  • Secondary research is a simple process, and therefore the cost associated with it is almost negligible.

Reasons Why you Should Not Use Secondary Research

  • Finding authentic and credible sources of secondary data is nothing less than a challenge. The internet these days is full of fake information, so it is important to exercise precaution when selecting and evaluating the available information.
  • Secondary sources may not provide accurate and/or up-to-date numbers, so your research could be diluted if you are not including accurate statistics from recent timelines.
  • Secondary research, in essence, is dependent on primary research and stems its findings from sets of primary data. The reliability of secondary research will, to a certain degree, depends on the quality of primary data used.

If you aren’t sure about the correct method of research for your dissertation paper, you should get help from an expert who can guide on whether you should use Primary or Secondary Research for your dissertation paper.

The Steps Involved in Writing a Dissertation 

Key Differences between Primary and Secondary Research

Should i use primary or secondary research for my dissertation paper – conclusion.

When choosing between primary and secondary research, you should always take into consideration the advantages and disadvantages of both types of research so you make an informed decision.

The best way to select the correct research strategy  for your dissertation is to look into your research topic,  research questions , aim and objectives – and of course the available time and financial resources.

Discussion pertaining to the two research techniques clearly indicates that primary research should be chosen when a specific topic, a case, organisation, etc. is to be researched about and the researcher has access to some financial resources.

Whereas secondary research should be considered when the research is general in nature and can be answered by analysing past researches and published data.

Not sure which research strategy you should apply,  get in touch with us right away . At ResearchProspect, we have Masters and Ph.D. qualified writers in all academic subjects so you can be confident of having your research; completed to the highest academic standard and well-recognised in the academic world.

Check Prices Now

Frequently Asked Questions

What is the difference between primary vs secondary research.

Primary research involves collecting firsthand data from sources like surveys or interviews. Secondary research involves analyzing existing data, such as articles or reports. Primary is original data gathering, while secondary relies on existing information.

You May Also Like

Discourse analysis is an essential aspect of studying a language. It is used in various disciplines of social science and humanities such as linguistic, sociolinguistics, and psycholinguistic.

A hypothesis is a research question that has to be proved correct or incorrect through hypothesis testing – a scientific approach to test a hypothesis.

Sampling methods are used to to draw valid conclusions about a large community, organization or group of people, but they are based on evidence and reasoning.

USEFUL LINKS

LEARNING RESOURCES

researchprospect-reviews-trust-site

COMPANY DETAILS

Research-Prospect-Writing-Service

  • How It Works

primary vs secondary research article

PSYC 201: Research Methods: Primary vs. Secondary articles

  • About the Libraries
  • Primary vs. Secondary articles
  • Find Sources
  • Cite Sources

Article Type Chart

Use this chart to compare the different types of articles you may find in a library database.

  • Article Comparison Chart

Parts of a Primary / Empirical Article

All the details of a study are specified and usually described in sections with the headings : Introduction, Materials and Methods, Results and Discussion . See below for examples of what to look for:

-------------------------------------------

screenshot of the first part of a research article, arrows point to the title, authors, introduction and journal title

--------------------------------------

                            screenshot of the methods section of an article

------------------------------------

primary vs secondary research article

-----------------------------------

screenshot of a data table from a research study

------------------------------------------

screenshot of the discussion section of the article

---------------------------------------

Which article is the primary / empirical article?

Look at these two articles and determine which article is the primary / research article.  

Types of Scholarly Articles - VCU Libraries

VCU Libraries.  (2011, Jan 12).  Types of scholarly articles.   [Video].  YouTube.  https://www.youtube.com/watch?v=uEsAKqXSfbY&t=207s

Primary / Research / Empirical Articles

primary vs secondary research article

  • << Previous: About the Libraries
  • Next: Find Sources >>
  • Last Updated: Apr 25, 2024 1:59 PM
  • URL: https://libguides.holycross.edu/psycresearch
  • Open access
  • Published: 01 June 2024

Biomarkers for personalised prevention of chronic diseases: a common protocol for three rapid scoping reviews

  • E Plans-Beriso   ORCID: orcid.org/0000-0002-9388-8744 1 , 2   na1 ,
  • C Babb-de-Villiers 3   na1 ,
  • D Petrova 2 , 4 , 5 ,
  • C Barahona-López 1 , 2 ,
  • P Diez-Echave 1 , 2 ,
  • O R Hernández 1 , 2 ,
  • N F Fernández-Martínez 2 , 4 , 5 ,
  • H Turner 3 ,
  • E García-Ovejero 1 ,
  • O Craciun 1 ,
  • P Fernández-Navarro 1 , 2 ,
  • N Fernández-Larrea 1 , 2 ,
  • E García-Esquinas 1 , 2 ,
  • V Jiménez-Planet 7 ,
  • V Moreno 2 , 8 , 9 ,
  • F Rodríguez-Artalejo 2 , 10 , 11 ,
  • M J Sánchez 2 , 4 , 5 ,
  • M Pollan-Santamaria 1 , 2 ,
  • L Blackburn 3 ,
  • M Kroese 3   na2 &
  • B Pérez-Gómez 1 , 2   na2  

Systematic Reviews volume  13 , Article number:  147 ( 2024 ) Cite this article

241 Accesses

2 Altmetric

Metrics details

Introduction

Personalised prevention aims to delay or avoid disease occurrence, progression, and recurrence of disease through the adoption of targeted interventions that consider the individual biological, including genetic data, environmental and behavioural characteristics, as well as the socio-cultural context. This protocol summarises the main features of a rapid scoping review to show the research landscape on biomarkers or a combination of biomarkers that may help to better identify subgroups of individuals with different risks of developing specific diseases in which specific preventive strategies could have an impact on clinical outcomes.

This review is part of the “Personalised Prevention Roadmap for the future HEalThcare” (PROPHET) project, which seeks to highlight the gaps in current personalised preventive approaches, in order to develop a Strategic Research and Innovation Agenda for the European Union.

To systematically map and review the evidence of biomarkers that are available or under development in cancer, cardiovascular and neurodegenerative diseases that are or can be used for personalised prevention in the general population, in clinical or public health settings.

Three rapid scoping reviews are being conducted in parallel (February–June 2023), based on a common framework with some adjustments to suit each specific condition (cancer, cardiovascular or neurodegenerative diseases). Medline and Embase will be searched to identify publications between 2020 and 2023. To shorten the time frames, 10% of the papers will undergo screening by two reviewers and only English-language papers will be considered. The following information will be extracted by two reviewers from all the publications selected for inclusion: source type, citation details, country, inclusion/exclusion criteria (population, concept, context, type of evidence source), study methods, and key findings relevant to the review question/s. The selection criteria and the extraction sheet will be pre-tested. Relevant biomarkers for risk prediction and stratification will be recorded. Results will be presented graphically using an evidence map.

Inclusion criteria

Population: general adult populations or adults from specific pre-defined high-risk subgroups; concept: all studies focusing on molecular, cellular, physiological, or imaging biomarkers used for individualised primary or secondary prevention of the diseases of interest; context: clinical or public health settings.

Systematic review registration

https://doi.org/10.17605/OSF.IO/7JRWD (OSF registration DOI).

Peer Review reports

In recent years, innovative health research has moved quickly towards a new paradigm. The ability to analyse and process previously unseen sources and amounts of data, e.g. environmental, clinical, socio-demographic, epidemiological, and ‘omics-derived, has created opportunities in the understanding and prevention of chronic diseases, and in the development of targeted therapies that can cure them. This paradigm has come to be known as “personalised medicine”. According to the European Council Conclusion on personalised medicine for patients (2015/C 421/03), this term defines a medical model which involves characterisation of individuals’ genotypes, phenotypes and lifestyle and environmental exposures (e.g. molecular profiling, medical imaging, lifestyle and environmental data) for tailoring the right therapeutic strategy for the right person at the right time, and/or to determine the predisposition to disease and/or to deliver timely and targeted prevention [ 1 , 2 ]. In many cases, these personalised health strategies have been based on advances in fields such as molecular biology, genetic engineering, bioinformatics, diagnostic imaging and new’omics technologies, which have made it possible to identify biomarkers that have been used to design and adapt therapies to specific patients or groups of patients [ 2 ]. A biomarker is defined as a substance, structure, characteristic, or process that can be objectively quantified as an indicator of typical biological functions, disease processes, or biological reactions to exposure [ 3 , 4 ].

Adopting a public health perspective within this framework, one of the most relevant areas that would benefit from these new opportunities is the personalisation of disease prevention. Personalised prevention aims to delay or avoid the occurrence, progression and recurrence of disease by adopting targeted interventions that take into account biological information, environmental and behavioural characteristics, and the socio-economic and cultural context of individuals. These interventions should be timely, effective and equitable in order to maintain the best possible balance in lifetime health trajectory [ 5 ].

Among the main diseases that merit specific attention are chronic noncommunicable diseases, due to their incidence, their mortality or disability-adjusted life years [ 6 , 7 , 8 , 9 ]. Within the European Union (EU), in 2021, one-third of adults reported suffering from a chronic condition [ 10 ]. In addition, in 2019, the leading causes of mortality were cardiovascular disease (CVD) (35%), cancer (26%), respiratory disease (8%), and Alzheimer's disease (5%) [ 11 ]. For all of the above, in 2019, the PRECeDI consortium recommended the identification of biomarkers that could be used for the prevention of chronic diseases to integrate personalised medicine in the field of chronicity. This will support the goal of stratifying populations by indicating an individuals’ risk or resistance to disease and their potential response to drugs, guiding primary, secondary and tertiary preventive interventions [ 12 ]; understanding primary prevention as measures taken to prevent the occurrence of a disease before it occurs, secondary prevention as actions aimed at early detection, and tertiary prevention as interventions to prevent complications and improve quality of life in individuals already affected by a disease [ 4 ].

The “Personalised Prevention roadmap for the future HEalThcare” (PROPHET) project, funded by the European Union’s Horizon Europe research and innovation program and linked to ICPerMed, seeks to assess the effectiveness, clinical utility, and existing gaps in current personalised preventive approaches, as well as their potential to be implemented in healthcare settings. It also aims to develop a Strategy Research and Innovation Agenda (SRIA) for the European Union. This protocol corresponds to one of the first steps in the PROPHET, namely a review that aims to map the evidence and highlight the evidence gaps in research or the use of biomarkers in personalised prevention in the general adult population, as well as their integration with digital technologies, including wearable devices, accelerometers, and other appliances utilised for measuring physical and physiological functions. These biomarkers may be already available or currently under development in the fields of cancer, CVD, and neurodegenerative diseases.

There is already a significant body of knowledge about primary and secondary prevention strategies for these diseases. For example, hypercholesterolemia or dyslipidaemia, hypertension, smoking, diabetes mellitus and obesity or levels of physical activity are known risk factors for CVD [ 6 , 13 ] and neurodegenerative diseases [ 14 , 15 , 16 ]; for cancer, a summary of lifestyle preventive actions with good evidence is included in the European code against cancer [ 17 ]. The question is whether there is any biomarker or combination of biomarkers that can help to better identify subgroups of individuals with different risks of developing a particular disease, in which specific preventive strategies could have an impact on clinical outcomes. Our aim in this context is to show the available research in this field.

Given the context and time constraints, the rapid scoping review design is the most appropriate method for providing landscape knowledge [ 18 ] and provide summary maps, such as Campbell evidence and gap map [ 19 ]. Here, we present the protocol that will be used to elaborate three rapid scoping reviews and evidence maps of research on biomarkers investigated in relation to primary or secondary prevention of cancer, cardiovascular and neurodegenerative diseases, respectively. The results of these three rapid scoping reviews will contribute to inform the development of the PROPHET SRIA, which will guide the future policy for research in this field in the EU.

Review question

What biomarkers are being investigated in the context of personalised primary and secondary prevention of cancer, CVD and neurodegenerative diseases in the general adult population in clinical or public health settings?

Three rapid scoping reviews are being conducted between February and June 2023, in parallel, one for each disease group included (cancer, CVD and neurodegenerative diseases), using a common framework and specifying the adaptations to each disease group in search terms, data extraction and representation of results.

This research protocol, designed according to Joanna Briggs Institute (JBI) and Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist [ 20 , 21 , 22 ] was uploaded to the Open Science Framework for public consultation [ 23 ], with registration DOI https://doi.org/ https://doi.org/10.17605/OSF.IO/7JRWD . The protocol was also reviewed by experts in the field, after which modifications were incorporated.

Eligibility criteria

Following the PCC (population, concept and context) model [ 21 , 22 ], the included studies will meet the following eligibility criteria (Table  1 ):

Rationale for performing a rapid scoping review

As explained above, these scoping reviews are intended to be one of the first materials produced in the PROPHET project, so that they can inform the first draft of the SRIA. Therefore, according to the planned timetable, the reviews should be completed in only 4 months. Thus, following recommendations from the Cochrane Rapid Review Methods Group [ 24 ] and taking into account the large number of records expected to be assessed, according to the preliminary searches, and in order to meet these deadlines, specific restrictions were defined for the search—limited to a 3-year period (2020–2023), in English only, and using only MEDLINE and EMBASE as possible sources—and it was decided that the title-abstract and full-text screening phase would be carried out by a single reviewer, after an initial training phase with 10% of the records assessed by two reviewers to ensure concordance between team members. This percentage could be increased if necessary.

Rationale for population selection

These rapid scoping reviews are focused on the general adult population. In addition, they give attention to studies conducted among populations that present specific risk factors relevant to the selected diseases or that include these factors among those considered in the study.

For cancer, these risk (or preventive) factors include smoking [ 25 ], obesity [ 26 ], diabetes [ 27 , 28 , 29 ], Helicobacter pylori infection/colonisation [ 30 ], human papillomavirus (HPV) infection [ 30 ], human immunodeficiency virus (HIV) infection [ 30 ], alcohol consumption [ 31 ], liver cirrhosis and viral (HVB, HVC, HVD) hepatitis [ 32 ].

For CVD, we include hypercholesterolemia or dyslipidaemia, arterial hypertension, smoking, diabetes mellitus, chronic kidney disease, hyperglycaemia and obesity [ 6 , 13 ].

Risk groups for neurodegenerative diseases were defined based on the following risk factors: obesity [ 15 , 33 ], arterial hypertension [ 15 , 33 , 34 , 35 ], diabetes mellitus [ 15 , 33 , 34 , 35 ], dyslipidaemia [ 33 ], alcohol consumption [ 36 , 37 ] and smoking [ 15 , 16 , 33 , 34 ].

After the general search, only relevant and/or disease-specific subpopulations will be used for each specific disease. On the other hand, pregnancy is an exclusion criterion, as the very specific characteristics of this population group would require a specific review.

Rationale for disease selection

The search is limited to diseases with high morbidity and mortality within each of the three disease groups:

Cancer type

Due to time constraints, we only evaluate those malignant neoplasms with the greatest mortality and incidence rates in Europe, which according to the European Cancer Information System [ 38 ] are breast, prostate, colorectum, lung, bladder, pancreas, liver, stomach, kidney, and corpus uteri. Additionally, cervix uteri and liver cancers will also be included due to their preventable nature and/or the existence of public health screening programs [ 30 , 31 ].

We evaluate the following main causes of deaths: ischemic heart disease (49.2% of all CVD deaths), stroke (35.2%) (this includes ischemic stroke, intracerebral haemorrhage and subarachnoid haemorrhage), hypertensive heart disease (6.2%), cardiomyopathy and myocarditis (1.8%), atrial fibrillation and flutter (1.7%), rheumatic heart disease (1.6%), non-rheumatic valvular heart disease (0.9%), aortic aneurism (0.9%), peripheral artery disease (0.4%) and endocarditis (0.4%) [ 6 ].

In this scoping review, specifically in the context of CVD, rheumatic heart disease and endocarditis are not considered because of their infectious aetiology. Arterial hypertension is a risk factor for many cardiovascular diseases and for the purposes of this review is considered as an intermediary disease that leads to CVD.

  • Neurodegenerative diseases

The leading noncommunicable neurodegenerative causes of death are Alzheimer’s disease or dementia (20%), Parkinson’s disease (2.5%), motor neuron diseases (0.4%) and multiple sclerosis (0.2%) [ 8 ]. Alzheimer’s disease, vascular dementia, frontotemporal dementia and Lewy body disease will be specifically searched, following the pattern of European dementia prevalence studies [ 39 ]. Additionally, because amyotrophic lateral sclerosis is the most common motor neuron disease, it is also included in the search [ 8 , 40 , 41 ].

Rationale for context

Public health and clinical settings from any geographical location are being considered. The searches will only consider the period between January 2020 and mid-February 2023 due to time constraints.

Rationale for type of evidence

Qualitative studies are not considered since they cannot answer the research question. Editorials and opinion pieces, protocols, and conference abstracts will also be excluded. Clinical practice guidelines are not included since the information they contain should be in the original studies and in reviews on which they are based.

Pilot study

We did a pilot study to test and refine the search strategies, selection criteria and data extraction sheet as well as to get used to the software—Covidence [ 42 ]. The pilot study consisted of selecting from the results of the preliminary search matrix 100 papers in order of best fit to the topic, and 100 papers at random. The team comprised 15 individual reviewers (both in the pilot and final reviews) who met daily to revise, enhance, and reach consensus on the search matrices, criteria, and data extraction sheets.

Regarding the selected databases and the platforms used, we conducted various tests, including PubMed/MEDLINE and Ovid/MEDLINE, as well as Ovid/Embase and Elsevier/Embase. Ultimately, we chose Ovid as the platform for accessing both MEDLINE and Embase, utilizing thesaurus Mesh and EmTrees. We manually translated these thesauri to ensure consistency between them. Given that the review team was spread across the UK and Spain, we centralised the search results within the UK team's access to the Ovid license to ensure consistency. Additionally, using Ovid exclusively for accessing both MEDLINE and Embase streamlined the process and allowed for easier access to preprints, which represent the latest research in this rapidly evolving field.

Identification of research

The searches are being conducted in MEDLINE via Ovid, Embase via Ovid and Embase preprints via Ovid. We also explored the feasibility of searching in CDC-Authored Genomics and Precision Health Publications Databases [ 43 ] . However, the lack of advanced tools to refine the search, as well as the unavailability of bulk downloading prevented the inclusion of this data source. Nevertheless, a search with 15 records for each disease group showed a full overlap with MEDLINE and/or Embase.

Search strategy definition

An initial limited search of MEDLINE via PubMed and Ovid was undertaken to identify relevant papers on the topic. In this step, we identified keytext words in their titles and abstracts, as well as thesaurus terms. The SR-Accelerator, Citationchaser, and Yale Mesh Analyzer tools were used to assist in the construction of the search matrix. With all this information, we developed a full search strategy adapted for each included database and information source, optimised by research librarians.

Study evidence selection

The complete search strategies are shown in Additional file 3. The three searches are being conducted in parallel. When performing the search, no limits to the type of study or setting are being applied.

Following each search, all identified citations will be collated and uploaded into Covidence (Veritas Health Innovation, Melbourne, Australia, available at www.covidence.org ) with the citation details, and duplicates will be removed.

In the title-abstract and full-text screening phase, the first 10% of the papers will be evaluated by two independent reviewers (accounting for 200 or more papers in absolute numbers in the title-abstract phase). Then, a meeting to discuss discrepancies will lead to adjusting inclusion and exclusion criteria and to acquire consistency between reviewers’ decisions. After that, the full screening of the search results will be performed by a single reviewer. Disagreements that arise between reviewers at each stage of the selection process will be resolved through discussion, or with additional reviewers. We maintain an active forum to facilitate permanent contact among reviewers.

The results of the searches and the study inclusion processes will be reported and presented in a flow diagram following the PRISMA-ScR recommendations [ 22 ].

Expert consultation

The protocol has been refined after consultation with experts in each field (cancer, CVD, and neurodegenerative diseases) who gave input on the scope of the reviews regarding the diverse biomarkers, risk factors, outcomes, and types of prevention relevant to their fields of expertise. In addition, the search strategies have been peer-reviewed by a network of librarians (PRESS-forum in pressforum.pbworks.com) who kindly provided useful feedback.

Data extraction

We have developed a draft data extraction sheet, which is included as Additional file 4, based on the JBI recommendations [ 21 ]. Data extraction will include citation details, study design, population type, biomarker information (name, type, subtype, clinical utility, use of AI technology), disease (group, specific disease), prevention (primary or secondary, lifestyle if primary prevention), and subjective reviewer observations. The data extraction for all papers will be performed by two reviewers to ensure consistency in the classification of data.

Data analysis and presentation

The descriptive information about the studies collected in the previous phase will be coded according to predefined categories to allow the elaboration of visual summary maps that can allow readers and researchers to have a quick overview of their main results. As in the previous phases, this process will be carried out with the aid of Covidence.

Therefore, a summary of the extracted data will be presented in tables as well as in static and, especially, through interactive evidence gap maps (EGM) created using EPPI-Mapper [ 44 ], an open-access web application developed in 2018 by the Evidence for Policy and Practice Information and Coordinating Centre (EPPI-Centre) and Digital Solution Foundry, in partnership with the Campbell Collaboration, which has become the standard software for producing visual evidence gap maps.

Tables and static maps will be made by using R Studio, which will also be used to clean and prepare the database for its use in EPPI-Mapper by generating two Excel files: one containing the EGM structure (i.e. what will be the columns and rows of the visual table) and coding sets, and another containing the bibliographic references and their codes that reviewers had added. Finally, we will use a Python script to produce a file in JSON format, making it ready for importation into EPPI-Reviewer.

The maps are matrixes with biomarker categories/subcategories defining the rows and diseases serving as columns. They define cells, which contain small squares, each one representing each paper included in it. We will use a code of colours to reflect the study design. There will be also a second sublevel in the columns, depending on the map. Thus, for each group of diseases, we will produce three interactive EGMs: two for primary prevention and one for secondary prevention. For primary prevention, the first map will stratify the data to show whether any or which lifestyle has been considered in each paper in combination with the studied biomarker. The second map for primary prevention and the map for secondary prevention will include, as a second sublevel, the subpopulations in which the biomarker has been used or evaluated, which are disease-specific (i.e. cirrhosis for hepatic cancer) researched. The maps will also include filters that allow users to select records based on additional features, such as the use of artificial intelligence in the content of the papers. Furthermore, the EGM, which will be freely available online, will enable users to view and export selected bibliographic references and their abstracts. An example of these interactive maps with dummy data is provided in Additional file 5.

Finally, we will elaborate on two scientific reports for PROPHET. The main report, which will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) recommendations, will summarise the results of the three scoping reviews, will provide a general and global interpretation of the results and will comment on their implication for the SRIA, and will discuss the limitations of the process. The second report will present the specific methodology for the dynamic maps.

This protocol summarises the procedure to carry out three parallel rapid scoping reviews to provide an overview of the available research and gaps in the literature on biomarkers for personalised primary and secondary prevention for the three most common chronic disease groups: cancer, CVD and neurodegenerative diseases. The result will be a common report for the three scoping reviews and the online publication of interactive evidence gap maps to facilitate data visualisation.

This work will be complemented, in a further step of the PROPHET project, by a subsequent mapping report on the scientific evidence for the clinical utility of biomarkers. Both reports are part of an overall mapping effort to characterise the current knowledge and environment around personalised preventive medicine. In this context, PROPHET will also map personalised prevention research programs, as well as bottlenecks and challenges in the adoption of personalised preventive approaches or in the involvement of citizens, patients, health professionals and policy-makers in personalised prevention. The overall results will contribute to the development of the SRIA concept paper, which will help define future priorities for personalised prevention research in the European Union.

In regard to this protocol, one of the strengths of this approach is that it can be applied in the three scoping reviews. This will improve the consistency and comparability of the results between them, allowing for better leveraging of efforts; it also will facilitate the coordination among the staff conducting the different reviews and will allow them to discuss them together, providing a more global perspective as needed for the SRIA. In addition, the collaboration of researchers with different backgrounds, the inclusion of librarians in the research team, and the specific software tools used have helped us to guarantee the quality of the work and have shortened the time invested in defining the final version of this protocol. Another strength is that we have conducted a pilot study to test and refine the search strategy, selection criteria and data extraction sheet. In addition, the selection of the platform of access to the bibliographic databases has been decided after a previous evaluation process (Ovid-MEDLINE versus PubMed MEDLINE, Ovid-Embase versus Elsevier-Embase, etc.).

Only 10% of the papers will undergo screening by two reviewers, and if time permits, we will conduct kappa statistics to assess reviewer agreement during the screening phases. Additionally, ongoing communication and the exchange and discussion of uncertainties will ensure a high level of consensus in the review process.

The main limitation of this work is the very broad field it covers: personalised prevention in all chronic diseases; however, we have tried to maintain decisions to limit it to the chronic diseases with the greatest impact on the population and in the last 3 years, making a rapid scoping review due to time constraints following recommendations from the Cochrane Rapid Review Methods Group [ 24 ]; however, as our aim is to identify gaps in the literature in an area of growing interest (personalisation and prevention), we believe that the records retrieved will provide a solid foundation for evaluating available literature. Additionally, systematic reviews, which may encompass studies predating 2020, have the potential to provide valuable insights beyond the temporal constraints of our search.

Thus, this protocol reflects the decisions set by the PROPHET's timetable, without losing the quality and rigour of the work. In addition, the data extraction phase will be done by two reviewers in 100% of the papers to ensure the consistency of the extracted data. Lastly, extending beyond these three scoping reviews, the primary challenge resides in amalgamating their findings with those from numerous other reviews within the project, ultimately producing a cohesive concept paper in the Strategy Research and Innovation Agenda (SRIA) for the European Union, firmly rooted in evidence-based conclusions.

Council of European Union. Council conclusions on personalised medicine for patients (2015/C 421/03). Brussels: European Union; 2015 dic. Report No.: (2015/C 421/03). Disponible en: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52015XG1217(01)&from=FR .

Goetz LH, Schork NJ. Personalized medicine: motivation, challenges, and progress. Fertil Steril. 2018;109(6):952–63.

Article   PubMed   PubMed Central   Google Scholar  

FDA-NIH Biomarker Working Group. BEST (Biomarkers, EndpointS, and other Tools) Resource. Silver Spring (MD): Food and Drug Administration (US); 2016 [citado 3 de febrero de 2023]. Disponible en: http://www.ncbi.nlm.nih.gov/books/NBK326791/ .

Porta M, Greenland S, Hernán M, dos Silva I S, Last JM. International Epidemiological Association, editores. A dictionary of epidemiology. 6th ed. Oxford: Oxford Univ. Press; 2014. p. 343.

Google Scholar  

PROPHET. Project kick-off meeting. Rome. 2022.

Roth GA, Mensah GA, Johnson CO, Addolorato G, Ammirati E, Baddour LM, et al. Global burden of cardiovascular diseases and risk factors, 1990–2019. J Am College Cardiol. 2020;76(25):2982–3021.

Article   Google Scholar  

GBD 2019 Cancer Collaboration, Kocarnik JM, Compton K, Dean FE, Fu W, Gaw BL, et al. Cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life years for 29 cancer groups from 2010 to 2019: a systematic analysis for the global burden of disease study 2019. JAMA Oncol. 2022;8(3):420.

Feigin VL, Vos T, Nichols E, Owolabi MO, Carroll WM, Dichgans M, et al. The global burden of neurological disorders: translating evidence into policy. The Lancet Neurology. 2020;19(3):255–65.

Article   PubMed   Google Scholar  

GBD 2019 Collaborators, Nichols E, Abd‐Allah F, Abdoli A, Abosetugn AE, Abrha WA, et al. Global mortality from dementia: Application of a new method and results from the Global Burden of Disease Study 2019. A&D Transl Res & Clin Interv. 2021;7(1). Disponible en: https://onlinelibrary.wiley.com/doi/10.1002/trc2.12200 . [citado 7 de febrero de 2023].

Eurostat. ec.europa.eu. Self-perceived health statistics. European health interview survey (EHIS). 2022. Disponible en: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Self-perceived_health_statistics . [citado 7 de febrero de 2023].

OECD/European Union. Health at a Glance: Europe 2022: State of Health in the EU Cycle. Paris: OECD Publishing; 2022. Disponible en: https://www.oecd-ilibrary.org/social-issues-migration-health/health-at-a-glance-europe-2022_507433b0-en .

Boccia S, Pastorino R, Ricciardi W, Ádány R, Barnhoorn F, Boffetta P, et al. How to integrate personalized medicine into prevention? Recommendations from the Personalized Prevention of Chronic Diseases (PRECeDI) Consortium. Public Health Genomics. 2019;22(5–6):208–14.

Visseren FLJ, Mach F, Smulders YM, Carballo D, Koskinas KC, Bäck M, et al. 2021 ESC Guidelines on cardiovascular disease prevention in clinical practice. Eur Heart J. 2021;42(34):3227–337.

World Health Organization. Global action plan on the public health response to dementia 2017–2025. Geneva: WHO Document Production Services; 2017. p. 27.

Norton S, Matthews FE, Barnes DE, Yaffe K, Brayne C. Potential for primary prevention of Alzheimer’s disease: an analysis of population-based data. Lancet Neurol. 2014;13(8):788–94.

Mentis AFA, Dardiotis E, Efthymiou V, Chrousos GP. Non-genetic risk and protective factors and biomarkers for neurological disorders: a meta-umbrella systematic review of umbrella reviews. BMC Med. 2021;19(1):6.

Schüz J, Espina C, Villain P, Herrero R, Leon ME, Minozzi S, et al. European Code against Cancer 4th Edition: 12 ways to reduce your cancer risk. Cancer Epidemiol. 2015;39:S1-10.

Tricco AC, Langlois EtienneV, Straus SE, Alliance for Health Policy and Systems Research, World Health Organization. Rapid reviews to strengthen health policy and systems: a practical guide. Geneva: World Health Organization; 2017. Disponible en: https://apps.who.int/iris/handle/10665/258698 . [citado 3 de febrero de 2023].

White H, Albers B, Gaarder M, Kornør H, Littell J, Marshall Z, et al. Guidance for producing a Campbell evidence and gap map. Campbell Systematic Reviews. 2020;16(4). Disponible en: https://onlinelibrary.wiley.com/doi/10.1002/cl2.1125 . [citado 3 de febrero de 2023].

Aromataris E, Munn Z. editores. JBI: JBI Manual for Evidence Synthesis; 2020.

Peters MDJ, Marnie C, Tricco AC, Pollock D, Munn Z, Alexander L, et al. Updated methodological guidance for the conduct of scoping reviews. JBI Evid Synth. 2020;18(10):2119–26.

Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018;169(7):467–73.

OSF. Open Science Framework webpage. Disponible en: https://osf.io/ . [citado 8 de febrero de 2023].

Garritty C, Gartlehner G, Nussbaumer-Streit B, King VJ, Hamel C, Kamel C, et al. Cochrane Rapid Reviews Methods Group offers evidence-informed guidance to conduct rapid reviews. Journal Clin Epidemiol. 2021;130:13–22.

Leon ME, Peruga A, McNeill A, Kralikova E, Guha N, Minozzi S, et al. European code against cancer, 4th edition: tobacco and cancer. Cancer Epidemiology. 2015;39:S20-33.

Anderson AS, Key TJ, Norat T, Scoccianti C, Cecchini M, Berrino F, et al. European code against cancer 4th edition: obesity, body fatness and cancer. Cancer Epidemiology. 2015;39:S34-45.

Barone BB, Yeh HC, Snyder CF, Peairs KS, Stein KB, Derr RL, et al. Long-term all-cause mortality in cancer patients with preexisting diabetes mellitus: a systematic review and meta-analysis. JAMA. 2008;300(23):2754–64.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Barone BB, Yeh HC, Snyder CF, Peairs KS, Stein KB, Derr RL, et al. Postoperative mortality in cancer patients with preexisting diabetes: systematic review and meta-analysis. Diabetes Care. 2010;33(4):931–9.

Noto H, Tsujimoto T, Sasazuki T, Noda M. Significantly increased risk of cancer in patients with diabetes mellitus: a systematic review and meta-analysis. Endocr Pract. 2011;17(4):616–28.

Villain P, Gonzalez P, Almonte M, Franceschi S, Dillner J, Anttila A, et al. European code against cancer 4th edition: infections and cancer. Cancer Epidemiology. 2015;39:S120-38.

Scoccianti C, Cecchini M, Anderson AS, Berrino F, Boutron-Ruault MC, Espina C, et al. European Code against Cancer 4th Edition: Alcohol drinking and cancer. Cancer Epidemiology. 2016;45:181–8.

El-Serag HB. Epidemiology of viral hepatitis and hepatocellular carcinoma. Gastroenterology. 2012;142(6):1264-1273.e1.

Li XY, Zhang M, Xu W, Li JQ, Cao XP, Yu JT, et al. Midlife modifiable risk factors for dementia: a systematic review and meta-analysis of 34 prospective cohort studies. CAR. 2020;16(14):1254–68.

Ford E, Greenslade N, Paudyal P, Bremner S, Smith HE, Banerjee S, et al. Predicting dementia from primary care records: a systematic review and meta-analysis Forloni G, editor. PLoS ONE. 2018;13(3):e0194735.

Xu W, Tan L, Wang HF, Jiang T, Tan MS, Tan L, et al. Meta-analysis of modifiable risk factors for Alzheimer’s disease. J Neurol Neurosurg Psychiatry. 2015;86(12):1299–306.

PubMed   Google Scholar  

Guo Y, Xu W, Liu FT, Li JQ, Cao XP, Tan L, et al. Modifiable risk factors for cognitive impairment in Parkinson’s disease: A systematic review and meta-analysis of prospective cohort studies. Mov Disord. 2019;34(6):876–83.

Jiménez-Jiménez FJ, Alonso-Navarro H, García-Martín E, Agúndez JAG. Alcohol consumption and risk for Parkinson’s disease: a systematic review and meta-analysis. J Neurol agosto de. 2019;266(8):1821–34.

ECIS European Cancer Information System. Data explorer | ECIS. 2023. Estimates of cancer incidence and mortality in 2020 for all cancer sites. Disponible en: https://ecis.jrc.ec.europa.eu/explorer.php?$0-0$1-AE27$2-All$4-2$3-All$6-0,85$5-2020,2020$7-7,8$CEstByCancer$X0_8-3$CEstRelativeCanc$X1_8-3$X1_9-AE27$CEstBySexByCancer$X2_8-3$X2_-1-1 . [citado 22 de febrero de 2023].

Bacigalupo I, Mayer F, Lacorte E, Di Pucchio A, Marzolini F, Canevelli M, et al. A systematic review and meta-analysis on the prevalence of dementia in Europe: estimates from the highest-quality studies adopting the DSM IV diagnostic criteria Bruni AC, editor. JAD. 2018;66(4):1471–81.

Barceló MA, Povedano M, Vázquez-Costa JF, Franquet Á, Solans M, Saez M. Estimation of the prevalence and incidence of motor neuron diseases in two Spanish regions: Catalonia and Valencia. Sci Rep. 2021;11(1):6207.

Ng L, Khan F, Young CA, Galea M. Symptomatic treatments for amyotrophic lateral sclerosis/motor neuron disease. Cochrane Neuromuscular Group, editor. Cochrane Database of Systematic Reviews. 2017;2017(1). Disponible en: http://doi.wiley.com/10.1002/14651858.CD011776.pub2 . [citado 13 de febrero de 2023].

Covidence systematic review software. Melbourne, Australia: Veritas Health Innovation; 2023. Disponible en: https://www.covidence.org .

Centre for Disease Control and Prevention. Public Health Genomics and Precision Health Knowledge Base (v8.4). 2023. Disponible en: https://phgkb.cdc.gov/PHGKB/specificPHGKB.action?action=about .

Digital Solution Foundry and EPPI Centre. EPPI Centre. UCL Social Research Institute: University College London; 2022.

Download references

Acknowledgements

We are grateful for the library support received from Teresa Carretero (Instituto de Salud Carlos III, ISCIII) and, from Concepción Campos-Asensio (Hospital Universitario de Getafe, Comité ejecutivo BiblioMadSalud) for the seminar on the Scoping Reviews methodology and for their continuous teachings through their social networks.

Also, we would like to thank Dr. Héctor Bueno (Centro Nacional de Investigaciones Cardiovasculares (CNIC), Hospital Universitario 12 de Octubre) and Dr. Pascual Sánchez (Fundación Centro de Investigación de Enfermedades Neurológicas (CIEN)) for their advice in their fields of expertise.

The PROPHET project has received funding from the European Union’s Horizon Europe research and innovation program under grant agreement no. 101057721. UK participation in Horizon Europe Project PROPHET is supported by UKRI grant number 10040946 (Foundation for Genomics & Population Health).

Author information

Plans-Beriso E and Babb-de-Villiers C contributed equally to this work.

Kroese M and Pérez-Gómez B contributed equally to this work.

Authors and Affiliations

Department of Epidemiology of Chronic Diseases, National Centre for Epidemiology, Instituto de Salud Carlos III, Madrid, Spain

E Plans-Beriso, C Barahona-López, P Diez-Echave, O R Hernández, E García-Ovejero, O Craciun, P Fernández-Navarro, N Fernández-Larrea, E García-Esquinas, M Pollan-Santamaria & B Pérez-Gómez

CIBER of Epidemiology and Public Health (CIBERESP), Madrid, Spain

E Plans-Beriso, D Petrova, C Barahona-López, P Diez-Echave, O R Hernández, N F Fernández-Martínez, P Fernández-Navarro, N Fernández-Larrea, E García-Esquinas, V Moreno, F Rodríguez-Artalejo, M J Sánchez, M Pollan-Santamaria & B Pérez-Gómez

PHG Foundation, University of Cambridge, Cambridge, UK

C Babb-de-Villiers, H Turner, L Blackburn & M Kroese

Instituto de Investigación Biosanitaria Ibs. GRANADA, Granada, Spain

D Petrova, N F Fernández-Martínez & M J Sánchez

Escuela Andaluza de Salud Pública (EASP), Granada, Spain

Cambridge University Medical Library, Cambridge, UK

National Library of Health Sciences, Instituto de Salud Carlos III, Madrid, Spain

V Jiménez-Planet

Oncology Data Analytics Program, Catalan Institute of Oncology (ICO), L’Hospitalet de Llobregat, Barcelona, 08908, Spain

Colorectal Cancer Group, ONCOBELL Program, Institut de Recerca Biomedica de Bellvitge (IDIBELL), L’Hospitalet de Llobregat, Barcelona, 08908, Spain

Department of Preventive Medicine and Public Health, Universidad Autónoma de Madrid, Madrid, Spain

F Rodríguez-Artalejo

IMDEA-Food Institute, CEI UAM+CSIC, Madrid, Spain

You can also search for this author in PubMed   Google Scholar

Contributions

BPG and MK supervised and directed the project. EPB and CBV coordinated and managed the development of the project. CBL, PDE, ORH, CBV and EPB developed the search strategy. All authors reviewed the content, commented on the methods, provided feedback, contributed to drafts and approved the final manuscript.

Corresponding author

Correspondence to E Plans-Beriso .

Ethics declarations

Competing interests.

There are no conflicts of interest in this project.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: glossary., additional file 2: glossary of biomarkers that may define high risk groups., additional file 3: search strategy., additional file 4: data extraction sheet., additional file 5: example of interactive maps in cancer and primary prevention., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Plans-Beriso, E., Babb-de-Villiers, C., Petrova, D. et al. Biomarkers for personalised prevention of chronic diseases: a common protocol for three rapid scoping reviews. Syst Rev 13 , 147 (2024). https://doi.org/10.1186/s13643-024-02554-9

Download citation

Received : 19 October 2023

Accepted : 03 May 2024

Published : 01 June 2024

DOI : https://doi.org/10.1186/s13643-024-02554-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Personalised prevention
  • Precision Medicine
  • Precision prevention
  • Cardiovascular diseases
  • Chronic diseases

Systematic Reviews

ISSN: 2046-4053

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

primary vs secondary research article

Site Logo

Dean's Download: Engineering Progress Spring 2024

  • by College of Engineering Communications
  • June 05, 2024

I never had any hope of writing an opening remark that did the collaborative, interdisciplinary theme at the heart of this issue, and, truly, the college’s research enterprise at large, any justice. Because I would have, by design, been writing by myself, separated from the community of outstanding faculty members and students who call the College of Engineering home.

That’s why I have opted in favor of replacing my dean’s letter with a conversation, a practice I hope to continue moving forward. For this inaugural edition, I could think of no better person than Raissa D’Souza , our associate dean of research, who has been a champion of our Next Level strategic research vision . I hope you enjoy our discussion, and the amazing stories of discovery on the pages that follow.

Richard L. Corsi 

Dean of Engineering

Richard Corsi: To start, let’s talk about the college's vision for Next Level engineering research. In my view, that phrase represents how we will leverage our strengths and invest in areas where we can make a real impact in the world. I'd love to hear your thoughts as our associate dean for research.

Raissa D’Souza: First, we're building on the deep expertise of the faculty, and second, we're aiming to solve big social problems through engineering solutions. The world is at a tipping point, and it's time to turn the ship on things like climate change. The turn of the ship is going to happen through engineering.

RC: Last year alone, we hired 18 new assistant professors who are all doing groundbreaking work, from proving that Amazon's Alexa was spying on customers to a new research program investigating how the human body performs in space. How do you think our early career faculty influence our overall research enterprise?

RD: In short, our early career faculty are creating the future not just for the college but for society at large. They're doing fantastic work on a broad spectrum of pressing issues, including next-generation prosthetic limbs, quantum information technologies and smart health technologies like e-tattoos — innovative, bold research that is pushing the frontier.

RC: Speaking about pushing the frontier, I am excited about Aggie Square and its potential to maximize our research innovations and elevate our efforts to advance human health. What are you looking forward to with Aggie Square, and are there other notable facilities in which our engineers are advancing next level research?

RD: The development of Aggie Square will take the partnership between engineering and medicine to the next level. There's so much opportunity for smart health to reach underserved communities, and those solutions need the partnership between the medical school researchers at UC Davis Health and our engineering researchers. There will also be a lot of exciting developments with biomedical engineering, like the maker's space that's coming online, which will greatly enhance our interactions across the causeway. The Center for Nano-MicroManufacturing , or CNM2, has been a research cornerstone of the college for quite a long time and with the CHIPS and Science Act — and the need for workforce development in the semiconductor industry — there's tremendous opportunity for discovery there. And, of course, the brand-new Coffee Center . This center will provide many opportunities to think about the impact of climate sustainability and sustainable food systems alongside the health and longevity of the people who are growing and picking these crops.

RC: You once said the grand challenges of the modern world “are problems that require inherently interdisciplinary thinking, and which have engineering at the core of the solutions.” Can you help unpack that a little bit? How is the College of Engineering best positioned to tackle the world's most pressing problems?

RD: One of the reasons that our college is well positioned to do that is the comprehensive nature of UC Davis overall. These next-generation advances need partnerships between the social sciences and humanities, the legal scholars and the medical school and the business school — all these experts partnering with engineers to make the solutions that will drive things forward. I think AI is a space where we must consider various perspectives, demands and competing objectives. It's a place like UC Davis that can rise to a challenge like this because we can have conversations with world experts in all of these different domains to engineer the right solutions that human beings want.

RC: I have one last question: You've been the college's associate dean of research for two years now, and I'm curious to know, what are you most proud of?

RD: Our administration has shined more of a spotlight on the amazing things that our engineers are doing. The other thing that is very satisfying to me is the opportunity to mentor so many of our young faculty with their CAREER Awards from the National Science Foundation and with navigating the landscape of scientific publishing, especially in high-level journals, and writing grant proposals. It’s been incredibly rewarding to see their successes.

RC: We're on a fantastic trajectory, and a big part of it is because of your efforts, Raissa. Thank you.

Learn more about the college's Next Level Research Vision

This article was originally featured in the Spring 2024 Engineering Progress Magazine .

Primary category, secondary categories.

  • Open access
  • Published: 30 May 2024

Healthcare use and costs in the last six months of life by level of care and cause of death

  • Yvonne Anne Michel 1 , 2 ,
  • Eline Aas 1 , 3 ,
  • Liv Ariane Augestad 1 ,
  • Emily Burger 1 , 4 ,
  • Lisbeth Thoresen 5 &
  • Gudrun Maria Waaler Bjørnelv   ORCID: orcid.org/0000-0003-4997-5426 1 , 6  

BMC Health Services Research volume  24 , Article number:  688 ( 2024 ) Cite this article

207 Accesses

Metrics details

Existing knowledge on healthcare use and costs in the last months of life is often limited to one patient group (i.e., cancer patients) and one level of healthcare (i.e., secondary care). Consequently, decision-makers lack knowledge in order to make informed decisions about the allocation of healthcare resources for all patients. Our aim is to elaborate the understanding of resource use and costs in the last six months of life by describing healthcare use and costs for all causes of death and by all levels of formal care.

Using five national registers, we gained access to patient-level data for all individuals who died in Norway between 2009 and 2013. We described healthcare use and costs for all levels of formal care—namely primary, secondary, and home- and community-based care —in the last six months of life, both in total and differentiated across three time periods (6-4 months, 3-2 months, and 1-month before death). Our analysis covers all causes of death categorized in ten ICD-10 categories.

During their last six months of life, individuals used an average of healthcare resources equivalent to €46,000, ranging from €32,000 (Injuries) to €64,000 (Diseases of the nervous system and sense organs). In terms of care level, 63% of healthcare resources were used in home- and community-based care (i.e., in-home nursing, practical assistance, or nursing home care), 35% in secondary care (mostly hospital care), and 2% in primary care (i.e., general practitioners). The amount and level of care varied by cause of death and by time to death. The proportion of home- and community-based care which individuals received during their last six months of life varied from 38% for cancer patients to 92% for individuals dying with mental diseases. The shorter the time to death, the more resources were needed: nearly 40% of all end-of-life healthcare costs were expended in the last month of life across all causes of death. The composition of care also differed depending on age. Individuals aged 80 years and older used more home- and community-based care (77%) than individuals dying at younger ages (40%) and less secondary care (old: 21% versus young: 57%).

Conclusions

Our analysis provides valuable evidence on how much healthcare individuals receive in their last six months of life and the associated costs, broken down by level of care and cause of death. Healthcare use and costs varied considerably by cause of death, but were generally higher the closer a person was to death. Our findings enable decision-makers to make more informed resource-allocation decisions and healthcare planners to better anticipate future healthcare needs.

Peer Review reports

Healthcare resources—such as trained staff, equipment, and beds in hospitals and nursing homes—are limited; therefore, decisions about how to use available healthcare resources are inevitable in publicly funded healthcare systems. Ideally, decision-makers base their resource-allocation decisions on valid, comprehensive evidence and societal preferences which indicate what is most important to the recipients of healthcare services. In reality, decision-makers have to make high-impact decisions under conditions of great uncertainty. As a result, scarce healthcare resources may be used inefficiently, due to significant knowledge gaps about which patient group needs which healthcare resources at which level of care.

The last months of life are known to be ‘resource intensive’ [ 1 , 2 , 3 ]. Existing knowledge on resource use during the last months of life is fragmented and incomprehensive, with studies focusing on single parameters of care and patient groups most commonly diagnosed with a specific type of cancer [ 4 , 5 , 6 , 7 , 8 , 9 ]. We identified two major knowledge gaps in the existing literature on resource use and costs in the last months of life.

For the first knowledge gap, extant research on resource use in the last months of life has focused predominantly on secondary healthcare services provided at hospitals; data on the use of primary healthcare (i.e., general practitioners (GPs), emergency primary healthcare) and home- and community-based care (i.e., care institutions, home nursing) is harder to find. Only if healthcare planners are provided with knowledge about healthcare use and costs at all levels of care can they fully optimise priorities when planning for future care needs.

We are aware of a limited number of studies which report on resource use and costs beyond secondary care. A systematic review summarised healthcare use in the last months of life in 3.7 million adult cancer patients [ 10 ]. Langton and colleagues found that secondary care received in hospitals was reported in most of the studies, while components of community care, was mentioned in 41% of the studies and physician visits as an indicator of primary care was mentioned in only 30% of the studies [ 10 ]. Nevertheless, none of the included studies provided data for all levels of formal care simultaneously. Tanuseputro’s population-based study looked into healthcare costs in the last 12 months of life in Ontario in 2010–2013 [ 11 ]. This study provided evidence on costs in the last year of life broken down by healthcare sector: total costs in the last year of life consisted of an average of 43% spent on inpatient care, while physician services, medications/devices, laboratories, and emergency rooms contributing to less than 20% of total costs; almost 16% was spent on long-term-care in institutions, and approximately 8% was spent on home care [ 11 ]. However, the study did not report resource use by cause of death. Finally, a recent registry-based study from 2022 investigated care pathways for patients with different cancer diagnoses in the last six months of life for all levels of formal care [ 12 ]. The authors found that, depending on their type of cancer, patients utilised 44–66% of resources in secondary care and 31–52% in home- and community-based care during their last six months of life [ 12 ]. To our knowledge, comparable estimates for all levels of formal care are not available for causes of death other than cancer.

For the second knowledge gap, knowledge on resource use and costs in the last months of life is only available for a limited number of causes of death, such as circulatory diseases [ 13 ], stroke [ 14 ], and respiratory diseases [ 15 ]. Still, most of the available evidence is on cancer patients’ use of secondary care in the last months of life [ 5 , 6 , 7 , 8 , 9 , 10 , 16 , 17 ]. Far less is known about resource use for individuals dying with mental diseases like dementia and Alzheimer’s disease, with existing studies focusing solely on costs [ 18 ]. Healthcare planners in publicly funded healthcare systems cannot afford inefficient allocation of scarce resources for a large and fast-growing patient group like dementia: the WHO expects that 75 million individuals will suffer from dementia in 2030, with the number rising to 132 million in 2050 [ 19 ]. Thus, ageing societies worldwide have an urgent need for evidence on resource use and costs for progressive mental diseases like dementia.

We aim to address these knowledge gaps by estimating healthcare use and costs in the six last months of life for all levels of formal care—primary, secondary, and home- and community-based care—for all causes of death, for two age groups, and for three time periods before death. In doing so, we aim to provide a more complete understanding of resource use and costs in the last six months of life. Our findings will support decision-makers in making more informed decisions regarding resource allocation and healthcare planners in better anticipating future healthcare needs.

In this study, we describe healthcare use at all levels of formal care (primary, secondary, and home- and community-based care) during the last six months of life of all individuals who died in Norway between 2009 and 2013. Using a healthcare perspective, we estimated the cost of healthcare during individuals’ last six months of life. To gather this information, we drew from five patient-level national registries.

Healthcare in Norway

Norway’s healthcare system is built on the principles of universal coverage and egalitarianism: healthcare is provided based on need for treatment, regardless of a person’s socioeconomic status, ethnicity, or area of residence. Healthcare is publicly funded, primarily through taxes, and membership in the public health insurance is mandatory [ 20 ]. Norwegian municipalities organise primary and home- and community-based care. In primary care, GPs play an important role and function as gatekeepers, referring patients to specialised healthcare when necessary. GPs provide primary care during office hours and emergency primary healthcare outside office hours [ 20 ]. The guiding principle for home- and community-based care is enabling patients to stay at home for as long as possible but to move to care facilities (i.e., nursing homes) when needed. Four state-owned Regional Health Authorities are responsible for organising specialised secondary care; inpatient care is provided at hospitals, while outpatient treatments are provided both at hospitals and by self-employed specialists in private practice [ 20 ].

National registries

We retrieved data from The Norwegian Causes of Death Register (CDR) [ 21 ], The Norwegian Patient Register (NPR) [ 22 ], Norwegian Control and Payment of Health Reimbursements Database (KUHR) [ 23 ], The Individual-based Statistics for Nursing and Care Services Register (IPLOS) [ 24 ], and Statistics Norway (SSB) [ 25 ].

Causes of death

Our study population contained all decedents in Norway in between 2009 and 2013, drawn from CDR. From this registry, we retrieved information on cause of death, coded as an individual’s underlying cause of death using ICD-10 codes [ 21 ]. Data on underlying cause of death was based on an individual’s death certificate, which was completed by a physician. For example, if a cancer patient died from pneumonia, the physician reported pneumonia as the immediate cause of death and cancer as the underlying cause of death. Only one underlying cause of death per person is recorded, identifying the diagnosis that most contributed to the individual’s death. In dialogue with the registries, we agreed on the following categories of underlying cause of death: Communicable diseases (ICD-10 codes A00–B99), Cancer (C00–C97), Endocrine, nutritional, and metabolic diseases (E00–E99), Mental and behavioural diseases (F00–99), Diseases of the nervous system and sense organs (G00–H95), Diseases of the circulatory system (I00–99), Diseases of the respiratory system (J00–99), Diseases of the digestive system (K00–93), Injuries (V01–Y89), and Other diseases (L, M, N, O, P, Q, R, S, T and U). In Table  1 , we list the five most common ICD-10 codes within each of the categories described above, providing the reader with an overview of which causes of death are represented in each category.

Healthcare use and costs

Primary care.

When a patient receives primary healthcare in Norway, the provider sends a claim to The Norwegian Health Economics Administration (HELFO) [ 26 ]. These claims, their associated costs, and information on patient co-payments are entered into KUHR. We used information on treatments provided by GPs, either at the GP’s office or as emergency primary healthcare outside normal office hours. We present primary healthcare use as number of visits. Costs of primary care were also retrieved from KUHR.

Secondary care

For each secondary care treatment provided at a hospital in Norway the patient’s diagnosis and the treatment provided are registered in NPR, including information on whether inpatient or outpatient treatment was provided. All patient-related activity in hospitals is grouped into approximately 900 diagnosis-related groups (DRGs), which reflect the treatment provided and its associated mean cost across several hospitals which provide the treatment [ 27 ]. DRG costs include direct costs associated with the treatment of the disease, cost of complications during the hospital stays, and overhead costs. Additionally, we retrieved laboratory and radiology costs and patients’ co-payments from KUHR. We used information on all hospital inpatient (including day and overnight treatments) and outpatient treatments, number of days in the hospital, and total costs during the last six months of life as estimated by DRGs.

  • Home- and community-based care

All Norwegian municipalities must provide information to IPLOS [ 24 ]. We retrieved information on the number of days individuals spent in care institutions during their last six months of life. Additionally, we obtained information regarding whether individuals received home-based care in the form of practical or nursing assistance, which was measured in hours.

  • Healthcare costs

We have used a healthcare perspective and show the estimated costs in 2013 euros (€) using the 2013 annual exchange rate. All costs were estimated at patient level.

To estimate the costs of primary care services, we used information on reimbursement claims and patient co-payments which are recorded in KUHR for each GP consultation and emergency primary care visit. Costs were estimated by dividing the sum of claims and patient co-payments by 0.3. This is in line with recommendations from the Norwegian Directorate of Health, who estimated that all claims and co-payments recorded in KUHR reflect approximately 30% of the total cost of primary care [ 28 ]. Other guidelines suggest using 0.5 [ 29 ], but a recent study found that this resulted in an underestimation of actual costs [ 30 ].

Secondary care costs were estimated by multiplying DRG weights by the yearly unit price of a DRG weight. The costs of radiology and laboratory services are recorded in KUHR. Similarly to other KUHR estimates, we summed costs of radiology and laboratory services as well as patient co-payments and dividing the total cost estimate by 0.3 [ 28 , 31 ]. We added these costs to the patient-level hospital costs.

To calculate costs of home- and community-based care, we multiplied days in care institutions by SSB’s official corrected gross operating expenses, published in KOSTRA (The Municipality- State- Reporting) [ 25 ]. To estimate the costs of practical and nursing assistance, we multiplied the number of hours of each type of care service that individuals received by the corresponding cost per hour, as estimated by Langeland and colleagues [ 31 ].

We estimated total healthcare costs by adding the costs in primary, secondary, and home- and community-based care. Variables of healthcare use and costs are detailed in Table 2 . To estimate country-specific costs, the readers can multiply their country-specific unit costs by the healthcare use estimates for all decedents as presented in Table 2 and decomposed for all causes of death and by age (younger and older than 80 years) in the detailed Supplementary Material 1 - 3 .

Place of living

Based on data from NPR [ 22 ] and IPLOS [ 24 ], we estimated how many days individuals spent at home, in care institutions—including short-term care and long-term care institutions (i.e., nursing homes, sheltered housing, other round-the-clock care, and sheltered housing with 24-hour care)—and in hospitals during their last six months of life. The number of days at home was estimated by subtracting days in hospitals and in care institutions from 186 days, which corresponds to six months. We allowed days in hospitals and in care institutions to overlap, since patients who receive treatment in hospitals often keep their place in their long-term care institution.

Statistical analysis

We used descriptive statistics to summarise the average healthcare use and costs during individuals’ last six months of life. We present both total healthcare use and costs by the following time periods: all six months before death (total), as well as 6 to 4 months, 3 to 2 months, and 1 month before death Footnote 1 . To enable comparison between time periods, we present healthcare use and costs as average resource use and costs per month for all time periods Footnote 2 . We present results for all decedents as well as stratified by cause of death. For all causes of death, we describe healthcare use and costs separately for those aged older than 80 years and for those younger than 80 years at the time of death. We provide supplementary materials with detailed cause-specific healthcare use and costs at all levels of formal care for the time periods 6 to 4 months, 3 to 2 months, and 1 month before death for all decedents (Supplementary Material 1 ), for those aged younger than 80 years (Supplementary Material 2 & 4 ) and for those aged 80 years or older (Supplementary Material 3 & 4 ). To estimate relevant healthcare use for other countries or contexts, our variables on resource use can be multiplied by country- or context-specific unit costs.

Between 2009 and 2013, a total of 207,299 individuals died in Norway, or approximately 41,000 individuals per year. The majority of those who died were older than 80 years at the time of death (Table  3 ). We list the categories of underlying cause of death in order of prevalence: Diseases of the circulatory system (31%), Cancer (26%), Diseases of the respiratory system (10%), Injuries (6%), Mental and behavioural diseases (5%), Diseases of the nervous system and sense organs (4%), Diseases of the digestive system (3%), Endocrine, nutritional, and metabolic diseases (2%), Communicable diseases (2%), and Other diseases (10%). Dementia was the most common underlying cause of death in both Mental and behavioural diseases (Unspecified dementia 77% + Vascular dementia 8%) and Diseases of the nervous system and sense organs (Alzheimer’s disease 45%) (Table 1 ). The most common causes of deaths in the other categories can be viewed in Table  1 .

All decedents

For the 207,299 decedents, the average healthcare costs per individual in the last six months of life was €46,166. The majority of healthcare resources were used in home- and community-based care (63%), followed by secondary care (35%) and primary care (2%). As death approached, healthcare use increased across all levels of care. On average, individuals used €17,801 in the last month of life, compared to €7,816 per month in the 3 to 2 months before death and €4,244 per month in the 6 to 4 months before death (Table  2 ). During their last six months of life, individuals spent most days at home (52%) and in care institutions (41%), and the fewest days in hospital (7%) (Table  2 ). The number of days individuals spent at home per month decreased as death approached (-6 days) (Table  2 ); correspondingly, the average number of days individuals spent in care institutions (+ 4 days) and at the hospital (+ 3 days) increased in the same time (Table  2 ).

On average, individuals received 2 inpatient and 3 outpatient treatments, visited their GP 9 times and had 3 emergency primary healthcare visits during their last six months of life (Table  2 ). They received 18 h of practical assistance and 56 h of nursing assistance during their last six months of life (Table  2 ). Similar to costs, healthcare use increased as death approached.

By cause of death

Average total healthcare costs in the last six months of life varied by cause of death, ranging from €32,276 (Injuries) to €64,123 (Diseases of the nervous system) (Fig.  1 ). Costs were lowest in primary care and highest in home- and community-based care for all causes of death except cancer, for which costs were highest in secondary care (Fig.  1 ). Individuals used different healthcare services depending on their cause of death. For example, individuals dying with endocrine/nutritional/metabolic diseases and individuals dying with cancer both used on average approximately €48,000 in the last six months of life; however, if total costs are decomposed by care level, it can be seen that cancer patients used more than twice as much in secondary care (€28,655) compared to individuals with endocrine/nutritional/metabolic diseases (€10,931), who in turn used twice as many resources in home- and community-based care (€36,262) compared to cancer patients (€18,454, Fig.  1 & Supplementary Material 1 ). Individuals dying with mental and nervous diseases, mostly dementia, received 86–92% of their care in the last six months of life outside secondary care, mostly in home- and community-based care. In contrast to individuals with dementia, individuals with digestive diseases or injuries used less resources in home- and community-based care, 38% and 58% respectively (Supplementary Material 1 ).

figure 1

Total healthcare costs by level of care and cause of death

Place of living differed by cause of death. While individuals dying with communicable diseases, circulatory diseases, digestive diseases, injuries, or other diseases spent most days at home, individuals dying with mental and nervous diseases spent most days in care institutions. The number of days in hospital in the last six months of life varied considerably, from 3 days in hospital for patients with dementia to 24 days in hospital for cancer patients (Fig. 2 & Supplementary Material 1 ). Individuals with communicable diseases, respiratory diseases, and digestive diseases spent 12 to 15 days in hospital, while individuals with endocrine/nutritional/metabolic diseases, nervous diseases, circulatory diseases, and injuries spent 6 to 9 days in the hospital in the last six months of life (Fig.  2 & Supplementary Material 1 ).

figure 2

Place of living in the last six months of life by cause of death

Individuals dying with nervous diseases, including Parkinson’s and Alzheimer’s disease, used more practical (72 h) and nursing (110 h) assistance than those dying from other causes of death (Supplementary Material 1 ). The amount of nursing assistance received by individuals with injuries was the lowest, at 15 h, while cancer patients received the least practical assistance, at 10 h (Supplementary Material 1 ). On average, individuals with cancer received the highest number of inpatient, outpatient treatments and GP consultations, while individuals with mental and nervous diseases had the fewest (Supplementary Material 1 ).

Compared to the average cost in the last month of life (€17,800; Table  2 ), higher costs were observed for those dying with communicable, mental, nervous, endocrine/nutritional/metabolic, and respiratory diseases (Fig.  3 , Supplementary Material 1 ). In the last month of life, dying with nervous diseases was associated with the highest average costs (€29,000), while the lowest costs were observed for those dying with injuries (€11,000) (Fig.  3 , Supplementary Material 1 ). For individuals dying with all causes except cancer, home- and community-based care constituted approximately 80% of care in the last month of life. For individuals dying with mental and nervous diseases, 91–95% of care in the last month of life was provided through home- and community-based care (Fig.  3 , Supplementary Material 1 ). For detailed estimates of healthcare use and costs for all levels of care, for all causes of death and for all age groups, we refer to our comprehensive Supplementary Materials.

figure 3

Healthcare costs in the last month of life by level of care and cause of death

The total healthcare cost during the last six months of life for individuals who died before the age of 80 years was €42,053, with these costs distributed as follows: 40% in home- and community-based care, 57% in secondary care, and 3% in primary care (Table  2 , Supplementary Material 2 ). For an individual who died at the age of 80 years or older, average total healthcare costs accumulated to €49,901, with 79% spent in home- and community-based care, 21% in secondary care, and 2% in primary care (Table  2 , Supplementary Material 3 ). Home- and community-based care was the dominant form of care for those aged 80 years and older, regardless of the cause of death (Table  2 , Supplementary Material 3 & 4 ). However, among those younger than 80 years, the level of care varied depending on the cause of death (Supplementary Material 2 & 4 ). For instance, for those aged  80 years or older, the proportion of overall expenses allocated to home- and community-based care ranged from 54% for individuals with cancer to 94% for individuals with mental and nervous diseases, mostly dementia (Supplementary Material 3 ). However, for those aged  younger than 80 years at time of death, this proportion ranged from 25% (cancer) to 83% (mental and nervous diseases) (Supplementary Material 2 & 4 ). We provide comparable data for all causes of death by age (Supplementary Material 2 – 3 ), including a figure comparing age groups (Supplementary Material 4 ).

Healthcare use and costs differed by level of care, cause of death, age at death, and time to death. For all individuals who died in Norway between 2009 and 2013, the average total cost was €46,000 in the last six months of life. For all decedents, the majority of healthcare resources in the last six months of life were used at the level of home- and community-based care (63%, Fig.  1 ; Table  2 ). Whether most care was utilised in home- and community-based or secondary care differed by cause of death and by age (Supplementary Material 1 – 4 ). Those who died aged 80 years or older used most home- and community-based care across all causes of death (Supplementary Material 3 & 4 ). For those who died being younger than 80 years, the predominance of home- and community-based care was only true for individuals dying with mental and nervous diseases (Supplementary Material 2 & 4 ).

For all decedents, across all age groups, resource use increased, the shorter the time to death (Table  2 , Supplementary Material 1 ). On average, the last four weeks of life accounted for one third of all health care costs incurred in the last six months of life (Table  2 ). The costs associated with dying from injuries, circulatory diseases, and other diseases were lower than the average costs during the last six months of life, most likely due to sudden death (Supplementary Material 1 ). In contrast, individuals who died from mental and nervous diseases, communicable diseases, and respiratory diseases were more likely to have received care for a longer period of time before death, resulting in higher-than-average healthcare costs in the last months of life. Individuals dying with cancer, digestive diseases, and endocrine/nutritional/metabolic diseases had close to average costs during the last six months of life (Supplementary Material 1 & 4 ).

Our findings have important implications for decision-makers who are responsible for resource allocation in healthcare, as well as for healthcare planners who have to anticipate future healthcare needs. In the future, improved survival from some diseases will likely shift the causes of death of at the population level; for example, if improvements in cancer treatment prevent cancer-related deaths, more individuals will die from other diseases later in life rather than from cancer. Our analysis provides knowledge on resource use and costs associated with diseases beyond cancer which are common in older age, such as dementia. Dementia is currently the seventh-leading cause of death worldwide, and its prevalence is expected to double every 20 years [ 19 , 32 ]. Dementia is estimated to be one of the costliest diseases globally [ 33 ].

Kinge and colleagues estimated that dementia was the disease with the highest health spending, at 10.2% of total national health spending in Norway already in 2019 [ 30 ]. Evidence which facilitates assessment of the cost-effectiveness of new dementia drugs and which helps in planning the expected need for relevant healthcare is urgently needed around the world. We found that individuals with dementia used an above-average amount of healthcare resources in the last six months of life and that approximately 90% of these resources were used in home-and community-based care. These findings are in line with a 2023 Norwegian population-based registry study, which revealed that 78% of healthcare expenses related to dementia were spent on nursing homes [ 30 ]. Similarly, a systematic review summarized that individuals with dementia used more resources for professional home care and for nursing facilitates compared to individuals suffering from other diseases [ 18 ]. This type of cause-specific evidence can help healthcare planners prepare for future demands.

The validity of a decision-analytic model depends on the validity of the data used to populate the model. In the absence of cause-specific estimates on resource use and costs, modellers habitually use proxy parameters, which are available in the existing literature, or generic unit costs. Our study indicates that using proxy data from other disease types can be problematic: if cancer patients’ resource use is utilised to model resource use for dementia patients, this will systematically bias results—particularly the share of resource use taken up by home- and community-based care (38% for cancer patients vs. 92% for dementia patients) (Supplementary Material 1 - 4 ). Modellers should always strive to provide a complete picture of relevant disease pathways and to include the real-world economic burden of care at all levels for the entire lifespan [ 34 ]. Currently, due to gaps in knowledge regarding healthcare usage and costs, this is not feasible for all patient groups. Our findings enable the use of cause-specific estimates instead of proxy parameters, which has the potential to enhance estimates of resource use, models, and thus decisions allocating healthcare resources in various settings.

Previous studies on resource use and costs in the last months of life have often focused selectively on single causes of death and specific care variables, mainly secondary care variables. Methodological differences in samples, time frames, and healthcare settings make it difficult to compare parameters across studies. It is not possible to explain the variance in healthcare use and costs between previous studies and our findings based on the descriptive analyses we performed; nevertheless, it is helpful to put our findings into context. In the following, we focus solely on dementia, as it would be overwhelming to discuss findings for all causes of death.

The PAID 3.0, a Dutch tool initially created to incorporate future disease costs in economic evaluations, offers annual healthcare costs from the Netherlands, stratified by ICD-10 codes, age, and time to death [ 35 ]. This data is based on Dutch cost-of-illness data published in 2017 [ 36 ]. In the last year of life, the total average healthcare cost for individuals with mental and behavioural diseases (F00–99) was estimated with PAID to be €57,018 [ 35 ]. When we adjust our total cost estimate for mental diseases from 2013 to 2017, the two estimates are very similar (PAID: €57,018 vs. €58,736). The same is true for secondary care costs for individuals with mental diseases (PAID: €11,192 vs. €12,025), while for home- and community-based care, the Dutch estimate is higher than our findings (PAID: €45,826 vs. €39,891). The PAID data is based on the entire last year of life, while our findings summarize costs for the last six months of life; however, since the majority of healthcare costs occur when death approaches, we consider the comparison with PAID data to be valuable, despite the different time frames.

In a recently-published systematic review, Sontheimer and colleagues examined the costs of dementia from the time of diagnosis until death across different studies [ 18 ]. They found significant variation in total cost estimates, ranging from €1385 per person for 104 dementia patients in Argentina [ 37 ] to €48,655 per person for 541 dementia patients in residential care in Australia [ 38 ]. This wide range emphasises the importance of studies (like ours) which estimate healthcare costs in a common methodological framework. The reviewed studies support our finding that individuals with dementia receive most care through home- and community-based care: Patients with dementia had significantly higher costs for nursing facilities and professional home care for than patients without dementia. Interestingly, the total costs for inpatient and outpatient treatments were similar for patients with and without dementia. This finding supports our conclusion that the additional burden associated with dementia, compared to other causes of death, arises from demand in home and community-based care. This highlights the importance of reflecting healthcare use and costs from home and community-based care in decision analytic models.

Our findings might raise the question of whether our grouping of causes of death was detailed enough. For data anonymity reasons, the grouping of decedents into these categories of cause of death was predefined by the registries before the data were delivered to the researchers. We are nevertheless confident with the present grouping, since the categories of cause of death in this analysis cover the major causes of deaths and provide a wider range of causes of death than commonly seen in previous studies. An earlier study estimated healthcare use and costs for individuals dying with different types of cancer and showed that the specific cancer was less influential than other factors, such as individuals’ age and access to informal care [ 12 ]; whether this is true for subgroups for other causes of death could not be assessed with our dataset and thus remains largely unknown.

Generalisability

Some aspects regarding the generalisability of our findings must be discussed. First, our data come from 2009 to 2013; this time delay occurred because it took years to obtain access to comprehensive registry data. Since that period, several changes might have influenced individuals’ healthcare use in the last months of life. For instance, life-prolonging treatments might have increased survival, and patients who die today might differ from those who died in 2009–2013. Individuals dying today might be older, or they might die from different causes which can influence healthcare use. In addition, societal changes might have shifted individuals’ healthcare use. Importantly, Norway (along with other countries) is increasingly encouraging the shifting of treatment from secondary care to more local levels (i.e., the municipality); consequently, patients are meant to spend less time in hospitals, while stays in municipal care institutions are likely to increase. New analyses on updated data are needed in order to evaluate whether this has happened. To our knowledge, our estimates are currently the most comprehensive and updated with regard to resource use and costs for all decedents and for all causes of death.

Second, our findings can be generalised to settings which are similar to Norway, where healthcare is universally covered, out-of-pocket-payments are relatively low, and it is common to use formal care at the end of life. In healthcare settings with differences in incidence and severity of diseases, available healthcare resources, clinical practices, and relative price levels, our findings on healthcare use can still be informative [ 39 ]. To facilitate the adaptation of our results to other countries, we have reported our results for healthcare use and costs separately in the Supplementary Materials 1 – 4 . This enables readers to multiply our estimates on healthcare use with any other country-specific unit costs.

Third, we are aware that informal caregivers carry a considerable burden when individuals approach the end of their lives [ 40 , 41 ]. Cultural differences with regard to how much informal care families provide during this period will influence findings reporting the use of formal healthcare. In a study evaluating the number of individuals who died at home, Cohen and colleagues (2010) found that home death for persons dying with cancer varied from 12.8% in Norway to 22% in England, 23% in Wales, 28% in Belgium, 36% in Italy, and 45% in the Netherlands [ 42 ]. In 2022, 15% of all those who died from cancer in Norway died in private homes [ 21 ]. Place of death is likely connected to where individuals receive care; consequently, the amount of informal care and that of formal healthcare use might differ between these countries. In societies in which informal care is the dominant form of care in the last months of life, our findings can still be of interest, but they should be generalised with caution.

Finally, we consider it worth mentioning that it is challenging for physicians to identify the correct immediate cause of death. For this reason, we chose to use the underlying cause of death in our analysis. Still, using CDR as the source of cause of death has its limitations, primarily related to coding [ 43 ]: for example, there is a risk of different physicians coding multimorbid patients in different ways. We validated the underlying cause of death for all individuals with cancer by comparing the ICD-10 codes provided in CDR [ 21 ] with those in The Cancer Registry of Norway [ 44 ]. We found a reassuring overlap, which gives us confidence that CDR provided reliable information for all causes of death.

We report a comprehensive picture of the quantity of healthcare used during the last six months of life. At the same time, we acknowledge the relevance of assessing the quality of care. More research is needed to explore to what extent end-of-life care aligns with the preferences of patients and their next-of-kin. Unfortunately, our current dataset does not provide answers to these important questions, but we are optimistic that we can address them in future studies.

Using comprehensive, population-based registry data, we described healthcare use and costs in the last six months of life by level of care, for all decedents and stratified by ten major ICD-10 categories summarising all causes of death. Our research shows that healthcare use and costs in the last six months of life differ depending on cause of death: The total amount of healthcare varies, as does the level of care at which most resources were utilised (primary, secondary, or home- and community-based care). These findings enable decision-makers to make more informed decisions about recource allocation and healthcare planners to better anticipate future healthcare needs.

Data availability

Legal restrictions apply to the availability of the data underpinning the findings of this study, which were used under license for the current study. The data is not available upon request from the authors, and it cannot be made available to referees, editors, or readers upon request.

To preserve anonymity, we did not receive data for shorter time periods from the registries.

We have divided the numbers for the 3-month periods (e.g., 6-4 months before death) by 3 to obtain monthly estimates. For the entire analysis, we assumed no healthcare use for missing registrations.

Abbreviations

The Norwegian Causes of Death Register

Diagnosis-related group

International Classification of Diseases

General Practitioner

The Individual-based Statistics for Nursing and Care Services Register

The Municipality-State-Reporting

Norwegian Control and Payment of Health Reimbursements Database

The Norwegian Patient Register

World Health Organisation

Diernberger K, Luta X, Bowden J, Fallon M, Droney J, Lemmon E, Gray E, Marti J, Hall P. Healthcare use and costs in the last year of life: a national population data linkage study. BMJ Supportive Palliative Care. 2021. https://doi.org/10.1136/bmjspcare-2020-002708 . bmjspcare-2020-002708.

Article   PubMed   PubMed Central   Google Scholar  

Jo M, Lee Y, Kim T. Medical care costs at the end of life among older adults with cancer: a national health insurance data-based cohort study. BMC Palliat Care. 2023;22(1):76. https://doi.org/10.1186/s12904-023-01197-2 .

Chastek B, Harley C, Kallich J, Newcomer L, Paoli CJ, Teitelbaum AH. Health care costs for patients with cancer at the end of life. J Oncol Pract. 2012;8(6S):s75–80.

Article   Google Scholar  

Sun L, Legood R, dos-Santos-Silva I, Mathur Gaiha S, Sadique Z. Global treatment costs of breast cancer by stage: a systematic review. PLoS ONE. 2018;13(11):e0207993. https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0207993&type=printable .

Bremner KE, Krahn MD, Warren JL, Hoch JS, Barrett MJ, Liu N, Barbera L, Yabroff KR. An international comparison of costs of end-of-life care for advanced lung cancer patients using health administrative data. Palliat Med. 2015;29(10):918–28.

Article   PubMed   Google Scholar  

Dover LL, Dulaney CR, Williams CP, Fiveash JB, Jackson BE, Warren PP, Rocque GB. Hospice care, cancer-directed therapy, and Medicare expenditures among older patients dying with malignant brain tumors. Neurooncology. 2018;20(7):986–93.

Google Scholar  

Kyeremanteng K, Ismail A, Wan C, Thavorn K, D’Egidio G. Outcomes and cost of patients with terminal cancer admitted to acute care in the final 2 weeks of life: a retrospective chart review. Am J Hospice Palliat Med. 2019;36(11):1020–5.

Reeve R, Srasuebkul P, Langton JM, Haas M, Viney R, Pearson SA. Health care use and costs at the end of life: a comparison of elderly Australian decedents with and without a cancer history. BMC Palliat care. 2018;17:1–10. & EOL-CC study authors

Shen C, Dasari A, Gu D, Chu Y, Zhou S, Xu Y, Shih YCT. Costs of Cancer Care for Elderly patients with neuroendocrine tumors. PharmacoEconomics. 2018;36:1005–13.

Langton JM, Blanch B, Drew AK, Haas M, Ingham JM, Pearson SA. Retrospective studies of end-of-life resource utilization and costs in cancer care using health administrative data: a systematic review. Palliat Med. 2014;28(10):1167–96.

Tanuseputro P, Wodchis WP, Fowler R, Walker P, Bai YQ, Bronskill SE, Manuel D. The health care cost of dying: a population-based retrospective cohort study of the last year of life in Ontario, Canada. PLoS ONE, 2015;10(3):e0121759.

Bjørnelv G, Hagen TP, Forma L, Aas E. Care pathways at end-of-life for cancer decedents: registry based analyses of the living situation, healthcare utilization and costs for all cancer decedents in Norway in 2009–2013 during their last 6 months of life. BMC Health Serv Res. 2022;22(1):1221.

Van Bulck L, Goossens E, Morin L, Luyckx K, Ombelet F, Willems R, Budts W, De Groote K, De Backer J, Annemans L, Moniotte S, de Hosson M, Marelli A, Moons P. Last year of life of adults with congenital heart diseases: causes of death and patterns of care. Eur Heart J. 2022;43(42):4483–92. https://doi.org/10.1093/eurheartj/ehac484 .

Levy SA, Pedowitz E, Stein LK, Dhamoon MS. Healthcare Utilization for Stroke Patients at the end of life: nationally Representative Data. J Stroke Cerebrovasc Dis. 2021;30(10):106008. https://doi.org/10.1016/j.jstrokecerebrovasdis.2021.106008 .

Faes K, De Frène V, Cohen J, Annemans L. Resource Use and Health Care costs of COPD patients at the end of life: a systematic review. J Pain Symptom Manag. 2016;52(4):588–99.

Bekelman JE, Halpern SD, Blankart CR, Bynum JP, Cohen J, Fowler R, Emanuel EJ. Comparison of site of death, health care utilization, and hospital expenditures for patients dying with cancer in 7 developed countries. JAMA. 2016;315(3):272–83.

Article   CAS   PubMed   Google Scholar  

Yabroff KR, Warren JL, Brown ML. Costs of cancer care in the USA: a descriptive review. Nat Clin Pract Oncol. 2007;4(11):643–56.

Sontheimer N, Konnopka A, König HH. The excess costs of dementia: a systematic review and Meta-analysis. J Alzheimers Dis. 2021;83(1):333–54. https://doi.org/10.3233/jad-210174 .

World Health Organisation. Global action plan on the public health response to dementia 2017–2025. Geneva: World Health Organization; 2017.

Ringard Å, Sagan A, Sperre Saunes I, Lindahl AK, World Health Organization. &. (2013). Norway: health system review.

The Norwegian Institute of Public Health. Dodsarsaksregisteret [The Norwegian Causes of Death Register]. Available from: https://www.fhi.no/hn/helseregistre-og-registre/dodsarsaksregisteret/ . Accessed 30 jan 2024.

The Norwegian Directorate of Health. Norsk Pasientregister [The Norwegian Patient Register]. Available from: https://helse direk torat et. no/norsk- pasie ntreg ister- npr. Accessed 30 jan 2024.

The Norwegian Directorate of Health. KUHR-databasen [The KUHR database]. Available from: https://www.helsedirektoratet.no/tema/statistikk-registre-og-rapporter/helsedata-og-helseregistre/kuhr . Accessed 30 jan 2024.

The Norwegian Directorate of Health. IPLOS-registeret [The IPLOS register]. Available from: Accessed 30 jan 2024. https://www.helsedirektoratet.no/tema/statistikk-registre-og-rapporter/helsedata-og-helseregistre/iplos-registeret

Statistics Norway. Kommune-Stat-Rapportering 2013 [The Municipality-State-Reporting]. Available from: Accessed 30 jan 2024. https://www.ssb.no/offentlig-sektor/kostra

The Norwegian Health Economics Administration HELFO. Available from: https://www.helfo.no/english/about-helfo . Accessed 30 jan 2024.

The Norwegian Directorate of Health. Innsatsstyrt finansiering 2016 [Activity based funding]. (2023). Available from: https://www.helsedirektoratet.no/tema/finansiering/innsatsstyrt-finansiering-og-drg-systemet/innsatsstyrt-finansiering-isf . Accessed 30 jan 2024.

The Norwegian Directorate of Health. Samfunnskostnader ved sykdom og ulykker Helsetap, helsetjenestekostnader og produksjonstap fordelt pa diagnoser og risikofaktorer [Societal costs of diseases and accidents. Health loss, healthcare services and production loss according to diagnoses and risk factors]. Available from: https://dokter.no/PDF-filer/Fastlegetariff_2013.pdf . (2013). Accessed 30 jan 2024.

The Norwegian Directorate of Health. (2012). Økonomisk evaluering av helsetiltak– en veileder [Economic evaluation of healthcare interventions– a guide]. Available from: https://www.helsedirektoratet.no/veiledere/okonomisk-evaluering-av-helsetiltak . Accessed 07 march 2024.

Kinge, J. M., Dieleman, J. L., Karlstad, Ø., Knudsen, A. K., Klitkou, S. T., Hay,S. I.,… Vollset, S. E. Disease-specific health spending by age, sex, and type of care in Norway: a national health registry study. BMC medicine, 2023;21(1):201.

Langeland E, Førland O, Aas E, Birkeland A, Folkestad B, Kjeken I. Modeler for hverdagsrehabilitering - en følgeevaluering i norske kommuner. Effekter for brukerne og gevinster for kommunene? [Models for everyday rehabilitation - a follow-up evaluation in Norwegian municipalities. Effects for the users and gains for the municipalities?] (2016). Available from: https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/2389813 . Accessed 30 jan 2024.

Prince M, Wimo A, Guerchet M, Ali GC, Wu YT, Prina M. World Alzheimer report 2015. The global impact of dementia: an analysis of prevalence, incidence, cost and trends (Doctoral dissertation, Alzheimer’s disease international). 2015.

Launer LJ. Statistics on the burden of dementia: need for stronger data. Lancet Neurol. 2019;18:25–7.

McCaffrey N, Currow DC. Separated at birth? BMJ Supportive Palliative Care. 2015;5(1):2–3. https://doi.org/10.1136/bmjspcare-2015-000855 .

Kellerborg K, Perry-Duxbury M, de Vries L, van Baal P. Practical guidance for including future costs in economic evaluations in the Netherlands: introducing and applying PAID 3.0. Value Health. 2020;23(11):1453–61.

Rijksinstituut voor Volksgezondheid en Milieu (RIVM). [Dutch National Institute for Public Health and the Environment] https://www.volksgezondheidenzorg.info/ . Accessed 30 jan 2024.

Rojas G, Bartoloni L, Dillon C, Serrano CM, Iturry M, Allegri RF. Clinical and economic characteristics associated with direct costs of Alzheimer’s, frontotemporal and vascular dementia in Argentina. Int Psychogeriatr. 2011;23(4):554–61.

Gnanamanickam, E. S., Dyer, S. M., Milte, R., Harrison, S. L., Liu, E., Easton, T.,… Crotty, M. Direct health and residential care costs of people living with dementia in Australian residential aged care. International journal of geriatric psychiatry, 2018;33(7):859–866.

Sculpher, M. J., Pang, F. S., Manca, A., Drummond, M. F., Golder, S., Urdahl, H.,… Eastwood, A. (2004). Generalisability in economic evaluation studies in healthcare:a review and case studies.

Bauer JM, Sousa-Poza A. Impacts of informal caregiving on caregiver employment, health, and family. J Popul Ageing. 2015;8:113–45.

Bolin K, Lindgren B, Lundborg P. Informal and formal care among single-living elderly in Europe. Health Econ. 2008;17(3):393–409.

Cohen J, Houttekier D, Onwuteaka-Philipsen B, Miccinesi G, Addington-Hall J, Kaasa S, Deliens L. Which patients with cancer die at home? A study of six European countries using death certificate data. J Clin Oncol. 2010;28(13):2267–73.

Pedersen AG, Ellingsen CL. (2015). Data quality in the Causes of Death Registry. Journal of the Norwegian Medical Association Available from: https://tidsskriftet.no/en/2015/05/perspectives/data-quality-causes-death-registry . Accessed 30 jan 2024.

The Cancer Registry of Norway. Kreftregisteret [The Cancer Registry of Norway]. Available from: https://www.kreftregisteret.no/en/General/About-the-Cancer-Registry/ . Accessed 30 jan 2024.

Download references

Acknowledgements

We would like to thank the Norwegian Cancer Society for funding this work as part of the SAFE project (research grant number: 208164). We thank Pauline Keller for her help in editing the final manuscript. We acknowledge two anonymous reviewers for their valuable feedback that helped us to improve this article.

The research was funded by the Norwegian Cancer Association, research grant number: 208164.

Open access funding provided by Norwegian University of Science and Technology

Author information

Authors and affiliations.

Department of Health Management and Health Economics, Institute of Health and Society, University of Oslo, Oslo, Norway

Yvonne Anne Michel, Eline Aas, Liv Ariane Augestad, Emily Burger & Gudrun Maria Waaler Bjørnelv

Faculty of Social Sciences, University of Applied Sciences Zittau/ Görlitz, Görlitz, Germany

Yvonne Anne Michel

Division for Health Services, Norwegian Institute of Public Health, Oslo, Norway

Center for Health Decision Science, Harvard T.H. Chan School of Public Health, Boston, MA, USA

Emily Burger

Department for Interdisciplinary Health Sciences, Institute of Health and Society, University of Oslo, Oslo, Norway

Lisbeth Thoresen

Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway

Gudrun Maria Waaler Bjørnelv

You can also search for this author in PubMed   Google Scholar

Contributions

All authors (YAM, EA, LAA, EB, LT, GWB) created a study plan. GWB and EA applied for ethical approval and collected the data. GWB conducted analyses in cooperation with YAM, and the results were continuously discussed with EA, LAA, EB, and LT. YAM drafted the manuscript, and EA, LAA, EB, LT, and GWB reviewed the manuscript throughout the process. All authors approved the final draft. We used a large language model, DeepL write ( www.deepl.com/write ), to improve the language of this article.

Corresponding author

Correspondence to Gudrun Maria Waaler Bjørnelv .

Ethics declarations

Ethics approval and consent to participate.

The Norwegian Ethics Committee and the Norwegian Data Protection Authority (ref no 2013/2090), in addition to all the registry owners, approved this study. Registry owners gave us administrative permission to access and use the data. The registry owners include the Norwegian Directorate of Health, the National Institute of Public Health, and Statistics Norway. The need for informed consent was waived by the Regional Committee for Medical Research Ethics South East Norway, since data was retrieved from national registries for the purpose of research, for which informed consent is not required. We confirm that all methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1: All decedents by all causes fo death

Supplementary material 2: decedents younger than 80 years by all causes of death, supplementary material 3: decedents older than 80 years by all causes of death, supplementary material 4: comparing healthcare costs by age at death, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Michel, Y.A., Aas, E., Augestad, L.A. et al. Healthcare use and costs in the last six months of life by level of care and cause of death. BMC Health Serv Res 24 , 688 (2024). https://doi.org/10.1186/s12913-024-10877-5

Download citation

Received : 29 March 2023

Accepted : 19 March 2024

Published : 30 May 2024

DOI : https://doi.org/10.1186/s12913-024-10877-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • End-of-life
  • Healthcare use
  • Cause of death

BMC Health Services Research

ISSN: 1472-6963

primary vs secondary research article

Disclaimer » Advertising

  • HealthyChildren.org

Issue Cover

  • Previous Article
  • Next Article

Study Design and Setting

Study participants, study variables/data sources/measurement, study outcomes, conclusions, acknowledgments, emergency department evaluation of young infants with head injury.

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • CME Quiz Close Quiz
  • Open the PDF for in another window
  • Get Permissions
  • Cite Icon Cite
  • Search Site

Todd W. Lyons , Rebekah Mannix , Michael C. Monuteaux , Sara A. Schutzman; Emergency Department Evaluation of Young Infants With Head Injury. Pediatrics June 2024; 153 (6): e2023065037. 10.1542/peds.2023-065037

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Video Abstract

We compared the emergency department (ED) evaluation and outcomes of young head-injured infants to older children.

Using the Pediatric Health Information Systems database, we performed a retrospective, cross-sectional analysis of children <2 years old with isolated head injuries (International Classification of Diseases, 10th Revision, diagnoses) at one of 47 EDs from 2015 to 2019. Our primary outcome was utilization of diagnostic cranial imaging. Secondary outcomes were diagnosis of traumatic brain injury (TBI), clinically important TBI, and mortality. We compared outcomes between the youngest infants (<3 months old) and children 3 to 24 months old.

We identified 112 885 ED visits for children <2 years old with isolated head injuries. A total of 62 129 (55%) were by males, and 10 325 (9.1%) were by infants <3 months of age. Compared with older children (12–23 months old), the youngest infants were more likely to: Undergo any diagnostic cranial imaging (50.3% vs 18.3%; difference 31.9%, 95% confidence interval [CI] 35.0–28.9%), be diagnosed with a TBI (17.5% vs 2.7%; difference 14.8%, 95% CI 16.4%–13.2%) or clinically important TBI (4.6% vs 0.5%; difference 4.1%, 95% CI 3.8%–4.5%), and to die (0.3% vs 0.1%; difference 0.2%, 95% CI 0.3%–0.1%). Among those undergoing computed tomography or MRI, TBIs were significantly more common in the youngest infants (26.4% vs 8.8%, difference 17.6%, 95% CI 16.3%–19.0%).

The youngest infants with head injuries are significantly more likely to undergo cranial imaging, be diagnosed with brain injuries, and die, highlighting the need for a specialized approach for this vulnerable population.

Although the youngest infants with head injuries are the most susceptible to radiation from imaging, they are also some of the most challenging to evaluate with the added layer of concern for the possibility of abusive head trauma.

This study found the youngest infants (0–3 months old) were nearly 3 times more likely to undergo cranial imaging and 9 times more likely to be diagnosed with a clinically important traumatic brain injury than older toddlers (12–24 months old).

Traumatic brain injury (TBI) is a leading cause of emergency department (ED) visits, morbidity, and mortality in children. However, the majority of head injuries are mild and most children make a full recovery without intervention. 1   – 4   The ED evaluation of children with head injuries may involve cranial imaging, commonly computed tomography (CT) scans. 5   Concerns about the long-term effects of ionizing radiation have led to efforts to reduce CT utilization, including guidelines by the Centers for Disease Control and Prevention to not routinely obtain neuroimaging in children with mild TBI. 6   – 10   Providers are tasked with identifying children who will benefit from neuroimaging, while avoiding radiation exposure in those who will not. To aid clinicians, clinical decision tools stratify a child’s risk of a TBI. 4   , 11   – 13   These tools, including the Pediatric Emergency Care Applied Research Network (PECARN) decision rules, identify children with clinically important TBI (ciTBI). 14   However, the evaluation of young infants is challenging, because significant injuries can be masked, young infants are also at the highest risk of radiation-induced injury, 15   and providers may have concern for abusive head trauma. 16   Despite these concerns, little is known about the ED evaluation and outcomes of young infants with head injuries.

We sought to measure how the evaluation and outcomes of the youngest infants (<3 months old) compare with other young children (3–24 months old) in a multicenter study of children diagnosed with head injury in the ED. Understanding current evaluation and outcomes of these young infants can inform future interventions to optimize care for this vulnerable population.

We performed a retrospective, cross-sectional study of children diagnosed in the ED with a head injury. Data were obtained from the Pediatric Health Information System (PHIS), an administrative database that contains inpatient; ED; ambulatory surgery and observation encounter-level data from not-for-profit, tertiary care pediatric hospitals in the United States to support a wide range of improvement activities including clinical effectiveness; resource utilization; care-guideline development; and more. 17   These hospitals are affiliated with the Children’s Hospital Association (Lenexa, Kansas). Data quality and reliability are assured through a joint effort between the Children’s Hospital Association and the participating hospitals. Portions of the data submission and data quality processes for the PHIS database are managed by IBM Watson Health (Ann Arbor, Michigan). For the purposes of external benchmarking, participating hospitals provide discharge/encounter data including demographics, diagnoses, and procedures. These hospitals also submit resource utilization data (eg, pharmaceuticals, imaging, and laboratory studies). Data are deidentified at the time of data submission but are given unique identifiers to allow for tracking across multiple encounters. For this study, we included 47 hospitals contributing data for any portion of the 5-year study period. This study was conducted in accordance with the REporting of studies Conducted using Observational Routinely-collected health Data guidelines and was approved by the institutional review board of our institution with a waiver of informed consent. 18  

We included children <2 years of age with a first ED visit for head injury (defined as no previous visit for a head injury in the preceding 3 months) between January 2015 and December 2019. These dates were selected to be inclusive of the International Classification of Diseases, 10th Revision (ICD-10), period and conclude before the coronavirus disease 2019 pandemic, because practice patterns may have changed during this time period. Head injury visits were identified using the previously described ICD-10 codes ( Appendix 1 ) as assigned by the treating providers. 19   Children transferred to a participating hospital were not included in the initial query because their evaluation before arrival at the PHIS hospital is unknown. Because we were interested in isolated head injuries, we excluded children evaluated for other injuries, defined by the utilization of diagnostic imaging of the chest (CT or chest x-ray, ultrasound), abdomen (CT, x-ray, ultrasound), pelvis (CT, x-ray, ultrasound), or extremities (x-ray), exclusive of those who underwent radiographic imaging of these body regions as part of a skeletal survey for the evaluation of abusive head trauma. We also excluded children undergoing procedures not related to head injury management. A multidisciplinary team of providers representing emergency medicine, critical care, radiology, and neurosurgery reviewed a list of all procedures performed to reach consensus on whether they were consistent with the evaluation or management of isolated head/neck injury. Because children with head injuries often undergo evaluation for neck injury, we did not exclude children who had diagnostic imaging or procedures for neck injury. All visits to the participating PHIS institutions meeting inclusion and exclusion criteria were included in final analyses.

We abstracted data for children from their ED/hospital encounter including demographics (age, sex, insurance status) and visit details (center, length of stay, ED disposition, mortality). We abstracted procedure codes for each included encounter. Utilization of cranial imaging was defined as the presence of Clinical Transaction Classification (CTC) codes for head/brain CT, head/brain MRI, ultrasound of the head/brain, or plain radiography of the skull ( Appendix 2 ). We defined intubation as the presence of the PHIS mechanical ventilation flag defined by ICD-10 procedures codes for respiratory ventilation (5A1935Z, 5A1945Z, or 5A1955Z) or CTC codes for mechanical ventilation (521166) or other specified ventilation assistance (521169). 20   We defined neurosurgery as the presence of an operating room charge flag in a child with an isolated head injury as defined above. We used performance of a skeletal survey (CTC imaging codes 427800–427899) as a marker for abusive head trauma evaluation. We considered children admitted if they were admitted to the hospital (floor or ICU) or placed in observation status.

Our primary outcome was utilization of any emergent cranial imaging (CT, MRI, x-ray, ultrasound, or combination of modalities) on day 0 to 1 of the index ED/hospital encounter. Secondary outcomes evaluated included: Diagnosis of TBI (epidural, subdural or subarachnoid hemorrhage, traumatic cerebral edema, diffuse TBI, other specified intracranial injuries, unspecified intracranial injuries [ Appendix 3 ]), diagnosis of ciTBI (TBI plus death, neurosurgery, hospital admission for 2 or more calendar days, or intubation/mechanical ventilation), 4   diagnosis of skull fracture, evaluation for abusive head trauma, hospital admission, neurosurgery, and mortality.

We used frequencies and proportions to characterize demographics features of the study cohort. We calculated the proportion of encounters with each of the study outcomes, stratified by age (<3 months, 3–5 months, 6–11 months, and 12–23 months). These age groups were chosen to represent distinct developmental stages, particularly with regard to independent mobility. We compared children in the youngest age group (<3 months) to children in the older 3 age groups on the primary and secondary outcomes using risk differences and 95% confidence intervals (CIs) with robust SEs clustered on hospital, to account for intrahospital correlation.

We tested for temporal trends in the utilization of cranial imaging, using logistic regression models with the cranial imaging outcome as the dependent variable and calendar year as the independent variable. Analyses were stratified by age group and with robust SEs, as described above. To evaluate practice variation across sites, we calculated hospital-level utilization of any cranial imaging (x-ray, CT, MRI, or ultrasound) stratified by age group (ie, the proportion of patients within age group at each hospital with the cranial imaging as outcome). These hospital- and age group-level proportions were depicted using box and whisker plots. We compared variability between the age group-specific, hospital-level proportions using an equality of variance test that is robust to assumptions about the normality of the underlying population distribution from which the samples were drawn. 21  

All statistical tests were 2-tailed, with α set at .05, and were conducted using Stata 16.0 (College Station, Texas).

We identified 124 923 nontransfer ED visits over the study period with a head injury diagnosis, of whom 112 885 ED visits met all inclusion and exclusion criteria, including ( Fig 1 ), including 10 325 (9.1%) visits by infants <3 months of age. Characteristics of included visits are summarized in Table 1 .

ED visit identification.

ED visit identification.

Demographic Characteristics of Emergency Department Encounters for Children Aged Less Than 24 Months With Head Injury ( n = 112 885)

Insurance data missing for 2641 encounters (2.3%).

Overall, 29 423 (26.1%) children underwent cranial imaging. Compared with older age groups, the youngest infants (<3 months old) were more likely to have cranial imaging performed; more likely to be diagnosed with a TBI and ciTBI, and skull fracture (with or without TBI); more likely to undergo evaluation for abusive head trauma; more likely to be admitted to the hospital (including to the ICU); and more likely to undergo neurosurgery ( Table 2 ). The youngest infants were also at increased risk for mortality compared with the 6- to 11-month and 12- to 23-month-old age groups. Among the subgroup of children who underwent cranial imaging with CT or MRI ( Table 3 ), rates of TBI and ciTBI were higher in the youngest infants (<3 months old) compared with all other older age groups.

Outcomes for Children Diagnosed With Head Injury in the Emergency Department Stratified By Age

Values in table represent frequency (percentage).

Diagnosis of TBI with at least 1 of the following: Mechanical ventilation, neurosurgery, death, or a length of stay ≥3 days.

Statistically significant ( P < .05) risk difference when compared with the 0- to 3-month-old age group.

Risk of TBI Among Patients With CT or MRI Performed, Stratified by Age

Significantly different compared with <3-month-old age group at P < .05.

The most common imaging modality for all age groups was CT scan, followed by MRI ( Table 2 ). Trends in the use of imaging modalities are shown in Fig 2 . We observed a modest declining linear trend over time in the utilization of skull x-ray across all age groups (overall x-ray utilization in 2016 and 2019: 1.7% and 1.2%, respectively). However, MRI utilization increased over time across all age groups, except the 12- to 23-month-old group (test for linear trend: Odds ratio [95% CI] = 1.11 [0.98–1.26]). The highest MRI utilization was seen among infants 0 to 3 months of age in 2019, with rates reaching 7.6%. There was no evidence of a linear trend over time in CT rates except among the 12- to 23-month-old group, where a small upward trend was detected (test for linear trend: Odds ratio [95% CI] = 1.04 [1.01–1.08]).

Trends in types of cranial imaging over time for children <2 years of age with an ED visit for head injury stratified by age group. A, skull x-ray; B, MRI; and C, CT.

Trends in types of cranial imaging over time for children <2 years of age with an ED visit for head injury stratified by age group. A, skull x-ray; B, MRI; and C, CT.

The variability in the rates of cranial imaging by center, stratified by age, are depicted in Fig 3 . There was significantly greater variability in hospital-level imaging rates among the youngest infants <3 months (SD = 0.11) compared with those aged 6 to 11 (SD = 0.09; P = .049) and 12 to 23 months (SD = 0.07; P < .001). There was also greater variability in hospital-level cranial imaging rates among patients aged 3 to 5 months (SD = 0.12) compared with those aged 12 to 23 months ( P = .003).

Hospital-level variability in rates of cranial imaging by center, stratified by age group.

Hospital-level variability in rates of cranial imaging by center, stratified by age group.

In this multicenter, cross-section study of children aged <2 years old presenting to 47 EDs diagnosed with isolated head injury by the treating providers, we found that infants aged <3 months had a 50% likelihood of undergoing any cranial imaging, a rate 2.7 times higher than children 12 to 23 months of age. One in 6 infants <3 months old were diagnosed with a TBI, a rate 6.5 times higher than children 13 to 23 months old. One in 22 infants <3 months old were diagnosed with a ciTBI, a rate 9.2 times higher than children 13 to 23 months old. Among the subset of children undergoing cranial imaging with either a CT or MRI, 26% of infants <3 months old were diagnosed with a TBI, a rate 3 times that of children 13 to 23 months old. These youngest infants were also more likely to be admitted to the hospital, to undergo evaluation for abusive head trauma, to have neurosurgery performed, and to die of their injuries. Furthermore, rates of imaging and types of imaging varied by both center and time. We observed an increase in MRI over time, especially in the youngest infants. Our data suggest young infants are undergoing a significant amount of diagnostic imaging, with a high rate of TBI diagnosis (not accounted for just by the increased imaging) with significant variability in care. The high rates of TBI and ciTBI appear to justify the higher rates of imaging, and resource utilization and variability in care suggest novel approaches to this vulnerable age group may be needed.

These data augment previous work on head injury in children and young infants. In the largest study to date, the PECARN group found rates of ciTBI in the <2-year-old age group of 0.9%. 4   A substudy of infants <3 months of age found higher rates of ciTBI and TBI on CT of 2.3% and 9.5%, respectively, with 59% undergoing CT imaging. 14   Although our observed rates of ciTBI and TBI are significantly higher, it is important to note that our study was not limited to those children with minor head trauma, and the inclusion of more severely injured children may explain this higher rate of injury. However, we found a similarly high rate of cranial imaging in this population. Our study augments these works by its inclusion of all head-injured children <2 years old, comparing evaluation and diagnosis by age subsets, and reflects the practice patterns of clinicians across a large, national sample of centers.

We observed a significantly higher rate of cranial imaging, including CT scan in this cohort of infants <3 months of age, when compared with older children. However, this higher rate of imaging appears justified by the higher rates of injuries, including ciTBI. Overall, the rate of ciTBI in the cohort of children was 4.6%, a rate previously felt high enough to warrant emergent cranial imaging. 4   Importantly, the significantly higher rate of ciTBI and TBI persisted (3.5 and 3 times higher, respectively, among infants <3 months of age when compared with those 3–23 months old), even when considering only patients with CT or MRI performed, suggesting it is not indiscriminate use of imaging in this youngest age group that is driving higher rates of TBI and ciTBI. Furthermore, 28% of the youngest infants with head injury underwent evaluation for abusive head trauma as indicated by the performance of a skeletal survey, suggesting abusive head trauma remains a significant concern among providers caring for nonmobile infants.

We observed an increase in utilization of MRI over time, especially for the youngest infants. The clinical indication for MRI cannot be ascertained from these data. However, this trend may represent the desire for clinicians to image this vulnerable population without exposing them to ionizing radiation, because these infants are at highest risk from ionizing radiation. 9   , 22   Previous studies have suggested that MRI, including rapid MRI, may be a viable imaging modality for children undergoing evaluation for minor head trauma, with similar accuracy of TBI diagnosis to CT. 23   – 26   Although previous data have suggested that MRI may miss cases of isolated skull fracture, 26   more recent data suggest new MRI sequences may improve the MRI’s ability to detect these injuries. 27   Furthermore, isolated skull fractures are exceedingly unlikely to require additional management, unless there is concern for abusive trauma. 28   Finally, the observed variability in rates of imaging across centers suggests the opportunity for standardization of care. We observed the largest variability in cranial imaging utilization in the youngest infants. Although accurate clinical decision tools exist for the evaluation of children with minor head trauma, 4   , 11   , 12   these data suggest application and use of these rules may still vary between centers, especially in these youngest infants.

Together, these data suggest an opportunity for evaluating novel approaches to these youngest infants. Given the perceived need by clinicians for imaging in this age group, as demonstrated by half undergoing cranial imaging (a rate seemingly justified by the significantly higher rate of brain injury), novel approaches may be able to reduce radiation exposure among this most vulnerable population without missing injuries. In previous studies, the PECARN rule performed well for identifying infants <3 months of age with a ciTBI; however, infants who met low risk criteria remained at risk for TBI on CT, something some providers may feel uncomfortable with. 14   A recent attempt to refine the rule for this age group also performed well in ruling out ciTBI, TBI, and skull fracture. 29   Therefore, implementation of such a rule may help reduce imaging in those who are low-risk. However, there may be other reasons why clinicians choose to image this patient population including concern for abusive head trauma, and long-term neurodevelopmental consequences of missed TBI on CT. 30   – 32   For children without concern for abusive head trauma, rapid brain MRI may represent a radiation-sparing approach that can rule out TBI. However, its limitation in identifying skull fractures and some specific types of TBI must be considered. 26   Finally, protocols that may identify children at low risk for abusive head trauma who have no other risk factors for TBI may be able to reduce the need for cranial imaging in this population. 33   , 34  

Our study has limitations. First, our study was performed at tertiary care children’s hospitals, and results may not be generalizable to other settings. 35   However, data suggest imaging rates may be even higher at general EDs. 36   , 37   Second, we used administrative data and lack clinical information. We cannot comment on the appropriateness of imaging or the association between specific symptoms and imaging/TBI. Furthermore, inclusion in our study relied upon a diagnosis code for head injury assigned by the treating providers. It is not known how the utilization of these codes corresponds to the types and severity (mild, moderate, or severe) of head injuries. However, these codes have been used in other large head injury studies, 19   and it is unlikely that this variation alone would explain the magnitude of difference we observed in our objective outcomes. Third, we aimed to limit our analysis to children with isolated head injury (by diagnostic and procedure codes). It is possible that some children were inappropriately included or excluded on the basis of this decision. However, because this was applied across all age groups, it’s unlikely to affect the interpretation of our results. Finally, our definition of ciTBI was based on diagnostic codes and not clinical data. Although diagnostic codes are accurate for surgical interventions, death, and mechanical ventilation, we cannot know if patients were hospitalized for 2 or more days solely for their head injury.

In this multicenter study of children <2 years year old undergoing evaluation in the ED for head injuries, infants <3 months of age had markedly higher rates of cranial imaging, TBI and ciTBI, hospital admission, neurosurgery, abusive head trauma evaluation, and mortality. We also found significant variability in care for these youngest infants including rising utilization of MRI. These data underscore the importance of a thoughtful evaluation and exploration of novel and more standardized approaches to this vulnerable population, specifically given the high rates of injury and increased sensitivity to the effects of ionizing radiation.

We thank Drs Kirsten Ecklund, Stephen Voss, Christopher Weldon, and Mark Proctor for reviewing diagnostic procedures for our patients to aid in determining if they were related to head injury management.

Drs Lyons, Mannix, and Schutzman conceptualized and designed the study, contributed to data collection, supervised data analyses, and drafted the initial manuscript; Dr Monuteaux designed the study, and abstracted and performed data analyses; and all authors reviewed and revised the manuscript, approved the final manuscript as submitted, and agree to be accountable for all aspects of the work.

COMPANION PAPER: A companion to this article can be found online at www.pediatrics.org/cgi/doi/10.1542/peds.2023-065511 .

FUNDING: No external funding.

CONFLICT OF INTEREST DISCLOSURES: The authors have indicated they have no conflicts of interest relevant to this article to disclose.

clinically important traumatic brain injury

confidence interval

computed tomography

Clinical Transaction Classification

emergency department

International Classification of Disease, 10th Edition

Pediatric Emergency Care Applied Research Network

Pediatric Health Information System

traumatic brain injury

Supplementary data

Advertising Disclaimer »

Citing articles via

Email alerts.

primary vs secondary research article

Affiliations

  • Editorial Board
  • Editorial Policies
  • Journal Blogs
  • Pediatrics On Call
  • Online ISSN 1098-4275
  • Print ISSN 0031-4005
  • Pediatrics Open Science
  • Hospital Pediatrics
  • Pediatrics in Review
  • AAP Grand Rounds
  • Latest News
  • Pediatric Care Online
  • Red Book Online
  • Pediatric Patient Education
  • AAP Toolkits
  • AAP Pediatric Coding Newsletter

First 1,000 Days Knowledge Center

Institutions/librarians, group practices, licensing/permissions, integrations, advertising.

  • Privacy Statement | Accessibility Statement | Terms of Use | Support Center | Contact Us
  • © Copyright American Academy of Pediatrics

This Feature Is Available To Subscribers Only

Sign In or Create an Account

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 05 June 2024

Machine learning predicts upper secondary education dropout as early as the end of primary school

  • Maria Psyridou 1 ,
  • Fabi Prezja 2 ,
  • Minna Torppa 3 ,
  • Marja-Kristiina Lerkkanen 3 ,
  • Anna-Maija Poikkeus 3 &
  • Kati Vasalampi 4  

Scientific Reports volume  14 , Article number:  12956 ( 2024 ) Cite this article

Metrics details

  • Computer science
  • Human behaviour

Education plays a pivotal role in alleviating poverty, driving economic growth, and empowering individuals, thereby significantly influencing societal and personal development. However, the persistent issue of school dropout poses a significant challenge, with its effects extending beyond the individual. While previous research has employed machine learning for dropout classification, these studies often suffer from a short-term focus, relying on data collected only a few years into the study period. This study expanded the modeling horizon by utilizing a 13-year longitudinal dataset, encompassing data from kindergarten to Grade 9. Our methodology incorporated a comprehensive range of parameters, including students’ academic and cognitive skills, motivation, behavior, well-being, and officially recorded dropout data. The machine learning models developed in this study demonstrated notable classification ability, achieving a mean area under the curve (AUC) of 0.61 with data up to Grade 6 and an improved AUC of 0.65 with data up to Grade 9. Further data collection and independent correlational and causal analyses are crucial. In future iterations, such models may have the potential to proactively support educators’ processes and existing protocols for identifying at-risk students, thereby potentially aiding in the reinvention of student retention and success strategies and ultimately contributing to improved educational outcomes.

Introduction

Education is often heralded as the key to poverty reduction, economic prosperity, and individual empowerment, and it plays a pivotal role in shaping societies and fostering individual growth 1 , 2 , 3 . However, the specter of school dropout casts a long shadow, with repercussions extending far beyond the individual. Dropping out of school is not only a personal tragedy but also a societal concern; it leads to a lifetime of missed opportunities and reduced potential alongside broader social consequences, including increased poverty rates and reliance on public assistance. Existing literature has underscored the link between school dropout and diminished wages, unskilled labor market entry, criminal convictions, and early adulthood challenges, such as substance use and mental health problems 4 , 5 , 6 , 7 . The socioeconomic impacts, which range from reduced tax collections and heightened welfare costs to elevated healthcare and crime expenditures, signal the urgency of addressing this critical issue 8 . Therefore, understanding and preventing school dropout is crucial for both individual and societal advancement.

Beyond its economic impact, education differentiates individuals within the labor market and serves as a vehicle for social inclusion. Students’ abandonment of the pursuit of knowledge translates into social costs for society and profound personal losses. Dropping out during upper secondary education disrupts the transition to adulthood, impedes career integration, and compromises societal well-being 9 . The strong link between educational attainment and adult social status observed in Finland and globally 10 underscores the importance of upper secondary education as a gateway to higher education and the labor market.

An increase in school drop-out rates in many European countries 11 is leading to growing pockets of marginalized young people. In the European Union (EU), 9.6% of individuals between 18 and 24 years of age did not engage in education or training beyond the completion of lower secondary education 12 . This disconcerting statistic raises alarms about the challenge of preventing early exits from the educational journey. Finnish statistics 13 highlight that 0.5% of Finnish students drop out during lower secondary school, but this figure is considerably higher at the upper secondary level, with dropout rates of 13.3% in vocational school and 3.6% in general upper secondary school. Amid this landscape, there is a clear and pressing need to not only support out-of-school youths and dropouts but also identify potential dropouts early on and prevent their potential disengagement. In view of the far-reaching consequences of school dropout for individuals and societies, social policy initiatives have rightly prioritized preventive interventions.

Machine learning has emerged as a transformative technology across numerous domains, particularly promising for its capabilities in utilizing large datasets and leveraging non-linear relationships. Within machine learning, deep learning 14 has gained significant traction due to its ability to outperform traditional methods given larger data samples. Deep learning has played a significant role in advancements in fields such as medical computer vision 15 , 16 , 17 , 18 , 19 , 20 and, more recently, in large foundation models 21 , 22 , 23 , 24 . Although machine learning methods have significantly transformed various disciplines, their application in education remains relatively unexplored 25 , 26 .

In education, only a handful of studies have harnessed machine learning to automatically classify between cases of students dropping out from upper secondary education or continuing in education. Previous research in this field has been constrained by short-term approaches. For instance, some studies have focused on collecting and analyzing data within the same academic year 27 , 28 . Others have restricted their data collection exclusively to the upper secondary education phase 29 , 30 , 31 , while one study has expanded its dataset to include data collection of student traits across both lower and upper secondary school years 32 . Only one previous study has focused on predicting dropout within the next three years following the collection of trait data 33 , and another study aimed at predictions within the next 5 years 34 . However, the process of dropping out of school often begins in early school years and is marked by a gradual disengagement and disassociation from education 35 , 36 . These findings suggest that current machine learning models might need to incorporate data that spans further back into the past. In this study we extended this time horizon by leveraging a 13-year longitudinal dataset, utilizing features from kindergarten up to Grade 9 (age 15-16). In this study, we provide the first results for the automatic classification of upper secondary school dropout and non-dropout, using data available as early as the end of primary school.

Given that the process of dropping out of school often begins in early school years and may be influenced by a multitude of factors, our study utilized data from a comprehensive longitudinal study. We aimed to include a broad spectrum of traits that existing literature has shown to have a direct or indirect association with school dropout 37 , 38 , 39 . From the available variables in the dataset, we incorporated features covering family background (e.g. parental education, socio-economic status), individual factors (e.g. gender, school absences, burn-out), behavioral patterns (e.g. prosocial behaviors, hyperactivity), motivation and engagement metrics (e.g. self-concept, task value, teacher-student relationships), experiences of bullying, health behaviors (e.g. smoking, alcohol use), media usage, and academic and cognitive performance (e.g. reading fluency, arithmetic skills). By incorporating this diverse set of features, we aimed to capture a holistic view of the students’ educational journey from kindergarten through the end of lower secondary school.

This study is guided by two main research questions:

Can predictive models, developed using a comprehensive longitudinal dataset from kindergarten through Grade 9, accurately classify students’ upper secondary dropout and non-dropout status at age 19?

How does the performance of machine learning classifiers in predicting school dropout compare when utilizing data up to the end of primary school (Grade 6; age 12-13) versus data up to the end of lower secondary school (Grade 9)? Can model predictions be made as early as Grade 6 without significantly compromising accuracy?

In response to these questions, we hypothesized that a comprehensive longitudinal dataset would facilitate the development of predictive models that could accurately classify dropout and non-dropout status as early as Grade 6. However, we acknowledge that the inherent variability in individual dropout factors may constrain the overall performance of these models. Additionally, we posit that while models trained with data up to Grade 9 are likely to demonstrate higher predictive accuracy than those trained with data only up to Grade 6, accurate model predictions could still be achieved with data up to Grade 6.

We trained and validated machine learning models, with a 13-year longitudinal dataset, to create classification models for upper secondary school dropout. Four supervised classification algorithms were utilized: Balanced Random Forest (B-RandomForest), Easy Ensemble (Adaboost Ensemble), RSBoost (Adaboost), and the Bagging Decision Tree. Six-fold cross-validation was used for the evaluation of performance. Confusion matrices were calculated for each classifier to evaluate performance. The methodological research workflow is presented in Fig.  1 .

figure 1

Proposed research workflow. Our process begins with data collection over 13 years, from kindergarten to the end of upper secondary education (Step 1), followed by data processing which includes cleaning and imputing missing feature values (Step 2). We then apply four machine learning models for dropout and non-dropout classification (Step 3), and evaluate these models using 6-fold cross-validation, focusing on performance metrics and ROC curves (Step 4).

This study used existing longitudinal data from the “First Steps” follow-up study 40 and its extension, the “School Path: From First Steps to Secondary and Higher Education” study 41 . The entire follow-up spanned a 13-year period, from kindergarten to the third (final) year of upper secondary education. In the “First Steps” study, approximately 2,000 children born in 2000 were followed 10 times from kindergarten (age 6–7) to the end of lower secondary school (Grade 9; age 15-16) in four municipalities around Finland (two medium-sized, one large, and one rural). The goal was to examine students’ learning, motivation, and problem behavior, including their academic performance, motivation and engagement, social skills, peer relations, and well-being, in different interpersonal contexts. The rate at which the contacted parents agreed to participate in the study ranged from 78% to 89% in the towns and municipalities - depending on the town or municipality. Ethnically and culturally, the sample was very homogeneous and representative of the Finnish population, and parental education levels were very close to the national distribution in Finland 42 . In the “School Path” study, the participants of the “First Steps” follow-up study and their new classmates ( \(N = 4160\) ) were followed twice after the transition to upper secondary education: in the first year (Grade 10; age 16-17) and in the third year (Grade 12; age 18-19).

The present study focused on those participants who took part in both the “First Steps” study and the “School Path” study. Data from three time points across three phases of the follow-up were used. Data collection for Time 1 (T1) took place in Fall 2006 and Spring 2007, when the participants entered kindergarten (age 6-7). Data collection for Time 2 (T2) took place during comprehensive school (ages 7-16), which extended from the beginning of primary school (Grade 1; age 7-8) in Fall 2007 to the end of the final year of the lower secondary school (Grade 9; age 15-16) in Spring 2016. For Time 3 (T3), data were collected at the end of 2019, 3.5 years after the start of upper secondary education. We focused on students who enrolled in either general upper secondary school (the academic track) or vocational school (the vocational track) following comprehensive school, as these tracks represent the most typical choices available for young individuals in Finland. Common reasons for not completing school within 3.5 years included students deciding to discontinue their education or not fulfilling specific requirements (e.g. failing mandatory courses) during their schooling.

At T1 and T2, questionnaires were administered to the participants in their classrooms during normal school days, and their academic skills were assessed through group-administered tasks. Questionnaires were administered to parents as well. At T3, register information on the completion of upper secondary education was collected from school registers. In Finland, the typical duration of upper secondary education is three years. For the data collection in comprehensive school (T1 and T2), written informed consent was obtained from the participants’ guardians. In the secondary phase (T3), the participants themselves provided written informed consent to confirm their voluntary participation. The ethical statements for the follow-up study were obtained in 2006 and 2018 from the Ethical Committee of the University of Jyväskylä.

The target variable in the 13-year follow-up was the participant’s status 3.5 years after starting upper secondary education, as determined from the school registers. Participants who had not completed upper secondary education by this time were coded as having dropped out. Initially, we considered the assessment of 586 features. However, as is common in longitudinal studies, missing values were identified in all of them. Features with more than 30% missing data were excluded from the analysis, and a total of 311 features were used (with one-hot encoding) (see Supplementary Table S3 ). These features covered family background (e.g. parental education, socio-economic status), individual factors (e.g. gender, absences from school, school burn-out), the individual’s behavior (e.g. prosocial behavior, hyperactivity), motivation (e.g. self-concept, task value), engagement (e.g. teacher-student relationships, class engagement), bullying (e.g. bullied, bullying), health behavior (e.g. smoking, alcohol use), media usage (e.g. use of media, phone, internet), cognitive skills (e.g. rapid naming, raven), and academic outcomes (i.e. reading fluency, reading comprehension, PISA scores, arithmetic, and multiplication). Figure  2 presents an overview of the features used while Fig.  3 summarizes the features used in the models, the grades and the corresponding ages for each grade, and the time points (T1, T2, T3) at which different assessments were conducted. The Supplementary Table S3 provides details about the features included.

figure 2

Features domains used for the classification of education dropout and non-dropout. The model incorporated a set of 311 features, categorized into 10 domains: family background, individual factors, behavior, motivation, engagement, bullying experiences, health behavior, media usage, cognitive skills, and academic outcomes. Each domain encompassed a variety of measures.

figure 3

Gantt chart summarizing the features used in the models, the grades and the corresponding ages for each grade, and the time points (T1, T2, T3) at which different assessments were conducted. Assessments from Grades 7 and 9 were not included in the models predicting dropout with data up to Grade 6.

Data processing

In our study, we employed a systematic approach to address missing values in the dataset. Initially, the percentage of missing data was calculated for each feature, and features exhibiting more than 30% missing values were excluded. For categorical features, imputation was performed using the most frequent value within each feature, while a median-based strategy was applied to numeric features. To ensure unbiased imputation, imputation values were derived from a temporary dataset where the majority class (i.e. non-dropout cases) was randomly sampled to match the size of the positive class (i.e. dropout cases).

  • Machine learning

In our study, we utilized a range of balanced classifiers from the Imbalanced Learning Python package 43 for benchmarking. These classifiers were employed with their default hyperparameter settings. Our selection included Balanced Random Forest, Easy Ensemble (Adaboost Ensemble), RSBoost (Adaboost), and Bagging Decision Tree. Notably, the Balanced Random Forest classifier played a pivotal role in our study. We delve into its performance, specific configuration, and effectiveness in the following section. Below are descriptions of each classifier:

Balanced random forest : This classifier modifies the traditional random forest 44 approach by randomly under-sampling each bootstrap sample to achieve balance. In our study, we refer to the classifier as “B-RandomForest”.

Easy ensemble (Adaboost ensemble) : This classifier, known as EasyEnsemble 45 , is a collection of AdaBoost 46 learners that are trained on differently balanced bootstrap samples. The balancing is realized through random under-sampling. In our study, we refer to the classifier as “E-Ensemble”.

RSBoost (Adaboost) : This classifier integrates random under-sampling into the learning process of AdaBoost. It under-samples the sample at each iteration of the boosting algorithm. In our study, we refer to the classifier as “B-Boosting”.

Bagging decision tree : This classifier operates similarly to the standard Bagging 47 classifier in the scikit-learn library 48 using decision trees 49 , but it incorporates an additional step to balance the training set by using a sampler. In our study, we refer to the classifier as “B-Bagging”.

Each of these classifiers was selected for their specific strengths in handling class imbalances, a critical consideration of our study’s methodology. The next section elaborates on the performance and configurations of these classifiers, particularly B-RandomForest.

Random forest

The Random Forest (RF) method, introduced by Breiman in 2001 44 , is a machine learning approach that employs a collection of decision trees for prediction tasks. This method’s strength lies in its ensemble nature, where multiple “weak learners” (individual decision trees) combine to form a “strong learner” (the RF). Typically, decision trees in an RF make binary predictions based on various feature thresholds. The mathematical representation of a single decision tree’s prediction, ( \(T_d\) ) for an input vector \({\varvec{I}}\) is given by the following formula:

Here, n signifies the total nodes in the tree, \(v_i\) is the value predicted at the i -th node, \(f_i({\varvec{I}})\) is the i -th feature of the input vector \({\varvec{I}}\) , \(t_i\) stands for the threshold at the i -th node, and \(\delta\) represents the indicator function.

In an RF, the collective predictions from D individual decision trees are aggregated to form the final output. For regression problems, these outputs are typically averaged, whereas a majority vote (mode) approach is used for classification tasks. The prediction formula for an RF ( \(F_D\) ) on an input vector \({\varvec{I}}\) , is as follows:

In this equation, \(T_d({\varvec{I}})\) is the result from the d -th tree for input vector \({\varvec{I}}\) , and D is the count of decision trees within the forest. Random Forests are particularly effective for reducing overfitting compared to individual decision trees because they average results across a plethora of trees. In our study, we utilized 100 estimators with default settings from the scikit-learn library 48 .

Figures of merit

To evaluate the efficacy of our classification models, we employed a set of essential evaluative metrics, known as figures of merit.

The accuracy metric reflects the fraction of correct predictions (encompassing both true positive and true negative outcomes) in comparison to the overall number of predictions. The formula for accuracy is as follows:

Notably, given the balanced nature of our target data, the accuracy rate in our analysis equated to the definition of balanced accuracy.

Precision, or the positive predictive value, represents the proportion of true positive predictions out of all positive predictions made. The equation to determine precision is as follows:

Recall, which is alternatively called sensitivity, quantifies the percentage of actual positives that were correctly identified. The formula for calculating recall is as follows:

Specificity, also known as the true negative rate, measures the proportion of actual negatives that were correctly identified. The formula for specificity is as follows:

The F1 Score is the harmonic mean of precision and recall, providing a balance between the two metrics. It is particularly useful when the class distribution is imbalanced. The formula for the F1 Score is as follows:

In these formulas, \(\text{TP}\) represents true positives, \(\text{TN}\) stands for true negatives, \(\text{FP}\) refers to false positives, and \(\text{FN}\) denotes false negatives.

The balanced accuracy metric, as referenced by Brodersen et al. in 2010 50 , is a crucial measure in the context of classification tasks, particularly when dealing with imbalanced datasets. This metric is calculated as follows:

Essentially, this equation is an average of the recall computed for each class. The balanced accuracy metric is particularly effective since it accounts for class imbalance by applying balanced sample weights. In situations where the class weights are equal, this metric is directly analogous to the conventional accuracy metric. However, when class weights differ, the metric adjusts accordingly and weights each sample based on the true class prevalence ratio. This adjustment makes the balanced accuracy metric a more robust and reliable measure in scenarios where the class distribution is uneven. In line with this approach, we also employed the macro average of F1 and Precision in our computations.

A confusion matrix is a vital tool for understanding the performance of a classification model. In the context of our study, the performance of each classification model was encapsulated by binary confusion matrices. One matrix was a \(2\times 2\) table categorizing the predictions into four distinct outcomes. In the columns of the matrix,the classifications predicted by the model are represented and categorized as Predicted Positive and Predicted Negative. The rows signify the actual classifications, which are labeled as Actual Positive and Actual Negative.

The upper-left cell is the True Negatives (TN), which are instances where the model correctly predicted the negative class.

The upper-right cell is the False Positives (FP), which are cases where the model incorrectly predicted the positive class for actual negatives.

The lower-left cell is the False Negatives (FN), where the model incorrectly predicted the negative class for actual positives.

Finally, the lower-right cell shows ’True Positives (TP)’, where the model correctly predicted the positive class.

In our study, we aggregated the results from all iterations of the cross-validation process to generate normalized average binary confusion matrices. Normalization of the confusion matrix involves converting the raw counts of true positives, false positives, true negatives, and false negatives into proportions, which account for the varying class distributions. This approach allows for a more comparable and intuitive understanding of the model’s performance, especially when dealing with imbalanced datasets. By analyzing the normalized matrices, we obtain a comprehensive view of the model’s predictive performance across the entire cross-validation run, instead of relying on a single instance.

The AUC score is a widely used metric in machine learning for evaluating the performance of binary classification models. Derived from the receiver operating characteristic (ROC) curve, the AUC score quantifies a model’s ability to distinguish between two classes. The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. By varying the threshold that determines the classification decision, the ROC curve illustrates the trade-off between sensitivity (TPR) and specificity (1 - FPR). The TPR and FPR are defined as follows:

The AUC score represents the area under the ROC curve and ranges from 0 to 1. An AUC score of 0.50 is equivalent to random guessing and indicates that the model has no discriminative ability. On the other hand, a model with an AUC score of 1.0 demonstrates perfect classification. A higher AUC score suggests a better model performance in terms of distinguishing between the positive and negative classes.

Cross-validation

In this study, we employed the stratified K-fold cross-validation method with \(K=6\) to ascertain the robustness and generalizability of our approach 51 . This method partitions the dataset into k distinct subsets, or folds with an even distribution of class labels in each fold to reflect the overall dataset composition. For each iteration of the process, one of these folds is designated as the test set, while the remaining folds collectively form the training set. This cycle is iterated k times, with a different fold used as the test set each time. This technique was crucial in our study to ensure that the model’s performance would be consistently evaluated against varied data samples. One formal representation of this process with \(K=6\) , is as follows:

Here, \({\mathscr {M}}\) represents the machine learning model, \({\mathscr {D}}\) is the dataset, \({\mathscr {D}}_k^\text{train}\) and \({\mathscr {D}}_k^\text{test}\) respectively denote the training and test datasets for the \(k\) -th fold, and \(\text{Eval}\) is the evaluation function (e.g. accuracy, precision, recall).Our AUC plots have been generated using the forthcoming version of utility functions from the Deep Fast Vision Python Library 52 .

Ethics declarations

Ethical approval for the original data collection was obtained from the Ethical Committee of the University of Jyväskylä in 2006 and 2018, ensuring that all experiments were performed in accordance with relevant guidelines and regulations.

This study utilized a comprehensive 13-year longitudinal dataset from kindergarten through upper secondary education. We applied machine learning techniques with data up to Grade 9 (age 15-16), and subsequently with data up to Grade 6 (age 12–13), to classify registered upper secondary education dropout and non-drop out status. The dataset included a broad range of educational data on students’ academic and cognitive skills, motivation, behavior, and well-being. Given the imbalance observed in the target, we trained four classifiers: Balanced Random Forest, or B-RandomForest; Easy Ensemble (AdaBoost Ensemble), or E-Ensemble; RSBoost (Adaboost), or B-Boosting; and Bagging Decision Tree, or B-Bagging. The performance of each classifier was evaluated using six-fold cross-validation, as shown in Fig.  4 and Table  1 .

figure 4

Confusion matrices for classifiers using data up to Grade 9 (first row) and up to Grade 6 (second row) averaged across all folds in six-fold cross-validation.

Our analysis using data up to Grade 9 (Fig.  4 , Table  1 ), revealed that the B-RandomForest classifier was the most effective, as it achieved the highest balanced mean accuracy (0.61). It also showed a recall rate of 0.60 (i.e. dropout class) and a specificity of 0.62 (i.e. non-dropout class). While the other classifiers matched or exceeded the specificity (B-Bagging: 0.78, E-Ensemble: 0.64, B-Boosting: 0.62), they underperformed in classifying true positives (B-Bagging: 0.32, B-Boosting: 0.50, E-Ensemble: 0.56) and had higher false negative rates (B-Bagging: 0.68, B-Boosting: 0.48, E-Ensemble: 0.45). The B-RandomForest classifier demonstrated a mean area under the curve (AUC) of 0.65, which indicated good discriminative ability (Fig.  5 ).

figure 5

The ROC Curves for the B-RandomForest classifiers from cross-validation. Left: Curve for the B-RandomForest classifier trained using data up to Grade 9. Right: Curve for another classifier instance trained using data up to Grade 6.

We further obtained the feature scores for the B-RandomForest models across the six-fold cross-validation (Fig.  6 ; for the full list, refer to Supplementary Table S1 ). The top 20 rankings of the features (averaged across folds) fell into two domains: cognitive skills and academic outcomes. The Supplementary Table S3 provides a detailed description of all features. Academic outcomes appeared as the dominant domain and included reading fluency skills in Grades 1, 2, 3, 4, 7, and 9, reading comprehension in Grade 1, 2, 3, and 4, PISA reading comprehension outcomes, arithmetic skills in Grades 1, 2, 3, and 4, and multiplication skills in Grades 4 and 7. Among the top ranked features were two cognitive skills assessed in kindergarten: rapid automatized naming (RAN) which involved naming a series of visual stimuli consisting of pictures of objects (e.g. a ball, a house) as quickly as possible and vocabulary.

figure 6

The top ranked 20 features for the B-RandomForest using data up to Grade 9. Features are listed in order of average score from top to bottom. The scores are averages from across all folds of the six-fold cross-validation. The features listed pertain to: READ2=Reading fluency, Grade 2; READ4=Reading fluency, Grade 4; READ3=Reading fluency, Grade 3; READ1=Reading fluency, Grade 1; RAN=Rapid Automatized Naming, Kindergarten; multSC7=Multiplication, Grade 4; ariSC4=Arithmetic, Grade 1 spring; ly1C5C=Reading comprehension, Grade 2; ariSC6=Arithmetic, Grade 3; ly4C7C=Reading comprehension, Grade 4; ly1C4C=Reading comprehension, Grade 1; ariSC7=Arithmetic, Grade 4; ariSC5=Arithmetic, Grade 2; ppvSC2=Vocabulary, Kindergarten; pisaC10total_sum=PISA, Grade 9; ariSC3=Arithmetic, Grade 1 fall; multSC9=Multiplication, Grade 7; READ9=Reading fluency, Grade 9; READ7=Reading fluency, Grade 7; ly1C6C=Reading comprehension, Grade 3.

Classifying school dropout using data up to grade 6

Using data from kindergarten up to Grade 6, we retrained the same four classifiers on this condensed dataset and evaluated their performance using a six-fold cross-validation method (Fig.  4 , Table  2 ). The B-RandomForest classifier performed the highest, with a balanced mean accuracy of 0.59. It showed a recall rate of 0.59 (dropout class) and a specificity of 0.59 (non-dropout class). In comparison, the other classifiers had higher specificities (B-Bagging: 0.76, B-Boosting: 0.62, E-Ensemble: 0.61) but lower true positives (recall rates: B-Bagging: 0.30, B-Boosting: 0.50, E-Ensemble: 0.56) and exhibited higher false negative rates (B-Bagging: 0.70, B-Boosting: 0.50, E-Ensemble: 0.44). The B-RandomForest classifier demonstrated an AUC of 0.61 (Fig.  5 ). The performance of this classifier was slightly lower but comparable to that of the classifier that used the more extensive dataset up to Grade 9.

We obtained the feature scores for the B-RandomForest models across the six-fold cross-validation with data up to Grade 6 (Fig.  7 ; for the full list, refer to Supplementary S2 ). The top 20 feature ranks included four domains: cognitive skills, academic outcomes, motivation, and family background. The Supplementary Information contains a detailed description of all features (Table S3 ). Similarly to the previous models academic outcomes ranked highest, consisting of reading fluency skills in Grades 1, 2, 3, 4, and 6, reading comprehension in Grades 1, 2, 4, and 6, arithmetic skills in Grades 1, 2, 3, and 4, and multiplication skills in Grades 4 and 6. Motivational factors, parental education level and two cognitive skills assessed in kindergarten - RAN and vocabulary - were also included in the ranking.

figure 7

The top ranked 20 features for the B-RandomForest using data up to Grade 6. Features are listed in order of average score from top to bottom. The scores are averages from across all folds of the six-fold cross-validation. READ1=Reading fluency, Grade 1; READ2=Reading fluency, Grade 2; READ4=Reading fluency, Grade 4; multSC7=Multiplication, Grade 4; READ3=Reading fluency, Grade 3; RAN=Rapid Automatized Naming, Kindergarten; ariSC4=Arithmetic, Grade 1 spring; multSC8=Multiplication, Grade 6; READ6=Reading fluency, Grade 6; ariSC6=Arithmetic, Grade 3; ly1C5C=Reading comprehension, Grade 2; ariSC7=Arithmetic, Grade 4; ariSC5=Arithmetic, Grade 2; voedo=Parental education; ly4C7C=Reading comprehension, Grade 4; ly1C4C=Reading comprehension, Grade 1; ppvSC2=Vocabulary, Kindergarten; tavma_g6=Task value for math, Grade 6; ariSC3=Arithmetic, Grade 1 fall; ly6C8C=Reading comprehension, Grade 6.

This study signifies a major advancement in educational research, as it provides the first predictive models leveraging data from as early as kindergarten to forecast upper secondary school dropout. By utilizing a comprehensive 13-year longitudinal dataset from kindergarten through upper secondary education, we developed predictive models using the Balanced Random Forest (B-RandomForest) classifier, which effectively predicted both dropout and non-dropout cases from as early as Grade 6.

The classifier’s consistency was evident from its performance, which showed only a slight decrease in the AUC from 0.65 with data up to Grade 9 to 0.61 with data limited up to Grade 6. These results are particularly significant since they demonstrate predictive ability. Upon further validation and investigation, and by collecting more data, this approach may assist in the prediction of dropout and non-dropout as early as the end of primary school. However, it is important to note that the deployment and practical application of these findings must be preceded by further data collection, study, and validation. The developed predictive models offered some substantial indicators for future proactive approaches to help educators in their established protocols for identifying and supporting at-risk students. Such an approach could set a new precedent for enhancing student retention and success, potentially leading to transformative changes in educational systems and policies. While our predictive models marked a significant advancement in early automatic identification, it is important to recognize that this study is just the first step in a broader process.

The use of register data was a strength of this study because it allowed us to conceptualize dropout not merely as a singular event but as a comprehensive measure of on-time upper secondary education graduation. This approach is particularly relevant for students who do not graduate by the expected time, as it highlights their high risk of encountering problems in later education and the job market and underscores the need for targeted supplementary support 37 , 53 . This conceptualization of dropout offers several advantages 53 as it aligns with the nuanced nature of dropout and late graduation dynamics in educational practice. Additionally, it avoids mistakenly applying the dropout category to students who switch between secondary school tracks yet still graduate within the expected timeframe or drop out multiple times but ultimately graduate on time. From the perspective of the school system, delays in graduation incur substantial costs and necessitate intensive educational strategies. This nuanced understanding of dropout and non-dropout underpins the primary objective of our approach: to help empower educators with tools that can assist them in their evaluation of intervention needs.

In our study, we adopted a comprehensive approach to feature collection, acknowledging that the process of dropping out begins in early school years 35 and evolves through protracted disengagement and disassociation from education 36 . With over 300 features covering a wide array of domains - such as family background, individual factors, behavior, motivation, engagement, bullying, health behavior, media usage, cognitive skills, and academic outcomes - our dataset presents a challenge typical of high-dimensional data: the curse of dimensionality.This phenomenon, where the volume of the feature space grows exponentially with the number of features, can lead to sparsity of data and make pattern recognition more complex.

To address these challenges, we employed machine learning classifiers like Random Forest, which are particularly adept at managing high-dimensional data. Random Forest inherently performs a form of feature selection, which is crucial in high-dimensional spaces, by building each tree from a random subset of features. This approach not only helps in addressing the risk of overfitting but also enhances the model’s ability to identify intricate patterns in the data. This comprehensive analysis, with the use of machine learning, not only advances the methodology in automatic dropout and non-dropout prediction but also provides educators and policymakers with valuable tools and insights into the multifaceted nature of dropout and non-drop out identification from the perspective of machine learning classifiers.

In our study, the limited size of the positive class, namely the dropout cases, posed a significant challenge due to its impact on classification data balance. This imbalance steered our methodological decisions, leading us to forego both neural network synthesis and conventional oversampling techniques. Instead, we focused on using classification methods designed to handle highly imbalanced datasets. Our strategy was geared towards effectively addressing the issues inherent in working with severely imbalanced classification data.

Another important limitation to acknowledge pertains to the initial dataset and the subsequent handling of missing data. The study initially recruited around 2,000 kindergarten-age children and then invited their classmates to join the study at each subsequent educational stage. While this approach expanded the participant pool, it also resulted in a significant amount of missing data in many features. To maintain reliability, we excluded features with more than 30% missing values. This aspect of our methodological approach highlights the challenges of managing large-scale longitudinal data. Future studies might explore alternative strategies for handling missing data or investigate ways to include a broader range of features for feature selection, while mitigating the impact of incomplete data and the curse of dimensionality.

Despite these limitations, this study confronts the shortcomings of current research, particularly the focus on short-term horizons. Previous studies that have used machine learning to predict upper secondary education dropout have operated within limited timeframes - by collecting data on student traits and dropout cases within the same academic year 27 , 28 , limiting the collection of data on student traits to upper secondary education 29 , 30 , 31 , and by collecting data on student traits during both lower and upper secondary school years 32 . Two previous studies have focused on predicting dropout within three years 33 and five years 34 , respectively, of collecting the data. The present study has extended this horizon by leveraging a 13-year longitudinal dataset, utilizing features from kindergarten, and predicting upper secondary school dropout and non-dropout as early as the end of primary school.

Our study identified a set of top features from Grades 1 to 4 that were highlighted by the Random Forest classifier as influential in predicting school dropout or non-dropout status. These features included aspects like reading fluency, reading comprehension, and arithmetic skills. These top feature rankings did not significantly change with data up to Grades 9 and 6. It is important to note that these features were identified based on their utility in improving the model’s predictions within the dataset and cross-validation and should not be interpreted as causal or correlational factors for dropout and non-dropout rates. Given these limitations, and considering known across-time feature correlations 54 , 55 , 56 , 57 , 58 , 59 , we find it pertinent to suggest further speculative discussions of this ranking consistency between early and later academic grades. If, upon further data collection, validation, and correlational and causal analysis this kind of ranking profile is re-established and validated, it could indicate that early proficiency in these key academic areas could potentially be an important factor influencing students’ educational trajectory and dropout risk.

In conclusion, this study represented a significant leap forward in educational research by developing predictive models that automatically distinguished between dropouts and non-dropouts as early as Grade 6. Utilizing a comprehensive 13-year longitudinal dataset, our research enriches existing knowledge of automatic school dropout and non-dropout detection and surpasses the time-frame confines of prior studies. While incorporating data up to Grade 9 enhanced predictive accuracy, the primary aim of our study was to predict potential school dropout status at an early stage. The Balanced Random Forest classifier demonstrated proficiency across educational stages. Although confronted with challenges such as handling missing data and dealing with small positive class sizes, our methodological approach was meticulously designed to address such issues.

The developed predictive models demonstrate potential for further investigation. Given that our study predominantly utilized data from the Finnish educational system, it is not clear how the classifiers would perform with different populations. Additional data, including data from populations from different demographic and educational contexts, and further validation using independent test sets are essential. Further independent correlational and causal analyses are also crucial. In future iterations, such models may have the potential to proactively support educators’ processes and existing protocols for identifying at-risk students, thereby potentially aiding in the reinvention of student retention and success strategies, and ultimately contributing to improved educational outcomes.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Huisman, J. & Smits, J. Keeping children in school: Effects of household and context characteristics on school dropout in 363 districts of 30 developing countries. SAGE Open 5 , 2158244015609666. https://doi.org/10.1177/2158244015609666 (2015).

Article   Google Scholar  

Breton, T. R. Can institutions or education explain world poverty? An augmented Solow model provides some insights. J. Socio-Econ. 33 , 45–69. https://doi.org/10.1016/j.socec.2003.12.004 (2004).

The World Bank. The Human Capital Index 2020 Update: Human Capital in the Time of COVID-19 (The World Bank, 2021).

Google Scholar  

Bäckman, O. High school dropout, resource attainment, and criminal convictions. J. Res. Crime Delinq. 54 , 715–749. https://doi.org/10.1177/0022427817697441 (2017).

Bjerk, D. Re-examining the impact of dropping out on criminal and labor outcomes in early adulthood. Econ. Educ. Rev. 31 , 110–122. https://doi.org/10.1016/j.econedurev.2011.09.003 (2012).

Campolieti, M., Fang, T. & Gunderson, M. Labour market outcomes and skill acquisition of high-school dropouts. J. Labor Res. 31 , 39–52. https://doi.org/10.1007/s12122-009-9074-5 (2010).

Dragone, D., Migali, G. & Zucchelli, E. High school dropout and the intergenerational transmission of crime. IZA Discuss. Paper https://doi.org/10.2139/ssrn.3794075 (2021).

Catterall, J. S. The societal benefits and costs of school dropout recovery. Educ. Res. Int. 2011 , 957303. https://doi.org/10.1155/2011/957303 (2011).

Freudenberg, N. & Ruglis, J. Reframing school dropout as a public health issue. Prev. Chronic Dis. 4 , A107 (2007).

PubMed   PubMed Central   Google Scholar  

Kallio, J. M., Kauppinen, T. M. & Erola, J. Cumulative socio-economic disadvantage and secondary education in Finland. Eur. Sociol. Rev. 32 , 649–661. https://doi.org/10.1093/esr/jcw021 (2016).

Gubbels, J., van der Put, C. E. & Assink, M. Risk factors for school absenteeism and dropout: A meta-analytic review. J. Youth Adolesc. 48 , 1637–1667. https://doi.org/10.1007/s10964-019-01072-5 (2019).

Article   PubMed   PubMed Central   Google Scholar  

EUROSTAT. Early leavers from education and training (2021).

Official Statistics of Finland (OSF). Discontinuation of education (2022).

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521 , 436–444 (2015).

Article   ADS   CAS   PubMed   Google Scholar  

Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 , 115–118. https://doi.org/10.1038/nature21056 (2017).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. Lancet Digit. Health 1 , e271–e297. https://doi.org/10.1016/S2589-7500(19)30123-2 (2019).

Article   PubMed   Google Scholar  

Prezja, F., Annala, L., Kiiskinen, S., Lahtinen, S. & Ojala, T. Synthesizing bidirectional temporal states of knee osteoarthritis radiographs with cycle-consistent generative adversarial neural networks. Preprint at http://arxiv.org/abs/2311.05798 (2023).

Prezja, F., Paloneva, J., Pölönen, I., Niinimäki, E. & Äyrämö, S. DeepFake knee osteoarthritis X-rays from generative adversarial neural networks deceive medical experts and offer augmentation potential to automatic classification. Sci. Rep. 12 , 18573. https://doi.org/10.1038/s41598-022-23081-4 (2022).

Prezja, F. et al. Improving performance in colorectal cancer histology decomposition using deep and ensemble machine learning. Preprint at http://arxiv.org/abs/2310.16954 (2023).

Topol, E. J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 25 , 44–56. https://doi.org/10.1038/s41591-018-0300-7 (2019).

Article   CAS   PubMed   Google Scholar  

Wornow, M. et al. The shaky foundations of clinical foundation models: A survey of large language models and foundation models for emrs. Preprint at http://arxiv.org/abs/2303.12961 (2023).

Peng, Z. et al. Kosmos-2: Grounding multimodal large language models to the world. Preprint at http://arxiv.org/abs/2306.14824 (2023).

Livne, M. et al. nach0: Multimodal natural and chemical languages foundation model. Preprint at http://arxiv.org/abs/2311.12410 (2023).

Luo, Y. et al. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. Preprint at http://arxiv.org/abs/2308.09442 (2023).

Bernardo, A. B. I. et al. Profiling low-proficiency science students in the Philippines using machine learning. Humanit. Soc. Sci. Commun. 10 , 192. https://doi.org/10.1057/s41599-023-01705-y (2023).

Bilal, M., Omar, M., Anwar, W., Bokhari, R. H. & Choi, G. S. The role of demographic and academic features in a student performance prediction. Sci. Rep. 12 , 12508. https://doi.org/10.1038/s41598-022-15880-6 (2022).

Krüger, J. G. C., Alceu de Souza, B. J. & Barddal, J. P. An explainable machine learning approach for student dropout prediction. Expert Syst. Appl. 233 , 120933. https://doi.org/10.1016/j.eswa.2023.120933 (2023).

Sara, N.-B., Halland, R., Igel, C. & Alstrup, S. High-school dropout prediction using machine learning: A danish large-scale study. In ESANN , vol. 2015, 23rd (2015).

Chung, J. Y. & Lee, S. Dropout early warning systems for high school students using machine learning. Child. Youth Serv. Rev. 96 , 346–353. https://doi.org/10.1016/j.childyouth.2018.11.030 (2019).

Lee, S. & Chung, J. Y. The machine learning-based dropout early warning system for improving the performance of dropout prediction. Appl. Sci. https://doi.org/10.3390/app9153093 (2019).

Sansone, D. Beyond early warning indicators: High school dropout and machine learning. Oxf. Bull. Econ. Stat. 81 , 456–485. https://doi.org/10.1111/obes.12277 (2019).

Aguiar, E. et al. Who, when, and why: A machine learning approach to prioritizing students at risk of not graduating high school on time. In Proc. of the Fifth International Conference on Learning Analytics And Knowledge , LAK ’15, 93–102, https://doi.org/10.1145/2723576.2723619 (Association for Computing Machinery, New York, NY, USA, 2015).

Colak, O. Z. et al. School dropout prediction and feature importance exploration in Malawi using household panel data: Machine learning approach. J. Comput. Soc. Sci. 6 , 245–287. https://doi.org/10.1007/s42001-022-00195-3 (2023).

Sorensen, L. C. “Big Data’’ in educational administration: An application for predicting school dropout risk. Educ. Adm. Q. 55 , 404–446. https://doi.org/10.1177/0013161X18799439 (2019).

Schoeneberger, J. A. Longitudinal attendance patterns: Developing high school dropouts. Clear. House J. Educ. Strat. Issues Ideas 85 , 7–14. https://doi.org/10.1080/00098655.2011.603766 (2012).

Balfanz, R., Herzog, L., Douglas, I. & Mac, J. Preventing student disengagement and keeping students on the graduation path in urban middle-grades schools: Early identification and effective interventions. Educ. Psychol. 42 , 223–235. https://doi.org/10.1080/00461520701621079 (2007).

Rumberger, R. W. Why Students Drop Out of High School and What Can Be Done About It (Harvard University Press, 2012).

De Witte, K., Cabus, S., Thyssen, G., Groot, W. & van Den Brink, H. M. A critical review of the literature on school dropout. Educ. Res. Rev. 10 , 13–28 (2013).

Esch, P. et al. The downward spiral of mental disorders and educational attainment: A systematic review on early school leaving. BMC Psychiatry 14 , 1–13 (2014).

Lerkkanen, M.-K. et al. The first steps study [alkuportaat] (2006-2016).

Vasalampi, K. & Aunola, K. The school path: From first steps to secondary and higher education study [koulupolku: Alkuportailta jatko-opintoihin] (2016).

Official Statistics of Finland (OSF). Statistical databases (2007).

Lemaître, G., Nogueira, F. & Aridas, C. K. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18 , 1–5 (2017).

Breiman, L. Random forests. Mach. Learn. 45 , 5–32 (2001).

Liu, X.-Y., Wu, J. & Zhou, Z.-H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. B (Cybernetics) 39 , 539–550 (2008).

PubMed   Google Scholar  

Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55 , 119–139 (1997).

Article   MathSciNet   Google Scholar  

Breiman, L. Bagging predictors. Mach. Learn. 24 , 123–140 (1996).

Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   Google Scholar  

Quinlan, J. R. Induction of decision trees. Mach. Learn. 1 , 81–106 (1986).

Brodersen, K. H., Ong, C. S., Stephan, K. E. & Buhmann, J. M. The balanced accuracy and its posterior distribution. In 2010 20th international conference on pattern recognition , 3121–3124 (IEEE, 2010).

Kohavi, R. et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai 14 , 1137–1145 (1995).

Prezja, F. Deep fast vision: A python library for accelerated deep transfer learning vision prototyping. Preprint at http://arxiv.org/abs/2311.06169 (2023).

Knowles, J. E. Of needles and haystacks: Building an accurate statewide dropout early warning system in Wisconsin. J. Educ. Data Min. 7 , 18–67. https://doi.org/10.5281/zenodo.3554725 (2015).

Aunola, K., Leskinen, E., Lerkkanen, M.-K. & Nurmi, J.-E. Developmental dynamics of math performance from preschool to Grade 2. J. Educ. Psychol. 96 , 699–713. https://doi.org/10.1037/0022-0663.96.4.699 (2004).

Ricketts, J., Lervåg, A., Dawson, N., Taylor, L. A. & Hulme, C. Reading and oral vocabulary development in early adolescence. Sci. Stud. Read. 24 , 380–396. https://doi.org/10.1080/10888438.2019.1689244 (2020).

Verhoeven, L. & van Leeuwe, J. Prediction of the development of reading comprehension: A longitudinal study. Appl. Cogn. Psychol. 22 , 407–423. https://doi.org/10.1002/acp.1414 (2008).

Khanolainen, D. et al. Longitudinal effects of the home learning environment and parental difficulties on reading and math development across Grades 1–9. Front. Psychol. https://doi.org/10.3389/fpsyg.2020.577981 (2020).

Psyridou, M. et al. Developmental profiles of arithmetic fluency skills from grades 1 to 9 and their early identification. Dev. Psychol. 59 , 2379–2396. https://doi.org/10.1037/dev0001622 (2023).

Psyridou, M. et al. Developmental profiles of reading fluency and reading comprehension from grades 1 to 9 and their early identification. Dev. Psychol. 57 , 1840–1854. https://doi.org/10.1037/dev0000976 (2021).

Download references

Acknowledgements

The First Steps Study was funded by by grants from the Academy of Finland (Grant numbers: 213486, 263891, 268586, 292466, 276239, 284439, and 313768). The School Path study was funded by grants from Academy of Finland (Grant numbers: 299506 and 323773).This research was also partly funded by the Strategic Research Council (SRC) established within the Academy of Finland (Grant numbers: 335625, 335727, 345196, 358490, and 358250 for the project CRITICAL and Grant numbers: 352648, 353392 for the project Right to Belong). In addition, Maria Psyridou was supported by the Academy of Finland (Grant number: 339418).

Author information

Authors and affiliations.

Department of Psychology, University of Jyväskylä, 40014, Jyväskylä, Finland

Maria Psyridou

Faculty of Information Technology, University of Jyväskylä, 40014, Jyväskylä, Finland

Fabi Prezja

Department of Teacher Education, University of Jyväskylä, 40014, Jyväskylä, Finland

Minna Torppa, Marja-Kristiina Lerkkanen & Anna-Maija Poikkeus

Department of Education, University of Jyväskylä, 40014, Jyväskylä, Finland

Kati Vasalampi

You can also search for this author in PubMed   Google Scholar

Contributions

M.P. conceived the experiment, was involved in data curation, and analysed the results. F.P. was involved in data curation and analysed the results. M.K.L. was involved in data collection. A.M.P. was involved in data collection. M.T. conceived the experiment, was involved in data curation and data collection. K.V. conceived the experiment and was involved in data curation and data collection. All authors reviewed the manuscript.

Corresponding author

Correspondence to Maria Psyridou .

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary information., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Psyridou, M., Prezja, F., Torppa, M. et al. Machine learning predicts upper secondary education dropout as early as the end of primary school. Sci Rep 14 , 12956 (2024). https://doi.org/10.1038/s41598-024-63629-0

Download citation

Received : 27 February 2024

Accepted : 30 May 2024

Published : 05 June 2024

DOI : https://doi.org/10.1038/s41598-024-63629-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Education dropout
  • Longitudinal data
  • Upper secondary education
  • Comprehensive education
  • Kindergarten
  • Academic outcomes

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

primary vs secondary research article

IMAGES

  1. Primary vs Secondary Sources

    primary vs secondary research article

  2. primary and secondary sources in research methodology pdf

    primary vs secondary research article

  3. Primary Research Vs. Secondary Research Methodology

    primary vs secondary research article

  4. Primary vs Secondary Research: Difference and Comparison

    primary vs secondary research article

  5. Finding Primary Research

    primary vs secondary research article

  6. Primary vs secondary sources

    primary vs secondary research article

VIDEO

  1. Primary VS Secondary Research / Dr. Hassaan Tohid

  2. Primary vs secondary research

  3. Primary Vs Secondary Research

  4. 099: How To Discern Primary vs Secondary Doctrine (Chartology 101)

  5. What is an Longitudinal Research?

  6. Finding Literature, Finding Data, and Using Mendeley Citations

COMMENTS

  1. Primary vs. Secondary Sources

    Primary sources provide raw information and first-hand evidence. Examples include interview transcripts, statistical data, and works of art. Primary research gives you direct access to the subject of your research. Secondary sources provide second-hand information and commentary from other researchers. Examples include journal articles, reviews ...

  2. Identifying Primary and Secondary Research Articles

    Determining Primary versus Secondary Using the Database Abstract. Information found in PubMed, CINAHL, Scopus, and other databases can help you determine whether the article you're looking at is primary or secondary. Primary research article abstract. Note that in the "Objectives" field, the authors describe their single, individual study.

  3. Primary vs secondary research

    Primary research definition. When you conduct primary research, you're collecting data by doing your own surveys or observations. Secondary research definition: In secondary research, you're looking at existing data from other researchers, such as academic journals, government agencies or national statistics. Free Ebook: The Qualtrics ...

  4. Tutorial: Evaluating Information: Primary vs. Secondary Articles

    Primary vs. Secondary Research Articles. In the sciences, primary (or empirical) research articles: are original scientific reports of new research findings (Please note that an original scientific article does not include review articles, which summarize the research literature on a particular subject, or articles using meta-analyses, which ...

  5. Distinguish Between Primary and Secondary Sources

    1. Introduction. Whether conducting research in the social sciences, humanities (especially history), arts, or natural sciences, the ability to distinguish between primary and secondary source material is essential. Basically, this distinction illustrates the degree to which the author of a piece is removed from the actual event being described, informing the reader as to whether the author is ...

  6. Understanding Primary and Secondary Sources

    Note for research in the sciences: Primary sources in the sciences are forms of documentation of original research. This could be a conference paper, presentation, journal article, lab notebook, dissertation, or patent. ... Primary and secondary source quiz form the Ithaca College Library: h ttps://library.ithaca.edu/ r101/primary/ Learning ...

  7. Primary Research vs Secondary Research in 2024: Definitions

    Examples of Primary Research vs Secondary Research. The following table illustrates the differences between primary and secondary research examples. The first column lists examples of topics, while the second column provides primary research examples of methods and materials that researchers can use for collecting data on these topics.

  8. Primary vs. Secondary Sources

    A primary source gives you direct access to the subject of your research. Secondary sources provide second-hand information and commentary from other researchers. Examples include journal articles, reviews, and academic books. A secondary source describes, interprets, or synthesises primary sources. Primary sources are more credible as evidence ...

  9. Primary Research

    Tip: Primary vs. secondary sources It can be easy to get confused about the difference between primary and secondary sources in your research. The key is to remember that primary sources provide firsthand information and evidence, while secondary sources provide secondhand information and commentary from previous works.

  10. Primary vs Secondary Sources

    The best way to tell the difference between an original research article (primary) and a review article (secondary) is to look at the Methods section. Original research articles must have a Methods section. This section describes the study design, including who/what was the subject of the study, what was done to the subjects and how, how long ...

  11. Primary vs Secondary Research: Differences, Methods, Sources, and More

    Primary vs Secondary Sources in Research. Both primary and secondary sources of research form the backbone of the insight generation process, when both are utilized in tandem it can provide the perfect steppingstone for the generation of real insights. Let's explore how each category serves its unique purpose in the research ecosystem.

  12. Primary vs. Secondary Sources

    Secondary Sources. Secondary sources list, summarize, compare, and evaluate primary information and studies so as to draw conclusions on or present current state of knowledge in a discipline or subject. Sources may include a bibliography which may direct you back to the primary research reported in the article. Secondary Sources include:

  13. Primary Research vs. Secondary Research

    Not every article in a peer-reviewed journal is a primary research article or even a secondary research article. These journals often contain book/product reviews, opinion pieces, advice for practitioners, literature reviews, et cetera. Books take longer to publish and when you need current research/data this is a consideration.

  14. Library: The Research Process: 3a. Primary vs. Secondary

    Primary vs. Secondary? The distinction between types of sources can get tricky, because a secondary source may also be a primary source. Garry Wills' book about Lincoln's Gettysburg Address, for example, can looked at as both a secondary and a primary source. ... In contrast, scholarly journals include research articles with primary materials ...

  15. Types of Study in Medical Research

    This article describes the structured classification of studies into two types, primary and secondary, as well as a further subclassification of studies of primary type. This is done on the basis of a selective literature search concerning study types in medical research, in addition to the authors' own experience.

  16. What Is Primary Research? Primary vs. Secondary Research

    Primary vs. Secondary Research. Primary research fuels discoveries in the worlds of science and marketing. Whether for scientific or market research, going directly to new primary sources yourself helps you target your questioning. Learn more about how to collect valuable data firsthand.

  17. Primary vs. Secondary Sources for Scientific Research

    Primary sources in the sciences may also be referred to as primary research, primary articles, or research studies. Examples include research studies, scientific experiments, papers and proceedings from scientific conferences or meetings, dissertations and theses, and technical reports. ... you will find two types of articles: primary and ...

  18. Primary vs. Secondary

    Primary sources in the sciences are a little bit different than primary sources in history, humanities, or social sciences. In the sciences, the focus is on the research. Primary sources are ones written by the scientists who performed the experiments - these articles include original research data.

  19. Peer-Reviewed Research: Primary vs. Secondary

    Secondary research is an account of original events or facts. It is secondary to and retrospective of the actual findings from an experiment or trial. These studies may be appraised summaries, reviews, or interpretations of primary sources and often exclude the original researcher(s).

  20. Primary vs Secondary Sources: Differences and Examples

    Secondary source examples include reviews and summaries of research conducted on topics related to the study and articles that summarize and highlight key aspects of a topic. It may also include analysis and interpretation of data and information, commentaries, and opinion pieces. Reference texts such as encyclopedias and even academic ...

  21. Primary Vs Secondary Research

    Secondary research involves gathering data that has already been collected by someone else. This type of research can be conducted through various sources, such as academic journals, books, government reports, and online databases. Secondary research is less time-consuming and less expensive than primary research, as the data has already been ...

  22. Primary & Secondary Sources

    In the sciences, reports of new research written by the scientists who conducted it are considered primary sources. (p.272) Secondary source: A source that comments on, analyzes, or otherwise relies on primary sources. An article in a newspaper that reports on a scientific discovery or a book that analyzes a writer's work is a secondary source ...

  23. Primary vs Secondary Research

    Research "own" the data collected. Research is based on data collected from previous researches. Primary research is based on raw data. Secondary research is based on tried and tested data which is previously analysed and filtered. The data collected fits the needs of a researcher, it is customised.

  24. PSYC 201: Research Methods: Primary vs. Secondary articles

    PSYC 201: Research Methods: Primary vs. Secondary articles. About the Libraries; Primary vs. Secondary articles; Find Sources; Cite Sources; Help; ... Look at these two articles and determine which article is the primary / research article. Which article is the primary / research article? Article 1. Article 2. Article 1: 153 votes (95.63% ...

  25. Primary, Secondary and Tertiary Sources

    A secondary source is a document or work where its author had an indirect part in a study or creation; an author is usually writing about or reporting the work or research done by someone else. Secondary sources can be used for additional or supporting information; they are not the direct product of research or the making of a creative work.

  26. Biomarkers for personalised prevention of chronic diseases: a common

    In recent years, innovative health research has moved quickly towards a new paradigm. The ability to analyse and process previously unseen sources and amounts of data, e.g. environmental, clinical, socio-demographic, epidemiological, and 'omics-derived, has created opportunities in the understanding and prevention of chronic diseases, and in the development of targeted therapies that can ...

  27. Dean's Download: Engineering Progress Spring 2024

    Dean's Download: Engineering Progress Spring 2024. by College of Engineering Communications. June 05, 2024. I never had any hope of writing an opening remark that did the collaborative, interdisciplinary theme at the heart of this issue, and, truly, the college's research enterprise at large, any justice. Because I would have, by design, been ...

  28. Healthcare use and costs in the last six months of life by level of

    Background Existing knowledge on healthcare use and costs in the last months of life is often limited to one patient group (i.e., cancer patients) and one level of healthcare (i.e., secondary care). Consequently, decision-makers lack knowledge in order to make informed decisions about the allocation of healthcare resources for all patients. Our aim is to elaborate the understanding of resource ...

  29. Emergency Department Evaluation of Young Infants With Head Injury

    10.1542/6347754642112Video AbstractPEDS-VA_2023-0650376347754642112OBJECTIVES. We compared the emergency department (ED) evaluation and outcomes of young head-injured infants to older children.METHODS. Using the Pediatric Health Information Systems database, we performed a retrospective, cross-sectional analysis of children <2 years old with isolated head injuries (International Classification ...

  30. Machine learning predicts upper secondary education dropout as ...

    Proposed research workflow. Our process begins with data collection over 13 years, from kindergarten to the end of upper secondary education (Step 1), followed by data processing which includes ...