• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

analytical research techniques

Home Market Research Research Tools and Apps

Analytical Research: What is it, Importance + Examples

Analytical research is a type of research that requires critical thinking skills and the examination of relevant facts and information.

Finding knowledge is a loose translation of the word “research.” It’s a systematic and scientific way of researching a particular subject. As a result, research is a form of scientific investigation that seeks to learn more. Analytical research is one of them.

Any kind of research is a way to learn new things. In this research, data and other pertinent information about a project are assembled; after the information is gathered and assessed, the sources are used to support a notion or prove a hypothesis.

An individual can successfully draw out minor facts to make more significant conclusions about the subject matter by using critical thinking abilities (a technique of thinking that entails identifying a claim or assumption and determining whether it is accurate or untrue).

What is analytical research?

This particular kind of research calls for using critical thinking abilities and assessing data and information pertinent to the project at hand.

Determines the causal connections between two or more variables. The analytical study aims to identify the causes and mechanisms underlying the trade deficit’s movement throughout a given period.

It is used by various professionals, including psychologists, doctors, and students, to identify the most pertinent material during investigations. One learns crucial information from analytical research that helps them contribute fresh concepts to the work they are producing.

Some researchers perform it to uncover information that supports ongoing research to strengthen the validity of their findings. Other scholars engage in analytical research to generate fresh perspectives on the subject.

Various approaches to performing research include literary analysis, Gap analysis , general public surveys, clinical trials, and meta-analysis.

Importance of analytical research

The goal of analytical research is to develop new ideas that are more believable by combining numerous minute details.

The analytical investigation is what explains why a claim should be trusted. Finding out why something occurs is complex. You need to be able to evaluate information critically and think critically. 

This kind of information aids in proving the validity of a theory or supporting a hypothesis. It assists in recognizing a claim and determining whether it is true.

Analytical kind of research is valuable to many people, including students, psychologists, marketers, and others. It aids in determining which advertising initiatives within a firm perform best. In the meantime, medical research and research design determine how well a particular treatment does.

Thus, analytical research can help people achieve their goals while saving lives and money.

Methods of Conducting Analytical Research

Analytical research is the process of gathering, analyzing, and interpreting information to make inferences and reach conclusions. Depending on the purpose of the research and the data you have access to, you can conduct analytical research using a variety of methods. Here are a few typical approaches:

Quantitative research

Numerical data are gathered and analyzed using this method. Statistical methods are then used to analyze the information, which is often collected using surveys, experiments, or pre-existing datasets. Results from quantitative research can be measured, compared, and generalized numerically.

Qualitative research

In contrast to quantitative research, qualitative research focuses on collecting non-numerical information. It gathers detailed information using techniques like interviews, focus groups, observations, or content research. Understanding social phenomena, exploring experiences, and revealing underlying meanings and motivations are all goals of qualitative research.

Mixed methods research

This strategy combines quantitative and qualitative methodologies to grasp a research problem thoroughly. Mixed methods research often entails gathering and evaluating both numerical and non-numerical data, integrating the results, and offering a more comprehensive viewpoint on the research issue.

Experimental research

Experimental research is frequently employed in scientific trials and investigations to establish causal links between variables. This approach entails modifying variables in a controlled environment to identify cause-and-effect connections. Researchers randomly divide volunteers into several groups, provide various interventions or treatments, and track the results.

Observational research

With this approach, behaviors or occurrences are observed and methodically recorded without any outside interference or variable data manipulation . Both controlled surroundings and naturalistic settings can be used for observational research . It offers useful insights into behaviors that occur in the actual world and enables researchers to explore events as they naturally occur.

Case study research

This approach entails thorough research of a single case or a small group of related cases. Case-control studies frequently include a variety of information sources, including observations, records, and interviews. They offer rich, in-depth insights and are particularly helpful for researching complex phenomena in practical settings.

Secondary data analysis

Examining secondary information is time and money-efficient, enabling researchers to explore new research issues or confirm prior findings. With this approach, researchers examine previously gathered information for a different reason. Information from earlier cohort studies, accessible databases, or corporate documents may be included in this.

Content analysis

Content research is frequently employed in social sciences, media observational studies, and cross-sectional studies. This approach systematically examines the content of texts, including media, speeches, and written documents. Themes, patterns, or keywords are found and categorized by researchers to make inferences about the content.

Depending on your research objectives, the resources at your disposal, and the type of data you wish to analyze, selecting the most appropriate approach or combination of methodologies is crucial to conducting analytical research.

Examples of analytical research

Analytical research takes a unique measurement. Instead, you would consider the causes and changes to the trade imbalance. Detailed statistics and statistical checks help guarantee that the results are significant.

For example, it can look into why the value of the Japanese Yen has decreased. This is so that an analytical study can consider “how” and “why” questions.

Another example is that someone might conduct analytical research to identify a study’s gap. It presents a fresh perspective on your data. Therefore, it aids in supporting or refuting notions.

Descriptive vs analytical research

Here are the key differences between descriptive research and analytical research:

The study of cause and effect makes extensive use of analytical research. It benefits from numerous academic disciplines, including marketing, health, and psychology, because it offers more conclusive information for addressing research issues.

QuestionPro offers solutions for every issue and industry, making it more than just survey software. For handling data, we also have systems like our InsightsHub research library.

You may make crucial decisions quickly while using QuestionPro to understand your clients and other study subjects better. Make use of the possibilities of the enterprise-grade research suite right away!

LEARN MORE         FREE TRIAL

MORE LIKE THIS

analytical research techniques

Why Multilingual 360 Feedback Surveys Provide Better Insights

Jun 3, 2024

Raked Weighting

Raked Weighting: A Key Tool for Accurate Survey Results

May 31, 2024

Data trends

Top 8 Data Trends to Understand the Future of Data

May 30, 2024

interactive presentation software

Top 12 Interactive Presentation Software to Engage Your User

May 29, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

Analyst Answers

Data & Finance for Work & Life

data analysis types, methods, and techniques tree diagram

Data Analysis: Types, Methods & Techniques (a Complete List)

( Updated Version )

While the term sounds intimidating, “data analysis” is nothing more than making sense of information in a table. It consists of filtering, sorting, grouping, and manipulating data tables with basic algebra and statistics.

In fact, you don’t need experience to understand the basics. You have already worked with data extensively in your life, and “analysis” is nothing more than a fancy word for good sense and basic logic.

Over time, people have intuitively categorized the best logical practices for treating data. These categories are what we call today types , methods , and techniques .

This article provides a comprehensive list of types, methods, and techniques, and explains the difference between them.

For a practical intro to data analysis (including types, methods, & techniques), check out our Intro to Data Analysis eBook for free.

Descriptive, Diagnostic, Predictive, & Prescriptive Analysis

If you Google “types of data analysis,” the first few results will explore descriptive , diagnostic , predictive , and prescriptive analysis. Why? Because these names are easy to understand and are used a lot in “the real world.”

Descriptive analysis is an informational method, diagnostic analysis explains “why” a phenomenon occurs, predictive analysis seeks to forecast the result of an action, and prescriptive analysis identifies solutions to a specific problem.

That said, these are only four branches of a larger analytical tree.

Good data analysts know how to position these four types within other analytical methods and tactics, allowing them to leverage strengths and weaknesses in each to uproot the most valuable insights.

Let’s explore the full analytical tree to understand how to appropriately assess and apply these four traditional types.

Tree diagram of Data Analysis Types, Methods, and Techniques

Here’s a picture to visualize the structure and hierarchy of data analysis types, methods, and techniques.

If it’s too small you can view the picture in a new tab . Open it to follow along!

analytical research techniques

Note: basic descriptive statistics such as mean , median , and mode , as well as standard deviation , are not shown because most people are already familiar with them. In the diagram, they would fall under the “descriptive” analysis type.

Tree Diagram Explained

The highest-level classification of data analysis is quantitative vs qualitative . Quantitative implies numbers while qualitative implies information other than numbers.

Quantitative data analysis then splits into mathematical analysis and artificial intelligence (AI) analysis . Mathematical types then branch into descriptive , diagnostic , predictive , and prescriptive .

Methods falling under mathematical analysis include clustering , classification , forecasting , and optimization . Qualitative data analysis methods include content analysis , narrative analysis , discourse analysis , framework analysis , and/or grounded theory .

Moreover, mathematical techniques include regression , Nïave Bayes , Simple Exponential Smoothing , cohorts , factors , linear discriminants , and more, whereas techniques falling under the AI type include artificial neural networks , decision trees , evolutionary programming , and fuzzy logic . Techniques under qualitative analysis include text analysis , coding , idea pattern analysis , and word frequency .

It’s a lot to remember! Don’t worry, once you understand the relationship and motive behind all these terms, it’ll be like riding a bike.

We’ll move down the list from top to bottom and I encourage you to open the tree diagram above in a new tab so you can follow along .

But first, let’s just address the elephant in the room: what’s the difference between methods and techniques anyway?

Difference between methods and techniques

Though often used interchangeably, methods ands techniques are not the same. By definition, methods are the process by which techniques are applied, and techniques are the practical application of those methods.

For example, consider driving. Methods include staying in your lane, stopping at a red light, and parking in a spot. Techniques include turning the steering wheel, braking, and pushing the gas pedal.

Data sets: observations and fields

It’s important to understand the basic structure of data tables to comprehend the rest of the article. A data set consists of one far-left column containing observations, then a series of columns containing the fields (aka “traits” or “characteristics”) that describe each observations. For example, imagine we want a data table for fruit. It might look like this:

Now let’s turn to types, methods, and techniques. Each heading below consists of a description, relative importance, the nature of data it explores, and the motivation for using it.

Quantitative Analysis

  • It accounts for more than 50% of all data analysis and is by far the most widespread and well-known type of data analysis.
  • As you have seen, it holds descriptive, diagnostic, predictive, and prescriptive methods, which in turn hold some of the most important techniques available today, such as clustering and forecasting.
  • It can be broken down into mathematical and AI analysis.
  • Importance : Very high . Quantitative analysis is a must for anyone interesting in becoming or improving as a data analyst.
  • Nature of Data: data treated under quantitative analysis is, quite simply, quantitative. It encompasses all numeric data.
  • Motive: to extract insights. (Note: we’re at the top of the pyramid, this gets more insightful as we move down.)

Qualitative Analysis

  • It accounts for less than 30% of all data analysis and is common in social sciences .
  • It can refer to the simple recognition of qualitative elements, which is not analytic in any way, but most often refers to methods that assign numeric values to non-numeric data for analysis.
  • Because of this, some argue that it’s ultimately a quantitative type.
  • Importance: Medium. In general, knowing qualitative data analysis is not common or even necessary for corporate roles. However, for researchers working in social sciences, its importance is very high .
  • Nature of Data: data treated under qualitative analysis is non-numeric. However, as part of the analysis, analysts turn non-numeric data into numbers, at which point many argue it is no longer qualitative analysis.
  • Motive: to extract insights. (This will be more important as we move down the pyramid.)

Mathematical Analysis

  • Description: mathematical data analysis is a subtype of qualitative data analysis that designates methods and techniques based on statistics, algebra, and logical reasoning to extract insights. It stands in opposition to artificial intelligence analysis.
  • Importance: Very High. The most widespread methods and techniques fall under mathematical analysis. In fact, it’s so common that many people use “quantitative” and “mathematical” analysis interchangeably.
  • Nature of Data: numeric. By definition, all data under mathematical analysis are numbers.
  • Motive: to extract measurable insights that can be used to act upon.

Artificial Intelligence & Machine Learning Analysis

  • Description: artificial intelligence and machine learning analyses designate techniques based on the titular skills. They are not traditionally mathematical, but they are quantitative since they use numbers. Applications of AI & ML analysis techniques are developing, but they’re not yet mainstream enough to show promise across the field.
  • Importance: Medium . As of today (September 2020), you don’t need to be fluent in AI & ML data analysis to be a great analyst. BUT, if it’s a field that interests you, learn it. Many believe that in 10 year’s time its importance will be very high .
  • Nature of Data: numeric.
  • Motive: to create calculations that build on themselves in order and extract insights without direct input from a human.

Descriptive Analysis

  • Description: descriptive analysis is a subtype of mathematical data analysis that uses methods and techniques to provide information about the size, dispersion, groupings, and behavior of data sets. This may sounds complicated, but just think about mean, median, and mode: all three are types of descriptive analysis. They provide information about the data set. We’ll look at specific techniques below.
  • Importance: Very high. Descriptive analysis is among the most commonly used data analyses in both corporations and research today.
  • Nature of Data: the nature of data under descriptive statistics is sets. A set is simply a collection of numbers that behaves in predictable ways. Data reflects real life, and there are patterns everywhere to be found. Descriptive analysis describes those patterns.
  • Motive: the motive behind descriptive analysis is to understand how numbers in a set group together, how far apart they are from each other, and how often they occur. As with most statistical analysis, the more data points there are, the easier it is to describe the set.

Diagnostic Analysis

  • Description: diagnostic analysis answers the question “why did it happen?” It is an advanced type of mathematical data analysis that manipulates multiple techniques, but does not own any single one. Analysts engage in diagnostic analysis when they try to explain why.
  • Importance: Very high. Diagnostics are probably the most important type of data analysis for people who don’t do analysis because they’re valuable to anyone who’s curious. They’re most common in corporations, as managers often only want to know the “why.”
  • Nature of Data : data under diagnostic analysis are data sets. These sets in themselves are not enough under diagnostic analysis. Instead, the analyst must know what’s behind the numbers in order to explain “why.” That’s what makes diagnostics so challenging yet so valuable.
  • Motive: the motive behind diagnostics is to diagnose — to understand why.

Predictive Analysis

  • Description: predictive analysis uses past data to project future data. It’s very often one of the first kinds of analysis new researchers and corporate analysts use because it is intuitive. It is a subtype of the mathematical type of data analysis, and its three notable techniques are regression, moving average, and exponential smoothing.
  • Importance: Very high. Predictive analysis is critical for any data analyst working in a corporate environment. Companies always want to know what the future will hold — especially for their revenue.
  • Nature of Data: Because past and future imply time, predictive data always includes an element of time. Whether it’s minutes, hours, days, months, or years, we call this time series data . In fact, this data is so important that I’ll mention it twice so you don’t forget: predictive analysis uses time series data .
  • Motive: the motive for investigating time series data with predictive analysis is to predict the future in the most analytical way possible.

Prescriptive Analysis

  • Description: prescriptive analysis is a subtype of mathematical analysis that answers the question “what will happen if we do X?” It’s largely underestimated in the data analysis world because it requires diagnostic and descriptive analyses to be done before it even starts. More than simple predictive analysis, prescriptive analysis builds entire data models to show how a simple change could impact the ensemble.
  • Importance: High. Prescriptive analysis is most common under the finance function in many companies. Financial analysts use it to build a financial model of the financial statements that show how that data will change given alternative inputs.
  • Nature of Data: the nature of data in prescriptive analysis is data sets. These data sets contain patterns that respond differently to various inputs. Data that is useful for prescriptive analysis contains correlations between different variables. It’s through these correlations that we establish patterns and prescribe action on this basis. This analysis cannot be performed on data that exists in a vacuum — it must be viewed on the backdrop of the tangibles behind it.
  • Motive: the motive for prescriptive analysis is to establish, with an acceptable degree of certainty, what results we can expect given a certain action. As you might expect, this necessitates that the analyst or researcher be aware of the world behind the data, not just the data itself.

Clustering Method

  • Description: the clustering method groups data points together based on their relativeness closeness to further explore and treat them based on these groupings. There are two ways to group clusters: intuitively and statistically (or K-means).
  • Importance: Very high. Though most corporate roles group clusters intuitively based on management criteria, a solid understanding of how to group them mathematically is an excellent descriptive and diagnostic approach to allow for prescriptive analysis thereafter.
  • Nature of Data : the nature of data useful for clustering is sets with 1 or more data fields. While most people are used to looking at only two dimensions (x and y), clustering becomes more accurate the more fields there are.
  • Motive: the motive for clustering is to understand how data sets group and to explore them further based on those groups.
  • Here’s an example set:

analytical research techniques

Classification Method

  • Description: the classification method aims to separate and group data points based on common characteristics . This can be done intuitively or statistically.
  • Importance: High. While simple on the surface, classification can become quite complex. It’s very valuable in corporate and research environments, but can feel like its not worth the work. A good analyst can execute it quickly to deliver results.
  • Nature of Data: the nature of data useful for classification is data sets. As we will see, it can be used on qualitative data as well as quantitative. This method requires knowledge of the substance behind the data, not just the numbers themselves.
  • Motive: the motive for classification is group data not based on mathematical relationships (which would be clustering), but by predetermined outputs. This is why it’s less useful for diagnostic analysis, and more useful for prescriptive analysis.

Forecasting Method

  • Description: the forecasting method uses time past series data to forecast the future.
  • Importance: Very high. Forecasting falls under predictive analysis and is arguably the most common and most important method in the corporate world. It is less useful in research, which prefers to understand the known rather than speculate about the future.
  • Nature of Data: data useful for forecasting is time series data, which, as we’ve noted, always includes a variable of time.
  • Motive: the motive for the forecasting method is the same as that of prescriptive analysis: the confidently estimate future values.

Optimization Method

  • Description: the optimization method maximized or minimizes values in a set given a set of criteria. It is arguably most common in prescriptive analysis. In mathematical terms, it is maximizing or minimizing a function given certain constraints.
  • Importance: Very high. The idea of optimization applies to more analysis types than any other method. In fact, some argue that it is the fundamental driver behind data analysis. You would use it everywhere in research and in a corporation.
  • Nature of Data: the nature of optimizable data is a data set of at least two points.
  • Motive: the motive behind optimization is to achieve the best result possible given certain conditions.

Content Analysis Method

  • Description: content analysis is a method of qualitative analysis that quantifies textual data to track themes across a document. It’s most common in academic fields and in social sciences, where written content is the subject of inquiry.
  • Importance: High. In a corporate setting, content analysis as such is less common. If anything Nïave Bayes (a technique we’ll look at below) is the closest corporations come to text. However, it is of the utmost importance for researchers. If you’re a researcher, check out this article on content analysis .
  • Nature of Data: data useful for content analysis is textual data.
  • Motive: the motive behind content analysis is to understand themes expressed in a large text

Narrative Analysis Method

  • Description: narrative analysis is a method of qualitative analysis that quantifies stories to trace themes in them. It’s differs from content analysis because it focuses on stories rather than research documents, and the techniques used are slightly different from those in content analysis (very nuances and outside the scope of this article).
  • Importance: Low. Unless you are highly specialized in working with stories, narrative analysis rare.
  • Nature of Data: the nature of the data useful for the narrative analysis method is narrative text.
  • Motive: the motive for narrative analysis is to uncover hidden patterns in narrative text.

Discourse Analysis Method

  • Description: the discourse analysis method falls under qualitative analysis and uses thematic coding to trace patterns in real-life discourse. That said, real-life discourse is oral, so it must first be transcribed into text.
  • Importance: Low. Unless you are focused on understand real-world idea sharing in a research setting, this kind of analysis is less common than the others on this list.
  • Nature of Data: the nature of data useful in discourse analysis is first audio files, then transcriptions of those audio files.
  • Motive: the motive behind discourse analysis is to trace patterns of real-world discussions. (As a spooky sidenote, have you ever felt like your phone microphone was listening to you and making reading suggestions? If it was, the method was discourse analysis.)

Framework Analysis Method

  • Description: the framework analysis method falls under qualitative analysis and uses similar thematic coding techniques to content analysis. However, where content analysis aims to discover themes, framework analysis starts with a framework and only considers elements that fall in its purview.
  • Importance: Low. As with the other textual analysis methods, framework analysis is less common in corporate settings. Even in the world of research, only some use it. Strangely, it’s very common for legislative and political research.
  • Nature of Data: the nature of data useful for framework analysis is textual.
  • Motive: the motive behind framework analysis is to understand what themes and parts of a text match your search criteria.

Grounded Theory Method

  • Description: the grounded theory method falls under qualitative analysis and uses thematic coding to build theories around those themes.
  • Importance: Low. Like other qualitative analysis techniques, grounded theory is less common in the corporate world. Even among researchers, you would be hard pressed to find many using it. Though powerful, it’s simply too rare to spend time learning.
  • Nature of Data: the nature of data useful in the grounded theory method is textual.
  • Motive: the motive of grounded theory method is to establish a series of theories based on themes uncovered from a text.

Clustering Technique: K-Means

  • Description: k-means is a clustering technique in which data points are grouped in clusters that have the closest means. Though not considered AI or ML, it inherently requires the use of supervised learning to reevaluate clusters as data points are added. Clustering techniques can be used in diagnostic, descriptive, & prescriptive data analyses.
  • Importance: Very important. If you only take 3 things from this article, k-means clustering should be part of it. It is useful in any situation where n observations have multiple characteristics and we want to put them in groups.
  • Nature of Data: the nature of data is at least one characteristic per observation, but the more the merrier.
  • Motive: the motive for clustering techniques such as k-means is to group observations together and either understand or react to them.

Regression Technique

  • Description: simple and multivariable regressions use either one independent variable or combination of multiple independent variables to calculate a correlation to a single dependent variable using constants. Regressions are almost synonymous with correlation today.
  • Importance: Very high. Along with clustering, if you only take 3 things from this article, regression techniques should be part of it. They’re everywhere in corporate and research fields alike.
  • Nature of Data: the nature of data used is regressions is data sets with “n” number of observations and as many variables as are reasonable. It’s important, however, to distinguish between time series data and regression data. You cannot use regressions or time series data without accounting for time. The easier way is to use techniques under the forecasting method.
  • Motive: The motive behind regression techniques is to understand correlations between independent variable(s) and a dependent one.

Nïave Bayes Technique

  • Description: Nïave Bayes is a classification technique that uses simple probability to classify items based previous classifications. In plain English, the formula would be “the chance that thing with trait x belongs to class c depends on (=) the overall chance of trait x belonging to class c, multiplied by the overall chance of class c, divided by the overall chance of getting trait x.” As a formula, it’s P(c|x) = P(x|c) * P(c) / P(x).
  • Importance: High. Nïave Bayes is a very common, simplistic classification techniques because it’s effective with large data sets and it can be applied to any instant in which there is a class. Google, for example, might use it to group webpages into groups for certain search engine queries.
  • Nature of Data: the nature of data for Nïave Bayes is at least one class and at least two traits in a data set.
  • Motive: the motive behind Nïave Bayes is to classify observations based on previous data. It’s thus considered part of predictive analysis.

Cohorts Technique

  • Description: cohorts technique is a type of clustering method used in behavioral sciences to separate users by common traits. As with clustering, it can be done intuitively or mathematically, the latter of which would simply be k-means.
  • Importance: Very high. With regard to resembles k-means, the cohort technique is more of a high-level counterpart. In fact, most people are familiar with it as a part of Google Analytics. It’s most common in marketing departments in corporations, rather than in research.
  • Nature of Data: the nature of cohort data is data sets in which users are the observation and other fields are used as defining traits for each cohort.
  • Motive: the motive for cohort analysis techniques is to group similar users and analyze how you retain them and how the churn.

Factor Technique

  • Description: the factor analysis technique is a way of grouping many traits into a single factor to expedite analysis. For example, factors can be used as traits for Nïave Bayes classifications instead of more general fields.
  • Importance: High. While not commonly employed in corporations, factor analysis is hugely valuable. Good data analysts use it to simplify their projects and communicate them more clearly.
  • Nature of Data: the nature of data useful in factor analysis techniques is data sets with a large number of fields on its observations.
  • Motive: the motive for using factor analysis techniques is to reduce the number of fields in order to more quickly analyze and communicate findings.

Linear Discriminants Technique

  • Description: linear discriminant analysis techniques are similar to regressions in that they use one or more independent variable to determine a dependent variable; however, the linear discriminant technique falls under a classifier method since it uses traits as independent variables and class as a dependent variable. In this way, it becomes a classifying method AND a predictive method.
  • Importance: High. Though the analyst world speaks of and uses linear discriminants less commonly, it’s a highly valuable technique to keep in mind as you progress in data analysis.
  • Nature of Data: the nature of data useful for the linear discriminant technique is data sets with many fields.
  • Motive: the motive for using linear discriminants is to classify observations that would be otherwise too complex for simple techniques like Nïave Bayes.

Exponential Smoothing Technique

  • Description: exponential smoothing is a technique falling under the forecasting method that uses a smoothing factor on prior data in order to predict future values. It can be linear or adjusted for seasonality. The basic principle behind exponential smoothing is to use a percent weight (value between 0 and 1 called alpha) on more recent values in a series and a smaller percent weight on less recent values. The formula is f(x) = current period value * alpha + previous period value * 1-alpha.
  • Importance: High. Most analysts still use the moving average technique (covered next) for forecasting, though it is less efficient than exponential moving, because it’s easy to understand. However, good analysts will have exponential smoothing techniques in their pocket to increase the value of their forecasts.
  • Nature of Data: the nature of data useful for exponential smoothing is time series data . Time series data has time as part of its fields .
  • Motive: the motive for exponential smoothing is to forecast future values with a smoothing variable.

Moving Average Technique

  • Description: the moving average technique falls under the forecasting method and uses an average of recent values to predict future ones. For example, to predict rainfall in April, you would take the average of rainfall from January to March. It’s simple, yet highly effective.
  • Importance: Very high. While I’m personally not a huge fan of moving averages due to their simplistic nature and lack of consideration for seasonality, they’re the most common forecasting technique and therefore very important.
  • Nature of Data: the nature of data useful for moving averages is time series data .
  • Motive: the motive for moving averages is to predict future values is a simple, easy-to-communicate way.

Neural Networks Technique

  • Description: neural networks are a highly complex artificial intelligence technique that replicate a human’s neural analysis through a series of hyper-rapid computations and comparisons that evolve in real time. This technique is so complex that an analyst must use computer programs to perform it.
  • Importance: Medium. While the potential for neural networks is theoretically unlimited, it’s still little understood and therefore uncommon. You do not need to know it by any means in order to be a data analyst.
  • Nature of Data: the nature of data useful for neural networks is data sets of astronomical size, meaning with 100s of 1000s of fields and the same number of row at a minimum .
  • Motive: the motive for neural networks is to understand wildly complex phenomenon and data to thereafter act on it.

Decision Tree Technique

  • Description: the decision tree technique uses artificial intelligence algorithms to rapidly calculate possible decision pathways and their outcomes on a real-time basis. It’s so complex that computer programs are needed to perform it.
  • Importance: Medium. As with neural networks, decision trees with AI are too little understood and are therefore uncommon in corporate and research settings alike.
  • Nature of Data: the nature of data useful for the decision tree technique is hierarchical data sets that show multiple optional fields for each preceding field.
  • Motive: the motive for decision tree techniques is to compute the optimal choices to make in order to achieve a desired result.

Evolutionary Programming Technique

  • Description: the evolutionary programming technique uses a series of neural networks, sees how well each one fits a desired outcome, and selects only the best to test and retest. It’s called evolutionary because is resembles the process of natural selection by weeding out weaker options.
  • Importance: Medium. As with the other AI techniques, evolutionary programming just isn’t well-understood enough to be usable in many cases. It’s complexity also makes it hard to explain in corporate settings and difficult to defend in research settings.
  • Nature of Data: the nature of data in evolutionary programming is data sets of neural networks, or data sets of data sets.
  • Motive: the motive for using evolutionary programming is similar to decision trees: understanding the best possible option from complex data.
  • Video example :

Fuzzy Logic Technique

  • Description: fuzzy logic is a type of computing based on “approximate truths” rather than simple truths such as “true” and “false.” It is essentially two tiers of classification. For example, to say whether “Apples are good,” you need to first classify that “Good is x, y, z.” Only then can you say apples are good. Another way to see it helping a computer see truth like humans do: “definitely true, probably true, maybe true, probably false, definitely false.”
  • Importance: Medium. Like the other AI techniques, fuzzy logic is uncommon in both research and corporate settings, which means it’s less important in today’s world.
  • Nature of Data: the nature of fuzzy logic data is huge data tables that include other huge data tables with a hierarchy including multiple subfields for each preceding field.
  • Motive: the motive of fuzzy logic to replicate human truth valuations in a computer is to model human decisions based on past data. The obvious possible application is marketing.

Text Analysis Technique

  • Description: text analysis techniques fall under the qualitative data analysis type and use text to extract insights.
  • Importance: Medium. Text analysis techniques, like all the qualitative analysis type, are most valuable for researchers.
  • Nature of Data: the nature of data useful in text analysis is words.
  • Motive: the motive for text analysis is to trace themes in a text across sets of very long documents, such as books.

Coding Technique

  • Description: the coding technique is used in textual analysis to turn ideas into uniform phrases and analyze the number of times and the ways in which those ideas appear. For this reason, some consider it a quantitative technique as well. You can learn more about coding and the other qualitative techniques here .
  • Importance: Very high. If you’re a researcher working in social sciences, coding is THE analysis techniques, and for good reason. It’s a great way to add rigor to analysis. That said, it’s less common in corporate settings.
  • Nature of Data: the nature of data useful for coding is long text documents.
  • Motive: the motive for coding is to make tracing ideas on paper more than an exercise of the mind by quantifying it and understanding is through descriptive methods.

Idea Pattern Technique

  • Description: the idea pattern analysis technique fits into coding as the second step of the process. Once themes and ideas are coded, simple descriptive analysis tests may be run. Some people even cluster the ideas!
  • Importance: Very high. If you’re a researcher, idea pattern analysis is as important as the coding itself.
  • Nature of Data: the nature of data useful for idea pattern analysis is already coded themes.
  • Motive: the motive for the idea pattern technique is to trace ideas in otherwise unmanageably-large documents.

Word Frequency Technique

  • Description: word frequency is a qualitative technique that stands in opposition to coding and uses an inductive approach to locate specific words in a document in order to understand its relevance. Word frequency is essentially the descriptive analysis of qualitative data because it uses stats like mean, median, and mode to gather insights.
  • Importance: High. As with the other qualitative approaches, word frequency is very important in social science research, but less so in corporate settings.
  • Nature of Data: the nature of data useful for word frequency is long, informative documents.
  • Motive: the motive for word frequency is to locate target words to determine the relevance of a document in question.

Types of data analysis in research

Types of data analysis in research methodology include every item discussed in this article. As a list, they are:

  • Quantitative
  • Qualitative
  • Mathematical
  • Machine Learning and AI
  • Descriptive
  • Prescriptive
  • Classification
  • Forecasting
  • Optimization
  • Grounded theory
  • Artificial Neural Networks
  • Decision Trees
  • Evolutionary Programming
  • Fuzzy Logic
  • Text analysis
  • Idea Pattern Analysis
  • Word Frequency Analysis
  • Nïave Bayes
  • Exponential smoothing
  • Moving average
  • Linear discriminant

Types of data analysis in qualitative research

As a list, the types of data analysis in qualitative research are the following methods:

Types of data analysis in quantitative research

As a list, the types of data analysis in quantitative research are:

Data analysis methods

As a list, data analysis methods are:

  • Content (qualitative)
  • Narrative (qualitative)
  • Discourse (qualitative)
  • Framework (qualitative)
  • Grounded theory (qualitative)

Quantitative data analysis methods

As a list, quantitative data analysis methods are:

Tabular View of Data Analysis Types, Methods, and Techniques

About the author.

Noah is the founder & Editor-in-Chief at AnalystAnswers. He is a transatlantic professional and entrepreneur with 5+ years of corporate finance and data analytics experience, as well as 3+ years in consumer financial products and business software. He started AnalystAnswers to provide aspiring professionals with accessible explanations of otherwise dense finance and data concepts. Noah believes everyone can benefit from an analytical mindset in growing digital world. When he's not busy at work, Noah likes to explore new European cities, exercise, and spend time with friends and family.

File available immediately.

analytical research techniques

Notice: JavaScript is required for this content.

analytical research techniques

Grad Coach

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA)  and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

  • What (exactly) is quantitative data analysis?
  • When to use quantitative analysis
  • How quantitative analysis works

The two “branches” of quantitative analysis

  • Descriptive statistics 101
  • Inferential statistics 101
  • How to choose the right quantitative methods
  • Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

  • Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
  • Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
  • And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

analytical research techniques

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

  • Mean – this is simply the mathematical average of a range of numbers.
  • Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
  • Mode – this is simply the most commonly occurring number in the data set.
  • In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
  • Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
  • Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

Descriptive statistics example data

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

  • Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
  • Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
  • And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Examples of descriptive statistics

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

  • Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
  • And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

Sample correlation

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations .

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations,  so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

  • The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
  • Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

  • Quantitative data analysis is all about  analysing number-based data  (which includes categorical and numerical data) using various statistical techniques.
  • The two main  branches  of statistics are  descriptive statistics  and  inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
  • Common  descriptive statistical methods include  mean  (average),  median , standard  deviation  and  skewness .
  • Common  inferential statistical methods include  t-tests ,  ANOVA ,  correlation  and  regression  analysis.
  • To choose the right statistical methods and techniques, you need to consider the  type of data you’re working with , as well as your  research questions  and hypotheses.

analytical research techniques

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Narrative analysis explainer

76 Comments

Oddy Labs

Hi, I have read your article. Such a brilliant post you have created.

Derek Jansen

Thank you for the feedback. Good luck with your quantitative analysis.

Abdullahi Ramat

Thank you so much.

Obi Eric Onyedikachi

Thank you so much. I learnt much well. I love your summaries of the concepts. I had love you to explain how to input data using SPSS

Lumbuka Kaunda

Amazing and simple way of breaking down quantitative methods.

Charles Lwanga

This is beautiful….especially for non-statisticians. I have skimmed through but I wish to read again. and please include me in other articles of the same nature when you do post. I am interested. I am sure, I could easily learn from you and get off the fear that I have had in the past. Thank you sincerely.

Essau Sefolo

Send me every new information you might have.

fatime

i need every new information

Dr Peter

Thank you for the blog. It is quite informative. Dr Peter Nemaenzhe PhD

Mvogo Mvogo Ephrem

It is wonderful. l’ve understood some of the concepts in a more compréhensive manner

Maya

Your article is so good! However, I am still a bit lost. I am doing a secondary research on Gun control in the US and increase in crime rates and I am not sure which analysis method I should use?

Joy

Based on the given learning points, this is inferential analysis, thus, use ‘t-tests, ANOVA, correlation and regression analysis’

Peter

Well explained notes. Am an MPH student and currently working on my thesis proposal, this has really helped me understand some of the things I didn’t know.

Jejamaije Mujoro

I like your page..helpful

prashant pandey

wonderful i got my concept crystal clear. thankyou!!

Dailess Banda

This is really helpful , thank you

Lulu

Thank you so much this helped

wossen

Wonderfully explained

Niamatullah zaheer

thank u so much, it was so informative

mona

THANKYOU, this was very informative and very helpful

Thaddeus Ogwoka

This is great GRADACOACH I am not a statistician but I require more of this in my thesis

Include me in your posts.

Alem Teshome

This is so great and fully useful. I would like to thank you again and again.

Mrinal

Glad to read this article. I’ve read lot of articles but this article is clear on all concepts. Thanks for sharing.

Emiola Adesina

Thank you so much. This is a very good foundation and intro into quantitative data analysis. Appreciate!

Josyl Hey Aquilam

You have a very impressive, simple but concise explanation of data analysis for Quantitative Research here. This is a God-send link for me to appreciate research more. Thank you so much!

Lynnet Chikwaikwai

Avery good presentation followed by the write up. yes you simplified statistics to make sense even to a layman like me. Thank so much keep it up. The presenter did ell too. i would like more of this for Qualitative and exhaust more of the test example like the Anova.

Adewole Ikeoluwa

This is a very helpful article, couldn’t have been clearer. Thank you.

Samih Soud ALBusaidi

Awesome and phenomenal information.Well done

Nūr

The video with the accompanying article is super helpful to demystify this topic. Very well done. Thank you so much.

Lalah

thank you so much, your presentation helped me a lot

Anjali

I don’t know how should I express that ur article is saviour for me 🥺😍

Saiqa Aftab Tunio

It is well defined information and thanks for sharing. It helps me a lot in understanding the statistical data.

Funeka Mvandaba

I gain a lot and thanks for sharing brilliant ideas, so wish to be linked on your email update.

Rita Kathomi Gikonyo

Very helpful and clear .Thank you Gradcoach.

Hilaria Barsabal

Thank for sharing this article, well organized and information presented are very clear.

AMON TAYEBWA

VERY INTERESTING AND SUPPORTIVE TO NEW RESEARCHERS LIKE ME. AT LEAST SOME BASICS ABOUT QUANTITATIVE.

Tariq

An outstanding, well explained and helpful article. This will help me so much with my data analysis for my research project. Thank you!

chikumbutso

wow this has just simplified everything i was scared of how i am gonna analyse my data but thanks to you i will be able to do so

Idris Haruna

simple and constant direction to research. thanks

Mbunda Castro

This is helpful

AshikB

Great writing!! Comprehensive and very helpful.

himalaya ravi

Do you provide any assistance for other steps of research methodology like making research problem testing hypothesis report and thesis writing?

Sarah chiwamba

Thank you so much for such useful article!

Lopamudra

Amazing article. So nicely explained. Wow

Thisali Liyanage

Very insightfull. Thanks

Melissa

I am doing a quality improvement project to determine if the implementation of a protocol will change prescribing habits. Would this be a t-test?

Aliyah

The is a very helpful blog, however, I’m still not sure how to analyze my data collected. I’m doing a research on “Free Education at the University of Guyana”

Belayneh Kassahun

tnx. fruitful blog!

Suzanne

So I am writing exams and would like to know how do establish which method of data analysis to use from the below research questions: I am a bit lost as to how I determine the data analysis method from the research questions.

Do female employees report higher job satisfaction than male employees with similar job descriptions across the South African telecommunications sector? – I though that maybe Chi Square could be used here. – Is there a gender difference in talented employees’ actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – Is there a gender difference in the cost of actual turnover decisions across the South African telecommunications sector? T-tests or Correlation in this one. – What practical recommendations can be made to the management of South African telecommunications companies on leveraging gender to mitigate employee turnover decisions?

Your assistance will be appreciated if I could get a response as early as possible tomorrow

Like

This was quite helpful. Thank you so much.

kidane Getachew

wow I got a lot from this article, thank you very much, keep it up

FAROUK AHMAD NKENGA

Thanks for yhe guidance. Can you send me this guidance on my email? To enable offline reading?

Nosi Ruth Xabendlini

Thank you very much, this service is very helpful.

George William Kiyingi

Every novice researcher needs to read this article as it puts things so clear and easy to follow. Its been very helpful.

Adebisi

Wonderful!!!! you explained everything in a way that anyone can learn. Thank you!!

Miss Annah

I really enjoyed reading though this. Very easy to follow. Thank you

Reza Kia

Many thanks for your useful lecture, I would be really appreciated if you could possibly share with me the PPT of presentation related to Data type?

Protasia Tairo

Thank you very much for sharing, I got much from this article

Fatuma Chobo

This is a very informative write-up. Kindly include me in your latest posts.

naphtal

Very interesting mostly for social scientists

Boy M. Bachtiar

Thank you so much, very helpfull

You’re welcome 🙂

Dr Mafaza Mansoor

woow, its great, its very informative and well understood because of your way of writing like teaching in front of me in simple languages.

Opio Len

I have been struggling to understand a lot of these concepts. Thank you for the informative piece which is written with outstanding clarity.

Eric

very informative article. Easy to understand

Leena Fukey

Beautiful read, much needed.

didin

Always greet intro and summary. I learn so much from GradCoach

Mmusyoka

Quite informative. Simple and clear summary.

Jewel Faver

I thoroughly enjoyed reading your informative and inspiring piece. Your profound insights into this topic truly provide a better understanding of its complexity. I agree with the points you raised, especially when you delved into the specifics of the article. In my opinion, that aspect is often overlooked and deserves further attention.

Shantae

Absolutely!!! Thank you

Thazika Chitimera

Thank you very much for this post. It made me to understand how to do my data analysis.

lule victor

its nice work and excellent job ,you have made my work easier

Pedro Uwadum

Wow! So explicit. Well done.

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

Data Analysis

  • Introduction to Data Analysis
  • Quantitative Analysis Tools
  • Qualitative Analysis Tools
  • Mixed Methods Analysis
  • Geospatial Analysis
  • Further Reading

Profile Photo

What is Data Analysis?

According to the federal government, data analysis is "the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data" ( Responsible Conduct in Data Management ). Important components of data analysis include searching for patterns, remaining unbiased in drawing inference from data, practicing responsible  data management , and maintaining "honest and accurate analysis" ( Responsible Conduct in Data Management ). 

In order to understand data analysis further, it can be helpful to take a step back and understand the question "What is data?". Many of us associate data with spreadsheets of numbers and values, however, data can encompass much more than that. According to the federal government, data is "The recorded factual material commonly accepted in the scientific community as necessary to validate research findings" ( OMB Circular 110 ). This broad definition can include information in many formats. 

Some examples of types of data are as follows:

  • Photographs 
  • Hand-written notes from field observation
  • Machine learning training data sets
  • Ethnographic interview transcripts
  • Sheet music
  • Scripts for plays and musicals 
  • Observations from laboratory experiments ( CMU Data 101 )

Thus, data analysis includes the processing and manipulation of these data sources in order to gain additional insight from data, answer a research question, or confirm a research hypothesis. 

Data analysis falls within the larger research data lifecycle, as seen below. 

( University of Virginia )

Why Analyze Data?

Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data. 

What are the Types of Data Analysis?

Data analysis can be quantitative, qualitative, or mixed methods. 

Quantitative research typically involves numbers and "close-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures ( Creswell & Creswell, 2018 , p. 4). Quantitative analysis usually uses deductive reasoning. 

Qualitative  research typically involves words and "open-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" ( 2018 , p. 4). Thus, qualitative analysis usually invokes inductive reasoning. 

Mixed methods  research uses methods from both quantitative and qualitative research approaches. Mixed methods research works under the "core assumption... that the integration of qualitative and quantitative data yields additional insight beyond the information provided by either the quantitative or qualitative data alone" ( Creswell & Creswell, 2018 , p. 4). 

  • Next: Planning >>
  • Last Updated: May 3, 2024 9:38 AM
  • URL: https://guides.library.georgetown.edu/data-analysis

Creative Commons

The 7 Most Useful Data Analysis Methods and Techniques

Data analytics is the process of analyzing raw data to draw out meaningful insights. These insights are then used to determine the best course of action.

When is the best time to roll out that marketing campaign? Is the current team structure as effective as it could be? Which customer segments are most likely to purchase your new product?

Ultimately, data analytics is a crucial driver of any successful business strategy. But how do data analysts actually turn raw data into something useful? There are a range of methods and techniques that data analysts use depending on the type of data in question and the kinds of insights they want to uncover.

You can get a hands-on introduction to data analytics in this free short course .

In this post, we’ll explore some of the most useful data analysis techniques. By the end, you’ll have a much clearer idea of how you can transform meaningless data into business intelligence. We’ll cover:

  • What is data analysis and why is it important?
  • What is the difference between qualitative and quantitative data?
  • Regression analysis
  • Monte Carlo simulation
  • Factor analysis
  • Cohort analysis
  • Cluster analysis
  • Time series analysis
  • Sentiment analysis
  • The data analysis process
  • The best tools for data analysis
  •  Key takeaways

The first six methods listed are used for quantitative data , while the last technique applies to qualitative data. We briefly explain the difference between quantitative and qualitative data in section two, but if you want to skip straight to a particular analysis technique, just use the clickable menu.

1. What is data analysis and why is it important?

Data analysis is, put simply, the process of discovering useful information by evaluating data. This is done through a process of inspecting, cleaning, transforming, and modeling data using analytical and statistical tools, which we will explore in detail further along in this article.

Why is data analysis important? Analyzing data effectively helps organizations make business decisions. Nowadays, data is collected by businesses constantly: through surveys, online tracking, online marketing analytics, collected subscription and registration data (think newsletters), social media monitoring, among other methods.

These data will appear as different structures, including—but not limited to—the following:

The concept of big data —data that is so large, fast, or complex, that it is difficult or impossible to process using traditional methods—gained momentum in the early 2000s. Then, Doug Laney, an industry analyst, articulated what is now known as the mainstream definition of big data as the three Vs: volume, velocity, and variety. 

  • Volume: As mentioned earlier, organizations are collecting data constantly. In the not-too-distant past it would have been a real issue to store, but nowadays storage is cheap and takes up little space.
  • Velocity: Received data needs to be handled in a timely manner. With the growth of the Internet of Things, this can mean these data are coming in constantly, and at an unprecedented speed.
  • Variety: The data being collected and stored by organizations comes in many forms, ranging from structured data—that is, more traditional, numerical data—to unstructured data—think emails, videos, audio, and so on. We’ll cover structured and unstructured data a little further on.

This is a form of data that provides information about other data, such as an image. In everyday life you’ll find this by, for example, right-clicking on a file in a folder and selecting “Get Info”, which will show you information such as file size and kind, date of creation, and so on.

Real-time data

This is data that is presented as soon as it is acquired. A good example of this is a stock market ticket, which provides information on the most-active stocks in real time.

Machine data

This is data that is produced wholly by machines, without human instruction. An example of this could be call logs automatically generated by your smartphone.

Quantitative and qualitative data

Quantitative data—otherwise known as structured data— may appear as a “traditional” database—that is, with rows and columns. Qualitative data—otherwise known as unstructured data—are the other types of data that don’t fit into rows and columns, which can include text, images, videos and more. We’ll discuss this further in the next section.

2. What is the difference between quantitative and qualitative data?

How you analyze your data depends on the type of data you’re dealing with— quantitative or qualitative . So what’s the difference?

Quantitative data is anything measurable , comprising specific quantities and numbers. Some examples of quantitative data include sales figures, email click-through rates, number of website visitors, and percentage revenue increase. Quantitative data analysis techniques focus on the statistical, mathematical, or numerical analysis of (usually large) datasets. This includes the manipulation of statistical data using computational techniques and algorithms. Quantitative analysis techniques are often used to explain certain phenomena or to make predictions.

Qualitative data cannot be measured objectively , and is therefore open to more subjective interpretation. Some examples of qualitative data include comments left in response to a survey question, things people have said during interviews, tweets and other social media posts, and the text included in product reviews. With qualitative data analysis, the focus is on making sense of unstructured data (such as written text, or transcripts of spoken conversations). Often, qualitative analysis will organize the data into themes—a process which, fortunately, can be automated.

Data analysts work with both quantitative and qualitative data , so it’s important to be familiar with a variety of analysis methods. Let’s take a look at some of the most useful techniques now.

3. Data analysis techniques

Now we’re familiar with some of the different types of data, let’s focus on the topic at hand: different methods for analyzing data. 

a. Regression analysis

Regression analysis is used to estimate the relationship between a set of variables. When conducting any type of regression analysis , you’re looking to see if there’s a correlation between a dependent variable (that’s the variable or outcome you want to measure or predict) and any number of independent variables (factors which may have an impact on the dependent variable). The aim of regression analysis is to estimate how one or more variables might impact the dependent variable, in order to identify trends and patterns. This is especially useful for making predictions and forecasting future trends.

Let’s imagine you work for an ecommerce company and you want to examine the relationship between: (a) how much money is spent on social media marketing, and (b) sales revenue. In this case, sales revenue is your dependent variable—it’s the factor you’re most interested in predicting and boosting. Social media spend is your independent variable; you want to determine whether or not it has an impact on sales and, ultimately, whether it’s worth increasing, decreasing, or keeping the same. Using regression analysis, you’d be able to see if there’s a relationship between the two variables. A positive correlation would imply that the more you spend on social media marketing, the more sales revenue you make. No correlation at all might suggest that social media marketing has no bearing on your sales. Understanding the relationship between these two variables would help you to make informed decisions about the social media budget going forward. However: It’s important to note that, on their own, regressions can only be used to determine whether or not there is a relationship between a set of variables—they don’t tell you anything about cause and effect. So, while a positive correlation between social media spend and sales revenue may suggest that one impacts the other, it’s impossible to draw definitive conclusions based on this analysis alone.

There are many different types of regression analysis, and the model you use depends on the type of data you have for the dependent variable. For example, your dependent variable might be continuous (i.e. something that can be measured on a continuous scale, such as sales revenue in USD), in which case you’d use a different type of regression analysis than if your dependent variable was categorical in nature (i.e. comprising values that can be categorised into a number of distinct groups based on a certain characteristic, such as customer location by continent). You can learn more about different types of dependent variables and how to choose the right regression analysis in this guide .

Regression analysis in action: Investigating the relationship between clothing brand Benetton’s advertising expenditure and sales

b. Monte Carlo simulation

When making decisions or taking certain actions, there are a range of different possible outcomes. If you take the bus, you might get stuck in traffic. If you walk, you might get caught in the rain or bump into your chatty neighbor, potentially delaying your journey. In everyday life, we tend to briefly weigh up the pros and cons before deciding which action to take; however, when the stakes are high, it’s essential to calculate, as thoroughly and accurately as possible, all the potential risks and rewards.

Monte Carlo simulation, otherwise known as the Monte Carlo method, is a computerized technique used to generate models of possible outcomes and their probability distributions. It essentially considers a range of possible outcomes and then calculates how likely it is that each particular outcome will be realized. The Monte Carlo method is used by data analysts to conduct advanced risk analysis, allowing them to better forecast what might happen in the future and make decisions accordingly.

So how does Monte Carlo simulation work, and what can it tell us? To run a Monte Carlo simulation, you’ll start with a mathematical model of your data—such as a spreadsheet. Within your spreadsheet, you’ll have one or several outputs that you’re interested in; profit, for example, or number of sales. You’ll also have a number of inputs; these are variables that may impact your output variable. If you’re looking at profit, relevant inputs might include the number of sales, total marketing spend, and employee salaries. If you knew the exact, definitive values of all your input variables, you’d quite easily be able to calculate what profit you’d be left with at the end. However, when these values are uncertain, a Monte Carlo simulation enables you to calculate all the possible options and their probabilities. What will your profit be if you make 100,000 sales and hire five new employees on a salary of $50,000 each? What is the likelihood of this outcome? What will your profit be if you only make 12,000 sales and hire five new employees? And so on. It does this by replacing all uncertain values with functions which generate random samples from distributions determined by you, and then running a series of calculations and recalculations to produce models of all the possible outcomes and their probability distributions. The Monte Carlo method is one of the most popular techniques for calculating the effect of unpredictable variables on a specific output variable, making it ideal for risk analysis.

Monte Carlo simulation in action: A case study using Monte Carlo simulation for risk analysis

 c. Factor analysis

Factor analysis is a technique used to reduce a large number of variables to a smaller number of factors. It works on the basis that multiple separate, observable variables correlate with each other because they are all associated with an underlying construct. This is useful not only because it condenses large datasets into smaller, more manageable samples, but also because it helps to uncover hidden patterns. This allows you to explore concepts that cannot be easily measured or observed—such as wealth, happiness, fitness, or, for a more business-relevant example, customer loyalty and satisfaction.

Let’s imagine you want to get to know your customers better, so you send out a rather long survey comprising one hundred questions. Some of the questions relate to how they feel about your company and product; for example, “Would you recommend us to a friend?” and “How would you rate the overall customer experience?” Other questions ask things like “What is your yearly household income?” and “How much are you willing to spend on skincare each month?”

Once your survey has been sent out and completed by lots of customers, you end up with a large dataset that essentially tells you one hundred different things about each customer (assuming each customer gives one hundred responses). Instead of looking at each of these responses (or variables) individually, you can use factor analysis to group them into factors that belong together—in other words, to relate them to a single underlying construct. In this example, factor analysis works by finding survey items that are strongly correlated. This is known as covariance . So, if there’s a strong positive correlation between household income and how much they’re willing to spend on skincare each month (i.e. as one increases, so does the other), these items may be grouped together. Together with other variables (survey responses), you may find that they can be reduced to a single factor such as “consumer purchasing power”. Likewise, if a customer experience rating of 10/10 correlates strongly with “yes” responses regarding how likely they are to recommend your product to a friend, these items may be reduced to a single factor such as “customer satisfaction”.

In the end, you have a smaller number of factors rather than hundreds of individual variables. These factors are then taken forward for further analysis, allowing you to learn more about your customers (or any other area you’re interested in exploring).

Factor analysis in action: Using factor analysis to explore customer behavior patterns in Tehran

d. Cohort analysis

Cohort analysis is a data analytics technique that groups users based on a shared characteristic , such as the date they signed up for a service or the product they purchased. Once users are grouped into cohorts, analysts can track their behavior over time to identify trends and patterns.

So what does this mean and why is it useful? Let’s break down the above definition further. A cohort is a group of people who share a common characteristic (or action) during a given time period. Students who enrolled at university in 2020 may be referred to as the 2020 cohort. Customers who purchased something from your online store via the app in the month of December may also be considered a cohort.

With cohort analysis, you’re dividing your customers or users into groups and looking at how these groups behave over time. So, rather than looking at a single, isolated snapshot of all your customers at a given moment in time (with each customer at a different point in their journey), you’re examining your customers’ behavior in the context of the customer lifecycle. As a result, you can start to identify patterns of behavior at various points in the customer journey—say, from their first ever visit to your website, through to email newsletter sign-up, to their first purchase, and so on. As such, cohort analysis is dynamic, allowing you to uncover valuable insights about the customer lifecycle.

This is useful because it allows companies to tailor their service to specific customer segments (or cohorts). Let’s imagine you run a 50% discount campaign in order to attract potential new customers to your website. Once you’ve attracted a group of new customers (a cohort), you’ll want to track whether they actually buy anything and, if they do, whether or not (and how frequently) they make a repeat purchase. With these insights, you’ll start to gain a much better understanding of when this particular cohort might benefit from another discount offer or retargeting ads on social media, for example. Ultimately, cohort analysis allows companies to optimize their service offerings (and marketing) to provide a more targeted, personalized experience. You can learn more about how to run cohort analysis using Google Analytics .

Cohort analysis in action: How Ticketmaster used cohort analysis to boost revenue

e. Cluster analysis

Cluster analysis is an exploratory technique that seeks to identify structures within a dataset. The goal of cluster analysis is to sort different data points into groups (or clusters) that are internally homogeneous and externally heterogeneous. This means that data points within a cluster are similar to each other, and dissimilar to data points in another cluster. Clustering is used to gain insight into how data is distributed in a given dataset, or as a preprocessing step for other algorithms.

There are many real-world applications of cluster analysis. In marketing, cluster analysis is commonly used to group a large customer base into distinct segments, allowing for a more targeted approach to advertising and communication. Insurance firms might use cluster analysis to investigate why certain locations are associated with a high number of insurance claims. Another common application is in geology, where experts will use cluster analysis to evaluate which cities are at greatest risk of earthquakes (and thus try to mitigate the risk with protective measures).

It’s important to note that, while cluster analysis may reveal structures within your data, it won’t explain why those structures exist. With that in mind, cluster analysis is a useful starting point for understanding your data and informing further analysis. Clustering algorithms are also used in machine learning—you can learn more about clustering in machine learning in our guide .

Cluster analysis in action: Using cluster analysis for customer segmentation—a telecoms case study example

f. Time series analysis

Time series analysis is a statistical technique used to identify trends and cycles over time. Time series data is a sequence of data points which measure the same variable at different points in time (for example, weekly sales figures or monthly email sign-ups). By looking at time-related trends, analysts are able to forecast how the variable of interest may fluctuate in the future.

When conducting time series analysis, the main patterns you’ll be looking out for in your data are:

  • Trends: Stable, linear increases or decreases over an extended time period.
  • Seasonality: Predictable fluctuations in the data due to seasonal factors over a short period of time. For example, you might see a peak in swimwear sales in summer around the same time every year.
  • Cyclic patterns: Unpredictable cycles where the data fluctuates. Cyclical trends are not due to seasonality, but rather, may occur as a result of economic or industry-related conditions.

As you can imagine, the ability to make informed predictions about the future has immense value for business. Time series analysis and forecasting is used across a variety of industries, most commonly for stock market analysis, economic forecasting, and sales forecasting. There are different types of time series models depending on the data you’re using and the outcomes you want to predict. These models are typically classified into three broad types: the autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models. For an in-depth look at time series analysis, refer to our guide .

Time series analysis in action: Developing a time series model to predict jute yarn demand in Bangladesh

g. Sentiment analysis

When you think of data, your mind probably automatically goes to numbers and spreadsheets.

Many companies overlook the value of qualitative data, but in reality, there are untold insights to be gained from what people (especially customers) write and say about you. So how do you go about analyzing textual data?

One highly useful qualitative technique is sentiment analysis , a technique which belongs to the broader category of text analysis —the (usually automated) process of sorting and understanding textual data.

With sentiment analysis, the goal is to interpret and classify the emotions conveyed within textual data. From a business perspective, this allows you to ascertain how your customers feel about various aspects of your brand, product, or service.

There are several different types of sentiment analysis models, each with a slightly different focus. The three main types include:

Fine-grained sentiment analysis

If you want to focus on opinion polarity (i.e. positive, neutral, or negative) in depth, fine-grained sentiment analysis will allow you to do so.

For example, if you wanted to interpret star ratings given by customers, you might use fine-grained sentiment analysis to categorize the various ratings along a scale ranging from very positive to very negative.

Emotion detection

This model often uses complex machine learning algorithms to pick out various emotions from your textual data.

You might use an emotion detection model to identify words associated with happiness, anger, frustration, and excitement, giving you insight into how your customers feel when writing about you or your product on, say, a product review site.

Aspect-based sentiment analysis

This type of analysis allows you to identify what specific aspects the emotions or opinions relate to, such as a certain product feature or a new ad campaign.

If a customer writes that they “find the new Instagram advert so annoying”, your model should detect not only a negative sentiment, but also the object towards which it’s directed.

In a nutshell, sentiment analysis uses various Natural Language Processing (NLP) algorithms and systems which are trained to associate certain inputs (for example, certain words) with certain outputs.

For example, the input “annoying” would be recognized and tagged as “negative”. Sentiment analysis is crucial to understanding how your customers feel about you and your products, for identifying areas for improvement, and even for averting PR disasters in real-time!

Sentiment analysis in action: 5 Real-world sentiment analysis case studies

4. The data analysis process

In order to gain meaningful insights from data, data analysts will perform a rigorous step-by-step process. We go over this in detail in our step by step guide to the data analysis process —but, to briefly summarize, the data analysis process generally consists of the following phases:

Defining the question

The first step for any data analyst will be to define the objective of the analysis, sometimes called a ‘problem statement’. Essentially, you’re asking a question with regards to a business problem you’re trying to solve. Once you’ve defined this, you’ll then need to determine which data sources will help you answer this question.

Collecting the data

Now that you’ve defined your objective, the next step will be to set up a strategy for collecting and aggregating the appropriate data. Will you be using quantitative (numeric) or qualitative (descriptive) data? Do these data fit into first-party, second-party, or third-party data?

Learn more: Quantitative vs. Qualitative Data: What’s the Difference? 

Cleaning the data

Unfortunately, your collected data isn’t automatically ready for analysis—you’ll have to clean it first. As a data analyst, this phase of the process will take up the most time. During the data cleaning process, you will likely be:

  • Removing major errors, duplicates, and outliers
  • Removing unwanted data points
  • Structuring the data—that is, fixing typos, layout issues, etc.
  • Filling in major gaps in data

Analyzing the data

Now that we’ve finished cleaning the data, it’s time to analyze it! Many analysis methods have already been described in this article, and it’s up to you to decide which one will best suit the assigned objective. It may fall under one of the following categories:

  • Descriptive analysis , which identifies what has already happened
  • Diagnostic analysis , which focuses on understanding why something has happened
  • Predictive analysis , which identifies future trends based on historical data
  • Prescriptive analysis , which allows you to make recommendations for the future

Visualizing and sharing your findings

We’re almost at the end of the road! Analyses have been made, insights have been gleaned—all that remains to be done is to share this information with others. This is usually done with a data visualization tool, such as Google Charts, or Tableau.

Learn more: 13 of the Most Common Types of Data Visualization

To sum up the process, Will’s explained it all excellently in the following video:

5. The best tools for data analysis

As you can imagine, every phase of the data analysis process requires the data analyst to have a variety of tools under their belt that assist in gaining valuable insights from data. We cover these tools in greater detail in this article , but, in summary, here’s our best-of-the-best list, with links to each product:

The top 9 tools for data analysts

  • Microsoft Excel
  • Jupyter Notebook
  • Apache Spark
  • Microsoft Power BI

6. Key takeaways and further reading

As you can see, there are many different data analysis techniques at your disposal. In order to turn your raw data into actionable insights, it’s important to consider what kind of data you have (is it qualitative or quantitative?) as well as the kinds of insights that will be useful within the given context. In this post, we’ve introduced seven of the most useful data analysis techniques—but there are many more out there to be discovered!

So what now? If you haven’t already, we recommend reading the case studies for each analysis technique discussed in this post (you’ll find a link at the end of each section). For a more hands-on introduction to the kinds of methods and techniques that data analysts use, try out this free introductory data analytics short course. In the meantime, you might also want to read the following:

  • The Best Online Data Analytics Courses for 2024
  • What Is Time Series Data and How Is It Analyzed?
  • What is Spatial Analysis?

PW Skills | Blog

Data Analysis Techniques in Research – Methods, Tools & Examples

' src=

Varun Saharawat is a seasoned professional in the fields of SEO and content writing. With a profound knowledge of the intricate aspects of these disciplines, Varun has established himself as a valuable asset in the world of digital marketing and online content creation.

data analysis techniques in research

Data analysis techniques in research are essential because they allow researchers to derive meaningful insights from data sets to support their hypotheses or research objectives.

Data Analysis Techniques in Research : While various groups, institutions, and professionals may have diverse approaches to data analysis, a universal definition captures its essence. Data analysis involves refining, transforming, and interpreting raw data to derive actionable insights that guide informed decision-making for businesses.

Data Analytics Course

A straightforward illustration of data analysis emerges when we make everyday decisions, basing our choices on past experiences or predictions of potential outcomes.

If you want to learn more about this topic and acquire valuable skills that will set you apart in today’s data-driven world, we highly recommend enrolling in the Data Analytics Course by Physics Wallah . And as a special offer for our readers, use the coupon code “READER” to get a discount on this course.

Table of Contents

What is Data Analysis?

Data analysis is the systematic process of inspecting, cleaning, transforming, and interpreting data with the objective of discovering valuable insights and drawing meaningful conclusions. This process involves several steps:

  • Inspecting : Initial examination of data to understand its structure, quality, and completeness.
  • Cleaning : Removing errors, inconsistencies, or irrelevant information to ensure accurate analysis.
  • Transforming : Converting data into a format suitable for analysis, such as normalization or aggregation.
  • Interpreting : Analyzing the transformed data to identify patterns, trends, and relationships.

Types of Data Analysis Techniques in Research

Data analysis techniques in research are categorized into qualitative and quantitative methods, each with its specific approaches and tools. These techniques are instrumental in extracting meaningful insights, patterns, and relationships from data to support informed decision-making, validate hypotheses, and derive actionable recommendations. Below is an in-depth exploration of the various types of data analysis techniques commonly employed in research:

1) Qualitative Analysis:

Definition: Qualitative analysis focuses on understanding non-numerical data, such as opinions, concepts, or experiences, to derive insights into human behavior, attitudes, and perceptions.

  • Content Analysis: Examines textual data, such as interview transcripts, articles, or open-ended survey responses, to identify themes, patterns, or trends.
  • Narrative Analysis: Analyzes personal stories or narratives to understand individuals’ experiences, emotions, or perspectives.
  • Ethnographic Studies: Involves observing and analyzing cultural practices, behaviors, and norms within specific communities or settings.

2) Quantitative Analysis:

Quantitative analysis emphasizes numerical data and employs statistical methods to explore relationships, patterns, and trends. It encompasses several approaches:

Descriptive Analysis:

  • Frequency Distribution: Represents the number of occurrences of distinct values within a dataset.
  • Central Tendency: Measures such as mean, median, and mode provide insights into the central values of a dataset.
  • Dispersion: Techniques like variance and standard deviation indicate the spread or variability of data.

Diagnostic Analysis:

  • Regression Analysis: Assesses the relationship between dependent and independent variables, enabling prediction or understanding causality.
  • ANOVA (Analysis of Variance): Examines differences between groups to identify significant variations or effects.

Predictive Analysis:

  • Time Series Forecasting: Uses historical data points to predict future trends or outcomes.
  • Machine Learning Algorithms: Techniques like decision trees, random forests, and neural networks predict outcomes based on patterns in data.

Prescriptive Analysis:

  • Optimization Models: Utilizes linear programming, integer programming, or other optimization techniques to identify the best solutions or strategies.
  • Simulation: Mimics real-world scenarios to evaluate various strategies or decisions and determine optimal outcomes.

Specific Techniques:

  • Monte Carlo Simulation: Models probabilistic outcomes to assess risk and uncertainty.
  • Factor Analysis: Reduces the dimensionality of data by identifying underlying factors or components.
  • Cohort Analysis: Studies specific groups or cohorts over time to understand trends, behaviors, or patterns within these groups.
  • Cluster Analysis: Classifies objects or individuals into homogeneous groups or clusters based on similarities or attributes.
  • Sentiment Analysis: Uses natural language processing and machine learning techniques to determine sentiment, emotions, or opinions from textual data.

Also Read: AI and Predictive Analytics: Examples, Tools, Uses, Ai Vs Predictive Analytics

Data Analysis Techniques in Research Examples

To provide a clearer understanding of how data analysis techniques are applied in research, let’s consider a hypothetical research study focused on evaluating the impact of online learning platforms on students’ academic performance.

Research Objective:

Determine if students using online learning platforms achieve higher academic performance compared to those relying solely on traditional classroom instruction.

Data Collection:

  • Quantitative Data: Academic scores (grades) of students using online platforms and those using traditional classroom methods.
  • Qualitative Data: Feedback from students regarding their learning experiences, challenges faced, and preferences.

Data Analysis Techniques Applied:

1) Descriptive Analysis:

  • Calculate the mean, median, and mode of academic scores for both groups.
  • Create frequency distributions to represent the distribution of grades in each group.

2) Diagnostic Analysis:

  • Conduct an Analysis of Variance (ANOVA) to determine if there’s a statistically significant difference in academic scores between the two groups.
  • Perform Regression Analysis to assess the relationship between the time spent on online platforms and academic performance.

3) Predictive Analysis:

  • Utilize Time Series Forecasting to predict future academic performance trends based on historical data.
  • Implement Machine Learning algorithms to develop a predictive model that identifies factors contributing to academic success on online platforms.

4) Prescriptive Analysis:

  • Apply Optimization Models to identify the optimal combination of online learning resources (e.g., video lectures, interactive quizzes) that maximize academic performance.
  • Use Simulation Techniques to evaluate different scenarios, such as varying student engagement levels with online resources, to determine the most effective strategies for improving learning outcomes.

5) Specific Techniques:

  • Conduct Factor Analysis on qualitative feedback to identify common themes or factors influencing students’ perceptions and experiences with online learning.
  • Perform Cluster Analysis to segment students based on their engagement levels, preferences, or academic outcomes, enabling targeted interventions or personalized learning strategies.
  • Apply Sentiment Analysis on textual feedback to categorize students’ sentiments as positive, negative, or neutral regarding online learning experiences.

By applying a combination of qualitative and quantitative data analysis techniques, this research example aims to provide comprehensive insights into the effectiveness of online learning platforms.

Also Read: Learning Path to Become a Data Analyst in 2024

Data Analysis Techniques in Quantitative Research

Quantitative research involves collecting numerical data to examine relationships, test hypotheses, and make predictions. Various data analysis techniques are employed to interpret and draw conclusions from quantitative data. Here are some key data analysis techniques commonly used in quantitative research:

1) Descriptive Statistics:

  • Description: Descriptive statistics are used to summarize and describe the main aspects of a dataset, such as central tendency (mean, median, mode), variability (range, variance, standard deviation), and distribution (skewness, kurtosis).
  • Applications: Summarizing data, identifying patterns, and providing initial insights into the dataset.

2) Inferential Statistics:

  • Description: Inferential statistics involve making predictions or inferences about a population based on a sample of data. This technique includes hypothesis testing, confidence intervals, t-tests, chi-square tests, analysis of variance (ANOVA), regression analysis, and correlation analysis.
  • Applications: Testing hypotheses, making predictions, and generalizing findings from a sample to a larger population.

3) Regression Analysis:

  • Description: Regression analysis is a statistical technique used to model and examine the relationship between a dependent variable and one or more independent variables. Linear regression, multiple regression, logistic regression, and nonlinear regression are common types of regression analysis .
  • Applications: Predicting outcomes, identifying relationships between variables, and understanding the impact of independent variables on the dependent variable.

4) Correlation Analysis:

  • Description: Correlation analysis is used to measure and assess the strength and direction of the relationship between two or more variables. The Pearson correlation coefficient, Spearman rank correlation coefficient, and Kendall’s tau are commonly used measures of correlation.
  • Applications: Identifying associations between variables and assessing the degree and nature of the relationship.

5) Factor Analysis:

  • Description: Factor analysis is a multivariate statistical technique used to identify and analyze underlying relationships or factors among a set of observed variables. It helps in reducing the dimensionality of data and identifying latent variables or constructs.
  • Applications: Identifying underlying factors or constructs, simplifying data structures, and understanding the underlying relationships among variables.

6) Time Series Analysis:

  • Description: Time series analysis involves analyzing data collected or recorded over a specific period at regular intervals to identify patterns, trends, and seasonality. Techniques such as moving averages, exponential smoothing, autoregressive integrated moving average (ARIMA), and Fourier analysis are used.
  • Applications: Forecasting future trends, analyzing seasonal patterns, and understanding time-dependent relationships in data.

7) ANOVA (Analysis of Variance):

  • Description: Analysis of variance (ANOVA) is a statistical technique used to analyze and compare the means of two or more groups or treatments to determine if they are statistically different from each other. One-way ANOVA, two-way ANOVA, and MANOVA (Multivariate Analysis of Variance) are common types of ANOVA.
  • Applications: Comparing group means, testing hypotheses, and determining the effects of categorical independent variables on a continuous dependent variable.

8) Chi-Square Tests:

  • Description: Chi-square tests are non-parametric statistical tests used to assess the association between categorical variables in a contingency table. The Chi-square test of independence, goodness-of-fit test, and test of homogeneity are common chi-square tests.
  • Applications: Testing relationships between categorical variables, assessing goodness-of-fit, and evaluating independence.

These quantitative data analysis techniques provide researchers with valuable tools and methods to analyze, interpret, and derive meaningful insights from numerical data. The selection of a specific technique often depends on the research objectives, the nature of the data, and the underlying assumptions of the statistical methods being used.

Also Read: Analysis vs. Analytics: How Are They Different?

Data Analysis Methods

Data analysis methods refer to the techniques and procedures used to analyze, interpret, and draw conclusions from data. These methods are essential for transforming raw data into meaningful insights, facilitating decision-making processes, and driving strategies across various fields. Here are some common data analysis methods:

  • Description: Descriptive statistics summarize and organize data to provide a clear and concise overview of the dataset. Measures such as mean, median, mode, range, variance, and standard deviation are commonly used.
  • Description: Inferential statistics involve making predictions or inferences about a population based on a sample of data. Techniques such as hypothesis testing, confidence intervals, and regression analysis are used.

3) Exploratory Data Analysis (EDA):

  • Description: EDA techniques involve visually exploring and analyzing data to discover patterns, relationships, anomalies, and insights. Methods such as scatter plots, histograms, box plots, and correlation matrices are utilized.
  • Applications: Identifying trends, patterns, outliers, and relationships within the dataset.

4) Predictive Analytics:

  • Description: Predictive analytics use statistical algorithms and machine learning techniques to analyze historical data and make predictions about future events or outcomes. Techniques such as regression analysis, time series forecasting, and machine learning algorithms (e.g., decision trees, random forests, neural networks) are employed.
  • Applications: Forecasting future trends, predicting outcomes, and identifying potential risks or opportunities.

5) Prescriptive Analytics:

  • Description: Prescriptive analytics involve analyzing data to recommend actions or strategies that optimize specific objectives or outcomes. Optimization techniques, simulation models, and decision-making algorithms are utilized.
  • Applications: Recommending optimal strategies, decision-making support, and resource allocation.

6) Qualitative Data Analysis:

  • Description: Qualitative data analysis involves analyzing non-numerical data, such as text, images, videos, or audio, to identify themes, patterns, and insights. Methods such as content analysis, thematic analysis, and narrative analysis are used.
  • Applications: Understanding human behavior, attitudes, perceptions, and experiences.

7) Big Data Analytics:

  • Description: Big data analytics methods are designed to analyze large volumes of structured and unstructured data to extract valuable insights. Technologies such as Hadoop, Spark, and NoSQL databases are used to process and analyze big data.
  • Applications: Analyzing large datasets, identifying trends, patterns, and insights from big data sources.

8) Text Analytics:

  • Description: Text analytics methods involve analyzing textual data, such as customer reviews, social media posts, emails, and documents, to extract meaningful information and insights. Techniques such as sentiment analysis, text mining, and natural language processing (NLP) are used.
  • Applications: Analyzing customer feedback, monitoring brand reputation, and extracting insights from textual data sources.

These data analysis methods are instrumental in transforming data into actionable insights, informing decision-making processes, and driving organizational success across various sectors, including business, healthcare, finance, marketing, and research. The selection of a specific method often depends on the nature of the data, the research objectives, and the analytical requirements of the project or organization.

Also Read: Quantitative Data Analysis: Types, Analysis & Examples

Data Analysis Tools

Data analysis tools are essential instruments that facilitate the process of examining, cleaning, transforming, and modeling data to uncover useful information, make informed decisions, and drive strategies. Here are some prominent data analysis tools widely used across various industries:

1) Microsoft Excel:

  • Description: A spreadsheet software that offers basic to advanced data analysis features, including pivot tables, data visualization tools, and statistical functions.
  • Applications: Data cleaning, basic statistical analysis, visualization, and reporting.

2) R Programming Language:

  • Description: An open-source programming language specifically designed for statistical computing and data visualization.
  • Applications: Advanced statistical analysis, data manipulation, visualization, and machine learning.

3) Python (with Libraries like Pandas, NumPy, Matplotlib, and Seaborn):

  • Description: A versatile programming language with libraries that support data manipulation, analysis, and visualization.
  • Applications: Data cleaning, statistical analysis, machine learning, and data visualization.

4) SPSS (Statistical Package for the Social Sciences):

  • Description: A comprehensive statistical software suite used for data analysis, data mining, and predictive analytics.
  • Applications: Descriptive statistics, hypothesis testing, regression analysis, and advanced analytics.

5) SAS (Statistical Analysis System):

  • Description: A software suite used for advanced analytics, multivariate analysis, and predictive modeling.
  • Applications: Data management, statistical analysis, predictive modeling, and business intelligence.

6) Tableau:

  • Description: A data visualization tool that allows users to create interactive and shareable dashboards and reports.
  • Applications: Data visualization , business intelligence , and interactive dashboard creation.

7) Power BI:

  • Description: A business analytics tool developed by Microsoft that provides interactive visualizations and business intelligence capabilities.
  • Applications: Data visualization, business intelligence, reporting, and dashboard creation.

8) SQL (Structured Query Language) Databases (e.g., MySQL, PostgreSQL, Microsoft SQL Server):

  • Description: Database management systems that support data storage, retrieval, and manipulation using SQL queries.
  • Applications: Data retrieval, data cleaning, data transformation, and database management.

9) Apache Spark:

  • Description: A fast and general-purpose distributed computing system designed for big data processing and analytics.
  • Applications: Big data processing, machine learning, data streaming, and real-time analytics.

10) IBM SPSS Modeler:

  • Description: A data mining software application used for building predictive models and conducting advanced analytics.
  • Applications: Predictive modeling, data mining, statistical analysis, and decision optimization.

These tools serve various purposes and cater to different data analysis needs, from basic statistical analysis and data visualization to advanced analytics, machine learning, and big data processing. The choice of a specific tool often depends on the nature of the data, the complexity of the analysis, and the specific requirements of the project or organization.

Also Read: How to Analyze Survey Data: Methods & Examples

Importance of Data Analysis in Research

The importance of data analysis in research cannot be overstated; it serves as the backbone of any scientific investigation or study. Here are several key reasons why data analysis is crucial in the research process:

  • Data analysis helps ensure that the results obtained are valid and reliable. By systematically examining the data, researchers can identify any inconsistencies or anomalies that may affect the credibility of the findings.
  • Effective data analysis provides researchers with the necessary information to make informed decisions. By interpreting the collected data, researchers can draw conclusions, make predictions, or formulate recommendations based on evidence rather than intuition or guesswork.
  • Data analysis allows researchers to identify patterns, trends, and relationships within the data. This can lead to a deeper understanding of the research topic, enabling researchers to uncover insights that may not be immediately apparent.
  • In empirical research, data analysis plays a critical role in testing hypotheses. Researchers collect data to either support or refute their hypotheses, and data analysis provides the tools and techniques to evaluate these hypotheses rigorously.
  • Transparent and well-executed data analysis enhances the credibility of research findings. By clearly documenting the data analysis methods and procedures, researchers allow others to replicate the study, thereby contributing to the reproducibility of research findings.
  • In fields such as business or healthcare, data analysis helps organizations allocate resources more efficiently. By analyzing data on consumer behavior, market trends, or patient outcomes, organizations can make strategic decisions about resource allocation, budgeting, and planning.
  • In public policy and social sciences, data analysis is instrumental in developing and evaluating policies and interventions. By analyzing data on social, economic, or environmental factors, policymakers can assess the effectiveness of existing policies and inform the development of new ones.
  • Data analysis allows for continuous improvement in research methods and practices. By analyzing past research projects, identifying areas for improvement, and implementing changes based on data-driven insights, researchers can refine their approaches and enhance the quality of future research endeavors.

However, it is important to remember that mastering these techniques requires practice and continuous learning. That’s why we highly recommend the Data Analytics Course by Physics Wallah . Not only does it cover all the fundamentals of data analysis, but it also provides hands-on experience with various tools such as Excel, Python, and Tableau. Plus, if you use the “ READER ” coupon code at checkout, you can get a special discount on the course.

For Latest Tech Related Information, Join Our Official Free Telegram Group : PW Skills Telegram Group

Data Analysis Techniques in Research FAQs

What are the 5 techniques for data analysis.

The five techniques for data analysis include: Descriptive Analysis Diagnostic Analysis Predictive Analysis Prescriptive Analysis Qualitative Analysis

What are techniques of data analysis in research?

Techniques of data analysis in research encompass both qualitative and quantitative methods. These techniques involve processes like summarizing raw data, investigating causes of events, forecasting future outcomes, offering recommendations based on predictions, and examining non-numerical data to understand concepts or experiences.

What are the 3 methods of data analysis?

The three primary methods of data analysis are: Qualitative Analysis Quantitative Analysis Mixed-Methods Analysis

What are the four types of data analysis techniques?

The four types of data analysis techniques are: Descriptive Analysis Diagnostic Analysis Predictive Analysis Prescriptive Analysis

10 Most Popular Big Data Analytics Tools

big data analytics tools

The world of big data analytics tools is diverse, with each tool offering a unique set of skills. Choose your…

Scope And Future of Data Analytics in 2025 And Beyond

analytical research techniques

Future of Data Analytics is sure to expand exponentially in the upcoming decade and beyond. Data Analysts will have more…

Which Course is Best for Business Analyst? (Business Analysts Online Courses)

business analysts online courses

Many reputed platforms and institutions offer online certification courses which can help you land job offers in relevant companies. In…

bottom banner

  • Search Search Please fill out this field.

What Is Data Analytics?

  • Understanding It
  • Data Analytics Rold
  • "Importance and Uses

The Bottom Line

  • Corporate Finance
  • Financial Analysis

Data Analytics: What It Is, How It's Used, and 4 Basic Techniques

analytical research techniques

Investopedia / Joules Garcia

The term data analytics refers to the science of analyzing raw data to make conclusions about information. Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data for human consumption. Data analytics can be used by different entities, such as businesses, to optimize their performance and maximize their profits. This is done by using software and other tools to gather and analyze raw data.

Key Takeaways

  • Data analytics is the science of analyzing raw data to make conclusions about that information.
  • Data analytics help a business optimize its performance, perform more efficiently, maximize profit, or make more strategically-guided decisions.
  • The techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data for human consumption. 
  • Various approaches to data analytics include descriptive analytics, diagnostic analytics, predictive analytics, and prescriptive analytics.
  • Data analytics relies on a variety of software tools including spreadsheets, data visualization, reporting tools, data mining programs, and open-source languages.

Understanding Data Analytics

Data analytics is a broad term that encompasses many diverse types of data analysis. Any type of information can be subjected to data analytics techniques to get insight that can be used to improve things. Data analytics techniques can reveal trends and metrics that would otherwise be lost in the mass of information. This information can then be used to optimize processes to increase the overall efficiency of a business or system.

For example, manufacturing companies often record the runtime, downtime, and work queue for various machines and then analyze the data to better plan workloads so the machines operate closer to peak capacity.

Data analytics can do much more than point out bottlenecks in production. Gaming companies use data analytics to set reward schedules for players that keep the majority of players active in the game. Content companies use many of the same data analytics to keep you clicking, watching, or re-organizing content to get another view or another click.

Data analytics is important because it helps businesses optimize their performances. Implementing it into the business model means companies can help reduce costs by identifying more efficient ways of doing business.

A company can also use data analytics to make better business decisions and help analyze customer trends and satisfaction, which can lead to new and better products and services. 

Steps in Data Analysis

The process involved in data analysis involves several steps:

  • Determine the data requirements or how the data is grouped. Data may be separated by age, demographic , income, or gender. Data values may be numerical or divided by category.
  • Collect the data. This can be done through a variety of sources such as computers, online sources, cameras, environmental sources, or through personnel.
  • Organize the data after it's collected so it can be analyzed. This may take place on a spreadsheet or other form of software that can take statistical data.
  • Clean up the data before it is analyzed. This is done by scrubbing it and ensuring there's no duplication or error and that it is not incomplete. This step helps correct any errors before the data goes on to a data analyst to be analyzed.

Types of Data Analytics

Data analytics is broken down into four basic types:

  • Descriptive analytics: This describes what has happened over a given period of time. Have the number of views gone up? Are sales stronger this month than last?
  • Diagnostic analytics: This focuses more on why something happened. It involves more diverse data inputs and a bit of hypothesizing. Did the weather affect beer sales? Did that latest marketing campaign impact sales?
  • Predictive analytics: This moves to what is likely going to happen in the near term. What happened to sales the last time we had a hot summer? How many weather models predict a hot summer this year?
  • Prescriptive analytics: This suggests a course of action. For example, we should add an evening shift to the brewery and rent an additional tank to increase output if the likelihood of a hot summer is measured as an average of these five weather models and the average is above 58%,

Data analytics underpins many quality control systems in the financial world, including the ever-popular Six Sigma program. It's nearly impossible to optimize something if you aren’t properly measuring it, whether it's your weight or the number of defects per million in a production line.

The sectors that have adopted the use of data analytics include the travel and hospitality industry where turnarounds can be quick. This industry can collect customer data and figure out where problems, if any, lie and how to fix them.

Healthcare combines the use of high volumes of structured and unstructured data and uses data analytics to make quick decisions. Similarly, the retail industry uses copious amounts of data to meet the ever-changing demands of shoppers. The information that retailers collect and analyze can help them identify trends, recommend products, and increase profits. 

The average total pay for a data analyst in the United States was just over $90,000 in April 2024. Although data analytics doesn't have a separate listing under the Bureau of Labor Statistics's (BLS) handbook, the responsibilities fall under the category of data scientist. The agency estimates as many as 59,400 jobs created in this field between 2022 and 2032 at a rate of 35%, which is much faster than average.

Data Analytics Techniques

Data analysts can use several analytical methods and techniques to process data and extract information. Some of the most popular methods include:

  • Regression Analysis : This entails analyzing the relationship between one or more independent variables and a dependent variable. The independent variables are used to explain the dependent variable, showing how changes in the independent variables influence the dependent variable.
  • Factor Analysis : This entails taking a complex dataset with many variables and reducing the variables to a small number. The goal of this maneuver is to attempt to discover hidden trends that would otherwise have been more difficult to see.
  • Cohort Analysis: This is the process of breaking a data set into groups of similar data, often into a customer demographic. This allows data analysts and other users of data analytics to further dive into the numbers relating to a specific subset of data.
  • Monte Carlo Simulations : Models the probability of different outcomes happening. They're often used for risk mitigation and loss prevention. These simulations incorporate multiple values and variables and often have greater forecasting capabilities than other data analytics approaches.
  • Time Series Analysis: Tracks data over time and solidifies the relationship between the value of a data point and the occurrence of the data point. This data analysis technique is usually used to spot cyclical trends or to project financial forecasts.

Data Analytics Tools

Data analytics has rapidly evolved in technological capabilities in addition to a broad range of mathematical and statistical approaches to crunching numbers. Data analysts have a broad range of software tools to help acquire data, store information, process data, and report findings.

Data analytics has always had loose ties to spreadsheets and Microsoft Excel . Data analysts also often interact with raw programming languages to transform and manipulate databases.

Data analysts also have help when reporting or communicating findings. Both Tableau and Power BI are data visualization and analysis tools used to compile information, perform data analytics, and distribute results via dashboards and reports.

Other tools are also emerging to assist data analysts. SAS is an analytics platform that can assist with data mining . Apache Spark is an open-source platform useful for processing large sets of data. Data analysts have a broad range of technological capabilities to further enhance the value they deliver to their company.

The Role of Data Analytics

Data analytics can enhance operations, efficiency, and performance in numerous industries by shining a spotlight on patterns. Implementing these techniques can give companies and businesses a competitive edge. Let's take a look at the process of data analysis divided into four basic steps.

Gathering Data

As the name suggests, this step involves collecting or gathering data and information from across a broad spectrum of sources. Various forms of information are then recreated into the same format so they can eventually be analyzed. The process can take a good bit of time, more than any other step.

Data Management

Data requires a database to contain, manage, and provide access to the information tht has been gathered. The next step in data analytics is therefore the creation of such a database to manage the information.

While some people or organizations may store data in Microsoft Excel spreadsheets, Excel is limited for this purpose and is more a tool for basic analysis and calculations such as in finance . Relational databases are a much better option than Excel for data storage. They allow for the storage of much greater volumes of data, and allow for efficient access. The relational structure allows for tables to easily be used together. Structured Query Language, known by its initials SQL, is the computer language used to work on and query from relational databases. Created in 1979, SQL allows for easy interaction with relational databases enabling datasets to be queried, built, and analysized.

Statistical Analysis

The third step is statistical analysis. It involves the interpretation of the gathered and stored data into models that will hopefully reveal trends that can be used to interpret future data. This is achieved through open-source programming languages such as Python. More specific tools for data analytics, like R, can be used for statistical analysis or graphical modeling.

Data Presentation

The results of the data analytics process are meant to be shared. The final step is formatting the data so it’s accessible to and understandable by others, particularly those individuals within a company who are responsible for growth, analysis, efficiency, and operations. Having access can be beneficial to shareholders as well.  

Importance and Uses of Data Analytics

Data analytics provide a critical component of a business’s probability of success. Gathering, sorting, analyzing, and presenting information can significantly enhance and benefit society, particularly in fields such as healthcare and crime prevention. But the uses of data analytics can be equally beneficial for small enterprises and startups that are looking for an edge over the business next door, albeit on a smaller scale,

Why Is Data Analytics Important?

Implementing data analytics into the business model means companies can help reduce costs by identifying more efficient ways of doing business. A company can also use data analytics to make better business decisions.

What Are the 4 Types of Data Analytics?

Data analytics is broken down into four basic types. Descriptive analytics describes what has happened over a given period . Diagnostic analytics focuses more on why something happened. Predictive analytics moves to what is likely going to happen in the near term. Finally, prescriptive analytics suggests a course of action.

Who Uses Data Analytics?

Data analytics has been adopted by several sectors where turnarounds can be quick, such as the travel and hospitality industry. Healthcare is another sector that combines the use of high volumes of structured and unstructured data, and data analytics can help in making quick decisions. The retail industry also uses large amounts of data to meet the ever-changing demands of shoppers.

Data analytics helps individuals and organizations make sure of their data in a world that's increasingly becoming reliant on information and gathering statistics. A set of raw numbers can be transformed using a variety of tools and techniques, resulting in informative, educational insights that drive decision-making and thoughtful management.

Glassdoor. " Data Analyst Salaries ."

U.S. Bureau of Labor Statistics. " Data Scientists ."

Oracle. " History of SQL ."

analytical research techniques

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices
  • Privacy Policy

Research Method

Home » Research Techniques – Methods, Types and Examples

Research Techniques – Methods, Types and Examples

Table of Contents

Research Techniques

Research Techniques

Definition:

Research techniques refer to the various methods, processes, and tools used to collect, analyze, and interpret data for the purpose of answering research questions or testing hypotheses.

Methods of Research Techniques

The methods of research techniques refer to the overall approaches or frameworks that guide a research study, including the theoretical perspective, research design, sampling strategy, data collection and analysis techniques, and ethical considerations. Some common methods of research techniques are:

  • Quantitative research: This is a research method that focuses on collecting and analyzing numerical data to establish patterns, relationships, and cause-and-effect relationships. Examples of quantitative research techniques are surveys, experiments, and statistical analysis.
  • Qualitative research: This is a research method that focuses on collecting and analyzing non-numerical data, such as text, images, and videos, to gain insights into the subjective experiences and perspectives of the participants. Examples of qualitative research techniques are interviews, focus groups, and content analysis.
  • Mixed-methods research: This is a research method that combines quantitative and qualitative research techniques to provide a more comprehensive understanding of a research question. Examples of mixed-methods research techniques are surveys with open-ended questions and case studies with statistical analysis.
  • Action research: This is a research method that focuses on solving real-world problems by collaborating with stakeholders and using a cyclical process of planning, action, and reflection. Examples of action research techniques are participatory action research and community-based participatory research.
  • Experimental research : This is a research method that involves manipulating one or more variables to observe the effect on an outcome, to establish cause-and-effect relationships. Examples of experimental research techniques are randomized controlled trials and quasi-experimental designs.
  • Observational research: This is a research method that involves observing and recording behavior or phenomena in natural settings to gain insights into the subject of study. Examples of observational research techniques are naturalistic observation and structured observation.

Types of Research Techniques

There are several types of research techniques used in various fields. Some of the most common ones are:

  • Surveys : This is a quantitative research technique that involves collecting data through questionnaires or interviews to gather information from a large group of people.
  • Experiments : This is a scientific research technique that involves manipulating one or more variables to observe the effect on an outcome, to establish cause-and-effect relationships.
  • Case studies: This is a qualitative research technique that involves in-depth analysis of a single case, such as an individual, group, or event, to understand the complexities of the case.
  • Observational studies : This is a research technique that involves observing and recording behavior or phenomena in natural settings to gain insights into the subject of study.
  • Content analysis: This is a research technique used to analyze text or other media content to identify patterns, themes, or meanings.
  • Focus groups: This is a research technique that involves gathering a small group of people to discuss a topic or issue and provide feedback on a product or service.
  • Meta-analysis: This is a statistical research technique that involves combining data from multiple studies to assess the overall effect of a treatment or intervention.
  • Action research: This is a research technique used to solve real-world problems by collaborating with stakeholders and using a cyclical process of planning, action, and reflection.
  • Interviews : Interviews are another technique used in research, and they can be conducted in person or over the phone. They are often used to gather in-depth information about an individual’s experiences or opinions. For example, a researcher might conduct interviews with cancer patients to learn more about their experiences with treatment.

Example of Research Techniques

Here’s an example of how research techniques might be used by a student conducting a research project:

Let’s say a high school student is interested in investigating the impact of social media on mental health. They could use a variety of research techniques to gather data and analyze their findings, including:

  • Literature review : The student could conduct a literature review to gather existing research studies, articles, and books that discuss the relationship between social media and mental health. This will provide a foundation of knowledge on the topic and help the student identify gaps in the research that they could address.
  • Surveys : The student could design and distribute a survey to gather information from a sample of individuals about their social media usage and how it affects their mental health. The survey could include questions about the frequency of social media use, the types of content consumed, and how it makes them feel.
  • Interviews : The student could conduct interviews with individuals who have experienced mental health issues and ask them about their social media use, and how it has impacted their mental health. This could provide a more in-depth understanding of how social media affects people on an individual level.
  • Data analysis : The student could use statistical software to analyze the data collected from the surveys and interviews. This would allow them to identify patterns and relationships between social media usage and mental health outcomes.
  • Report writing : Based on the findings from their research, the student could write a report that summarizes their research methods, findings, and conclusions. They could present their report to their peers or their teacher to share their insights on the topic.

Overall, by using a combination of research techniques, the student can investigate their research question thoroughly and systematically, and make meaningful contributions to the field of social media and mental health research.

Purpose of Research Techniques

The Purposes of Research Techniques are as follows:

  • To investigate and gain knowledge about a particular phenomenon or topic
  • To generate new ideas and theories
  • To test existing theories and hypotheses
  • To identify and evaluate potential solutions to problems
  • To gather data and evidence to inform decision-making
  • To identify trends and patterns in data
  • To explore cause-and-effect relationships between variables
  • To develop and refine measurement tools and methodologies
  • To establish the reliability and validity of research findings
  • To communicate research findings to others in a clear and concise manner.

Applications of Research Techniques

Here are some applications of research techniques:

  • Scientific research: to explore, investigate and understand natural phenomena, and to generate new knowledge and theories.
  • Market research: to collect and analyze data about consumer behavior, preferences, and trends, and to help businesses make informed decisions about product development, pricing, and marketing strategies.
  • Medical research : to study diseases and their treatments, and to develop new medicines, therapies, and medical technologies.
  • Social research : to explore and understand human behavior, attitudes, and values, and to inform public policy decisions related to education, health care, social welfare, and other areas.
  • Educational research : to study teaching and learning processes, and to develop effective teaching methods and instructional materials.
  • Environmental research: to investigate the impact of human activities on the environment, and to develop solutions to environmental problems.
  • Engineering Research: to design, develop, and improve products, processes, and systems, and to optimize their performance and efficiency.
  • Criminal justice research : to study crime patterns, causes, and prevention strategies, and to evaluate the effectiveness of criminal justice policies and programs.
  • Psychological research : to investigate human cognition, emotion, and behavior, and to develop interventions to address mental health issues.
  • Historical research: to study past events, societies, and cultures, and to develop an understanding of how they shape our present.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Research Design

Research Design – Types, Methods and Examples

Data Analysis

Data Analysis – Process, Methods and Types

What is a Hypothesis

What is a Hypothesis – Types, Examples and...

Problem statement

Problem Statement – Writing Guide, Examples and...

Research Paper

Research Paper – Structure, Examples and Writing...

Ethical Considerations

Ethical Considerations – Types, Examples and...

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Neurol Res Pract

Logo of neurrp

How to use and assess qualitative research methods

Loraine busetto.

1 Department of Neurology, Heidelberg University Hospital, Im Neuenheimer Feld 400, 69120 Heidelberg, Germany

Wolfgang Wick

2 Clinical Cooperation Unit Neuro-Oncology, German Cancer Research Center, Heidelberg, Germany

Christoph Gumbinger

Associated data.

Not applicable.

This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions, and focussing on intervention improvement. The most common methods of data collection are document study, (non-) participant observations, semi-structured interviews and focus groups. For data analysis, field-notes and audio-recordings are transcribed into protocols and transcripts, and coded using qualitative data management software. Criteria such as checklists, reflexivity, sampling strategies, piloting, co-coding, member-checking and stakeholder involvement can be used to enhance and assess the quality of the research conducted. Using qualitative in addition to quantitative designs will equip us with better tools to address a greater range of research problems, and to fill in blind spots in current neurological research and practice.

The aim of this paper is to provide an overview of qualitative research methods, including hands-on information on how they can be used, reported and assessed. This article is intended for beginning qualitative researchers in the health sciences as well as experienced quantitative researchers who wish to broaden their understanding of qualitative research.

What is qualitative research?

Qualitative research is defined as “the study of the nature of phenomena”, including “their quality, different manifestations, the context in which they appear or the perspectives from which they can be perceived” , but excluding “their range, frequency and place in an objectively determined chain of cause and effect” [ 1 ]. This formal definition can be complemented with a more pragmatic rule of thumb: qualitative research generally includes data in form of words rather than numbers [ 2 ].

Why conduct qualitative research?

Because some research questions cannot be answered using (only) quantitative methods. For example, one Australian study addressed the issue of why patients from Aboriginal communities often present late or not at all to specialist services offered by tertiary care hospitals. Using qualitative interviews with patients and staff, it found one of the most significant access barriers to be transportation problems, including some towns and communities simply not having a bus service to the hospital [ 3 ]. A quantitative study could have measured the number of patients over time or even looked at possible explanatory factors – but only those previously known or suspected to be of relevance. To discover reasons for observed patterns, especially the invisible or surprising ones, qualitative designs are needed.

While qualitative research is common in other fields, it is still relatively underrepresented in health services research. The latter field is more traditionally rooted in the evidence-based-medicine paradigm, as seen in " research that involves testing the effectiveness of various strategies to achieve changes in clinical practice, preferably applying randomised controlled trial study designs (...) " [ 4 ]. This focus on quantitative research and specifically randomised controlled trials (RCT) is visible in the idea of a hierarchy of research evidence which assumes that some research designs are objectively better than others, and that choosing a "lesser" design is only acceptable when the better ones are not practically or ethically feasible [ 5 , 6 ]. Others, however, argue that an objective hierarchy does not exist, and that, instead, the research design and methods should be chosen to fit the specific research question at hand – "questions before methods" [ 2 , 7 – 9 ]. This means that even when an RCT is possible, some research problems require a different design that is better suited to addressing them. Arguing in JAMA, Berwick uses the example of rapid response teams in hospitals, which he describes as " a complex, multicomponent intervention – essentially a process of social change" susceptible to a range of different context factors including leadership or organisation history. According to him, "[in] such complex terrain, the RCT is an impoverished way to learn. Critics who use it as a truth standard in this context are incorrect" [ 8 ] . Instead of limiting oneself to RCTs, Berwick recommends embracing a wider range of methods , including qualitative ones, which for "these specific applications, (...) are not compromises in learning how to improve; they are superior" [ 8 ].

Research problems that can be approached particularly well using qualitative methods include assessing complex multi-component interventions or systems (of change), addressing questions beyond “what works”, towards “what works for whom when, how and why”, and focussing on intervention improvement rather than accreditation [ 7 , 9 – 12 ]. Using qualitative methods can also help shed light on the “softer” side of medical treatment. For example, while quantitative trials can measure the costs and benefits of neuro-oncological treatment in terms of survival rates or adverse effects, qualitative research can help provide a better understanding of patient or caregiver stress, visibility of illness or out-of-pocket expenses.

How to conduct qualitative research?

Given that qualitative research is characterised by flexibility, openness and responsivity to context, the steps of data collection and analysis are not as separate and consecutive as they tend to be in quantitative research [ 13 , 14 ]. As Fossey puts it : “sampling, data collection, analysis and interpretation are related to each other in a cyclical (iterative) manner, rather than following one after another in a stepwise approach” [ 15 ]. The researcher can make educated decisions with regard to the choice of method, how they are implemented, and to which and how many units they are applied [ 13 ]. As shown in Fig.  1 , this can involve several back-and-forth steps between data collection and analysis where new insights and experiences can lead to adaption and expansion of the original plan. Some insights may also necessitate a revision of the research question and/or the research design as a whole. The process ends when saturation is achieved, i.e. when no relevant new information can be found (see also below: sampling and saturation). For reasons of transparency, it is essential for all decisions as well as the underlying reasoning to be well-documented.

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig1_HTML.jpg

Iterative research process

While it is not always explicitly addressed, qualitative methods reflect a different underlying research paradigm than quantitative research (e.g. constructivism or interpretivism as opposed to positivism). The choice of methods can be based on the respective underlying substantive theory or theoretical framework used by the researcher [ 2 ].

Data collection

The methods of qualitative data collection most commonly used in health research are document study, observations, semi-structured interviews and focus groups [ 1 , 14 , 16 , 17 ].

Document study

Document study (also called document analysis) refers to the review by the researcher of written materials [ 14 ]. These can include personal and non-personal documents such as archives, annual reports, guidelines, policy documents, diaries or letters.

Observations

Observations are particularly useful to gain insights into a certain setting and actual behaviour – as opposed to reported behaviour or opinions [ 13 ]. Qualitative observations can be either participant or non-participant in nature. In participant observations, the observer is part of the observed setting, for example a nurse working in an intensive care unit [ 18 ]. In non-participant observations, the observer is “on the outside looking in”, i.e. present in but not part of the situation, trying not to influence the setting by their presence. Observations can be planned (e.g. for 3 h during the day or night shift) or ad hoc (e.g. as soon as a stroke patient arrives at the emergency room). During the observation, the observer takes notes on everything or certain pre-determined parts of what is happening around them, for example focusing on physician-patient interactions or communication between different professional groups. Written notes can be taken during or after the observations, depending on feasibility (which is usually lower during participant observations) and acceptability (e.g. when the observer is perceived to be judging the observed). Afterwards, these field notes are transcribed into observation protocols. If more than one observer was involved, field notes are taken independently, but notes can be consolidated into one protocol after discussions. Advantages of conducting observations include minimising the distance between the researcher and the researched, the potential discovery of topics that the researcher did not realise were relevant and gaining deeper insights into the real-world dimensions of the research problem at hand [ 18 ].

Semi-structured interviews

Hijmans & Kuyper describe qualitative interviews as “an exchange with an informal character, a conversation with a goal” [ 19 ]. Interviews are used to gain insights into a person’s subjective experiences, opinions and motivations – as opposed to facts or behaviours [ 13 ]. Interviews can be distinguished by the degree to which they are structured (i.e. a questionnaire), open (e.g. free conversation or autobiographical interviews) or semi-structured [ 2 , 13 ]. Semi-structured interviews are characterized by open-ended questions and the use of an interview guide (or topic guide/list) in which the broad areas of interest, sometimes including sub-questions, are defined [ 19 ]. The pre-defined topics in the interview guide can be derived from the literature, previous research or a preliminary method of data collection, e.g. document study or observations. The topic list is usually adapted and improved at the start of the data collection process as the interviewer learns more about the field [ 20 ]. Across interviews the focus on the different (blocks of) questions may differ and some questions may be skipped altogether (e.g. if the interviewee is not able or willing to answer the questions or for concerns about the total length of the interview) [ 20 ]. Qualitative interviews are usually not conducted in written format as it impedes on the interactive component of the method [ 20 ]. In comparison to written surveys, qualitative interviews have the advantage of being interactive and allowing for unexpected topics to emerge and to be taken up by the researcher. This can also help overcome a provider or researcher-centred bias often found in written surveys, which by nature, can only measure what is already known or expected to be of relevance to the researcher. Interviews can be audio- or video-taped; but sometimes it is only feasible or acceptable for the interviewer to take written notes [ 14 , 16 , 20 ].

Focus groups

Focus groups are group interviews to explore participants’ expertise and experiences, including explorations of how and why people behave in certain ways [ 1 ]. Focus groups usually consist of 6–8 people and are led by an experienced moderator following a topic guide or “script” [ 21 ]. They can involve an observer who takes note of the non-verbal aspects of the situation, possibly using an observation guide [ 21 ]. Depending on researchers’ and participants’ preferences, the discussions can be audio- or video-taped and transcribed afterwards [ 21 ]. Focus groups are useful for bringing together homogeneous (to a lesser extent heterogeneous) groups of participants with relevant expertise and experience on a given topic on which they can share detailed information [ 21 ]. Focus groups are a relatively easy, fast and inexpensive method to gain access to information on interactions in a given group, i.e. “the sharing and comparing” among participants [ 21 ]. Disadvantages include less control over the process and a lesser extent to which each individual may participate. Moreover, focus group moderators need experience, as do those tasked with the analysis of the resulting data. Focus groups can be less appropriate for discussing sensitive topics that participants might be reluctant to disclose in a group setting [ 13 ]. Moreover, attention must be paid to the emergence of “groupthink” as well as possible power dynamics within the group, e.g. when patients are awed or intimidated by health professionals.

Choosing the “right” method

As explained above, the school of thought underlying qualitative research assumes no objective hierarchy of evidence and methods. This means that each choice of single or combined methods has to be based on the research question that needs to be answered and a critical assessment with regard to whether or to what extent the chosen method can accomplish this – i.e. the “fit” between question and method [ 14 ]. It is necessary for these decisions to be documented when they are being made, and to be critically discussed when reporting methods and results.

Let us assume that our research aim is to examine the (clinical) processes around acute endovascular treatment (EVT), from the patient’s arrival at the emergency room to recanalization, with the aim to identify possible causes for delay and/or other causes for sub-optimal treatment outcome. As a first step, we could conduct a document study of the relevant standard operating procedures (SOPs) for this phase of care – are they up-to-date and in line with current guidelines? Do they contain any mistakes, irregularities or uncertainties that could cause delays or other problems? Regardless of the answers to these questions, the results have to be interpreted based on what they are: a written outline of what care processes in this hospital should look like. If we want to know what they actually look like in practice, we can conduct observations of the processes described in the SOPs. These results can (and should) be analysed in themselves, but also in comparison to the results of the document analysis, especially as regards relevant discrepancies. Do the SOPs outline specific tests for which no equipment can be observed or tasks to be performed by specialized nurses who are not present during the observation? It might also be possible that the written SOP is outdated, but the actual care provided is in line with current best practice. In order to find out why these discrepancies exist, it can be useful to conduct interviews. Are the physicians simply not aware of the SOPs (because their existence is limited to the hospital’s intranet) or do they actively disagree with them or does the infrastructure make it impossible to provide the care as described? Another rationale for adding interviews is that some situations (or all of their possible variations for different patient groups or the day, night or weekend shift) cannot practically or ethically be observed. In this case, it is possible to ask those involved to report on their actions – being aware that this is not the same as the actual observation. A senior physician’s or hospital manager’s description of certain situations might differ from a nurse’s or junior physician’s one, maybe because they intentionally misrepresent facts or maybe because different aspects of the process are visible or important to them. In some cases, it can also be relevant to consider to whom the interviewee is disclosing this information – someone they trust, someone they are otherwise not connected to, or someone they suspect or are aware of being in a potentially “dangerous” power relationship to them. Lastly, a focus group could be conducted with representatives of the relevant professional groups to explore how and why exactly they provide care around EVT. The discussion might reveal discrepancies (between SOPs and actual care or between different physicians) and motivations to the researchers as well as to the focus group members that they might not have been aware of themselves. For the focus group to deliver relevant information, attention has to be paid to its composition and conduct, for example, to make sure that all participants feel safe to disclose sensitive or potentially problematic information or that the discussion is not dominated by (senior) physicians only. The resulting combination of data collection methods is shown in Fig.  2 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig2_HTML.jpg

Possible combination of data collection methods

Attributions for icons: “Book” by Serhii Smirnov, “Interview” by Adrien Coquet, FR, “Magnifying Glass” by anggun, ID, “Business communication” by Vectors Market; all from the Noun Project

The combination of multiple data source as described for this example can be referred to as “triangulation”, in which multiple measurements are carried out from different angles to achieve a more comprehensive understanding of the phenomenon under study [ 22 , 23 ].

Data analysis

To analyse the data collected through observations, interviews and focus groups these need to be transcribed into protocols and transcripts (see Fig.  3 ). Interviews and focus groups can be transcribed verbatim , with or without annotations for behaviour (e.g. laughing, crying, pausing) and with or without phonetic transcription of dialects and filler words, depending on what is expected or known to be relevant for the analysis. In the next step, the protocols and transcripts are coded , that is, marked (or tagged, labelled) with one or more short descriptors of the content of a sentence or paragraph [ 2 , 15 , 23 ]. Jansen describes coding as “connecting the raw data with “theoretical” terms” [ 20 ]. In a more practical sense, coding makes raw data sortable. This makes it possible to extract and examine all segments describing, say, a tele-neurology consultation from multiple data sources (e.g. SOPs, emergency room observations, staff and patient interview). In a process of synthesis and abstraction, the codes are then grouped, summarised and/or categorised [ 15 , 20 ]. The end product of the coding or analysis process is a descriptive theory of the behavioural pattern under investigation [ 20 ]. The coding process is performed using qualitative data management software, the most common ones being InVivo, MaxQDA and Atlas.ti. It should be noted that these are data management tools which support the analysis performed by the researcher(s) [ 14 ].

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig3_HTML.jpg

From data collection to data analysis

Attributions for icons: see Fig. ​ Fig.2, 2 , also “Speech to text” by Trevor Dsouza, “Field Notes” by Mike O’Brien, US, “Voice Record” by ProSymbols, US, “Inspection” by Made, AU, and “Cloud” by Graphic Tigers; all from the Noun Project

How to report qualitative research?

Protocols of qualitative research can be published separately and in advance of the study results. However, the aim is not the same as in RCT protocols, i.e. to pre-define and set in stone the research questions and primary or secondary endpoints. Rather, it is a way to describe the research methods in detail, which might not be possible in the results paper given journals’ word limits. Qualitative research papers are usually longer than their quantitative counterparts to allow for deep understanding and so-called “thick description”. In the methods section, the focus is on transparency of the methods used, including why, how and by whom they were implemented in the specific study setting, so as to enable a discussion of whether and how this may have influenced data collection, analysis and interpretation. The results section usually starts with a paragraph outlining the main findings, followed by more detailed descriptions of, for example, the commonalities, discrepancies or exceptions per category [ 20 ]. Here it is important to support main findings by relevant quotations, which may add information, context, emphasis or real-life examples [ 20 , 23 ]. It is subject to debate in the field whether it is relevant to state the exact number or percentage of respondents supporting a certain statement (e.g. “Five interviewees expressed negative feelings towards XYZ”) [ 21 ].

How to combine qualitative with quantitative research?

Qualitative methods can be combined with other methods in multi- or mixed methods designs, which “[employ] two or more different methods [ …] within the same study or research program rather than confining the research to one single method” [ 24 ]. Reasons for combining methods can be diverse, including triangulation for corroboration of findings, complementarity for illustration and clarification of results, expansion to extend the breadth and range of the study, explanation of (unexpected) results generated with one method with the help of another, or offsetting the weakness of one method with the strength of another [ 1 , 17 , 24 – 26 ]. The resulting designs can be classified according to when, why and how the different quantitative and/or qualitative data strands are combined. The three most common types of mixed method designs are the convergent parallel design , the explanatory sequential design and the exploratory sequential design. The designs with examples are shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 42466_2020_59_Fig4_HTML.jpg

Three common mixed methods designs

In the convergent parallel design, a qualitative study is conducted in parallel to and independently of a quantitative study, and the results of both studies are compared and combined at the stage of interpretation of results. Using the above example of EVT provision, this could entail setting up a quantitative EVT registry to measure process times and patient outcomes in parallel to conducting the qualitative research outlined above, and then comparing results. Amongst other things, this would make it possible to assess whether interview respondents’ subjective impressions of patients receiving good care match modified Rankin Scores at follow-up, or whether observed delays in care provision are exceptions or the rule when compared to door-to-needle times as documented in the registry. In the explanatory sequential design, a quantitative study is carried out first, followed by a qualitative study to help explain the results from the quantitative study. This would be an appropriate design if the registry alone had revealed relevant delays in door-to-needle times and the qualitative study would be used to understand where and why these occurred, and how they could be improved. In the exploratory design, the qualitative study is carried out first and its results help informing and building the quantitative study in the next step [ 26 ]. If the qualitative study around EVT provision had shown a high level of dissatisfaction among the staff members involved, a quantitative questionnaire investigating staff satisfaction could be set up in the next step, informed by the qualitative study on which topics dissatisfaction had been expressed. Amongst other things, the questionnaire design would make it possible to widen the reach of the research to more respondents from different (types of) hospitals, regions, countries or settings, and to conduct sub-group analyses for different professional groups.

How to assess qualitative research?

A variety of assessment criteria and lists have been developed for qualitative research, ranging in their focus and comprehensiveness [ 14 , 17 , 27 ]. However, none of these has been elevated to the “gold standard” in the field. In the following, we therefore focus on a set of commonly used assessment criteria that, from a practical standpoint, a researcher can look for when assessing a qualitative research report or paper.

Assessors should check the authors’ use of and adherence to the relevant reporting checklists (e.g. Standards for Reporting Qualitative Research (SRQR)) to make sure all items that are relevant for this type of research are addressed [ 23 , 28 ]. Discussions of quantitative measures in addition to or instead of these qualitative measures can be a sign of lower quality of the research (paper). Providing and adhering to a checklist for qualitative research contributes to an important quality criterion for qualitative research, namely transparency [ 15 , 17 , 23 ].

Reflexivity

While methodological transparency and complete reporting is relevant for all types of research, some additional criteria must be taken into account for qualitative research. This includes what is called reflexivity, i.e. sensitivity to the relationship between the researcher and the researched, including how contact was established and maintained, or the background and experience of the researcher(s) involved in data collection and analysis. Depending on the research question and population to be researched this can be limited to professional experience, but it may also include gender, age or ethnicity [ 17 , 27 ]. These details are relevant because in qualitative research, as opposed to quantitative research, the researcher as a person cannot be isolated from the research process [ 23 ]. It may influence the conversation when an interviewed patient speaks to an interviewer who is a physician, or when an interviewee is asked to discuss a gynaecological procedure with a male interviewer, and therefore the reader must be made aware of these details [ 19 ].

Sampling and saturation

The aim of qualitative sampling is for all variants of the objects of observation that are deemed relevant for the study to be present in the sample “ to see the issue and its meanings from as many angles as possible” [ 1 , 16 , 19 , 20 , 27 ] , and to ensure “information-richness [ 15 ]. An iterative sampling approach is advised, in which data collection (e.g. five interviews) is followed by data analysis, followed by more data collection to find variants that are lacking in the current sample. This process continues until no new (relevant) information can be found and further sampling becomes redundant – which is called saturation [ 1 , 15 ] . In other words: qualitative data collection finds its end point not a priori , but when the research team determines that saturation has been reached [ 29 , 30 ].

This is also the reason why most qualitative studies use deliberate instead of random sampling strategies. This is generally referred to as “ purposive sampling” , in which researchers pre-define which types of participants or cases they need to include so as to cover all variations that are expected to be of relevance, based on the literature, previous experience or theory (i.e. theoretical sampling) [ 14 , 20 ]. Other types of purposive sampling include (but are not limited to) maximum variation sampling, critical case sampling or extreme or deviant case sampling [ 2 ]. In the above EVT example, a purposive sample could include all relevant professional groups and/or all relevant stakeholders (patients, relatives) and/or all relevant times of observation (day, night and weekend shift).

Assessors of qualitative research should check whether the considerations underlying the sampling strategy were sound and whether or how researchers tried to adapt and improve their strategies in stepwise or cyclical approaches between data collection and analysis to achieve saturation [ 14 ].

Good qualitative research is iterative in nature, i.e. it goes back and forth between data collection and analysis, revising and improving the approach where necessary. One example of this are pilot interviews, where different aspects of the interview (especially the interview guide, but also, for example, the site of the interview or whether the interview can be audio-recorded) are tested with a small number of respondents, evaluated and revised [ 19 ]. In doing so, the interviewer learns which wording or types of questions work best, or which is the best length of an interview with patients who have trouble concentrating for an extended time. Of course, the same reasoning applies to observations or focus groups which can also be piloted.

Ideally, coding should be performed by at least two researchers, especially at the beginning of the coding process when a common approach must be defined, including the establishment of a useful coding list (or tree), and when a common meaning of individual codes must be established [ 23 ]. An initial sub-set or all transcripts can be coded independently by the coders and then compared and consolidated after regular discussions in the research team. This is to make sure that codes are applied consistently to the research data.

Member checking

Member checking, also called respondent validation , refers to the practice of checking back with study respondents to see if the research is in line with their views [ 14 , 27 ]. This can happen after data collection or analysis or when first results are available [ 23 ]. For example, interviewees can be provided with (summaries of) their transcripts and asked whether they believe this to be a complete representation of their views or whether they would like to clarify or elaborate on their responses [ 17 ]. Respondents’ feedback on these issues then becomes part of the data collection and analysis [ 27 ].

Stakeholder involvement

In those niches where qualitative approaches have been able to evolve and grow, a new trend has seen the inclusion of patients and their representatives not only as study participants (i.e. “members”, see above) but as consultants to and active participants in the broader research process [ 31 – 33 ]. The underlying assumption is that patients and other stakeholders hold unique perspectives and experiences that add value beyond their own single story, making the research more relevant and beneficial to researchers, study participants and (future) patients alike [ 34 , 35 ]. Using the example of patients on or nearing dialysis, a recent scoping review found that 80% of clinical research did not address the top 10 research priorities identified by patients and caregivers [ 32 , 36 ]. In this sense, the involvement of the relevant stakeholders, especially patients and relatives, is increasingly being seen as a quality indicator in and of itself.

How not to assess qualitative research

The above overview does not include certain items that are routine in assessments of quantitative research. What follows is a non-exhaustive, non-representative, experience-based list of the quantitative criteria often applied to the assessment of qualitative research, as well as an explanation of the limited usefulness of these endeavours.

Protocol adherence

Given the openness and flexibility of qualitative research, it should not be assessed by how well it adheres to pre-determined and fixed strategies – in other words: its rigidity. Instead, the assessor should look for signs of adaptation and refinement based on lessons learned from earlier steps in the research process.

Sample size

For the reasons explained above, qualitative research does not require specific sample sizes, nor does it require that the sample size be determined a priori [ 1 , 14 , 27 , 37 – 39 ]. Sample size can only be a useful quality indicator when related to the research purpose, the chosen methodology and the composition of the sample, i.e. who was included and why.

Randomisation

While some authors argue that randomisation can be used in qualitative research, this is not commonly the case, as neither its feasibility nor its necessity or usefulness has been convincingly established for qualitative research [ 13 , 27 ]. Relevant disadvantages include the negative impact of a too large sample size as well as the possibility (or probability) of selecting “ quiet, uncooperative or inarticulate individuals ” [ 17 ]. Qualitative studies do not use control groups, either.

Interrater reliability, variability and other “objectivity checks”

The concept of “interrater reliability” is sometimes used in qualitative research to assess to which extent the coding approach overlaps between the two co-coders. However, it is not clear what this measure tells us about the quality of the analysis [ 23 ]. This means that these scores can be included in qualitative research reports, preferably with some additional information on what the score means for the analysis, but it is not a requirement. Relatedly, it is not relevant for the quality or “objectivity” of qualitative research to separate those who recruited the study participants and collected and analysed the data. Experiences even show that it might be better to have the same person or team perform all of these tasks [ 20 ]. First, when researchers introduce themselves during recruitment this can enhance trust when the interview takes place days or weeks later with the same researcher. Second, when the audio-recording is transcribed for analysis, the researcher conducting the interviews will usually remember the interviewee and the specific interview situation during data analysis. This might be helpful in providing additional context information for interpretation of data, e.g. on whether something might have been meant as a joke [ 18 ].

Not being quantitative research

Being qualitative research instead of quantitative research should not be used as an assessment criterion if it is used irrespectively of the research problem at hand. Similarly, qualitative research should not be required to be combined with quantitative research per se – unless mixed methods research is judged as inherently better than single-method research. In this case, the same criterion should be applied for quantitative studies without a qualitative component.

The main take-away points of this paper are summarised in Table ​ Table1. 1 . We aimed to show that, if conducted well, qualitative research can answer specific research questions that cannot to be adequately answered using (only) quantitative designs. Seeing qualitative and quantitative methods as equal will help us become more aware and critical of the “fit” between the research problem and our chosen methods: I can conduct an RCT to determine the reasons for transportation delays of acute stroke patients – but should I? It also provides us with a greater range of tools to tackle a greater range of research problems more appropriately and successfully, filling in the blind spots on one half of the methodological spectrum to better address the whole complexity of neurological research and practice.

Take-away-points

Acknowledgements

Abbreviations, authors’ contributions.

LB drafted the manuscript; WW and CG revised the manuscript; all authors approved the final versions.

no external funding.

Availability of data and materials

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

SYSTEMATIC REVIEW article

This article is part of the research topic.

Reviews in Gastroenterology 2023

Electrogastrography Measurement Systems and Analysis Methods Used in Clinical Practice and Research: Comprehensive Review Provisionally Accepted

  • 1 VSB-Technical University of Ostrava, Czechia

The final, formatted version of the article will be published soon.

Electrogastrography (EGG) is a non-invasive method with high diagnostic potential for the prevention of gastroenterological pathologies in clinical practice. In this paper, a review of the measurement systems, procedures, and methods of analysis used in electrogastrography is presented. A critical review of historical and current literature is conducted, focusing on electrode placement, measurement apparatus, measurement procedures, and time-frequency domain methods of filtration and analysis of the non-invasively measured electrical activity of the stomach.As a result a total of 129 relevant articles with primary aim on experimental diet were reviewed in this study. Scopus, PubMed and Web of Science databases were used to search for articles in English language, according to the specific query and using PRISMA method. The research topic of electrogastrography has been continuously growing in popularity since the first measurement by professor Alvarez 100 years ago and there are many researchers and companies interested in EGG nowadays. Measurement apparatus and procedures are still being developed in both commercial and research settings. There are plenty variable electrode layouts, ranging from minimal numbers of electrodes for ambulatory measurements to very high numbers of electrodes for spatial measurements. Most authors used in their research anatomically approximated layout with 2 active electrodes in bipolar connection and commercial electrogastrograph with sampling rate of 2 or 4 Hz. Test subjects were usually healthy adults and diet was controlled. However, evaluation methods are being developed at a slower pace and usually the signals are classified only based on dominant frequency. The main review contributions include the overview of spectrum of measurement systems and procedures for electrogastrography developed by many authors, but a firm medical standard has not yet been defined. Therefore, it is not possible to use this method in clinical practice for objective diagnosis.

Keywords: electrogastrography, non-invasive method, Measurement systems, Electrode placement, Measurement apparatus, Signal processing

Received: 19 Jan 2024; Accepted: 03 Jun 2024.

Copyright: © 2024 Oczka, Augustynek, Penhaker and Kubicek. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Jan Kubicek, VSB-Technical University of Ostrava, Ostrava, 708 33, Moravian-Silesian Region, Czechia

People also looked at

Application of Data Analytic Techniques and Monte-Carlo Simulation for Forecasting and Optimizing Oil Production from Tight Reservoirs

  • Original Paper
  • Published: 03 June 2024

Cite this article

analytical research techniques

  • Hamid Rahmanifard 1 &
  • Ian Gates   ORCID: orcid.org/0000-0001-9551-6752 1  

Prediction of well production from unconventional reservoirs is a complex problem even with considerable amounts of data especially due to uncertainties and incomplete understanding of physics. Data analytic techniques (DAT) with machine learning algorithms are an effective approach to enhance solution reliability for robust forward recovery forecasting from unconventional resources. However, there are still some difficulties in selecting and building the best DAT models, and in using them effectively for decision making. The objective of this study is to explore the application of DAT and Monte-Carlo simulation for forecasting and enhancing oil production of a horizontal well that has been hydraulically fractured in a tight reservoir. To do this, a database was first generated from 495 simulations of a tight oil reservoir, where the oil production in the first year depends on 16 variables, including reservoir characteristics and well design parameters. Afterward, using the random forest algorithm, the most influential parameters were determined. Considering the optimum hyperparameters for each algorithm, the best algorithm, which was identified through a comparative study, was then integrated with Monte-Carlo simulation to determine the quality of the production well. The results showed that oil production was mainly affected by well length, reservoir permeability, and number of fracture stages. The results also indicated that a neural network model with two hidden layers performed better than the other algorithms in predicting oil production (lower mean absolute error and standard deviation). Finally, the probabilistic analysis revealed that the completion design parameters were within the appropriate range.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

analytical research techniques

There are other ML algorithms such as reinforcement learning and semi supervised, which are not discussed here.

1 mD = 9.869233 × 10 −16 m 2

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization. In Shawe-Taylor et al. (Eds.), Neural Information Processing Systems .

Breiman, L. (2001). Random forests. Machine Learning, 45 , 5–32.

Article   Google Scholar  

Brownlee, J. (2016). Machine learning mastery with Python [WWW Document]. Machine Learning Mastery with Python. https://machinelearningmastery.com/machine-learning-with-python/ . Retrieved April 13 2022

Brownlee, J. (2017) What is the difference between a parameter and a hyperparameter ? [WWW Document]. MachineLearningMastery.com. https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/ . Retrieved January 18 2022

Brownlee, J. (2018) Use early stopping to halt the training of neural networks at the right time [WWW Document]. December 10. https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/ . Retrieved May 5 2022

Brownlee, J. (2020b). Classification and regression trees for machine learning [WWW Document]. MachineLearningMastery.com. https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/ . Retrieved October 2 2023

Brownlee, J. (2020a). kNN imputation for missing values in machine learning [WWW Document]. MachineLearningMastery.com. https://machinelearningmastery.com/knn-imputation-for-missing-values-in-machine-learning/ . Retrieved October 1 2023

Brownlee, J. (2021a). A gentle introduction to learning algorithms [WWW Document]. Machine Learning Mastery. https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/ . Retrieved January 17 2022

Brownlee, J. (2021b) XGBoost for regression [WWW Document]. MachineLearningMastery.com. https://machinelearningmastery.com/xgboost-for-regression/ . Retrieved April 5 2024

Candelieri, A., Perego, R., Giordani, I., Ponti, A., & Archetti, F. (2020). Modelling human active search in optimizing black-box functions. Soft Computing, 24 , 17771–17785.

Cao, Q., Banerjee, R., Gupta, S., Li, J., Zhou, W., Jeyachandra, B., (2016). Data driven production forecasting using machine learning. In Society of petroleum engineers—SPE argentina exploration and production of unconventional resources symposium . OnePetro. https://doi.org/10.2118/180984-ms

Crnkovic-Friis, L., Erlandson, M. (2015). Geology driven EUR prediction using deep learning. In Proceedings—SPE annual technical conference and exhibition (pp. 1062–1071). OnePetro. https://doi.org/10.2118/174799-ms

Deuschle, W. J. (2018). Undergraduate fundamentals of machine learning [WWW Document]. https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364585 . Retrieved October 10 2023

Dobilas, S. (2020) Support vector regression (SVR)—one of the most flexible yet robust prediction algorithms [WWW Document]. https://towardsdatascience.com/support-vector-regression-svr-one-of-the-most-flexible-yet-robust-prediction-algorithms-4d25fbdaca60 . Retrieved January 18 2022

Dubey, A. (2018). Feature selection using random forest [WWW Document]. Towards Data Science. https://towardsdatascience.com/feature-selection-using-random-forest-26d7b747597f . Retrieved April 12 2022

Džeroski, S., Panov, P., & Ženko, B. (2009). Machine learning, ensemble methods in. In R. Meyers (Ed.), Encyclopedia of complexity and systems science. Springer. https://doi.org/10.1007/978-0-387-30440-3_315

Chapter   Google Scholar  

Enyioha, C., Ertekin, T. (2014). Advanced well structures: An artificial intelligence approach to field deployment and performance prediction. In Society of petroleum engineers—SPE intelligent energy international 2014 (pp. 549–561). OnePetro. https://doi.org/10.2118/167870-ms

Enyioha, C., Ertekin, T. (2017). Performance prediction for advanced well structures in unconventional oil and gas reservoirs using artificial intelligent expert systems. In Proceedings—SPE annual technical conference and exhibition . OnePetro. https://doi.org/10.2118/187037-ms

Frazier, P.I. (2018). A tutorial on Bayesian optimization [WWW Document]. http://arxiv.org/abs/1807.02811 . Retrieved January 18 2022

Kaggle (2020). A guide on XGBoost hyperparameters tuning [WWW Document]. https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-hyperparameters-tuning/notebook . Retrieved April 13 2022

Kartik, T. (2021) SVM hyperparameter tuning using GridSearchCV [WWW Document]. https://www.geeksforgeeks.org/ . https://www.geeksforgeeks.org/svm-hyperparameter-tuning-using-gridsearchcv-ml/ . Retrieved April 13 2022

Kasturi, S.N. (2023) A comprehensive guide to hyperparameter tuning [WWW Document]. DZone. https://dzone.com/articles/a-comprehensive-guide-to-hyperparameter-tuning-exp . Retrieved April 8 2024

Kocoglu, Y., Gorell, S. B., Emadi, H., Eyinla, D. S., Bolouri, F., Kocoglu, Y. C., & Arora, A. (2024). Improving the accuracy of short-term multiphase production forecasts in unconventional tight oil reservoirs using contextual Bi-directional long short-term memory. Geoenergy Science and Engineering, 235 , 212688.

Article   CAS   Google Scholar  

Koehrsen, W. (2018b). Hyperparameter tuning the random forest in Python [WWW Document]. Towards Data Science. https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74 . Retrieved April 13 2022

Koehrsen, W. (2018a). A conceptual explanation of Bayesian hyperparameter optimization for machine learning [WWW Document]. Towards Data Science. URL https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f . Retrieved April 8 2024

Komer, B., Bergstra, J., Eliasmith, C. (2014). Hyperopt-Sklearn: Automatic hyperparameter configuration for Scikit-learn. In Proceedings of the 13th Python in Science Conference (pp. 32–37). https://doi.org/10.25080/majora-14bd3278-006

Kong, B., Chen, Z., Chen, S., & Qin, T. (2021). Machine learning-assisted production data analysis in liquid-rich Duvernay formation. Journal of Petroleum Science and Engineering, 200 , 108377. https://doi.org/10.1016/j.petrol.2021.108377

Li, W., Dong, Z., Lee, J. W., Ma, X., & Qian, S. (2022a). Development of decline curve analysis parameters for tight oil wells using a machine learning algorithm. Geofluids, 2022 , 8441075.

Google Scholar  

Li, X., Ma, X., Xiao, F., Xiao, C., Wang, F., & Zhang, S. (2022b). Time-series production forecasting method based on the integration of bidirectional gated recurrent unit (Bi-GRU) network and sparrow search algorithm (SSA). Journal of Petroleum Science and Engineering, 208 , 109309.

Liang, Y., Zhao, P. (2019). A machine learning analysis based on big data for eagle ford shale formation. In Proceedings—SPE annual technical conference and exhibition . OnePetro. https://doi.org/10.2118/196158-ms

Liao, L., Zeng, Y., Liang, Y., Zhang, H. (2020). Data mining: A novel strategy for production forecast in tight hydrocarbon resource in Canada by random forest analysis. In International petroleum technology conference 2020, IPTC 2020 . OnePetro. https://doi.org/10.2523/iptc-20344-ms

López, F. (2020). TPOT: Pipelines optimization with genetic algorithms [WWW Document]. Towards Data Science. https://towardsdatascience.com/tpot-pipelines-optimization-with-genetic-algorithms-56ec44ef6ede . Retrieved January 18 2022

López, F. (2021). HyperOpt: Hyperparameter tuning based on Bayesian optimization [WWW Document]. Towards Data Science. https://towardsdatascience.com/hyperopt-hyperparameter-tuning-based-on-bayesian-optimization-7fa32dffaf29 . Retrieved January 18 2022

Lu, Y., Shen, M., Wang, H., Wang, X., Van Rechem, C., Wei, W. (2023). Machine learning for synthetic data generation: A review . arXiv preprint arXiv:2302.04062 .

Luo, G., Tian, Y., Sharma, A., Ehlig-Economides, C. (2019). Eagle ford well insights using data-driven approaches. In International Petroleum Technology Conference 2019, IPTC 2019 . OnePetro. https://doi.org/10.2523/iptc-19260-ms

Martulandi, A. (2019). K-nearest neighbors in Python + hyperparameters tuning [WWW Document]. DataDrivenInvestor. https://medium.datadriveninvestor.com/k-nearest-neighbors-in-python-hyperparameters-tuning-716734bc557f . Retrieved April 13 2022

McElroy, P. D., Bibang, H., Emadi, H., Kocoglu, Y., Hussain, A., & Watson, M. C. (2021). Artificial neural network (ANN) approach to predict unconfined compressive strength (UCS) of oil and gas well cement reinforced with nanoparticles. Journal of Natural Gas Science and Engineering, 88 , 103816.

Mitchell, T. M. (2010). Machine learning. Machine Learning V2 . https://doi.org/10.1093/bioinformatics/btq112

Mohaghegh, S. D. (2017). Shale analytics: Data-driven analytics in unconventional resources. Shale Analytics: Data-Driven Analytics in Unconventional Resources . https://doi.org/10.1007/978-3-319-48753-3

Morales-Hernández, A., Van Nieuwenhuyse, I., & Rojas Gonzalez, S. (2022). A survey on multi-objective hyperparameter optimization algorithms for machine learning. Artificial Intelligence Review, 56 (8), 8043–8093.

Nagpal, A. (2017). Decision tree ensembles—bagging and boosting [WWW Document]. Towards Data Science. https://towardsdatascience.com/decision-tree-ensembles-bagging-and-boosting-266a8ba60fd9 . Retrieved September 9 2021

Negash, B. M., Ayoub, M. A., Jufar, S. R., & Robert, A. J. (2017). History matching using proxy modeling and multiobjective optimizations. In M. Awang, B. Negash, N. Md Akhir, L. Lubis, Md. Rafek, & A. (Eds.), ICIPEG 2016. Springer. https://doi.org/10.1007/978-981-10-3650-7_1

Nejad, A. M., Sheludko, S., Hodgson, T., McFall, R., Shelley, R. F. (2015). A case history: Evaluating well completions in the eagle ford shale using a data-driven approach. In Society of petroleum engineers—SPE hydraulic fracturing technology conference 2015 (pp. 164–182). OnePetro. https://doi.org/10.2118/spe-173336-ms

Omrani, P. S., Vecchia, A. L., Dobrovolschi, I., van Baalen, T., Poort, J., Octaviano, R., Binn-Tahir, H., Muñoz, E. (2019). Deep learning and hybrid approaches applied to production forecasting. In Society of petroleum engineers—Abu Dhabi international petroleum exhibition and conference 2019, ADIP 2019 . https://doi.org/10.2118/197498-MS

Otero, A., Carballido, J. L., Salgado, L., Canudo, J. I., Garrido, C., Kecerdasan, I., Ikep, P. (2017). Random forest: Many are better than one [WWW Document]. QuantDare. https://quantdare.com/random-forest-many-are-better-than-one/ . Retrieved January 17 2022

Panja, P., Velasco, R., Pathak, M., & Deo, M. (2018). Application of artificial intelligence to forecast hydrocarbon production from shales. Petroleum, 4 , 75–89. https://doi.org/10.1016/j.petlm.2017.11.003

Park, J., Datta-Gupta, A., Singh, A., & Sankaran, S. (2021). Hybrid physics and data-driven modeling for unconventional field development and its application to US onshore basin. Journal of Petroleum Science and Engineering, 206 , 109008. https://doi.org/10.1016/j.petrol.2021.109008

Pedamkar, P. (2020) Machine learning techniques | top 4 techniques of machine learning [WWW Document]. EDUCBA. https://www.educba.com/machine-learning-techniques/?source=leftnav . Retrieved September 9 2021

Rahmanifard, H., Alimohammadi, H., & Gates, I. (2020). Well performance prediction in montney formation using machine learning approaches . OnePetro. https://doi.org/10.15530/urtec-2020-2465

Book   Google Scholar  

Rahmanifard, H., Maroufi, P., Alimohamadi, H., Plaksina, T., & Gates, I. (2021). The application of supervised machine learning techniques for multivariate modelling of gas component viscosity: A comparative study. Fuel, 285 , 119146.

Razak, S. M., Cornelio, J., Cho, Y., Liu, H. H., Vaidya, R., & Jafarpour, B. (2022). Transfer learning with recurrent neural networks for long-term production forecasting in unconventional reservoirs. SPE Journal, 27 , 2425–2442. https://doi.org/10.2118/209594-PA

Ren, X., Yin, J., Xiao, F., Miao, S., Lolla, S., Yao, C., Lonnes, S., Sun, H., Chen, Y., Brown, J.S., Garzon, J., Pankaj, P. (2023). Data driven oil production prediction and uncertainty quantification for unconventional asset development planning through machine learning. In Proceedings of the 11th unconventional resources technology conference . https://doi.org/10.15530/URTEC-2023-3865670

Schuetter, J., Mishra, S., Zhong, M., LaFollette, R. (2015). Data analytics for production optimization in unconventional reservoirs. https://doi.org/10.15530/URTEC-2015-2167005

Schuetter, J., Mishra, S., Lin, L., Chandramohan, D. (2019). Ensemble learning: A robust paradigm for data-driven modeling in unconventional reservoirs. In SPE/AAPG/SEG unconventional resources technology conference 2019, URTeC 2019 . https://doi.org/10.15530/URTEC-2019-929

Schuetter, J., Mishra, S., Zhong, M., & LaFollette, R. (2018). A data-analytics tutorial: Building predictive models for oil production in an unconventional shale reservoir. SPE Journal, 23 (04), 1075–1089.  https://doi.org/10.2118/189969-pa

Scikit-learn (2023). 3.2. Tuning the hyper-parameters of an estimator [WWW Document]. scikit-learn 1.3.1 documentation. https://scikit-learn.org/stable/modules/grid_search.html . Reterieved October 5 2023

Shelley, R., Guliyev, N., Nejad, A. (2012). A novel method to optimize horizontal Bakken completions in a factory mode development program. In Proceedings—SPE annual technical conference and exhibition (pp. 3034–3043). OnePetro. https://doi.org/10.2118/159696-ms

Smets, K., Verdonk, B., Jordaan, E. M. (2007) Evaluation of performance measures for SVR hyperparameter selection. In IEEE international conference on neural networks — conference proceedings (pp. 637–642). https://doi.org/10.1109/IJCNN.2007.4371031

Smola, A. J., Schölkopf, B., & Schölkopf, S. (2004). A tutorial on support vector regression. Statistics and Computing . https://doi.org/10.1023/B:STCO.0000035301.49549.88

Solanki, S. (2021). Scikit-optimize: Simple guide to hyperparameters Optimization/tunning [WWW Document]. CoderzColumn. https://coderzcolumn.com/tutorials/machine-learning/scikit-optimize-guide-to-hyperparameters-optimization . Retrieved January 18 2022

Suhag, A., Ranjith, R., Aminzadeh, F. (2017) Comparison of shale oil production forecasting using empirical methods and artificial neural networks. In Proceedings - SPE annual technical conference and exhibition . OnePetro. https://doi.org/10.2118/187112-ms

Temizel, C., Canbaz, C. H., Alsaheib, H., Yanidis, K., Balaji, K., Alsulaiman, N., Basri, M., Jama, N. (2021). Geology-driven EUR forecasting in unconventional fields. In SPE Middle East oil and gas show and conference, MEOS, proceedings 2021-November . https://doi.org/10.2118/204583-MS

Tempelman, G. (2020) Comparing hyperparameter optimization frameworks in Python: A conceptual and pragmatic approach [WWW Document]. Medium. https://medium.com/@gerbentempelman/comparing-hyperparameter-optimization-frameworks-in-python-a-conceptual-and-pragmatic-approach-24d9baa1cc69 . Retrieved April 6 2024

Vyas, A., Datta-Gupta, A., Mishra, S. (2017). Modeling early time rate decline in unconventional reservoirs using machine learning techniques. In Society of petroleum engineers—SPE Abu Dhabi international petroleum exhibition and conference 2017 . OnePetro. https://doi.org/10.2118/188231-ms

Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415 , 295–316.

Yousefzadeh, R., Kazemi, A., Ahmadi, M., & Gholinezhad, J. (2023). History matching and robust optimization using proxies. In R. Yousefzadeh, A. Kazemi, M. Ahmadi, & J. Gholinezhad (Eds.), Introduction to geological uncertainty management in reservoir characterization and optimization. SpringerBriefs in petroleum geoscience and engineering. Cham: Springer.

Zhang, C., & Ma, Y. (2012). Ensemble machine learning: Methods and applications . Springer. https://doi.org/10.1007/9781441993267

Zhong, R., Johnson, R., & Chen, Z. (2020). Generating pseudo density log from drilling and logging-while-drilling data using extreme gradient boosting (XGBoost). International Journal of Coal Geology, 220 , 103416.

Zhou, Z. H. (2012). Ensemble methods: Foundations and algorithms (pp. 1–218). Chapman and Hall/CRC. https://doi.org/10.1201/B12207/ENSEMBLE-METHODS-ZHI-HUA-ZHOU

Download references

Acknowledgments

The authors acknowledge support from the Natural Sciences and Engineering Research Council (NSERC) and the University of Calgary’s Canada First Research Excellence Fund program, entitled the Global Research Initiative in Sustainable Low-Carbon Unconventional Resources.

Author information

Authors and affiliations.

Department of Chemical and Petroleum Engineering, Schulich School of Engineering, University of Calgary, 2500 University Dr. NW, Calgary, AB, T2N 1N4, Canada

Hamid Rahmanifard & Ian Gates

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ian Gates .

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 511 kb)

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rahmanifard, H., Gates, I. Application of Data Analytic Techniques and Monte-Carlo Simulation for Forecasting and Optimizing Oil Production from Tight Reservoirs. Nat Resour Res (2024). https://doi.org/10.1007/s11053-024-10358-w

Download citation

Received : 02 March 2024

Accepted : 06 May 2024

Published : 03 June 2024

DOI : https://doi.org/10.1007/s11053-024-10358-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Tight oil reservoirs
  • Machine learning
  • Data analytics
  • Hyperparameter tuning
  • Hydrocarbon production
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 27 May 2024

Discovery of novel RNA viruses through analysis of fungi-associated next-generation sequencing data

  • Xiang Lu 1 , 2   na1 ,
  • Ziyuan Dai 3   na1 ,
  • Jiaxin Xue 2   na1 ,
  • Wang Li 4 ,
  • Ping Ni 4 ,
  • Juan Xu 4 ,
  • Chenglin Zhou 4 &
  • Wen Zhang 1 , 2 , 4  

BMC Genomics volume  25 , Article number:  517 ( 2024 ) Cite this article

371 Accesses

1 Altmetric

Metrics details

Like all other species, fungi are susceptible to infection by viruses. The diversity of fungal viruses has been rapidly expanding in recent years due to the availability of advanced sequencing technologies. However, compared to other virome studies, the research on fungi-associated viruses remains limited.

In this study, we downloaded and analyzed over 200 public datasets from approximately 40 different Bioprojects to explore potential fungal-associated viral dark matter. A total of 12 novel viral sequences were identified, all of which are RNA viruses, with lengths ranging from 1,769 to 9,516 nucleotides. The amino acid sequence identity of all these viruses with any known virus is below 70%. Through phylogenetic analysis, these RNA viruses were classified into different orders or families, such as Mitoviridae , Benyviridae , Botourmiaviridae , Deltaflexiviridae , Mymonaviridae , Bunyavirales , and Partitiviridae . It is possible that these sequences represent new taxa at the level of family, genus, or species. Furthermore, a co-evolution analysis indicated that the evolutionary history of these viruses within their groups is largely driven by cross-species transmission events.

Conclusions

These findings are of significant importance for understanding the diversity, evolution, and relationships between genome structure and function of fungal viruses. However, further investigation is needed to study their interactions.

Peer Review reports

Introduction

Viruses are among the most abundant and diverse biological entities on Earth; they are ubiquitous in the natural environment but difficult to culture and detect [ 1 , 2 , 3 ]. In recent decades, the significant advancements in omics have transformed the field of virology and enabled researchers to detect potential viruses in a variety of environmental samples, helping us to expand the known diversity of viruses and explore the “dark matter” of viruses that may exist in vast quantities [ 4 ]. In most cases, the hosts of these newly discovered viruses exhibit only asymptomatic infections [ 5 , 6 ], and they even play an important role in maintaining the balance, stability, and sustainable development of the biosphere [ 7 ]. But some viruses may be involved in the emergence and development of animal or plant diseases. For example, the tobacco mosaic virus (TMV) causes poor growth in tobacco plants, while norovirus is known to cause diarrhea in mammals [ 8 , 9 ]. In the field of fungal research, viral infections have significantly reduced the yield of edible fungi, thereby attracting increasing attention to fungal diseases caused by viruses [ 10 ]. However, due to their apparent relevance to health [ 11 ], fungal-associated viruses have been understudied compared to viruses affecting humans, animals, or plants.

Mycoviruses (also known as fungal viruses) are widely distributed in various fungi and fungal-like organisms [ 12 ]. The first mycoviruses were discovered in the 1960s by Hollings M in the basidiomycete Agaricus bisporus , an edible cultivated mushroom [ 13 ]. Shortly thereafter, Ellis LF et al. reported mycoviruses in the ascomycete Penicillium stoloniferum , confirming that viral dsRNA is responsible for interferon stimulation in mammals [ 13 , 14 , 15 ]. In recent years, the diversity of known mycoviruses has rapidly increased with the development and widespread application of sequencing technologies [ 16 , 17 , 18 , 19 , 20 ]. According to the classification principles of the International Committee for the Taxonomy of Viruses (ICTV), mycoviruses are currently classified into 24 taxa, consisting of 23 families and 1 genus ( Botybirnavirus ) [ 21 ]. Most mycoviruses belong to double-stranded (ds) RNA viruses, such as families Totiviridae , Partitiviridae , Reoviridae , Chrysoviridae , Megabirnaviridae , Quadriviridae , and genus Botybirnavirus , or positive-sense single-stranded (+ ss) RNA viruses, such as families Alphaflexiviridae , Gammaflexiviridae , Barnaviridae , Hypoviridae , Endornaviridae , Metaviridae and Pseudoviridae . However, negative-sense single-stranded (-ss) RNA viruses (family Mymonaviridae ) and single-stranded (ss) DNA viruses (family Genomoviridae ) have also been described [ 22 ]. The taxonomy of mycoviruses is continually refined as novel mycoviruses that cannot be classified into any established taxon are identified. While the vast majority of fungi-infecting viruses do not show infection characteristics and have no significant impact on their hosts, some mycoviruses have inhibitory effects on the phenotype of the host, leading to hypovirulence in phytopathogenic fungi [ 23 ]. The use of environmentally friendly, low-virulence-related mycoviruses such as Chryphonectria hypovirus 1 (CHV-1) for biological control has been considered a viable alternative to chemical fungicides [ 24 ]. With the deepening of research, an increasing number of mycoviruses that can cause fungal phenotypic changes have been identified [ 3 , 23 , 25 ]. Therefore, understanding the distribution of these viruses and their effects on hosts will allow us to determine whether their infections can be prevented and treated.

To explore the viral dark matter hidden within fungi, this study collected over 200 available fungal-associated libraries from approximately 40 Bioprojects in the Sequence Read Archive (SRA) database, uncovering novel RNA viruses within them. We further elucidated the genetic relationships between known viruses and these newfound ones, thereby expanding our understanding of fungal-associated viruses and providing assistance to viral taxonomy.

Materials and methods

Genome assembly.

To discover novel fungal-associated viruses, we downloaded 236 available libraries from the SRA database, corresponding to 32 fungal species (Supplementary Table 1). Pfastq-dump v0.1.6 ( https://github.com/inutano/pfastq-dump ) was used to convert SRA format files to fastq format files. Subsequently, Bowtie2 v2.4.5 [ 26 ] was employed to remove host sequences. Primer sequences of raw reads underwent trimming using Trim Galore v0.6.5 ( https://www.bioinformatics.babraham.ac.uk/projects/trim_galore ), and the resulting files underwent quality control with the options ‘–phred33 –length 20 –stringency 3 –fastqc’. Duplicated reads were marked using PRINSEQ-lite v0.20.4 (-derep 1). All SRA datasets were then assembled in-house pipeline. Paired-end reads were assembled using SPAdes v3.15.5 [ 27 ] with the option ‘-meta’, while single-end reads were assembled with MEGAHIT v1.2.9 [ 28 ], both using default parameters. The results were then imported into Geneious Prime v2022.0.1 ( https://www.geneious.com ) for sorting and manual confirmation. To reduce false negatives during sequence assembly, further semi-automatic assembly of unmapped contigs and singlets with a sequence length < 500 nt was performed. Contigs with a sequence length > 1,500 nt after reassembly were retained. Individual contigs were then used as references for mapping to the raw data using the Low Sensitivity/Fastest parameter in Geneious Prime. In addition, mixed assembly was performed using MEGAHIT in combination with BWA v0.7.17 [ 29 ] to search for unused reads that might correspond to low-abundance contigs.

Searching for novel viruses in fungal libraries

We identified novel viral sequences present in fungal libraries through a series of steps. To start, we established a local viral database, consisting of the non-redundant protein (nr) database downloaded in August 2023, along with IMG/VR v3 [ 30 ], for screening assembled contigs. The contigs labeled as “viruses” and exhibiting less than 70% amino acid (aa) sequence identity with the best match in the database were imported into Geneious Prime for manual mapping. Putative open reading frames (ORFs) were predicted by Geneious Prime using built-in parameters (Minimum size: 100) and were subsequently verified by comparison to related viruses. The annotations of these ORFs were based on comparisons to the Conserved Domain Database (CDD). The sequences after manual examination were subjected to genome clustering using MMseqs2 (-k 0 -e 0.001 –min-seq-id 0.95 -c 0.9 –cluster-mode 0) [ 31 ]. After excluding viruses with high aa sequence identity (> 70%) to known viruses, a dataset containing a total of 12 RNA viral sequences was obtained. The non-redundant fungal virus dataset was compared against the local database using the BLASTx program built in DIAMOND v2.0.15 [ 32 ], and significant sequences with a cut-off E-value of < 10 –5 were selected. The coverage of each sequence in all libraries was calculated using the pileup tool in BBMap. Taxonomic identification was conducted using TaxonKit [ 33 ] software, along with the rma2info program integrated into MEGAN6 [ 34 ]. The RNA secondary structure prediction of the novel viruses was conducted using RNA Folding Form V2.3 ( http://www.unafold.org/mfold/applications/rna-folding-form-v2.php ).

Phylogenetic analysis

To infer phylogenetic relationships, nucleotide and their encoded protein sequences of reference strains belonging to different groups of corresponding viruses were downloaded from the NCBI GenBank database, along with sequences of proposed species pending ratification. Related sequences were aligned using the alignment program within the CLC Genomics Workbench 10.0, and the resulting alignment was further optimized using MUSCLE in MEGA-X [ 35 ]. Sites containing more than 50% gaps were temporarily removed from the alignments. Maximum-likelihood (ML) trees were then constructed using IQ-TREE v1.6.12 [ 36 ]. All phylogenetic trees were created using IQ-TREE with 1,000 bootstrap replicates (-bb 1000) and the ModelFinder function (-m MFP). Interactive Tree Of Life (iTOL) was used for visualizing and editing phylogenetic trees [ 37 ]. Colorcoded distance matrix analysis between novel viruses and other known viruses were performed with Sequence Demarcation Tool v1.2 [ 38 ].

To illustrate cross-species transmission and co-divergence between viruses and their hosts across different virus groups, we reconciled the co-phylogenetic relationships between these viruses and their hosts. The evolutionary tree and topologies of the hosts involved in this study were obtained from the TimeTree [ 39 ] website by inputting their Latin names. The viruses in the phylogenetic tree for which the host cannot be recognized through published literature or information provided by the authors are disregarded. The co-phylogenetic plots (or ‘tanglegram’) generated using the R package phytools [ 40 ] visually represent the correspondence between host and virus trees, with lines connecting hosts and their respective viruses. The event-based program eMPRess [ 41 ] was employed to determine whether the pairs of virus groups and their hosts undergo coevolution. This tool reconciles pairs of phylogenetic trees according to the Duplication-Transfer-Loss (DTL) model [ 42 ], employing a maximum parsimony formulation to calculate the cost of each coevolution event. The cost of duplication, host-jumping (transfer), and extinction (loss) event types were set to 1.0, while host-virus co-divergence was set to zero, as it was considered the null event.

Data availability

The data reported in this paper have been deposited in the GenBase in National Genomics Data Center [ 43 ], Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, under accession numbers C_AA066339.1-C_AA066350.1 that are publicly accessible at https://ngdc.cncb.ac.cn/genbase . Please refer to Table  1 for details.

Twelve novel RNA viruses associated with fungi

We investigated fungi-associated novel viruses by mining publicly available metagenomic and transcriptomic fungal datasets. In total, we collected 236 datasets, which were categorized into four fungal phyla: Ascomycota (159), Basidiomycota (47), Chytridiomycota (15), and Zoopagomycota (15). These phyla corresponded to 20, 8, 2, and 2 different fungal genera, respectively (Supplementary Table 1). A total of 12 sequences containing complete coding DNA sequences (CDS) for RNA-dependent RNA polymerase (RdRp) have been identified, ranging in length from 1,769 nt to 9,516 nt. All of these sequences have less than 70% aa identity with RdRp sequences from any currently known virus (ranging from 32.97% to 60.43%), potentially representing novel families, genera, or species (Table  1 ). Some of the identified sequences were shorter than the reference genomes of RNA viruses, suggesting that these viral sequences represented partial sequences of viral genomes. To exclude the possibility of transient viral infections in hosts or de novo assembly artefacts in co-infection detection, we extracted the nucleotide sequences of the coding regions of these 12 sequences and mapped them to all collected libraries to compute coverage (Supplementary Table 2). The results revealed varying degrees of read matches for these viral genomes across different libraries, spanning different fungal species. Although we only analyzed sequences longer than 1,500 nt, it is worth noting that we also discovered other viral reads in many libraries. However, we were unable to assemble them into sufficiently long contigs, possibly due to library construction strategies or sequencing depth. In any case, this preliminary finding reveals a greater diversity of fungal-associated viruses than previously considered.

Positive-sense single-stranded RNA viruses

(i) mitoviridae.

Members of the family Mitoviridae (order Cryppavirales ) are monopartite, linear, positive-sense ( +) single-stranded (ss) RNA viruses with genome size of approximately 2.5–2.9 kb [ 44 ], carrying a single long open reading frame (ORF) which encodes a putative RdRp. Mitoviruses have no true virions and no structural proteins, virus genome is transmitted horizontally through mating or vertically from mother to daughter cells [ 45 ]. They use mitochondria as their sites of replication and have typical 5' and 3' untranslated regions (UTRs) of varying sizes, which are responsible for viral translation and replicase recognition [ 46 ]. According to the taxonomic principles of ICTV, the viruses belonging to the family Mitoviridae are divided into four genera, namely Duamitovirus , Kvaramitovirus , Triamitovirus and Unuamitovirus . In this study, two novel viruses belonging to the family Mitoviridae were identified in the same library (SRR12744489; Species: Thielaviopsis ethacetica ), named Thielaviopsis ethacetica mitovirus 1 (TeMV01) and Thielaviopsis ethacetica mitovirus 2 (TeMV02), respectively (Fig.  1 A). The genome sequence of TeMV01 spans 2,689 nucleotides in length with a GC content of 32.2%. Its 5' and 3' UTRs comprise 406 nt and 36 nt, respectively. Similarly, the genome sequence of TeMV02 extends 3,087 nucleotides in length with a GC content of 32.6%. Its 5' and 3' UTRs consist of 553 and 272 nt, respectively. The 5' and 3' ends of both genomes are predicted to have typical stem-loop structures (Fig.  1 B). In order to determine the evolutionary relationship between these two mitoviruses and other known mitoviruses, phylogenetic analysis based on RdRp showed that viral strains were divided into 2 genetic lineages in the genera Duamitovirus and Unuamitovirus (Fig.  1 C). In the genus Unuamitovirus , TeMV01 was clustered with Ophiostoma mitovirus 4, exhibiting the highest aa identity of 51.47%, while in the genus Duamitovirus , TeMV02 was clustered with a strain isolated from Plasmopara viticola , showing the highest aa identity of 42.82%. According to the guidelines from the ICTV regarding the taxonomy of the family Mitoviridae , a species demarcation cutoff of < 70% aa sequence identity is established [ 47 ]. Drawing on this recommendation and phylogenetic inferences, these two viral strains could be presumed to be novel viral species [ 48 ].

figure 1

Identification of novel positive-sense single-stranded RNA viruses in fungal sequencing libraries. A Genome organization of two novel mitoviruses; the putative ORF for the viral RdRp is depicted by a green box, and the predicted conserved domain region is displayed in a gray box. B Predicted RNA secondary structures of the 5'- and 3'-terminal regions. C ML phylogenetic tree of members of the family Mitoviridae . The best-fit model (LG + F + R6) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified viruses represented in red font. D The genome organization of GtBeV is depicted at the top; in the middle is the ML phylogenetic tree of members of the family Benyviridae . The best-fit model (VT + F + R5) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified virus represented in red font. At the bottom is the distance matrix analysis of GeBeV identified in Gaeumannomyces tritici . Pairwise sequence comparison produced with the RdRp amino acid sequences within the ML tree. E The genome organization of CrBV is depicted at the top; in the middle is the ML phylogenetic tree of members of the family Botourmiaviridae . The best-fit model (VT + F + R5) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified virus represented in red font. At the bottom is the distance matrix analysis of CrBV identified in Clonostachys rosea . Pairwise sequence comparison produced with the RdRp amino acid sequences within the ML tree

(ii) Benyviridae

The family Benyviridae is comprised of multipartite plant viruses that are rod-shaped, approximately 85–390 nm in length and 20 nm in diameter. Within this family, there is a single genus, Benyvirus [ 49 ]. It is reported that one species within this genus,Beet necrotic yellow vein virus, can cause widespread and highly destructive soil-borne ‘rhizomania’ disease of sugar beet [ 50 ]. A full-length RNA1 sequence related to Benyviridae has been detected from Gaeumannomyces tritici (ERR3486062), with a length of 6,479 nt. It possesses a poly(A) tail at the 3' end and is temporarily designated as Gaeumannomyces tritici benyvirus (GtBeV). BLASTx results indicate a 34.68% aa sequence identity with the best match found (Fig.  1 D). The non-structural polyprotein CDS of RNA1 encodes a large replication-associated protein of 1,688 amino acids with a molecular mass of 190 kDa. Four domains were predicted in this polyprotein corresponding to representative species within the family Benyviridae . The viral methyltransferase (Mtr) domain spans from nucleotide position 386 to 1411, while the RNA helicase (Hel) domain occupies positions 2113 to 2995 nt. Additionally, the protease (Pro) domain is located between positions 3142 and 3410 nt, and the RdRp domain is located at 4227 to 4796 nt. A phylogenetic analysis was conducted by integrating RdRp sequences of viruses closely related to GtBeV. The result revealed that GtBeV clustered within the family Benyviridae , exhibiting substantial evolutionary divergence from any other sequences. Consequently, this virus likely represents a novel species in the family Benyviridae .

(iii) Botourmiaviridae

The family Botourmiaviridae comprises viruses infecting plants and filamentous fungi, which may possess mono- or multi-segmented genomes [ 51 ]. Recent research has led to a rapid expansion in the number of viruses within the family Botourmiaviridae , increasing from the confirmed 4 genera in 2020 to a total of 12 genera. A contig identified from Clonostachys rosea (ERR5928658) using the BLASTx method exhibited similarity to viruses in the family Botourmiaviridae . After manual mapping, a 2,903 nt-long genome was obtained, tentatively named Clonostachys rosea botourmiavirus (CrBV), which includes a complete RdRP region (Fig.  1 E). Based on phylogenetic analysis using RdRp, CrBV clustered with members of the genus Magoulivirus , sharing 56.58% aa identity with a strain identified from Eclipta prostrata . However, puzzlingly, according to the ICTV's Genus/Species demarcation criteria, members of different genera/species within the family Botourmiaviridae share less than 70%/90% identity in their complete RdRP amino acid sequences. Furthermore, the RdRp sequences with accession numbers NC_055143 and NC_076766, both considered to be members of the genus Magoulivirus , exhibited only 39.05% aa identity to each other. Therefore, CrBV should at least be considered as a new species within the family Botourmiaviridae .

(iv) Deltaflexiviridae

An assembled sequence of 3,425 nucleotides in length Lepista sordida deltaflexivirus (LsDV), derived from Lepista sordida (DRR252167) and showing homology to Deltaflexiviridae within the order Tymovirales , was obtained. The Tymovirales comprises five recognized families: Alphaflexiviridae , Betaflexiviridae , Deltaflexiviridae , Gammaflexiviridae , and Tymoviridae [ 52 ]. The Deltaflexiviridae currently only includes one genus, the fungal-associated deltaflexivirus; they are mostly identified in fungi or plants pathogens [ 53 ]. LsDV was predicted to have a single large ORF, VP1, which starts with an AUG codon at nt 163–165 and ends with a UAG codon at nt 3,418–3,420. This ORF encodes a putative polyprotein of 1,086 aa with a calculated molecular mass of 119 kDa. Two conserved domains within the VP1 protein were identified: Hel and RdRp (Fig.  2 A). However, the Mtr was missing, indicating that the 5' end of this polyprotein is incomplete. According to the phylogenetic analysis of RdRp, LsDV was closely related to viruses of the family Deltaflexiviridae and shared 46.61% aa identity with a strain (UUW06602) isolated from Macrotermes carbonarius . Despite this, according to the species demarcation criteria proposed by ICTV, because we couldn't recover the entire replication-associated polyprotein, LsDV cannot be regarded as a novel species at present.

figure 2

Identification of novel members of family Deltaflexiviridae and Toga-like virus in fungal sequencing libraries. A On the right side of the image is the genome organization of LsDV; the putative ORF for the viral RdRp is depicted by a green box, and the predicted conserved domain region is displayed in a gray box. ML phylogenetic tree of members of the family Deltaflexiviridae . The best-fit model (VT + F + R6) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified virus represented in red font. B The genome organization of GtTlV is depicted at the top; the putative ORF for the viral RdRp is depicted by a green box, and the predicted conserved domain region is displayed in a gray box. ML phylogenetic tree of members of the order Martellivirales . The best-fit model (LG + R7) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified virus represented in red font

(v) Toga-like virus

Members of the family Togaviridae are primarily transmitted by arthropods and can infect a wide range of vertebrates, including mammals, birds, reptiles, amphibians, and fish [ 54 ]. Currently, this family only contains a single confirmed genus, Alphavirus . A contig was discovered in Gaeumannomyces tritici (ERR3486058), it is 7,588 nt in length with a complete ORF encoding a putative protein of 1,928 aa, which had 60.43% identity to Fusarium sacchari alphavirus-like virus 1 (QIQ28421) with 97% coverage. Phylogenetic analysis showed that it did not cluster with classical alphavirus members such as VEE, WEE, EEE, SF complex [ 54 ], but rather with several sequences annotated as Toga-like that were available (Fig.  2 B). It was provisionally named Gaeumannomyces tritici toga-like virus (GtTIV). However, we remain cautious about the accuracy of these so-called Toga-like sequences, as they show little significant correlation with members of the order Martellivirales .

Negative-sense single-stranded RNA viruses

(i) mymonaviridae.

Mymonaviridae is a family of linear, enveloped, negative-stranded RNA genomes in the order Mononegavirales , which infect fungi. They are approximately 10 kb in size and encode six proteins [ 55 ]. The famliy Mymonaviridae was established to accommodate Sclerotinia sclerotiorum negative-stranded RNA virus 1 (SsNSRV-1), a novel virus discovered in a hypovirulent strain of Sclerotinia sclerotiorum [ 56 ]. According to the ICTV, the family Mymonaviridae currently includes 9 genera, namely Auricularimonavirus , Botrytimonavirus , Hubramonavirus , Lentimonavirus , Penicillimonavirus , Phyllomonavirus , Plasmopamonavirus , Rhizomonavirus and Sclerotimonavirus . Two sequences originating from Gaeumannomyces tritici (ERR3486068) and Aspergillus puulaauensis (DRR266546), respectively, and associated with the family Mymonaviridae , have been identified and provisionally named Gaeumannomyces tritici mymonavirus (GtMV) and Aspergillus puulaauensis mymonavirus (ApMV). GtMV is 9,339 nt long with a GC content of 52.8%. It was predicted to contain 5 discontinuous ORFs, with the largest one encoding RdRp. Additionally, a nucleoprotein and three hypothetical proteins with unknown function were also predicted. A multiple alignment of nucleotide sequences among these ORFs identified a semi-conserved sequence, 5'-UAAAA-CUAGGAGC-3', located downstream of each ORF (Fig.  3 A). These regions are likely gene-junction regions in the GtMV genome, a characteristic feature shared by mononegaviruses [ 57 , 58 ]. For ApMV, a complete RdRp CDS with a length of 1,978 aa was predicted. The BLASTx searches showed that GtMV shared 45.22% identity with the RdRp of Soybean leaf-associated negative-stranded RNA virus 2 (YP_010784557), while ApMV shared 55.90% identity with the RdRp of Erysiphe necator associated negative-stranded RNA virus 23 (YP_010802816). The representative members of the family Mymonaviridae were included in the phylogenetic analysis. The results showed that GtMV and ApMV clustered closely with members of the genera Sclerotimonavirus and Plasmopamonavirus , respectively (Fig.  3 B). Members of the genus Plasmopamonavirus are about 6 kb in size and encode for a single protein. Therefore, GtMV and ApMV should be considered as representing new species within their respective genera.

figure 3

Identification of two new members in the family Mymonaviridae . A At the top is the nucleotide multiple sequence alignment result of GtMV with the reference genomes. the putative ORF for the viral RdRp is depicted by a green box, the predicted nucleoprotein is displayed in a yellow box, and three hypothetical proteins are displayed in gray boxes. The comparison of putative semi-conserved regions between ORFs in GtMV is displayed in the 5' to 3' orientation, with conserved sequences are highlighted. At the bottom is the genome organization of AmPV; the putative ORF for the viral RdRp is depicted by a green box. B ML phylogenetic tree of members of the family Mymonaviridae . The best-fit model (LG + F + R6) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified viruses represented in red font

(ii) Bunyavirales

The Bunyavirales (the only order in the class Ellioviricetes ) is one of the largest groups of segmented negative-sense single-stranded RNA viruses with mainly tripartite genomes [ 59 ], which includes many pathogenic strains that infect arthropods(such as mosquitoes, ticks, sand flies), plants, protozoans, and vertebrates, and even cause severe human diseases. Order Bunyavirales consists of 14 viral families, including Arenaviridae , Cruliviridae , Discoviridae , Fimoviridae , Hantaviridae , Leishbuviridae , Mypoviridae , Nairoviridae , Peribunyaviridae , Phasmaviridae , Phenuiviridae , Tospoviridae , Tulasviridae and Wupedeviridae . In this study, three complete or near complete RNA1 sequences related to bunyaviruses were identified and named according to their respective hosts: CoBV ( Conidiobolus obscurus bunyavirus; SRR6181013; 7,277 nt), GtBV ( Gaeumannomyces tritici bunyavirus; ERR3486069; 7,364 nt), and TaBV ( Thielaviopsis aethacetica bunyavirus; SRR12744489; 9,516 nt) (Fig.  4 A). The 5' and 3' terminal RNA segments of GtBV and TaBV complement each other, allowing the formation of a panhandle structure [ 60 ], which plays an essential role as promoters of genome transcription and replication [ 61 ], except for CoBV, as the 3' terminal of CoBV has not been fully obtained (Fig.  4 B). BLASTx results indicated that these three viruses had identities ranging from 32.97% to 54.20% to the best matches in the GenBank database. Phylogenetic analysis indicated that CoBV was classified into the family Phasmaviridae , with distant relationships to any of its genera; GtBV clustered well with members of the genus Entovirus of family Phenuiviridae ; while TaBV did not cluster with any known members of families within Bunyavirales , hence provisionally placed within the Bunya-like group (Fig.  4 C). Therefore, these three sequences should be considered as potential new family, genus, or species within the order Bunyavirales .

figure 4

Identification of three new members in the order Bunyavirales . A The genome organization of CoBV, GtBV, and TaBV; the putative ORF for the viral RdRp is depicted by a green box, and the predicted conserved domain region is displayed in a gray box. B The complementary structures formed at the 5' and 3' ends of GtBV and TaBV. C ML phylogenetic tree of members of the order Bunyavirales . The best-fit model (VT + F + R8) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified viruses represented in red font

Double-stranded RNA viruses

Partitiviridae.

The Partitiviridae is a family of small, non-enveloped viruses, approximately 35–40 nm in diameter, with bisegmented double-stranded (ds) RNA genomes. Each segment is about 1.4–3.0 kb in size, resulting in a total size about 4 kb [ 62 ]. The family Partitiviridae is now divided into five genera: Alphapartitivirus , Betapartiivirus , Cryspovirus , Deltapartitivirus and Gammapartitivirus . Each genus has characteristic hosts: plants or fungi for Alphapartitivirus and Betapartitivirus , fungi for Gammapartitivirus , plants for Deltapartitivirus , and protozoa for Cryspovirus [ 62 ]. A complete dsRNA1 sequence Neocallimastix californiae partitivirus (NcPV) retrieved from Neocallimastix californiae (SRR15362281) has been identified as being associated with the family Partitiviridae . The BLASTp result indicated that it shared the highest aa identity of 41.5% with members of the genus Gammapartitivirus . According to the phylogenetic tree constructed based on RdRp, NcPV was confirmed to fall within the genus Gammapartitivirus (Fig.  5 ). Typical members of the genus Gammapartitivirus have two segments in their complete genome, namely dsRNA1 and dsRNA2, encoding RdRp and coat protein, respectively [ 62 ]. The larger dsRNA1 segment of NcPV measures 1,769 nt in length, with a GC content of 35.8%. It contains a single ORF encoding a 561 aa RdRp. A CDD search revealed that the RdRp of NcPV harbors a catalytic region spanning from 119 to 427aa. Regrettably, only the complete dsRNA1 segment was obtained. According to the classification principles of ICTV, due to the lack of information regarding dsRNA2, we are unable to propose it as a new species. It is worth noting that according to the Genus demarcation criteria ( https://ictv.global/report/chapter/partitiviridae/partitiviridae ), members of the genus Gammapartitivirus should have a dsRNA1 length ranging from 1645 to 1787 nt, and the RdRp length should fall between 519 and 539 aa. However, the length of dsRNA1 in NcPV is 1,769 nt, with RdRp being 561 aa, challenging this classification criterion. In fact, multiple strains have already exceeded this criterion, such as GenBank accession numbers: WBW48344, UDL14336, QKK35392, among others.

figure 5

Identification of a new member in the family Partitiviridae . The genome organization of NcPV is depicted at the top; the putative ORF for the viral RdRp is depicted by a green box, and the predicted conserved domain region is displayed in a gray box. At the bottom is the ML phylogenetic tree of members of the family Partitiviridae . The best-fit model (VT + F + R4) was estimated using IQ-Tree model selection. The bootstrap value is shown at each branch, with the newly identified virus represented in red font

Long-term evolutionary relationships between fungal-associated viruses and hosts

Understanding the co-divergence history between viruses and hosts helps reveal patterns of virus transmission and infection and influences the biodiversity and stability of ecosystems. To explore the frequency of cross-species transmission and co-divergence among fungi-associated viruses, we constructed tanglegrams illustrating the interconnected evolutionary histories of viral families and their respective hosts through phylogenetic trees (Fig.  6 A). The results indicated that cross-species transmission (Host-jumping) consistently emerged as the most frequent evolutionary event among all groups of RNA viruses examined in this study (median, 66.79%; range, 60.00% to 79.07%) (Fig.  6 B). This finding is highly consistent with the evolutionary patterns of RNA viruses recently identified by Mifsud et al. in their extensive transcriptome survey of plants [ 63 ]. Members of the families Botourmiaviridae (79.07%) and Deltaflexiviridae (72.41%) were most frequently involved in cross-species transmission. The frequencies of co-divergence (median, 20.19%; range, 6.98% to 27.78%), duplication (median, 10.60%; range, 0% to 22.45%), and extinction (median, 2.42%; range, 0% to 5.56%) events involved in the evolution of fungi-associated viruses gradually decrease. Specifically, members of the family Benyviridae exhibited the highest frequency of co-divergence events, which also supports the findings reported by Mifsud et al.; certain studies propose that members of Benyviridae are transmitted via zoospores of plasmodiophorid protist [ 64 ]. It's speculated that the ancestor of these viruses underwent interkingdom horizontal transfer between plants and protists over evolutionary timelines [ 65 ]. Members of the family Mitoviridae showed the highest frequency of duplication events; and members of the families Benyviridae and Partitiviridae demonstrated the highest frequency of extinction events. Not surprisingly, this result is influenced by the current limited understanding of virus-host relationships. On one hand, viruses whose hosts cannot be recognized through published literature or information provided by authors have been overlooked. On the other hand, the number of viruses recorded in reference databases represents just the tip of the iceberg within the entire virosphere. The involvement of a more extensive sample size in the future should change this evolutionary landscape.

figure 6

Co-evolutionary analysis of virus and host. A Tanglegram of phylogenetic trees for virus orders/families and their hosts. Lines and branches are color-coded to indicate host clades. The cophylo function in phytools was employed to enhance congruence between the host (left) and virus (right) phylogenies. B Reconciliation analysis of virus groups. The bar chart illustrates the proportional range of possible evolutionary events, with the frequency of each event displayed at the top of its respective column

Our understanding of the interactions between fungi and their associated viruses has long been constrained by insufficient sampling of fungal species. Advances in metagenomics in recent decades have led to a rapid expansion of the known viral sequence space, but it is far from saturated. The diversity of hosts, the instability of the viral structures (especially RNA viruses), and the propensity to exchange genetic material with other host viruses all contribute to the unparalleled diversity of viral genomes [ 66 ]. Fungi are diverse and widely distributed in nature and are closely related to humans. A few fungi can parasitize immunocompromised humans, but their adverse effects are limited. As decomposers in the biological chain, fungi can decompose the remains of plants and animals and maintain the material cycle in the biological world [ 67 ]. In agricultural production, many fungi are plant pathogens, and about 80% of plant diseases are caused by fungi. However, little is currently known about the diversity of mycoviruses and how these viruses affect fungal phenotypes, fungal-host interactions, and virus evolution, and the sequencing depth of fungal libraries in most public databases only meets the needs of studying bacterial genomes. Sampling viruses from a larger diversity of fungal hosts should lead to new and improved evolutionary scenarios.

RNA viruses are widespread in deep-sea sediments [ 68 ], freshwater [ 69 ], sewage [ 70 ], and rhizosphere soils [ 71 ]. Compared to DNA viruses, RNA viruses are less conserved, prone to mutation, and can transfer between different hosts, potentially forming highly differentiated and unrecognized novel viruses. This characteristic increases the difficulty of monitoring these viruses. Previously, all discovered mycoviruses were RNA viruses. Until 2010, Yu et al. reported the discovery of a DNA virus, namely SsHADV-1, in fungi for the first time [ 72 ]. Subsequently, new fungal-related DNA viruses are continually being identified [ 73 , 74 , 75 ]. Currently, viruses have been found in all major groups of fungi, and approximately 100 types of fungi can be infected by viruses, instances exist where one virus can infect multiple fungi, or one fungus can be infected by several viruses simultaneously. The transmission of mycoviruses differs from that of animal and plant viruses and is mainly categorized into vertical and horizontal transmission [ 76 ]. Vertical transmission refers to the spread of the mycovirus to the next generation through the sexual or asexual spores of the fungus, while horizontal transmission refers to the spread of the mycovirus from one strain to another through fusion between hyphae. In the phylum Ascomycota , mycoviruses generally exhibit a low ability to transmit vertically through ascospores, but they are commonly transmitted vertically to progeny strains through asexual spores [ 77 ].

In this study, we identified two novel species belonging to different genera within the family Mitoviridae . Interestingly, they both simultaneously infect the same fungus— Thielaviopsis ethacetica , the causal agent of pineapple sett rot disease in sugarcane [ 78 ]. Previously, a report identified three different mitoviruses in Fusarium circinatum [ 79 ]. These findings suggest that there may be a certain level of adaptability or symbiotic relationship among members of the family Mitoviridae . Benyviruses are typically considered to infect plants, but recent evidence suggests that they can also infect fungi, such as Agaricus bisporus [ 80 ], further reinforced by the virus we discovered in Gaeumannomyces tritici . Moreover, members of the family Botourmiaviridae commonly exhibit a broad host range, with viruses closely related to CrBV capable of infecting members of Eukaryota , Viridiplantae , and Metazoa , in addition to fungi (Supplementary Fig. 1). The LsDV identified in this study shared the closest phylogenetic relationship with a virus identified from Macrotermes carbonarius in southern Vietnam (17_N1 + N237) [ 81 ]. M. carbonarius is an open-air foraging species that collects plant litter and wood debris to cultivate fungi in fungal gardens [ 82 ], termites may act as vectors, transmitting deltaflexivirus to other fungi. Furthermore, the viruses we identified, typically associated with fungi, also deepen their connections with species from other kingdoms on the tanglegram tree. For example, while Partitiviridae are naturally associated with fungi and plants, NcPV also shows close connections with Metazoa . In fact, based largely on phylogenetic predictions, various eukaryotic viruses have been found to undergo horizontal transfer between organisms of plants, fungi, and animals [ 83 ]. The rice dwarf virus was demonstrated to infect both plant and insect vectors [ 84 ]; moreover, plant-infecting rhabdoviruses, tospoviruses, and tenuiviruses are now known to replicate and spread in vector insects and shuttle between plants and animals [ 85 ]. Furthermore, Bian et al. demonstrated that plant virus infection in plants enables Cryphonectria hypovirus 1 to undergo horizontal transfer from fungi to plants and other heterologous fungal species [ 86 ].

Recent studies have greatly expanded the diversity of mycoviruses [ 87 , 88 ]. Gilbert et al. [ 20 ] investigated publicly available fungal transcriptomes from the subphylum Pezizomycotina, resulting in the detection of 52 novel mycoviruses; Myers et al. [ 18 ] employed both culture-based and transcriptome-mining approaches to identify 85 unique RNA viruses across 333 fungi; Ruiz-Padilla et al. identified 62 new mycoviral species from 248 Botrytis cinerea field isolates; Zhou et al. identified 20 novel viruses from 90 fungal strains (across four different macrofungi species) [ 89 ]. However, compared to these studies, our work identified fewer novel viruses, possibly due to the following reasons: 1) The libraries from the same Bioproject are usually from the same strains (or isolates). Therefore, there is a certain degree of redundancy in the datasets collected for this study. 2) Contigs shorter than 1,500 nt were discarded, potentially resulting in the oversight of short viral molecules. 3) Establishing a threshold of 70% aa sequence identity may also lead to the exclusion of certain viruses. 4) Some poly(A)-enriched RNA-seq libraries are likely to miss non-polyadenylated RNA viral genomes.

Taxonomy is a dynamic science, evolving with improvements in analytical methods and the emergence of new data. Identifying and rectifying incorrect classifications when new information becomes available is an ongoing and inevitable process in today's rapidly expanding field of virology. For instance, in 1975, members of the genera Rubivirus and Alphavirus were initially grouped under the family Togaviridae ; however, in 2019, Rubivirus was reclassified into the family Matonaviridae due to recognized differences in transmission modes and virion structures [ 90 ]. Additionally, the conflicts between certain members of the genera Magoulivirus and Gammapartitivirus mentioned here and their current demarcation criteria (e.g., amino acid identity, nucleotide length thresholds) need to be reconsidered.

Taken together, these findings reveal the potential diversity and novelty within fungal-associated viral communities and discuss the genetic similarities among different fungal-associated viruses. These findings advance our understanding of fungal-associated viruses and suggest the importance of subsequent in-depth investigations into the interactions between fungi and viruses, which will shed light on the important roles of these viruses in the global fungal kingdom.

Availability of data and materials

The data reported in this paper have been deposited in the GenBase in National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, under accession numbers C_AA066339.1-C_AA066350.1 that are publicly accessible at https://ngdc.cncb.ac.cn/genbase . Please refer to Table  1 for details.

Leigh DM, Peranic K, Prospero S, Cornejo C, Curkovic-Perica M, Kupper Q, et al. Long-read sequencing reveals the evolutionary drivers of intra-host diversity across natural RNA mycovirus infections. Virus Evol. 2021;7(2):veab101. https://doi.org/10.1093/ve/veab101 . Epub 2022/03/19 PubMed PMID: 35299787; PubMed Central PMCID: PMCPMC8923234.

Article   PubMed   PubMed Central   Google Scholar  

Ghabrial SA, Suzuki N. Viruses of plant pathogenic fungi. Annu Rev Phytopathol. 2009;47:353–84. https://doi.org/10.1146/annurev-phyto-080508-081932 . Epub 2009/04/30 PubMed PMID: 19400634.

Article   CAS   PubMed   Google Scholar  

Ghabrial SA, Caston JR, Jiang D, Nibert ML, Suzuki N. 50-plus years of fungal viruses. Virology. 2015;479–480:356–68. https://doi.org/10.1016/j.virol.2015.02.034 . Epub 2015/03/17 PubMed PMID: 25771805.

Chen YM, Sadiq S, Tian JH, Chen X, Lin XD, Shen JJ, et al. RNA viromes from terrestrial sites across China expand environmental viral diversity. Nat Microbiol. 2022;7(8):1312–23. https://doi.org/10.1038/s41564-022-01180-2 . Epub 2022/07/29 PubMed PMID: 35902778.

Pearson MN, Beever RE, Boine B, Arthur K. Mycoviruses of filamentous fungi and their relevance to plant pathology. Mol Plant Pathol. 2009;10(1):115–28. https://doi.org/10.1111/j.1364-3703.2008.00503.x . Epub 2009/01/24 PubMed PMID: 19161358; PubMed Central PMCID: PMCPMC6640375.

Santiago-Rodriguez TM, Hollister EB. Unraveling the viral dark matter through viral metagenomics. Front Immunol. 2022;13:1005107. https://doi.org/10.3389/fimmu.2022.1005107 . Epub 2022/10/04 PubMed PMID: 36189246; PubMed Central PMCID: PMCPMC9523745.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Srinivasiah S, Bhavsar J, Thapar K, Liles M, Schoenfeld T, Wommack KE. Phages across the biosphere: contrasts of viruses in soil and aquatic environments. Res Microbiol. 2008;159(5):349–57. https://doi.org/10.1016/j.resmic.2008.04.010 . Epub 2008/06/21 PubMed PMID: 18565737.

Guo W, Yan H, Ren X, Tang R, Sun Y, Wang Y, et al. Berberine induces resistance against tobacco mosaic virus in tobacco. Pest Manag Sci. 2020;76(5):1804–13. https://doi.org/10.1002/ps.5709 . Epub 2019/12/10 PubMed PMID: 31814252.

Villabruna N, Izquierdo-Lara RW, Schapendonk CME, de Bruin E, Chandler F, Thao TTN, et al. Profiling of humoral immune responses to norovirus in children across Europe. Sci Rep. 2022;12(1):14275. https://doi.org/10.1038/s41598-022-18383-6 . Epub 2022/08/23 PubMed PMID: 35995986.

Zhang Y, Gao J, Li Y. Diversity of mycoviruses in edible fungi. Virus Genes. 2022;58(5):377–91. https://doi.org/10.1007/s11262-022-01908-6 . Epub 2022/06/07 PubMed PMID: 35668282.

Shkoporov AN, Clooney AG, Sutton TDS, Ryan FJ, Daly KM, Nolan JA, et al. The human gut virome is highly diverse, stable, and individual specific. Cell Host Microbe. 2019;26(4):527–41. https://doi.org/10.1016/j.chom.2019.09.009 . Epub 2019/10/11 PubMed PMID: 31600503.

Botella L, Janousek J, Maia C, Jung MH, Raco M, Jung T. Marine Oomycetes of the Genus Halophytophthora harbor viruses related to Bunyaviruses. Front Microbiol. 2020;11:1467. https://doi.org/10.3389/fmicb.2020.01467 . Epub 2020/08/08 PubMed PMID: 32760358; PubMed Central PMCID: PMCPMC7375090.

Kotta-Loizou I. Mycoviruses and their role in fungal pathogenesis. Curr Opin Microbiol. 2021;63:10–8. https://doi.org/10.1016/j.mib.2021.05.007 . Epub 2021/06/09 PubMed PMID: 34102567.

Ellis LF, Kleinschmidt WJ. Virus-like particles of a fraction of statolon, a mould product. Nature. 1967;215(5101):649–50. https://doi.org/10.1038/215649a0 . Epub 1967/08/05 PubMed PMID: 6050227.

Banks GT, Buck KW, Chain EB, Himmelweit F, Marks JE, Tyler JM, et al. Viruses in fungi and interferon stimulation. Nature. 1968;218(5141):542–5. https://doi.org/10.1038/218542a0 . Epub 1968/05/11 PubMed PMID: 4967851.

Jia J, Fu Y, Jiang D, Mu F, Cheng J, Lin Y, et al. Interannual dynamics, diversity and evolution of the virome in Sclerotinia sclerotiorum from a single crop field. Virus Evol. 2021;7(1):veab032. https://doi.org/10.1093/ve/veab032 .

Mu F, Li B, Cheng S, Jia J, Jiang D, Fu Y, et al. Nine viruses from eight lineages exhibiting new evolutionary modes that co-infect a hypovirulent phytopathogenic fungus. Plos Pathog. 2021;17(8):e1009823. https://doi.org/10.1371/journal.ppat.1009823 . Epub 2021/08/25 PubMed PMID: 34428260; PubMed Central PMCID: PMCPMC8415603.

Myers JM, Bonds AE, Clemons RA, Thapa NA, Simmons DR, Carter-House D, et al. Survey of early-diverging lineages of fungi reveals abundant and diverse Mycoviruses. mBio. 2020;11(5):e02027. https://doi.org/10.1128/mBio.02027-20 . Epub 2020/09/10 PubMed PMID: 32900807; PubMed Central PMCID: PMCPMC7482067.

Ruiz-Padilla A, Rodriguez-Romero J, Gomez-Cid I, Pacifico D, Ayllon MA. Novel Mycoviruses discovered in the Mycovirome of a Necrotrophic fungus. MBio. 2021;12(3):e03705. https://doi.org/10.1128/mBio.03705-20 . Epub 2021/05/13 PubMed PMID: 33975945; PubMed Central PMCID: PMCPMC8262958.

Gilbert KB, Holcomb EE, Allscheid RL, Carrington JC. Hiding in plain sight: new virus genomes discovered via a systematic analysis of fungal public transcriptomes. Plos One. 2019;14(7):e0219207. https://doi.org/10.1371/journal.pone.0219207 . Epub 2019/07/25 PubMed PMID: 31339899; PubMed Central PMCID: PMCPMC6655640.

Khan HA, Telengech P, Kondo H, Bhatti MF, Suzuki N. Mycovirus hunting revealed the presence of diverse viruses in a single isolate of the Phytopathogenic fungus diplodia seriata from Pakistan. Front Cell Infect Microbiol. 2022;12:913619. https://doi.org/10.3389/fcimb.2022.913619 . Epub 2022/07/19 PubMed PMID: 35846770; PubMed Central PMCID: PMCPMC9277117.

Kotta-Loizou I, Coutts RHA. Mycoviruses in Aspergilli: a comprehensive review. Front Microbiol. 2017;8:1699. https://doi.org/10.3389/fmicb.2017.01699 . Epub 2017/09/22 PubMed PMID: 28932216; PubMed Central PMCID: PMCPMC5592211.

Garcia-Pedrajas MD, Canizares MC, Sarmiento-Villamil JL, Jacquat AG, Dambolena JS. Mycoviruses in biological control: from basic research to field implementation. Phytopathology. 2019;109(11):1828–39. https://doi.org/10.1094/PHYTO-05-19-0166-RVW . Epub 2019/08/10 PubMed PMID: 31398087.

Rigling D, Prospero S. Cryphonectria parasitica, the causal agent of chestnut blight: invasion history, population biology and disease control. Mol Plant Pathol. 2018;19(1):7–20. https://doi.org/10.1111/mpp.12542 . Epub 2017/02/01 PubMed PMID: 28142223; PubMed Central PMCID: PMCPMC6638123.

Okada R, Ichinose S, Takeshita K, Urayama SI, Fukuhara T, Komatsu K, et al. Molecular characterization of a novel mycovirus in Alternaria alternata manifesting two-sided effects: down-regulation of host growth and up-regulation of host plant pathogenicity. Virology. 2018;519:23–32. https://doi.org/10.1016/j.virol.2018.03.027 . Epub 2018/04/10 PubMed PMID: 29631173.

Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. https://doi.org/10.1038/nmeth.1923 . Epub 2012/03/06 PubMed PMID: 22388286; PubMed Central PMCID: PMCPMC3322381.

Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo assembler. Curr Protoc Bioinform. 2020;70(1):e102. https://doi.org/10.1002/cpbi.102 . Epub 2020/06/20 PubMed PMID: 32559359.

Article   CAS   Google Scholar  

Li D, Luo R, Liu CM, Leung CM, Ting HF, Sadakane K, et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3–11. https://doi.org/10.1016/j.ymeth.2016.02.020 . Epub 2016/03/26 PubMed PMID: 27012178.

Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–95. https://doi.org/10.1093/bioinformatics/btp698 . Epub 2010/01/19 PubMed PMID: 20080505; PubMed Central PMCID: PMCPMC2828108.

Roux S, Paez-Espino D, Chen IA, Palaniappan K, Ratner A, Chu K, et al. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 2021;49(D1):D764–75. https://doi.org/10.1093/nar/gkaa946 . Epub 2020/11/03 PubMed PMID: 33137183; PubMed Central PMCID: PMCPMC7778971.

Mirdita M, Steinegger M, Soding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics. 2019;35(16):2856–8. https://doi.org/10.1093/bioinformatics/bty1057 . Epub 2019/01/08 PubMed PMID: 30615063; PubMed Central PMCID: PMCPMC6691333.

Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18(4):366–8. https://doi.org/10.1038/s41592-021-01101-x . Epub 2021/04/09 PubMed PMID: 33828273; PubMed Central PMCID: PMCPMC8026399.

Shen W, Ren H. TaxonKit: A practical and efficient NCBI taxonomy toolkit. J Genet Genomics. 2021;48(9):844–50. https://doi.org/10.1016/j.jgg.2021.03.006 . Epub 2021/05/19 PubMed PMID: 34001434.

Article   PubMed   Google Scholar  

Gautam A, Felderhoff H, Bagci C, Huson DH. Using AnnoTree to get more assignments, faster, in DIAMOND+MEGAN microbiome analysis. mSystems. 2022;7(1):e0140821. https://doi.org/10.1128/msystems.01408-21 . Epub 2022/02/23 PubMed PMID: 35191776; PubMed Central PMCID: PMCPMC8862659.

Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35(6):1547–9. https://doi.org/10.1093/molbev/msy096 . Epub 2018/05/04 PubMed PMID: 29722887; PubMed Central PMCID: PMCPMC5967553.

Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4. https://doi.org/10.1093/molbev/msaa015 . Epub 2020/02/06 PubMed PMID: 32011700; PubMed Central PMCID: PMCPMC7182206.

Letunic I, Bork P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res (2024). https://doi.org/10.1093/nar/gkae268

Muhire BM, Varsani A, Martin DP. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation. Plos One. 2014;9(9):e108277. https://doi.org/10.1371/journal.pone.0108277 . Epub 2014/09/27 PubMed PMID: 25259891; PubMed Central PMCID: PMCPMC4178126.

Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, et al. TimeTree 5: an expanded resource for species divergence times. Mol Biol Evol. 2022;39(8):msac174. https://doi.org/10.1093/molbev/msac174 . Epub 2022/08/07 PubMed PMID: 35932227; PubMed Central PMCID: PMCPMC9400175.

Revell LJ. phytools 2.0: an updated R ecosystem for phylogenetic comparative methods (and other things). PeerJ. 2024;12:e16505. https://doi.org/10.7717/peerj.16505 . Epub 2024/01/09 PubMed PMID: 38192598; PubMed Central PMCID: PMCPMC10773453.

Santichaivekin S, Yang Q, Liu J, Mawhorter R, Jiang J, Wesley T, et al. eMPRess: a systematic cophylogeny reconciliation tool. Bioinformatics. 2021;37(16):2481–2. https://doi.org/10.1093/bioinformatics/btaa978 . Epub 2020/11/21 PubMed PMID: 33216126.

Ma W, Smirnov D, Libeskind-Hadas R. DTL reconciliation repair. BMC Bioinformatics. 2017;18(Suppl 3):76. https://doi.org/10.1186/s12859-017-1463-9 . Epub 2017/04/01 PubMed PMID: 28361686; PubMed Central PMCID: PMCPMC5374596.

Members C-N, Partners. Database resources of the national genomics data center, China national center for bioinformation in 2024. Nucleic Acids Res. 2024;52(D1):D18–32. https://doi.org/10.1093/nar/gkad1078 . Epub 2023/11/29 PubMed PMID: 38018256; PubMed Central PMCID: PMCPMC10767964.

Article   Google Scholar  

Shafik K, Umer M, You H, Aboushedida H, Wang Z, Ni D, et al. Characterization of a Novel Mitovirus infecting Melanconiella theae isolated from tea plants. Front Microbiol. 2021;12: 757556. https://doi.org/10.3389/fmicb.2021.757556 . Epub 2021/12/07 PubMed PMID: 34867881; PubMed Central PMCID: PMCPMC8635788

Kamaruzzaman M, He G, Wu M, Zhang J, Yang L, Chen W, et al. A novel Partitivirus in the Hypovirulent isolate QT5–19 of the plant pathogenic fungus Botrytis cinerea. Viruses. 2019;11(1):24. https://doi.org/10.3390/v11010024 . Epub 2019/01/06 PubMed PMID: 30609795; PubMed Central PMCID: PMCPMC6356794.

Akata I, Keskin E, Sahin E. Molecular characterization of a new mitovirus hosted by the ectomycorrhizal fungus Albatrellopsis flettii. Arch Virol. 2021;166(12):3449–54. https://doi.org/10.1007/s00705-021-05250-4 . Epub 2021/09/24 PubMed PMID: 34554305.

Walker PJ, Siddell SG, Lefkowitz EJ, Mushegian AR, Adriaenssens EM, Alfenas-Zerbini P, et al. Recent changes to virus taxonomy ratified by the international committee on taxonomy of viruses (2022). Arch Virol. 2022;167(11):2429–40. https://doi.org/10.1007/s00705-022-05516-5 . Epub 2022/08/24 PubMed PMID: 35999326; PubMed Central PMCID: PMCPMC10088433.

Alvarez-Quinto R, Grinstead S, Jones R, Mollov D. Complete genome sequence of a new mitovirus associated with walking iris (Trimezia northiana). Arch Virol. 2023;168(11):273. https://doi.org/10.1007/s00705-023-05901-8 . Epub 2023/10/17 PubMed PMID: 37845386.

Gilmer D, Ratti C, Ictv RC. ICTV Virus taxonomy profile: Benyviridae. J Gen Virol. 2017;98(7):1571–2. https://doi.org/10.1099/jgv.0.000864 . Epub 2017/07/18 PubMed PMID: 28714846; PubMed Central PMCID: PMCPMC5656776.

Wetzel V, Willlems G, Darracq A, Galein Y, Liebe S, Varrelmann M. The Beta vulgaris-derived resistance gene Rz2 confers broad-spectrum resistance against soilborne sugar beet-infecting viruses from different families by recognizing triple gene block protein 1. Mol Plant Pathol. 2021;22(7):829–42. https://doi.org/10.1111/mpp.13066 . Epub 2021/05/06 PubMed PMID: 33951264; PubMed Central PMCID: PMCPMC8232027.

Ayllon MA, Turina M, Xie J, Nerva L, Marzano SL, Donaire L, et al. ICTV Virus taxonomy profile: Botourmiaviridae. J Gen Virol. 2020;101(5):454–5. https://doi.org/10.1099/jgv.0.001409 . Epub 2020/05/08 PubMed PMID: 32375992; PubMed Central PMCID: PMCPMC7414452.

Xiao J, Wang X, Zheng Z, Wu Y, Wang Z, Li H, et al. Molecular characterization of a novel deltaflexivirus infecting the edible fungus Pleurotus ostreatus. Arch Virol. 2023;168(6):162. https://doi.org/10.1007/s00705-023-05789-4 . Epub 2023/05/17 PubMed PMID: 37195309.

Canuti M, Rodrigues B, Lang AS, Dufour SC, Verhoeven JTP. Novel divergent members of the Kitrinoviricota discovered through metagenomics in the intestinal contents of red-backed voles (Clethrionomys gapperi). Int J Mol Sci. 2022;24(1):131. https://doi.org/10.3390/ijms24010131 . Epub 2023/01/09 PubMed PMID: 36613573; PubMed Central PMCID: PMCPMC9820622.

Hermanns K, Zirkel F, Kopp A, Marklewitz M, Rwego IB, Estrada A, et al. Discovery of a novel alphavirus related to Eilat virus. J Gen Virol. 2017;98(1):43–9. https://doi.org/10.1099/jgv.0.000694 . Epub 2017/02/17 PubMed PMID: 28206905.

Jiang D, Ayllon MA, Marzano SL, Ictv RC. ICTV Virus taxonomy profile: Mymonaviridae. J Gen Virol. 2019;100(10):1343–4. https://doi.org/10.1099/jgv.0.001301 . Epub 2019/09/04 PubMed PMID: 31478828.

Liu L, Xie J, Cheng J, Fu Y, Li G, Yi X, et al. Fungal negative-stranded RNA virus that is related to bornaviruses and nyaviruses. Proc Natl Acad Sci U S A. 2014;111(33):12205–10. https://doi.org/10.1073/pnas.1401786111 . Epub 2014/08/06 PubMed PMID: 25092337; PubMed Central PMCID: PMCPMC4143027.

Zhong J, Li P, Gao BD, Zhong SY, Li XG, Hu Z, et al. Novel and diverse mycoviruses co-infecting a single strain of the phytopathogenic fungus Alternaria dianthicola. Front Cell Infect Microbiol. 2022;12:980970. https://doi.org/10.3389/fcimb.2022.980970 . Epub 2022/10/15 PubMed PMID: 36237429; PubMed Central PMCID: PMCPMC9552818.

Wang W, Wang X, Tu C, Yang M, Xiang J, Wang L, et al. Novel Mycoviruses discovered from a Metatranscriptomics survey of the Phytopathogenic Alternaria Fungus. Viruses. 2022;14(11):2552. https://doi.org/10.3390/v14112552 . Epub 2022/11/25 PubMed PMID: 36423161; PubMed Central PMCID: PMCPMC9693364.

Sun Y, Li J, Gao GF, Tien P, Liu W. Bunyavirales ribonucleoproteins: the viral replication and transcription machinery. Crit Rev Microbiol. 2018;44(5):522–40. https://doi.org/10.1080/1040841X.2018.1446901 . Epub 2018/03/09 PubMed PMID: 29516765.

Li P, Bhattacharjee P, Gagkaeva T, Wang S, Guo L. A novel bipartite negative-stranded RNA mycovirus of the order Bunyavirales isolated from the phytopathogenic fungus Fusarium sibiricum. Arch Virol. 2023;169(1):13. https://doi.org/10.1007/s00705-023-05942-z . Epub 2023/12/29 PubMed PMID: 38155262.

Ferron F, Weber F, de la Torre JC, Reguera J. Transcription and replication mechanisms of Bunyaviridae and Arenaviridae L proteins. Virus Res. 2017;234:118–34. https://doi.org/10.1016/j.virusres.2017.01.018 . Epub 2017/02/01 PubMed PMID: 28137457; PubMed Central PMCID: PMCPMC7114536.

Vainio EJ, Chiba S, Ghabrial SA, Maiss E, Roossinck M, Sabanadzovic S, et al. ICTV Virus taxonomy profile: Partitiviridae. J Gen Virol. 2018;99(1):17–8. https://doi.org/10.1099/jgv.0.000985 . Epub 2017/12/08 PubMed PMID: 29214972; PubMed Central PMCID: PMCPMC5882087.

Mifsud JCO, Gallagher RV, Holmes EC, Geoghegan JL. Transcriptome mining expands knowledge of RNA viruses across the plant Kingdom. J Virol. 2022;96(24):e0026022. https://doi.org/10.1128/jvi.00260-22 . Epub 2022/06/01 PubMed PMID: 35638822; PubMed Central PMCID: PMCPMC9769393.

Tamada T, Kondo H. Biological and genetic diversity of plasmodiophorid-transmitted viruses and their vectors. J Gen Plant Pathol. 2013;79:307–20.

Dolja VV, Krupovic M, Koonin EV. Deep roots and splendid boughs of the global plant virome. Annu Rev Phytopathol. 2020;58:23–53.

Koonin EV, Dolja VV, Krupovic M, Varsani A, Wolf YI, Yutin N, et al. Global organization and proposed Megataxonomy of the virus world. Microbiol Mol Biol Rev. 2020;84(2):e00061. https://doi.org/10.1128/MMBR.00061-19 . Epub 2020/03/07 PubMed PMID: 32132243; PubMed Central PMCID: PMCPMC7062200.

Osono T. Role of phyllosphere fungi of forest trees in the development of decomposer fungal communities and decomposition processes of leaf litter. Can J Microbiol. 2006;52(8):701–16. https://doi.org/10.1139/w06-023 . Epub 2006/08/19 PubMed PMID: 16917528.

Li Z, Pan D, Wei G, Pi W, Zhang C, Wang JH, et al. Deep sea sediments associated with cold seeps are a subsurface reservoir of viral diversity. ISME J. 2021;15(8):2366–78. https://doi.org/10.1038/s41396-021-00932-y . Epub 2021/03/03 PubMed PMID: 33649554; PubMed Central PMCID: PMCPMC8319345.

Hierweger MM, Koch MC, Rupp M, Maes P, Di Paola N, Bruggmann R, et al. Novel Filoviruses, Hantavirus, and Rhabdovirus in freshwater fish, Switzerland, 2017. Emerg Infect Dis. 2021;27(12):3082–91. https://doi.org/10.3201/eid2712.210491 . Epub 2021/11/23 PubMed PMID: 34808081; PubMed Central PMCID: PMCPMC8632185.

La Rosa G, Iaconelli M, Mancini P, Bonanno Ferraro G, Veneri C, Bonadonna L, et al. First detection of SARS-CoV-2 in untreated wastewaters in Italy. Sci Total Environ. 2020;736:139652. https://doi.org/10.1016/j.scitotenv.2020.139652 . Epub 2020/05/29 PubMed PMID: 32464333; PubMed Central PMCID: PMCPMC7245320.

Sutela S, Poimala A, Vainio EJ. Viruses of fungi and oomycetes in the soil environment. FEMS Microbiol Ecol. 2019;95(9):fiz119. https://doi.org/10.1093/femsec/fiz119 . Epub 2019/08/01 PubMed PMID: 31365065.

Yu X, Li B, Fu Y, Jiang D, Ghabrial SA, Li G, et al. A geminivirus-related DNA mycovirus that confers hypovirulence to a plant pathogenic fungus. Proc Natl Acad Sci U S A. 2010;107(18):8387–92. https://doi.org/10.1073/pnas.0913535107 . Epub 2010/04/21 PubMed PMID: 20404139; PubMed Central PMCID: PMCPMC2889581.

Li P, Wang S, Zhang L, Qiu D, Zhou X, Guo L. A tripartite ssDNA mycovirus from a plant pathogenic fungus is infectious as cloned DNA and purified virions. Sci Adv. 2020;6(14):eaay9634. https://doi.org/10.1126/sciadv.aay9634 . Epub 2020/04/15 PubMed PMID: 32284975; PubMed Central PMCID: PMCPMC7138691.

Khalifa ME, MacDiarmid RM. A mechanically transmitted DNA Mycovirus is targeted by the defence machinery of its host, Botrytis cinerea. Viruses. 2021;13(7):1315. https://doi.org/10.3390/v13071315 . Epub 2021/08/11 PubMed PMID: 34372522; PubMed Central PMCID: PMCPMC8309985.

Yu X, Li B, Fu Y, Xie J, Cheng J, Ghabrial SA, et al. Extracellular transmission of a DNA mycovirus and its use as a natural fungicide. Proc Natl Acad Sci U S A. 2013;110(4):1452–7. https://doi.org/10.1073/pnas.1213755110 . Epub 2013/01/09 PubMed PMID: 23297222; PubMed Central PMCID: PMCPMC3557086.

Nuss DL. Hypovirulence: mycoviruses at the fungal-plant interface. Nat Rev Microbiol. 2005;3(8):632–42. https://doi.org/10.1038/nrmicro1206 . Epub 2005/08/03 PubMed PMID: 16064055.

Coenen A, Kevei F, Hoekstra RF. Factors affecting the spread of double-stranded RNA viruses in Aspergillus nidulans. Genet Res. 1997;69(1):1–10. https://doi.org/10.1017/s001667239600256x . Epub 1997/02/01 PubMed PMID: 9164170.

Freitas CSA, Maciel LF, Dos Correa Santos RA, Costa O, Maia FCB, Rabelo RS, et al. Bacterial volatile organic compounds induce adverse ultrastructural changes and DNA damage to the sugarcane pathogenic fungus Thielaviopsis ethacetica. Environ Microbiol. 2022;24(3):1430–53. https://doi.org/10.1111/1462-2920.15876 . Epub 2022/01/08 PubMed PMID: 34995419.

Martinez-Alvarez P, Vainio EJ, Botella L, Hantula J, Diez JJ. Three mitovirus strains infecting a single isolate of Fusarium circinatum are the first putative members of the family Narnaviridae detected in a fungus of the genus Fusarium. Arch Virol. 2014;159(8):2153–5. https://doi.org/10.1007/s00705-014-2012-8 . Epub 2014/02/13 PubMed PMID: 24519462.

Deakin G, Dobbs E, Bennett JM, Jones IM, Grogan HM, Burton KS. Multiple viral infections in Agaricus bisporus - characterisation of 18 unique RNA viruses and 8 ORFans identified by deep sequencing. Sci Rep. 2017;7(1):2469. https://doi.org/10.1038/s41598-017-01592-9 . Epub 2017/05/28 PubMed PMID: 28550284; PubMed Central PMCID: PMCPMC5446422.

Litov AG, Zueva AI, Tiunov AV, Van Thinh N, Belyaeva NV, Karganova GG. Virome of three termite species from Southern Vietnam. Viruses. 2022;14(5):860. https://doi.org/10.3390/v14050860 . Epub 2022/05/29 PubMed PMID: 35632601; PubMed Central PMCID: PMCPMC9143207.

Hu J, Neoh KB, Appel AG, Lee CY. Subterranean termite open-air foraging and tolerance to desiccation: Comparative water relation of two sympatric Macrotermes spp. (Blattodea: Termitidae). Comp Biochem Physiol A Mol Integr Physiol. 2012;161(2):201–7. https://doi.org/10.1016/j.cbpa.2011.10.028 . Epub 2011/11/17 PubMed PMID: 22085890.

Kondo H, Botella L, Suzuki N. Mycovirus diversity and evolution revealed/inferred from recent studies. Annu Rev Phytopathol. 2022;60:307–36. https://doi.org/10.1146/annurev-phyto-021621-122122 . Epub 2022/05/25 PubMed PMID: 35609970.

Fukushi T. Relationships between propagative rice viruses and their vectors. 1969.

Google Scholar  

Sun L, Kondo H, Bagus AI. Cross-kingdom virus infection. Encyclopedia of Virology: Volume 1–5. 4th Ed. Elsevier; 2020. pp. 443–9. https://doi.org/10.1016/B978-0-12-809633-8.21320-4 .

Bian R, Andika IB, Pang T, Lian Z, Wei S, Niu E, et al. Facilitative and synergistic interactions between fungal and plant viruses. Proc Natl Acad Sci U S A. 2020;117(7):3779–88. https://doi.org/10.1073/pnas.1915996117 . Epub 2020/02/06 PubMed PMID: 32015104; PubMed Central PMCID: PMCPMC7035501.

Chiapello M, Rodriguez-Romero J, Ayllon MA, Turina M. Analysis of the virome associated to grapevine downy mildew lesions reveals new mycovirus lineages. Virus Evol. 2020;6(2):veaa058. https://doi.org/10.1093/ve/veaa058 . Epub 2020/12/17 PubMed PMID: 33324489; PubMed Central PMCID: PMCPMC7724247.

Sutela S, Forgia M, Vainio EJ, Chiapello M, Daghino S, Vallino M, et al. The virome from a collection of endomycorrhizal fungi reveals new viral taxa with unprecedented genome organization. Virus Evol. 2020;6(2):veaa076. https://doi.org/10.1093/ve/veaa076 . Epub 2020/12/17 PubMed PMID: 33324490; PubMed Central PMCID: PMCPMC7724248.

Zhou K, Zhang F, Deng Y. Comparative analysis of viromes identified in multiple macrofungi. Viruses. 2024;16(4):597. https://doi.org/10.3390/v16040597 . Epub 2024/04/27 PubMed PMID: 38675938; PubMed Central PMCID: PMCPMC11054281.

Siddell SG, Smith DB, Adriaenssens E, Alfenas-Zerbini P, Dutilh BE, Garcia ML, et al. Virus taxonomy and the role of the International Committee on Taxonomy of Viruses (ICTV). J Gen Virol. 2023;104(5):001840. https://doi.org/10.1099/jgv.0.001840 . Epub 2023/05/04 PubMed PMID: 37141106; PubMed Central PMCID: PMCPMC10227694.

Download references

Acknowledgements

All authors participated in the design, interpretation of the studies and analysis of the data and review of the manuscript; WZ and CZ contributed to the conception and design; XL, ZD, JXU, WL and PN contributed to the collection and assembly of data; XL, ZD and JXE contributed to the data analysis and interpretation.

This research was supported by National Key Research and Development Programs of China [No.2023YFD1801301 and 2022YFC2603801] and the National Natural Science Foundation of China [No.82341106].

Author information

Xiang Lu, Ziyuan Dai and Jiaxin Xue are equally contributed to this works.

Authors and Affiliations

Institute of Critical Care Medicine, The Affiliated People’s Hospital, Jiangsu University, Zhenjiang, 212002, China

Xiang Lu & Wen Zhang

Department of Microbiology, School of Medicine, Jiangsu University, Zhenjiang, 212013, China

Xiang Lu, Jiaxin Xue & Wen Zhang

Department of Clinical Laboratory, Affiliated Hospital 6 of Nantong University, Yancheng Third People’s Hospital, Yancheng, Jiangsu, China

Clinical Laboratory Center, The Affiliated Taizhou People’s Hospital of Nanjing Medical University, Taizhou, 225300, China

Wang Li, Ping Ni, Juan Xu, Chenglin Zhou & Wen Zhang

You can also search for this author in PubMed   Google Scholar

Contributions

Corresponding authors.

Correspondence to Juan Xu , Chenglin Zhou or Wen Zhang .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Lu, X., Dai, Z., Xue, J. et al. Discovery of novel RNA viruses through analysis of fungi-associated next-generation sequencing data. BMC Genomics 25 , 517 (2024). https://doi.org/10.1186/s12864-024-10432-w

Download citation

Received : 19 March 2024

Accepted : 20 May 2024

Published : 27 May 2024

DOI : https://doi.org/10.1186/s12864-024-10432-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

BMC Genomics

ISSN: 1471-2164

analytical research techniques

COMMENTS

  1. Analytical Research: What is it, Importance + Examples

    Methods of Conducting Analytical Research. Analytical research is the process of gathering, analyzing, and interpreting information to make inferences and reach conclusions. Depending on the purpose of the research and the data you have access to, you can conduct analytical research using a variety of methods. Here are a few typical approaches:

  2. Research Methods

    Qualitative analysis tends to be quite flexible and relies on the researcher's judgement, so you have to reflect carefully on your choices and assumptions and be careful to avoid research bias. Quantitative analysis methods. Quantitative analysis uses numbers and statistics to understand frequencies, averages and correlations (in descriptive ...

  3. Data Analysis: Types, Methods & Techniques (a Complete List)

    Quantitative data analysis then splits into mathematical analysis and artificial intelligence (AI) analysis. Mathematical types then branch into descriptive, diagnostic, predictive, and prescriptive. Methods falling under mathematical analysis include clustering, classification, forecasting, and optimization.

  4. What is data analysis? Methods, techniques, types & how-to

    Gaining a better understanding of different techniques and methods in quantitative research as well as qualitative insights will give your analyzing efforts a more clearly defined direction, so it's worth taking the time to allow this particular knowledge to sink in. Additionally, you will be able to create a comprehensive analytical report that will skyrocket your analysis.

  5. Quantitative Data Analysis Methods & Techniques 101

    The two "branches" of quantitative analysis. As I mentioned, quantitative analysis is powered by statistical analysis methods.There are two main "branches" of statistical methods that are used - descriptive statistics and inferential statistics.In your research, you might only use descriptive statistics, or you might use a mix of both, depending on what you're trying to figure out.

  6. The Beginner's Guide to Statistical Analysis

    Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.

  7. Introduction to Data Analysis

    According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" (2018, p. 4). Thus, qualitative analysis usually invokes inductive reasoning. Mixed methods research uses methods from both quantitative and qualitative research approaches.

  8. Data analysis

    data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.Data analysis techniques are used to gain useful insights from datasets, which ...

  9. The 7 Most Useful Data Analysis Techniques [2024 Guide]

    Cluster analysis. Time series analysis. Sentiment analysis. The data analysis process. The best tools for data analysis. Key takeaways. The first six methods listed are used for quantitative data, while the last technique applies to qualitative data.

  10. What Is Data Analysis? (With Examples)

    Written by Coursera Staff • Updated on Apr 19, 2024. Data analysis is the practice of working with data to glean useful information, which can then be used to make informed decisions. "It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts," Sherlock ...

  11. Data Analysis Techniques In Research

    Data analysis techniques in research are essential because they allow researchers to derive meaningful insights from data sets to support their hypotheses or research objectives.. Data Analysis Techniques in Research: While various groups, institutions, and professionals may have diverse approaches to data analysis, a universal definition captures its essence.

  12. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if ...

  13. Data Analysis

    Data Analysis. Definition: Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets.

  14. A Complete Overview of Analytics Techniques: Descriptive ...

    The future of businesses is very much dependent on big data. This chapter reflects on the three types of analytics techniques used while discovering, interpreting, and communicating the meaningful patterns and trends in data, i.e., descriptive, predictive, and prescriptive analytics. ... Analytics is a multidimensional discipline which involves ...

  15. Learning to Do Qualitative Data Analysis: A Starting Point

    For many researchers unfamiliar with qualitative research, determining how to conduct qualitative analyses is often quite challenging. Part of this challenge is due to the seemingly limitless approaches that a qualitative researcher might leverage, as well as simply learning to think like a qualitative researcher when analyzing data. From framework analysis (Ritchie & Spencer, 1994) to content ...

  16. Quantitative Research

    Statistical analysis is the most common quantitative research analysis method. It involves using statistical tools and techniques to analyze the numerical data collected during the research process. Statistical analysis can be used to identify patterns, trends, and relationships between variables, and to test hypotheses and theories. Regression ...

  17. Introduction to systematic review and meta-analysis

    It is easy to confuse systematic reviews and meta-analyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A meta-analysis differs from a systematic review in that it uses statistical ...

  18. Data Analytics: What It Is, How It's Used, and 4 Basic Techniques

    Data analytics is the science of drawing insights from sources of raw information. Many of the techniques and process of data analytics have been automated into mechanical processes and algorithms ...

  19. Research Techniques

    Some common methods of research techniques are: Quantitative research: This is a research method that focuses on collecting and analyzing numerical data to establish patterns, relationships, and cause-and-effect relationships. Examples of quantitative research techniques are surveys, experiments, and statistical analysis.

  20. How to use and assess qualitative research methods

    Abstract. This paper aims to provide an overview of the use and assessment of qualitative research methods in the health sciences. Qualitative research can be defined as the study of the nature of phenomena and is especially appropriate for answering questions of why something is (not) observed, assessing complex multi-component interventions ...

  21. Descriptive and Analytical Research: What's the Difference?

    Descriptive research classifies, describes, compares, and measures data. Meanwhile, analytical research focuses on cause and effect. For example, take numbers on the changing trade deficits between the United States and the rest of the world in 2015-2018. This is descriptive research.

  22. Weighing Mixing in a Decision About Priority in Mixed Methods Research

    The Routledge reviewers guide to mixed methods analysis. Routledge. Crossref. Google Scholar. Pedersen M., Overgaard D., Anderson I., Baastrup M. (2018). Mechanisms and drivers of social inequality in phase II cardiac rehabilitation attendance: A convergent mixed methods study. ... Journal of Mixed Methods Research, 14(3), 379-402. https ...

  23. Electrogastrography Measurement Systems and Analysis Methods Used in

    Electrogastrography (EGG) is a non-invasive method with high diagnostic potential for the prevention of gastroenterological pathologies in clinical practice. In this paper, a review of the measurement systems, procedures, and methods of analysis used in electrogastrography is presented. A critical review of historical and current literature is conducted, focusing on electrode placement ...

  24. Application of Data Analytic Techniques and Monte-Carlo ...

    Prediction of well production from unconventional reservoirs is a complex problem even with considerable amounts of data especially due to uncertainties and incomplete understanding of physics. Data analytic techniques (DAT) with machine learning algorithms are an effective approach to enhance solution reliability for robust forward recovery forecasting from unconventional resources. However ...

  25. Discovery of novel RNA viruses through analysis of fungi-associated

    Background Like all other species, fungi are susceptible to infection by viruses. The diversity of fungal viruses has been rapidly expanding in recent years due to the availability of advanced sequencing technologies. However, compared to other virome studies, the research on fungi-associated viruses remains limited. Results In this study, we downloaded and analyzed over 200 public datasets ...

  26. Analytical Procedure for Calculating Impulsive Responses on Floor

    An assessment of analytical methods for the vibration analysis of reinforced concrete (RC) precast one-way joist slab floor systems under human walking is presented.

  27. Evaluation of Bitemark Analysis's Potential Application in Forensic

    Bitemark analysis maintains significant importance in forensic odontology, as it can wield substantial influence, whether within a legal framework or in evaluating the well-being of children considered to be at risk [].Bitemarks act as impressions created by teeth on the skin or other flexible surfaces [2,3].Bitemark analysis involves examining both the patterned injury and the surrounding ...

  28. Analysis of the retraction papers in oncology field from Chi

    r and annual distribution, article types, reasons for retraction, retraction time delay, publishers, and journal characteristics of the retracted papers were analyzed. Results: A total of 2695 oncology papers from Chinese scholars published from 2013 to 2022 had been retracted. The majority of these papers were published from 2017 to 2020. In terms of article type, 2538 of the retracted papers ...

  29. Adding an Aquatic Prey Fish Module within the Everglades Vulnerability

    Ridge and slough landscape within the Everglades. Image credit: Everglades National Park | Flickr. Methodology for Addressing the Issue: We will use Bayesian networks to build a spatially explicit EVA module based on current knowledge and existing data on fish density and biomass trends on the landscape.Everglades fish experts will be consulted to determine the best model parameterizations and ...