Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on 4 November 2022 by Pritha Bhandari . Revised on 9 January 2023.

Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population .

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalisable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, frequently asked questions.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarise the frequency of every possible value of a variable in numbers or percentages.

  • Simple frequency distribution table
  • Grouped frequency distribution table

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Measures of central tendency estimate the center, or average, of a data set. The mean , median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then, the median is the number in the middle. If there are two numbers in the middle, find their mean.

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

If you were to only consider the mean as a measure of central tendency, your impression of the ‘middle’ of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to extreme values, you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read ‘across’ the table to see how the independent and dependent variables relate to each other.

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables. It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

Descriptive statistics summarise the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalisable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarise only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2023, January 09). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved 14 May 2024, from https://www.scribbr.co.uk/stats/descriptive-statistics-explained/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, data collection methods | step-by-step guide & examples, variability | calculating range, iqr, variance, standard deviation, normal distribution | examples, formulas, & uses.

Grad Coach

Quant Analysis 101: Descriptive Statistics

Everything You Need To Get Started (With Examples)

By: Derek Jansen (MBA) | Reviewers: Kerryn Warren (PhD) | October 2023

If you’re new to quantitative data analysis , one of the first terms you’re likely to hear being thrown around is descriptive statistics. In this post, we’ll unpack the basics of descriptive statistics, using straightforward language and loads of examples . So grab a cup of coffee and let’s crunch some numbers!

Overview: Descriptive Statistics

What are descriptive statistics.

  • Descriptive vs inferential statistics
  • Why the descriptives matter
  • The “ Big 7 ” descriptive statistics
  • Key takeaways

At the simplest level, descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset – for example, a set of survey responses. They provide a snapshot of the characteristics of your dataset and allow you to better understand, roughly, how the data are “shaped” (more on this later). For example, a descriptive statistic could include the proportion of males and females within a sample or the percentages of different age groups within a population.

Another common descriptive statistic is the humble average (which in statistics-talk is called the mean ). For example, if you undertook a survey and asked people to rate their satisfaction with a particular product on a scale of 1 to 10, you could then calculate the average rating. This is a very basic statistic, but as you can see, it gives you some idea of how this data point is shaped .

Descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset, including its “shape”

What about inferential statistics?

Now, you may have also heard the term inferential statistics being thrown around, and you’re probably wondering how that’s different from descriptive statistics. Simply put, descriptive statistics describe and summarise the sample itself , while inferential statistics use the data from a sample to make inferences or predictions about a population .

Put another way, descriptive statistics help you understand your dataset , while inferential statistics help you make broader statements about the population , based on what you observe within the sample. If you’re keen to learn more, we cover inferential stats in another post , or you can check out the explainer video below.

Why do descriptive statistics matter?

While descriptive statistics are relatively simple from a mathematical perspective, they play a very important role in any research project . All too often, students skim over the descriptives and run ahead to the seemingly more exciting inferential statistics, but this can be a costly mistake.

The reason for this is that descriptive statistics help you, as the researcher, comprehend the key characteristics of your sample without getting lost in vast amounts of raw data. In doing so, they provide a foundation for your quantitative analysis . Additionally, they enable you to quickly identify potential issues within your dataset – for example, suspicious outliers, missing responses and so on. Just as importantly, descriptive statistics inform the decision-making process when it comes to choosing which inferential statistics you’ll run, as each inferential test has specific requirements regarding the shape of the data.

Long story short, it’s essential that you take the time to dig into your descriptive statistics before looking at more “advanced” inferentials. It’s also worth noting that, depending on your research aims and questions, descriptive stats may be all that you need in any case . So, don’t discount the descriptives! 

Free Webinar: Research Methodology 101

The “Big 7” descriptive statistics

With the what and why out of the way, let’s take a look at the most common descriptive statistics. Beyond the counts, proportions and percentages we mentioned earlier, we have what we call the “Big 7” descriptives. These can be divided into two categories – measures of central tendency and measures of dispersion.

Measures of central tendency

True to the name, measures of central tendency describe the centre or “middle section” of a dataset. In other words, they provide some indication of what a “typical” data point looks like within a given dataset. The three most common measures are:

The mean , which is the mathematical average of a set of numbers – in other words, the sum of all numbers divided by the count of all numbers. 
The median , which is the middlemost number in a set of numbers, when those numbers are ordered from lowest to highest.
The mode , which is the most frequently occurring number in a set of numbers (in any order). Naturally, a dataset can have one mode, no mode (no number occurs more than once) or multiple modes.

To make this a little more tangible, let’s look at a sample dataset, along with the corresponding mean, median and mode. This dataset reflects the service ratings (on a scale of 1 – 10) from 15 customers.

Example set of descriptive stats

As you can see, the mean of 5.8 is the average rating across all 15 customers. Meanwhile, 6 is the median . In other words, if you were to list all the responses in order from low to high, Customer 8 would be in the middle (with their service rating being 6). Lastly, the number 5 is the most frequent rating (appearing 3 times), making it the mode.

Together, these three descriptive statistics give us a quick overview of how these customers feel about the service levels at this business. In other words, most customers feel rather lukewarm and there’s certainly room for improvement. From a more statistical perspective, this also means that the data tend to cluster around the 5-6 mark , since the mean and the median are fairly close to each other.

To take this a step further, let’s look at the frequency distribution of the responses . In other words, let’s count how many times each rating was received, and then plot these counts onto a bar chart.

Example frequency distribution of descriptive stats

As you can see, the responses tend to cluster toward the centre of the chart , creating something of a bell-shaped curve. In statistical terms, this is called a normal distribution .

As you delve into quantitative data analysis, you’ll find that normal distributions are very common , but they’re certainly not the only type of distribution. In some cases, the data can lean toward the left or the right of the chart (i.e., toward the low end or high end). This lean is reflected by a measure called skewness , and it’s important to pay attention to this when you’re analysing your data, as this will have an impact on what types of inferential statistics you can use on your dataset.

Example of skewness

Measures of dispersion

While the measures of central tendency provide insight into how “centred” the dataset is, it’s also important to understand how dispersed that dataset is . In other words, to what extent the data cluster toward the centre – specifically, the mean. In some cases, the majority of the data points will sit very close to the centre, while in other cases, they’ll be scattered all over the place. Enter the measures of dispersion, of which there are three:

Range , which measures the difference between the largest and smallest number in the dataset. In other words, it indicates how spread out the dataset really is.

Variance , which measures how much each number in a dataset varies from the mean (average). More technically, it calculates the average of the squared differences between each number and the mean. A higher variance indicates that the data points are more spread out , while a lower variance suggests that the data points are closer to the mean.

Standard deviation , which is the square root of the variance . It serves the same purposes as the variance, but is a bit easier to interpret as it presents a figure that is in the same unit as the original data . You’ll typically present this statistic alongside the means when describing the data in your research.

Again, let’s look at our sample dataset to make this all a little more tangible.

research methodology descriptive statistics

As you can see, the range of 8 reflects the difference between the highest rating (10) and the lowest rating (2). The standard deviation of 2.18 tells us that on average, results within the dataset are 2.18 away from the mean (of 5.8), reflecting a relatively dispersed set of data .

For the sake of comparison, let’s look at another much more tightly grouped (less dispersed) dataset.

Example of skewed data

As you can see, all the ratings lay between 5 and 8 in this dataset, resulting in a much smaller range, variance and standard deviation . You might also notice that the data are clustered toward the right side of the graph – in other words, the data are skewed. If we calculate the skewness for this dataset, we get a result of -0.12, confirming this right lean.

In summary, range, variance and standard deviation all provide an indication of how dispersed the data are . These measures are important because they help you interpret the measures of central tendency within context . In other words, if your measures of dispersion are all fairly high numbers, you need to interpret your measures of central tendency with some caution , as the results are not particularly centred. Conversely, if the data are all tightly grouped around the mean (i.e., low dispersion), the mean becomes a much more “meaningful” statistic).

Key Takeaways

We’ve covered quite a bit of ground in this post. Here are the key takeaways:

  • Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis.
  • Measures of central tendency include the mean (average), median and mode.
  • Skewness indicates whether a dataset leans to one side or another
  • Measures of dispersion include the range, variance and standard deviation

If you’d like hands-on help with your descriptive statistics (or any other aspect of your research project), check out our private coaching service , where we hold your hand through each step of the research journey. 

Literature Review Course

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

You Might Also Like:

Inferential stats 101

Good day. May I ask about where I would be able to find the statistics cheat sheet?

Khan

Right above you comment 🙂

Laarbik Patience

Good job. you saved me

Lou

Brilliant and well explained. So much information explained clearly!

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Privacy Policy

Research Method

Home » Descriptive Research Design – Types, Methods and Examples

Descriptive Research Design – Types, Methods and Examples

Table of Contents

Descriptive Research Design

Descriptive Research Design

Definition:

Descriptive research design is a type of research methodology that aims to describe or document the characteristics, behaviors, attitudes, opinions, or perceptions of a group or population being studied.

Descriptive research design does not attempt to establish cause-and-effect relationships between variables or make predictions about future outcomes. Instead, it focuses on providing a detailed and accurate representation of the data collected, which can be useful for generating hypotheses, exploring trends, and identifying patterns in the data.

Types of Descriptive Research Design

Types of Descriptive Research Design are as follows:

Cross-sectional Study

This involves collecting data at a single point in time from a sample or population to describe their characteristics or behaviors. For example, a researcher may conduct a cross-sectional study to investigate the prevalence of certain health conditions among a population, or to describe the attitudes and beliefs of a particular group.

Longitudinal Study

This involves collecting data over an extended period of time, often through repeated observations or surveys of the same group or population. Longitudinal studies can be used to track changes in attitudes, behaviors, or outcomes over time, or to investigate the effects of interventions or treatments.

This involves an in-depth examination of a single individual, group, or situation to gain a detailed understanding of its characteristics or dynamics. Case studies are often used in psychology, sociology, and business to explore complex phenomena or to generate hypotheses for further research.

Survey Research

This involves collecting data from a sample or population through standardized questionnaires or interviews. Surveys can be used to describe attitudes, opinions, behaviors, or demographic characteristics of a group, and can be conducted in person, by phone, or online.

Observational Research

This involves observing and documenting the behavior or interactions of individuals or groups in a natural or controlled setting. Observational studies can be used to describe social, cultural, or environmental phenomena, or to investigate the effects of interventions or treatments.

Correlational Research

This involves examining the relationships between two or more variables to describe their patterns or associations. Correlational studies can be used to identify potential causal relationships or to explore the strength and direction of relationships between variables.

Data Analysis Methods

Descriptive research design data analysis methods depend on the type of data collected and the research question being addressed. Here are some common methods of data analysis for descriptive research:

Descriptive Statistics

This method involves analyzing data to summarize and describe the key features of a sample or population. Descriptive statistics can include measures of central tendency (e.g., mean, median, mode) and measures of variability (e.g., range, standard deviation).

Cross-tabulation

This method involves analyzing data by creating a table that shows the frequency of two or more variables together. Cross-tabulation can help identify patterns or relationships between variables.

Content Analysis

This method involves analyzing qualitative data (e.g., text, images, audio) to identify themes, patterns, or trends. Content analysis can be used to describe the characteristics of a sample or population, or to identify factors that influence attitudes or behaviors.

Qualitative Coding

This method involves analyzing qualitative data by assigning codes to segments of data based on their meaning or content. Qualitative coding can be used to identify common themes, patterns, or categories within the data.

Visualization

This method involves creating graphs or charts to represent data visually. Visualization can help identify patterns or relationships between variables and make it easier to communicate findings to others.

Comparative Analysis

This method involves comparing data across different groups or time periods to identify similarities and differences. Comparative analysis can help describe changes in attitudes or behaviors over time or differences between subgroups within a population.

Applications of Descriptive Research Design

Descriptive research design has numerous applications in various fields. Some of the common applications of descriptive research design are:

  • Market research: Descriptive research design is widely used in market research to understand consumer preferences, behavior, and attitudes. This helps companies to develop new products and services, improve marketing strategies, and increase customer satisfaction.
  • Health research: Descriptive research design is used in health research to describe the prevalence and distribution of a disease or health condition in a population. This helps healthcare providers to develop prevention and treatment strategies.
  • Educational research: Descriptive research design is used in educational research to describe the performance of students, schools, or educational programs. This helps educators to improve teaching methods and develop effective educational programs.
  • Social science research: Descriptive research design is used in social science research to describe social phenomena such as cultural norms, values, and beliefs. This helps researchers to understand social behavior and develop effective policies.
  • Public opinion research: Descriptive research design is used in public opinion research to understand the opinions and attitudes of the general public on various issues. This helps policymakers to develop effective policies that are aligned with public opinion.
  • Environmental research: Descriptive research design is used in environmental research to describe the environmental conditions of a particular region or ecosystem. This helps policymakers and environmentalists to develop effective conservation and preservation strategies.

Descriptive Research Design Examples

Here are some real-time examples of descriptive research designs:

  • A restaurant chain wants to understand the demographics and attitudes of its customers. They conduct a survey asking customers about their age, gender, income, frequency of visits, favorite menu items, and overall satisfaction. The survey data is analyzed using descriptive statistics and cross-tabulation to describe the characteristics of their customer base.
  • A medical researcher wants to describe the prevalence and risk factors of a particular disease in a population. They conduct a cross-sectional study in which they collect data from a sample of individuals using a standardized questionnaire. The data is analyzed using descriptive statistics and cross-tabulation to identify patterns in the prevalence and risk factors of the disease.
  • An education researcher wants to describe the learning outcomes of students in a particular school district. They collect test scores from a representative sample of students in the district and use descriptive statistics to calculate the mean, median, and standard deviation of the scores. They also create visualizations such as histograms and box plots to show the distribution of scores.
  • A marketing team wants to understand the attitudes and behaviors of consumers towards a new product. They conduct a series of focus groups and use qualitative coding to identify common themes and patterns in the data. They also create visualizations such as word clouds to show the most frequently mentioned topics.
  • An environmental scientist wants to describe the biodiversity of a particular ecosystem. They conduct an observational study in which they collect data on the species and abundance of plants and animals in the ecosystem. The data is analyzed using descriptive statistics to describe the diversity and richness of the ecosystem.

How to Conduct Descriptive Research Design

To conduct a descriptive research design, you can follow these general steps:

  • Define your research question: Clearly define the research question or problem that you want to address. Your research question should be specific and focused to guide your data collection and analysis.
  • Choose your research method: Select the most appropriate research method for your research question. As discussed earlier, common research methods for descriptive research include surveys, case studies, observational studies, cross-sectional studies, and longitudinal studies.
  • Design your study: Plan the details of your study, including the sampling strategy, data collection methods, and data analysis plan. Determine the sample size and sampling method, decide on the data collection tools (such as questionnaires, interviews, or observations), and outline your data analysis plan.
  • Collect data: Collect data from your sample or population using the data collection tools you have chosen. Ensure that you follow ethical guidelines for research and obtain informed consent from participants.
  • Analyze data: Use appropriate statistical or qualitative analysis methods to analyze your data. As discussed earlier, common data analysis methods for descriptive research include descriptive statistics, cross-tabulation, content analysis, qualitative coding, visualization, and comparative analysis.
  • I nterpret results: Interpret your findings in light of your research question and objectives. Identify patterns, trends, and relationships in the data, and describe the characteristics of your sample or population.
  • Draw conclusions and report results: Draw conclusions based on your analysis and interpretation of the data. Report your results in a clear and concise manner, using appropriate tables, graphs, or figures to present your findings. Ensure that your report follows accepted research standards and guidelines.

When to Use Descriptive Research Design

Descriptive research design is used in situations where the researcher wants to describe a population or phenomenon in detail. It is used to gather information about the current status or condition of a group or phenomenon without making any causal inferences. Descriptive research design is useful in the following situations:

  • Exploratory research: Descriptive research design is often used in exploratory research to gain an initial understanding of a phenomenon or population.
  • Identifying trends: Descriptive research design can be used to identify trends or patterns in a population, such as changes in consumer behavior or attitudes over time.
  • Market research: Descriptive research design is commonly used in market research to understand consumer preferences, behavior, and attitudes.
  • Health research: Descriptive research design is useful in health research to describe the prevalence and distribution of a disease or health condition in a population.
  • Social science research: Descriptive research design is used in social science research to describe social phenomena such as cultural norms, values, and beliefs.
  • Educational research: Descriptive research design is used in educational research to describe the performance of students, schools, or educational programs.

Purpose of Descriptive Research Design

The main purpose of descriptive research design is to describe and measure the characteristics of a population or phenomenon in a systematic and objective manner. It involves collecting data that describe the current status or condition of the population or phenomenon of interest, without manipulating or altering any variables.

The purpose of descriptive research design can be summarized as follows:

  • To provide an accurate description of a population or phenomenon: Descriptive research design aims to provide a comprehensive and accurate description of a population or phenomenon of interest. This can help researchers to develop a better understanding of the characteristics of the population or phenomenon.
  • To identify trends and patterns: Descriptive research design can help researchers to identify trends and patterns in the data, such as changes in behavior or attitudes over time. This can be useful for making predictions and developing strategies.
  • To generate hypotheses: Descriptive research design can be used to generate hypotheses or research questions that can be tested in future studies. For example, if a descriptive study finds a correlation between two variables, this could lead to the development of a hypothesis about the causal relationship between the variables.
  • To establish a baseline: Descriptive research design can establish a baseline or starting point for future research. This can be useful for comparing data from different time periods or populations.

Characteristics of Descriptive Research Design

Descriptive research design has several key characteristics that distinguish it from other research designs. Some of the main characteristics of descriptive research design are:

  • Objective : Descriptive research design is objective in nature, which means that it focuses on collecting factual and accurate data without any personal bias. The researcher aims to report the data objectively without any personal interpretation.
  • Non-experimental: Descriptive research design is non-experimental, which means that the researcher does not manipulate any variables. The researcher simply observes and records the behavior or characteristics of the population or phenomenon of interest.
  • Quantitative : Descriptive research design is quantitative in nature, which means that it involves collecting numerical data that can be analyzed using statistical techniques. This helps to provide a more precise and accurate description of the population or phenomenon.
  • Cross-sectional: Descriptive research design is often cross-sectional, which means that the data is collected at a single point in time. This can be useful for understanding the current state of the population or phenomenon, but it may not provide information about changes over time.
  • Large sample size: Descriptive research design typically involves a large sample size, which helps to ensure that the data is representative of the population of interest. A large sample size also helps to increase the reliability and validity of the data.
  • Systematic and structured: Descriptive research design involves a systematic and structured approach to data collection, which helps to ensure that the data is accurate and reliable. This involves using standardized procedures for data collection, such as surveys, questionnaires, or observation checklists.

Advantages of Descriptive Research Design

Descriptive research design has several advantages that make it a popular choice for researchers. Some of the main advantages of descriptive research design are:

  • Provides an accurate description: Descriptive research design is focused on accurately describing the characteristics of a population or phenomenon. This can help researchers to develop a better understanding of the subject of interest.
  • Easy to conduct: Descriptive research design is relatively easy to conduct and requires minimal resources compared to other research designs. It can be conducted quickly and efficiently, and data can be collected through surveys, questionnaires, or observations.
  • Useful for generating hypotheses: Descriptive research design can be used to generate hypotheses or research questions that can be tested in future studies. For example, if a descriptive study finds a correlation between two variables, this could lead to the development of a hypothesis about the causal relationship between the variables.
  • Large sample size : Descriptive research design typically involves a large sample size, which helps to ensure that the data is representative of the population of interest. A large sample size also helps to increase the reliability and validity of the data.
  • Can be used to monitor changes : Descriptive research design can be used to monitor changes over time in a population or phenomenon. This can be useful for identifying trends and patterns, and for making predictions about future behavior or attitudes.
  • Can be used in a variety of fields : Descriptive research design can be used in a variety of fields, including social sciences, healthcare, business, and education.

Limitation of Descriptive Research Design

Descriptive research design also has some limitations that researchers should consider before using this design. Some of the main limitations of descriptive research design are:

  • Cannot establish cause and effect: Descriptive research design cannot establish cause and effect relationships between variables. It only provides a description of the characteristics of the population or phenomenon of interest.
  • Limited generalizability: The results of a descriptive study may not be generalizable to other populations or situations. This is because descriptive research design often involves a specific sample or situation, which may not be representative of the broader population.
  • Potential for bias: Descriptive research design can be subject to bias, particularly if the researcher is not objective in their data collection or interpretation. This can lead to inaccurate or incomplete descriptions of the population or phenomenon of interest.
  • Limited depth: Descriptive research design may provide a superficial description of the population or phenomenon of interest. It does not delve into the underlying causes or mechanisms behind the observed behavior or characteristics.
  • Limited utility for theory development: Descriptive research design may not be useful for developing theories about the relationship between variables. It only provides a description of the variables themselves.
  • Relies on self-report data: Descriptive research design often relies on self-report data, such as surveys or questionnaires. This type of data may be subject to biases, such as social desirability bias or recall bias.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Questionnaire

Questionnaire – Definition, Types, and Examples

Case Study Research

Case Study – Methods, Examples and Guide

Observational Research

Observational Research – Methods and Guide

Quantitative Research

Quantitative Research – Methods, Types and...

Qualitative Research Methods

Qualitative Research Methods

Explanatory Research

Explanatory Research – Types, Methods, Guide

Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

Brief introduction to this section that descibes Open Access especially from an IntechOpen perspective

Want to get in touch? Contact our London head office or media team here

Our team is growing all the time, so we’re always on the lookout for smart people who want to help us reshape the world of scientific publishing.

Home > Books > Recent Advances in Biostatistics [Working Title]

Introduction to Descriptive Statistics

Submitted: 04 July 2023 Reviewed: 20 July 2023 Published: 07 September 2023

DOI: 10.5772/intechopen.1002475

Cite this chapter

There are two ways to cite this chapter:

From the Edited Volume

Recent Advances in Biostatistics [Working Title]

B. Santhosh Kumar

Chapter metrics overview

180 Chapter Downloads

Impact of this chapter

Total Chapter Downloads on intechopen.com

This chapter offers a comprehensive exploration of descriptive statistics, tracing its historical development from Condorcet’s “average” concept to Galton and Pearson’s contributions. Emphasizing its pivotal role in academia, descriptive statistics serve as a fundamental tool for summarizing and analyzing data across disciplines. The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques. It delves into their historical context, highlighting their organizational and presentational significance. Furthermore, the chapter accentuates the advantages of descriptive statistics in academia, including their ability to succinctly represent complex data, aid decision-making, and enhance research communication. It highlights the potency of visualization in discerning data patterns and explores emerging trends like large dataset analysis, Bayesian statistics, and nonparametric methods. Sources of variance intrinsic to descriptive statistics, such as sampling fluctuations, measurement errors, and outliers, are discussed, stressing the importance of considering these factors in data interpretation.

  • academic research
  • data analysis
  • data visualization
  • decision-making
  • research methodology
  • data summarization

Author Information

Olubunmi alabi *.

  • African University of Science and Technology, Abuja, Nigeria

Tosin Bukola

  • University of Greenwich, London, United Kingdom

*Address all correspondence to: [email protected]

1. Introduction

The French mathematician and philosopher Condorcet established the idea of the “average” as a means to summarize data, which is when descriptive statistics got their start. Yet, the widespread use of descriptive statistics in academic study did not start until the 19th century. Francis Galton, who was concerned in the examination of human features and attributes, was one of the major forerunners of descriptive statistics. Galton created various statistical methods that are still frequently applied in academic research today, such as the correlation and regression analysis concepts. The American statistician and mathematician in the early 20th century Karl Pearson created the “normal distribution,” which is a bell-shaped curve that characterizes the distribution of many natural occurrences. Moreover, Pearson created a number of correlational measures and popularized the chi-square test, which evaluates the significance of variations between observed and predicted frequencies. With the advent of new methods like multivariate analysis and factor analysis in the middle of the 20th century, the development of electronic computers sparked a revolution in statistical analysis. Descriptive statistics is the analysis and summarization of data to gain insights into its characteristics and distribution [ 1 ].

Descriptive statistics help researchers generate study ideas and guide further analysis by allowing them to explore data patterns and trends [ 2 ]. Descriptive statistics were used more often in academic research because they helped researchers better comprehend their datasets and served as a basis for more sophisticated statistical techniques. Similarly, Descriptive statistics are used to summarize and analyze data in a variety of academic areas, including psychology, sociology, economics, education, and epidemiology [ 3 ]. Descriptive statistics continue to be a crucial research tool in academia today, giving researchers a method to compile and analyze data from many fields. It is now simpler than ever to analyze and understand data, enabling researchers to make better informed judgments based on their results. This is due to the development of new statistical techniques and computer tools. Descriptive statistics can benefit researchers in hypothesis creation and exploratory analysis by identifying trends, patterns, and correlations between variables in huge datasets [ 4 ]. Descriptive statistics are important in data-driven decision-making processes because they allow stakeholders to make educated decisions based on reliable data [ 5 ].

2. Background

The history of descriptive statistics may be traced back to the 17th century, when early pioneers like John Graunt and William Petty laid the groundwork for statistical analysis [ 6 ]. Descriptive statistics is a fundamental concept in academia that is widely used across many disciplines, including social sciences, economics, medicine, engineering, and business. Descriptive statistics provides a comprehensive background for understanding data by organizing, summarizing, and presenting information effectively [ 7 ]. In academia, descriptive statistics is used to summarize and analyze data, providing insights into the patterns, trends, and characteristics of a dataset. Similarly, in academic research, descriptive statistics are often used as a preliminary analysis technique to gain a better understanding of the dataset before applying more complex statistical methods. Descriptive statistics lay the groundwork for inferential statistics by assisting researchers in drawing inferences about a population based on observed sample data [ 8 ]. Descriptive statistics aid in the identification and analysis of outliers, which can give useful insights into unusual observations or data collecting problems [ 9 ].

Descriptive statistics enable researchers to synthesize both quantitative and qualitative data, allowing for a thorough examination of factors [ 10 ]. Descriptive statistics can provide valuable information about the central tendency, variability, and distribution of the data, allowing researchers to make informed decisions about the appropriate statistical techniques to use. Descriptive statistics are an essential component of survey research technique, allowing researchers to efficiently summarize and display survey results [ 11 ]. Descriptive statistics may be used to summarize data as well as spot outliers, or observations that dramatically depart from the trend of the data as a whole. Finding outliers can help researchers spot any issues or abnormalities in the data so they can make the necessary modifications or repairs. In academic research, descriptive statistics are frequently employed to address research issues and evaluate hypotheses. Descriptive statistics, for instance, can be used to compare the average scores of two groups to see if there is a significant difference between them. In order to create new hypotheses or validate preexisting ideas, descriptive statistics may also be used to find patterns and correlations in the data.

There are several sources of variation that can affect the descriptive statistics of a data set, some of which include: Sampling Variation, descriptive statistics are often calculated using a sample of data rather than the entire population. Therefore, the descriptive statistics can vary depending on the particular sample that is selected. This is known as sampling variation. Measurement Variation, different measurement methods can produce different results, leading to variation in descriptive statistics. For example, if a scale is used to measure the weight of objects, slight differences in how the scale is used can produce slightly different measurements.

Data entry errors are mistakes made during the data entry process which can lead to variation in descriptive statistics. Even small errors, such as transposing two digits, can significantly impact the results. Outliers, Outliers are extreme values that fall outside of the expected range of values. These values can skew the descriptive statistics, making them appear more or less extreme than they actually are. Natural Variation, Natural variation refers to the inherent variability in the data itself. For example, if a data set contains measurements of the heights of trees, there will naturally be variation in the heights of the trees. It is important to understand these sources of variation when interpreting and using descriptive statistics in academia. Properly accounting for these sources of variation can help ensure that the descriptive statistics accurately reflect the underlying data.

Some emerging patterns in descriptive statistics in academia include: Big data analysis, with the increasing availability of large data sets, researchers are using descriptive statistics to identify patterns and trends in the data. The use of big data analysis techniques, such as machine learning and data mining, is becoming more common in academic research. Visualization techniques, advances in data visualization techniques are enabling researchers to more easily identify patterns in data sets. For example, heat maps and scatter plots can be used to visualize the relationship between different variables. Bayesian statistics is an emerging area of research in academia, which involves using probability theory to make inferences about data. Bayesian statistics can provide more accurate estimates of descriptive statistics, particularly when dealing with complex data sets.

Non-parametric statistics are becoming increasingly popular in academia, particularly when dealing with data sets that do not meet the assumptions of traditional parametric statistical tests. Non-parametric tests do not require the data to be normally distributed, and can be more robust to outliers. Open science practices, such as pre-registration and data sharing, are becoming more common in academia. This is enabling researchers to more easily replicate and verify the results of descriptive statistical analyses, which can improve the quality and reliability of research findings. Overall, the emerging patterns in descriptive statistics in academia reflect the increasing availability of data, the need for more accurate and robust statistical techniques, and a growing emphasis on transparency and openness in research practices.

3. Benefits of descriptive statistics

The advantages of descriptive statistics extend beyond research and academia, with applications in commercial decision-making, public policy, and strategic planning [ 12 ]. The benefits of descriptive statistics include providing a clear and concise summary of data, aiding in decision-making processes, and facilitating effective communication of findings [ 13 ]. Descriptive statistics provide numerous benefits to academia, some of which include: Summarization of Data: descriptive statistics allow researchers to quickly and efficiently summarize large data sets, providing a snapshot of the key characteristics of the data. This can help researchers identify patterns and trends in the data, and can also help to simplify complex data sets. Better decision making: descriptive statistics can help researchers make data-driven decisions. For example, if a researcher is comparing the effectiveness of two different treatments, descriptive statistics can be used to identify which treatment is more effective based on the data. Visualization of data: descriptive statistics can be used to create visualizations of data, which can make it easier to communicate research findings to others.

Histograms, bar charts, and scatterplots are examples of data visualization techniques that may be used to graphically depict data in order to detect trends, outliers, and correlations [ 14 ]. Visualizations can also help to identify patterns and trends in the data that might not be immediately apparent from raw data. Hypothesis Testing: descriptive statistics are often used in hypothesis testing, which allows researchers to determine whether a particular hypothesis about a data set is supported by the data. This can help to validate research findings and increase confidence in the conclusions drawn from the data. Improved data quality: Descriptive statistics can help to identify errors or inconsistencies in the data, which can help researchers improve the quality of the data. This can lead to more accurate research findings and a better understanding of the underlying phenomena. Overall, the benefits of descriptive statistics in academia are many and varied. They help researchers summarize large data sets, make data-driven decisions, visualize data, validate research findings, and improve the quality of the data. By using descriptive statistics, researchers can gain valuable insights into complex data sets and make more informed decisions based on the data.

4. Practical applications of descriptive statistics

Descriptive statistics has practical applications in disciplines such as business, social sciences, healthcare, finance, and market research [ 15 ]. Descriptive statistics have a wide range of practical applications in academia, some of which include: Data Summarization: Descriptive statistics can be used to summarize large data sets, making it easier for researchers to understand the key characteristics of the data. This is particularly useful when dealing with complex data sets that contain many variables. Hypothesis Testing: Descriptive statistics can be used to test hypotheses about a data set. For example, researchers can use descriptive statistics to test whether the mean value of a particular variable is significantly different from a hypothesized value. Data visualization: descriptive statistics can be used to create visualizations of data, which can make it easier to identify patterns and trends in the data. For example, a histogram or boxplot can be used to visualize the distribution of a variable. Comparing Groups: Descriptive statistics can be used to compare different groups within a data set. For example, researchers may compare the mean values of a particular variable between different demographic groups, such as age or gender. Predictive modeling: Descriptive statistics can be used to build predictive models, which can be used to forecast future trends or outcomes. For example, a researcher might use descriptive statistics to identify the key variables that predict student performance in a particular course. The practical applications of descriptive statistics in academia are wide-ranging and varied. They can be used in many different fields, including psychology, economics, sociology, and biology, among others, to provide insights into complex data sets and help researchers make data-driven decisions ( Figure 1 ).

research methodology descriptive statistics

Types of descriptive statistics. Ref: https://www.analyticssteps.com/blogs/types-descriptive-analysis-examples-steps .

Descriptive statistics is a useful tool for researchers in a variety of sectors since it allows them express the major characteristics of a dataset, such as its frequency, central tendency, variability, and distribution.

4.1 Central tendency measurements

Central tendency metrics, such as mean, median, and mode, are essential descriptive statistics that offer information about the average or typical value in a collection [ 16 ]. One of the primary purposes of descriptive statistics is to summarize data in a succinct and useful manner. Measures of central tendency, such as the median, are resistant to outliers and offer a more representative assessment of the average value in a skewed distribution [ 17 ]. The mean, median, and mode are measures of central tendency that are used to characterize the usual or center value of a dataset. The mean of a dataset is the arithmetic average, but the median is the midway number when the data is ordered in order of magnitude. The mode is the most often occurring value in the collection. Central tendency measurements are one of the most important aspects of descriptive statistics, as they provide a summary of the “typical” value of a data set.

The three most commonly used measures of central tendency are: Mean: the mean is calculated by adding up all the values in a data set and dividing by the total number of values. The mean is sensitive to outliers, as even one extreme value can greatly affect the mean. Median: the median is the middle value in a data set when the values are ordered from smallest to largest. If the data set has an odd number of values, the median is the middle value. If the data set has an even number of values, the median is the average of the two middle values. The median is more robust to outliers than the mean. Mode: the mode is the most common value in a data set. In some cases, there may be multiple modes (i.e. bimodal or multimodal distributions). The mode is useful for identifying the most frequently occurring value in a data set. Each of these measures of central tendency provides a different perspective on the “typical” value of a data set, and which measure is most appropriate to use depends on the nature of the data and the research question being addressed. For example, if the data set contains extreme outliers, the median may be a better measure of central tendency than the mean. Conversely, if the data set is symmetrical and normally distributed, the mean may provide the best measure of central tendency.

4.2 Variability indices

It is another key part of descriptive statistics is determining data variability. The spread or dispersion of data points about the central tendency readings is quantified by variability indices such as range, variance, and standard deviation [ 18 ]. Variability measures, such as range, variance, and standard deviation, reveal information about the spread or dispersion of the data. Variability indices, such as the coefficient of variation, allow you to compare variability across various datasets with different scales or units of measurement [ 19 ]. The range is the distance between the dataset’s greatest and lowest values, and the variance and standard deviation are measures of how much the data values depart from the mean. Variability indices are measures used in descriptive statistics to provide information about how much the data varies or how spread out it is. Variability indices, such as the interquartile range, give insights into data distribution while being less impacted by extreme values than the standard deviation [ 20 ]. Some commonly used variability indices include:

Range: The range is the difference between the largest and smallest values in a data set. It provides a simple measure of the spread of the data, but is sensitive to outliers. Interquartile Range (IQR): The IQR is the range of the middle 50% of the data. It is calculated by subtracting the 25th percentile (lower quartile) from the 75th percentile (upper quartile). The IQR is more robust to outliers than the range. Variance: The variance is a measure of how spread out the data is around the mean. It is calculated by taking the average of the squared differences between each data point and the mean. The variance is sensitive to outliers. Standard Deviation: The standard deviation is the square root of the variance. It provides a measure of how much the data varies from the mean, and is more commonly used than the variance because it has the same units as the original data.

Coefficient of Variation (CV): The CV is a measure of relative variability, expressed as a percentage. It is calculated by dividing the standard deviation by the mean and multiplying by 100. The CV is useful for comparing variability across different data sets that have different units or scales. These variability indices provide important information about the spread and variability of the data, which can help researchers better understand the characteristics of the data and draw meaningful conclusions from it.

4.3 Data visualization

Data may be visually represented using graphical approaches in addition to numerical metrics. Graphs and charts, such as histograms, box plots, and scatterplots, allow researchers investigate data patterns and correlations. Box plots and violin plots are efficient data visualization approaches for showing data distribution and spotting potential outliers [ 21 ]. They may also be used to detect outliers, or data points that deviate dramatically from the rest of the data. Data visualization is an important aspect of descriptive statistics, as it allows researchers to communicate complex data in a visual and easily understandable format. Some common types of data visualization used in descriptive statistics include: Histograms: Histograms are used to display the distribution of a continuous variable. The data is divided into intervals (or “bins”), and the number of observations falling into each bin is displayed on the vertical axis. Histograms provide a visual representation of the shape of the distribution, and can help to identify outliers or skewness. Box plots: Box plots provide a graphical representation of the distribution of a continuous variable. The application of graphical approaches, such as scatterplots and heat maps, improves comprehension of correlations and patterns in large datasets [ 22 ].

The box represents the middle 50% of the data, with the median displayed as a horizontal line inside the box. The whiskers extend to the minimum and maximum values in the data set, and any outliers are displayed as points outside the whiskers. Box plots are useful for comparing distributions across different groups or for identifying outliers. Scatter plots: Scatter plots are used to display the relationship between two continuous variables. Each data point is represented as a point on the graph, with one variable displayed on the horizontal axis and the other variable displayed on the vertical axis. Scatter plots can help to identify patterns or relationships in the data, such as a positive or negative correlation. Bar charts: Bar charts are used to display the distribution of a categorical variable.

The categories are displayed on the horizontal axis, and the frequency or percentage of observations falling into each category is displayed on the vertical axis. Bar charts can help to compare the frequency of different categories or to display the results of a survey or questionnaire. Heat maps: Heat maps are used to display the relationship between two categorical variables. The categories are displayed on both the horizontal and vertical axes, and the frequency or percentage of observations falling into each combination of categories is displayed using a color scale. Heat maps can help to identify patterns or relationships in the data, such as a higher frequency of observations in certain combinations of categories. These types of data visualizations can help researchers to communicate complex data in a clear and understandable format, and can also provide insights into the characteristics of the data that may not be immediately apparent from the raw data.

4.4 Data cleaning and preprocessing

Data cleaning and preprocessing procedures, such as imputation methods for missing data, aid in the preservation of data integrity and the reduction of bias in descriptive analysis [ 23 ]. Before beginning any statistical analysis, be certain that the data is clean and well arranged. The process of discovering and fixing flaws or inconsistencies in data, such as missing numbers or outliers, is known as data cleaning. Data preparation is the process of putting data into an appropriate format for analysis, such as scaling or normalizing the data. Data cleaning and preprocessing are essential steps in descriptive analysis, as they help to ensure that the data is accurate, complete, and ready for analysis. Some common data cleaning and preprocessing steps include: Handling missing data: Missing data can be a common problem in datasets and can impact the accuracy of the analysis. Depending on the amount of missing data, researchers may choose to remove incomplete cases or impute missing values using techniques such as mean imputation, regression imputation, or multiple imputation. Handling outliers: Outliers are extreme values that are different from the majority of the data points and can distort the analysis. Outlier identification and removal procedures, for example, assist increase the accuracy and reliability of descriptive statistics [ 24 ].

To assure the correctness and dependability of descriptive statistics, data cleaning and preprocessing require finding and dealing with missing values, outliers, and data inconsistencies [ 25 ]. Researchers may choose to remove or transform outliers to better reflect the characteristics of the data. Data transformation: Data transformation is used to normalize the data or to make it easier to analyze. Common transformations include logarithmic, square root, or Box-Cox transformations. Handling categorical data: Categorical data, such as nominal or ordinal data, may need to be recoded into numerical data before analysis. Researchers may also need to handle missing or inconsistent categories within the data. Standardizing data: Standardizing data involves scaling the data to have a mean of zero and a standard deviation of one. This can be useful for comparing variables with different units or scales. Data integration: Data integration involves merging or linking multiple datasets to create a single, comprehensive dataset for analysis. This may involve matching or merging datasets based on common variables or identifiers. By performing these data cleaning and preprocessing steps, researchers can ensure that the data is accurate and ready for analysis, which can lead to more reliable and meaningful insights from the data.

5. Descriptive statistics in academic methodology

Descriptive statistics are important in academic technique because they enable researchers to synthesize and describe data collected for research objectives [ 26 ]. Descriptive statistics is often used in combination with other statistical techniques, such as inferential statistics, to draw conclusions and make predictions from the data. In academic research, descriptive statistics is used in a variety of ways, such as describing sample characteristics. Descriptive statistics is used to describe the characteristics of a sample, such as the mean, median, and standard deviation of a variable. This information can be used to identify patterns, trends, or differences within the sample. Identifying data outliers: Descriptive statistics can help researchers identify potential outliers or anomalies in the data, which can affect the validity of the results. For example, identifying extreme values in a dataset can help researchers to investigate whether these values are due to measurement error or a true characteristic of the population.

Communicating research findings: Descriptive statistics is used to summarize and communicate research findings in a clear and concise manner. Graphs, charts, and tables can be used to display descriptive statistics in a way that is easy to understand and interpret. Testing assumptions: Descriptive statistics can be used to test assumptions about the data, such as normality or homogeneity of variance, which are important for selecting appropriate statistical tests and interpreting the results. Overall, descriptive statistics is a critical methodology in academic research that helps researchers to describe and understand the characteristics of their data. By using descriptive statistics, researchers can draw meaningful insights and conclusions from their data, and communicate these findings to others in a clear and concise manner.

6. Pitfalls of descriptive statistics

The possibility for misunderstanding, reliance on summary measures alone, and susceptibility to high values or outliers are all disadvantages of descriptive statistics [ 27 ]. While descriptive statistics is an essential tool in academic statistics, there are several potential pitfalls that researchers should be aware of: Limited scope: Descriptive statistics can provide a useful summary of the characteristics of a dataset, but it is limited in its ability to provide insights into the underlying causes or mechanisms that drive the data. Descriptive statistics alone cannot establish causal relationships or test hypotheses. Misleading interpretations: Descriptive statistics can be misleading if not interpreted correctly. For example, a small sample size may not accurately represent the population, and summary statistics such as the mean may not be meaningful if the data is not normally distributed.

Incomplete analysis: Descriptive statistics can only provide a limited view of the data, and researchers may need to use additional statistical techniques to fully analyze the data. For example, hypothesis testing and regression analysis may be needed to establish relationships between variables and make predictions. Biased data: Descriptive statistics can be biased if the data is not representative of the population of interest. Sampling bias, measurement bias, or non-response bias can all impact the validity of descriptive statistics. Over-reliance on summary statistics: Descriptive statistics can be over-reliant on summary statistics such as the mean or median, which may not provide a complete picture of the data. Visualizations and other descriptive statistics, such as measures of variability, can provide additional insight into the data. To avoid these pitfalls, researchers should carefully consider the scope and limitations of descriptive statistics and use additional statistical techniques as needed. They should also ensure that their data is representative of the population of interest and interpret their descriptive statistics in a thoughtful and nuanced manner.

7. Conclusion

Researchers can test the normalcy assumptions of their data by using relevant descriptive statistics techniques such as measures of skewness and kurtosis [ 28 ]. Descriptive statistics has become a fundamental methodology in academic research that is used to summarize and describe the characteristics of a dataset, such as the central tendency, variability, and distribution of the data. It is used in a wide range of disciplines, including social sciences, natural sciences, engineering, and business. Descriptive statistics can be used to describe sample characteristics, identify data outliers, communicate research findings, and test assumptions. The kind of data, research topic, and particular aims of the study all influence the right choice and implementation of descriptive statistical approaches [ 29 ].

However, there are several potential pitfalls of descriptive statistics, including limited scope, misleading interpretations, incomplete analysis, biased data, and over-reliance on summary statistics. The use of descriptive statistics in data presentation can improve the interpretability of study findings, making complicated material more accessible to a larger audience [ 30 ]. To use descriptive statistics effectively in academic research, researchers should carefully consider the limitations and scope of the methodology, use additional statistical techniques as needed, ensure that their data is representative of the population of interest, and interpret their descriptive statistics in a thoughtful and nuanced manner.

Conflict of interest

The authors declare no conflict of interest.

  • 1. Agresti A, Franklin C. Statistics: The Art and Science of Learning from Data. Upper Saddle River, NJ: Pearson; 2009
  • 2. Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 4th ed. Shelton (CT): PMPH-USA; 2014
  • 3. Cohen J, Cohen P, West SG, Aiken LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New York: Routledge; 2013
  • 4. Osborne J. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment. 2019; 10 (7):1-9
  • 5. Field A, Hole G. How To Design and Report Experiments Sage. The Tyranny of Evaluation Human Factors in Computing Systems CHI Fringe; 2003
  • 6. Anders H. A History of Mathematical Statistics from 1750 to 1930. New York: Wiley; 1998. p. xvii+795. ISBN 0-471-17912-4
  • 7. Rebecca M. Warner’s Applied Statistics: From Bivariate Through Multivariate Techniques. Second Edition. Thousand Oaks, California: SAGE Publications; 2012
  • 8. Sullivan LM, Artino AR Jr. Analyzing and interpreting continuous data using ordinal regression. Journal of Graduate Medical Education. 2013; 5 (4):542-543
  • 9. Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; 2011
  • 10. Maxwell SE, Delaney HD, Kelley K. Designing Experiments and Analyzing Data: A Model Comparison Perspective. Routledge; 2017
  • 11. De Leeuw ED, Hox JJ. International Handbook of Survey Methodology. Routledge; 2008
  • 12. Chatfield C. The Analysis of Time Series: An Introduction. CRC Press; 2016
  • 13. Tabachnick BG, Fidell LS. Using Multivariate Statistics. Pearson; 2013
  • 14. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; 2016
  • 15. Field A, Miles J, Field Z. Discovering Statistics Using R. Sage; 2012
  • 16. Howell DC. Statistical Methods for Psychology. Cengage Learning; 2013
  • 17. Wilcox RR. Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction. CRC Press; 2017
  • 18. Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis. Pearson; 2019
  • 19. Beasley TM, Schumacker RE. Multiple regression approach to analyzing contingency tables: Post hoc and planned comparison procedures. Journal of Experimental Education. 2013; 81 (3):310-312
  • 20. Dodge Y. The Concise Encyclopedia of Statistics. Springer Science & Business Media; 2008
  • 21. Krzywinski M, Altman N. Points of significance: Visualizing samples with box plots. Nature Methods. 2014; 11 (2):119-120
  • 22. Cleveland WS. Visualizing data. Hobart Press; 1993
  • 23. Little RJ, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2019
  • 24. Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Computational Statistics & Data Analysis. 2008; 52 (3):1694-1711
  • 25. Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl KC Jr, Desarbo WS. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons; 2017
  • 26. Aguinis H, Gottfredson RK. Statistical power analysis in HRM research. Organizational Research Methods. 2013; 16 (2):289-324
  • 27. Stevens JP. Applied Multivariate Statistics for the Social Sciences. Routledge; 2012
  • 28. Byrne BM. Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming. Routledge; 2016
  • 29. Everitt BS, Hothorn T. An Introduction to Applied Multivariate Analysis with R. Springer; 2011
  • 30. Kosslyn SM. Graph Design for the Eye and Mind. Oxford University Press; 2006

© The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

  • Free Account
  • For Digital
  • For Customer Care
  • For Human Resources
  • For Researchers
  • Financial Services
  • All Industries

Popular Use Cases

  • Customer Experience
  • Employee Experience
  • Net Promoter Score
  • Voice of Customer
  • Customer Success Hub
  • Product Documentation
  • Training & Certification
  • XM Institute
  • Popular Resources
  • Customer Stories
  • Artificial Intelligence

Market Research

  • Partnerships
  • Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

  • English/AU & NZ
  • Español/Europa
  • Español/América Latina
  • Português Brasileiro
  • REQUEST DEMO
  • Experience Management
  • Descriptive Statistics

Try Qualtrics for free

Descriptive statistics in research: a critical component of data analysis.

15 min read With any data, the object is to describe the population at large, but what does that mean and what processes, methods and measures are used to uncover insights from that data? In this short guide, we explore descriptive statistics and how it’s applied to research.

What do we mean by descriptive statistics?

With any kind of data, the main objective is to describe a population at large — and using descriptive statistics, researchers can quantify and describe the basic characteristics of a given data set.

For example, researchers can condense large data sets, which may contain thousands of individual data points or observations, into a series of statistics that provide useful information on the population of interest. We call this process “describing data”.

In the process of producing summaries of the sample, we use measures like mean, median, variance, graphs, charts, frequencies, histograms, box and whisker plots, and percentages. For datasets with just one variable, we use univariate descriptive statistics. For datasets with multiple variables, we use bivariate correlation and multivariate descriptive statistics.

Want to find out the definitions?

Univariate descriptive statistics: this is when you want to describe data with only one characteristic or attribute

Bivariate correlation: this is when you simultaneously analyze (compare) two variables to see if there is a relationship between them

Multivariate descriptive statistics: this is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable

Then, after describing and summarizing the data, as well as using simple graphical analyses, we can start to draw meaningful insights from it to help guide specific strategies. It’s also important to note that descriptive statistics can employ and use both quantitative and qualitative research .

Describing data is undoubtedly the most critical first step in research as it enables the subsequent organization, simplification and summarization of information — and every survey question and population has summary statistics. Let’s take a look at a few examples.

Examples of descriptive statistics

Consider for a moment a number used to summarize how well a striker is performing in football — goals scored per game. This number is simply the number of shots taken against how many of those shots hit the back of the net (reported to three significant digits). If a striker is scoring 0.333, that’s one goal for every three shots. If they’re scoring one in four, that’s 0.250.

A classic example is a student’s grade point average (GPA). This single number describes the general performance of a student across a range of course experiences and classes. It doesn’t tell us anything about the difficulty of the courses the student is taking, or what those courses are, but it does provide a summary that enables a degree of comparison with people or other units of data.

Ultimately, descriptive statistics make it incredibly easy for people to understand complex (or data intensive) quantitative or qualitative insights across large data sets.

Take your research to the next level with XM for Strategy & Research

Types of descriptive statistics

To quantitatively summarize the characteristics of raw, ungrouped data, we use the following types of descriptive statistics:

  • Measures of Central Tendency ,
  • Measures of Dispersion and
  • Measures of Frequency Distribution.

Following the application of any of these approaches, the raw data then becomes ‘grouped’ data that’s logically organized and easy to understand. To visually represent the data, we then use graphs, charts, tables etc.

Let’s look at the different types of measurement and the statistical methods that belong to each:

Measures of Central Tendency are used to describe data by determining a single representative of central value. For example, the mean, median or mode.

Measures of Dispersion are used to determine how spread out a data distribution is with respect to the central value, e.g. the mean, median or mode. For example, while central tendency gives the person the average or central value, it doesn’t describe how the data is distributed within the set.

Measures of Frequency Distribution are used to describe the occurrence of data within the data set (count).

The methods of each measure are summarized in the table below:

Mean: The most popular and well-known measure of central tendency. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

Median: The median is the middle score for a set of data that has been arranged in order of magnitude. If you have an even number of data, e.g. 10 data points, take the two middle scores and average the result.

Mode: The mode is the most frequently occurring observation in the data set.  

Range: The difference between the highest and lowest value.

Standard deviation: Standard deviation measures the dispersion of a data set relative to its mean and is calculated as the square root of the variance.

Quartile deviation : Quartile deviation measures the deviation in the middle of the data.

Variance: Variance measures the variability from the average of mean.

Absolute deviation: The absolute deviation of a dataset is the average distance between each data point and the mean.

Count: How often each value occurs.

Scope of descriptive statistics in research

Descriptive statistics (or analysis) is considered more vast than other quantitative and qualitative methods as it provides a much broader picture of an event, phenomenon or population.

But that’s not all: it can use any number of variables, and as it collects data and describes it as it is, it’s also far more representative of the world as it exists.

However, it’s also important to consider that descriptive analyses lay the foundation for further methods of study. By summarizing and condensing the data into easily understandable segments, researchers can further analyze the data to uncover new variables or hypotheses.

Mostly, this practice is all about the ease of data visualization. With data presented in a meaningful way, researchers have a simplified interpretation of the data set in question. That said, while descriptive statistics helps to summarize information, it only provides a general view of the variables in question.

It is, therefore, up to the researchers to probe further and use other methods of analysis to discover deeper insights.

Things you can do with descriptive statistics

Define subject characteristics

If a marketing team wanted to build out accurate buyer personas for specific products and industry verticals, they could use descriptive analyses on customer datasets (procured via a survey) to identify consistent traits and behaviors.

They could then ‘describe’ the data to build a clear picture and understanding of who their buyers are, including things like preferences, business challenges, income and so on.

Measure data trends

Let’s say you wanted to assess propensity to buy over several months or years for a specific target market and product. With descriptive statistics, you could quickly summarize the data and extract the precise data points you need to understand the trends in product purchase behavior.

Compare events, populations or phenomena

How do different demographics respond to certain variables? For example, you might want to run a customer study to see how buyers in different job functions respond to new product features or price changes. Are all groups as enthusiastic about the new features and likely to buy? Or do they have reservations? This kind of data will help inform your overall product strategy and potentially how you tier solutions.

Validate existing conditions

When you have a belief or hypothesis but need to prove it, you can use descriptive techniques to ascertain underlying patterns or assumptions.

Form new hypotheses

With the data presented and surmised in a way that everyone can understand (and infer connections from), you can delve deeper into specific data points to uncover deeper and more meaningful insights — or run more comprehensive research.

Guiding your survey design to improve the data collected

To use your surveys as an effective tool for customer engagement and understanding, every survey goal and item should answer one simple, yet highly important question:

What am I really asking?

It might seem trivial, but by having this question frame survey research, it becomes significantly easier for researchers to develop the right questions that uncover useful, meaningful and actionable insights.

Planning becomes easier, questions clearer and perspective far wider and yet nuanced.

Hypothesize – what’s the problem that you’re trying to solve? Far too often, organizations collect data without understanding what they’re asking, and why they’re asking it.

Finally, focus on the end result. What kind of data do you need to answer your question? Also, are you asking a quantitative or qualitative question? Here are a few things to consider:

  • Clear questions are clear for everyone. It takes time to make a concept clear
  • Ask about measurable, evident and noticeable activities or behaviors.
  • Make rating scales easy. Avoid long lists, confusing scales or “don’t know” or “not applicable” options.
  • Ensure your survey makes sense and flows well. Reduce the cognitive load on respondents by making it easy for them to complete the survey.
  • Read your questions aloud to see how they sound.
  • Pretest by asking a few uninvolved individuals to answer.

Furthermore…

As well as understanding what you’re really asking, there are several other considerations for your data:

Keep it random

How you select your sample is what makes your research replicable and meaningful. Having a truly random sample helps prevent bias, increasingly the quality of evidence you find.

Plan for and avoid sample error

Before starting your research project, have a clear plan for avoiding sample error. Use larger sample sizes, and apply random sampling to minimize the potential for bias.

Don’t over sample

Remember, you can sample 500 respondents selected randomly from a population and they will closely reflect the actual population 95% of the time.

Think about the mode

Match your survey methods to the sample you select. For example, how do your current customers prefer communicating? Do they have any shared characteristics or preferences? A mixed-method approach is critical if you want to drive action across different customer segments.

Use a survey tool that supports you with the whole process

Surveys created using a survey research software can support researchers in a number of ways:

  • Employee satisfaction survey template
  • Employee exit survey template
  • Customer satisfaction (CSAT) survey template
  • Ad testing survey template
  • Brand awareness survey template
  • Product pricing survey template
  • Product research survey template
  • Employee engagement survey template
  • Customer service survey template
  • NPS survey template
  • Product package testing survey template
  • Product features prioritization survey template

These considerations have been included in Qualtrics’ survey software , which summarizes and creates visualizations of data, making it easy to access insights, measure trends, and examine results without complexity or jumping between systems.

Uncover your next breakthrough idea with Stats iQ™

What makes Qualtrics so different from other survey providers is that it is built in consultation with trained research professionals and includes high-tech statistical software like Qualtrics Stats iQ .

With just a click, the software can run specific analyses or automate statistical testing and data visualization. Testing parameters are automatically chosen based on how your data is structured (e.g. categorical data will run a statistical test like Chi-squared), and the results are translated into plain language that anyone can understand and put into action.

Get more meaningful insights from your data

Stats iQ includes a variety of statistical analyses, including: describe, relate, regression, cluster, factor, TURF, and pivot tables — all in one place!

Confidently analyze complex data

Built-in artificial intelligence and advanced algorithms automatically choose and apply the right statistical analyses and return the insights in plain english so everyone can take action.

Integrate existing statistical workflows

For more experienced stats users, built-in R code templates allow you to run even more sophisticated analyses by adding R code snippets directly in your survey analysis.

Advanced statistical analysis methods available in Stats iQ

Regression analysis – Measures the degree of influence of independent variables on a dependent variable (the relationship between two or multiple variables).

Analysis of Variance (ANOVA) test – Commonly used with a regression study to find out what effect independent variables have on the dependent variable. It can compare multiple groups simultaneously to see if there is a relationship between them.

Conjoint analysis – Asks people to make trade-offs when making decisions, then analyses the results to give the most popular outcome. Helps you understand why people make the complex choices they do.

T-Test – Helps you compare whether two data groups have different mean values and allows the user to interpret whether differences are meaningful or merely coincidental.

Crosstab analysis – Used in quantitative market research to analyze categorical data – that is, variables that are different and mutually exclusive, and allows you to compare the relationship between two variables in contingency tables.

Go from insights to action

Now that you have a better understanding of descriptive statistics in research and how you can leverage statistical analysis methods correctly, now’s the time to utilize a tool that can take your research and subsequent analysis to the next level.

Try out a Qualtrics survey software demo so you can see how it can take you through descriptive research and further research projects from start to finish.

Take your research to the next level with XM for Strategy & Research

Related resources

Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.

Ready to learn more about Qualtrics?

Introduction to Research Methods

10 descriptive statistics.

This introduction doesn’t actually introduce the topic, but is rather meant as a reminder about how this and subsequent chapters will be structured. The first half will describe the concepts used in the chapter, and why they’re useful. There wont be any coding shown in that portion of the chapter, but there will be examples of the type of output we’re discussing. The text is meant to be read just like any other book.

The second half of the chapter, the practice section, will walk students through creating all of the statistics we describe in the first half using R. The second half of the chapter should be read “actively” while practicing the code yourself. Most of learning to code is just taking code someone else has produced and practicing it until you know it.

10.1 Concepts

Descriptive statistics are a first step in taking raw data and making something more meaningful. The most common descriptive statistics either identify the middle of the data (mean, median) or how spread out the data is around the middle (percentiles, standard deviation). The statistics we calculate as descriptive statistics will be useful for many of the more advanced lessons we’ll encounter later, but they are important on their own as well.

Descriptive statistics are useful for exactly what it sounds like it would be: describing something. Specifically, describing data. Why does data need to be described? Because raw data is difficult to digest and a single data point doesn’t tell us very much.

10.1.1 Data

We’ve used the word data in a few different ways throughout this book. Data can be words, data can be numbers, data can be pictures, data can be anything. But one of the most common associations of the term is with a spread sheet . If I tell a colleague to “send me the data” I probably mean send me a spreadsheet with the information we’re discussing. When we discuss using data in the upcoming chapters, we’re discussing using a spreadsheet. Just so that we can full understand that use of the term, let’s discuss the anatomy of data/spreadsheets in a little more detail.

Let’s start by looking outside of R at the most popular spreadsheet program available: Excel.

Data is made up of rows and columns. Rows run from side to side on the sheet, while columns go up and data. Each data point falls into a cell, which can be identified by the exact row and column it has in the data.

research methodology descriptive statistics

We want to label our columns with a short phrase that indicates what the data points in that column represent (Age, Education). Each column holds the same information for all of the rows in the data, while each row has the data for a single observation. Below I show a few rows for some made up data. The first person in the data is 18, has 12 years of education, and is not married. The 7 people in the data are 18,45,32,74,52, and 34 years old. Each column has a name, and typically rows just have a number.

research methodology descriptive statistics

There are a lot of names for a spreadsheet. You can call it a data set, or a data frame, or just the data. Data set and data frame are pretty common, and I’ll slip back in forth on what I call it.

10.1.2 Summary Statistics

Okay, so why does data need describing then? There are a few good reasons to use descriptive statistics. One is in order to condense data and another is for comparisons . We’ll talk about both in this chapter, and we’ll keep coming back to those words: condense and compare.

Let’s say my child is a student at Wright Elementary in Sonoma, California. Every year schools take the same standardized math test, and when the scores come out I want to know whether my school is good or not. Finally the reporting day arrives and the results are announced: Wright Elementary scored 668.3.

Is that good or bad?

I don’t know. That’s just an absolute figure, which doesn’t tell me anything about how any other school did.

So I want to know how my school did, not just in the absolute sense of how many points it earned, but relative to all other schools in the state.

I call the head of the Department of Education and I say “show me the data!” and they do…

That’s a lot of data! Number after number is in there, and somewhere in the list is the score from Wright Elementary at 668.3. Okay, so I’ve got the data, now what do I do to understand the data. Just having raw data wont help me make better decisions unless I can organize it in some way to answer my question.

Take a look at that list again then, does 668.3 look high or low? It’s lower than the first number, but higher than the second and third. But I don’t want to take the time to compare my school to every other school individually. That would be exhausting just with the 420 schools that are in the state of California. Rather than doing 420 individual comparisons, let’s have R do some math for us.

If I want to understand how well Wright Elementary is doing, it’d be useful to summarize the data in some sort of clear way. I can start by measuring the middle of the data, using the average or the mean.

10.1.2.1 Mean and Median

Mean is just a mathy word for average that you’ll rarely see outside of math classes. We can use the two terms interchangeably in this book, but in most math it’ll just be referred to as the mean.

To calculate the mean, you add up all the individual values and divide it by the total number of observations. Of course, you’ll never have to do that again because R can do it for you a lot quicker than you can. But it’s still worth understanding what the gears in the machine are doing: adding up all the values in a column, and dividing it by how many rows there are.

Average is perhaps the most commonly discussed statistic in the world. Every year you’ll hear reports about whether test scores are increasing or decreasing based on statewide averages. Sports fans know the average number of points their favorite basketball player scores or the batting average of baseball players.

Average can to some degree be taken as the expected value from the data. If you took a random data point, your best guess might be that it would be close to the average. If you go to a basketball game and the best player averages 30 points, you probably intuitively expect them to score about 30 points. It should be stressed that they probably wont. Picking a random data point or watching a random game doesn’t mean the figure will be anywhere near the mean. But it sets your expectations and provides you some guidance for the future.

Do you expect the food at a restaurant that averages 4.5 stars on yelp to be better than one that has 2.5 stars on average? The mean indicates something about the overall values in a data set, even if it doesn’t guarantee that any individual experience will be different.

So what the mean does is condense all of our data into one figure that tells us something about the middle of that data. And in this case the mean of our data is 653.3426.

And my kids school got 668.3. Great, that means my kids school is above average!

Does that mean Wright Elementary is better than half of the schools in the state? Not exactly. Sometimes the mean value of data will be the exact middle of all the values, but sometimes it wont. On the other hand, another measure for the middle of the data will be: the median.

The median is the exact middle of our data. If we have 3 numbers in our data, it’s the 2nd highest one. If we have 9 numbers, it’s the 5th highest. No matter what the highest and lowest numbers are in our data, the median will always be the middle number. If you have an even number of numbers it’s the average of the middle two.

Let’s use an example to describe why we might want to look at both the mean and the median.

If the average test scores in a given school district are increasing from one year to the next, does that mean every school is improving? No. Let’s take 5 hypothetical schools for some school district, and change their scores a few different ways to see how the average shifts.

So to start the average test score for the 5 schools is 524. Let’s increase the average test score by 10 points in 3 different ways.

In the column labeled Change 1 all of the schools increase their scores by 10 points, so the average test score increases by 10 points in turn. Everyone is doing better.

In the second column only school E has improved its score though, from 750 to 800. However, to calculate the mean we just add together all the values and divide it by the number of rows, so regardless the average score still rises. The third change is even more stark - Schools A, B, C and D all had decreases in their scores, but because School E did so much better the average test scores for all the schools increased!

Now imagine being an administrator for this school district, and hearing that average test scores have risen for the district. That should be good news. But as we just showed, that can mean a lot of different things about the data. Depending on which scenario occurs most of your schools are either improving or declining, despite the outcome being exactly the same. The average is a useful starting point to understanding our data, but it’s never sufficient on its own.

We have 5 schools, so the median figure will always be the 3rd highest test score.

In one of the scenarios the median stays the same (Change 2), in one it decreases (Change 3), and in one it increases (Change 1). That’s in contrast to the mean, which increased in all 3 scenarios. Here the median seems like a better measure of how the school district is changing, depending on which set of changes have occurred. However, the district might want to just report the mean because scores increasing looks good for all the officials! However, if they ignore the underlying change they may not understand what is occurring at the schools.

Which isn’t to say that the mean should never be used. It tells us something about the data too, and it’ll often be used in the calculation of other mathy stuff later in the book. But understanding the difference can help you to sniff out times when someone might be using statistics to lie or trick you.

There’s a famous hypothetical of Bill Gates walking into a soup kitchen where there are 9 homeless people that have zero wealth. Bill Gates of course has nearly infinite wealth ($113 billion as of my Googling), which means the average wealth of people in the kitchen is now 113 billion divided by 10. So each homeless person is now worth $11.3 billion? No, they still have zero dollars, but the average has changed. The median on the other hand is still 0, as the 5th most wealthy person in the room still has 0 dollars.

The mean and median are often misunderstood conversationally .

There’s a famous joke attributed to the comedian George Carlin that is shown below:

research methodology descriptive statistics

Are half of people stupider than the average person? Not necessarily, but 50 percent of people are stupider than the median smartest person. That makes the joke a bit more complicated though for the average person to understand.

10.1.2.2 Distribution

The median in the testing data is 652.45. That’s not far from the mean, 653.3426, but it’s not exactly the same either.

Whether the mean is equal to, above, or below the mean is a good indication of whether your data is skewed . If the mean and median are equal it’s a sign that the data is evenly distributed, and both figures are equally good at describing the middle of the data.

Skew just means not symmetrical , which in this context means that the distribution doesn’t fall evenly around the mean and median. The most famous distribution is the normal distribution, where the data is evenly distributed above the mean and the median. A fairly normal distortion is displayed below, with a mean and median of 100.

research methodology descriptive statistics

Normal distributions are really important for some of the mathematical stuff we do later. For now we can just accept that it’s important because it’s what we compare all distributions to, to understand how close or far from a normal distribution they are. Why we compare it is sort of hard to understand unless you know the magical powers that a normal distribution has, but that’s for a later chapter.

So let’s take a look at the distribution of all of the values from math scores in California. And we can add labels to show where the mean and median sit as well.

research methodology descriptive statistics

The first thing that might jump out at you is that this doesn’t look exactly like the normal distortion I showed above. It’s more lumpy in places, and it’s not quite evenly distributed above and below the median and mean. But the mean and median are still fairly close together. They can get much further apart with heavily skewed data.

Data can be skewed to the right, as shown below (we say skewed to the right because the “tail” of the data is pulled out to the right side).

research methodology descriptive statistics

What happens to the mean and median in that case? Let’s plot it again and see.

research methodology descriptive statistics

The mean is to the right of the median, another indicator that the data is skewed to the right.

Data can also be left skewed.

research methodology descriptive statistics

Here we see the data has a looooong tail to the left, and the mean is to the left of the median.

Skewed data comes up pretty often in the real world. For instance, let’s look at the distribution of income in the United States.

research methodology descriptive statistics

Income is heavily skewed to the right, which means the mean is above the median. We more often talk about the median income of citizens than the mean because the mean can increase primarily as a result off the wealthy becoming wealthier. If we concerned about how the average American is doing, median is actually a better measure to understand their status.

So a basic rule of thumb is to look at the mean and the median. If they’re the same you can just use the mean, that’s more easy for the average reader to understand. If they differ significantly report them both, or just report the median.

10.1.2.3 Mode

We’ve talked about two measures of the middle so far: the mean and median. There’s one more measure that is a little less common, the mode, which can be overlooked in part because it’s used less in quantitative studies.

Mode is the most common value in a list of data. It wouldn’t be a great way of analyzing the data on test scores in California. There the mode is 636.7, which appears twice, but that doesn’t help us to understand what schools are good or bad, it just tells us the most common score. Where it is more useful is with characteristics in our data, particularly if we’re trying to assess what the most common feature is. I wouldn’t want to talk about what the average color of hair is for my students, but rather what the most common hair color is. Or in a more applied setting, I might want to report what the most common race of respondents to my survey is, rather than their average race.

Let’s go through some examples where mode was (or would have been useful).

In 1950 the US Airforce was designing a new set of planes; in order to ensure that they would be comfortable for their pilots bodies, they took measurements of 4,000 pilots across 140 dimensions. That produces a lot of data! And with those measurements they fit the cockpit to the average pilot in the force: the average leg length, arm length, shoulder width, etc.

What happened? It was a disaster. The plane fit the “average” pilot, but no pilot actually fit the dimensions of the numerical average. It was uncomfortable for everyone, because it was designed for a composite individual that didn’t exist. A better strategy may have been to identify the modal pilot with the most common sets of features, and design the cockpit for that pilot. That might have not been comfortable for all the other pilots, but at least someone would get a plane they could control.

Similar question then - who is the average American? I often hear that politicians are attempting to appeal to the average American, but I don’t actually know who they are. If we take the numerical average of the nation’s demographics, they would be 51% female, 61.6% non-Hispanic white, and 37.9 years old. That doesn’t really sound like anyone I know though. What we mean by average American is actually most common American, which would indicate what we really want to find is the modal American. But that phrase sounds a bit clunky, so maybe it wont catch on.

If you’re interested in knowing who the modal American is there’s an episode of the podcast Planet Money that discusses that question and has a fairly interesting answer.

10.1.2.4 Outside the middle

So we have three measures for the middle of our data, each of which might be useful depending on the question we’re attempting to answer and the distribution of our data. But we’re not just concerned about the middle. The middle is a good place to start, but we’re also concerned about more than the middle. Mean and median are great for condensing lots of data into a single measure that gives us some handle on what the data looks like, but they also mean ignoring everything that is far away from those points.

Let’s return to figuring out whether Wright Elementary school did well on the math test or not. We know their score is above the average and the median by a few points. But that’s all we know so far.

Are they the best school in California? The highest value in our data is called the max or maximum , and so the max value is the school we would say did best. Was that Wright Elementary? Sadly (for their students), no. Los Altos got a 709.5, the highest score in that year.

At the other end of the spectrum would be the min or the minimum , which as you’re probably guessed is the lowest value in the data. In the case of the test score data that was Burrel Union Elementary with 605.4. The min and the max are useful points to give you a feel for how spread out the data is, and perhaps what a reasonable change in the data might be. It probably wouldn’t be a good idea for the principal of Wright Elementary to set a goal of adding 200 points to their math test score the next year, since that would far exceed what any school had achieved. A better goal might be a small improvement, or to match the best school in the state.

research methodology descriptive statistics

10.1.2.5 Relative Figures

So we can describe the middle (mean, median, mode) or the ends of our data (mix and max). Those are all really valuable for getting a quick look at your data, and they’re among the most common descriptive statistics used in research.

Another way of quickly summarizing your data would be to split them into percentiles. Earlier we referred to the score at Wright Elementary as a absolute figure. The score of 668.3 didn’t mean anything on its own, it was just the value we had. That’s in contrast to what are called relative figures, which tell us something implicitly comparative about the data

Percentiles don’t tell you what any one school in the data scored, but rather where a school is relative to all others in the state. In order to calculate percentiles, you essentially sort all of the values from lowest to highest, and put them into 100 equally sized groups. If you had 100 numbers in your data, the lowest number would be the 1st percentile, the second number would be the 2nd percentile, and so on. If you had 1000 numbers in your data, the lowest 1/100 (or the lowest 10) would be in the first percentile, with higher numbers sorting into higher percentiles.

The benefit of reporting percentiles is that they take absolute figures, which often don’t mean anything on their own, and turn them into something that tells you the relative rank of the figure compared to everything else. For instance, I see percentiles every time I take my toddler for a health check up, after they weigh and measure her. The fact that she was 27 inches tall doesn’t mean a lot to me, because what I really care about is whether she’ll be taller than her classmates at daycare. That’s why they also report her percentile height for kids of her age and gender - so that I know whether she’s taller or shorter than other similar kids.

We’ve already met one percentile earlier. The median represents the middle value in our data, so it is also the 50th percentile. If your kid is the median height that would mean they were taller than 50 percent of kids of a similar age, but also shorter than the other 50 percent. If they are 70th percentile, that would mean they are taller than 70 percent of other kids, and shorther than the other 30 percent. If they are in the 27th percentile they would be taller than 27 percent of other kids that age, and shorter than 73 percent.

Earlier we talked about how Wright Elementary is better than average on the math test, and scored somewhere between average and the maximum value. But that’s all we know so far. Percentiles gives us a much more precise estimate. In Wright Elementary scored in the 79th percentile. That means they did better than 79 percent of other schools in the state, but also worse than 21 percent. That’s really good. And to some degree that closes our quest to understand whether Wright Elementary did well or poorly on the math test. They weren’t the best, but they were better than a lot of other schools in the state. They should feel good about that.

10.1.2.6 Noise

To this point we’ve learned a few different ways to condense our data into a few different measures that help us get a quick idea of what our data contains. But along with the middle of our data we also often want to know how spread out or noisy the data is. Why do we care about how noisy or dispersed our data is?

The dispersion of your data gives you evidence of how representative the mean is of the data. If the data is highly dispersed, each individual observation is more likely to be further away from the mean. Less dispersed data is the opposite, tightly clustered around the mean.

Let’s imagine you’re choosing where to go for dinner. There are two new places you’ve heard about and want to check out; you look at yelp and see they have really similar ratings (out of 5). We’ll call one Oscar’s and one Luis’s (based on restaurants I like in my home town) and look at the average ratings at both.

That’s pretty close. It’s tough to pick between them. So you look closer and notice that Luis’s has really high variance or dispersion in its reviews. There are a lot of 5’s, but also a lot of 1’s. Oscars on the other hand is more consistently rated around a 4. For Luis’s, the mean isn’t very indicative of the typical experience, but for Oscar’s you know what to expect with just that number. That’s because Luis’s data is more dispersed.

Why is the dispersion so different? It turns out that Luis’ brother works as a chef, and is awful. So anytime anyone rates the restaurant after eating one of the dishes cooked by him, it gets a bad review. But the other cooks are top notch. On the other hand, Oscar’s chefs are far more consistent. So the choice would depend on whether you want a chance at the better meal and are willing to take a risk on getting food poisoning, or if you’d rather just know that your food will be good - but not great.

We measure dispersion using standard deviation . Standard deviation measures how spread out the data is typically around the mean, and gives us a figure that provides a range. We can think of our mean, plus or minus the standard deviation, as giving us a range we can expect to observe in our data. Not all of the data will fall in that range, but most of it will or should.

Let’s look at the test scores data again. We had a mean of 653.3, and a standard deviation of 18.75 points. That means a typical school scored around 653 points, plus or minus 18.7. Let’s look at a graph of that again to illustrate.

research methodology descriptive statistics

With the range set by the two blue lines falls most of our data. We would generally say that schools between the blue lines were close to average. Some were above, some where below, but that’s the sort of variation we’re willing to attribute to random chance. The school that did 1 point below the average and one point above the average aren’t considered fundamentally different, they just did a little better or worse than each other. Data does fall outside that range though, and what that indicates is that those schools did atypically well or poorly on the test.

To calculate the standard deviation by hand we need to:

  • Calculate the mean
  • Subtract each individual observation from the mean, and square the result
  • Calculate the mean of the squared differences.
  • Calculate the square root of each figure.

That’s a mouth full. We’ll see below that we can calculate standard deviation with only a few keystrokes in R

Which is to say, that calculating standard deviation is not the important lesson here. What matters is understanding what it is telling you about the data.

Let’s say you’re going to a basketball game, and the best players on both teams average around 25 points. But the standard deviation in their scoring is quite different.

Which player should you be more confident will score close to 25 points at the game? Smith. She has a much smaller standard deviation, so you can be confident that at a typical game she’ll score between 22 and 28 points. Jones runs hot or cold; they might score 37, but they could just as easily score 13. There’s a lot more variation in her games.

10.1.2.7 Summary

This chapter has worked through a lot of terminology. Means, Medians, and Modes, along with Mins and Maxs, and lets not forget percentiles or standard deviation. Actually, let’s not forget any of it. All of the terms we’ve covered in this chapter will come up again as we work into more and more of the statistics researchers use to explain the world.

But they’re also important on their own. Whenever you collect or use data, it’s important that you give the reader some summary statistics like the ones we outlined above so that they can begin to understand what you’re working with. Even if you use a common data source, like the US Census, I wont know exactly what that data looks like unless you tell me about it. So before we get to the practice of calculating or outputting descriptive statistics, let’s look at the descriptive statistics used in a few journal articles.

What we’re really talking about is a descriptive statistics table. A table that lists the variables you’re using or are important in your study, and some brief descriptive statistics (like the ones described above) so that the reader gets a better understanding of your data.

research methodology descriptive statistics

That was from a quantaititve study I did with a colleague where we looked at whether poor neighborhoods damaged by Hurricane Katrina were more or less likely to gentrify over the decade that followed. So the line for Gentrified shows that 61.4% of the 101 neighborhoods we studied did gentrify. Damage from Katrina measures the percentage of all housing in each neighborhood that was damaged by Hurricane Katrina. So the data shows that the average neighborhood had 19% of its housing damaged, but the lowest amount was 0 percent and the largest amount was 62.3% The distance to the CBD (Central Business District, or downtown) shows that the average neighborhood in our sample of 101 was 2.4 miles from downtown, with the furthest neighborhood being 6.3 miles away. Looking at the standard deviation, you can see that most neighborhoods were between 1.4 and 3.4 miles from downtown. The percentage of residents that were black in 2000 shows that the average neighborhood in our study was 78.3% black, with a range of 7% to 100%. That data is much more spread out, so the standard deviation is 23.5. 78.3 + 23.5 equals 101.8, which isn’t possible (no neighborhood can be more than 100% anything), and that’s okay for the standard deviation, the values don’t necessarily have to be meaningful in that case they’re just necessary to indicate how spread out the data is.

You’ll see descriptive statistics used in qualitative research too. Wait, you might say, qualitative research focuses on words - why would you present the average of something? So that the reader can understand who your average or typical respondent was. See the table below from the article More Than An Eyesore by Garvin and coauthors. It’s good for me to know that their sample was 59% male, typically unmarried, all Black, etc. Numbers like those are easier to read in the form of a table than writing them out, and they provide important context for your results.

research methodology descriptive statistics

10.2 Practice

So now that we are starting to understand the numbers that go into a descriptive statistics table, and we’ve seen a few examples, let’s make one ourselves. Again, we’re not going to spend a lot of time on calculating things, because R wants to do those things for us.

Let’s start by just calculating some of the statistics we have above just using R. Of course, we’ll need some data to calculate these things, so be sure to load the data on California Schools that I’m using to practice. You can copy that line of code into an R script to run it.

Let’s calculate the mean and median score on the reading test (since we’ve already spent so much time talking about math). R tries to keep its commands fairly basic and easy to understand, so to calculate mean you use mean() and for median you use median().

The mean is 654.9705 and the median is 655.75, those are pretty similar numbers, so we could probably report either one if we wanted to.

We can calculate the min and the max with (drum roll)… the commands min() and max().

And to calculate the standard deviation we can skip those four steps described above and use the command sd().

The standard deviation is roughly 20, meaning that most schools scored 654 on the reading test, plus or minus 20 points.

Those are all statistics that you might see in a descriptive statistics table. You could run all of those commands for all of the variables you’re interested in and build a table by hand by copying and pasting the output into the table.

Or, you can have R work on building the table for you. The summary() command will actually give you a whole set of summary statistics with just one line of code.

Let’s say I want descriptive statistics for more than one column in my data. The summary() command can do that, it can also produce statistics for an entire data set at once.

That gets a little messier to read though.

Okay here are the more advanced lessons though. Above I just produced the descriptive statistics for all 15 variables in my data set. But maybe I don’t want all 15. Maybe I just want a few of the columns. I can create a new data set in R, just with the columns I actually want. Let’s say my analysis is focused on the math (math) and reading (read) scores for schools, along with the number of students (students) and the median income of parents (income). I can select those columns by name, as I do below.

I’m going to break that down in detail here, but it may not still completely make sense until you practice it 100 times. I figured out that line of code by googling “how to select columns by name” dozens of times until I learned the way to do it. I’d find some post online that had the answer, and just copy the code and change the names to match my data. That’s most of what learning to code is all about. But anyways, the explanation:

research methodology descriptive statistics

What this line of code is saying overall is look in this object called CASchools, find the columns names read, math, students, and income, and make that into a new object called CASchools2.

New name. I’m creating a new object here called CASchools2. That way I’ll have the old data set CASchools still in my environment with all the columns, but also have a new data set called CASchools2 with just the 4 columns I want. I could name it anything I want, but I choose the name CASchools2 so that it would be similar to the original data, but the “2” is added so I know it is a different version.

I need to tell R what data I’m taking the columns from, so i need to identify CASchools by name.

I then tell R the list of variables I want from CASchools. To make a list I need to include the c outside the parenthesis, similar to how we created an object called y with the values 1,2,3 in the last chapter. So I’m saying this list of 4 columns that are in CASchools.

This would be a really good time to practice that line of code, and try to extract the columns “lunch”, “computer”, “expenditure”, “income”, or some other list. Remember, to look up all the column names in a data set you can use the command names(). It’d also be a good idea to change the name of the new object, try any word you want instead of CASchools2. Change everything until the code gives you an error message, and then go back a step to something that worked.

Okay, but for now we’ve got fewer columns in our data frame called CASchools2, so there will be less text in our summary statistics.

10.2.1 Advanced Practice

Anyone not interested in continuing to practice their coding skills can get off the bus here. It’ll move a little faster (less explanation), and these exercises are more meant for people that are actively looking to see the full potential of using R in a project.

You can copy the output of a table produced by summary() and put it in a word document, but below I’m going to give you some more code to show how I actually build a descriptive statistics table for one of my papers.

With a few more steps though you can A) select the exact statistics you want for your summary table and B) add the standard deviation.

Fair warning: There might be a better or more efficient way to produce what I do below. In fact, there probably is. If you can improve on it, great! Please tell me how. I’m really bad at using loops, so that might improve this code, but I’m not sure where to start and just reusing this code makes it quicker than learning how to improve it.

First I create a new data frame using the command as.data.frame() with a list of the names of variables I want. I don’t use the short descriptions I have for column names in the data, but rather a more informative title that will start to tell the reader what the data is. I’ll use the same 4 variables we have in CASchools2 above.

And I’m going to rename the column in that data frame using colnames() as “Variables” so that I know exactly what it holds.

I then generate a separate object for each variable with the summary statistics saved. I’ll name those x1, x2, x3, x4 as a very simple name that tells me the order I created them.

Just to show you what that did, let’s look at object x1. It just has the same summary statistics we produced above for the variable read.

By saving it as an object we can now select the elements in we want from that list. The default summary statistics in R has 6 figures (min, 1st quartile, median, mean, 3rd quartile, and max) but we may not want to show all of those all the time.

First we combine the 4 different objects x1, x2, x3, and x4 with the command rbind(), which stands for rbind.

Then we select only the columns we want. Specifically, I’d like to keep the mean, min, and max and I’ll place those three columns in a new object called s (for summary).

Great, now the object s only has the 3 columns I wanted. Earlier I combined different rows of data with the command rbind(). Now I’ll combine different columns with cbind(), specifically the column of variable names we created earlier (names) and the 3 columns of summary statistics in s.

Two more steps to go. Now I’m going to generate the standard deviation for each variable, which isn’t included in the default summary statistics table. and I’ll save that as an object named sd and give it the column name “SD”.

And then we add that new column to our existing data frame called s, and we’re done.

That is closer to the type of summary statistics table that I would use in a paper. I might also round all of the figures, as a final step. I’ll let the decimals show one digit using the command round() by entering the name of the column followed by a comma and the number 1. If I wanted no decimals I could use the number 0, or if I wanted 2 decimal places I could use 2.

If I click on the object s3 or write the command View(s3) I can copy and paste that output into my word document for final formatting. Or I could write it into an excel document, but that is for another lesson.

Child Care and Early Education Research Connections

Descriptive Statistics

This page describes graphical and pictorial methods of descriptive statistics and the three most common measures of descriptive statistics (central tendency, dispersion, and association).

Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a dataset and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of:

Graphical/Pictorial Methods

Measures of central tendency, measures of dispersion, measures of association.

There are several graphical and pictorial methods that enhance researchers' understanding of individual variables and the relationships between variables. Graphical and pictorial methods provide a visual representation of the data. Some of these methods include:

Scatter plots

Geographical Information Systems (GIS)

Visually represent the frequencies with which values of variables occur

Each value of a variable is displayed along the bottom of a histogram, and a bar is drawn for each value

The height of the bar corresponds to the frequency with which that value occurs

Display the relationship between two quantitative or numeric variables by plotting one variable against the value of another variable

For example, one axis of a scatter plot could represent height and the other could represent weight. Each person in the data would receive one data point on the scatter plot that corresponds to his or her height and weight

Geographic Information Systems (GIS)

A GIS is a computer system capable of capturing, storing, analyzing, and displaying geographically referenced information; that is, data identified according to location

Using a GIS program, a researcher can create a map to represent data relationships visually

Display networks of relationships among variables, enabling researchers to identify the nature of relationships that would otherwise be too complex to conceptualize

Visit the following websites for more information:

Graphical Analytic Techniques

Geographic Information Systems

Glossary terms related to graphical and pictorial methods:

GIS Histogram Scatter Plot Sociogram

Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics. They describe the "average" member of the population of interest. There are three measures of central tendency:

Mean  -- the sum of a variable's values divided by the total number of values Median  -- the middle value of a variable Mode  -- the value that occurs most often

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000.

Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = $225,000 Median Income = $45,000 Modal Income = $10,000

The mean is the most commonly used measure of central tendency. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution). For example, the median income is often the best measure of the average income because, while most individuals earn between $0 and $200,000, a handful of individuals earn millions.

Basic Statistics

Measures of Position

Glossary terms related to measures of central tendency:

Average Central Tendency Confidence Interval Mean Median Mode Moving Average Point Estimate Univariate Analysis

Measures of dispersion provide information about the spread of a variable's values. There are four key measures of dispersion:

Standard Deviation

Range  is simply the difference between the smallest and largest values in the data. The interquartile range is the difference between the values at the 75th percentile and the 25th percentile of the data.

Variance  is the most commonly used measure of dispersion. It is calculated by taking the average of the squared differences between each value and the mean.

Standard deviation , another commonly used statistic, is the square root of the variance.

Skew  is a measure of whether some values of a variable are extremely different from the majority of the values. For example, income is skewed because most people make between $0 and $200,000, but a handful of people earn millions. A variable is positively skewed if the extreme values are higher than the majority of values. A variable is negatively skewed if the extreme values are lower than the majority of values.

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000:

Range = 1,000,000 - 10,000 = 990,000 Variance = [(10,000 - 225,000)2 + (10,000 - 225,000)2 + (45,000 - 225,000)2 + (60,000 - 225,000)2 + (1,000,000 - 225,000)2] / 5 = 150,540,000,000 Standard Deviation = Square Root (150,540,000,000) = 387,995 Skew = Income is positively skewed

Survey Research Tools

Variance and Standard Deviation

Summarizing and Presenting Data

Skewness Simulation

Glossary terms related to measures of dispersion:

Confidence Interval Distribution Kurtosis Point Estimate Quartiles Range Skewness Standard Deviation Univariate Analysis Variance

Measures of association indicate whether two variables are related. Two measures are commonly used:

Correlation

As a measure of association between variables, chi-square tests are used on nominal data (i.e., data that are put into classes: e.g., gender [male, female] and type of job [unskilled, semi-skilled, skilled]) to determine whether they are associated*

A chi-square is called significant if there is an association between two variables, and nonsignificant if there is not an association

To test for associations, a chi-square is calculated in the following way: Suppose a researcher wants to know whether there is a relationship between gender and two types of jobs, construction worker and administrative assistant. To perform a chi-square test, the researcher counts up the number of female administrative assistants, the number of female construction workers, the number of male administrative assistants, and the number of male construction workers in the data. These counts are compared with the number that would be expected in each category if there were no association between job type and gender (this expected count is based on statistical calculations). If there is a large difference between the observed values and the expected values, the chi-square test is significant, which indicates there is an association between the two variables.

*The chi-square test can also be used as a measure of goodness of fit, to test if data from a sample come from a population with a specific distribution, as an alternative to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests. As such, the chi square test is not restricted to nominal data; with non-binned data, however, the results depend on how the bins or classes are created and the size of the sample

A correlation coefficient is used to measure the strength of the relationship between numeric variables (e.g., weight and height)

The most common correlation coefficient is  Pearson's r , which can range from -1 to +1.

If the coefficient is between 0 and 1, as one variable increases, the other also increases. This is called a positive correlation. For example, height and weight are positively correlated because taller people usually weigh more

If the correlation coefficient is between -1 and 0, as one variable increases the other decreases. This is called a negative correlation. For example, age and hours slept per night are negatively correlated because older people usually sleep fewer hours per night

Chi-Square Procedures for the Analysis of Categorical Frequency Data

Chi-square Analysis

Glossary terms related to measures of association:

Association Chi Square Correlation Correlation Coefficient Measures of Association Pearson's Correlational Coefficient Product Moment Correlation Coefficient

Logo for Portland State University Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 12: Descriptive Statistics

At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables, followed by some of the most common techniques for describing statistical relationships between variables. We then look at how to present descriptive statistics in writing and also in the form of tables and graphs that would be appropriate for an American Psychological Association (APA)-style research report. We end with some practical advice for organizing and carrying out your analyses.

Psychology Research Methods Copyright © by The Research Methods Teaching and Learning Group is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Descriptive Statistics: Reporting the Answers to the 5 Basic Questions of Who, What, Why, When, Where, and a Sixth, So What?

Affiliation.

  • 1 From the Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas.
  • PMID: 28891910
  • DOI: 10.1213/ANE.0000000000002471

Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic statistical tutorial discusses a series of fundamental concepts about descriptive statistics and their reporting. The mean, median, and mode are 3 measures of the center or central tendency of a set of data. In addition to a measure of its central tendency (mean, median, or mode), another important characteristic of a research data set is its variability or dispersion (ie, spread). In simplest terms, variability is how much the individual recorded scores or observed values differ from one another. The range, standard deviation, and interquartile range are 3 measures of variability or dispersion. The standard deviation is typically reported for a mean, and the interquartile range for a median. Testing for statistical significance, along with calculating the observed treatment effect (or the strength of the association between an exposure and an outcome), and generating a corresponding confidence interval are 3 tools commonly used by researchers (and their collaborating biostatistician or epidemiologist) to validly make inferences and more generalized conclusions from their collected data and descriptive statistics. A number of journals, including Anesthesia & Analgesia, strongly encourage or require the reporting of pertinent confidence intervals. A confidence interval can be calculated for virtually any variable or outcome measure in an experimental, quasi-experimental, or observational research study design. Generally speaking, in a clinical trial, the confidence interval is the range of values within which the true treatment effect in the population likely resides. In an observational study, the confidence interval is the range of values within which the true strength of the association between the exposure and the outcome (eg, the risk ratio or odds ratio) in the population likely resides. There are many possible ways to graphically display or illustrate different types of data. While there is often latitude as to the choice of format, ultimately, the simplest and most comprehensible format is preferred. Common examples include a histogram, bar chart, line chart or line graph, pie chart, scatterplot, and box-and-whisker plot. Valid and reliable descriptive statistics can answer basic yet important questions about a research data set, namely: "Who, What, Why, When, Where, How, How Much?"

  • Analysis of Variance
  • Biomedical Research / statistics & numerical data*
  • Computer Graphics
  • Confidence Intervals
  • Data Collection / statistics & numerical data*
  • Data Interpretation, Statistical*
  • Models, Statistical*
  • Research Design / statistics & numerical data*
  • Sample Size

Logo for Kwantlen Polytechnic University

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Descriptive Statistics

At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables, followed by some of the most common techniques for describing statistical relationships between variables. We then look at how to present descriptive statistics in writing and also in the form of tables and graphs that would be appropriate for an American Psychological Association (APA)-style research report. We end with some practical advice for organizing and carrying out your analyses.

Research Methods in Psychology Copyright © 2019 by Rajiv S. Jhangiani, I-Chant A. Chiang, Carrie Cuttler, & Dana C. Leighton is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Foundations
  • Measurement
  • Research Design
  • Conclusion Validity
  • Data Preparation
  • Correlation
  • Inferential Statistics
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of surveys.

Completely free for academics and students .

Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.

Descriptive statistics are typically distinguished from inferential statistics . With descriptive statistics you are simply describing what is or what the data shows. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what’s going on in our data.

Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research study we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive statistics help us to simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary. For instance, consider a simple number used to summarize how well a batter is performing in baseball, the batting average. This single number is simply the number of hits divided by the number of times at bat (reported to three significant digits). A batter who is hitting .333 is getting a hit one time in every three at bats. One batting .250 is hitting one time in four. The single number describes a large number of discrete events. Or, consider the scourge of many students, the Grade Point Average (GPA). This single number describes the general performance of a student across a potentially wide range of course experiences.

Every time you try to describe a large set of observations with a single indicator you run the risk of distorting the original data or losing important detail. The batting average doesn’t tell you whether the batter is hitting home runs or singles. It doesn’t tell whether she’s been in a slump or on a streak. The GPA doesn’t tell you whether the student was in difficult courses or easy ones, or whether they were courses in their major field or in other disciplines. Even given these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units.

Univariate Analysis

Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at:

  • the distribution
  • the central tendency
  • the dispersion

In most situations, we would describe all three of these characteristics for each of the variables in our study.

The Distribution

The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value. For instance, a typical way to describe the distribution of college students is by year in college, listing the number or percent of students at each of the four years. Or, we describe gender by listing the number or percent of males and females. In these cases, the variable has few enough values that we can list each one and summarize how many sample cases had the value. But what do we do for a variable like income or GPA? With these variables there can be a large number of possible values, with relatively few people having each one. In this case, we group the raw scores into categories according to ranges of values. For instance, we might look at GPA according to the letter grade ranges. Or, we might group income into four or five ranges of income values.

One of the most common ways to describe a single variable is with a frequency distribution . Depending on the particular variable, all of the data values may be represented, or you may group the values into categories first (e.g. with age, price, or temperature variables, it would usually not be sensible to determine the frequencies for each value. Rather, the value are grouped into ranges and the frequencies determined.). Frequency distributions can be depicted in two ways, as a table or as a graph. The table above shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 1. This type of graph is often referred to as a histogram or bar chart.

Distributions may also be displayed using percentages. For example, you could use percentages to describe the:

  • percentage of people in different income levels
  • percentage of people in different age ranges
  • percentage of people in different ranges of standardized test scores

Central Tendency

The central tendency of a distribution is an estimate of the “center” of a distribution of values. There are three major types of estimates of central tendency:

The Mean or average is probably the most commonly used method of describing central tendency. To compute the mean all you do is add up all the values and divide by the number of values. For example, the mean or average quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values:

The sum of these 8 values is 167 , so the mean is 167/8 = 20.875 .

The Median is the score found at the exact middle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample. For example, if there are 500 scores in the list, score #250 would be the median. If we order the 8 scores shown above, we would get:

There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20 , the median is 20 . If the two middle scores had different values, you would have to interpolate to determine the median.

The Mode is the most frequently occurring value in the set of scores. To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the model. In some distributions there is more than one modal value. For instance, in a bimodal distribution there are two values that occur most frequently.

Notice that for the same set of 8 scores we got three different values ( 20.875 , 20 , and 15 ) for the mean, median and mode respectively. If the distribution is truly normal (i.e. bell-shaped), the mean, median and mode are all equal to each other.

Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15 , so the range is 36 - 15 = 21 .

The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample. Again lets take the set of scores:

to compute the standard deviation, we first find the distance between each value and the mean. We know from above that the mean is 20.875 . So, the differences from the mean are:

Notice that values that are below the mean have negative discrepancies and values above it have positive ones. Next, we square each discrepancy:

Now, we take these “squares” and sum them to get the Sum of Squares (SS) value. Here, the sum is 350.875 . Next, we divide this sum by the number of scores minus 1 . Here, the result is 350.875 / 7 = 50.125 . This value is known as the variance . To get the standard deviation, we take the square root of the variance (remember that we squared the deviations earlier). This would be SQRT(50.125) = 7.079901129253 .

Although this computation may seem convoluted, it’s actually quite simple. To see this, consider the formula for the standard deviation:

  • X is each score,
  • X̄ is the mean (or average),
  • n is the number of values,
  • Σ means we sum across the values.

In the top part of the ratio, the numerator, we see that each score has the mean subtracted from it, the difference is squared, and the squares are summed. In the bottom part, we take the number of scores minus 1 . The ratio is the variance and the square root is the standard deviation. In English, we can describe the standard deviation as:

the square root of the sum of the squared deviations from the mean divided by the number of scores minus one.

Although we can calculate these univariate statistics by hand, it gets quite tedious when you have more than a few values and variables. Every statistics program is capable of calculating them easily for you. For instance, I put the eight scores into SPSS and got the following table as a result:

which confirms the calculations I did by hand above.

The standard deviation allows us to reach some conclusions about specific scores in our distribution. Assuming that the distribution of scores is normal or bell-shaped (or close to it!), the following conclusions can be reached:

  • approximately 68% of the scores in the sample fall within one standard deviation of the mean
  • approximately 95% of the scores in the sample fall within two standard deviations of the mean
  • approximately 99% of the scores in the sample fall within three standard deviations of the mean

For instance, since the mean in our example is 20.875 and the standard deviation is 7.0799 , we can from the above statement estimate that approximately 95% of the scores will fall in the range of 20.875-(2*7.0799) to 20.875+(2*7.0799) or between 6.7152 and 35.0348 . This kind of information is a critical stepping stone to enabling us to compare the performance of an individual on one variable with their performance on another, even when the variables are measured on entirely different scales.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

Research Methodology and Descriptive Statistics Course

Dr. Henk van der Kolk

Dr. Lyset Rekes-Mombarg

Blended Learning

You follow this course together with pre-master students of the faculty BMS. We expect that PhD students who register for this course participate actively. Although lectures and tutorials will not be organized for PhD students separately, the teachers will create a separate ‘niche’ where you can meet fellow PhD students. Although participation in lectures, tutorials and discussion boards is not formally required, it is strongly recommended.

Please contact the lecturer for the actual schedule (about 36 sessions in 2 months starting September 2, 2024)

Your details

ORIGINAL RESEARCH article

The impact of improper waste disposal on human health and the environment: a case of umgungundlovu district municipality in kwazulu natal province of south africa provisionally accepted.

  • 1 University of the Free State, South Africa

The final, formatted version of the article will be published soon.

Waste generation has increased drastically worldwide in recent decades, with less than 20% of waste recycled each year, and onethird of all food produced wasted. With Sustainable Development Goal 12 advocating for changing how we consume, produce, and dispose of items, the cruciality of driving a more sustainable future lies in how we dispose of our waste. Improper waste disposal has always been a global concern. This study assessed the impacts of improper waste disposal on human health and the environment in the KwaZulu Natal Province of South Africa. The study applied a mixed-method approach to a semi-structured questionnaire.Using Statistical Package for Social Scientists (SPSS) and Microsoft Excel, the study applied a series of Chi-Squared tests of independence, regression, and descriptive statistics to the data. This study shed light on the complex dynamics surrounding the awareness and perception of risks associated with improper waste disposal. While a fair level of knowledge exists concerning the general risks, there are notable gaps in understanding specific human health risks related to improper waste disposal. The statistically insignificant relationships between demographic variables and critical questions regarding risk awareness suggest that demographic factors do not significantly influence awareness. This implies a need for targeted educational campaigns that transcend demographic boundaries to address the identified gaps in knowledge. Furthermore, the findings highlight a critical disparity in awareness regarding specific human health risks associated with improper waste disposal. This underscores the importance of enhancing public education and outreach programs to ensure a comprehensive understanding of the potential dangers to human health. The insignificant relationship between information availability and community concern about health impacts emphasizes the need for improved communication strategies. Efforts should focus on delivering accurate and accessible information to communities, fostering a sense of concern and responsibility regarding the health implications of improper waste disposal. The statistically significant relationship revealed by the regression model on the cost of clean-up for the municipality and waste generation necessitates re-evaluating waste management policies. The study municipality should explore sustainable waste management practices to mitigate the economic burden posed by increased waste generation.

Keywords: Nelisiwe Manqele: Conceptualization, Formal analysis, Funding acquisition, investigation, methodology, project administration, resources, Software

Received: 14 Feb 2024; Accepted: 13 May 2024.

Copyright: © 2024 Raphela, Manqele and Erasmus. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Dr. Tlou Raphela, University of the Free State, Bloemfontein, South Africa

People also looked at

IMAGES

  1. Standard statistical tools in research and data analysis

    research methodology descriptive statistics

  2. Descriptive Statistics

    research methodology descriptive statistics

  3. Research 101: Descriptive statistics

    research methodology descriptive statistics

  4. Descriptive Statistics (Data Science)

    research methodology descriptive statistics

  5. Descriptive Statistics

    research methodology descriptive statistics

  6. Accounting Nest

    research methodology descriptive statistics

VIDEO

  1. Reporting Descriptive Analysis

  2. Descriptive Research Design #researchmethodology

  3. Research Methodology: Part 2-Frequency and Descriptive Statistics

  4. Methodology: Descriptive

  5. Important Statistics from Research Methodology || Descriptive and Inferential Statistics || UGC NET

  6. Descriptive Analysis

COMMENTS

  1. Descriptive Statistics

    Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population. ... If you want to know more about statistics, methodology, or research bias, make sure to check out some of our other articles with explanations and examples. Statistics ...

  2. Descriptive Statistics

    Descriptive statistics summarise and organise characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population. In quantitative research, after collecting data, ... The mean, or M, is the most commonly used method for finding the average. To find the mean, simply add up all response ...

  3. What Is Descriptive Statistics: Full Explainer With Examples

    Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis. Measures of central tendency include the mean (average), median and mode. Skewness indicates whether a dataset leans to one side or another. Measures of dispersion include the range, variance and standard deviation.

  4. Descriptive Statistics for Summarising Data

    Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).

  5. Descriptive Statistics

    Descriptive Statistics. Descriptive statistics is a branch of statistics that deals with the summarization and description of collected data. This type of statistics is used to simplify and present data in a manner that is easy to understand, often through visual or numerical methods.

  6. Descriptive Research Design

    Here are some common methods of data analysis for descriptive research: Descriptive Statistics. This method involves analyzing data to summarize and describe the key features of a sample or population. Descriptive statistics can include measures of central tendency (e.g., mean, median, mode) and measures of variability (e.g., range, standard ...

  7. (PDF) Introduction to Descriptive statistics

    Descriptive statistics are used to examine methods of collecting, tidying up, and presenting research data (Alabi & Bukola, 2023). In addition to descriptive, there is also an evaluative analysis ...

  8. Introduction to Descriptive Statistics

    Overall, descriptive statistics is a critical methodology in academic research that helps researchers to describe and understand the characteristics of their data. By using descriptive statistics, researchers can draw meaningful insights and conclusions from their data, and communicate these findings to others in a clear and concise manner.

  9. Descriptive statistics

    Research. A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, [1] while descriptive statistics (in the mass noun sense) is the process of using and analysing those statistics. Descriptive statistics is distinguished from inferential ...

  10. Study designs: Part 2

    INTRODUCTION. In our previous article in this series, [ 1] we introduced the concept of "study designs"- as "the set of methods and procedures used to collect and analyze data on variables specified in a particular research question.". Study designs are primarily of two types - observational and interventional, with the former being ...

  11. Descriptive Statistics in Research: Your Complete Guide- Qualtrics

    It's also important to note that descriptive statistics can employ and use both quantitative and qualitative research. Describing data is undoubtedly the most critical first step in research as it enables the subsequent organization, simplification and summarization of information — and every survey question and population has summary ...

  12. 10 Descriptive Statistics

    10. Descriptive Statistics. This introduction doesn't actually introduce the topic, but is rather meant as a reminder about how this and subsequent chapters will be structured. The first half will describe the concepts used in the chapter, and why they're useful. There wont be any coding shown in that portion of the chapter, but there will ...

  13. Descriptive Statistics

    Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a dataset and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of: Graphical/Pictorial Methods. Measures of Central Tendency.

  14. Chapter 12: Descriptive Statistics

    Chapter 12: Descriptive Statistics. At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single ...

  15. The Beginner's Guide to Statistical Analysis

    Table of contents. Step 1: Write your hypotheses and plan your research design. Step 2: Collect data from a sample. Step 3: Summarize your data with descriptive statistics. Step 4: Test hypotheses or make estimates with inferential statistics.

  16. Descriptive Statistics: Reporting the Answers to the 5 Basic Questions

    Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic …

  17. Descriptive Statistics

    Descriptive Statistics. At this point, we need to consider the basics of data analysis in psychological research in more detail. In this chapter, we focus on descriptive statistics—a set of techniques for summarizing and displaying the data from your sample. We look first at some of the most common techniques for describing single variables ...

  18. Basic statistical tools in research and data analysis

    Descriptive statistics try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. ... It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an ...

  19. Descriptive Statistics

    Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. Descriptive statistics are typically distinguished from inferential statistics.

  20. Qualitative and descriptive research: Data type versus data analysis

    Qualitative and descriptive research methods have been very common procedures for conducting research in many disciplines, including education, psychology, and social sciences. These types of research have also begun to be increasingly used in the field of second language teaching and learning. ... Chi-square statistics were used to examine the ...

  21. Research Methodology and Descriptive Statistics Course

    CONTENT. This course introduces the basic principles of empirical research in the social sciences. The role of research in the context of the empirical cycle (i.e. testing theories) and research in the context of problem solving and design will be discussed. Students will learn to formulate clear and answerable empirical research questions.

  22. ORIGINAL RESEARCH article

    The study applied a mixed-method approach to a semi-structured questionnaire.Using Statistical Package for Social Scientists (SPSS) and Microsoft Excel, the study applied a series of Chi-Squared tests of independence, regression, and descriptive statistics to the data.