case study descriptive statistics

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
Duis aute irure dolor in reprehenderit in voluptate
Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

1.4 - example: descriptive statistics, example 1-5: women's health survey (descriptive statistics) section .

Let us take a look at an example. In 1985, the USDA commissioned a study of women’s nutrition. Nutrient intake was measured for a random sample of 737 women aged 25-50 years. The following variables were measured:

Calcium(mg)
Vitamin A(μg)
Vitamin C(mg)

Using Technology

Example

We will use the SAS program to carry out the calculations that we would like to see.

Download the data file: nutrient.csv

The lines of this program are saved in a simple text file with a .sas file extension. If you have SAS installed on the machine on which you have downloaded this file, it should launch SAS and open the program within the SAS application. Marking up a printout of the SAS program is also a good strategy for learning how this program is put together.

Note : In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.

The first part of this SAS output, (download below), is the results of the Means Procedure - proc means. Because the SAS output is usually a relatively long document, printing these pages of output out and marking them with notes is highly recommended if not required!

Example: Nutrient Intake Data - Descriptive Statistics

The MEANS Procedure

The Means Procedure

Summary statistics.

Download the SAS Output file: nutrient2.lst

The first column of the Means Procedure table above gives the variable name. The second column reports the sample size. This is then followed by the sample means (third column) and the sample standard deviations (fourth column) for each variable. I have copied these values into the table below. I have also rounded these numbers a bit to make them easier to use for this example.

Here are the steps to find the descriptive statistics for the Women's Nutrition dataset in Minitab:

Descriptive Statistics in Minitab

Go to File > Open > Worksheet [open nutrient_tf.csv ]
Highlight and select C2 through C6 and choose ‘Select ’ to move the variables into the window on the right.
Select ‘ Statistics... ’, and check the boxes for the statistics of interest.

Descriptive Statistics

A summary of the descriptive statistics is given here for ease of reference.

Notice that the standard deviations are large relative to their respective means, especially for Vitamin A & C. This would indicate a high variability among women in nutrient intake. However, whether the standard deviations are relatively large or not, will depend on the context of the application. Skill in interpreting the statistical analysis depends very much on the researcher's subject matter knowledge.

The variance-covariance matrix is also copied into the matrix below.

\[S = \left(\begin{array}{RRRRR}157829.4 & 940.1 & 6075.8 & 102411.1 & 6701.6 \\ 940.1 & 35.8 & 114.1 & 2383.2 & 137.7 \\ 6075.8 & 114.1 & 934.9 & 7330.1 & 477.2 \\ 102411.1 & 2383.2 & 7330.1 & 2668452.4 & 22063.3 \\ 6701.6 & 137.7 & 477.2 & 22063.3 & 5416.3 \end{array}\right)\]

Interpretation

Because this covariance is positive, we see that calcium intake tends to increase with increasing iron intake. The strength of this positive association can only be judged by comparing s 12 to the product of the sample standard deviations for calcium and iron. This comparison is most readily accomplished by looking at the sample correlation between the two variables.

The sample variances are given by the diagonal elements of S . For example, the variance of iron intake is $s_{2}^{2}$. 35. 8 mg 2 .
The covariances are given by the off-diagonal elements of S . For example, the covariance between calcium and iron intake is $s_{12}$= 940. 1.
Note that, the covariances are all positive, indicating that the daily intake of each nutrient increases with increased intake of the remaining nutrients.

Sample Correlations

The sample correlations are included in the table below.

Here we can see that the correlation between each of the variables and themselves is all equal to one, and the off-diagonal elements give the correlation between each of the pairs of variables.

Generally, we look for the strongest correlations first. The results above suggest that protein, iron, and calcium are all positively associated. Each of these three nutrient increases with increasing values of the remaining two.

The coefficient of determination is another measure of association and is simply equal to the square of the correlation. For example, in this case, the coefficient of determination between protein and iron is $(0.623)^2$ or about 0.388.

\[r^2_{23} = 0.62337^2 = 0.38859\]

This says that about 39% of the variation in iron intake is explained by protein intake. Or, conversely, 39% of the protein intake is explained by the variation in the iron intake. Both interpretations are equivalent.

Cookie Policy

We use cookies to operate this website, improve usability, personalize your experience, and improve our marketing. Privacy Policy .

By clicking "Accept" or further use of this website, you agree to allow cookies.

Data Science
Data Analytics
Machine Learning

Essential Statistics for Data Science: A Case Study using Python, Part I

Get to know some of the essential statistics you should be very familiar with when learning data science

LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.

Our last post dove straight into linear regression. In this post, we'll take a step back to cover essential statistics that every data scientist should know. To demonstrate these essentials, we'll look at a hypothetical case study involving an administrator tasked with improving school performance in Tennessee.

You should already know:

Python fundamentals — learn on dataquest.io

Note, this tutorial is intended to serve solely as an educational tool and not as a scientific explanation of the causes of various school outcomes in Tennessee .

Article Resources

Notebook and Data: Github
Libraries: pandas, matplotlib, seaborn

Introduction

Meet Sally, a public school administrator. Some schools in her state of Tennessee are performing below average academically. Her superintendent, under pressure from frustrated parents and voters, approached Sally with the task of understanding why these schools are under-performing. Not an easy problem, to be sure.

To improve school performance, Sally needs to learn more about these schools and their students, just as a business needs to understand its own strengths and weaknesses and its customers.

Though Sally is eager to build an impressive explanatory model, she knows the importance of conducting preliminary research to prevent possible pitfalls or blind spots (e.g. cognitive bias'). Thus, she engages in a thorough exploratory analysis, which includes: a lit review, data collection, descriptive and inferential statistics, and data visualization.

Sally has strong opinions as to why some schools are under-performing, but opinions won't do, nor will a handful of facts; she needs rigorous statistical evidence.

Sally conducts a lit review, which involves reading a variety of credible sources to familiarize herself with the topic. Most importantly, Sally keeps an open mind and embraces a scientific world view to help her resist confirmation bias (seeking solely to confirm one's own world view).

In Sally's lit review, she finds multiple compelling explanations of school performance: curriculae , income , and parental involvement . These sources will help Sally select her model and data, and will guide her interpretation of the results.

Data Collection

The data we want isn't always available, but Sally lucks out and finds student performance data based on test scores ( school_rating ) for every public school in middle Tennessee. The data also includes various demographic, school faculty, and income variables (see readme for more information). Satisfied with this dataset, she writes a web-scraper to retrieve the data.

But data alone can't help Sally; she needs to convert the data into useful information.

Descriptive and Inferential Statistics

Sally opens her stats textbook and finds that there are two major types of statistics, descriptive and inferential.

Descriptive statistics identify patterns in the data, but they don't allow for making hypotheses about the data.

Within descriptive statistics, there are two measures used to describe the data: central tendency and deviation . Central tendency refers to the central position of the data (mean, median, mode) while the deviation describes how far spread out the data are from the mean. Deviation is most commonly measured with the standard deviation. A small standard deviation indicates the data are close to the mean, while a large standard deviation indicates that the data are more spread out from the mean.

Inferential statistics allow us to make hypotheses (or inferences ) about a sample that can be applied to the population. For Sally, this involves developing a hypothesis about her sample of middle Tennessee schools and applying it to her population of all schools in Tennessee.

For now, Sally puts aside inferential statistics and digs into descriptive statistics.

To begin learning about the sample, Sally uses pandas' describe method, as seen below. The column headers in bold text represent the variables Sally will be exploring. Each row header represents a descriptive statistic about the corresponding column.

Looking at the output above, Sally's variables can be put into two classes: measurements and indicators.

Measurements are variables that can be quantified. All data in the output above are measurements. Some of these measurements, such as state_percentile_16 , avg_score_16 and school_rating , are outcomes; these outcomes cannot be used to explain one another. For example, explaining school_rating as a result of state_percentile_16 (test scores) is circular logic. Therefore we need a second class of variables.

The second class, indicators, are used to explain our outcomes. Sally chooses indicators that describe the student body (for example, reduced_lunch ) or school administration ( stu_teach_ratio ) hoping they will explain school_rating .

Sally sees a pattern in one of the indicators, reduced_lunch . reduced_lunch is a variable measuring the average percentage of students per school enrolled in a federal program that provides lunches for students from lower-income households. In short, reduced_lunch is a good proxy for household income, which Sally remembers from her lit review was correlated with school performance.

Sally isolates reduced_lunch and groups the data by school_rating using pandas' groupby method and then uses describe on the re-shaped data (see below).

Below is a discussion of the metrics from the table above and what each result indicates about the relationship between school_rating and reduced_lunch :

count : the number of schools at each rating. Most of the schools in Sally's sample have a 4- or 5-star rating, but 25% of schools have a 1-star rating or below. This confirms that poor school performance isn't merely anecdotal, but a serious problem that deserves attention.

mean : the average percentage of students on reduced_lunch among all schools by each school_rating . As school performance increases, the average number of students on reduced lunch decreases. Schools with a 0-star rating have 83.6% of students on reduced lunch. And on the other end of the spectrum, 5-star schools on average have 21.6% of students on reduced lunch. We'll examine this pattern further. in the graphing section.

std : the standard deviation of the variable. Referring to the school_rating of 0, a standard deviation of 8.813498 indicates that 68.2% (refer to readme ) of all observations are within 8.81 percentage points on either side of the average, 83.6%. Note that the standard deviation increases as school_rating increases, indicating that reduced_lunch loses explanatory power as school performance improves. As with the mean, we'll explore this idea further in the graphing section.

min : the minimum value of the variable. This represents the school with the lowest percentage of students on reduced lunch at each school rating. For 0- and 1-star schools, the minimum percentage of students on reduced lunch is 53%. The minimum for 5-star schools is 2%. The minimum value tells a similar story as the mean, but looking at it from the low end of the range of observations.

25% : the bottom quartile; represents the lowest 25% of values for the variable, reduced_lunch . For 0-star schools, 25% of the observations are less than 79.5%. Sally sees the same trend in the bottom quartile as the above metrics: as school_rating increases the bottom 25% of reduced_lunch decreases.

50% : the second quartile; represents the lowest 50% of values. Looking at the trend in school_rating and reduced_lunch , the same relationship is present here.

75% : the top quartile; represents the lowest 75% of values. The trend continues.

max : the maximum value for that variable. You guessed it: the trend continues!

The descriptive statistics consistently reveal that schools with more students on reduced lunch under-perform when compared to their peers. Sally is on to something.

Sally decides to look at reduced_lunch from another angle using a correlation matrix with pandas' corr method. The values in the correlation matrix table will be between -1 and 1 (see below). A value of -1 indicates the strongest possible negative correlation, meaning as one variable decreases the other increases. And a value of 1 indicates the opposite. The result below, -0.815757, indicates strong negative correlation between reduced_lunch and school_rating . There's clearly a relationship between the two variables.

Sally continues to explore this relationship graphically.

Essential Graphs for Exploring Data

Box-and-whisker plot.

In her stats book, Sally sees a box-and-whisker plot . A box-and-whisker plot is helpful for visualizing the distribution of the data from the mean. Understanding the distribution allows Sally to understand how far spread out her data is from the mean; the larger the spread from the mean, the less robust reduced_lunch is at explaining school_rating .

See below for an explanation of the box-and-whisker plot.

Now that Sally knows how to read the box-and-whisker plot, she graphs reduced_lunch to see the distributions. See below.

In her box-and-whisker plots, Sally sees that the minimum and maximum reduced_lunch values tend to get closer to the mean as school_rating decreases; that is, as school_rating decreases so does the standard deviation in reduced_lunch .

What does this mean?

Starting with the top box-and-whisker plot, as school_rating decreases, reduced_lunch becomes a more powerful way to explain outcomes. This could be because as parents' incomes decrease they have fewer resources to devote to their children's education (such as, after-school programs, tutors, time spent on homework, computer camps, etc) than higher-income parents. Above a 3-star rating, more predictors are needed to explain school_rating due to an increasing spread in reduced_lunch .

Having used box-and-whisker plots to reaffirm her idea that household income and school performance are related, Sally seeks further validation.

Scatter Plot

To further examine the relationship between school_rating and reduced_lunch , Sally graphs the two variables on a scatter plot. See below.

In the scatter plot above, each dot represents a school. The placement of the dot represents that school's rating (Y-axis) and the percentage of its students on reduced lunch (x-axis).

The downward trend line shows the negative correlation between school_rating and reduced_lunch (as one increases, the other decreases). The slope of the trend line indicates how much school_rating decreases as reduced_lunch increases. A steeper slope would indicate that a small change in reduced_lunch has a big impact on school_rating while a more horizontal slope would indicate that the same small change in reduced_lunch has a smaller impact on school_rating .

Sally notices that the scatter plot further supports what she saw with the box-and-whisker plot: when reduced_lunch increases, school_rating decreases. The tighter spread of the data as school_rating declines indicates the increasing influence of reduced_lunch . Now she has a hypothesis.

Correlation Matrix

Sally is ready to test her hypothesis: a negative relationship exists between school_rating and reduced_lunch (to be covered in a follow up article). If the test is successful, she'll need to build a more robust model using additional variables. If the test fails, she'll need to re-visit her dataset to choose other variables that possibly explain school_rating . Either way, Sally could benefit from an efficient way of assessing relationships among her variables.

An efficient graph for assessing relationships is the correlation matrix, as seen below; its color-coded cells make it easier to interpret than the tabular correlation matrix above. Red cells indicate positive correlation; blue cells indicate negative correlation; white cells indicate no correlation. The darker the colors, the stronger the correlation (positive or negative) between those two variables.

With the correlation matrix in mind as a future starting point for finding additional variables, Sally moves on for now and prepares to test her hypothesis.

Sally was approached with a problem: why are some schools in middle Tennessee under-performing? To answer this question, she did the following:

Conducted a lit review to educate herself on the topic.
Gathered data from a reputable source to explore school ratings and characteristics of the student bodies and schools in middle Tennessee.
The data indicated a robust relationship between school_rating and reduced_lunch .
Explored the data visually.
Though satisfied with her preliminary findings, Sally is keeping her mind open to other explanations.
Developed a hypothesis: a negative relationship exists between school_rating and reduced_lunch .

In a follow up article, Sally will test her hypothesis. Should she find a satisfactory explanation for her sample of schools, she will attempt to apply her explanation to the population of schools in Tennessee.

Course Recommendations

Further learning:, applied data science with python — coursera, statistics and data science micromasters — edx, get updates in your inbox.

Join over 7,500 data science learners.

Recent articles:

The 9 best ai courses for 2024 (and two to avoid), the 6 best python courses for 2024 – ranked by software engineer, best course deals for black friday and cyber monday 2024, sigmoid function, 7 best artificial intelligence (ai) courses.

Top courses you can take today to begin your journey into the Artificial Intelligence field.

Meet the Authors

A graduate of Belmont University, Tim is a Nashville, TN-based software engineer and statistician at Perception Health, an industry leader in healthcare analytics, and co-founder of Sidekick, LLC, a data consulting company. Find him on Twitter and GitHub .

John Burke Data Scientist Author @ Learn Data Sci

John is a research analyst at Laffer Associates, a macroeconomic consulting firm based in Nashville, TN. He graduated from Belmont University. Find him on GitHub and LinkedIn

Back to blog index

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Perspect Clin Res
v.10(1); Jan-Mar 2019

Study designs: Part 2 – Descriptive studies

Rakesh aggarwal.

Department of Gastroenterology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Priya Ranganathan

1 Department of Anaesthesiology, Tata Memorial Centre, Mumbai, Maharashtra, India

One of the first steps in planning a research study is the choice of study design. The available study designs are divided broadly into two types – observational and interventional. Of the various observational study designs, the descriptive design is the simplest. It allows the researcher to study and describe the distribution of one or more variables, without regard to any causal or other hypotheses. This article discusses the subtypes of descriptive study design, and their strengths and limitations.

INTRODUCTION

In our previous article in this series,[ 1 ] we introduced the concept of “study designs”– as “the set of methods and procedures used to collect and analyze data on variables specified in a particular research question.” Study designs are primarily of two types – observational and interventional, with the former being loosely divided into “descriptive” and “analytical.” In this article, we discuss the descriptive study designs.

WHAT IS A DESCRIPTIVE STUDY?

A descriptive study is one that is designed to describe the distribution of one or more variables, without regard to any causal or other hypothesis.

TYPES OF DESCRIPTIVE STUDIES

Descriptive studies can be of several types, namely, case reports, case series, cross-sectional studies, and ecological studies. In the first three of these, data are collected on individuals, whereas the last one uses aggregated data for groups.

Case reports and case series

A case report refers to the description of a patient with an unusual disease or with simultaneous occurrence of more than one condition. A case series is similar, except that it is an aggregation of multiple (often only a few) similar cases. Many case reports and case series are anecdotal and of limited value. However, some of these bring to the fore a hitherto unrecognized disease and play an important role in advancing medical science. For instance, HIV/AIDS was first recognized through a case report of disseminated Kaposi's sarcoma in a young homosexual man,[ 2 ] and a case series of such men with Pneumocystis carinii pneumonia.[ 3 ]

In other cases, description of a chance observation may open an entirely new line of investigation. Some examples include: fatal disseminated Bacillus Calmette–Guérin infection in a baby born to a mother taking infliximab for Crohn's disease suggesting that adminstration of infliximab may bring about reactivation of tuberculosis,[ 4 ] progressive multifocal leukoencephalopathy following natalizumab treatment – describing a new adverse effect of drugs that target cell adhesion molecule α4-integrin,[ 5 ] and demonstration of a tumor caused by invasive transformed cancer cells from a colonizing tapeworm in an HIV-infected person.[ 6 ]

Cross-sectional studies

Studies with a cross-sectional study design involve the collection of information on the presence or level of one or more variables of interest (health-related characteristic), whether exposure (e.g., a risk factor) or outcome (e.g., a disease) as they exist in a defined population at one particular time. If these data are analyzed only to determine the distribution of one or more variables, these are “descriptive.” However, often, in a cross-sectional study, the investigator also assesses the relationship between the presence of an exposure and that of an outcome. Such cross-sectional studies are referred to as “analytical” and will be discussed in the next article in this series.

Cross-sectional studies can be thought of as providing a “snapshot” of the frequency and characteristics of a disease in a population at a particular point in time. These are very good for measuring the prevalence of a disease or of a risk factor in a population. Thus, these are very helpful in assessing the disease burden and healthcare needs.

Let us look at a study that was aimed to assess the prevalence of myopia among Indian children.[ 7 ] In this study, trained health workers visited schools in Delhi and tested visual acuity in all children studying in classes 1–9. Of the 9884 children screened, 1297 (13.1%) had myopia (defined as spherical refractive error of −0.50 diopters (D) or worse in either or both eyes), and the mean myopic error was −1.86 ± 1.4 D. Furthermore, overall, 322 (3.3%), 247 (2.5%) and 3 children had mild, moderate, and severe visual impairment, respectively. These parts of the study looked at the prevalence and degree of myopia or of visual impairment, and did not assess the relationship of one variable with another or test a causative hypothesis – these qualify as a descriptive cross-sectional study. These data would be helpful to a health planner to assess the need for a school eye health program, and to know the proportion of children in her jurisdiction who would need corrective glasses.

The authors did, subsequently in the paper, look at the relationship of myopia (an outcome) with children's age, gender, socioeconomic status, type of school, mother's education, etc. (each of which qualifies as an exposure). Those parts of the paper look at the relationship between different variables and thus qualify as having “analytical” cross-sectional design.

Sometimes, cross-sectional studies are repeated after a time interval in the same population (using the same subjects as were included in the initial study, or a fresh sample) to identify temporal trends in the occurrence of one or more variables, and to determine the incidence of a disease (i.e., number of new cases) or its natural history. Indeed, the investigators in the myopia study above visited the same children and reassessed them a year later. This separate follow-up study[ 8 ] showed that “new” myopia had developed in 3.4% of children (incidence rate), with a mean change of −1.09 ± 0.55 D. Among those with myopia at the time of the initial survey, 49.2% showed progression of myopia with a mean change of −0.27 ± 0.42 D.

Cross-sectional studies are usually simple to do and inexpensive. Furthermore, these usually do not pose much of a challenge from an ethics viewpoint.

However, this design does carry a risk of bias, i.e., the results of the study may not represent the true situation in the population. This could arise from either selection bias or measurement bias. The former relates to differences between the population and the sample studied. The myopia study included only those children who attended school, and the prevalence of myopia could have been different in those did not attend school (e.g., those with severe myopia may not be able to see the blackboard and hence may have been more likely to drop out of school). The measurement bias in this study would relate to the accuracy of measurement and the cutoff used. If the investigators had used a cutoff of −0.25 D (instead of −0.50 D) to define myopia, the prevalence would have been higher. Furthermore, if the measurements were not done accurately, some cases with myopia could have been missed, or vice versa, affecting the study results.

Ecological studies

Ecological (also sometimes called as correlational) study design involves looking for association between an exposure and an outcome across populations rather than in individuals. For instance, a study in the United States found a relation between household firearm ownership in various states and the firearm death rates during the period 2007–2010.[ 9 ] Thus, in this study, the unit of assessment was a state and not an individual.

These studies are convenient to do since the data have often already been collected and are available from a reliable source. This design is particularly useful when the differences in exposure between individuals within a group are much smaller than the differences in exposure between groups. For instance, the intake of particular food items is likely to vary less between people in a particular group but can vary widely across groups, for example, people living in different countries.

However, the ecological study design has some important limitations.First, an association between exposure and outcome at the group level may not be true at the individual level (a phenomenon also referred to as “ecological fallacy”).[ 10 ] Second, the association may be related to a third factor which in turn is related to both the exposure and the outcome, the so-called “confounding”. For instance, an ecological association between higher income level and greater cardiovascular mortality across countries may be related to a higher prevalence of obesity. Third, migration of people between regions with different exposure levels may also introduce an error. A fourth consideration may be the use of differing definitions for exposure, outcome or both in different populations.

Descriptive studies, irrespective of the subtype, are often very easy to conduct. For case reports, case series, and ecological studies, the data are already available. For cross-sectional studies, these can be easily collected (usually in one encounter). Thus, these study designs are often inexpensive, quick and do not need too much effort. Furthermore, these studies often do not face serious ethics scrutiny, except if the information sought to be collected is of confidential nature (e.g., sexual practices, substance use, etc.).

Descriptive studies are useful for estimating the burden of disease (e.g., prevalence or incidence) in a population. This information is useful for resource planning. For instance, information on prevalence of cataract in a city may help the government decide on the appropriate number of ophthalmologic facilities. Data from descriptive studies done in different populations or done at different times in the same population may help identify geographic variation and temporal change in the frequency of disease. This may help generate hypotheses regarding the cause of the disease, which can then be verified using another, more complex design.

DISADVANTAGES

As with other study designs, descriptive studies have their own pitfalls. Case reports and case-series refer to a solitary patient or to only a few cases, who may represent a chance occurrence. Hence, conclusions based on these run the risk of being non-representative, and hence unreliable. In cross-sectional studies, the validity of results is highly dependent on whether the study sample is well representative of the population proposed to be studied, and whether all the individual measurements were made using an accurate and identical tool, or not. If the information on a variable cannot be obtained accurately, for instance in a study where the participants are asked about socially unacceptable (e.g., promiscuity) or illegal (e.g., substance use) behavior, the results are unlikely to be reliable.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Python for Data Science

Table of contents.

Introduction
Descriptive statistics with Python
... using Pandas
... using Researchpy

Descriptive statistics

Descriptive statistics summarizes the data and are broken down into measures of central tendency (mean, median, and mode) and measures of variability (standard deviation, minimum/maximum values, range, kurtosis, and skewness). Example data to be used on this page is [3, 5, 7, 8, 8, 9, 10, 11]. Measures of Central Tendency Mean The average value of the data. Can be calculated by adding all the measurements of a variable together and dividing that summation by the number of observations used. The formula is displayed below. $$ \bar{x} = \frac{\sum x}{n} \\ \\ \begin{align} \text{Where,} \\ \text{$\bar{x}$ is the estimated average} \\ \text{$\sum$ indicates to add all the values in the data} \\ \text{$x$ represents the measurements, and} \\ \text{$n$ is the total number of observations} \end{align} $$ Calculating the mean using the example data. $$ \bar{x} = \frac{3 + 5 + 7 + 8 + 8 + 9 + 10 + 11}{8} \\ \\ \bar{x} = 7.625 $$ Median The middle value when the measurements are placed in ascending order. If there is no true midpoint, the median is calculated by adding the two midpoints together and dividing by 2. $$ \text{median} = \frac{8 + 8}{2} \\ \text{median} = 8 $$ Mode The number that occurs the most in the set of measurements. Measures of Variability Variance The sum of the squared deviations divided by the number of observations - 1. Using this definition is considered an unbiased estimate of the population variance. Variance does not have a unit of measurement. $$ s^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1} \\ \begin{align} \text{Where,} \\ \text{$x_i$ is the $i^{th}$ value of the measurement} \\ \text{$\bar{x}$ is the estimated average} \\ \text{$\sum$ indicates to add all the values in the data} \\ \text{$n$ is the total number of observations} \end{align} $$ Standard deviation The positive square root of the variance. Standard deviation can be interpreted using the unit of measurement of the observations used. $$ \sqrt{s^2} $$ Minimum value The smallest value of the measurements. Maximum value The largest value of the measurements. Range The difference between the maximum and minimum values. Kurtois Is a measure of tailedness of a distribution. Skew Is a measure of symmetry of the distribution of the data.

Descriptive Statistics with Python

There are a few ways to get descriptive statistics using Python. Below will show how to get descriptive statistics using Pandas and Researchpy. First, let's import an example data set.

Continuous variables

This method returns many useful descriptive statistics with a mix of measures of central tendency and measures of variability. This includes the number of non-missing observations; the mean; standard deviation; minimum value; 25 th , 50 th (a.k.a. the median), and 75 th percentile; as well as the maximum value. It's missing some useful information that is typically desired regarding the mean, this is the standard error and the 95% confidence interval. No worries though, pairing this with Researcpy's summary_cont() method provides the descriptive statistic information that is wanted - this method will be shown later.

Categorical variables

Using both the describe() and value_counts() methods are useful since they compliment each other with the information returned. The describe() method says that "Female" occurs more than "Male" but one can see that is not the case since they both occur an equal amount. For more information about these methods, please see their official documentation page for describe() and value_counts() .

Distribution measures

For more information on these methods, please see their official documentation page for kurtosis() and skew() .

This method returns less overall information compared to the describe() method, but it does return more in-depth information regarding the mean. It returns the non-missing count, mean, stand deviation (SD). standard error (SE), and the 95% confidence interval.

The method returns the variable name, the non-missing count, and the percentage of each category of a variable. By default, the outcomes are sorted in descending order. For more information about these methods, please see the official documentation for summary_cont() and summary_cont() .

Business Essentials
Leadership & Management
Credential of Leadership, Impact, and Management in Business (CLIMB)
Entrepreneurship & Innovation
Digital Transformation
Finance & Accounting
Business in Society
For Organizations
Support Portal
Media Coverage
Founding Donors
Leadership Team

Harvard Business School →
HBS Online →
Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

Career Development
Communication
Decision-Making
Earning Your MBA
Negotiation
News & Events
Productivity
Staff Spotlight
Student Profiles
Work-Life Balance
AI Essentials for Business
Alternative Investments
Business Analytics
Business Strategy
Business and Climate Change
Design Thinking and Innovation
Digital Marketing Strategy
Disruptive Strategy
Economics for Managers
Entrepreneurship Essentials
Financial Accounting
Global Business
Launching Tech Ventures
Leadership Principles
Leadership, Ethics, and Corporate Accountability
Leading Change and Organizational Renewal
Leading with Finance
Management Essentials
Negotiation Mastery
Organizational Leadership
Power and Influence for Positive Impact
Strategy Execution
Sustainable Business Strategy
Sustainable Investing
Winning with Digital Platforms

What Is Descriptive Analytics? 5 Examples

Professional looking at descriptive analytics on computer

09 Nov 2021

Data analytics is a valuable tool for businesses aiming to increase revenue, improve products, and retain customers. According to research by global management consulting firm McKinsey & Company, companies that use data analytics are 23 times more likely to outperform competitors in terms of new customer acquisition than non-data-driven companies. They were also nine times more likely to surpass them in measures of customer loyalty and 19 times more likely to achieve above-average profitability.

Data analytics can be broken into four key types :

Descriptive, which answers the question, “What happened?”
Diagnostic , which answers the question, “Why did this happen?”
Predictive , which answers the question, “What might happen in the future?”
Prescriptive , which answers the question, “What should we do next?”

Each type of data analysis can help you reach specific goals and be used in tandem to create a full picture of data that informs your organization’s strategy formulation and decision-making.

Descriptive analytics can be leveraged on its own or act as a foundation for the other three analytics types. If you’re new to the field of business analytics, descriptive analytics is an accessible and rewarding place to start.

Access your free e-book today.

What Is Descriptive Analytics?

Descriptive analytics is the process of using current and historical data to identify trends and relationships. It’s sometimes called the simplest form of data analysis because it describes trends and relationships but doesn’t dig deeper.

Descriptive analytics is relatively accessible and likely something your organization uses daily. Basic statistical software, such as Microsoft Excel or data visualization tools , such as Google Charts and Tableau, can help parse data, identify trends and relationships between variables, and visually display information.

Descriptive analytics is especially useful for communicating change over time and uses trends as a springboard for further analysis to drive decision-making .

Here are five examples of descriptive analytics in action to apply at your organization.

Related: 5 Business Analytics Skills for Professionals

5 Examples of Descriptive Analytics

1. traffic and engagement reports.

One example of descriptive analytics is reporting. If your organization tracks engagement in the form of social media analytics or web traffic, you’re already using descriptive analytics.

These reports are created by taking raw data—generated when users interact with your website, advertisements, or social media content—and using it to compare current metrics to historical metrics and visualize trends.

For example, you may be responsible for reporting on which media channels drive the most traffic to the product page of your company’s website. Using descriptive analytics, you can analyze the page’s traffic data to determine the number of users from each source. You may decide to take it one step further and compare traffic source data to historical data from the same sources. This can enable you to update your team on movement; for instance, highlighting that traffic from paid advertisements increased 20 percent year over year.

The three other analytics types can then be used to determine why traffic from each source increased or decreased over time, if trends are predicted to continue, and what your team’s best course of action is moving forward.

2. Financial Statement Analysis

Another example of descriptive analytics that may be familiar to you is financial statement analysis. Financial statements are periodic reports that detail financial information about a business and, together, give a holistic view of a company’s financial health.

There are several types of financial statements, including the balance sheet , income statement , cash flow statement , and statement of shareholders’ equity. Each caters to a specific audience and conveys different information about a company’s finances.

Financial statement analysis can be done in three primary ways: vertical, horizontal, and ratio.

Vertical analysis involves reading a statement from top to bottom and comparing each item to those above and below it. This helps determine relationships between variables. For instance, if each line item is a percentage of the total, comparing them can provide insight into which are taking up larger and smaller percentages of the whole.

Horizontal analysis involves reading a statement from left to right and comparing each item to itself from a previous period. This type of analysis determines change over time.

Finally, ratio analysis involves comparing one section of a report to another based on their relationships to the whole. This directly compares items across periods, as well as your company’s ratios to the industry’s to gauge whether yours is over- or underperforming.

Each of these financial statement analysis methods are examples of descriptive analytics, as they provide information about trends and relationships between variables based on current and historical data.

Credential of Readiness | Master the fundamentals of business | Learn More

3. Demand Trends

Descriptive analytics can also be used to identify trends in customer preference and behavior and make assumptions about the demand for specific products or services.

Streaming provider Netflix’s trend identification provides an excellent use case for descriptive analytics. Netflix’s team—which has a track record of being heavily data-driven—gathers data on users’ in-platform behavior. They analyze this data to determine which TV series and movies are trending at any given time and list trending titles in a section of the platform’s home screen.

Not only does this data allow Netflix users to see what’s popular—and thus, what they might enjoy watching—but it allows the Netflix team to know which types of media, themes, and actors are especially favored at a certain time. This can drive decision-making about future original content creation, contracts with existing production companies, marketing, and retargeting campaigns.

4. Aggregated Survey Results

Descriptive analytics is also useful in market research. When it comes time to glean insights from survey and focus group data, descriptive analytics can help identify relationships between variables and trends.

For instance, you may conduct a survey and identify that as respondents’ age increases, so does their likelihood to purchase your product. If you’ve conducted this survey multiple times over several years, descriptive analytics can tell you if this age-purchase correlation has always existed or if it was something that only occurred this year.

Insights like this can pave the way for diagnostic analytics to explain why certain factors are correlated. You can then leverage predictive and prescriptive analytics to plan future product improvements or marketing campaigns based on those trends.

Related: What Is Marketing Analytics?

5. Progress to Goals

Finally, descriptive analytics can be applied to track progress to goals. Reporting on progress toward key performance indicators (KPIs) can help your team understand if efforts are on track or if adjustments need to be made.

For example, if your organization aims to reach 500,000 monthly unique page views, you can use traffic data to communicate how you’re tracking toward it. Perhaps halfway through the month, you’re at 200,000 unique page views. This would be underperforming because you’d like to be halfway to your goal at that point—at 250,000 unique page views. This descriptive analysis of your team’s progress can allow further analysis to examine what can be done differently to improve traffic numbers and get back on track to hit your KPI.

Business Analytics | Become a data-driven leader | Learn More

Using Data to Identify Relationships and Trends

“Never before has so much data about so many different things been collected and stored every second of every day,” says Harvard Business School Professor Jan Hammond in the online course Business Analytics . “In this world of big data, data literacy —the ability to analyze, interpret, and even question data—is an increasingly valuable skill.”

Leveraging descriptive analytics to communicate change based on current and historical data and as a foundation for diagnostic, predictive, and prescriptive analytics has the potential to take you and your organization far.

Do you want to become a data-driven professional? Explore our eight-week Business Analytics course and our three-course Credential of Readiness (CORe) program to deepen your analytical skills and apply them to real-world business problems.

About the Author

Table of Contents
Course Home
Assignments
Peer Instruction (Instructor)
Peer Instruction (Student)
Change Course
Instructor's Page
Progress Page
Edit Profile
Change Password
Scratch ActiveCode
Scratch Activecode
Instructors Guide
About Runestone
Report A Problem
2.1 Introduction
2.2 Case Study 1: The Happiness Report
2.3 Case Study 1: Adding More Happiness Data
2.4 Case Study 1: Comparing Happiness Data across Years
2.5 Challenge in Case Study 1: Calculating a Correlation Matrix
2.6 Case Study 2: Considering Starting a Business?
2.7 Case Study 2: Where Should We Start Our New Business?
2.8 Case Study 2: How is Business Over Time?
2.9 Case Study 2: Calculating a Correlation Matrix for Business Data
2.10 Glossary
1.2. Glossary" data-toggle="tooltip">
2.1. Introduction' data-toggle="tooltip" >

2. Exploring the Data Science Pipeline via Descriptive Statistics ¶

Descriptive Statistics

2.1.1. Learning Goals
2.1.2. Learning Objectives
2.1.3. Reading List
2.1.4. Time Required
2.1.5. Summary Statistics
2.2.1. Introducing the Happiness Report
2.2.2. Happiness Index Research Questions
2.2.3. Summary Statistics
2.2.4. Visualizing Happiness
2.3.1. Happiness by Region
2.3.2. Joining Data from Other Sources
2.3.3. Introducing Pivot Tables
2.4. Case Study 1: Comparing Happiness Data across Years
2.5. Challenge in Case Study 1: Calculating a Correlation Matrix
2.6.1. Thinking About Starting Your Business
2.6.2. Business Start-Up Data Analysis Research Questions
2.6.3. Visualizing How to Start a Business
2.7.1. Business Score by Region
2.7.2. Joining Data from Other Sources When Considering Starting a New Business
2.7.3. Summarizing Key Business Data Using Pivot Table
2.8. Case Study 2: How is Business Over Time?
2.9. Case Study 2: Calculating a Correlation Matrix for Business Data
2.10.1. Definitions
2.10.2. Keywords

Introduction to Statistical Thinking

Chapter 16 case studies, 16.1 student learning objective.

This chapter concludes this book. We start with a short review of the topics that were discussed in the second part of the book, the part that dealt with statistical inference. The main part of the chapter involves the statistical analysis of 2 case studies. The tools that will be used for the analysis are those that were discussed in the book. We close this chapter and this book with some concluding remarks. By the end of this chapter, the student should be able to:

Review the concepts and methods for statistical inference that were presented in the second part of the book.

Apply these methods to requirements of the analysis of real data.

Develop a resolve to learn more statistics.

16.2 A Review

The second part of the book dealt with statistical inference; the science of making general statement on an entire population on the basis of data from a sample. The basis for the statements are theoretical models that produce the sampling distribution. Procedures for making the inference are evaluated based on their properties in the context of this sampling distribution. Procedures with desirable properties are applied to the data. One may attach to the output of this application summaries that describe these theoretical properties.

In particular, we dealt with two forms of making inference. One form was estimation and the other was hypothesis testing. The goal in estimation is to determine the value of a parameter in the population. Point estimates or confidence intervals may be used in order to fulfill this goal. The properties of point estimators may be assessed using the mean square error (MSE) and the properties of the confidence interval may be assessed using the confidence level.

The target in hypotheses testing is to decide between two competing hypothesis. These hypotheses are formulated in terms of population parameters. The decision rule is called a statistical test and is constructed with the aid of a test statistic and a rejection region. The default hypothesis among the two, is rejected if the test statistic falls in the rejection region. The major property a test must possess is a bound on the probability of a Type I error, the probability of erroneously rejecting the null hypothesis. This restriction is called the significance level of the test. A test may also be assessed in terms of it’s statistical power, the probability of rightfully rejecting the null hypothesis.

Estimation and testing were applied in the context of single measurements and for the investigation of the relations between a pair of measurements. For single measurements we considered both numeric variables and factors. For numeric variables one may attempt to conduct inference on the expectation and/or the variance. For factors we considered the estimation of the probability of obtaining a level, or, more generally, the probability of the occurrence of an event.

We introduced statistical models that may be used to describe the relations between variables. One of the variables was designated as the response. The other variable, the explanatory variable, is identified as a variable which may affect the distribution of the response. Specifically, we considered numeric variables and factors that have two levels. If the explanatory variable is a factor with two levels then the analysis reduces to the comparison of two sub-populations, each one associated with a level. If the explanatory variable is numeric then a regression model may be applied, either linear or logistic regression, depending on the type of the response.

The foundations of statistical inference are the assumption that we make in the form of statistical models. These models attempt to reflect reality. However, one is advised to apply healthy skepticism when using the models. First, one should be aware what the assumptions are. Then one should ask oneself how reasonable are these assumption in the context of the specific analysis. Finally, one should check as much as one can the validity of the assumptions in light of the information at hand. It is useful to plot the data and compare the plot to the assumptions of the model.

16.3 Case Studies

Let us apply the methods that were introduced throughout the book to two examples of data analysis. Both examples are taken from the case studies of the Rice Virtual Lab in Statistics can be found in their Case Studies section. The analysis of these case studies may involve any of the tools that were described in the second part of the book (and some from the first part). It may be useful to read again Chapters 9 – 15 before reading the case studies.

16.3.1 Physicians’ Reactions to the Size of a Patient

Overweight and obesity is common in many of the developed contrives. In some cultures, obese individuals face discrimination in employment, education, and relationship contexts. The current research, conducted by Mikki Hebl and Jingping Xu 87 , examines physicians’ attitude toward overweight and obese patients in comparison to their attitude toward patients who are not overweight.

The experiment included a total of 122 primary care physicians affiliated with one of three major hospitals in the Texas Medical Center of Houston. These physicians were sent a packet containing a medical chart similar to the one they view upon seeing a patient. This chart portrayed a patient who was displaying symptoms of a migraine headache but was otherwise healthy. Two variables (the gender and the weight of the patient) were manipulated across six different versions of the medical charts. The weight of the patient, described in terms of Body Mass Index (BMI), was average (BMI = 23), overweight (BMI = 30), or obese (BMI = 36). Physicians were randomly assigned to receive one of the six charts, and were asked to look over the chart carefully and complete two medical forms. The first form asked physicians which of 42 tests they would recommend giving to the patient. The second form asked physicians to indicate how much time they believed they would spend with the patient, and to describe the reactions that they would have toward this patient.

In this presentation, only the question on how much time the physicians believed they would spend with the patient is analyzed. Although three patient weight conditions were used in the study (average, overweight, and obese) only the average and overweight conditions will be analyzed. Therefore, there are two levels of patient weight (average and overweight) and one dependent variable (time spent).

The data for the given collection of responses from 72 primary care physicians is stored in the file “ discriminate.csv ” 88 . We start by reading the content of the file into a data frame by the name “ patient ” and presenting the summary of the variables:

Observe that of the 72 “patients”, 38 are overweight and 33 have an average weight. The time spend with the patient, as predicted by physicians, is distributed between 5 minutes and 1 hour, with a average of 27.82 minutes and a median of 30 minutes.

It is a good practice to have a look at the data before doing the analysis. In this examination on should see that the numbers make sense and one should identify special features of the data. Even in this very simple example we may want to have a look at the histogram of the variable “ time ”:

A feature in this plot that catches attention is the fact that there is a high concventration of values in the interval between 25 and 30. Together with the fact that the median is equal to 30, one may suspect that, as a matter of fact, a large numeber of the values are actually equal to 30. Indeed, let us produce a table of the response:

Notice that 30 of the 72 physicians marked “ 30 ” as the time they expect to spend with the patient. This is the middle value in the range, and may just be the default value one marks if one just needs to complete a form and do not really place much importance to the question that was asked.

The goal of the analysis is to examine the relation between overweigh and the Doctor’s response. The explanatory variable is a factor with two levels. The response is numeric. A natural tool to use in order to test this hypothesis is the $t$ -test, which is implemented with the function “ t.test ”.

First we plot the relation between the response and the explanatory variable and then we apply the test:

Nothing seems problematic in the box plot. The two distributions, as they are reflected in the box plots, look fairly symmetric.

When we consider the report that produced by the function “ t.test ” we may observe that the $p$ -value is equal to 0.005774. This $p$ -value is computed in testing the null hypothesis that the expectation of the response for both types of patients are equal against the two sided alternative. Since the $p$ -value is less than 0.05 we do reject the null hypothesis.

The estimated value of the difference between the expectation of the response for a patient with BMI=23 and a patient with BMI=30 is $31.36364 -24.73684 \approx 6.63$ minutes. The confidence interval is (approximately) equal to $[1.99, 11.27]$ . Hence, it looks as if the physicians expect to spend more time with the average weight patients.

After analyzing the effect of the explanatory variable on the expectation of the response one may want to examine the presence, or lack thereof, of such effect on the variance of the response. Towards that end, one may use the function “ var.test ”:

In this test we do not reject the null hypothesis that the two variances of the response are equal since the $p$ -value is larger than $0.05$ . The sample variances are almost equal to each other (their ratio is $1.044316$ ), with a confidence interval for the ration that essentially ranges between 1/2 and 2.

The production of $p$ -values and confidence intervals is just one aspect in the analysis of data. Another aspect, which typically is much more time consuming and requires experience and healthy skepticism is the examination of the assumptions that are used in order to produce the $p$ -values and the confidence intervals. A clear violation of the assumptions may warn the statistician that perhaps the computed nominal quantities do not represent the actual statistical properties of the tools that were applied.

In this case, we have noticed the high concentration of the response at the value “ 30 ”. What is the situation when we split the sample between the two levels of the explanatory variable? Let us apply the function “ table ” once more, this time with the explanatory variable included:

Not surprisingly, there is still high concentration at that level “ 30 ”. But one can see that only 2 of the responses of the “ BMI=30 ” group are above that value in comparison to a much more symmetric distribution of responses for the other group.

The simulations of the significance level of the one-sample $t$ -test for an Exponential response that were conducted in Question \[ex:Testing.2\] may cast some doubt on how trustworthy are nominal $p$ -values of the $t$ -test when the measurements are skewed. The skewness of the response for the group “ BMI=30 ” is a reason to be worry.

We may consider a different test, which is more robust, in order to validate the significance of our findings. For example, we may turn the response into a factor by setting a level for values larger or equal to “ 30 ” and a different level for values less than “ 30 ”. The relation between the new response and the explanatory variable can be examined with the function “ prop.test ”. We first plot and then test:

The mosaic plot presents the relation between the explanatory variable and the new factor. The level “ TRUE ” is associated with a value of the predicted time spent with the patient being 30 minutes or more. The level “ FALSE ” is associated with a prediction of less than 30 minutes.

The computed $p$ -value is equal to $0.05409$ , that almost reaches the significance level of 5% 89 . Notice that the probabilities that are being estimated by the function are the probabilities of the level “ FALSE ”. Overall, one may see the outcome of this test as supporting evidence for the conclusion of the $t$ -test. However, the $p$ -value provided by the $t$ -test may over emphasize the evidence in the data for a significant difference in the physician attitude towards overweight patients.

16.3.2 Physical Strength and Job Performance

The next case study involves an attempt to develop a measure of physical ability that is easy and quick to administer, does not risk injury, and is related to how well a person performs the actual job. The current example is based on study by Blakely et al. 90 , published in the journal Personnel Psychology.

There are a number of very important jobs that require, in addition to cognitive skills, a significant amount of strength to be able to perform at a high level. Construction worker, electrician and auto mechanic, all require strength in order to carry out critical components of their job. An interesting applied problem is how to select the best candidates from amongst a group of applicants for physically demanding jobs in a safe and a cost effective way.

The data presented in this case study, and may be used for the development of a method for selection among candidates, were collected from 147 individuals working in physically demanding jobs. Two measures of strength were gathered from each participant. These included grip and arm strength. A piece of equipment known as the Jackson Evaluation System (JES) was used to collect the strength data. The JES can be configured to measure the strength of a number of muscle groups. In this study, grip strength and arm strength were measured. The outcomes of these measurements were summarized in two scores of physical strength called “ grip ” and “ arm ”.

Two separate measures of job performance are presented in this case study. First, the supervisors for each of the participants were asked to rate how well their employee(s) perform on the physical aspects of their jobs. This measure is summarizes in the variable “ ratings ”. Second, simulations of physically demanding work tasks were developed. The summary score of these simulations are given in the variable “ sims ”. Higher values of either measures of performance indicates better performance.

The data for the 4 variables and 147 observations is stored in “ job.csv ” 91 . We start by reading the content of the file into a data frame by the name “ job ”, presenting a summary of the variables, and their histograms:

All variables are numeric. Examination of the 4 summaries and histograms does not produce interest findings. All variables are, more or less, symmetric with the distribution of the variable “ ratings ” tending perhaps to be more uniform then the other three.

The main analyses of interest are attempts to relate the two measures of physical strength “ grip ” and “ arm ” with the two measures of job performance, “ ratings ” and “ sims ”. A natural tool to consider in this context is a linear regression analysis that relates a measure of physical strength as an explanatory variable to a measure of job performance as a response.

FIGURE 16.1: Scatter Plots and Regression Lines

Let us consider the variable “ sims ” as a response. The first step is to plot a scatter plot of the response and explanatory variable, for both explanatory variables. To the scatter plot we add the line of regression. In order to add the regression line we fit the regression model with the function “ lm ” and then apply the function “ abline ” to the fitted model. The plot for the relation between the response and the variable “ grip ” is produced by the code:

The plot that is produced by this code is presented on the upper-left panel of Figure 16.1 .

The plot for the relation between the response and the variable “ arm ” is produced by this code:

The plot that is produced by the last code is presented on the upper-right panel of Figure 16.1 .

Both plots show similar characteristics. There is an overall linear trend in the relation between the explanatory variable and the response. The value of the response increases with the increase in the value of the explanatory variable (a positive slope). The regression line seems to follow, more or less, the trend that is demonstrated by the scatter plot.

A more detailed analysis of the regression model is possible by the application of the function “ summary ” to the fitted model. First the case where the explanatory variable is “ grip ”:

Examination of the report reviles a clear statistical significance for the effect of the explanatory variable on the distribution of response. The value of R-squared, the ration of the variance of the response explained by the regression is $0.4094$ . The square root of this quantity, $\sqrt{0.4094} \approx 0.64$ , is the proportion of the standard deviation of the response that is explained by the explanatory variable. Hence, about 64% of the variability in the response can be attributed to the measure of the strength of the grip.

For the variable “ arm ” we get:

This variable is also statistically significant. The value of R-squared is $0.4706$ . The proportion of the standard deviation that is explained by the strength of the are is $\sqrt{0.4706} \approx 0.69$ , which is slightly higher than the proportion explained by the grip.

Overall, the explanatory variables do a fine job in the reduction of the variability of the response “ sims ” and may be used as substitutes of the response in order to select among candidates. A better prediction of the response based on the values of the explanatory variables can be obtained by combining the information in both variables. The production of such combination is not discussed in this book, though it is similar in principle to the methods of linear regression that are presented in Chapter 14 . The produced score 92 takes the form:

\[\mbox{\texttt{score}} = -5.434 + 0.024\cdot \mbox{\texttt{grip}}+ 0.037\cdot \mbox{\texttt{arm}}\;.\] We use this combined score as an explanatory variable. First we form the score and plot the relation between it and the response:

The scatter plot that includes the regression line can be found at the lower-left panel of Figure 16.1 . Indeed, the linear trend is more pronounced for this scatter plot and the regression line a better description of the relation between the response and the explanatory variable. A summary of the regression model produces the report:

Indeed, the score is highly significant. More important, the R-squared coefficient that is associated with the score is $0.5422$ , which corresponds to a ratio of the standard deviation that is explained by the model of $\sqrt{0.5422} \approx 0.74$ . Thus, almost 3/4 of the variability is accounted for by the score, so the score is a reasonable mean of guessing what the results of the simulations will be. This guess is based only on the results of the simple tests of strength that is conducted with the JES device.

Before putting the final seal on the results let us examine the assumptions of the statistical model. First, with respect to the two explanatory variables. Does each of them really measure a different property or do they actually measure the same phenomena? In order to examine this question let us look at the scatter plot that describes the relation between the two explanatory variables. This plot is produced using the code:

It is presented in the lower-right panel of Figure 16.1 . Indeed, one may see that the two measurements of strength are not independent of each other but tend to produce an increasing linear trend. Hence, it should not be surprising that the relation of each of them with the response produces essentially the same goodness of fit. The computed score gives a slightly improved fit, but still, it basically reflects either of the original explanatory variables.

In light of this observation, one may want to consider other measures of strength that represents features of the strength not captures by these two variable. Namely, measures that show less joint trend than the two considered.

Another element that should be examined are the probabilistic assumptions that underly the regression model. We described the regression model only in terms of the functional relation between the explanatory variable and the expectation of the response. In the case of linear regression, for example, this relation was given in terms of a linear equation. However, another part of the model corresponds to the distribution of the measurements about the line of regression. The assumption that led to the computation of the reported $p$ -values is that this distribution is Normal.

A method that can be used in order to investigate the validity of the Normal assumption is to analyze the residuals from the regression line. Recall that these residuals are computed as the difference between the observed value of the response and its estimated expectation, namely the fitted regression line. The residuals can be computed via the application of the function “ residuals ” to the fitted regression model.

Specifically, let us look at the residuals from the regression line that uses the score that is combined from the grip and arm measurements of strength. One may plot a histogram of the residuals:

The produced histogram is represented on the upper panel. The histogram portrays a symmetric distribution that my result from Normally distributed observations. A better method to compare the distribution of the residuals to the Normal distribution is to use the Quantile-Quantile plot . This plot can be found on the lower panel. We do not discuss here the method by which this plot is produced 93 . However, we do say that any deviation of the points from a straight line is indication of violation of the assumption of Normality. In the current case, the points seem to be on a single line, which is consistent with the assumptions of the regression model.

The next task should be an analysis of the relations between the explanatory variables and the other response “ ratings ”. In principle one may use the same steps that were presented for the investigation of the relations between the explanatory variables and the response “ sims ”. But of course, the conclusion may differ. We leave this part of the investigation as an exercise to the students.

16.4 Summary

16.4.1 concluding remarks.

The book included a description of some elements of statistics, element that we thought are simple enough to be explained as part of an introductory course to statistics and are the minimum that is required for any person that is involved in academic activities of any field in which the analysis of data is required. Now, as you finish the book, it is as good time as any to say some words regarding the elements of statistics that are missing from this book.

One element is more of the same. The statistical models that were presented are as simple as a model can get. A typical application will required more complex models. Each of these models may require specific methods for estimation and testing. The characteristics of inference, e.g. significance or confidence levels, rely on assumptions that the models are assumed to possess. The user should be familiar with computational tools that can be used for the analysis of these more complex models. Familiarity with the probabilistic assumptions is required in order to be able to interpret the computer output, to diagnose possible divergence from the assumptions and to assess the severity of the possible effect of such divergence on the validity of the findings.

Statistical tools can be used for tasks other than estimation and hypothesis testing. For example, one may use statistics for prediction. In many applications it is important to assess what the values of future observations may be and in what range of values are they likely to occur. Statistical tools such as regression are natural in this context. However, the required task is not testing or estimation the values of parameters, but the prediction of future values of the response.

A different role of statistics in the design stage. We hinted in that direction when we talked about in Chapter \[ch:Confidence\] about the selection of a sample size in order to assure a confidence interval with a given accuracy. In most applications, the selection of the sample size emerges in the context of hypothesis testing and the criteria for selection is the minimal power of the test, a minimal probability to detect a true finding. Yet, statistical design is much more than the determination of the sample size. Statistics may have a crucial input in the decision of how to collect the data. With an eye on the requirements for the final analysis, an experienced statistician can make sure that data that is collected is indeed appropriate for that final analysis. Too often is the case where researcher steps into the statistician’s office with data that he or she collected and asks, when it is already too late, for help in the analysis of data that cannot provide a satisfactory answer to the research question the researcher tried to address. It may be said, with some exaggeration, that good statisticians are required for the final analysis only in the case where the initial planning was poor.

Last, but not least, is the theoretical mathematical theory of statistics. We tried to introduce as little as possible of the relevant mathematics in this course. However, if one seriously intends to learn and understand statistics then one must become familiar with the relevant mathematical theory. Clearly, deep knowledge in the mathematical theory of probability is required. But apart from that, there is a rich and rapidly growing body of research that deals with the mathematical aspects of data analysis. One cannot be a good statistician unless one becomes familiar with the important aspects of this theory.

I should have started the book with the famous quotation: “Lies, damned lies, and statistics”. Instead, I am using it to end the book. Statistics can be used and can be misused. Learning statistics can give you the tools to tell the difference between the two. My goal in writing the book is achieved if reading it will mark for you the beginning of the process of learning statistics and not the end of the process.

16.4.2 Discussion in the Forum

In the second part of the book we have learned many subjects. Most of these subjects, especially for those that had no previous exposure to statistics, were unfamiliar. In this forum we would like to ask you to share with us the difficulties that you encountered.

What was the topic that was most difficult for you to grasp? In your opinion, what was the source of the difficulty?

When forming your answer to this question we will appreciate if you could elaborate and give details of what the problem was. Pointing to deficiencies in the learning material and confusing explanations will help us improve the presentation for the future editions of this book.

Hebl, M. and Xu, J. (2001). Weighing the care: Physicians’ reactions to the size of a patient. International Journal of Obesity, 25, 1246-1252. ↩

The file can be found on the internet at http://pluto.huji.ac.il/~msby/StatThink/Datasets/discriminate.csv . ↩

One may propose splinting the response into two groups, with one group being associated with values of “ time ” strictly larger than 30 minutes and the other with values less or equal to 30. The resulting $p$ -value from the expression “ prop.test(table(patient$time>30,patient$weight)) ” is $0.01276$ . However, the number of subjects in one of the cells of the table is equal only to 2, which is problematic in the context of the Normal approximation that is used by this test. ↩

Blakley, B.A., Qui?ones, M.A., Crawford, M.S., and Jago, I.A. (1994). The validity of isometric strength tests. Personnel Psychology, 47, 247-274. ↩

The file can be found on the internet at http://pluto.huji.ac.il/~msby/StatThink/Datasets/job.csv . ↩

The score is produced by the application of the function “ lm ” to both variables as explanatory variables. The code expression that can be used is “ lm(sims ~ grip + arm, data=job) ”. ↩

Generally speaking, the plot is composed of the empirical percentiles of the residuals, plotted against the theoretical percentiles of the standard Normal distribution. The current plot is produced by the expression “ qqnorm(residuals(sims.score)) ”. ↩

Home » Descriptive Research Design – Types, Methods and Examples

Descriptive Research Design – Types, Methods and Examples

Table of Contents

Descriptive Research Design

Definition:

Descriptive research design is a type of research methodology that aims to describe or document the characteristics, behaviors, attitudes, opinions, or perceptions of a group or population being studied.

Descriptive research design does not attempt to establish cause-and-effect relationships between variables or make predictions about future outcomes. Instead, it focuses on providing a detailed and accurate representation of the data collected, which can be useful for generating hypotheses, exploring trends, and identifying patterns in the data.

Types of Descriptive Research Design

Types of Descriptive Research Design are as follows:

Cross-sectional Study

This involves collecting data at a single point in time from a sample or population to describe their characteristics or behaviors. For example, a researcher may conduct a cross-sectional study to investigate the prevalence of certain health conditions among a population, or to describe the attitudes and beliefs of a particular group.

Longitudinal Study

This involves collecting data over an extended period of time, often through repeated observations or surveys of the same group or population. Longitudinal studies can be used to track changes in attitudes, behaviors, or outcomes over time, or to investigate the effects of interventions or treatments.

This involves an in-depth examination of a single individual, group, or situation to gain a detailed understanding of its characteristics or dynamics. Case studies are often used in psychology, sociology, and business to explore complex phenomena or to generate hypotheses for further research.

Survey Research

This involves collecting data from a sample or population through standardized questionnaires or interviews. Surveys can be used to describe attitudes, opinions, behaviors, or demographic characteristics of a group, and can be conducted in person, by phone, or online.

Observational Research

This involves observing and documenting the behavior or interactions of individuals or groups in a natural or controlled setting. Observational studies can be used to describe social, cultural, or environmental phenomena, or to investigate the effects of interventions or treatments.

Correlational Research

This involves examining the relationships between two or more variables to describe their patterns or associations. Correlational studies can be used to identify potential causal relationships or to explore the strength and direction of relationships between variables.

Data Analysis Methods

Descriptive research design data analysis methods depend on the type of data collected and the research question being addressed. Here are some common methods of data analysis for descriptive research:

Descriptive Statistics

This method involves analyzing data to summarize and describe the key features of a sample or population. Descriptive statistics can include measures of central tendency (e.g., mean, median, mode) and measures of variability (e.g., range, standard deviation).

Cross-tabulation

This method involves analyzing data by creating a table that shows the frequency of two or more variables together. Cross-tabulation can help identify patterns or relationships between variables.

Content Analysis

This method involves analyzing qualitative data (e.g., text, images, audio) to identify themes, patterns, or trends. Content analysis can be used to describe the characteristics of a sample or population, or to identify factors that influence attitudes or behaviors.

Qualitative Coding

This method involves analyzing qualitative data by assigning codes to segments of data based on their meaning or content. Qualitative coding can be used to identify common themes, patterns, or categories within the data.

Visualization

This method involves creating graphs or charts to represent data visually. Visualization can help identify patterns or relationships between variables and make it easier to communicate findings to others.

Comparative Analysis

This method involves comparing data across different groups or time periods to identify similarities and differences. Comparative analysis can help describe changes in attitudes or behaviors over time or differences between subgroups within a population.

Applications of Descriptive Research Design

Descriptive research design has numerous applications in various fields. Some of the common applications of descriptive research design are:

Market research: Descriptive research design is widely used in market research to understand consumer preferences, behavior, and attitudes. This helps companies to develop new products and services, improve marketing strategies, and increase customer satisfaction.
Health research: Descriptive research design is used in health research to describe the prevalence and distribution of a disease or health condition in a population. This helps healthcare providers to develop prevention and treatment strategies.
Educational research: Descriptive research design is used in educational research to describe the performance of students, schools, or educational programs. This helps educators to improve teaching methods and develop effective educational programs.
Social science research: Descriptive research design is used in social science research to describe social phenomena such as cultural norms, values, and beliefs. This helps researchers to understand social behavior and develop effective policies.
Public opinion research: Descriptive research design is used in public opinion research to understand the opinions and attitudes of the general public on various issues. This helps policymakers to develop effective policies that are aligned with public opinion.
Environmental research: Descriptive research design is used in environmental research to describe the environmental conditions of a particular region or ecosystem. This helps policymakers and environmentalists to develop effective conservation and preservation strategies.

Descriptive Research Design Examples

Here are some real-time examples of descriptive research designs:

A restaurant chain wants to understand the demographics and attitudes of its customers. They conduct a survey asking customers about their age, gender, income, frequency of visits, favorite menu items, and overall satisfaction. The survey data is analyzed using descriptive statistics and cross-tabulation to describe the characteristics of their customer base.
A medical researcher wants to describe the prevalence and risk factors of a particular disease in a population. They conduct a cross-sectional study in which they collect data from a sample of individuals using a standardized questionnaire. The data is analyzed using descriptive statistics and cross-tabulation to identify patterns in the prevalence and risk factors of the disease.
An education researcher wants to describe the learning outcomes of students in a particular school district. They collect test scores from a representative sample of students in the district and use descriptive statistics to calculate the mean, median, and standard deviation of the scores. They also create visualizations such as histograms and box plots to show the distribution of scores.
A marketing team wants to understand the attitudes and behaviors of consumers towards a new product. They conduct a series of focus groups and use qualitative coding to identify common themes and patterns in the data. They also create visualizations such as word clouds to show the most frequently mentioned topics.
An environmental scientist wants to describe the biodiversity of a particular ecosystem. They conduct an observational study in which they collect data on the species and abundance of plants and animals in the ecosystem. The data is analyzed using descriptive statistics to describe the diversity and richness of the ecosystem.

How to Conduct Descriptive Research Design

To conduct a descriptive research design, you can follow these general steps:

Define your research question: Clearly define the research question or problem that you want to address. Your research question should be specific and focused to guide your data collection and analysis.
Choose your research method: Select the most appropriate research method for your research question. As discussed earlier, common research methods for descriptive research include surveys, case studies, observational studies, cross-sectional studies, and longitudinal studies.
Design your study: Plan the details of your study, including the sampling strategy, data collection methods, and data analysis plan. Determine the sample size and sampling method, decide on the data collection tools (such as questionnaires, interviews, or observations), and outline your data analysis plan.
Collect data: Collect data from your sample or population using the data collection tools you have chosen. Ensure that you follow ethical guidelines for research and obtain informed consent from participants.
Analyze data: Use appropriate statistical or qualitative analysis methods to analyze your data. As discussed earlier, common data analysis methods for descriptive research include descriptive statistics, cross-tabulation, content analysis, qualitative coding, visualization, and comparative analysis.
I nterpret results: Interpret your findings in light of your research question and objectives. Identify patterns, trends, and relationships in the data, and describe the characteristics of your sample or population.
Draw conclusions and report results: Draw conclusions based on your analysis and interpretation of the data. Report your results in a clear and concise manner, using appropriate tables, graphs, or figures to present your findings. Ensure that your report follows accepted research standards and guidelines.

When to Use Descriptive Research Design

Descriptive research design is used in situations where the researcher wants to describe a population or phenomenon in detail. It is used to gather information about the current status or condition of a group or phenomenon without making any causal inferences. Descriptive research design is useful in the following situations:

Exploratory research: Descriptive research design is often used in exploratory research to gain an initial understanding of a phenomenon or population.
Identifying trends: Descriptive research design can be used to identify trends or patterns in a population, such as changes in consumer behavior or attitudes over time.
Market research: Descriptive research design is commonly used in market research to understand consumer preferences, behavior, and attitudes.
Health research: Descriptive research design is useful in health research to describe the prevalence and distribution of a disease or health condition in a population.
Social science research: Descriptive research design is used in social science research to describe social phenomena such as cultural norms, values, and beliefs.
Educational research: Descriptive research design is used in educational research to describe the performance of students, schools, or educational programs.

Purpose of Descriptive Research Design

The main purpose of descriptive research design is to describe and measure the characteristics of a population or phenomenon in a systematic and objective manner. It involves collecting data that describe the current status or condition of the population or phenomenon of interest, without manipulating or altering any variables.

The purpose of descriptive research design can be summarized as follows:

To provide an accurate description of a population or phenomenon: Descriptive research design aims to provide a comprehensive and accurate description of a population or phenomenon of interest. This can help researchers to develop a better understanding of the characteristics of the population or phenomenon.
To identify trends and patterns: Descriptive research design can help researchers to identify trends and patterns in the data, such as changes in behavior or attitudes over time. This can be useful for making predictions and developing strategies.
To generate hypotheses: Descriptive research design can be used to generate hypotheses or research questions that can be tested in future studies. For example, if a descriptive study finds a correlation between two variables, this could lead to the development of a hypothesis about the causal relationship between the variables.
To establish a baseline: Descriptive research design can establish a baseline or starting point for future research. This can be useful for comparing data from different time periods or populations.

Characteristics of Descriptive Research Design

Descriptive research design has several key characteristics that distinguish it from other research designs. Some of the main characteristics of descriptive research design are:

Objective : Descriptive research design is objective in nature, which means that it focuses on collecting factual and accurate data without any personal bias. The researcher aims to report the data objectively without any personal interpretation.
Non-experimental: Descriptive research design is non-experimental, which means that the researcher does not manipulate any variables. The researcher simply observes and records the behavior or characteristics of the population or phenomenon of interest.
Quantitative : Descriptive research design is quantitative in nature, which means that it involves collecting numerical data that can be analyzed using statistical techniques. This helps to provide a more precise and accurate description of the population or phenomenon.
Cross-sectional: Descriptive research design is often cross-sectional, which means that the data is collected at a single point in time. This can be useful for understanding the current state of the population or phenomenon, but it may not provide information about changes over time.
Large sample size: Descriptive research design typically involves a large sample size, which helps to ensure that the data is representative of the population of interest. A large sample size also helps to increase the reliability and validity of the data.
Systematic and structured: Descriptive research design involves a systematic and structured approach to data collection, which helps to ensure that the data is accurate and reliable. This involves using standardized procedures for data collection, such as surveys, questionnaires, or observation checklists.

Advantages of Descriptive Research Design

Descriptive research design has several advantages that make it a popular choice for researchers. Some of the main advantages of descriptive research design are:

Provides an accurate description: Descriptive research design is focused on accurately describing the characteristics of a population or phenomenon. This can help researchers to develop a better understanding of the subject of interest.
Easy to conduct: Descriptive research design is relatively easy to conduct and requires minimal resources compared to other research designs. It can be conducted quickly and efficiently, and data can be collected through surveys, questionnaires, or observations.
Useful for generating hypotheses: Descriptive research design can be used to generate hypotheses or research questions that can be tested in future studies. For example, if a descriptive study finds a correlation between two variables, this could lead to the development of a hypothesis about the causal relationship between the variables.
Large sample size : Descriptive research design typically involves a large sample size, which helps to ensure that the data is representative of the population of interest. A large sample size also helps to increase the reliability and validity of the data.
Can be used to monitor changes : Descriptive research design can be used to monitor changes over time in a population or phenomenon. This can be useful for identifying trends and patterns, and for making predictions about future behavior or attitudes.
Can be used in a variety of fields : Descriptive research design can be used in a variety of fields, including social sciences, healthcare, business, and education.

Limitation of Descriptive Research Design

Descriptive research design also has some limitations that researchers should consider before using this design. Some of the main limitations of descriptive research design are:

Cannot establish cause and effect: Descriptive research design cannot establish cause and effect relationships between variables. It only provides a description of the characteristics of the population or phenomenon of interest.
Limited generalizability: The results of a descriptive study may not be generalizable to other populations or situations. This is because descriptive research design often involves a specific sample or situation, which may not be representative of the broader population.
Potential for bias: Descriptive research design can be subject to bias, particularly if the researcher is not objective in their data collection or interpretation. This can lead to inaccurate or incomplete descriptions of the population or phenomenon of interest.
Limited depth: Descriptive research design may provide a superficial description of the population or phenomenon of interest. It does not delve into the underlying causes or mechanisms behind the observed behavior or characteristics.
Limited utility for theory development: Descriptive research design may not be useful for developing theories about the relationship between variables. It only provides a description of the variables themselves.
Relies on self-report data: Descriptive research design often relies on self-report data, such as surveys or questionnaires. This type of data may be subject to biases, such as social desirability bias or recall bias.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

Questionnaire – Definition, Types, and Examples

Case Study – Methods, Examples and Guide

Observational Research – Methods and Guide

Quantitative Research – Methods, Types and...

Qualitative Research Methods

Explanatory Research – Types, Methods, Guide

Research Writing and Analysis

NVivo Group and Study Sessions
SPSS This link opens in a new window
Statistical Analysis Group sessions
Using Qualtrics
Dissertation and Data Analysis Group Sessions
Defense Schedule - Commons Calendar This link opens in a new window
Research Process Flow Chart
Research Alignment Chapter 1 This link opens in a new window
Step 1: Seek Out Evidence
Step 2: Explain
Step 3: The Big Picture
Step 4: Own It
Step 5: Illustrate
Annotated Bibliography
Literature Review This link opens in a new window
Systematic Reviews & Meta-Analyses
How to Synthesize and Analyze
Synthesis and Analysis Practice
Synthesis and Analysis Group Sessions
Problem Statement
Purpose Statement
Conceptual Framework
Theoretical Framework
Quantitative Research Questions
Qualitative Research Questions
Trustworthiness of Qualitative Data
Analysis and Coding Example- Qualitative Data
Thematic Data Analysis in Qualitative Design
Dissertation to Journal Article This link opens in a new window
International Journal of Online Graduate Education (IJOGE) This link opens in a new window
Journal of Research in Innovative Teaching & Learning (JRIT&L) This link opens in a new window

Writing a Case Study

What is a case study?

A Map of the world with hands holding a pen.

A Case study is:

An in-depth research design that primarily uses a qualitative methodology but sometimes includes quantitative methodology.
Used to examine an identifiable problem confirmed through research.
Used to investigate an individual, group of people, organization, or event.
Used to mostly answer "how" and "why" questions.

What are the different types of case studies?

Note: These are the primary case studies. As you continue to research and learn

about case studies you will begin to find a robust list of different types.

Who are your case study participants?

What is triangulation ?

Validity and credibility are an essential part of the case study. Therefore, the researcher should include triangulation to ensure trustworthiness while accurately reflecting what the researcher seeks to investigate.

How to write a Case Study?

When developing a case study, there are different ways you could present the information, but remember to include the five parts for your case study.

Man holding his hand out to show five fingers.

Was this resource helpful?

<< Previous: Thematic Data Analysis in Qualitative Design
Next: Journal Article Reporting Standards (JARS) >>
Last Updated: May 16, 2024 8:25 AM
URL: https://resources.nu.edu/researchtools

School Guide
Mathematics
Number System and Arithmetic
Trigonometry
Probability
Mensuration
Maths Formulas
Class 8 Maths Notes
Class 9 Maths Notes
Class 10 Maths Notes
Class 11 Maths Notes
Class 12 Maths Notes

Descriptive Statistics

Descriptive Statistic in R
Descriptive Statistics in Excel
Statistics Questions
Categorical Data Descriptive Statistics in R
Mean in Statistics
Statistics Cheat Sheet
Probability and Statistics
Data Types in Statistics
Types of Statistical Series
SciPy - Stats
Application of Statistics
Statistics Formulas
SciPy - Statistical Significance Tests
Descriptive Statistics in Julia
R - Statistics
Statistical Database Security
Statistics for Economics
Statistics with Python

Descriptive statistics is a subfield of statistics that deals with characterizing the features of known data. Descriptive statistics give summaries of either population or sample data. Aside from descriptive statistics, inferential statistics is another important discipline of statistics used to draw conclusions about population data.

Descriptive statistics is divided into two categories:

Measures of Central Tendency

Measures of dispersion.

In this article, we will learn about descriptive statistics, including their many categories, formulae, and examples in detail.

What is Descriptive Statistics?

Descriptive statistics is a branch of statistics focused on summarizing, organizing, and presenting data in a clear and understandable way. Its primary aim is to define and analyze the fundamental characteristics of a dataset without making sweeping generalizations or assumptions about the entire data set.

The main purpose of descriptive statistics is to provide a straightforward and concise overview of the data, enabling researchers or analysts to gain insights and understand patterns, trends, and distributions within the dataset.

Descriptive statistics typically involve measures of central tendency (such as mean, median, mode), dispersion (such as range, variance, standard deviation), and distribution shape (including skewness and kurtosis). Additionally, graphical representations like charts, graphs, and tables are commonly used to visualize and interpret the data.

Histograms, bar charts, pie charts, scatter plots, and box plots are some examples of widely used graphical techniques in descriptive statistics.

Descriptive Statistics Definition

Descriptive statistics is a type of statistical analysis that uses quantitative methods to summarize the features of a population sample. It is useful to present easy and exact summaries of the sample and observations using metrics such as mean, median, variance, graphs, and charts.

Types of Descriptive Statistics

There are three types of descriptive statistics:

Measures of Frequency Distribution

The central tendency is defined as a statistical measure that may be used to describe a complete distribution or dataset with a single value, known as a measure of central tendency. Any of the central tendency measures accurately describes the whole data distribution. In the following sections, we will look at the central tendency measures, their formulae, applications, and kinds in depth.

Mean is the sum of all the components in a group or collection divided by the number of items in that group or collection. Mean of a data collection is typically represented as x̄ (pronounced “x bar”). The formula for calculating the mean for ungrouped data to express it as the measure is given as follows:

For a series of observations:

x̄ = Σx / n

x̄ = Mean Value of Provided Dataset
Σx = Sum of All Terms
n = Number of Terms

Example: Weights of 7 girls in kg are 54, 32, 45, 61, 20, 66 and 50. Determine the mean weight for the provided collection of data.

Mean = Σx/n = (54 + 32 + 45 + 61 + 20 + 66 + 50)/7 = 328 / 7 = 46.85 Thus, the group’s mean weight is 46.85 kg.

Median of a data set is the value of the middle-most observation obtained after organizing the data in ascending order, which is one of the measures of central tendency. Median formula may be used to compute the median for many types of data, such as grouped and ungrouped data.

Ungrouped Data Median (n is odd): [(n + 1)/2] th term Ungrouped Data Median (n is even): [(n / 2) th term + ((n / 2) + 1) th term]/2

Example: Weights of 7 girls in kg are 54, 32, 45, 61, 20, 66 and 50. Determine the median weight for the provided collection of data.

Arrange the provided data collection in ascending order: 20, 32, 45, 50, 54, 61, 66 Median = [(n + 1) / 2] th term = [(7 + 1) / 2] th term = 4 th term = 50 Thus, group’s median weight is 50 kg.

Mode is one of the measures of central tendency, defined as the value that appears the most frequently in the provided data, i.e. the observation with the highest frequency is known as the mode of data. The mode formulae provided below can be used to compute the mode for ungrouped data.

Mode of Ungrouped Data: Most Repeated Observation in Dataset

Example: Weights of 7 girls in kg are 54, 32, 45, 61, 20, 45 and 50. Determine the mode weight for the provided collection of data.

Mode = Most repeated observation in Dataset = 45 Thus, group’s mode weight is 45 kg.

If the variability of data within an experiment must be established, absolute measures of variability should be employed. These metrics often reflect differences in a data collection in terms of the average deviations of the observations. The most prevalent absolute measurements of deviation are mentioned below. In the following sections, we will look at the variability measures, their formulae in depth.

Standard Deviation

The range represents the spread of your data from the lowest to the highest value in the distribution. It is the most straightforward measure of variability to compute. To get the range, subtract the data set’s lowest and highest values.

Range = Highest Value – Lowest Value

Example: Calculate the range of the following data series: 5, 13, 32, 42, 15, 84

Arrange the provided data series in ascending order: 5, 13, 15, 32, 42, 84 Range = H – L = 84 – 5 = 79 So, the range is 79.

Standard deviation (s or SD) represents the average level of variability in your dataset. It represents the average deviation of each score from the mean. The higher the standard deviation, the more varied the dataset is.

To calculate standard deviation, follow these six steps:

Step 1: Make a list of each score and calculate the mean.

Step 2: Calculate deviation from the mean, by subtracting the mean from each score.

Step 3: Square each of these differences.

Step 4: Sum up all squared variances.

Step 5: Divide the total of squared variances by N-1.

Step 6: Find the square root of the number that you discovered.

Example: Calculate standard deviation of the following data series: 5, 13, 32, 42, 15, 84.

Step 1: First we have to calculate the mean of following series using formula: Σx / n

Step 2: Now calculate the deviation from mean, subtract the mean from each series.

Step 3: Squared the deviation from mean and then add all the deviation.

Step 4: Divide the squared deviation with N-1 => 4182.84 / 5 = 836.57

Step 5: √836.57 = 28.92

So, the standard deviation is 28.92

Variance is calculated as average of squared departures from the mean. Variance measures the degree of dispersion in a data collection. The more scattered the data, the larger the variance in relation to the mean. To calculate the variance, square the standard deviation.

Symbol for variance is s 2

Example: Calculate the variance of the following data series: 5, 13, 32, 42, 15, 84.

First we have to calculate the standard deviation, that we calculate above i.e. SD = 28.92 s 2 = (SD) 2 = (28.92) 2 = 836.37 So, the variance is 836.37

Mean Deviation

Mean Deviation is used to find the average of the absolute value of the data about the mean, median, or mode. Mean Deviation is some times also known as absolute deviation. The formula mean deviation is given as follows:

Mean Deviation = ∑ n 1 |X – μ|/n

μ is Central Value

Quartile Deviation

Quartile Deviation is the Half of difference between the third and first quartile. The formula for quartile deviation is given as follows:

Quartile Deviation = (Q 3 − Q 1 )/2

Q 3 is Third Quartile
Q 1 is First Quartile

Other measures of dispersion include the relative measures also known as the coefficients of dispersion.

Datasets consist of various scores or values. Statisticians employ graphs and tables to summarize the occurrence of each possible value of a variable, often presented in percentages or numerical figures.

For instance, suppose you were conducting a poll to determine people’s favorite Beatles. You would create one column listing all potential options (John, Paul, George, and Ringo) and another column indicating the number of votes each received. Statisticians represent these frequency distributions through graphs or tables

Univariate Descriptive Statistics

Univariate descriptive statistics focus on one thing at a time. We look at each thing individually and use different ways to understand it better. Programs like SPSS and Excel can help us with this.

If we only look at the average (mean) of something, like how much people earn, it might not give us the true picture, especially if some people earn a lot more or less than others. Instead, we can also look at other things like the middle value (median) or the one that appears most often (mode). And to understand how spread out the values are, we use things like standard deviation and variance along with the range.

Bivariate Descriptive Statistics

When we have information about more than one thing, we can use bivariate or multivariate descriptive statistics to see if they are related. Bivariate analysis compares two things to see if they change together. Before doing any more complicated tests, it’s important to look at how the two things compare in the middle.

Multivariate analysis is similar to bivariate analysis, but it looks at more than two things at once, which helps us understand relationships even better.

Representations of Data in Descriptive Statistics

Descriptive statistics use a variety of ways to summarize and present data in an understandable manner. This helps us grasp the data set’s patterns, trends, and properties.

Frequency Distribution Tables: Frequency distribution tables divide data into categories or intervals and display the number of observations (frequency) that fall into each one. For example, suppose we have a class of 20 students and are tracking their test scores. We may make a frequency distribution table that contains score ranges (e.g., 0-10, 11-20) and displays how many students scored in each range.

Graphs and Charts: Graphs and charts graphically display data, making it simpler to understand and analyze. For example, using the same test score data, we may generate a bar graph with the x-axis representing score ranges and the y-axis representing the number of students. Each bar on the graph represents a score range, and its height shows the number of students scoring within that range.

These approaches help us summarize and visualize data, making it easier to discover trends, patterns, and outliers, which is critical for making informed decisions and reaching meaningful conclusions in a variety of sectors.

Descriptive Statistics Applications

Descriptive statistics are used in a variety of sectors to summarize, organize, and display data in a meaningful and intelligible way. Here are a few popular applications:

Business and Economics: Descriptive statistics are useful for analyzing sales data, market trends, and customer behaviour. They are used to generate averages, medians, and standard deviations in order to better evaluate product performance, pricing strategies, and financial metrics.
Healthcare: Descriptive statistics are used to analyze patient data such as demographics, medical histories, and treatment outcomes. They assist healthcare workers in determining illness prevalence, assessing treatment efficacy, and identifying risk factors.
Education: Descriptive statistics are useful in education since they summarize student performance on tests and examinations. They assist instructors in assessing instructional techniques, identifying areas for improvement, and monitoring student growth over time.
Market Research: Descriptive statistics are used to analyze customer preferences, product demand, and market trends. They enable businesses to make educated decisions about product development, advertising campaigns, and market segmentation.
Finance and investment: Descriptive statistics are used to analyze stock market data, portfolio performance, and risk management. They assist investors in determining investment possibilities, tracking asset values, and evaluating financial instruments.

Difference Between Descriptive Statistics and Inferential Statistics

Difference between Descriptive Statistics and Inferential Statistics is studied using the table added below as,

Example of Descriptive Statistics Examples

Example 1: Calculate the Mean, Median and Mode for the following series: {4, 8, 9, 10, 6, 12, 14, 4, 5, 3, 4}

First, we are going to calculate the mean. Mean = Σx / n = (4 + 8 + 9 + 10 + 6 + 12 + 14 + 4 + 5 + 3 + 4)/11 = 79 / 11 = 7.1818 Thus, the Mean is 7.1818. Now, we are going to calculate the median. Arrange the provided data collection in ascending order: 3, 4, 4, 4, 5, 6, 8, 9, 10, 12, 14 Median = [(n + 1) / 2] th term = [(11 + 1) / 2] th term = 6 th term = 6 Thus, the median is 6. Now, we are going to calculate the mode. Mode = The most repeated observation in the dataset = 4 Thus, the mode is 4.

Example 2: Calculate the Range for the following series: {4, 8, 9, 10, 6, 12, 14, 4, 5, 3, 4}

Arrange the provided data series in ascending order: 3, 4, 4, 4, 5, 6, 8, 9, 10, 12, 14 Range = H – L = 14 – 3 = 11 So, the range is 11.

Example 3: Calculate the standard deviation and variance of following data: {12, 24, 36, 48, 10, 18}

First we are going to compute standard deviation. For standard deviation calculate the mean, deviation from mean and squared deviation.

Dividing squared deviation with N-1 => 1093.351 / 5 = 218.67

√(218.67) = 14.79

So, the standard deviation is 14.79.

Now we are going to calculate the variance.

s 2 = 218.744

So, the variance is 218.744

Practice Problems on Descriptive Statistics

P1) Determine the sample variance of the following series: {17, 21, 52, 28, 26, 23}

P2) Determine the mean and mode of the following series: {21, 14, 56, 41, 18, 15, 18, 21, 15, 18}

P3) Find the median of the following series: {7, 24, 12, 8, 6, 23, 11}

P4) Find the standard deviation and variance of the following series: {17, 28, 42, 48, 36, 42, 20}

FAQs of Descriptive Statistics

What is meant by descriptive statistics.

Descriptive statistics seek to summarize, organize, and display data in an accessible manner while avoiding making sweeping generalizations about the whole population. It aids in discovering patterns, trends, and distributions within the collection.

How is the mean computed in descriptive statistics?

Mean is computed by adding together all of the values in the dataset and dividing them by the total number of observations. It measures the dataset’s central tendency or average value.

What role do measures of variability play in descriptive statistics?

Measures of variability, such as range, standard deviation, and variance, aid in quantifying the spread or dispersion of data points around the mean. They give insights on the dataset’s variety and consistency.

Can you explain the median in descriptive statistics?

The median is the midpoint value of a dataset whether sorted ascending or descending. It measures central tendency and is important when dealing with skewed data or outliers.

How can frequency distribution measurements contribute to descriptive statistics?

Measures of frequency distribution summarize the incidence of various values or categories within a dataset. They give insights into the distribution pattern of the data and are commonly represented by graphs or tables.

How are inferential statistics distinguished from descriptive statistics?

Inferential statistics use sample data to draw inferences or make predictions about a wider population, whereas descriptive statistics summarize aspects of known data. Descriptive statistics concentrate on the present dataset, whereas inferential statistics go beyond the observable data.

Why are descriptive statistics necessary in data analysis?

Descriptive statistics give researchers and analysts a clear and straightforward summary of the dataset, helping them to identify patterns, trends, and distributions. It aids in making educated judgements and gaining valuable insights from data.

What are the four types of descriptive statistics?

There are four major types of descriptive statistics: Measures of Frequency Measures of Central Tendency Measures of Dispersion or Variation Measures of Position

Which is an example of descriptive statistics?

Descriptive statistics examples include the study of mean, median, and mode.

Please Login to comment...

Improve your Coding Skills with Practice

What kind of Experience do you want to share?

Innovative Statistics Project Ideas for Insightful Analysis

Table of contents

1.1 AP Statistics Topics for Project
1.2 Statistics Project Topics for High School Students
1.3 Statistical Survey Topics
1.4 Statistical Experiment Ideas
1.5 Easy Stats Project Ideas
1.6 Business Ideas for Statistics Project
1.7 Socio-Economic Easy Statistics Project Ideas
1.8 Experiment Ideas for Statistics and Analysis
2 Conclusion: Navigating the World of Data Through Statistics

Diving into the world of data, statistics presents a unique blend of challenges and opportunities to uncover patterns, test hypotheses, and make informed decisions. It is a fascinating field that offers many opportunities for exploration and discovery. This article is designed to inspire students, educators, and statistics enthusiasts with various project ideas. We will cover:

Challenging concepts suitable for advanced placement courses.
Accessible ideas that are engaging and educational for younger students.
Ideas for conducting surveys and analyzing the results.
Topics that explore the application of statistics in business and socio-economic areas.

Each category of topics for the statistics project provides unique insights into the world of statistics, offering opportunities for learning and application. Let’s dive into these ideas and explore the exciting world of statistical analysis.

Top Statistics Project Ideas for High School

Statistics is not only about numbers and data; it’s a unique lens for interpreting the world. Ideal for students, educators, or anyone with a curiosity about statistical analysis, these project ideas offer an interactive, hands-on approach to learning. These projects range from fundamental concepts suitable for beginners to more intricate studies for advanced learners. They are designed to ignite interest in statistics by demonstrating its real-world applications, making it accessible and enjoyable for people of all skill levels.

Need help with statistics project? Get your paper written by a professional writer Get Help Reviews.io 4.9/5

AP Statistics Topics for Project

Analyzing Variance in Climate Data Over Decades.
The Correlation Between Economic Indicators and Standard of Living.
Statistical Analysis of Voter Behavior Patterns.
Probability Models in Sports: Predicting Outcomes.
The Effectiveness of Different Teaching Methods: A Statistical Study.
Analysis of Demographic Data in Public Health.
Time Series Analysis of Stock Market Trends.
Investigating the Impact of Social Media on Academic Performance.
Survival Analysis in Clinical Trial Data.
Regression Analysis on Housing Prices and Market Factors.

Statistics Project Topics for High School Students

The Mathematics of Personal Finance: Budgeting and Spending Habits.
Analysis of Class Performance: Test Scores and Study Habits.
A Statistical Comparison of Local Public Transportation Options.
Survey on Dietary Habits and Physical Health Among Teenagers.
Analyzing the Popularity of Various Music Genres in School.
The Impact of Sleep on Academic Performance: A Statistical Approach.
Statistical Study on the Use of Technology in Education.
Comparing Athletic Performance Across Different Sports.
Trends in Social Media Usage Among High School Students.
The Effect of Part-Time Jobs on Student Academic Achievement.

Statistical Survey Topics

Public Opinion on Environmental Conservation Efforts.
Consumer Preferences in the Fast Food Industry.
Attitudes Towards Online Learning vs. Traditional Classroom Learning.
Survey on Workplace Satisfaction and Productivity.
Public Health: Attitudes Towards Vaccination.
Trends in Mobile Phone Usage and Preferences.
Community Response to Local Government Policies.
Consumer Behavior in Online vs. Offline Shopping.
Perceptions of Public Safety and Law Enforcement.
Social Media Influence on Political Opinions.

Statistical Experiment Ideas

The Effect of Light on Plant Growth.
Memory Retention: Visual vs. Auditory Information.
Caffeine Consumption and Cognitive Performance.
The Impact of Exercise on Stress Levels.
Testing the Efficacy of Natural vs. Chemical Fertilizers.
The Influence of Color on Mood and Perception.
Sleep Patterns: Analyzing Factors Affecting Sleep Quality.
The Effectiveness of Different Types of Water Filters.
Analyzing the Impact of Room Temperature on Concentration.
Testing the Strength of Different Brands of Batteries.

Easy Stats Project Ideas

Average Daily Screen Time Among Students.
Analyzing the Most Common Birth Months.
Favorite School Subjects Among Peers.
Average Time Spent on Homework Weekly.
Frequency of Public Transport Usage.
Comparison of Pet Ownership in the Community.
Favorite Types of Movies or TV Shows.
Daily Water Consumption Habits.
Common Breakfast Choices and Their Nutritional Value.
Steps Count: A Week-Long Study.

Business Ideas for Statistics Project

Analyzing Customer Satisfaction in Retail Stores.
Market Analysis of a New Product Launch.
Employee Performance Metrics and Organizational Success.
Sales Data Analysis for E-commerce Websites.
Impact of Advertising on Consumer Buying Behavior.
Analysis of Supply Chain Efficiency.
Customer Loyalty and Retention Strategies.
Trend Analysis in Social Media Marketing.
Financial Risk Assessment in Investment Decisions.
Market Segmentation and Targeting Strategies.

Socio-Economic Easy Statistics Project Ideas

Income Inequality and Its Impact on Education.
The Correlation Between Unemployment Rates and Crime Levels.
Analyzing the Effects of Minimum Wage Changes.
The Relationship Between Public Health Expenditure and Population Health.
Demographic Analysis of Housing Affordability.
The Impact of Immigration on Local Economies.
Analysis of Gender Pay Gap in Different Industries.
Statistical Study of Homelessness Causes and Solutions.
Education Levels and Their Impact on Job Opportunities.
Analyzing Trends in Government Social Spending.

Experiment Ideas for Statistics and Analysis

Multivariate Analysis of Global Climate Change Data.
Time-Series Analysis in Predicting Economic Recessions.
Logistic Regression in Medical Outcome Prediction.
Machine Learning Applications in Statistical Modeling.
Network Analysis in Social Media Data.
Bayesian Analysis of Scientific Research Data.
The Use of Factor Analysis in Psychology Studies.
Spatial Data Analysis in Geographic Information Systems (GIS).
Predictive Analysis in Customer Relationship Management (CRM).
Cluster Analysis in Market Research.

Conclusion: Navigating the World of Data Through Statistics

In this exploration of good statistics project ideas, we’ve ventured through various topics, from the straightforward to the complex, from personal finance to global climate change. These ideas are gateways to understanding the world of data and statistics, and platforms for cultivating critical thinking and analytical skills. Whether you’re a high school student, a college student, or a professional, engaging in these projects can deepen your appreciation of how statistics shapes our understanding of the world around us. These projects encourage exploration, inquiry, and a deeper engagement with the world of numbers, trends, and patterns – the essence of statistics.

Readers also enjoyed

WHY WAIT? PLACE AN ORDER RIGHT NOW!

Just fill out the form, press the button, and have no worries!

We use cookies to give you the best experience possible. By continuing we’ll assume you board with our cookie policy.

Open access
Published: 14 May 2024

Social media influence on COVID-19 vaccine perceptions among University students: a Malawi case study

Mervis Folotiya 1 &
Chimwemwe Ngoma ORCID: orcid.org/0000-0001-8648-1244 1 , 2

BMC Public Health volume 24 , Article number: 1312 ( 2024 ) Cite this article

375 Accesses

1 Altmetric

Metrics details

Introduction

The global fight against the COVID-19 pandemic relies significantly on vaccination. The collective international effort has been massive, but the pace of vaccination finds hindrance due to supply and vaccine hesitancy factors. Understanding public perceptions, especially through the lens of social media, is important. This study investigates the influence of social media on COVID-19 vaccine perceptions among university students in Malawi.

The study utilized a quantitative methodology and employed a cross-sectional study design to explore the relationship between social media dynamics and COVID-19 vaccine perceptions among 382 randomly sampled students at MUBAS. Data, collected by use of a Likert-scale questionnaire, was analyzed using IBM SPSS 20 for descriptive statistics and Pearson correlation tests.

The findings reveal crucial correlations. Specifically, trust in online vaccine information shows a positive correlation ( r = 0.296, p < 0.01) with active engagement in social media discussions. Conversely, a negative correlation surfaces concerning individuals’ reactions to vaccine availability in Malawi ( r = -0.026, p > 0.05). The demographic overview highlights the prevalence of the 16 to 30 age group, representing 92.9% of respondents.

Conclusions

The identified correlations emphasize the need for careful communication strategies tailored to combat misinformation and enhance vaccine acceptance among the younger demographic in Malawi. The positive correlation between trust in online vaccine information and social media engagement underscores digital platforms’ potential for disseminating accurate information. Conversely, the negative correlation with vaccine availability reactions suggest the presence of complex factors shaping public perceptions.

Peer Review reports

Vaccination has become an important weapon in the global fight against the Coronavirus Disease 2019 (COVID-19), helping to slow down the virus’s spread [ 1 ]. Global efforts to implement COVID-19 vaccination campaigns have been unprecedented in scale and scope [ 2 ], with governments, pharmaceutical firms, and organizations collaborating to develop, manufacture, and distribute vaccines at a faster pace [ 3 ]. However, the difficulties have been complex, involving issues like vaccine hesitancy and unequal vaccine supply and distribution [ 4 ].

The pandemic has had an impact on Malawi’s healthcare delivery and public health interventions, primarily due to the country’s limited healthcare infrastructure, socio-economic disparities, and cultural beliefs that influence perceptions of healthcare practices [ 5 , 6 ]. This context has also contributed to vaccine hesitancy, resulting in a lower uptake of COVID-19 vaccines in the country. As of May 2023, the uptake rate stands at 40%, which is lower than the country’s 60% target [ 7 , 8 ].

A recent scoping review on COVID-19 vaccination hesitancy among Malawians reveals that COVID-19 vaccine reluctance is primarily the result of misinformation, with vaccines perceived as harmful or dangerous. Myths such as infertility, severe disability, or even death have contributed to vaccine hesitancy [ 9 ]. The review also reveals that some people refuse vaccinations because of their religious convictions and beliefs [ 9 ]. The challenges posed by vaccine hesitancy in Malawi highlight the need for targeted communication strategies and public health initiatives that consider the country’s unique socio-cultural context, aimed at achieving widespread vaccination coverage and understanding public opinions about COVID-19 vaccinations [ 8 , 10 , 11 ]. These strategies must address factors such as misinformation, lack of trust in healthcare institutions, fear of side effects, and cultural beliefs surrounding vaccination, which contribute to reluctance to accept COVID-19 vaccines.

Vaccine perception plays an important role in determining the success of vaccination efforts, and these perceptions are shaped by exposure to (mis)information amplified by the media, the community, and the health system. Notably, social networks may either positively or negatively impact vaccination uptake, depending on their views on vaccines [ 12 ]. Given these challenges, the success of vaccination campaigns relies not only on the development and distribution of vaccines but also on how these interventions are perceived by the public. Public attitudes and beliefs surrounding vaccine safety, efficacy, and necessity significantly impact vaccine uptake [ 11 ].

Social media has emerged a powerful tool in the current information-dissemination landscape for influencing public opinion. Its role in health communication has expanded significantly, providing a dynamic platform for sharing information, influencing attitudes, and shaping behavior [ 13 ]. During the COVID-19 pandemic, social media platforms played an important role in amplifying public health messages [ 14 ]. However, social media’s very nature, which is marked by a rapid flow of information and a variety of sources, also poses numerous challenges. The spread of false and scientifically inaccurate information are some of the issues that health communication must deal with in the digital age [ 15 ].

Within the unique university settings, the dynamics of vaccine perceptions take on a distinctive dimension. The convergence of diverse backgrounds, cultures, and perspectives among university students creates a rich tapestry of attitudes towards health-related issues. Understanding the specific distinctions within this demographic is crucial for tailoring effective public health interventions. Factors such as lack of access, affordability, health disparities, educational background, peer pressure, political views, and lack of trust in institutions may be influenced in unique ways [ 16 ].

Despite the growing body of literature exploring the influence of social media on vaccine perceptions, a research gap exists concerning its specific impact on COVID-19 vaccine perceptions among university students in Malawi. While other studies have examined aspects of social media influence on COVID-19 perceptions [ 17 , 18 , 19 , 20 ], such as the role of trust in online vaccine information, and engagement in social media discussions, there remains a need for more research within this specific demographic. Therefore, this study seeks to contribute in bridging this gap by providing valuable insights that not only enhance the academic understanding of the subject but also provide practical implications for public health communication strategies tailored to the Malawian university setting.

The study employed a quantitative methodology, and utilized a cross-sectional study design to investigate the relationship between social media dynamics and COVID-19 vaccine perceptions among university students at Malawi University of Business and Applied Sciences (MUBAS), which had a student population of 7,619 during the 2022/2023 academic year.

To ensure unbiased participant selection and equitable representation, a simple random sampling technique was employed. Using the population of students at MUBAS, a sampling frame was created in Excel, including student IDs as the unique identifier number. Subsequently, a random number generator was integrated in Excel to randomly select participants in the study to ensure that each participant had an equal chance of being selected. Using the Taro Yamane method, a sample size of 380 respondents was determined from the total student population of 7619, at precision level of 5%. However, for practical considerations, the sample size was adjusted upwards to 388. Potential participants were approached through an invitation process, and were informed about the purpose of the study. A total of 382 complete questionnaires were collected.

Data collection

A self-administered questionnaire, organized into three sections (demographics, access to COVID-19 vaccine information on social media, and awareness of COVID-19 vaccine information), was used for data collection. Respondents’ attitudes and perspectives were recorded using the Likert scale. The data collection process involved a comprehensive exploration of variables, covering aspects such as social media usage, access to COVID-19 vaccine information, engagement levels, trust, and demographic details. The survey yielded a response rate of 99.5%, indicating robust participant engagement and contributing to the reliability of the gathered data. Ethical considerations, including confidentiality, consent, and voluntary participation, were observed to safeguard the privacy of the participants. Additionally, the study was evaluated and approved by the MUBAS Postgraduate Research Evaluation Committee, study number MMS/20/PG/004.

Data analysis

Rigorous editing procedures were applied to ensure data completeness and consistency. IBM SPSS 20 software was used for data coding, cleaning, and analysis. Descriptive statistics and Pearson correlation tests were also performed to derive insights from the dataset. In-depth analyses involved the interpretation of demographic data, an assessment of social media dynamics, and exploration of relationships through Pearson correlation. Results were interpreted based on statistical significance, effect size, and alignment with existing research, contributing to a better understanding of the interaction between social media, COVID-19 vaccine misinformation, and hesitancy among university students.

The study established that all participants in the study utilized social media and internet for various purposes. A noteworthy positive correlation ( r = 0.296, p < 0.01) emerged, indicating a strong association between trust in online vaccine information and active engagement in social media discussions. Conversely, a significant negative correlation ( r = -0.610, p < 0.01) was identified, shedding light on the relationship between individuals’ reactions to vaccine availability in Malawi and their trust in online information.

Demographic overview

This section provides an overview of the respondents’ demographics using descriptive statistics. Four key characteristic namely age, education year, gender, and nationality, were analyzed to unveil insights into the composition of the study participants.

The study participants exclusively comprised Malawian students, aligning with the university’s predominantly undergraduate student population, which constitutes a significant percentage of the study sample. International students registered at the university are mostly postgraduate students and represented by less than 5% of the study population. Respondents covered various age groups, with the majority falling within the 21 to 30 years range ( n = 230). This age bracket represents 60.2% of the total respondents. 26.7% of the respondents were in their fourth year, 25.4% in the first year, and 21.7% and 21.2% in the second and third years, respectively. Postgraduate students represented the smallest group at 5%. Regarding gender, 50.3% of respondents were female, while 49.7% were male.

Correlation matrix of key variables

Correlation is deemed significant at the 0.01 level (two-tailed).

N = 382 for all correlations.

Table 1 presents an analysis of key variables related to COVID-19 vaccine perceptions among university students. The Pearson correlation coefficient shows the relationship between trust in online COVID-19 vaccine information, COVID-19 vaccine effectiveness, participation in COVID-19 discussions on social media, and response to the COVID-19 vaccine.

A noteworthy positive correlation ( r = 0.296, p < 0.01) between trust in online vaccine information and participation in social media discussions is observed, indicating that people who trust online vaccine information are more likely to participate in digital health discussions. On the other hand, a significant correlation ( r = -0.610, p < 0.01) between trust in internet-based vaccine information and individuals’ attitudes toward vaccination, highlighting the influence of trust on vaccination perceptions. Additionally, a modest negative correlation ( r = -0.087, p > 0.05) between the COVID-19 vaccine and participating in Covid-19 discussions on social media is noted. Although not statistically significant, this relationship suggests a possible trend by which individuals protected from COVID-19 may exhibit lower levels of social media engagement.

The demographic overview in Table 2 reveals that a majority of respondents ( n = 230), fall within the 21 to 30 age range. Notably, the 16 to 30 age range constitutes an accumulative 92.9% of the total respondents. This aligns with predominant internet usage patterns globally [ 21 ], and emphasizes the importance of understanding the perspectives of this demographic in shaping COVID-19 vaccine perceptions. Customizing information dissemination to this group’s preferences and habits becomes important, since they make up a significant portion of social media and internet users.

The gender distribution in Table 2 , shows a near-balanced representation, with 50.3% female and 49.7% male respondents. Previous research on the connection between gender and social media use reveal broader trends indicating the popularity of social media among females than males [ 22 , 23 , 24 ]. The study benefits from the diverse perspectives contributed by both genders in social media discussions, which is highlighted by this balance.

The study examined the relationships among the variables influencing the perceptions of COVID-19 vaccines. The analysis in Table 1 reveals a positive correlation between trust in online vaccine information and active engagement in COVID-19 discussions on social media ( r = 0.296, p < 0.01). This aligns with existing literature, which suggests that people who place trust in online health information are more likely to actively participate in digital health dialogues [ 25 ]. Furthermore, the relationship between trust and engagement extends beyond mere participation, it influences the dissemination of accurate information and the formation of informed opinions within online communities [ 26 ].

One effective way to promote trust in online vaccine information is through transparent communication practices, including providing information from reputable sources, ensuring data accuracy and reliability, engaging with credible health experts, and conducting educational campaigns on health literacy and critical thinking [ 27 ].

Moreover, the positive correlation emphasizes the potential of digital platforms, particularly social media, in promoting health literacy and influencing public health behaviors [ 28 ]. As individuals trust the information they encounter online, they are more likely to share it with their social networks, leading to broader awareness and understanding of vaccination-related issues [ 26 ]. This phenomenon has implications for public health communication strategies, indicating that efforts to build trust in online vaccine information can have cascading effects on community engagement and knowledge dissemination, ultimately contributing to improved vaccination rates and public health outcomes [ 29 ].

The negative correlation ( r = -0.026, p > 0.05) found between people’s reactions to vaccine availability in Malawi and their trust in online vaccine information is an interesting finding. On the surface, one might expect that higher trust in online vaccine information would correspond to more positive reactions toward vaccine availability. However, this result is consistent with studies that highlight the complexity of public attitudes toward vaccination, often influenced by contextual and socio-cultural factors [ 30 , 31 ]. This unexpected finding prompts an exploration into the factors influencing public sentiment in the context of vaccine availability. Factors such as misinformation, fear of side effects, and cultural beliefs surrounding vaccination play a crucial role in shaping vaccine perceptions among Malawians, contributing to vaccine hesitancy and lower uptake rates compared to the set targets [ 5 , 6 ].

Additionally, a recent study on the impact of social media news on COVID-19 vaccine hesitancy and vaccination behavior suggests that individuals are more sensitive to vaccine risk news than safety news on social media, indicating a relationship between the type of information and its impact on perception [ 17 ]. This resonates with the findings of this study, highlighting the complex nature of public sentiment, shaped by the interaction of trust, engagement, and the specific content of vaccine-related information on social media.

Moreover, although the negative correlation between COVID-19 vaccination and participation in COVID-19 discussions on social media ( r = -0.087, p > 0.05) was not statistically significant, it indicates a possible a potential trend worth exploring further. This finding suggests that people who have received the COVID-19 vaccine may show slightly lower levels of participation in COVID-19 discussions on social media. This observation raises questions about the factors influencing online engagement among vaccinated individuals within this population group. One of the possible factors contributing to this trend could be that vaccinated individuals may feel a reduced sense of urgency or concern about COVID-19 compared to unvaccinated individuals, leading to less active participation in discussions about the virus on social media.

This study underscores the interaction between trust in online vaccine information, social media engagement, and public perception regarding COVID-19 vaccination. The positive correlation identified between trust and active participation in social media discussions highlights the role of reliable online sources in shaping public discourse. The negative correlation between trust and individuals’ reactions to vaccine availability prompts a deeper exploration into the factors influencing public perception. By acknowledging and addressing these factors, policymakers and healthcare providers can enhance vaccine acceptance and uptake rates.

Limitations and future directions

While the study provides valuable insights, certain limitations warrant acknowledgment. The reliance on self-reported data and the cross-sectional design inherent in the methodology limit the extent to which causal inferences can be drawn [ 32 ]. Additionally, the exclusive focus on students from MUBAS may not fully capture the broader spectrum of the population. To address these constraints, future research could employ longitudinal designs and incorporate diverse demographic groups for a more comprehensive understanding of the dynamics shaping COVID-19 vaccine perceptions.

The findings of this study underscore the importance of promoting trust in online vaccine information and leveraging digital platforms, particularly social media, to enhance health literacy and influence public health behaviors. Addressing vaccine hesitancy requires tailored communication strategies that are responsive to widespread concerns. By actively promoting trust in the veracity of online vaccine information and recognizing the influence of contextual and socio-cultural factors on public sentiment, public health campaigns can effectively utilize social media platforms to promote positive attitudes and perceptions regarding COVID-19 vaccination.

Data availability

The data obtained from the project is accessible and can be provided by the first author upon reasonable request.

Abbreviations

19–Coronavirus Disease 2019

Malawi University of Business and Applied Sciences

Tang B, Zhang X, Li Q, Bragazzi NL, Golemi-Kotra D, Wu J. The minimal COVID-19 vaccination coverage and efficacy to compensate for a potential increase of transmission contacts, and increased transmission probability of the emerging strains. BMC Public Health [Internet]. 2022;22(1). https://doi.org/10.1186/s12889-022-13429-w .

Machado BAS, Hodel KVS, Fonseca LMDS, Pires VC, Mascarenhas LAB, Da Silva Andrade LPC et al. The importance of vaccination in the context of the COVID-19 pandemic: A brief update regarding the use of vaccines. Vaccines [Internet]. 2022;10(4):591. https://doi.org/10.3390/vaccines10040591 .

Druedahl LC, Minssen T, Price WN. Collaboration in times of crisis: A study on COVID-19 vaccine R&D partnerships. Vaccine [Internet]. 2021;39(42):6291–5. https://doi.org/10.1016/j.vaccine.2021.08.101 .

Blasioli E, Mansouri B, Tamvada SS, Hassini E. Vaccine Allocation and Distribution: A Review with a Focus on Quantitative Methodologies and Application to Equity, Hesitancy, and COVID-19 Pandemic. Operations Research Forum [Internet]. 2023;4(2). https://doi.org/10.1007/s43069-023-00194-8 .

Phiri M, MacPherson E, Panulo M, Chidziwisano K, Kalua K, Chirambo CM et al. Preparedness for and impact of COVID-19 on primary health care delivery in urban and rural Malawi: a mixed methods study. BMJ Open [Internet]. 2022;12(6):e051125. https://doi.org/10.1136/bmjopen-2021-051125 .

Chawinga WD, Singini W, Phuka J, Chimbatata N, Mitambo C, Sambani C et al. Combating coronavirus disease (COVID-19) in rural areas of Malawi: Factors affecting the fight. African Journal of Primary Health Care & Family Medicine [Internet]. 2023;15(1). https://doi.org/10.4102/phcfm.v15i1.3464 .

TRADING ECONOMICS. Malawi Coronavirus COVID-19 vaccination rate [Internet]. TRADING ECONOMICS. 2024 [cited 2024 Mar 23]. https://tradingeconomics.com/malawi/coronavirus-vaccination-rate .

Bwanali AN, Lubanga A, Mphepo M, Munthali L, Chumbi GD, Kangoma M. Vaccine hesitancy in Malawi: a threat to already-made health gains. Annals of Medicine and Surgery [Internet]. 2023;85(10):5291–3. https://doi.org/10.1097/ms9.0000000000001198 .

Nkambule E, Mbakaya BC. COVID-19 vaccination hesitancy among Malawians: a scoping review. Systematic Reviews [Internet]. 2024;13(1). https://doi.org/10.1186/s13643-024-02499-z .

West R, Hurst NB, Sharma S, Henry B, Vitale-Rogers S, Mutahi W et al. Communication strategies to promote vaccination behaviours in sub-Saharan Africa. BMC Global and Public Health [Internet]. 2023;1(1). https://doi.org/10.1186/s44263-023-00004-7 .

Paul E, Steptoe A, Fancourt D. Attitudes towards vaccines and intention to vaccinate against COVID-19: Implications for public health communications. The Lancet Regional Health - Europe [Internet]. 2021;1:100012. https://doi.org/10.1016/j.lanepe.2020.100012 .

Loreche AM, Pepito VCF, Sumpaico-Tanchanco LB, Dayrit MM. COVID-19 vaccine brand hesitancy and other challenges to vaccination in the Philippines. PLOS Global Public Health [Internet]. 2022;2(1):e0000165. https://doi.org/10.1371/journal.pgph.0000165 .

Benetoli A, Chen T, Aslani P. How patients’ use of social media impacts their interactions with healthcare professionals. Patient Education and Counseling [Internet]. 2018;101(3):439–44. https://doi.org/10.1016/j.pec.2017.08.015 .

Obi-Ani NA, Anikwenze C, Isiani MC. Social media and the Covid-19 pandemic: Observations from Nigeria. Cogent Arts & Humanities [Internet]. 2020;7(1):1799483. https://doi.org/10.1080/23311983.2020.1799483 .

Kadam A, Atre S. Negative impact of social media panic during the COVID-19 outbreak in India. Journal of Travel Medicine [Internet]. 2020;27(3). https://doi.org/10.1093/jtm/taaa057 .

Gilbert-Esparza E, Brady A, Haas S, Wittstruck H, Miller J, Kang Q et al. Vaccine hesitancy in college students. Vaccines [Internet]. 2023;11(7):1243. https://doi.org/10.3390/vaccines11071243 .

Zhang Q, Zhang R, Wu WC, Liu Y, Yu Z. Impact of social media news on COVID-19 vaccine hesitancy and vaccination behavior. Telematics and Informatics [Internet]. 2023;80:101983. https://doi.org/10.1016/j.tele.2023.101983 .

Cascini F, Pantović A, Al-Ajlouni YA, Failla G, Puleo V, Melnyk A et al. Social media and attitudes towards a COVID-19 vaccination: A systematic review of the literature. EClinicalMedicine [Internet]. 2022;48:101454. https://doi.org/10.1016/j.eclinm.2022.101454 .

Gudi SK, George SM, José J. Influence of social media on the public perspectives of the safety of COVID-19 vaccines. Expert Review of Vaccines [Internet]. 2022;21(12):1697–9. https://doi.org/10.1080/14760584.2022.2061951 .

Wilson SL, Wiysonge CS. Social media and vaccine hesitancy. BMJ Global Health [Internet]. 2020;5(10):e004206. https://doi.org/10.1136/bmjgh-2020-004206 .

Statista. Age distribution of internet users worldwide 2021 [Internet]. Statista. 2023. https://www.statista.com/statistics/272365/age-distribution-of-internet-users-worldwide/ .

Booker C, Kelly Y, Sacker A. Gender differences in the associations between age trends of social media interaction and well-being among 10–15 year olds in the UK. BMC Public Health [Internet]. 2018;18(1). https://doi.org/10.1186/s12889-018-5220-4 .

Karatsoli M, Nathanail E. Examining gender differences of social media use for activity planning and travel choices. European Transport Research Review [Internet]. 2020;12(1). https://doi.org/10.1186/s12544-020-00436-4 .

Chidiac M, Ross C, Marston HR, Freeman S. Age and Gender Perspectives on Social Media and Technology Practices during the COVID-19 Pandemic. International Journal of Environmental Research and Public Health [Internet]. 2022;19(21):13969. https://doi.org/10.3390/ijerph192113969 .

Impact of internet use on health-related behaviors and the patient-physician relationship: a survey-based study and review [Internet]. PubMed. 2008. https://pubmed.ncbi.nlm.nih.gov/19075034 .

Westney ZV, Hur I, Wang L, Sun J. Examining the effects of disinformation and trust on social media users’ COVID-19 vaccine decision-making. Information Technology & People [Internet]. 2023; https://doi.org/10.1108/itp-05-2022-0410 .

Fan J, Wang X, Du S, Mao A, Du H, Qiu W. Discussion of the Trust in Vaccination against COVID-19. Vaccines [Internet]. 2022;10(8):1214. https://doi.org/10.3390/vaccines10081214 .

Al-Dmour H, Masa’deh R, Salman A, Abuhashesh M, Al-Dmour R. Influence of social media platforms on public health protection against the COVID-19 pandemic via the mediating effects of public health awareness and behavioral changes: Integrated model. Journal of Medical Internet Research [Internet]. 2020;22(8):e19996. https://doi.org/10.2196/19996 .

De Freitas L, Basdeo D, Wang H. Public trust, information sources and vaccine willingness related to the COVID-19 pandemic in Trinidad and Tobago: an online cross-sectional survey. The Lancet Regional Health - Americas [Internet]. 2021;3:100051. https://doi.org/10.1016/j.lana.2021.100051 .

AlShurman BA, Khan AF, Mac C, Majeed M, Butt ZA. What demographic, social, and contextual factors influence the intention to use COVID-19 vaccines: a scoping review. International Journal of Environmental Research and Public Health [Internet]. 2021;18(17):9342. https://doi.org/10.3390/ijerph18179342 .

Larson HJ, Jarrett C, Eckersberger E, Smith D, Paterson P. Understanding vaccine hesitancy around vaccines and vaccination from a global perspective: A systematic review of published literature, 2007–2012. Vaccine [Internet]. 2014;32(19):2150–9. https://doi.org/10.1016/j.vaccine.2014.01.081 .

Levy JT, Maroney J, Kashem MA. Introduction to clinical research. In: Elsevier eBooks [Internet]. 2023. pp. 105–10. https://doi.org/10.1016/b978-0-323-90300-4.00040-9 .

Download references

Acknowledgements

We extend our heartfelt gratitude to Dr. Jolly Ntaba, the Supervisor of this study, for his invaluable guidance, and insightful feedback during the research project. We also appreciate the peer review and feedback provided by Mr. Andrew Kaponya and Mr. Ronald Udedi from the Department of Journalism and Media Studies, which greatly contributed to the improvement of this study.

The authors did not receive any funding to undertake the research and develop this paper.

Author information

Authors and affiliations.

Malawi University of Business and Applied Sciences, Blantyre, Malawi

Mervis Folotiya & Chimwemwe Ngoma

Department of Research and Innovation, ThinkSmart Consulting, Lilongwe, Malawi

Chimwemwe Ngoma

You can also search for this author in PubMed Google Scholar

Contributions

The original concept for this study was conceived by MF, who also designed the study and undertook data collection. Data analysis was conducted by MF and CN. CN took the lead in composing the initial draft of this paper. Subsequently, MF, and CN engaged in a critical revision process to enhance its intellectual depth. All authors participated in reviewing, reading, and endorsing the final version of the paper.

Corresponding author

Correspondence to Chimwemwe Ngoma .

Ethics declarations

Ethics approval and consent to participate.

The study was evaluated and proved by the Malawi University of Business and Applied Sciences (MUBAS) Postgraduate Research Evaluation Committee, study number MMS/20/PG/004. In addition, informed consent was obtained from all participants prior to their involvement in the study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been updated to correct tables 1 & 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Folotiya, M., Ngoma, C. Social media influence on COVID-19 vaccine perceptions among University students: a Malawi case study. BMC Public Health 24 , 1312 (2024). https://doi.org/10.1186/s12889-024-18764-8

Download citation

Received : 07 December 2023

Accepted : 02 May 2024

Published : 14 May 2024

DOI : https://doi.org/10.1186/s12889-024-18764-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Vaccination
Social media
University students
Vaccine perceptions
Communication strategies

BMC Public Health

ISSN: 1471-2458

General enquiries: [email protected]

Open access
Published: 11 May 2024

The reliability of the College of Intensive Care Medicine of Australia and New Zealand “Hot Case” examination

Kenneth R. Hoffman 1 , 2 ,
David Swanson 3 ,
Stuart Lane 4 ,
Chris Nickson 1 , 2 ,
Paul Brand 5 &
Anna T. Ryan 3

BMC Medical Education volume 24 , Article number: 527 ( 2024 ) Cite this article

1271 Accesses

5 Altmetric

Metrics details

High stakes examinations used to credential trainees for independent specialist practice should be evaluated periodically to ensure defensible decisions are made. This study aims to quantify the College of Intensive Care Medicine of Australia and New Zealand (CICM) Hot Case reliability coefficient and evaluate contributions to variance from candidates, cases and examiners.

This retrospective, de-identified analysis of CICM examination data used descriptive statistics and generalisability theory to evaluate the reliability of the Hot Case examination component. Decision studies were used to project generalisability coefficients for alternate examination designs.

Examination results from 2019 to 2022 included 592 Hot Cases, totalling 1184 individual examiner scores. The mean examiner Hot Case score was 5.17 (standard deviation 1.65). The correlation between candidates’ two Hot Case scores was low (0.30). The overall reliability coefficient for the Hot Case component consisting of two cases observed by two separate pairs of examiners was 0.42. Sources of variance included candidate proficiency (25%), case difficulty and case specificity (63.4%), examiner stringency (3.5%) and other error (8.2%). To achieve a reliability coefficient of > 0.8 a candidate would need to perform 11 Hot Cases observed by two examiners.

The reliability coefficient for the Hot Case component of the CICM second part examination is below the generally accepted value for a high stakes examination. Modifications to case selection and introduction of a clear scoring rubric to mitigate the effects of variation in case difficulty may be helpful. Increasing the number of cases and overall assessment time appears to be the best way to increase the overall reliability. Further research is required to assess the combined reliability of the Hot Case and viva components.

Peer Review reports

Credentialling medical specialists requires defined performance standards [ 1 , 2 ] and traditionally relies upon high stakes examinations to assess trainees against those standards [ 3 , 4 , 5 ]. These examinations substitute for controlling quality of care by attempting to control progression through training programs for the safety of both patients and society. Specialist colleges are also expected to provide transparent and fair assessment processes, to ensure defensible decisions are made regarding trainee progression and specialist credentialling [ 6 ].

The College of Intensive Care Medicine of Australia and New Zealand (CICM) second part examination was introduced in 1979 and has undergone many revisions [ 3 ]. It has two components: a written examination and, if completed successfully, an oral examination. The oral examination includes an eight-station viva assessment and two clinical “Hot Case” assessments. This Hot Case component targets the highest level of assessment on Miller’s Pyramid [ 7 ], ‘Does’, requiring candidates to be assessed in the workplace performing real-world tasks. Of the candidates who have passed the written examination successfully, only 35% pass both Hot Cases [ 8 ]. It is therefore important to evaluate both the validity of inferences from this examination component and the reliability or reproducibility of the results [ 9 ].

Reliability describes the degree to which variation in scores reflects true variability in candidates’ proficiency, rather than measurement error. This is dependent on the task, examiner stringency and assessment context [ 10 ]. Reliability can be quantified using the reliability coefficient, with 0 representing a completely unreliable assessment and 1 representing a completely reliable assessment. The minimum standard generally considered acceptable for high stakes medical examinations is a reliability coefficient greater than 0.8 [ 11 , 12 , 13 , 14 ].

Generalisability theory (G-theory) provides the statistical basis for combining multiple sources of variance into a single analysis [ 15 ]. This enables the calculation of an overall reliability coefficient and calculation of the contribution from candidates, cases and examiners to examination reliability. G-theory also provides the basis for conducting decision studies (D-studies) that statistically project reliability based on alternate assessment designs.

To date, no information on the reliability of the CICM second part examination has been published. Given the implications of incorrect credentialling decisions for trainees, patients and society, the Hot Case reliability coefficient should be quantified.

Examination format

The second part examination prior to COVID-19 was held twice yearly with candidates invited to the oral component in a single Australian city. Trainees complete two Hot Cases within metropolitan intensive care units (ICU) with 20 min allocated for each: 10 min to examine an ICU patient, followed by 10 min with paired examiners to present their findings and answer questions regarding investigations and clinical management.

Format changes occurred during the COVID-19 pandemic. The first oral examination was cancelled in 2020, with trainees deferring to the second sitting. Additionally, travel restrictions meant candidates sat the Hot Case component in their home city with local examiners from the second sitting in 2020 to the second sitting in 2021. From 2022 onwards, the oral examination has been held in Sydney, Melbourne, or both.

Hot Cases are marked out of 10 by two CICM examiners using a rating scale that scores candidates based on how comfortable examiners would be supervising them. An acceptable pass standard (5/10) indicates an examiner is comfortable to leave the candidate in charge of the ICU with minimal supervision. There is no specific scoring rubric, although examiner pairs cooperatively determine clinical signs that should be identified, nominate investigations and imaging to show a candidate, and specify discussion questions. Expected levels of knowledge, interpretation and clinical management are defined prospectively. An automatic fail for the entire oral examination is triggered if candidates fail both Hot Cases and obtain a Hot Case component mark < 40% of the possible marks.

Examiner calibration

Examiners undergo calibration training prior to the examination. They independently score the candidate, then discuss their individual scores and rationale. Examiners can then amend their score before recording final scores in the examination database. Each Hot Case is marked by separate pairs of examiners, to prevent bias from a candidates first case performance influencing their second case score. Following the examination, results are presented to the whole examiner cohort for further discussion and explanation.

Data collection

The CICM provided access to their examination database from the second sitting of 2012 (2012-2) through to the first sitting of 2022 (2022-1). For each de-identified candidate, the written mark, overall Hot Case mark, viva mark, and overall examination mark were obtained. The Hot Case specific data included the cases used, examiners present and individual examiner marks, with a total of four scores per candidate (two examiner scores for each Hot Case).

Analysis was restricted to 2019-1 to 2022-1 due to data recording inconsistency providing insufficient data for G-theory analysis. Additionally, changes occurred from 2019-1 with the introduction of the Angoff standard setting method [ 16 , 17 ] for the written examination. This altered final score calculation with the written examination functioning as a barrier examination, although the written score no longer contributes to the final examination score. Candidates were included if they sat the oral examination for the first time in 2019 or later and, if they failed, subsequent attempts were recorded.

Statistical analysis

Statistical analysis used Microsoft Excel and SPSS. Continuous examination scores were summarised using mean and standard deviation. Categorical variables were reported as counts and percentages. Frequency distributions (histograms) were used to graph overall examination component results. A p-value of < 0.05 indicated statistical significance. Comparisons of examiner marks and relationships between examination components were analysed with Pearson’s correlation coefficient and visually represented with scatterplots.

G-theory analysis was used to calculate an overall reliability coefficient for the Hot Case examination, and the factors contributing to variance. As examiners observed multiple candidates and candidates performed multiple Hot Cases, the design was partially crossed. However, as the case identification numbers used in the examination were recorded variably, the initial design was modified to treat cases as nested within candidates for the analysis. The variance factors being analysed included candidate proficiency, examiner stringency, case to case performance variability (case specificity) and other unspecified measurement error. These were reported with variance components, square roots of variance components and percentage of total variance. G-theory was used to conduct D-studies exploring the impact of alternate assessment designs on overall generalisability coefficients and associated standard errors of measurement. The D-study calculated the generalisability coefficient based on the equation listed in Fig. 1 .

Generalisability coefficient equation

Overall, there were 889 candidate oral examination attempts from 2012-2 to 2022-1. After exclusion of candidate oral examination attempts prior to the 2019-1 sitting, exclusion of candidates with first attempts prior to 2019-1 and exclusion of one candidate with missing Hot Case scores, there were 296 candidate oral examination attempts analysed. This included 166 first attempts, 100 s and 30 third attempts. This resulted in 592 Hot Case results and 1184 individual examiner Hot Case scores. The recruitment, exclusion and analysis of the sample are presented in Fig. 2 .

CONSORT style diagram demonstrating the sample size from data request through to the sample available for analysis

The mean and standard deviation of individual examiner Hot Case scores from all examiners was 5.17 and 1.65 respectively. Of the 1184 Hot Case individual examiner scores, 645 (54.5%) achieved a score 5 or greater, and 539 (45.5%) scored less than 5. The distribution of individual examiner Hot Case scores is presented in Fig. 3 . First attempt candidates scored higher than those repeating (5.25 (SD1.63) vs. 4.89 (SD1.66) p = < 0.01).

Histogram showing individual examiner Hot Case scores for all attempts

Scores on each Hot Case are calculated as the mean of the two individual examiner Hot Case scores. Overall, 312 of 592 Hot Cases were passed (52.7%). The correlation coefficient between candidates first and second Hot Cases was low at 0.30 (Fig. 4 ).

The correlation between each candidate’s first and second Hot Case scores. A jitter function was applied to spread overlying points

The correlation coefficient between examiners observing the same case (inter-rater agreement) was high at 0.91 (Fig. 5 ).

The summary of sources of variance for individual examiner Hot Case scores is presented in Table 1 .

Comparison between Hot Case scores from the first and second examiners. A jitter function was applied to spread overlying points

The overall generalisability coefficient of the Hot Case component including two separate cases observed by two examiners each was 0.42.

The results for the D-studies are presented in Table 2 . To achieve a generalisability coefficient of 0.8 or greater, 11 Hot Cases with two examiners would be needed. A graph comparing the generalisability coefficients for one and two examiners is presented in Fig. 6 .

Generalisability coefficients with a variable number of cases comparing examination designs with one and two examiners observing each case

The current examination format with two Hot Cases observed by two examiners has a reliability coefficient of 0.42. To achieve the widely accepted standard for high stakes examinations of a reliability coefficient of > 0.8 requires each candidate to sit 11 Hot Cases with two examiners.

These results are similar to The Royal Australasian College of Physicians (RACP) 60-minute long case examination observed by two examiners which has a reliability coefficient of 0.38 [ 18 ]. When the assessment time is lengthened with two long cases and four short cases, the RACP achieved a reliability coefficient of 0.71. The RACP continues to use long case examinations, as they are valued by examiners and trainees as an authentic measure of competence with an educational impact from examination preparation [ 18 ]. Educational impact is commonly cited as a reason to retain clinical examinations [ 4 , 19 , 20 ].

G-theory analysis demonstrates that examiners appear well calibrated, as examiner variance was responsible for only 3.5% of overall variance in Hot Case scores. Therefore, adding additional examiners would not substantially improve reliability. However, this conclusion may be affected by the extent of discussion between the examiners prior to recording their amended final scores. If discussion influences the opinions of an examiner strongly, it is likely there will be higher correlation between examiner scores. To evaluate this effect, independent examiner scores would need to be recorded prior to discussion, with clear guidelines around acceptable amendments to scores.

The finding that the majority of Hot Case variance (63.4%) arises from case variation is consistent with anecdotal reports from examination candidates who describe case difficulty as a “lucky dip”. This finding is consistent with the poor correlation (0.30) between candidates’ first and second Hot Cases. Whilst examiners preview the Hot Case patient, there is no formal method of quantifying and adjusting for the difficulty of each case. According to Kane’s Validity Framework [ 21 ], it is difficult to argue that the assessment is valid if the initial scoring and subsequent generalisation of those scores are based more on case specificity than candidate proficiency, particularly when the implications of the results are significant for candidates and patient safety. The CICM has introduced the Angoff method [ 16 ] for the written examination to account for variation in question difficulty and an appropriate standard setting method for the Hot Case component may mitigate this degree of case variability to some extent. The CICM has avoided the use of norm referenced assessments where candidates are compared with their peers so that all candidates deemed competent are eligible to pass. This is appropriate given the low number of candidates in each sitting, the low number of candidates taken to each case and high variability in case difficulty.

Case specificity is the concept that candidate performance is dependent on the case used and is a major issue in specialist credentialling examinations [ 4 ]. Problem solving ability and clinical reasoning are based on prior clinical experience, areas of particular interest and background knowledge. Candidate performance may be highly case specific, meaning limited numbers of examination cases have detrimental effects on reliability [ 4 , 5 , 22 ]. In the literature, increasing case numbers or overall assessment time is commonly proposed as a method of obtaining more generalisable results [ 6 , 18 , 23 , 24 ]. However, having a candidate pass overall, but clearly fail a component of a credentialling examination may be difficult to justify as defensible from the perspective of patient safety and societal obligations.

The individual examiner Hot Case scores (5.17, SD 1.65) are close to the 50% pass fail boundary. This makes examiners’ decision making difficult, with potentially small differences in performance determining a pass or fail. This is demonstrated in the histogram in Fig. 3 , with a large proportion of trainees scoring a 4.5 or 5, the junction between a pass and a fail. This dichotomisation should be supported by a clear rubric defining what constitutes a minimally competent performance. This will also give candidates clearer performance expectations and may mitigate variability due to case difficulty and specificity by defining expected competencies which are independent of the case difficulty.

Assessing the quality of future care using examination performance as a substitute marker of competence has limitations [ 11 ]. There are concerns from a validity point of view regarding decision making based on short periods of assessment [ 6 , 9 , 10 , 18 , 25 ]. As such, credentialling examinations should focus on identifying low end outliers, a possible true risk to patients and society without further training. Rather than failing candidates with a borderline performance, the focus should be on increasing the sample size to guide decision making. Additional Hot Cases for those with a borderline performance on the oral examination is a possible solution, to increase the reliability for defensible decision making. Summative Hot Cases performed during the training program, but not at the time of the final examination, is another option to increase available data through a transition to a programmatic style of longitudinal assessment.

Restricting the analysis for candidates who sat the written from the 2019-1 sitting onwards was necessitated by the quality of the available dataset. This aided analysis as the Angoff method was introduced for the written paper in 2019 [ 17 ] with the written score no longer counting toward the overall examination score. Candidates are now considered to have passed or failed the written, and then to pass the oral examination they require > 50% from the Hot Case component (worth 30 marks) and viva component (worth 40 marks) combined. This results in a higher benchmark to pass the examination overall, as previously a strong written mark could contribute to an overall pass despite a weaker oral performance.

This research fills a gap in the current understanding of credentialling intensive care physicians. However, it should be taken in context of the overall assessment process. If high stakes assessment requires a reliability coefficient of > 0.8, this value should be the benchmark for the combined oral examination including the Hot Cases and viva component. Further research is required to assess how the Hot Case component and the viva component interact to form the overall reliability of the oral examination.

The strengths of this study include the originality, the predefined statistical plan, the large cohort and the collaboration with the CICM to provide previously unexamined data for an independent analysis. Additionally, the use of descriptive statistics, G-theory analysis and D-studies provides a comprehensive picture of the Hot Case examination reliability in its current format.

Study limitations include dataset consistency issues that restricted the study period, the focus specifically on the Hot Case component without an in-depth analysis of the other components of the examination, the focus on traditional psychometric evaluation and the potential overestimation of examiner calibration due to revision of examiner scores after discussion. Evaluating examination performance without external measures of candidate ability is a research design that focuses on the examination itself. Assessment research is often not truly focussed on candidate competence as this is very difficult to study, so it inevitably evaluates the process rather than the product. As such, identifying poor reliability as a weakness of the Hot Case examination does not detract from potential validity in the overall examination process.

Several implications and unanswered questions remain. Firstly, examiners appear well calibrated, but discussion and score amendment may be significant. Secondly, with additional examiner time, reliability could be increased by challenging candidates with borderline results with additional cases upon which decisions are made. Thirdly, this research highlights the importance of a scoring rubric and robust processes for data capture. Finally, further research is required to assess how the Hot Case and viva examination interact to test the overall reliability of the oral examination. This should be supported by research aiming to assess the validity of the Hot Case as a method of evaluating clinical competence by comparing it with other forms of assessment and workplace competency.

Hot Cases have long been a method of assessment in ICU training in Australia and New Zealand, with perceived benefits from the perspective of stakeholder acceptance and educational impact. Changes to the current examination format to increase reliability would solidify its role in the credentialling process by addressing concerns within the ICU community.

The reliability of the CICM Hot Case examination is less than the generally accepted standard for a high stakes credentialling examination. Further examiner training is unlikely to improve the reliability as the examiners appear to be well calibrated. Modifications to case selection and the introduction of a clear scoring rubric to mitigate the effects of variation in case difficulty may be helpful, but are unlikely to improve reliability substantially due to case specificity. Increasing the number of cases and overall assessment time appears to be the best way to increase the overall reliability. Further research is required to assess how the Hot Case and viva results interact to quantify the reliability of the oral examination in its entirety, and to evaluate the validity of the examination format in making credentialling decisions.

Data availability

The datasets analysed during the current study are not publicly available, but are available from the corresponding author on reasonable request.

Burch VC, Norman GR, Schmidt HG, van der Vleuten CP. Are specialist certification examinations a reliable measure of physician competence? Adv Health Sci Educ Theory Pract. 2008;13(4):521–33. https://doi.org/10.1007/s10459-007-9063-5 .

Article Google Scholar

Norcini J, Anderson MB, Bollela V, Burch V, Costa MJ, Duvivier R, Hays R, Palacios Mackay MF, Roberts T, Swanson D. Consensus framework for good assessment. Med Teach. 2018;40(11):1102–9. https://doi.org/10.1080/0142159X.2018.1500016 .

Lee RP, Venkatesh B, Morley P. Evidence-based evolution of the high stakes postgraduate intensive care examination in Australia and New Zealand. Anaesth Intensive Care. 2009;37(4):525–31. https://doi.org/10.1177/0310057X0903700422 .

Turnbull J, Turnbull J, Jacob P, Brown J, Duplessis M, Rivest J. Contextual considerations in summative competency examinations: relevance to the long case. Acad Med. 2005;80(12):1133–7. https://doi.org/10.1097/00001888-200512000-00014 .

Memon MA, Joughin GR, Memon B. Oral assessment and postgraduate medical examinations: establishing conditions for validity, reliability and fairness. Adv Health Sci Educ Theory Pract. 2010;15(2):277–89. https://doi.org/10.1007/s10459-008-9111-9 .

Lane AS, Roberts C, Khanna P. Do we know who the Person with the Borderline score is, in standard-setting and decision-making. Health Prof Edu. 2020;6(4):617–25. https://doi.org/10.1016/j.hpe.2020.07.001 .

Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9 Suppl):S63–7. https://doi.org/10.1097/00001888-199009000-00045 . PMID: 2400509.

College of Intensive Care Medicine of Australia and New Zealand. Second part examination: previous exam reports [Internet]. CICM. 2022 [updated 2023; cited 2023 Oct 30]. https://www.cicm.org.au/CICM_Media/CICMSite/Files/Exams/2022-1-SP-Exam-Report.pdf .

Hoffman K, Nickson CP, Ryan AT, Lane S. Too hot to handle? The validity and reliability of the college of intensive Care Medicine Hot Case examination. Crit Care Resusc. 2022;24(1):87–92. https://doi.org/10.51893/2022.1.L .

Van Der Vleuten C. The assessment of professional competence: developments, research and practical implications. Adv Health Sci Educ. 1996;1:41–67. https://doi.org/10.1007/BF00596229 .

Bloch R, Norman G. Generalizability theory for the perplexed: a practical introduction and guide: AMEE Guide 68. Med Teach. 2012;34(11):960–92. https://doi.org/10.3109/0142159X.2012.703791 .

Moonen-van Loon JM, Overeem K, Donkers HH, van der Vleuten CP, Driessen EW. Composite reliability of a workplace-based assessment toolbox for postgraduate medical education. Adv Health Sci Educ Theory Pract. 2013;18(5):1087–102. https://doi.org/10.1007/s10459-013-9450-z .

Crossley J, Davies H, Humphris G, Jolly B. Generalisability: a key to unlock professional assessment. Med Educ. 2002;36(10):972–8. https://doi.org/10.1046/j.1365-2923.2002.01320.x .

Weller JM, Castanelli DJ, Chen Y, Jolly B. Making robust assessments of specialist trainees’ workplace performance. Br J Anaesth. 2017;118(2):207–14. https://doi.org/10.1093/bja/aew412 .

Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The dependability of behavioral measurements: theory of generalizability for scores and profiles. New York: Wiley; 1972.

Google Scholar

Angoff WH. Scales, norms, and equivalent scores. Educational Measurement. Second Edition. Washington: American Coucil on Education; 1971.

Karcher C. The Angoff method in the written exam of the College of Intensive Care Medicine of Australia and New Zealand: setting a new standard. Crit Care Resusc. 2019;21(1):6–8.

Wilkinson TJ, Campbell PJ, Judd SJ. Reliability of the long case. Med Educ. 2008;42(9):887–93. https://doi.org/10.1111/j.1365-2923.2008.03129.x .

Tey C, Chiavaroli N, Ryan A. Perceived educational impact of the medical student long case: a qualitative study. BMC Med Educ. 2020;20:257. https://doi.org/10.1186/s12909-020-02182-6 .

Sim J, Daniel E. The long case as an assessment tool of clinical skills in summative assessment: a necessary evil. Int Med J. 2015;22:537–40.

Kane MT. Validating the interpretations and uses of test scores. J Educ Meas. 2013;50:1–73.

Swanson DB, Norman GR, Linn RL. Performance-based assessment: lessons from the health professions. Educ Res. 1995;24:5e11.

Wass V, Jolly B. Does observation add to the validity of the long case? Med Educ. 2001;35(8):729–34. https://doi.org/10.1046/j.1365-2923.2001.01012.x .

Van Der Vleuten C, Schuwirth L. Assessing professional competence: from methods to programmes. Med Educ. 2005;39(3):309–17. https://doi.org/10.1111/j.1365-2929.2005.02094.x .

Dijkstra J, Galbraith R, Hodges B, et al. Expert validation of fit-for-purpose guidelines for designing programmes of assessment. BMC Med Educ. 2012;1712:20. https://doi.org/10.1186/1472-6920-12-20 .

Download references

The CICM provided staff time for data extraction and de-identification prior to release. There was no additional funding provided for the research.

Author information

Authors and affiliations.

Intensive Care Unit, The Alfred Hospital, Melbourne, Australia

Kenneth R. Hoffman & Chris Nickson

Department of Epidemiology and Preventative Medicine, School of Public Health, Monash University, Melbourne, Australia

Department of Medical Education, Melbourne Medical School, University of Melbourne, Melbourne, Australia

David Swanson & Anna T. Ryan

Sydney Medical School, The University of Sydney, Sydney, Australia

Stuart Lane

College of Intensive Care Medicine of Australia and New Zealand, Melbourne, Australia

You can also search for this author in PubMed Google Scholar

Contributions

KH conceived and designed the study, analysed and interpreted the data and drafted and revised the manuscript. DS designed the study, performed the analysis, interpreted the data and revised the manuscript. SL contributed to the conception, design and interpretation of the study and revised the manuscript. CN contributed to the conception, design and interpretation of the study and revised the manuscript. PB contributed to data acquisition and analysis and revised the manuscript. AR contributed to the conception, design and interpretation of the study and revised the manuscript.

Corresponding author

Correspondence to Kenneth R. Hoffman .

Ethics declarations

Ethical approval and consent to participate.

Ethics approval was provided by the University of Melbourne Low Risk Human Ethics Committee (Project number 2022-23964-28268-3). The consent requirement was waived by the University of Melbourne Low Risk Human Ethics Committee as analysis was retrospective using de-identified data with no foreseeable risk to participants in accordance with the Australian National Statement on Ethical Conduct in Human Research 2023.

Consent for publication

Not applicable.

Competing interests

KH (Fellow of the CICM), AR (No conflicts of interest), SL (Fellow of the CICM, CICM Second Part examiner 2011-2023, CICM Second Part examination committee 2019-2023, Chair of the CICM Second Part examination panel 2020-2023, CICM First Part examiner 2012-2019), CN (Fellow of the CICM, CICM First Part examiner 2017-2023, CICM First Part examination committee 2019-2023, CICM Supervisor of Training 2018-2023), DS (No conflicts of interest), PB (Employed by the CICM in the position of Information, Communication and Technology Manager).

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Hoffman, K.R., Swanson, D., Lane, S. et al. The reliability of the College of Intensive Care Medicine of Australia and New Zealand “Hot Case” examination. BMC Med Educ 24 , 527 (2024). https://doi.org/10.1186/s12909-024-05516-w

Download citation

Received : 21 February 2024

Accepted : 03 May 2024

Published : 11 May 2024

DOI : https://doi.org/10.1186/s12909-024-05516-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Intensive care
Examination
Credentialling
Reliability
Generalisability theory

BMC Medical Education

ISSN: 1472-6920

Submission enquiries: [email protected]
General enquiries: [email protected]

IMAGES

Descriptive statistics of the case study data
4 SAS/STAT Descriptive Statistics Procedure You Must Know
Assignment 1
Descriptive Statistics Case Study (Unit 2) by Greg Angstadt-Williams
descriptive statistics case study
Descriptive Statistics

VIDEO

Descriptive Study designs: Case report, case series, Ecological and cross-sectional study designs
4. Descriptive Statistics (Measures of Variability)
Statistics 1.1.1 Descriptive Statistics Introduction
HOW DESCRIPTIVE STATISTICS DESCRIBES THE DATA EXPLAINED WITH EXAMPLE
EP2 cours statistique descriptive tableaux statistique
Inferential Statistics vs Descriptive Statistics In Hindi

COMMENTS

Descriptive Statistics for Summarising Data
Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).
Descriptive Statistics
There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value. The central tendency concerns the averages of the values. The variability or dispersion concerns how spread out the values are. You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in ...
1.4
Example 1-5: Women's Health Survey (Descriptive Statistics) Let us take a look at an example. In 1985, the USDA commissioned a study of women's nutrition. Nutrient intake was measured for a random sample of 737 women aged 25-50 years. The following variables were measured:
Essential Statistics for Data Science: A Case Study using Python, Part
246SHARES. Author: Tim Dobbins Engineer & Statistician. Author: John Burke Research Analyst. Statistics. Essential Statistics for Data Science: A Case Study using Python, Part I. Get to know some of the essential statistics you should be very familiar with when learning data science. Our last post dove straight into linear regression.
(PDF) Introduction to Descriptive statistics
Similarly, De scriptive statistics are used to summarize and analyze data in. a variety of academic areas, including psychology, sociology, economics, education, and epidemiology [3 ]. Descriptive ...
First Case Study in Descriptive Analytics
Perform descriptive analytics to create a customer profile for each model's product line. ... This case study is inspired from a case study in Business Statistics : A First Course by Dr. P. K ...
Study designs: Part 2
Descriptive studies, irrespective of the subtype, are often very easy to conduct. For case reports, case series, and ecological studies, the data are already available. For cross-sectional studies, these can be easily collected (usually in one encounter). Thus, these study designs are often inexpensive, quick and do not need too much effort.
Descriptive statistics with Python
This method returns many useful descriptive statistics with a mix of measures of central tendency and measures of variability. This includes the number of non-missing observations; the mean; standard deviation; minimum value; 25 th, 50 th (a.k.a. the median), and 75 th percentile; as well as the maximum value. It's missing some useful ...
Descriptive statistics
Research. A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, [1] while descriptive statistics (in the mass noun sense) is the process of using and analysing those statistics. Descriptive statistics is distinguished from inferential ...
Statistics for Data Analysts: Descriptive Statistics with Python
This article focuses on descriptive statistics using Python. For illustration, we would use a case study of student performance in an exam. The data used can be found here. The data contains the ...
What Is Descriptive Analytics? 5 Examples
5 Examples of Descriptive Analytics. 1. Traffic and Engagement Reports. One example of descriptive analytics is reporting. If your organization tracks engagement in the form of social media analytics or web traffic, you're already using descriptive analytics. These reports are created by taking raw data—generated when users interact with ...
2. Exploring the Data Science Pipeline via Descriptive Statistics
2.4 Case Study 1: Comparing Happiness Data across Years; 2.5 Challenge in Case Study 1: Calculating a Correlation Matrix; 2.6 Case Study 2: Considering Starting a Business? 2.7 Case Study 2: Where Should We Start Our New Business? 2.8 Case Study 2: How is Business Over Time? 2.9 Case Study 2: Calculating a Correlation Matrix for Business Data ...
Chapter 16 Case Studies
16.3 Case Studies. Let us apply the methods that were introduced throughout the book to two examples of data analysis. Both examples are taken from the case studies of the Rice Virtual Lab in Statistics can be found in their Case Studies section. The analysis of these case studies may involve any of the tools that were described in the second part of the book (and some from the first part).
What is a Case Study? Definition & Examples
A case study is an in-depth investigation of a single person, group, event, or community. This research method involves intensively analyzing a subject to understand its complexity and context. The richness of a case study comes from its ability to capture detailed, qualitative data that can offer insights into a process or subject matter that ...
Case Study Method: A Step-by-Step Guide for Business Researchers
Although case studies have been discussed extensively in the literature, little has been written about the specific steps one may use to conduct case study research effectively (Gagnon, 2010; Hancock & Algozzine, 2016).Baskarada (2014) also emphasized the need to have a succinct guideline that can be practically followed as it is actually tough to execute a case study well in practice.
(PDF) Descriptive Statistics
Mainly descriptive statistics is used to describe the behavior of a. sample data. It is used to present quantitative analysis of the given set of data. As in a study there. are numerous variables ...
PDF Case Study Applications of Statistics in Institutional Research
descriptive and inferential statistics as they are applied to case studies in Institutional Research. In this format, this is quite a challenge as a wide range of statistical concepts and procedures is covered in relatively few pages. No intent is made to document the numerical calculation of statistics or to prove statistical formulas.
Descriptive Research Design
Case Study. This involves an in-depth examination of a single individual, group, or situation to gain a detailed understanding of its characteristics or dynamics. ... As discussed earlier, common data analysis methods for descriptive research include descriptive statistics, cross-tabulation, content analysis, qualitative coding, visualization ...
LibGuides: Research Writing and Analysis: Case Study
A Case study is: An in-depth research design that primarily uses a qualitative methodology but sometimes includes quantitative methodology. Used to examine an identifiable problem confirmed through research. Used to investigate an individual, group of people, organization, or event. Used to mostly answer "how" and "why" questions.
Descriptive Statistics: Definition, Formulas, Types, Examples
Descriptive Statistics Definition. Descriptive statistics is a type of statistical analysis that uses quantitative methods to summarize the features of a population sample. It is useful to present easy and exact summaries of the sample and observations using metrics such as mean, median, variance, graphs, and charts.
Descriptive statistics of the case study data
Download Table | Descriptive statistics of the case study data from publication: Analysis of judgmental adjustments in the presence of promotions | Sales forecasting is increasingly complex due to ...
PDF DESCRIPTIVE CASE STUDY
Descriptive cases are teaching materials, not research publications. They require research, but the research furnishes concepts and content for the case narrative. Writing this type of case is very different from writing a research article. First-time case authors typically go through a period of adjustment to adapt their writing from a form ...
Assignment 1
ASSESSMENT TASK 1: Individual Case Study on Descriptive Statistics (20%) Version C Part I: Descriptive Case study (10%) Course: ECON1193 - Business Statistics 1 Semester 3, 2017 1. Jack Valenti, who is the former president of the Motion Picture Association of America once said: "No one, absolutely no one can tell you what a film is going to ...
Statistics Project Topics: From Data to Discovery
Topics that explore the application of statistics in business and socio-economic areas. Each category of topics for the statistics project provides unique insights into the world of statistics, offering opportunities for learning and application. Let's dive into these ideas and explore the exciting world of statistical analysis.
Social media influence on COVID-19 vaccine ...
The study established that all participants in the study utilized social media and internet for various purposes. A noteworthy positive correlation (r = 0.296, p < 0.01) emerged, indicating a strong association between trust in online vaccine information and active engagement in social media discussions.Conversely, a significant negative correlation (r = -0.610, p < 0.01) was identified ...
The reliability of the College of Intensive Care Medicine of Australia
Background High stakes examinations used to credential trainees for independent specialist practice should be evaluated periodically to ensure defensible decisions are made. This study aims to quantify the College of Intensive Care Medicine of Australia and New Zealand (CICM) Hot Case reliability coefficient and evaluate contributions to variance from candidates, cases and examiners. Methods ...