What Is Statistical Analysis?

statistical research data

Statistical analysis is a technique we use to find patterns in data and make inferences about those patterns to describe variability in the results of a data set or an experiment. 

In its simplest form, statistical analysis answers questions about:

  • Quantification — how big/small/tall/wide is it?
  • Variability — growth, increase, decline
  • The confidence level of these variabilities

What Are the 2 Types of Statistical Analysis?

  • Descriptive Statistics:  Descriptive statistical analysis describes the quality of the data by summarizing large data sets into single measures. 
  • Inferential Statistics:  Inferential statistical analysis allows you to draw conclusions from your sample data set and make predictions about a population using statistical tests.

What’s the Purpose of Statistical Analysis?

Using statistical analysis, you can determine trends in the data by calculating your data set’s mean or median. You can also analyze the variation between different data points from the mean to get the standard deviation . Furthermore, to test the validity of your statistical analysis conclusions, you can use hypothesis testing techniques, like P-value, to determine the likelihood that the observed variability could have occurred by chance.

More From Abdishakur Hassan The 7 Best Thematic Map Types for Geospatial Data

Statistical Analysis Methods

There are two major types of statistical data analysis: descriptive and inferential. 

Descriptive Statistical Analysis

Descriptive statistical analysis describes the quality of the data by summarizing large data sets into single measures. 

Within the descriptive analysis branch, there are two main types: measures of central tendency (i.e. mean, median and mode) and measures of dispersion or variation (i.e. variance , standard deviation and range). 

For example, you can calculate the average exam results in a class using central tendency or, in particular, the mean. In that case, you’d sum all student results and divide by the number of tests. You can also calculate the data set’s spread by calculating the variance. To calculate the variance, subtract each exam result in the data set from the mean, square the answer, add everything together and divide by the number of tests.

Inferential Statistics

On the other hand, inferential statistical analysis allows you to draw conclusions from your sample data set and make predictions about a population using statistical tests. 

There are two main types of inferential statistical analysis: hypothesis testing and regression analysis. We use hypothesis testing to test and validate assumptions in order to draw conclusions about a population from the sample data. Popular tests include Z-test, F-Test, ANOVA test and confidence intervals . On the other hand, regression analysis primarily estimates the relationship between a dependent variable and one or more independent variables. There are numerous types of regression analysis but the most popular ones include linear and logistic regression .  

Statistical Analysis Steps  

In the era of big data and data science, there is a rising demand for a more problem-driven approach. As a result, we must approach statistical analysis holistically. We may divide the entire process into five different and significant stages by using the well-known PPDAC model of statistics: Problem, Plan, Data, Analysis and Conclusion.

statistical analysis chart of the statistical cycle. The chart is in the shape of a circle going clockwise starting with one and going up to five. Each number corresponds to a brief description of that step in the PPDAC cylce. The circle is gray with blue number. Step four is orange.

In the first stage, you define the problem you want to tackle and explore questions about the problem. 

Next is the planning phase. You can check whether data is available or if you need to collect data for your problem. You also determine what to measure and how to measure it. 

The third stage involves data collection, understanding the data and checking its quality. 

4. Analysis

Statistical data analysis is the fourth stage. Here you process and explore the data with the help of tables, graphs and other data visualizations.  You also develop and scrutinize your hypothesis in this stage of analysis. 

5. Conclusion

The final step involves interpretations and conclusions from your analysis. It also covers generating new ideas for the next iteration. Thus, statistical analysis is not a one-time event but an iterative process.

Statistical Analysis Uses

Statistical analysis is useful for research and decision making because it allows us to understand the world around us and draw conclusions by testing our assumptions. Statistical analysis is important for various applications, including:

  • Statistical quality control and analysis in product development 
  • Clinical trials
  • Customer satisfaction surveys and customer experience research 
  • Marketing operations management
  • Process improvement and optimization
  • Training needs 

More on Statistical Analysis From Built In Experts Intro to Descriptive Statistics for Machine Learning

Benefits of Statistical Analysis

Here are some of the reasons why statistical analysis is widespread in many applications and why it’s necessary:

Understand Data

Statistical analysis gives you a better understanding of the data and what they mean. These types of analyses provide information that would otherwise be difficult to obtain by merely looking at the numbers without considering their relationship.

Find Causal Relationships

Statistical analysis can help you investigate causation or establish the precise meaning of an experiment, like when you’re looking for a relationship between two variables.

Make Data-Informed Decisions

Businesses are constantly looking to find ways to improve their services and products . Statistical analysis allows you to make data-informed decisions about your business or future actions by helping you identify trends in your data, whether positive or negative. 

Determine Probability

Statistical analysis is an approach to understanding how the probability of certain events affects the outcome of an experiment. It helps scientists and engineers decide how much confidence they can have in the results of their research, how to interpret their data and what questions they can feasibly answer.

You’ve Got Questions. Our Experts Have Answers. Confidence Intervals, Explained!

What Are the Risks of Statistical Analysis?

Statistical analysis can be valuable and effective, but it’s an imperfect approach. Even if the analyst or researcher performs a thorough statistical analysis, there may still be known or unknown problems that can affect the results. Therefore, statistical analysis is not a one-size-fits-all process. If you want to get good results, you need to know what you’re doing. It can take a lot of time to figure out which type of statistical analysis will work best for your situation .

Thus, you should remember that our conclusions drawn from statistical analysis don’t always guarantee correct results. This can be dangerous when making business decisions. In marketing , for example, we may come to the wrong conclusion about a product . Therefore, the conclusions we draw from statistical data analysis are often approximated; testing for all factors affecting an observation is impossible.

Built In’s expert contributor network publishes thoughtful, solutions-oriented stories written by innovative tech professionals. It is the tech industry’s definitive destination for sharing compelling, first-person accounts of problem-solving on the road to innovation.

Great Companies Need Great People. That's Where We Come In.

  • Top Courses
  • Online Degrees
  • Find your New Career
  • Join for Free

What Is Statistical Analysis? Definition, Types, and Jobs

Statistical analytics is a high demand career with great benefits. Learn how you can apply your statistical and data science skills to this growing field.

[Featured image] Analysts study sheets of paper containing statistical harts and graphs

Statistical analysis is the process of collecting large volumes of data and then using statistics and other data analysis techniques to identify trends, patterns, and insights. If you're a whiz at data and statistics, statistical analysis could be a great career match for you. The rise of big data, machine learning, and technology in our society has created a high demand for statistical analysts, and it's an exciting time to develop these skills and find a job you love. In this article, you'll learn more about statistical analysis, including its definition, different types of it, how it's done, and jobs that use it. At the end, you'll also explore suggested cost-effective courses than can help you gain greater knowledge of both statistical and data analytics.

Statistical analysis definition

Statistical analysis is the process of collecting and analyzing large volumes of data in order to identify trends and develop valuable insights.

In the professional world, statistical analysts take raw data and find correlations between variables to reveal patterns and trends to relevant stakeholders. Working in a wide range of different fields, statistical analysts are responsible for new scientific discoveries, improving the health of our communities, and guiding business decisions.

Types of statistical analysis

There are two main types of statistical analysis: descriptive and inferential. As a statistical analyst, you'll likely use both types in your daily work to ensure that data is both clearly communicated to others and that it's used effectively to develop actionable insights. At a glance, here's what you need to know about both types of statistical analysis:

Descriptive statistical analysis

Descriptive statistics summarizes the information within a data set without drawing conclusions about its contents. For example, if a business gave you a book of its expenses and you summarized the percentage of money it spent on different categories of items, then you would be performing a form of descriptive statistics.

When performing descriptive statistics, you will often use data visualization to present information in the form of graphs, tables, and charts to clearly convey it to others in an understandable format. Typically, leaders in a company or organization will then use this data to guide their decision making going forward.

Inferential statistical analysis

Inferential statistics takes the results of descriptive statistics one step further by drawing conclusions from the data and then making recommendations. For example, instead of only summarizing the business's expenses, you might go on to recommend in which areas to reduce spending and suggest an alternative budget.

Inferential statistical analysis is often used by businesses to inform company decisions and in scientific research to find new relationships between variables. 

Statistical analyst duties

Statistical analysts focus on making large sets of data understandable to a more general audience. In effect, you'll use your math and data skills to translate big numbers into easily digestible graphs, charts, and summaries for key decision makers within businesses and other organizations. Typical job responsibilities of statistical analysts include:

Extracting and organizing large sets of raw data

Determining which data is relevant and which should be excluded

Developing new data collection strategies

Meeting with clients and professionals to review data analysis plans

Creating data reports and easily understandable representations of the data

Presenting data

Interpreting data results

Creating recommendations for a company or other organizations

Your job responsibilities will differ depending on whether you work for a federal agency, a private company, or another business sector. Many industries need statistical analysts, so exploring your passions and seeing how you can best apply your data skills can be exciting. 

Statistical analysis skills

Because most of your job responsibilities will likely focus on data and statistical analysis, mathematical skills are crucial. High-level math skills can help you fact-check your work and create strategies to analyze the data, even if you use software for many computations. When honing in on your mathematical skills, focusing on statistics—specifically statistics with large data sets—can help set you apart when searching for job opportunities. Competency with computer software and learning new platforms will also help you excel in more advanced positions and put you in high demand.

Data analytics , problem-solving, and critical thinking are vital skills to help you determine the data set’s true meaning and bigger picture. Often, large data sets may not represent what they appear on the surface. To get to the bottom of things, you'll need to think critically about factors that may influence the data set, create an informed analysis plan, and parse out bias to identify insightful trends. 

To excel in the workplace, you'll need to hone your database management skills, keep up to date on statistical methodology, and continually improve your research skills. These skills take time to build, so starting with introductory courses and having patience while you build skills is important.

Common software used in statistical analytics jobs

Statistical analysis often involves computations using big data that is too large to compute by hand. The good news is that many kinds of statistical software have been developed to help analyze data effectively and efficiently. Gaining mastery over this statistical software can make you look attractive to employers and allow you to work on more complex projects. 

Statistical software is beneficial for both descriptive and inferential statistics. You can use it to generate charts and graphs or perform computations to draw conclusions and inferences from the data. While the type of statistical software you will use will depend on your employer, common software used include:

Read more: The 7 Data Analysis Software You Need to Know

Pathways to a career in statistical analytics

Many paths to becoming a statistical analyst exist, but most jobs in this field require a bachelor’s degree. Employers will typically look for a degree in an area that focuses on math, computer science, statistics, or data science to ensure you have the skills needed for the job. If your bachelor’s degree is in another field, gaining experience through entry-level data entry jobs can help get your foot in the door. Many employers look for work experience in related careers such as being a research assistant, data manager, or intern in the field.

Earning a graduate degree in statistical analytics or a related field can also help you stand out on your resume and demonstrate a deep knowledge of the skills needed to perform the job successfully. Generally, employers focus more on making sure you have the mathematical and data analysis skills required to perform complex statistical analytics on its data. After all, you will be helping them to make decisions, so they want to feel confident in your ability to advise them in the right direction.

Read more: Your Guide to a Career as a Statistician—What to Expect

How much do statistical analytics professionals earn? 

Statistical analysts earn well above the national average and enjoy many benefits on the job. There are many careers utilizing statistical analytics, so comparing salaries can help determine if the job benefits align with your expectations.

Median annual salary: $113,990

Job outlook for 2022 to 2032: 23% [ 1 ]

Data scientist

Median annual salary: $103,500

Job outlook for 2022 to 2032: 35% [ 2 ]

Financial risk specialist

Median annual salary: $102,120

Job outlook for 2022 to 2032: 8% [ 3 ]

Investment analyst

Median annual salary: $95,080

Operational research analyst

Median annual salary: $85,720

Job outlook for 2022 to 2032: 23% [ 4 ]

Market research analyst

Median annual salary: $68,230

Job outlook for 2022 to 2032: 13% [ 5 ]

Statistician

Median annual salary: $99,960

Job outlook for 2022 to 2032: 30% [ 6 ]

Read more: How Much Do Statisticians Make? Your 2022 Statistician Salary Guide

Statistical analysis job outlook

Jobs that use statistical analysis have a positive outlook for the foreseeable future.

According to the US Bureau of Labor Statistics (BLS), the number of jobs for mathematicians and statisticians is projected to grow by 30 percent between 2022 and 2032, adding an average of 3,500 new jobs each year throughout the decade [ 6 ].

As we create more ways to collect data worldwide, there will be an increased need for people able to analyze and make sense of the data.

Ready to take the next step in your career?

Statistical analytics could be an excellent career match for those with an affinity for math, data, and problem-solving. Here are some popular courses to consider as you prepare for a career in statistical analysis:

Learn fundamental processes and tools with Google's Data Analytics Professional Certificate . You'll learn how to process and analyze data, use key analysis tools, apply R programming, and create visualizations that can inform key business decisions.

Grow your comfort using R with Duke University's Data Analysis with R Specialization . Statistical analysts commonly use R for testing, modeling, and analysis. Here, you'll learn and practice those processes.

Apply statistical analysis with Rice University's Business Statistics and Analysis Specialization . Contextualize your technical and analytical skills by using them to solve business problems and complete a hands-on Capstone Project to demonstrate your knowledge.

Article sources

US Bureau of Labor Statistics. " Occupational Outlook Handbook: Actuaries , https://www.bls.gov/ooh/math/actuaries.htm." Accessed November 21, 2023.

US Bureau of Labor Statistics. " Occupational Outlook Handbook: Data Scientists , https://www.bls.gov/ooh/math/data-scientists.htm." Accessed Accessed November 21, 2023.

US Bureau of Labor Statistics. " Occupational Outlook Handbook: Financial Analysts , https://www.bls.gov/ooh/business-and-financial/financial-analysts.htm." Accessed Accessed November 21, 2023.

US Bureau of Labor Statistics. " Occupational Outlook Handbook: Operations Research Analysts , https://www.bls.gov/ooh/math/operations-research-analysts.htm." Accessed Accessed November 21, 2023.

US Bureau of Labor Statistics. " Occupational Outlook Handbook: Market Research Analyst , https://www.bls.gov/ooh/business-and-financial/market-research-analysts.htm." Accessed Accessed November 21, 2023.

US Bureau of Labor Statistics. " Occupational Outlook Handbook: Mathematicians and Statisticians , https://www.bls.gov/ooh/math/mathematicians-and-statisticians.htm." Accessed Accessed November 21, 2023.

Keep reading

Coursera staff.

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.

  • Data Center
  • Applications
  • Open Source

Logo

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More .

Statistical analysis is a systematic method of gathering, analyzing, interpreting, presenting, and deriving conclusions from data. It employs statistical tools to find patterns, trends, and links within datasets to facilitate informed decision-making. Data collection, description, exploratory data analysis (EDA), inferential statistics, statistical modeling, data visualization, and interpretation are all important aspects of statistical analysis.

Used in quantitative research to gather and analyze data, statistical data analysis provides a more comprehensive view of operational landscapes and gives organizations the insights they need to make strategic, evidence-based decisions. Here’s what you need to know.

Table of Contents

How Does Statistical Analysis Work?

The strategic use of statistical analysis procedures helps organizations get insights from data to make educated decisions. Statistical analytic approaches, which include everything from data extraction to the creation of actionable recommendations, provide a systematic approach to comprehending, interpreting, and using large datasets. By navigating these complex processes, businesses uncover hidden patterns in their data and extract important insights that can be used as a compass for strategic decision-making.

Extracting and Organizing Raw Data

Extracting and organizing raw data entails gathering information from a variety of sources, combining datasets, and assuring data quality through rigorous cleaning. In healthcare, for example, this method may comprise combining patient information from several systems to assess patterns in illness prevalence and treatment outcomes.

Identifying Essential Data

Identifying key data—and excluding irrelevant data—necessitates a thorough analysis of the dataset. Analysts use variable selection strategies to filter datasets to emphasize characteristics most relevant to the objectives, resulting in more focused and meaningful analysis.

Developing Innovative Collection Strategies 

Innovative data collection procedures include everything from creating successful surveys and organizing experiments to data mining to extract data from a wide range of sources. Researchers in environmental studies might use remote sensing technology to obtain data on how plants and land cover change over time. Modern approaches such as satellite photography and machine learning algorithms help scientists improve the depth and precision of data collecting, opening the way for more nuanced analyses and informed decision-making.

Collaborating With Experts

Collaborating with clients and specialists to review data analysis tactics can align analytical approaches with organizational objectives. In finance, for example, engaging with investment professionals ensures that data analysis tactics analyze market trends and make educated investment decisions. Analysts may modify their tactics by incorporating comments from domain experts, making the ensuing study more relevant and applicable to the given sector or subject.

Creating Reports and Visualizations 

Creating data reports and visualizations entails generating extensive summaries and graphical representations for clarity. In e-commerce, reports might indicate user purchase trends using visualizations like heatmaps to highlight popular goods. Businesses that display data in a visually accessible format can rapidly analyze patterns and make data-driven choices that optimize product offers and improve the entire consumer experience.

Analyzing Data Findings 

This step entails using statistical tools to discover patterns, correlations, and insights in the dataset. In manufacturing, data analysis can identify connections between production factors and product faults, leading process improvement efforts. Engineers may discover and resolve fundamental causes using statistical tools and methodologies, resulting in higher product quality and operational efficiency.

Acting on the Data

Synthesizing findings from data analysis leads to the development of organizational recommendations. In the hospitality business, for example, data analysis might indicate trends in client preferences, resulting in strategic suggestions for tailored services and marketing efforts. Continuous improvement ideas based on analytical results help the firm adapt to a changing market scenario and compete more effectively.

The Importance of Statistical Analysis

The importance of statistical analysis goes far beyond data processing; it is the cornerstone in giving vital insights required for strategic decision-making, especially in the dynamic area of presenting new items to the market. Statistical analysis, which meticulously examines data, not only reveals trends and patterns but also provides a full insight into customer behavior, preferences, and market dynamics.

This abundance of information is a guiding force for enterprises, allowing them to make data-driven decisions that optimize product launches, improve market positioning, and ultimately drive success in an ever-changing business landscape.

2 Types of Statistical Analysis

There are two forms of statistical analysis, descriptive statistics, and statistical inference, both of which play an important role in guaranteeing data correctness and communicability using various analytical approaches.

By combining the capabilities of descriptive statistics with statistical inference, analysts can completely characterize the data and draw relevant insights that extend beyond the observed sample, guaranteeing conclusions that are resilient, trustworthy, and applicable to a larger context. This dual method improves the overall dependability of statistical analysis, making it an effective tool for obtaining important information from a variety of datasets.

Descriptive Statistics

This type of statistical analysis is all about visuals. Raw data doesn’t mean much on its own, and the sheer quantity can be overwhelming to digest. Descriptive statistical analysis focuses on creating a basic visual description of the data or turning information into graphs, charts, and other visuals that help people understand the meaning of the values in the data set. Descriptive analysis isn’t about explaining or drawing conclusions, though. It is only the practice of digesting and summarizing raw data to better understand it.

Statistical Inference

Inferential statistics practices involve more upfront hypotheses and follow-up explanations than descriptive statistics. In this type of statistical analysis, you are less focused on the entire collection of raw data and instead, take a sample and test your hypothesis or first estimation. From this sample and the results of your experiment, you can use inferential statistics to infer conclusions about the rest of the data set.

6 Benefits of Statistical Analysis

Statistical analysis enables a methodical and data-driven approach to decision-making and helps organizations maximize the value of their data, resulting in increased efficiency, informed decision-making, and innovation. Here are six of the most important benefits:

  • Competitive Analysis: Statistical analysis illuminates your company’s objective value—knowing common metrics like sales revenue and net profit margin allows you to compare your performance to competitors.
  • True Sales Visibility: The sales team says it is having a good week, and the numbers look good, but how can you accurately measure the impact on sales numbers? Statistical data analysis measures sales data and associates it with specific timeframes, products, and individual salespeople, which gives better visibility of marketing and sales successes.
  • Predictive Analytics: Predictive analytics allows you to use past numerical data to predict future outcomes and areas where your team should make adjustments to improve performance.
  • Risk Assessment and Management: Statistical tools help organizations analyze and manage risks more efficiently. Organizations may use historical data to identify possible hazards, anticipate future outcomes, and apply risk mitigation methods, lowering uncertainty and improving overall risk management.
  • Resource Optimization: Statistical analysis identifies areas of inefficiency or underutilization, improving personnel management, budget allocation, and resource deployment and leading to increased operational efficiency and cost savings.
  • Informed Decision Making: Statistical analysis allows businesses to base judgments on factual data rather than intuition. Data analysis allows firms to uncover patterns, trends, and correlations, resulting in better informed and strategic decision-making processes.

5-Step Statistical Analysis Process

Here are five essential steps for executing a thorough statistical analysis. By carefully following these stages, analysts may undertake a complete and rigorous statistical analysis, creating the framework for informed decision-making and providing actionable insights for both individuals and businesses.

Step 1: Data Identification and Description

Identify and clarify the features of the data to be analyzed. Understanding the nature of the dataset is critical in building the framework for a thorough statistical analysis.

Step 2: Establishing the Population Connection

Make progress toward creating a meaningful relationship between the studied data and the larger sample population from which it is drawn. This stage entails contextualizing the data within the greater framework of the population it represents, increasing the analysis’s relevance and application.

Step 3: Model Construction and Synthesis

Create a model that accurately captures and synthesizes the complex relationship between the population under study and the unique dataset. Creating a well-defined model is essential for analyzing data and generating useful insights.

Step 4: Model Validity Verification

Apply the model to thorough testing and inspection to ensure its validity. This stage guarantees that the model properly represents the population’s underlying dynamics, which improves the trustworthiness of future analysis and results.

Step 5: Predictive Analysis of Future Trends

Using predictive analytics tools , you may take your analysis to the next level. This final stage forecasts future trends and occurrences based on the developed model, providing significant insights into probable developments and assisting with proactive decision-making.

5 Statistical Analysis Methods

There are five common statistical analysis methods, each adapted to distinct data goals and guiding rapid decision-making. The approach you choose is determined by the nature of your dataset and the goals you want to achieve.

Finding the mean—the average, or center point, of the dataset—is computed by adding all the values and dividing by the number of observations. In real-world situations, the mean is used to calculate a representative value that captures the usual magnitude of a group of observations. For example, in educational evaluations, the mean score of a class provides educators with a concise measure of overall performance, allowing them to determine the general level of comprehension.

Standard Deviation

The standard deviation measures the degree of variance or dispersion within a dataset. By demonstrating how far individual values differ from the mean, it provides information about the dataset’s general dispersion. In practice, the standard deviation is used in financial analysis to analyze the volatility of stock prices. A higher standard deviation indicates greater price volatility, which helps investors evaluate and manage risks associated with various investment opportunities.

Regression analysis seeks to understand and predict connections between variables. This statistical approach is used in a variety of disciplines, including marketing, where it helps anticipate sales based on advertising spend. For example, a corporation may use regression analysis to assess how changes in advertising spending affect product sales, allowing for more efficient resource allocation for future marketing efforts.

Hypothesis Testing

Hypothesis testing is used to determine the validity of a claim or hypothesis regarding a population parameter. In medical research, hypothesis testing may be used to compare the efficacy of a novel medicine against a traditional treatment. Researchers develop a null hypothesis, implying that there is no difference between treatments, and then use statistical tests to assess if there is sufficient evidence to reject the null hypothesis in favor of the alternative.

Sample Size Determination

Choosing an adequate sample size is critical for producing trustworthy and relevant results in a study. In clinical studies, for example, researchers determine the sample size to ensure that the study has the statistical power to detect differences in treatment results. A well-determined sample size strikes a compromise between the requirement for precision and practical factors, thereby strengthening the study’s results and helping evidence-based decision-making.

Bottom Line: Identify Patterns and Trends With Statistical Analysis

Statistical analysis can provide organizations with insights into customer behavior, market dynamics, and operational efficiency. This information simplifies decision-making and prepares organizations to adapt and prosper in changing situations. Organizations that use top-tier statistical analysis tools can leverage the power of data, uncover trends, and stay at the forefront of innovation, assuring a competitive advantage in today’s ever-changing technological world.

Interested in statistical analysis? Learn how to run Monte Carlo simulations and master logistic regression in Excel.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Mastering structured data: from basics to real-world applications, 9 best ai certification courses to future-proof your career in 2024, hubspot crm vs. salesforce: head-to-head comparison (2024), get the free newsletter.

Subscribe to Data Insider for top news, trends & analysis

Latest Articles

Mastering structured data: from..., 9 best ai certification..., 10 best cloud-based project..., 9 top rpa companies....

Logo

SciSciNet: A large-scale open data lake for the science of science research

statistical research data

Data, measurement and empirical methods in the science of science

statistical research data

Interdisciplinarity revisited: evidence for research impact and dynamism

Background & summary.

Recent policy changes in funding agencies and academic journals have increased data sharing among researchers and between researchers and the public. Data sharing advances science and provides the transparency necessary for evaluating, replicating, and verifying results. However, many data-sharing policies do not explain what constitutes an appropriate dataset for archiving or how to determine the value of datasets to secondary users 1 , 2 , 3 . Questions about how to allocate data-sharing resources efficiently and responsibly have gone unanswered 4 , 5 , 6 . For instance, data-sharing policies recognize that not all data should be curated and preserved, but they do not articulate metrics or guidelines for determining what data are most worthy of investment.

Despite the potential for innovation and advancement that data sharing holds, the best strategies to prioritize datasets for preparation and archiving are often unclear. Some datasets are likely to have more downstream potential than others, and data curation policies and workflows should prioritize high-value data instead of being one-size-fits-all. Though prior research in library and information science has shown that the “analytic potential” of a dataset is key to its reuse value 7 , work is needed to implement conceptual data reuse frameworks 8 , 9 , 10 , 11 , 12 , 13 , 14 . In addition, publishers and data archives need guidance to develop metrics and evaluation strategies to assess the impact of datasets.

Several existing resources have been compiled to study the relationship between the reuse of scholarly products, such as datasets (Table  1 ); however, none of these resources include explicit information on how curation processes are applied to data to increase their value, maximize their accessibility, and ensure their long-term preservation. The CCex (Curation Costs Exchange) provides models of curation services along with cost-related datasets shared by contributors but does not make explicit connections between them or include reuse information 15 . Analyses on platforms such as DataCite 16 have focused on metadata completeness and record usage, but have not included related curation-level information. Analyses of GenBank 17 and FigShare 18 , 19 citation networks do not include curation information. Related studies of Github repository reuse 20 and Softcite software citation 21 reveal significant factors that impact the reuse of secondary research products but do not focus on research data. RD-Switchboard 22 and DSKG 23 are scholarly knowledge graphs linking research data to articles, patents, and grants, but largely omit social science research data and do not include curation-level factors. To our knowledge, other studies of curation work in organizations similar to ICPSR – such as GESIS 24 , Dataverse 25 , and DANS 26 – have not made their underlying data available for analysis.

This paper describes a dataset 27 compiled for the MICA project (Measuring the Impact of Curation Actions) led by investigators at ICPSR, a large social science data archive at the University of Michigan. The dataset was originally developed to study the impacts of data curation and archiving on data reuse. The MICA dataset has supported several previous publications investigating the intensity of data curation actions 28 , the relationship between data curation actions and data reuse 29 , and the structures of research communities in a data citation network 30 . Collectively, these studies help explain the return on various types of curatorial investments. The dataset that we introduce in this paper, which we refer to as the MICA dataset, has the potential to address research questions in the areas of science (e.g., knowledge production), library and information science (e.g., scholarly communication), and data archiving (e.g., reproducible workflows).

We constructed the MICA dataset 27 using records available at ICPSR, a large social science data archive at the University of Michigan. Data set creation involved: collecting and enriching metadata for articles indexed in the ICPSR Bibliography of Data-related Literature against the Dimensions AI bibliometric database; gathering usage statistics for studies from ICPSR’s administrative database; processing data curation work logs from ICPSR’s project tracking platform, Jira; and linking data in social science studies and series to citing analysis papers (Fig.  1 ).

figure 1

Steps to prepare MICA dataset for analysis - external sources are red, primary internal sources are blue, and internal linked sources are green.

Enrich paper metadata

The ICPSR Bibliography of Data-related Literature is a growing database of literature in which data from ICPSR studies have been used. Its creation was funded by the National Science Foundation (Award 9977984), and for the past 20 years it has been supported by ICPSR membership and multiple US federally-funded and foundation-funded topical archives at ICPSR. The Bibliography was originally launched in the year 2000 to aid in data discovery by providing a searchable database linking publications to the study data used in them. The Bibliography collects the universe of output based on the data shared in each study through, which is made available through each ICPSR study’s webpage. The Bibliography contains both peer-reviewed and grey literature, which provides evidence for measuring the impact of research data. For an item to be included in the ICPSR Bibliography, it must contain an analysis of data archived by ICPSR or contain a discussion or critique of the data collection process, study design, or methodology 31 . The Bibliography is manually curated by a team of librarians and information specialists at ICPSR who enter and validate entries. Some publications are supplied to the Bibliography by data depositors, and some citations are submitted to the Bibliography by authors who abide by ICPSR’s terms of use requiring them to submit citations to works in which they analyzed data retrieved from ICPSR. Most of the Bibliography is populated by Bibliography team members, who create custom queries for ICPSR studies performed across numerous sources, including Google Scholar, ProQuest, SSRN, and others. Each record in the Bibliography is one publication that has used one or more ICPSR studies. The version we used was captured on 2021-11-16 and included 94,755 publications.

To expand the coverage of the ICPSR Bibliography, we searched exhaustively for all ICPSR study names, unique numbers assigned to ICPSR studies, and DOIs 32 using a full-text index available through the Dimensions AI database 33 . We accessed Dimensions through a license agreement with the University of Michigan. ICPSR Bibliography librarians and information specialists manually reviewed and validated new entries that matched one or more search criteria. We then used Dimensions to gather enriched metadata and full-text links for items in the Bibliography with DOIs. We matched 43% of the items in the Bibliography to enriched Dimensions metadata including abstracts, field of research codes, concepts, and authors’ institutional information; we also obtained links to full text for 16% of Bibliography items. Based on licensing agreements, we included Dimensions identifiers and links to full text so that users with valid publisher and database access can construct an enriched publication dataset.

Gather study usage data

ICPSR maintains a relational administrative database, DBInfo, that organizes study-level metadata and information on data reuse across separate tables. Studies at ICPSR consist of one or more files collected at a single time or for a single purpose; studies in which the same variables are observed over time are grouped into series. Each study at ICPSR is assigned a DOI, and its metadata are stored in DBInfo. Study metadata follows the Data Documentation Initiative (DDI) Codebook 2.5 standard. DDI elements included in our dataset are title, ICPSR study identification number, DOI, authoring entities, description (abstract), funding agencies, subject terms assigned to the study during curation, and geographic coverage. We also created variables based on DDI elements: total variable count, the presence of survey question text in the metadata, the number of author entities, and whether an author entity was an institution. We gathered metadata for ICPSR’s 10,605 unrestricted public-use studies available as of 2021-11-16 ( https://www.icpsr.umich.edu/web/pages/membership/or/metadata/oai.html ).

To link study usage data with study-level metadata records, we joined study metadata from DBinfo on study usage information, which included total study downloads (data and documentation), individual data file downloads, and cumulative citations from the ICPSR Bibliography. We also gathered descriptive metadata for each study and its variables, which allowed us to summarize and append recoded fields onto the study-level metadata such as curation level, number and type of principle investigators, total variable count, and binary variables indicating whether the study data were made available for online analysis, whether survey question text was made searchable online, and whether the study variables were indexed for search. These characteristics describe aspects of the discoverability of the data to compare with other characteristics of the study. We used the study and series numbers included in the ICPSR Bibliography as unique identifiers to link papers to metadata and analyze the community structure of dataset co-citations in the ICPSR Bibliography 32 .

Process curation work logs

Researchers deposit data at ICPSR for curation and long-term preservation. Between 2016 and 2020, more than 3,000 research studies were deposited with ICPSR. Since 2017, ICPSR has organized curation work into a central unit that provides varied levels of curation that vary in the intensity and complexity of data enhancement that they provide. While the levels of curation are standardized as to effort (level one = less effort, level three = most effort), the specific curatorial actions undertaken for each dataset vary. The specific curation actions are captured in Jira, a work tracking program, which data curators at ICPSR use to collaborate and communicate their progress through tickets. We obtained access to a corpus of 669 completed Jira tickets corresponding to the curation of 566 unique studies between February 2017 and December 2019 28 .

To process the tickets, we focused only on their work log portions, which contained free text descriptions of work that data curators had performed on a deposited study, along with the curators’ identifiers, and timestamps. To protect the confidentiality of the data curators and the processing steps they performed, we collaborated with ICPSR’s curation unit to propose a classification scheme, which we used to train a Naive Bayes classifier and label curation actions in each work log sentence. The eight curation action labels we proposed 28 were: (1) initial review and planning, (2) data transformation, (3) metadata, (4) documentation, (5) quality checks, (6) communication, (7) other, and (8) non-curation work. We note that these categories of curation work are very specific to the curatorial processes and types of data stored at ICPSR, and may not match the curation activities at other repositories. After applying the classifier to the work log sentences, we obtained summary-level curation actions for a subset of all ICPSR studies (5%), along with the total number of hours spent on data curation for each study, and the proportion of time associated with each action during curation.

Data Records

The MICA dataset 27 connects records for each of ICPSR’s archived research studies to the research publications that use them and related curation activities available for a subset of studies (Fig.  2 ). Each of the three tables published in the dataset is available as a study archived at ICPSR. The data tables are distributed as statistical files available for use in SAS, SPSS, Stata, and R as well as delimited and ASCII text files. The dataset is organized around studies and papers as primary entities. The studies table lists ICPSR studies, their metadata attributes, and usage information; the papers table was constructed using the ICPSR Bibliography and Dimensions database; and the curation logs table summarizes the data curation steps performed on a subset of ICPSR studies.

Studies (“ICPSR_STUDIES”): 10,605 social science research datasets available through ICPSR up to 2021-11-16 with variables for ICPSR study number, digital object identifier, study name, series number, series title, authoring entities, full-text description, release date, funding agency, geographic coverage, subject terms, topical archive, curation level, single principal investigator (PI), institutional PI, the total number of PIs, total variables in data files, question text availability, study variable indexing, level of restriction, total unique users downloading study data files and codebooks, total unique users downloading data only, and total unique papers citing data through November 2021. Studies map to the papers and curation logs table through ICPSR study numbers as “STUDY”. However, not every study in this table will have records in the papers and curation logs tables.

Papers (“ICPSR_PAPERS”): 94,755 publications collected from 2000-08-11 to 2021-11-16 in the ICPSR Bibliography and enriched with metadata from the Dimensions database with variables for paper number, identifier, title, authors, publication venue, item type, publication date, input date, ICPSR series numbers used in the paper, ICPSR study numbers used in the paper, the Dimension identifier, and the Dimensions link to the publication’s full text. Papers map to the studies table through ICPSR study numbers in the “STUDY_NUMS” field. Each record represents a single publication, and because a researcher can use multiple datasets when creating a publication, each record may list multiple studies or series.

Curation logs (“ICPSR_CURATION_LOGS”): 649 curation logs for 563 ICPSR studies (although most studies in the subset had one curation log, some studies were associated with multiple logs, with a maximum of 10) curated between February 2017 and December 2019 with variables for study number, action labels assigned to work description sentences using a classifier trained on ICPSR curation logs, hours of work associated with a single log entry, and total hours of work logged for the curation ticket. Curation logs map to the study and paper tables through ICPSR study numbers as “STUDY”. Each record represents a single logged action, and future users may wish to aggregate actions to the study level before joining tables.

figure 2

Entity-relation diagram.

Technical Validation

We report on the reliability of the dataset’s metadata in the following subsections. To support future reuse of the dataset, curation services provided through ICPSR improved data quality by checking for missing values, adding variable labels, and creating a codebook.

All 10,605 studies available through ICPSR have a DOI and a full-text description summarizing what the study is about, the purpose of the study, the main topics covered, and the questions the PIs attempted to answer when they conducted the study. Personal names (i.e., principal investigators) and organizational names (i.e., funding agencies) are standardized against an authority list maintained by ICPSR; geographic names and subject terms are also standardized and hierarchically indexed in the ICPSR Thesaurus 34 . Many of ICPSR’s studies (63%) are in a series and are distributed through the ICPSR General Archive (56%), a non-topical archive that accepts any social or behavioral science data. While study data have been available through ICPSR since 1962, the earliest digital release date recorded for a study was 1984-03-18, when ICPSR’s database was first employed, and the most recent date is 2021-10-28 when the dataset was collected.

Curation level information was recorded starting in 2017 and is available for 1,125 studies (11%); approximately 80% of studies with assigned curation levels received curation services, equally distributed between Levels 1 (least intensive), 2 (moderately intensive), and 3 (most intensive) (Fig.  3 ). Detailed descriptions of ICPSR’s curation levels are available online 35 . Additional metadata are available for a subset of 421 studies (4%), including information about whether the study has a single PI, an institutional PI, the total number of PIs involved, total variables recorded is available for online analysis, has searchable question text, has variables that are indexed for search, contains one or more restricted files, and whether the study is completely restricted. We provided additional metadata for this subset of ICPSR studies because they were released within the past five years and detailed curation and usage information were available for them. Usage statistics including total downloads and data file downloads are available for this subset of studies as well; citation statistics are available for 8,030 studies (76%). Most ICPSR studies have fewer than 500 users, as indicated by total downloads, or citations (Fig.  4 ).

figure 3

ICPSR study curation levels.

figure 4

ICPSR study usage.

A subset of 43,102 publications (45%) available in the ICPSR Bibliography had a DOI. Author metadata were entered as free text, meaning that variations may exist and require additional normalization and pre-processing prior to analysis. While author information is standardized for each publication, individual names may appear in different sort orders (e.g., “Earls, Felton J.” and “Stephen W. Raudenbush”). Most of the items in the ICPSR Bibliography as of 2021-11-16 were journal articles (59%), reports (14%), conference presentations (9%), or theses (8%) (Fig.  5 ). The number of publications collected in the Bibliography has increased each decade since the inception of ICPSR in 1962 (Fig.  6 ). Most ICPSR studies (76%) have one or more citations in a publication.

figure 5

ICPSR Bibliography citation types.

figure 6

ICPSR citations by decade.

Usage Notes

The dataset consists of three tables that can be joined using the “STUDY” key as shown in Fig.  2 . The “ICPSR_PAPERS” table contains one row per paper with one or more cited studies in the “STUDY_NUMS” column. We manipulated and analyzed the tables as CSV files with the Pandas library 36 in Python and the Tidyverse packages 37 in R.

The present MICA dataset can be used independently to study the relationship between curation decisions and data reuse. Evidence of reuse for specific studies is available in several forms: usage information, including downloads and citation counts; and citation contexts within papers that cite data. Analysis may also be performed on the citation network formed between datasets and papers that use them. Finally, curation actions can be associated with properties of studies and usage histories.

This dataset has several limitations of which users should be aware. First, Jira tickets can only be used to represent the intensiveness of curation for activities undertaken since 2017, when ICPSR started using both Curation Levels and Jira. Studies published before 2017 were all curated, but documentation of the extent of that curation was not standardized and therefore could not be included in these analyses. Second, the measure of publications relies upon the authors’ clarity of data citation and the ICPSR Bibliography staff’s ability to discover citations with varying formality and clarity. Thus, there is always a chance that some secondary-data-citing publications have been left out of the bibliography. Finally, there may be some cases in which a paper in the ICSPSR bibliography did not actually obtain data from ICPSR. For example, PIs have often written about or even distributed their data prior to their archival in ICSPR. Therefore, those publications would not have cited ICPSR but they are still collected in the Bibliography as being directly related to the data that were eventually deposited at ICPSR.

In summary, the MICA dataset contains relationships between two main types of entities – papers and studies – which can be mined. The tables in the MICA dataset have supported network analysis (community structure and clique detection) 30 ; natural language processing (NER for dataset reference detection) 32 ; visualizing citation networks (to search for datasets) 38 ; and regression analysis (on curation decisions and data downloads) 29 . The data are currently being used to develop research metrics and recommendation systems for research data. Given that DOIs are provided for ICPSR studies and articles in the ICPSR Bibliography, the MICA dataset can also be used with other bibliometric databases, including DataCite, Crossref, OpenAlex, and related indexes. Subscription-based services, such as Dimensions AI, are also compatible with the MICA dataset. In some cases, these services provide abstracts or full text for papers from which data citation contexts can be extracted for semantic content analysis.

Code availability

The code 27 used to produce the MICA project dataset is available on GitHub at https://github.com/ICPSR/mica-data-descriptor and through Zenodo with the identifier https://doi.org/10.5281/zenodo.8432666 . Data manipulation and pre-processing were performed in Python. Data curation for distribution was performed in SPSS.

He, L. & Han, Z. Do usage counts of scientific data make sense? An investigation of the Dryad repository. Library Hi Tech 35 , 332–342 (2017).

Article   Google Scholar  

Brickley, D., Burgess, M. & Noy, N. Google dataset search: Building a search engine for datasets in an open web ecosystem. In The World Wide Web Conference - WWW ‘19 , 1365–1375 (ACM Press, San Francisco, CA, USA, 2019).

Buneman, P., Dosso, D., Lissandrini, M. & Silvello, G. Data citation and the citation graph. Quantitative Science Studies 2 , 1399–1422 (2022).

Chao, T. C. Disciplinary reach: Investigating the impact of dataset reuse in the earth sciences. Proceedings of the American Society for Information Science and Technology 48 , 1–8 (2011).

Article   ADS   Google Scholar  

Parr, C. et al . A discussion of value metrics for data repositories in earth and environmental sciences. Data Science Journal 18 , 58 (2019).

Eschenfelder, K. R., Shankar, K. & Downey, G. The financial maintenance of social science data archives: Four case studies of long–term infrastructure work. J. Assoc. Inf. Sci. Technol. 73 , 1723–1740 (2022).

Palmer, C. L., Weber, N. M. & Cragin, M. H. The analytic potential of scientific data: Understanding re-use value. Proceedings of the American Society for Information Science and Technology 48 , 1–10 (2011).

Zimmerman, A. S. New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Sci. Technol. Human Values 33 , 631–652 (2008).

Cragin, M. H., Palmer, C. L., Carlson, J. R. & Witt, M. Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368 , 4023–4038 (2010).

Article   ADS   CAS   Google Scholar  

Fear, K. M. Measuring and Anticipating the Impact of Data Reuse . Ph.D. thesis, University of Michigan (2013).

Borgman, C. L., Van de Sompel, H., Scharnhorst, A., van den Berg, H. & Treloar, A. Who uses the digital data archive? An exploratory study of DANS. Proceedings of the Association for Information Science and Technology 52 , 1–4 (2015).

Pasquetto, I. V., Borgman, C. L. & Wofford, M. F. Uses and reuses of scientific data: The data creators’ advantage. Harvard Data Science Review 1 (2019).

Gregory, K., Groth, P., Scharnhorst, A. & Wyatt, S. Lost or found? Discovering data needed for research. Harvard Data Science Review (2020).

York, J. Seeking equilibrium in data reuse: A study of knowledge satisficing . Ph.D. thesis, University of Michigan (2022).

Kilbride, W. & Norris, S. Collaborating to clarify the cost of curation. New Review of Information Networking 19 , 44–48 (2014).

Robinson-Garcia, N., Mongeon, P., Jeng, W. & Costas, R. DataCite as a novel bibliometric source: Coverage, strengths and limitations. Journal of Informetrics 11 , 841–854 (2017).

Qin, J., Hemsley, J. & Bratt, S. E. The structural shift and collaboration capacity in GenBank networks: A longitudinal study. Quantitative Science Studies 3 , 174–193 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Acuna, D. E., Yi, Z., Liang, L. & Zhuang, H. Predicting the usage of scientific datasets based on article, author, institution, and journal bibliometrics. In Smits, M. (ed.) Information for a Better World: Shaping the Global Future. iConference 2022 ., 42–52 (Springer International Publishing, Cham, 2022).

Zeng, T., Wu, L., Bratt, S. & Acuna, D. E. Assigning credit to scientific datasets using article citation networks. Journal of Informetrics 14 , 101013 (2020).

Koesten, L., Vougiouklis, P., Simperl, E. & Groth, P. Dataset reuse: Toward translating principles to practice. Patterns 1 , 100136 (2020).

Du, C., Cohoon, J., Lopez, P. & Howison, J. Softcite dataset: A dataset of software mentions in biomedical and economic research publications. J. Assoc. Inf. Sci. Technol. 72 , 870–884 (2021).

Aryani, A. et al . A research graph dataset for connecting research data repositories using RD-Switchboard. Sci Data 5 , 180099 (2018).

Färber, M. & Lamprecht, D. The data set knowledge graph: Creating a linked open data source for data sets. Quantitative Science Studies 2 , 1324–1355 (2021).

Perry, A. & Netscher, S. Measuring the time spent on data curation. Journal of Documentation 78 , 282–304 (2022).

Trisovic, A. et al . Advancing computational reproducibility in the Dataverse data repository platform. In Proceedings of the 3rd International Workshop on Practical Reproducible Evaluation of Computer Systems , P-RECS ‘20, 15–20, https://doi.org/10.1145/3391800.3398173 (Association for Computing Machinery, New York, NY, USA, 2020).

Borgman, C. L., Scharnhorst, A. & Golshan, M. S. Digital data archives as knowledge infrastructures: Mediating data sharing and reuse. Journal of the Association for Information Science and Technology 70 , 888–904, https://doi.org/10.1002/asi.24172 (2019).

Lafia, S. et al . MICA Data Descriptor. Zenodo https://doi.org/10.5281/zenodo.8432666 (2023).

Lafia, S., Thomer, A., Bleckley, D., Akmon, D. & Hemphill, L. Leveraging machine learning to detect data curation activities. In 2021 IEEE 17th International Conference on eScience (eScience) , 149–158, https://doi.org/10.1109/eScience51609.2021.00025 (2021).

Hemphill, L., Pienta, A., Lafia, S., Akmon, D. & Bleckley, D. How do properties of data, their curation, and their funding relate to reuse? J. Assoc. Inf. Sci. Technol. 73 , 1432–44, https://doi.org/10.1002/asi.24646 (2021).

Lafia, S., Fan, L., Thomer, A. & Hemphill, L. Subdivisions and crossroads: Identifying hidden community structures in a data archive’s citation network. Quantitative Science Studies 3 , 694–714, https://doi.org/10.1162/qss_a_00209 (2022).

ICPSR. ICPSR Bibliography of Data-related Literature: Collection Criteria. https://www.icpsr.umich.edu/web/pages/ICPSR/citations/collection-criteria.html (2023).

Lafia, S., Fan, L. & Hemphill, L. A natural language processing pipeline for detecting informal data references in academic literature. Proc. Assoc. Inf. Sci. Technol. 59 , 169–178, https://doi.org/10.1002/pra2.614 (2022).

Hook, D. W., Porter, S. J. & Herzog, C. Dimensions: Building context for search and evaluation. Frontiers in Research Metrics and Analytics 3 , 23, https://doi.org/10.3389/frma.2018.00023 (2018).

https://www.icpsr.umich.edu/web/ICPSR/thesaurus (2002). ICPSR. ICPSR Thesaurus.

https://www.icpsr.umich.edu/files/datamanagement/icpsr-curation-levels.pdf (2020). ICPSR. ICPSR Curation Levels.

McKinney, W. Data Structures for Statistical Computing in Python. In van der Walt, S. & Millman, J. (eds.) Proceedings of the 9th Python in Science Conference , 56–61 (2010).

Wickham, H. et al . Welcome to the Tidyverse. Journal of Open Source Software 4 , 1686 (2019).

Fan, L., Lafia, S., Li, L., Yang, F. & Hemphill, L. DataChat: Prototyping a conversational agent for dataset search and visualization. Proc. Assoc. Inf. Sci. Technol. 60 , 586–591 (2023).

Download references

Acknowledgements

We thank the ICPSR Bibliography staff, the ICPSR Data Curation Unit, and the ICPSR Data Stewardship Committee for their support of this research. This material is based upon work supported by the National Science Foundation under grant 1930645. This project was made possible in part by the Institute of Museum and Library Services LG-37-19-0134-19.

Author information

Authors and affiliations.

Inter-university Consortium for Political and Social Research, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill, Sara Lafia, David Bleckley & Elizabeth Moss

School of Information, University of Michigan, Ann Arbor, MI, 48104, USA

Libby Hemphill & Lizhou Fan

School of Information, University of Arizona, Tucson, AZ, 85721, USA

Andrea Thomer

You can also search for this author in PubMed   Google Scholar

Contributions

L.H. and A.T. conceptualized the study design, D.B., E.M., and S.L. prepared the data, S.L., L.F., and L.H. analyzed the data, and D.B. validated the data. All authors reviewed and edited the manuscript.

Corresponding author

Correspondence to Libby Hemphill .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Hemphill, L., Thomer, A., Lafia, S. et al. A dataset for measuring the impact of research data and their curation. Sci Data 11 , 442 (2024). https://doi.org/10.1038/s41597-024-03303-2

Download citation

Received : 16 November 2023

Accepted : 24 April 2024

Published : 03 May 2024

DOI : https://doi.org/10.1038/s41597-024-03303-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

statistical research data

statistical research data

Statistical Papers

Statistical Papers is a forum for presentation and critical assessment of statistical methods encouraging the discussion of methodological foundations and potential applications.

  • The Journal stresses statistical methods that have broad applications, giving special attention to those relevant to the economic and social sciences.
  • Covers all topics of modern data science, such as frequentist and Bayesian design and inference as well as statistical learning.
  • Contains original research papers (regular articles), survey articles, short communications, reports on statistical software, and book reviews.
  • High author satisfaction with 90% likely to publish in the journal again.
  • Werner G. Müller,
  • Carsten Jentsch,
  • Shuangzhe Liu,
  • Ulrike Schneider

statistical research data

Latest issue

Volume 65, Issue 2

Latest articles

Analyzing quantitative performance: bayesian estimation of 3-component mixture geometric distributions based on kumaraswamy prior.

  • Nadeem Akhtar
  • Sajjad Ahmad Khan
  • Haifa Alqahtani

Variation comparison between infinitely divisible distributions and the normal distribution

statistical research data

Multivariate stochastic comparisons of sequential order statistics with non-identical components

  • Tanmay Sahoo
  • Nil Kamal Hazra
  • Narayanaswamy Balakrishnan

statistical research data

New copula families and mixing properties

  • Martial Longla

statistical research data

Multiple random change points in survival analysis with applications to clinical trials

statistical research data

Journal updates

Write & submit: overleaf latex template.

Overleaf LaTeX Template

Journal information

  • Australian Business Deans Council (ABDC) Journal Quality List
  • Current Index to Statistics
  • Google Scholar
  • Japanese Science and Technology Agency (JST)
  • Mathematical Reviews
  • Norwegian Register for Scientific Journals and Series
  • OCLC WorldCat Discovery Service
  • Research Papers in Economics (RePEc)
  • Science Citation Index Expanded (SCIE)
  • TD Net Discovery Service
  • UGC-CARE List (India)

Rights and permissions

Editorial policies

© Springer-Verlag GmbH Germany, part of Springer Nature

  • Find a journal
  • Publish with us
  • Track your research

Enago Academy

Effective Use of Statistics in Research – Methods and Tools for Data Analysis

' src=

Remember that impending feeling you get when you are asked to analyze your data! Now that you have all the required raw data, you need to statistically prove your hypothesis. Representing your numerical data as part of statistics in research will also help in breaking the stereotype of being a biology student who can’t do math.

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings. In this article, we will discuss how using statistical methods for biology could help draw meaningful conclusion to analyze biological studies.

Table of Contents

Role of Statistics in Biological Research

Statistics is a branch of science that deals with collection, organization and analysis of data from the sample to the whole population. Moreover, it aids in designing a study more meticulously and also give a logical reasoning in concluding the hypothesis. Furthermore, biology study focuses on study of living organisms and their complex living pathways, which are very dynamic and cannot be explained with logical reasoning. However, statistics is more complex a field of study that defines and explains study patterns based on the sample sizes used. To be precise, statistics provides a trend in the conducted study.

Biological researchers often disregard the use of statistics in their research planning, and mainly use statistical tools at the end of their experiment. Therefore, giving rise to a complicated set of results which are not easily analyzed from statistical tools in research. Statistics in research can help a researcher approach the study in a stepwise manner, wherein the statistical analysis in research follows –

1. Establishing a Sample Size

Usually, a biological experiment starts with choosing samples and selecting the right number of repetitive experiments. Statistics in research deals with basics in statistics that provides statistical randomness and law of using large samples. Statistics teaches how choosing a sample size from a random large pool of sample helps extrapolate statistical findings and reduce experimental bias and errors.

2. Testing of Hypothesis

When conducting a statistical study with large sample pool, biological researchers must make sure that a conclusion is statistically significant. To achieve this, a researcher must create a hypothesis before examining the distribution of data. Furthermore, statistics in research helps interpret the data clustered near the mean of distributed data or spread across the distribution. These trends help analyze the sample and signify the hypothesis.

3. Data Interpretation Through Analysis

When dealing with large data, statistics in research assist in data analysis. This helps researchers to draw an effective conclusion from their experiment and observations. Concluding the study manually or from visual observation may give erroneous results; therefore, thorough statistical analysis will take into consideration all the other statistical measures and variance in the sample to provide a detailed interpretation of the data. Therefore, researchers produce a detailed and important data to support the conclusion.

Types of Statistical Research Methods That Aid in Data Analysis

statistics in research

Statistical analysis is the process of analyzing samples of data into patterns or trends that help researchers anticipate situations and make appropriate research conclusions. Based on the type of data, statistical analyses are of the following type:

1. Descriptive Analysis

The descriptive statistical analysis allows organizing and summarizing the large data into graphs and tables . Descriptive analysis involves various processes such as tabulation, measure of central tendency, measure of dispersion or variance, skewness measurements etc.

2. Inferential Analysis

The inferential statistical analysis allows to extrapolate the data acquired from a small sample size to the complete population. This analysis helps draw conclusions and make decisions about the whole population on the basis of sample data. It is a highly recommended statistical method for research projects that work with smaller sample size and meaning to extrapolate conclusion for large population.

3. Predictive Analysis

Predictive analysis is used to make a prediction of future events. This analysis is approached by marketing companies, insurance organizations, online service providers, data-driven marketing, and financial corporations.

4. Prescriptive Analysis

Prescriptive analysis examines data to find out what can be done next. It is widely used in business analysis for finding out the best possible outcome for a situation. It is nearly related to descriptive and predictive analysis. However, prescriptive analysis deals with giving appropriate suggestions among the available preferences.

5. Exploratory Data Analysis

EDA is generally the first step of the data analysis process that is conducted before performing any other statistical analysis technique. It completely focuses on analyzing patterns in the data to recognize potential relationships. EDA is used to discover unknown associations within data, inspect missing data from collected data and obtain maximum insights.

6. Causal Analysis

Causal analysis assists in understanding and determining the reasons behind “why” things happen in a certain way, as they appear. This analysis helps identify root cause of failures or simply find the basic reason why something could happen. For example, causal analysis is used to understand what will happen to the provided variable if another variable changes.

7. Mechanistic Analysis

This is a least common type of statistical analysis. The mechanistic analysis is used in the process of big data analytics and biological science. It uses the concept of understanding individual changes in variables that cause changes in other variables correspondingly while excluding external influences.

Important Statistical Tools In Research

Researchers in the biological field find statistical analysis in research as the scariest aspect of completing research. However, statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible.

1. Statistical Package for Social Science (SPSS)

It is a widely used software package for human behavior research. SPSS can compile descriptive statistics, as well as graphical depictions of result. Moreover, it includes the option to create scripts that automate analysis or carry out more advanced statistical processing.

2. R Foundation for Statistical Computing

This software package is used among human behavior research and other fields. R is a powerful tool and has a steep learning curve. However, it requires a certain level of coding. Furthermore, it comes with an active community that is engaged in building and enhancing the software and the associated plugins.

3. MATLAB (The Mathworks)

It is an analytical platform and a programming language. Researchers and engineers use this software and create their own code and help answer their research question. While MatLab can be a difficult tool to use for novices, it offers flexibility in terms of what the researcher needs.

4. Microsoft Excel

Not the best solution for statistical analysis in research, but MS Excel offers wide variety of tools for data visualization and simple statistics. It is easy to generate summary and customizable graphs and figures. MS Excel is the most accessible option for those wanting to start with statistics.

5. Statistical Analysis Software (SAS)

It is a statistical platform used in business, healthcare, and human behavior research alike. It can carry out advanced analyzes and produce publication-worthy figures, tables and charts .

6. GraphPad Prism

It is a premium software that is primarily used among biology researchers. But, it offers a range of variety to be used in various other fields. Similar to SPSS, GraphPad gives scripting option to automate analyses to carry out complex statistical calculations.

This software offers basic as well as advanced statistical tools for data analysis. However, similar to GraphPad and SPSS, minitab needs command over coding and can offer automated analyses.

Use of Statistical Tools In Research and Data Analysis

Statistical tools manage the large data. Many biological studies use large data to analyze the trends and patterns in studies. Therefore, using statistical tools becomes essential, as they manage the large data sets, making data processing more convenient.

Following these steps will help biological researchers to showcase the statistics in research in detail, and develop accurate hypothesis and use correct tools for it.

There are a range of statistical tools in research which can help researchers manage their research data and improve the outcome of their research by better interpretation of data. You could use statistics in research by understanding the research question, knowledge of statistics and your personal experience in coding.

Have you faced challenges while using statistics in research? How did you manage it? Did you use any of the statistical tools to help you with your research data? Do write to us or comment below!

Frequently Asked Questions

Statistics in research can help a researcher approach the study in a stepwise manner: 1. Establishing a sample size 2. Testing of hypothesis 3. Data interpretation through analysis

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings.

Statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible. They can manage large data sets, making data processing more convenient. A great number of tools are available to carry out statistical analysis of data like SPSS, SAS (Statistical Analysis Software), and Minitab.

' src=

nice article to read

Holistic but delineating. A very good read.

Rate this article Cancel Reply

Your email address will not be published.

statistical research data

Enago Academy's Most Popular Articles

Empowering Researchers, Enabling Progress: How Enago Academy contributes to the SDGs

  • Promoting Research
  • Thought Leadership
  • Trending Now

How Enago Academy Contributes to Sustainable Development Goals (SDGs) Through Empowering Researchers

The United Nations Sustainable Development Goals (SDGs) are a universal call to action to end…

Research Interviews for Data Collection

  • Reporting Research

Research Interviews: An effective and insightful way of data collection

Research interviews play a pivotal role in collecting data for various academic, scientific, and professional…

Planning Your Data Collection

Planning Your Data Collection: Designing methods for effective research

Planning your research is very important to obtain desirable results. In research, the relevance of…

best plagiarism checker

  • Language & Grammar

Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!

While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…

Year

  • Industry News
  • Publishing News

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats were achieved!

It’s beginning to look a lot like success! Some of the greatest opportunities to research…

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats…

statistical research data

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

statistical research data

As a researcher, what do you consider most when choosing an image manipulation detector?

And get full access to all statistics. Starting from $2,388 USD per year!

Trusted by more than 23,000 companies

Trending statistics

Get facts and insights on topics that matter, may 10, 2024 | music, most streamed eurovision songs on spotify worldwide 2024.

As every year, the Eurovision song contest gives stage to bands and artists selected by the European and Europe-adjacent nations (as well as Australia), to compete for the the Eurovison trophy. In 2024, the contest was hosted by Sweden in Malmö, following Loreen's second win in the competition the year before. The Italian song "La noia" by artist Angelina Mango was streamed over 60 million times on Spotify until the day before the competition, making it the most streamed song from Eurovision 2024 on the platform. Croatia's Baby Lasagna, the favorite to win the contest, stood at 8th position and recorded 6.5 million streams.

May 7, 2024 | Video Streaming

Quarterly disney+ subscribers count worldwide 2020-2024.

In the second quarter of 2024, the number of global Disney+ subscribers amounted to 153.6 million. This marked a decline of around four million compared with the same quarter of the previous year, amid further price increases of the service in October 2023.

The Walt Disney Company launched its highly anticipated streaming service in November 2019. Less than two years later, Disney+ reached a significant milestone by amassing 100 million subscribers worldwide – which is even more impressive considering that the company had initially set a goal of 60 to 90 million users by 2024. By comparison, it took SVOD market leader Netflix roughly a decade to reach the 100-million-mark, despite having navigated a much less competitive market then.

What makes Disney+ so appealing to audiences is not just its repertoire of animated classics but also the vast range of content from Disney’s various subsidiaries. The House of Mouse has acquired Lucasfilm, 20th Century Studios, Pixar, and Marvel Entertainment over the last few years, setting it up for success with viewers of all ages. In 2022, for example, “Moon Knight” and “Obi-Wan Kenobi,” available exclusively on Disney+, were among the five most popular original series releases worldwide .

May 2, 2024 | Travel, Tourism & Hospitality

Number of international tourist arrivals worldwide 1950-2023.

The number of international tourist arrivals worldwide increased in 2023 compared to the previous year. That said, it remained still below the figure reported in 2019, before the impact of the COVID-19 pandemic. After declining with the onset of the health crisis to roughly 407 million, the lowest figure recorded since 1989, global inbound tourist arrivals showed strong signs of recovery in the following years, totaling just under 1.3 billion in 2023.

Since 2005, Europe has been the global region attracting the highest number of international tourists . While inbound tourist arrivals in Europe rose significantly in 2023 over the previous year, they did not catch up yet with pre-pandemic levels. Within this region,  Southern and Mediterranean Europe was the most popular area for international tourism , recording around 265 million arrivals in 2022.

According to Statista Mobility Market Insights, the global travel and tourism market's revenue - including hotels, package holidays, vacation rentals, and camping - amounted to nearly 860 million U.S. dollars in 2023, recovering from the impact of COVID-19. When breaking down  travel and tourism's revenue worldwide by sales channels , it emerges that the online channel generated over two-third of the global transactions' value that year.

May 6, 2024 | Semiconductors

Semiconductor materials market revenue worldwide 2019-2023, by region.

The global semiconductor materials market generated revenues of 66.72 billion U.S. dollars in 2023, of which 19.18 billion U.S. dollars was consumed in Taiwan. A further 13.09 billion U.S. dollars was consumed in China.

May 8, 2024 | Apps

Monthly global downloads of temu shopping app 2022-2024.

The popularity of ecommerce platform Temu has been surging since its debut in the fall of 2022. In April 2024, the app was downloaded over 46 million times all over the world, making it more popular than Amazon’s marketplace app.

Temu, which is owned by the Chinese online retailer PDD Holdings, has successfully replicated the meteoric growth of its sister app Pinduoduo in overseas markets through effective marketing campaigns. Focusing on providing low-cost products with free and fast shipping, Temu has emerged as a wallet-saving alternative amidst rising inflation . The newcomer has also followed the playbook of Pinduoduo, such as gamification features and personalized purchase recommendations, to make shopping on mobile more fun.

These strategies work. In the first five months of 2023, Temu generated over 1.5 billion U.S. dollars in gross merchandise volume. It has caught the eye of inflation-weary shoppers in the West, particularly young people in the United States and Mexico. In April 2023, Temu achieved its first milestone of over 100 million active users in the United States.  

May 3, 2024 | Employment

U.s. seasonally adjusted unemployment rate 2022-2024.

The seasonally-adjusted national unemployment rate is measured on a monthly basis in the United States. In April 2024, the national unemployment rate was at 3.9 percent. Seasonal adjustment is a statistical method of removing the seasonal component of a time series that is used when analyzing non-seasonal trends. 

According to the Bureau of Labor Statistics - the principle fact-finding agency for the U.S. Federal Government in labor economics and statistics - unemployment decreased dramatically between 2010 to 2019. This trend of decreasing unemployment followed after a high in 2010 resulting from the 2008 financial crisis. However, after a smaller financial crisis due to the COVID-19 pandemic unemployment reached 8.1 percent in 2020. As the economy recovered the unemployment rate fell to 5.3 in 2021, and fell even further in 2022. Additional statistics from the BLS paint an interesting picture of unemployment in the United States. In November 2023, the states with the highest (seasonally adjusted) unemployment rate were the Nevada and the District of Columbia. Unemployment was the lowest in Maryland, at 1.8 percent. Workers in the agricultural and related industries suffered the highest unemployment rate of any industry at seven percent in December 2023.

May 10, 2024 | Apps

Monthly active users of leading music apps in china 2024.

As of February 2024, Tencent's music apps - KuGou, QQ, and Kuwo - were the three most popular music streaming apps in China. The monthly active users of KuGou totaled to around 350 million, whereas Baidu's rebranded Qian Qian amassed around 5.7 million monthly active users. 

Globally, Tencent is one of the biggest music streaming providers. The Chinese tech behemoth owns the largest music company in the country – Tencent Music Entertainment (TME). With roughly 60 percent of the local music market share, TME had reported a profitable financial status since its founding in 2016. Besides, its monthly active users rose to 600 million – capturing most of the mobile music userbase in the country. Among its music apps, KuGou Music was found to be favored by those who aged between 19 and 28, while QQ Music was more preferred among older users.

Although the usage of mobile music streaming is high among Chinese netizens, most of them are not familiar with paying for the services. Music streaming platforms can hardly generate high revenue per user . They are more likely to opt for free sources to access their choice of music. In 2020, paying members was estimated to make up about eight percent of the total music userbase in China. To address the problem, industry experts suggested to raise public awareness of copyright protections.

May 8, 2024 | National Security

Eu military defense: defense expenditure 2005-2022, by member states.

As of 2022, the European Union member states spent almost 240 billion Euros collectively on military defense. This includes money spent on procuring weapons, paying salaries, and other operational costs, as well as research & development expenditure. The total amount spent on defense declined significantly following the global financial crisis, as European countries cut back on public expenditure, reaching a low point of 138 billion Euros in 2012.  Since 2014, when Russia reemerged as a geopolitical threat to EU countries, due to its illegal annexation of Crimea from Ukraine and its covert military operations in the east of that country, military expenditure has increased sharply, rising particularly in Poland and Germany. Despite this, Germany still lags far behind the spending target set for it by NATO and the European Defence Agency of two percent of GDP.

May 9, 2024 | Elections

Voting intention in the united kingdom 2017-2024.

In May 2024, 48 percent of British adults would vote for the Labour Party in a general election, compared with 18 percent who would vote for the Conservative Party. The ruling Conservatives have trailed Labour in the polls throughout 2022 and 2023, with a huge gap emerging in September 2022 when Liz Truss came to power. Truss' short time as Prime Minister was widely seen as a disaster for the country and her party, and she was succeeded by Rishi Sunak as Prime Minister that October. Labour has maintained their lead in the polls since Sunak became Prime Minister, and would win the next general election based on the most recent polls. 

The next UK general election is expected in 2024, but may take place as late as January 2025. Unlike many other democracies, general elections in the UK have no fixed date, with the power to call one resting with the Prime Minister, although this must be done at least every five years. While the last election in 2019 was held in the winter, it is unlikely that Sunak will wait until then to call an election, with a Spring or Autumn election in 2024 the most likely scenario. It may suit the Conservatives to bide their time until the Autumn, however, in the hope that the economic situation improves, giving a lift to Sunak's job rating and a boost to the government's sinking approval ratings . The job of catching Labour in the polls in 2024 may be a long shot for the Conservatives, especially as no ground was made up in 2023. 

After a tough 2022, in which Britain suffered through its worst cost of living crisis in a generation, the economy was consistently identified as the main issue facing the country , just ahead of healthcare. To respond to these concerns, Rishi Sunak started 2023 with five pledges; halve inflation, grow the economy, reduce national debt, cut NHS waiting times, and stop small boats. One year on from this announcement, just one pledge can be said to be realized, with CPI inflation falling from 10.1 percent at the start of the year to 3.9 percent by November. There is some ambiguity regarding the success of some of the other pledges. The economy shrank in the third quarter of 2023, the national  debt has increased slightly, while small boat arrivals are down from 2022, but still higher than in most other years. The pledge to cut NHS waiting times was not fulfilled either, with the number of people awaiting treatment rising in 2023.

May 2, 2024 | Crime & Law Enforcement

Safest countries in africa 2024.

In 2024, Rwanda was the safest country in Africa. It had a score of roughly 73.2 points in the safety index, making it the African nation with the lowest crime incidents and the only country with high safety levels - over 60 index points. In several other countries in Africa, the level of safety was considered moderate (40 to 60 index points).

Popular topics

Starting point of your research.

  • Agriculture
  • Pharmaceuticals
  • Advertising
  • Video Games
  • Virtual Reality
  • European Union
  • United Kingdom

Market Insights

Discover data on your market

Gain access to valuable and comparable market data for over 190+ countries, territories, and regions with our Market Insights. Get deep insights into important figures, e.g., revenue metrics, key performance indicators, and much more.

All Market Insights topics at a glance

Statista accounts

Our complete solutions, basic account, get to know us.

  • Access to basic statistics
  • Download as PDF & PNG

Starter Account

The ideal entry-level account for individual users.

  • Full access to all statistics
  • Stand-alone license

Professional Account

Our company solution.

  • All functions of the Starter Account
  • Access to dossiers, forecasts, studies

*All products require an annual contract. Prices do not include sales tax.

Global stories vividly visualized

Consumer insights, understand what drives consumers.

The Consumer Insights helps marketers, planners and product managers to understand consumer behavior and their interaction with brands. Explore consumption and media usage on a global basis.

Our Service

Save time & money with statista, trusted content.

With an increasing number of Statista-cited media articles, Statista has established itself as a reliable partner for the largest media companies of the world.

Industry expertise

Over 500 researchers and specialists gather and double-check every statistic we publish. Experts provide country and industry-based forecasts.

Flatrate access

With our solutions you find data that matters within minutes – ready to go in your favorite format.

Mon - Fri, 9am - 6pm (EST)

Mon - Fri, 9am - 5pm (SGT)

Mon - Fri, 10:00am - 6:00pm (JST)

Mon - Fri, 9:30am - 5pm (GMT)

Library Home

Introduction to Statistics

(15 reviews)

statistical research data

David Lane, Rice University

Copyright Year: 2003

Publisher: David Lane

Language: English

Formats Available

Conditions of use.

No Rights Reserved

Learn more about reviews.

Reviewed by Terri Torres, professor, Oregon Institute of Technology on 8/17/23

This author covers all the topics that would be covered in an introductory statistics course plus some. I could imagine using it for two courses at my university, which is on the quarter system. I would rather have the problem of too many topics... read more

Comprehensiveness rating: 5 see less

This author covers all the topics that would be covered in an introductory statistics course plus some. I could imagine using it for two courses at my university, which is on the quarter system. I would rather have the problem of too many topics rather than too few.

Content Accuracy rating: 5

Yes, Lane is both thorough and accurate.

Relevance/Longevity rating: 5

What is covered is what is usually covered in an introductory statistics book. The only topic I may, given sufficient time, cover is bootstrapping.

Clarity rating: 5

The book is clear and well-written. For the trickier topics, simulations are included to help with understanding.

Consistency rating: 5

All is organized in a way that is consistent with the previous topic.

Modularity rating: 5

The text is organized in a way that easily enables navigation.

Organization/Structure/Flow rating: 5

The text is organized like most statistics texts.

Interface rating: 5

Easy navigation.

Grammatical Errors rating: 5

I didn't see any grammatical errors.

Cultural Relevance rating: 5

Nothing is included that is culturally insensitive.

The videos that accompany this text are short and easy to watch and understand. Videos should be short enough to teach, but not so long that they are tiresome. This text includes almost everything: videos, simulations, case studies---all nicely organized in one spot. In addition, Lane has promised to send an instructor's manual and slide deck.

Reviewed by Professor Sandberg, Professor, Framingham State University on 6/29/21

This text covers all the usual topics in an Introduction to Statistics for college students. In addition, it has some additional topics that are useful. read more

This text covers all the usual topics in an Introduction to Statistics for college students. In addition, it has some additional topics that are useful.

I did not find any errors.

Some of the examples are dated. And the frequent use of male/female examples need updating in terms of current gender splits.

I found it was easy to read and understand and I expect that students would also find the writing clear and the explanations accessible.

Even with different authors of chapter, the writing is consistent.

The text is well organized into sections making it easy to assign individual topics and sections.

The topics are presented in the usual order. Regression comes later in the text but there is a difference of opinions about whether to present it early with descriptive statistics for bivariate data or later with inferential statistics.

I had no problem navigating the text online.

The writing is grammatical correct.

I saw no issues that would be offensive.

I did like this text. It seems like it would be a good choice for most introductory statistics courses. I liked that the Monty Hall problem was included in the probability section. The author offers to provide an instructor's manual, PowerPoint slides and additional questions. These additional resources are very helpful and not always available with online OER texts.

Reviewed by Emilio Vazquez, Associate Professor, Trine University on 4/23/21

This appears to be an excellent textbook for an Introductory Course in Statistics. It covers subjects in enough depth to fulfill the needs of a beginner in Statistics work yet is not so complex as to be overwhelming. read more

This appears to be an excellent textbook for an Introductory Course in Statistics. It covers subjects in enough depth to fulfill the needs of a beginner in Statistics work yet is not so complex as to be overwhelming.

I found no errors in their discussions. Did not work out all of the questions and answers but my sampling did not reveal any errors.

Some of the examples may need updating depending on the times but the examples are still relevant at this time.

This is a Statistics text so a little dry. I found that the derivation of some of the formulas was not explained. However the background is there to allow the instructor to derive these in class if desired.

The text is consistent throughout using the same verbiage in various sections.

The text dose lend itself to reasonable reading assignments. For example the chapter (Chapter 3) on Summarizing Distributions covers Central Tendency and its associated components in an easy 20 pages with Measures of Variability making up most of the rest of the chapter and covering approximately another 20 pages. Exercises are available at the end of each chapter making it easy for the instructor to assign reading and exercises to be discussed in class.

The textbook flows easily from Descriptive to Inferential Statistics with chapters on Sampling and Estimation preceding chapters on hypothesis testing

I had no problems with navigation

All textbooks have a few errors but certainly nothing glaring or making text difficult

I saw no issues and I am part of a cultural minority in the US

Overall I found this to be a excellent in-depth overview of Statistical Theory, Concepts and Analysis. The length of the textbook appears to be more than adequate for a one-semester course in Introduction to Statistics. As I no longer teach a full statistics course but simply a few lectures as part of our Research Curriculum, I am recommending this book to my students as a good reference. Especially as it is available on-line and in Open Access.

Reviewed by Audrey Hickert, Assistant Professor, Southern Illinois University Carbondale on 3/29/21

All of the major topics of an introductory level statistics course for social science are covered. Background areas include levels of measurement and research design basics. Descriptive statistics include all major measures of central tendency and... read more

All of the major topics of an introductory level statistics course for social science are covered. Background areas include levels of measurement and research design basics. Descriptive statistics include all major measures of central tendency and dispersion/variation. Building blocks for inferential statistics include sampling distributions, the standard normal curve (z scores), and hypothesis testing sections. Inferential statistics include how to calculate confidence intervals, as well as conduct tests of one-sample tests of the population mean (Z- and t-tests), two-sample tests of the difference in population means (Z- and t-tests), chi square test of independence, correlation, and regression. Doesn’t include full probability distribution tables (e.g., t or Z), but those can be easily found online in many places.

I did not find any errors or issues of inaccuracy. When a particular method or practice is debated in the field, the authors acknowledge it (and provide citations in some circumstances).

Relevance/Longevity rating: 4

Basic statistics are standard, so the core information will remain relevant in perpetuity. Some of the examples are dated (e.g., salaries from 1999), but not problematic.

Clarity rating: 4

All of the key terms, formulas, and logic for statistical tests are clearly explained. The book sometimes uses different notation than other entry-level books. For example, the variance formula uses "M" for mean, rather than x-bar.

The explanations are consistent and build from and relate to corresponding sections that are listed in each unit.

Modularity is a strength of this text in both the PDF and interactive online format. Students can easily navigate to the necessary sections and each starts with a “Prerequisites” list of other sections in the book for those who need the additional background material. Instructors could easily compile concise sub-sections of the book for readings.

The presentation of topics differs somewhat from the standard introductory social science statistics textbooks I have used before. However, the modularity allows the instructor and student to work through the discrete sections in the desired order.

Interface rating: 4

For the most part the display of all images/charts is good and navigation is straightforward. One concern is that the organization of the Table of Contents does not exactly match the organizational outline at the start of each chapter in the PDF version. For example, sometimes there are more detailed sub-headings at the start of chapter and occasionally slightly different section headings/titles. There are also inconsistencies in section listings at start of chapters vs. start of sub-sections.

The text is easy to read and free from any obvious grammatical errors.

Although some of the examples are outdated, I did not review any that were offensive. One example of an outdated reference is using descriptive data on “Men per 100 Women” in U.S. cities as “useful if we are looking for an opposite-sex partner”.

This is a good introduction level statistics text book if you have a course with students who may be intimated by longer texts with more detailed information. Just the core basics are provided here and it is easy to select the sections you need. It is a good text if you plan to supplement with an array of your own materials (lectures, practice, etc.) that are specifically tailored to your discipline (e.g., criminal justice and criminology). Be advised that some formulas use different notation than other standard texts, so you will need to point that out to students if they differ from your lectures or assessment materials.

Reviewed by Shahar Boneh, Professor, Metropolitan State University of Denver on 3/26/21, updated 4/22/21

The textbook is indeed quite comprehensive. It can accommodate any style of introductory statistics course. read more

The textbook is indeed quite comprehensive. It can accommodate any style of introductory statistics course.

The text seems to be statistically accurate.

It is a little too extensive, which requires instructors to cover it selectively, and has a potential to confuse the students.

It is written clearly.

Consistency rating: 4

The terminology is fairly consistent. There is room for some improvement.

By the nature of the subject, the topics have to be presented in a sequential and coherent order. However, the book breaks things down quite effectively.

Organization/Structure/Flow rating: 3

Some of the topics are interleaved and not presented in the order I would like to cover them.

Good interface.

The grammar is ok.

The book seems to be culturally neutral, and not offensive in any way.

I really liked the simulations that go with the book. Parts of the book are a little too advanced for students who are learning statistics for the first time.

Reviewed by Julie Gray, Adjunct Assistant Professor, University of Texas at Arlington on 2/26/21

The textbook is for beginner-level students. The concept development is appropriate--there is always room to grow to high higher level, but for an introduction, the basics are what is needed. This is a well-thought-through OER textbook project by... read more

The textbook is for beginner-level students. The concept development is appropriate--there is always room to grow to high higher level, but for an introduction, the basics are what is needed. This is a well-thought-through OER textbook project by Dr. Lane and colleagues. It is obvious that several iterations have only made it better.

I found all the material accurate.

Essentially, statistical concepts at the introductory level are accepted as universal. This suggests that the relevance of this textbook will continue for a long time.

The book is well written for introducing beginners to statistical concepts. The figures, tables, and animated examples reinforce the clarity of the written text.

Yes, the information is consistent; when it is introduced in early chapters it ties in well in later chapters that build on and add more understanding for the topic.

Modularity rating: 4

The book is well-written with attention to modularity where possible. Due to the nature of statistics, that is not always possible. The content is presented in the order that I usually teach these concepts.

The organization of the book is good, I particularly like the sample lecture slide presentations and the problem set with solutions for use in quizzes and exams. These are available by writing to the author. It is wonderful to have access to these helpful resources for instructors to use in preparation.

I did not find any interface issues.

The book is well written. In my reading I did not notice grammatical errors.

For this subject and in the examples given, I did not notice any cultural issues.

For the field of social work where qualitative data is as common as quantitative, the importance of giving students the rationale or the motivation to learn the quantitative side is understated. To use this text as an introductory statistics OER textbook in a social work curriculum, the instructor will want to bring in field-relevant examples to engage and motivate students. The field needs data-driven decision making and evidence-based practices to become more ubiquitous than not. Preparing future social workers by teaching introductory statistics is essential to meet that goal.

Reviewed by Mamata Marme, Assistant Professor, Augustana College on 6/25/19

This textbook offers a fairly comprehensive summary of what should be discussed in an introductory course in Statistics. The statistical literacy exercises are particularly interesting. It would be helpful to have the statistical tables... read more

Comprehensiveness rating: 4 see less

This textbook offers a fairly comprehensive summary of what should be discussed in an introductory course in Statistics. The statistical literacy exercises are particularly interesting. It would be helpful to have the statistical tables attached in the same package, even though they are available online.

The terminology and notation used in the textbook is pretty standard. The content is accurate.

The statistical literacy example are up to date but will need to be updated fairly regularly to keep the textbook fresh. The applications within the chapter are accessible and can be used fairly easily over a couple of editions.

The textbook does not necessarily explain the derivation of some of the formulae and this will need to be augmented by the instructor in class discussion. What is beneficial is that there are multiple ways that a topic is discussed using graphs, calculations and explanations of the results. Statistics textbooks have to cover a wide variety of topics with a fair amount of depth. To do this concisely is difficult. There is a fine line between being concise and clear, which this textbook does well, and being somewhat dry. It may be up to the instructor to bring case studies into the readings we are going through the topics rather than wait until the end of the chapter.

The textbook uses standard notation and terminology. The heading section of each chapter is closely tied to topics that are covered. The end of chapter problems and the statistical literacy applications are closely tied to the material covered.

The authors have done a good job treating each chapter as if they stand alone. The lack of connection to a past reference may create a sense of disconnect between the topics discussed

The text's "modularity" does make the flow of the material a little disconnected. If would be better if there was accountability of what a student should already have learnt in a different section. The earlier material is easy to find but not consistently referred to in the text.

I had no problem with the interface. The online version is more visually interesting than the pdf version.

I did not see any grammatical errors.

Cultural Relevance rating: 4

I am not sure how to evaluate this. The examples are mostly based on the American experience and the data alluded to mostly domestic. However, I am not sure if that creates a problem in understanding the methodology.

Overall, this textbook will cover most of the topics in a survey of statistics course.

Reviewed by Alexandra Verkhovtseva, Professor, Anoka-Ramsey Community College on 6/3/19

This is a comprehensive enough text, considering that it is not easy to create a comprehensive statistics textbook. It is suitable for an introductory statistics course for non-math majors. It contains twenty-one chapters, covering the wide range... read more

This is a comprehensive enough text, considering that it is not easy to create a comprehensive statistics textbook. It is suitable for an introductory statistics course for non-math majors. It contains twenty-one chapters, covering the wide range of intro stats topics (and some more), plus the case studies and the glossary.

The content is pretty accurate, I did not find any biases or errors.

The book contains fairly recent data presented in the form of exercises, examples and applications. The topics are up-to-date, and appropriate technology is used for examples, applications, and case studies.

The language is simple and clear, which is a good thing, since students are usually scared of this class, and instructors are looking for something to put them at ease. I would, however, try to make it a little more interesting, exciting, or may be even funny.

Consistency is good, the book has a great structure. I like how each chapter has prerequisites and learner outcomes, this gives students a good idea of what to expect. Material in this book is covered in good detail.

The text can be easily divided into sub-sections, some of which can be omitted if needed. The chapter on regression is covered towards the end (chapter 14), but part of it can be covered sooner in the course.

The book contains well organized chapters that makes reading through easy and understandable. The order of chapters and sections is clear and logical.

The online version has many functions and is easy to navigate. This book also comes with a PDF version. There is no distortion of images or charts. The text is clean and clear, the examples provided contain appropriate format of data presentation.

No grammatical errors found.

The text uses simple and clear language, which is helpful for non-native speakers. I would include more culturally-relevant examples and case studies. Overall, good text.

In all, this book is a good learning experience. It contains tools and techniques that free and easy to use and also easy to modify for both, students and instructors. I very much appreciate this opportunity to use this textbook at no cost for our students.

Reviewed by Dabrina Dutcher, Assistant Professor, Bucknell University on 3/4/19

This is a reasonably thorough first-semester statistics book for most classes. It would have worked well for the general statistics courses I have taught in the past but is not as suitable for specialized introductory statistics courses for... read more

This is a reasonably thorough first-semester statistics book for most classes. It would have worked well for the general statistics courses I have taught in the past but is not as suitable for specialized introductory statistics courses for engineers or business applications. That is OK, they have separate texts for that! The only sections that feel somewhat light in terms of content are the confidence intervals and ANOVA sections. Given that these topics are often sort of crammed in at the end of many introductory classes, that might not be problematic for many instructors. It should also be pointed out that while there are a couple of chapters on probability, this book spends presents most formulas as "black boxes" rather than worry about the derivation or origin of the formulas. The probability sections do not include any significant combinatorics work, which is sometimes included at this level.

I did not find any errors in the formulas presented but I did not work many end-of-chapter problems to gauge the accuracy of their answers.

There isn't much changing in the introductory stats world, so I have no concerns about the book becoming outdated rapidly. The examples and problems still feel relevant and reasonably modern. My only concern is that the statistical tool most often referenced in the book are TI-83/84 type calculators. As students increasingly buy TI-89s or Inspires, these sections of the book may lose relevance faster than other parts.

Solid. The book gives a list of key terms and their definitions at the end of each chapter which is a nice feature. It also has a formula review at the end of each chapter. I can imagine that these are heavily used by students when studying! Formulas are easy to find and read and are well defined. There are a few areas that I might have found frustrating as a student. For example, the explanation for the difference in formulas for a population vs sample standard deviation is quite weak. Again, this is a book that focuses on sort of a "black-box" approach but you may have to supplement such sections for some students.

I did not detect any problems with inconsistent symbol use or switches in terminology.

Modularity rating: 3

This low rating should not be taken as an indicator of an issue with this book but would be true of virtually any statistics book. Different books still use different variable symbols even for basic calculated statistics. So trying to use a chapter of this book without some sort of symbol/variable cheat-sheet would likely be frustrating to the students.

However, I think it would be possible to skip some chapters or use the chapters in a different order without any loss of functionality.

This book uses a very standard order for the material. The chapter on regressions comes later than it does in some texts but it doesn't really matter since that chapter never seems to fit smoothly anywhere.

There are numerous end of chapter problems, some with answers, available in this book. I'm vacillating on whether these problems would be more useful if they were distributed after each relevant section or are better clumped at the end of the whole chapter. That might be a matter of individual preference.

I did not detect any problems.

I found no errors. However, there were several sections where the punctuation seemed non-ideal. This did not affect the over-all useability of the book though

I'm not sure how well this book would work internationally as many of the examples contain domestic (American) references. However, I did not see anything offensive or biased in the book.

Reviewed by Ilgin Sager, Assistant Professor, University of Missouri - St. Louis on 1/14/19

As the title implies, this is a brief introduction textbook. It covers the fundamental of the introductory statistics, however not a comprehensive text on the subject. A teacher can use this book as the sole text of an introductory statistics.... read more

As the title implies, this is a brief introduction textbook. It covers the fundamental of the introductory statistics, however not a comprehensive text on the subject. A teacher can use this book as the sole text of an introductory statistics. The prose format of definitions and theorems make theoretical concepts accessible to non-math major students. The textbook covers all chapters required in this level course.

It is accurate; the subject matter in the examples to be up to date, is timeless and wouldn't need to be revised in future editions; there is no error except a few typographical errors. There are no logic errors or incorrect explanations.

This text will remain up to date for a long time since it has timeless examples and exercises, it wouldn't be outdated. The information is presented clearly with a simple way and the exercises are beneficial to follow the information.

The material is presented in a clear, concise manner. The text is easy readable for the first time statistics student.

The structure of the text is very consistent. Topics are presented with examples, followed by exercises. Problem sets are appropriate for the level of learner.

When the earlier matters need to be referenced, it is easy to find; no trouble reading the book and finding results, it has a consistent scheme. This book is set very well in sections.

The text presents the information in a logical order.

The learner can easily follow up the material; there is no interface problem.

There is no logic errors and incorrect explanations, a few typographical errors is just to be ignored.

Not applicable for this textbook.

Reviewed by Suhwon Lee, Associate Teaching Professor, University of Missouri on 6/19/18

This book is pretty comprehensive for being a brief introductory book. This book covers all necessary content areas for an introduction to Statistics course for non-math majors. The text book provides an effective index, plenty of exercises,... read more

This book is pretty comprehensive for being a brief introductory book. This book covers all necessary content areas for an introduction to Statistics course for non-math majors. The text book provides an effective index, plenty of exercises, review questions, and practice tests. It provides references and case studies. The glossary and index section is very helpful for students and can be used as a great resource.

Content appears to be accurate throughout. Being an introductory book, the book is unbiased and straight to the point. The terminology is standard.

The content in textbook is up to date. It will be very easy to update it or make changes at any point in time because of the well-structured contents in the textbook.

The author does a great job of explaining nearly every new term or concept. The book is easy to follow, clear and concise. The graphics are good to follow. The language in the book is easily understandable. I found most instructions in the book to be very detailed and clear for students to follow.

Overall consistency is good. It is consistent in terms of terminology and framework. The writing is straightforward and standardized throughout the text and it makes reading easier.

The authors do a great job of partitioning the text and labeling sections with appropriate headings. The table of contents is well organized and easily divisible into reading sections and it can be assigned at different points within the course.

Organization/Structure/Flow rating: 4

Overall, the topics are arranged in an order that follows natural progression in a statistics course with some exception. They are addressed logically and given adequate coverage.

The text is free of any issues. There are no navigation problems nor any display issues.

The text contains no grammatical errors.

The text is not culturally insensitive or offensive in any way most of time. Some examples might need to consider citing the sources or use differently to reflect current inclusive teaching strategies.

Overall, it's well-written and good recourse to be an introduction to statistical methods. Some materials may not need to be covered in an one-semester course. Various examples and quizzes can be a great recourse for instructor.

Reviewed by Jenna Kowalski, Mathematics Instructor, Anoka-Ramsey Community College on 3/27/18

The text includes the introductory statistics topics covered in a college-level semester course. An effective index and glossary are included, with functional hyperlinks. read more

The text includes the introductory statistics topics covered in a college-level semester course. An effective index and glossary are included, with functional hyperlinks.

Content Accuracy rating: 3

The content of this text is accurate and error-free, based on a random sampling of various pages throughout the text. Several examples included information without formal citation, leading the reader to potential bias and discrimination. These examples should be corrected to reflect current values of inclusive teaching.

The text contains relevant information that is current and will not become outdated in the near future. The statistical formulas and calculations have been used for centuries. The examples are direct applications of the formulas and accurately assess the conceptual knowledge of the reader.

The text is very clear and direct with the language used. The jargon does require a basic mathematical and/or statistical foundation to interpret, but this foundational requirement should be met with course prerequisites and placement testing. Graphs, tables, and visual displays are clearly labeled.

The terminology and framework of the text is consistent. The hyperlinks are working effectively, and the glossary is valuable. Each chapter contains modules that begin with prerequisite information and upcoming learning objectives for mastery.

The modules are clearly defined and can be used in conjunction with other modules, or individually to exemplify a choice topic. With the prerequisite information stated, the reader understands what prior mathematical understanding is required to successfully use the module.

The topics are presented well, but I recommend placing Sampling Distributions, Advanced Graphs, and Research Design ahead of Probability in the text. I think this rearranged version of the index would better align with current Introductory Statistics texts. The structure is very organized with the prerequisite information stated and upcoming learner outcomes highlighted. Each module is well-defined.

Adding an option of returning to the previous page would be of great value to the reader. While progressing through the text systematically, this is not an issue, but when the reader chooses to skip modules and read select pages then returning to the previous state of information is not easily accessible.

No grammatical errors were found while reviewing select pages of this text at random.

Cultural Relevance rating: 3

Several examples contained data that were not formally cited. These examples need to be corrected to reflect current inclusive teaching strategies. For example, one question stated that “while men are XX times more likely to commit murder than women, …” This data should be cited, otherwise the information can be interpreted as biased and offensive.

An included solutions manual for the exercises would be valuable to educators who choose to use this text.

Reviewed by Zaki Kuruppalil, Associate Professor, Ohio University on 2/1/18

This is a comprehensive book on statistical methods, its settings and most importantly the interpretation of the results. With the advent of computers and software’s, complex statistical analysis can be done very easily. But the challenge is the... read more

This is a comprehensive book on statistical methods, its settings and most importantly the interpretation of the results. With the advent of computers and software’s, complex statistical analysis can be done very easily. But the challenge is the knowledge of how to set the case, setting parameters (for example confidence intervals) and knowing its implication on the interpretation of the results. If not done properly this could lead to deceptive inferences, inadvertently or purposely. This book does a great job in explaining the above using many examples and real world case studies. If you are looking for a book to learn and apply statistical methods, this is a great one. I think the author could consider revising the title of the book to reflect the above, as it is more than just an introduction to statistics, may be include the word such as practical guide.

The contents of the book seems accurate. Some plots and calculations were randomly selected and checked for accuracy.

The book topics are up to date and in my opinion, will not be obsolete in the near future. I think the smartest thing the author has done is, not tied the book with any particular software such as minitab or spss . No matter what the software is, standard deviation is calculated the same way as it is always. The only noticeable exception in this case was using the Java Applet for calculating Z values in page 261 and in page 416 an excerpt of SPSS analysis is provided for ANOVA calculations.

The contents and examples cited are clear and explained in simple language. Data analysis and presentation of the results including mathematical calculations, graphical explanation using charts, tables, figures etc are presented with clarity.

Terminology is consistant. Framework for each chapter seems consistent with each chapter beginning with a set of defined topics, and each of the topic divided into modules with each module having a set of learning objectives and prerequisite chapters.

The text book is divided into chapters with each chapter further divided into modules. Each of the modules have detailed learning objectives and prerequisite required. So you can extract a portion of the book and use it as a standalone to teach certain topics or as a learning guide to apply a relevant topic.

Presentation of the topics are well thought and are presented in a logical fashion as if it would be introduced to someone who is learning the contents. However, there are some issues with table of contents and page numbers, for example chapter 17 starts in page 597 not 598. Also some tables and figures does not have a number, for instance the graph shown in page 114 does not have a number. Also it would have been better if the chapter number was included in table and figure identification, for example Figure 4-5 . Also in some cases, for instance page 109, the figures and titles are in two different pages.

No major issues. Only suggestion would be, since each chapter has several modules, any means such as a header to trace back where you are currently, would certainly help.

Grammatical Errors rating: 4

Easy to read and phrased correctly in most cases. Minor grammatical errors such as missing prepositions etc. In some cases the author seems to have the habbit of using a period after the decimal. For instance page 464, 467 etc. For X = 1, Y' = (0.425)(1) + 0.785 = 1.21. For X = 2, Y' = (0.425)(2) + 0.785 = 1.64.

However it contains some statements (even though given as examples) that could be perceived as subjective, which the author could consider citing the sources. For example from page 11: Statistics include numerical facts and figures. For instance: • The largest earthquake measured 9.2 on the Richter scale. • Men are at least 10 times more likely than women to commit murder. • One in every 8 South Africans is HIV positive. • By the year 2020, there will be 15 people aged 65 and over for every new baby born.

Solutions for the exercises would be a great teaching resource to have

Reviewed by Randy Vander Wal, Professor, The Pennsylvania State University on 2/1/18

As a text for an introductory course, standard topics are covered. It was nice to see some topics such as power, sampling, research design and distribution free methods covered, as these are often omitted in abbreviated texts. Each module... read more

As a text for an introductory course, standard topics are covered. It was nice to see some topics such as power, sampling, research design and distribution free methods covered, as these are often omitted in abbreviated texts. Each module introduces the topic, has appropriate graphics, illustration or worked example(s) as appropriate and concluding with many exercises. An instructor’s manual is available by contacting the author. A comprehensive glossary provides definitions for all the major terms and concepts. The case studies give examples of practical applications of statistical analyses. Many of the case studies contain the actual raw data. To note is that the on-line e-book provides several calculators for the essential distributions and tests. These are provided in lieu of printed tables which are not included in the pdf. (Such tables are readily available on the web.)

The content is accurate and error free. Notation is standard and terminology is used accurately, as are the videos and verbal explanations therein. Online links work properly as do all the calculators. The text appears neutral and unbiased in subject and content.

The text achieves contemporary relevance by ending each section with a Statistical Literacy example, drawn from contemporary headlines and issues. Of course, the core topics are time proven. There is no obvious material that may become “dated”.

The text is very readable. While the pdf text may appear “sparse” by absence varied colored and inset boxes, pictures etc., the essential illustrations and descriptions are provided. Meanwhile for this same content the on-line version appears streamlined, uncluttered, enhancing the value of the active links. Moreover, the videos provide nice short segments of “active” instruction that are clear and concise. Despite being a mathematical text, the text is not overly burdened by formulas and numbers but rather has “readable feel”.

This terminology and symbol use are consistent throughout the text and with common use in the field. The pdf text and online version are also consistent by content, but with the online e-book offering much greater functionality.

The chapters and topics may be used in a selective manner. Certain chapters have no pre-requisite chapter and in all cases, those required are listed at the beginning of each module. It would be straightforward to select portions of the text and reorganize as needed. The online version is highly modular offering students both ease of navigation and selection of topics.

Chapter topics are arranged appropriately. In an introductory statistics course, there is a logical flow given the buildup to the normal distribution, concept of sampling distributions, confidence intervals, hypothesis testing, regression and additional parametric and non-parametric tests. The normal distribution is central to an introductory course. Necessary precursor topics are covered in this text, while its use in significance and hypothesis testing follow, and thereafter more advanced topics, including multi-factor ANOVA.

Each chapter is structured with several modules, each beginning with pre-requisite chapter(s), learning objectives and concluding with Statistical Literacy sections providing a self-check question addressing the core concept, along with answer, followed by an extensive problem set. The clear and concise learning objectives will be of benefit to students and the course instructor. No solutions or answer key is provided to students. An instructor’s manual is available by request.

The on-line interface works well. In fact, I was pleasantly surprised by its options and functionality. The pdf appears somewhat sparse by comparison to publisher texts, lacking pictures, colored boxes, etc. But the on-line version has many active links providing definitions and graphic illustrations for key terms and topics. This can really facilitate learning as making such “refreshers” integral to the new material. Most sections also have short videos that are professionally done, with narration and smooth graphics. In this way, the text is interactive and flexible, offering varied tools for students. To note is that the interactive e-book works for both IOS and OS X.

The text in pdf form appeared to free of grammatical errors, as did the on-line version, text, graphics and videos.

This text contains no culturally insensitive or offensive content. The focus of the text is on concepts and explanation.

The text would be a great resource for students. The full content would be ambitious for a 1-semester course, such use would be unlikely. The text is clearly geared towards students with no statistics background nor calculus. The text could be used in two styles of course. For 1st year students early chapters on graphs and distributions would be the starting point, omitting later chapters on Chi-square, transformations, distribution-free and size effect chapters. Alternatively, for upper level students the introductory chapters could be bypassed with the latter chapters then covered to completion.

This text adopts a descriptive style of presentation with topics well and fully explained, much like the “Dummy series”. For this, it may seem a bit “wordy”, but this can well serve students and notably it complements powerpoint slides that are generally sparse on written content. This text could be used as the primary text, for regular lectures, or as reference for a “flipped” class. The e-book videos are an enabling tool if this approach is adopted.

Reviewed by David jabon, Associate Professor, DePaul University on 8/15/17

This text covers all the standard topics in a semester long introductory course in statistics. It is particularly well indexed and very easy to navigate. There is comprehensive hyperlinked glossary. read more

This text covers all the standard topics in a semester long introductory course in statistics. It is particularly well indexed and very easy to navigate. There is comprehensive hyperlinked glossary.

The material is completely accurate. There are no errors. The terminology is standard with one exception: the book calls what most people call the interquartile range, the H-spread in a number of places. Ideally, the term "interquartile range" would be used in place of every reference to "H-spread." "Interquartile range" is simply a better, more descriptive term of the concept that it describes. It is also more commonly used nowadays.

This book came out a number of years ago, but the material is still up to date. Some more recent case studies have been added.

The writing is very clear. There are also videos for almost every section. The section on boxplots uses a lot of technical terms that I don't find are very helpful for my students (hinge, H-spread, upper adjacent value).

The text is internally consistent with one exception that I noted (the use of the synonymous words "H-spread" and "interquartile range").

The text book is brokenly into very short sections, almost to a fault. Each section is at most two pages long. However at the end of each of these sections there are a few multiple choice questions to test yourself. These questions are a very appealing feature of the text.

The organization, in particular the ordering of the topics, is rather standard with a few exceptions. Boxplots are introduced in Chapter II before the discussion of measures of center and dispersion. Most books introduce them as part of discussion of summaries of data using measure of center and dispersion. Some statistics instructors may not like the way the text lumps all of the sampling distributions in a single chapter (sampling distribution of mean, sampling distribution for the difference of means, sampling distribution of a proportion, sampling distribution of r). I have tried this approach, and I now like this approach. But it is a very challenging chapter for students.

The book's interface has no features that distracted me. Overall the text is very clean and spare, with no additional distracting visual elements.

The book contains no grammatical errors.

The book's cultural relevance comes out in the case studies. As of this writing there are 33 such case studies, and they cover a wide range of issues from health to racial, ethnic, and gender disparity.

Each chapter as a nice set of exercises with selected answers. The thirty three case studies are excellent and can be supplement with some other online case studies. An instructor's manual and PowerPoint slides can be obtained by emailing the author. There are direct links to online simulations within the text. This text is very high quality textbook in every way.

Table of Contents

  • 1. Introduction
  • 2. Graphing Distributions
  • 3. Summarizing Distributions
  • 4. Describing Bivariate Data
  • 5. Probability
  • 6. Research Design
  • 7. Normal Distributions
  • 8. Advanced Graphs
  • 9. Sampling Distributions
  • 10. Estimation
  • 11. Logic of Hypothesis Testing
  • 12. Testing Means
  • 14. Regression
  • 15. Analysis of Variance
  • 16. Transformations
  • 17. Chi Square
  • 18. Distribution-Free Tests
  • 19. Effect Size
  • 20. Case Studies
  • 21. Glossary

Ancillary Material

  • Ancillary materials are available by contacting the author or publisher .

About the Book

Introduction to Statistics is a resource for learning and teaching introductory statistics. This work is in the public domain. Therefore, it can be copied and reproduced without limitation. However, we would appreciate a citation where possible. Please cite as: Online Statistics Education: A Multimedia Course of Study (http://onlinestatbook.com/). Project Leader: David M. Lane, Rice University. Instructor's manual, PowerPoint Slides, and additional questions are available.

About the Contributors

David Lane is an Associate Professor in the Departments of Psychology, Statistics, and Management at the Rice University. Lane is the principal developer of this resource although many others have made substantial contributions. This site was developed at Rice University, University of Houston-Clear Lake, and Tufts University.

Contribute to this Page

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Ann Card Anaesth
  • v.22(3); Jul-Sep 2019

Selection of Appropriate Statistical Methods for Data Analysis

Prabhaker mishra.

Department of Biostatistics and Health Informatics, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Chandra Mani Pandey

Uttam singh, amit keshri.

1 Department of Neuro-otology, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Mayilvaganan Sabaretnam

2 Department of Endocrine Surgery, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

In biostatistics, for each of the specific situation, statistical methods are available for analysis and interpretation of the data. To select the appropriate statistical method, one need to know the assumption and conditions of the statistical methods, so that proper statistical method can be selected for data analysis. Two main statistical methods are used in data analysis: descriptive statistics, which summarizes data using indexes such as mean and median and another is inferential statistics, which draw conclusions from data using statistical tests such as student's t -test. Selection of appropriate statistical method depends on the following three things: Aim and objective of the study, Type and distribution of the data used, and Nature of the observations (paired/unpaired). All type of statistical methods that are used to compare the means are called parametric while statistical methods used to compare other than means (ex-median/mean ranks/proportions) are called nonparametric methods. In the present article, we have discussed the parametric and non-parametric methods, their assumptions, and how to select appropriate statistical methods for analysis and interpretation of the biomedical data.

Introduction

Selection of appropriate statistical method is very important step in analysis of biomedical data. A wrong selection of the statistical method not only creates some serious problem during the interpretation of the findings but also affects the conclusion of the study. In statistics, for each specific situation, statistical methods are available to analysis and interpretation of the data. To select the appropriate statistical method, one need to know the assumption and conditions of the statistical methods, so that proper statistical method can be selected for data analysis.[ 1 ] Other than knowledge of the statistical methods, another very important aspect is nature and type of the data collected and objective of the study because as per objective, corresponding statistical methods are selected which are suitable on given data. Practice of wrong or inappropriate statistical method is a common phenomenon in the published articles in biomedical research. Incorrect statistical methods can be seen in many conditions like use of unpaired t -test on paired data or use of parametric test for the data which does not follow the normal distribution, etc., At present, many statistical software like SPSS, R, Stata, and SAS are available and using these softwares, one can easily perform the statistical analysis but selection of appropriate statistical test is still a difficult task for the biomedical researchers especially those with nonstatistical background.[ 2 ] Two main statistical methods are used in data analysis: descriptive statistics, which summarizes data using indexes such as mean, median, standard deviation and another is inferential statistics, which draws conclusions from data using statistical tests such as student's t-test, ANOVA test, etc.[ 3 ]

Factors Influencing Selection of Statistical Methods

Selection of appropriate statistical method depends on the following three things: Aim and objective of the study, Type and distribution of the data used, and Nature of the observations (paired/unpaired).

Aim and objective of the study

Selection of statistical test depends upon our aim and objective of the study. Suppose our objective is to find out the predictors of the outcome variable, then regression analysis is used while to compare the means between two independent samples, unpaired samples t-test is used.

Type and distribution of the data used

For the same objective, selection of the statistical test is varying as per data types. For the nominal, ordinal, discrete data, we use nonparametric methods while for continuous data, parametric methods as well as nonparametric methods are used.[ 4 ] For example, in the regression analysis, when our outcome variable is categorical, logistic regression while for the continuous variable, linear regression model is used. The choice of the most appropriate representative measure for continuous variable is dependent on how the values are distributed. If continuous variable follows normal distribution, mean is the representative measure while for non-normal data, median is considered as the most appropriate representative measure of the data set. Similarly in the categorical data, proportion (percentage) while for the ranking/ordinal data, mean ranks are our representative measure. In the inferential statistics, hypothesis is constructed using these measures and further in the hypothesis testing, these measures are used to compare between/among the groups to calculate significance level. Suppose we want to compare the diastolic blood pressure (DBP) between three age groups (years) (<30, 30--50, >50). If our DBP variable is normally distributed, mean value is our representative measure and null hypothesis stated that mean DB P values of the three age groups are statistically equal. In case of non-normal DBP variable, median value is our representative measure and null hypothesis stated that distribution of the DB P values among three age groups are statistically equal. In above example, one-way ANOVA test is used to compare the means when DBP follows normal distribution while Kruskal--Wallis H tests/median tests are used to compare the distribution of DBP among three age groups when DBP follows non-normal distribution. Similarly, suppose we want to compare the mean arterial pressure (MAP) between treatment and control groups, if our MAP variable follows normal distribution, independent samples t-test while in case follow non-normal distribution, Mann--Whitney U test are used to compare the MAP between the treatment and control groups.

Observations are paired or unpaired

Another important point in selection of the statistical test is to assess whether data is paired (same subjects are measures at different time points or using different methods) or unpaired (each group have different subject). For example, to compare the means between two groups, when data is paired, paired samples t-test while for unpaired (independent) data, independent samples t-test is used.

Concept of Parametric and Nonparametric Methods

Inferential statistical methods fall into two possible categorizations: parametric and nonparametric. All type of statistical methods those are used to compare the means are called parametric while statistical methods used to compare other than means (ex-median/mean ranks/proportions) are called nonparametric methods. Parametric tests rely on the assumption that the variable is continuous and follow approximate normally distributed. When data is continuous with non-normal distribution or any other types of data other than continuous variable, nonparametric methods are used. Fortunately, the most frequently used parametric methods have nonparametric counterparts. This can be useful when the assumptions of a parametric test are violated and we can choose the nonparametric alternative as a backup analysis.[ 3 ]

Selection between Parametric and Nonparametric Methods

All type of the t -test, F test are considered parametric test. Student's t -test (one sample t -test, independent samples t -test, paired samples t -test) is used to compare the means between two groups while F test (one-way ANOVA, repeated measures ANOVA, etc.) which is the extension of the student's t -test are used to compare the means among three or more groups. Similarly, Pearson correlation coefficient, linear regression is also considered parametric methods, is used to calculate using mean and standard deviation of the data. For above parametric methods, counterpart nonparametric methods are also available. For example, Mann--Whitney U test and Wilcoxon test are used for student's t -test while Kruskal--Wallis H test, median test, and Friedman test are alternative methods of the F test (ANOVA). Similarly, Spearman rank correlation coefficient and log linear regression are used as nonparametric method of the Pearson correlation and linear regression, respectively.[ 3 , 5 , 6 , 7 , 8 ] Parametric and their counterpart nonparametric methods are given in Table 1 .

Parametric and their Alternative Nonparametric Methods

Statistical Methods to Compare the Proportions

The statistical methods used to compare the proportions are considered nonparametric methods and these methods have no alternative parametric methods. Pearson Chi-square test and Fisher exact test is used to compare the proportions between two or more independent groups. To test the change in proportions between two paired groups, McNemar test is used while Cochran Q test is used for the same objective among three or more paired groups. Z test for proportions is used to compare the proportions between two groups for independent as well as dependent groups.[ 6 , 7 , 8 ] [ Table 2 ].

Other Statistical Methods

Intraclass correlation coefficient is calculated when both pre-post data are in continuous scale. Unweighted and weighted Kappa statistics are used to test the absolute agreement between two methods measured on the same subjects (pre-post) for nominal and ordinal data, respectively. There are some methods those are either semiparametric or nonparametric and these methods, counterpart parametric methods, are not available. Methods are logistic regression analysis, survival analysis, and receiver operating characteristics curve.[ 9 ] Logistic regression analysis is used to predict the categorical outcome variable using independent variable(s). Survival analysis is used to calculate the survival time/survival probability, comparison of the survival time between the groups (Kaplan--Meier method) as well as to identify the predictors of the survival time of the subjects/patients (Cox regression analysis). Receiver operating characteristics (ROC) curve is used to calculate area under curve (AUC) and cutoff values for given continuous variable with corresponding diagnostic accuracy using categorical outcome variable. Diagnostic accuracy of the test method is calculated as compared with another method (usually as compared with gold standard method). Sensitivity (proportion of the detected disease cases from the actual disease cases), specificity (proportion of the detected non-disease subjects from the actual non-disease subjects), overall accuracy (proportion of agreement between test and gold standard methods to correctly detect the disease and non-disease subjects) are the key measures used to assess the diagnostic accuracy of the test method. Other measures like false negative rate (1-sensitivity), false-positive rate (1-specificity), likelihood ratio positive (sensitivity/false-positive rate), likelihood ratio negative (false-negative rate/Specificity), positive predictive value (proportion of correctly detected disease cases by the test variable out of total detected disease cases by the itself), and negative predictive value (proportion of correctly detected non-disease subjects by test variable out of total non-disease subjects detected by the itself) are also used to calculate the diagnostic accuracy of the test method.[ 3 , 6 , 10 ] [ Table 3 ].

Semi-parametric and non-parametric methods

Advantage and Disadvantages of Nonparametric Methods over Parametric Methods and Sample Size Issues

Parametric methods are stronger test to detect the difference between the groups as compared with its counterpart nonparametric methods, although due to some strict assumptions, including normality of the data and sample size, we cannot use parametric test in every situation and resultant its alternative nonparametric methods are used. As mean is used to compare parametric method, which is severally affected by the outliers while in nonparametric method, median/mean rank is our representative measures which do not affect from the outliers.[ 11 ]

In parametric methods like student's t-test and ANOVA test, significance level is calculated using mean and standard deviation, and to calculate standard deviation in each group, at least two observations are required. If every group did not have at least two observations, its alternative nonparametric method to be selected works through comparisons of the mean ranks of the data.

For small sample size (average ≤15 observations per group), normality testing methods are less sensitive about non-normality and there is chance to detect normality despite having non-normal data. It is recommended that when sample size is small, only on highly normally distributed data, parametric method should be used otherwise corresponding nonparametric methods should be preferred. Similarly on sufficient or large sample size (average >15 observations per group), most of the statistical methods are highly sensitive about non-normality and there is chance to wrongly detect non-normality, despite having normal data. It is recommended that when sample size is sufficient, only on highly non-normal data, nonparametric method should be used otherwise corresponding parametric methods should be preferred.[ 12 ]

Minimum Sample Size Required for Statistical Methods

To detect the significant difference between the means/medians/mean ranks/proportions, at minimum level of confidence (usually 95%) and power of the test (usually 80%), how many individuals/subjects (sample size) are required depends on the detected effect size. The effect size and corresponding required sample size are inversely proportional to each other, that is, on the same level of confidence and power of the test, when effect size is increasing, required sample size is decreasing. Summary is, no minimum or maximum sample size is fix for any particular statistical method and it is subject to estimate based on the given inputs including effect size, level of confidence, power of the study, etc., Only on the sufficient sample size, we can detect the difference significantly. In case lack of the sample size than actual required, our study will be under power to detect the given difference as well as result would be statistically insignificant.

Impact of Wrong Selection of the Statistical Methods

As for each and every situation, there are specific statistical methods. Failing to select appropriate statistical method, our significance level as well as their conclusion is affected.[ 13 ] For example in a study, systolic blood pressure (mean ± SD) of the control (126.45 ± 8.85, n 1 =20) and treatment (121.85 ± 5.96, n 2 =20) group was compared using Independent samples t -test (correct practice). Result showed that mean difference between two groups was statistically insignificant ( P = 0.061) while on the same data, paired samples t -test (incorrect practice) indicated that mean difference was statistically significant ( P = 0.011). Due to incorrect practice, we detected the statistically significant difference between the groups although actually difference did not exist.

Conclusions

Selection of the appropriate statistical methods is very important for the quality research. It is important that a researcher knows the basic concepts of the statistical methods used to conduct research study that produce a valid and reliable results. There are various statistical methods that can be used in different situations. Each test makes particular assumptions about the data. These assumptions should be taken into consideration when deciding which the most appropriate test is. Wrong or inappropriate use of statistical methods may lead to defective conclusions, finally would harm the evidence-based practices. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important for improving and producing quality biomedical research. However, it is extremely difficult for a biomedical researchers or academician to learn the entire statistical methods. Therefore, at least basic knowledge is very important so that appropriate selection of the statistical methods can decide as well as correct/incorrect practices can be recognized in the published research. There are many softwares available online as well as offline for analyzing the data, although it is fact that which set of statistical tests are appropriate for the given data and study objective is still very difficult for the researchers to understand. Therefore, since planning of the study to data collection, analysis and finally in the review process, proper consultation from statistical experts may be an alternative option and can reduce the burden from the clinicians to go in depth of statistics which required lots of time and effort and ultimately affect their clinical works. These practices not only ensure the correct and appropriate use of the biostatistical methods in the research but also ensure the highest quality of statistical reporting in the research and journals.[ 14 ]

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Acknowledgements

Authors would like to express their deep and sincere gratitude to Dr. Prabhat Tiwari, Professor, Department of Anaesthesiology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, for his encouragement to write this article. His critical reviews and suggestions were very useful for improvement in the article.

Table of Contents

Types of statistical analysis, importance of statistical analysis, benefits of statistical analysis, statistical analysis process, statistical analysis methods, statistical analysis software, statistical analysis examples, career in statistical analysis, choose the right program, become proficient in statistics today, what is statistical analysis types, methods and examples.

What Is Statistical Analysis?

Statistical analysis is the process of collecting and analyzing data in order to discern patterns and trends. It is a method for removing bias from evaluating data by employing numerical analysis. This technique is useful for collecting the interpretations of research, developing statistical models, and planning surveys and studies.

Statistical analysis is a scientific tool in AI and ML that helps collect and analyze large amounts of data to identify common patterns and trends to convert them into meaningful information. In simple words, statistical analysis is a data analysis tool that helps draw meaningful conclusions from raw and unstructured data. 

The conclusions are drawn using statistical analysis facilitating decision-making and helping businesses make future predictions on the basis of past trends. It can be defined as a science of collecting and analyzing data to identify trends and patterns and presenting them. Statistical analysis involves working with numbers and is used by businesses and other institutions to make use of data to derive meaningful information. 

Given below are the 6 types of statistical analysis:

Descriptive Analysis

Descriptive statistical analysis involves collecting, interpreting, analyzing, and summarizing data to present them in the form of charts, graphs, and tables. Rather than drawing conclusions, it simply makes the complex data easy to read and understand.

Inferential Analysis

The inferential statistical analysis focuses on drawing meaningful conclusions on the basis of the data analyzed. It studies the relationship between different variables or makes predictions for the whole population.

Predictive Analysis

Predictive statistical analysis is a type of statistical analysis that analyzes data to derive past trends and predict future events on the basis of them. It uses machine learning algorithms, data mining , data modelling , and artificial intelligence to conduct the statistical analysis of data.

Prescriptive Analysis

The prescriptive analysis conducts the analysis of data and prescribes the best course of action based on the results. It is a type of statistical analysis that helps you make an informed decision. 

Exploratory Data Analysis

Exploratory analysis is similar to inferential analysis, but the difference is that it involves exploring the unknown data associations. It analyzes the potential relationships within the data. 

Causal Analysis

The causal statistical analysis focuses on determining the cause and effect relationship between different variables within the raw data. In simple words, it determines why something happens and its effect on other variables. This methodology can be used by businesses to determine the reason for failure. 

Statistical analysis eliminates unnecessary information and catalogs important data in an uncomplicated manner, making the monumental work of organizing inputs appear so serene. Once the data has been collected, statistical analysis may be utilized for a variety of purposes. Some of them are listed below:

  • The statistical analysis aids in summarizing enormous amounts of data into clearly digestible chunks.
  • The statistical analysis aids in the effective design of laboratory, field, and survey investigations.
  • Statistical analysis may help with solid and efficient planning in any subject of study.
  • Statistical analysis aid in establishing broad generalizations and forecasting how much of something will occur under particular conditions.
  • Statistical methods, which are effective tools for interpreting numerical data, are applied in practically every field of study. Statistical approaches have been created and are increasingly applied in physical and biological sciences, such as genetics.
  • Statistical approaches are used in the job of a businessman, a manufacturer, and a researcher. Statistics departments can be found in banks, insurance businesses, and government agencies.
  • A modern administrator, whether in the public or commercial sector, relies on statistical data to make correct decisions.
  • Politicians can utilize statistics to support and validate their claims while also explaining the issues they address.

Become a Data Science & Business Analytics Professional

  • 28% Annual Job Growth By 2026
  • 11.5 M Expected New Jobs For Data Science By 2026

Data Analyst

  • Industry-recognized Data Analyst Master’s certificate from Simplilearn
  • Dedicated live sessions by faculty of industry experts

Data Scientist

  • Industry-recognized Data Scientist Master’s certificate from Simplilearn

Here's what learners are saying regarding our programs:

Gayathri Ramesh

Gayathri Ramesh

Associate data engineer , publicis sapient.

The course was well structured and curated. The live classes were extremely helpful. They made learning more productive and interactive. The program helped me change my domain from a data analyst to an Associate Data Engineer.

A.Anthony Davis

A.Anthony Davis

Simplilearn has one of the best programs available online to earn real-world skills that are in demand worldwide. I just completed the Machine Learning Advanced course, and the LMS was excellent.

Statistical analysis can be called a boon to mankind and has many benefits for both individuals and organizations. Given below are some of the reasons why you should consider investing in statistical analysis:

  • It can help you determine the monthly, quarterly, yearly figures of sales profits, and costs making it easier to make your decisions.
  • It can help you make informed and correct decisions.
  • It can help you identify the problem or cause of the failure and make corrections. For example, it can identify the reason for an increase in total costs and help you cut the wasteful expenses.
  • It can help you conduct market analysis and make an effective marketing and sales strategy.
  • It helps improve the efficiency of different processes.

Given below are the 5 steps to conduct a statistical analysis that you should follow:

  • Step 1: Identify and describe the nature of the data that you are supposed to analyze.
  • Step 2: The next step is to establish a relation between the data analyzed and the sample population to which the data belongs. 
  • Step 3: The third step is to create a model that clearly presents and summarizes the relationship between the population and the data.
  • Step 4: Prove if the model is valid or not.
  • Step 5: Use predictive analysis to predict future trends and events likely to happen. 

Although there are various methods used to perform data analysis, given below are the 5 most used and popular methods of statistical analysis:

Mean or average mean is one of the most popular methods of statistical analysis. Mean determines the overall trend of the data and is very simple to calculate. Mean is calculated by summing the numbers in the data set together and then dividing it by the number of data points. Despite the ease of calculation and its benefits, it is not advisable to resort to mean as the only statistical indicator as it can result in inaccurate decision making. 

Standard Deviation

Standard deviation is another very widely used statistical tool or method. It analyzes the deviation of different data points from the mean of the entire data set. It determines how data of the data set is spread around the mean. You can use it to decide whether the research outcomes can be generalized or not. 

Regression is a statistical tool that helps determine the cause and effect relationship between the variables. It determines the relationship between a dependent and an independent variable. It is generally used to predict future trends and events.

Hypothesis Testing

Hypothesis testing can be used to test the validity or trueness of a conclusion or argument against a data set. The hypothesis is an assumption made at the beginning of the research and can hold or be false based on the analysis results. 

Sample Size Determination

Sample size determination or data sampling is a technique used to derive a sample from the entire population, which is representative of the population. This method is used when the size of the population is very large. You can choose from among the various data sampling techniques such as snowball sampling, convenience sampling, and random sampling. 

Everyone can't perform very complex statistical calculations with accuracy making statistical analysis a time-consuming and costly process. Statistical software has become a very important tool for companies to perform their data analysis. The software uses Artificial Intelligence and Machine Learning to perform complex calculations, identify trends and patterns, and create charts, graphs, and tables accurately within minutes. 

Look at the standard deviation sample calculation given below to understand more about statistical analysis.

The weights of 5 pizza bases in cms are as follows:

Calculation of Mean = (9+2+5+4+12)/5 = 32/5 = 6.4

Calculation of mean of squared mean deviation = (6.76+19.36+1.96+5.76+31.36)/5 = 13.04

Sample Variance = 13.04

Standard deviation = √13.04 = 3.611

A Statistical Analyst's career path is determined by the industry in which they work. Anyone interested in becoming a Data Analyst may usually enter the profession and qualify for entry-level Data Analyst positions right out of high school or a certificate program — potentially with a Bachelor's degree in statistics, computer science, or mathematics. Some people go into data analysis from a similar sector such as business, economics, or even the social sciences, usually by updating their skills mid-career with a statistical analytics course.

Statistical Analyst is also a great way to get started in the normally more complex area of data science. A Data Scientist is generally a more senior role than a Data Analyst since it is more strategic in nature and necessitates a more highly developed set of technical abilities, such as knowledge of multiple statistical tools, programming languages, and predictive analytics models.

Aspiring Data Scientists and Statistical Analysts generally begin their careers by learning a programming language such as R or SQL. Following that, they must learn how to create databases, do basic analysis, and make visuals using applications such as Tableau. However, not every Statistical Analyst will need to know how to do all of these things, but if you want to advance in your profession, you should be able to do them all.

Based on your industry and the sort of work you do, you may opt to study Python or R, become an expert at data cleaning, or focus on developing complicated statistical models.

You could also learn a little bit of everything, which might help you take on a leadership role and advance to the position of Senior Data Analyst. A Senior Statistical Analyst with vast and deep knowledge might take on a leadership role leading a team of other Statistical Analysts. Statistical Analysts with extra skill training may be able to advance to Data Scientists or other more senior data analytics positions.

Supercharge your career in AI and ML with Simplilearn's comprehensive courses. Gain the skills and knowledge to transform industries and unleash your true potential. Enroll now and unlock limitless possibilities!

Program Name AI Engineer Post Graduate Program In Artificial Intelligence Post Graduate Program In Artificial Intelligence Geo All Geos All Geos IN/ROW University Simplilearn Purdue Caltech Course Duration 11 Months 11 Months 11 Months Coding Experience Required Basic Basic No Skills You Will Learn 10+ skills including data structure, data manipulation, NumPy, Scikit-Learn, Tableau and more. 16+ skills including chatbots, NLP, Python, Keras and more. 8+ skills including Supervised & Unsupervised Learning Deep Learning Data Visualization, and more. Additional Benefits Get access to exclusive Hackathons, Masterclasses and Ask-Me-Anything sessions by IBM Applied learning via 3 Capstone and 12 Industry-relevant Projects Purdue Alumni Association Membership Free IIMJobs Pro-Membership of 6 months Resume Building Assistance Upto 14 CEU Credits Caltech CTME Circle Membership Cost $$ $$$$ $$$$ Explore Program Explore Program Explore Program

Hope this article assisted you in understanding the importance of statistical analysis in every sphere of life. Artificial Intelligence (AI) can help you perform statistical analysis and data analysis very effectively and efficiently. 

If you are a science wizard and fascinated by the role of AI in statistical analysis, check out this amazing Caltech Post Graduate Program in AI & ML course in collaboration with Caltech. With a comprehensive syllabus and real-life projects, this course is one of the most popular courses and will help you with all that you need to know about Artificial Intelligence. 

Our AI & Machine Learning Courses Duration And Fees

AI & Machine Learning Courses typically range from a few weeks to several months, with fees varying based on program and institution.

Get Free Certifications with free video courses

Introduction to Data Analytics Course

Data Science & Business Analytics

Introduction to Data Analytics Course

Introduction to Data Science

Introduction to Data Science

Learn from Industry Experts with free Masterclasses

Ai & machine learning.

Unlock Gen AI Skills to Rule the Industry in 2024

Career Masterclass: How to Build the Best Fantasy League Team Using Gen AI Tools

The Art of Resume Writing: Techniques for Landing Your Dream Job

Recommended Reads

Free eBook: Guide To The CCBA And CBAP Certifications

Understanding Statistical Process Control (SPC) and Top Applications

A Complete Guide on the Types of Statistical Studies

Digital Marketing Salary Guide 2021

What Is Data Analysis: A Comprehensive Guide

A Complete Guide to Get a Grasp of Time Series Analysis

Get Affiliated Certifications with Live Class programs

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Federal Statistical Research Data Center

In  partnership with the U.S. Census Bureau,  IRiSS hosts one of the nation’s 31 Federal Statistical Research Data Centers, providing access to restricted data for researchers from Stanford as well as nearby universities . Approved projects can access microdata from the Census Bureau's economic and demographic surveys and censuses, restricted-use data from NCHS and AHRQ, and administrative data from partnered federal, state, and local governments.

The Stanford RDC administrator position is currently open. Please direct any questions to stephanie.m.bailey [at] census.gov (Stephanie Bailey) , Administrator for the Yale Research Data Center.

As an economist who studies interactions between workers and firms, I rely heavily on rich U.S. Census Bureau data, which is housed in the RDC system. Consequently, the vast majority of my work could not be done without access to an RDC. As a grad student, having easy and nearby access to the Stanford RDC, hosted by IRiSS, allowed me to start and develop this research agenda; the RDC was a key input into all works in my dissertation and ultimately allowed me to pursue a career in academia.

— Melanie Wallskog (Economics, '22)

Available data

Data center access fees, overview of the fsrdc (in pdf), curious about the fsrdc, what data are available in federal statistical research data centers.

The Federal Statistical Research Data Centers (RDCs) located across the country—managed and administered by the U.S. Census Bureau—make two types of restricted data accessible to approved researchers: survey and census data, and administrative data.

The former is largely comprised of data collected by the U.S. Census Bureau in the course of its surveys and censuses, such as the decennial census and the American Community Survey. As a whole, the content of these Census Bureau RDC data can be broadly categorized as demographic (e.g., decennial census, American Community Survey, Current Population Survey, American Housing Survey, National Crime Victimization Survey) and economic (e.g.., Longitudinal Business Database, economic censuses, Annual Survey of Manufactures, Survey of Business Owners, Longitudinal Firm Trade Transactions Database).

Read more .

  • About the New York Fed
  • Bank Leadership
  • Diversity and Inclusion
  • Communities We Serve
  • Board of Directors
  • Disclosures
  • Ethics and Conflicts of Interest
  • Annual Financial Statements
  • News & Events
  • Advisory Groups
  • Vendor Information
  • Holiday Schedule

At the New York Fed, our mission is to make the U.S. economy stronger and the financial system more stable for all segments of society. We do this by executing monetary policy, providing financial services, supervising banks and conducting research and providing expertise on issues that impact the nation and communities we serve.

New York Innovation Center

The New York Innovation Center bridges the worlds of finance, technology, and innovation and generates insights into high-value central bank-related opportunities.

Information Requests

Do you have a request for information and records? Learn how to submit it.

Gold Vault

Learn about the history of the New York Fed and central banking in the United States through articles, speeches, photos and video.

  • Markets & Policy Implementation
  • Reference Rates
  • Effective Federal Funds Rate
  • Overnight Bank Funding Rate
  • Secured Overnight Financing Rate
  • SOFR Averages & Index
  • Broad General Collateral Rate
  • Tri-Party General Collateral Rate
  • Desk Operations
  • Treasury Securities
  • Agency Mortgage-Backed Securities
  • Reverse Repos
  • Securities Lending
  • Central Bank Liquidity Swaps
  • System Open Market Account Holdings
  • Primary Dealer Statistics
  • Historical Transaction Data
  • Monetary Policy Implementation
  • Agency Commercial Mortgage-Backed Securities
  • Agency Debt Securities
  • Repos & Reverse Repos
  • Discount Window
  • Treasury Debt Auctions & Buybacks as Fiscal Agent
  • INTERNATIONAL MARKET OPERATIONS
  • Foreign Exchange
  • Foreign Reserves Management
  • Central Bank Swap Arrangements
  • Statements & Operating Policies
  • Survey of Primary Dealers
  • Survey of Market Participants
  • Annual Reports
  • Primary Dealers
  • Standing Repo Facility Counterparties
  • Reverse Repo Counterparties
  • Foreign Exchange Counterparties
  • Foreign Reserves Management Counterparties
  • Operational Readiness
  • Central Bank & International Account Services
  • Programs Archive
  • Economic Research
  • Consumer Expectations & Behavior
  • Survey of Consumer Expectations
  • Household Debt & Credit Report
  • Home Price Changes
  • Growth & Inflation
  • Equitable Growth Indicators
  • Multivariate Core Trend Inflation
  • New York Fed DSGE Model
  • New York Fed Staff Nowcast
  • R-star: Natural Rate of Interest
  • Labor Market
  • Labor Market for Recent College Graduates
  • Financial Stability
  • Corporate Bond Market Distress Index
  • Outlook-at-Risk
  • Treasury Term Premia
  • Yield Curve as a Leading Indicator
  • Banking Research Data Sets
  • Quarterly Trends for Consolidated U.S. Banking Organizations
  • Empire State Manufacturing Survey
  • Business Leaders Survey
  • Supplemental Survey Report
  • Regional Employment Trends
  • Early Benchmarked Employment Data
  • INTERNATIONAL ECONOMY
  • Global Supply Chain Pressure Index
  • Staff Economists
  • Visiting Scholars
  • Resident Scholars
  • PUBLICATIONS
  • Liberty Street Economics
  • Staff Reports
  • Economic Policy Review
  • RESEARCH CENTERS
  • Applied Macroeconomics & Econometrics Center (AMEC)
  • Center for Microeconomic Data (CMD)
  • Economic Indicators Calendar
  • Financial Institution Supervision
  • Regulations
  • Reporting Forms
  • Correspondence
  • Bank Applications
  • Community Reinvestment Act Exams
  • Frauds and Scams

As part of our core mission, we supervise and regulate financial institutions in the Second District. Our primary objective is to maintain a safe and competitive U.S. and global banking system.

The Governance & Culture Reform

The Governance & Culture Reform hub is designed to foster discussion about corporate governance and the reform of culture and behavior in the financial services industry.

Need to file a report with the New York Fed?

Need to file a report with the New York Fed? Here are all of the forms, instructions and other information related to regulatory and statistical reporting in one spot.

Frauds and Scams

The New York Fed works to protect consumers as well as provides information and resources on how to avoid and report specific scams.

  • Financial Services & Infrastructure
  • Services For Financial Institutions
  • Payment Services
  • Payment System Oversight
  • International Services, Seminars & Training
  • Tri-Party Repo Infrastructure Reform
  • Managing Foreign Exchange
  • Money Market Funds
  • Over-The-Counter Derivatives

The Federal Reserve Bank of New York works to promote sound and well-functioning financial systems and markets through its provision of industry and payment services, advancement of infrastructure reform in key markets and training and educational support to international institutions.

The New York Innovation Center

The New York Fed offers the Central Banking Seminar and several specialized courses for central bankers and financial supervisors.

Tri-party Infrastructure Reform

The New York Fed has been working with tri-party repo market participants to make changes to improve the resiliency of the market to financial stress.

  • Community Development & Education
  • Household Financial Well-being
  • Fed Communities
  • Fed Listens
  • Fed Small Business
  • Workforce Development
  • Other Community Development Work
  • High School Fed Challenge
  • College Fed Challenge
  • Teacher Professional Development
  • Classroom Visits
  • Museum & Learning Center Visits
  • Educational Comic Books
  • Economist Spotlight Series
  • Lesson Plans and Resources
  • Economic Education Calendar

Our Community Development Strategy

We are connecting emerging solutions with funding in three areas—health, household financial stability, and climate—to improve life for underserved communities. Learn more by reading our strategy.

Economic Inequality & Equitable Growth

The Economic Inequality & Equitable Growth hub is a collection of research, analysis and convenings to help better understand economic inequality.

Government and Culture Reform

Household Debt Rose by $184 Billion in Q1 2024; Delinquency Transition Rates Increased Across All Debt Types

NEW YORK — The Federal Reserve Bank of New York’s Center for Microeconomic Data today issued its Quarterly Report on Household Debt and Credit . The report shows total household debt increased by $184 billion (1.1%) in the first quarter of 2024, to $17.69 trillion. The report is based on data from the New York Fed’s nationally representative Consumer Credit Panel .

The New York Fed also issued an accompanying Liberty Street Economics blog post examining credit card utilization and its relationship with delinquency. The Quarterly Report also includes a one-page summary of key takeaways and their supporting data points.

“In the first quarter of 2024, credit card and auto loan transition rates into serious delinquency continued to rise across all age groups,” said Joelle Scally, Regional Economic Principal within the Household and Public Policy Research Division at the New York Fed. “An increasing number of borrowers missed credit card payments, revealing worsening financial distress among some households.”

Mortgage balances rose by $190 billion from the previous quarter and was $12.44 trillion at the end of March. Balances on home equity lines of credit (HELOC) increased by $16 billion, representing the eighth consecutive quarterly increase since Q1 2022, and now stand at $376 billion. Credit card balances decreased by $14 billion to $1.12 trillion. Other balances, which include retail cards and consumer loans, also decreased by $11 billion. Auto loan balances increased by $9 billion, continuing the upward trajectory seen since 2020, and now stand at $1.62 trillion.

Mortgage originations continued increasing at the same pace seen in the previous three quarters, and now stand at $403 billion. Aggregate limits on credit card accounts increased modestly by $63 billion, representing a 1.3% increase from the previous quarter. Limits on HELOC grew by $30 billion and have grown by 14% over the past two years, after 10 years of observed declines.

Aggregate delinquency rates increased in Q1 2024, with 3.2% of outstanding debt in some stage of delinquency at the end of March. Delinquency transition rates increased for all debt types. Annualized, approximately 8.9% of credit card balances and 7.9% of auto loans transitioned into delinquency. Delinquency transition rates for mortgages increased by 0.3 percentage points yet remain low by historic standards.

Household Debt and Credit Developments as of Q1 2024

*Change from Q4 2023 to Q1 2024 ** Change from Q1 2023 to Q1 2024

Flow into Serious Delinquency (90 days or more delinquent)

About the Report

The Federal Reserve Bank of New York’s Household Debt and Credit Report provides unique data and insight into the credit conditions and activity of U.S. consumers. Based on data from the New York Fed’s Consumer Credit Panel , a nationally representative sample drawn from anonymized Equifax credit data, the report provides a quarterly snapshot of household trends in borrowing and indebtedness, including data about mortgages, student loans, credit cards, auto loans and delinquencies. The report aims to help community groups, small businesses, state and local governments and the public to better understand, monitor, and respond to trends in borrowing and indebtedness at the household level. Sections of the report are presented as interactive graphs on the New York Fed’s  Household Debt and Credit Report web page  and the full report is available for download.

Close

  • Request a Speaker
  • International Seminars & Training
  • Governance & Culture Reform
  • Data Visualization
  • Economic Research Tracker
  • Markets Data APIs
  • Terms of Use

Federal Reserve Bank Seal

ScienceDaily

Century of statistical ecology reviewed

Crunching numbers isn't exactly how Neil Gilbert, a postdoctoral researcher at Michigan State University, envisioned a career in ecology.

"I think it's a little funny that I'm doing this statistical ecology work because I was always OK at math, but never particularly enjoyed it," he explained. "As an undergrad, I thought, I'll be an ecologist -- that means that I can be outside, looking at birds, that sort of thing."

As it turns out," he chuckled, "ecology is a very quantitative discipline."

Now, working in the Zipkin Quantitative Ecology lab, Gilbert is the lead author on a new article in a special collection of the journal Ecology that reviews the past century of statistical ecology .

Statistical ecology, or the study of ecological systems using mathematical equations, probability and empirical data, has grown over the last century. As increasingly large datasets and complex questions took center stage in ecological research, new tools and approaches were needed to properly address them.

To better understand how statistical ecology changed over the last century, Gilbert and his fellow authors examined a selection of 36 highly cited papers on statistical ecology -- all published in Ecology since its inception in 1920.

The team's paper examines work on statistical models across a range of ecological scales from individuals to populations, communities, ecosystems and beyond. The team also reviewed publications providing practical guidance on applying models. Gilbert noted that because, "many practicing ecologists lack extensive quantitative training," such publications are key to shaping studies.

Ecology is an advantageous place for such papers, because it is one of, "the first internationally important journals in the field. It has played an outsized role in publishing important work," said lab leader Elise Zipkin, a Red Cedar Distinguished Associate Professor in the Department of Integrative Biology.

"It has a reputation of publishing some of the most influential papers on the development and application of analytical techniques from the very beginning of modern ecological research."

The team found a persistent evolution of models and concepts in the field, especially over the past few decades, driven by refinements in techniques and exponential increases in computational power.

"Statistical ecology has exploded in the last 20 to 30 years because of advances in both data availability and the continued improvement of high-performance computing clusters," Gilbert explained.

Included among the 36 reviewed papers were a landmark 1945 study by Lee R. Dice on predicting the co-occurrence of species in space -- Ecology's most highly cited paper of all time -- and an influential 2002 paper led by Darryl MacKenzie on occupancy models. Ecologists use these models to identify the range and distribution of species in an environment.

Mackenzie's work on species detection and sampling, "played an outsized role in the study of species distributions," says Zipkin. MacKenzie's paper, which was cited more than 5,400 times, spawned various software packages that are now widely used by ecologists, she explained.

  • Environmental Issues
  • Computer Modeling
  • Mathematical Modeling
  • Origin of Life
  • Early Climate
  • Artificial intelligence
  • Computational genomics
  • Albert Einstein
  • Bioinformatics
  • Mathematical model
  • Water turbine
  • Numerical weather prediction

Story Source:

Materials provided by Michigan State University . Original written by Caleb Hess. Note: Content may be edited for style and length.

Journal Reference :

  • Neil A. Gilbert, Bruna R. Amaral, Olivia M. Smith, Peter J. Williams, Sydney Ceyzyk, Samuel Ayebare, Kayla L. Davis, Wendy Leuenberger, Jeffrey W. Doser, Elise F. Zipkin. A century of statistical Ecology . Ecology , 2024; DOI: 10.1002/ecy.4283

Cite This Page :

Explore More

  • Toward a Successful Vaccine for HIV
  • Highly Efficient Thermoelectric Materials
  • Toward Human Brain Gene Therapy
  • Whale Families Learn Each Other's Vocal Style
  • AI Can Answer Complex Physics Questions
  • Otters Use Tools to Survive a Changing World
  • Monogamy in Mice: Newly Evolved Type of Cell
  • Sustainable Electronics, Doped With Air
  • Male Vs Female Brain Structure
  • Breeding 'Carbon Gobbling' Plants

Trending Topics

Strange & offbeat.

U.S. flag

An official website of the United States government

  • The BEA Wire | BEA's Official Blog

Experimental R&D Value Added Statistics for the U.S. and States Now Available

Research and development activity accounted for 2.3 percent of the U.S. economy in 2021, according to new experimental statistics released today by the Bureau of Economic Analysis. R&D as a share of each state’s gross domestic product, or GDP, ranged from 0.3 percent in Louisiana and Wyoming to 6.3 percent in New Mexico, home to federally funded Los Alamos National Laboratory and Sandia National Laboratories.

new-map-value-added-percent-of-state-GDP_0

These statistics are part of a new Research and Development Satellite Account  BEA is developing in partnership with the National Center for Science and Engineering Statistics of the National Science Foundation . The statistics complement BEA’s national data on R&D investment  and provide BEA’s first state-by-state numbers on R&D.

The new statistics, covering 2017 to 2021, provide information on the contribution of R&D to GDP (known as R&D value added), compensation, and employment for the nation, all 50 states, and the District of Columbia. In the state statistics, R&D is attributed to the state where the R&D is performed.

Some highlights from the newly released statistics:

R&D activity is highly concentrated in the United States. The top ten R&D-producing states account for 70 percent of U.S. R&D value added. California alone accounts for almost a third of U.S. R&D. Other top R&D-producing states include Washington, Massachusetts, Texas, and New York.

chart-RD-state-ranking-value-added-vertical

Treating R&D as a sector allows for comparisons with other industries and sectors of the U.S. economy. For instance, R&D’s share of U.S. value added in 2021 is similar to hospitals (2.4 percent) and food services and drinking places (2.2 percent).

Comparison of R and D with Other Sectors

Eighty-five percent of R&D value added is generated by the business sector, followed by government, and nonprofit institutions serving households.

Within the business sector, the professional, scientific, and technical services industry accounts for 40 percent of business R&D value added.    Information (15 percent), chemical manufacturing (12 percent), and computer and electronic product manufacturing (11 percent) also account for sizable shares.

chart-RD-industry-and-biz-sector-comparison

Visit the R&D Satellite Account on BEA’s website for the full set of experimental statistics and accompanying information. To help refine the methodology and presentation of these statistics, BEA is seeking your feedback. Please submit comments to  [email protected] .

U.S. flag

A .gov website belongs to an official government organization in the United States.

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Older Adult Fall Prevention
  • Falls Facts
  • Fall Prevention Resources
  • Falls Interventions
  • About Still Going Strong
  • STEADI - Older Adult Fall Prevention

[cdc_image image_type="basic" dep_image_url="/falls/images/STEADI-banner-2020-b.jpg" image_size="original" image_title="STEADI-banner-2020-b" box_style="header_main_primary" alignment="center" image_alt="STEADI%20falls%20rate%202020" image_margin_position="top,right,bottom,left" image_border="none" overlay_alignment="bottom" overlay_width="50" overlay_margin="large" overlay_padding="standard" overlay_title_size="standard" overlay_button_position="left" image_link="/steadi/index.html" more_link_color="main|primary" image_padding="none" image_padding_position="top,right,bottom,left" cs_rule="inherit" mp_action="none" /] -->

Older Adult Falls Data

At a glance.

Falls are the leading cause of injury for adults ages 65 years and older. Older adult falls are common, costly, and preventable.

Falls are the leading cause of injury for adults ages 65 years and older. 1 Over 14 million, or 1 in 4 older adults report falling every year. 2 This chart shows the number of older adult fall-related deaths by month and year, including the most recent provisional data available. You can access the full interactive chart by clicking on the link below.

View older adult fall trends

Falls among adults aged 65 and older are common, costly, and preventable. Falls are the leading cause of fatal and nonfatal injuries among older adults. 1

Keep reading to learn more about the falls and fall death rates in your state.

Older adult falls reported by state

In the United States, over 14 million, or one in four, adults ages 65 and older (older adults), report falling each year. 2 While not all falls result in an injury, about 37% of those who fall reported an injury that required medical treatment or restricted their activity for at least one day, resulting in an estimated nine million fall injuries. 3

While older adult falls are common across all states, there is variability. 2

Select a year from the drop down to change the map information.

Data source: Centers for Disease Control and Prevention. Behavioral Risk Factor Surveillance System (BRFSS) – https://www.cdc.gov/brfss/annual_data/annual_2020.html

*Age-adjusted percentages standardized to the 2000 U.S. population with age groups 65–74, 75–84, and ≥85 years using the direct method.

Note for grayed-out states on the 2012 map: In the 2012 BRFSS survey, Michigan, Oregon, and Wisconsin used a different falls question from the rest of the states. Therefore, the 2012 falls estimates could not be calculated for these states.

Deaths from older adult falls

Falls are the leading cause of injury-related death among adults ages 65 and older, and the fall death rate is increasing. 4 The age-adjusted fall death rate increased by 41% from 55.3 per 100,000 older adults in 2012 to 78.0 per 100,000 older adults in 2021. 5

The rising number of deaths from falls among older adults can be addressed by screening for fall risk and intervening to address risk factors such as use of medicines that may increase fall risk, or poor strength and balance.

For more information on how to screen, assess, and intervene to reduce fall risk, visit www.cdc.gov/STEADI .

Data source: Centers for Disease Control and Prevention. National Center for Health Statistics. National Vital Statistics System, Mortality 1999–2021 on CDC WONDER Online Database. Accessed January 24, 2023. https://wonder.cdc.gov/ucd-icd10.html

*Age-adjusted death rates standardized to the 2000 U.S. population with age groups 65–74, 75–84, and ≥85 years using the direct method.

**Rates are marked as "unreliable" when the death count is less than 20.

  • Centers for Disease Control and Prevention, National Center for Injury Prevention and Control. Web–based Injury Statistics Query and Reporting System (WISQARS) [online].
  • Kakara R, Bergen G, Burns E, Stevens M. Nonfatal and Fatal Falls Among Adults Aged ≥65 Years—United States, 2020–2021 . MMWR Morbidity and Mortality Weekly Report. 2023;72:938–943. DOI: 10.15585/mmwr.mm7235a1.
  • Moreland B, Kakara R, Henry A. Trends in Nonfatal Falls and Fall-Related Injuries Among Adults Aged ≥65 Years—United States, 2012–2018 . MMWR Morbidity and Mortality Weekly Report. 2020 July 10;69(27):875–881. DOI: 10.15585/mmwr.mm6927a5.
  • Kakara RS, Lee R, Eckstrom EN. Cause-Specific Mortality Among Adults Aged ≥65 Years in the United States, 1999 Through 2020 . Public Health Reports. 2023 March;139(1):54–58. DOI: 10.1177/00333549231155869.
  • Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics System, Mortality 1999–2020 on CDC WONDER Online Database , released in 2021. Data are from the Multiple Cause of Death Files, 1999–2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed February 9, 2023.

Falls—and the injuries and deaths they cause—are increasing, but falls can be prevented. Learn more about Older Adult Fall Prevention.

IMAGES

  1. Statistical Analysis Types

    statistical research data

  2. 7 Types of Statistical Analysis with Best Examples

    statistical research data

  3. 7 Types of Statistical Analysis: Definition and Explanation

    statistical research data

  4. Statistical Analysis Methods: 6 Statistical Methods for Analysis Must

    statistical research data

  5. Role of Statistics in Research

    statistical research data

  6. 5 Steps of the Data Analysis Process

    statistical research data

VIDEO

  1. ISLR: Statistical Learning (islr06 2)

  2. Building Research Design and Statistical Analysis using Data

  3. PRC Methods Talk

  4. Spatial Statistics for Data Science: Spatial data in R (spacestats01 2)

  5. Spatial Statistics for Data Science: Types of Spatial Data (spacestats01 1)

  6. Intro: Part I

COMMENTS

  1. The Beginner's Guide to Statistical Analysis

    Step 1: Write your hypotheses and plan your research design. To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design. Writing statistical hypotheses. The goal of research is often to investigate a relationship between variables within a population. You start with a prediction ...

  2. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  3. What Is Statistical Analysis? (Definition, Methods)

    Statistical analysis is useful for research and decision making because it allows us to understand the world around us and draw conclusions by testing our assumptions. Statistical analysis is important for various applications, including: Statistical quality control and analysis in product development. Clinical trials.

  4. What Is Statistical Analysis? Definition, Types, and Jobs

    Statistical analysis is the process of collecting and analyzing large volumes of data in order to identify trends and develop valuable insights. In the professional world, statistical analysts take raw data and find correlations between variables to reveal patterns and trends to relevant stakeholders. Working in a wide range of different fields ...

  5. An Introduction to Statistics: Choosing the Correct Statistical Test

    The choice of statistical test used for analysis of data from a research study is crucial in interpreting the results of the study. This article gives an overview of the various factors that determine the selection of a statistical test and lists some statistical testsused in common practice. How to cite this article: Ranganathan P. An ...

  6. Statistics for Research Students

    It includes foundational concepts, application methods, and advanced statistical techniques relevant to research methodologies. Content Accuracy rating: 5 The textbook presents statistical methods and data accurately, with up-to-date statistical practices and examples. Relevance/Longevity rating: 4

  7. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if ...

  8. Statistics

    Statistics is the application of mathematical concepts to understanding and analysing large collections of data. A central tenet of statistics is to describe the variations in a data set or ...

  9. What Is Statistical Analysis? Ultimate Guide

    Statistical analysis is a systematic method of gathering, analyzing, interpreting, presenting, and deriving conclusions from data. It employs statistical tools to find patterns, trends, and links within datasets to facilitate informed decision-making. Data collection, description, exploratory data analysis (EDA), inferential statistics ...

  10. Statistics

    Read the latest Research articles in Statistics from Scientific Reports. ... Improved data quality and statistical power of trial-level event-related potentials with Bayesian random-shift Gaussian ...

  11. A dataset for measuring the impact of research data and their ...

    Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset ...

  12. Choosing the Right Statistical Test

    When to perform a statistical test. You can perform statistical tests on data that have been collected in a statistically valid manner - either through an experiment, or through observations made using probability sampling methods.. For a statistical test to be valid, your sample size needs to be large enough to approximate the true distribution of the population being studied.

  13. Home

    Covers all topics of modern data science, such as frequentist and Bayesian design and inference as well as statistical learning. Contains original research papers (regular articles), survey articles, short communications, reports on statistical software, and book reviews. High author satisfaction with 90% likely to publish in the journal again.

  14. Data analysis

    data analysis, the process of systematically collecting, cleaning, transforming, describing, modeling, and interpreting data, generally employing statistical techniques. Data analysis is an important part of both scientific research and business, where demand has grown in recent years for data-driven decision making.

  15. Role of Statistics in Research

    Types of Statistical Research Methods That Aid in Data Analysis Types of Statistical Research Methods. Statistical analysis is the process of analyzing samples of data into patterns or trends that help researchers anticipate situations and make appropriate research conclusions. Based on the type of data, statistical analyses are of the ...

  16. Statista

    Find statistics, consumer survey results and industry studies from over 22,500 sources on over 60,000 topics on the internet's leading statistics database

  17. Statistical Research

    When existing statistical methods are inadequate for addressing new and challenging data questions, improved or new statistical methods must be found. Statistical research is the rigorous development of improved or new statistical methods grounded in probability and statistical theory. We focus on methods in eight areas of statistical research ...

  18. Introduction to Statistics

    The length of the textbook appears to be more than adequate for a one-semester course in Introduction to Statistics. As I no longer teach a full statistics course but simply a few lectures as part of our Research Curriculum, I am recommending this book to my students as a good reference. Especially as it is available on-line and in Open Access.

  19. Selection of Appropriate Statistical Methods for Data Analysis

    Type and distribution of the data used. For the same objective, selection of the statistical test is varying as per data types. For the nominal, ordinal, discrete data, we use nonparametric methods while for continuous data, parametric methods as well as nonparametric methods are used.[] For example, in the regression analysis, when our outcome variable is categorical, logistic regression ...

  20. What is Statistical Analysis? Types, Methods and Examples

    Statistical analysis is the process of collecting and analyzing data in order to discern patterns and trends. It is a method for removing bias from evaluating data by employing numerical analysis. This technique is useful for collecting the interpretations of research, developing statistical models, and planning surveys and studies.

  21. Federal Statistical Research Data Center

    The Federal Statistical Research Data Centers (RDCs) located across the country—managed and administered by the U.S. Census Bureau—make two types of restricted data accessible to approved researchers: survey and census data, and administrative data. The former is largely comprised of data collected by the U.S. Census Bureau in the course of ...

  22. Federal Statistical Research Data Centers

    Federal Statistical Research Data Centers (FSRDC) are partnerships between federal statistical agencies and leading research institutions. FSRDCs provide secure environments supporting qualified researchers using restricted-access data while protecting respondent confidentiality. The federal statistical system has developed the standard ...

  23. Household Debt Rose by $184 Billion in Q1 2024; Delinquency Transition

    NEW YORK — The Federal Reserve Bank of New York's Center for Microeconomic Data today issued its Quarterly Report on Household Debt and Credit. The report shows total household debt increased by $184 billion (1.1%) in the first quarter of 2024, to $17.69 trillion.

  24. Century of statistical ecology reviewed

    A special review examines highly-cited papers in statistical ecology. The review, which covers a century of research, details how models and concepts have evolved alongside increasing ...

  25. Experimental R&D Value Added Statistics for the U.S. and States Now

    These statistics are part of a new Research and Development Satellite Account BEA is developing in partnership with the National Center for Science and Engineering Statistics of the National Science Foundation.The statistics complement BEA's national data on R&D investment and provide BEA's first state-by-state numbers on R&D.. The new statistics, covering 2017 to 2021, provide information ...

  26. Older Adult Falls Data

    Data on older adult falls. Falls are the leading cause of injury for adults ages 65 years and older. 1 Over 14 million, or 1 in 4 older adults report falling every year. 2 This chart shows the number of older adult fall-related deaths by month and year, including the most recent provisional data available. You can access the full interactive chart by clicking on the link below.